CS655 - Problem Set 2: Types

University of Virginia, Department of Computer Science
CS655: Programming Languages
Spring 2000

Problem Set 2: Types

Out: 17 Feb 2000
Due: Thursday, 2 March in class

Problem set answers may be hand-written, but only if your hand writting is neat enough for us to read it.

Warning: This problem set is believed to be substantially harder than Problem Set 1. You are encouraged to start thinking about these problems early (everything relevant to this problem set has already been covered in class), and after you have tried them on your own to collaborate with your classmates. (Remember to list everyone you collaborated with.)

Part I: Typing UmniVML

Make sure you understand Sections 1-4 of the Stata and Abadi paper before attempting this part.

Sammy Lucko, President of Colossal Software, has concluded that the stack-based virtual machine used by JVML is too inefficient and inflexible for the mobile code needs of future Internet devices, and has decided to create a register and heap-based virtual machine language UmniVML. Since UmniVML will run in toasters and can openers, security is very important and the security of UmniVML programs depends on type safety.

We start with a subset of UmniVML denoted UmniVML0. UmniVML0 has one primitive type (integer) and recursive reference types. A reference type is described as ref (type), so the type denoted by ref (ref (integer)) is a reference to a reference to an integer (described syntactically as REF REF INTEGER). The grammar for UmniVML0 is:

Program ::= TypeHint* Statement*

TypeHint ::= TYPE MemoryLocation Type

Location MemoryLocation is declared to contain a Type. The type of MemoryLocation is ref (type denoted by Type).

Type ::= INTEGER

Denotes the type integer.

| REF Type

Denotes the type ref (type denoted by Type).

Statement ::= STORE Expression_m Expression_v

Store the value of Expression_v in the memory location denoted by Expression_m. Expression_m must have type ref (type of Expression_v).

| READ Expression

Get an integer input from the user. Store it in location denoted by Expression. Expression must have type ref(integer).

| PRINT Expression

Display value of Expression on screen. Expression must have type integer.

| HALT

Terminate execution. The result of the execution is the value stored in M0.

Expression ::= ADD Expression_1 Expression_2

Value is the integer sum of value of Expression_1 and Expression_2. Expression_1 and Expression_2 must have type integer. Value has type integer.

| REF Expression

Value is a reference to location defined by Expression. Expression must have type ref (T). Value has type ref (ref (T)).

| DEREF Expression

Value is the value in location corresponding to Expression. Expression must have type ref (T). Value has type T.

| IntLiteral

Value is the value denoted by the integer literal. Vaule has type integer.

| MemoryLocation

Value is the memory location denoted by the location literal. Value has type ref (type contained in MemoryLocation).

MemoryLocation ::= M[0-9][0-9]*

IntLiteral ::= [-]?[0-9][0-9]*

1. (.02) Understanding UmniVML0

Consider the UmniVML0 program:

          [h0] TYPE M0 INTEGER      ;;; Type of M0 is ref (integer)
          [h1] TYPE M1 INTEGER
          [h2] TYPE M2 REF INTEGER
          [0]  READ M0
          [1]  READ M1
          [2]  STORE M2 M0
          [3]  PRINT DEREF M2
          [4]  HALT

Statements 3 contains a type error. Describe it and show how to correct it.

2. (.23) Operational (Dynamic) Semantics
Develop an operational semantics for UmniVML0. Your operational semantics should end in a stuck state if the program contains a type error.

(a) (.05) Machine Description
Describe the virtual machine you will use for your semantics, and how you will describe its configurations.

(b) (.06) Initial Configuration
Show how an UmniVML0 program is mapped to an initial configuration.

(c) (.12) Transition Rules

Show the transition rules for:

STORE MemoryLocation IntLiteral
STORE MemoryLocation MemoryLocation
ADD IntLiteral IntLiteral
REF IntLiteral
REF MemoryLocation
DEREF MemoryLocation

You should write these rules in a general enough way so they are sufficient to simluate any UmniVML program. For example, your rules should be able to handle statements like (parenthesis added for understanding, but not part of the language): STORE (DEREF M4) (ADD (DEREF M5) (ADD 23 (DEREF (DEREF M7)))).

3. (.25) Static Semantics
Show the static semantics for UmniVML0. Your static semantics should allow a type judgment to be proven for any type-safe statement, and not allow any type judgment proof for any non-type-safe statement. Your rules should be of the form used in Lecture 6.

4 (.25). (Challenge) Pointer Arithmetic
Sammy Lucko insists that efficient programs cannot be generated in UmniVML without pointer arithmetic. He proposes adding the ADDP expression (call the new language UmniVML1):

Expression ::= ADDP Expression_1 Expression_2

Value is a reference to the memory location that is value of Expression_2 slots beyond the location refered to by Expression_1; this location must have the same type as Expression_1. Expression_2 must have type integer. The type of the ADDP expression is the type of Expression_1.

An example of a well-typed UmniVML1 program (and a hint for question 1) is:

          [h0] TYPE M0 INTEGER      ;;; Type of M0 is ref (integer)
          [h1] TYPE M1 INTEGER
          [h2] TYPE M2 REF INTEGER
          [0]  READ M0
          [1]  READ M1
          [2]  STORE M2 M0
          [3]  STORE M2 ADDP DEREF M2 1   ;;; STORE M2 ADDP DEREF M2 2 would not be well-typed.
          [4]  HALT

(a) (.15) Dynamic Semantics
Show the changes you need to make to your operational semantics to support ADDP.

(b) (.10) Static Semantics
Show the changes you need to make to your static semantics to support ADDP, or argue clearly and convincingly that it is not possible to devise a static semantics that supports ADDP. If you figured out a general way to do this, how would the world be different?

Part II: Reasoning about Data Abstractions

After selling Colossal to NanoSoft, Sammy Lucko plans a trip around the world. He hires Colleen P. Hacker to develop a program he can use for analyzing different routes. Her program uses a weighted, directed graph (wdgraph) data structure. She finished most of the program, but was called away on urgent Federation business before finishing the wdgraph implementation. In this problem, you will finish her implementation and reason about its correctness.

Colleen P. Hacker's partial implementation of wdgraph is shown here (it is not necessary to understand all the code to answer the questions):

   wdgraph = cluster is create, add_node, add_edge, has_edge, find_least_path, unparse
      % Overview
      %    A wdgraph is a set of nodes (named by strings) connected by
      %    directed edges with positive integer weights.  For example,
      %
      %                         68
      %     Charlottesville .--------> . Richmond
      %           /\         <--------
      %            |            73
      %            | 110
      %            |
      %            . Washington DC
      %
      %    is a wdgraph where you can go from Charlottesville to Richmond in 68 minutes,
      %    Richmond to Charlottesville in 73 minutes, Washington to Charlottesville in
      %    110 minutes, but you can't go from Charlottesville to Washington.
      %
      % Abstraction Function
      %    A typical wdgraph is:
      %        [ (N_i0, N_j0, w_i0j0),
      %          (N_i1, N_j1, w_i1j1),
      %           ...,
      %          (N_in, N_jn, w_injn) ].
      %
      %    The abstraction function is A(r) = ???
      %
      % Rep Invariant
      %    Still working
      %

      edge = oneof [ noedge: null, weight: int ]
      as = array[string] ae = array[edge] aae = array[ae]        
      rep = record [ nodes: as, edges: aae ]

      create = proc () returns (wdgraph)
         % effects Returns a new, empty wdgraph.
         return up(rep${nodes: as$new (), edges: aae$new ()})
         end create

      lookup_node = proc (g: cvt, node: string) returns (int) signals (not_found)
         % Note: not in cluster header, so it cannot be called from outside.
         % effects If there is a node in rep.nodes matching name node, returns
         %     its index; otherwise signals not_found.

         ... implementation not shown
         end lookup_node

      add_node = proc (g: cvt, node: string) signals (duplicate_node)
         % modifies g
         % effects Adds a new node to the wdgraph with no edges.  If there
         %     is already a matching node, signals duplicate_node.
         lookup_node (g, node)
            except when not_found:
               as$addh (g.nodes, node)
               % Add a new column by adding a new no edge to each row in edge matrix.
               for row: ae in aae$elemente (g.edges) do
                   ae$addh (row, edge$make_noedge (nil))
                   end % for
               % Add a new row to the edge matrix for edges from the new node (currently none).
               aae$addh (g.edges, ae$fill (1, as$size (g.nodes), edge$make_noedge (nil)))
               return
            end
         % No exception means its a duplicate!
         signal duplicate_node
         end add_node

      add_edge = proc (g: cvt, start: string, end: string, weight: int)
                       signals (duplicate_edge, missing_node)
         % modifies g
         % effects Adds an edge from start to end with weight weight to
         %     g.  Signals duplicate_edge is there is already an edge
         %     from start to end (in that direction) in g.  Signals missing_node
         %     if either start or end does not match a node in g.

         % needs implementation
         end add_edge

     find_least_path = proc (g: cvt, from_node: string, to_node: string)
                             returns (array[string]) signals (missing_node, no_path)
         % effects Returns the path from from_node to to_node in g with the
         %    lowest total edge weights.  Signals no_path if there is no path 
         %    between from_node and to_node in g.  Signals missing_node if
         %    from_node or to_node is not in g.

         % ...
         end find_least_path

    has_edge = proc (g: cvt, from_node: string, to_node: string)
                     returns (bool) signals (missing_node (string))
         % effects Returns true iff there is an edge from from_node to to_node
         %    in g.  Signals missing_node if from_node or to_node is not in g.

        findex : int := lookup_node (g, from_node)
          except when not_found: signal missing_node (from_node) end
        tindex : int := lookup_node (g, to_node)
          except when not_found: signal missing_node (to_node) end
        return (edge$is_weight (g.edges [findex][tindex]))
     end has_edge

   unparse = proc (g: cvt) returns (string) 
      % effects Returns a string representation of g.
      % ...
      end unparse

end wdgraph

5. (.05) Implementing add_edge

To demonstrate that you understand wdgraphs and Colleen's representation, write an implementation for add_edge. Your implementation need not be syntactically correct CLU, but it should be real code, not pseudo code.

6. (.20) Abstraction Function

(a) (.05) Abstract Descrption
Colleen's description of a typical wdgraph is not adequate. Show this clearly by illustrating two wdgraphs that are the same according to her representation, but will behave differently accorgin to the specification of operations of wdgraph.

(b) (.15) Abstraction Function
Suggest an adequate abstract description of a wdgraph and define an abstraction function for the given rep and your abstract description.

7. (.25) Rep Invariant

(a) (.10) Define the rep invariant for the wdgraph implementation.
(b) (.05) Prove that create returns a wdgraph that satisfies your rep invariant.
(c) (.05) Prove that add_node preserves your rep invariant.
(d) (.05) Prove that your add_edge implementation preserves your rep invariant.

University of Virginia
CS 655: Programming Languages

cs655-staff@cs.virginia.edu
Last modified: Mon Feb 26 12:48:15 2001

Program	::=	TypeHint* Statement*
TypeHint	::=	`TYPE` MemoryLocation Type
			Location MemoryLocation is declared to contain a Type. The type of MemoryLocation is ref (type denoted by Type).
Type	::=	`INTEGER`
			Denotes the type integer.
		\| `REF` Type
			Denotes the type ref (type denoted by Type).
Statement	::=	`STORE` Expression_m Expression_v
			Store the value of Expression_v in the memory location denoted by Expression_m. Expression_m must have type ref (type of Expression_v).
		\| `READ` Expression
			Get an integer input from the user. Store it in location denoted by Expression. Expression must have type ref(integer).
		\| `PRINT` Expression
			Display value of Expression on screen. Expression must have type integer.
		\| `HALT`
			Terminate execution. The result of the execution is the value stored in M0.
Expression	::=	`ADD` Expression_1 Expression_2
			Value is the integer sum of value of Expression_1 and Expression_2. Expression_1 and Expression_2 must have type integer. Value has type integer.
		\| `REF` Expression
			Value is a reference to location defined by Expression. Expression must have type ref (T). Value has type ref (ref (T)).
		\| `DEREF` Expression
			Value is the value in location corresponding to Expression. Expression must have type ref (T). Value has type T.
		\| IntLiteral
			Value is the value denoted by the integer literal. Vaule has type integer.
		\| MemoryLocation
			Value is the memory location denoted by the location literal. Value has type ref (type contained in MemoryLocation).
MemoryLocation	::=	`M`[`0`-`9`][`0`-`9`]*
IntLiteral	::=	[`-`]?[`0`-`9`][`0`-`9`]*

Expression	::=	`ADDP` Expression_1 Expression_2
			Value is a reference to the memory location that is value of Expression_2 slots beyond the location refered to by Expression_1; this location must have the same type as Expression_1. Expression_2 must have type integer. The type of the `ADDP` expression is the type of Expression_1.