University of Virginia, Department of Computer Science
CS655: Programming Languages
Spring 2000
Problem Set 3: Subtyping and Proof-Carrying Code Out: 21 March 2000
Due: Tuesday, 11 April in class

Problem set answers may be hand-written, but only if your hand writting is neat enough for us to read it.

Warning: This problem set is believed to be substantially harder than Problem Set 2. You are encouraged to start thinking about these problems early (everything relevant to this problem set has already been covered in class), and after you have tried them on your own to collaborate with your classmates. (Remember to list everyone you collaborated with.)

Note: This problem set contains a large amount of background material. We recommend that you skim through the whole problem set before trying to answer any of the questions.

One of the (many) reasons Java programs run slowly is because of the overhead associated with all the run-time checking. A smart compiler can eliminate much of the unnecessary run-time checking, but this is only useful if it can also construct a proof that convinces an untrusting JavaVM (that doesn't see the source code) that it is safe to execute the program without the run-time checks. The goal of this problem set is to use axiomatic semantics and proof-carrying code techniques to remove run-time checking from a Java program.

Background

Consider the following Java class:

public class Scrunch {
  public static String Scrunch (String a[]) {
    Object [] ar = new Object [100];

    for (int i = 0; i < a.length; i++) {
      ar[i] = a[i];
    }

    String s = "";

    for (int i = 0; i < a.length; i++) {
      s = s.concat ((String) ar[i]);
    }

    return s;
  }
}
For simplicity, we use a code example that would not appear in a (reasonable) Java program. In a real Java program, this would make more sense if ar were a Vector (which because of the lack of parameterized types in Java must be a Vector of Objects) instead of an Object [].

A Java compiler (Sun's JDK) produces the following byte codes for the Scrunch method (you can see this for yourself by running javap -c -verbose class):

Method java.lang.String Scrunch(java.lang.String[])
   0 bipush 100Push the constant 100 on the stack
   2 anewarray
     class <Class java.lang.Object>
Pop the top of the stack, and construct a new array of element type java.lang.Object of that size
   5 astore_1Store the top of the stack (the array we just created) in local 1 (corresponds to ar)
   6 iconst_0Push the constant 0
   7 istore_2Store it in local 2 (corresponds to i)
   8 goto 20Jump to instruction numbered 20
  11 aload_1Push local 1 (ar)
  12 iload_2Push local 2 (i)
  13 aload_0Push parameter (a)
  14 iload_2Push local 2 (i)
The top of stack is now: [i, a, i, ar, ...]
  15 aaloadPop i (top), a (next) from stack; push a[i]
aaload performs run-time bounds checking
  16 aastorePop a[i] (top), i (next), ar (next) from stack; store a[i] in ar[i]
  17 iinc 2 1Increment the integer in local 2 by one (i++)
  20 iload_2Push the integer in local 2 (i)
  21 aload_0Push the object in local 0 (the parameter, args)
  22 arraylengthReplace stack top with length of array
  23 if_icmplt 11Pop x (top), y (next) from stack; if y < x jump to 11 (beginning of loop)
  26 ldc <String "">Push the constant String ""
  28 astore_3Store top of stack in local 3 (s)
  29 iconst_0Push constant 0
  30 istore 4Store it in local 4 (i in second loop)
  32 goto 50
  35 aload_3Push local 3 (s)
  36 aload_1Push local 1 (ar)
  37 iload 4Push local 4 (i)
  39 aaloadPop i (top), ar (next) from stack; push ar[i]
  40 checkcast <Class java.lang.String>If the runtime type of the top of the stack (ar[i]) is not a subtype of java.lang.String, issue a run-time type error; otherwise, continue knowing its type is a subtype of java.lang.String
  43 invokevirtual
      <Method java.lang.String 
        concat(java.lang.String)>
Invoke the method concat on the object at the top of the stack. Note that if this object is an instance of a subtype of java.lang.String that overrides to concat method, the method in the subtype is called. Pass the next item on the stack as an argument.
  46 astore_3Store the result (which is put on top of the stack) in local 3 (s)
  47 iinc 4 1Increment local 4 (i) by 1
  50 iload 4Push local 4 (i)
  52 aload_0Push local 0 (a)
  53 arraylengthReplace top of stack with its length (a.length)
  54 if_icmplt 35if i < a.length goto 35 (continue loop)
  57 aload_3Push s on stack
  58 areturnReturn to caller; result is on top of stack.

Eliminating Run-Time Checks

The generated byte codes have several unnecessary run-time checks:
  1. 15 aaload - the run-time bounds checking on the array load, a[i].
  2. 16 aastore - the run-time bounds checking on the array store to ar[i] is unnecessary if we know a.length < 100; a smart compiler will check this once before the loop, and remove the checking inside the loop.
  3. 39 aaload - the run-time bounds checking on the array load, ar[i].
We will attempt to replace these with:
  1. 15 safe_aaload - an array load with no run-time bounds checking
  2. 16 safe_aastore - an array store with no run-time bounds checking (after inserting one check before the loop)
  3. 39 safe_aaload - an array load with no run-time bounds checking
These are not real JavaVM instructions, but one could imagine a future version of Java with a more sophisticated byte code verifier supporting them. We call JVML extended with the safe_ instructions JVML+safe.


1. (.10) VCGen
Write the VCGen function for JVML+safe. You may assume all the normal JVML instructions are checked by the bytecode verifier, so the only instructions that generate predicates are the new safe_ instructions.

To write the VCGen, we use a shorthand that uses arguments to represent the stack slots. You may assume the byte code verifier ensures the top stack slots match the argument types. You may assume no object is NULL.

The form of your VCGen should be:


VCGen (PC) =
   if Inst[PC] = safe_aaload <i: int> <a: array>
        <predicate for safe_aaload> /\ VCGen (PC + 1)
   ...
   else % all non-safe instructions
      VCGen (PC + 1)
         (Don't worry about falling off the end or multiple-word instructions.)
Complete VCGen with rules for the instructions in JVML+safe.

Optimized Code

Tortilla Systems certifying optimizing compiler generates the following code for Scrunch:

Method java.lang.String Scrunch(java.lang.String[])
   0 bipush 100
   2 anewarray class 
   5 astore_1
   6 iconst_0
   7 istore_2
   8 check aload_0.length <= 100
   9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100
  10 iload_2
  11 aload_0
  12 arraylength
  13 if_icmpge 22
  14 aload_1
  15 iload_2
  16 aload_0
  17 iload_2
  18 safe_aaload
  19 safe_aastore
  20 iinc 2 1
  21 goto 9
  22 ldc 
  23 astore_3
  24 iconst_0
  25 istore 4
  26 invariant ???
  27 iload 4
  28 aload_0
  29 arraylength
  30 if_icmpge 40
  31 aload_3
  32 aload_1
  33 iload 4
  34 safe_aaload
  35 checkcast 
  36 invokevirtual 
  37 astore_3
  38 iinc 4 1
  39 goto 26
  40 aload_3
  41 areturn
This code differs from the code produced by Sun's JDK in four ways:
  1. Loops have been rearranged to make analysis easier.
  2. The check instruction in line 8 has been added. A check instruction performs a run-time check that its predicate holds. Immediately after a check instruction, the predicate can safely be assumed to be true, since execution would terminate if it were false.
  3. An invariant has been introduced for each loop. This follows the PCC requirement that all back-edges point to instructions with associated invariants. The invariant for instruction 26 is not shown (you will produce it in question 4).
  4. The slow aaload and aastore instructions in the original program have been replaced with the faster instructions safe_aaload and safe_aastore.
We assume Java objects cannot be NULL. Without this assumption, we would need to include tests for null in the check and invariant clauses.

First Loop

This section walks through most of the safety proof required for the first loop. In question 4, you will need to derive the invariant and proof for the second loop yourself.

For simplicity, we can view the loop between instructions 9 and 21 as:

         while i < a.length do
            ar[i] := a[i] % safe array loads and stores
            i := i + 1
         end

To prove the aaload in instruction 18 is safe, we need to show:


VCGen (18 safe_aaload <i> <a>) ==
                  i >= 0 /\ i < a.length /\ VCGen (19) 
(Hint: you should check that your answer to question 1 would produce this predicate.)

We assume (for now) VCGen (19) is true, and show i >= 0 /\ i < a.length.

Since the invariant was provided by the untrustworthy code supplier, we cannot assume it is correct. Instead, we must prove the invariant holds. Then, we use the invariant to prove VCGen(18).

The axiomatic semantics partial correctness (since its a safety proof, we don't care about showing termination) rule for while is:

P => Inv,
Inv { Pred } => Inv,
Inv /\ Pred { Statement } Inv,
(Inv /\ ~Pred) => Q,
___________________________________

P { while Pred do Statement end } Q
Inv is given by instruction 9:
   9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100
We can rewrite this as:
       a.length <= 100 /\ i >= 0 /\ ar.length = 100

P can be any predicate that we can prove from the code before the loop. The check clause gives a.length <= 100, instructions 0-5 give ar.length = 100 and instructions 6-7 give i = 0. This is argued informally, but could be shown using axiomatic semantics rules for assignment along with a specification of anewarray. This gives,

        P == a.length <= 100 /\ ar.length = 100 /\ i = 0

Q is what we need to be true after the loop. Since we won't know this until doing the second loop, we can start with the weakest possible post-condition, Q = true. In question 4, you will find a stronger post-condition is needed, and change Q.

We prove each antecedent clause in turn:

Now, we can use the invariant to prove VCGen (18). At instruction 18, we can assume the invariant is true (because of the above proof, and nothing has been modified since the beginning of the loop), and i < a.length because of the loop predicate. We need to show this is enough to satisfy VCGen (18), assuming VCGen (19):
         a.length <= 100 /\ i >= 0 /\ ar.length = 100 /\ (i < a.length)
     ==> i >= 0 /\ i < a.length /\ VCGen (19)
This is trivially true.


2. (.10) Instruction 19

a. (.05) Predicate.
Show the verification predicate your VCGen generates for instruction 19: safe_aastore <a[i]> <i> <ar>.

b. (.05) Proof.
Show the proof that VCGen (19) is satisfied (assuming VCGen (20) is true). You may use everything that was used in the proof of VCGen (18) above.


Second Loop

For simplicity, we can view the loop between instructions 26 and 39 as:
         while i < a.length do
            s := s.concat ((String) ar[i]) % safe load
            i := i + 1
         end
You should construct your arguments at the same level of detail as the proof for the first loop above.


3. (.40) Safe Load
a. (.05) Verification Predicates.
Show the verification predicate for instruction 34, safe_aaload <i> <ar>.

b. (.15) Invariant.
Write out a loop invariant (missing from instruction 26) that will be sufficient to prove your verification predicate for instruction 34.

c. (.20) Proof.
Use the axiomatic semantics rule for while to prove VCGen (34) is true (assuming VCGen (35)). Your proof should follow the structure of the proof of VCGen (18) --- you should prove the invariant holds first, and then use the invariant to prove VCGen (34).

4 (.20/.50 with Challenge) Safe Cast
The next generation Tortilla systems virtual machine adds an instruction safe_cast <type>. Unlike checkcast which does (expensive) run-time checking to ensure the run-time type satisfies the cast constraint, safe_cast implies the type constraint can be verified statically. Our goal is to replace the checkcast in instruction 35 with

  35 safe_cast <Class java.lang.String>
a. (.05) VCGen
Show the clause added to VCGen to handle safe_cast.

b. (.05)
Show VCGen (35), the verification predicate generated for the safe_cast version of instruction 35.

c. (.10)
Prove VCGen (35) holds for the second loop. You will need to strengthen the invariant, and assume a stronger pre-condition on entry to the loop.

d. (.30) (Challenge)
Prove that the pre-condition you used for the second loop is true. You will need to strengthen the invariant for the first loop.

5. (.20) Subtyping
Being oxygen-deprived at the top of the Eiffel tower, Mertrude Bryer suggests adding the following typing judgments to Java:

          S <= T                             (<= means is a subtype of)
   ____________________                      [monotonic-arrays]

   array[S] <= array[T]


  P_1 <= Q_1, ..., P_n <= Q_n, S <= T
_________________________________________    [monotonic-procedures]

     proc (P_1, ..., P_n) returns (S)
  <= proc (Q_1, ... , Q_n) returns (T)

Show that an attacker could exploit these rules by passing an argument to Scrunch that leads to a type safety violation. This means it passes the Java type checker, but contains a type error that is not detected at run time.


CS
655 University of Virginia
CS 655: Programming Languages
cs655-staff@cs.virginia.edu
Last modified: Mon Feb 26 12:48:14 2001