University of Virginia, Department of Computer Science
CS655: Programming Languages, Spring 2001

Problem Set 4:
Decaffeinating Java
Out: 10 April 2001
Due: Thursday, 19 April 2001 (in class)

Formal Semantics and Proof-Carrying Code

Purpose
This problem set will require you to apply several of the formal semantics techniques we have seen in class to the problem of improving the performance of Java programs.
Collaboration Policy
You may choose anyone you want to work with on this assignment. A similar assignment was used in last year's CS655; it would be dishonorable to use it to assist your work.

If you choose to work alone, you should turn in your own solution. If you choose to work with someone else, you should turn in a single solution that represents your combined work.

Warning
This problem set is believed to be long and difficult. You should start thinking about these problems early.
Optional Reminder
This assignment is optional. You should do as much or as little of it as you think would be worthwhile. For most of you, I believe it will be worthwhile to do all of this problem set, but you should use your best judgement to allocate your time between this assignment, your project, and your other tasks. It won't count against your final grade in this course if you don't do this problem set, but manage to convince me through your class contribution, project and final that you have good understanding of this material (but it is probably hard to get a good understanding of these formal semantics tools without actually attempting to use them).

Background

One of the (many) reasons Java programs run slowly is because of the overhead associated with all the run-time checking. A smart compiler can eliminate much of the unnecessary run-time checking, but this is only useful if it can also construct a proof that convinces an untrusting JavaVM (that doesn't see the source code) that it is safe to execute the program without the run-time checks. The goal of this problem set apply proof-carrying code techniques to remove run-time checking from a Java program.

Consider the following Java class:

public class Scrunch {
  public static String Scrunch (String a[]) {
    Object [] ar = new Object [100];

    for (int i = 0; i < a.length; i++) {
      ar[i] = a[i];
    }

    String s = "";

    for (int i = 0; i < a.length; i++) {
      s = s.concat ((String) ar[i]);
    }

    return s;
  }
}
For simplicity, we use a code example that would not appear in a (reasonable) Java program. In a real Java program, this would make more sense if ar were a Vector (which because of the lack of parameterized types in Java must be a Vector of Objects) instead of an Object [].

A Java compiler (Sun's JDK) produces the following byte codes (the actual byte codes are shown because it builds character to read them, but we will deal with the code at a higher level for our proof) for the Scrunch method (you can see this for yourself by running javap -c -verbose class):

Method java.lang.String Scrunch(java.lang.String[])
   0 bipush 100Push the constant 100 on the stack
   2 anewarray
     class <Class java.lang.Object>
Pop the top of the stack, and construct a new array of element type java.lang.Object of that size
   5 astore_1Store the top of the stack (the array we just created) in local 1 (corresponds to ar)
   6 iconst_0Push the constant 0
   7 istore_2Store it in local 2 (corresponds to i)
   8 goto 20Jump to instruction numbered 20
  11 aload_1Push local 1 (ar)
  12 iload_2Push local 2 (i)
  13 aload_0Push parameter (a)
  14 iload_2Push local 2 (i)
The top of stack is now: [i, a, i, ar, ...]
  15 aaloadPop i (top), a (next) from stack; push a[i]
aaload performs run-time bounds checking
  16 aastorePop a[i] (top), i (next), ar (next) from stack; store a[i] in ar[i]
  17 iinc 2 1Increment the integer in local 2 by one (i++)
  20 iload_2Push the integer in local 2 (i)
  21 aload_0Push the object in local 0 (the parameter, args)
  22 arraylengthReplace stack top with length of array
  23 if_icmplt 11Pop x (top), y (next) from stack; if y < x jump to 11 (beginning of loop)
  26 ldc <String "">Push the constant String ""
  28 astore_3Store top of stack in local 3 (s)
  29 iconst_0Push constant 0
  30 istore 4Store it in local 4 (i in second loop)
  32 goto 50
  35 aload_3Push local 3 (s)
  36 aload_1Push local 1 (ar)
  37 iload 4Push local 4 (i)
  39 aaloadPop i (top), ar (next) from stack; push ar[i]
  40 checkcast <Class java.lang.String>If the runtime type of the top of the stack (ar[i]) is not a subtype of java.lang.String, issue a run-time type error; otherwise, continue knowing its type is a subtype of java.lang.String
  43 invokevirtual
      <Method java.lang.String 
        concat(java.lang.String)>
Invoke the method concat on the object at the top of the stack. Note that if this object is an instance of a subtype of java.lang.String that overrides to concat method, the method in the subtype is called. Pass the next item on the stack as an argument.
  46 astore_3Store the result (which is put on top of the stack) in local 3 (s)
  47 iinc 4 1Increment local 4 (i) by 1
  50 iload 4Push local 4 (i)
  52 aload_0Push local 0 (a)
  53 arraylengthReplace top of stack with its length (a.length)
  54 if_icmplt 35if i < a.length goto 35 (continue loop)
  57 aload_3Push s on stack
  58 areturnReturn to caller; result is on top of stack.

Eliminating Run-Time Checks

The generated byte codes have several unnecessary run-time checks:
  1. 15 aaload - the run-time bounds checking on the array load, a[i].
  2. 16 aastore - the run-time bounds checking on the array store to ar[i] is unnecessary if we know a.length < 100; a smart compiler will check this once before the loop, and remove the checking inside the loop.
  3. 39 aaload - the run-time bounds checking on the array load, ar[i].
We will attempt to replace these with:
  1. 15 safe_aaload - an array load with no run-time bounds checking
  2. 16 safe_aastore - an array store with no run-time bounds checking (after inserting one check before the loop)
  3. 39 safe_aaload - an array load with no run-time bounds checking
These are not real JavaVM instructions, but one could imagine a future version of Java with a more sophisticated byte code verifier supporting them. We call JVML extended with the safe_aastore and safe_aaload instructions JVML+safe.

Generating Verification Conditions

1. (.10) VCGen
Complete the VCGen function for JVML+safe below. You may assume all the normal JVML instructions are checked by the bytecode verifier, so the only instructions that generate predicates are the new safe_ instructions.

To write the VCGen, we use a shorthand that uses arguments to represent the stack slots. You may assume the byte code verifier ensures the top stack slots match the argument types. You may assume no object is NULL. A complete verifier would also need to check these things.


VCGen (PC) =
   if Inst[PC] = safe_aaload <i: int> <a: array[T]>
       i < a.length /\ i >= 0 /\ VCGen (PC + 1)
   else if Inst[PC] = safe_aastore <val: S> <i: int> <a: array[T]>
            where S <= T (S is a subtype of T)
       ... 
   else % all non-safe instructions
      VCGen (PC + 1)
         (Don't worry about falling off the end or multiple-word instructions.)

Optimized Code

Tortilla Systems certifying optimizing compiler generates the following code for Scrunch:

Method java.lang.String Scrunch(java.lang.String[])
   0 bipush 100
   2 anewarray class 
   5 astore_1
   6 iconst_0
   7 istore_2
   8 check aload_0.length <= 100
   9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100
  10 iload_2
  11 aload_0
  12 arraylength
  13 if_icmpge 22
  14 aload_1
  15 iload_2
  16 aload_0
  17 iload_2
  18 safe_aaload
  19 safe_aastore
  20 iinc 2 1
  21 goto 9
  22 ldc 
  23 astore_3
  24 iconst_0
  25 istore 4
  26 invariant ???
  27 iload 4
  28 aload_0
  29 arraylength
  30 if_icmpge 40
  31 aload_3
  32 aload_1
  33 iload 4
  34 safe_aaload
  35 checkcast 
  36 invokevirtual 
  37 astore_3
  38 iinc 4 1
  39 goto 26
  40 aload_3
  41 areturn
This code differs from the code produced by Sun's JDK in four ways:
  1. Loops have been rearranged to make analysis easier.
  2. The check instruction in line 8 has been added. A check instruction performs a run-time check that its predicate holds. Immediately after a check instruction, the predicate can safely be assumed to be true, since execution would terminate if it were false.
  3. An invariant has been introduced for each loop. This follows the PCC requirement that all back-edges point to instructions with associated invariants. The invariant for instruction 26 is not shown (you will produce it in question 4).
  4. The slow aaload and aastore instructions in the original program have been replaced with the faster instructions safe_aaload and safe_aastore.
We assume Java objects cannot be NULL. Without this assumption, we would need to include tests for null in the check and invariant clauses.

First Loop

This section walks through most of the safety proof required for the first loop. In question 4, you will need to derive the invariant and proof for the second loop yourself.

For simplicity, we can view the loop between instructions 9 and 21 as:

         while i < a.length do
            ar[i] := a[i] % safe array loads and stores
            i := i + 1
         end

To prove the aaload in instruction 18 is safe, we need to show:


VCGen (18 safe_aaload <i> <a>) ==
                  i >= 0 /\ i < a.length /\ VCGen (19) 
(Hint: you should check that your answer to question 1 would produce this predicate.)

We assume (for now) VCGen (19) is true, and show i >= 0 /\ i < a.length.

Since the invariant was provided by the untrustworthy code supplier, we cannot assume it is correct. Instead, we must prove the invariant holds. Then, we use the invariant to prove VCGen(18).

The axiomatic semantics partial correctness (since its a safety proof, we don't care about showing termination) rule for while is:

P => Inv,
Inv { Pred } => Inv,
Inv /\ Pred { Statement } Inv,
(Inv /\ ~Pred) => Q,
___________________________________

P { while Pred do Statement end } Q
Inv is given by instruction 9:
   9 invariant aload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100
We can rewrite this as:
       a.length <= 100 /\ i >= 0 /\ ar.length = 100

P can be any predicate that we can prove from the code before the loop. The check clause gives a.length <= 100, instructions 0-5 give ar.length = 100 and instructions 6-7 give i = 0. This is argued informally, but could be shown using axiomatic semantics rules for assignment along with a specification of anewarray. This gives,

        P == a.length <= 100 /\ ar.length = 100 /\ i = 0

Q is what we need to be true after the loop. Since we won't know this until doing the second loop, we can start with the weakest possible post-condition, Q = true. In question 4, you will find a stronger post-condition is needed, and change Q.

We prove each antecedent clause in turn:

Now, we can use the invariant to prove VCGen (18). At instruction 18, we can assume the invariant is true (because of the above proof, and nothing has been modified since the beginning of the loop), and i < a.length because of the loop predicate. We need to show this is enough to satisfy VCGen (18), assuming VCGen (19):
         a.length <= 100 /\ i >= 0 /\ ar.length = 100 /\ (i < a.length)
     ==> i >= 0 /\ i < a.length /\ VCGen (19)
This is trivially true.

2. (.10) Instruction 19

a. (.05) Predicate.
Show the verification predicate your VCGen generates for instruction 19: safe_aastore <a[i]> <i> <ar>.

b. (.05) Proof.
Show the proof that VCGen (19) is satisfied (assuming VCGen (20) is true). You may use everything that was used in the proof of VCGen (18) above.

Second Loop

For simplicity, we can view the loop between instructions 26 and 39 as:
         while i < a.length do
            s := s.concat ((String) ar[i]) % safe load
            i := i + 1
         end
You should construct your arguments at the same level of detail as the proof for the first loop above.

3. (.40) Safe Load
a. (.05) Verification Predicates.
Show the verification predicate for instruction 34, safe_aaload <i> <ar>.

b. (.15) Invariant.
Write out a loop invariant (missing from instruction 26) that will be sufficient to prove your verification predicate for instruction 34.

c. (.20) Proof.
Use the axiomatic semantics rule for while to prove VCGen (34) is true (assuming VCGen (35)). Your proof should follow the structure of the proof of VCGen (18) --- you should prove the invariant holds first, and then use the invariant to prove VCGen (34).

4 (.20) Safe Cast
The next generation Tortilla systems virtual machine adds an instruction safe_cast <type>. Unlike checkcast which does (expensive) run-time checking to ensure the run-time type satisfies the cast constraint, safe_cast implies the type constraint can be verified statically. Our goal is to replace the checkcast in instruction 35 with

  35 safe_cast <Class java.lang.String>
a. (.05) VCGen
Show the clause added to VCGen to handle safe_cast.

b. (.05)
Show VCGen (35), the verification predicate generated for the safe_cast version of instruction 35.

c. (.10)
Prove VCGen (35) holds for the second loop. You will need to strengthen the invariant, and assume a stronger pre-condition on entry to the loop.

In order to complete the proof, you would need to prove that the pre-condition you used for the second loop is true. This would involve strengthening the invariant for the first loop. (It is somewhat tedious to do this, so it is not recommended that you do so.)

5. (.20) Subtyping
Being oxygen-deprived at the top of the Eiffel tower, Mertrude Bryer suggests adding the following typing judgments to Java:

          S <= T                             (<= means is a subtype of)
   ____________________                      [monotonic-arrays]

   array[S] <= array[T]


  P_1 <= Q_1, ..., P_n <= Q_n, S <= T
_________________________________________    [monotonic-procedures]

     proc (P_1, ..., P_n) returns (S)
  <= proc (Q_1, ... , Q_n) returns (T)

Show that an attacker could exploit these rules by passing an argument to Scrunch that leads to a type safety violation. This means it passes the Java type checker, but contains a type error that is not detected at run time.


CS 655 University of Virginia
Department of Computer Science
CS 655: Programming Languages
David Evans
evans@virginia.edu