University of Virginia, Department of Computer ScienceCS655: Programming Languages, Spring 2001 |

Problem Set 4:

Decaffeinating JavaOut: 10 April 2001

Due: Thursday, 19 April 2001 (in class)

## Formal Semantics and Proof-Carrying Code

PurposeThis problem set will require you to apply several of the formal semantics techniques we have seen in class to the problem of improving the performance of Java programs.Collaboration PolicyYou may choose anyone you want to work with on this assignment. A similar assignment was used in last year's CS655; it would be dishonorable to use it to assist your work.If you choose to work alone, you should turn in your own solution. If you choose to work with someone else, you should turn in a single solution that represents your combined work.

WarningThis problem set is believed to be long and difficult. You should start thinking about these problems early.Optional ReminderThis assignment is optional. You should do as much or as little of it as you think would be worthwhile. For most of you, I believe it will be worthwhile to do all of this problem set, but you should use your best judgement to allocate your time between this assignment, your project, and your other tasks. It won't count against your final grade in this course if you don't do this problem set, but manage to convince me through your class contribution, project and final that you have good understanding of this material (but it is probably hard to get a good understanding of these formal semantics tools without actually attempting to use them).## Background

One of the (many) reasons Java programs run slowly is because of the overhead associated with all the run-time checking. A smart compiler can eliminate much of the unnecessary run-time checking, but this is only useful if it can also construct a proof that convinces an untrusting JavaVM (that doesn't see the source code) that it is safe to execute the program without the run-time checks. The goal of this problem set apply proof-carrying code techniques to remove run-time checking from a Java program.Consider the following Java class:

public class Scrunch { public static String Scrunch (String a[]) { Object [] ar = new Object [100]; for (int i = 0; i < a.length; i++) { ar[i] = a[i]; } String s = ""; for (int i = 0; i < a.length; i++) { s = s.concat ((String) ar[i]); } return s; } }For simplicity, we use a code example that would not appear in a (reasonable) Java program. In a real Java program, this would make more sense ifarwere aVector(which because of the lack of parameterized types in Java must be a Vector of Objects) instead of anObject [].A Java compiler (Sun's JDK) produces the following byte codes (the actual byte codes are shown because it builds character to read them, but we will deal with the code at a higher level for our proof) for the

Scrunchmethod (you can see this for yourself by runningjavap -c -verbose):class

Method java.lang.String Scrunch(java.lang.String[])0 bipush 100Push the constant 100 on the stack 2 anewarray

class <Class java.lang.Object>Pop the top of the stack, and construct a new array of element type java.lang.Objectof that size5 astore_1Store the top of the stack (the array we just created) in local 1 (corresponds to ar)6 iconst_0Push the constant 0 7 istore_2Store it in local 2 (corresponds to i)8 goto 20Jump to instruction numbered 20 11 aload_1Push local 1 ( ar)12 iload_2Push local 2 ( i)13 aload_0Push parameter ( a)14 iload_2Push local 2 ( i)The top of stack is now: [ i,a,i,ar, ...]15 aaloadPop i(top),a(next) from stack; push a[i]aaloadperforms run-time bounds checking16 aastorePop a[i](top),i(next),ar(next) from stack; storea[i]inar[i]17 iinc 2 1Increment the integer in local 2 by one ( i++)20 iload_2Push the integer in local 2 ( i)21 aload_0Push the object in local 0 (the parameter, args)22 arraylengthReplace stack top with length of array 23 if_icmplt 11Pop x (top), y (next) from stack; if y < x jump to 11 (beginning of loop) 26 ldc <String "">Push the constant String ""28 astore_3Store top of stack in local 3 ( s)29 iconst_0Push constant 0 30 istore 4Store it in local 4 ( iin second loop)32 goto 5035 aload_3Push local 3 ( s)36 aload_1Push local 1 ( ar)37 iload 4Push local 4 ( i)39 aaloadPop i(top),ar(next) from stack; pushar[i]40 checkcast <Class java.lang.String>If the runtime type of the top of the stack ( ar[i]) is not a subtype ofjava.lang.String, issue a run-time type error; otherwise, continue knowing its type is a subtype ofjava.lang.String43 invokevirtual

<Method java.lang.String

concat(java.lang.String)>Invoke the method concaton the object at the top of the stack. Note that if this object is an instance of a subtype ofjava.lang.Stringthat overrides toconcatmethod, the method in the subtype is called. Pass the next item on the stack as an argument.46 astore_3Store the result (which is put on top of the stack) in local 3 ( s)47 iinc 4 1Increment local 4 ( i) by 150 iload 4Push local 4 ( i)52 aload_0Push local 0 ( a)53 arraylengthReplace top of stack with its length ( a.length)54 if_icmplt 35if i < a.lengthgoto 35 (continue loop)57 aload_3Push son stack58 areturnReturn to caller; result is on top of stack.

## Eliminating Run-Time Checks

The generated byte codes have several unnecessary run-time checks:We will attempt to replace these with:

15 aaload- the run-time bounds checking on the array load,a[i].16 aastore- the run-time bounds checking on the array store toar[i]is unnecessary if we knowa.length < 100; a smart compiler will check this once before the loop, and remove the checking inside the loop.39 aaload- the run-time bounds checking on the array load,ar[i].These are not real JavaVM instructions, but one could imagine a future version of Java with a more sophisticated byte code verifier supporting them. We call JVML extended with the

15 safe_aaload- an array load with no run-time bounds checking16 safe_aastore- an array store with no run-time bounds checking (after inserting one check before the loop)39 safe_aaload- an array load with no run-time bounds checkingsafe_aastoreandsafe_aaloadinstructions JVML+safe.

## Generating Verification Conditions

1. (.10) VCGen

Complete the VCGen function for JVML+safe below. You may assume all the normal JVML instructions are checked by the bytecode verifier, so the only instructions that generate predicates are the newsafe_instructions.To write the VCGen, we use a shorthand that uses arguments to represent the stack slots. You may assume the byte code verifier ensures the top stack slots match the argument types. You may assume no object is

NULL. A complete verifier would also need to check these things.

VCGen (PC) = if Inst[PC] = safe_aaload<i: int><a: array[T]>i < a.length /\ i >= 0 /\ VCGen (PC + 1) else if Inst[PC] = safe_aastore<val: S> <i: int><a: array[T]>whereS<= T (Sis a subtype ofT) ...else% all non-safe instructionsVCGen (PC + 1)(Don't worry about falling off the end or multiple-word instructions.)## Optimized Code

Tortilla Systems certifying optimizing compiler generates the following code for Scrunch:

Method java.lang.String Scrunch(java.lang.String[]) 0 bipush 100 2 anewarray classThis code differs from the code produced by Sun's JDK in four ways:5 astore_1 6 iconst_0 7 istore_2 8 checkaload_0.length <= 100 9invariantaload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100 10 iload_2 11 aload_0 12 arraylength 13 if_icmpge 22 14 aload_1 15 iload_2 16 aload_0 17 iload_2 18 safe_aaload 19 safe_aastore 20 iinc 2 1 21 goto 9 22 ldc23 astore_3 24 iconst_0 25 istore 4 26 invariant??? 27 iload 4 28 aload_0 29 arraylength 30 if_icmpge 40 31 aload_3 32 aload_1 33 iload 4 34 safe_aaload 35 checkcast36 invokevirtual 37 astore_3 38 iinc 4 1 39 goto 26 40 aload_3 41 areturn We assume Java objects cannot be

- Loops have been rearranged to make analysis easier.
- The
instruction in line 8 has been added. Acheckinstruction performs a run-time check that its predicate holds. Immediately after acheckinstruction, the predicate can safely be assumed to be true, since execution would terminate if it were false.check- An invariant has been introduced for each loop. This follows the PCC requirement that all back-edges point to instructions with associated invariants. The invariant for instruction 26 is not shown (you will produce it in question 4).
- The slow
aaloadandaastoreinstructions in the original program have been replaced with the faster instructionssafe_aaloadandsafe_aastore.NULL. Without this assumption, we would need to include tests for null in theandcheckclauses.invariant

## First Loop

This section walks through most of the safety proof required for the first loop. In question 4, you will need to derive the invariant and proof for the second loop yourself.For simplicity, we can view the loop between instructions 9 and 21 as:

while i < a.length do ar[i] := a[i] % safe array loads and stores i := i + 1 endTo prove the

aaloadin instruction 18 is safe, we need to show:VCGen (18 safe_aaload(<i><a>) == i >= 0 /\ i < a.length /\ VCGen (19)Hint:you should check that your answer to question 1 would produce this predicate.)We assume (for now) VCGen (19) is true, and show

i >= 0 /\ i < a.length.Since the invariant was provided by the untrustworthy code supplier, we cannot assume it is correct. Instead, we must prove the invariant holds. Then, we use the invariant to prove

VCGen(18).The axiomatic semantics partial correctness (since its a safety proof, we don't care about showing termination) rule for

whileis:P => Inv, Inv { Pred } => Inv, Inv /\ Pred { Statement } Inv, (Inv /\ ~Pred) => Q, ___________________________________ P { while Pred do Statement end } QInvis given by instruction 9:9We can rewrite this as:invariantaload_0.length <= 100 /\ iload_2 >= 0 /\ aload_1.length = 100a.length <= 100 /\ i >= 0 /\ ar.length = 100

Pcan be any predicate that we can prove from the code before the loop. Thecheckclause givesa.length <= 100, instructions 0-5 givear.length = 100and instructions 6-7 givei = 0. This is argued informally, but could be shown using axiomatic semantics rules for assignment along with a specification ofanewarray. This gives,P == a.length <= 100 /\ ar.length = 100 /\ i = 0

Qis what we need to be true after the loop. Since we won't know this until doing the second loop, we can start with the weakest possible post-condition,Q = true. In question 4, you will find a stronger post-condition is needed, and changeQ.We prove each antecedent clause in turn:

Now, we can use the invariant to prove VCGen (18). At instruction 18, we can assume the invariant is true (because of the above proof, and nothing has been modified since the beginning of the loop), and

P => Inva.length <= 100 /\ ar.length = 100 /\ i = 0 => a.length <= 100 /\ i >= 0 /\ ar.length = 100This is true sincei = 0 => i >= 0and all the other clauses match exactly.Inv { Pred } => InvTrivially true, since Pred =i < a.lengthis side-effect free.Inv /\ Pred { Statement } InvWe need to show:(a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100) /\ (i_0 < a.length) /\ i_0 = i { ar[i] := a[i] i := i + 1 } a.length <= 100 /\ i >= 0 /\ ar.length = 100We push the second assignment using the axiomatic semantics assignment rule:(a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100) /\ (i_0 < a.length) /\ i_0 = i{ ar[i] := a[i] } a.length <= 100 /\ (i_0 + 1) >= 0 /\ ar.length = 100The first assignment does not change the length of either array or the value of i, so we need to show:a.length <= 100 /\ i_0 >= 0 /\ ar.length = 100 /\ i < a.length /\ i_0 = i ==> a.length <= 100 /\ (i_0 + 1) >= 0 /\ ar.length = 100This holds, since ifi >= 0we knowi + 1must also be>= 0.Inv /\ ~Pred => QSince Q istrue, this always holds.i < a.lengthbecause of the loop predicate. We need to show this is enough to satisfy VCGen (18), assuming VCGen (19):a.length <= 100 /\ i >= 0 /\ ar.length = 100 /\ (i < a.length) ==> i >= 0 /\ i < a.length /\ VCGen (19)This is trivially true.

2. (.10) Instruction 19

a. (.05) Predicate.

Show the verification predicate your VCGen generates for instruction 19:safe_aastore.<a[i]><i><ar>

b. (.05) Proof.

Show the proof that VCGen (19) is satisfied (assuming VCGen (20) is true). You may use everything that was used in the proof of VCGen (18) above.## Second Loop

For simplicity, we can view the loop between instructions 26 and 39 as:while i < a.length do s := s.concat ((String) ar[i]) % safe load i := i + 1 endYou should construct your arguments at the same level of detail as the proof for the first loop above.

3. (.40) Safe Load

a. (.05) Verification Predicates.

Show the verification predicate for instruction 34,safe_aaload.<i><ar>

b. (.15) Invariant.

Write out a loop invariant (missing from instruction 26) that will be sufficient to prove your verification predicate for instruction 34.

c. (.20) Proof.

Use the axiomatic semantics rule forwhileto prove VCGen (34) is true (assuming VCGen (35)). Your proof should follow the structure of the proof of VCGen (18) --- you should prove the invariant holds first, and then use the invariant to prove VCGen (34).

4 (.20)Safe Cast

The next generation Tortilla systems virtual machine adds an instructionsafe_cast <type>. Unlikecheckcastwhich does (expensive) run-time checking to ensure the run-time type satisfies the cast constraint,safe_castimplies the type constraint can be verified statically. Our goal is to replace thecheckcastin instruction 35 with35 safe_cast <Class java.lang.String>a. (.05) VCGen

Show the clause added to VCGen to handlesafe_cast.

b. (.05)

Show VCGen (35), the verification predicate generated for thesafe_castversion of instruction 35.

c. (.10)

Prove VCGen (35) holds for the second loop. You will need to strengthen the invariant, and assume a stronger pre-condition on entry to the loop.In order to complete the proof, you would need to prove that the pre-condition you used for the second loop is true. This would involve strengthening the invariant for the first loop. (It is somewhat tedious to do this, so it is not recommended that you do so.)

5. (.20) Subtyping

Being oxygen-deprived at the top of the Eiffel tower, Mertrude Bryer suggests adding the following typing judgments to Java:

S <= T (<= means is a subtype of) ____________________ [monotonic-arrays] array[S] <= array[T] P_1 <= Q_1, ..., P_n <= Q_n, S <= T _________________________________________ [monotonic-procedures] proc (P_1, ..., P_n) returns (S) <= proc (Q_1, ... , Q_n) returns (T)Show that an attacker could exploit these rules by passing an argument to

Scrunchthat leads to a type safety violation. This means it passes the Java type checker, but contains a type error that is not detected at run time.

University of Virginia Department of Computer Science CS 655: Programming Languages |
David Evansevans@virginia.edu |