University of Virginia, Department of Computer Science
CS201J: Engineering Software, Fall 2003

Problem Set 3: Implementing Data Abstractions Out: 9 September 2003
Due: Tuesday, 23 September 2003, beginning of class

Collaboration Policy - Read Carefully

For this problem set, you may either work alone and turn in a problem set with just your name on it, or work with one other student in your section of your choice. If you worked with a partner on PS2, you may not work with the same person for PS3. If you work with a partner, you and your partner should turn in one assignment with both of your names on it.

Regardless of whether you work alone or with a partner, you are encouraged to discuss this assignment with other students in the class and ask and provide help in useful ways. You may consult any outside resources you wish including books, papers, web sites and people. If you use resources other than the class materials, indicate what you used along with your answer.

Reading: Before beginning this assignment, you should read Chapter 5.3-5.10.

Purpose

In this assignment, you will answer questions about data abstraction, and then use the Google API to implement a program that ranks words most commonly associated with a particular search term. You have two weeks to complete this assignment. You should finish at least the first three questions before September 16 to have enough time to complete the rest of the assignment.


Reasoning About Data Abstractions

First, we will consider a implementing the Poly abstraction from Chapter 5 (specified in Figure 5.4). Suppose we implement Poly using a java.util.Vector. The Vector type allows us to represent an ordered collection of Objects. The objects we will store in our vector are records of <term, coefficient>. Here is the representation:
import java.util.Vector;

class TermRecord {
   // OVERVIEW: Record type 
   int power;
   int coeff;
   TermRecord (int p_coeff, int p_power);
}

public class Poly {
   private Vector terms; // A Vector of TermRecord objects.

   ...
}
Suppose this is the implementation of degree:
   public int degree () {
      // EFFECTS: Returns the degree of this, i.e., the largest exponent
      //    with a non-zero coefficient.  Returns 0 if this is the zero Poly.
      return terms.lastElement ().power;
    }

Question 1: (8) What rep invariant would make the implementation of degree above satisfy its specification?

In an alternate implementation with the same rep, suppose this is the implementation of coeff:

   public int coeff (int d) {
      // EFFECTS: Returns the coefficient of the term of this whose
      //     exponent is d.  

      int res = 0;
      Enumeration els = terms.elements ();

      while (els.hasMoreElements ()) {
	 TermRecord r = (TermRecord) els.nextElement ();
	 if (r.power == d) { res += r.coeff; }
      }

      return res;
   }
Question 2: (7) What rep invariant would make the implementation of coeff above satisfy its specification? Explain how a stronger rep invariant would make it possible to implement coeff more efficiently.

Annotating Rep Invariants

Directions:
  • Download this file to your machine: ps3.zip
  • Save it to c:\localdata as you did with Problem Set 2.
  • Log into your Home Directory.
  • Open Eclipse by selecting Start | SEAS | Java Apps | Eclipse 2.1 | Eclipse CS201J (from the Windows Start menu)
  • Inside Eclipse (as with Problem Set 2):
    • Click on File | New | Project
    • Select the default option: Java->Java Project
    • Type in ps3-1 as the project name and click Finish.
    • Click Yes when Eclipse asks you if you want to switch to the Java perspective
    • Then, in Eclipse's Package Explorer, right click on ps3 and select Import.
    • Select Zip file as the import source.
    • Type c:\localdata\ps3.zip into the "From zip file" box. Hit tab. Then click Finish.

    Now, set up your project to run ESC/Java on the source code files.
    • Click on Project | Properties.
    • Click on the ESC/Java option on the left.
    • Click on Enable ESC/Java.
    • Click Ok.

The StringSet data abstraction is similar to IntSet from Figure 5.6. We represent a set of strings using a Vector. Its abstraction function is similar to that for IntSet (p. 101):

   AF(c) = { c.els[i] | 0 <= i < c.els.size }
And its rep invariant (p. 102) is:
   c.els != null && 
   for all integers i . c.els[i] is a String &&
   there are no duplicates in c.els
If we run ESC/Java on StringSet.java, eight warnings are produced.

The first warning message is for StringSet.java line 26:

Warning: Possible null dereference (Null)
      if (getIndex (s) < 0) els.add (s); 
                               ^
In Java, a variable declared with an object type can either hold a value of the declared type, or the special value null (which is treated as a value of any object type). So, els is either a Vector or null. But, if els is null, it would be an error to invoke a method on it, hence ESC/Java produces a warning for the method call els.add. How does the programmer know this call is safe?

The answer is the rep invariant — it contains the term c.els != null, so we know els is not null for the this object at the beginning of insert. Since nothing in insert assigns to els, we know it is still not null when we call els.add.

To prevent the ESC/Java warning, we need to document the rep invariant using a formal annotation. Formal annotations are Java comments that are ignored by the Java compiler, but interpreted by ESC/Java. They are denoted by a @ character after the comment open. We express it as:

   //@invariant els != null

Add the annotation to StringSet.java. It should be after the declaration of the representation.

Running ESC/Java should now produce four warnings. The first message warns that the precondition for add may not be satisfied for StringSet.java line 29:

Warning: Precondition possibly not established (Pre)
     if (getIndex (s) < 0) els.add (s); 
                                   ^
Associated declaration is "C:\Program Files\escjava\lib\specs\java\util\Collection.spec", line 217, col 8:
    //@ requires !containsNull ==> o!=null
        ^
A Vector may contain either all non-null elements or some possibly null elements. If the Vector containins only non-null elements, the precondition for add requires that the parameter is non-null. Note that our informal rep invariant did not preclude having null in the Vector.

To make ESC/Java happy, we must explicity state whether or not the Vector can contain null. We choose to allow null in the Vector by adding

      //@invariant els.containsNull == true
to indicate that the els Vector may contain null.

The second message is similar to the first one, except it is warning about the type of the Vector elements, not whether or not they are null:

StringSet.java:27: Warning: Precondition possibly not established (Pre)
        if (getIndex (s) < 0) els.add (s); 
                                      ^
Associated declaration is "C:\Program Files\escjava\lib\specs\java\util\Collection.spec", line 218, col 8:
    //@ requires \typeof(o) <: elementType || o==null
        ^
Since the elements of a Vector may be any object type, we need to document the actual type of the Vector elements. In our informal rep invariant, we expressed this as for all integers i . c.els[i] is a String. We can express this with a formal annotation: //@invariant els.elementType == \type(String).

So, now our invariant annotations are:

    //@invariant els != null
    //@invariant els.containsNull == true
    //@invariant els.elementType == \type(String)
Add the new annotations to StringSet.java and try rebuilding. ESC/Java should now produce four warnings. The first two reveal bugs in ESC/Java:
StringSet: StringSet() ...
------------------------------------------------------------------------
StringSet.java:24: Warning: Possible violation of object invariant (Invariant)
    }
    ^
Associated declaration is "StringSet.java", line 18, col 7:
    //@invariant els.containsNull == true
       ^
------------------------------------------------------------------------
StringSet.java:24: Warning: Possible violation of object invariant (Invariant)
    }
    ^
Associated declaration is "StringSet.java", line 19, col 7:
    //@invariant els.elementType == \type(String)
ESC/Java is not able to prove the invariant is true for the constructor, but by inspecting the code we can convince ourselves that it is. The Vector constructor returns a vector with no elements, so it does not contain null, and every element it contains (that is, none) is of type String. We can add set annotations to convince ESC/Java the invariant is true:
    public StringSet () {
	// EFFECTS: Initializes this to be empty: { }
	els = new Vector ();
	//@set els.elementType = \type(String)
	//@set els.containsNull = true
    }  // ESC/Java is unable to prove the invariant for the empty constructor without the set's.

The next warning is:

StringSet: remove(java.lang.String) ...
------------------------------------------------------------------------
StringSet.java:38: Warning: Precondition possibly not established (Pre)
        els.removeElementAt (i);
                            ^
Associated declaration is "C:\Program Files\escjava\lib\specs\java\util\Vector.spec", line 569, col 8:
    //@ requires index < elementCount ;
        ^
The precondition for Vector.removeElementAt requires that the value of the parameter is a valid index of an element in the vector: requires index < elementCount. The elementCount is a specification variable that indicates the number of elements in the vector.

We know this is safe because getIndex always returns a value less than elementCount. To enable ESC/Java to use this, we need to document it as a postcontition of getIndex. This is done by adding an ensures clause (which has the same meaning as an informal EFFECTS clause):

    private int getIndex (String s) 
       //@ensures \result < els.elementCount
The final warning is:
StringSet: getIndex(java.lang.String) ...
------------------------------------------------------------------------
StringSet.java:50: Warning: Possible null dereference (Null)
            if (s.equals (els.elementAt (i))) {
                 ^
The method call dereferences s, the parameter to getIndex. We could either add a precondition that s is not null, or fix the code of getIndex to handle the case where s is null. Since we decided to allow null in the StringSet, we take the second approach and change the implementation of getIndex to work when s is null:
    private int getIndex (String s) 
        //@ensures \result < els.elementCount
    {
	// EFFECTS: If x is in this returns index where x appears, else returns -1.
	for (int i = 0; i < els.size (); i++) {
	    if (s == null) {
		if (els.elementAt (i) == null) {
		    return i;
		}
	    } else {
		if (s.equals (els.elementAt (i))) {
		    return i;
		}
	    }
	}

	return -1;
    }
Now, running ESC/Java on StringSet.java produces no warnings.

Note that we did not include the there are no duplicates in c.els part of our informal invariant in our formal annotations. Some terms in invariants are too complex to describe and check with ESC/Java. To check this at runtime, we should define and use a repOk method.

Question 3: (10) In the example above, we decided to allow null in the els vector. A reasonable alternative design decision would be to not allow null in the els vector. Modify StringSet.java to not allow null in the string set.

Replace the

   //@invariant els.containsNull == true
you added earlier with
   //@invariant els.containsNull == false

Check your implmentation with ESC/Java, and add annotations or fix the code so running ESC/Java on your program produces no warnings. (Recall that you can require a parameter to be non-null using //@requires s != null.)

The Google API

Google indexes 3 Billion web pages and responds to 200 million search requests per day using a cluster of more than 10,000 servers. The Google API provides a simple interface to Google's search engine through a Java program.

The WordTally Abstraction

A useful data abstraction will keep track of tallies of the number of times a word appears. Your assignment is to implement the WordTally datatype that does that. Your implementation must satisfy this specification:
public class WordTally {
    // Overview: WordTally is a mutable datatype for associating counts
    //    with words.  A typical WordTally is
    //        { < w1, t1 >, < w2, t2 >, ... , < wn, tn > }
    //    Each word in a < word, tally > pair is unique, and
    //    all tally values must be nonzero.
    
    //@ghost public int numEntries; // Number of words in the table.
    
    public WordTally()
	// EFFECTS: Initializes this to an empty WordTally: { }
	//@ensures numEntries == 0;
    { }
    
    public void tallyWord(/*@non_null@*/ String word)
	// EFFECTS: Increases the count associated with word in this by
	//     one.  If this_pre = { ..., < word, t >, ... } then
	//     this_post = { ..., < word, t + 1 >, ... }.  If there
	//     is no entry in this whose word matches word, then
	//     this_post = this_pre U { < word, 1 > }.
	// MODIFIES: this
	//@modifies numEntries;
	//@ensures numEntries >= \old(numEntries);
    { }
    
    public void resetWord(String word)
	// EFFECTS: Set the tally associated with word to 0.
	//    If this_pre = { ..., < word, t >, ... } then 
	//    this_post = this_pre - { < word, t> }.  Otherwise,
	//    this_post = this_pre.
    { }
    
    public int getTally(String word)
	// EFFECTS: Returns the tally associated with word.  If
	//    this = { ..., < word, t >, ... } returns t.  
	//    If there is no entry in this whose word matches
	//    word, returns 0.
	//@ensures \result >= 0;
    { }
    
    public int numberOfWords()
	// EFFECTS: Returns the number of words in this. 
	//@ensures \result == numEntries;
    { }
    
    public String getRankedWord(int n)
	// REQUIRES: n >= 1
	//@requires n >= 1;
	// EFFECTS: If this contains at least n words, returns the word
	//     whose tally is nth highest.  If two words have the same
	//     tally value, they are ranked alphabetically.  If this
	//     contains fewer than n words, returns null.
	// 
	//     For example, if 
	//        this = { < "cookies", 27 >, < "marbles", 25 >,  
	//     < "ducks", 25 > } then getRanked (1) should return
	//     "cookies", getRanked (2) should return "ducks", 
	//     getRanked (3) should return "marbles", and 
	//     getRanked (4) should return null.
    { }
    
    public String toString()
	// EFFECTS: Returns a string representation of this.
    { }
}

Question 4: (15) Describe at least two possible representation choices for WordTally. Discuss the advantages and disadvantages of each representation. Select which representation you will use for your implementation and explain why you selected the representation you did. In particular, you should consider:
  • Which methods will be harder to implement with one representation, but easier to implement with the other representation?
  • Which methods will be more efficient with one implementation than the other?
  • What types of programming mistakes are more likely with each representation?
  • Will the choice of representation have any impact on how much work it is to satisfactorily test your implementation?
Your choice of representation will have a big impact on how difficult it is to implement the WordTally data type, so think carefully about how you will implement all the methods of WordTally in choosing your representation. Keep in mind that your time is usually more valuable than computing time — its better to choose an ineffecient design that will be easier to implement correctly and test, then to choose an efficient design that will be more complex. One of the advantages of data abstract is that if you found out later that the inefficient design was too slow, you could replace it with the efficient design without having to change code outside the data abstraction.

Question 5: (10) What is your abstraction function? Your abstraction function should map your concrete representation to the abstract notation used in the WordTally specification.

Question 6: (10) What is your representation invariant? First, express your representation invariant informally using clear English. Then, express as much as possible about your representation invariant using ESC/Java invariant annotations.

Question 7: (25) Implement the WordTally datatype. Your implementation should include ESC/Java annotations to document your invariant and preconditions and postconditions of methods as necessary to remove all ESC/Java warnings.

We have provided the MostCommonWords.java class that uses the WordTally datatype and the Google API. Before running MostCommonWords, you will need to edit the value of googleKey to use your google developer's key. After that, try running with main class MostCommonWords and Program Arguments "\"University of Virginia\"" (the quotes should be exactly as shown, so the quotes around University of Virginia are preserved as part of the google query). You should get a result similar to that below (determining whether or not Google's database accurately reflects the importance of students to the University of Virginia beyond the scope of this assignment):

Scanning page: http://www.virginia.edu/
   Good words: 48
Scanning page: http://etext.lib.virginia.edu/
   Good words: 88
Scanning page: http://etext.lib.virginia.edu/uvaonline.html
   Good words: 140
Scanning page: http://www.lib.virginia.edu/
   Good words: 50
Scanning page: http://www.uva.edu/
   Good words: 16
Scanning page: http://hsc.virginia.edu/
   Good words: 99
Scanning page: http://www.cs.virginia.edu/
   Good words: 162
Scanning page: http://www.law.virginia.edu/
   Good words: 61
Scanning page: http://www.upress.virginia.edu/
   Good words: 60
Scanning page: http://www.darden.edu/
   Good words: 93
Different words: 568
1. Virginia (21)
2. University (19)
3. Center (16)
4. Health (9)
5. Library (8)
6. Medicine (8)
7. virginia (7)
8. images (6)
9. substring (6)
10. Rector (5)
11. September (5)
12. Visitors (5)
13. texts (5)
14. Darden's (4)
15. School (4)
16. about (4)
17. access (4)
18. award (4)
19. collections (4)
20. document (4)
21. library (4)
22. other (4)
23. paper (4)
24. provide (4)
25. students (4)
Finished processing results successfully.


Question 8: (10) Devise a testing strategy for your implementation. You should add a main method to your TallyWord class that runs the test cases. Describe your testing strategy in general, and any problems in your implementation that were discovered in testing.

Question 9: (10) Submit your code using the Submission Form. Your program will be tested on public test cases, as well as secret test cases (which will not be revealed until after the assignment is due). You receive 10 points if your code passes all the test cases.

Question 10: (Optional, worth up to 100 bonus points) Improve the MostCommonWords program to produce better results or reveal something more interesting about content on the web. This is optional. Before attempting it, make sure you have all the basic functionality working first. Be creative! You can do anything you want for this.

Turn-in Checklist: On Tuesday, 23 September, bring to class a stapled turn in containing your written answers to all questions and all the code you wrote (including the modified StringSet.java for question 3).

Corrections

9 September 2003: Question 4 should read "Describe at least two possible representation choices for WordTally" instead of "Describe at least two possible representation choices for TallyWord" as originally. (Fixed above).

14 September 2003: The abstraction function should be

   AF(c) = { c.els[i] | 0 <= i < c.els.size }
(note the < instead of <=).

Credits: This problem set was developed for UVA CS 201J Fall 2003 by Leonid Bolotnyy and David Evans.


CS201J University of Virginia
Department of Computer Science
CS 201J: Engineering Software
Sponsored by the
National Science Foundation
cs201j-staff@cs.virginia.edu