University of Virginia, Department of Computer Science
CS201J: Engineering Software, Fall 2002

Problem Set 2: Using Data Abstractions Out: 3 September 2002
Due: 10 September 2002, before class

Collaboration Policy - Read Carefully

For this problem set, you may either work alone and turn in a problem set with just your name on it, or work with one other student in the class of your choice. If you work with a partner, you and your partner should turn in one assignment with both of your names on it.

Regardless of whether you work alone or with a partner, you are encouraged to discuss this assignment with other students in the class and ask and provide help in useful ways. You may consult any outside resources you wish including books, papers, web sites and people. If you use resources other than the class materials, indicate what you used along with your answer.

Reading: Before beginning this assignment, you should read though Chapter 5.2 except for Chapter 4, Chapter 6-6.3 and Chapter 10-10.2 and 10.7-10.11.

Purpose

Background

For this assignment, you will examine and create some programs that analyze first names selected for American babies. The data was collected from the Social Security Administration's web site (http://www.ssa.gov/OACT/babynames/index.html). We wrote a Java program (you don't need to look at this, but if you are curious you can find the code in GrabNames.java) to download these pages and store the information in a more useful format. The files year[f | m] stores the more popular names for female and male babies in the given year (or decade before 1999). For example, 2001f contains the most popular names for female babies in 2001 and 1920m contains the most popular names for male babies in the 1920s. Each line in these files is a name followed by the fraction of babies given that name. For example, the line in 2001f
Alyssa: 0.006623470702678305
means that 0.66% or about 7 out of 1000 females born in the United States in 2001 were named Alyssa.

We have provided a Java class, StringTable, that provides a data abstraction that associates double values with Strings. The specification for that class is shown below (and in StringTable.spec). We have included ESC/Java annotations (denoted with @ markers) in the specification.

public class StringTable 
{
    // overview: StringTable is a set of <String, double> entries,
    //    where the String values are unique keys.  A typical StringTable
    //    is {<s0: d0>, <s1: d1>, ... }.
    //

    // Specification variable for representing the number of entries in the table:
    //@ghost public int numEntries; 
    
    public StringTable () 
      // effects: Initializes this as an empty table: { }.
      //@ensures numEntries == 0;
      { }

    public StringTable (java.io.InputStream instream) 
       // requires: The stream instream is a names file containing lines of the form
       //                   <name>: <rate>
       //           where the name is a string of non-space characters and the rate is
       //           a floating point number.
       // modifies: instream
       // effects:  Initializes this as a names table using the data from instream.
       { }
    
    public void addName (/*@non_null@*/ String key, double value) 
       throws DuplicateEntryException
       // requires: The parameter name is not null.  (This is what the
       //    ESC/Java /*@non_null@*/ annotation means.)
       // modifies: this   
       // effects:  If key matches the value of String in this, 
       //    throws DuplicateEntryException.  Otherwise, inserts
       //    <key, value> into this.
       //      e.g., if this_pre = {<s0, d0>, <s1, d1>} 
       //            then this_post = {<s0, d0>, <s1, d1>, <key: double>}.
       //                 
       //@modifies numEntries
       //@ensures  numEntries == \old(numEntries) + 1;
       { }  
    
    public double getValue (String key)
       // effects: Returns the value associated with key in this.  
       //    If there is no entry matching key, returns 0.
       //      Note: it would be better to throw and exception (but we
       //      haven't covered that yet).
       { }
    
    public /*@non_null@*/ String getNthLowest (int index)
       // requires: The parameter index is non-negative and less than
       //    the the number of entries in this.
       //@requires index >= 0;   
       //@requires index < numEntries;
       // effects: Returns the key such that there are exactly index
       //    entries in the table for with the value of the entry is
       //    lower than the value of the returned key.  If two keys have 
       //    the same value, they will be ordered in an arbitrary way
       //    such that getNthLowest (n) returns the first key and
       //    getNthLowest (n + 1) returns the second key. 
       //
       //    e.g., getNthLowest (0) returns the key associated with
       //             the lowest value in the table.
       //          getNthLowest (size () - 1)  returns the key 
       //              associated with the highest value in the table.
       { }
   
    public int size ()
       // effects: Returns the number of entries in this.
       //@ensures \result == numEntries;
       { }
    
    public String toString ()
       // effects: Returns a string representation of this.
       { }

    public /*@non_null@*/ StringIterator names () 
       // Note: this should be called keys, but is called names in
       //    the implementation we provided.
       // effects: Returns a StringIterator that will iterate through 
       //    all the keys in this in order from lowest to highest.
       { }
}

The last method of StringTable returns an interator (see Chapter 6) that iterates through the keys in the StringTable in order. The specifiction for StringIterator is in StringIterator.spec.

Getting Started with ESC/Java

The Extended Static Checker for Java (ESC/Java) is a tool that attempts to find common programming errors in Java programs by static analysis of the program text. Because ESC/Java is analyzing the code itself, it can find problems that may not be revealed in testing.

Consider the (buggy) AverageLength program below:

// import cs201j.*; - cut for now because of problem with ITC file locations
import java.io.*;

public class AverageLength {
   public static void main (/*@non_null@*/ String args[]) throws RuntimeException
    {
        String filename = args[0];
        
        try {
            FileInputStream infile = new FileInputStream (filename);
            StringTable names = new StringTable (infile);
            int numnames = names.size ();
            int totallength = 0;
            
            // Calculate the average length of all the names in the file.
            
            for (int index = 0; index <= numnames; index++) {
                String name = names.getNthLowest (index);
                totallength = totallength + name.length ();
            }
            
            System.out.println ("The average name length is: " 
                                + (double) totallength / numnames); 
            // The double cast is necessary to produce a precise (non-integer result)
        } catch (FileNotFoundException e) {
            System.err.println ("Cannot fine file: " + filename);
            System.exit (1);
        }
    }
}

Downloads: You will need to download this file to your machine: ps2.zip

Create a cs201j sub-directory in your home directory, and a ps2 subdirectory in that directory. Unzip ps2.zip in that subdirectory by executing unzip ps2.zip in a command shell.

Try running ESC/Java on the AverageLength.java program. Your results should match this:

ESC/Java version 1.2.4, 27 September 2001

AverageLength ...

AverageLength: main(java.lang.String[]) ...
------------------------------------------------------------------------
AverageLength.java:7: Warning: Array index possibly too large (IndexTooBig)
        String filename = args[0];
                              ^
------------------------------------------------------------------------
AverageLength.java:18: Warning: Precondition possibly not established (Pre)
                String name = names.getNthLowest (index);
                                                 ^
Associated declaration is "./StringTable.spec", line 47, col 10:
       //@requires index < numEntries;
          ^
Execution trace information:
    Reached top of loop after 0 iterations in "AverageLength.java", line 17, col 12.

------------------------------------------------------------------------
    [0.411 s]  failed

AverageLength: AverageLength() ...
    [0.012 s]  passed
  [1.302 s total]
2 warnings

The second warning reports that the precondition for getNth may not be established. The precondition (described by the requires clause is that the index must be between 0 and the number of entries in the StringTable object. The ESC/Java annotations express this precisely as
                //@requires index >= 0;	
                //@requires index < numEntries;

Question 1: (5) Explain in clear English what the problem is with the program and how to fix it. (Hint: you should only need to remove one character from AverageLength.java to fix the problem.)

After fixing this problem, you should be able to run AverageLength on the sample input files with reasonable results. Running ESC/Java should produce two warnings (assuming you fixed the problem in Question 1 in the simplest possible way).

Question 2: (10) For both of the warnings ESC/Java has reported, explain a situation in which the code would produce a run-time error. Illustrate your explanations with a sample test case that demonstrates the problem.

Question 3: (10) Improve the AverageLength code to fix the two problems reported by ESC/Java. Once you have made the changes, ESC/Java should report no warnings for your modified code. Note that you cannot necessarily do something useful for all possible inputs; it is reasonable in this case, to just exit gracefully instead of with a run-time error.

Procedural Specifications

We did not provide a specification for AverageLength.main, but you were able to guess more or less what it should do by its name.

Question 4: (10) Write a specification for the main method of AverageLength. Your specification should follow the requires/modifies/effects style from Liskov Chapter 3. Your code for Question 3 should satisfy this specification.

Trendy Names

Suppose we wanted to answer questions like, "What name gained the most popularity over the past 2 years?" Some might believe trends in baby names reveal important social phenomenon about society, but we won't make any such speculations.

Your task is to create a program that takes in two name files, and produces as output the names ordered by how much more popular they are in the second file as they were in the first file. That is,

rate of name in second file - rate of name in first file
--------------------------------------------------------
             rate of name in first file

For example, with the input files earlypresidents (which has the first name rates for US Presidents before 1900) and latepresidents (which has the first name rates for US Presidents since 1900), your program should produce output similar to:

> java NameTrends earlypresidents latepresidents
Thomas -1.0
Martin -1.0
Zachary -1.0
Millard -1.0
Abraham -1.0
Ulysses -1.0
Rutherford -1.0
Chester -1.0
Grover -1.0
Benjamin -1.0
Andrew -1.0
James -1.0
John -0.5555552
Franklin 0.33332373341013255
William 0.3333385333541334
George 1.6666450668394654
The negative values mean the name appeared more frequently in earlypresidents than in latepresidents. For example, -1.0 for Thomas indicates that there were 100% fewer Thomas's in the late presidents than in the early presidents and 1.66 for George indicates that the rate of George's was 166% higher in the late presidents (2 out of 18) than the early presidents (1 out of 24).

Question 5: (10) Write a specification for the program that caluculates and prints out naming trends as described above. If the description above is insufficient to write a good specification, you should make assumptions as necessary to write a precise specification.

Question 6: (25) Implement a program that satisfies your specification. Your program should produce no warnings when checked by ESC/Java.

Try your program out on some of the SSA name data to find out what names have been gaining and losing popularity in the past few years, and over the decades.

Question 7: (20) Estimate how many tests would be needed to perform black-box and glass-box path complete testing for your program. Suggest and carry out a feasible testing strategy that will give you high confidence that your program will always work correctly.

Question 9: (10) How confident are you that your program will always work as intended? (Where "as intended" means as you specified it in question 6, except if there are inputs that are not covered by your specification it must behave as the course staff intended.) Express your answer as a bet of between 0 and 20 points. If the customer (grader) agrees that your program always works as intended, you get the points. If not, you lose twice your bet.

Turn-in Checklist: On Tuesday, 10 September, bring to class a stapled turn in containing your written answers to all questions and all the code you wrote. Also email the code you wrote for Question 6 to cs201j-staff@cs.virginia.edu.

Credits: This problem set was developed for UVA CS 2001J Fall 2002.


CS201J University of Virginia
Department of Computer Science
CS 201J: Engineering Software
Sponsored by the
National Science Foundation
cs201j-staff@cs.virginia.edu