cs205: engineering software? (none) 05 April 2010

### Testing

Consider the histogram specification from problem set 2 (question 4):
```public static int [] histogram (int [] a) throws NegativeValue
// EFFECTS:  If a contains any negative values, throws NegativeValue.
//    If a is null, throws a NullPointerException.
//    Otherwise, returns an array, result, where result[x] is the
//    number of times x appears in a.  The result array has
//    maxval(a) + 1 elements.  For example,
//      histogram ({1, 1, 2, 5}) = { 0, 2, 1, 0, 0, 1 }
```
1. (9.4 average out of 10) Describe a good set of black box test cases for the histogram procedure, as specified above.
Black box test cases should test every path through the specification, as well as boundary conditions. For this specification, we should test:
An input that contains a negative value, e.g., { 1, -4, 7, 7 }. If we are being extra vigilant, we should also include cases where all the values are negative, and where only the first, and only the last value are negative. For all of these inputs, the NegativeValue exception should be thrown.
• An input that is null: null. The NullPointerException should be thrown.
• The rest of the specification has only one path, but we should test several different inputs along that path. Good choices include the example in the specification ({1, 1, 2, 5}), an array containing many elements of the same value ({12, 12, 12, 12, 12, 12, 12}), and an array where no element is repeated ({1, 2, 3, 4, 5}).
• Boundary cases — the most important boundary case to consider here is the empty array ({ }). Note that this is different from null. The specification itself is unclear on what should be done with the empty array, since maxval(a) is not defined for an empty array. The most sensible result for this would be to return an empty array, but the specification should be rewritten to make this more clear. Another boundary case is an array containing 0 (this is the boundary of being a negative value), and a singleton array.
2. (8.1 / 10) If an implementation of histogram successfully passes all of your test cases, what can you say about the implementation?
Even if the implementation passes all the test cases, we can say very little about it. We certainly cannot claim that it is correct for all inputs; we can't even claim that it is always correct for the particular inputs we tested. For example, consider this implementation of histogram:
```import java.util.Calendar;

...
public static int [] histogram (int [] a) throws NegativeValue
{
Calendar now = Calendar.getInstance();
if (now.get(Calendar.MONTH) == Calendar.NOVEMBER
&& now.get(Calendar.DAY_OF_WEEK) == Calendar.TUESDAY
&& now.get(Calendar.DAY_OF_MONTH) <= 8) {
... // incorrect histogram implementation
} else {
... // correct histogram implementation
}
}
}
```
(Any similarity between this code and Diebold's voting machines is purely coincidendal.)

### Data Abstraction

Patty Page is implementing a new web search engine. Similar to popular search engines, her search engine will rank pages in response to a query based on the number of times the query word appears on the page, as well as a measure of its popularity based on the number of other pages that link to this page (and how popular they are).

Her design involves two datatypes: WordIndex, a reverse index that maps words to the pages on which they appear, and PageGraph, a graph representing the links between pages.

Consider the specification below for the WordIndex datatype:

```public class WordIndex
// OVERVIEW: A WordIndex is a mutable datatype for representing
//    the words appearing in a collecting of pages.  For each
//    word entry, there is an occurance list of < page, count >
//    entries where page is the URL of a page containing the word,
//    and count is the number of times the word appears on that page.
//    A typical WordIndex is:
//       { < word_1, [ < page_1_1, count_1_1 >,
//                        ..., < page_1_m, count_1,m> ] >,
//         ...
//         < word_n, [ < page_n_1, count_n_1 >,
//                        ..., < page_n_k, count_n_k> ] > }
//    A given page may appear in a word's associated list at most once.
//    A given word may not appear as a key in the WordIndex more
//        than once.

public WordIndex()
// EFFECTS: Initializes this to an empty WordIndex: {}

public void addInstance(String word, String page)
// MODIFIES: this
// EFFECTS: If word is not a word in this, adds a new entry
//   to this:
//     this_post = this_pre U { < word, [ < page, 1 > ] }
//   Otherwise, if the occurance list associated with word does not
//     include page, adds and entry for page with count 1 to word's
//     occurance list.
//   Otherwise, adds one to the count associated with word for page.

public int getCount(String word, String page)
// EFFECTS: Returns the number of times page contains the word
//     in this.
//   If this has no entry for word, the result is 0.
//   If there is an entry for word, but the occurance list does not
//      include page, the result is 0.
//   Otherwise, the result is the count value associated with the
//      entry for page in the occurance list for word.
```
3. (8.7 / 10) Suppose an implementation that satisfies this specification is available. Will this be useful for Patty's search engine implementation? (Either explain how Patty would use this datatype, or what essential functionality she needs is missing and cannot be created using the abstract operations provided.)
The provided datatype is not adequate for use in building a search engine. The problem is that is does not provide adequate observers. There is no method for finding the pages that contain a word. The closest method, getCount, takes both the word and the page and returns the count of the number of times the word appears on the page. If we knew all the pages in the system, we could try calling getCount with each page as a parameter, but the WordIndex datatype provides no observer for obtaining a list of all the pages. Even if such an observer were provided, this would be a ridiculously inefficient way of finding the pages that contain a particular word. For a large set of pages (in the billions if Patty is attempting to index the web), it is infeasible to try each possible page for every query.
Here is a partial implementation of the specified WordIndex datatype:
```    // Rep:
class PageRecord {
String url;
int count;
PageRecord (String url, int count) {
this.url = url;
this.count = count;
}
}

class WordRecord {
String word;
Vector<PageRecord> occurences;
WordRecord (String word) {
this.word = word;
this.occurences = new Vector<PageRecord>();
}
}

private Vector<WordRecord> entries;

// Abstraction Function (r) =
// { < word_i, [ < page_i_j, count_i_j > ] > |
// where
//    word_i = entries.elementAt(i).word
//    page_i_j = entries.elementAt(i).occurences.elementAt(j).url
//    count_i_j = entries.elementAt(i).occurences.elementAt(j).count
//    forall 0 ≤ i < entries.size ()
//    forall 0 ≤ j < entries.elementAt(i).occurences.size()

public void addInstance(String word, String page) {
for (WordRecord r : entries) {
if (r.word.equals(word)) {
for (PageRecord p : r.occurences) {
if (p.url.equals(page)) {
p.count = p.count + 1;
return;
}
}
// No page record found, add a new one with count 1
return;
}
}
WordRecord wr = new WordRecord (word);
}
}
```
4. (9 / 10) Define an adequate rep invariant for the WordIndex implementation shown.
The most important assumptions the rep invariant needs to capture is that there is each word can appear as the word in only one WordRecord in entries, and each page can appear at most once in the occurences list for a given word. The code for addInstance satisfies these properties (but that isn't enough to know they are required in the rep invariant). The abstraction function, though, makes it pretty clear that they are expected. Without these no duplicates properties in the rep invariant, the abstraction function could produce a result that does not satisfy the abstract state requirements stated in the overview specification.

So, an adequate rep invariant would be:

```   entries is not null
all elements of entries are not null
all fields of all elements of entries are not null
entries does not contain multiple WordRecord elements with the same
word
foreach WordRecord w: entries,
w.occurences does not contain multiple PageRecord elements with the same url
foreach PageRecord p: w
p.count >= 1
```
Note that it might also be reasonable to have p.count >= 0 (but this makes less sense when the abstraction function is considered).

Note that this is not a good implementation choice for WordIndex since the time to lookup a word record is linear in the number of words and pages. If we want to build a WordIndex that will scale to support a large index, we should use a datatype with faster indexing such as the HashMap datatype:

```   private HashMap<String, PageRecord> entries;
```
so we can do a lookup for a give word in constant time (that is, the time it takes to find a word does not increase as the number of words in the index increases).

5. (8 / 10) Assuming the rep invariant from your previous answer, provide a correct implementation of the getCount method.

```public int getCount(String word, String page) {
for (WordRecord r : entries) {
if (r.word.equals(word)) {
for (PageRecord p : r.occurences) {
if (p.url.equals(page)) {
return p.count;
}
}
break; // No need to try other entires
}
}
return 0;
}
```
Note that this implementation takes advantage of the no duplicates properties in our rep invariant.
To represent the link structure, Page proposes to use the PageGraph abstract datatype, partially specified below:
```public class PageGraph
// OVERVIEW: A PageGraph is a mutable datatype that represents a
//    set of hypertext pages including links between the pages.
//    A typical PageGraph is:
//     { < url_1, [ <tag_1_1, target_1_1>,
//                    ..., < tag_1_m, target_1_m > ] > ,
//       ...
//       < url_n, [ <tag_n,1, target_n_1>,
//                    ..., <tag_n_k, target_n_k> ] > 1}
//    where the target_i_j's are URLs that may or may not
//    correspond to url_k's in the PageGraph.

public PageGraph()
// EFFECTS: Initializes this to a new, empty PageGraph: { }

public void addPage(String url, String [] tags, String [] targets)
throws DuplicateException
// MODIFIES: this
// EFFECTS: If url is a url in this, throws DuplicateException.
//    Otherwise, adds a new page to the page graph with URL url:
//      this.post = this.pre U
//         { < url,
//              [ < tags[0], targets[0] >, ... <
//                < tags[tags.length - 1],
//                            targets[targets.length - 1 > ]
//           >  }

public String [] getTargets(String url)
// EFFECTS: If url is a url in this, returns an array
//    representing the targets of links from url.
```
Two reasonable possibilities are:
```public void addLink (String url, String tag, String target) throws NoURLException
MODIFIES: this
EFFECTS: If url is not a url in this, throws NoURLException.
list associated with url in this.
```
or,
```public void addLink (String url, String tag, String target)
MODIFIES: this
EFFECTS: If url is not a url in this, adds a new entry to this
containing < url, [ < tag, target> ].  Otherwise,
associated with url in this.
```
If we think of a typical use of PageGraph being in a web crawler where a page is reached first (and added using addPage), and then the links on that page are added (using addLink), the first specification seems preferable. It supports a more defensive style of client programming where new pages have to be added before their links are added.

Many of your specifications also included something like,

```   If the links list for url already contains an element < tag,
target > throws DuplicateException.
```
Although its definitely a good idea to think about duplicates in a specification like this, in this case it doesn't really make much sense to disallow them. Note that the abstract notation used [ ... ] for the link list, not { ... }. In addition, if what we are representing is the links on a page, it is prefectly normal (and common) for a given web page to contain multiple instances of the same link (even with the same tag text). For example, this web page contains two instances of the link < "cs205: engineering software", "http://www.cs.virginia.edu/cs205" >. For determining page rankings, it may be useful to keep track of both of those in the PageGraph. Throwing an exception in this cases, places an unnecessary extra burden on clients of the datatype for no good reason that I can see.

Patty proposes to implement PageGraph using the following representation:

```   // rep:
String tag;
String target;
}

class PageRecord {
String url;
}

private Vector<PageRecord> pages;
```
7. (9.2 / 10) Write a plausible abstraction function for Patty's representation.
We need to create an instance of the abstract notation introduced in the overview specification from the concrete representation:
```AF(c) = { <url_i, [ <tag_i_0, target_i_0>,
..., < tag_i_mi, target_i_mi > ] > |
url_i = pages.elementAt(i).url
mi = pages.links.size () - 1
forall 0 ≤ i < pages.size ()
```
8. (8.6 / 10) Write a rep invariant for the PageGraph implementation.
The rep invariant should ensure that there are not multiple entries with the same url:
```   RepInvariant(c) =
pages is not null
no null values inside pages elements in any field
no elements of pages contain duplicate url values
```
Suppose the overview specification was changed to use a different abstract notation for the PageGraph:
```   // ... A typical PageGraph is:
//  < [ url_1, ..., url_n ],
//       [ <url_k1, tag_k1, target_k1>,
//         ..., <url_km, tag_km, target_km ] >
// where the target_i_j's are URLs that may or may not
// correspond to url_k's in the PageGraph.
```
9. (7.4 / 10) Write a plausible abstraction function for the new specification, assuming the same representation as in questions 7 and 8, but the new overview specification.
Although both abstract notations are reasonable, this one is less similar to the choosen concrete representation, so it is harder to express the abstraction function. The important thing to remember is the goal is to produce an instance of the abstract notation from the concrete representation.
```AF(c) = < [ url_1, ..., url_n], links ]
where
n = pages.size()
url_i = pages.elementAt (i - 1)
foreach PageRecord p: pages {
}
}
```

#### Subtyping

Consider the two interfaces defined below:
```public interface Animal {
public void eat(Object food)
throws CannibalismException,
InedibleException, OvereatingException;
}

public interface Vegetarian {
public void eat(Object food)
throws NonVegetableException, OvereatingException;
}

```
You may assume all the exception datatypes are direct subtypes of Exception.

10. (7.7 / 10) Suppose we implement a datatype Cow that is a subtype of both Animal and Vegetarian (that is, it implements both interfaces). Assuming Java follows the substitution principle correctly, what exceptions can the eat(Object) method for Cow throw? Explain why.

A method can only be declared to throw exceptions that are subtypes of the exceptions thrown by the corresponding supertype method. In this case, we have two supertypes, so it is less clear what should be allowed. But, recall the reason for the original rule: if client code calls the supertype method correctly (that is, it catches all the supertype method's exceptions), it should still work when the subtype method is substituted. This means, the subtype method should not be able to throw any exception that could not be thrown by the supertype. Since Cow has both Animal and Vegetarian as supertypes, the substitution principle requires than anywhere we used Animal before we can now safely use Cow, and anywhere we used Vegetarian before, we can now safely use Cow. This means the exceptions a subtype method may throw must be subtypes of the intersection of the exceptions of all of its supertypes' methods. In this case, the intersection of the exceptions listed by the two supertypes' methods is OvereatingException. Hence, the exceptions listed in the throws clause to the Cow eat method must all be subtypes of OvereatingException.

Bertran Meyer would probably argue that as with the contravariant typing of parameters, this is another place where the substitution principle is in conflict with what we might want. One might argue that it makes more sense to allow a cow to throw all the exceptions of the subtype methods. The best solution might be to make NonVegetableException a subtype of InedibleException.

11. (no credit) Do you feel your performance on this exam will fairly reflect your understanding of the course material so far? If not, explain why.
Most people seemed to think so, except for overemphasizing abstraction functions with two questions. My hope was the first one (question 7) would be straightforward enough that most people would get it well, and question 9 would distinguish between people who really understand how to construct abstraction functions. It also makes the point that you don't have to change the rep to need to change the abstraction function.
12. (no credit) Do you prefer lectures that use mostly slides or mostly chalk?
Three people prefer mostly chalk; two people prefer slides (one with an exclamation point, so that counts extra). The rest prefer a combination or have no clear preference. So, I guess there is no clear answer here, but I will use a mixture depending on what seems to work best for the material. As one response said, "When you switch it up, it keeps us on our toes: we have no idea what to expect next!".