cs205: engineering software?
20 September 2010
Problem Set 3 - Comments
Implementing Data Abstractions

1. (8.375 average out of 10) What abstraction function and rep invariant are needed to make the implementation of degree above satisfy its specification?

This is the implementation of degree:
   public int degree () {
      // EFFECTS: Returns the degree of this, i.e., the largest exponent
      //    with a non-zero coefficient.  Returns 0 if this is the zero Poly.
      return terms.lastElement ().power;
We need these properties to ensure the code will execute without any possible run-time exceptions:
terms != null (otherwise terms. could give a null object exception)
terms.size > 0 (otherwise lastElement () could give a NoSuchElementException)
terms does not contain null (otherwise .power could give a null object exception)
This is not enough to know the implementation meets its specification. We also need to know also that the last element in the vector corresponds to the element with the largest exponent with a non-zero coefficient. A sufficient rep invariant would state that no element in the terms array has a power higher than that of the last element and a non-zero coefficient:
for all 0 <= i < terms.size - 1:
   terms[i].power < terms[terms.size - 1].power \/ terms[i].coeff == 0
/\ terms[terms.size-1].coeff != 0
A more reasonable rep invariant would require that the terms are sorted by their power value, and that all the terms have non-zero coefficients (except for representing the zero poly):
terms.size == 1 && terms[0].power == 0 && terms[0].coeff = 0
\/ for all 0 <= i < terms.size:
      terms[i].power >= 0
      terms[i].coeff != 0
      for all 0 < j < i, terms[j].power <= terms[i].power
In an alternate implementation with the same rep (that is, the abstraction function and rep invariant may be different), suppose this is the implementation of coeff:
   public int coeff (int d) {
      // EFFECTS: Returns the coefficient of the term of this whose
      //     exponent is d.  

      int res = 0;
      for (TermRecord r : terms) {
	 if (r.power == d) { res += r.coeff; }

      return res;
2. (8.625 / 10) What rep invariant would make the implementation of coeff above correctly satisfy its specification?
Note that the loop sums the values of the coefficients for every record with its power matching d. This indicates that the author of this code is not assuming a given power can only exist in the terms once. Hence, the rep invariant could be:
terms != null (otherwise terms. could give a null object exception)
terms does not contain null
Nothing else is needed to be convinced the implementation is correct.
3. (9.125 / 10) Explain how a stronger rep invariant would make it possible to implement coeff more efficiently.
A stronger rep invariant would make it possible to implement coeff more efficiently. If the implementer of coeff could rely on the terms not containing duplicate powers, then we could implement coeff to return right away after finding the first matching power, instead of having to look through all the terms.

If we made the rep invariant even stronger:

terms[i].power = i for 0 <= i < terms.size
(that is, the vector contains a term record for every power in order) we could implement coeff with just:
public int coeff (int d) {
   if (d >= terms.size ()) return 0;
   else return terms.getElementAt (d).coeff;
Note that this rep invariant does not work well for sparse polys (e.g., representing 3x2034 now requires a terms vector with 2035 elements).
4. For each representation choice (a, b, and c), provide an abstraction function and rep invariant.
a. (7.6 / 10)
Vector<String> nodes;
boolean [][] edges;
The most obvious interpretation of this representation is that the boolean value at edges[i][j] determines if there is an edge from nodes.elementAt(i) to nodes.elementAt(j). This means the nodes vector must not contain duplicates (otherwise there could be different edge values for the same node pairs). It also means the size of the edges array must be at least as big as the nodes vector in both directions. It would be reasonable to require it to be equal in size instead, but that would be overly restrictive since it would require copying the edges matrix into a new, bigger matrix everytime a node is added.
   Abstraction function:
     Nodes = { nodes[i] | 0 <= i < nodes.size () }
     Edges = { { nodes[a], nodes[b] } | 
                  forall 0 <= a, b < nodes.size()
                     where edges[a][b] = true }
   Rep Invariant:
      nodes != null; edges != null
      no duplicates in nodes
      edges.length >= nodes.size ()
      forall 0 <= i < nodes.size: edges[i].length = edges.length
         The last two clauses state that edges is a square matrix, 
         at least as big as nodes (but possibly bigger).
b. (4.0 / 5)
Set<String> nodes;
Set<Edge> edges;
where Edge is a record type containing two String values:
class Edge { 
   String a, b; 
   Edge (String p_a, String p_b);
This one maps very naturally onto the abstract notation. The only important invariant is that all the names of nodes in the Edge objects in edges match names of nodes in nodes:
Abstraction Function:
    Nodes = nodes
    Edges = edges
Rep Invariant:
     nodes != null
     edges != null
     for all e in edges, e.a and e.b are in nodes
c. (3.5 / 5)
Set<NodeRecord> rep;
where NodeRecord is a record type that records a String and an associated set of Strings:
class NodeRecord {
   String key;
   Set<String> values;
Here, the abstraction function is more complicated since we need to extract the nodes and edges from the NodeRecord objects:
Abstraction Function:
   Nodes = { el.key | el is an element in rep }
   Edges = { { el.key, value } | el is an element 
               in rep and value is an element in el.values }

   or, more precisely
     Edges = {}
     for (NodeRecord r: rep) {
        for (String val: r.values) {
           Edges = Edges U { { r.el, val } }

Rep Invariant:
   rep != null
   elements of rep are not null
   all elements in e.values where e is an element of rep match the
      value of f.key for f some element of rep
5. (7.375 / 10) Which representation choice would make implementing addNode most difficult? Explain why.
Representation A. To add a node, we need to not only add it to the nodes vector, but also may need to expand the size of the edges matrix to preserve the rep invariant. For both representations B and C, adding a node can be done by just adding a new element to a set.
6. (4.375 / 5) Which representation choice would enable the most efficient getAdjacent implementation? Explain why.
The only reasonable answer to this question is "it depends". It depends on two things: If efficient means the asymptotic running time of getAdjacent, then we need to consider how the work scales with the size of the graph. (If you have not yet taken CS150 or CS216, this explanation probably won't make much sense. Don't worry about this, but it is included since CS150 graduates should be thinking this way.) For representation A, we need to find the node in the nodes vector. With the given rep invariant that does not impose any ordering on the strings in the vector, this requires up to one comparison with every string in the vector, so it is Θ(n) where n is the number of nodes. Then, we need to look through one row of the edges matrix to find the edges. For each true value, we find the corresponding element in nodes (this is constant time), and add the corresponding string to the result (also constant time, given a good Set implementation). The size of the matrix is the number of nodes, n, so this operation is also Θ(n). Performing two Θ(n) operations in sequence is also Θ(n), so the total running time for implementation A scales linearly with the number of nodes in the graph. For representation B, we need to go through the elements of the Edge array to find all edges that start with the parameter node. This requires e iterations where e is the number of edges in the graph, so it Θ(e). Note that the number of edges in a graph can scale as the square of the number of nodes (if every node is connected to all nodes), so this is worse that Θ(n) for representation A. For representation C, we need to find the node record element of the rep with a key matching the parameter, and then return a copy of the set of values associated with that node. (Note that it must be a copy, otherwise the rep is exposed.) This requires Θ(n) iterations to look through the set elements, and then Θ(n) work to copy the values (the maximum number of edges for a given node it n). So, it is equivalent to representation A, Θ(n). Thus, for asymptotic running time, the most efficient representations are A and C.

If efficient means the actual running time on a particular Java implementation, then it is hard to know without knowing details of the underlying Vector and Set implementations.

If efficient means minimal memory usage, then they are all equivalent (all need to create the Set to return), unless we allow the rep to be exposed. If rep exposure is allowed (that is, we modify the spec for getAdjacent to require that the called may not modify the result or use it after the graph is modified), then C can be implemented most efficiently by just returning the corresponding values Set.

The other part of the efficiency question depends on what types of graphs we are representing. If the graphs are very dense (the number of edges is scaling as the square of the number of nodes), then the first representation may be best since it represents the edges with a fixed size matrix. If the graphs are sparse (there is a large number of nodes, but most nodes are just connected to a few other nodes), then we are better off with either B or C.

7. (16.75 / 20) Implement the StringGraph datatype specified above. You may use any of the datatypes from PS2 you want except the DirectedGraph datatype (note that the specification suggests using the NoNodeException, DuplicateException, and Set provided datatypes). Your implementation may use any representation you want (including the ones describe above, but not limited to those choices). Your implementation should clearly document its abstraction function and rep invariant.
My implementation using the three different reps from question 4 are attached and available in http://www.cs.virginia.edu/cs205/ps/ps3/ps3-mine.zip. Note that I made StringGraph into an interface, so I can have the three different implementations StringGraphA, StringGraphB, and StringGraphC implement that interface. This makes them all subtypes of StringGraph, so they can be used interchangeably (as they are in the test code). Implementation B is the shortest, but C is probably the simplest except for the complex toString method (which is related to the complex abstraction function needed).

8. (8.875 / 10) Describe a testing strategy for your StringGraph datatype. Include all the code you developed for testing in your answer.
We can develop most of the testing strategy independent of the implementation (that is, black box testing). We should try all operations on an empty graph, a graph with some nodes and no edges, and a graph with many nodes and edges. We should try inputs that cover all paths through the method specifications: See the provided code for my tests: TestGraph.java
9. (8 / 10) Consider adding a removeNode method to the StringGraph datatype that removes a node from a graph. Write a declarative specification for the removeNode method. Consider carefully what should happen with the edges of the graph when a node it removed, and make sure your specification is total.
public void removeNode(String s) throws NoNodeException
  // MODIFIES: this
  // EFFECTS: If s is not a node in this, throw NoNodeException.
  //    Otherwise, remove s from the nodes of this, and removes
  //    all edges from the edges of this where either endpoint
  //    of the edge matches s.
10. (9.5 / 10) For this question you have a choice, either do choice 1 or choice 2:
Unsurprisingly (and disappointingly), no one choose choice 2 even though it is much easier (but required figuring out a few new things on your own, or from the examples already provided). To implement a generic DirectedGraph dataype, all you would need to do is add <T> to the class declaration, and replace String with T at appropriate places in the implementation.