[an error occurred while processing this directive]
Problem Set 2 - Comments
Procedural Abstraction and Using Abstract Datatypes
For the next question, consider these two specifications for the
sort procedure:
A. From the Java 2 Platform API documentation (java.util.Arrays):
public static void sort(int[ ] a)
Sorts the specified array of ints into
ascending numerical order. The sorting algorithm is a tuned quicksort,
adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort
Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265
(November 1993). This algorithm offers n*log(n) performance on many data
sets that cause other quicksorts to degrade to quadratic performance.
Parameters:
a - the array to be sorted.
|
B. From the textbook (p.46):
public static void sort(int[ ] a)
MODIFIES: a
EFFECTS: Rearranges the elements of a into ascending order.
e.g., if a = [3, 1, 6, 1], a_post = [1, 1, 3, 6]
|
1. (10) Describe three advantages of specification B over
specification A.
There are many advantages:
- Specification B is shorter than Specification A.
- Specification B is declarative, specification A is operational.
- Specification B states clearly what may be modified by the method.
- Specification B does not overconstrain the implementation.
Specification A requires a particular sorting algorithm; if a better
algorithm is discovered for a particular platform, it cannot be used if
Specification A is used.
2. (5) Describe one scenario where specification A is more useful.
If the client needs to know the performance of the sort method,
the operational information in specification A may be helpful. This
would be the case, for example, if the client is using sort to
sort a large array, and needs to know if it will perform adequately (on
all Java platforms), or if it is necessary to implement a sort routine
that has known performance properties.
3. (10) Write a complete, declarative specification of
histogram. Your specification should be enough for a client to
safely use the implementation provided above.
public static int [] histogram (int [] a)
// REQUIRES: a is non-null and the values in a are non-negative.
// EFFECTS: Returns an array, result, where result[x] is the
// number of times x appears in a. The result array has
// maxval(a) + 1 elements. For example,
// histogram ({1, 1, 2, 5}) = { 0, 2, 1, 0, 0, 1 }
Note that the REQUIRES is needed. The implementation provided would
produce a run-time exception if its argument includes a negative value.
4. (10) Write an alternative specification for the histogram
procedure that places less burden on the client by using exceptions.
What are the advantages and disadvantages of this specification compared
to your answer to question 3?
public static int [] histogram (int [] a) throws NegativeValue
// EFFECTS: If a contains any negative values, throws NegativeValue.
// If a is null, throws a NullPointerException.
// Otherwise, returns an array, result, where result[x] is the
// number of times x appears in a. The result array has
// maxval(a) + 1 elements. For example,
// histogram ({1, 1, 2, 5}) = { 0, 2, 1, 0, 0, 1 }
(Note we need to create the NegativeValue exception class.)
The advantage of this specification is the behavior is predictable when
an array containing a negative value is passed in. The disadvantage is
it requires a more complex implementation.
5. (10) Write an alternative specification for the histogram
procedure that is total. A client should be able to pass any array of
integers into this procedure and obtain a meaningful return value.
What are the advantages and disadvantages of this specification compared
to your answers to questions 3 and 4?
A better solution requires changing the return value of
histogram to represent the histogram in a more complex way.
One possibility is to keep the
int [] array return value, but
start the indexes from the minimum value in
a:
public static int [] histogram (int [] a)
// EFFECTS: If a is null, returns { }. Otherwise,
// returns an array, result, where result[minValue(a) + x] is the
// number of times x appears in a and minValue(a) is the lowest value
// in a. The result array has maxValue(a) - minValue(a) + 1
// elements. For example,
// histogram ({1, 1, 2, 5}) = { 2, 1, 0, 0, 1 }
// histogram ({-2, 0, 1, -2}) = { 2, 0, 1, 1 }
This is a bit awkward for a client, however, since it needs to know
minValue(a) to use the return value usefully.
Another option is to return a value table:
public static java.util.HashMap histogram (int [] a)
// EFFECTS: Returns a HashMap where the value associated with x in
// the result is the number of times x appears in a. That is,
// if result.containsKey (x)
// appearances of x = result.get (x)
// else
// appearances of x = 0.
This may be more useful for the client, but requires changing the return
type to a data type that can represent a mapping (such as the
java.util.HashMap used here.
6. (17 points for code, 8 points for tests) Impement the task
schedule as described above. Your program should take a file name as
input, and output a valid schedule and completion time for completing
the first task in the file. You implementation should be total: no
matter what input it is given, it should behave in a sensible
way. You should use procedural abstraction to avoid code duplication
and enhance the clarity of your implementation.
You should implement your program by creating a new
TaskScheduler class (in the ps2 package) with a main
method.
There are lots of possible implementations. Here is a solution that
follows the structure we talked about in class (
http://www.cs.virginia.edu/cs205/ps/ps2/mine/TaskScheduler.java).
Note that it avoids the problem of circular dependencies in a fairly
kludgey way (using the
depth and
maxdepth parameters of
scheduleTask to
recognize when too many nested calls to
scheduleTask have been
made). There are many more elegant ways to handle this problem, but the
provided
DirectedGraph datatype is fairly inadequate (for
example, without a
remove operation, or a way of iterating
through all nodes, we cannot implement a standard topological sort).
[an error occurred while processing this directive]
7. (10) Write a specification for your program. Your specification
should be total: it should describe what the program does on all
possible inputs.
It is quite difficult to write a clear and complete specification for the
main method. The specification needs to describe the input
file format and constraints on the contents of valid files. We write a
total spec (no requires clause) that describes the behavior on all
possible inputs (even though for some inputs the only sensible behavior
is to report an error and exit). An alternative would be to use a
REQUIRES clause to impose preconditions on the input, including the
input file. This makes the implementation easier, but means the
behavior on some inputs is unpredictable.
EFFECTS: If the command line arguments do not contain at least one element, and
the first element is not the name of a readable file that satisfies the
file format described below, the output is an error message.
Otherwise, outputs a valid schedule for completing the first task listed
in the input file. A valid schedule is a schedule in which: (1) only
tasks that must be completed to complete to first task are included, and
(2) every task is preceeded by the tasks on which it depends.
The input file is described by this (somewhat informal) grammar:
Line ::= CommentLine | TaskLine
CommentLine ::= # any characters up to and including the end of line
TaskLine ::= Name Number { DependsList } end of line
DependsList ::=
DependsList ::= Name DependsList
Name ::= a sequence of non-space printable characters
Number ::= a sequence of characters that can be parsed as an integer
Every task name that appears in a DependsList must also appear as
the Name of a TaskLine. The same Name may
not appear as the Name on more than on TaskLine; the
same Name may not appear more than once in a
DependsList. The file must describe an acyclic graph. That
is, if a given task A depends on task B, there must be no path in the
graph from B to A.
8. (16) Describe a testing strategy for an implementation of the task
schedule program. Your answer should include a list of black-box test
cases, and any additional glass-box test cases.
The black-box test cases should explore paths through the specification;
the glass-box test cases should explore paths through the
implementation.
There are many tests needed to try all paths through the specification.
In particular, we should try:
- Valid and invalid command lines — 0, 1 and more arguments
(note that our specification implies that it is okay to have more than
one argument, but doesn't do anything with the additional arguments)
- An unreadable file
- An empty file
- Readable files that do not satisfy the file format. Lots of
possibilities including any input that does not satisfy the provided
grammar. Obvious possibilities include files with incomplete task lines
(no Number, no {, no }) at the end and in the
middle of the file.
- Readable files that satisfy the file format but not the
constraints. Obvious possibilities include having task names in
depends lists but not in task lines, having cyclic dependencies,
duplicate tasks, and duplicate tasks in the depends lists.
- Valid inputs including tasks with deep dependency chains, inputs
that include unnecessary tasks (not needed to complete the first task),
inputs where there are no dependent tasks for the first task.
For glass box testing, we consider any paths through the code that do
not appear to be tested by our black box test cases. The main issue
this turns up is from the code for
scheduleTask — we
should be sure to include some test cases where the true brach of the
schedule.contains(t) predicate would be taken. For example,
main 10 { a b }
a 10 { c }
b 10 { c }
c 5 { }
Note that the implementation contains infinitely many possible paths, so
we cannot possible test them all. But, the black box and glass box
tests described about should at least cover all lines in the program.
9. (10) How confident are you that your program will always work as intended? (Where "as intended" means as you
specified it in question 7, except if there are inputs that are not
covered by your specification it must behave as the course staff
intended.) Express your answer as a bet of between 0 and 20 points. If the
customer (grader) agrees that your program always works as intended, you
get the points. If not, you lose twice your bet. You should
assume that your program will run with implementations of the
provided datatypes that satisfy their specifications, but not
necessarily the same implementations as were provided.
I don't know of any inputs for which my code fails, but I'd be very
surprised if it is completely correct. But, I'll a gambler, so I'll bet
20 points. If you find an input for which my code behaves badly, you
get the 20 points.
Here are the tests I ran (for people who bet more than 0 points for
question 9):
- No command line arguments
- engine.tsl — the example
- missingbrace.tsl — file
contains a task line with a missing close brace
- duplicates.tsl — file
contains duplicate tasks with the same name
- dupdepends.tsl — file
lists the same task twice in a dependencies list
- circular.tsl — file contains
circular dependencies
- unused.tsl — file contains
unnecessary tasks (not needed to complete first task)