Because we have to deal withNone, the code is fairly awkward and complex:def equal(self, t): if not self.__value == t.__value: return False if self.getLeft () == None: if not t.getLeft () == None: return False else: if t.getLeft () == None: return False if not self.getLeft ().equal (t.getLeft ()): return False if self.getRight () == None: if not t.getRight () == None: return False else: if t.getRight () == None: return False if not self.getRight ().equal (t.getRight ()): return False return True

- What is the worst case running time of your
`equal`method?Θ(

*N*) — the worst case is when the trees are equal (or the only difference is in the rightmost leaf) and every node must be compared. This will involve*N*calls to the`equal`method. The work for each call is constant. It involves lots of comparisons, but no work that scales with the tree size. - What is the best case running time of your
`equal`method?*O*(1) — if the root nodes are unequal, only one comparison is needed (to reach the first`return False`), so the running time is constant and does not scale with the size of the tree. - What is the worst case space usage of your
`equal`method?The space for each call is constant (no local variables are used), so the space scales with the number of calls that might be on the stack. The worst case occurs with the tree is completely unbalanced, so its height is

*N*. In this case, we could have*N*recursive calls active as we walk down the tree, so the worst case space usage is Θ(*N*). - What is the worst case space usage of your
`equal`method if the input trees are both well-balanced?If the trees are well balanced, the height of the trees are Θ(log

*N*), so the maximum recursive depth is Θ(log*N*), and the worst case space usage (for balanced trees) is Θ(log*N*).

**3.** Define a method `isomorphic` in the `Tree` class
that takes a tree as its parameter, and evaluates to true if and only if
the input tree is isomorphic to self. Two trees are considered
isomorphic if their root nodes are equal, and each node in the tree
either (1) has a left child that is isomorphic to the left child of the
corresponding node in the self tree and has a right child that is
isomorphic to the right childe of the corresponding node in the self
tree; or (2) has a left child that is isomorphic to the right child of
the corresponding node in the self tree and has a right child that is
isomorphic to the left child of the corresponding node in the self
tree. (The intuition behind our definition is the two trees would be
equal if you could swap left and right children.)

There are lots of possibilities here. One is to modify ourequaldefinition to add the isomorphic cases. This would be pretty complex however, especially since we have to deal specially withNonechildren.So, instead we implement a somewhat simpler approach:

def numChildren(self): num = 0 if not self.__left == None: num += 1 if not self.__right == None: num += 1 return num def isomorphic(self, t): if not self.__value == t.__value: return False if not self.numChildren () == t.numChildren (): return False if self.numChildren () == 0: return True elif self.numChildren () == 1: schild = self.getLeft () if schild == None: schild = self.getRight () tchild = t.getLeft () if tchild == None: tchild = t.getRight () return schild.isomorphic (tchild) else: return (self.getLeft ().isomorphic (t.getLeft ()) \ and self.getRight ().isomorphic (t.getRight ())) \ or \ (self.getLeft ().isomorphic (t.getRight ()) \ and self.getRight ().isomorphic (t.getLeft ()))Another option would be to useequaland swap children in our comparisons, but repair them after. This is risky — we need to know there is no other code running concurrently that might observe the tree in its altered state. It does make the code simpler, however:def isomorphic(self, t): if self.equal(t): return True else: (self.__left, self.__right) = (self.__right, self.__left) res = self.equal(t) (self.__right, self.__left) = (self.__left, self.__right) return res

This is pretty tricky. We need to find a way to keep track of the state of the comparison. With the recurisve definition, Python's runtime stack does this for us. If we can't use recursion, though, we need to keep track of this ourselves. Our strategy is to maintain a list of pairs of nodes that remain to be checked.def equalIter(self, t): print "equalIter: " + str(self) + " / " + str(t) nodes = [[self, t]] while not len(nodes) == 0: nnodes = [] for pair in nodes: print "Checking pair: " + str(pair[0]) + " / " + str(pair[1]) if pair[0] == None: if not pair[1] == None: return False elif pair[1] == None: return False else: if pair[0].getValue () != pair[1].getValue (): return False nnodes.append ([pair[0].getLeft (), pair[1].getLeft ()]) nnodes.append ([pair[0].getRight (), pair[1].getRight ()]) nodes = nnodes return TrueNote that the code is actually simpler than our recursive code because we don't need as much special code for handling theNonecases.

- What is the worst case running time of your
`iterEqual`method?Θ(

*N*) — The maximum number of iterations of the while loop is*N*, in the case where all the nodes are equal. The easiest way to see this is noticing that each node value must be compared once. The running time of each operation is constant. This assumes the list append and access operations are all*O*(1). - What is the best case running time of your
`iterEqual`method?As in 2b,

*O*(1). - What is the worst case space usage of your
`iterEqual`method?Since there are no recursive calls, the stack depth is constant. But,

`iterEqual`uses memory to store the`nnodes`list. The space needed to store a list scales linearly in the number of elements in the list. So, we need to figure out the longest list it could be.The

`nnodes`list contains the number of nodes at a given depth of the tree, so its maximum length is the maximum number of nodes at any tree depth. This is maximized for a well balanced tree as the number of leaves in the tree (which are all at the same depth in a well balanced tree). The maximum number of leaves in a tree of*N*nodes is*N*/2. So, the memory use is in Θ(*N*). - What is the worst case space usage of your
`iterEqual`method if the input trees are both well-balanced?That is the worst case, Θ(

*N*) as explained above.

**6.** The provided `insert` method has expected running time
in Θ(*N*) where *N* is the number of entries in the
table. (We are optimistically assuming the Python slicing and access
operations are in *O*(1).) Define an `insert` method that
has expected running time in Θ(log *N*).

We use the same search strategy as inlookupto find the correct insertion position:def insert(self, key, value): def insertposition(low, high): if (low >= high): return low middle = (low + high) / 2 if key < self.items[middle].key: return insertposition (low, middle) elif key > self.items[middle].key: return insertposition (middle + 1, high) else: print "ERROR! Duplicate key" assert (False) pos = insertposition (0, len(self.items)) self.items.insert (pos, Record (key,value))This has expected running time in Θ(logN) since each recursive call toinsertpositionhalves the number of locations that are under consideration.

**7.** Ari Tern suggest replacing the implementation of
`lookup` with this implementation (`tlookup` in
`ContinuousTable.py`):

def tlookup(self, key): def lookuprange(items): if len(items) == 0: return None if len(items) == 1: if items[0].key == key: return items[0].value else: return None split1 = len(items) / 3 split2 = 2 * len(items) / 3 if key < items[split1].key: return lookuprange (items[:split1]) elif key < items[split2].key: return lookuprange (items[split1:split2]) else: return lookuprange (items[split2:]) return lookuprange(self.items)Is this a good idea? (A good answer will consider the affect of Ari's change on both the asymptotic and absolute properties of the procedure.)

Thetlookupimplementation requires fewer recusive calls tolookuprangethan was required withlookupsince each call eliminates two thirds of the items from consideration, instead of just one half. This means the number of expected calls is log_{3}Ninstead of log_{2}N. Within our ordernotation, this doesn't matter, though, since changing the base of a log only alters the value by a constant factor. So, the aymptotic running time is still in Θ(logN).The actual running time, however, will be affected. We argued in the previous paragraph that the number of calls is reduced from log

_{2}Nto log_{3}N. In Lecture 5, we sawlogSo, this reduces the number of calls by log_{b}x = log_{a}x / log_{a}b_{2}3 = 0.63. The cost is an increase in the size (and complexity) of the code, and an increase in the running time of each call. We can estimate the running time increase by the number of expected comparisons. In the original code, one comparison is always needed (key < items[middle].key). (We ignore the end cases where the length is 0 or 1 since these are only encountered once.) In the modified code, this is more complex. We always make the first comparison (key < items[split1].key). If it is true, we are done. Otherwise, we need to make the second comparison. Assuming the calls tolookupare evenly distributed over the list, we expect the first comparison to be true only 1/3 of the time. Hence, the expected number of comparisons is 1 + 2/3. If our assumption that comparisons dominate the running time, then the expected running time is 0.63 * (1 + 2/3) = 1.05 the running time oflookup. So, we would expect it to be slightly slower, but after accounting for the overhead of the calls and the other work, this would be reduced. Hence, the change is a bad idea. There is no likely performance improvement (and a possible reduction), and the size of the code has increased.

We need to find an example where the best possible phylogeny does not match the one found by the greedy algorithm. Any case where the best phylogeny does not directly connect the two elements with the highest goodness score would satisfy this, since we know the greedy algorithm would connect those elements.

The greedy algorithm could be implemented with a running time in Θ(n^{2}) wherenis the number of species in the input set.We need to first compute the goodness matrix. This involves computing the best alignment of each pair of sequences. There are

n^{2}cells to fill. If we use the Needleman-Wunsch algorithm (Lecture 4), each one requires work in Θ(|U||V|). If we assume the lengths of the input genomes do not scale (that is, we are concerned withnscaling, but the genome lengths are bounded), this is constant time related to the input size (which is measured in the number of species in the input set).Then, we execute the greedy algorithm. Finding the best initial pair requires running time in

O(n^{2}) assuming we can access each cell in the matrix in constant time. We just need to look at all the cells to find the best goodness score.Adding each element requires considering all remaining elements (there are

O(n) of them). For each one, we need to consider all possible tree locations where it could be added. This scales with the number of nodes in the current tree — each node can have at most two children to consider. The number of nodes in the tree is up ton. This isO(n^{2}). For each, we need to compute the total goodness score. If we use the result from the previous tree, though, we can compute this by just adding the new goodness score to the old score, so this can be done in constant time.Hence, the total running time is in

O(n^{2}).

CS216: Program and Data RepresentationUniversity of Virginia |
David Evansevans@cs.virginia.eduUsing these Materials |