CS432, Fall 2005 Solutions for Problem Set 4 Part 1: ------- (A)Vertex cover problem Problem: if G = (V, E), find the smallest subset W of V such that the set W "covers" every edge in E, which means that for every edge vw in E, either v or w belongs to W. Greedy algorithms will build up a solution one vertex at a time, until a set is achieved that covers all edges. This set will be final solution, W. Possible greedy solutions involve a strategy based on a particular greedy rule, i.e. which vertex to add next to the set W. Solution 1: Let G2 be a copy of G (we'll modify G2 as we go). Greedy rule: Choose vertex v in G2 that has the largest degree (number of edges incident to v). Add v to the partial solution W, and remove all edges incident to v from G2. Repeat this until no edges left in G2. (It is very likely that an identical or similar strategy can be chosen that doesn't have to have a copy of the graph to modify.) Why isn't doesn't this always produce an optimal result? A simple counterexample will show that this can fail if there are more than one equally good choice (from the point of view of the greedy rule) but the choice makes a difference. Counter example: A--B--C--D--E The optimal solution is {B, D} but the greey rule above might just as easily chosen C first, since each of B, C, and D have degree == 2. NOTE on counterexamples (or how to fool a greedy algorithm): To find an input for which a greedy approach fails is sometimes hard. But a good strategy (usually) is to come up with an "adversary input" that tries to "trick" the algortihm into making an apparent OK choice that will later turn out to be bad. (Often this strategy relies on two apparently equal choices, where later on one turns out to be bad.) (B) Graph coloring: Here we want to assign a color c_i to each vertex v_i in V such that no pair of adjacent vertices have the same color. Solution 1: (See p. 521 of our textbook!) for each vertex v in V { assign v the smallest color that is not assigned to one of its neighbors. } The greedy selection rule is simply step in the loop. But note that this algorithm processes vertices in whatever order they happened to be processed or stored, which is arbitrary. (This might be the order they're found in the adjacency list or adjacency matrix data structure.) This leads to its failing -- consider this counter-example that shows it does not always produce an optimal coloring. Let's say you process vertices in alphabetical order. For this graph: A--C--D--B then you'd assign 1 to A, then 1 to B (uh oh), then 2 to C, and then you'd have to assign 3 to D. Solution 2: You can have the same basic strategy as above, but use DFS (or BFS) to move through the vertices. Imagine creating a DFS tree -- as you visit a new vertex v, you could look at each adjacent vertex w, which will either leads to a visited edge (in which case it has a color assigned to it), or leads to an un-visited vertex. Assign v the lowest color value that is not assigned to one of its visited/colored neighbors. Not optimal? It does work for the counterexample given above. But here's a counterexample for greedy DFS graph coloring: B--F /| |\ A | | E \| |/ C--D The algorithm produces a 4-coloring: A=1, B=2, C=3, D=1, E=2, F=4. But here's a 3-coloring: A=1, B=2, C=3, D=2, E=1, F=3. The same shape can be retooled as a counterexample for BFS: A--B /| |\ C | | E \| |/ D--F The algorithm again produces a 4-coloring: A=1, B=2, C=2, D=3, E=1, F=4. But we know this shape can be 3-colored. ===================================================== Part 2: (Sunday, 11/6: Uh oh, left my textbook at school, so I'm trying to remember what these problems were. I think this is right, but I'll check them first thing when I get back to the office.) ===================================================== Problems 6 from the textbook, page 295. Discuss the worst-case behavior for Prim's MST algorithm if the graph is a complete graph (undirected) and a minheap is used to implement the priority queue. This situation leads to the priority queue (PQ) having the largest possible size as early as possible. After the first vertex is removed, then all other nodes, n-1 nodes, go onto the PQ right away. So the cost of the each PQ remove() and descreaseKey() operation could be the worst-case for a PQ that's the largest possible size under the circumstances. The main loop stills goes around n times; we'll count them from i=1 to n. Let's look at each of the PQ-related costs: a) Cost of PQ.remove(). The first time, the PQ has size 1, since just the starting vertex is on the PQ. No comparisons needed to fix the heap after the only node in it is removed. For the n-1 times after the first time, one node is removed and the PQ shrinks by one. So for loop iterations i=2 to n, we'll call siftdown() on a heap of size n-i. So the cost will be Sum{i=2 to n}(lg(n-i)) which can be re-written as Sum{i=1 to n-2}(lg(i)). If you think back to Heap algorithms, you'll remember this is j ust like used Insert to construct a heap, and you should remember this is Theta(n lg n). b) Cost of PQ.insert(). The first time through the loop, all remaining n-1 unseen nodes are inserted into the PQ in successsion. This is just constructing a heap using insert() of size n-1, so just like (a) above this is Theta(n lg n). c) Cost of PQ.decreaseKey(). Would it be possible that descreaseKey() could be called *every* possible time? It would have to be the case that you'd choose a vertex each time so that edges from that chosen vertex were better candidates edges to all the remaining fringe ertices. This is possible -- try to sketch it for a small value of n, say, n=4. How many calls to descreaseKey() is this? Well, initially all n-1 unseen nodes adjacent to the start node go into the PQ. Next, the 2nd node is removed, and there are n-2 remaining fringe nodes, and descreaseKey() will be called on each of those. A nd so on, with one less call each time. So there are Sum(i=n-2 downto 1}(i) calls to decreaseKey() which is (n-1)(n-2)/2 calls which is Theta(n^2) calls. NOTE: This is no surpise. Slide 42 "Priority Queue Costs and Prim’s" says (indirectly) that there could be Theta(m) calls of decreaseKey() and in the case of a complete graph m is Theta(n^2). But we're not quite done. There are Theta(n^2) calls to descreaseKey(). ow much does each one cost when you use a minheap? It depends. If you use indirect heaps, each one costs BigOh(lg n). If you don't, each one will cost BigTheta(n). TOTAL COST: This cost for (c) is larger than the costs for (a) and (b) above. If you say you're using indirect heaps, the overall complexity is BigOh( n^2 lg n ). If you don't say this, then the cost is BigTheta( n^3 ). Problems 8 from the textbook, page 295. ======================================= This is covered on slide 42 mentioned above, "Priority Queue Costs and Prim’s". If you vertices are identified by an integer, you can just look them up in an array. If they're not, you can use a hash-table to find the adjacency list in constant time. (a) Cost of PQ.remove(). This will be a findMax(), so each call is BigTheta(n) and the overall cost is Theta(n^2). b) Cost of PQ.insert(). Here you just change a flag to indicate the node is no longer unseen but is now fringe. Each cost is Theta(1) and the total is Theta(n). c) Cost of PQ.decreaseKey(). Just like (b), you just change the candidate edge weight value that's stored for each node. So this is constant too, and if we have to do it Theta(n^2) times for our worst-case complete graph, the total cost is Theta(n^2). TOTAL COST: Two parts of the algorithm are Theta(n^2), so that's its overall cost. Bottom line: In a worst-case situation, with this complete graph that forces the number calls to decreaseKey() to be of order Theta(n^2), the unsorted array is better than the minheap!