|
1
|
- Solving Recurrences Continued
- The Master Theorem
- Introduction to heapsort
|
|
2
|
- MergeSort(A, left, right) {
- if (left < right) {
- mid = floor((left + right) / 2);
- MergeSort(A, left, mid);
- MergeSort(A, mid+1, right);
- Merge(A, left, mid, right);
- }
- }
- // Merge() takes two sorted subarrays of A and
- // merges them into a single sorted subarray of A.
- // Code for this is in the book.
It requires O(n)
- // time, and *does* require allocating O(n) space
|
|
3
|
- Statement Effort
- So T(n) = Q(1)
when n = 1, and 2T(n/2)
+ Q(n) when
n > 1
- This expression is a recurrence
|
|
4
|
- Substitution method
- Iteration method
- Master method
|
|
5
|
- The substitution method
- A.k.a. the “making a good guess method”
- Guess the form of the answer, then use induction to find the constants
and show that solution works
- Run an example: merge sort
- T(n) = 2T(n/2) + cn
- We guess that the answer is O(n lg n)
- Prove it by induction
- Can similarly show T(n) = Ω(n lg n), thus Θ(n lg n)
|
|
6
|
- The “iteration method”
- Expand the recurrence
- Work some algebra to express as a summation
- Evaluate the summation
- We showed several examples, were in the middle of:
|
|
7
|
- T(n) =
- aT(n/b) + cn
- a(aT(n/b/b) + cn/b) + cn
- a2T(n/b2) + cna/b + cn
- a2T(n/b2) + cn(a/b + 1)
- a2(aT(n/b2/b) + cn/b2) + cn(a/b + 1)
- a3T(n/b3) + cn(a2/b2) +
cn(a/b + 1)
- a3T(n/b3) + cn(a2/b2 + a/b
+ 1)
- …
- akT(n/bk) + cn(ak-1/bk-1 +
ak-2/bk-2 + … + a2/b2 + a/b
+ 1)
|
|
8
|
- So we have
- T(n) = akT(n/bk) + cn(ak-1/bk-1 +
... + a2/b2 + a/b + 1)
- For k = logb n
- n = bk
- T(n) = akT(1) + cn(ak-1/bk-1 + ... + a2/b2
+ a/b + 1)
- = akc + cn(ak-1/bk-1 + ... + a2/b2
+ a/b + 1)
- = cak + cn(ak-1/bk-1 + ... + a2/b2
+ a/b + 1)
- = cnak /bk + cn(ak-1/bk-1 +
... + a2/b2 + a/b + 1)
- = cn(ak/bk + ... + a2/b2 +
a/b + 1)
|
|
9
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a = b?
- T(n) = cn(k + 1)
- = cn(logb n + 1)
- = Q(n log n)
|
|
10
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a < b?
|
|
11
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a < b?
- Recall that S(xk
+ xk-1 + … + x + 1) = (xk+1 -1)/(x-1)
|
|
12
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a < b?
- Recall that (xk + xk-1 + … + x + 1) = (xk+1
-1)/(x-1)
- So:
|
|
13
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a < b?
- Recall that S(xk
+ xk-1 + … + x + 1) = (xk+1 -1)/(x-1)
- So:
- T(n) = cn ·Q(1) =
Q(n)
|
|
14
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
|
|
15
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
|
|
16
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
|
|
17
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
- T(n) = cn · Q(ak
/ bk)
- = cn · Q(alog
n / blog n) = cn · Q(alog n / n)
|
|
18
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
- T(n) = cn · Q(ak
/ bk)
- = cn · Q(alog
n / blog n) = cn · Q(alog n / n)
- recall logarithm fact: alog n = nlog a
|
|
19
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
- T(n) = cn · Q(ak
/ bk)
- = cn · Q(alog
n / blog n) = cn · Q(alog n / n)
- recall logarithm fact: alog n = nlog a
- = cn · Q(nlog
a / n) = Q(cn · nlog a / n)
|
|
20
|
- So with k = logb n
- T(n) = cn(ak/bk + ... + a2/b2 +
a/b + 1)
- What if a > b?
- T(n) = cn · Q(ak
/ bk)
- = cn · Q(alog
n / blog n) = cn · Q(alog n / n)
- recall logarithm fact: alog n = nlog a
- = cn · Q(nlog
a / n) = Q(cn · nlog a / n)
- = Q(nlog a
)
|
|
21
|
|
|
22
|
- Given: a divide and conquer algorithm
- An algorithm that divides the problem of size n into a subproblems,
each of size n/b
- Let the cost of each stage (i.e., the work to divide the problem +
combine solved subproblems) be described by the function f(n)
- Then, the Master Theorem gives us a cookbook for the algorithm’s running
time:
|
|
23
|
- if T(n) = aT(n/b) + f(n) then
|
|
24
|
- T(n) = 9T(n/3) + n
- a=9, b=3, f(n) = n
- nlogb a = nlog3
9 = Q(n2)
- Since f(n) = O(nlog3 9 - e), where e=1, case 1 applies:
- Thus the solution is T(n) = Q(n2)
|
|
25
|
- So far we’ve talked about two algorithms to sort an array of numbers
- What is the advantage of merge sort?
- What is the advantage of insertion sort?
- Next on the agenda: Heapsort
- Combines advantages of both previous algorithms
|
|
26
|
- A heap can be seen as a complete binary tree:
- What makes a binary tree complete?
- Is the example above complete?
|
|
27
|
- A heap can be seen as a complete binary tree:
- The book calls them “nearly complete” binary trees; can think of
unfilled slots as null pointers
|
|
28
|
- In practice, heaps are usually implemented as arrays:
|
|
29
|
- To represent a complete binary tree as an array:
- The root node is A[1]
- Node i is A[i]
- The parent of node i is A[i/2] (note: integer divide)
- The left child of node i is A[2i]
- The right child of node i is A[2i + 1]
|
|
30
|
- So…
- Parent(i) { return ëi/2û; }
- Left(i) { return 2*i; }
- right(i) { return 2*i + 1; }
- An aside: How would you implement this
most efficiently?
- Another aside: Really?
|
|
31
|
- Heaps also satisfy the heap property:
- A[Parent(i)] ³ A[i] for
all nodes i > 1
- In other words, the value of a node is at most the value of its parent
- Where is the largest element in a heap stored?
- Definitions:
- The height of a node in the tree = the number of edges on the longest
downward path to a leaf
- The height of a tree = the height of its root
|
|
32
|
- What is the height of an n-element heap? Why?
- This is nice: basic heap operations take at most time proportional to
the height of the heap
|
|
33
|
- Heapify(): maintain the heap property
- Given: a node i in the heap with children l and r
- Given: two subtrees rooted at l and r, assumed to be heaps
- Problem: The subtree rooted at i may violate the heap property (How?)
- Action: let the value of the parent node “float down” so subtree at i
satisfies the heap property
- What do you suppose will be the basic operation between i, l, and r?
|
|
34
|
- Heapify(A, i)
- {
- l = Left(i); r = Right(i);
- if (l <= heap_size(A) && A[l] > A[i])
- largest = l;
- else
- largest = i;
- if (r <= heap_size(A) && A[r] > A[largest])
- largest = r;
- if (largest != i)
- Swap(A, i, largest);
- Heapify(A, largest);
- }
|
|
35
|
|
|
36
|
|
|
37
|
|
|
38
|
|
|
39
|
|
|
40
|
|
|
41
|
|
|
42
|
|
|
43
|
|
|
44
|
- Aside from the recursive call, what is the running time of Heapify()?
- How many times can Heapify() recursively call itself?
- What is the worst-case running time of Heapify() on a heap of size n?
|
|
45
|
- Fixing up relationships between i, l, and r takes Q(1) time
- If the heap at i has n elements, how many elements can the subtrees at l
or r have?
- Answer: 2n/3 (worst case: bottom row 1/2 full)
- So time taken by Heapify() is given by
- T(n) £ T(2n/3) + Q(1)
|
|
46
|
- So we have
- T(n) £ T(2n/3) +
Q(1)
- By case 2 of the Master Theorem,
- T(n) = O(lg n)
- Thus, Heapify() takes linear time
|
|
47
|
- We can build a heap in a bottom-up manner by running Heapify() on
successive subarrays
- Fact: for array of length n, all elements in range
A[ën/2û + 1 .. n] are heaps (Why?)
- So:
- Walk backwards through the array from n/2 to 1, calling Heapify() on
each node.
- Order of processing guarantees that the children of node i are heaps
when i is processed
|
|
48
|
- // given an unsorted array A, make A a heap
- BuildHeap(A)
- {
- heap_size(A) = length(A);
- for (i = ëlength[A]/2û downto 1)
- Heapify(A, i);
- }
|
|
49
|
- Work through example
A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}
|
|
50
|
- Each call to Heapify() takes O(lg n) time
- There are O(n) such calls (specifically, ën/2û)
- Thus the running time is O(n lg n)
- Is this a correct asymptotic upper bound?
- Is this an asymptotically tight bound?
- A tighter bound is O(n)
- How can this be? Is there a flaw
in the above reasoning?
|
|
51
|
- To Heapify() a subtree takes O(h) time where h is the height of the
subtree
- h = O(lg m), m = # nodes in subtree
- The height of most subtrees is small
- Fact: an n-element heap has at most én/2h+1ù nodes of height h
- CLR 7.3 uses this fact to prove that BuildHeap() takes O(n) time
|
|
52
|
- Given BuildHeap(), an in-place
sorting algorithm is easily constructed:
- Maximum element is at A[1]
- Discard by swapping with element at A[n]
- Decrement heap_size[A]
- A[n] now contains correct value
- Restore heap property at A[1] by calling Heapify()
- Repeat, always swapping A[1] for A[heap_size(A)]
|
|
53
|
- Heapsort(A)
- {
- BuildHeap(A);
- for (i = length(A) downto 2)
- {
- Swap(A[1], A[i]);
- heap_size(A) -= 1;
- Heapify(A, 1);
- }
- }
|
|
54
|
- The call to BuildHeap() takes O(n) time
- Each of the n - 1 calls to Heapify() takes O(lg n) time
- Thus the total time taken by HeapSort()
= O(n) + (n - 1) O(lg n)
= O(n) + O(n lg n)
= O(n lg n)
|
|
55
|
- Heapsort is a nice algorithm, but in practice Quicksort (coming up)
usually wins
- But the heap data structure is incredibly useful for implementing priority
queues
- A data structure for maintaining a set S of elements, each with an
associated value or key
- Supports the operations Insert(), Maximum(), and ExtractMax()
- What might a priority queue be useful for?
|
|
56
|
- Insert(S, x) inserts the element x into set S
- Maximum(S) returns the element of S with the maximum key
- ExtractMax(S) removes and returns the element of S with the maximum key
- How could we implement these operations using a heap?
|