Notes
Slide Show
Outline
1
CS 332: Algorithms
  • Introduction to heapsort
2
Review: The Master Theorem
  • Given: a divide and conquer algorithm
    • An algorithm that divides the problem of size n into a subproblems, each of size n/b
    • Let the cost of each stage (i.e., the work to divide the problem + combine solved subproblems) be described by the function f(n)
  • Then, the Master Theorem gives us a cookbook for the algorithm’s running time:
3
Review: The Master Theorem
  • if  T(n) = aT(n/b) + f(n) then



4
Sorting Revisited
  • So far we’ve talked about two algorithms to sort an array of numbers
    • What is the advantage of merge sort?
      • Answer: O(n lg n) worst-case running time
    • What is the advantage of insertion sort?
      • Answer: sorts in place
      • Also: When array “nearly sorted”, runs fast in practice
  • Next on the agenda: Heapsort
    • Combines advantages of both previous algorithms
5
Heaps
  • A heap can be seen as a complete binary tree:







    • What makes a binary tree complete?
    • Is the example above complete?
6
Heaps
  • A heap can be seen as a complete binary tree:







    • The book calls them “nearly complete” binary trees; can think of unfilled slots as null pointers
7
Heaps
  • In practice, heaps are usually implemented as arrays:






8
Heaps
  • To represent a complete binary tree as an array:
    • The root node is A[1]
    • Node i is A[i]
    • The parent of node i is A[i/2] (note: integer divide)
    • The left child of node i is A[2i]
    • The right child of node i is A[2i + 1]
9
Referencing Heap Elements
  • So…
    • Parent(i) { return ëi/2û; }
    • Left(i) { return 2*i; }
    • right(i) { return 2*i + 1; }
  • An aside: How would you implement this
    most efficiently?
    • Trick question, I was looking for “i << 1”, etc.
    • But, any modern compiler is smart enough to do this for you (and it makes the code hard to follow)
10
The Heap Property
  • Heaps also satisfy the heap property:
  • A[Parent(i)] ³ A[i] for all nodes i > 1
    • In other words, the value of a node is at most the value of its parent
    • Where is the largest element in a heap stored?
11
Heap Height
  • Definitions:
    • The height of a node in the tree = the number of edges on the longest downward path to a leaf
    • The height of a tree = the height of its root
  • What is the height of an n-element heap? Why?
  • This is nice: basic heap operations take at most time proportional to the height of the heap


12
Heap Operations: Heapify()
  • Heapify(): maintain the heap property
    • Given: a node i in the heap with children l and r
    • Given: two subtrees rooted at l and r, assumed to be heaps
    • Problem: The subtree rooted at i may violate the heap property (How?)
    • Action: let the value of the parent node “float down” so subtree at i satisfies the heap property
      • What do you suppose will be the basic operation between i, l, and r?
13
Heap Operations: Heapify()
  • Heapify(A, i)
  • {
  • l = Left(i); r = Right(i);
  • if (l <= heap_size(A) && A[l] > A[i])
  • largest = l;
  • else
  • largest = i;
  • if (r <= heap_size(A) && A[r] > A[largest])
  • largest = r;
  • if (largest != i)
  • Swap(A, i, largest);
  • Heapify(A, largest);
  • }
14
Heapify() Example
15
Heapify() Example
16
Heapify() Example
17
Heapify() Example
18
Heapify() Example
19
Heapify() Example
20
Heapify() Example
21
Heapify() Example
22
Heapify() Example
23
Analyzing Heapify(): Informal
  • Aside from the recursive call, what is the running time of Heapify()?
  • How many times can Heapify() recursively call itself?
  • What is the worst-case running time of Heapify() on a heap of size n?
24
Analyzing Heapify(): Formal
  • Fixing up relationships between i, l, and r takes Q(1) time
  • If the heap at i has n elements, how many elements can the subtrees at l or r have?
    • Draw it
  • Answer: 2n/3 (worst case: bottom row 1/2 full)
  • So time taken by Heapify() is given by
  • T(n) £ T(2n/3) + Q(1)
25
Analyzing Heapify(): Formal
  • So we have
  • T(n) £ T(2n/3) + Q(1)
  • By case 2 of the Master Theorem,
  • T(n) = O(lg n)
  • Thus, Heapify() takes logarithmic time


26
Heap Operations: BuildHeap()
  • We can build a heap in a bottom-up manner by running Heapify() on successive subarrays
    • Fact: for array of length n, all elements in range
      A[ën/2û + 1 .. n] are heaps (Why?)
    • So:
      • Walk backwards through the array from n/2 to 1, calling Heapify() on each node.
      • Order of processing guarantees that the children of node i are heaps when i is processed
27
BuildHeap()
  • // given an unsorted array A, make A a heap
  • BuildHeap(A)
  • {
  • heap_size(A) = length(A);
  • for (i = ëlength[A]/2û  downto 1)
  • Heapify(A, i);
  • }
28
BuildHeap() Example
  • Work through example
    A = {4, 1, 3, 2, 16, 9, 10, 14, 8, 7}
29
Analyzing BuildHeap()
  • Each call to Heapify() takes O(lg n) time
  • There are O(n) such calls (specifically, ën/2û)
  • Thus the running time is O(n lg n)
    • Is this a correct asymptotic upper bound?
    • Is this an asymptotically tight bound?
  • A tighter bound is O(n)
    • How can this be?  Is there a flaw in the above reasoning?
30
Analyzing BuildHeap(): Tight
  • To Heapify() a subtree takes O(h) time where h is the height of the subtree
    • h = O(lg m), m = # nodes in subtree
    • The height of most subtrees is small
  • Fact: an n-element heap has at most én/2h+1ù nodes of height h
  • CLR 7.3 uses this fact to prove that BuildHeap() takes O(n) time


31
Heapsort
  • Given BuildHeap(),  an in-place sorting algorithm is easily constructed:
    • Maximum element is at A[1]
    • Discard by swapping with element at A[n]
      • Decrement heap_size[A]
      • A[n] now contains correct value
    • Restore heap property at A[1] by calling Heapify()
    • Repeat, always swapping A[1] for A[heap_size(A)]
32
Heapsort
  • Heapsort(A)
  • {
  • BuildHeap(A);
  • for (i = length(A) downto 2)
  • {
  • Swap(A[1], A[i]);
  • heap_size(A) -= 1;
  • Heapify(A, 1);
  • }
  • }
33
Analyzing Heapsort
  • The call to BuildHeap() takes O(n) time
  • Each of the n - 1 calls to Heapify() takes O(lg n) time
  • Thus the total time taken by HeapSort()
    = O(n) + (n - 1) O(lg n)
    = O(n) + O(n lg n)
    = O(n lg n)
34
Priority Queues
  • Heapsort is a nice algorithm, but in practice Quicksort (coming up) usually wins
  • But the heap data structure is incredibly useful for implementing priority queues
    • A data structure for maintaining a set S of elements, each with an associated value or key
    • Supports the operations Insert(), Maximum(), and ExtractMax()
    • What might a priority queue be useful for?


35
Priority Queue Operations
  • Insert(S, x) inserts the element x into set S
  • Maximum(S) returns the element of S with the maximum key
  • ExtractMax(S) removes and returns the element of S with the maximum key
  • How could we implement these operations using a heap?