Sorting Asymptotics
Licensed under Creative Commons: other posts

theory

Worst- and best-case runtime for sorting, and why anyone cares.

A few weeks ago I commented on asymptotics, or analyzing how programs behave on very large inputs. Last week I failed to write a sequel to this post. As I reflected on that failure I decided part of it was because I hadn’t given enough background before writing asymptotics. Today I hope to give a smattering of more-accessible examples.

# Sorting

Suppose I have a deck of cards and I want to put them in order. The cards could be in any order to start out; for example, if they are already in order then all I need to do is glance through the deck to verify that and I’m done. If they are in some “‍random‍” order I need to move some of them around too. Let’s see if we can formalize the amount of work required.

First we need to formalize the basic operations we consider. Let’s start simple: we can check to see if two cards are in order or not, and we can remove a card from one spot and insert it somewhere else. There are plenty of possible operations this ignores, such as putting a card on a particular spot on a table or checking to see how many cards are between two cards, but it’s enough for many sorting algorithms.

In the best case we need to verify that the deck is already in order. For a 52-card deck this requires comparing at least 51 cards. Thus, the the best-case runtime of a card sorting algorithm is n – 1, which is in O(n).

# Sorting Algorithms

Usually when a problem like sorting is considered it is considered in terms of a few example algorithms. We’ll do the same, looking at a few such algorithms

bubble sort
for each position i { if cards i and i + 1 are out of order { swap cards i and i + 1 } }

Bubble sort is a particularly simple algorithm to express, but also rather inefficient. A single pass through the deck is not sufficient; for example, if we start with 364152 the first pass runs as follows:

4152
3152
3452
3412
3415

Another pass gives us 314256, then another pass for 132456, and a fourth for 123456. The worst case for bubble sort is an initially backwards deck: 654321 → 543216 → 432156 → 321456 → 213456 → 123456. Backwards decks use n – 1 comparisons and swaps to get the last card in place, then n – 2 comparisons and swaps to get the next-to-last card in place, etc. Thus, bubble sort’s worst-case runtime is
 n2 – n 2
, which is in O(n2).

binary insertion sort
for each s in 1, 2, … n – 1 { call card s + 1 “‍c‍”; consider the window 1 … s; while the window has more than one card { if c is after the middle card in the window { keep the second half of the window } otherwise { keep the first half of the window } }; move c before or after the window, as appropriate. }

The binary insertion sort is both more complicated and more efficient than bubble sort. The idea is that the first s cards are in order and we efficiently decide where to stick the next card into that set. The whole “‍window‍” bit is called a “‍binary search‍” and is the quickest way to find the right spot in an ordered list. Again using 364152, the entire search is

 64152 → 36 4152
4152 → 6 4152 → 346 152
 152 → 46 152 → 1346 52
 52 → 13 52 → 136 52 → 13456 2
 2 → 456 2 → 3456 2 → 123456

One of the interesting things about this algorithm is that is takes approximately the same amount of time for every input. That time is a bit messy to derive or even express It involves the Pochhammer symbol, not exactly well-known mathematics. , but the binary search takes approximately log2(s) and the end result is in O(n log(n)).

We could go on like this, discussing heap sort and merge sort and tom sort and quick sort and so on, but it would get tedious.

# Sorting Bounds

At some point we stop asking “‍what about doing this‍” and start asking “‍how good can we do?‍” This is my favorite kind of question, and the sort of thing I am paid to research. Fortunately, it isn’t that hard to explain the results.

Let’s start by asking, “‍how many initial ordering of cards can there be?‍” Well, there are 52 cards that might be first, and for each of those there are 51 cards left that might be second, and 50 cards that might be third, and so on. This gives us 52 × 51 × 50 × … × 1 = 52! total orderings of a deck of 52 cards.

Next, let’s ask “‍how much information we can learn each time we compare two cards?‍” At most, such a comparison splits the possible decks we might have in half, so if initially I might be holding any one of 52! decks then after one comparison I only need consider 52! ÷ 2 decks, after two comparison only 52! ÷ 4, etc.

A particular sequence of card moves results in a sorted deck for only one initial deck. Hence, the comparisons we use must be sufficient to identify just a single initial deck in order to select the right sequence of moves. Thus, we must perform at least log2(n!) comparisons. log2(n!) is again rather messy, but it works out to be in O(n log2(n)).

Thus, if the only way we have of learning about the deck of cards is by comparisons of pairs of cards then no sorting algorithm can possibly be faster than O(n log(n)) for all inputs.

# So What?

Who really cares about sorting anyway? Why go to all the bother of finding the “‍best‍” algorithm and proving it “‍best‍”? Why bother?

People want everything to be faster. Once we make an algorithm “‍fast enough‍” someone always finds a new problem that taxes that algorithm to the breaking point. Every millisecond a search engine or online commerce site takes doing its job costs the site customers. Speed matters.

Knowing the bounds an algorithm cannot pass serves two purposes. First, it means that we don’t need to waste effort trying to push an algorithm past where it can go. Second, the assumptions of the proof itself (such as “‍all we know about the deck we learn by comparisons‍”) tell us what we need to work around to get something faster. Another way of reading the previous section is “‍if you need to sort it faster than O(n log(n)), find out more about it than the relative ordering of pairs of cards.‍”

Theory is the study of the possible. Knowing the possible is quite often useful.