The Right Redundancy
© 16 May 2012 Luther Tychonievich
Licensed under Creative Commons: CC BY-NC-ND 3.0
other posts

Redundancy. Computers take it out, then put it back in again. Why?


As a child I was intrigued by people who collected things. It seemed like most of my acquaintances collected something, and I couldn’t quite figure it out. I tried a few different collections, without much sticking power.

My father collected redundancies: common English phrasing that says the same thing multiple times. “‍Aid and abet‍”, “‍heat up‍”, “‍PIN number‍”, etc. There are also completely extraneous phrases, such as “‍go ahead and‍”, which can be removed anywhere they appear without semantic impact.

As I was starting to consciously improve my manner of speech and expression in my early twenties, eliminating redundancies and extraneous phrases was one of my first goals. This roughly coincided with my beginning to learn about computing. And redundancy removal plays a big role in computing. We call it “‍compression‍” and we use it all the time. There’s a good chance many of the web sites you visited today were compressed before being sent to you. The MP3 scheme for removing redundancy from audio files is so successful the term “‍MP3‍” has become almost synonymous with “‍digital audio file.‍” Communication without redundancy is less demanding of others, more efficient, cleaner.

But there’s also a problem with communication without redundancy: it requires precision to be understood. If you mishear some of the sounds words I say, you can not only detect that you misheard but also reconstruct what I probably actually said because human language has redundancy. Lots of redundancy, from the audio waveforms of each sound being repeated dozens of times to grammar requiring a relatively small set of words in any given position and context requiring a relatively small set of concepts in any given sentence.

Compression tries to remove all of this context and repetition; if there are only twenty words that could come next in context, why waste more than a single letter denoting which one it is? But with fully compressed communication, any miscommunication becomes undetectable. Instead of a misspelling causing your essay to be less nice looking, a misspelling in a zero-redundancy essay changes the rest of the essay completely—it changes not only the next word, but the context of the word after that, etc.

Thus, computing first compresses and then adds redundancy back in. We call this process “‍error-correcting codes‍” and “‍transmission protocols‍”. We take a web page, compress it to 20% of its original size, then wrap that compressed document in a whole stack of redundancy-injecting protocols that might expand its size 150%–200%. We do this because the redundancies that help humans figure out that “‍Helko, gow are yiu?‍” probably meant “‍Hello, how are you?‍” are not easy redundancies for computers to figure out. So the computer strips out the redundancies it can find easily, adds in redundancies it can use to correct miscommunications easily, and then undoes all of that at the other end so the human mind can do the same thing.

Looking for comments…

Loading user comment form…