This problem set is intended to help you prepare for Exam 1. You may work on it by yourself or with any number of other students of your choice, but keep in mind that you will need to do the exam on your own.Regardless of whether you work alone or not, you are encouraged to discuss this assignment with other students in the class and ask and provide help in useful ways. Remember to follow the pledge you read and signed at the beginning of the semester. For this assignment, you may consult any outside resources, including books, papers, web sites and people, you wish except for materials from previous cs1120, cs150, and cs200 courses. If you use resources other than the class materials, lectures and course staff, explain what you used in your turn-in.
As always, you are strongly encouraged to take advantage of the scheduled help hours and office hours for this course.
The encrypt function takes a plaintext (unencrypted) message and produces a ciphertext (encrypted) message. Encrypt scrambles and alters the letters of the plaintext message so that an eavesdropper who intercepts the message would not be able to understand its meaning. The decrypt function takes a ciphertext message and produces the corresponding plaintext message. Encryption works as intended if the only person who can perform the decrypt function is the intended recipient of the message.
Since making up good encrypt and decrypt functions and keeping them secret is hard, most cryptosystems are designed to be secure even if the encryption and decryption algorithms are revealed. The security relies on a key which is kept secret and known only to the sender and receiver. The key alters the encryption and decryption algorithm in some way that would be hard for someone who doesn’t know the key to figure out. If the sender and receiver use the same secret key we call it a symmetric cipher. If the sender and receiver can use different keys we call it an asymmetric cipher (asymmetric ciphers are also known as public-key cryptosystems).
In this problem set, you will explore a symmetric cipher based on the Lorenz Cipher that was used by the German Army High Command to send some of the most important and secret messages during World War II. The Lorenz Cipher was broken by British Cryptographers at Bletchley Park. Arguably the first electronic programmable computer, Colossus, was designed and built by Tommy Flowers, a Post Office engineer working at Bletchley Park during the war. (There is a lot of arguing about what should be considered the first computer. Babbage's Analytical Computer would be the first, except he was not able to actually build it. Konrad Zuse's built the first working universal computer, but had the misfortune of working in Nazi Germany. Who Invented the Computer? The Legal Battle That Changed Computing History supports John Atanasoff's case; this Scientific American post summarizes some of the other candidates.)
Ten Collosi were built in 1943 and 1944, and used to break some of the most important German messages during World War II. Messages broken by Colossus were crucial to the D-Day invasion since the allies were able to learn that their campaign to deceive Hitler about where the attack would come was succeeding and knew where German troops were positioned.
Bletchley Park (Summer 2004) |
It is regretted that it is not possible to give an adequate idea of the fascination of a Colossus at work: its sheer bulk and apparent complexity; the fantastic speed of thin paper tape round the glittering pulleys; the childish pleasure of not-not, span, print main heading and other gadgets; the wizardry of purely mechanical decoding letter by letter (one novice thought she was being hoaxed); the uncanny action of the typewriter in printing the correct scores without and beyond human aid; the stepping of display; periods of eager expectation culminating in the sudden appearance of the longed-for score; the strange rhythms characterizing every type of run; the stately break-in, the erratic short run, the regularity of wheel-breaking, the stolid rectangle interrupted by the wild leaps of the carriage-return, the frantic chatter of a motor run, the ludicrous frenzy of hosts of bogus scores.
D. Mitchie, J. Good, G. Timms. General Report on Tunny, 1945. (Released to the Public Record Office in 2000). |
This file contains:
Note [added 29 Sept]: When you draw your machine, it is fine to draw the NOT and AND operations as simple boxes, instead of transistors. Your NOT box should have one input and one output; your AND box should have two inputs and one output.
In cryptography, it is usually easier to deal with 1 and 0 instead of true and false. For this problem set, use 1 to represent true and 0 to represent false. Each 0 or 1 represents one bit of information.
Your xor-bits function should produce these evaluations:
> (xor-bits 0 0) 0 > (xor-bits 0 1) 1 |
> (xor-bits 1 0) 1 > (xor-bits 1 1) 0 |
Fortunately for the allies, the Nazis didn't have a way of generating and distributing perfectly random bit sequences. Instead, they generated non-random sequences of bits using rotors. Because the sequences of bits used as the key were not random, the cryptographers at Bletchley Park were able to figure out a way to determine the most likely key, and hence, to decrypt the message. We'll talk more about how they did this in class (but it is not necessary to understand to complete this problem set).
(Historical aside: The Soviet Union also used a one-time pad cipher following WWII. Because they sometimes reused keys and made some mistakes in constructing the keys, however, the NSA was able to decrypt many important KGB messages. Details of this were released in 1995, and are available at the NSA's web site. The decrypted messages were used to expose Julius and Ethel Rosenberg's spy ring, as well as Soviet attempts to obtain atomic weapons research.)We will look at how the Lorenz cipher used xor to encrypt messages soon. First, we consider how to turn messages into sequences of bits.
Input ::= BitSeq + BitSeq #When your machine reaches the Halt state, the tape should contain BitSeq # (its okay if there is more stuff to the right of the #) where the output BitSeq is the XOR of the two inputs. For example, if the input is 0110+1111# the output should be 1001#. If the two input sequences are of different lengths, your machine should end in an Error state.
BitSeq ::= ε
BitSeq ::= 0 BitSeq
BitSeq ::= 1 BitSeq
Your answer should show the design of a Turing Machine that solves the problem. The fewer states you use in your answer, the better.
Note: It is not necessary to complete this problem before going on to the rest of the problem set. You may prefer to come back to this question later after finishing the Scheme programming questions.
Note [added 29 Sept]: If you prefer to have a # at the beginning of the input and output, that is fine (and may make it a bit easier to design the machine). With this change, your rule for Input is Input ::= # BitSeq + BitSeq #.Table 1 shows the letter mappings for the Baudot code. For example, the letter H is represented by 10100 while I is 00110. We can put letters together just by concatenating their encodings. For example, the string "HI" is represented in Baudot as 10100 00110. There are some values in the Baudot code that are awkward to print: carriage return, line feed, letter shift, figure shift and error. For our purposes we will use printable characters unused in the Baudot code to represent those values so we can print encrypted messages as normal strings. Table 1 shows the replacement values in parenthesis.
A | 00011 | H | 10100 | O | 11000 | V | 11110 | space | 00100 | ||||
B | 11001 | I | 00110 | P | 10110 | W | 10011 | carriage return (,) | 01000 | ||||
C | 01110 | J | 01011 | Q | 10111 | X | 11101 | line feed (-) | 00010 | ||||
D | 01001 | K | 01111 | R | 01010 | Y | 10101 | letter shift (.) | 11111 | ||||
E | 00001 | L | 10010 | S | 00101 | Z | 10001 | figure shift (!) | 11011 | ||||
F | 01101 | M | 11100 | T | 10000 | error (*) | 00000 | ||||||
G | 11010 | N | 01100 | U | 00111 |
We use lists of ones and zeros to represent Baudot codes. H is represented as (list 1 0 1 0 0). A string is represented as a list of these lists: "HI" is
(list (list 1 0 1 0 0) (list 0 0 1 1 0)).
We have provided these two functions in lorenz.rkt:
char-to-baudot: Character → BaudotTakes a character as input, and outputs the corresponding Baudot code represented as a list of five bits.baudot-to-char: Baudot → CharacterTakes a Baudot code, represented as a list of five bits, as input and outputs the corresponding character.
Characters are represented in DrRacket using #\character. For example, #\H is the character H and #\space is the space character.
There are two useful built-in procedures for converting between a string and a list of characters:
string->list: String → ListTakes a String as input and outputs the corresponding List of Characters.list->string: List → StringTakes a List of Characters as input and outputs the corresponding String.
For example:
> (string->list "HI") (#\H #\I) > (list->string (list #\H #\I)) "HI"
Your functions should be inverses. Hence, you can test your code by evaluating baudot-to-string composed with string-to-baudot. For example,
(baudot-to-string (string-to-baudot "HELLO"))
should evaluate to "HELLO".
Lorenz Cipher Machine |
The Lorenz cipher was an encryption algorithm developed by the Germans during World War II. It was used primarily for communications between high commanders in European capitals controlled by the Nazis. The original Lorenz machine consisted of 12 wheels, each one having 23 to 61 unique positions. Each position of a wheel represented either a one or a zero.
The first 5 wheels were called the K wheels. Each bit of the Baudot representation of a letter was xor-ed with the value showing on the respective wheel. The same process was repeated with the next 5 wheels, named the S wheels. The resulting value represented the encrypted letter. After each message letter the K wheels turn one rotation. The movement of the S wheels was determined by the positions of the final two wheels, called the M wheels.
Like all good ciphers, the Lorenz machine uses a key to control the encryption and decryption functions. The key was the starting position of each of the 12 wheels. To decipher the message you simply need to start the wheels with the same position as was used to encrypt and enter the ciphertext.
There are 16,033,955,073,056,318,658 possible starting positions (this is about the number of transistors we should expect Intel to sell in 2022). This made the Nazis very confident that without knowing the key (starting positions of the wheels), no one would be able to break messages encrypted using the Lorenz machine. (As we will see, however, they did not account for the cleverness of the Bletchley Park cryptographers and machine designers!)
For this problem set, you will simulate a simplified version of the Lorenz cipher (dealing with the full Lorenz cipher is left as a bonus problem). You should be suitably amazed that the allied cryptographers in 1943 were able to build a computer to solve a problem that is still hard for us to solve today! (Of course, they did have more that a week to solve it, and a much more serious motivation than we can provide in this course.)
Our Lorenz machine will use 11 wheels, each with only 5 positions. The first five wheels will be the K wheels and the second five the S wheels. Each of these will only have a single starting position for all 5. Unlike the real Lorenz machine, for this problem set we will assume all five K wheels must start at the same position and all five S wheels must start at the same position.
The final wheel will act as the M wheel. After each letter all the K wheels and the M wheel should rotate. If the M wheel shows a 1 the S wheels should also rotate, but if the M wheel shows a 0 the S wheels do not rotate.
We have provided 3 lists that represent the wheels. The first is called K-wheels and is a list of lists, each of the inner lists containing the 5 settings. The definition is:
(define K-wheels (list (list 1 1 0 1 0) (list 0 1 0 0 1) (list 1 0 0 1 0) (list 1 1 1 0 1) (list 1 0 0 0 1))).
There is a similar list called S-wheels to represent the S wheels of our simulated machine.
The final list represents the M wheels and is just a single list. The definition is:(define M-wheel (list 0 0 1 0 1))
The first number in the list represents the current position of the wheel. Thus, we can simulate rotating a wheel by removing the number at the front of the list and placing it at the back.
Next, define similar procedures that work on a list of wheels at a time instead of a single wheel.
Now that we can rotate our wheels, we can simulate the (simplified) Lorenz machine using our K and S wheels. Since both sets of wheels are doing the same thing, we should be able to write one procedure that will work with either the K wheels or the S wheels.
We now have all the procedures we need to implement our simplified Lorenz machine. A quick review of how the machine should work:
> (do-lorenz (string-to-baudot "COOKIE") K-wheels S-wheels M-wheel)
((1 1 0 1 0) (0 0 0 0 1) (1 1 0 0 1) (1 0 0 0 1) (0 0 1 1 1) (1 1 0 1 1))
You should now be able to encrypt strings using the simplified Lorenz cipher. To test it, call your lorenz-encrypt function with a string and offsets of your choice to produce ciphertext. Since our encryption and decryption functions are the same, if you evaluate lorenz-encrypt again using the ciphertext and the same offsets you should get your original message back.
For example:
> (lorenz-encrypt "CAKE" 1 2 3)
"BNR!"
> (lorenz-encrypt "BNR!" 1 2 3)
"CAKE"
The messages were sent to John Tiltman at Bletchley Park. Tiltman was able to discern both messages and determine the generated key. The messages were then passed on to Bill Tutte who, after two months of work, figured out the complete structure of the Lorenz machine only from knowing the key it generated. The British were then able to break the Lorenz codes, but much of the work needed to be done by hand, which took a number of weeks to complete. By the time the messages were decrypted they were mostly useless.
The problem was given to Tommy Flowers, an electronics engineer from the Royal Post Office. Flowers designed and built a device called Colossus that worked primarily with electronic valves. The Colossus was the first electronic programmable computer. It was able to decrypt the Lorenz messages in a matter of hours, a huge improvement from the previous methods. The British built ten more Colossi and were able to decrypt an enormous amount of messages sent between Hitler and his high commanders. The British kept Colossus secret until the 1970s. After the war, eight of the Colossi were quickly destroyed and the remaining two were destroyed in 1960 and all drawings were burnt. The details of the breaking of the Lorenz Cipher were kept secret until 2000, but are now available at http://www.codesandciphers.org.uk/documents/newman/newmix.htm.
Our simplified Lorenz cipher is small enough that it is possible to break encrypted messages just by trying all possible keys until you find the one that works. Since there are only 5 starting positions for the K wheels, 5 for the S wheels, and 5 for the M wheel, there are only 125 different keys. This is such a small number that we can simply try every possibility and look at the results to find the original message. Breaking a cipher by trying all possible key values is called a brute-force attack. A good cipher must have far too many keys for someone to be able to test them all, even with access to the most powerful computers. Typical ciphers today, such as the AES cipher, use a key that is at least 128 bits long (so there are 2^{128} > 10^{38} possible keys, which makes the number of transistors Intel sells look like a very small number!)
(brute-force-lorenz (lambda (s) (begin (display s) (newline))) ciphertext)should print out the 125 possible decoded strings for the ciphertext input (defined in lorenz.rkt). If your procedure works correctly, the one message generated that looks like sensible English is the plaintext message.
The challenge-ciphertext message was encrypted using a Lorenz-like cipher, but unlike the ciphertext for Question 12 (and the machine you simulated to solve that), the wheels in each group are not necessarily all rotated by the same amount. So, each of the five K-wheels, each of the five S-wheels, and the one M-wheel can be rotated by any amount. This means instead of having 125 possible initial wheel settings, this machine has 5^{11} = 48828125 possible settings.
Solving this challenge is worth at least a gold star bonus (and potentially multiple gold stars depending on how you solve it), as well as an offer to join my research group (with a paid position over the summer). There is no deadline on submitting a solution to this challenge, whoever solves it first wins. The winner will be expected to explain their solution to me, and present it to the class.
Note: this is definitely a challenging problem, but trivially easy compared to breaking the actual Lorenz cipher as was done at Bletchley Park in 1943, not even accounting for the fact that they had to start by inventing a computer! (Hint: The encrypted message is in English, but you definitely don’t want to look at 48 million output strings by hand to see which one is English!)
Colossus (Original, 1943) |
Colossus (Rebuilt, 2004) |
Thomas Jefferson’s instructions to Captain Lewis for the Expedition to the Pacific.
You must be logged in to post a comment.
I tried to download the ps4 zip file, but it says that the page requested could not be found
Opps, sorry! The link is fixed now.
I’ve added a couple notes to Question 1 and Question 3 that should make it a bit easier to answer these.