University of Virginia, Department of Computer Science CS588: Cryptography, Spring 2005

### Permutation Cipher Redux

1. (20) How large is the set of messages that can be transmitted with perfect secrecy using the 8-bit transposition cipher (as in PS1, Question 4b) with an alphabet containing only 2 symbols? That is, strings in {0, 1}*. Justify your answer by describing the elements of your message set, and why it is not possible to transmit any larger set of possible message without revealing some potentially useful information to a passive attacker. (Note: you may transmit as many blocks as you want, but they must all be encrypted using the same transposition key.)

Apparently these transposition cipher questions are harder than I think, since no one answered this one correctly either. Before discussing the correct answer, let me explain why the common attempts were misguided:
• Several of you used unicity distance to answer this. Information theoretical perfect secrecy can not be estimated using unicity distance. Unicity distance is a way of approximating the amount of ciphertext an attacker needs for a brute force guessed plaintext attack to work. It depends on the language of the plaintext being redundant. If all plaintexts are equally likely, there is no way to check if a guess is correct. Information theoretical perfect secrecy depends on the ciphertext revealing no extra information about the plaintext (than the attacker already knows from the distribution of messages).
• Many people answered there are 28 different messages because that is the most you can send in one block, and if you send more than one block you are reusing the key (which we know is insecure from our experience with the two-time pad). Both parts of this argument are faulty. You cannot actually send 28 different messages in one block without revealing something to the attacker. With the transposition cipher, the attacker learns the distribution of symbols in the input. So, an attacker receiving "00000000" (which must be one of the 28 possible ciphertext messages if we are transmitting 28 different messages with one block) knows that the plaintext was "00000000"; an attacker who receives a message that contains a "1" bit, knows the plaintext was not "0000000". This means the cipher is not perfect, since the attacker has learned something about the plaintext message from the ciphertext. So, the largest set of messages that can be transmitted with perfect secrecy using only one block is the largest set of messages we can construct with the same distribution of symbols. That is the set of blocks with four "0" bits and four "1" bits. The number of elements in this set is 8C4 (we are choosing 4 positions for the "1" bits out of 8 possible positions) = 8!/4!4! = 70. But, the question did not limit the message length to one block. We can transmit multiple blocks with the same key, so long as all possible plaintext messages are still equally likely for every possible ciphertext message.
So, we need to think of 8! different messages that can be transmitted with the transposition cipher so that every message can be mapped to every ciphertext using some transposition key.

Recall the 26-symbol version, we found the 8! different messages by considering all permutations of "ABCDEFGH". If we can represent each of these 8 symbols using the 2-symbol alphabet in a way that makes them indistinguishable from the ciphertext (that is, each represented symbol can map to any other on some key), then we are done. There are many ways we could do this. A simple one is:

```A = 00000001
B = 00000010
C = 00000100
D = 00001000
E = 00010000
F = 00100000
G = 01000000
H = 10000000
```
Note that there is equally many keys for which the intercepted ciphertext "10000000" corresponds to a plaintext "A" (all keys that transpose the eigth and first positions) as to a plaintext "H" (all keys that leave the first position where it is).

Now, we have to argue that receiving 8 blocks encoded in this manner corresponding to the permutations of "ABCDEFGH" reveals nothing to the attacker. Suppose the attacker intercepts, "10000000 00010000". She knows the first two blocks transmitted were not the same plaintext blocks. This sounds like the cipher is imperfect. But, the key is the attacker knew that already since the set of possible messages includes only 8-block sequences where all the blocks are different. Hence, the attacker learns nothing new from the ciphertext blocks. Since the second block could correspond to any letter, other than the one the first block corresponds to, no information is learned and the cipher is perfect. This argument continues through the 8th block. If the sender attempted to transmit a 9th block, something is revealed, since one of the previous blocks must be repeated.

The encoding we describe seems very inefficient - we are using eight 8-bit blocks (64 bits) to transmit 8! (40320 < 216) different messages. Without worrying about the perfect secrecy property, we should only need 2 blocks, but our encoding uses 8. This brings up:

Challenge Problem 3. Either:
1. Prove that there is no more efficient encoding that is still information theoretically perfectly secure.
2. Devise a more efficient encoding and prove that it is information theoretically perfectly secure.

### Smashing Hashes

The attached paper — Adrian Perrig, Robert Szewczyk, J. D. Tygar, Victor Wen and David Culler, SPINS: Security Protocols for Sensor Networks. Wireless Networks, 2002 (originally in MobiCom 2001) — describes several protocols for securing sensor networks. These questions concern only µTESLA, a protocol that provides authenticated streaming broadcast without needing public-key encryption. (You are encouraged to read the whole paper, though, since it touches on many things we have seen in this class.)

2. (20) Since wireless communication in sensor networks is unreliable, it is possible that a node misses one of the released keys. How serious of a problem is it if a node misses a key release message? (Your answer should explain how the node can validate the next received key, and whether or not messages transmitted using the missed key can eventually be authenticated.)

If a node misses a sequence in a chain, it just needs to hash the next revealed key multiple times until it matches the previous known key.

Suppose the node heard K3 = F97 (k). It misses K4 and K5, but hears K6. It can verify K6 by checking F (F (F (K6))) = K3 since K6 = F94 (k).

3. (30) If µTESLA is used on a long-lived sensor network application, eventually the end of the hash chain will be reached (the sender would need to use Kn and have no more keys left). Suggest a modification to the µTESLA protocol that can be used to extend its lifetime indefinitely. Explain the security risks of using your modified protocol.

The sender can initialize a new hash chain by generating a new secret Kbn and sending the first key in the new hash chain authenticated with the last key in the previous chain.

Suppose the first sequence started with Ka0 = F100 (Ka), then Ka99 = F1 (Ka100). During the time quantum for key Ka100 the sender will transmit Kb0 = Fa100 (Kbn) along with a MAC generated using key Ka100. After the time quantum expires, the sender reveals Ka100. The next time quantum will use Kb1 as the key, which the receivers can verify using the (now-verified) Kb0 which started the new hash chain. Note there is no need for secrecy in sending Kb0, only a need for integrity which is provided by the MAC.

We should worry about the situation where the message revealing Ka100 is lost, or the message sending Kb0 is lost. In either case, the receiver would be unable to recover and all subsequent messages would be unauthenticatable. The easy solution is just to transmit these messages multiple times. The risk is increasing the time between the first revelation of Kb0 and the end of the Kb0 time quantum. Ideally, this should be no longer that the time between any other key revelation.

The paper (which was published in 2001, before the weaknesses in cryptographic hashing algorithms were known) suggests using MD5 as the cryptographic hash algorithm to generate the µTESLA hash chain. In Lecture 8 we saw a Perl program that demonstrates MD5 does not provide strong collision resistance, as is expected in a cryptographic hash algorithm. Make sure you understand the difference between weak and strong collision resistance (defined in Lecture 8) when you answer questions 4 and 5.

4. (20) How secure is the authentication provided by µTESLA if a hash algorithm that does not provide strong collision resistance is used?

This is not a serious problem. Strong collision resistance means an attacker can find some pair x, y such that F(x) = F(y). As long as neither x nor y appears as a key in the hash chain, the collision does not help the attacker. Since the actual values in the hash chain should be randomly distributed, the chance of one of them appearing is exceedingly low. The strong collision does not even help the attacker, though, since it is just as useful to have two pairs, x1 = F (y1) and x2 = F (y2) and wait for either x1 or x2 to appear in the hash chain. When it appears, the attacker can use y1 or y2 as the next key in the sequence to hijack the receivers. However, if the MD5 hash function is used the 128-bit output, the chance that a particular value is used is 2-128 so the attacker will need to wait an awfully long time.

5. (10) How secure is the authentication provided by µTESLA if a hash algorithm that does not provide weak collision resistance is used?

This could be a serious problem. If the function F does not provide weak collision resistance, the attacker can find a value x such that F(x) = y for an arbitrary y. So, if the previous key revealed is y, the attacker can use x as the next key in the sequence. This allows the attacker to send bogus messages during this time quantum. Note that the attacker would need to find another weak collision to continue the hash chain. Otherwise, if the receiver receives the next key revelation message from the legitimate sender, the receiver will know something went horribly wrong. The last three revealed keys were a, b and c where a and c are from the original good hash chain and b is a collision found by the attacker. The receiver finds a = F(b) (everything looks okay after attacker's key was revealed), but F(c) != b and F (F (c)) = a. Note that this assumes the likely event that the attacker does not find the actual value in the original hash chain.

This attack would require a very fast technique for finding the weak collision. It would need to find the collision before the time quantum expires, since after the key is revealed it has no value. The discovered collision is only useful for the remainder of the time quantum. The reported attacks on SHA-1 are very very very far away from doing this. They are a 269 work attack (which is significantly better than the 280 expected work if SHA-1 was perfect). Further, it is only able to find particular collisions. It breaks strong collision resistance, but not weak collision resistance.

### Censorship-Resistant Publishing

Note: this question was added on Friday, 18 February after Chenxi Wang's guest lecture.

6. (20) Censorship-resistant publishing schemes rely on secret sharing where some k out of n pieces of information are needed to construct a document, but fewer than k pieces provide no useful information. We can describe secret sharing schemes according to the value of n and k where n is the number of shares distributed, and k is the number of shares needed to recover the secret. For example, the last question on Problem Set 1 considered a (3, 3) secret-sharing scheme. For censorship-resistant publishing, we need a scheme where k < n, so if one of the participants refuses to provide her share, the others can still recover the document.

Invent a (2, 3) secret-sharing scheme using XOR as the only operation. Your answer should explain how a secret is divided into 3 shares, how any 2 of those shares can be combined to recoved the secret, and include a convincing information theoretical argument why any single share provides no information.

Generate three random keys, K1 and K2 and K3, each as long as M and let K12 = K1 XOR K2, K13 = K1 XOR K3 and K23 = K2 XOR K3. Then:
A gets K1 and M XOR K12
B gets K2 and M XOR K23
C gets K3 and M XOR K12
A and B have K1, K2 and M XOR K12.
A and C have K1, K3 and M XOR K13.
B and C have K2, K3 and M XOR K23.

So, any pair can determine M. No person gets anything that is not XOR'd with some random, unknown value, so perfect secrecy is preserved.

Note that we can produce a (n, k) secret sharing scheme using just XOR for any values of n and k. It gets pretty expensive though, since we are generating different values for all possible combinations of shares that could decrypt the message. This is why more complex schemes (like the points on a line scheme described in Chenxi Wang's lecture) are used.