Changelog:
 4 Feb 2023: don’t actually describe trap door functions as asymmetric encryption, add section on message authentication codes; reorganize asymmetric encyrption section to not assume that encryption function and decryption are the same function with different parameters; add section on pitfalls with cryptographic protocols; reorganize DiffieHellman key exchange to separate out of the more mathematical parts
 5 Feb 2023: adjust security property phrasing for MACs to mirror symmetric ciphers more; add further resourecs; expand on the existence of other protocol attacks
Three basic primitives provide a great deal of modern security: hashes, symmetric (or sharedkey) ciphers, and asymmetric (or publickey) ciphers. This writeup is designed to explain the concepts behind these tools and how they are used to achieve common security objectives without the algorithmic and mathematical complexity of current implementations.
1 Four useful tools
1.1 Hashes
A hash function takes an input of any size (often generalized as a byte stream) and returns an output of a fixed size (often generalized as a large binary number) such that changing any byte of the input would also change the output.
The word hash
is used to refer to (a) the output of this
function, (b) the function itself, and (c) the action of running this
function. It is grammatically correct to say I hashed it with my
favorite hash to get the following hash:
f4ca408dc68ca2776e5d294e8f1166b56ee5af9b
.
A crytographicallysecure hash function is not feasibly invertible. That is, given the output of the hash function and the code of the function itself, finding an input that would generate that output requires work equivalent to guessing random inputs until one happens to have the desired output.
The following code implements a hash function, in that any sized input will provide a fixedsize output
(uint *arr, size_t len) {
uint hash= 0;
uint ans for(size_t i=0; i<len; i+=1) {
= (ans << 1) ^ arr[i] ^ (ans>>31);
ans }
return ans;
}
However, that code it not even close to being cryptographically
secure; to get a desired output x
all that is needed is to
input an array containing only x
.
It is very difficult to prove that a hash function is crytographically secure, and we have a long history of compromising one hash function (e.g., by figuring out how to rule out 99% of all inputs so we only have to guess 1% as many) and adopting another.
1.2 Symmetric chiphers
A symmetric cipher or symmetrickey cipher is a pair of functions, one to encrypt and one to decrypt, each taking two inputs: a message of arbitrary length and a limitedsize secret key. They should have the following properties:
 e(m, k) = c \ne m — that is, encryption results in a ciphertext which is unlike the original message
 d(e(m, k), k) = m — that is, decryption undoes encryption
 recovering m from e and c without k is no easier than trying every possible k until one works
The following code implements a symmetric cipher, albeit not a secure one
void e(uint *arr, size_t len, size_t key) {
for(size_t i=0; i<len; i+=1)
[i] += key;
arr}
void d(uint *arr, size_t len, size_t key) {
for(size_t i=0; i<len; i+=1)
[i] = key;
arr}
Describing why this is insecure stems from the fact that messages have internal patterns. If I know that you are sending a PDF document, for example, I know that the first four bytes are always 0x25, 0x50, 0x44, 0x46; I can simply subtract that from the first four bytes of the ciphertext and recover the key.
Getting the first two properties is easy; getting the third is hard, contributing to a history of finding a weakness in one cipher and picking another to replace it.
1.3 Message authentication codes and authenticated encryption
A message authentication code are functions similar to a hash function except it takes a limitedsize secret key k. These functions have the following properties:
 f(m, k) = t — applying the message authentication code f produces a tag t
 recovering k from f and m and t is no easier than trying every possible k until one works
 knowing f and m and t, finding another message m' and corresponding tag t' such that f(m',k)=t' is no easier than finding k
By sending the tag
along with our message, we can ensure that
the message was sent by someone who had the secret key k.
Frequently, this is used in conjunction with symmetric ciphers. The symmetric cipher itself only ensures that the message contents is secret, but it doesn’t ensure that it is not maliciously corrupted. Combining asymmetric encryption and a message authentication code is called authenticated encryption.
1.4 Asymmetric ciphers
An asymmetric cipher or publickey cipher^{1} is pair of functions that can both encrypt and decrypt. However, unlike symmetric ciphers, keys come in pairs. If (e,d) is an asymmetric cipher and (k_1, k_2) is a key pair then
 d(e(m, k_1), k_2) = m — k_2 decrypts what k_1 encrypts, and
 knowing d, c, and k_1 should not make determining k_2 or m too easy
Generally, k_1 is called the public key and k_2 is called the private key.
In practice, only a limited set of asymmetric ciphers are known and
they do not offer quite as good security as no better than brute
force,
but they do have (near) proofs of difficulty to break
if
 Certain generallyaccepted but notyetproven theorems hold (notably
P ≠ NP
andfactorization ∉ P
)  You only encrypt small data
The following code implements an asymmetric cipher, albeit not a secure one
int e(int msg, int key) {
return msg * key;
}
int d(int msg, int key) {
return msg * key;
}
A key pair is a pair of integers that multiplies to give
1
, which is possible because int
has finite
size; for example 2501 * 3221654797 == 0x75400000001
, which
truncated to 32 bits is just 1; thus (2501, 3221654797) is a key pair
for this function.
int message = 0xdeadbeef;
int k1 = 2501, k2 = 3221654797;
("> message: %x; ciphertext: %x; decrypted: %x\n",
printf, e(message, k1), d(e(message, k1), k2));
message("< message: %x; ciphertext: %x; decrypted: %x\n",
printf, e(message, k2), d(e(message, k2), k1)); message
displays
> message: deadbeef; ciphertext: 776a54eb; decrypted: deadbeef
< message: deadbeef; ciphertext: ba965523; decrypted: deadbeef
This cipher is not secure because the extended Euclid’s algorithm can efficiently derive k_1 from k_2.
Typically, keys are generated in pairs and one of them (chosen arbitrarily) is kept secret while the other one is shared publicly. If you have my public key, you can use it to encrypt messages only I can decrypt (you can’t even decrypt them yourself) and to decrypt messages only I can encrypt.
Unlike symmetric ciphers, of which there are many, only a few asymmetric ciphers have ever been popular:
 RSA^{2} encryption is the most well known by far.
 ElGamal encryption and ECIES and some similar schemes, which are all based on DiffieHellman key exchange (see below)
1.5 Digital Signatures
A digital signature scheme (sometimes just called a
signature or publickey signature) is
pair of functions s
(sign) and v
(verify).
Like asymmetric encryption, digital signature schemes use keypairs. If
(s,v) is a digital signature scheme and
(k_1,k_2), then
 s(k_2,m)=S is called a signature,
 v(k_1,S,m) = 1 but v(k_1,X,m)=0 for almost all X \not= S
 knowing s, v, m, S and k_1 should not make determining k_2 easy, and
 knowing s, v, m, S and k_1 should not make finding other values S', m' such v(k_1,S',m')=1 too easy
Like with asymmetric encryption, k_1 is called the public key and k_2 is called the private key.
Less formally, we can describe that last security porperty as
disallowing the forging
of a signature on any new message.
Like with asymmetric ciphers, only a few digital signature schemes have ever been popular: * RSA signatures. RSA signatures and RSA encryption are based on the same underlying functions, so the decrypt function in RSA encryption is similar to the sign function for RSA signatures. * DSA and ECDSA and EdDSA and some similar schemes, which are all mathematically related to DiffieHellman key exchange.
Trapdoor functions and relating signatures and publickey encryption
Most of the popular digital signature schemes have mathematically related asymmetric encryption algorithms. Typically, both the signature scheme and the asymmetric encryption scheme are based on the same trapdoor function.
A trapdoor function will be easy to compute in one direction but very
hard to compute in the inverse direction without some private
information. To construct a trapdoor function, someone will generate a
private key (the private information) and a corresponding public key
which is related to the private key. will call the function that is easy
to compute the public
operation and the function that is hard to
compute the private
operation. The public operation is requires
the public key and the private operation requires the private key.
Given a trapdoor function, asymmetric encryption can be implemented by something similar to using the public operation as the encryption function and the private operation as decryption operation. Digital sigantures can be implemented by using something similar to: * using the private operation as the sign function, and * applying the public operation to a signature and comparing to the original message for the verify function In practice, there are often some extra details required to make a secure asymmetric encryption or signature scheme secure, depending on the details of the trapdoor function.
There are some digital signature schemes which are not based on a trapdoor function, like the Lamport signatures. There are some asymmetric encryption schemes where a related digital signature scheme is far less straightforward than for the popular RSA or ECDSA, such as McEliece.
1.6 DiffieHellman
The DiffieHellman key exchange^{3} is a method of causing two (or more) parties to agree on a single shared random integer without anyone else listening in being able to know what number they agreed upon.
Without going into mathematical details, when performing DiffieHellamn key exchange: * each party generates a random private key * each party generates a public value from their private key and sends it to the other party * each party can then combine the other’s public value with their private key to obtain the same number
1.6.1 Mathematical details
To make this possible mathematically, the process requires
identifying a cyclic group
– that is, a set of values (integers
are preferred) and an operator on elements of that set such that op(op(x, y), z) = op(op(x, z), y). For this
scheme to be secure, the set of values f works on should be large and the f operation hard to undo (i.e., knowing both
x and op(x,y) should not make it easy to determine
y).
The following code implements a cyclic group (i.e., f(f(x,y),z) == f(f(x,z),y)
,
albeit not a secure one
int f(int generator, int key) {
return generator * key;
}
This is insecure for the same reason it was for asymmetric ciphers.
If two people want to create a shared secret, they
 Agree (in public) on an initial random number g
 Each (privately, without telling anyone at all) pick a second random number, their private key
 Each applies the operator to g and their key, sharing the result (in public) with the others
 Each applies the operator to what they received and their key to attain their shared secret
person A knows  shared publicly  person B knows 

f, g  
a  b  
f(g, a), f(g, b)  
f(f(g,b), a)  f(f(g,a), b) 
Suppose you and I are communicating, using the (insecure) f from the previous example.
One of us picks
0x3308006d
as g and we both agree to use itI pick
0x71563a35
as my key, computef(g, 0x71563a35) = 0xa25ec891
, and share0xa25ec891
You pick
0xd380203f
as your key, computef(g, 0xd380203f) = 0x9c85bad3
, and share0x9c85bad3
I use what you shared (
0x9c85bad3
) and my secret (0x71563a35
) to computef(0x9c85bad3, 0x71563a35) = 0x99e57baf
You use what I shared (
0xa25ec891
) and your secret (0xd380203f
) to computef(0xa25ec891, 0xd380203f) = 0x99e57baf
You and I now both know
0x99e57baf
, but all anyone listening in knows are0x3308006d
,0xa25ec891
, and0x9c85bad3
.Because we picked an insecure operator f, this public information does allow others to figure out our secret keys via the extended Euclid’s algorithm and thus learn our shared secret. If we had picked a better f (an elliptic curve, perhaps) this would not have been feasible.
1.7 Cryptographicallysecure random numbers
Many security activities depend at some point on a random number being used. These need to be random in the sense that there is no ability to guess them, even if you know a great deal about the code that created them.
Most software random number generators
are actually
pseudorandom number generators, meaning they are created by a
deterministic subroutine with some internal state; if you know the value
of that state, you can predict the sequence of random
numbers
perfectly, and observing a sequence of generated numbers is sufficient
to reconstruct the state.
Cryptographicallysecure random number generators instead rely on
some form of entropy harvesting to collect
unpredictable bits from sources invisible to outsiders. For example, as
I type this writeup the loworder bit of the microsecond at which I
press each key may appear to be pure entropy
—that is, without
pattern or meaning. We could likewise harvest the loworder bits of each
mouse movement, CPU temperature reading, etc. We don’t actually know
that such measurements are random; there might be a pattern we haven’t
noticed, and if so even secure algorithms may be breakable by attackers
that know those patterns. However, it’s almost certainly more secure
than using something entirely predictable like the output of a
pseudorandom generator seeded with the time of day.
Modern operating systems typically harvest likely sources of entropy and maintain an entropy pool. Since entropy cannot be generated on demand, there is always risk that the entropy pool will not contain enough bytes for a needed cryptographic purpose.
As explained in man 4 urandom
,
Linux maintains an entropy pool in /dev/random
. It is
designed to be readonce: if you read a byte from
/dev/random
, there is one fewer byte there than there was
before, and it is relatively easy to empty it entirely.
To avoid having cryptographic tools wait for more entropy to be
generated, Linux also provides /dev/urandom
, a
pseudorandom number generator that uses harvested entropy to prevent
itself becoming predictable.
As noted on the manual page, The
Explaining how the cryptographic security of
/dev/random
interface is considered a legacy interface, and
/dev/urandom
is preferred and sufficient in all use cases,
with the exception of applications which require randomness during early
boot time./dev/urandom
was established is beyond the scope of this
course.
2 Using the tools
2.1 Signatures
 Task
 Verify that a document has been approved in its current form by a particular individual.
 Solution

Have them sign the document — or, more commonly, a hash of the document, with their private key using a digital signature scheme.
To check the signature, see if the digital signature scheme’s verify function returns true.
2.2 Should I trust you?
 Task
 Convince someone you don’t know that you are who you say you are.
 Solution

 Have someone who knows both of you sign a
document saying
this person’s public key for asymmetric encryption is x.
 Give that document, called a certificate, to the skeptic who doubts your identity.
 The skeptic then generates a random number y, encrypts it with your public key, and tells you the encrypted result z.
 You decrypt z with your private key and tell the skeptic
it’s really me: your number was y.
 Have someone who knows both of you sign a
document saying
The internet today uses a few wellknown certificate authorities who validating website identity and signing certificates.^{4}
A key insight about digital certificates is that presenting the certificate is just half of the authentication process; it must be followed with a challenge to verify you can use your private key, and that private key is never given away. It’s a bit like (but much more secure than) an ID card with a picture of you on it: you have to have your card (which others could steal) and your face (which you never intentionally give to anyone) to use it.
In this example, the certificate contained a public key for encryption. Something similar can be done if the certificate instead contains a public key for a digital signature.
There are alternative ways of establishing trust without a centralized set of certificate authorities, but the approach described here is dominant at the time of writing^{5}.
2.3 Storing passwords
 Task
 I want the computer to recognize my password, but even the root account is not to be able to learn what my password is.
 Solution

Store only a hash of the password.
Note that a simple implementation of this makes possible exploits like rainbow tables, where the hashes of a large number of plausible passwords are precomputed to compare against the hashes stored on a computer. One partial solution to this is to salt the password before hashing: we generate a random value, store it with the password, and include it and the password in the hash.
Passwords on the internet are much less secure than this. We hope servers store only hashes (though verifying this is hard), but we still have to get the password to them to hash, so they typically are transmitted in their raw forms. Thus, if your browser is remembering your passwords, it cannot be remembering only the hashes because it needs to enter the actual passwords into web forms for you. However, if you are good about using a different password for each site, browser storage of them can be more secure than typing them manually, as it prevents you accidentally providing one site with another site’s password.
A better approach to web passwords is to use a password manager. This
still has to store your passwords, but it can encrypt them using a
symmetric key derived from your master
password, which it stores
only as a hash, so that compromise of the password manager has limited
loss. It can also generate random, virtuallyunguessable passwords for
each site and continue to provide access to your passwords even if you
are not on your usual browser.
2.4 Let’s talk privately
 Task
 Initialize private communication with a new acquaintance in a public place.
 Solution

Since communication has patterns that can be used to bypass the security of public key cryptography, we need to establish a shared key for use with a symmetric cipher. Such a key is generally little more than a random number, so we can use DiffieHellman to generate it.
Of course, we first need to agree on which symmetric cipher to use and possibly share certificates to make sure we are who we say we are, and DiffieHellman involves several communication steps itself, so establishing this communication requires a multistep backandforth protocol. One popular protocol for this is called TLS, which provides an interface similar to TCP sockets that gaurentees privacy and prevents tampering by malicious networks.
TLS key exchange
The TLS protocol has a lot of ways it can work, depending on the server and browser configuraiton, but the basic flow of one of the more common configurations today when your browser connects to a web server:
 your browser generates a private key for using DiffieHellman key exchange and sends the corresponding public value to the server
 the server generates a temporary private key for using DiffieHellamn key exchange and sends the coresponding public value to the server
 the browser and server combine the public values with private keys to obtain several new keys for symmetric encryption and message authentication codes.
 the server sends its certificate(s). The certificate contains a copy of the server’s longterm public key, signed by a certificate authority that the browser is already aware of.
 the server hashes all the messages it sent so far, then signs this hash with its longterm public signature key. This lets the client make sure the server is the one identified by the certificate.
 the server sends a message authentication code over everything sent so far. This lets the client make sure that the server is the same one who sent the public DiffieHellman value the client used.
 the browser verifies the signature on the certificate. Then, using the information in the certificate, it verifies and message authentication code
 the client and server communicate using the authenticated symmetric encryption using keys generated via DiffieHellman
3 Pitfalls of cryptographic protocols
Noncareful use of encryption and authentication has often resulted in suprising security flaws despite using otherwise secure cryptopgraphic functions.
As an example, let’s say A is trying to tell B what payments to make for company.
A naive approach to do this would be for A to simply encrypt their messages to B, either using shared secret key and symmetric encryption or using a keypair and asymmetric encryption. These ideas are likely to run into a lot of problems.
3.1 Symmetric cencryption but not authentication
Just because messages are encrypted does not mean they cannot be tampered with.
With symmetric encryption, often it is possible for a malicious party to manipulate the encrypted message despite not being able to encrypt it. For example, let’s say A sending a message like:
pay $ 1000.00 to Mallroy
and encrypts this message. An attacker might not be able to decrypt
this message, but may be able to guess the contents of the message from
other sources. With many encryption algorithms this would let them
change the contents of the message in a targeted way. For example, they
would know where in the cipertext the data for 1000.00
would be
and could try flipping some bits to change the number there. A
countermeasure to this type of attack is to authenticate messages (such
as with a message authentication code).
3.2 Public key encryption but not authentication
With publickey encryption, the public keys are public. This means that anyone can encrypt a message to a public key. So if A is sending a message to B by encrypting to B’s public key that says:
pay $ 1000.00 to Mallroy
then an attacker M can do exactly what A did: send their own message to B by encrypting with B’s public key. And when M does this process, they could change the number. To prevent M from pretending to be B, B could, for example, require messages to also be signed using a public key that corresponds to a private key only A has.
3.3 Replay attacks
Let’s go back to the example of the message:
pay $ 1000.00 to Mallroy
If this message is signed and encrypted, an attacker might still have access to the signature and ciphertext. Using this, they can send the message as many times as they like. If A is willing to make multiple payments for B, then A will end up paying Mallroy too much.
A common countermeasure for replay attacks are nonces, numbers used only once. For example, instead the message could be:
payment #15: pay $ 1000.00 to Mallroy
and B could verify that the payment number is larger than any that has been used before.
If we wanted to avoid having B remember the highest payment number, an alternate scheme would be to have B choose a large random number to use a nonce, remember it and send it to A, and require that A’s next message uses a number it just sent A.
3.4 Other attacks, prevention
Besides failing to authenticate or replay attacks, there are other issues that have plagued implementations of cryptographic protocols, despite use of otherwise secure building blocks. Some examples:
 trying to have A prove that they are talking to B by decrypting a message asymmetrically encrypted from B to A, but not realizing that this allows B decrypt messages others sent to A
 implementing verification of the server on the other end of the connection, but skipping it when the server sends a message that would normally only be used after the verification completed
Because of the difficulty of verifying protocol correctness by hand, the state of the art is to use machinechecked proofs that protocols have certain security properties.
4 Side channels
Having a secure protocol and entropy source is not sufficient to have a secure system. Humans are often a weak point, picking bad passwords and sharing them too freely. However, side channels are also a major possible weakness.
A side channel is something that can carry information but was not the communication channel the designer of the system had in mind. There are many side channels and many side channel attacks, but one example will suffice to show the complexity involved in anticipating and preventing them.
Suppose I want to learn the randomlygenerated private key you used in DiffieHellman. I look through your code and consult the specs of your processor and notice that it takes your code 2 microseconds to process each 0bit and 3 microseconds to process each 1bit. There are a few other aspects that are hard for me to time perfectly, like how long it takes me to see your messages, but after watching a few hundred DH key exchanges I get a reasonable estimate on those as well.
Now suppose for a particular 32bit DH session I time your code and determine you have 13 1bits in your key. To bruteforce guess your key, I only need to check the 347,373,600 32bit numbers that have 13 1bits, not all 4,294,967,296, a savings of approximately 12×.
5 Further resources
5.1 Textbooks
Anderson, Security Engineering (e.g. the Protocols chapter of any edition; note that PDFs of the second edition are on that website)
Ferguson, Schneier, and Kohno, Cryptography Engineering
Menezes, Oorschot, and Vanstone, Handbook of Applied Cryptography (particularly chapter 1) (This book takes a very mathematical approach, unlike the prior two books, which more oriented towards cryptography implementators.
Note that
publickey
is an overloaded term; for example, DiffieHellman does not use the type of asymmetric cipher described here, but is asymmetric in a different way and is also sometimes called apublickey
protocol.↩︎RSA is short for RivestShamirAdleman, named after Ron Rivest, Adi Shamir, and Len Adleman, its inventors. It was previously invented by Clifford Cocks, but Cocks’ document describing it was given a topsecret classification by the British government and not made public until 1997.↩︎
There is some difference of opinion about the justice of this name. Whitfield Diffie and Martin Hellman published a paper describing it in 1976, but Hellman later stated the idea in the paper was that of their graduate student Ralph Merkle.↩︎
Since you are connecting to a website, the
identity
that most certificate authorities are verifying is control over a particular domain name. You might think that this would be done by looking up who has the domain registered and contacting them in some secure way. The reality is maybe less satisfying. As of this writing, the CA/Browser forum sets the criteria for verifying domain control and primarily it’s based on what machine(s) the certificate finds when they lookup that name on their internet connection(s) when they create the certificate.↩︎2023↩︎