Matasano - Cryptopals

Note: most links below are just links to files within this repo for your
convenience.

Set 1: Basics

1. Convert hex to base64

Trivial introduction to hex parsing to get you going. I
implemented b64decode out of curiosity.
2. Fixed XOR

Introduction to XOR between strings. Implementation in xor.py.
3. Single-byte XOR cipher

Ahh. Crypto! To break it, loop through all 256 possible bytes to find which
of the keys produces the decrypted text that looks the most like English.

frequency.py has code to score how much a string looks
like English (testing how close to a certain distribution our data is).

xor.py has code to encrypt/decrypt using a single-byte XOR
cipher.
4. Detect single-character XOR

Very similar to challenge #3. Break all ciphertexts, pick the one with the
best "English score".
5. Implement repeating-key XOR

Simply cycle the key when XOR-ing with the plaintext.
6. Break repeating-key XOR

Interesting! Breaking it amounts to breaking a single-byte XOR cipher for the
1st character of every repeated key, then for the 2nd, the 3rd, etc.

Guessing the keysize is done by picking the keysize that minimizes the
normalized hamming distance (distance / keysize) between
"blocks" of the repeated key. This is because we can expect 2 different blocks
encrypted with the right keysize to have similar bit patterns (e.g. matching
characters), so minimizing the normalized hamming distance gives us the
keysize that produces the most similar blocks. We need to normalize because
e.g. a keysize of 4 would have 2 times as many bits as a keysize of 2.
7. AES in ECB mode

Intro to using the pycrypto API. aes.py has the code that
handles AES encryption/decryption.
8. Detect AES in ECB mode

ECB mode encrypts the blocks independently. This means that two identical
blocks of plaintext at different positions will produce the same ciphertext
block. All that we need to do is find the ciphertext that has the most (or
any, really) duplicate ciphertext blocks, which is very unlikely with 16 bytes
of otherwise random data.

Set 2: Block crypto

9. Implement PKCS#7 padding

We pad the plaintext so that it has a length that is a multiple of the
blocksize (e.g. 16). If it already is, we add a full block of padding. The
padding byte to use is equal to the length of the padding.

Examples:
```
pad('0123456789') = '0123456789' + '666666'
pad('') = '\x10' * 16
```
10. Implement CBC mode

Wikipedia's
image on the matter is enough of a description to implement this. Basically,
every encrypted block is XORed with the previous block (or the IV if this is
the first block). The result is that 2 identical plaintext blocks will no
longer automatically encrypt to the same ciphertext block (compared to ECB).
11. An ECB/CBC detection oracle

When using ECB, two identical plaintext blocks will encrypt to the same
ciphertext block.

Therefore, a block that contains duplicate plaintext blocks will contain
duplicate ciphertext blocks once encrypted.

Our oracle checks if the ciphertext contains duplicate blocks. If it does, we
consider the ciphertext to be encrypted using ECB. Otherwise, we consider that
it used CBC.
12. Byte-at-a-time ECB decryption (Simple)

Detecting the blocksize is relatively easy. We start by noting
len(ciphertext). We continue adding prefix bytes until we notice a change in
len(ciphertext). When that happens, it means that we have crossed a block
boundary and we know the blocksize to be len(after) - len(start).

Using the ECB property that we exploited earlier (identical plaintext blocks
encrypt to identical ciphertext blocks), we can bruteforce individual bytes
at the end of blocks.

For example, assume that we want to bruteforce the (unknown) plaintext:
hello this is a test.

At first, we want to isolate the first character (h, but we don't know that
yet) at the end of a block, through a prefix of As. Like this:
```
AAAAAAAAAAAAAAAh
```
We get the encrypted result of this (call this C0). We can try all
possible bytes that could follow our prefix, like the following:
```
AAAAAAAAAAAAAAAa
AAAAAAAAAAAAAAAb
AAAAAAAAAAAAAAAc
...
```
Eventually, we will get an encrypted block that will match C0, and thus
we will know the first byte of our plaintext.

We can bruteforce the second byte (e) similarly. We start by getting C1 by
encrypting:
```
AAAAAAAAAAAAAAhe
```
Then by trying all possible bytes:
```
AAAAAAAAAAAAAAha
AAAAAAAAAAAAAAhb
AAAAAAAAAAAAAAhc
...
```
Until we find a block that encrypts to C1. And so on, for all bytes in the
first block.

For the following blocks, the idea is the same, but a little more thought must
be put into where we get our Ci and what should be put as a prefix.

The block Ci is merely the index of the block we already are at with the
next byte that we want to bruteforce. The reason is quite simple: with our
padding, we will only be pushing that byte at the end of its current block,
not produce a next block.

The padding only has to be enough to put the next bruteforced byte at the end
of a block. When bruteforcing, however, the bytes before the tried byte must
be the last 15 (blocksize - 1) known bytes, to fit Ci.

This is all clearer through an example:
```
plaintext:    hello this is a (...unknown....)
              |---block 0----||---block 1----|
bruteforcing: t (unknown yet)

We will pad with As:
padded:       AAAAAAAAAAAAAAAhello this is a t(...unknown....)
              |---block 0----||---block 1----||---block 2----|
Ci = block 1

Bruteforce tries: ello this is a a
                  ello this is a b
                  ello this is a c
                  ...
```
We repeat the steps up until the point where we reach the last block. Then, we
will eventually hit the padding.

If the padding happens to be 1 byte, we will successfully find it (this is
just like bruteforcing a normal byte that just happens to be 0x01).

If the padding is anything else, we will first successfully "bruteforce" a
0x01 in the plaintext (because we will be padding the plaintext until it is
only missing a byte, so it will be padded with 0x01). Then, we will try to
bruteforce the next byte, but fail. The reason is that the padding will now be
[0x02, 0x02], while we are trying [0x01, 0x??]. Once that happens, we know
that we have successfully bruteforced the whole plaintext. We can remove our
undesirable bruteforced 0x01 padding and enjoy reading our plaintext. 🎉

Using the same exploitable ECB property as before, we craft multiple messages
and pick-and-choose the parts that we want.

Blocks A:

email=aaaaaaaaaa (A0)
adminPPPPPPPPPPP (A1) (P is the padding that will be needed as a final block)
&uid=10&role=use (A2)
rPPPPPPPPPPPPPPP (A3)

Blocks B:

email=aaaaaaaaaa (B0)
aaa&uid=10&role= (B1)
userPPPPPPPPPPPP (B2)

By crafting the ciphertext B0 + B1 + A1, we get:

email=aaaaaaaaaa (B0)
aaa&uid=10&role= (B1)
adminPPPPPPPPPPP (A1)

Which grants us admin access to log in.

14. Byte-at-a-time ECB decryption (Harder)

This attack is the same as the challenge #12, but with some required initial
work, offsets and padding to apply.

We first need to find out how long the prefix is. We do this by generating 2
blocks of fixed data (e.g. [0xA] * 32) and gradually increasing the size of
the fixed data until we find 2 neighbour duplicate blocks in the ciphertext.
This indicates that we have fully padded the last block of the prefix and that
we have produced two blocks of our own input after that. To make sure that we
don't just have identical blocks because the prefix happened to end with our
fixed value (therefore fooling us into thinking that we have padded 1 more
byte than we really have), we can try with 2 different fixed values, e.g.
[0] * 32 and [1] * 32.

Then, one can use the index where the duplicate blocks begin to find where the
first block after the prefix starts. With that information, we can find the
amount of padding that was required to pad the prefix to a multiple of
blocksize through len(fixed_data) - 2 * blocksize. The length of the prefix
is then index of first of the duplicates - padding length.

With the length of the prefix, we just use our algorithm from challenge #12,
but prefixing our input with some padding to pad the prefix to a blocksize
multiple. We also need to offset any index in the produced ciphertext by the
amount of blocks in the prefix.
15. PKCS#7 padding validation

We get the last byte of the plaintext as n and make sure that the last n
bytes of the plaintext are equal to n. If it is not the case, raise an
exception.
16. CBC bitflipping attacks

Start out by encrypting a normal token for a block of 16 bytes. This will be
where we will inject our crafted block. Call this encrypted block
current_block.

We want to inject target = ";admin=true;abc=".

We know that the plaintext block following our input is:
next_plain = ";comment2=%20lik".

When decrypting with CBC, the following is done:
next_plain = next_block_pre_xor ^ current_block

We can calculate next_block_pre_xor = next_plain ^ current_block.

We want next_block_pre_xor ^ crafted_block to yield target, so we choose:
crafted_block = target ^ next_block_pre_xor.

Then, all we need is to swap current_block with our crafted_block to get
admin access. The decryption of current_block will yield scrambled
plaintext, but it is not a problem since it only modifies comment1.

Set 3: Block & stream crypto

17. The CBC padding oracle

If we have an oracle that tells us if the padding on a ciphertext is valid or
not, we are able to recover the plaintext.

Keep in mind that decryption happens like this:
```
block_pre_xor xxxxxxxxxxxxxxxx ^
   prev_block pppppppppppppppp =
    plaintext tttttttttttttttt
```
We can play with the prev_block to search for a byte that results in valid
padding:
```
block_pre_xor xxxxxxxxxxxxxxx? ^
   prev_block pppppppppppppppB = (we bruteforce 'B')
    plaintext ttttttttttttttt1
```
Once we have the prev_block byte that gives valid padding, we can deduce
pre_xor byte ?.

Once we know the last N bytes of the pre_xor, we can use that information
to craft the ending of the padding in just the right way to bruteforce the
N+1th byte (starting from the right).

With all the pre_xor bytes recovered from a block, we can XOR them with
out real prev_block to find the plaintext.

However, we must keep the following tricky thing in mind:

The last block (and other plaintext blocks, if we're unlucky) can produce two
valid pre_xor bytes, instead of just one:
E.g. block ...55555
We'll find these 2 valid pre_xor bytes:
- the one that produces ...55551
- the one that produces ...55555
When this happens, we can just alter the 2nd last byte to produce junk, so
that only the padding ending with a 1 will pass:
E.g. changing 5 to 4
- the pre_xor that produces ...55541 passes
- the pre_xor that produces ...55545 does not
18. Implement CTR, the stream cipher mode

There isn't much to say here... We generate a keystream with:

aes(key, nonce + counter)

Where nonce and counter are 64-bits integers encoded in little-endian and
counter is the amount of blocks generated so far.
19. Break fixed-nonce CTR mode using substitutions

We reuse our english_test function and pick the key that has the best
score. In some cases, we pick a key that is not as good as some others
because we notice (by analysis of the plaintexts obtained) that the key
is not the right one.

As mentioned by the challenge description, this is a bit manual indeed.
20. Break fixed-nonce CTR statistically

Very similar to the previous challenge, except that we can heavily reuse code
that we've written before.
21. Implement the MT19937 Mersenne Twister RNG

This task surprisingly took a little longer than just copying Wikipedia's
pseudocode. I spent some time wondering why Python's random would yield
different numbers (and state!) from me, so I investigated.

I compared with C++'s std::mt19937, which yielded the same results as
me. I couldn't find posts complaining about this specifically so I took a
look at CPython's source.
It turns out that the
underlying C module
seeds in a different way from what we see on Wikipedia. It assumes that
the seed could be greater than 32-bits and just uses all of the bits of
the seed in a different procedure. The result: it ends up with a
different state from us.

Just to make sure I looked at how numpy's random and it does generate
the same state as us when seeding. Therefore, the tests for this one use
values that I extracted with C++'s implementation.¨
22. Crack an MT19937 seed

Pretty straightforward. Bruteforce all sensible seeds (a couple of seconds
in the past) and pick the one that matches.
23. Clone an MT19937 RNG from its output

The state is directly coming from the next output, so it is really easy to
recover the next state from a given output.

Essentially, we need to look at 624 transformations of the state to fully
recover the MT state:
```
state = [untemper(output) for output in outputs[-624:]]
```
Our untemper needs to reverse the right and left shift operations.

Let's look at a case with 8-bits: y = 10101010

If we're shifting right by 3 bits, we'll do the following:
```
10101010 y
00010101 y >> 3
10111111 y ^ y >> 3
```
We notice that the first 3 bits of the result will match the first 3 bits of
the original y. Then, those original 3 bits will be used (once shifted) to
xor with the original y. By using the known first 3 bits, we can recover
the next 3 bits of y by doing this:
```
10111111 y ^ y >> 3
00010100 known_bits >> 3
10101011 (y ^ y >> 3) ^ (known_bits >> 3)
```
We can then recover the whole y this way. The left shift is very similar,
but we & with a constant before xoring.
24. Create the MT19937 stream cipher and break it

This challenge was very similar to challenge 22. Basically, we bruteforce the
seed space to find one that gives an expected decryption/token.

Set 4: Stream crypto and randomness

25. Break "random access read/write" AES CTR

Here we're doing a terrible mistake: we are reusing our keystream on a
different plaintext. Solution? We provide 00000000... and it gets directly
xored with the keystream (yielding the keystream). We xor that with the
original ciphertext and we get the secret back!
26. CTR bitflipping

This one is really straightforward. We know the plaintext, so we can find the
keystream and use it to encrypt our token the way that we want.
27. Recover the key from CBC with IV=Key

When we split the obtained ciphertext in 3 blocks:
```
C1, C2, C3
```
We can then produce a ciphertext in the following way:
```
C1, 0, C1
```
and capture the produced plaintext from the error message.

This means that P1 is the result of KEY ^ P3 (since P3 is unchanged by
xoring with 0). We can recover the key through:
P1 ^ P3 = (P3 ^ KEY) ^ P3 = KEY.
28. Implement a SHA-1 keyed MAC

Wrote implementation based on Wikipedia's pseudocode (inspired by existing
implementations).
29. Break a SHA-1 keyed MAC using length extension

The final SHA-1 hash is just the h values directly encoded to bytes (after
doing the pre-processing of appending the glue for a given length and
processing all the chunks in sequence).

Thus, if we know that SHA-1(prefix) gives a certain digest, we know that
the h states after processing prefix + glue(len(prefix)) will be
digest. We can inject glue after our prefix to know what h will be
at that point, without even knowing prefix (apart from its length).

If we know the length, we just need to initialize a SHA-1 instance to have
h = unpacked_digest, and msg_len = len(prefix) + len(glue(len(prefix))).
If we update that SHA-1, we can then inject whatever we want.

In practice, since we don't know the length of the password (pseudocode):
```
# Assuming we have 'digest = H(secret + msg)' and don't know 'secret'.
for secret_len in range(128):  # some upper-bound
  h = unpack(digest)
  prefix_len = secret_len + len(msg)
  prefix_glue = glue(prefix_len)
  msg_len = prefix_len + len(prefix_glue)
  sha = Sha1(h=h, msg_len=msg_len)
  new_mac = sha.update(b';admin=true')
  new_message = msg + prefix_glue + b';admin=true'
  if try_message(new_message, new_mac):
    print("Bingo! ", new_message, " ", new_digest)
```
30. Break an MD4 keyed MAC using length extension

After refactoring the sha1 implementation to a general merkle_damgard
structure,
I then reimplemented md4 using similar primitives (only implementing
process_chunk). The logic for length-extension attacks could then be made
generally available in merkle_damgard, which I simply used for MD4 and it
worked out of the box.
31. Implement and break HMAC-SHA1 with an artificial timing leak

If we're bruteforcing the ith byte of the HMAC, we measure, for each byte,
how long it takes to evaluate a file with hmac[i] = byte. We do so round
times to get some distribution. Simply taking the median of each byte times
and taking the byte with the maximum median seemed sufficient here.

Note that this takes a while to run, so I added (optional) logic to only
add a sleep for the current byte. This is cheating because in reality each
sleep adds more noise (that's part of the challenge), but we also want Travis
to successfully run on it, in reasonable time. :)
32. Break HMAC-SHA1 with a slightly less artificial timing leak

Took the same implementation and reduced sleep to 0.1ms. Note that I'm
still doing this all without going through HTTP, which is EZ mode...
Attacking this in practice seems difficult. Good experience. :)

Set 5: Diffie-Hellman and friends

33. Implement Diffie-Hellman

Pretty easy in python, especially with pow(a, b, p). Noteworthy: to
implement modexp, you can do so with exponentiation-by-squaring, with
modulos on the way.
34. Implement a MITM key-fixing attack on Diffie-Hellman with parameter injection

By giving p instead of A to the server, it ends up computing
s = A^b mod p, which gives:
```
s = A^b mod p
  = p^b mod p
  = 0 mod p
```
So we then know the secret key. The same goes for when we return p instead
of p.
35. Implement DH with negotiated groups, and break with malicious "g" parameters

Three scenarios.

For g=1, we have:
```
s = g^a^b mod p
  = 1^a^b mod p
  = 1 mod p
```
For g=p, we have:
```
s = g^a^b mod p
  = p^a^b mod p
  = 0 mod p
```
This is the same as challenge 35. Note that it would be the same for g=0.

For g=p-1 (note p-1 mod p = -1 mod p), we have:
```
s = g^a^b mod p
  = (p-1)^a^b mod p
  = (-1)^a^b mod p
  = { -1 mod p   if a*b is odd
    { 1 mod p    if a*b is even
```
We could already do a lot by just restricting to two possibilities, but we
can narrow it down by finding if a and b are individually odd/even:
- If A == 1 mod p, a is even. If it is p-1 mod p, a is odd.
- If B == 1 mod p, b is even. If it is p-1 mod p, b is odd.
Then, a*b will be odd iff a and b are odd.
36. Implement Secure Remote Password (SRP)

Implement the protocol under srp.py.
37. Break SRP with a zero key

By passing A = 0, the server will then do:
```
s = (A * v^u)^b mod N
  = (0 * v^u)^b mod N
  =         0^b mod N
  =           0 mod N
```
We get the same result by passing A = cN, where c is an integer.
38. Offline dictionary attack on simplified SRP

This challenge gives a bit of clarity into why SRP does B = kv + g^b mod n.
If it didn't, like in the challenge, then we can do an offline brute-force of
s, without knowing the password verifier v. We do (pseudocode):
```
# Known from eaves-dropping an example: salt, A, u, hmac
# Forced via MITM: b
for password in dictionary:
  x = H(salt + password)
  v = pow(g, x, n)
  s = pow(A * pow(v, u, n), b, n)
  if hmac(s, salt) == hmac: print("FOUND ", password)
```
So why does B = kv + g^b mod n help then?

First, we can't do the same attack because the HMAC that we'll receive is no
longer computed directly from g^b mod n on the client side, it is computed
on g^b - kv mod n (because it assumes that we added it to our B).

This page has relevant information about
this attack and the protocol. First, why is it safe to add kv to a public
parameter? We pick g such that it is a primitive root of GF(n), which
means that we can get any integer a such that 1 < a < n via
a = g^k mod n (they're distributed uniformly). In that way, doing
g^b + kv mod n does not leak kv, because an eaves-dropper doesn't know
g^b. Alright, but why not something else?

So we want to mix knowledge about the password (v) into B to protect
against offline bruteforce, but we have to be careful about it. If we were to
do e.g. v * g^b mod n (multiply), we could also attack it:
```
x = H(salt + password)
v = pow(g, x, n)
# Assuming we sent `B = g^b mod n` instead of `B = v * g^b mod n` during mitm.
# Client would do: `s = pow(B * invmod(v), a + u * x, n)`.
# s =     (B / v)^(a+ux) mod n
#   = (g^b / g^x)^(a+ux) mod n
#   =   (g^(b-x))^(a+ux) mod n
#   =   (g^(a+ux))^(b-x) mod n
#   =       (Av^u)^(b-x) mod n
s = pow(A * pow(v, u, n), (b - x) % n, n)
```
Note that we pretend that we are doing v * g^b mod n instead of
kv * g^b mod n, because otherwise we end up with a tricky term that we don't
know how to work with (we don't know a+ux):
(Av^u)^(b-x) * k^(-a - ux) mod n

To show that this attack would work if we did v * g^b mod n, I also
implemented that behind a boolean option to confirm that the math made sense.

If we were to use xor instead, we could do a "partition attack" by eliminating
possible passwords (if B = v xor g^b, then a guess v' is invalid if
B > n). If g is not a primitive root of GF(n), we can also do a
partition attack.

So modular addition (in the real SRP) appears to be a reasonable choice here.
39. Implement RSA

We implement egcd and modinv in mod.py.

We also implement is_prime in prime.py and random_prime.

To implement primality checking for big numbers (think >1024 bits), we
use a probabilistic method, Miller-Rabin.

First, probabilistic? It functions by testing for properties that are always
true for primer numbers, but only sometimes for composite numbers (with a test
that has a chance to filter out any possible composite number). For
Miller-Rabin, the probability of tagging a composite number as "probably
prime" is 1/4 per round. By doing 50 rounds, the probability of having a
composite number pass all 50 rounds (we call this a "strong pseudoprime") is
1/4^50, or ~= 10^(-30). To put this in perspective,
this link compares that probability to
the probability of a cosmic ray flipping the 1-bit result of a deterministic
test. In other words, probabilistic is fine.

The Miller-Rabin test works by starting from Fermat's little theorem:
For n prime, a^(n-1) = 1 (mod n). We can rewrite a^(n-1) as
a^(2^s * d), where d is odd (factor out powers of 2).

If n is prime, (mod n) is a field and x^2 = 1 (mod n) has only two
roots: x = 1 (mod n) and x = -1 (mod n). To prove that there are only two
roots, we can use
Euclid's lemma:
x^2 - 1 = (x + 1)(x - 1) = 0 (mod n). Then it follows that since n divides
(x + 1)(x - 1), it divides one of the factors.

So, from a^(n-1) = 1 (mod n), if n is prime, we can take square roots as
long as the result is 1, and we should get -1 eventually (or reach a^d = 1).

In other words, a prime number will have:
```
a^d = 1 (mod n)
  or 
a^(2^r * d) = -1 (mod n), for some 0 <= r < s
```
From the contrapositive, we can "witness" that n is composite if:
```
a^d != 1 (mod n)
  and
a^(2^r * d) != -1 (mod n), for all 0 <= r < s
```
It turns out that for a random a, the probability of n being a strong
pseudoprime is <= 1/4 (see
this). We can repeat
this for multiple random bases a.

As for the RSA implementation, it's under rsa.py.
40. Implement an E=3 RSA Broadcast attack

For this challenge, we'll make use of the
"Chinese Remainder Theorem".

As a side-note, I was interested in the origin of the name (i.e. why wasn't it
named after the person that found the theorem, like we've done for Euler,
Fermat, and countless others). On
mathoverflow, there is some mention of the
difficulties in knowing exactly where this problem first appeared, but a good
name for the theorem might be the
Sun Zi Theorem
(this reference also contains interesting history around the problem and its
origins). One of my favorite quotes while looking this up, from the last
paragraph of that last reference:

Euler, Lagrange and Gauss presented their achievements in indeterminate
analysis in the 18th century [...] at that time Europeans considered their
results in mathematics unique and very significant. They did not know that
they had been completely solved in the East at least several hundred years
earlier.

... There's something there. :)

Let's look at the Sun Zi theorem and
solution:
```
From the congruences:
  N = r_1  (mod m_1)
  N = r_2  (mod m_2)
  ...
  N = r_n  (mod m_n)

where m_1, m_2, ..., m_n mutually coprime.

From known r_i, m_i (1 <= i <= n), what is N?
```
The solution:
```
Let M = m_1 * m_2 * ... * m_n and M_i = M / m_i
Note that gcd(M_i, m_i) = 1.
Note that M_i = 0  (mod m_j) for i != j.

Let s_1, s_2, ..., s_n be the modular inverses of M_1, M_2, ..., M_n,
respectively.
I.e., s_i * M_i = 1  (mod m_i)
Note that s_i * r_i * M_i = r_i  (mod m_i).

Since M_i = 0  (mod m_j) for i != j, then r_i * M_i * s_i = 0  (mod m_j).

Then, the sum \sum_{i=1}^n {r_i * M_i * s_i} satisfies all the initial
congruences. This is because we'll get 0 terms for indices != i, with only the
term r_i left for a given modulus m_i.

Any number equal to the sum (mod M) will be a solution to the system of
congruences.
```
The code lies under mod.py.

In our case, we have the following examples:
```
plaintext^3 = ciphertext_1  (mod N_1)
plaintext^3 = ciphertext_2  (mod N_2)
plaintext^3 = ciphertext_3  (mod N_3)

With gcd(N_i, N_j) != 1 for i != j. If that were not the case, we could
trivially factor p, q for one of them and recover d and plaintext this way.
```
With the Sun Zi theorem, we can recover plaintext^3, and compute the cube
root.

To compute the integer cube root, I found the following links to be super
helpful to understand how we can apply Newton's Method and prove that the
number of iterations is O(lg lg n). Note that we could have done a dumb
algorithm where we do a binary search over [1, n/2], and that would have
been O(lg n) iterations, but it's interesting to see how to do it in a
smarter way!

Links: isqrt
iroot

The code lies under roots.py.

Note that all of this is not necessary if m is small enough to not even wrap
in the first place (m**3 < N, e.g. bits(m) * e < N). In that case, we can
directly take the cube root. To force m**e to wrap around N, we can add
static padding to m to make it close to bits(N), but then it is vulnerable
to the attack documented here.

Set 6: RSA and DSA

41. Implement unpadded message recovery oracle

This one is fairly straightforward. We captured c = p^e mod N and want to
recover p. By getting the decrypted c' = (s^e mod N) * c mod N, we can
recover p:
```
c' = (s^e mod N) * c  mod N
   = s^e * p^e        mod N
   = (s * p)^e        mod N

p' = decrypt(c')
   = s * p            mod N

=> p = s^(-1) * p'    mod N
```

42. Bleichenbacher's e=3 RSA Attack

This attack comes from the
notes
of Bleichenbacher.

PKCS v1.5 (RFC3447) encodes a
particular digest as:

00 01 FF ... FF 00 <digest_info>

digest_info is the following ASN.1 (DER-encoded):
SEQUENCE {
  SEQUENCE {
    OBJECT-IDENTIFIER (hash algorithm oid),
    NULL
  },
  OCTET-STRING (hash digest)
}

This is implemented in pkcs1_v1_5.py. A subset of ASN.1
DER is implemented under asn1.py.

The vulnerability in this challenge comes from extracting the digest_info
essentially via a regex \x00\x01\xff+\x00(.*), without ensuring that there
are enough ffs to cover the entire space (i.e. without making sure that
digest_info is right-justified). Because ASN.1 encodes the length of the
digest, this will accept a signature that encrypts as:

00 01 FF .. FF 00 <digest_info> <garbage>

Because signature validation implies doing signature^3 and checking the
digest, if we are able to find a value that cubes to something of the form
above, we can forge a signature.

We can do so like this:

digest = hashlib.sha1(b"hi mom").digest()
digest_info = pkcs1_v1_5.sha1_digest_info(digest)
target_digest_info = pkcs1_v1_5.signing_pad(digest_info, total_len=1024//8)
forged_padded = b"\x00\x01\xFF\x00" + target_digest_info
forged_padded += b"\xff" * (1024 - len(forged_padded))
forged_signature = iroot(int.from_bytes(forged_padded, "big"), 3)

... And it works! Note that I couldn't get this to work at first with SHA-256,
because there's not much space to find a valid cube root with a SHA-256 digest
and 1024 bits total size.

Out of curiosity, I also implemented the "by hand" approach from
Bleichenbacher
, with some personal notes to understand why it works:

Let's work with a 3072-bit key (more space to find a root).
Size of 00<digest_info> for SHA-1 is 36 bytes (288 bits).
We'll be producing 00 01 FF ... FF 00 <digest_info> <garbage>.

Reminder that (A-B)^3 = A^3 - 3(A^2)B + 3A(B^2) - B^3.
So if we can formulate our target as something that looks like that, we can
just use (A-B) as our forged signature.

Following Bleichenbacher's notes, we define:
D := 00 <digest_info>
N := 2^288 - D  (note 288 comes from size in bits of <digest_info>)
We assume that N = 0 (mod 3) (we'll have a division by 3 later).
We choose to place D 2072 bits over from the right (numerically, D * 2^2072).
Our padded version will look like:
00 01 FF ... FF <D> <GARBAGE>

To represent our "prefix" (00 01 FF ... FF) numerically (followed by zeros
since it's just the prefix), we can do:
2^(3072 - 15) - 2^(2072 + 288) = 2^3057 - 2^2360
  => '15' is the number of 0 bits in 00 01
  => '2072 + 288' is the number of bits for <D> <GARBAGE>
By doing one minus the other, we get the numerical value for having a series
of 1s in the positions where we want 01 FF ... FF.

Our padded numerical value is thus:
2^3057 - 2^2360 + D*2^2072 + garbage
This can be rewritten as:
2^3057 - N*2^2072 + garbage
The cube root of this is then 2^1019 - (N*2^34/3). That's our forged
signature.

To check that this works, let's cube it:
(2^1019 - (N * 2^34 / 3))^3  (note this is of the form (A-B)^3)
Reminder that (A-B)^3 = A^3 - 3(A^2)B + 3A(B^2) - B^3.
= (2^1019)^3 - 3*(2^1019)^2*(N*2^34/3) + 3*2^1019*(N*2^34/3)^2 - (N*2^34/3)^3
= 2^3057 - (3*2^2038*N*2^34/3) + (3*2^1019*N^2*2^68/9) - (N^3*2^102/27)
= 2^3057 - N*2^2072 + N^2*2^1087/3 - N^3*2^102/27
This fits the pattern:
2^3057 - N*2^2072 + garbage
So it works!

Now, there is a more interesting variant of this attack that we can look at:
What if we have a validator that checks that the ASN.1 decoding has no
left-over (i.e. the hash is right-justified), but not that the padding is a
sequence of FFs? In other words, what if the validator checks:

^00 01 [^00]+ 00 <digest_info>$

This attack broke python-rsa, as decribed in this
blog post.

We know how to find x such that x^3 has a given prefix, but what about a
given suffix? The blog post above gives an iterative algorithm if suffix is
odd, motivated by the observation that flipping the Nth bit (from the right)
in x causes the Nth bit in x^3 to flip, while leaving the bits 0 to N-1
unaffected. We can thus start from bit 0 and iteratively flip bits (as needed)
of x to find our target suffix. Once we have found the prefix and suffix of
our signature, we try random filler bytes in-between until we find a cube that
doesn't have 00s in the padding area.

This got me curious about how to solve the cube suffix generally for even
suffixes and how could we show that this algorithm works for any odd number,
so I wrote up about this in a separate
repository.
The tl;dr is that we can state this problem as:

Solve for x in x^3 = suffix (mod 2^bitlen(suffix))
  or, equivalently,
Find roots of f(x) = x^3 - suffix (mod 2^bitlen(suffix))
  or, more generally,
Solve for x in f(x) = x^3 - suffix = 0 (mod p^k), for prime 'p'.

We find that we can make use of
Hensel's Lemma to "lift" a
solution (mod p^k) to (mod p^(k+1)) when f'(x) != 0 (mod p). That
solution is unique and can be computed directly. We find that in our case, an
odd suffix implies f'(x) != 0 (mod 2), which explains why we can always lift
a solution to the next power of 2 (the N+1th bit). For even suffixes, we need
a recursive approach, described in more details in the linked repository, but
summarized here:

hensel_lift(f, p, k):
  if k = 1: return [x for x in range(p) if f(x) = 0 (mod p)]

  prev_roots := hensel_lift(f, p, k-1)
  new_roots  := []
  for r in prev_roots:
    if f'(r) != 0 (mod p):                     # Hensel's Lemma (simple root)
      s := (r - f(r) * f'(r)^(-1))) (mod p^k)  # Note f'(r)^(-1) is in (mod p)
      new_roots.append(s)
    elif f(r) = 0 (mod p^k):                   # If r+tp^(k-1) are all solutions
      for t in range(p):
        s := (r + tp^(k-1)) (mod p^k)
        new_roots.append(s)
  return new_roots

43. DSA key recovery from nonce

Code for DSA lies under dsa.py. We implement parameter
generation following FIPS 186-4 documentation.

We implement the key recovery as described in the challenge, derived here:
```
s = k^(-1) (h + xr) mod q
=> sk = h + xr mod q
=> sk - h = xr mod q
=> x = (sk - h) / r mod q
```
Then, we can test x by checking if g^x mod p equals y.

A lot of subtle bugs because of how we manually handle mod n operations, as
opposed to using a construct like Mod in Sage.

Regarding DSA's correctness (i.e. why does signature validation work), if we follow the notes on Wikipedia:
```
Note that g = h^((p-1)/q)         mod p.
This means that g^q = h^(p-1) = 1 mod p  (Fermat's little theorem)
With this and g > 0, q is prime => g must have order q.

The signer computes:
s = k^(-1) (H(m) + xr) mod q
Which we can rearrange:
k = H(m)s^(-1) + xrs^(-1) mod q
  = H(m)w + xrw           mod q
Now, since g has order q and k = H(m)w + xrw mod q:
g^k = g^(H(m)w) g^(xrw) mod p
    = g^(H(m)w) y^(rw)  mod p
    = g^u1 y^u2         mod p
    
And when verifying:
r = (g^k mod p)       mod q
  = (g^u1 y^u2 mod p) mod q
  = v
```
Regarding the FIPS 186-4 parameter generation procedure, it seems a bit cryptic at first. I found this answer helpful in understanding why it is done this way -- it is mainly to be able to verify the generation through a seed. The computation shifts and adds multiple hash outputs to generate a pseudorandom sequence, then subtracts a remainder to have it be p = 1 mod 2q (IIUC so that q divides p-1, and so that it is an odd number). Alternatively, we could have generated a sequence of random bits, set the top and bottom bits to 1, and done a similar trick to make p = 1 mod q.
44. DSA nonce recovery from repeated nonce

The formula comes from:
```
Assuming k was reused for (m1, s1) and (m2, s2):
(all mod q)
s1 = (h1 + x * r1) / k
s2 = (h2 + x * r2) / k
r1 = r2, since r = ((g^k) mod p) mod q

Then:
s1 - s2
=> k (s1 - s2) = h1 + xr - (h2 + xr)
=> k (s1 - s2) = h1 - h2
=> k = (h1 - h2) / (s1 - s2)
```
This also means that we just need to find a duplicate r in the signatures to know that k was reused. Then, we recover k with the formula above, and recover the private key with the attack from the previous challenge.

45. DSA parameter tampering

For this one, we assume that y was generated initially using some g, and that we overwrite the g used in signature/verification. Otherwise, y would be computed using our tampered g and be trivial to deduce.

For g = 0:

Assume we are signing "hello".
The signer does:
	r = (g^k mod p) mod q
	s = k^(-1) (H(m) + xr) mod q
So r=0.
And s has not dependence on the private key anymore!
	s = k^(-1) H(m) mod q
When verifying:
	v = (g^u1 * y^u2 mod p) mod q
g^u1 will give 0, so v will always be 0, so will allow any s we give, for any message.
E.g. (0, s) is valid for "hello", (0, s) is valid for "hi", (0, 424242) is valid for "hi".

We had to make changes to sign/verify to pass a flag that allows 0s to avoid checks, which means that this is somewhat dependent on the implementation not making checks on r or s.

For g = p + 1 = 1 mod p:

Assume we are signing "hello".
The signer does:
	r = (g^k mod p) mod q
	s = k^(-1) (H(m) + xr) mod q
So r = 1.
	s = k^(-1) (H(m) + xr) mod q
	=> sk = H(m) + x mod q
Still some dependence on unknowns x and k.

Note that verifying (r, s) would fail, because the g used for generating y
is not the same as what we are passing now, in our setup.

When verifying:
	v = (g^u1 * y^u2 mod p) mod q
	  = (y^u2 mod p) mod q
	u2 = r/s mod q
If we choose some fixed z value and compute:
	r = (y^z mod p) mod q
	s = r/z mod q
So in our setup:
	u2 = r/s mod q
	   = r / (r/z) mod q
	   = z mod q
Verification will do:
	v = (y^u2 mod p) mod q
	  = (y^z mod p) mod q
	  = r mod q
So we can craft a signature that looks somewhat normal and will pass verification!
E.g.
	z = 2**32 -5  # Arbitrary
	r = Zq(pow(y, z, p))
	s = r / z
Then we can sign any string.

46. RSA parity oracle

When we manipulate our ciphertext c = p^e mod n, we mess with the plaintext p:

2^e * c mod n
= 2^e * p^e mod n
= (2p)^e mod n
i.e. ciphertext for 2p

Through that, knowing that n is odd, being the product of two odd primes, we know that an even number going over the modulus once will produce an odd number 2p - n = even - odd = odd. We can start with a range for our plaintext value of [0, n) and do log2(n) steps to narrow it down to our exact plaintext value.

So, if we look at a small example:

p, q = 3, 5
n = p * q  # 15
phi = (p-1) * (q-1)
e = 3
d = pow(e, -1, phi)  # Nice Python 3.8 feature :)
pt = 2  # what we want to recover
ct = pow(pt, e, n)  # what we're given, 8
oracle = lambda c: pow(c, d, n) % 2 == 0

# 4 iterations, since n.bit_length() == 4
# 0 <= pt < 15
# Double our plaintext through our ciphertext
ct = (ct * pow(2, e, n)) % n  # 2pt, encrypted
print(oracle(ct))  # True, did not wrap. I.e. 2pt < 15 (that is, 4 < 15)

# 0 <= pt < 15/2
ct = (ct * pow(2, e, n)) % n  # 4pt, encrypted
print(oracle(ct))  # True, did not wrap. I.e. 4pt < 15 (that is, 8 < 15)

# 0 <= p < 15/4
ct = (ct * pow(2, e, n)) % n  # 8pt, encrypted
print(oracle(ct))  # False, wrapped. I.e. 8pt >= 15 (that is, 16 >= 15)

# 15/8 <= pt < 15/4
ct = (ct * pow(2, e, n)) % n  # 2(8pt-n), encrypted
print(oracle(ct))  # True, did not wrap. I.e. 2(8pt-n) < 15 (that is, 2 < 15)

16pt - 2n < n
16pt < 3n

# We end up with 15/8 <= pt < 3*15/16, i.e. 1.875 <= pt < 2.8125, i.e. pt = 2

One thing that was surprising to me was that when I tried to implement it similar to a regular binary search, I was able to decrypt all the plaintext except for the last byte. E.g. with this approach:

lower, upper = 0, n
for _ in range(n.bit_length()):
    c = (c * 2**e) % n
    mid = (upper + lower) // 2
    if parity_oracle_fn(c):
        upper = mid
    else:
        lower = mid
    print(byteops.int_to_bytes(upper))  # !! NOTE: Does not recover the last byte !!

I would get an incorrect last byte (e.g. non-printable ascii). and not always the same.

Looking around, I found that a github project called Crypton had the same problem. Looking around for other solutions to this challenge, I found this repository that solved the problem by keeping track of numerators/denominators instead of using divisions in the loop. This makes sense, we can end up with bounds like 3/4 N, and truncating multiple times along the way will create slight inaccuracies. So we can instead implement the attack like so (available in rsa.py):

lower, upper = 0, 1
denominator = 1
for _ in range(n.bit_length()):
	c = (c * 2**e) % n
    delta = upper - lower
    lower *= 2
    upper *= 2
    denominator *= 2
    if parity_oracle_fn(c):
        upper -= delta
    else:
        lower += delta
    plaintext = n * upper // denominator
    print(byteops.int_to_bytes(plaintext))

While we're at it, also sent a PR to Crypton.

47. Bleichenbacher's PKCS 1.5 Padding Oracle (Simple Case)

48. Bleichenbacher's PKCS 1.5 Padding Oracle (Complete Case)

Had to take a look at other implementations of this attack to iron out some mis-readings on my part of the formulas and exact meaning of the union of intervals in step 3, but this works out to a relatively concise algorithm! We add PKCS #1 encryption padding to pkcs1_v1_5.py and change our RSA decryption to make use of the CRT optimization, because this does involve running sometimes millions of oracle decryptions. We also implement a ceil_div function, following this nice trick. Even with a 256-bit modulus, I would still end up often getting more than one interval, so had to implement all steps to really test things out. Challenge 48 is then the same problem, but with a larger modulus.

Following the paper, we end up with something along the lines of:

def oracle_s(s):
    """Test out an `s` value through our PKCS#1 oracle."""
    return oracle((c * pow(s, e, n)) % n)

# Step 1: Blinding. Nothing to do for us, we have a known 'c'.
B = 2**(8 * (k - 2))
M = {(2*B, 3*B-1)}

# Step 2: Search for PKCS-conforming messages.
# Step 2.a: Starting the search.
s = next(s1 for s1 in range(ceil_div(n, 3*B), n) if oracle_s(s1))

while len(M) > 1 or next(iter(M))[0] != next(iter(M))[1]:
    if len(M) > 1:
        # Step 2.b: Searching with more than one interval left.
        s = next(si for si in range(s+1, n) if oracle_s(si))
    else:
        # Step 2.c: Searching with one interval left.
        a, b = next(iter(M))
        r = ceil_div(2 * (b*s - 2*B), n)
        found = False
        while not found:
            s_min = ceil_div(2*B + r*n, b)
            s_max = ceil_div(3*B + r*n, a)
            for new_s in range(s_min, s_max):
                if oracle_s(new_s):
                    found = True
                    break
            r += 1
        s = new_s
    # Step 3: Narrowing the set of solutions.
    new_M = set()
    for a, b in M:
        r_min = ceil_div(a*s-3*B+1, n)
        r_max = ceil_div(b*s-2*B, n)
        for r in range(r_min, r_max+1):
            interval_min = max(a, ceil_div(2*B+r*n, s))
            interval_max = min(b, (3*B-1+r*n) // s)
            if interval_min <= interval_max:
                new_M.add((interval_min, interval_max))
    M = new_M
a, _ = next(iter(M))
m = a  # because s0 == 1

I found this resource invaluable to understand the steps of the attack and possible follow-up improvements to lower the median number of oracle calls required to decrypt.

Set 7: Hashes

49. CBC-MAC Message Forgery

It took me some time to understand the architecture described in the challenge, originally I was trying with a single server and was confused how the key management was done. I later realized that I should treat one server as the "web server", that will sign transactions after authenticating that the user owns the account, and an "API" that will apply transactions with valid MACs. Both servers share a secret key. We are in a position where we can MITM between the two servers.

For the first variant (controlled IV), we can manipulate the first block by doing the same bitflips to the IV as we are doing to the plaintext. We then make a transaction against ourselves (e.g. from=2&to=2&amount=1), manipulate it to make it come from Alice, then repeat each time, increasing the amount to our current balance (so that the server accepts our initial useful transaction by seeing that we have the funds).
```
# Our transaction: from=2&to=2&amount=1234...
#  (block 0)       ||||||||||||||||
# We want:         from=0&to=2&amount=1234
msg, iv, mac = parse(my_transaction)
block0 = msg[:16]
target = b"from=0&to=2&amou"
xors = xor_bytes(block0, target)
iv = xor_bytes(iv, xors)
payload = target + msg[16:] + iv + mac
```
For the second variant (fixed IV), we can append a full message, as long as it's fine for our first appended block to get scrambled. This is because we are in a similar situation where we know the "IV" at a certain stage during decryption (i.e. Alice's MAC is the decryption of the last block of her padded message), we can do similar to the previous attack. but now using a plaintext to make sure the xored value comes off to the same first block as our appended message. With this, we know our final MAC will be the same as the MAC of our appended message. Now, we intercept a message from Alice, create a message with two transactions to ourselves (to have things align well), append our message to Alice's to have her also send cash to us, and repeat until we're rich.
```
alice_msg, alice_mac = parse(intercepted)
# We are sending: from=2&tx_list=2:0;2:1000
#  (block 0)      ||||||||||||||||
# Appending to Alice with a scrambled first block (after xors to match fixed IV), we'll get:
#    <Alice's msg><scrambled data>:0;2:1000
# We're assuming a lenient server that will accept the scrambled data and ignore that one.
my_msg, my_mac = parse(my_transaction)
block0 = my_msg[:16]
# Note: the fixed IV is 0.
block0_fixed = xor_bytes(block0, alice_mac)  # block0_fixed will give block0 when xored in CBC.
# Note: alice_mac was computed based on pad(alice_msg) behind-the-scenes! Do the same here.
payload = pad(alice_msg) + block0_fixed + my_msg[16:] + my_mac
```
Remembering to pad Alice's message prior to appending our message turns out to be important.

In a way, the nature of our protocol allowed us to attack it in such a way. Ideas for improvements:
- (not sufficient) More stringent validation of format, deny all transactions if we fail to parse one in the list.
- User-derived keys as opposed to a fixed key.
- Include the length of the message at the start, as part of the protocol (note: at the end is not sufficient).
- Reading on Wikipedia, we also learn about another approach: re-encrypt the last block, using a separate key.

50. Hashing with CBC-MAC

All that we care about to produce target_hash is that our last block's pre-encryption value looks like decrypt_block(key, target_hash). With the known key, we can craft a Javascript message with strategic middle bytes to push our last block's pre-encryption bytes to fit:

assert cbc_mac_hash(b"alert('MZA who was that?');\n") == target_hash
target_pre = aes_decrypt_block(key, target_hash)  # What we want to dupe the hash.

payload = b"alert('Ayo, the Wu is back!');/*"  # Open comments, we'll add noise to get there.
payload += b"A" * (-len(payload) % 16)  # Pad to block size for convenience

# Figure out what `iv` will be at this stage in CBC (the last decrypted block)
iv = b"\x00" * 16
for block in get_blocks(payload):
    block = xor_bytes(iv, block)
    iv = aes_encrypt_block(key, block)

# We want our last block to look like:
# <...>*/\x01  (close comment & \x01 for implicit CBC padding)
ending = b"BBBBBBBBBBBBB*/"  # Leave room for \x01 byte
ending_padded = ending + b"\x01"

# Craft middle bytes to produce our target_pre bytes at the end.
middle_encrypted = xor_bytes(ending_padded, target_pre)
middle = aes_decrypt_block(key, middle_encrypted)
middle = xor_bytes(iv, middle)

crafted += middle + ending
assert cbc_mac_hash(crafted) == target_hash

51. Compression Ratio Side-Channel Attacks

The first case is relatively simple -- try every char, remember the ones that produced the lower oracle length (so more compressed). Here it's possible that we have more than one candidate, since the compressions operates at the bit level and we are getting byte-level information, it can happen that our guess doesn't cross a byte boundary. To overcome this, when we have multiple candidates, we try each of them as if they were now part of our known prefix and see which one gave us the shorter next-step of our next iteration. This is a bit wasteful in terms of oracle calls, but gives a relatively concise algorithm where we can reuse a function for "give me all next-letter candidates with this prefix".

The block encryption case is trickier, because due to block-padding during encryption, we need to be at a block boundary to get informative oracle lengths. I approached this one by a simple retry approach of the previous algorithm: if we failed to fully deduce the next byte (even after trying each candidates at the next stage), retry with extra padding before our known prefix. We add a random byte to make our padding less likely to be compressible. Eventually, this will lead our next byte to fall on a block boundary, where we can do our previous attack successfully. Again, this is wasteful in terms of oracle calls (we could maybe figure out where we are in a block, then reusing that information to pad for the next bytes to guess), but keeps the code simple while still being functional.

def decrypt_compression(oracle_fn, known_prefix, done_fn, block_based=False):
    decrypted = bytearray()
    padding = b""  # For block-based decryption.
    while not done_fn(decrypted):
        prefix = padding + known_prefix + decrypted
        candidates, _ = next_char_candidates(oracle_fn, prefix)
        if len(candidates) > 1:
            # Didn't cross a byte boundary, perhaps. Try each one on the next step
            # to narrow down our candidates to one.
            candidates = narrow_candidates(candidates, oracle_fn, prefix)
            if len(candidates) > 1:
                assert block_based, "For byte-based decryption, this should have worked."
                # Perhaps we didn't cross a block boundary. Try again, with extra
                # random padding.
                rand_byte = random_number(below=128)  # ascii char
                padding += byte([rand_byte])
                continue
        decrypted.append(ord(candidates[0]))
    return bytes(decrypted)

def next_char_candidates(oracle_fn, prefix):
    alphabet = string.printable
    letters_lens = {letter: oracle_fn(prefix + letter.encode("ascii")) for letter in alphabet}
    best_len = min(letters_lens.values())
    candidates = [letter for letter, len_ in letters_lens.items()
                  if len_ == best_len]
    return candidates, best_len

def narrow_candidates(candidates, oracle_fn, prefix):
    next_candidates = {
        letter: next_char_candidates(oracle_fn, prefix + letter.encode("ascii"))
        for letter in candidates
    }
    best_len = min(oracle_len for _, (_, oracle_len) in next_candidates.items())
    best_candidates = [letter for letter, (_, oracle_len) in next_candidates.items()
                       if oracle_len == best_len]
    return best_candidates


# Usage:
ends_in_newline = lambda text: text.endswith(b"\n")
print(decrypt_compression(oracle_stream_cipher, known_prefix=b"sessionid=",
                          done_fn=ends_in_newline))
print(decrypt_compression(oracle_block_cipher, known_prefix=b"sessionid=",
                          done_fn=ends_in_newline, block_based=True))

52. Iterated Hash Function Multicollisions

We can write a generic implementation of collision generation directly as part of our merkle_damgard.py class. In particular, we can turn this in to a CollisionGeneratorclass to make it easier to do the second part, where we have a chance of not finding a g collision with b2/2 f collisions. By having it as a class that remembers its state, we can just do the extra work of generating twice as many collisions by finding one more block, as opposed to starting over with n+1. This ends up looking like:

class CollisionGenerator:
    """Helper class to iteratively generator more collisions of a MD hash."""
    def __init__(self, hash_cls: merkle_damgard.Hash):
        self.hash_cls = hash_cls
        # Sequential pairs of blocks that give a collision under 'hash_cls'.
        # With 'n' pairs of colliding blocks, we can generate 2**n collisions.
        self.colliding_blocks = []
        self.current_state = hash_cls()._state  # start from default internal state
        self.n = 0  # Number of steps that we ran (2**n collisions).
        self.num_calls = 0  # Total calls made to the hash function.
    
    def next(self):
        """Finds the next colliding block, doubling our total collisions."""
        state_to_block = {}
        while True:
            block = random_helper.random_bytes(self.hash_cls.BLOCK_SIZE)
            state = tuple(self.current_state)
            # Note: we could do a full hash from the total sequence of blocks so far,
            # but it's equivalent to just focus on the current's block processing.
            state = self.hash_cls.process_chunk(block, state)
            self.num_calls += 1
            if state in state_to_block and block != state_to_block[state]:
                # New collision!
                self.colliding_blocks.append((block, state_to_block[state]))
                break
            state_to_block[state] = block
        self.current_state = state
        self.n += 1
        
    def num_collisions(self):
        return 2**self.n
    
    def all_collisions(self):
        return (b"".join(blocks) for blocks in itertools.product(*self.colliding_blocks))

I chose to implement f's compression function as an AES call using h (16-bits) left-padded with zeros as key and 2-byte blocks to be left-padded with zeros, using the first 2 bytes of the output as our output. g was implemented similarly, but with h twice as big, taking 4 bytes from the output. The collision finding code for h then looks like:

b2 = 32
f_generator = merkle_damgard.CollisionGenerator(CheapHash)
for _ in range(b2//2):
    f_generator.next()
done = False
while not done:
    g_hash_to_block = {}
    for x in f_generator.all_collisions():
        g_hash = g(x)
        if g_hash in g_hash_to_block and x != g_hash_to_block[g_hash]:
            y = g_hash_to_block[g_hash]
            done = True
            break
        g_hash_to_block[g_hash] = x
    if not done:
        f_generator.next()
assert f(x) == f(y)
assert g(x) == g(y)
assert h(x) == h(y)

53. Kelsey and Schneier's Expandable Messages

Building expandable messages gives us k choices, each path leading to the same final state -- either we use a single block, or 2**i+1 blocks (i goes over [0, k]). What this allows us to do is to generate any message with length n blocks, as long as k <= n <= k + 2**k - 1. The way we do this is first by noticing that each step we either add 1, or 1 plus a power of 2. That means that if we look at the binary representation of n - k, it tells us whether we should take the longer block (to get +2**i) or a shorter one (to leave that bit as 0).

We can build a helper class to create expandable messages:

class ExpandableMessages:
def __init__(self, hash_cls: merkle_damgard.Hash, k: int):
    self.k = k
    self.hash_cls = hash_cls
    self.short_blocks = []  # When we want +1
    self.long_blocks = []  # When we want +2**i+1
    
    state = hash_cls().state()  # Initial state
    for i in reversed(range(k)):
        long = bytearray()
        long_state = state
        for _ in range(2**i):  # The 2**i blocks before our 2**i+1th
            block = random_helper.random_bytes(self.hash_cls.BLOCK_SIZE)
            long_state = hash_cls.process_chunk(block, long_state)
            long.extend(block)
        state, short_block, long_block = self._block_collision(state, long_state)
        
        long.extend(long_block)
        self.short_blocks.append(short_block)
        self.long_blocks.append(bytes(long))
    self.final_state = state

def _block_collision(self, left_state, right_state):
    """Returns (colliding_state, left_block, right_block)."""
    left_seen = {}
    right_seen = {}
    while True:
        block = random_helper.random_bytes(self.hash_cls.BLOCK_SIZE)
        left_hash = self.hash_cls.process_chunk(block, left_state)
        if left_hash in right_seen:
            return left_hash, block, right_seen[left_hash]
        left_seen[left_hash] = block
        right_hash = self.hash_cls.process_chunk(block, right_state)
        if right_hash in left_seen:
            return right_hash, left_seen[right_hash], block
        right_seen[right_hash] = block

def expand_to(self, n):
    """Generate n blocks (in [k, k+2**k-1]) that produce 'final_state'."""
    assert self.k <= n <= self.k + 2**self.k - 1
    message = bytearray()
    binary = bin(n - self.k)[2:].zfill(self.k)
    for i, bit in enumerate(binary):
        if bit == "0":
            message.extend(self.short_blocks[i])
        else:
            message.extend(self.long_blocks[i])
    return bytes(message)

With this, we can create a 2nd preimage collision for a message that has 2**k blocks:

def second_preimage_collision(hash_cls, msg):
    """Find m s.t. H(m) = H(msg), |msg| must have 2**k blocks, int k."""
    k = round(math.log2(len(msg) / hash_cls.BLOCK_SIZE))
    assert 2**k * hash_cls.BLOCK_SIZE == len(msg)
    expandable = ExpandableMessages(hash_cls, k)
    intermediate = {}
    state = hash_cls().state()  # Initial state.
    for i in range(2**k):
        block = msg[i * hash_cls.BLOCK_SIZE:(i+1) * hash_cls.BLOCK_SIZE]
        if i > k:  # Need at least k+1 for prefix+bridge.
        	intermediate[state] = i
        state = hash_cls.process_chunk(block, state)
    bridge_state = None
    while bridge_state not in intermediate:
        bridge = random_helper.random_bytes(hash_cls.BLOCK_SIZE)
        bridge_state = hash_cls.process_chunk(bridge, expandable.final_state)
    suffix_idx = intermediate[bridge_state]
    suffix = msg[suffix_idx * hash_cls.BLOCK_SIZE:]
    prefix_len = len(msg) - len(bridge) - len(suffix)
    assert prefix_len % hash_cls.BLOCK_SIZE == 0
    prefix = expandable.expand_to(prefix_len // hash_cls.BLOCK_SIZE)
    collision = prefix + bridge + suffix
    assert len(collision) == len(msg)
    return collision

It is not obvious to me how one would deal with messages that don't have exactly 2**k blocks, but this is a very neat attack nonetheless.

54. Kelsey and Kohno's Nostradamus Attack

This one is relatively straightforward -- we create a funnel starting from 2**k states, colliding pairs at each stage to eventually get a single final state.

We can create a helper class that handles all of it:

class NostradamusGenerator:
    def __init__(self, hash_cls, k, msg_len):
        """Precompute 2**k states to collide into known final digest, for |msg|."""
        assert msg_len % hash_cls.BLOCK_SIZE == 0
        self.k = k
        self.msg_len = msg_len
        self.funnel = []  # k elements, each a map from state to block
        self.hash_cls = hash_cls
        states = set()
        while len(states) < 2**k:
            block = hash_cls.random_block()  # Helper function, like above.
            states.add(hash_cls.process_chunk(block, hash_cls.init_state()))
        states = list(states)
        for _ in range(k):
            new_states = []
            state_to_block = {}
            for i in range(0, len(states), 2):
                left, right = states[i:i+2]
                new_state, left_block, right_block = _block_collision_parallel(
                	hash_cls, left, right)  # Find collision from 2 start states.
                new_states.append(new_state)
                states_to_block[left] = left_block
                states_to_block[right] = right_block
            self.funnel.append(state_to_block)
            states = new_states
        assert len(states) == 1
        final_state = next(iter(states))
        padding = hash_cls.length_padding(msg_len)
        # Helper method to process multiple blocks in a row.
        _, digest_state = hash_cls.process_blocks(padding, final_state)
        self.digest = hash_cls.state_to_digest(digest_state)
    
    def get_message(self, prefix, pad_char):
        """Produces msg m with given prefix s.t. H(m) = self.digest."""
        glue_len = (self.k + 1) * self.hash_cls.BLOCK_SIZE
        assert len(prefix) + glue_len <= self.msg_len
        pad = pad_char * (self.msg_len - glue_len - len(prefix))
        prefix += pad.encode("ascii")
        assert len(prefix) + glue_len == self.msg_len
        return prefix + self._get_glue(prefix)
    
    def _get_glue(self, prefix):
        assert len(prefix) % self.hash_cls.BLOCK_SIZE == 0
        _, state = self.hash_cls.process_blocks(prefix,
                                                self.hash_cls.init_state())
        leaves = self.funnel[0]
        state, bridge = _block_collision_into(self.hash_cls, state, leaves)
        glue = bytearray()
        glue.extend(bridge)
        for state_to_block in self.funnel:
            assert state in state_to_block
            block = state_to_block[state]
            glue.extend(block)
            state = self.hash_cls.process_chunk(block, state)
        assert len(glue) == (self.k + 1) * self.hash_cls.BLOCK_SIZE
        return glue

With that, we can now show off!

msg_len = 5000  # Roughly. We'll pad to it.
b = MyHash.state_size() * 8
generator = NostradamusGenerator(MyHash, k=b//2, msg_len=msg_len)
print("I am Nostradamus. I know the baseball future. Here is my proof:")
print(generator.digest.hex())

scores = baseball_season()  # Wait...

print("Ah. Just like I predicted! My prediction was...")
prediction_prefix = scores.encode("ascii") + b"\n\nMy secret notes (ignore):\n"
prediction = generator.get_message(prediction_prefix, pad_char=" ")
print(prediction)
print(f"Hash: {MyHash().update(prediction).digest().hex()}")
assert generator.digest == MyHash().update(prediction).digest()

Here again I'm not sure how one would go around removing the restriction on msg_len (to be a block size multiple in this case), since all our precomputations assume that we are at a block boundary, but presumably this wouldn't often be an issue in practice since we are already adding some padding, let's assume we can add a bit more to reach a block size multiple.

55. MD4 Collisions

TODO: challenge

JesseEmond/matasano-cryptopals