"need the sequel to babyfeistel pls"
nc juniorfeistel.challs.open.ecsc2024.it 38014
In this challenge we are tasked with performing a chosen-plaintext key-recovery attack on a custom 64-bit block cipher. The attack has a data limit of 7 million plaintext-ciphertext pairs and a time limit of 10 minutes. The custom block cipher is a Feistel network that uses modular addition instead of XOR to add the branches together. Additionally, the round function is non-invertible. A single round (out of 10) looks as follows:
The round function
where
The key schedule that generates the round keys
In the rest of this write-up I will document my key recovery attack. It follows the standard 3-step framework of
In the first section I will discuss the process of finding the distinguisher, which in this case will be through differential cryptanalysis. In the second section I will discuss how we can get the most out of the data limit by using input structures to combine multiple differential distinguishers. The third section will be about the key recovery itself. Finally, I will finish with some thoughts on the challenge.
The easiest (to implement) attack avenues for a block cipher are probably SMT modelling, MitM attacks or integral attacks, since all of these attacks are deterministic. However, due to the relatively large number of rounds, the complexity of the round function, the use of modular addition and the low data limit, non of these avenues are viable. Given the modular addition and rotations, it would be cool to use rotational(-xor) cryptanalysis, alas, due to the key addition, this is only applicable in the related-key setting. The next best things are probably differential or linear cryptanalysis. Lets focus on the former.
The idea behind differential cryptanalysis is to propagate a pair of values with a know difference through the block cipher and evaluate the probability that you see another predetermined difference at the output. The reason that this is so useful, is because a difference deterministically propagates through the principal component of any block cipher: the key addition. With addition we mean the group operation of an Abelian group, think bit-vectors with xor or addition modulo an integer. Given a known difference
In general, only linear functions will have deterministic difference propagation. For example, the copy function
To approximate the probability of a difference propagation through a complex function, like a block cipher, we can start from a chosen input difference an propagate it forwards through every component of the cipher. The input and output difference, together with all the intermediate differences are called a differential characteristic. The product of the probabilities of all the transition in the characteristic is then, under Markov assumptions, a lower bound on the key-averaged probability of the differential over the whole cipher. I'll discuss what that means in the last section of this write-up, but in non-extreme cases (such as ours), it means that the probability of the characteristic is a reasonable approximation of the actual probability of the differential.
Before we continue to the search for a trail, in the rest of this write-up we will be using differential cryptanalysis with the group
If the round function of a Feistel network is non-invertible there is a good repeatable 2-round meta-characteristic that allows us to skip every second round function. Instantiating it with a specific difference should give us a reasonably high probability differential.
To instantiate this characteristic, we have to look for a difference such that the propagation of
Ordinarily, differential cryptanalysis is performed over the group
We can apply a similar trick to differential cryptanalysis over the group we are working in. Even more so, we can construct an input structure with a single difference, because unlike in the bit vector case, most group elements are not their own inverse. That is, we can use pairs of the form
Don't get too distracted by the figure though, while summoning RGB demons can be fun, cryptanalysis is even more fun (and less dangerous). So... what do these input structures mean for our differences? Well, the additive order of
To mount the attack we first query all the input structures. Since the probability of each of the three differential characteristics is somewhat below
In this next step we try to recover the last round key. We do this by partially decrypting the filtered output pairs for each possible guess of the round key. Then we keep track of how many times a right pair was detected with each key. The key with the highest number of detected right pairs is with high probability the correct last round key. If there are multiple keys with the same number of detected right pairs, we'll have to test each of them in the next round.
With the last round key in hand, we can guess the second-to-last round and compute the key-schedule backwards to recover a possible master key. For each of the possible master keys we can then check, with multiple plaintexts, which master key encrypts to the right ciphertexts. Et voila, we have the master key and can collect our flag.
You can find an implementation of the attack in the code
directory. However, I'm not done here, as an extra challenge I wanted to see how low I could get the data.
For this bonus challenge, I wanted to see how much I can reduce the data without increasing the amount of brute-forcing necessary. I first checked if I didn't miss any other good differentials over the round function, so I made the search code a bit faster. The search turns up the following 8 differences with very similar probabilities:
Difference | ||||||||
---|---|---|---|---|---|---|---|---|
Probability | 0.0296 | 0.0296 | 0.0296 | 0.0296 | 0.0282 | 0.0282 | 0.0282 | 0.0296 |
This is a lot more than what I found earlier. The common multiple between all these differences is
Another thing to check is to see if specific values for
A final insight that I didn't have during the CTF is that the filter on the output differences is so strong that for the low amount of data we have, practically all pairs that survive the filtering process are guaranteed to be right pairs for the nine round distinguisher. So instead of decrypting all of the filtered pairs for every possible last round key, it is possible to filter the key in the same way as we did the second to last round-key. In fact, for this, a single right pair should suffice.
Alas, it seems that a single right pair gives rise to a large number of possible last round keys. I checked how many key candidates are left on average when filtering with 1 to 10 right pairs. There might be a high variance on these numbers, as I only took 10 samples for each measurement.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
---|---|---|---|---|---|---|---|---|---|---|
5.3e7 | 1.3e6 | 462 | 200 | 36 | 6 | 7 | 5 | 4 | 2 |
Clearly there are a lot of possible key candidates when using only a single right pair to filter. Around 50 million of them on average. It looks like even 10 right pairs isn't quite enough to rule out all but one key. But a good balance between the number of key candidates and number of right pairs seems to be
As a final optimization, to prevent going over the time limit of the challenge, I will simply restart the challenge if there aren't at least 6 right pairs. You can find the implementation in the code
directory.
All in all, this extra push for lower data complexity didn't lead to a grand improvement. We did learn however that the round function of this cipher is really really weird. The cipher would probably be even more broken if the roles of the key and the input where switched in the round function, as there are a large number of collisions on the key.
This was a really cool challenge. It is the first time I applied differential cryptanalysis in another group than The product of the probabilities of all the transition in the characteristic is then, under Markov assumptions, a lower bound on the key-averaged probability of the transition from input to output difference over the whole cipher.
, which suggest that the process I described to find a good differential property isn't quite the whole story.
Clustering is the phenomenon where multiple differential characteristic with the same input and output difference have a significant effect on the probability of a single differential. You can simply sum the probabilities of these characteristic together to get a better lower bound on the key-averaged probability of the differential. This phenomenon is probably what happens in the original characteristic I found, although I think there is probably a better explanation about switching between groups during the analysis. However this would probably lead us beyond the state of the art.
On another note, I don't really like the name clustering, it suggests that this is some kind of special phenomenon that doesn't happen often. However, it always happens and to get the exact key-averaged probability of a differential, it is necessary to sum all characteristics together. Approximating this sum with one or a small number of characteristics is called a dominant trail approximation. I much prefer this term be used, as it makes clear that we are actually only approximating probabilities. Note also that I have been using the term key-averaged probability. This means something slightly different than you would expect. Lets get to that.
The Markov (cipher) assumption is an assumption that is implicitly used in almost every application of differential cryptanalysis. Its name references Markov processes, which are sequences of events in which the probability of an event is only dependent on the previous event that occurred. In terms of differential cryptanalysis, by taking the product of the probabilities of transitions through individual components to get the probability of a characteristic, we are implicitly assuming that each transition happens independent of each other. In reality this is not the case, the fact that one transition happened influences the probabilities of the following transitions. To truly get independent transitions, every component should have a uniform random and statistically independent key addition before and after it. This is of course also not the case, since block ciphers use key schedules to expand a relatively small master key into round keys. The key-averaged probability from earlier actually refers to these independent masking keys. If you use the key-averaged probability you are actually using a second assumption. That a random key behaves similar to the average. This is the hypothesis of stochastic equivalence.
To remove the Markov assumption from analysis, it is necessary to track the conditions on the input and output values of the pairs through the whole cipher, for each transition and through each component. There is a systematic way to do this called quasi-differential cryptanalysis. This work actually shows that it is common that the probability of a characteristics for a fixed key is different from the key-averaged probability. There are often large classes of keys for which the probability is significantly smaller, but there are also equally large classes of keys for which the probability is that much higher.
All in all, while the theoretic foundations of differential are quite shaky, you can't deny that it works. At least now you know where to go look if you ever encounter weird phenomena. Before I go, I want to thank the author of this fun challenge. The insights I acquired might actually be useful for a research project I'm planning to do. And also congratulations to you, the reader, for getting al the way to the end, I hope you learned something new.
Michiel out!