# About GOST's code, Grasshopper, its SBox and lost seeds

We recently returned from the EuroCrypt 2019 conference, where we met extremely smart people and at the same time learned new, extremely offensive facts about GOST SBox.

So, this is the second approach to the projectile. Corrected and supplemented.

This time there will be no incomprehensible red-blue slides, but there will be original documents from the ISO committee with explanations from the authors of Grasshopper.

And even the challenge at the end!

Go.

First educational program. In the previous article it was not, this time I correct.

## Chosen Plaintext Attack (CPA)

In this model, the attacker knows everything about the cryptosystem except the encryption key. He can create any plaintext, get the corresponding ciphertexts and his task is to calculate the key, because this is the only variable.

The block cipher in this situation can be considered as a pseudo-random substitution. A function that matches a plaintext block to a ciphertext block in such a way that it is impossible to determine the relationship between them.

An ideal block cipher can be imagined as a large table, where horizontally there will be all possible keys from 000 ... 000 to 111 ... 111, vertically - all possible open texts also from 000 ... 000 to 111 ... 111 , and at the place of their intersection - randomly generated ciphertexts that would uniquely associate a pair of "key - plaintext". To create such a table in real life is not possible because of its size, so it is emulated using various block encryption algorithms.

The attack on the block cipher can be called successful if we can determine the “randomness” with which the algorithm selects ciphertexts that correspond to plaintexts. This randomness allows in the worst cases to calculate the encryption key.

## (Non) linearity

The encryption process in a block cipher can be represented by the simple formula
C = M x K
where C is ciphertext, M is plaintext, K is the encryption key, and x is the block cipher.

This formula is visually similar to the school formula of the linear equation y = kx + b, the graph of which is a straight line.

Any straight line can be restored in just two points. And at the same time, we really do not want that in two pairs of plaintext - ciphertext, you could restore the encryption key. For this, special layers are added to the encryption algorithms that are responsible for non- linearity. These layers are designed to prevent the possibility of calculating the relationship between plaintext, ciphertext, and the key.

Their quality is critical to the security of the algorithm.

## What is SBox?

This is the same nonlinear layer. A function that, in the case of Grasshopper and some other ciphers, uniquely maps one byte to another byte.

Very often it is set by a simple correspondence table, for example this:

Because otherwise it cannot be described. At first sight.

## Why is SBox important?

Because it is the only non-linear function in the entire cipher. Without it, breaking a cipher will not be easy, but very simple, presenting it as a system of linear equations. Therefore, the substitution function has so much attention. There are even practical exercises for hacking AES with linear SBox.

## Why can't you create one safe SBox at all?

The problem is that cryptography is not an exact science. The only provably strong encryption algorithm is a one-time pad . All the rest is timid attempts to fit into the range of knowledge available to us, the set of which is not so great.
We do not know for sure whether RSA or AES or elliptic curves are resistant, but we know that such and such things are definitely not possible . All in between is creativity.

Hence the constant paranoia about the various “magic constants” and other points that the authors of the algorithms present as safe, but cannot prove it .

## How to create SBox?

The various SBox variants are 256 !, which is approximately 2 1684 . The choice is large and over the years of cryptanalysis, metrics and characteristics have been developed that SBox should have, resistant to today's known attacks. Of course, the creators of the tables think for years to come and try to create substitutions that would be resistant even to attacks invented in 5-10 years. But this is more from the category of magic and shamanism.

There are two fundamentally different approaches to creating SBox.

The first is a random search. Developers generate random tables, look at their characteristics and filter out those that are not suitable. This continues until the developers are satisfied with what they have found.

In the civilized world, this happens, for example, as follows:

1. Some initial value is taken, for example, a quote from a book or the first few digits of Pi
2. Runs through a hash
3. The hash result is used as data to form the SBox.
4. If SBox did not fit, take the current hash and return to step 2

So anyone can reproduce this process and make sure that at least the minimum requirements for pseudo-random search are met.

Do you know where the seeds from the main symmetric algorithm of the country go? Lost ! I thought they didn’t specifically tell them out, the secret is there or something, but Russian colleagues at EuroCrypt said that during the development of the algorithm in 2007, for some reason no one thought that it would be necessary to justify the design of the lookup table, and the values ​​from which it came out were forever lost. The story is beautiful, just do not forget that the algorithm was created not at school at a break, but in the bowels of the FSB.

The second way is to create the SBox yourself, guided by the available mathematical apparatus. So didAES authors and they did pretty well. If we compare the nonlinearity of SBox AES, SM4 (Chinese standard) and Grasshopper (it uses the same SBox as the Stribog hash), then the result will not be in favor of the latter
AES non-linearity (min, max) = (112.0, 112.0)
SM4 non-linearity (min, max) = (112.0, 112.0)
Streebog non-linearity (min, max) = (102.0, 110.0)
The nonlinearity calculation code uses the Walsh Transform and is available here.

## Documents

I got two documents from ISO. In the first, Grasshopper designers explain how they created SBox, in another, the committee discusses their arguments.

from the first document we are interested in two quotes:

and

I hope the topic with “Leo Perrin himself invented that the authors searched for SBox randomly” is now closed.

From the explanations of designers it follows that

1. They really looked for the SBox in a pseudo-random way (given security criteria)
2. No hidden structure was allegedly laid in it.

And in this place they thoroughly screwed up.

## What is a structure?

A structure as applied to a lookup table is an algorithm that describes this table.

The document mentioned AES. But the lookup table for AES was originally created not by random search, but with the help of several mathematical techniques that allowed us to describe a nonlinear layer with several formulas . This, incidentally, is the uniqueness of AES.

On the contrary, if you are looking for SBox randomly, then there should not be such structures in it and the problem with Grasshopper SBox is that the words of the algorithm designers are very different from the facts. I wrote about the structure of a grasshopper called TKLog in a previous article , this time let's talk about structures in general.

## Kolmogorov complexity

This is the result of research from Leo Perrin 's latest article on the Grasshopper.

The main counterargument to articles about structures in Grasshopper SBox is that “in almost any SBox you can find some structure if you want to.” And "even though the probability of finding the structure that Leo found is negligible, if there were another SBox, then there would be another, also with an insignificant probability."

Let's say this is so. But. As it turned out, it is possible to derive a certain “degree of structuredness” of SBox, which does not depend on the probability of falling into one or another structure.

But it depends on the size of the algorithm that is needed to generate this SBox!

This is called the Kolmogorov complexity.

If you imagine SBox as a string of bytes, then in the case of a random string, there should not be an algorithm that generates this string and at the same time is smaller than this string.

## For a grasshopper

So, the size of SBox is 256 bytes. Here is a readable version of the code of authorship of Leo Perrin, which implements the Grasshopper table. At the input is the source byte, at the output is the corresponding byte from the Grasshopper SBox. The main condition for such an algorithm is a ban on the use of language or platform structures that cheatfully reduce the size of the program. For example, if somewhere inside the standard library there is a constant containing half of SBox, then you cannot use it.

Challenge - write a program that will be smaller than SBox.

``````unsigned char p(unsigned char x){
unsigned char
s[]={1,221,146,79,147,153,11,68,214,215,78,220,152,10,69},
k[]={0,32,50,6,20,4,22,34,48,16,2,54,36,52,38,18,0};
if(x) {
unsigned char l=1, a=2;
while(a!=x) {
a=(a<<1)^(a>>7)*29;
l++;
}
if (l%17) return 252^k[l%17]^s[l/17];
else return 252^k[l/17];
}
else return 252;
}
``````

Our task is to show that Grasshopper SBox has a “strong” structure, such that its size is much smaller than the size of SBox itself. The code above takes 416 characters, which is a bit much for now.

If you package constants in Unicode characters and get rid of some beauty, you get the following code:

``````p(x){
unsigned char
*t="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8",
a=2,l=0,b=17;
while(x && (l++,a^x)) a=2*a^a/128*29;
return l%b ? t[l%b]^t[b+l/b]^b : t[l/b]^188;
}
``````

This source already takes 196 bytes, which is already 23% less than the SBox itself. But we move on. If you remove the extra spaces and line breaks, then the best working version of C code to date looks like this:

``````p(x){char*k="@`rFTDVbpPBvdtfR@\xacp?\xe2>4\xa6\xe9{z\xe3q5\xa7\xe8";int l=256,b=17;while(--l*x^l)x=2*x^x/128*285;return l%b?k[l%b]^k[b+l/b]^b:k[l/b]^188;}
``````

To display SBox, you can use the following simple code:

``````int main() {
for(int i = 0; i < 256; i++){
if (i % 16 == 0){
printf("\n");
}
printf("%d, ", (unsigned char)p(i));
}
}
``````

You can run and make sure that the code really works and corresponds to Grasshopper SBox.
This version of C code takes 153 characters. Since all characters in the code are valid ANSI, each character can be considered equal to 7 bits, not 8. Thus, we have 1071 bits or ~ 134 bytes. And this is almost half the size of the table, although it is still a text source.

As for the compiled code, for the Cortex-M4 architecture Leo was able to compile a code of only 80 bytes (the assembler code is in the article).

This is also not the limit, there are already working implementations of less than 64 bytes in size .

## And what does that mean backdoor?

No, we can’t talk about the presence of a backdoor in Grasshopper until it is completely found.

But the structure, 4 times smaller than the Sbox, cannot get into the SBox by accident, no matter what the authors and advocates of Grasshopper say.

Just think about it. The actual size of the Grasshopper lookup table found by "pseudo-random search" is comparable to the size of the AES table ( 60 characters , GolfScript), whose structure was known from the very beginning.

Whether or not there is a backdoor in Grasshopper - we don’t know. But there was no doubt that the authors lied.

## conclusions

Grasshopper designers have repeatedly declared the absence of hidden structures in the most important element of the algorithm - SBox. But studies have shown that the process of creating a lookup table was far from being randomly searched. Even with the criteria that the authors described in their explanation .

If you believe the authors (and on this side of the border they unconditionally believe), then they accidentally stumbled upon a structure whose size is 4 times smaller than the SBox itself. I repeat, cryptography is not an exact science and the slightest reason for doubt is enough to arouse paranoia in people. In this situation, the size of the motive is 4 times more than sufficient to consider the algorithm developed in the bowels of our illustrious power structures to be very suspicious. No, it’s true, it’s funny when our valiant FSB officers include schoolchildren and say that 11 years ago this algorithm began to be developed almost in their free time, so they eventually lost seeds and most likely the program that generated the lookup table. It just so happened that the side project became the national standard ¯ \ _ (ツ) _ / ¯.

## Afterword

Now in ISO is a six-month period of study Grasshopper for the presence of a backdoor. Based on its results, a decision will be made on the fate of the algorithm as an international standard. Or the process will be extended for another half year.

According to people personally familiar with the Grasshopper designers, the algorithm was developed at a time when no one needed to explain why SBox has exactly the form they specified. Therefore, no one has saved the starting values ​​for the search. The argument, as for me, is rather weak.

At the conference, I spoke with Curve25519 author Daniel J. Bernstein and Tanja Lange, who are members of the ISO committee for standardizing new algorithms. They said that it is not necessary to provide seeds, it is enough to show the program itself that generated this ill-fated SBox. This has not been done so far and the probability of this event is not too great. For a secret.

In general, in the discussion protocol, you can find many interesting details and remarks that allow you to understand the thoughts of the committee members.

As for the fate of the algorithm, the probability of accepting it as an international standard is rather low. This is recognized by both ISO members and employees of Russian companies that I met on EuroCrypt.

Moreover, according to the latest data, there is a non-zero probability of sawing out the Stribog hash already accepted in the standard (with the same SBox), as well as RFC 7801, which describes the Grasshopper.

It would be possible to create a new SBox that is correct according to all canons, even for the sake of interest they did it in one of the Russian companies (by the way, they promised to show the results). But the problem is that in Russia, the Grasshopper is already standardized, implemented in hardware and implemented wherever possible. Replacing a Grasshopper with a Grasshopper V2 will cost a very round sum.

The nineties and zero are long behind. It is time to stop creating encryption algorithms in basement sharak and say that "but apart from the FSB no one can create anything like this in Russia, we have no choice."

Maybe at least try to hold a competition before rushing with such words? And do not forget that AES was actually not invented in the USA.

Maybe then we will no longer need to beat the shovels of gouging, who find a place for children's excuses in algorithms that protect, among other things, the state. a secret.

## Challenge!

We need to write an implementation of Grasshopper's SBox function, which would be as small as possible than that found by Leo and his colleagues. But those that are simply less than 256 bytes will go. Technical details here . There are several examples in different languages. The most successful ones are fame, respect and respect for their contribution to honest cryptography.

So far, the record is 58 bytes in the Stax language. This is less than a quarter of the size of a “randomly found" SBox.

Thanks for attention.