The simplest explanation of how symmetric encryption algorithms work today

Original author: Colm MacCárthaigh
  • Transfer

(I found a thread on Twitter with a very cool explanation of symmetric ciphers. It was written by Colm MacCárthaigh, one of the main Apache contributors. I asked Colm for permission to translate, he kindly agreed).


I will explain to you in plain language what happens when data is encrypted. I hope that without the mysticism and complex things that were invented by cryptographers.


So, symmetric encryption is exactly what we use in most cases when we want to encrypt a bunch of data. Your browser sends and receives data using symmetric encryption. If you encrypt files or a disk, symmetric encryption also works in this case. iMessage, Signal, WhatsApp - all of them use symmetric encryption for the security of your correspondence.


If you think that when encrypting the data is mixed so that no one can read it without a key, the way it actually happens.


Here is a simple example. Suppose I have an Ovaltine string and want to encrypt it. I could use rot13 - Caesar’s very simple old-school cipher, which makes a round dance of letters where a and z hold hands, and replaces each letter with another letter of the alphabet, which is 13 characters from the replaced letter. Thus, "O" turns into "B", and "v" becomes "i", as a result, "Ovaltine" turns into "Binygvar". Of course, this is not very safe. This is a naive example, which is very easy to crack, since the attacker can find out which letter is most often found (usually in the original text it is “e”) and find the remaining letters in this way.


Now you can imagine that there should be trickier ways to “mix” the letters. For example, some complex scheme in which "a" goes to "p", but when re-encrypted, to "f". Maybe even sometimes this scheme begins to encrypt "a" with two letters, for example, "jd" or something else. Thus, this complicated scheme can encrypt "Ovaltine" into the string "FGyswDmweeRq" (note that it has become longer). Encryption algorithms appeared in the past that worked in a similar way, but that’s not at all how modern encryption works.


Instead of "shuffling" letters, modern encryption takes your secret string and artfully combines it with random data. This is similar to rot13 only in two aspects: encryption and decryption are essentially the same operation, and everything happens “in place”. Indeed, have you noticed that rot13 is both an encryption and decryption algorithm? rot13 (Ovaltine) -> Binygvar, rot13 (Binygvar) -> Ovaltine. I believe that this is a very beautiful symmetry in symmetric encryption. But back to our topic. The trick is that we use the bitwise XOR operation. In cryptography, formal logic, and code, XOR programs can be defined differently, but I will use a notation that you are most likely familiar with. It looks like this: ^.


XOR stands for "exclusive OR". This is an operator (or function, if you prefer), which takes two arguments and returns the result. A ^ B = C. This operator is called "bitwise" because it applies to bits corresponding to each other. If A and B are bytes, then we can assume that A ^ B = C is essentially 8 different operations that occur simultaneously. ^ compares the first bit A and the first bit B, and then puts the result in the first bit C. It repeats the same 7 more times for the remaining bits. The rules are simple: if the bit from A is “1” OR the bit from B is “1”, then we set the corresponding bit C to “1”, but only if “A” and “B” are not “1” at the same time. This is the exclusive part. Here is an old-school truth table:


A|B|C
0|0|0
1|0|1
0|1|1
1|1|0

The coolest thing about XOR is that it looks like rot13. We can use it for encryption and decryption. I will show this with a simple example. Let's imagine that we want to encrypt the usual number "3" and that our encryption key is another number "7". Thus 3 ^ 7 = 4. That is, the encryption result is "4". Let's now decipher the number. I'll just do the same thing again: 4 ^ 7 = 3. Take any number you like or any data, and it will always work - XOR will always be able to decrypt itself.


Bit by bit - this is how we actually encrypt and decrypt data, there is no mixing, only XOR-ing. The hard part is finding data to which we can apply XOR. One approach is to take a large chunk of secret data at hand and use it as the second argument to XOR. In this case, all participants in the process of transmitting encrypted data must use the same set of secret data for encryption and decryption. And it will work. True, there are several problems.


The first problem. Secret data should seem random. You cannot take text from a book or anything like that. Any patterns will appear in the encrypted data. This is precisely what made the allied forces superior in World War II.


The second problem. You cannot reuse sensitive data, as patterns will reappear. Thus, you must somehow provide large chunks of secret data to everyone who needs it, like the One-time pad. This is too hard.


In modern encryption, we “generate” the secret data we need from small keys. These keys are much easier to carry and protect. This is what symmetric encryption algorithms really are - schemes for the deterministic generation of random data from a key. The part about “determinism” is very important: two people with the same key must generate absolutely the same data set, otherwise they will not be able to understand each other. You have probably heard about such algorithms: AES, 3DES, DES, RC4, ChaCha20. They all do it.


It turns out that the mathematical problem of generating a random data stream (in which there are no patterns in any predictable form) using the key is very difficult. From this list, only AES and ChaCha20 are considered safe today. Other algorithms were hacked: people were able to predict them. Moreover, AES has a slightly tarnished reputation, because cryptographers say the following:


AES is the main and most analyzed encryption algorithm. Absolutely Gold Standard! : dark_sunglasses:

But at the same time they add:


AES implementations in software (not in hardware) are either unsafe or slow, and sometimes not safe, and slow. It was not designed taking into account the fact that it can be hacked using cache analysis. : facepalm:

Do not be too scared if this is not clear to you. The main idea is this: AES is gorgeous from the point of view of mathematics, but it is very complicated in software implementation. But do not worry - we almost always have AES support at the hardware level (a list of all processors with AES hardware support can be found here https://en.wikipedia.org/wiki/AES_instruction_set , - translator's note).


Be that as it may, we continue ... How do these algorithms actually work? How can we take a key and safely generate a random data stream? I will simplify things a bit here and start with blocks.


These algorithms receive three parameters at the input and give out the ciphertext at the output. Input parameters - a key, encrypted text and ... surprise - something strange called "initialization vector" (initialization vector, IV).


AES(key, IV, plaintext) -> encrypted_data.

The key and IV are combined with each other to create a set of "starting conditions" for the algorithm; this is similar to the initial swap or shuffling of tiles in a Scrabble game. The same combination of key and IV will always create the same set of starting conditions. You ask, why did we even need IV then? We need an IV so that we can encrypt multiple messages using the same key. Without IV, each generated data stream would be the same, and this is bad. This would violate one of the rules that we talked about earlier: we cannot reuse the same data for encryption. So we need an IV to mix the result. But unlike key IV, it can be public.


So, when you encrypt a message and send it to someone, you can also add: "Hey, here is the IV that I used." It’s still critical that we don’t reuse the combination of key and IV, because they would give us repeated random data. There are two ways to achieve this condition: 1) IV is a kind of counter that we increase with each new message. 2) IV is generated randomly, while it has a fairly large value, so we do not need to worry much about collisions. Be that as it may, I mentioned that I will talk about blocks.


Keys and IV are “mixed” or combined in such a way as to create a set of start conditions ... these conditions are actually the initial “block” of random data. The length of this block is 128 bits for AES128, 256 bits for AES256, and 512 bits for ChaCha20. And here the real magic and individuality of a particular encryption algorithm is manifested. In fact, their essence lies in how the sequence of blocks is generated and how each block is associated with its neighbors. The relationship between these blocks remains predictable even for those who do not have a key.


I will not go deep into how these algorithms work, but if you want to know more, I advise you to start exploring this topic with the linear congruential generators (LCG). LCG is a function that creates "circular" data blocks in a random and non-repeating way. Then take a look at Feistel networks, the next level of LCG development. Then deal with S-Boxes, and then look at how the Salsa20 creates interlacing in the ChaCha20 algorithm. All this is much more affordable than you might think!


So, we now know how a random data stream can be combined with text to encrypt and decrypt it, and we are already a little in the subject of how these random data streams are created. Isn't that all we need? For disk encryption, that's really almost all. We can encrypt each block or sector of the storage using one key and IV, which can be obtained from the "position" on the disk. Thus, we can always decrypt any data block anywhere on the disk, as long as we have the key. But there is one problem ... someone can ruin our encrypted data. If I change the value of any byte, even if I do not have a key, then in the end we will not be able to decrypt the block. And there is no protection against this kind of interference. In the case of sending messages and data over the network, it becomes even more critical. We do not want anyone to spoil our transmitted data. So we need to add an integrity check! There are several schemes in order to do this.


HMAC, GCM, and Poly1305 are the most common modern integrity checking schemes. These algorithms basically work like this: they are supplied with data and another key (the so-called integrity key). After the calculations, they give out the MAC (message authentication code) or tag, which in turn is just another piece of data that acts as a signature.


Thus, for encryption and protection, our scheme may look like this:


AES(key, IV, "Ovaltine") -> encrypted_output
HMAC(key, encrypted_output) -> MAC

and then by wire we send:


IV | encrypted_output | MAC

For decryption, we check the MAC, generating it again and comparing the result with the received MAC, and then decrypt the data. There are internal differences in how HMAC, GCM, and Poly1305 generate these signatures, but you don’t have to worry about that. To date, this combination of operations is usually wrapped in a function called "AEAD" (Authenticated Encryption with Additional Data). Under the hood, she does everything that I spoke about earlier:


AEAD(key, IV, plaintext, additional_data) -> IV_encrypted_data_MAC

A piece called "additional_data" is just data with which you can make sure that the sending party has this data, although it was not sent to them. It’s like meta-data that sets access rights. Often this field is left blank.


Nevertheless, you can have problems with AEAD if you use the same IV. This is bad! There are attempts to improve this situation: my colleague, whose name is Shay, is working on a cool SIV scheme that adds a layer of protection against this problem. But if you use a unique IV, modern encryption is very secure. That is, you can publish the ciphertext in the New York Times, and no one can crack it. The cipher will remain inaccessible even if “some” part of the text is known. For example, in Internet protocols a large amount of text is known. HTTP servers always respond the same and the first bytes are always known. But this fact does not matter at all - it will not help the attacker to find out a single piece of the remaining data ... We have come a long way since the Second World War.


But there are attacks that work! If you send data over a network and someone tracks the time and size of messages, then encrypted data can be cracked using traffic analysis.


image


Let's figure out the length first. Obviously, length is not a hidden characteristic. And this is normal if you are trying to protect your password or credit card number somewhere in the middle of the message. Not a big problem. But this means that potentially anyone can determine the type of content that you submit. A simple example: if you send a gif using a messenger and if the size of this image is unique, the attacker intercepting your data may suggest which GIF was just sent. There are trickier versions of this attack for Google Maps, Netflix, Wikipedia, etc. To protect against this attack, you can "finish off" the sent messages with additional bytes, so that all sent messages are the same length no matter what. Encryption, which is used in military networks, always “finishes” the traffic with additional data, that is, for the interceptor it always looks the same! Another problem associated with length is that if you use compression and give the attacker the ability to change any part of the content on the page that the user sees, this allows the attacker to find out even the smallest secrets. Search for an attack called CRIME. She is gorgeous and scary. then this enables the attacker to find out even the smallest secrets. Search for an attack called CRIME. She is gorgeous and scary. then this enables the attacker to find out even the smallest secrets. Search for an attack called CRIME. She is gorgeous and scary.


I also said that the other problem is timing. Obviously, the sending time of each message is open information. Could this be a problem? Can! For example, if you send a message every time you press a key, then it’s trivial to find out what exactly is printed using time analysis. Cool! Another example is VOIP. If your call application sends data only when people speak, but not during silence, this is enough to restore 70% of English speech. Just out of silence. Scary cool.


These examples are just the tip of the iceberg. Even when you use encryption algorithms and schemes that have been improving for 80 years, there are still gaps that can be used to crack security. That is why it is valuable to know about it!


Be that as it may, this is the level of explanation that I want to dwell on now, but we have considered the most necessary things to know. If you read up to this point - thanks! You should now have a greater understanding of what happens during encryption and what to watch out for.


Feel free to ask questions.


Translation is published under CC BY-NC-SA 4.0 License


Also popular now: