We implement an even more secure VPN protocol.
This publication is a continuation of the previously written in our blog: "We implement a secure VPN protocol ." In this article, we do not redo or rewrite the protocol, but only slightly modify it further. The implementation of everything described below is already present in GoVPN 3.1 .
To create noise, the transport protocol is slightly modified. To augment handshakes and strengthen passwords, the handshake protocol has been changed. In more detail about all this under a cat.
At the end of the previous article, I noticed that we ensure the confidentiality of the contents of the transmitted data, but do not hide the size of the packets and the fact of their sending. Sometimes even the fact itself (the period of the packet occurrence) can indirectly most likely say that now, for example, DHCP works on the encrypted channel: it seems to be encrypted, but we still know what processes are inside. Or you can track the correlation between incoming traffic from one client to outgoing in another place, and thereby deanonymize it.
We solve this problem quite simply, although it is somewhat overhead in terms of resources: we add noise to the traffic.
In the transport protocol after nonceadded two bytes (which will be encrypted) containing the size of the payload. It can be equal to zero, which is convenient to use for heartbeat packets to show that the client / server is still “alive” on the network. As a side effect: we reduce the MTU of the virtual TAP interface by these two bytes.
Each packet is supplemented before encryption with zeros in order to increase its size to the maximum possible GoVPN sent. After encryption, it becomes a noise in which it is impossible to understand where the payload is and where the data is useless.
So we hid the size of the message, but not the fact that messages appeared on the network. This problem is solved simply by creating a constant packet rate traffic. Technically done simply: the tick generator is turned on. For each tick, it is checked if there is a packet to send. If not, an empty packet is sent. All packages are supplemented to a maximum size with noise.
The scheme for forming the transport layer packet looks like this:
As the user cebka correctly noted in the comments to the previous publication, the 256-bit public key Curve25519 is not a random set of bytes, but a point on an elliptic curve. Therefore, when we try to decrypt it, we will see that we received not random data, but, in fact, a point, and, thereby, we will realize that we have successfully picked up (found) a common authentication key. The common authentication key in the previous GoVPN implementation, even in the examples, assumes that it was generated not from a password, but from PRNG. So in practice, of course, just trying to sort the key would not work. However, if we want to use passwords, then this will become a problem, since passwords have much less entropy and are susceptible to dictionary attacks .
Why do we want to use passwords? Because in any case, the shared authentication key must be somehow protected. Either it is stored on a disk to which full-disk encryption is applied , or, for example, PGP is encrypted and, when used, its decrypted version is placed in RAM (temporary disk). Both drive and PGP, in turn, are protected by passphrases. Why not use these passphrases directly in the GoVPN protocol to have fewer software dependencies and attack vectors?
A small digression:use should be just passphrases, not passwords. Technically, there may not be a difference between them for a computer, but for a person it is significant: a password is usually a short line of high-entropy (random) characters, and a passphrase is a long line of low-entropy ones. Low entropy means ease of remembering by a person. Regular English text is believed to contain 1-2 bits of entropy per character. However, if we take a hundred characters, then in total we get a hundred bits, usually easy to remember. The only “but” from a technical point of view: if the password can still be saved in the database (no need to do this, of course), then the passphrase is not convenient for this and the hash is saved from it.
In order for the authentication protocol to be called “strong”, it must be safe to use even with weak passwords. In our case, the password “foobar” will be quickly selected according to the dictionary and decryption of the public key at the time of the handshake will indicate that the password was selected successfully. That is, this is not a zero-knowledge protocol yet.
This can be fixed by using special coding of Elligator curve points . It allows you to encode them so that they become indistinguishable from noise. This will be enough for the protocol to become zero-knowledge and be able to use even weak passwords, at the same time it was called a “strong authentication protocol”. Elligator is applied to the public key on one side before encryption and is inverted on the opposite after decryption.
Elligator can not be applied to all Curve25519 key pairs: on average, about half of the points cannot be encoded in a random string. When generating a Curve25519 key pair, we try to encode the public one, checking if it works out. If not, then repeat the procedure. We get an unpleasant side effect: when generating Curve25519 keys on each side, on average, we need twice as much entropy and computing resources.
The protocol after applying Elligator becomes zero-knowledge and is suitable for authentication with weak passwords. But authentication data is stored on the server and client. There may not be a client on the hard disk, as the passphrase is entered manually, but on the server it will be a separate file. Compromise of the contents of the server’s hard disk, leakage of the database of authentication keys will allow iterating through the password, attacking the dictionary. This is a very powerful attack, which is able to recover a huge number of passwords used by people and even passphrases.
If on the server we save the password hash (since it is convenient to store it), then the attacker will simply calculate the hashes from the passwords being searched and compare with what is on the hard disk. Hashes count quickly. Therefore alwaysand everywhere stored passwords or passphrases need to be enhanced.
Common password hardening methods: PBKDF2 , bcrypt , scrypt . We will not particularly go into the descriptions of these algorithms, since there are a lot of articles on this topic (because until now people manage not to use any of this without absolutely appreciating users' secrets).
Personally, I do not consider bcrypt as an option, since the password length of input phrases is normally limited to 72 characters (Blowfish feature), which is small (personally, I have all password phrases 90-110 characters long). And the main argument in favor of bcrypt is that its function is slower. True, but what is stopping you from increasing the number of iterations of PBKDF2? The difference between them as a whole turns out to be very blurry: the essence is one, just slightly different tools are used.
Scrypt is interesting, but there are many arguments against it, albeit controversial. It would be possible to think more closely about him, if not for the finale of the Password Hashing Competition, designed to make a good quality password enhancement function that takes into account both memory load and temporary attacks on side channels (side channel attack). There are really very interesting ideas, implementations and well-versed in the topic of “judges”. But until a finalist is selected, GoVPN uses PBKDF2-SHA512.
As a rule, any gain consists in increasing the entropy of passwords and some expensive operation. An increase in entropy is necessary so that, at a minimum, the amplified identical passwords do not coincide and for this they add the so-called “salt”. The expensive operation in the case of PBKDF2 is the many (thousands) iterations of the hash function. In addition, additional entropy protects against the generation of pre-calculated hash values.
GoVPN uses the existing 128-bit client identifier as a salt that is not secret (no need to hide).
An already enhanced version is saved on the server. It is also used in the handshake protocol. Before starting the connection, the user enters a passphrase, it is amplified using the user ID as salt, and this result is already used as an authentication key when shaking hands with the server. The amplification operation is expensive, but only performed on the client at the time the daemon starts.
When you compromise the database of client authentication keys on the server, we are unlikely to be able to easily find out the user passwords. But in our hands we have the result of their strengthening, used to authenticate the parties. If this data flows to the attacker, then he will be able to appear as a client, he will be able to connect to the VPN server.
If we can store on the server something that can only authenticate authentication data, but cannot be used as its data, then this problem will be solved. The process is commonly called augmentation and is described in the article for EKE . Instead of passwords on the server side are the so-called "verifiers" (verifiers).
There are many options for solving this problem. We apply based on asymmetric signature algorithms. Specifically, Ed25519 from the author of the already used Curve25519, Salsa20 and Poly1305. This is an easy to implement, fast, reliable (good cryptanalysis) algorithm for generating and verifying signatures. In addition, it does not require additional entropy when creating signatures.
The essence of augmentation in this case is that the public key Ed25519 of the pair generated from the enhanced password is used as a verifier. Instead of a strong password, this verifier is used to encrypt Diffie-Hellman public keys. The client additionally at the end of the handshake signs the used shared key Kreceived after Diffie-Helman and sends this signature to the server. Since the verifier is just a public key, the server will be able to verify the signature by it and make sure that the client really has a private part of the key, which can be obtained by knowing only the password in clear form. An attacker will not be able to create a signature and introduce himself as a client.
The tester is created on the client side in advance using the utility included in GoVPN. After entering his identifier (which can be created on any side) and a passphrase on the basis of which an enhanced version and Ed25519 key pair are created, he sends the verifier to the server administrator. As a side effect, we get an increase in handshake traffic by the length of the signature, and a waste of the client’s processor resources to create the signature, and the server to verify it.
The final handshake protocol began to look like this:
Dependence on high-quality PRNG has not disappeared, and the safe use of GoVPN under closed proprietary operating systems is technically impossible. You can fix this only by changing the OS / platform for good. Fixed in version 3.4 : you can use third-party EGD-compatible PRNG sources.
The only thing that can indirectly be understood that traffic is GoVPN-specific is that at the beginning (when there is a handshake) there is an exchange of packets of always clearly defined sizes and only then “noise” is turned on. Handshake messages are indistinguishable from noise, do not give out a client identifier, but the size is not hidden. Fixed in version 4.0 : handshake messages can be noisy.
Some statistics are not current:
All the best, do not switch!
Sergey Matveev, Python and the Go-developer ivi.ru
Our previous publications:
»We implement a secure VPN protocol
» Extra elements or how we balance between servers
» Blowfish on guard ivi
» Non-personalized recommendations: association method
» By city and by weight or how we balance between CDN nodes
» I am Groot. We do our analytics on events
» All for one or how we built CDN
Hide payload size and time
At the end of the previous article, I noticed that we ensure the confidentiality of the contents of the transmitted data, but do not hide the size of the packets and the fact of their sending. Sometimes even the fact itself (the period of the packet occurrence) can indirectly most likely say that now, for example, DHCP works on the encrypted channel: it seems to be encrypted, but we still know what processes are inside. Or you can track the correlation between incoming traffic from one client to outgoing in another place, and thereby deanonymize it.
We solve this problem quite simply, although it is somewhat overhead in terms of resources: we add noise to the traffic.
In the transport protocol after nonceadded two bytes (which will be encrypted) containing the size of the payload. It can be equal to zero, which is convenient to use for heartbeat packets to show that the client / server is still “alive” on the network. As a side effect: we reduce the MTU of the virtual TAP interface by these two bytes.
Each packet is supplemented before encryption with zeros in order to increase its size to the maximum possible GoVPN sent. After encryption, it becomes a noise in which it is impossible to understand where the payload is and where the data is useless.
So we hid the size of the message, but not the fact that messages appeared on the network. This problem is solved simply by creating a constant packet rate traffic. Technically done simply: the tick generator is turned on. For each tick, it is checked if there is a packet to send. If not, an empty packet is sent. All packages are supplemented to a maximum size with noise.
The scheme for forming the transport layer packet looks like this:
Strong password authentication protocol
As the user cebka correctly noted in the comments to the previous publication, the 256-bit public key Curve25519 is not a random set of bytes, but a point on an elliptic curve. Therefore, when we try to decrypt it, we will see that we received not random data, but, in fact, a point, and, thereby, we will realize that we have successfully picked up (found) a common authentication key. The common authentication key in the previous GoVPN implementation, even in the examples, assumes that it was generated not from a password, but from PRNG. So in practice, of course, just trying to sort the key would not work. However, if we want to use passwords, then this will become a problem, since passwords have much less entropy and are susceptible to dictionary attacks .
Why do we want to use passwords? Because in any case, the shared authentication key must be somehow protected. Either it is stored on a disk to which full-disk encryption is applied , or, for example, PGP is encrypted and, when used, its decrypted version is placed in RAM (temporary disk). Both drive and PGP, in turn, are protected by passphrases. Why not use these passphrases directly in the GoVPN protocol to have fewer software dependencies and attack vectors?
A small digression:use should be just passphrases, not passwords. Technically, there may not be a difference between them for a computer, but for a person it is significant: a password is usually a short line of high-entropy (random) characters, and a passphrase is a long line of low-entropy ones. Low entropy means ease of remembering by a person. Regular English text is believed to contain 1-2 bits of entropy per character. However, if we take a hundred characters, then in total we get a hundred bits, usually easy to remember. The only “but” from a technical point of view: if the password can still be saved in the database (no need to do this, of course), then the passphrase is not convenient for this and the hash is saved from it.
In order for the authentication protocol to be called “strong”, it must be safe to use even with weak passwords. In our case, the password “foobar” will be quickly selected according to the dictionary and decryption of the public key at the time of the handshake will indicate that the password was selected successfully. That is, this is not a zero-knowledge protocol yet.
This can be fixed by using special coding of Elligator curve points . It allows you to encode them so that they become indistinguishable from noise. This will be enough for the protocol to become zero-knowledge and be able to use even weak passwords, at the same time it was called a “strong authentication protocol”. Elligator is applied to the public key on one side before encryption and is inverted on the opposite after decryption.
Elligator can not be applied to all Curve25519 key pairs: on average, about half of the points cannot be encoded in a random string. When generating a Curve25519 key pair, we try to encode the public one, checking if it works out. If not, then repeat the procedure. We get an unpleasant side effect: when generating Curve25519 keys on each side, on average, we need twice as much entropy and computing resources.
Password Strengthening
The protocol after applying Elligator becomes zero-knowledge and is suitable for authentication with weak passwords. But authentication data is stored on the server and client. There may not be a client on the hard disk, as the passphrase is entered manually, but on the server it will be a separate file. Compromise of the contents of the server’s hard disk, leakage of the database of authentication keys will allow iterating through the password, attacking the dictionary. This is a very powerful attack, which is able to recover a huge number of passwords used by people and even passphrases.
If on the server we save the password hash (since it is convenient to store it), then the attacker will simply calculate the hashes from the passwords being searched and compare with what is on the hard disk. Hashes count quickly. Therefore alwaysand everywhere stored passwords or passphrases need to be enhanced.
Common password hardening methods: PBKDF2 , bcrypt , scrypt . We will not particularly go into the descriptions of these algorithms, since there are a lot of articles on this topic (because until now people manage not to use any of this without absolutely appreciating users' secrets).
Personally, I do not consider bcrypt as an option, since the password length of input phrases is normally limited to 72 characters (Blowfish feature), which is small (personally, I have all password phrases 90-110 characters long). And the main argument in favor of bcrypt is that its function is slower. True, but what is stopping you from increasing the number of iterations of PBKDF2? The difference between them as a whole turns out to be very blurry: the essence is one, just slightly different tools are used.
Scrypt is interesting, but there are many arguments against it, albeit controversial. It would be possible to think more closely about him, if not for the finale of the Password Hashing Competition, designed to make a good quality password enhancement function that takes into account both memory load and temporary attacks on side channels (side channel attack). There are really very interesting ideas, implementations and well-versed in the topic of “judges”. But until a finalist is selected, GoVPN uses PBKDF2-SHA512.
As a rule, any gain consists in increasing the entropy of passwords and some expensive operation. An increase in entropy is necessary so that, at a minimum, the amplified identical passwords do not coincide and for this they add the so-called “salt”. The expensive operation in the case of PBKDF2 is the many (thousands) iterations of the hash function. In addition, additional entropy protects against the generation of pre-calculated hash values.
GoVPN uses the existing 128-bit client identifier as a salt that is not secret (no need to hide).
An already enhanced version is saved on the server. It is also used in the handshake protocol. Before starting the connection, the user enters a passphrase, it is amplified using the user ID as salt, and this result is already used as an authentication key when shaking hands with the server. The amplification operation is expensive, but only performed on the client at the time the daemon starts.
Authentication Augmentation
When you compromise the database of client authentication keys on the server, we are unlikely to be able to easily find out the user passwords. But in our hands we have the result of their strengthening, used to authenticate the parties. If this data flows to the attacker, then he will be able to appear as a client, he will be able to connect to the VPN server.
If we can store on the server something that can only authenticate authentication data, but cannot be used as its data, then this problem will be solved. The process is commonly called augmentation and is described in the article for EKE . Instead of passwords on the server side are the so-called "verifiers" (verifiers).
There are many options for solving this problem. We apply based on asymmetric signature algorithms. Specifically, Ed25519 from the author of the already used Curve25519, Salsa20 and Poly1305. This is an easy to implement, fast, reliable (good cryptanalysis) algorithm for generating and verifying signatures. In addition, it does not require additional entropy when creating signatures.
The essence of augmentation in this case is that the public key Ed25519 of the pair generated from the enhanced password is used as a verifier. Instead of a strong password, this verifier is used to encrypt Diffie-Hellman public keys. The client additionally at the end of the handshake signs the used shared key Kreceived after Diffie-Helman and sends this signature to the server. Since the verifier is just a public key, the server will be able to verify the signature by it and make sure that the client really has a private part of the key, which can be obtained by knowing only the password in clear form. An attacker will not be able to create a signature and introduce himself as a client.
The tester is created on the client side in advance using the utility included in GoVPN. After entering his identifier (which can be created on any side) and a passphrase on the basis of which an enhanced version and Ed25519 key pair are created, he sends the verifier to the server administrator. As a side effect, we get an increase in handshake traffic by the length of the signature, and a waste of the client’s processor resources to create the signature, and the server to verify it.
The final handshake protocol began to look like this:
rand (xbit) | reading X bits from PRNG |
CDHPriv | private diffi helman customer key |
SDHPriv | Private Diffie-Helman server key |
CDHPub | Public Diffie Helman Client Key |
SDHPub | public diffi-helman server key |
enc (K, N, D) | Salsa20 encryption with key K, nonce N, data D |
H () | HSalsa20 hash function. It doesn’t matter what it is. Could be SHA2 |
El () | Elligator curve point encoding function, as well as inverting this action |
DSAPub | Ed25519 client public key generated based on his password |
DSAPriv | Ed25519 client's private key generated based on his password |
Sign (K, D) | generating Ed25519 signature with private key K data D |
Verify (K, D) | verification of signature Ed25519 by public key K of data D |
What else is worth doing or fixing?
Dependence on high-quality PRNG has not disappeared, and the safe use of GoVPN under closed proprietary operating systems is technically impossible. You can fix this only by changing the OS / platform for good. Fixed in version 3.4 : you can use third-party EGD-compatible PRNG sources.
The only thing that can indirectly be understood that traffic is GoVPN-specific is that at the beginning (when there is a handshake) there is an exchange of packets of always clearly defined sizes and only then “noise” is turned on. Handshake messages are indistinguishable from noise, do not give out a client identifier, but the size is not hidden. Fixed in version 4.0 : handshake messages can be noisy.
Some statistics are not current:
Overhead transport protocol | 26 bytes per Ethernet TAP interface packet |
Overhead handshake protocol | 264 bytes, 2 packets from the client, 2 from the server |
IPv4 TCP traffic skipping | 786 Mbps on amd64 FreeBSD 10.1, Intel i5-2450M CPU 2.5 GHz, Go 1.5.1, daemon loaded with a single core |
Code size f-ii (de) encryption of the transport Protocol | 1 screen, 1 screen |
Code size of the server, client part of the handshake protocol | 2 screens, 1.5 screens |
Supported Platforms | i386 / amd64 GNU / Linux and FreeBSD |
Available as packages in | Arch Linux , FreeBSD |
Sergey Matveev, Python and the Go-developer ivi.ru
Our previous publications:
»We implement a secure VPN protocol
» Extra elements or how we balance between servers
» Blowfish on guard ivi
» Non-personalized recommendations: association method
» By city and by weight or how we balance between CDN nodes
» I am Groot. We do our analytics on events
» All for one or how we built CDN