How and why do we do TLS in Yandex

    I am engaged in food safety in Yandex and it seems that now is the time to tell in more detail than we already had at YaC on Habr about how we are implementing TLS.

    Using HTTPS connections is an important part of a secure web service, since it is HTTPS that ensures the confidentiality and integrity of data during the transfer of data between the client and the service. We gradually transfer all our services only to an HTTPS connection. Many of them already work exclusively on it: Passport, Post, Direct, Metric, Taxi, Yandex.Money, as well as all forms of feedback dealing with users' personal data. Yandex.Mail has been exchanging data with other mail services via SSL / TLS for over a year now that support this.



    We all know that HTTPS is HTTP wrapped in TLS. Why TLS and not SSL? Because fundamentally TLS is a newer SSL , and the name of the new protocol most accurately describes its purpose. And in light of the POODLE vulnerability, you can officially assume that SSL can no longer be used.

    Along with HTTP, almost any application layer protocol can be wrapped in TLS. For example, SMTP, IMAP, POP3, XMPP, etc. But since HTTPS deployment is the most widespread problem and, due to the behavior of browsers, has a large number of subtleties, I’ll tell you about it. However, with some assumptions, many things will be true for other protocols. I will try to talk about the necessary minimum that will be useful to our colleagues.

    I will conditionally divide the story into two parts - infrastructure, where everything below HTTP will be, and part about changes at the application level.

    Termination


    The first thing that a team that wants to deploy HTTPS will have to deal with is choosing a TLS termination method. TLS termination is the process of encapsulating an application layer protocol in TLS. There are usually three options to choose from:

    1. Use one of many third-party services - Amazon ELB , Cloudflare , Akamai and others. The main disadvantage of this method will be the need to protect channels between a third-party service and your servers. Most likely, this will still require the deployment of TLS support in one form or another. A big drawback will also be a complete dependence on the service provider in terms of supporting the necessary functionality, the speed of fixing vulnerabilities. A separate problem may be the need to disclose certificates. Despite this, this method will be a good solution for startups or companies using PaaS .
    2. For companies using their own hardware and their data centers, a possible option would be Hardware load balancer with TLS-termination functions. The only advantage here is performance. Choosing such a solution, you find yourself completely dependent on your vendor, and since often the same hardware components are used inside the products, it also depends on the chip manufacturers. As a result, the timing of adding any features is far from ideal. Potential customs difficulties with the import of such products will be left outside this material.
    3. Software solutions - the golden mean. Existing opensource solutions - Nginx , Haproxy , Bud , etc. - give you almost complete control over the situation, adding features, optimizations. The downside is performance - it is lower than hardware solutions.

    At Yandex, we use software solutions. If you go our way, the unification of components will be an important step in deploying TLS for you.

    Unification


    Historically, at different times, our services used different software for web servers, therefore, in order to unify everything, we decided to abandon most of the solutions in favor of Nginx, and where it is impossible to refuse, “hide” them for Nginx. The exception in this case was a search that uses its own development called - suddenly - Balancer.

    The balancer can do many things that other, even commercial, solutions cannot. One day, I think the guys will talk more about this. With a talented development team, we can afford to maintain one of our own web server in addition to Nginx.

    As for cryptography itself, we use the OpenSSL library. Today it is the most stable and productive implementation of TLS with an adequate license. It is important to use OpenSSL version 1+, since it optimized the work with memory, there is support for all the necessary modern ciphers and protocols. All of our further recommendations will be targeted at users of the Nginx web server.

    Certifications


    To use HTTPS on your service you will need a certificate. A certificate is a public key and a certain data set in ASN.1 format , signed by a Certificate Authority. Typically, such certificates are signed by Intermediate Certification Authorities (Intermediate CA) and contain the domain name of your service in Common Name, or the Alt Names extension.

    To verify the validity of the certificate, the browser tries to verify the validity of the digital signature for the final certificate, and then for each of the intermediate certification authorities. The certificate of the latter in the chain of certification centers must be signed by the so-called Root Certification Authority (Root CA).

    Certificates of root certification authorities are stored in the operating system or in the user's browser (for example, in Firefox). When setting up the web server, it is important to send the client not only the server certificate, but also all the intermediate ones. In this case, you do not need to send the root certificate - it is already in the OS.



    Large companies can afford to have their own Intermediate CA. For example, until 2012, all Yandex certificates were signed by YandexExternalCA. Using your own Intermediate CAit gives both additional opportunities for optimization and pinning of certificates, and imposes additional responsibility, as it allows you to issue a certificate for almost any final domain name, and in case of compromise it can lead to serious consequences, up to the revocation of an intermediate CA certificate.

    Maintaining your own CA can be too expensive and complicated, which is why some companies use them in MPKI - Managed PKI mode . For most consumers, purchasing certificates using one of the commercial suppliers will suffice.

    All certificates can be divided into the following characteristics:
    1. Digital signature algorithm and hash function used;
    2. Type of certificate.

    Digital Signature Algorithm - cryptographic algorithms with a public key are used to sign certificates, most often it is RSA , DSA or ECDSA . We will not dwell on the GOST family of algorithms, since they have not yet received massive support in client software.

    RSA certificates are the most widely used today and are supported by all protocol and OC versions.

    The disadvantage of this algorithm is the key size and comparable performance when generating and verifying a digital signature. Since certificates with a key size of less than 2048 bits are unsafe, and operations with a larger key consume a large amount of processor resources.

    DSA-like circuits are faster than RSA when generating signatures (with the same parameter sizes), while ECDSA is much faster than classical DSA, since all operations take place in the group of points of the elliptic curve. According to our tests, one Xeon 5645 server allows you to make up to 3200 TLS handshakes per second on the Nginx web server using a certificate signed by RSA with a key size of 2048 bits (ECDHE-RSA-AES128-GCM-SHA256). At the same time, using the ECDSA certificate (ECDHE-ECDSA-AES128-GCM-SHA256), you can already make 6300 handshakes - the difference in performance is almost doubled.

    Unfortunately, Windows XP <SP3 and some browsers, whose share among clients of large sites is non-zero, do not support ECC certificates.

    The durability of the most common EDS algorithms directly depends on the durability (security) of the hash function used. The main hashing algorithms used are:
    1. MD5 - today it is considered unsafe and not used;
    2. SHA-1 - was used to sign most of the certificates until 2014, is now recognized as unsafe;
    3. SHA-256- an algorithm that has already come to replace today SHA-1;
    4. SHA-512 - Today it is used quite rarely, so we will not dwell on it.

    SHA-1Already today it is officially considered unsafe and is gradually being phased out. So Yandex.Browser and other browsers of the Chromium family in the coming months will begin to mark certificates that are signed using SHA-1and which expire after January 1, 2016 , as unsafe. All new certificates must be properly signed using SHA-256. Unfortunately, not all browsers and OS (WinXP <sp3) support this hash function, and for really large resources this can lead to loss of clients.

    All server end certificates used for TLS can be conditionally divided by the validation method - Extended Validated and others (most often Domain Validated).

    Technically, in the extended validated certificate, additional fields with the EV sign and often with the company address are added. The presence of an EV certificate implies legal verification of the existence of the certificate holder, while certificates of the Domain Validated type confirm only that the certificate holder does control the domain name.

    In addition to the appearance of a beautiful green dash, the sign of an EV certificate also affects the behavior of browsers associated with checking their revocation statuses. So even browsers of the Chromium family that do not use either OCSP or CRL, but rely only on Google CRLsets, for EV check their status using the OCSP protocol. Below I will talk about the features of these protocols in more detail.

    Now that we’ve figured out the certificates, we need to understand which protocol versions will be used. As we all remember, the SSLv2 and SSLv3 version protocols contain fundamental vulnerabilities. Therefore, they must be disabled. Almost all clients now have TLS 1.0 support. We recommend using TLS versions 1.1 and 1.2.

    In case you have a significant number of clients using SSLv3, you can only allow it to be used with the RC4 algorithm as a compensatory measure - we did just that for the transition period. However, if you do not have as many users with older browsers as ours, I recommend completely disabling SSLv3. The correct configuration for Nginx in terms of protocols will look like this:
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

    Regarding the choice of ciphersuites or cipher suites and hash functions that will be used for TLS connections, web servers should use only secure ciphers. It is important to strike a balance between security, speed, performance and compatibility.

    Security vs. Perfomance


    It is generally accepted that using HTTPS is very costly both in terms of server-side performance and the speed of loading and rendering resources on the client side. This is partly true - incorrectly configured HTTPS can add 2 (or more) Round Trip Times (RTT) for the handshake.

    In order to mitigate the delays that occur during the implementation of HTTPS, the following methods are used:

    1. Content Delivery Networks (CDN). By placing the termination point closer to the client, RTT can be reduced. Thus, making the delays that occur during the implementation of HTTPS, imperceptible. Yandex successfully uses this technique and is constantly increasing the number of points of presence.

      image
    2. Optimization of certificate status checks. When a secure connection is established, some browsers check the revocation status of the server certificate. Such checks make it possible to verify that the certificate has not been revoked by the owner. The need to revoke the server certificate may arise, for example, after compromising the private keys. So in bulk order certificates were revoked after the discovery of the Heartbleed vulnerability.

    Today, there are two main protocols used to check certificate statuses:
    • Certificate Revocation Lists. When using this method, the browser uses the HTTP protocol to download from the URL specified in the certificate a list of serial numbers of the revoked certificates. This list is controlled and signed by CA. Since the file with the list can be large, it is cached for a given period of time, most often for 1 week).
    • Online Certificate Status Protocol.


    Since both protocols work on top of HTTP and at the same time checking the status of the certificate is a blocking procedure, where the servers distributing CRL or OCSP are located, the responders can directly affect the speed of the TLS handshake.

    Different browsers check certificate statuses differently. So Firefox uses only OCSP for regular certificates, but CRL is also checked for EV. IE and Opera check both CRL and OCSP, and Yandex.Browser and other browsers of the Chromium family do not use traditional protocols, relying on CRLsets - lists of revoked certificates of popular resources that come with browser updates.

    To optimize the checks, a mechanism called OCSP stapling was also invented ., which allows the client to send an OCSP responder's response in the form of a TLS extension along with a certificate. All modern desktop browsers support OCSP stapling (except Safari).

    Enable OCSP stapling with nginx can be the following directive: ssl_stapling on;. In this case, be sure to specify resolver .

    But if you have a really large and loaded resource, most likely you will want to be sure that the OCSP responses that you cache (Nginx caches responses for 1 hour) are correct.
    ssl_stapling_verify on;
    ssl_trusted_certificate /path/to/your_intermediate_CA_and_root_certs;


    When using OCSP stapling, mass resources may encounter a problem such as incorrect time on the client system. This is due to the fact that, according to the standard, the responder’s response time is limited to a clearly defined time interval, and the time on the client machine can be 5-10-20 minutes behind. To solve this problem for users, we had to teach the server to give out answers about a day after they were generated (approximately the same thing we do when laying out new certificates).

    Thus, we have the opportunity to show a warning about the wrong time to those people whose system time is down for a period of up to a day. In order to randomly rotate OCSP responses, the ssl_stapling_file directive is used.". For those clients that do not support OCSP stapling, we use caching of the responses of OCSP responders in our CDN, thereby reducing the response time.

    Another effective way to optimize checks is to use short-lived certificates, that is, those that do not have status verification points But the life span of such certificates is very short - from one to three months.

    Certification Authorities can afford to use such certificates. Almost always they are used for OCSP responders, since there are verification points status in the certificate can cause Internet Explorer to check the status of the certificate of the OCSP responder itself, which will create additional delays.
    But even when using OCSP stapling or short-lived certificates, the standard TLS handshake (4 steps) will add 2 RTT delays.

    The TLS False Start mechanism allows you to send application data after 3 stages without waiting for a server response, thus saving 1 RTT. TLS False Start is supported by browsers of the Chromium family and Yandex.Browser, IE, Safari, Firefox.



    Unfortunately, unlike browsers, not all web servers support this mechanism. Therefore, the following requirements are usually a signal to use TLS False Start:
    • The server announces NPN / ALPN (not required for Safari and IE);
    • Server uses Perfect Forward Secrecy ciphersuites.


    Perfect forward secrecy


    Prior to SSLv3, an attacker who gained access to the server’s private key could passively decrypt all communications that passed through the server. Later, had devised the Forward Secrecy (sometimes used Perfect prefix), which uses a key agreement protocol (usually based on the scheme Diffie-Hellman , and ensures that the session keys can not be recovered, even if an attacker gains access to the private key of the server.

    Typical nginx configuration for a service working with user data looks like this:
    ssl_prefer_server_ciphers on;
    ssl_ciphers kEECDH+AES128:kEECDH:kEDH:-3DES:kRSA+AES128:kEDH+3DES:DES-CBC3-SHA:!RC4:!aNULL:!eNULL:!MD5:!EXPORT:!LOW:!SEED:!CAMELLIA:!IDEA:!PSK:!SRP:!SSLv2;


    In this configuration, we set the maximum priority for AES with a 128-bit session key, which is formed according to the Elliptic Curve Diffie Hellman (ECDH) algorithm. Next come any other ciphers with ECDH. The second “E” in the abbreviation stands for Ephemeral, i.e. A session key that exists within the same connection.
    Next, we allow the use of the usual Diffie Hellman (EDH). It is important to note here that using a Diffie Hellman with a key size of 2048 bits can be quite expensive.

    This part of the config provides us with PFS support. If you are using AES-NI- enabled processorsthen AES will be free for you in terms of resources. Disable 3DES, enable AES128 in non-PFS mode. We leave 3DES and EDH and 3DES in CBC mode for compatibility with very old customers. Disable unsafe RC4 and more. It is important to use the latest versions of OpenSSL, then "AES128" will be deployed including AEAD ciphers .

    PFS has one drawback - performance penalties. Modern web servers (including Nginx) use an event-driven model. At the same time, asymmetric cryptography is most often a blocking operation, since the web server process is blocked, and the clients it serves suffer. What can we optimize in this place?

    1. SPDY.



      If you read about the experience of implementing SPDY in Mail , you noticed that SPDY allows you to reduce the number of connections, and hence the number of handshakes. In nginx 1.5+, SPDY is enabled by adding 4 letters to the config (the server must be built with the spdy module --with-http_spdy_module).

      listen 443 default spdy ssl;
    2. Use elliptical cryptography. Asymmetric cryptography algorithms using elliptic curves are more efficient than their classical inverse images, which is why when setting up ciphersuites we increase the priority for ECDH. As I wrote earlier, in addition to using ECDH, you can use certificates with digital signature on elliptic curves (ECDSA), which will increase productivity.

      Unfortunately, Windows XP <SP3 and some other browsers, whose share among clients of large sites is non-zero, do not support ECC certificates. The solution may be to use different certificates for different clients, which will save resources due to newer clients, which are the majority. Openssl version 1.0.2 allows you to select a server certificate depending on client settings. Unfortunately, while Nginx “out of the box” does not allow the use of multiple certificates for a single server.
    3. Use session reuse. Reusing sessions can not only save server resources (excluding work with asymmetric cryptography) for PFS / False Start connections, but also reduce the TLS handshake delay to 1RTT for regular connections.




    There are currently two session reuse mechanisms:
    1. SSL session cache. This mechanism is based on the fact that for each connection a unique identifier is given to the client, and a session key is stored on the server using this identifier. A plus is support for almost all, including older, browsers. The downside is the need to synchronize caches containing critical data between physical servers and data centers, which can lead to security problems.



      In the case of Nginx, the session cache will work only if the client gets to the same real place where the original SSL handshake took place. We still recommend enabling SSL session cache, as it will be useful for configurations with a small number of reals, where the probability of a user getting to the same real is higher.

      In nginx, the configuration will look something like this, where SOME_UNIQ_CACHE_NAME is the cache name, which is recommended to use different identifiers for different certificates (not necessary in nginx 1.7.5+, 1.6.2+), 128Mb is the cache size, 28 hours is the session lifetime. With an increase in the lifetime of the session, you need to be prepared to ensure that the error logs may appear entries of the form: . This is due to the peculiarity of squeezing data out of the session cache in nginx - an attempt is made to allocate memory for the session, if the limit is reached, one of the oldest sessions is killed and the operation is repeated again. That is, the session is successfully added to the buffer, but an error is written to the log when the first call to the allocator function. Such errors can be ignored - they do not affect the functionality (fixed in Nginx 1.4.7).
      ssl_session_cache shared:SOME_UNIQ_CACHE_NAME:128m;

      ssl_session_timeout 28h;



      2014/03/18 13:36:08 [crit] 18730#0: ngx_slab_alloc() failed: no memory in SSL session shared cache "SSL"

    2. TLS session tickets . The mechanism is supported only by browsers of the Chromium family, including Yandex.Browser, as well as Firefox. In this case, the client is sent the session state, encrypted with a key known to the server, as well as the key identifier. In this case, only keys are shared between the servers.



      Nginx added static key support for session tickets in versions 1.5.8+. Setting up tls session tickets when working with multiple servers is done as follows:
      ssl_session_ticket_key current.key;
      ssl_session_ticket_key prev.key;
      ssl_session_ticket_key prevprev.key;


      In this case, current.key is the key that is currently in use. Prev.key - the key used N hours before using current.key. Prevprev.key - a key used N hours before using prev.key. The value of N must be equal to that specified in ssl_session_timeout . We recommend starting at 28 hours.

      An important point is the key rotation method, as an attacker who has stolen a key to encrypt tickets can decrypt all sessions (including PFS) within the key's life.


    Yandex has special mechanisms for generating and safely delivering keys to end servers.

    Applications


    Csp


    Once the infrastructure problems have been resolved, back to the applications. The first thing you will need to do is get rid of the so-called mixed content. It all depends on the scale of the project, the quantity and quality of the code. Somewhere you can get by with sed, or using nginx , but somewhere you have to look for hardcoded http schemes in the DOM tree. The Content Security Policy mechanism came to our aid; our colleagues from the Mail wrote about its implementation earlier .

    By adding such a heading to the test bench, you will receive reports on any content that is loaded using protocols other than data:andhttps:
    Content-Security-Policy-Report-Only: default-src https:; script-src https: 'unsafe-eval' 'unsafe-inline'; style-src https: 'unsafe-inline'; img-src https: data:; font-src https: data:; report-uri /csp-report

    Secure cookies


    After you get rid of mixed content, it’s important to ensure that the Secure attribute is set for cookies. It tells the browser that these cookies cannot be sent over an insecure connection. So in Yandex there are two cookies so far - sessionid2 and Session_id, one of which is sent only through a secure connection, and the other is still “unsafe” for backward compatibility. Without a “safe” cookie, you cannot get to Mail, Disk and other important services.

    Set-Cookie: session=1234567890abcdef; HttpOnly; Secure;

    Hsts


    And finally, after you have verified that your service works correctly using the HTTPS protocol, put a redirect from the HTTP version to HTTPS, it is important to tell the browser that you can no longer access this resource using the unprotected HTTP protocol.

    To do this, the HTTP Strict Transport Security header was coined.
    Strict-Transport-Security: max-age=31536000; includeSubdomains;

    The max-age parameter sets the period (1 year) during which the secure protocol should be used. The optional includeSubdomains flag indicates that all subdomains of a given domain can also be accessed only through encrypted connections.

    To ensure that users of the Chromium and Firefox family of browsers always use secure connections, even at the first call, you can add your resource to the HSTS preload list of the browser. In addition to ensuring safety, it will also save one redirect on first use.

    To do this, add the “preload” flag to the header and specify the domain here: hstspreload.appspot.com .
    Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

    For example, Yandex.Passport is added to the preload list of browsers.

    Conclusion


    The whole configuration of a single nginx server will look something like this:

    http {
    [...]
    ssl_stapling on;
    resolver 77.88.8.1; # или 127.0.0.1 если используется локальный
    keepalive_timeout     120 120;
    server {
        listen              443 ssl spdy;
        server_name         yourserver.com;
        ssl_certificate     /etc/nginx/ssl/cert.pem; # сертификат сервера
        ssl_certificate_key /etc/nginx/ssl/key.pem; # ключ сервера
        ssl_dhparam         /etc/nginx/ssl/dhparam.pem; # генерируется командой openssl dhparam 2048
        ssl_prefer_server_ciphers on;
        ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
        ssl_ciphers kEECDH+AES128:kEECDH:kEDH:-3DES:kRSA+AES128:kEDH+3DES:DES-CBC3-SHA:!RC4:!aNULL:!eNULL:!MD5:!EXPORT:!LOW:!SEED:!CAMELLIA:!IDEA:!PSK:!SRP:!SSLv2;
        ssl_session_cache    shared:SSL:64m;
        ssl_session_timeout  28h;
        add_header Strict-Transport-Security "max-age=31536000; includeSubDomains;";
        add_header Content-Security-Policy-Report-Only "default-src https:; script-src https: 'unsafe-eval' 'unsafe-inline'; style-src https: 'unsafe-inline'; img-src https: data:; font-src https: data:; report-uri /csp-report";
        location / {
            ...
        }
    }


    In conclusion, I would like to add that HTTPS is gradually becoming the de facto standard for working with WEB, and is used not only by browsers - most mobile application APIs start working using the HTTPS protocol. Some features of the safe implementation of working with HTTPS in mobile can be found in the report by Yuri tracer0tong Leonyshev at a community work day in Nizhny Novgorod.

    Also popular now: