R01 registrar crashes in DNS and some fatal accidents

    Today, one of the oldest registrars of R01 announced a failure in DNS .
    In this regard, I want to tell you a little instructive story about how it almost killed our company.

    By nature, we are saas analytics for the web. Our main weapon is a javascript file that collects statistics. The file is distributed to many sites of our users, so we must ensure its impeccable stability, the inaccessibility of our site should not affect the sites of our customers. And we spent a lot of effort to ensure complete stability: we put the script in an excellent powerful CDN, made our domain to abstract this CDN (so that you can change the CDN at any time if it fails or becomes too expensive). But they did not take into account one trifle: the DNS server was located at the registrar.

    The failure of R01's DNS was that all domains resolved into one specific IP address, which showed a normal domain parking page with advertisements for HTTP requests. Or it didn’t show anything, because this IP came up with a spontaneous DoS attack from all the people who tried to access the familiar sites that kept DNS at R01. But not in our case. The server responds to /somescript.js requests with a script . And not just a script, but dynamically generated, like this:

    var redir_url = 'http: // domain that resolved to the fateful IP /';
    if (window! = top) {
    top.location.href = redir_url;
    } else {
    window.location = redir_url;
    }


    The script cut off the request parameters (still on the server) and redirected to the base domain. All our users who connected analytics, instead of our script on their sites, received a stranger who ran a redirect from their site to the left page.

    Partially, it saved us only that almost all the time the server that was responding to that IP was under DDoS because of the number of requests to it, because of which the vast majority of users had requests for our script fall off by timeout (if the script was unavailable, we , of course, we do not affect other people's sites in any way). But those fractions of a percent who were "lucky" to receive a response from the server received a redirect.

    The fatal combination of circumstances, as often happens, almost cost us 2 years of the project. Just unavailable DNS; Available DNS, which routes requests to an unavailable server accessible DNS + available server, but which does not respond to script requests with a script; all the previous, but a script that does not redirect - all this would help to avoid a catastrophic effect on the sites of our users. I don’t know if we can get out after such a blow to our service, but many customers, of course, noticed complaints from their users about incomprehensible redirects. And we didn’t hide it, we immediately sent out an e-mail with a request to disable our service until the DNS cache was updated around the world.

    The moral of the story is simple: it is impossible to foresee everything, and even such reliable and familiar things as DNS servers cannot be trusted.

    It was extremely difficult to foresee such a situation: too much was not in our favor. Of course, we will change the DNS provider, and even set up our servers, if it is more reliable. But DNS and CDN can break, anything. The only thing that can be done to minimize the chance of flying in as much as we do: make sure that at every point in your system the best, most reliable of possible solutions is used. Even if it is the choice of a DNS server or registrar.

    Also popular now: