Half the fire that never happened: how we moved to the new data center

    imageTwo crossings are equivalent to one fire.
    (Folk wisdom)


    Instead of the foreword


    A well-known heart surgeon arrives at a car service and hands over his car for repair. A mechanic working in the workshop, taking this opportunity, called to the doctor and asked him the question:

    - Doctor! In fact, we are doing the same thing: I take out the "hearts" of cars, pull out valves from them, put in new ones. And I can replace the whole engine. One way or another, after my work, the car continues to live with a new “heart”. But you are rowing money with a shovel, and I get a penny for my work. Why is that?!

    To which the doctor reasonably remarked:

    - And you, dear, try to make a major overhaul of the working engine!

    imageWe are growing fast, and we constantly need new capacities to accommodate our equipment. Moreover, the growth of our volumes in no case should result in a decrease in the quality of our services. This is a strategic challenge.

    Summer is the time of holidays, the most “quiet” period for most webmasters: an extraordinary scheduled “reboot” of the server is perceived more calmly.

    We waited for the summer - and moved to our new data center !

    Task


    It would seem, what can I talk about here? Indeed, at first glance, there is nothing tricky in moving: with a certain accuracy, you can easily transport anything and anywhere - especially when it comes to transporting boxes of iron, which are server and network equipment by nature. In fact, the task of moving the hosting to a new technical site, and even in St. Petersburg (this is an important point!), Has spicy features - in particular, it is highly desirable that the hosting process continues to work during the move. Thus, the main problem that was to be solved in the process of moving was minimizing downtime in the provision of services . Based on this goal, funds were chosen.

    Relocation planning was carried out on the basis of the following data:

    • Venue - St. Petersburg. The old and new data centers are located on different banks of the Neva River at a distance of 12 kilometers from each other. For reference: in the summer at night, bridges in the city are bred.
    • It is required to transfer from the old data center to new servers and other equipment, with the help of which virtual hosting services (shared, premium) are provided for several tens of thousands of sites, VDS rental services, dedicated customer servers dedicated services.
    • The completion of the operation of the old technical site and the start of work at full capacity in the new location was planned to be implemented within three weeks.

    Solutions


    It was possible to solve the problem before us in various ways, each of which was carefully analyzed. Three main groups of solutions were identified.

    Simple, cheap, clumsy


    The simplest solution would look like this:

    • Remove all the iron.
    • Load it in a truck, bring it to a new technical site.
    • Mount it there, run it and see what happens.
    • While the servers are moving from place to place, make a change in the announcement of networks and continue to work on all sites in a new place using the old IP addresses.

    Pros:

    • It would be very cheap.
    • The solution is relatively easy to implement.
    • It would have turned out very quickly in terms of the duration of the entire relocation process.
    • There would be no need to make changes to the zones hosted on the hosting domain.

    Minuses:

    • A huge risk of damage to equipment (including, all at once, as well as without the possibility of recovery) during transportation.
    • Significant downtime: it would take several hours to dismantle and install all the equipment, which is completely unacceptable.

    The quantitative predominance of the advantages of such a relocation scenario could not outweigh the materiality of its minuses, and they did not accept the option.

    Time-consuming, expensive, elegant


    An elegant solution would be to deploy a completely new technical platform in a new place - new equipment in the amount available in the old data center, new networks of IP addresses. After the readiness of the new site, it would be possible:

    • Manually transfer each site from the old server to the corresponding new one (copy files, databases, mail).
    • By decreasing the TTL for domain zones in advance, change the values ​​of the corresponding records.
    • During the DNS update, organize proxying so that for visitors who, for one reason or another, have DNS information cached, sites also open from old addresses.

    Pros:

    • Low downtime sites. For many, the move would have gone completely unnoticed.

    Minuses:

    • To transfer several tens of thousands of sites, an indefinitely large amount of time is required for the work of qualified specialists. Given the need to carry out ongoing work, the move would drag on for months.
    • Not all domains of sites hosted by us are delegated to our NS-servers. For those who independently support zones of their domains, no elegance in this solution would be revealed, quite the opposite.
    • It is impossible to predict or somehow influence the time of updating DNS information - this process does not depend on the hosting provider (for more details see the article All about domain registration and transfer ).
    • Our hosting hosts are not only sites in the usual sense of the word: a number of clients use our computing power to host specialized software, and the described approach for transferring such services would simply not be applicable.
    • An impressive investment would be required in the acquisition of an excess amount of new equipment, time for its setup, and the cost of its maintenance.

    The number of minuses seriously outweighs the advantages, and this solution was also considered inappropriate: no one has the opportunity to use uncontrolled processes as a tool to solve their problems.

    Life


    Having analyzed the factors that determine the duration of a break in the provision of services, we have developed a solution that we are proud of at heart.

    Technical factors


    image
    • Our company has the status of a local Internet registry ( LIR ) and operates its own address space. In this matter, we are not dependent on Internet providers, which significantly helped us when moving. In order to avoid the need to make changes to the domain zone records of the sites hosted by us, we decided to continue to work on the current address space. In order to ensure the possibility of its simultaneous use at both technical sites, the data centers were connected by a virtual network ( VLAN ). In a practical sense, this gave us the opportunity to turn off the server on the old technical site and turn it on on the new one without changing the IP addresses and without the need to make changes to the routing.
    • Before moving the server with its own services of the company (billing, the main NS server, etc.), the operation of the backup NS server, which is located physically outside the main technical site, was additionally checked.

    Organizational Factors


    • Transportation of hard drives was carried out separately from the servers themselves; in the new data center they were installed in new servers: this saved time on the dismantling and installation of equipment and SCS; in addition, organizing the night-time transportation of hard drives invisible to an external observer is much easier than transporting multiple servers together.
    • In order to minimize the likelihood of being stopped by the traffic police, the disks were transported in an ordinary passenger car that was driving without the slightest violation of traffic rules (high-speed mode, etc.).
    • The shared and premium servers were transported on weekends at night — exactly after bridges were brought together. The transportation time for dedicated servers was previously agreed with customers.

    Upon completion of the physical transfer of equipment to the new data center, we only had to “pay off” the network in the old data center and make changes to the routing of our networks, which was successfully done. From the break in the work of the site, it was possible not to notice. For those who, for technical reasons, nevertheless noticed him, the visibility of the site "disappeared" for no more than 10 minutes.

    Of the minuses of the decision made and put into practice, only significant laboriousness and some overhead expenses (for example, for the purchase of “buffer” equipment for a new technical site) should be noted. But these moments did not affect the qualitative side of the process, therefore, they turned out to be acceptable.

    Organizational conclusions


    Of course, we did not succeed in making “overhaul of the working engine” - for objective reasons it is impossible to change the physical position of the equipment without suspending its operation. But we are glad that we were able to prevent the occurrence of “half a fire” —the physical moving of the equipment by the user of the shared hosting and most of the VDS or dedicated rental service customers looked completely indistinguishable from the ordinary full-time server reboot, performed, for example, to update hardware or system software : instead of the planned two hours of downtime, which we warned customers about in the newsletters, the average site unavailability time was 1 hour 20 minutes.

    Also popular now: