Modernization of the operational data center

Original author: Nick Craver
  • Transfer
In our blog we write a lot about the construction of the 1cloud cloud service (for example, about the implementation of the server disk space management function on the fly), but a lot of interesting things can be gleaned from the experience of working with the infrastructure of other companies.

We already talked about the imgix photo service data center, described the history of problems with SSDs of the Algolia project, and today we’ll talk about upgrading the Stack Exchange data center. / photo Dennis van Zuijlekom CC




The Stack Exchange Network is a network of Q&A-related websites across a wide range of areas of expertise. The first network site was launched in 2008, and already in 2012 the network grew to 90 sections and 2 million users. Among social functions, voting for questions and answers is available, and editing content is possible in a wiki mode. One of the most popular sites is Stack Overflow, which covers technology topics and development and programming issues.

The basic infrastructure of the company is located in New York and New Jersey, and due to the general approach to the exchange of experience, the company decided to talk about how engineers modernized their data center and moved it from New York.

The decision to upgrade equipment was made at headquarters in Denver. At that meeting, the optimal hardware life cycle was determined (approximately 4 years). This figure is based on the operating experience of the very first kit, which just exceeded the mark of 4 years of operation. Of course, the attracted investments also helped to make the decision .

The preparation of the plan took three days. The start of the update coincided with a warning of snowfall , but the team decided to continue working. Before starting the upgrade, the engineers solved an important task with Redis servers occupied with Targeted Job ads. These machines were initially driven by Cassandra and later by Elasticsearch.

For these systems, Samsung 840 Pro hard drives were ordered, which, as it turned out, have a serious bug in the firmware. Because of this bug, copying from server to server on a duplicated 10-gigabit network would be carried out at a speed of 12 MB / s, but it was decided to update the firmware. When the process was started, it went in the background, in parallel with other work (since rebuilding a RAID10 array with data on board requires several tens of minutes, even if an SSD is used). As a result, it was possible to increase the speed of copying files to 100-200 MB / s.

In server rack C, it was planned to replace the 10-Gbps SFP connection with an external 1-Gb channel width with a 10GBASE-T interface (RJ45 connector). This was done for several reasons: the cable used for SFP + is calledTwinaxial : It is very difficult to route and difficult to directly connect to the daughter cards of Dell servers. Also, SFP + FEX do not allow connecting devices of the 10GBASE-T standard that may appear later (it is in this rack that they are not , but they can be in others, for example, in a load balancer).

The plans were to simplify the network configuration, reduce the number of cables and the number of elements in general, while saving 4U of free space. During the work, it was necessary to maintain the functioning of the KVMs, which were temporarily relocated. Next, SFP + FEX was lowered to install the new 10GBASE-T FEX in its place. The principle of operation of FEX technology allows us to allocate from 1 to 8 uplink ports for each switch. During the upgrade, the connection diagram will change as follows: 8/0 -> 4/4 -> 0/8.

It was decided to replace the old virtual servers with two Dell PowerEdge FX2 chassis(with two FC630 blade servers each). The blade server has two 18-core Intel E5-2698v3 processors and 768 GB of RAM. Each hardware unit has uplink ports with a transmission speed of 80 Gb / s and four dual-channel 10-gigabit I / O modules. I / O aggregators , in essence, are full switches with four external and eight internal ports at 10 Gb / s (two on a blade server).

When the new blade servers went online, all the virtual machines migrated to these two hosts and were able to dismantle the old servers to free up more space. After that, the last two blade servers were launched. As a result, processors of significantly greater power, a larger amount of memory (the volume increased from 512 GB to 3078 GB) and an upgraded network component (new blade servers with 20-gigabyte trunk ports for communication with all networks) were at the disposal of the company.

Next, the new EqualLogic PS6210 storage unitreplaced the old PS6200 with twenty-four 10k HDDs with 900 GB and SFP + transceivers. The newer version supports the 10GBASE-T standard and contains twenty-four 10k HDDs of 1.2 TB, that is, it works faster, has large disk space and the ability to connect / disconnect to / from storage systems. Along with the new storage system, a new NY-LOGSQL01 server was installed, which replaced the obsolete Dell R510. The space freed from the virtual machine hosts made it possible to install a new file server and support server.

Due to the worsening weather conditions, public transport stopped walking, and the storm was even able to cut down electricity. The team decided to continue working as much as possible in order to get ahead of schedule. The guys from QTS came to the rescuewho ordered the manager to find at least some place to sleep.

When all the virtual machines started working, the storage was configured, and several wires were torn out, the clock showed 9:30 AM Tuesday. The second day began with the assembly of web servers: upgrade the R620, replacing four gigabit daughter cards with two 10-gigabit and two gigabit ones.

At this stage, only the old storage system and the old network storage for SQL R510 remained connected, but there were some difficulties with installing a third PCIe card on the backup server. It all came down to the tape drive, which needed another SAS controller that was simply forgotten.

The new plan included using the existing 10-gigabit network daughter cards (NDC), removing the PCIe SFP + adapter and replacing it with a new 12-gigabit SAS controller. For this, it was required to independently make the replacement of the mounting bracket from pieces of metal (cards are supplied without brackets). Then it was decided to swap the last two SFP + blocks in rack C with a backup server (including the new DAS MD1400).

The last element that took its place at rack C was NY-GIT02, the new Gitlab and TeamCity server. This was followed by a lengthy process of cable management, which had to deal with almost everything (and in relation to previously connected equipment). After completing work on two web servers, laying cables and removing network equipment, the clock showed 8:30 in the morning.

On the third day, the team returned to the data center at about 5 pm. It remained to replace the Redis servers and the “service servers” (the server for implementing the tag engine, the ElasticSearch indexing server and others). One of the boxes intended for the tag engine was the well-known R620, which received an upgrade to 10 Gb a little earlier, so now it has not been touched.

The process of rebuilding the array was as follows: deploy from a Windows 2012 R2 image, install updates and DSC, and then specialized software through scripts. StackServer (from the point of view of the system administrator) is just a window service, and the TeamCity build controls the installation - there is a special flag for this. These boxes also have small instances of IIS for internal services, which are a simple add-on. Their last task is to organize access to DFS storage.

The old Dell R610 web server solution with two Intel E5640 processors and 48 GB of RAM (updated over time) has been replaced by a new one with two Intel 2678W v3 processors and 64 GB of DDR4 memory.

Day 4 went to work with rack D. It took time to add the sleeves for cables (where they were missing) and carefully replace most of the wires. How to understand that the cable is plugged into the correct port? How do you know where its other end goes? Tags. Many, many marks. It took a lot of time, but this will save him in the future.

Entire photo report

Stream #SnowOps on Twitter

Upgrades of equipment of this scale do not pass without problems. Similar to the situation in 2010, when the database servers were updated, and the performance increase was not what I would like to see, a bug was detected with a system with BIOS versions 1.0.4 / 1.1.4 (where performance settings are simply not taken into account).

Other issues included a problem when running DSC configuration scripts, starting a bare server with IIS, moving disks from an R610 RAID to R630 (a boot request appears via PXE), and so on. In addition, it turned out that the Dell MD1400 DAS arrays (13 generations and 12 Gb / s) do not support connection to 12 generation servers, for example, to the R620 backup server, and the Dell hardware diagnostics do not affect power supplies. By the way, the hypothesis was confirmed that it is rather problematic to turn the web server upside down when several important elements are twisted from it.

As a result, the team managed to reduce the page processing time from approximately 30-35 to 10-15 ms, but according to them - this is only a small tip of the iceberg.

PS We are trying not only the experience of working on the service for providing virtual infrastructure 1cloud , but also the experience of Western experts in our blog on Habré. Do not forget to subscribe to updates, friends!

Also popular now: