Inventing Servers - Open Compute Project

    Launched in 2011, the Facebook project called the Open Compute Project (OCP) involves the creation of open standards and equipment architectures for building energy-efficient and economical data centers. OCP began as a hardware project for Facebook's data center in Praineville, Oregon. As a result, Facebook decided to make the architecture open, including server boards, power supplies, server chassis and racks. The company has released OCP specifications with recommendations for compact and energy-efficient rack-mount server architecture and cooling techniques.



    Under the cut, we will take a closer look at what these servers are made of, how they work and what it gives.

    The Open Compute Project came about thanks to Facebook tech director Frank Frankowski . It was he who launched the initiative that allowed the industry community not only to get acquainted with the project of the Facebook data center in Oregon, but also to take part in the further development of the new architecture. The ultimate goal is to improve data centers and form an ecosystem in order to create more energy-efficient and cost-efficient servers.


    In general, the idea resembles the community of Open Source software developers who create and improve their products. The project turned out to be so interesting that it was supported by large companies. OCP has over one and a half hundred members.



    Server and storage architectures are created in accordance with the OCP Open Rack specifications , covering hardware components such as motherboards and power system components. The project also involves the development of standards, in particular, management standards. Last year, OCP replenished with new members. Now in Open Compute participate IBM, Microsoft, Yandex, Box.net and many other well-known companies.



    Cheaper, otherwise you will lose


    If the OCP architecture becomes the de facto standard for data centers, it can simplify the deployment and management of systems. But the main thing is the saving, which allows to provide customers with cheaper services and thereby win in a highly competitive market. OCP aims to increase MTBF, increase server density, ease of maintenance with access from a cold corridor, and improve energy efficiency, which is especially important for companies operating thousands of servers.

    So, for example, the Facebook Oregon data center consumes 38% less electricity than other data centers of the company, it uses an ad-free adiabatic cooling system, and the PUE value reaches 1.07-1.08 (and this without the use of water cooling), while the industry average is about 1.5. At the same time, capital expenditures were reduced by a quarter.

    Open Compute servers are lightweight and can operate at elevated temperatures. They are much lighter, but larger than a regular server - the case height is 1.5U instead of 1U. They feature high heatsinks and more efficient fans. The modular design of the Open Compute Server simplifies access to all of its components — processors, disks, network cards, and memory modules. No tools are required for its maintenance.

    By 2014, OCP specifications already covered a whole pool of “open” hardware - from servers to the infrastructure of data centers. The number of companies using OCP has also grown. As it turned out, many OCP innovations are suitable not only for large data centers, but also for use in solutions for private / public clouds.  

    This year, Facebook introduced several more developments under the OCP project. In particular, together with Intel, work is underway on a "server on a chip" Yosemite, and Accton and Broadcom are participating in the Wedge switch development project .

    From Freedom to Leopard


    The introduction of technologies created under the project has enabled Facebook to save more than $ 2 billion over three years. However, it should be borne in mind that such significant savings were also achieved due to software optimization, and five typical platforms were developed for each type of application.

    Today, these are mainly variants of the Facebook Leopard platform on Xeon E5 processors. It was the development of legacy servers, such as the 2012 Windmill system powered by Intel Sandy Bridge-EP and AMD Opteron 6200/6300 processors.

    Generations of OCP servers from Facebook

     
    Freedom (Intel)
    Freedom (AMD)
    Windmill (Intel)
    Watermark (AMD)
    Winterfell
    Leopard
    Platform
    Westmere-EP
    Interlagos
    Sandy bridge-EP
    Interlagos
    Sandy Bridge-EP / Ivy-Bridge EP
    Haswell-EP
    Chipset
    5500
    SR5650 /
    SP5100
    C602
    SR5650 /
    SR5670 /
    SR5690
    C602
    C226
    Models
    X5500 /
    X5600
    Opteron 6200/6300
    E5-2600
    Opteron 6200/6300
    E5-2600 v1 / v2
    E5-2600v3
    Sockets
    2
    2
    2
    2
    2
    2
    Thermo package, W
    95
    85
    115
    85
    115
    145
    RAM on socket
    3x DDR3
    12x DDR3
    8x DDR3
    8x DDR3
    8x DDR3
    8x DDR4
    / NVDIMM
    ~ Server node width (inches)
    21
    21
    8
    21
    6.5
    6.5
    Form Factor (U)
    1.5
    1.5
    1.5
    1.5
    2
    2
    Number of fans per node
    4
    4
    2
    4
    2
    2
    Fan size (mm)
    60
    60
    60
    60
    80
    80
    Number of drive bays (3.5``)
    6
    6
    6
    6
    1
    1
    Disk interface
    SATA II
    SATA II
    SATA III
    SATA III
    SATA III / RAID HBA
    SATA III
    / M.2
    Number of DIMM slots per socket
    9
    12
    9
    12
    8
    8
    Generation DDRX
    3
    3
    3
    3
    3
    4
    Ethernet
    1 GB fixed
    2 GbE fix
    2 GbE fix + PCIe mezzanine
    2 GbE fix
    1GbE fix + 8x PCIe Mezzanine
    8x PCIe
    Mezzanine
    Where deployed
    Oregon
    Oregon
    Sweden
    Sweden
    Pennsylvania
    ?
    PSU Model
    PowerOne SPAFCBK- 01G
    PowerOne SPAFCBK- 01G
    Powerone
    Powerone
    - -
    PSU Number
    1
    1
    1
    1
    - -
    PSU Power (W)
    450
    450
    450
    450
    - -
    Number of nodes
    1
    1
    2
    2
    3
    3
    BMC
    No (Intel RMM)
    Not
    No (Intel RMM)
    Not
    No (Intel RMM)
    Yes (Aspeed AST1250
    w 1GB
    Samsung DDR3 DIMM K4B1G1646G- BCH9)


    One of the problems with Freedom servers is the lack of a redundant PSU. Adding a PSU to each server would mean an increase not only in CAPEX, but also in OPEX, since in active / passive mode the passive PSU still consumes electricity.

    It was logical to group the power supplies of several servers in the chassis. This is reflected in the architecture of the Open Rack v1 racks, where the power supplies are located on the “power shelves” of 12.5V DC, supplying “zones” of 4.2 kW.


    Each zone has its own power shelf (3OU, OpenUnits, 1OU = 48 mm high). Power supplies are backed up according to the 5 + 1 scheme and occupy a total of 10OU in a rack. When the power consumption is low, part of the PSU is automatically turned off, allowing the rest to work with optimal load.

    Improvements also affected the power distribution system. There are no power cables that need to be disconnected at each server maintenance. Power is supplied via vertical power rails to each zone. When the server slides into the rack, a power connector is inserted into the back of it. A separate 2OU compartment is reserved for switches. Open Rack

    SpecificationsThey include the creation of 48U racks, which helps to improve air circulation in the equipment and simplifies access to equipment for technical personnel. The Open Rack rack is 24 inches wide, but the equipment bay is 21 inches wide — 2 inches wider than a regular rack. This allows you to install three motherboards or five 3.5-inch drives in the chassis.

    OCP Knox and others


    Open Rack v1 required a new server design. Using the Freedom chassis without a PSU left a lot of empty space, and simply filling it with a 3.5 "HDD would be wasteful, and for most Facebook loads, there weren’t so many drives needed. A solution similar to the power supplies was chosen. The drives were grouped and moved outside the server Nodes: the Knox storage system was born.


    Generally speaking, OCP Knox is a regular JBOD disk shelf created under Open Rack. HBA of neighboring Winterfell server nodes are connected to it. It differs from the standard 19 "design in that it can accommodate 30 3.5" disk drives and is very easy to maintain. To replace the disk, a “tray” is pulled out, the corresponding compartment opens, the disk is replaced, and everything slides back.

    Seagate has developed its own Ethernet storage device specification, known as Seagate Kinetic. These drives were an object storage connected directly to the data network. A new BigFoot Storage Object Open chassis was also developed with these drives and 12 10GbE ports in a 2OU package.

    On Facebook, for a similar purpose, they created the Honey Badger system - a Knox modification for storing images. It is equipped with Panther + computing nodes based on Intel Avoton SoC (C2350 and C2750) with four DDR3 SODIMM slots and mSATA / M.2 SATA3 interfaces.

    Such a system can work without a head node - usually a Winterfell server (Facebook does not plan to use Knox servers with Knox). A slightly modified version of Knox was used as an archive storage. Disks and fans in it start only when it is required.

    Another version of the archive system from Facebook using OpenRack will involve 24 stores with 36 container cartridges, 12 Blu-ray discs each. That is, the total capacity reaches 1.26 Pbytes. And Blu-ray discs are stored up to 50 years or more. The operation of the system resembles a jukebox.

    Winterfell Servers


    A Freedom chassis without a PSU is essentially just a motherboard, fan, and boot disk. Facebook engineers have created a more compact form factor - Winterfell. It resembles a Supermicro dual server node, but three such nodes can be placed on the ORv1 shelf. One 2OU Winterfell node contains a modified Windmill motherboard, a power bus connector, and a backplane for connecting power cables and fans to the motherboard. On the motherboard, you can install a full-size x16 PCIe card and a half-size x8 card, as well as a mezzanine x8 PCIe network interface card. The boot disk is connected via SATA or mSATA.



    Open rack v2


    During the deployment of ORv1, it became clear that three power zones with three buses each are redundant - so much power is simply not required. A new version has appeared - Open Rack v2 with two power zones instead of three and one bus for each zone. And the height of the switch compartment has grown to 3OU.



    Changes in nutrition led to incompatibility with Winterfell, so a new project appeared - Project Cubby. In fact, Cubby is a kind of Supermicro TwinServer chassis, but instead of two PSUs built into the server module, the power bus is used. In this design, three power supplies (2 + 1) 3.3 kW per power supply area are used instead of six. Each power zone provides 6.3 kW of power. The bottom of the rack may contain three batteries - Battery Backup Units (BBU) in the event of a power failure.

    Leopard Servers


    So, Leopard. This is the latest Windmill update with the Intel C226 chipset and support for up to two Haswell Xeon E5-2600v3 processors.



    The enlarged CPU heatsinks and good airflow make it possible to use processors with thermal packages up to 145 W, that is, the entire Xeon family, excluding the 160-watt E5-2687W v3. Each processor has 8 DIMM channels available, and DDR4 allows the future use of 128 GB memory modules, which will provide up to 2 TB of RAM - more than enough for Facebook. You can also use NVDIMM modules (flash memory in the DIMM form factor) and Facebook is testing this option.



    Other changes include the lack of an external PCIe connector, support for a mezzanine card with two QSFP +, an mSATA / M.2 slot for SATA / NVMe drives, and 8 more PCIe lines for an additional card - there are 24 in all. There is no SAS connector - Leopard is not used as head unit for Knox.



    An important addition is the Baseboard Management Controller (BMC). This is an Aspeed AST1250 controller with IPMI and Serial Over Lan access. BMC allows you to remotely update CPLD, VR, BMC and UEFI firmware. Power control is also provided to manage the PSU load.

    Certified Solutions


    OCP equipment is usually made to order. But there are also “retail” versions of Leopard. Manufacturers offer their modifications. Examples include Quanta QCT cloud servers and WiWynn systems with an increased number of drives. HP, Microsoft and Dell also did not stand aside.



    With the increase in the number of suppliers of Open Compute equipment, it became necessary to make sure that they followed the accepted specifications correctly. Two certifications appeared - OCP Ready and OCP Certified. The first means that the equipment meets the specifications and can operate in an OCP environment. The second is assigned by special testing organizations. There are only two of them - at the University of Texas at San Antonio ( UTSA ) in the USA and at the Industrial Technology Research Institute ( ITRI) in Taiwan. The first vendors to certify their equipment were WiWynn and Quanta QCT .



    OCP innovations are gradually being implemented in data centers and standardized, becoming available to a wide range of customers.
    Open technologies allow customers to offer any combination of computing nodes, storage systems and switches in a rack, using ready-made or proprietary components. At the same time, there is no rigid binding to switching systems - you can use switches of any vendor.

    Microsoft innovation


    Microsoft introduced detailed specifications for its Open Compute servers and even revealed the source code for infrastructure management software with server diagnostics, cooling and power monitoring features. She contributed to OCP by developing the Open Cloud Server architecture. These servers are optimized to work with Windows Server and are built in accordance with the high requirements for availability, scalability and efficiency that the Windows Azure cloud platform places.

    According to Microsoft, the server cost has been reduced by almost 40%, energy efficiency has increased by 15%, and infrastructure can be deployed 50% faster.

    At the Open Compute Project (OCP) Summit forum in the USA, Microsoft showed another interesting development - the distributed UPS technology calledLocal Energy Storage (LES) . It is a combination of power module and battery compatible with the Open CloudServer (OCS) v2 chassis. New LESs are interchangeable with previous PSUs. Depending on the data center topology and power redundancy requirements, you can choose which type of PSU to use. Typically, UPSs are housed in a separate room, and lead batteries are used to back up IT equipment. For a number of reasons, this solution is inefficient. Large areas are occupied, energy is lost due to AC / AC and AC / DC (DC / AC) conversions. Double conversion and battery charging increases the data center PUE to 17%. Reliability is reduced, operating costs are rising.



    Switching to adiabatic cooling can reduce costs and simplify operations.



    But how to improve the power distribution system and the UPS? Can everything be radically simplified here? Microsoft decided to abandon a separate room for the UPS and move the power modules closer to the IT load, at the same time integrating the battery system with IT management. So there was LES.



    In the LES topology, the design of the PSU has been changed - components such as batteries, a battery management controller, and a low-voltage charger have been added.



    The batteries are lithium-ion, as in electric vehicles. Thus, the LES developers took the standard elements of the PSU, ordinary batteries and combined them in one module. What does it give? According to Microsoft:

    • Up to five times lower costs compared to a traditional UPS, the power supply system in the data center is greatly simplified, and commercially available batteries perform the function of energy storage.
    • Moving the battery to the server eliminates the 9% loss typical of conventional UPSs. In lithium-ion batteries, only 2% is lost on charging, while in lead - up to 8% and 1% for power supply. As a result, PUE is reduced.
    • Data center areas are reduced by 25%, and this is a radical saving in capital costs.
    • Maintenance is greatly simplified - LES modules are easy to replace, no acid. The consequences of failure are minimized and localized.

    Innovations of the Open Compute Project are changing the market for data centers and cloud services, standardizing and cheapening the development of server solutions. This is just one of many examples. In the next series of materials you will get acquainted with other interesting solutions.

    We in Hostka also make the servers themselves for leasing them as dedicated servers - traditional solutions do not allow technologically offering low-cost machines to customers. After many attempts, the 4U platform, codenamed Aero10, went into the preliminary series : the solutions on this platform fully satisfy the need for microservers with 2-4 core processors.



    We use traditional mini-ITX motherboards, the rest of the development is entirely ours - from the case to the power distribution electronics and control circuits - everything was done locally in Moscow. We still use Chinese power supplies - MeanWell RSP1000-12 to 12V with load sharing and hot swap.

    The platform implements dedicated micro-servers based on Celeron J1800 2x2.4Ghz, Celeron J1900 4x2.0Ghz, i3-4360 2x3.7Ghz processors and the flagship i7-4790 4x3.6Ghz. In the middle segment, we make servers based on the E3-1230v3 with a remote control module on the same platform using ASUS P9D-I motherboards .

    Using this solution allows us to reduce capex costs by up to 50%, save up to 20-30% of electricity and develop this solution further. All this allows us to offer our customers competitive prices possible on the Russian market in the absence of loans, leasing, installments for 3 years and other tools available to us in the Netherlands .
    The project already has a solution for 19 mini-ITX blades in the same 4U and we are constantly working on improving the power and cooling system. In the future, the operation of such servers will not require a traditional data center.  
    In the near future we will talk about this platform and our other developments in detail, subscribe and stay tuned.

    Also popular now: