About the process of creating a server - from idea to detail
Hello! My name is Aleksey, I lead the creation of equipment at YADRO - I coordinate the work of everyone who is somehow involved in the development process.
At the end of the last article, Maxim maxf75 touched a little on the features of the location of the memory connectors. Today I’ll talk in general about how we came to the version of architecture and layout that we are working on now.
Rear view of the designed server with the rear grill removed.
When designing, we proceeded from the key requirement: to provide the maximum amount of memory. In general, this is the topic of a separate article about how the company started developing on the basis of OpenPOWER, determined the objectives for the server and came to this requirement. Between other publications, we will tell this story. In the meantime, let’s take it as a design starting point: an OpenPOWER-based server with maximum memory capacity.
It should be noted that the solutions that are available on the market now and which allow you to provide a really large amount of memory have one significant drawback - the cost of the server is several times higher than the cost of the memory installed in it. That is why we decided to create a server that breaks this tradition and allows us to provide up to 8 TB of memory in one machine while maintaining a low total (as much as possible even considering the cost of the actual 8 TB DDR4) solution cost.
Together with the goal of maximizing the amount of memory, the desire to provide high density immediately arose - sometimes this turns out to be an important factor determining the competitive advantage when comparing with other servers. After a couple of weeks of reflection and paperwork, I got the feeling that we could put all this into a standard 19 ”2U chassis.
Given the target memory size and the total number of slats, its placement is a determining factor in building the server layout. It is clear that it is simply impossible to place 128 DIMMs on the motherboard both on the basis of banal geometry (the board will be of gigantic size) and on the basis of Signal Integrity requirements. Obviously, to cram our amount of memory, we need to make risers vertically placed in the chassis that connect to the system board. On risers, you need to place connectors for DIMMs and a Centaur memory buffer, which contains a cache and provides processor access to memory (one processor supports up to 4 memory buffers).
The first idea of the riser layout was to place the modules on one side, and the memory buffer on the side next to them, as in the picture. But, firstly, we ran into a restriction on the length of traces from the buffer to DIMMs, and secondly, we realized that there would be problems with their alignment.
The initial layout of the memory riser layout
had to be done differently - to place the memory buffer chip between two groups of DIMMs. At first it was a little unclear whether such a solution would pass in height, but carefully calculating the height of the riser, we realized that when placing components with minimal tolerances, the resulting board passes just in height between the bottom and the cover of the 2U-case. Thus, the connector for connecting to the system board had to be sideways, and the riser turned out like this:
The board is complex, 18 layers.
Then we started building a general server layout. Traditionally, the front of the chassis houses disks for local storage. For a 2U chassis, the most standard options are either 24 × 2.5 ”or 12 × 3.5”. We chose the first one - 3.5 ”drives are not of much interest to us in this project, since we focus more on SSDs.
Fans are classically placed behind the disks - there were no special questions either: they put 5 fans of a common size of 80 × 38 mm - in fact, a maximum that fit in width. Here, too, there were tasks that had to be tinkered with - when placing five fans, there is practically no space left for the connectors (it is necessary to ensure the possibility of replacing them on the go). We got out, finding very compact connectors and placing them in fact in the volume occupied by the fans themselves.
Connect fans. For convenience of display, the near fan and the near guide frame are hidden.
Fans are connected to the board lying under them, which distributes the power and speed control lines. Each fan has its own control channel. The picture shows the power buses leading to the board - they run along the left side of the server if you expand it with power supplies to yourself. On the right side there is a loop for transmitting PWM control signals from the system board.
With the connection of local drives is also not so simple. We really like the NVMe standard, and we generally believe that it is the future. No matter what new types of memory appear in the foreseeable period (the same 3D-XPoint from the union of Intel and Micron), they are likely to find application in the variant of NVMe drives, since PCI Express is the shortest way to connect anything to the processor (yes, we know about NV-DIMM, but this is a very expensive compromise, which also eats up valuable memory slots for us). On the other hand, we would not like to completely and irrevocably refuse to support SAS / SATA. These considerations quite logically led us to the decision that we will place connectors on the system board that allow us to put the PCI Express bus with a cable to the disk controllers,
Molex's NanoPitch solution was chosen as the most suitable pair of connector-cable pairs for us (in fact, this is just an implementation of the actively promoted PCI SIG standard OCuLink). Cables for internal connections are quite compact and allow one cable to route up to 8 PCIe Gen3 lanes.
Then the question arose about where to actually place the disk controllers. On the backplane, to which the drives are connected, it is simply impossible to do (the chips are that of SAS-controllers, that PCIe switches are too big for this). After a careful study of the size of the disks, the maximum permissible chassis height and the study of various design options for disk trays, it became clear that in general it is possible to place a board with controllers above the disks. This arrangement, firstly, simplifies its connection to the disk backplane (you can use standard CardEdge connectors), and secondly, it allows you to reduce the height of disk tracks due to the rejection of optical fibers and the placement of the entire display on the controller board.
As a result, we got such a connection scheme:
For a change - art by hand. Disk connection diagram. Magnetic board, markers, 2016.
The board with the PCIe switch or SAS controller is located above the disks, connected to the system board with a cable. The board itself is connected to the disk backplane, into which the drives are stuck.
Power supplies are usually placed in the left or right rear corner of the case. It was more convenient for us, based on the design of the PDB (Power Distribution Board), to place them in the left (when viewed from the rear). The power supplies decided to use the CRPS standard, the main advantages of which are the high specific power of the power supplies (2 kW today, up to 2.4 - almost tomorrow), high efficiency, and most importantly - this is not some proprietary standard of one vendor, but the standard , which was originally initiated by Intel and which was supported by a significant number of companies. Two two-kilowatt power supplies in our case are located one above the other.
Since on each riser we place one memory buffer and 8 DIMMs (the maximum number supported by Centaur), it turns out that we need four risers per processor, that is, only 16 in the chassis. Based on the height of standard DDR4 RDIMMs, the width of such risers in the chassis can accommodate no more than 11 (and even that, you have to use Ultra-low seating DIMM sockets and shrink, counting tenths of a millimeter). Therefore, another 5 risers had to be put in another place, on the back of the system board. Actually, this ultimately led to the deployment of one processor at 180 degrees (the last Cthulhu picture in a previous article ). Given the height of our memory risers from the bottom to the server cover, another cutout has been added to the limitations of the shape of the motherboard.
After that, it remains only to place the connectors for standard PCI Express cards, the number of which was uniquely determined by the free space. It turned out to place 5 slots, plus a separate connector for the management board (for unloading the system board, we decided to put it on a separate BMC, USB and Ethernet card - we put all this on a separate small board, which is installed in the same sixth slot).
The result is a picture of the location of the components (top view, the board with controllers above the drives is not shown so as not to obscure the drives):
Legend:
These are the considerations that I described that determined the look of the server. Total: standard 19 ”chassis with 2U height, 4 sockets for POWER8 processors, 16 slots for risers with memory slots (up to 128 DIMM modules in the entire server, 8 on each riser), 5 standard PCI Express slots for expansion cards and one card management.
At the end of the last article, Maxim maxf75 touched a little on the features of the location of the memory connectors. Today I’ll talk in general about how we came to the version of architecture and layout that we are working on now.
Rear view of the designed server with the rear grill removed.
When designing, we proceeded from the key requirement: to provide the maximum amount of memory. In general, this is the topic of a separate article about how the company started developing on the basis of OpenPOWER, determined the objectives for the server and came to this requirement. Between other publications, we will tell this story. In the meantime, let’s take it as a design starting point: an OpenPOWER-based server with maximum memory capacity.
It should be noted that the solutions that are available on the market now and which allow you to provide a really large amount of memory have one significant drawback - the cost of the server is several times higher than the cost of the memory installed in it. That is why we decided to create a server that breaks this tradition and allows us to provide up to 8 TB of memory in one machine while maintaining a low total (as much as possible even considering the cost of the actual 8 TB DDR4) solution cost.
Together with the goal of maximizing the amount of memory, the desire to provide high density immediately arose - sometimes this turns out to be an important factor determining the competitive advantage when comparing with other servers. After a couple of weeks of reflection and paperwork, I got the feeling that we could put all this into a standard 19 ”2U chassis.
Memory
Given the target memory size and the total number of slats, its placement is a determining factor in building the server layout. It is clear that it is simply impossible to place 128 DIMMs on the motherboard both on the basis of banal geometry (the board will be of gigantic size) and on the basis of Signal Integrity requirements. Obviously, to cram our amount of memory, we need to make risers vertically placed in the chassis that connect to the system board. On risers, you need to place connectors for DIMMs and a Centaur memory buffer, which contains a cache and provides processor access to memory (one processor supports up to 4 memory buffers).
The first idea of the riser layout was to place the modules on one side, and the memory buffer on the side next to them, as in the picture. But, firstly, we ran into a restriction on the length of traces from the buffer to DIMMs, and secondly, we realized that there would be problems with their alignment.
The initial layout of the memory riser layout
had to be done differently - to place the memory buffer chip between two groups of DIMMs. At first it was a little unclear whether such a solution would pass in height, but carefully calculating the height of the riser, we realized that when placing components with minimal tolerances, the resulting board passes just in height between the bottom and the cover of the 2U-case. Thus, the connector for connecting to the system board had to be sideways, and the riser turned out like this:
The board is complex, 18 layers.
Local storage and cooling fans
Then we started building a general server layout. Traditionally, the front of the chassis houses disks for local storage. For a 2U chassis, the most standard options are either 24 × 2.5 ”or 12 × 3.5”. We chose the first one - 3.5 ”drives are not of much interest to us in this project, since we focus more on SSDs.
Fans are classically placed behind the disks - there were no special questions either: they put 5 fans of a common size of 80 × 38 mm - in fact, a maximum that fit in width. Here, too, there were tasks that had to be tinkered with - when placing five fans, there is practically no space left for the connectors (it is necessary to ensure the possibility of replacing them on the go). We got out, finding very compact connectors and placing them in fact in the volume occupied by the fans themselves.
Connect fans. For convenience of display, the near fan and the near guide frame are hidden.
Fans are connected to the board lying under them, which distributes the power and speed control lines. Each fan has its own control channel. The picture shows the power buses leading to the board - they run along the left side of the server if you expand it with power supplies to yourself. On the right side there is a loop for transmitting PWM control signals from the system board.
With the connection of local drives is also not so simple. We really like the NVMe standard, and we generally believe that it is the future. No matter what new types of memory appear in the foreseeable period (the same 3D-XPoint from the union of Intel and Micron), they are likely to find application in the variant of NVMe drives, since PCI Express is the shortest way to connect anything to the processor (yes, we know about NV-DIMM, but this is a very expensive compromise, which also eats up valuable memory slots for us). On the other hand, we would not like to completely and irrevocably refuse to support SAS / SATA. These considerations quite logically led us to the decision that we will place connectors on the system board that allow us to put the PCI Express bus with a cable to the disk controllers,
Molex's NanoPitch solution was chosen as the most suitable pair of connector-cable pairs for us (in fact, this is just an implementation of the actively promoted PCI SIG standard OCuLink). Cables for internal connections are quite compact and allow one cable to route up to 8 PCIe Gen3 lanes.
Then the question arose about where to actually place the disk controllers. On the backplane, to which the drives are connected, it is simply impossible to do (the chips are that of SAS-controllers, that PCIe switches are too big for this). After a careful study of the size of the disks, the maximum permissible chassis height and the study of various design options for disk trays, it became clear that in general it is possible to place a board with controllers above the disks. This arrangement, firstly, simplifies its connection to the disk backplane (you can use standard CardEdge connectors), and secondly, it allows you to reduce the height of disk tracks due to the rejection of optical fibers and the placement of the entire display on the controller board.
As a result, we got such a connection scheme:
For a change - art by hand. Disk connection diagram. Magnetic board, markers, 2016.
The board with the PCIe switch or SAS controller is located above the disks, connected to the system board with a cable. The board itself is connected to the disk backplane, into which the drives are stuck.
Power supplies
Power supplies are usually placed in the left or right rear corner of the case. It was more convenient for us, based on the design of the PDB (Power Distribution Board), to place them in the left (when viewed from the rear). The power supplies decided to use the CRPS standard, the main advantages of which are the high specific power of the power supplies (2 kW today, up to 2.4 - almost tomorrow), high efficiency, and most importantly - this is not some proprietary standard of one vendor, but the standard , which was originally initiated by Intel and which was supported by a significant number of companies. Two two-kilowatt power supplies in our case are located one above the other.
A little more about memory
Since on each riser we place one memory buffer and 8 DIMMs (the maximum number supported by Centaur), it turns out that we need four risers per processor, that is, only 16 in the chassis. Based on the height of standard DDR4 RDIMMs, the width of such risers in the chassis can accommodate no more than 11 (and even that, you have to use Ultra-low seating DIMM sockets and shrink, counting tenths of a millimeter). Therefore, another 5 risers had to be put in another place, on the back of the system board. Actually, this ultimately led to the deployment of one processor at 180 degrees (the last Cthulhu picture in a previous article ). Given the height of our memory risers from the bottom to the server cover, another cutout has been added to the limitations of the shape of the motherboard.
After that, it remains only to place the connectors for standard PCI Express cards, the number of which was uniquely determined by the free space. It turned out to place 5 slots, plus a separate connector for the management board (for unloading the system board, we decided to put it on a separate BMC, USB and Ethernet card - we put all this on a separate small board, which is installed in the same sixth slot).
General scheme
The result is a picture of the location of the components (top view, the board with controllers above the drives is not shown so as not to obscure the drives):
Legend:
0. The motherboard.
1. Processors IBM POWER8 SCM Turismo.
2. Memory risers with DIMMs.
3. 24 × 2.5 ”disk and disk backplane
4. Slots for expansion cards (only HHHL, that is, low-profile cards, are supported).
5. Management Board.
6. Fans of the cooling system.
7. Power supplies.
These are the considerations that I described that determined the look of the server. Total: standard 19 ”chassis with 2U height, 4 sockets for POWER8 processors, 16 slots for risers with memory slots (up to 128 DIMM modules in the entire server, 8 on each riser), 5 standard PCI Express slots for expansion cards and one card management.