DIY supercomputer
Today it is possible to build a home supercomputer, which will be discussed.
The article discusses the methods of hardware construction of high-performance computing systems. One of the interesting uses is cryptography. For example, thanks to modern technology, hacking MD5 or WPA has become available to anyone. If you try (information is quickly cut out), on the Internet you can find a way to hack the A5 / 2 algorithm used in GSM. Another application is engineering, financial, medical calculations, bitcoin mining.
The date of the first written mention of supercomputers can be considered March 1, 1920. New York newspapers wrote about cars with a capacity of one hundred mathematicians. These were tabulators - electromechanical computers manufactured by IBM (which was then called CTR). Subsequently, computers became electronic. Several players have emerged in the supercomputer market, such as Cray, HP, IBM, Nec. These computers had vector processors (that is, they operated not with separate numbers, but with vectors). For communication between computing nodes, proprietary technologies of manufacturers were used. For example, one of such technologies - the connection of processors according to the topology of the four-dimensional torus - hides a very simple meaning behind these words: each node is connected to six others. Further development of supercomputers spawned the direction of massively parallel systems and clusters. In clusters, as the quintessentials of this direction, approximately the same communication algorithms between computing nodes are used as in supercomputers, only based on network interfaces. They are the weak point of such systems. In addition to the non-standard (compared to the classical star) network topology like Fat Tree, “multidimensional torus” or Dragonfly, special switching devices are required.
Touching upon the topic we have taken, one cannot fail to mention that today one of the promising directions for the development of supercomputers is the use of coprocessors in the standard computer architecture, similar in architecture to video cards.
Today, the main processor manufacturers are Intel and AMD. RISC processors such as Power 7+ , despite their attractiveness, are quite exotic and expensive. For example, not the newest model of such a server costs more than a million .
(By the way, speaking, it’s possible to assemble an inexpensive and efficient cluster from xbox 360 or PS3, the processors are there like Power, and you can buy more than one set-top box in a million.)
Based on this, we note interesting options for building a high-performance system that are interesting for the price. Of course, it should be multiprocessor. Intel uses Xeon processors for such tasks, while AMD uses Opteron.
Separately, we note the extremely expensive, but productive line of processors on the Intel Xeon LGA1567 socket.
The top processor in this series is the E7-8870 with ten 2.4 GHz cores. Its price is $ 4616. For such CPUs, HP and Supermicro are releasing! eight-processor! server chassis. Eight 10-core Xeon E7-8870 2.4 GHz processors with HyperThreading support 8 * 10 * 2 = 160 threads, which is displayed in Windows Task Manager as one hundred and sixty graphs of processor loading, matrix 10x16.
In order for eight processors to fit in the case, they are not placed immediately on the motherboard, but on separate boards that stick into the motherboard. The photo shows four motherboards with processors installed in the motherboard (two on each). This is a Supermicro solution. In HP solutionEach processor has its own board. The cost of an HP solution is two to three million, depending on the number of processors, memory and more. The Supermicro chassis costs $ 10,000, which is more attractive. In addition, Supermicro can put four coprocessor expansion cards in the PCI-Express x16 ports (by the way, there will still be room for an Infiniband adapter to assemble a cluster of these), but only two in HP. Thus, to create a supercomputer, an eight-processor platform from Supermicro is more attractive. The next photo from the exhibition shows a complete computer with four GPU boards.
However, it is very expensive.
But there is the prospect of assembling a supercomputer on more affordable AMD Opteron G34, Intel Xeon LGA2011 and LGA 1366 processors.
To select a specific model, I compiled a table in which I calculated the price / (number of cores * frequency) indicator for each processor. I dropped processors below 2 GHz from the calculation, and for Intel with a bus below 6.4GT / s.
In bold italics, a model with a minimum ratio indicator, underlined, is highlighted - the most powerful AMD and, in my opinion, the closest in performance Xeon.
Thus, my choice of processors for the supercomputer is the Opteron 6386 SE, Opteron 6344, Xeon E5-2687W and Xeon E5-2630.
It is impossible to put more than four dual-slot expansion cards on ordinary motherboards. There is another architecture - the use of cross-boards, such as BPG8032 PCI Express Backplane.
PCI Express expansion cards and one processor board are placed in such a board, somewhat similar to those installed in the eight-processor Supermicro-based servers discussed above. But only these processor boards comply with PICMG industry standards. Standards are developing slowly and such boards often do not support the most modern processors. A maximum of such processor boards are now released on two Xeon E5-2448L - Trenton BXT7059 SBC.
Such a system will cost at least $ 5,000 without a GPU.
For the same amount, you can purchase a ready-made platform for assembling TYAN FT72B7015 supercomputers . In this, you can install up to eight GPUs and two Xeon LGA1366.
Supermicro X9QR7-TF - You can install 4 expansion cards and 4 processors on this motherboard.
Supermicro X9DRG-QF - This board is specifically designed for the assembly of high-performance systems.
Supermicro H8QGL-6F - This board allows you to install four processors and three expansion cards
This market is almost completely captured by NVidia, which also produce computing cards in addition to gaming video cards. AMD has a smaller market share, and Intel has recently entered the market.
A feature of such coprocessors is the presence on board of a large amount of RAM, quick calculations with double precision and energy efficiency.
Nvidia's top-end solution is called the Tesla K20X with Kepler architecture. It is these cards that are in the world's most powerful Titan supercomputer. However, Nvidia recently released the Geforce Titan graphics card. Older models were with FP64 truncated performance up to 1/24 of the FP32 (GTX680). But in Titan, the manufacturer promises a fairly high performance in calculations with double precision. AMD solutions are not bad either, but they are built on a different architecture and this can create difficulties for running calculations optimized for CUDA (Nvidia technology).
The solution from Intel - Xeon Phi 5110P is interesting in that all the cores in the coprocessor are based on the x86 architecture and there is no need for special code optimization to run the calculations. But my favorite among the coprocessors is the relatively inexpensive AMD HD 7970 GHz Edition. Theoretically, this video card will show maximum performance based on cost.
To improve system performance, several computers can be combined into a cluster, which will distribute the computing load between the computers included in the cluster.
Using a standard gigabit Ethernet as a network interface for connecting computers is too slow. For these purposes, Infiniband is most often used. Infiniband host adapter relative to the server is inexpensive. For example, at an international Ebay auction, such adapters sell for as low as $ 40. For example, the X4 DDR adapter (20Gb / s) will cost about $ 100 for delivery to Russia.
At the same time, switching equipment for Infiniband is quite expensive. Yes, and as mentioned above, the classic star as the topology of a computer network is not the best choice.
However, InfiniBand hosts can be connected to each other directly, without a switch. Then, for example, such an option becomes quite interesting: a cluster of two computers connected via infiniband. Such a supercomputer can be assembled at home.
In the most powerful supercomputer of today Cray Titan, the ratio of processors to “video cards” is 1: 1, that is, it has 18688 16-core processors and 18688 Tesla K20X.
In Tianhe-1A, the Chinese Xeon supercomputer, the attitude is as follows. Two six-core processors to one "vidyushka" Nvidia M2050 (weaker than K20X).
We will take such an attitude for our assemblies as optimal (because it’s cheaper). That is, 12-16 processor cores per GPU. On the table below, bold are the practically possible options, underlining - the most successful from my point of view.
If a system with an already established processor / video card ratio can take on board additional computing devices, we will add them to increase the build power.
The options below are a supercomputer chassis without RAM, hard drives, or software. All models use an AMD HD 7970 GHz Edition video adapter. It can be replaced by another, at the request of the task (for example, xeon phi). Where the system allows, one of the AMD HD 7970 GHz Edition is replaced by the three-slot AMD HD 7990 Devil 13.
Theoretically, performance will be around 12 Tflops.
This board does not support Opteron 63xx, so 62xx is used. In this version, two computers are clustered by Infiniband x4 DDR with two cables. Theoretically, the connection speed in this case rests against the PCIe x8 speed, i.e. 32 Gb / s. There are two power supplies. How to coordinate them among themselves can be found on the Internet.
For a cluster of such configurations, two are needed and their cost will be $ 11360 . Its power consumption at full load will be about 3000W. Theoretically, performance will be up to 31Tflops.
This option differs in that with eight GPUs there are only two CPUs. Accordingly, its performance in real-world tasks will depend on the ability of the program to be highly parallelized.
Theoretically, performance will be up to 32 Tflops.
For a cluster of such configurations, two are needed and their cost will be $ 15,940. The total power consumption at full load will be about 4000 watts. Theoretically, performance will be up to 39Tflops.
The article discusses the methods of hardware construction of high-performance computing systems. One of the interesting uses is cryptography. For example, thanks to modern technology, hacking MD5 or WPA has become available to anyone. If you try (information is quickly cut out), on the Internet you can find a way to hack the A5 / 2 algorithm used in GSM. Another application is engineering, financial, medical calculations, bitcoin mining.
A bit of history
The date of the first written mention of supercomputers can be considered March 1, 1920. New York newspapers wrote about cars with a capacity of one hundred mathematicians. These were tabulators - electromechanical computers manufactured by IBM (which was then called CTR). Subsequently, computers became electronic. Several players have emerged in the supercomputer market, such as Cray, HP, IBM, Nec. These computers had vector processors (that is, they operated not with separate numbers, but with vectors). For communication between computing nodes, proprietary technologies of manufacturers were used. For example, one of such technologies - the connection of processors according to the topology of the four-dimensional torus - hides a very simple meaning behind these words: each node is connected to six others. Further development of supercomputers spawned the direction of massively parallel systems and clusters. In clusters, as the quintessentials of this direction, approximately the same communication algorithms between computing nodes are used as in supercomputers, only based on network interfaces. They are the weak point of such systems. In addition to the non-standard (compared to the classical star) network topology like Fat Tree, “multidimensional torus” or Dragonfly, special switching devices are required.
Touching upon the topic we have taken, one cannot fail to mention that today one of the promising directions for the development of supercomputers is the use of coprocessors in the standard computer architecture, similar in architecture to video cards.
CPU choice
Today, the main processor manufacturers are Intel and AMD. RISC processors such as Power 7+ , despite their attractiveness, are quite exotic and expensive. For example, not the newest model of such a server costs more than a million .
(By the way, speaking, it’s possible to assemble an inexpensive and efficient cluster from xbox 360 or PS3, the processors are there like Power, and you can buy more than one set-top box in a million.)
Based on this, we note interesting options for building a high-performance system that are interesting for the price. Of course, it should be multiprocessor. Intel uses Xeon processors for such tasks, while AMD uses Opteron.
If a lot of money
Separately, we note the extremely expensive, but productive line of processors on the Intel Xeon LGA1567 socket.
The top processor in this series is the E7-8870 with ten 2.4 GHz cores. Its price is $ 4616. For such CPUs, HP and Supermicro are releasing! eight-processor! server chassis. Eight 10-core Xeon E7-8870 2.4 GHz processors with HyperThreading support 8 * 10 * 2 = 160 threads, which is displayed in Windows Task Manager as one hundred and sixty graphs of processor loading, matrix 10x16.
In order for eight processors to fit in the case, they are not placed immediately on the motherboard, but on separate boards that stick into the motherboard. The photo shows four motherboards with processors installed in the motherboard (two on each). This is a Supermicro solution. In HP solutionEach processor has its own board. The cost of an HP solution is two to three million, depending on the number of processors, memory and more. The Supermicro chassis costs $ 10,000, which is more attractive. In addition, Supermicro can put four coprocessor expansion cards in the PCI-Express x16 ports (by the way, there will still be room for an Infiniband adapter to assemble a cluster of these), but only two in HP. Thus, to create a supercomputer, an eight-processor platform from Supermicro is more attractive. The next photo from the exhibition shows a complete computer with four GPU boards.
However, it is very expensive.
What is cheaper
But there is the prospect of assembling a supercomputer on more affordable AMD Opteron G34, Intel Xeon LGA2011 and LGA 1366 processors.
To select a specific model, I compiled a table in which I calculated the price / (number of cores * frequency) indicator for each processor. I dropped processors below 2 GHz from the calculation, and for Intel with a bus below 6.4GT / s.
Model | Number of cores | Frequency | Price, $ | Price / core, $ | Price / Core / GHz |
AMD | | | | | |
6386 SE | 16 | 2,8 | 1392 | 87 | 31 |
6380 | 16 | 2,5 | 1088 | 68 | 27 |
6378 | 16 | 2,4 | 867 | 54 | 23 |
6376 | 16 | 2,3 | 703 | 44 | 19 |
6348 | 12 | 2,8 | 575 | 48 | 17 |
6344 | 12 | 2.6 | 415 | 35 | thirteen |
6328 | 8 | 3.2 | 575 | 72 | 22 |
6320 | 8 | 2,8 | 293 | 37 | thirteen |
INTEL | | | | | |
E5-2690 | 8 | 2.9 | 2057 | 257 | 89 |
E5-2680 | 8 | 2.7 | 1723 | 215 | 80 |
E5-2670 | 8 | 2.6 | 1552 | 194 | 75 |
E5-2665 | 8 | 2,4 | 1440 | 180 | 75 |
E5-2660 | 8 | 2.2 | 1329 | 166 | 76 |
E5-2650 | 8 | 2 | 1107 | 138 | 69 |
E5-2687W | 8 | 3,1 | 1885 | 236 | 76 |
E5-4650L | 8 | 2.6 | 3616 | 452 | 174 |
E5-4650 | 8 | 2.7 | 3616 | 452 | 167 |
E5-4640 | 8 | 2,4 | 2725 | 341 | 142 |
E5-4617 | 6 | 2.9 | 1611 | 269 | 93 |
E5-4610 | 6 | 2,4 | 1219 | 203 | 85 |
E5-2640 | 6 | 2,5 | 885 | 148 | 59 |
E5-2630 | 6 | 2,3 | 612 | 102 | 44 |
E5-2667 | 6 | 2.9 | 1552 | 259 | 89 |
X5690 | 6 | 3.46 | 1663 | 277 | 80 |
X5680 | 6 | 3.33 | 1663 | 277 | 83 |
X5675 | 6 | 3.06 | 1440 | 240 | 78 |
X5670 | 6 | 2.93 | 1440 | 240 | 82 |
X5660 | 6 | 2,8 | 1219 | 203 | 73 |
X5650 | 6 | 2.66 | 996 | 166 | 62 |
E5-4607 | 6 | 2.2 | 885 | 148 | 67 |
X5687 | 4 | 3.6 | 1663 | 416 | 115 |
X5677 | 4 | 3.46 | 1663 | 416 | 120 |
X5672 | 4 | 3.2 | 1440 | 360 | 113 |
X5667 | 4 | 3.06 | 1440 | 360 | 118 |
E5-2643 | 4 | 3.3 | 885 | 221 | 67 |
In bold italics, a model with a minimum ratio indicator, underlined, is highlighted - the most powerful AMD and, in my opinion, the closest in performance Xeon.
Thus, my choice of processors for the supercomputer is the Opteron 6386 SE, Opteron 6344, Xeon E5-2687W and Xeon E5-2630.
motherboards
Picmg
It is impossible to put more than four dual-slot expansion cards on ordinary motherboards. There is another architecture - the use of cross-boards, such as BPG8032 PCI Express Backplane.
PCI Express expansion cards and one processor board are placed in such a board, somewhat similar to those installed in the eight-processor Supermicro-based servers discussed above. But only these processor boards comply with PICMG industry standards. Standards are developing slowly and such boards often do not support the most modern processors. A maximum of such processor boards are now released on two Xeon E5-2448L - Trenton BXT7059 SBC.
Such a system will cost at least $ 5,000 without a GPU.
Ready-made TYAN platforms
For the same amount, you can purchase a ready-made platform for assembling TYAN FT72B7015 supercomputers . In this, you can install up to eight GPUs and two Xeon LGA1366.
Ordinary server motherboards
For LGA2011
Supermicro X9QR7-TF - You can install 4 expansion cards and 4 processors on this motherboard.
Supermicro X9DRG-QF - This board is specifically designed for the assembly of high-performance systems.
For opteron
Supermicro H8QGL-6F - This board allows you to install four processors and three expansion cards
Reinforcing the platform with expansion cards
This market is almost completely captured by NVidia, which also produce computing cards in addition to gaming video cards. AMD has a smaller market share, and Intel has recently entered the market.
A feature of such coprocessors is the presence on board of a large amount of RAM, quick calculations with double precision and energy efficiency.
FP32, Tflops | FP64, Tflops | Price | Memory GB | |
Nvidia Tesla K20X | 3.95 | 1.31 | 5.5 | 6 |
AMD FirePro S10000 | 5.91 | 1.48 | 3.6 | 6 |
Intel Xeon Phi 5110P | 1 | 2.7 | 8 | |
Nvidia GTX Titan | 4.5 | 1.3 | 1.1 | 6 |
Nvidia GTX 680 | 3 | 0.13 | 0.5 | 2 |
AMD HD 7970 GHz Edition | 4 | 1 | 0.5 | 3 |
AMD HD 7990 Devil 13 | 2x3,7 | 2x0.92 | 1.6 | 2x3 |
Nvidia's top-end solution is called the Tesla K20X with Kepler architecture. It is these cards that are in the world's most powerful Titan supercomputer. However, Nvidia recently released the Geforce Titan graphics card. Older models were with FP64 truncated performance up to 1/24 of the FP32 (GTX680). But in Titan, the manufacturer promises a fairly high performance in calculations with double precision. AMD solutions are not bad either, but they are built on a different architecture and this can create difficulties for running calculations optimized for CUDA (Nvidia technology).
The solution from Intel - Xeon Phi 5110P is interesting in that all the cores in the coprocessor are based on the x86 architecture and there is no need for special code optimization to run the calculations. But my favorite among the coprocessors is the relatively inexpensive AMD HD 7970 GHz Edition. Theoretically, this video card will show maximum performance based on cost.
Can connect to a cluster
To improve system performance, several computers can be combined into a cluster, which will distribute the computing load between the computers included in the cluster.
Using a standard gigabit Ethernet as a network interface for connecting computers is too slow. For these purposes, Infiniband is most often used. Infiniband host adapter relative to the server is inexpensive. For example, at an international Ebay auction, such adapters sell for as low as $ 40. For example, the X4 DDR adapter (20Gb / s) will cost about $ 100 for delivery to Russia.
At the same time, switching equipment for Infiniband is quite expensive. Yes, and as mentioned above, the classic star as the topology of a computer network is not the best choice.
However, InfiniBand hosts can be connected to each other directly, without a switch. Then, for example, such an option becomes quite interesting: a cluster of two computers connected via infiniband. Such a supercomputer can be assembled at home.
How many video cards do you need
In the most powerful supercomputer of today Cray Titan, the ratio of processors to “video cards” is 1: 1, that is, it has 18688 16-core processors and 18688 Tesla K20X.
In Tianhe-1A, the Chinese Xeon supercomputer, the attitude is as follows. Two six-core processors to one "vidyushka" Nvidia M2050 (weaker than K20X).
We will take such an attitude for our assemblies as optimal (because it’s cheaper). That is, 12-16 processor cores per GPU. On the table below, bold are the practically possible options, underlining - the most successful from my point of view.
GPU | Cores | 6-core CPU | 8-core CPU | 12-core CPU | 16-core CPU | |||||
2 | 24 | 32 | 4 | 5 | 3 | 4 | 2 | 3 | 2 | 2 |
3 | 36 | 48 | 6 | 8 | 5 | 6 | 3 | 4 | 2 | 3 |
4 | 48 | 64 | 8 | eleven | 6 | 8 | 4 | 5 | 3 | 4 |
If a system with an already established processor / video card ratio can take on board additional computing devices, we will add them to increase the build power.
So how much is
The options below are a supercomputer chassis without RAM, hard drives, or software. All models use an AMD HD 7970 GHz Edition video adapter. It can be replaced by another, at the request of the task (for example, xeon phi). Where the system allows, one of the AMD HD 7970 GHz Edition is replaced by the three-slot AMD HD 7990 Devil 13.
Option 1 on the Supermicro H8QGL-6F motherboard
Motherboard | Supermicro H8QGL-6F | 1 | 1200 | 1200 |
CPU | AMD Opteron 6344 | 4 | 500 | 2000 |
CPU cooler | Thermaltake CLS0017 | 4 | 40 | 160 |
Case 1400W | SC748TQ-R1400B | 1 | 1000 | 1000 |
Graphics Accelerator | AMD HD 7970 GHz Edition | 3 | 500 | 1500 |
5860 |
Theoretically, performance will be around 12 Tflops.
Option 2 on the TYAN S8232 motherboard, cluster
This board does not support Opteron 63xx, so 62xx is used. In this version, two computers are clustered by Infiniband x4 DDR with two cables. Theoretically, the connection speed in this case rests against the PCIe x8 speed, i.e. 32 Gb / s. There are two power supplies. How to coordinate them among themselves can be found on the Internet.
number | Price | Amount | ||
Motherboard | TYAN S8232 | 1 | 790 | 790 |
CPU | AMD Opteron 6282SE | 2 | 1000 | 2000 |
CPU cooler | Noctua NH-U12DO A3 | 2 | 60 | 120 |
Housing | Antec Twelve Hundred Black | 1 | 200 | 200 |
Power Supply | FSP AURUM PRO 1200W | 2 | 200 | 400 |
Graphics Accelerator | AMD HD 7970 GHz Edition | 2 | 500 | 1000 |
Graphics Accelerator | AX7990 6GBD5-A2DHJ | 1 | 1000 | 1000 |
Infiniband adapter | X4 DDR Infiniband | 1 | 140 | 140 |
Infiniband cable | X4 DDR Infiniband | 1 | thirty | thirty |
5680 (per block) |
For a cluster of such configurations, two are needed and their cost will be $ 11360 . Its power consumption at full load will be about 3000W. Theoretically, performance will be up to 31Tflops.
Option 3 on the Tyan FT72B7015 platform
This option differs in that with eight GPUs there are only two CPUs. Accordingly, its performance in real-world tasks will depend on the ability of the program to be highly parallelized.
number | Price | Amount | ||
Chassis (3000W) | Tyan FT72B7015 | 1 | 4900 | 4900 |
CPU | Xeon X5680 | 2 | 1300 | 2600 |
CPU cooler | SuperMicro SNK-P0040AP4 | 2 | 40 | 80 |
Graphics Accelerator | AMD HD 7970 GHz Edition | 8 | 500 | 4000 |
11580 |
Theoretically, performance will be up to 32 Tflops.
Option 4 for LGA2011, clustered
number | Price | Amount | ||
Motherboard | Supermicro X9DRG-QF | 1 | 600 | 600 |
CPU | Intel Xeon E5-2687W | 2 | 2000 | 4000 |
CPU cooler | Supermicro SNK-P0050AP4 | 2 | fifty | 100 |
Housing | Antec Twelve Hundred Black | 1 | 200 | 200 |
Power Supply | FSP AURUM PRO 1200W | 2 | 200 | 400 |
Graphics Accelerator | AMD HD 7970 GHz Edition | 3 | 500 | 1500 |
Graphics Accelerator | AX7990 6GBD5-A2DHJ | 1 | 1000 | 1000 |
Infiniband adapter | X4 DDR Infiniband | 1 | 140 | 140 |
Infiniband cable | X4 DDR Infiniband | 1 | thirty | thirty |
7970 (per block) |
For a cluster of such configurations, two are needed and their cost will be $ 15,940. The total power consumption at full load will be about 4000 watts. Theoretically, performance will be up to 39Tflops.