DIY supercomputer

Today it is possible to build a home supercomputer, which will be discussed.

The article discusses the methods of hardware construction of high-performance computing systems. One of the interesting uses is cryptography. For example, thanks to modern technology, hacking MD5 or WPA has become available to anyone. If you try (information is quickly cut out), on the Internet you can find a way to hack the A5 / 2 algorithm used in GSM. Another application is engineering, financial, medical calculations, bitcoin mining.

A bit of history


Newspaper article 1920 on a supercomputer1920 supercomputerThe date of the first written mention of supercomputers can be considered March 1, 1920. New York newspapers wrote about cars with a capacity of one hundred mathematicians. These were tabulators - electromechanical computers manufactured by IBM (which was then called CTR). Subsequently, computers became electronic. Several players have emerged in the supercomputer market, such as Cray, HP, IBM, Nec. These computers had vector processors (that is, they operated not with separate numbers, but with vectors). For communication between computing nodes, proprietary technologies of manufacturers were used. For example, one of such technologies - the connection of processors according to the topology of the four-dimensional torus - hides a very simple meaning behind these words: each node is connected to six others. Further development of supercomputers spawned the direction of massively parallel systems and clusters. In clusters, as the quintessentials of this direction, approximately the same communication algorithms between computing nodes are used as in supercomputers, only based on network interfaces. They are the weak point of such systems. In addition to the non-standard (compared to the classical star) network topology like Fat Tree, “multidimensional torus” or Dragonfly, special switching devices are required.

Touching upon the topic we have taken, one cannot fail to mention that today one of the promising directions for the development of supercomputers is the use of coprocessors in the standard computer architecture, similar in architecture to video cards.

CPU choice


Today, the main processor manufacturers are Intel and AMD. RISC processors such as Power 7+ , despite their attractiveness, are quite exotic and expensive. For example, not the newest model of such a server costs more than a million .

(By the way, speaking, it’s possible to assemble an inexpensive and efficient cluster from xbox 360 or PS3, the processors are there like Power, and you can buy more than one set-top box in a million.)

Based on this, we note interesting options for building a high-performance system that are interesting for the price. Of course, it should be multiprocessor. Intel uses Xeon processors for such tasks, while AMD uses Opteron.

If a lot of money

Xeon E7-8870 Performance MonitorCPU BoardsSeparately, we note the extremely expensive, but productive line of processors on the Intel Xeon LGA1567 socket.
The top processor in this series is the E7-8870 with ten 2.4 GHz cores. Its price is $ 4616. For such CPUs, HP and Supermicro are releasing! eight-processor! server chassis. Eight 10-core Xeon E7-8870 2.4 GHz processors with HyperThreading support 8 * 10 * 2 = 160 threads, which is displayed in Windows Task Manager as one hundred and sixty graphs of processor loading, matrix 10x16.

In order for eight processors to fit in the case, they are not placed immediately on the motherboard, but on separate boards that stick into the motherboard. The photo shows four motherboards with processors installed in the motherboard (two on each). This is a Supermicro solution. In HP solutionEach processor has its own board. The cost of an HP solution is two to three million, depending on the number of processors, memory and more. The Supermicro chassis costs $ 10,000, which is more attractive. In addition, Supermicro can put four coprocessor expansion cards in the PCI-Express x16 ports (by the way, there will still be room for an Infiniband adapter to assemble a cluster of these), but only two in HP. Thus, to create a supercomputer, an eight-processor platform from Supermicro is more attractive. The next photo from the exhibition shows a complete computer with four GPU boards.
Supercomputer with 4 GPU boards

However, it is very expensive.

What is cheaper

But there is the prospect of assembling a supercomputer on more affordable AMD Opteron G34, Intel Xeon LGA2011 and LGA 1366 processors.

To select a specific model, I compiled a table in which I calculated the price / (number of cores * frequency) indicator for each processor. I dropped processors below 2 GHz from the calculation, and for Intel with a bus below 6.4GT / s.
Model
Number of cores
Frequency
Price, $
Price / core, $
Price / Core / GHz
AMD
 
 
 
 
 
6386 SE
16
2,8
1392
87
31
6380
16
2,5
1088
68
27
6378
16
2,4
867
54
23
6376
16
2,3
703
44
19
6348
12
2,8
575
48
17
6344
12
2.6
415
35
thirteen
6328
8
3.2
575
72
22
6320
8
2,8
293
37
thirteen
INTEL
 
 
 
 
 
E5-2690
8
2.9
2057
257
89
E5-2680
8
2.7
1723
215
80
E5-2670
8
2.6
1552
194
75
E5-2665
8
2,4
1440
180
75
E5-2660
8
2.2
1329
166
76
E5-2650
8
2
1107
138
69
E5-2687W
8
3,1
1885
236
76
E5-4650L
8
2.6
3616
452
174
E5-4650
8
2.7
3616
452
167
E5-4640
8
2,4
2725
341
142
E5-4617
6
2.9
1611
269
93
E5-4610
6
2,4
1219
203
85
E5-2640
6
2,5
885
148
59
E5-2630
6
2,3
612
102
44
E5-2667
6
2.9
1552
259
89
X5690
6
3.46
1663
277
80
X5680
6
3.33
1663
277
83
X5675
6
3.06
1440
240
78
X5670
6
2.93
1440
240
82
X5660
6
2,8
1219
203
73
X5650
6
2.66
996
166
62
E5-4607
6
2.2
885
148
67
X5687
4
3.6
1663
416
115
X5677
4
3.46
1663
416
120
X5672
4
3.2
1440
360
113
X5667
4
3.06
1440
360
118
E5-2643
4
3.3
885
221
67

In bold italics, a model with a minimum ratio indicator, underlined, is highlighted - the most powerful AMD and, in my opinion, the closest in performance Xeon.

Thus, my choice of processors for the supercomputer is the Opteron 6386 SE, Opteron 6344, Xeon E5-2687W and Xeon E5-2630.

motherboards


Picmg

It is impossible to put more than four dual-slot expansion cards on ordinary motherboards. There is another architecture - the use of cross-boards, such as BPG8032 PCI Express Backplane.
BPG8032 PCI Express Backplane

PCI Express expansion cards and one processor board are placed in such a board, somewhat similar to those installed in the eight-processor Supermicro-based servers discussed above. But only these processor boards comply with PICMG industry standards. Standards are developing slowly and such boards often do not support the most modern processors. A maximum of such processor boards are now released on two Xeon E5-2448L - Trenton BXT7059 SBC.
Trenton BXT7059 SBC


Such a system will cost at least $ 5,000 without a GPU.

Ready-made TYAN platforms

For the same amount, you can purchase a ready-made platform for assembling TYAN FT72B7015 supercomputers . In this, you can install up to eight GPUs and two Xeon LGA1366.

Ordinary server motherboards

For LGA2011

Supermicro X9QR7-TF - You can install 4 expansion cards and 4 processors on this motherboard.

Supermicro X9DRG-QF - This board is specifically designed for the assembly of high-performance systems.

For opteron

Supermicro H8QGL-6F - This board allows you to install four processors and three expansion cards

Reinforcing the platform with expansion cards


This market is almost completely captured by NVidia, which also produce computing cards in addition to gaming video cards. AMD has a smaller market share, and Intel has recently entered the market.

A feature of such coprocessors is the presence on board of a large amount of RAM, quick calculations with double precision and energy efficiency.
 FP32, TflopsFP64, TflopsPriceMemory GB
Nvidia Tesla K20X3.951.315.56
AMD FirePro S100005.911.483.66
Intel Xeon Phi 5110P 12.78
Nvidia GTX Titan4.51.31.16
Nvidia GTX 68030.130.52
AMD HD 7970 GHz Edition410.53
AMD HD 7990 Devil 132x3,72x0.921.62x3

Nvidia's top-end solution is called the Tesla K20X with Kepler architecture. It is these cards that are in the world's most powerful Titan supercomputer. However, Nvidia recently released the Geforce Titan graphics card. Older models were with FP64 truncated performance up to 1/24 of the FP32 (GTX680). But in Titan, the manufacturer promises a fairly high performance in calculations with double precision. AMD solutions are not bad either, but they are built on a different architecture and this can create difficulties for running calculations optimized for CUDA (Nvidia technology).

The solution from Intel - Xeon Phi 5110P is interesting in that all the cores in the coprocessor are based on the x86 architecture and there is no need for special code optimization to run the calculations. But my favorite among the coprocessors is the relatively inexpensive AMD HD 7970 GHz Edition. Theoretically, this video card will show maximum performance based on cost.

Can connect to a cluster


To improve system performance, several computers can be combined into a cluster, which will distribute the computing load between the computers included in the cluster.

Using a standard gigabit Ethernet as a network interface for connecting computers is too slow. For these purposes, Infiniband is most often used. Infiniband host adapter relative to the server is inexpensive. For example, at an international Ebay auction, such adapters sell for as low as $ 40. For example, the X4 DDR adapter (20Gb / s) will cost about $ 100 for delivery to Russia.

At the same time, switching equipment for Infiniband is quite expensive. Yes, and as mentioned above, the classic star as the topology of a computer network is not the best choice.

However, InfiniBand hosts can be connected to each other directly, without a switch. Then, for example, such an option becomes quite interesting: a cluster of two computers connected via infiniband. Such a supercomputer can be assembled at home.

How many video cards do you need


In the most powerful supercomputer of today Cray Titan, the ratio of processors to “video cards” is 1: 1, that is, it has 18688 16-core processors and 18688 Tesla K20X.

In Tianhe-1A, the Chinese Xeon supercomputer, the attitude is as follows. Two six-core processors to one "vidyushka" Nvidia M2050 (weaker than K20X).

We will take such an attitude for our assemblies as optimal (because it’s cheaper). That is, 12-16 processor cores per GPU. On the table below, bold are the practically possible options, underlining - the most successful from my point of view.
GPUCores6-core CPU8-core CPU12-core CPU16-core CPU
224324
5
3
4
2
3
2
2
336486
8
5
6
3
4
2
3
448648
eleven
6
8
4
5
3
4

If a system with an already established processor / video card ratio can take on board additional computing devices, we will add them to increase the build power.

So how much is


The options below are a supercomputer chassis without RAM, hard drives, or software. All models use an AMD HD 7970 GHz Edition video adapter. It can be replaced by another, at the request of the task (for example, xeon phi). Where the system allows, one of the AMD HD 7970 GHz Edition is replaced by the three-slot AMD HD 7990 Devil 13.

Option 1 on the Supermicro H8QGL-6F motherboard

Enclosure SC748TQ-R1400B

     
MotherboardSupermicro H8QGL-6F112001200
CPUAMD Opteron 634445002000
CPU coolerThermaltake CLS0017440160
Case 1400WSC748TQ-R1400B110001000
Graphics AcceleratorAMD HD 7970 GHz Edition35001500
    5860

Theoretically, performance will be around 12 Tflops.

Option 2 on the TYAN S8232 motherboard, cluster

image

This board does not support Opteron 63xx, so 62xx is used. In this version, two computers are clustered by Infiniband x4 DDR with two cables. Theoretically, the connection speed in this case rests against the PCIe x8 speed, i.e. 32 Gb / s. There are two power supplies. How to coordinate them among themselves can be found on the Internet.
  numberPriceAmount
MotherboardTYAN S82321790790
CPUAMD Opteron 6282SE210002000
CPU coolerNoctua NH-U12DO A3260120
HousingAntec Twelve Hundred Black1200200
Power SupplyFSP AURUM PRO 1200W2200400
Graphics AcceleratorAMD HD 7970 GHz Edition25001000
Graphics AcceleratorAX7990 6GBD5-A2DHJ110001000
Infiniband adapterX4 DDR Infiniband1140140
Infiniband cableX4 DDR Infiniband1thirtythirty
    5680 (per block)

For a cluster of such configurations, two are needed and their cost will be $ 11360 . Its power consumption at full load will be about 3000W. Theoretically, performance will be up to 31Tflops.

Option 3 on the Tyan FT72B7015 platform

Housing Tyan FT72B7015

This option differs in that with eight GPUs there are only two CPUs. Accordingly, its performance in real-world tasks will depend on the ability of the program to be highly parallelized.
  numberPriceAmount
Chassis (3000W)Tyan FT72B7015149004900
CPUXeon X5680213002600
CPU coolerSuperMicro SNK-P0040AP424080
Graphics AcceleratorAMD HD 7970 GHz Edition85004000
    11580

Theoretically, performance will be up to 32 Tflops.

Option 4 for LGA2011, clustered

  numberPriceAmount
MotherboardSupermicro X9DRG-QF1600600
CPUIntel Xeon E5-2687W220004000
CPU coolerSupermicro SNK-P0050AP42fifty100
HousingAntec Twelve Hundred Black1200200
Power SupplyFSP AURUM PRO 1200W2200400
Graphics AcceleratorAMD HD 7970 GHz Edition35001500
Graphics AcceleratorAX7990 6GBD5-A2DHJ110001000
Infiniband adapterX4 DDR Infiniband1140140
Infiniband cableX4 DDR Infiniband1thirtythirty
    7970 (per block)

For a cluster of such configurations, two are needed and their cost will be $ 15,940. The total power consumption at full load will be about 4000 watts. Theoretically, performance will be up to 39Tflops.

Also popular now: