Softliner February 18, 2015 at 13:15

Supercomputer in NArFU: exploration of the Arctic by numerical methods

In modern technical and engineering universities, quite serious computational problems are often solved, such that on a regular computer they will be considered days and weeks. Powerful computer systems, "number-gnawing", have already created dozens of universities in Russia. One of them is the supercomputer recently built by Fujitsu and Softline at the Northern Arctic Federal University in Arkhangelsk.

What kind of computing did you need a supercomputer for?

A number of problems are much easier to solve by numerical methods than analytically. Typically, these are applied problems of mathematical modeling of various production units, for example, chemical reactors, heat exchangers, or torches of a welding machine. A reliable model allows you to accurately predict the behavior of a real device, depending on changes in certain operating parameters and improve it. To obtain a reliable model, it is usually required to check the calculated data with the data of a real experiment more than once, make adjustments to the model, and recalculate. This is very computationally expensive, even if you are calculating intermediate versions of the model with reduced accuracy. A few days or weeks of settlements on an ordinary computer is a common reality.

In NArFU, such resource-intensive calculations are used in several scientific and applied fields at once.

The first direction is problems in the field of molecular dynamics. This, for example, modeling diffusion, absorption, mass transfer in gas mixtures, all this is calculated with high accuracy - up to the behavior of hundreds and thousands of molecules. In practice, it solves the problem of improving the properties of filter materials, improving the technology of separation of mixtures and purification of chemicals.

The second direction is hydro-gas dynamics. These are also applied tasks oriented to production, in particular engineering. One example is numerical calculations of the behavior of a flame in a gas burner. Calculation of speeds, pressures and temperatures in different layers of gas, turbulences, as a result allow to improve welding technology, improve tools, improve quality and speed. A branch of NArFU in Severodvinsk is engaged in similar tasks. This is the forge of the fleet, and there really is a lot of work to improve production technologies.

The third area is heat engineering, calculations in the field of thermodynamics. It was from the department of heat engineering that the first task came, which was considered on a supercomputer. In the student work for the bachelor's degree, a mathematical model of a heat exchanger was created - a heat exchanger for the selection of by-product heat from industrial furnaces in the form of heated gases.

In addition, the Institute of Mathematics, Information and Space Technologies of NArFU is actively using a supercomputer for training and practical work on the creation and optimization of parallel algorithms.

What does it consist of

The NArFU supercomputer is relatively small - it has 20 computing nodes, each is a 2-processor server with 10 cores on each processor. Total 40 processors and 400 cores. This is not so much compared to 1000-processor monsters, but for a university level it is very good, and it is enough for solving computational problems of NArFU.

Eight of these 20 nodes are equipped with Intel Xeon Phi coprocessors - these are 60 nuclear threshers similar in functionality to nVidia GPUs. They very quickly consider a number of specific problems, first of all, calculations with large matrices, and the numerical solution of systems of differential equations. Their use gives a tangible increase in productivity, especially on such specific tasks for which they are intended.

Although Intel Xeon Phi is not an expensive pleasure, using coprocessors is much more profitable in terms of performance / cost ratio than counting the same tasks on ordinary computing nodes without coprocessors.

In addition to computing nodes, there are two more head servers for queuing tasks for computing and cluster administration. And four servers serve the storage system, more about it below.

Supercomputer Communications

In clusters, there are two factors most critical for performance:

1. speed of communication between nodes,
2. speed of access to large files.

The point is that the program should count, and not waste time waiting for I / O operations. This is where bottlenecks need to be eliminated first.

To exchange data between processes running on different nodes, a separate, fastest network is used. This is an InfiniBand network with very high bandwidth (up to 56 gigabits per second) and low latency. This network is used with very high intensity, it is indicated in pink on the diagram.

The second separate network (indicated in orange on the diagram) is used by the job management system to connect to nodes, transmit commands and service messages. The speed requirements here are much lower than in the first network.

And the third network, shown in green in the diagram, is a technological network for servicing hardware components. Modern servers allow you to manage yourself at the hardware level, regardless of the installed system. Turn on / off, check the parameters of the hardware components, run diagnostics, reboot - all this is possible at the hardware level and all this is done through this network.

Data storage

Fujitsu Exabyte File System (FEFS) 60 TB networked storage delivers 1.7 gigabytes of bandwidth per second. It is much faster than any hard drive. Physically, these are 2 baskets of hard drives that are serviced by 4 servers.

The FEFS file system contains a metadata server that stores metadata about the namespace and several server storage objects with, in fact, files.

Software

Operating system on compute nodes - Redhat linux.

Job Management System PBS Professional.

Fujitsu HPC Gateway cluster management system, its task is to install and reinstall computing nodes, turn them on and off, etc.

Ansys system was purchased from commercial engineering software; in fact, it is responsible for the calculations themselves.

How it all looks from the point of view of the user

There is a head server that users log on to, for example, remotely. Through ssh, they can place their files, compile them and send the generated task to the queue for calculations. This is done through PBS Pro. When the task is calculated, you look at the results, and repeat if necessary.

And the second way is to send your models to be considered on the supercomputer with one button from the engineering work environment. This can be done from Ansys and from other engineering software too. You only need to integrate them correctly with the job management system

How it all looks physically

In the central building of the university there is a fairly large server room, it has several rows of racks, the supercomputer equipment is distributed in three racks, which are not fully loaded to optimize cooling.

Computing nodes are dual-processor (Intel Xeon E5-2680 v) servers in a half-width 1U form factor. Two models: Fujitsu PRIMERGY CX250 S2 and CX270 S2 are distinguished by the presence of the second Intel Xeon Phi coprocessor.

The Fujitsu PRIMERGY RX300 and R200 Rack Servers are used for storage system maintenance and as head nodes.

A supercomputer could consume electricity up to 50 kilowatts (taking into account cooling and redundancy of power), this is a lot on the scale of the city of Arkhangelsk. Fortunately, when connected, it was possible to integrate into the existing reserve and university infrastructure. But in general in a university, high energy consumption can be a problem.

Welcome to the club

Many Russian universities have already built their own supercomputer; they are united by the Supercomputer Consortium of Russian Universities ( http://hpc-russia.ru/ ), which also includes NArFU. The main task of the consortium is the popularization of parallel computing and mutual assistance of participants. If there is a need to find something more resource-intensive, we can turn to partners. The result of the joint work was the inclusion of the annual youth scientific-practical school "High-Performance Computing on GRID Systems" ( http://itprojects.narfu.ru/grid/ ), held at NArFU, in the list of events of the supercomputer consortium.

Prior to purchasing a supercomputer, NArFU employees considered their tasks on clusters of other universities both in Russia and in countries neighboring with the northwestern region — Sweden, Norway, and Finland. And now colleagues from other places use the cluster of NArFU.

We are grateful for the cooperation in writing this article by Alexander Vasilievich Rudalev, a leading software engineer at the Department of Applied Mathematics and High Performance Computing, NArFU.

Tags:

supercomputer