Just about the complex: high-performance computing for engineering and research

What do you need to dig a garden? If there is a garden, you need working tools and labor (workers). But what if you need to dig up faster or more? You can call friends or hire other people, that is, increase the number of employees. This is an example of high-performance digging of the garden. It is not always possible to increase the productivity of digging up a garden by searching for strong workers, since the productivity of each individual employee is limited. Therefore, it is necessary to resort to the services of a larger number of workers.

Similarly with high performance computing. Workers (eng. Workers) are called individual computers and processor cores in computing clusters, based on the terminology of the MATLAB package (English MATLAB). In the documentation of other clusters, these kernels and computers are called nodes (English nodes), and I will call them in this note.

Just about the complex: high-performance computing for engineering and research


Introduction

Habrahabr has already written a lot about high-performance, distributed and parallel computing (BB). freetonik already made a detailed and visual introduction to parallel computing and continued here , BB were considered by keleg in here , the theory of distributed computing was disclosed in the note by mkosyakov , Melges described the experience of organizing parallel computing over a network in C and XakepRU described how processes can be parallelized in Linux. After re-reading them, I realized that there is no note that could help start using explosives to solve engineering and scientific problems. This is most likely a common feature of many sources of information on this topic. Programmers write good programs that perform the tasks assigned to them. University professors explain how and why high-performance computing is worthwhile. But as soon as researchers realize that it is time for them to use explosives, they are confronted with a small number of 'bridges' that link understanding of explosives with the direct use of explosive systems in their work. At universities, students can find such a “bridge” in laboratory and practical work. And I will try to fill this gap in the hope that the material will be useful for those who have not studied it and will help them start using BB.

High-performance computing (BB) comes to the rescue in those cases when you need to reduce the time of calculations or gain access to a larger amount of memory. For example, your program may perform the necessary calculations within a week, but you need to get the results tomorrow. If you divide this program into parts and execute each of them on a separate node, then theoretically you can speed up the calculations in proportion to the number of nodes involved. But this is only theoretically, but in practice, something always interferes with this (which was described in detail here) Here it is worth mentioning another case when your program requires a large amount of RAM. For example, only 4 GB of RAM is installed on your computer, but at least 64 GB is needed for calculations. In BB systems, a memory of a certain capacity is installed on each node. So, if 2 GB of memory is available for each node, then again you can divide the program into 32 parts, each of which will be executed on a separate node, interact with other parts, exchange data and, ultimately, the program as a whole will have access to 64 GB of memory.

From these examples, you probably realized that high-performance computing is computing performed on computer systems with specifications that far exceed conventional computers. This concept is conditional, perhaps there is a more precise definition, but I just could not find it. There are parallel, distributed explosives, as well as their combinations.

Parallel computing involves the development of programs that, during their execution, are several parallel and interacting processes. For example, modeling the characteristics of a solar cell cell involves the interaction of three models describing: carrier transport, propagation of incident light inside the cell, temperature effects, tension-compression. Thus, carrier transport, tension-compression, and the refractive index of the material used in the optical model of incident light depend on temperature and the models describing these effects must interact with each other in the calculation process. To speed up the calculations, you can execute the code of the model describing the transport of carriers on one node, the code responsible for the distribution of light on the other, the temperature model on the third, etc. That is, the nodes will perform interacting calculations in parallel.

Distributed computing involves the use of several nodes and processes that do not interact with each other. Very often in this case, the same code is executed on different nodes. For example, we need to evaluate the extension and compression of the same cell in a solar cell as a function of temperature. In this case, the temperature is the input parameter of the model and the same program code of this model can be executed on different nodes for different temperature values.

The choice between distributed and parallel calculations depends on the organization of the program code used for calculations, the physical model itself, and the availability of explosive systems for the end user. Later in this article about
  • how the end user interacts with the BB system;
  • what explosive systems are available and what are their limitations;
  • about clusters built using the Condor software and the MATLAB (the choice fell on them simply because of the author’s experience with them);
  • a bit about supercomputers and grids;
  • and how you can take advantage of all this economy.


User Interaction with High Performance Computing

If the user uses the BB system remotely, then he needs a computer on which he will:
  • prepare programs for launching on BB systems;
  • run calculations on BB systems or connect to BB systems to run programs;
  • on which he will process the calculation results.


Here a lot depends on the personal preference of the user, the availability of the necessary software and other requirements. There is a choice of programming language, operating system of a working computer and a cluster, used software libraries and software for organizing clusters. Personally, I always try to write programs in C or C ++ using MPI and openMP for Linux ( here and herethere are already good articles on this subject of these dads and moms of high-performance computing), but for various reasons this does not always work. A typical situation - the boss comes on Friday and says that we urgently need results. It ends with writing a program in MATLAB for the necessary calculations. And in order to get results faster, this program runs on the MATLAB cluster of our organization until Monday.

As for the operating system of the user's working computer, in most cases it is most convenient to use the same operating system and its distribution kit that is installed on the BB system. Currently, most BB systems are running various Linux distributions.. If Scientific Linux is installed on our cluster, then it is easier to install the same system on a working computer so that you do not get confused with commands in the future. If you plan to use a MATLAB-based cluster, the choice of an operating system does not matter, since programs written in MATLAB can run on computers with any OS (available for installing MATLAB, of course).

If you choose a mixed scheme, in which you have installed the MS Windows family of operating systems on your computer, and the BB system is built on Linux, then you need a client to connect to a remote system (for example, PuTTY), and maybe an X server or immediately Cygwin, in which all this is. Local software administrators will always help you in choosing software.

An important point: BB systems usually either do not support programs requiring interactive work (which, during execution, request data input, expect other user actions such as keystrokes or mouse manipulations) or support them limitedly. Similarly, with regard to the graphical interface - its use is most often not provided for and BB systems are used in text mode from the command line (the exception is the same MATLAB). Before using your program in the BB system, it must be debugged and then transformed so that it can be run on the BB system and without further human intervention, it performed the calculations and saved the results in files or transferred them to the user in another way.

The requirements for the user's computer performance are minimal, since the main calculations are still planned to be carried out on BB systems. You can monitor the status of calculations, start and interrupt them from a computer of minimal configuration and from a mobile phone.

Some BB systems

general review

Most often, supercomputers, computer clusters, and grids are used for explosives.
Supercomputers are computer systems that significantly exceed most of the existing computers in terms of their parameters such as performance, available RAM, and the available number of processors. For more information about them, you can see a list of the five hundred most productive supercomputers in the world .

A computer cluster is a group of computers that can interact with each other to increase the available memory and the number of processors involved in the work. Most often, such clusters are built within research groups or organizations.

Grids are groups of clusters and supercomputers scattered across different cities and countries. So, for example, you can transfer your computational task to a server in Switzerland, but it will be performed either on clusters in Germany, France or Poland. The most famous example of a grid is the European EGEE grid system , which combines about forty thousand processors and several petabytes of disk space.

It is often difficult or impossible for the end user to distinguish between supercomputers and clusters. Here are three examples:

1. It is not rare that a group of computers connected to each other through high-speed communication networks is also called supercomputers, which is essentially the same computer cluster;

2. At the same time, there are clusters built on the basis of HPCondor software, it is also a group of computers interacting with a server on a local network (often a slow network) and no one will dare to call such clusters supercomputers;

3. There are NVIDIA supercomputers (which have a larger system unit than regular office computers) in which the entire computing system is not scattered across the network, but fits in this system unit.

If we take examples 2 and 3, then the difference between a supercomputer and a cluster is obvious. In the first and third, a special facet is not visible, again, both systems from these two examples are called supercomputers.

HPCondor clusters (Condor, after 2012 - HTCondor)

Software for organizing such a cluster can be downloaded free of charge from the project page . Clusters of this type consist of work computers and a server. To dunordavind's commentsmade an important clarification: such BB systems are not clusters in the classical sense, but rather resource managers (but in order not to rewrite the entire text, I will continue to call them clusters anyway). The advantage of such a cluster is that ordinary office and laboratory computers on which client software is installed can act as working computers. In the daytime, these computers can be used for basic work, but as soon as they stop using them (it depends on the settings), the server starts to run on these computers tasks that were transferred to it earlier. A prerequisite for using this cluster is to install client software on the computer from which users transfer tasks. That is, their computer must be part of a cluster. Supported operating systems: MS Windows, MacOS and Linux.

To run the program, this program must be compiled into executable code for the desired OS and, together with the necessary libraries, transferred to the server. This also applies to programs written for MATLAB - you also need to compile them using the C compiler that comes with MATLAB. To run this program in a cluster, you need to write a simple configuration script that writes down the requirements for the runtime of your program (RAM size, operating system, and so on) and a list of files transferred with this program. As an example, below is the text from one of these files (let's call it cost_top.txt):

universe              = vanilla 
executable            = cost_top.bat 
transfer_input_files  = cost_top.exe 
output                = dump.txt 
error                 = errdump.txt 
log                   = foo.log 
requirements          = (OpSys == "WINNT51") 
rank                  = kflops 
transfer_files        = ALWAYS 
queue


I am sure that you already guessed - this file “explains” Condor software such important points as the name of the executable program, which files should be transferred to the cluster, in which file to write the results of program execution, in which - error messages, in which - additional messages, which it is the requirements put forward by the OS of the node and its performance and whether to transfer files.

The contents of the cost_top.bat file, which runs on the node:

path=c:\windows\system32;c:\windows;c:\windows\system;p:\matlab6\bin\win32 
cost_top.exe 


Most likely you will understand that the first line in this script is responsible for adding the necessary paths to the environment variable, the second - for launching the program we need.

To transfer your task to the cluster server, you will need to type 'condor_submit cost_top.txt' on the command line. After that, your task will be queued and after a while the server will be ready to run your task on client computers. The waiting time in the queue depends on the priority of each user and the load on the cluster and is selected by the server task balancing system.

Clusters of this type have limitations:
  • From the moment the task is queued and until the end of the calculation, your client computer must be turned on and connected to the local network since the server and client exchange files;
  • this cluster supports only distributed explosive tasks;
  • there are difficulties in using any third-party program (other than the one written and copied by you) and programs requiring many libraries.


MATLAB clusters

MATLAB itself is able to create a cluster . To do this, you will need the appropriate libraries and server - the Distributed Computing Toolbox and Distributed Computing Server. Nowadays, modern computer processors have more than one core and MATLAB is able to deploy your own local cluster directly on the basis of your working computer. This cluster configuration is known as a local configuration. It is convenient in those cases when you want to speed up the calculations a little without much effort, as when you need to test the program before starting it on a more serious BB system such as a supercomputer or cluster.

Along with the local configuration, there are other configurations. For example, for a cluster uniting a group of computers in a local network, a group of computers in a cluster or grid. If administrators have the opportunity and are not lazy, then they usually set up MATLAB clusters and conduct training courses so that users can easily use such clusters.

Advantages of MATLAB clusters:
  • the client computer from which tasks for calculation are transferred can be turned off after the task is transferred and the user can pick up the calculation results later;
  • can perform both distributed and parallel computing tasks;
  • MATLAB users find it easier to start using such clusters, since the programming language is already familiar;
  • programs do not require compilation;
  • Adaptation of the program for parallel calculations in which there are already operators of the 'for' cycle is very simple - just replace such an operator with 'parfor' and add a couple of lines to initialize the cluster and close it after finishing work.


For example, code without using parfor:

clear all; 
Na=4:50; 
Nc=4:30; 
for i1=1:length(Na), 
    for i2=1:length(Nc), 
        [out1,out2]=fom(Na(i1),Na(i1),Nc(i2),0) ; 
    end 
end 
save FigOM.dat FigOM -ascii 
save dF.dat dF -ascii 
exit


And now the same with parfor and the four nodes:

clear all; 
matlabpool open 4 
Na=4:50; 
Nc=4:30; 
for i1=1:length(Na), 
    parfor i2=1:length(Nc), 
        [out1,out2]=fom(Na(i1),Na(i1),Nc(i2),0) 
    end 
end 
matlabpool close 
save FigOM.dat FigOM -ascii 
save dF.dat dF -ascii 
exit


Disadvantages:
  • MATLAB is not a free product and it is simply not affordable for some users;
  • cluster software does not come with a load balancing program (it can be installed separately), which leads to situations when some users occupy all cluster nodes and block access for other users.


Supercomputers and grids

As mentioned above, it is sometimes difficult to find the difference between a supercomputer, a computing cluster, and a grid. From this side of the terminal window they all look the same. All of them have a large number of processors and memory in the BB system. Among the installed software, they have compilers and libraries MPI and OpenMP. Sometimes MATLAB and other programs are installed that support the use of a group of nodes and their memory.

The most common operation algorithm is as follows:
  • the user connects (usually via SSH) to special nodes (English login nodes) on which he can interactively execute part of the commands and from which he can control his calculations;
  • Loads the modules necessary to perform a particular task, for example, the gcc compiler and the MPI library;
  • if necessary, it compiles its program with the support of the necessary libraries;
  • similar to the HPCondor cluster, it prepares a file of settings and commands for executing its program (English job submiossion file);
  • sends this file of settings and commands using the command 'qsub file_name' to the queue for execution;
  • as soon as the program is completed, the user can get the results of its execution (and it is easier to save them to files).


Settings files are similar to HPCondor cluster files. For example, in order to run the above example with parfor, you can use the following file:

#!/bin/sh 
#$ -l h_rt=10:00:00 
/usr/local/bin/matlab /home/el/calmap.m


The second line indicates the maximum time required to complete this task, and the third indicates the command that must be executed on this system to run the MATLAB program code that the user needs.
Another example file to run a program that uses MPI libraries:

#!/bin/bash 
#$ -l h_rt=4:00:00 
#$ -pe mvapich2-ib 12 
# 
LDFLAGS="-L$HOME/opt/lib -lm" export LDFLAGS 
CPPFLAGS="-I$HOME/opt/include" export CPPFLAGS 
LD_LIBRARY_PATH="$HOME/opt/lib:$LD_LIBRARY_PATH" export LD_LIBRARY_PATH 
PATH=$PATH:$HOME/opt/bin export PATH 
module add compilers/intel/12.1.15 
module add mpi/intel/mvapich2/1.8.1 
mpirun -np 12 m-mpi test7.ct


In the second line - the maximum time required for calculation, in the third - the name of the environment for parallel calculations (set by administrators) and the number of requested nodes, then 4 lines with the assignment of the desired value to the environment variables, after which two lines are responsible for connecting the necessary modules and at the end of the script - launch the desired program that will use 12 nodes.

Conclusion

It is impossible to embrace the immensity, but it is possible and necessary to try. In this article I tried to give an overview of high-performance computing systems, to help novice users understand the range of features and understand what is available and how it can be used. As you can see, even if you do not have access to supercomputers and grids, you can build your own cluster based on MATLAB or the free Condor software.

ps If you can supplement this note or find an error, then please write about it below. In the end, it will only benefit knowledge and understanding of the issue and this will provide an opportunity to improve the note.
pps There is still the possibility of using CUDA technology to speed up calculations in C / C ++ and MATLAB by involving the GPU cores in the work, but much has been written about this .

Also popular now: