Do-it-yourself matlab cluster

The information in this article is out of date.



Introduction


Sooner or later, the user of the Matlab / Simulink package is faced with the problem of insufficient performance of the computer on which it works. I also faced.

The first thing that came to mind was the modernization of iron. I work on an old laptop, and so I began to think about a stationary PC. But this did not promise a large increase in productivity, and indeed it is a gamer’s method but not an engineer’s.

Details under the cut >>

Based on the fact that this was my first experience using parallel computing, the topic is also intended for beginners. Connoisseurs of parallel computing will express my gratitude for the good advice and comments of my actions.

So:
  1. New computer.
    As I mentioned above, for a comfortable "scientific" work, I needed just a "gaming" PC, which has a lot of nuclear processor and a graphics accelerator with cuda capability of at least 1.3 . Expensive and not scalable. I was sure that I would master his unbridled power quite quickly, and again it would be few.
  2. Optimization / minimization / refinement of models and code.
    I already traveled this path before I hit the ceiling of my laptop’s capabilities. The method completely exhausted itself when the time of modeling the situation became longer than the standard coffee / tea time. “You can start the simulation and go to sleep!”, Those who know will say, but for this it is necessary to have an already optimized model and a clear idea of ​​how it will behave. I also needed performance in the initial stages of development.
  3. Using a computing cluster.
    There was no cluster at hand, but on such as Uranus you need to knock for a long time. And the statement of the problem was not very suitable for using third-party resources. I was not ready to run the task once and collect the results. This was still a long way off.

As a result, it was decided to build, although small, but with blackjack its own cluster.

Iron search


Hoping for the predominance of good in our world, I turned to colleagues and acquaintances for help. As a result, some volunteered and provided their PCs for my experiment. There were several options, but I had to choose only two cars. I will try to explain why:
  • Each had a core i7 processor on board. Four cores gave 2 virtual. In total, I had the opportunity to create 16 virtual WORKERS based on them, which is pretty good. Plus, the RAM was in one 6 and the other 8 Gb.
  • They were located in the same local network, which was simply necessary to minimize the time required to transfer data between virtual laboratories. To connect to this network, I raised the VPN channel, and to work with each PC I set up a standard remote desktop.

Cons were also present:
  • Both PCs were running Windows 7 x64, while my laptop had an x86 operating system. One of the mandatory requirements for the Matlab cluster is the uniform bit depth of the operating systems used. As a result, I lost the opportunity to add two more WORKERs to the cluster based on the processor of my laptop.
  • Suda capability of video cards was less than 1.3 and it was not possible to include them in a bunch either.

But since the pluses were still outweighed, the first stone of the cluster on the knee was laid.

Building a cluster


Software requirements

I covered the requirements for iron above. The software requirements are as follows:
  1. Matlab on every machine.
    Do not do without a platform.
  2. Distributed Computing toolbox (DCT) on each machine.
  3. MATLAB Distributed Computing Engine (MDCE) on each machine.
    Both toolboxes work only together.
  4. The compiler is different from the standard Matlab compiler.
    On machines with a 64 bit operating system, if we are going to use them as a host to compile the Simulink model into C code. Matlab offers a long list of supported compilers, some of which are free. I used Microsoft Visual C ++ 2010.
  5. Simulink and other extensions depending on the type of activity.

Cluster launch

First, on each machine you need to install and run MDCE as a service. This can be done both without leaving Matlab (using the "!" Sign executes the Matlab command as a system command) or using the command line. * .bat files for this are located relative to the Matlab installation directory along the path \ toolbox \ distcomp \ bin \.

Install and run MDCE on each machine: Run the scheduler, which will manage parallel computing: The -v parameter is responsible for the detailed display of the startup process in the Matlab command window. The scheduler can be run remotely on any PC at our disposal. We start the cycle of the so-called workers (workers), who will simultaneously fulfill the tasks assigned to them:
cd('C:\Program Files\MATLAB\R2010b\toolbox\distcomp\bin\')
!mdce install
!mdce start


!startjobmanager -name jm -v



clientHost = 'slovak';
node = {'slovak', 'puls'};
for i = 1:length(node)
for j = 1:8
str = ['!startworker -name w_' num2str(j) '_' node{i} ' -jobmanagerhost ' clientHost ' -jobmanager jm -remotehost ' node{i} ' -v'];
eval(str)
end
end

The host 'Slovak' is the computer on which the scheduler is running. In my case, this is one of the working PCs. I had to rename it, because the previous name was in Cyrillic, which Matlab does not tolerate.

After creating workhorses, we can admire our cluster:
!nodestatus -infolevel 3
-infolevel 3 parameter for detailed information on "nodes". You can manage the cluster using the GUI. To do this, the admincenter.bat file is in the batch file folder, which launches the administration utility. In our case, it looks like this: At the time of writing the article, PC puls “fell off” and therefore there are only 8 workers in the admin center. Next, we need to find the created scheduler using Matlab:
...
Job manager:
Name jm
Running on host Slovak
Number of workers 16
...





jm = findResource('scheduler','type','jobmanager', ...
'LookupURL', 'slovak:27350', 'Name', 'jm');

After this step, you can solemnly cut the ribbon and believe with confidence that our cluster is created and ready to work. Next, you need to correctly distribute the work between the nodes, organize access to shared resources, and much more. I also want to note that at this stage we can only run parallel tasks from Matlab. In order to parallelize the Simulink model, I had to dance with a tambourine more than once. As a result, I fundamentally changed the concept of all my work especially for the convenience of a parallel solution, but this is the topic of a separate topic.

Information on the topic was drawn from the following resources:
  1. Matlab Website
  2. Matlab User Blogs
  3. Russian-language user forum Matlab
  4. N.N. Olenev, R.V. Pechenkin, A.M. Chernetsov Parallel programming in MATLAB and its applications M.: VTs RAS, 2007.120 p.
  5. The method of scientific trial and error

Thank you for attention!

PS: All the work done was carried out with the version of Matlab R2010b, but when I just glanced briefly at the distcomp folder of the new version of R2011a, at first glance, the work of the mathworks team was not small. Unfortunately, I don’t have the new version myself yet, but as soon as the opportunity arises I will try to highlight the innovations that it can offer us. Although now I'm more busy generating C code from the Simulink model for microcontrollers.

Also popular now: