Creating and Using Matlab Clusters

Purpose of the article: I want to share my experience in creating three Matlab computing clusters, as well as their remote administration.

Small introduction

When researching / modeling various natural phenomena (and not only), occasionally there is a need for great computing abilities with which a home PC can no longer cope (no matter how powerful it is). In the end, this need appeared in me.

Modeling associated with solving systems of nonlinear differential equations over a long period of relative time takes up a lot of processor time, so it was decided to "parallelize" this whole thing.

So, about everything - in order


Iron available:

At home : a computer (Phenom II x4 840, 7x64) and a laptop (Athlon II Dual-Core M320, 7x64) connected to the same network by the good old DIR-300 router.

At the girl’s house: comp (i5 4440, 7x64).

At work: 10 computers (Athlon II Dual-Core, XPx86) (connected to one network) in one room and 4 (Athlon II Dual-Core, XPx86) in another (also connected to one network). There is no local area network between rooms.

All of the above boxes have Internet access.

We begin to create 2 clusters at work.


The article describes a way to create a cluster, but it does not indicate a lot of pitfalls during its creation, which almost buried my venture. (Although everything was done according to the wonderful instructions, for which I thank the author!)

To begin with, I want to note that first of all you need to install Matlab correctly. The point here is not “not to breathe during installation”, or “right” selection of components, but that there are 2 versions of Matlab. One server, the other local. So, if you install only one of them - it will not work.
In this articleThere is data that will help to understand the issue of versions, however, it should be noted that in the new version of Matlaba R2013b, the installer works a little differently than described on the screenshots in the article, so first you need to install the local version with the Parallel Computing Toolbox, and only then, to another folder, the server version with the Distributed Computing Server, otherwise an error will occur when starting the Parallel Computing Toolbox:

Starting parallel pool (parpool) using the 'MJSProfileXXX' profile ... 
Error using parpool (line 111)
Failed to start a parallel pool. (For information in
addition to the causing error, validate the profile
'MJSProfile8' in the Cluster Profile Manager.)
Error in parallel.internal.ui.PoolHelper.startPool (line 11)
            parpool();
Caused by:
    Error using
    parallel.internal.pool.InteractiveClient/start (line
    326)
    Failed to start pool.
        Error using parallel.Job/submit (line 304)
        All dimension arguments must be greater than zero

This error is very popular in various forums, but no one says how to get rid of it. It occurs when the Parallel Computing Toolbox is launched on the server version of Matlab (the start parallel pool button).
image

Therefore, you need to run parallel / cluster computing with Mastera on the local version of Matlab .

Install two versions of Matlab in one folder, as recommended in the article above - the R2013b installer did not allow it, and, as it turned out, did it right!
After the full installation of Matlabovsky software, 2 folders should be present on the Master. The first is with the local version of Matlab, which hooks all the extensions of the Matlab files and creates shortcuts, and the second folder with the Matlab Distributed Computing Server installed.(the second folder is autonomous and can be transferred to all computers on the local network to save time when deploying the cluster)

We will consider the computer on which the 2 versions of Matlab installed as Master, since it is from it that our program will start. Only the server version of Matlab should be installed
on other computers of this local network ( or just copy the second folder from Master to any directory on other computers )

Now you can safely use the above article to create a local Matlab cluster. However, it is worth noting that running the! Mdce install and! Mdce start commands for the first time follows from the window of the Matlab itself, no matter which version. This will help to avoid the error that the vcredist64 / 86 libraries are missing, because if you run the! Mdce install command from the Matlab window, they will install themselves. Otherwise, the mdce server, called for example from a batch file, may simply not get up despite the lack of libraries.
You need to run these commands from the bin directory of the 2nd folder, respectively (there is a mdce.bat file there).
Personally, I had enough 3 commands on the Master to start the cluster :
!mdce install
!mdce start
!admincenter

And already from the Admin Center you can create a scheduler and distribute workers on computers. But here again there is a pitfall. Damn firewall! I strongly recommend disconnecting it completely! With all the rules of incoming and outgoing connections and with all exceptions. Only in this way was I able to ensure that Admin Center added all the computers on this LAN. By the way, when adding a computer to the local network, you can set its name for example Siegurd-PC and are not afraid of dashes. At least in the latest version of Matlab, this works.
When adding computers to the Admin Center, it is necessary that the mdce service is already running on each computer and hangs in the processes. At the same time, Matlab itself can be closed on each computer, since it does not participate in any way.

The Admin Center has the ability to start the mdce service remotely, but I have not been able to do this. Perhaps the fault is the lack of administrator rights to access the folders of computers on the local network, but this is not so important and does not affect the task at all.
And yes, when you run mdce most likely there will be such messages:

Setting permissions on LOGBASE C:\Windows\TEMP\MDCE\Log 
Setting permissions on CHECKPOINTBASE C:\Windows\TEMP\MDCE\Checkpoint 
Setting permissions on SECURITY_DIR C:\Windows\TEMP\MDCE\Checkpoint\security 
Unable to give the "Administrators" group full control for C:\Windows\TEMP\MDCE\Checkpoint\security 
You may need to manually give the "Administrators" group read and write access to this directory. 
Unable to give the "CREATOR OWNER" group full control for C:\Windows\TEMP\MDCE\Checkpoint\security 
You may need to manually give the "CREATOR OWNER" group read and write access to this directory. 
Unable to give the "Authenticated Users" group traversal rights for C:\Windows\TEMP\MDCE\Checkpoint\security 
You may need to manually give the "Authenticated Users" group traversal rights to this directory. 

I simply ignore them, since they do not affect the health of the cluster. The article about the correct installation describes in detail the method for treating these errors related to the Russian-speaking axis.

Important! When using the Admin Center to create workers, developers recommend making sure that port 7 is open in case of errors.

Create Scheduler

In the Admin Center, right-click on any of the computers added and select Start MJS in the context menu, or click on Start in the Admin Center window itself:

image

After creating the scheduler, we add workers to each computer in the same way.
That's all with setting up a cluster on the local network.

Start computing

To start cluster computing, you need to run the local version of Matlab on Master and add a scheduler. In the main window of Mtalaba, click on Diskover Clusters ...

image

Then search for the previously created scheduler in the local network:

image

After adding it, you need to select the number of workers in the settings of the parallel profile!

image

We go into Parallel Preferences and select the number of workers to which we should connect.
image

Important! If the set number of workers in the settings is less than those created in the Admin Center, then the calculations will take place only on the specified number of workers. That is, if you created 20 wokers, and in the settings it costs 4, then only the first 4 in the Admin Center list will work. The status of the workers to which you are connected should change fromidle on busy .
If the exposed number of workers is more than created - Matlab will connect to all existing workers without errors.

After the work done, you can safely run your code, which will be parallelized between all the workers of the current scheduler (cluster). (Personally, I used the parfor loop for this, but there are other commands)
This scheme is used in 2 rooms and the Masters are administered remotely through a time-window from home. Unfortunately, these clusters are not interconnected, since it requires a fully connected network (each with each one), and I could not set up a VPN between so many computers.

Creating a Home VPN Cluster


In many ways, the deployment of the cluster was similar to the previous ones, however, the use of VPN created several obstacles. I managed to get rid of them only after long dances with a tambourine.
To unite in one network home computers and a computer of his girlfriend, the notorious Hamachi was used. The current version allows you to add 5 cars to the network for free.

How to do everything right:
Install Hamachi on all computers. Create a virtual network on any of them and connect everyone to this network. Again, turn off the firewall if it’s not okay , start the services and the scheduler ...
When adding computers to the Admin Center, you need to add them by IP addresses, not by name. It is important! The Master computer on which calculations from the local version of Matlab will be launched needs to be added by name.
In Hamachi’s settings, disable traffic encryption, traffic compression, and traffic filtering (set the value: allow all). This is the only way I was able to ensure that all workers get Connected status! Prior to these actions, workers were created, but their status was Failed to connect.
After all the workers have received the Connected status, you can safely start the calculations according to the instructions described above.

Note.Hamachi is good, but unstable, especially when experimenting with network settings, so if you suddenly can’t connect to the dedicated computer, I recommend restarting both (Master, and the computer with which there is no connection) if it doesn’t help, reinstall Hamachi.

Yes, and one more thing. The mdce service, once installed and running, will itself be turned on with the computer turned on until it is stopped. However, sometimes, when changing network settings and conflicts arise, I advise you to restart this service as a solution to the problem with the commands:

!mdce stop
!mdce start


Summary


Thus, overcoming all these subtleties, 3 computing clusters were organized, which, through the timeline, are remotely administered from any computer at any time. The main thing - do not forget to turn off sleep and auto power off on all computers!

Once again I want to thank the authors of 2 articles that were used above. Thanks you!
Indeed, without your efforts, men would never have been able to raise these clusters!

I hope my article will help people who wanted to create a computational Matlab cluster, but stumbled on pitfalls and could not do this.

PS: If anyone knows how to create a fully-connected, free VPN on the windows - please light up. This will help many scientists in organizing and conducting serious scientific research.

Good luck with the deployment of clusters!

Thanks for attention!

Also popular now: