Cluster performance testing under Windows. Linpack, Lizard

    Hello,

    Today's post is about the delicate issue of cluster performance testing. Many will say (and be right) that, in general, the results of such tests are intended solely for press releases and reporting to TOP500 have no practical use. However, testing tools can also be used to identify system bottlenecks. So, in the first post we will talk about Linpack & Lizard.

    Table of contents:

    1) Linpack general information
    2) Linpack main parameters
    3) Lizard. Linpack implementation for Windows-systems
    4) Lizard. Linpack optimization for Windows-based systems
    5) Native cluster testing tools


    Note: at some points we are talking about the performance of computers, in some - about the network. These two indicators add up to the overall cluster performance

    1) Linpack Overview



    Since the 80s, the Linpack library, now expanded to a more functional LAPACK (Linear Algebra PACKage), has been considered the benchmark of the library for testing the performance of supercomputers (not only clusters). It has interfaces for Fortran and C.

    LAPACK analogues:
    * Intel MKL
    * AMD ACML
    * Sun Performance Library
    * NAG's LAPACK
    * HP's MLIB

    Each manufacturer, in the best IT tradition, develops and implements its own library for its architecture. Naturally, Intel’s
    MKL library will give better performance than LAPACK.

    The main task of Linpack and its analogues / modifications is to solve the system of linear arithmetic equations of the form Ax = f using the LU factorization method with the choice of the leading element of the column, where A is a filled matrix of dimension N. The original matrix is ​​divided into logical blocks of dimension NB × NB. These blocks, in turn, are divided into smaller ones by the P × Q grid. Each of these blocks will go to a separate processor of the system.

    More information about the mathematical base of the test can be found at www.intuit.ru/department/supercomputing/tbucs/4/2.html > read on the Intuit website .

    Performance in the Linpack benchmark is measured in the number of floating point operations performed per second. The unit of measurement is 1 flop (one such operation per second).

    2) The main parameters of Linpack



    N , the rank of the matrix. The higher the rank, the more arithmetic floating point operations will be executed. N is limited by the amount of memory that the system can allocate to the HPL process. LIZARD himself can choose the optimal, as he believes, parameters. So, 26,000 is suitable for four nodes with 2 GB of RAM on each. But it’s better to choose the value empirically, starting with the smallest. A performance drop will be detected when the system starts writing to the swap file, and, accordingly, it will be necessary to slightly decrease the rank value in order to get the optimal one. N must be equal to or greater than P * Q.
    P and Q- additional coefficients, the product of which must be adjusted to the value of N. P * Q = Number of Processes. You can equate P to the number of processors, and Q to the number of nodes - it will be quite optimal. Before configuring, you need to consider Hyperthreading (or better off altogether).
    NB - coefficient reflecting the number of parts into which the task will be divided. Shows how much a piece of data will be received by each node. Practically speaking, the smaller the value of this coefficient, the more optimal the processor load. But you can configure it as it is considered necessary, and watch the performance that will turn out in the end (based on the needs of the architecture). When dividing N by NB, the remainder must be zero.

    For convenience, you can use Excel Linpack, when filling in the corresponding cells independently calculating the values ​​of the coefficients.

    HPL saves the results to an hpl file in its working folder with detailed comments. Unfortunately, I was not able to bring such a file from our configuration into a digestible form.

    3) Lizard. Linpack Implementation for Windows Systems



    It is logical that Microsoft, suddenly flew into the TOP500 with its new system, could not stay away. For lazy Windows-system administrators, a shell for testing cluster performance (Lizard, Linpack Wizard) was specially developed, which is based on a canonical library wrapped in a convenient visual wizard (supplied with the HPC Tool Pack 2008). This wizard allows both an express test (with standard parameters automatically selected by the wizard) and advanced for specific coefficient settings. Everything is accompanied by comments.

    4) Lizard. Linpack Optimization for Windows Systems



    For optimization, Microsoft recommends disabling all services on which the system’s operation does not directly depend. Script:
    sc stop wuauserv
    sc stop WinRM
    sc stop WinHttpAutoProxySvc
    sc stop WAS
    sc stop W32Time
    sc stop TrkWks
    sc stop SstpSvc
    sc stop Spooler
    sc stop ShellHWDetection
    sc stop RemoteRegistry
    sc stop RasMan
    sc stop NlaSvc
    sc stop NetTcpActivator
    sc stop NetTcpActivator
    sc stop NetTcpActivator
    sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivator sc stop NetTcpActivpator
    sc stop MSDTC
    sc stop KtmRm
    sc stop KeyIso
    rem sc stop gpsvc
    sc stop bfe
    sc stop CryptSvc
    sc stop BITS
    sc stop AudioSrv
    sc stop SharedAccess
    sc stop SENS
    sc stop EventSystem
    sc stop PolicyAgent
    sc stop AeLookupSvc
    sc stop WerSvc
    sc stop hkmsvc
    sc stop UmRdpService
    sc stop MpsSvc
    sc config wuauserv start = disabled
    sc config WinRM start = disabled
    sc config WinHttpAutoProxySvc start = disabled
    sc config WAS start = disabled
    sc config W32Time start = disabled
    sc config TrkWks start = disabled
    sc config SstpSvc start = disabled
    sc config Spooler start = disabled
    sc config ShellHWDetection start = disabled
    sc config RemoteRegistry start = disabled
    sc config RasMan start = disabled
    sc config NlaSvc start = disabled
    sc config NetTcpActivator start = disabled
    sc config NetTcpPortSharing start = disabled
    sc config netprofm start = disabled
    sc config NetPipeActivator start = disabled
    sc config MSDTC start = disabled
    sc config KtmRm start = disabled
    sc config KeyIso start = disabled
    rem sc config gpsvc start = disabled
    sc config bfe start = disabled
    sc config CryptSvc start = disabled
    sc config BITS start = disabled
    sc config AudioSrv start = disabled
    sc config SharedAccess start = disabled
    sc config SENS start = disabled
    sc config EventSystem start = disabled
    sc config PolicyAgent start = disabled
    sc config AeLookupSvc start = disabled
    sc config WerSvc start = disabled
    sc config hkmsvc start = disabled
    sc config UmRdpService start = disabled
    sc config MpsSvc start = disabled


    5) Native cluster testing tools



    In addition to Linpack and Lizard, Windows HPC Server 2008 (namely, HPC Pack 2008) has standard cluster performance testing tools, such as:
    MPI Ping-Pong Lightweight Throughput (packet transfer between nodes)
    MPI Ping-Pong Quick Check (check network latency, bandwidth etc)

    Of course, this does not end the list of tests, there are more than 10 of them covering the entire functionality of the cluster.

    Thanks for attention.

    Also popular now: