VK_practice September 5, 2016 at 13:42

Huawei KunLun Server - Our Testing

First, a few words about the architecture of KunLun - there is practically no information about it in the Russian-language segment. KunLun was created as a High-End platform; accordingly, all its components are duplicated (including control modules and controllers of NUMA nodes). However, the duplication of the High-End components of the server is not limited to: the solution allows the OS to be replaced without stopping not only PCIe-boards (this is basically not new), but also processors with memory. The system will proactively let you know which components may soon fail, without waiting for the failure itself. You can replace them without stopping the OS. Today, hot swapping of processors and memory modules is provided only in the EulerOS OS (CentOS from Huawei). Out of the box support is promised soon for RHEL and SLES.

Server motherboards including 1 processor and 24 memory modules through a switching system are combined into physical partitions with 4, 8, 16 or 32 processors. Less granularity can be obtained only by applying logical partitioning (hypervisor).

The server is also equipped with built-in disks - up to 4 baskets of 12 disks each - with the ability to create hardware RAID inside each bucket. In some cases, this will do without an external disk array.

What is the main feature of KunLun? The ability to combine up to 32 Intel Xeon processors and up to 24 TB of memory in one partition. Well, as a bonus: the system uses the BIOS from Huawei, and the vendor is ready to provide source codes for software certification.

Why not every manufacturer can offer a 32-processor system?

The standard means of Intel processors can combine no more than 8 processors in one server. You can combine more only by creating special devices - controllers of NUMA nodes (node controller). Intel does not produce them, but the possibility of using this functionality is embedded in the QPI bus. This was used by HP, SGI and Huawei - each manufacturer made its own controller. It is clear that behind the creation of such a controller is a large-scale scientific activity and associated costs. Huawei, for example, took 8 years to develop.

The remaining vendors (and Intel among them) refused to develop their controllers. Causes? Firstly, an increase in the number of processors leads to a decrease in the speed of working with memory. This is largely due to the need to synchronize the status of processor caches: the more processors cached a piece of memory, the more notifications will be required when it is changed by one of the processors. The second reason - for the vast majority of computing tasks, from one to four processors are enough.

EulerOS

The manufacturer claims the possibility of replacing processors and memory on the fly. This requires a specialized OS - EulerOS. On the Internet, information about it is very scarce and relates mainly to certification on the latest version of Linux Standard Base. In fact, it turned out that EulerOS was assembled from RHEL sources - Red Hat Enterprise Linux (similar to CentOS). Huawei customizes it for its equipment, in particular by adding CPU / RAM hot swap drivers.

In addition to EulerOS, KunLun has announced support for RHEL, SLES 11 & 12, Windows Server 2012.

SPECint / SPECfp Performance Test

KunLun is doing fine with arithmetic. During SPECint, processes are bound to specific kernels and work only with local memory.

Server	Speint	Speff
SGI UV 300 (32x, Intel Xeon E7-8890 v3)	22600	15700
KunLun 9032 (32x - Intel Xeon E7-8890 v3)	22900	16300
IBM Power E880 (16 x Power8 4.0 GHz, 192 core)	14400	11400
KunLun 9016 (16x - Intel Xeon E7-8890 v3)	11700	8050
SGI UV 300 (16x, Intel Xeon E7-8890 v3)	11400	7880
Integrity Superdome X (16x, Intel Xeon E7-8890 v3)	11100	7670

The comparison of KunLun with the top-end IBM Power E880 (also 16-processor) turned out to be interesting - the gap between them is not so wide. That is, in the Intel Xeon computing area, the Huawei server is quite a competitor to Power8.

SLOB Performance Test (Oracle)

Here, to a greater extent, the speed was measured not of the calculations themselves, but of access to memory. DBMS processes are not tied to NUMA nodes; for the test, all memory is considered equidistant from the processors. The test results confirmed that the dependence of server performance on adding resources is non-linear.

An increase in processor capacity by seven times (from 16 to 144 cores, taking into account the reduction in frequency) led to a 5-fold increase in server performance (71% efficiency). With a 4-fold increase in the number of cores - from 16 (4 CPUs) to 64 (16 CPUs), productivity increased 2.7 times (68% efficiency).

Applications KunLun

The main advantage of KunLun is the impressive amount of memory on board (24 TB now, 32 TB in the future). This is especially true for In-Memory analytics, when the entire database is placed in RAM. Using KunLun allows you to reduce data access time by 3 orders of magnitude compared to hard drives, that is, speed up database queries. KunLun is good for SAP HANA and SAP S / 4HANA. The amount of memory allows you to use HANA even in a single-node KunLun configuration. Oracle Database (especially with the In-Memory option) and QlikView also look good in the Chinese super server.

Retailers can use this solution as a platform for SAP HANA to analyze large volumes of data on customer demand for certain goods, on stock balances, etc. A combination of Oracle In-Memory Option and KunLun will help banks assess their creditworthiness on the fly, calculate capital adequacy ratios, etc. Telecom operators based on this solution will be able to implement customer loyalty management - the formation of their profiles, targeting.

In addition, KunLun can replace x86-based RISC systems. For some companies, vertically scalable tasks that have outgrown the x86 servers of the past and run on RISC are relevant. At the same time, an equal sign can be put between the cost of KunLun and the price of annual maintenance of the RISC system. KunLun is not inferior to them in terms of reliability and wins in a variety of application software. It is noteworthy that at home, KunLun is actively used for import substitution, mainly as a platform for migration from RISC systems.

This article was prepared by Dmitry Glushenko, a systems architect at the Jet Infosystems Computer Design Center. We welcome your constructive comments.

Tags: