"Find the Five Differences." Scalable and Generation Difference - New Batch of Tests
Less than two years after the announcement, Intel introduced the second generation of Intel Xeon Scalable processors on the new Cascade Lake architecture. Officially - April 2. The company itself calls it the largest launch in its history, strategically very important for it. Well, let's figure out what's so special about these new Scalables.
What was left?
Processors Cascade Lake, or rather Cascade Lake SP, like their predecessors Skylake, still belong to the platform Purley, now the second generation - Purley Refresh. They are fully compatible with Skylake at the connector level, chipsets and motherboards inherited from the first generation. But with nuances - for example, the new bios.
The process technology has not changed. The same 14 nm, however, with optimizations.
The general naming and naming scheme for the Platinum, Gold, Silver, Bronze series has remained the same. True, there are more “suffixes”. New Y, N, V and S were added to the existing L, M and T. The numbering of the second position (hundreds) has changed in the numbering: now instead of unity - two, that is, Gold 6240 will be the successor, for example, Gold 6140.
Otherwise, the basic characteristics and set of features have not changed. The number of cores and cache sizes hold positions: up to 28 and 1 MB L2 per core + up to 38.5 MB total L3. The number and type of PCI-E lines are the same as they were - 48 lines version 3.0. The scalability is the same: up to 3 UPI lines per 10.4 GT / s and up to 8 (seamlessly) sockets in the system.
What did you add?
In general, there are many different micro-updates, but I would single out these from more or less significant ones.
First, Cascade Lake introduced hardware patches against sensational vulnerabilities last year . Intel introduced software and hardware solutions against options 2 (Specter), 3, 3a and 4 (Specter NG), L1TF (Foreshadow). For Specter Variant 1, only the software patch is still offered. That is, everything that is already in the Intel Core i9 line. And so it looks in a press release:
- Option 1. Protection is carried out by means of OS and VMM (Virtual Machine Monitor)
- Option 2. Hardware Branch Prediction Hardening (prevention of future attacks by this method) + by means of OS and VMM
- Option 3. Hardware Hardening
- Option 3a. Hardware
- Option 4. Hardware + OS / VMM
- L1TF. Already closed thanks to Hardware Hardening option 3
Secondly, support for DDR4-2933 memory appeared. But with reservations: only for the Gold and Platinum lines (Bronze and Silver still work with DDR4-2400) and with only one DIMM per channel - in a configuration with two DIMMs per channel, the frequency decreases to 2666 MT / s.
Thirdly, Intel Optane DC Persistent Memory (DCPM) premiered. The clearest wording about what it is was obtained by Tiskoma, so I quote:
“Intel Optane DC Persistent Memory (DCPM) is a new class of technology that combines the concepts called“ memory and storage ”for use in data centers.”
You may recall that Intel previously introduced Intel Memory Drive Technology for Xeon Skylake: Hypervisor (Xen) + Optane NVMe Modules. We even had tests on this subject, but the results were not inspirational, and we decided to wait for a more impressive solution. It seems to have waited =)
At the core of Intel's new solution are DCPMMs that are visually similar to DIMMs, and are electrically and mechanically compatible with them. They operate at a speed of 2666 MT / s and have a capacity of 128/256/512 GB. At the logical level, they use the DDR4-T (Transaction) protocol, which, according to Intel, is approved by JEDEC, but in practice it is supported only in Cascade Lake memory controllers. That is, they installed an energy-independent memory made using 3D XPoint technology on the DDR4 DIM4 connector, which again outperforms the widely spread NAND Flash by three orders of magnitude (1000 times) in terms of Intel, such as speed and service life.
The solution turned out to be very interesting and extremely ambiguous: of course, there are operating features (not without it), price, and applications. But we will not focus on this killer feature for this line of processors - a more detailed story about it goes far beyond the scope of today's article. As soon as the tests in all possible modes of operation of this technology are ready, we’ll roll out the Longrid immediately :-)
Fourth, Intel Resource Director Technology (RDT), Speed Select (SST) and Intel DL Boost technologies are pumped over skills.
I'll start with RDT. It represents mechanisms of rather fine monitoring and control over the execution of applications and the use of resources. The piece is not new, but in this line they put their hands to it well and worked in detail. The bottom line is that an application with a higher priority on time gets everything that it needs. Naturally, due to "infringement of the rights" of other applications.
Now SST. Here it is the same, but at the level of nuclei: it allows you to firmly distinguish a group of nuclei that will have an increased priority over others. The appearance this time is not debut, but quite spectacular.
And for dessert, Intel DL Boost. The innovation concerns a new set of instructions, previously known as Vector Neural Network Instructions (VNNI). Gizmo for AI, or rather, for more flexible training of deep learning networks. In fact, another add-on over the AVX-512.
And finally, fifth. According to the old tradition, there are more frequencies, more cores for Intel refreshes :-) Both the base frequencies and the frequencies in the boost have grown by 200-300 MHz. With some exceptions, two cores were added per processor. The amount of supported RAM has increased.
Separately, it is worth noting Intel's work to optimize the use of caches and RAM, probably to minimize the negative impact of patches from vulnerabilities of the Specter and Meltdown family.
More details on the architecture of Cascade Lake can be found on wikichip . I recommend reading it. And now - traditional testing.
Testing
The testing involved eight Intel Xeon Scalable processors:
- first generation - Silver 4110, Silver 4114, Gold 6130, Gold 6140
- second generation - Silver 4210, Silver 4214, Gold 6230 and Gold 6240.
The performance characteristics of the platforms
All processors have the same basic configuration.
- Platform: Intel Corporation S2600WFT (BIOS SE5C620.86B.02.01.0008.031920191559)
- RAM:
- 16 GB Samsung DDR4-2933 - 12 units (one for each channel) for Gold 6230 and 6240 processors
- 16 GB Samsung DDR4-2666 - 12 units (one for each channel) for Gold 6130 and 6140 processors
- 16 GB Samsung DDR4-2400 - 12 units (one for each channel) for Silver processors of both generations
- SSD: Intel DC S4500 480 GB - 2 pieces in RAID1
- Dual processor configuration
Software part: OS CentOS Linux 7 x86_64 (7.6.1810)
Kernel: 3.10.0-957.12.2.el7.x86_64
Introduced optimizations relative to the standard installation: added kernel launch options elevator = noop selinux = 0
Testing is performed with all patches from Specter attacks , Meltdown and Foreshadow, backported to this kernel.
The list of tests that we will conduct:
- Geekbench
- Sysbench
- Phoronix Test Suite
Detailed test description
Geekbench Test
Package of tests conducted in single-threaded and multi-threaded mode. The result is a performance index for both modes. In this test, we will consider two main indicators:
Units of measure: abstract "parrots". The more parrots, the better.
Sysbench test Sysbench
is a package of tests (or benchmarks) for evaluating the performance of various computer subsystems: processor, RAM, data storage devices. The test is multi-threaded, for all cores. In this test, I measured one indicator: CPU speed events per second - the number of operations performed by the processor per second. The higher the value, the more productive the system.
Phoronix Test Suite
Phoronix Test Suite is a very rich test suite. Almost all the tests presented here are multithreaded. Only two of them are an exception: single-threaded tests Himeno and LAME MP3 Encoding.
In these tests, the higher the score, the better.
And in these if less, it’s better - in all tests the time it takes to measure is measured.
In this testing, I removed the ffmpeg test because it stopped passing adequately on the total number of cores that modern golds have in a dual-processor configuration.
Package of tests conducted in single-threaded and multi-threaded mode. The result is a performance index for both modes. In this test, we will consider two main indicators:
- Single-Core Score - single-threaded tests.
- Multi-Core Score - multi-threaded tests.
Units of measure: abstract "parrots". The more parrots, the better.
Sysbench test Sysbench
is a package of tests (or benchmarks) for evaluating the performance of various computer subsystems: processor, RAM, data storage devices. The test is multi-threaded, for all cores. In this test, I measured one indicator: CPU speed events per second - the number of operations performed by the processor per second. The higher the value, the more productive the system.
Phoronix Test Suite
Phoronix Test Suite is a very rich test suite. Almost all the tests presented here are multithreaded. Only two of them are an exception: single-threaded tests Himeno and LAME MP3 Encoding.
In these tests, the higher the score, the better.
- Multithreaded John the Ripper passwords test. Take the Blowfish crypto algorithm. Measures the number of operations per second.
- The Himeno Test is a linear Poisson pressure solver using the Jacobi point method.
- 7-Zip Compression - 7-Zip test using p7zip with integrated performance testing function.
- OpenSSL is a set of tools that implement the SSL (Secure Sockets Layer) and TLS (Transport Layer Security) protocols. Measures the performance of RSA 4096-bit OpenSSL.
- Apache Benchmark - the test measures how many requests per second a given system can withstand while executing 1,000,000 requests, while 100 requests are executed simultaneously.
And in these if less, it’s better - in all tests the time it takes to measure is measured.
- C-Ray tests CPU performance on floating point calculations. This test is multi-threaded (16 threads per core), will shoot 8 rays from each pixel for smoothing and generate a 1600x1200 image. The time taken to complete the test is measured.
- Parallel BZIP2 Compression - The test measures the time it takes to compress a file (.tar package of the Linux kernel source code) using BZIP2 compression.
- Encoding audio data. The LAME MP3 Encoding test is performed in a single stream. Measured test time
- Timed GCC Compilation. Shows how long it takes to build the GNU GCC compiler (version 8.2.0). Units are seconds.
In this testing, I removed the ffmpeg test because it stopped passing adequately on the total number of cores that modern golds have in a dual-processor configuration.
Test results
In the Geekbench test in single-threaded and multi-threaded versions, the new Scalable bypass the old ones in all respects. In a single-threaded test from 3% to 6%, in multi-threaded from 6% to 13%, and the apotheosis - Silver 4210 is better than Silver 4110 as much as 33%.
In the Sysbench test, the difference is from 22% to 37%. The minimum gap between Gold 6140 and Gold 6240 is 7% in favor of the new.
In the test, John The Ripper Silver 4210 overtakes Silver 4110 by 41%, and between Silver 4214 and Silver 4114 the difference is almost 30% - naturally, in favor of the first. Now gold. Gold 6230 is 16% faster than Gold 6130. The minimum gap between Gold 6140 and Gold 6240 is 7.6%.
Silver 4210 overtakes Silver 4110 by 29%, and Silver 4214 predecessor by 23%. The gap between the pairs of Gold is 20% and 8%, respectively.
In the single-threaded Himeno test, you can see a net increase of 200-300 MHz - from 2.2% to 6% in favor of the new generation.
The compress-7zip test almost completely copies the result of the John The Ripper: Blowfish test. A beautiful gap between Silver 4110 and Silver 4210: 4210 is almost 35% faster than its predecessor. Silver 4214 and Gold 6230 are 18% and 20% better than 4114 and 6130, respectively. The minimum gap between Gold 6140 and Gold 6240: the new one is 4.7% better than before.
In the compress-pbzip2 test, the picture is similar to the compress-7zip test. Of the significant differences, the gap between Gold 6130 and Gold 6230 has narrowed, here it is 5.6%.
In the single-threaded Encode-mp3 test, we again see the difference of 200-300 MHz. From 4% to 7% - the second generation Scalable is so much better than the first in this test.
In the openssl test, the biggest gap between Silver 4110 and Silver 4210 is 41%. Between 4114 and 4214 - 29%. Golds have less. Between Gold 6130 and 6230 - 23%. And in the pair Gold 6140 and 6240 - 4.6%. I note that Gold 6240 is only 0.78% better than Gold 6230.
In the Apache test Silver 4210 is better than Silver 4110 by 40%, Silver 4214 overtakes Silver 4114 by 36%, Gold 6230 is better than Gold 6130 by 21% and Gold 6240 passes this test better than Gold 6140 by 29%. I will especially focus on Silver 4210, Silver 4214 and Gold 6230: Gold 6230 is 3% better than Silver 4210 and 1.5% better than Silver 4214. That is, the gap is minimal. Gold 6240 is 13% better than Gold 6230.
In the GCC test, the new generation outperforms its predecessors by about 19%, 16%, 11% and 9.5%, respectively.
What is the result.
We observe a significant gap between Silver 4110 and Silver 4210 - the new generation is better than the previous one in multithreaded tests from about 20% to 40%. Thank you, frequencies and cores.
There is already less difference between Silver 4114 and Silver 4214: test maximum - in the Apache test it reaches 36%.
Further, the gap is narrowing. Gold 6230 overtakes Gold 6130 in the range from 11% in the GCC test to 23% in the OpenSSL test.
And finally, the minimum gap between the Gold 6140 and Gold 6240 pair: the new one is 3% -10% ahead of the previous one according to the result of most tests. An exception is the Apache test: the difference is 28% - fewer cores, more base frequency (Apache is generally a very interesting test).
And now we pass to additional tests. But first, a brief background.
RAM Testing
The new Gold 62xx Intel Xeon Scalable processors are now supporting a new type of DDR4-2933 RAM. We, quite logically, asked ourselves: how much will the frequency of RAM affect the overall system performance. In general, based on the assumption that plus to plus always gives something positive, it was believed that a fresh processor paired with new memory will prove to be great. But it is one thing to assume, and another to verify experimentally.
For the test, we took the Gold 6240 processor in a dual-processor configuration. The performance characteristics of the platform and the software component have not changed. We will test such memory: DDR4-2400, DDR4-2666 and DDR4-2933.
Always happy when at hand there is everything you need to test hypotheses =) And now let's go see what came of it.
RAM test results
When it’s too good, it’s already bad. Therefore, I decided to abandon the idea of drawing all the graphs and brought the results to tables - more convenient and faster, although less clearly. Charts will also be, but only the most interesting, in my opinion.
“Either we are doing something wrong, or one of two things.”
The quotation of the Pilot brothers, albeit slightly paraphrased, turned out to be very useful after the memory testing was completed ...
As in all tests, we took ten measurements and chose the average values for them. As you can see, the testimonies vary as much as the testimonies of citizen Krolikova from the movie Shirley-Myrli.
In tests Phoronix 50 to 50 high results show configurations with RAM 2400 and 2933 MHz. Geekbench benchmarked 2933 memory with Memory Score_Single and Memory Score_Multi parameters, but the overall result is surprising.
From assumptions - the effect of a higher frequency on latency. And here comes the balance between speed and response time. But, to be honest, I'm not sure ... If you have something to say about this - I ask in the comments.
Last time I became convinced that the non-use of all channels of the processor's memory exerts a greater influence on the test results. In the next processor testing, we will definitely consider this effect and I will tell you what and how.
A small step for man, but a huge step for humanity
As Comrade Kamnoedov (I love the Strugatsky) would say, “roughly in such an acceptance” Intel is positioning a new line of Xeon Scalable processors. At the beginning of the article, I said that the release of new Scalable for Intel itself is an important strategic step. Now I will explain.
On the one hand, the new Scalable ushered in a global upgrade of the data center platform. And already in the second half of the year a couple of interesting announcements await us. On the other hand, all the innovations are not accidental - this is an answer to the current demands of the industry. And quite a decent answer. Not enough memory? Here is the Optane DC Persistent Memory. Wanted hardware prioritization of processes and cores? Please have pumped SST and RDT. Have you dreamed of professional training of networks? :-) Here, sign, a new set of instructions for AI. For Intel you can only rejoice.
Although, personally, it seems to me that this release includes Wishlist, which Intel did not manage to implement last time. And, of course, something had to be done with hardware holes, the search for which for different specialists has already become a kind of entertainment. Everything that Intel took away from the user with Spectrum-Meltowna holes, he now returned, saving the price.
In addition, AMD comes from all sides, whose decisions have been much less affected by the Spectrum-Meltdowns, and which has recently been especially “shaky” Intel like in the desktop (I would like to have such youthfulness in such a respectable age), and slightly in the server segment. By the way, in terms of the latter, it is very interesting to see how the new AMD Epyc Rome will show themselves, since the current generation of Epyc personally did not leave me indifferent.
But back to Scalable.
What is the bottom line for a user who is not burdened by AI and trained networks? Unambiguously obvious increase in productivity due to a larger number of cores, higher base frequencies and frequencies in the turbo boost. And if for Gold processors of different generations this increase reaches a maximum of 23% - both of them are good, then for Silver in some tests it reaches 40%. Given the almost unchanged cost, the difference is quite pleasant, although I always want more =)
If you rely on Intel’s statement that this is only the beginning, even a skeptic like me is curious to see what is offered to us in the future.
In testing we used servers based on Intel Xeon Scalable processors: Silver 4110, Silver 4114, Silver 4210 , Silver 4214, Gold 6130, Gold 6140, Gold 6230 , Gold 6240 .
Until July 25, servers with the new Xeon Scalable can be ordered on the 1dedic.ru website with a 25% discount for 1 month using the NEW_SCALABLE promotional code . The promotional code will burn at midnight on July 26, 2019.
For any dedicated server, a 10% discount when paying for the year.
Tested and written for you by Trashwind , senior system administrator of the FirstDEDIC operations department