Servers based on the new Intel Xeon E5v3 processors and DDR4 memory

    Intel Xeon E5v3

    We hasten to inform you that dedicated servers based on the processors of the new Intel Xeon E5v3 family and DDR4 memory are now available for order in our data centers. We are the first in Russia to offer these servers to the customers the latest and most productive configurations to date.

    A complete list of available configurations is presented on our website .

    In this article, we will talk in detail about the new processors and their capabilities.


    Intel Xeon E5v3: what's new


    Intel Xeon E5v3 processors are manufactured according to the same technological process as the previous generation processors, but at the same time they have implemented many improvements at the microarchitecture level and the chip design has been improved.

    The main innovations are listed in the table below:

    RegionChangeAdvantage
    Crystal InterconnectsTwo ring buses per processorEnhances processing power and throughput of cores
    Memory controller (home agent)
    • DDR4
    • Two home agents in more processor models

    • Increased memory bandwidth and energy efficiency
    • Increase socket bandwidth

    Last Level Cache, LLC
    • Cluster On Die (COD)
    • Enhanced Layer Cache Allocation Policy

    • Increased productivity and reduced latency
    • Improving overall performance

    Power managementEach core operates at its own frequency under voltage corresponding to the current load.
    • Improved performance per watt
    • Saving power on an idle socket

    QPI 1.1Up to 9.6GT / sSpeeding up cache synchronization in multiprocessor configurations
    Integrated I / O Controller (IO-Hub)
    • Tracking I / O Cache Affiliation
    • Increase PCIe Buffers

    Increased PCI bandwidth in a conflict situation (multiple simultaneous attempts to access the same cache line)
    PCI Express 3.0Allows simultaneous bus recording to multiple devicesReduced Bus Bandwidth Utilization


    Below we will talk about them in more detail.

    Cluster on Die (Cluster on Die)


    As mentioned above, Intel Xeon E5v3 processors use the Cluster on Die circuitry. To better understand the features of this scheme, compare it with previous microarchitectures.

    Intel Xeon E5v3

    Sandy Bridge processors consisted of two rows of cores and the last level cache blocks connected by one ring bus. The Ivy Bridge processors (Intel Xeon E5v2) had three rows connected by two ring buses. Ring busses moved data in opposite directions (clockwise and counterclockwise) to ensure their delivery along the shortest route and reduce delay time. After the data entered the ring structure, it was necessary to coordinate their route in order to avoid confusion with the previous data.

    On Intel Xeon E5v3 processors, the cores are arranged in four rows around two last-level cache blocks. It is very difficult to control the movement of data using such a scheme, and therefore the following solution was proposed: two ring buses were separated from each other. A buffered switch is used to exchange data between them - this is similar to how an Ethernet switch divides a network into two segments.

    Ring buses can operate independently, and this increases the bandwidth. This innovation is especially useful when FMA / AVX instructions work with large 256-bit chunks of data.

    In addition to the described configuration, there are two others. The basic information about them is presented in the following graphic diagram:

    Intel Xeon E5v3

    Processor circuitNumber of rowsNumber of Home AgentNumber of CoresPower consumptionThe number of transistors, billionsCrystal area, mm 2
    Hcc4214 - 18110 - 1455.69662
    Mcc326 - 1265 - 1603.84492
    Lcc214 - 855 - 1402.60354

    The first supports 4 to 8 cores. It consists of one double ring bus, two columns of cores and an agent for memory controllers. The last level cache in this configuration is less, and its delay time is lower.

    The second configuration supports 10 - 12 cores and is a reduced version of the configuration, which we have already described above. In this configuration, the chip is equipped with two memory controller agents. The blue points on the diagram indicate where the data goes into the ring buses.

    In all three processor circuits, the crystal configuration is asymmetrical: for example, an 18-core processor has 8 cores and 20 MB of the last level cache on one side, and 10 cores and 25 MB of cache on the other.

    Kernel data and instructions are not stored in cache sections located next to them. Such a solution may not provide the minimum possible delay, but it avoids cache overflow. Data is stored on physical addresses, which provides uniform access to all cache cells of the last level. Transactions take the shortest route.

    Each ring bus operates at its own frequency and voltage optimized for the current load. In case of increased load, the tires can be allocated additional power, which allows for a higher speed.

    Improving instruction sets


    Compared with the previous generation, the performance of new processors has increased significantly. This has been made possible thanks to the most extensive over the past three years improvement of the AVX 2.0 instruction set.

    The bit depth of vector computing blocks in new processors has been increased from 128 to 256 bits. As a result, the performance of floating point computing has increased by 70 - 100%. The speed of many application operations has increased: for example, when calculating checksums for data deduplication and Thin Provisioning, the processor load is reduced by about half.

    AVX 2.0 also includes an updated set of FMA instructions (fused-multiply-add, multiplication-addition with a single rounding). For code that performs sequential multiply-add operations, FMA halves the number of cycles. This innovation allows you to significantly increase the speed of high-performance applications (professional graphics, pattern recognition, etc.).

    The new processor family has also improved the Intel Advanced Encryption Standard New Instructions (AES-NI) instruction set, which has almost doubled the encryption and decryption speed.

    Energy efficiency


    Intel Xeon E5v3

    Compared to the previous generation, Intel Xeon E5v3 processors consume 36% less power.

    This increase in energy efficiency was made possible by PCPS (Per-Core P-States) technology. Firstly, each core now operates at its own frequency with a voltage corresponding to the current load. Secondly, the cores of new processors can go into low power mode not only all together (as was the case with the previous generation processors), but also separately.

    Virtualization


    The number of cores in the Intel Xeon E5v3 processors increased, which made it possible to increase the density of virtualization (the number of VMs per server). In this regard, there is a need for improvements in the field of instructions for hardware virtualization. Such improvements have been implemented; they include the use of Cache Quality of Service Monitoring, Virtual Machine Control Structure Shadowing and Extended Page Accessed and Dirty Bits technologies, as well as improvements to Direct Data I / O technology.
    Let us consider in more detail these technologies and their advantages.

    Cache quality of service monitoring


    Intel Xeon E5v3

    This technology allows real-time monitoring of cache loading at the kernel, thread, application or virtual machine level. Thanks to it, you can reduce the likelihood of crowding out data from the cache of one VM, improve the responsiveness of virtual machines and increase their performance.

    Monitoring Cache QoS will help to identify “bad neighbors” consuming too large cache volumes in a virtual environment, as well as optimize the load in multi-user environments.

    Virtual Machine Control Structure Shadowing


    Using this technology, you can run a hypervisor in a hypervisor. It gives guest hypervisors access to the server hardware (under the control of the primary hypervisor). Thanks to this, you can run any guest software with a minimal drop in performance.
    VMCS Shadowing technology can be used to test and debug hypervisors, operating systems and other programs that need direct access to VMCS (Virtual Machine Control Structure) and VMM (Virtual Machine Monitor).

    Support for VMCS Shadowing is already implemented in KVM-3.1 and Xen-4.3 and higher.

    Extended Pages Accessed and Dirty Bits


    Migration of virtual machines is fraught with a number of problems, especially in the case of migration of actively running VMs. When migrating virtual machines, it is important that the actual data is transferred in memory.

    EPT A / D Bits (Extended Pages Accessed and Dirty Bits) technology allows hypervisors to get more information about the state of VM memory pages using the Accessed and Dirty flags, which reduces the number of calls to the VMExit instruction during VM migration. Its use accelerates migration, and therefore reduces the downtime of virtual machines.

    Improving Direct Data I / O Technology


    One of the key innovations of the Intel E5 processor family was Direct Data I / O, which allows peripheral devices to direct I / O traffic directly to the processor’s cache. As a result of its use, the amount of data transferred to the system memory is reduced, power consumption is optimized and input / output delays are reduced.

    In Intel Xeon E5v3 processors, this technology has been improved. With it, you can now configure the binding of LLC to the cores and lines of PCIe. This reduces memory and cache accesses during I / O virtualization, which further improves performance and reduces latency.

    Detailed information on these technologies can also be found on the Intel Developer Blog .

    DDR4


    One of the key features of Intel Xeon E5v3 processors is support for the new DDR4 SDRAM.

    An important difference between DDR4 and previous generations is the organization of memory chips: the number of banks is doubled to 16 (for technical details see, for example, here ). Switching between banks is faster; DDR4 chips open arbitrary strings twice as fast as DDR3.

    An increase in the number of banks and related technological innovations, firstly, make it possible to create memory modules with increased capacity, and secondly, they contribute to increased productivity.

    DDR4 memory is highly energy efficient: having higher performance compared to the previous generation, the new memory consumes 40% less energy.

    Available configurations


    We offer the following server configurations based on Intel Xeon E5v3 processor family:

    CPUMemoryDisksPrice, rub / month
    Xeon E5-1650v3 3.5 GHz64 GB DDR42 × 4 TB SATA10,000
    Xeon E5-1650v3 3.5 GHz64 GB DDR42 × 480 GB SSD12000
    2 × Xeon E5-2630v3 2.4 GHz128 GB DDR42 × 480 GB SSD15500
    2 × Xeon E5-2630v3 2.4 GHz64 GB DDR42 × 4 TB SATA,
    2 × 480 GB SSD
    16000
    2 × Xeon E5-2670v3 2.3 GHz256 GB DDR42 × 800 GB SSD30000

    Servers are already available for order in St. Petersburg. In the very near future they can be ordered in Moscow. Also, it will soon be possible to build custom configuration servers based on the new Xeon E5v3 processors.

    PS Servers of all previous configurations are now available for sale . Hurry up to rent dedicated servers on very favorable terms!

    Readers who for one reason or another are not able to leave comments here are welcome to our blog .

    Also popular now: