HPE Superdome Flex: A New Level of Performance and Scaling
Last December, HPE announced the launch of a modular in-memory computing platform with in-memory technology with the world's largest scaling capabilities, the HPE Superdome Flex. This is a breakthrough in computing systems to support mission-critical applications, real-time analytics and high-performance computing with intensive data processing.
The HPE Superdome Flex platform has several features that make it unique in its industry. We offer you a translation of an article from the Servers: The Right Compute blog , which deals with the modular and scalable platform architecture.
Like most x86 server vendors, HPE uses the latest Intel Xeon Scalable processor family, codenamed Skylake, in its latest generation of servers, including the HPE Superdome Flex. The Intel reference architecture for these processors uses the new UltraPath Interconnect (UPI) technology with scaling limitation to eight sockets. Most of the vendors that use these processors use a no-gluing connection in servers, but the HPE Superdome Flex uses a unique modular architecture with scaling capabilities that exceed the capabilities of Intel: from 4 to 32 sockets in one system.
This architecture is used because we saw a need for scaling platforms that go beyond eight Intel sockets; This is especially true today, when data volumes are increasing at an unprecedented rate. In addition, since Intel developed UPI primarily for servers with two and four sockets, servers with eight sockets without “gluing” face bandwidth problems. Our architecture provides high bandwidth even when the system grows to maximum configuration.
The modular architecture of the HPE Superdome Flex is based on a four-socket chassis that can be scaled to eight chassis and 32 sockets in a single server system . A wide choice of processors is available for use in the server: from inexpensive Gold models to the top-end Platinum series of Xeon Scalable processors.
This choice between Gold and Platinum processors across the full zoom range provides excellent advantages in price / performance compared to entry-level systems. For example, in a typical configuration with 6 TB of memory, Superdome Flex provides a cheaper and more productive solution than competitive offers with four sockets. Why? Due to the architecture, other manufacturers of 4-processor systems are forced to use 128 GB DIMM memory modules and more expensive processors that support 1.5 TB per socket. This is significantly more expensive than using 64 GB DIMM modules in Superdome Flex with eight sockets. Thanks to this, the Superdome Flex platform with eight sockets and 6 TB of memory provides twice the processing power,
Similarly, for an 8-processor configuration with 6 TB of memory, the Superdome Flex platform can provide a less expensive and more efficient solution with eight sockets. How? Other manufacturers of 8-processor systems are forced to use more expensive Platinum processors, while the Superdome Flex with eight sockets can use inexpensive Gold processors, providing the same amount of memory.
In fact, among platforms based on the Intel Xeon Scalable processor family, only Superdome Flex can support more economical Gold processors in configurations with 8 or more sockets.(Intel "no-stick" architecture supports 8 sockets only with expensive Platinum processors). We also offer a large selection of processors with different numbers of cores, from 4 to 28 per processor, allowing them to be correlated with the requirements for workloads.
The ability to scale up within a single system, or scale up, provides a number of benefits for mission-critical workloads and databases for which the HPE Superdome Flex is best suited. These include traditional databases and in-memory databases, real-time analytics, ERP, CRM and other transactional applications. For these types of loads, it is easier and cheaper to manage one vertically scalable environment than a cluster with horizontal scaling; in addition, it significantly reduces latency and improves performance.
Read the blog post Speed operations with horizontal and vertical scaling with SAP S / 4HANAto understand why vertical scaling is much more efficient than horizontal (clustering) for these types of workloads. In fact, it’s all about speed and ability to work at the level required for these critical applications.
The Superdome Flex's high scalability is achieved through the unique HPE Superdome Flex ASIC chipset, connecting separate 4-processor chassis, as shown in Figures 1 and 2. In this case, all ASICs are interconnected directly (remotely in one step), ensuring minimal access delays remote resources and maximum performance. HPE Superdome Flex ASIC technology provides adaptive routing for balancing the switching fabric load and optimizing delays and throughput, which improves system performance and system availability. The ASIC integrates the chassis into a cache-coherent matrix fabric and maintains cache consistency across all processors, using a large directory with the status records of the cache lines that are embedded directly into the ASIC. This coherence scheme plays a crucial role, providing Superdome Flex with the ability to maintain near-linear performance scaling from 4 to 32 sockets. Typical non-gluing architectures demonstrate already more limited performance scaling (ranging from four to eight sockets) due to the broadcasting of service requests to ensure coherence.
Fig. Fig. 1. Connection diagram of the HPE Flex Grid patch matrix of the Superdome Flex 32-socket server
Fig. 2. 4-processor chassis
Similarly, processor resources can be increased and the amount of memory by adding a chassis to the system. Each chassis has 48 DDR4 DIMM slots in which 32 GB RDIMM, 64 GB LRDIMM or 128 GB 3DS LRDIMM memory modules can be installed, which provides a maximum memory capacity of 6 TB in the chassis. Accordingly, the total amount of HPE Superdome Flex RAM in a maximum configuration with 32 sockets reaches 48 TB, which allows working with the most demanding applications using in-memory technology.
As for I / O, each Superdome Flex chassis can be equipped with a basket with 16 or 12 I / O slots to provide a large number of options for installing standard PCIe 3.0 cards and the flexibility to maintain system balance for any workload. In any case of the basket, I / O slots are connected to the processors directly without the use of bus repeaters or expanders, which could increase the delay time or reduce throughput. This ensures the highest possible performance of each I / O card.
Low latency access to the entire shared RAM space is a key factor in Superdome Flex high performance. Regardless of whether the data are in local memory or in remote (in a different chassis), their copy can be in the cache of different processors within the system. The cache coherence mechanism ensures consistency of cached copies in case a process changes data. The delay time for the processor to access local memory is about 100 ns. The delay in accessing data in the memory of another processor via the UPI channel is about 130 ns. Processors accessing data stored in the memory of another chassis travel between two Flex ASICs (always connected directly) with a delay of less than 400 ns, regardless of which chassis the processor is in. Because of this, Superdome Flex provides bandwidth between the two halves of the matrix (bi-sectioned) of more than 210 GB / s in a configuration with 8 sockets, more than 425 GB / s in a configuration with 16 sockets and more than 850 GB / s in a configuration with 32 sockets. This is more than enough for the most demanding and resource-intensive workloads.
It is no secret that the amount of data increases at an unprecedented rate; This means that the infrastructure must cope with increasingly demanding requests for processing and analyzing critical and ever-expanding data. But growth rates can be unpredictable.
When deploying memory-intensive applications, you can ask: what will the next TB memory cost me ? Superdome Flex allows you to increase the amount of memory without replacing hardware, since you are not limited to DIMM slots in one chassis. In addition, with an increase in the number of users, critical applications always require high performance, regardless of the amount of workload.
Today, in-memory databases require low-latency, high-bandwidth hardware platforms. Thanks to its innovative architecture, the HPE Superdome Flex platform delivers exceptional performance, high throughput and consistently low latency, even in the largest configurations. Moreover, you can get all this for your critical loads and databases with a very attractive price / performance ratio compared to systems from other manufacturers.
You can learn about the unique resiliency properties (RAS) of the Superdome Flex server from the HPE Superdome Flex blog : The unique RAS properties and the HPE Superdome Flex technical description : server architecture and RAS characteristics. Also recently released a blog dedicated to updates HPE Superdome Flex , announced on HPE Discover.
In this article, you can learn how HPE Superdome Flex is used to solve cosmology problems, and how the platform is prepared for memory-driven computing, a new memory-based computing architecture.
More information about the platform can also be from the record of the webinar .
The HPE Superdome Flex platform has several features that make it unique in its industry. We offer you a translation of an article from the Servers: The Right Compute blog , which deals with the modular and scalable platform architecture.
Zoom Capabilities Exceed Intel Capabilities
Like most x86 server vendors, HPE uses the latest Intel Xeon Scalable processor family, codenamed Skylake, in its latest generation of servers, including the HPE Superdome Flex. The Intel reference architecture for these processors uses the new UltraPath Interconnect (UPI) technology with scaling limitation to eight sockets. Most of the vendors that use these processors use a no-gluing connection in servers, but the HPE Superdome Flex uses a unique modular architecture with scaling capabilities that exceed the capabilities of Intel: from 4 to 32 sockets in one system.
This architecture is used because we saw a need for scaling platforms that go beyond eight Intel sockets; This is especially true today, when data volumes are increasing at an unprecedented rate. In addition, since Intel developed UPI primarily for servers with two and four sockets, servers with eight sockets without “gluing” face bandwidth problems. Our architecture provides high bandwidth even when the system grows to maximum configuration.
Price / performance as a competitive advantage
The modular architecture of the HPE Superdome Flex is based on a four-socket chassis that can be scaled to eight chassis and 32 sockets in a single server system . A wide choice of processors is available for use in the server: from inexpensive Gold models to the top-end Platinum series of Xeon Scalable processors.
This choice between Gold and Platinum processors across the full zoom range provides excellent advantages in price / performance compared to entry-level systems. For example, in a typical configuration with 6 TB of memory, Superdome Flex provides a cheaper and more productive solution than competitive offers with four sockets. Why? Due to the architecture, other manufacturers of 4-processor systems are forced to use 128 GB DIMM memory modules and more expensive processors that support 1.5 TB per socket. This is significantly more expensive than using 64 GB DIMM modules in Superdome Flex with eight sockets. Thanks to this, the Superdome Flex platform with eight sockets and 6 TB of memory provides twice the processing power,
Similarly, for an 8-processor configuration with 6 TB of memory, the Superdome Flex platform can provide a less expensive and more efficient solution with eight sockets. How? Other manufacturers of 8-processor systems are forced to use more expensive Platinum processors, while the Superdome Flex with eight sockets can use inexpensive Gold processors, providing the same amount of memory.
In fact, among platforms based on the Intel Xeon Scalable processor family, only Superdome Flex can support more economical Gold processors in configurations with 8 or more sockets.(Intel "no-stick" architecture supports 8 sockets only with expensive Platinum processors). We also offer a large selection of processors with different numbers of cores, from 4 to 28 per processor, allowing them to be correlated with the requirements for workloads.
The importance of scaling within a single system
The ability to scale up within a single system, or scale up, provides a number of benefits for mission-critical workloads and databases for which the HPE Superdome Flex is best suited. These include traditional databases and in-memory databases, real-time analytics, ERP, CRM and other transactional applications. For these types of loads, it is easier and cheaper to manage one vertically scalable environment than a cluster with horizontal scaling; in addition, it significantly reduces latency and improves performance.
Read the blog post Speed operations with horizontal and vertical scaling with SAP S / 4HANAto understand why vertical scaling is much more efficient than horizontal (clustering) for these types of workloads. In fact, it’s all about speed and ability to work at the level required for these critical applications.
Consistent high performance up to maximum configurations
The Superdome Flex's high scalability is achieved through the unique HPE Superdome Flex ASIC chipset, connecting separate 4-processor chassis, as shown in Figures 1 and 2. In this case, all ASICs are interconnected directly (remotely in one step), ensuring minimal access delays remote resources and maximum performance. HPE Superdome Flex ASIC technology provides adaptive routing for balancing the switching fabric load and optimizing delays and throughput, which improves system performance and system availability. The ASIC integrates the chassis into a cache-coherent matrix fabric and maintains cache consistency across all processors, using a large directory with the status records of the cache lines that are embedded directly into the ASIC. This coherence scheme plays a crucial role, providing Superdome Flex with the ability to maintain near-linear performance scaling from 4 to 32 sockets. Typical non-gluing architectures demonstrate already more limited performance scaling (ranging from four to eight sockets) due to the broadcasting of service requests to ensure coherence.
Fig. Fig. 1. Connection diagram of the HPE Flex Grid patch matrix of the Superdome Flex 32-socket server
Fig. 2. 4-processor chassis
Common memory
Similarly, processor resources can be increased and the amount of memory by adding a chassis to the system. Each chassis has 48 DDR4 DIMM slots in which 32 GB RDIMM, 64 GB LRDIMM or 128 GB 3DS LRDIMM memory modules can be installed, which provides a maximum memory capacity of 6 TB in the chassis. Accordingly, the total amount of HPE Superdome Flex RAM in a maximum configuration with 32 sockets reaches 48 TB, which allows working with the most demanding applications using in-memory technology.
High I / O flexibility
As for I / O, each Superdome Flex chassis can be equipped with a basket with 16 or 12 I / O slots to provide a large number of options for installing standard PCIe 3.0 cards and the flexibility to maintain system balance for any workload. In any case of the basket, I / O slots are connected to the processors directly without the use of bus repeaters or expanders, which could increase the delay time or reduce throughput. This ensures the highest possible performance of each I / O card.
Low latency
Low latency access to the entire shared RAM space is a key factor in Superdome Flex high performance. Regardless of whether the data are in local memory or in remote (in a different chassis), their copy can be in the cache of different processors within the system. The cache coherence mechanism ensures consistency of cached copies in case a process changes data. The delay time for the processor to access local memory is about 100 ns. The delay in accessing data in the memory of another processor via the UPI channel is about 130 ns. Processors accessing data stored in the memory of another chassis travel between two Flex ASICs (always connected directly) with a delay of less than 400 ns, regardless of which chassis the processor is in. Because of this, Superdome Flex provides bandwidth between the two halves of the matrix (bi-sectioned) of more than 210 GB / s in a configuration with 8 sockets, more than 425 GB / s in a configuration with 16 sockets and more than 850 GB / s in a configuration with 32 sockets. This is more than enough for the most demanding and resource-intensive workloads.
Why are high modular scaling capabilities important?
It is no secret that the amount of data increases at an unprecedented rate; This means that the infrastructure must cope with increasingly demanding requests for processing and analyzing critical and ever-expanding data. But growth rates can be unpredictable.
When deploying memory-intensive applications, you can ask: what will the next TB memory cost me ? Superdome Flex allows you to increase the amount of memory without replacing hardware, since you are not limited to DIMM slots in one chassis. In addition, with an increase in the number of users, critical applications always require high performance, regardless of the amount of workload.
Today, in-memory databases require low-latency, high-bandwidth hardware platforms. Thanks to its innovative architecture, the HPE Superdome Flex platform delivers exceptional performance, high throughput and consistently low latency, even in the largest configurations. Moreover, you can get all this for your critical loads and databases with a very attractive price / performance ratio compared to systems from other manufacturers.
You can learn about the unique resiliency properties (RAS) of the Superdome Flex server from the HPE Superdome Flex blog : The unique RAS properties and the HPE Superdome Flex technical description : server architecture and RAS characteristics. Also recently released a blog dedicated to updates HPE Superdome Flex , announced on HPE Discover.
In this article, you can learn how HPE Superdome Flex is used to solve cosmology problems, and how the platform is prepared for memory-driven computing, a new memory-based computing architecture.
More information about the platform can also be from the record of the webinar .