Non-volatile NVDIMM memory for cache protection in RAIDIX 4.6
In this article, we will talk more about non-volatile memory (NVDIMM) support in RAIDIX 4.6 software. The new version of the software was adopted by our key partners. Thus, RAIDIX 4.6 management software is already used in the new Trinity FlexApp storage system from Trinity.
Persistent memory and NVDIMM standard
The new RAIDIX works with persistent memory (PMEM), which combines the advantages of traditional storage devices and high bandwidth DRAM. This type of memory allows for byte addressing (load / store), and - unlike traditional "blockers" - operates with DRAM speed and corresponding low latencies. In the event of a loss of power on the server, the entire contents of the memory remains intact and can be restored after loading. This type of memory is currently available in the form of Non-Volatile Dual Inline Memory Module (NVDIMMs).
NVDIMMs are an example of the integrated use of two technologies: operational and non-volatile memory. The standard itself is not a novelty: it was approved a couple of years ago, and many companies have already presented their memory modules with a "battery". However, due to the active development of Flash chips, the modern NVDIMM has included enough NAND memory. Now NVDIMM allows not only to preserve data integrity at the time of a power outage, but also to cache all data passing through RAM on the fly.
Non-volatile NVDIMM memory can be of different types: NVDIMM-N, NVDIMM-F, NVDIMM-P. A NVDIMM-N type module includes both an SDRAM chip (RAM) and a flash memory chip (SSD) for backing up RAM data in case of an accident.
If NVDIMM-N is an “RAM” with advanced functionality, then NVDIMM-F is a kind of storage. There are no RAM cells in the “F” modules; they contain only Flash memory chips. NVDIMM-P combines the functions of NVDIMM-F and NVDIMM-N in a single module. Access goes simultaneously to both DRAM and NAND on the same level. All three configurations can significantly increase productivity when working with big data, HPC, etc.
In 2017, there was a kind of breakthrough regarding NVDIMM memory in the server segment. Micron introduced new 32GB modules. These modules operate at DDR4-2933 with CL21 delays - which is much faster than other DDR4 for server applications. Somewhat earlier, 8 GB and 16 GB memory modules were released.
Figure 1 Micron 32GB DDR4 NVDIMM-N Module
Micron's NVDIMM-N is ECC DRAM with 32GB fully available, with NAND Flash used only for data backup.
Access NVDIMM
There are two main ways to establish memory access on NVDIMM:
1. Direct access with PMEM
Direct access to modules using PMEM without combining into a single space is used in RAIDIX 4.6. In this case, the physical address space of the NVDIMM (DPA) is mapped to the physical address space of the SPA. If there are several NVDIMM slots in the system, the memory controller can display them as it sees fit. For example, like this:
Fig. 2 Direct access scheme using PMEM
Accordingly, access to the modules will be carried out as a single entity. This is not always the preferred option. In some cases, you need to access each bracket separately, for example, to collect RAID from them. For such tasks, there is a second mode.
2. Access using BLK apertures
Access is provided to each bar separately using the so-called “access windows”:
Fig. 3 Access scheme using BLK apertures
Modern NVDIMM strips often support both of these modes simultaneously. For this, namespaces are used in the same way as on NVMe devices. Also, if NVDIMM corresponds to NFIT (NVDIMM Firmware Interface Table), then at the beginning of each module special headers (labels) are stored, according to which the address space is divided into areas with different access modes (BLK or PMEM).
It is imperative that these areas do not overlap, as concurrent access to the same area using different methods is likely to lead to data corruption. (Read more about NFIT in ACPI 6.1 .)
NVDIMM Write Cache Protection in RAIDIX 4.6
Space organization
Prior to version 4.6, the update operation of the block by the leading controller in RAIDIX was accompanied by a synchronous copy of the block into the RAM of the slave controller. In the event of a power outage, a significant supply of energy was required in the UPS, sufficient for the cluster to save all copies of the blocks to disks before a complete shutdown. When power was returned, a substantial amount of time was required to charge the batteries before the cluster could be brought into operation. The total downtime depended on both the cluster power consumption and battery capacity, which in turn depended on the quality of service and storage conditions.
In single-controller mode, there were risks of data loss during emergency reboots and lack of battery capacity in backup power sources with vertical scaling of the solution.
New functionality introduced in version 4.6 allows simplifying system maintenance and avoiding the introduction of redundant hardware components. How is this technically implemented?
NVDIMM is used as a more reliable storage location for our cache than in RAM. To do this, you need to get the address in the virtual address space and cover the entire size of the area needed to gain access.
Below is an example of the organization of information storage:
Fig. 4 Layout of data and metadata
First, divide all available space into several namespaces. They will store data and metadata describing their location on RAIDs. Metadata also contains identifiers for unambiguous identification of data and the possibility of their recovery after a crash.
What kind of memory to use?
In this article, we will focus on the interaction of RAIDIX 4.6 with persistent memory NVDIMM-N from Micron. So, NVDIMM allows you to maintain data integrity even in the event of a power outage. At the same time, the Micron product combines DRAM performance with the stability and reliability of NAND memory, ensuring data integrity and business continuity.
In the event of a failure, the internal NVDIMM moves data stored in DRAM to the non-volatile memory area. After system recovery, the controller transfers data from NAND memory back to RAM without loss, allowing applications to continue working. The AgigA Tech PowerGEM ultracapacitor can act as a backup power source for the Micron NVDIMM.
Fig. 5 NVDIMM with AgigA PowerGEM ultracapacitor
Micron's NVDIMM technology is a combination of volatile and non-volatile memory (NAND Flash, DRAM, and a stand-alone power source in the memory subsystem). Micron's DDR4 NVDIMM-N provides fast DRAM read and write speed and DRAM data backup in case of loss of power supply.
Below is a little more about the lossless data transfer process.
Data transfer process
In the new version, the system records data taking into account the state of supercapacitors (warranty period - up to 5 years), supplying only NVDIMM-N. The guarantee of data integrity is related to the accuracy of the energy reserve estimate that is required to transfer data to non-volatile memory.
Persistent memory combines the advantages of traditional storage devices and the high throughput of DRAM memory. A feature of persistent memory is byte addressing with high speed and very short delay time.
NVDIMM-N modules within one minute after a power outage automatically push data from DRAM to NAND. The transfer process is accompanied by an appropriate indication. Upon completion of the transfer, the modules can be removed from the faulty controller and put into serviceable, like ordinary DIMM modules. This feature is relevant for all single-controller configurations, from budget disk solutions to specialized solid state ones.
What is the advantage?
In a configuration with two controllers, any of the NVDIMM-N modules is replaced with the controller at any convenient time, without interrupting access to data. NVDIMM coherence can be provided not only by software, but also by hardware. This eliminates the need to maintain batteries (BBUs in older RAID controllers or UPSs). With vertical and horizontal scaling of the solution, reassessment of data loss risks is no longer required!
Trinity FlexApp Storage Features Based on RAIDIX 4.6
Within the Trinity FlexApp storage system, each controller is a regular server with NVDIMM-N modules installed in memory slots:
Fig. 6 Trinity FlexApp storage components
Support 100 Gbps
The system provides administrators with the ability to connect to Linux client machines through the high-performance InfiniBand Mellanox ConnectX-4 100 Gb interfaces. As a result, the system delivers minimal latency and increased performance in big data, HPC, and enterprise environments. In addition, a number of improvements have been made to the software in terms of usability and resource management.
Cluster-in-a-box
RAIDIX-based Trinity FlexApp storage supports heterogeneous clusters in Active-Active mode, which allows you to vertically scale the system without interrupting data access and quickly replacing controllers with more modern and powerful ones. Thus, the possible current and future risks on the qualitative development of the system as a whole are minimized.
High performance and data protection
The key tasks of the system are security, consistency and efficiency of simultaneous access to data for certain user groups and competitive connections based on corporate policies and directories. In addition, RAIDIX includes the Silent Data Corruption Protection mechanism and provides error tolerance associated with data corruption at the disk level (read noise, vibration resonance, and shock).
RAIDIX-based solutions based on NVDIMM-N are already used not only in Russia but also abroad. For example, in HPC projects for the largest scientific cluster in Japan. Software-defined RAIDIX technology meets the needs of high-performance computing, providing irreducible computing speed (up to 25 GB / s per processor core), high fault tolerance (proprietary RAID levels - 6, 7.3, N + M), scalability and compatibility with Intel Luster *.