CSS ScaleFlux, or how to speed up databases two to three times a simple replacement for NVMe

    Databases, content distribution networks, big data, artificial intelligence, machine learning - all these data-driven scenarios require high performance of the entire IT infrastructure. For the storage subsystem, everything is solved simply - installing high-speed NVMe and SSD instead of SAS and SATA. With the computing part, everything is more complicated - the central processors do not keep up with many operations that are very sensitive to time. To eliminate this bottleneck, ScaleFlux has developed new types of media. Inside them, side by side with 3D NAND memory FPGA components work, which take on many typical data operations. In this post we will talk in detail about the ScaleFlux solution.



    Principle of operation


    CSS in the case of ScaleFlux is the Computational Storage System. This device typically has the format of a PCI-E expansion card or U.2 format drive. Inside, fast flash memory is installed - 1.6 TB, 3.2 TB or 6.4 TB - as well as a semiconductor component with the complex name "user-programmable gate array", better known as FPGA.



    In an infrastructure with conventional SSDs, the central processor takes care of all the computing operations. Including those that are most closely related to data. For example, compression - it is carried out by applications working with large volumes of information in order to save disk space (GZIP compression).

    In infrastructure with CSS ScaleFlux compression is carried out directly in the drive. Like other frequent operations. For example:

    • Erasure coding
    • Search in key-value stores
    • AES-128/256 Encryption
    • SHA-3 hashing

    This helps free up processor resources and direct them toward application acceleration. The principle of operation is understandable, now we will tell you how it works in real conditions.

    ScaleFlux in popular applications


    Our main goal is that CSS ScaleFlux can be used without dancing with a tambourine. Together with the device we supply a software package for Linux (kernel version from 2.6 is required). With the help of the package, FPGA is configured within a few minutes, the computational part of CSS, which systems access through a compatible API. Now we have released software for use in nine popular data-driven systems: MySQL, PostgreSQL, Hadoop, Aerospike, HBase, Hortonworks, RockDB, Spark, Vitesse Data.

    To understand whether it is worth developing support for a particular system, we conduct benchmarks where we compare the performance of similar configurations with NVMe cards and CSS ScaleFlux. Here are the results:


    There are more detailed results on our sitefor each of the scenarios, with graphs and test configurations, the

    list of officially supported platforms does not yet contain several fairly well-known ones: MongoDB, Cassandra, Vertica, etc. We are working on compatibility with these systems and will add them when we remove all possible roughness. If you still use CSS in working with applications without official support, you will get the standard NVMe with block storage. And then, if necessary, you can easily switch to supported systems and the use of the computing part.

    Data Protection and General Issues


    CSS ScaleFlux can use different technologies to protect information: flash RAID, redundant recording, scanning and error correction. Checkpoints are constantly being created for critical information, such as address tables.

    CSS has additional capacitors to protect against power outages. In the absence of external power, they are enough to record the necessary information without loss. For work in conditions of elevated temperature, throttling is provided.

    At a price, CSS ScaleFlux are comparable to conventional NVMe cards: the difference usually does not exceed 9%. In practice, it often happens that this difference is offset by the comparative space savings achieved with the “delegated” compression. The CSS ScaleFlux warranty is three years, based on 5 full data rewrites daily.

    We can share some implementation experience. One of our financial customers provides 4 billion card transactions per year, captures all data in HBase and analyzes it to form new offers. After the implementation of ScaleFlux, the volume occupied by its data for analysis was halved, as well as the query time in the database. Another client developing digital security tools uses a different database - Aerospike. He replaced six SATA SSDs with one ScaleFlux system and as a result doubled the transaction speed.

    If you want to see and test CSS ScaleFlux, you can contact us through the form , in the comments to the post, by mail at ru@globaldots.com or by phone + 7-495-762-45-85.

    Also popular now: