Data Reduction Features at Dell Compellent

    Earlier we talked about the implementation of QoS technology in Dell Compellent systems , but the innovations in SCOS version 7 do not end there, and today a few words about them.

    In Compellent systems, Data Progression technology has been around for quite some time, when for many midrange storage systems there wasn’t even a word about it. As then, the movement of data blocks between levels now also occurs according to a schedule, and not in “online” mode. This allows you to control the load on the system, avoiding degradation of performance due to the ongoing migration processes. Although, you need to take into account the activity of the load in advance when planning. The organization of data in Compellent systems is based on data pages ranging in size from 512KB to 4MB (default page size is 2MB). Migration is also done at the individual page level. Due to this, data can be more selectively distributed across storage levels.
    image

    Starting with SCOS 6.5.1, data compression has been added. It also happens offline, in parallel with the redistribution of pages by storage levels. To explain how Data Reduction works in Dell Compellent storage systems, it is necessary to say that the data pages operated by the system are of three types:

    ● “Active” (active pages) - recently recorded data. These pages are always at the fastest storage level and the data in them can be either read or overwritten.
    ● “Frozen accessible” - pages after taking a snapshot. The data in them has not yet been overwritten, so they are readable, but as soon as there is an attempt to change them, a new active page will be created, and this page will be put into “frozen inaccessible” mode.
    ● “Frozen inaccessible” (frozen inaccessible) pages that were overwritten after the picture was taken. The data from them can no longer be read when accessing the volume (without restoring the snapshot), and the rewritten blocks are in the active pages at the highest storage level.

    All Data Progression and Data Reduction processes involve only “frozen” pages. Every day, according to the schedule, another snapshot is created (at this moment all active pages “freeze”) and migration between levels begins. Compression always works with 64KB blocks, regardless of page size. If after applying compression algorithms it turns out that the compressed data together with the necessary set of metadata takes up more space than the uncompressed ones, compression for this page is not applied. Since the system operates with pages, at the final stage of the Data Progression process, part-time pages are combined to free up disk space. SCOS can also recognize and replace links filled with zeros, which allows for even higher compression rates.

    But the use of SSDs and the spread of All-Flash systems require more active use of data optimization technologies to reduce storage costs. Therefore, in the SCOS 7 version, Data Reduction was supplemented with deduplication. Since all optimizations occur in parallel with the movement of data between levels, both for compression and deduplication will require a license for Data Progression. All Flash arrays with a single storage level are an exception and you can not buy this license for them. Note that if ssd disks are not used in the array, then neither compression nor deduplication can be used. At first glance, this is rather strange - because both of these technologies work on pages that could be moved from the fastest level of storage to regular disks. That's right - compressed data can be stored on regular disks, but to access them, we always need additional access to official metadata. And therefore, SSD disks (at least 6 disks) are also necessary for storing this information, since if we placed metadata on regular disks, there would be a significant performance degradation when accessing compressed data.

    Deduplication is supported only on controllers with sufficient processor performance - SC4020, SC7000, SC8000, SC9000. Systems based on SC40 and SCv2000 are not supported. To optimize the data, dedicated processor cores are used in the controllers, so in most cases there is no noticeable effect of background processes on I / O performance. But at any time you can pause the Data Reduction processes, if suddenly they really began to affect the speed of the system.

    Optimization processes are background and can start either on a schedule or as soon as you take a new volume snapshot. After creating the snapshot, the “on-demand data progression” process starts and, as a result, compression and deduplication (if they are of course included for this volume). There is a serious difference between these two options - daily optimization processes all frozen pages, and the “fast” option only treats those that appeared when creating the image. As a result, such a background process loads the system less during business hours.

    Deduplication, unlike compression, already works with 4K blocks. The implementation principle is standard - as in other systems, the hash from the block is considered and compared with the dictionary. If there is already such a hash in the dictionary, then the data block is replaced by a link, due to which disk space is saved. Then, after deduplication, compression starts. At this stage, only the remaining unique 4K blocks are compressed. Everything that was said about compression earlier remains valid - in some cases, compression may not give the desired effect and then the page will be written to disk “as is”.

    For each volume, you can choose which technologies to use - compression, compression paired with deduplication, or disable Data Reduction altogether. It is not possible to enable deduplication without compression. Depending on what storage levels are used, optimization can work in different ways:
    image
    image

    As it usually happens, when developing a new solution, it is necessary to remember some features of the implementation. Storage-level replication is supported for compressed and deduplicated volumes, but at the time of the actual data transfer, the transferred data is decompressed in the controller memory (the data on the disks remain compressed) and only uncompressed data is transferred “out”. Yes, when setting up replication, you can separately enable data deduplication , but this process will only work when sending data to a remote system, and the data itself will be pre-rehydrated. If you want to change the controller that owns the volume with Data Reduction, use It is necessary to first turn off compression and deduplication, wait for a full recovery (after the next Data Progression cycle), and only then change the owner to another controller.

    Yes, there was a time when tiered storage in Dell Compellent could be positioned as a new and unique solution. But now, when All-Flash is gaining popularity, you cannot leave the system without the possibility of data optimization - the price per GB is getting too high compared to competitors. The appearance of new functionality in SCOS cannot but rejoice, and given that the update is also supported on existing controllers, customers are given the opportunity to start making better use of existing storage systems. How much periodic deduplication is worse or better is a constant question, and for each individual project there will be an answer. The right way is to test the equipment before purchasing and listen to not only the marketing statements of the vendors when choosing a solution.

    Trinity engineers will be happy to advise you on server virtualization, storage, workstations, applications, networks.

    Visit the popular Trinity Tech Forum or request a consultation.

    Other Trinity articles can be found on the Trinity blog and hub . Subscribe!

    Also popular now: