FlexClone - Digital Sheep Dolly

    image

    The Dolly Sheep from the title is the household name of the first successful clone of a high-level living creature created by science. Although the biology of clones is still an experimental field, in IT and in the field of data storage, the topic of clones has settled firmly and broadly.
    The ability to quickly (and most importantly - economically, from the point of view of occupied space and time) create a complete copy of various vast data is a very popular task today. Such a clone of data could be used for testing, development, and other experiments when it is undesirable or directly impossible to do this on “combat data”.


    A practical example of where and how clones can be applied.
    Suppose our company uses a large database in which our entire business is concentrated, for example, the base of an ERP system. Of course, the business lives and grows, along with the growth of our company, the development team writes new functionality for the base, designs new jobs, writes some new business logic in the database. And all this they need to test and verify on the data. Moreover, the more voluminous and real data they have, the more effective and more accurate their testing will be.
    Ideally, they need to be tested on real data, on the real base of your company. However, who in their right mind would let them test something on a “combat” base?
    Now, if you could create an exact copy of the database, in its current state, especially for testing!

    But to create a complete, independent clone, we need a storage volume of at least equal to the capacity of the main base. Moreover, the speed of such a base should, ideally, be equal to "combat", that is, it will not be suitable to copy them to a "bunch of USB drives." I'm not talking about the fact that under the terms of implementation, for example, of an ERP system such as SAP, the developed infrastructure simply must have development units and QA (Quality Assurance, validation and verification), and often we also need a base for conducting training and education of new users, and each must have its own copy of the current database, which is the development and control of applications, as well as staff training.

    It is these requirements that often lead to the fact that enterprises implementing ERP automatically fall on two or three additional to the main storage system, or twice or three times as many disks in it, spending not only on their purchase, but also on operation, service, administration, and so on. But it’s still important to maintain the relevance of the copies, and therefore regularly roll up changes to the copies, keeping them up to date with the main copy of the state.
    And nothing can be done here. Keeping several identical copies of the base is necessary.
    Or is it still possible?

    And here the clones begin to play.
    Clones are complete copies of data that work as full copies, that is, not just “read-only”, but like a regular copy. The contents of such a clone can not only be read, but also changed, that is, to fully write into it, indistinguishable for an application, from working with a regular data section.

    You can make a clone in three ways. Firstly, just copy the data to a free place, and maintain its relevance manually. This is obvious and not interesting. It is rather a copy, not a clone. This option takes up a huge space of an expensive storage place and a lot of time for its creation.
    The second option is the so-called Copy-on-Write (COW) copy. They contain records that, for example, the application under test wants to put in its “copy” cause the source data to be copied to a special reserved space, while both the original data set and their changes will be saved. The problems here are the same as those of COW snapshots; this almost three times reduces the performance of such a storage. And this is also uninteresting.

    And the third option was implemented by NetApp.
    As you already know, the underlying structure of all NetApp storage systems is a data block allocation structure called WAFL, is arranged in such a way that changes to the blocks in it are not made inside the actually variable blocks, but into the free space of the volume, where the current status data pointers are then rearranged.
    This scheme allows you to easily and elegantly solve the problem with variable blocks in the clone. The clone always remains the same virtual “copy” of the data (which means we can not take up disk space on it, but simply referring to the blocks of source data from it), and all changes to the blocks that we make on the clone instances are accumulated separately.
    Therefore, such clones occupy disk space only in the amount of changes made to the clones.

    image

    If your programmers in the development department changed a couple of terabytes of data in their virtual copies of the clones of the database, 6TB in size, then their clone will occupy exactly these 2TB on the disk from free space, and that's all.

    image

    In the screenshot you see how such a clone looks in practice. There are three files on the disk, each, from the point of view of the Explorer 10GB in size, but the space occupied by the disk by all three is only 10GB.

    image

    It is also useful that such clones can easily “come off” from the original volume, and, if necessary, turn into a physical copy.
    Did the developers check the process of upgrading the database to the new version, or tested the patches? They upgraded, checked all the software to work correctly with the new version, made sure that everything works on an updated database - and easily replaced the working volume with an updated and verified “clone”, with all the changes made in this clone, which, in this case, now becomes quite a full volume.

    By the way, there can be up to 255 such clones on each volume (not on the system as a whole - on each volume!), And you can not limit yourself to the number of clones. If your developers have several options, and they would like to choose one of them - just give them a clone for each option, let them experiment and choose, comparing all the desired options at once.

    As you can see, the presence of such a simple and effective mechanism for data cloning often changes the approach to its use. What has not been done before, “because it is impossible”, as an example - the same practical debugging on real data, or parallel development on voluminous multi-gigabyte and terabyte data, now, with such clones, is quite feasible.

    The picture in the title of the article contains two clones of the new FAS3200 series controllers.

    Also popular now: