ZFS Storage, standby and test environments
- Do we have any snapshot for January, closer to February? 
- Now let's see ... Yes, there is! Now we open.
It happens that there is an average lifetime of the test base, there is a snapshot lifetime agreed upon by all concerned, but one of the environments “lingers” for a long time in its picture, which is not removed at all ... and then it turns out to be useful to colleagues. And a minus to a minus gives a plus.
Usually for any systems in which something can happen, it is required to form backups. And if it also develops and is being finalized, then somewhere else also develop development and testing environments. And for backups and test environments that work, in fact, with the same data, you need a lot of space. And yet these environments need to somehow lead to the current state. And all this requires hardware and time resources.
In our case, these needs covered the Oracle ZFS Storage Appliance and the Oracle / Sun servers, which actually merged into one ecosystem with Exadata, which appeared shortly before them.
Since there is an InfiniBand switch inside Exadata, through which its components communicate, and ZFS Storage is also a Oracle Appliance, then:
Well, what about the place? You need to avoid duplication.
Test environments need the same data as they are in backups. Can this data perform both functions? Be a backup and foundation for any privileged test environment that needs a complete set of data? They can!
The Oracle ZFS Storage Appliance is an array that provides, among other things, the ability to form network shares running under the ZFS file system. As part of the ZFS file system, you can create snapshots, based on which you can deploy clones, which are visible as new network shares. We use this feature as follows:
Eventually:
- Now let's see ... Yes, there is! Now we open.
It happens that there is an average lifetime of the test base, there is a snapshot lifetime agreed upon by all concerned, but one of the environments “lingers” for a long time in its picture, which is not removed at all ... and then it turns out to be useful to colleagues. And a minus to a minus gives a plus.
Usually for any systems in which something can happen, it is required to form backups. And if it also develops and is being finalized, then somewhere else also develop development and testing environments. And for backups and test environments that work, in fact, with the same data, you need a lot of space. And yet these environments need to somehow lead to the current state. And all this requires hardware and time resources.
In our case, these needs covered the Oracle ZFS Storage Appliance and the Oracle / Sun servers, which actually merged into one ecosystem with Exadata, which appeared shortly before them.
Since there is an InfiniBand switch inside Exadata, through which its components communicate, and ZFS Storage is also a Oracle Appliance, then:
- first, it was directly connected to this switch by part of its ports;
- secondly, it can store tablespace files with segments compressed in Exadata Hybrid Columnar Compression (EHCC), saving us a lot of space in the main system. If you try to restore the database on a separate server, then after recovery, referring to the compressed data, you will get the error: “the data files, compressed in the EHCC should be stored on the Oracle Appliance;
- thirdly, it opens up the possibility of using ZFS capacity for storing test environment files.
Well, what about the place? You need to avoid duplication.
Test environments need the same data as they are in backups. Can this data perform both functions? Be a backup and foundation for any privileged test environment that needs a complete set of data? They can!
The Oracle ZFS Storage Appliance is an array that provides, among other things, the ability to form network shares running under the ZFS file system. As part of the ZFS file system, you can create snapshots, based on which you can deploy clones, which are visible as new network shares. We use this feature as follows:
- On ZFS Storage (so we will call the array, so as not to be confused with the file system) two shares are created - Archivelog is added into one, and the base files into the other;
- Share is mounted to the Oracle / Sun server (which is also an Appliance), and on the server itself, an instance of the Oracle Database runs as cascaded physical standby - it receives logs from the conditionally reserved site and applies the changes to the files lying in share;
- The use of logs is organized according to the workunit principle (hello to all participants of distributed computing!). At the level of the algorithm introduced the concept of workunit, which corresponds to a certain time interval. After logging in for the required interval, the instance stops, and in share there are files that are in a consistent state relative to each other and the controlfile. In fact, this is a cold backup, it is also an Image Copy, on top of which it is snapshot;
- When it comes time to re-create the test environment, a clone is created from the desired snapshot. It is mounted to the server on which the environment is running, after which the files in it open as a base under a different name and in Read / Write mode;
- In the process of work, changes are made in the test base, which are deposited within the clone, and it gradually grows. By the end of the life cycle, the environment grows to its maximum.
- To reduce the consumption of disk space even less, we apply LZJB compression, which ZFS Storage performs on the fly.
Eventually:
- In the current configuration, test environments can run I / O to 3.75 Gb / s; 
 The maximum for reading is limited by the existing settings of the InfiniBand ports on the server, the maximum for writing is by the CPU on the ZFS Storage controllers and reaches approximately 2 Gb / s. (Yes, yes! Because 10 GbE was not enough, separate switches were purchased for test servers, including ZFS Storage and the servers themselves);
- Several snapshots are created per day, which are now stored, depending on the base, from 2 weeks to 2 months. After that, they are all deleted, except for the snapshots created at 00:00, the 1st of each month - these are already stored for more than a quarter. There were cases when the snapshots that were stored for about six months turned out to be useful ;
- If necessary, the entire industrial database can be restored from the desired snapshot. It is also at a speed of about 1 ... 3 Gb / s., But the option of creating a clone from the desired snapshot from which the data of the necessary tables is downloaded is much more popular;
- The re-creation time of the test environment is about 1 hour (with transfer of a number of additional circuits there, etc.);
- The time to provide a clone to colleagues from which you can take data for recovery or just some kind of analysis - from 15 minutes (under ideal conditions) to 1-2 hours (with a large parallel load on ZFS Storage or us J);
- If necessary, you can recover from the snapshot and clone, and the entire database as a whole;
- The main performance limiter is the number of IOPS generated by test environments or cascaded standby instances. And here the system behaves absolutely adequately and predictably - as soon as their number approaches 75 IOPS per HDD (it contains 3.5 ”disks at 7200 rpm) under prolonged load, the system gradually begins to sag. And small in time - Write- and Read-Flash are noticeably easier;
- The number of IOPS, the total amount of incoming data, the load on the CPU, the number of reads from the caches in RAM and Flash, as well as a few dozen (if not hundreds) metrics - can be viewed in the web-based management interface;
- You can work with ZFS Storage objects using the REST requests described in the documentation. With their help, we managed to automate the removal of outdated snapshots, and much more could be done!