Fault tolerance in Qsan storage
Today, in the IT infrastructure, with the widespread use of virtualization, storage systems are the core that stores all virtual machines. The failure of this node is able to completely stop the work of the data center. Although a considerable part of the server equipment has fault tolerance in one form or another “by default”, it is precisely because of the special role of storage systems within the data center that they have increased requirements in terms of “survivability”.

The most effective method of ensuring fault tolerance in IT is the use of several instances of equipment and software (in the simplest case, duplication). Of course, storage can be duplicated entirely. And for disaster recovery, this is exactly the approach that is used. But not all companies can afford such a solution. It is not only about double the cost of equipment, but also about other costs for organizing such a solution and its further support.
However, the possibility of duplication of equipment does not eliminate the need to ensure fault tolerance at the component level. In particular, redundancy is applied to storage systems for power supplies, cooling modules, drives and, of course, controllers. All this has long become commonplace. It is difficult to find storage without using a similar design. Qsan here is no exception. But we want to talk in this article about what is not immediately evident, and at the same time it is aimed primarily at increasing the fault tolerance of the system as a whole.
Cooling modules
Very often in storage systems with 2U-3U cases, combined modules are used that combine power supplies and fans. On the one hand, it’s convenient because Only one unit needs to be serviced. On the other hand, if the cooling system fails, the power supply may be forcibly turned off to avoid overheating. And it seems that not the most critical situation will arise, but obviously it is not worth adding storage vulnerabilities.
The cooling in Qsan storage systems is organized in the form of separate modules with a "hot" replacement, independent of power supplies. Actually, the power supplies have their own fans, designed to blow the PSU itself. The cooling module accommodates two independent fans that insure each other. There are two such modules in the storage system: on the right and on the left - for efficient airflow of all components. If one of the fans fails, all the others automatically increase their speed in order to compensate for the resulting lack of air flow. That is why a fan malfunction does not entail the risk of overheating of the entire device.
Extension Shelf Connection Topology
The classic scheme for connecting expansion shelves to storage means a topology called a cascade. In this case, the corresponding shelf and storage controllers are interconnected by a single SAS cable. In total, 2 cables for a dual-controller system are obtained. If you want to connect the second, then it is connected in the same way to the first shelf. Etc. The advantage of this topology is the ease of implementation in equipment. And the minus will be some vulnerability to a sudden break in the SAS circuit due to the cross failure of unconnected storage controllers and shelves or because of a blackout of one of the expansion shelves in the middle of the chain. The result will be loss of access to part of the drives and a possible collapse of the RAID group if it is “spread out” across several cases.
From cross-controller failure, Qsan has protection in the form of internal logical communication between controllers through the storage backplane. Those. the storage controller sees not only the JBOD controller directly connected to it, but also the “neighbor” controller through a special link in the backplane. As a result, if such a situation occurs and no one physically pulls out the SAS cables between the storage system and the shelf, then access to all drives will be preserved.

To protect the SAS circuit from breaking, for example, due to de-energization of the expansion shelf, a different connection topology is usually used - the reverse cascade. In this case, the storage system is connected immediately to the first and last shelf in the chain, gaining access to the drives from both sides.

If you want stronger protection, then you can build configurations on a larger scale, using, for example, the topology of the tree. Or else complicate through a combination of the mentioned topologies. This is possible due to the large number of SAS connectors on the devices (2 for each storage controller and 5 for each JBOD controller) with automatic detection of input / output operating modes. The main thing is that the administrator himself is not confused. And the storage system will be able to correctly configure the configuration.
Fast rebuild
The availability of hot spare spare disks in the system significantly increases the reliability of information storage. However, just the fact that such disks are allocated does not mean absolute protection. The fact is that the recovery process (rebuild) is quite time-consuming and often time-consuming. The complexity arises from the ongoing access to master data. Those. the system, along with the current work, must also copy the data to a new disk. And the duration of the rebuild directly depends on the capacity of the drive and its speed characteristics. Since the system does not know anything about the actual occupied disk space, in the process of rebuilding it simply copies everything: block by block.
As a result, restoration of a modern high-capacity disk of 10 + TB with a serious load on storage systems can easily be a week or more. You should also keep in mind the fact that during the rebuild, the probability of failure of other drives significantly increases due to the increased load on them. And this can already pose a serious danger in the case of using, for example, RAID5.
As a solution to this problem, many storage developers are concerned about speeding up the recovery process. Different approaches can be used for this, but the essence is the same - copying only really occupied blocks during rebuilding. Qsan did not stand aside from this problem. In the storage system of this vendor, when the Fast Rebuild option is activated, the system keeps track of the blocks used for recording, thereby having the ability to copy only them to a new drive in case of a disk failure.

The Fast Rebuild option is not enabled by default when creating new volumes, as its use has an impact on performance, especially with random write operations, because:
- It is necessary to track records in blocks;
- When rebuilding, checksums are not recalculated for unallocated space, therefore, when a new entry is made to this area, it is first necessary to “initialize” it.
Therefore, it is not recommended to use Fast Rebuild for volumes, for example, with highly loaded databases or in video surveillance systems, where the volume will still be 100% full. But for file or mail servers, this option will be just very useful.
Instead of a conclusion
Each storage manufacturer implies that its devices are reliable. And if there are no fatal miscalculations in the development of devices and an incredible thirst for savings in the process of their production and testing, then in general we can agree with the vendor. However, you need to understand:
- basic fault tolerance of storage systems is first of all a way to continue to have access to data in case of failure of any component (s);
- additional options regarding fault tolerance (such as those described above) are the elimination of certain types of malfunctions and increasing your chances of having access to data;
- 100% reliability, alas, does not happen. But, to get as close as possible to it, most sane storage vendors (and Qsan among them) make every effort to continuously improve their products in both hardware and software.
At the same time, one should not forget that no absolute reliability of storage systems does not cancel the availability of backup copies, clear and rehearsed plans for recovery in the event of an accident, and operational technical support for the vendor.