Family photo without grandmother or what to do if ECC crashes?
- Transfer
Not so long ago, I wrote about an interesting error correction algorithm called LDPC. But what if error correction cannot complete its task? A good note from Kent Smith on this subject recently appeared on the LSI blog, I decided to translate it.
Many users do not even suspect that while reading data from external devices, be it traditional HDDs or the same SSDs, computers constantly encounter a lot of errors. Therefore, error correction codes (ECCs) are used to correct error bits before incorrect data is returned to the user. But ECC capabilities are limited, and if the number of errors exceeds a certain limit, the Error Correction Code saves. Therefore, most companies in the storage industry are developing more sophisticated algorithms. LSI in its SandForce controller goes much deeper to protect user data.
What happens when ECC passes?
If ECC algorithms cannot do their job, only backup mechanisms can come to the rescue. There are three alternatives. The first is when the user must do the backup himself in order to avoid ECC crashes and other threats that can damage data or make it inaccessible. It can be both natural disasters that damage buildings and what is inside (earthquakes, mudslides, landslides), and more exotic problems, starting with damage to computers without proper lightning protection, and ending with banal theft. According to modern research, less than 10% of data is properly backed up. Not a very comfortable figure.
The second solution is to use RAID (Redundant Array of Independent Disks). Data is automatically stored in excess on several disks (sometimes even connected to different computers), and in the event of any failure, this redundancy allows you to restore lost data. This technology is very widely used in the corporate sector, but among home users it is most often exotic.
Is there a simple, automatic solution that works for a single drive?
Yes, the answer to all three questions, this is exactly the third solution implemented by LSI in SandForce chipsets under the name RAISE ™ data protection. This technology was introduced in 2009 with the first SandForce chipset. RAISE stands for Redundant Array of Independent Silicon Elements, it sounds like RAID, and in some ways the technologies are similar. This technology uses separate SSD cells as disks in a RAID array, saving data with some "excess". The original protection of RAISE level 1 can protect against the failure of an entire page of flash memory (I already wrote about flash pages here , approx. Translator), which is definitely beyond the power of the classic ECC.
Introduced last December, the SF3700 gives RAISE even more flexibility, allowing users to better protect their data. The original RAISE Level 1 required a certain amount of space to be reserved solely for data backup. In the case of a 64 GB disk, the amount available to the user was 60 or even 55 GB. Such losses are not very pleasant on such a small volume, and the only option to avoid this was to disable RAISE protection. In newer versions, such losses have become optional. The new “fractional” RAISE option allows this technology to use minimal amounts of memory, while ensuring information protection and a sufficient level of data redundancy (and the latter is especially important, since it allows you to fight against recording amplification, maintaining high speed SSD and protecting them from excessive wear and tear,
Better protection with Level 2 RAISE The
new Level 2 RAISE protection protects users from even larger-scale crashes, starting with multiple page read errors and ending with the failure of the whole chip. This technology uses auto-redistribution of data, taking into account the number of errors of a particular chip. If the chip is close to failure, the protection will redistribute the data from it to others. This leads to a decrease in the amount available to the user, therefore RAISE level 2 has the ability to "roll back" to the protection of level 1 without losing the storage available to the user.
Another feature of the new chipset is the presence of an additional (ninth) flash memory channel, which allows manufacturers to make chips with a larger capacity, which in turn will allow using RAISE level 1 without reducing the amount available to the user (without it, the disks will be reduced in volume to 60, 120 and 240 GB, respectively).
Of course, RAISE will not protect you from a possible theft or disaster, such as a power surge or flood, but these events are clearly less likely than a normal ECC failure. Therefore, the best strategy is to buy a disk with RAISE and periodically make backups to protect against global problems.