SCT Error Recovery Control

    ... or what is really a 'raid edition' for hard drives



    Bit of theory


    There are two strategies for HDD behavior when an error is detected:
    • standalone / desktop - try to read to the last. It feels like a "braking screw", which still works, if it is a single failure, it "dulled, but passed," plus a characteristic clatter of recalibrated heads.
    • raid - fall off right there. It feels like “suddenly there was a disk error but then mhdd, etc. I DID NOT FIND ANYTHING WHAT I DO. ”
    The strategies obviously differ in their purpose - the desktop will do better, but it won’t give an error, there is a spare screw in the raid, and no one has the chance to endure reading brakes while reading. Could not read? We read from the spare screws, mark the whole screw as bad, start the resink, and then the disk will then be sent to the utilizer. Perhaps undeservedly, but there is nothing to hiccup in a responsible position.

    Managing error behavior strategies is a feature of expensive hard drives. In desktop series, it often simply does not exist, or it exists, but without the right to turn it on - the hard drive tupitates the error as much as it sees fit. The second important point - on raid hard drives, this option is enabled by default. Which can lead to problems.

    Name decoding


    The ability to control disk behavior during errors is called very, very confusing: SCT ERC. This stands for SCT Error Recovery Control. SCT, in turn, is the name of the general SMART Command Transport protocol. SMART, in turn, stands for Self-Monitoring, Analysis and Reporting Technology, so the full SCT ERC decryption is: Self-Monitoring, Analysis and Reporting Technology Command Transport Error Recovery Control (exhaled).

    Quick reference


    You can see if the hard drive supports error management using the command smartctl -a /dev/sdxxline SCT capabilities:

    SCT capabilities:  (0x303f) SCT Status supported.
    			SCT Error Recovery Control supported.  *****
    			SCT Feature Control supported.
    

    If there is no line, their disk (command) does not support them.

    Next - in fact, the management process. In those disks that I saw, there are two parameters - the read operation timeout and the write operation timeout. Below I will give the values ​​for all the disks that my hands reached.

    To watch timeouts use the command smartctl -l scterc /dev/sda. The output looks like this:

    # smartctl -l scterc /dev/sda
    SCT Error Recovery Control:
               Read:     70 (7.0 seconds)
              Write:     70 (7.0 seconds)
    # smartctl -l scterc /dev/sde
    SCT Error Recovery Control:
               Read: Disabled
              Write: Disabled
    # smartctl -l scterc /dev/sdd
    Warning: device does not support SCT Error Recovery Control command
    

    For installation, respectively, we indicate the values ​​separated by a comma after scterc: smartctl -l scterc,120,60 /dev/sde(the value is indicated in tenths of a second, that is, 120 corresponds to 12 seconds, the first number is reading, the second is writing). 0 means “to the end”, that is, unlimitedly long.

    Default values


    Here are the data from the different drives that I have on the farm:
    TitleModelERC (whether or not, if any, default values)
    Western Digital VelociRaptorWDC WD1500HLFS-01G6U1Yes, 7/7
    Western Digital RE4 Serial ATAWDC WD1500HLFS-01G6U1Yes, 7/7
    Western Digital RE3 Serial ATA familyWD1002FBYS-02A6B0Yes, 7/7
    Western Digital Caviar Green (Adv. Format)WDC WD20EARS-00MVWB0not supported
    Western Digital Caviar GreenWD7500AACS-00D6B0Yes, 0/0, cannot be turned on
    Seagate Maxtor DiamondMax 22STM3500320ASYes, 0/0, you can enable
    Seagate Barracuda 7200.9ST3400633ASNo (the maxtors / sigates have the same years, but the sigates do not - wow)
    Seagate Barracuda 7200.10ST3500630ASnot
    Seagate Barracuda 7200.11ST31500341AS(suddenly!) Yes, 0/0, you can turn it on
    Seagate Barracuda LPST31500541ASYes, 0/0 (that is, turned off), you can enable
    SAMSUNG SpinPoint F4 EG (AFT)SAMSUNG HD204UIYes, 0/0 (off), you can enable
    Hitachi Deskstar 7K3000HDS723030ALA640Yes, 0/0, cannot be enabled (scsi error aborted command)
    Hitachi Deskstar T7K500HDT725032VLA360Yes, 0/0, cannot be turned on

    (just don’t ask me where I got so many drives from at home).

    Morality


    People who take RE4 disks for themselves (and other raid editions from other remaining manufacturers), as well as velocity raptors for use as the only hard disk and do not set ERC to zero, do a gigantic stupidity comparable only to stupidity of people which desktop screws drive into the raid without setting up ERC and hope that in case of failure their raid will save.

    In fact: they bought a cool screw home in the amount of one piece: turn off ERC (0,0). We bought a screw for a raid - check that his ERC is different from zero, but better closer to a reasonable value in the region of 3-10s. (300-1000).

    Models, the use of which on the desktop requires attention: WD RE3, RE4, Raptor, Seagate NS.

    PS In addition to ERC, manufacturers promise increased quality and reliability of the RE / NS series, but we can’t verify this, but the presence / absence of ERC is an objective, easily verified sign. A drive without an ERC should not be in a raid under any circumstances, since in the event of a failure of harm, it will bring more than good.

    PPS How to perform operations with SMART'om in Microsoft Windows - I have no idea. Call manufacturer support and ask. Phone 8 (800) 200-8001.

    For Mac OS X, as far as I know, there is a smartmontools port, so these commands (from the root) are quite executable there.

    PPPS (from comments) For WD there is a utility WDTLER (Time-Limited Error Recovery) on some hdd green-series you can still enable ERC / TLER: blog.agdunn.net/?p=208

    Also popular now: