
The problem of using kernel raid autodetect for / on / dev / md0 and superblock v1.2 for other / dev / md at the same time, or how you can drop (and raise) the server after updating it
Thanks for reading the headline. It was a test.
Today, after the next update on its favorite Gentoo server and preventive reboot, it suddenly fell off / dev / md1 with the words of the wise kernel: sdc1 does not have a valid v0.90 superblock, not importing!
Shock! Panic! well, that’s not in the core ...
And what, in fact, is the matter?
First, I’ll talk about the server configuration to make it easier to understand the essence of the problem and how to solve it. So, the kernel 3.10.7 with RAID autodetect enabled and two RAID1 (mirror) drives.
Root is mounted on / dev / md0, on the / dev / md1 database (Percona):
db13 ~ # cat /etc/fstab | grep md
/dev/md0 / ext3 noatime 0 1
/dev/md1 /mnt/db reiser4 noatime 0 0
And a piece of /boot/grub/grub.conf:
title Gentoo Linux 3.10.7 md0
root (hd0,0)
kernel /boot/kernel-3.10.7 root=/dev/md0
So, for the successful assembly of md devices by the kernel during boot, two conditions must be met:
- Type 0xFD for partitions on which RAID is built
- The superblock format version is 0.90 on the device / dev / md, which is created using mdadm
If with clause 1. everything was fine in my configuration, then, as it turned out, the superblock format was 1.2 I suspect that I created / dev / md1 after the new version of mdadm arrived, which uses this format by default. As a result, the core swears in terrible words:
dmesg | grep md
[0.000000] Command line: root = / dev / md0 raid = / dev / md0 [0.000000] Kernel command line: root = / dev / md0 raid = / dev / md0 [1.063603] md: raid1 personality registered for level 1 [1.266420] md: Waiting for all devices to be available before autodetect [1.266494] md: If you don't use raid, use raid = noautodetect [1.266781] md: Autodetecting RAID arrays. [1.293670] md: invalid raid superblock magic on sdc1
[1.294210] md: sdc1 does not have a valid v0.90 superblock, not importing! [1.312482] md: invalid raid superblock magic on sdd1 [1.312556] md: sdd1 does not have a valid v0.90 superblock, not importing!
[1.312579] md: Scanned 4 and added 2 devices. [1.312595] md: autorun ... [1.312610] md: considering sdb3 ... [1.312626] md: adding sdb3 ... [1.312641] md: adding sda3 ... [1.312657] md: created md0 [1.312665] md: bind[1.312754] md: bind [1.312770] md: running: [1.313064] md / raid1: md0: active with 2 out of 2 mirrors [1.313166] md0: detected capacity change from 0 to 7984840704 [1.313262] md: ... autorun DONE. [1.320413] md0: unknown partition table [1.338528] EXT3-fs (md0): mounted filesystem with ordered data mode [2.581420] systemd-udevd [861]: starting version 208 [3.122748] md: bind [4.896331] EXT3-fs (md0): using internal journal
When Google Doesn’t Help
The choice is very small - either disable auto-detection of arrays in the kernel (recompilation and edits in grub.conf), or change the format of the superblock (full data backup and mirror killing with its subsequent restoration). Both options are “not an option”, since they are destructive in nature and can lead to data loss, and they can take a lot of time (as it turned out during the search for a solution kernel autodetect is depricated feature )
By the way, after starting the server / dev / md1 runs fine with the command
mdadm --manage / dev / md1 --run. Of course, it would be possible to write this line somewhere in rc scripts, but, you see, this is somehow not sports.
Eureka!
The solution did not come right away, although it lay on the surface - all that needs to be done is to remove the 0xFD type (replace with 0x83) from the disks in / dev / md1 and then the kernel will stop trying to collect this array without success, preventing udevd from doing its job. Indeed, after using fdisk to change the type of partitions on both mirrors and rebooting the server, everything miraculously started up:
dmesg | grep md
[0.000000] Command line: root = / dev / md0 raid = / dev / md0 [0.000000] Kernel command line: root = / dev / md0 raid = / dev / md0 [1.063924] md: raid1 personality registered for level 1 [1.248078] md: Waiting for all devices to be available before autodetect [1.248201] md: If you don't use raid, use raid = noautodetect [1.248504] md: Autodetecting RAID arrays. [1.265058] md: Scanned 2 and added 2 devices. [1.265243] md: autorun ... [1.265258] md: considering sda3 ... [1.265274] md: adding sda3 ... [1.265290] md: adding sdb3 ... [1.265305] md: created md0 [1.265321] md: bind[1.265331] md: bind [1.265428] md: running: [1.265865] md / raid1: md0: active with 2 out of 2 mirrors [1.265891] md0: detected capacity change from 0 to 7984840704 [1.266068] md: ... autorun DONE. [1.276627] md0: unknown partition table [1.294892] EXT3-fs (md0): mounted filesystem with ordered data mode
[2.713383] systemd-udevd [860]: starting version 208 [3.128295] md: bind[3.159107] md: bind [3.170320] md / raid1: md1: active with 2 out of 2 mirrors [3.170333] md1: detected capacity change from 0 to 17170300928 [3.178113] md1: unknown partition table [4.911712] EXT3-fs (md0): using internal journal [5.027077] reiser4: md1: found disk format 4.0.0.
I would be glad if Google, having found this text, shows it to my colleagues in the workshop who find themselves in a similar situation.