Replacing the disc while maintaining the correct numbering in the CEPH
It is assumed that as a result of this method we save the sequence in which the disks are displayed using the ceph osd tree command. If they are there in order, then it is easier to read and is considered, if necessary.
Lyrical digression on the topic. The official method of replacing a disk in ceph involves deleting all the logical entities associated with this disk from the cluster and then re-creating them. As a result, a newly-created osd (under some set of circumstances) can change its number (the number in the entity name, which is the osd. Number) and the location in the crush map and will naturally be displayed elsewhere in the ceph osd tree and other commands. Change its sequence number.
The idea of this method is that we will not change any logical entities, but simply slip a new disk to the “old” place in the cluster. To do this, on this new disk you need to (re) create the correct data structures: all sorts of id, symlinks, and keys.
Mark up the new disk.
Create a new section on our partition
We get the UUID of the dead osd
We put PARTUUID on a data disk
Find a section with a magazine
Create a file system disk
Mount FS
Copying data from the neighboring OSD
In fact, this is the most disgusting part of the procedure, you need to do everything carefully.
When copying, you must skip the directory / var / lib / ceph / osd / ceph-NUMBER / current, this is the data directory. Symlink to the journal, we will create later
copying
Looking for a magazine
accordingly, we find the partition, and do
We do a symlink on this UUID
Fsid is filled with the correct value.
This fsid is actually a unique id, under which the osd scale is listed in the cluster, it is important, because if you do not guess with id, then the osd-scale itself will not see the cluster and it will be mutual.
And the value must be taken from the partuuid partition on the log with the data.
Fill in the keyring.
With this, the osd-box is authorized in the cluster.
It is recorded in a file in the format
Fill whoami
Just write in this file the number of OSD-shki that we want to revive.
We hammer in magazine
Create log metadata and osd-shki
Change data owner
Starting ceph-osd
Warning: Immediately after starting ceph-osd, rebuild will start if no ceph osd out NUMBER command was issued before the disk was released from the cluster.
Lyrical digression on the topic. The official method of replacing a disk in ceph involves deleting all the logical entities associated with this disk from the cluster and then re-creating them. As a result, a newly-created osd (under some set of circumstances) can change its number (the number in the entity name, which is the osd. Number) and the location in the crush map and will naturally be displayed elsewhere in the ceph osd tree and other commands. Change its sequence number.
The idea of this method is that we will not change any logical entities, but simply slip a new disk to the “old” place in the cluster. To do this, on this new disk you need to (re) create the correct data structures: all sorts of id, symlinks, and keys.
Mark up the new disk.
parted /dev/диск_с_данными mklabel gpt
Create a new section on our partition
parted /dev/sdaa mkpart primary ext2 0% 100%
/sbin/sgdisk --change-name=1:'ceph data' -- /dev/sda1
We get the UUID of the dead osd
ceph osd dump|grep 'osd.Номер'
We put PARTUUID on a data disk
/sbin/sgdisk --typecode=1:99886b14-7904-4396-acef-c031095d4b62 -- /dev/Диск_с_данными
Find a section with a magazine
ceph-disk list | grep for | sort
Create a file system disk
/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdaa1
Mount FS
mount -o rw,noatime,attr2,inode64,noquota /dev/Партиция_на_диске_с_данными /var/lib/ceph/osd/ceph-номер_OSD
Copying data from the neighboring OSD
In fact, this is the most disgusting part of the procedure, you need to do everything carefully.
When copying, you must skip the directory / var / lib / ceph / osd / ceph-NUMBER / current, this is the data directory. Symlink to the journal, we will create later
copying
for i in activate.monmap active ceph_fsid fsid journal_uuid keyring magic ready store_version superblock systemd type whoami; do cp /var/lib/ceph/osd/ceph-НОМЕР_СОСЕДА/${i} /var/lib/ceph/osd/ceph-НОМЕР; done
Looking for a magazine
ceph-disk list | grep for | sort
accordingly, we find the partition, and do
ls -l /dev/disk/by-partuuid | grep Партиция_Номер
We do a symlink on this UUID
ln -s /dev/disk/by-partuuid/UUID /var/lib/ceph/osd/ceph-НОМЕР/journal
Fsid is filled with the correct value.
This fsid is actually a unique id, under which the osd scale is listed in the cluster, it is important, because if you do not guess with id, then the osd-scale itself will not see the cluster and it will be mutual.
And the value must be taken from the partuuid partition on the log with the data.
echo -n UUID >/var/lib/ceph/osd/ceph-НОМЕР/fsid
Fill in the keyring.
With this, the osd-box is authorized in the cluster.
ceph auth list|grep --after-context=1 'osd.НОМЕР'
It is recorded in a file in the format
[osd.НОМЕР]
key = СТРОКА_С_КЛЮЧОМ
Fill whoami
Just write in this file the number of OSD-shki that we want to revive.
We hammer in magazine
dd bs=32M oflag=direct if=/dev/zero of=/var/lib/ceph/osd/ceph-НОМЕР/journal
Create log metadata and osd-shki
ceph-osd --mkfs -i Номер_OSD
ceph-osd --mkjournal -i Номер_OSD
Change data owner
chown -R ceph:ceph /var/lib/ceph/osd/ceph-НОМЕР
Starting ceph-osd
Warning: Immediately after starting ceph-osd, rebuild will start if no ceph osd out NUMBER command was issued before the disk was released from the cluster.
systemctl start ceph-osd.НОМЕР