Replacing the disc while maintaining the correct numbering in the CEPH

It is assumed that as a result of this method we save the sequence in which the disks are displayed using the ceph osd tree command. If they are there in order, then it is easier to read and is considered, if necessary.

Lyrical digression on the topic. The official method of replacing a disk in ceph involves deleting all the logical entities associated with this disk from the cluster and then re-creating them. As a result, a newly-created osd (under some set of circumstances) can change its number (the number in the entity name, which is the osd. Number) and the location in the crush map and will naturally be displayed elsewhere in the ceph osd tree and other commands. Change its sequence number.

The idea of ​​this method is that we will not change any logical entities, but simply slip a new disk to the “old” place in the cluster. To do this, on this new disk you need to (re) create the correct data structures: all sorts of id, symlinks, and keys.

Mark up the new disk.

parted /dev/диск_с_данными mklabel gpt

Create a new section on our partition

parted /dev/sdaa mkpart primary ext2 0% 100%
/sbin/sgdisk --change-name=1:'ceph data' -- /dev/sda1

We get the UUID of the dead osd

ceph osd dump|grep 'osd.Номер'

We put PARTUUID on a data disk

/sbin/sgdisk --typecode=1:99886b14-7904-4396-acef-c031095d4b62 -- /dev/Диск_с_данными

Find a section with a magazine

ceph-disk list | grep for | sort

Create a file system disk

/sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdaa1

Mount FS

mount -o rw,noatime,attr2,inode64,noquota /dev/Партиция_на_диске_с_данными /var/lib/ceph/osd/ceph-номер_OSD

Copying data from the neighboring OSD

In fact, this is the most disgusting part of the procedure, you need to do everything carefully.

When copying, you must skip the directory / var / lib / ceph / osd / ceph-NUMBER / current, this is the data directory. Symlink to the journal, we will create later


for i in activate.monmap active ceph_fsid fsid journal_uuid keyring magic ready store_version superblock systemd type whoami; do cp /var/lib/ceph/osd/ceph-НОМЕР_СОСЕДА/${i} /var/lib/ceph/osd/ceph-НОМЕР; done

Looking for a magazine

ceph-disk list | grep for | sort

accordingly, we find the partition, and do

ls -l /dev/disk/by-partuuid | grep Партиция_Номер

We do a symlink on this UUID

ln -s /dev/disk/by-partuuid/UUID /var/lib/ceph/osd/ceph-НОМЕР/journal

Fsid is filled with the correct value.

This fsid is actually a unique id, under which the osd scale is listed in the cluster, it is important, because if you do not guess with id, then the osd-scale itself will not see the cluster and it will be mutual.

And the value must be taken from the partuuid partition on the log with the data.

echo -n UUID >/var/lib/ceph/osd/ceph-НОМЕР/fsid

Fill in the keyring.

With this, the osd-box is authorized in the cluster.

ceph auth list|grep --after-context=1 'osd.НОМЕР'

It is recorded in a file in the format


Fill whoami

Just write in this file the number of OSD-shki that we want to revive.

We hammer in magazine

dd bs=32M oflag=direct if=/dev/zero of=/var/lib/ceph/osd/ceph-НОМЕР/journal

Create log metadata and osd-shki

ceph-osd --mkfs -i Номер_OSD
ceph-osd --mkjournal -i Номер_OSD

Change data owner

chown -R ceph:ceph /var/lib/ceph/osd/ceph-НОМЕР

Starting ceph-osd

Warning: Immediately after starting ceph-osd, rebuild will start if no ceph osd out NUMBER command was issued before the disk was released from the cluster.

systemctl start ceph-osd.НОМЕР

Also popular now: