
Recover Lost LVM Volume in XenServer
Once upon a time I had a car with XenServer 6.5 on board and several arrays of SATA disks. Recently, SATA performance has ceased to be enough and it was decided to replace one array with SAS disks. For these purposes, the Adaptec 3805 RAID controller was found (I know that it’s old, but a freebie).
After successfully creating a RAID array from SAS disks (I confess, I used adaptek raid) and adding it as lvm-storage, I started transferring one of the images of virtual machines to it. In the process of contemplating the progress of the transfer, a suspicion of something was crept in, as the tone of the server's sound changed. And when the server went into an independent reboot, I started to turn a little gray ... And I was finally finished by the fact that after the reboot I did not find a portable image in any of the storages, and the new storage itself is displayed with the status “not available”.
After a short walk to calm my nerves and a cup of coffee, I rolled up my sleeves (yeah, on a T-shirt) and started thinking how to restore the image ...
For a start, of course, I went into the logs and saw that an error occurred while creating the storage from the SAS array:
An error means that storage is not available. I decided to check the physical lvm volumes through pvdisplay and did not see the created volume on the SAS array. pvs also did not find the volume.
This meant that the repository, in fact, was not created. More precisely, a storage object was created in XenServer, but it was not associated with a physical storage. Why XenServer behaved this way, and, moreover, allowed to transfer the image to this storage, I still did not find out.
It turns out that you can’t even look for the image on the SAS array, since nothing was physically transferred to it. So you need to try to restore the image from the storage on which it was originally.
An Internet search on the topic of recovering LVM logical volumes set the initial excavation vector.
LVM stores its current configuration in / etc / lvm / backup / and, under normal conditions, an archive of old configurations in the form of binary files, in / etc / lvm / archive /. The UUID of the XenServer repository corresponds to the LVM VolumeGroup name. But it turns out that in XenServer this very archive is disabled:
Further searches showed that lvm stores an archive of all VolumeGroup configurations in the initial sectors of these same VolumeGroups. Since my storage is located on a separate array, I look at the beginning of this array:
If you see something similar to configs, then you can remove the dump of these sectors for easier reading (in my case, the archive took 100Mb):
You can also dump using the lvmdump command.
In the dump I’m looking for a config whose date precedes the moment the image was lost:
From this config, you only need the entry corresponding to the missing image (the entry that is not in the current config in / etc / lvm / backup / <Corresponding VG>, in my case VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89). I rewrite it to the current configuration and give the LVM command to restore VG from backup:
I verify that Logical Volume was picked up:
If you now search the repository (via XenCenter or xe sr-scan), then XenServer will successfully overwrite this record and will have to do it all over again. As I understand it, XenServer does not see a VDI (disk image) with a UUID that matches the UUID of our restored Logical Volume.
XenServer, when using lvm, stores disk images directly in Logcal Volume. More precisely, the Logical Volume is the VHD image. Therefore, I suggested that you can make XenServer see the image by copying it on top of another, the same size.
To copy a section, I activate sections in this VG:
Now you have access to the LVM section, which means you can copy this section using dd:
After the copied image was added to the virtual machine, and also after the machine started from this image, my happiness knew no bounds!
However, not everything is so rosy. It was still necessary to turn off the server in order to pull out the buggy controller. And when I turned it on, the image disappeared again! It turns out that XenServer at startup checks the image UUID in LVM against the UUID in its database and, if they do not match, the image is deleted.
While picking LVM, he noticed that when transferring an image from one storage to another, its UUID also changes and, based on this, suggested that the image can be completely resurrected by simply copying the image transferred through dd to another storage. This should update the UUID in the image by matching it to the UUID in the database. We repeat all the procedures again, after which we transfer the image to the temporary storage created for this purpose, add it to the virtual machine and try to run it. Startup goes fine.
We reboot the server, shaking hands with impatience and fatigue, check the list of images and ... the image is in place! Fortunately there is no limit and, pleased with myself, I retire into the sunset ...
After successfully creating a RAID array from SAS disks (I confess, I used adaptek raid) and adding it as lvm-storage, I started transferring one of the images of virtual machines to it. In the process of contemplating the progress of the transfer, a suspicion of something was crept in, as the tone of the server's sound changed. And when the server went into an independent reboot, I started to turn a little gray ... And I was finally finished by the fact that after the reboot I did not find a portable image in any of the storages, and the new storage itself is displayed with the status “not available”.
After a short walk to calm my nerves and a cup of coffee, I rolled up my sleeves (yeah, on a T-shirt) and started thinking how to restore the image ...
For a start, of course, I went into the logs and saw that an error occurred while creating the storage from the SAS array:
Error code: SR_BACKEND_FAILURE_47
An error means that storage is not available. I decided to check the physical lvm volumes through pvdisplay and did not see the created volume on the SAS array. pvs also did not find the volume.
This meant that the repository, in fact, was not created. More precisely, a storage object was created in XenServer, but it was not associated with a physical storage. Why XenServer behaved this way, and, moreover, allowed to transfer the image to this storage, I still did not find out.
It turns out that you can’t even look for the image on the SAS array, since nothing was physically transferred to it. So you need to try to restore the image from the storage on which it was originally.
An Internet search on the topic of recovering LVM logical volumes set the initial excavation vector.
LVM stores its current configuration in / etc / lvm / backup / and, under normal conditions, an archive of old configurations in the form of binary files, in / etc / lvm / archive /. The UUID of the XenServer repository corresponds to the LVM VolumeGroup name. But it turns out that in XenServer this very archive is disabled:
/etc/lvm/lvm.conf
# Configuration of metadata backups and archiving. In LVM2 when we
# talk about a 'backup' we mean making a copy of the metadata for the
# * current * system. The 'archive' contains old metadata configurations.
# Backups are stored in a human readeable text format.
backup {
# Should we maintain a backup of the current metadata configuration?
# Use 1 for Yes; 0 for No.
# Think very hard before turning this off!
backup = 1
# Where shall we keep it?
# Remember to back up this directory regularly!
backup_dir = "/ etc / lvm / backup"
# Should we maintain an archive of old metadata configurations.
# Use 1 for Yes; 0 for No.
# On by default. Think very hard before turning this off.
archive = 0
# Where should archived files go?
# Remember to back up this directory regularly!
archive_dir = "/ etc / lvm / archive"
# What is the minimum number of archive files you wish to keep?
retain_min = 10
# What is the minimum time you wish to keep an archive file for?
retain_days = 30
}
# talk about a 'backup' we mean making a copy of the metadata for the
# * current * system. The 'archive' contains old metadata configurations.
# Backups are stored in a human readeable text format.
backup {
# Should we maintain a backup of the current metadata configuration?
# Use 1 for Yes; 0 for No.
# Think very hard before turning this off!
backup = 1
# Where shall we keep it?
# Remember to back up this directory regularly!
backup_dir = "/ etc / lvm / backup"
# Should we maintain an archive of old metadata configurations.
# Use 1 for Yes; 0 for No.
# On by default. Think very hard before turning this off.
archive = 0
# Where should archived files go?
# Remember to back up this directory regularly!
archive_dir = "/ etc / lvm / archive"
# What is the minimum number of archive files you wish to keep?
retain_min = 10
# What is the minimum time you wish to keep an archive file for?
retain_days = 30
}
Further searches showed that lvm stores an archive of all VolumeGroup configurations in the initial sectors of these same VolumeGroups. Since my storage is located on a separate array, I look at the beginning of this array:
# hexdump -C /dev/md1 | less
If you see something similar to configs, then you can remove the dump of these sectors for easier reading (in my case, the archive took 100Mb):
# dd if=/dev/md1 of=dump.conf bs=100M count=1
You can also dump using the lvmdump command.
In the dump I’m looking for a config whose date precedes the moment the image was lost:
Configuration
VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191 {
id = "TprMi6-z1OR-BGcz-uReP-if22-6122-tfu0zP"
seqno = 5
status = ["RESIZEABLE", "READ", "WRITE"]
flags = []
extent_size = 8192 # 4 Megabytes
max_lv = 0
max_pv = 0
metadata_copies = 0
physical_volumes {
pv0 {
id = "0gexgQ-urcH-GZd0-iehs-ne0y-6JYz-ZTGbna"
device = "/dev/md1" # Hint only
status = ["ALLOCATABLE"]
flags = []
dev_size = 473571375 # 225.816 Gigabytes
pe_start = 20608
pe_count = 57806 # 225.805 Gigabytes
}
}
logical_volumes {
MGT {
id = "Znwuly-qcgx-AHbd-1qg9-Jjp8-eogk-N5ASme"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 1 # 4 Megabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 0
]
}
}
VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89 {
id = "yLMfFb-9yOk-vf1N-FTmz-5NiL-F5lx-NNmkuN"
status = ["READ", "WRITE", "VISIBLE"]
flags = []
segment_count = 1
segment1 {
start_extent = 0
extent_count = 12827 # 50.1055 Gigabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 1
]
}
}
}
}
From this config, you only need the entry corresponding to the missing image (the entry that is not in the current config in / etc / lvm / backup / <Corresponding VG>, in my case VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89). I rewrite it to the current configuration and give the LVM command to restore VG from backup:
vgcfgrestore -f /etc/lvm/backup/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191 -v VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
I verify that Logical Volume was picked up:
# lvdisplay
--- Logical volume ---
LV Name /dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/MGT
VG Name VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
LV UUID Znwuly-qcgx-AHbd-1qg9-Jjp8-eogk-N5ASme
LV Write Access read/write
LV Status available
# open 0
LV Size 4.00 MB
Current LE 1
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
--- Logical volume ---
LV Name /dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89
VG Name VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
LV UUID yLMfFb-9yOk-vf1N-FTmz-5NiL-F5lx-NNmkuN
LV Write Access read/write
LV Status available
# open 0
LV Size 50.11 GB
Current LE 1
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 253:0
If you now search the repository (via XenCenter or xe sr-scan), then XenServer will successfully overwrite this record and will have to do it all over again. As I understand it, XenServer does not see a VDI (disk image) with a UUID that matches the UUID of our restored Logical Volume.
XenServer, when using lvm, stores disk images directly in Logcal Volume. More precisely, the Logical Volume is the VHD image. Therefore, I suggested that you can make XenServer see the image by copying it on top of another, the same size.
To copy a section, I activate sections in this VG:
# vgchange -ay VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
Now you have access to the LVM section, which means you can copy this section using dd:
# dd if=/dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89 of=image.vhd bs=100M
After the copied image was added to the virtual machine, and also after the machine started from this image, my happiness knew no bounds!
However, not everything is so rosy. It was still necessary to turn off the server in order to pull out the buggy controller. And when I turned it on, the image disappeared again! It turns out that XenServer at startup checks the image UUID in LVM against the UUID in its database and, if they do not match, the image is deleted.
While picking LVM, he noticed that when transferring an image from one storage to another, its UUID also changes and, based on this, suggested that the image can be completely resurrected by simply copying the image transferred through dd to another storage. This should update the UUID in the image by matching it to the UUID in the database. We repeat all the procedures again, after which we transfer the image to the temporary storage created for this purpose, add it to the virtual machine and try to run it. Startup goes fine.
We reboot the server, shaking hands with impatience and fatigue, check the list of images and ... the image is in place! Fortunately there is no limit and, pleased with myself, I retire into the sunset ...