Recover Lost LVM Volume in XenServer

    Once upon a time I had a car with XenServer 6.5 on board and several arrays of SATA disks. Recently, SATA performance has ceased to be enough and it was decided to replace one array with SAS disks. For these purposes, the Adaptec 3805 RAID controller was found (I know that it’s old, but a freebie).

    After successfully creating a RAID array from SAS disks (I confess, I used adaptek raid) and adding it as lvm-storage, I started transferring one of the images of virtual machines to it. In the process of contemplating the progress of the transfer, a suspicion of something was crept in, as the tone of the server's sound changed. And when the server went into an independent reboot, I started to turn a little gray ... And I was finally finished by the fact that after the reboot I did not find a portable image in any of the storages, and the new storage itself is displayed with the status “not available”.

    After a short walk to calm my nerves and a cup of coffee, I rolled up my sleeves (yeah, on a T-shirt) and started thinking how to restore the image ...

    For a start, of course, I went into the logs and saw that an error occurred while creating the storage from the SAS array:
    Error code: SR_BACKEND_FAILURE_47
    

    An error means that storage is not available. I decided to check the physical lvm volumes through pvdisplay and did not see the created volume on the SAS array. pvs also did not find the volume.
    This meant that the repository, in fact, was not created. More precisely, a storage object was created in XenServer, but it was not associated with a physical storage. Why XenServer behaved this way, and, moreover, allowed to transfer the image to this storage, I still did not find out.

    It turns out that you can’t even look for the image on the SAS array, since nothing was physically transferred to it. So you need to try to restore the image from the storage on which it was originally.

    An Internet search on the topic of recovering LVM logical volumes set the initial excavation vector.

    LVM stores its current configuration in / etc / lvm / backup / and, under normal conditions, an archive of old configurations in the form of binary files, in / etc / lvm / archive /. The UUID of the XenServer repository corresponds to the LVM VolumeGroup name. But it turns out that in XenServer this very archive is disabled:
    /etc/lvm/lvm.conf
    # Configuration of metadata backups and archiving. In LVM2 when we
    # talk about a 'backup' we mean making a copy of the metadata for the
    # * current * system. The 'archive' contains old metadata configurations.
    # Backups are stored in a human readeable text format.
    backup {

    # Should we maintain a backup of the current metadata configuration?
    # Use 1 for Yes; 0 for No.
    # Think very hard before turning this off!
    backup = 1

    # Where shall we keep it?
    # Remember to back up this directory regularly!
    backup_dir = "/ etc / lvm / backup"

    # Should we maintain an archive of old metadata configurations.
    # Use 1 for Yes; 0 for No.
    # On by default. Think very hard before turning this off.
    archive = 0

    # Where should archived files go?
    # Remember to back up this directory regularly!
    archive_dir = "/ etc / lvm / archive"

    # What is the minimum number of archive files you wish to keep?
    retain_min = 10

    # What is the minimum time you wish to keep an archive file for?
    retain_days = 30
    }

    Further searches showed that lvm stores an archive of all VolumeGroup configurations in the initial sectors of these same VolumeGroups. Since my storage is located on a separate array, I look at the beginning of this array:
    # hexdump -C /dev/md1 | less
    

    If you see something similar to configs, then you can remove the dump of these sectors for easier reading (in my case, the archive took 100Mb):
    # dd if=/dev/md1 of=dump.conf bs=100M count=1
    

    You can also dump using the lvmdump command.

    In the dump I’m looking for a config whose date precedes the moment the image was lost:
    Configuration
    VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191 {
            id = "TprMi6-z1OR-BGcz-uReP-if22-6122-tfu0zP"
            seqno = 5
            status = ["RESIZEABLE", "READ", "WRITE"]
            flags = []
            extent_size = 8192              # 4 Megabytes
            max_lv = 0
            max_pv = 0
            metadata_copies = 0
            physical_volumes {
                    pv0 {
                            id = "0gexgQ-urcH-GZd0-iehs-ne0y-6JYz-ZTGbna"
                            device = "/dev/md1"    # Hint only
                            status = ["ALLOCATABLE"]
                            flags = []
                            dev_size = 473571375    # 225.816 Gigabytes
                            pe_start = 20608
                            pe_count = 57806        # 225.805 Gigabytes
                    }
            }
            logical_volumes {
                     MGT {
                            id = "Znwuly-qcgx-AHbd-1qg9-Jjp8-eogk-N5ASme"
                            status = ["READ", "WRITE", "VISIBLE"]
                            flags = []
                            segment_count = 1
                            segment1 {
                                    start_extent = 0
                                    extent_count = 1        # 4 Megabytes
                                    type = "striped"
                                    stripe_count = 1        # linear
                                    stripes = [
                                            "pv0", 0
                                    ]
                            }
                    }
                    VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89 {
                            id = "yLMfFb-9yOk-vf1N-FTmz-5NiL-F5lx-NNmkuN"
                            status = ["READ", "WRITE", "VISIBLE"]
                            flags = []
                            segment_count = 1
                            segment1 {
                                    start_extent = 0
                                    extent_count = 12827    # 50.1055 Gigabytes
                                    type = "striped"
                                    stripe_count = 1        # linear
                                    stripes = [
                                            "pv0", 1
                                    ]
                            }
                    }
            }
    }
    


    From this config, you only need the entry corresponding to the missing image (the entry that is not in the current config in / etc / lvm / backup / <Corresponding VG>, in my case VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89). I rewrite it to the current configuration and give the LVM command to restore VG from backup:
    vgcfgrestore -f /etc/lvm/backup/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191 -v VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
    

    I verify that Logical Volume was picked up:
    # lvdisplay
      --- Logical volume ---
      LV Name                /dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/MGT
      VG Name                VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
      LV UUID                Znwuly-qcgx-AHbd-1qg9-Jjp8-eogk-N5ASme
      LV Write Access        read/write
      LV Status              available
      # open                 0
      LV Size                4.00 MB
      Current LE             1
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           253:0
      --- Logical volume ---
      LV Name                /dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89
      VG Name                VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
      LV UUID                yLMfFb-9yOk-vf1N-FTmz-5NiL-F5lx-NNmkuN
      LV Write Access        read/write
      LV Status              available
      # open                 0
      LV Size                50.11 GB
      Current LE             1
      Segments               1
      Allocation             inherit
      Read ahead sectors     auto
      - currently set to     256
      Block device           253:0
    

    If you now search the repository (via XenCenter or xe sr-scan), then XenServer will successfully overwrite this record and will have to do it all over again. As I understand it, XenServer does not see a VDI (disk image) with a UUID that matches the UUID of our restored Logical Volume.

    XenServer, when using lvm, stores disk images directly in Logcal Volume. More precisely, the Logical Volume is the VHD image. Therefore, I suggested that you can make XenServer see the image by copying it on top of another, the same size.

    To copy a section, I activate sections in this VG:
    # vgchange -ay VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191
    

    Now you have access to the LVM section, which means you can copy this section using dd:
    # dd if=/dev/VG_XenStorage-a1744b5b-cc65-ac9a-390c-8cfacf2cc191/VHD-6bdf21c1-cc52-45d1-ab9e-56bd7aa9bc89 of=image.vhd bs=100M
    

    After the copied image was added to the virtual machine, and also after the machine started from this image, my happiness knew no bounds!

    However, not everything is so rosy. It was still necessary to turn off the server in order to pull out the buggy controller. And when I turned it on, the image disappeared again! It turns out that XenServer at startup checks the image UUID in LVM against the UUID in its database and, if they do not match, the image is deleted.

    While picking LVM, he noticed that when transferring an image from one storage to another, its UUID also changes and, based on this, suggested that the image can be completely resurrected by simply copying the image transferred through dd to another storage. This should update the UUID in the image by matching it to the UUID in the database. We repeat all the procedures again, after which we transfer the image to the temporary storage created for this purpose, add it to the virtual machine and try to run it. Startup goes fine.

    We reboot the server, shaking hands with impatience and fatigue, check the list of images and ... the image is in place! Fortunately there is no limit and, pleased with myself, I retire into the sunset ...

    Also popular now: