We restore virtual machines from the erroneously initialized Datastore. The story of one nonsense with a happy end
Disclaimer: The note is entertaining. The specific density of useful information in it is small. It was written "for yourself."
Lyrical introduction
The file dump in our organization is spinning on a VMware ESXi 6 virtual machine under Windows Server 2016. And this is not just a dump. This is a file sharing server between structural divisions: there is collaboration, and project documentation, and folders from network scanners. In general, this is the whole production life.
And this container of all production life began to hang. Moreover, the guest could quietly hang himself, without affecting the rest. Could hang after itself the whole host and, accordingly, all the other guest machines. I could hang myself and hang the vSphere client services: that is, the processes of the other guests are alive, the machines are working properly and are responding, but there is no file corruption and the vSphere Client does not cling to the host. In general, no system could be identified. Hangs could occur during the day during a weak load. Could at night during zero load. Could at night during differential backup and medium load. Could at the weekend during a full backup and high load. And there was a clear degradation of the situation. At first it was once a year, then once every six months. At the end of my patience, twice a week.
I sinned for RAM. But they didn’t let me stop the trash even on weekends and drive Memtest away. Waited for the May holidays. On May holidays, I drove off Memtest and ... no errors were found.
I was amazed and decided to go on vacation. While I was on vacation - the garbage dump did not have a single hang. And when on Monday the first day went to work - the garbage bin hung. Sustained a full backup, and right at the end it hung. Such a warm meeting from vacation pushed me to the decision to physically drag the guest drive to another host.
And, although it has long been known that on the first day after a vacation, nothing serious can be done, although I set myself up for work all the way to work, my indignation with the next freeze knocked out my mind and mood, and vows ...
Physical disks were rearranged to another host. Hot connection. Disks appear in the storage settings on the Drives tab . On the Datastores tab , the storage on these drives is not. Refresh - do not appear. Well, of course, the first impulse is Add Storage . The Add Wizard tells you what it supports. Of course it also supports VMFS. I did not doubt it. A quick look at the wizard messages at each step: Next, Next, Next, Finish. His gaze did not even close to catch a small yellow circle with an exclamation mark at the bottom of the window of one of the master’s steps.
At the end of the wizard, a fresh Datastore appeared on the list ... and with it Datastores from other physical disks.
I’m moving on to navigation on the newly added Datastore, and it ... is empty. Of course, I was again amazed. 8 am on the clock, the first 15 minutes at work after the holidays, even the sugar in the coffee has not yet been stirred. And here it is. My first thought was that I pulled the wrong drive from the "native" host. I looked to see if the required Datastore is present in the "native" host: no, not present. The second thought was: “shit # b!”. Not sure, but it seems to me that the third, fourth and at least fifth thought was the same.
To dispel doubts, I quickly installed a fresh ESXi on the sample, took the left drive and, after reading through it, went through the steps of the wizard. Yes. When adding a Datastore using the wizard, all data on the disk is lost without the ability to roll back the operation and restore data. Later, I read on one of the forums an assessment of such a design by the master: shitsome crap. And right now I really agreed.
Starting with the sixth, thoughts flowed in a more constructive way. Okay. Initialization takes a matter of seconds even for a 3Tb drive. So this is high level formatting. So, the partition table was simply rewritten. So the data is still there. So now let's look for some unformat and voila.
I load the car from the boot image of Strelec ... And I find out that partition recovery programs are known to everyone except VMFS. For example, they know the partition layout of Synology, but VMFS does not.
Enumerating programs is not comforting: at best, GetDataBack and R.Saver find NTFS partitions with live directory structures and live file names. But that doesn’t suit me. I need two vmdk files: with a system disk and a garbage bin.
And then I understand that, it seems, now I will install Windows and roll out of the file backup. And at the same time I remember that I had a DFS root there. And also a completely wild in volume and branching system of access rights to the folders of units. Not an option. The only time-acceptable option is to restore the state of the system and the disk with data and all rights.
Again googling, forums, KB'shki and again crying Yaroslavna: VMware ESXi does not provide a data recovery mechanism. All discussion threads have two finals: someone recovered with the help of not cheap DiskInternals VMFS Recovery or someone who was actively promoting their services with help from vmfs-tools and dd helped . The option to purchase a DiskInternals VMFS Recovery license for $ 700 is not an option. The admission of an outsider from the “territory of a potential adversary” to corporate data is also not an option. But it was googled that VMFS partitions can read also UFS Explorer.
DiskInternals VMFS Recovery
The trial version was downloaded and installed. The program successfully saw an empty VMFS partition:
In Undelete (Fast Scan) mode, it also found a shabby Datastore with virtual machine folders with disks inside: A
preview showed that the files were alive:
Mounting the partition to the system was successful, but for some reason in all three folders there was one and the same virtualka. Of course, the law of meanness is not what is required.
Three lines of shame
An attempt to shamelessly lock up the software ended in failure. But UFS Explorer was locked.
I was in a catastrophic situation and was not at all proud of the measures I had resorted to.
I am extremely negative about software theft. In no case do I urge the use of circumvention of protection against unlicensed use.
I was in a catastrophic situation and was not at all proud of the measures I had resorted to.
UFS Explorer
Scanning the disk showed the presence of 7 nodes. The number of nodes “surprisingly” coincided with the number of * -flat.vmdk files detected by VMFS Recovery:
Comparison of file sizes and node sizes also showed a match up to byte. At the same time, the names of * -flat.vmdk files and, accordingly, their belonging to virtual machines were restored.
In general, from the point of view of ESXi, vmdk disks consist of two files: a data file (<machine name> -flat.vmdk) and a physical disk partitioning file (<machine name> .vmdk). If you upload * -flat.vmdk file from the local machine to Datastore, then ESXi will not recognize it as a valid disk file. There is an article in the VMware Knowledge Base on how to manually create a disk descriptor file: kb.vmware.com/s/article/1002511, but I didn’t have to do this, I just copied the contents of the corresponding files from the preview area of the file contents to DiskInternals VMFS Recovery:
After 4 hours of uploading a 2.5TB node from UFS Explorer and 20 hours of loading into the Datastore hypervisor, the bent disk files were connected to a freshly created virtual machine. Wheels picked up. No data loss was noticed.