From Michael's Information Zone
Jump to: navigation, search

Problem

Environment

ESXi and VSphere are using a NetAPP appliance for NFS storage. NetAPP is set to create snapshots daily and keep for several days.

Steps to reproduce

  • Create a couple snapshots of the VM using vpshere
  • Move the disks to another NFS volume
  • Change the location of the disk in the VM settings using vpshere.
  • Some how obtain a corrupted VM that can not be deleted
  • Recreate the VM and add the exisitng disks to it.

At this point, one of the three disks had a couple snapshots that could not be consolidated or delete. But the drive was usable. At one point I messed up the permissions on the disk, and thought it would be easier to restore the disk from a NetAPP snapshot. However vsphere would not copy the disk because it was "locked" (even though it was in a snapshot directory).

  • Delete the problem disk.
  • Download the disk from the NetAPP snapshot directory
  • Upload the disk to the VM directory.

Now I had the disk restored, but it would not mount. The error message reported missing dependencies. Ends up the snapshots were located in two separate VM directories. The original that died and in the rebuild.

  • Download the disk files for both VM directories, from the NetAPP snapshot.
  • Replace the disk files with the ones downloaded.

For the next couple days the disk ran as expected. However, an admin reached out to me saying the disk no longer was functioning. It ended up that the disk was referencing the snapshot directory! WTF?!?! Of course the snapshot was old and was deleted. Without the files the disk could not stay mounted.

  • Download ALL files from the earliest snapshot available just in case.
  • Delete the disk files.
  • Upload the backups.
  • Experience the CID mismatch


Solution

So it ends up that the Datastore browser does not show all the files. Instead it consolidates them. If you download a VMDK file that holds the contents of the disk, you get several files. The first is the virtual machine disk descriptor file (aka vmdk) which provides the following information[1]

examplevm.vmdk:

# Disk DescriptorFile
version=1
CID= 7b7644b2
parentCID=ffffffff
createType="vmfs"


# Extent description
RW 20971520 VMFS "examplevm-flat.vmdk"

# The Disk Data Base
#DDB

ddb.toolsVersion = "0"
ddb.adapterType = "lsilogic"
ddb.geometry.sectors = "63"
ddb.geometry.heads = "255"
ddb.geometry.cylinders = "1305"
ddb.uuid = "60 00 C2 9f ae de ba e9-95 4e a7 a6 4e 95 c1 c1"
ddb.virtualHWVersion = "4"

The next file is the delta file, which provides the changes to the disk. Why are these not committed first? I have no idea! The third file is the actual data stored in a flat vmdk file.

  • With the files downloaded. I deleted the disk files from the VM and consolidated the RDO logs. This basically told vsphere that the disk does not exist and there are no related snapshots.
  • Now I compare the CIDs of each file, and update the master flat file with the expected CID that was from another file no longer used.
  • At this point upload the files back to the VM.
  • Add the disk and it should be recognized.
  • Add a new disk to the VM, replicate the disk files to the new disk, and replace the old disk with the new one.
  • Delete the old disk
  • https://kb.vmware.com/s/article/1007969