VM backup fails due to quiescing failure

Snapshot win vmware vss

Some time ago we where at a customer and ran into a problem backing up virtual machines using vss and snapshots.

First I’ll explain the situation we got into.

This customer has offices geographically spread.

Currently the customer is in the middle of a datacenter consolidation process. The organization is migrating from a decentralized IT environment to a centralized environment using a virtualization. In the “old” environment every office location housed their own file- & application servers.

These remote file and application servers are consolidated into a single datacenter and even more into two (2) (virtual) Windows Server 2008 R2 file servers running on vSphere 5 and are presented using DFS (Distributed File Services).

Both new file servers are configured with one C-drive with multiple mount points to hold all the different file shares of the branch offices that used to be remote.

This is how the virtual machine configuration was setup:

server 01:

3 pvSCSI cards

Disk 1 c:\ at SCSI 0:0

11 disks Disk 2 – 12 at SCSI 1:0 – 1:10

11 disks Disk 13 – 23 at SCSI 2:0 – 2:10

 

server 02:

3 pvSCSI cards

Disk 1 c:\ at SCSI 0:0

13 disks Disk 2 – 14 op SCSI 1:0 – 1:13

13 disks Disk 15 – 27 op SCSI 2:0 – 2:13

 

So far so good, all the data has been migrated and the file servers are up and running. There is one issue however. When trying to backup the virtual machines using a virtualization aware/integrated backup tool we ran into an issue.

For Windows systems the backup tool uses the VMware vss (volume shadow copy service) driver to create application level consistent snapshots.

We ran into a problem with one of the two file servers. With this particular server (server 02) we were unable to create a quiesced snapshot. Server 01 did not experience any problems creating snapshots.

After quite some time spent on google and other sites we found a site that somehow explained what might be the problem:

http://markcampbell.sys-con.com/node/1544145/mobile

In one of the bottom cells in the right column of the table there is this statement: 

“There must as many free SCSI slots in the virtual machine as the number of disks. For example, if there are 8 SCSI disks on SCSI adapter 1, there are not enough SCSI slots free to perform application quiescing.”

So when digging deeper in order to find some VMware documentation about this I found three articles.

The first one is an article about developing sphere backup solutions which more or less confirms the statement in the previous article:

http://www.vmware.com/support/developer/vddk/vadp_vsphere_backup12.pdf 

The virtual machine must use SCSI disks only and have enough free SCSI slots free as the number of disks.”

This confirmed the statement mentioned on the blog site, but still did not explain why, so we digged deeper…

The second one is a KB article I found earlier that day about snapshot creation failing with windows 2008 and more than 30 disks:  http://kb.vmware. com/kb/1037754

The third and last article, and the most actual/recent I found is located at the vSphere 5 documentation Center. This article is about the driver type and quiescing mechanisms used according to the Guest Operating Systems. In other words, the table in this article explains which driver (sync driver or vss driver) and which quiescing method (file-system or application- consistent) is used (as default) in combination with OS-version (type) and ESX version.

http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.datarecovery.admin.doc_20%2FGUID-6F339449-8A9F-48C0-BE70-91A2654A79D2.html

Conclusion: after reading these las two articles it became clear what was going on and what we needed to do to fix the issue with backup:  Use the right amount of SCSI adapters and the snapshot creation should run smooth. 

Explanation:

We are dealing with Windows Server 2008 R2 and vSphere 5.0, the default application quiescing driver used it the VMware vss driver and the quiescing method is at application level. Application quiescing on Windows 2008 requires hot-add of the snapshot disks so that VSS and applications in the guest can modify the snapshot files to represent the quiesced state.

Said that it means that there must be at least as many SCSI slots free as there are in use. 

Now the math begins, it means that the maximum number of disks (or targets as in “vSphere configuration maximums” ) on a SCSI bus is 15. Given the fact a virtual machine can have up to four SCSI controllers, you will end up having 60 disks at most. But when taking into account the hot-add and enough free slots part. You’ll end up with a maximum of 30 disks per virtual machine.

In the example mentioned at the beginning the virtual machines both use less that 30 disks, but on the other hand also usue only three scsi controllers per virtual machine. So in this configuration server 01 and server 02 both have three SCSI controllers this means a maximum of 45 (3 * 15) disks. Half of the disks should be reserved for hot-add. Because this is an uneven amount the maximum number of disks in this setup should be 22 (ROUNDDOWN(45/2)) to be able to use application level quiesced snapshots. Server 02 has more than 22 disks and violates the rule so backup fails. Solution is to add another SCSI controller and with that the number of available SCSI slots.

In this situation we are using only file-server functionality, so filesystem-level quiescing is good enough, an option to get around the maximum of 30 disks in this scenario will be to force the snapshot mechanism to use filesystem quiescing. When setting the Virtual Machine VMX option disk.EnableUUID to false it will use filesystem level quiescing.

Please feel free to post a relevant comment.

 

1 Comment

  1. Ali

    Thanks a lot of such great explanation, Please keep the great work..

    I have one question, If the server is power on, how to add the scsi controller? as it’s dimmm… to be successful for snapshot.

Leave a comment

Your email address will not be published. Required fields are marked *