This week we tried to clone a vm but when doing so we encountered an error (see screenshot). Whatever we tried we could not figure out what caused the problem.
At first we thought it was the upgrade to Virtual Center 2.5 that caused the problem, but being in the luckily circumstances having another ESX/VC installation on another site and the ability to test it there without any problem proved otherwise. Our next guess concerned the version of the hosts. Being in the middle of the upgrade to ESX 3.5 + VC 2.5 we still had to upgrade the ESX hosts. So we had an upgraded VC version 2.5 communicating to a couple of ESX hosts who were running on version 3.0.2.
Further testing ruled out any causal connection between the version of ESX/VC and the error we were presented with. In another datacenter within the same Virtual Center instance we could clone a vm without being presented with an error. That left us with the conclusion that the error was to be found at the datacenter or even cluster level. In order to get more information we raised the logging level in Virtual Center, hoping to see an error message telling us why the clone kept failing. I read a thread this week about improved error messaging in the new version of ESX/VC so that was keepoing up the faith. Unfortunately in this case the error messages were not that clear : An unhandled exception occured … etc.
Trying another approach called ‘common sense’ proved to be much more effective. First we looked at the process of cloning. Cloning consist of a number of steps. Which steps are taken and in which part of the process does the error pop up. It turned out that the error popped up at the stage where it should present us with a list of datastores available. Next we analyzed the list of datastores, which also contains the local storage of each host. That was it! One of our ESX-hosts in the cluster was shut down because of hardware failure, but still appeared as member in the cluster (with a red cross). Because repairing the host had taken some time already (and a low priority was given to it) we forgot about it. So the local storage of the cluster resource(ESX host) could not be queried and gave us the error as shown in the screenshot.
Removing the ESX host with the hardware defect from the inventory proved the theory. After the removal we could clone a machine again.
So when running into an error while cloning a VM, take a look at the process of cloning, determine the step in the process where it goes wrong. Take a good look at your inventory, check for hardware defects or connectivity issues with specific components involved.