Kit Wai Chan - Fotolia

How does vSphere high availability affect VM restart order?

If you encounter a VM failure, you can use vSphere HA to automatically detect and restart VMs. Before you restart the VMs, you'll also need to meet resource and host criteria.

VMs can fail for a number of reasons, but with the proper configuration, vSphere High Availability can automatically detect the failure and attempt a restart -- as long as your system meets five criteria.

The restart process for a VM in a high availability cluster that fails is often more complicated than it seems because the original host system is likely no longer available. The cluster's master host must find another host system that is both available and capable of running the afflicted VM. The cluster's master host must also evaluate certain parameters before it restarts a failed VM node.

The first step in the vSphere High Availability process is for the cluster's master host to determine whether the VM files are accessible. If the master can't find the necessary files, it can't restart the VM. In most cases, this requires the host to access a snapshot or VM image that is currently running on another active cluster node.

Next, the master node must determine whether other suitable host systems are available -- and whether the VM is even capable of running on those available host systems. The replacement host must be a different physical system than other nodes in the cluster. That way, you avoid running duplicate VM nodes on the same physical host, which would defeat the purpose of using vSphere High Availability. If there are no other compatible host systems available, it's impossible to restart the VM.

Before a VM can start, the potential host system must have enough unreserved resources available to meet its resource requirements.

After the cluster's master host finds compatible host systems, the master considers any resource reservations on those systems. A system can reserve processors, memory, network interfaces and virtual flash. Before a VM can start, the potential host system must have enough unreserved resources available to meet its resource requirements. If there aren't enough resources available -- unreserved processor, memory, network interface or virtual flash capacity -- the VM won't be able to restart on that system.

Next, the cluster's master host must check for any prevailing host limits. For example, the VM won't restart on a system if that action violates the maximum number of supported vCPUs or VMs. If this process violates host limits, the master attempts to select an alternate host.

Finally, the master has to obey VM affinity or anti-affinity rules. For example, VM placement might be subject to VM affinity rules, which limit the VM to run on a certain subset of available host systems. In contrast, VM anti-affinity rules prevent VMs from starting on certain systems -- even if those systems are available and meet other criteria. If no available systems meet VM affinity or anti-affinity rules, it's impossible to restart the VM.

If conditions prevent a VM from restarting, the master triggers a log event noting that vSphere High Availability can't restart the VM. VSphere High Availability will try to restart the VM later if conditions change.

Dig Deeper on Disaster recovery planning and management