Recognizing and correcting common VM migration problems
A careful and systematic approach can speed the recognition and correction of common VM migration problems.
Migration is a fundamental benefit of virtualization, allowing virtual machines to move seamlessly between servers without noticeable disruption. Migration has become an essential tool for data center tasks ranging from server workload balancing and troubleshooting to mundane tasks like ensuring routine server maintenance windows. But migration isn't always seamless. Issues including server configuration oversights, a lack of compatible hardware between servers, the unwanted presence of dedicated hardware, a lack of network access and inadequate computing resources can all disrupt the migration process. This tip examines the most important causes of workload migration problems and offers suggestions to avoid and correct them.
Check migration settings or reconnect the host servers
VM migration between servers requires that both servers enable migration in the first place. For example, two servers using VMware ESX or ESXi must enable vMotion on both servers. If two Hyper-V servers must handle a migration, it's important to verify that Live Migration is available and enabled on both servers. With VMware ESX or ESXi, vMotion is enabled deep in the Configuration tab for the specific vSphere client, so it's important for IT administrators to use the documentation that accompanies each hypervisor and enable migration on each server.
Alternately, a technician can use the hypervisor's console to logically disconnect and reconnect each host server. For example, VMware ESX and ESXi hosts can be selectively disconnected, then reconnected through the vSphere client's host Inventory. Once the disconnect task is complete, return to the inventory, select the host and connect it again. Try the migration again after the hosts are reconnected successfully through vSphere.
In some cases, migration may be disrupted because of software bugs with the hypervisor, and it might be necessary to toggle migration settings off and on again at either (or both) of the affected servers. For example, this type of problem is known to have occurred on VMware ESX/ESXi 4.0 prior to Update 2, and technicians had to toggle the Migrate.Enabled setting on each host's vSphere Configuration tab. Patching ESX/ESXi 4.0 to Update 2 or later should resolve the problem.
Check for compatible server hardware and device dependencies
Virtualized servers are specifically intended to abstract the underlying hardware from the workloads running on top -- this abstraction is what makes workload migration possible -- but some rare situations may result in hardware discontinuities between the source and destination servers that prevent a successful migration.
It's important to start a troubleshooting effort by reviewing the server hardware and its configuration. As a simple example, source and destination servers need to use an identical processor for successful workload migration. Other hardware problems might arise from subtle differences in the processor or I/O virtualization settings in each system's BIOS.
Migration problems can also crop up when a VM relies on the presence of a hardware device that is not available on the destination server. For example, hypervisors like VMware ESX/ESXi allow VMs to connect to physical drives. If the VM relies on a physical drive connected to the source server -- but not present on the destination server -- the migration can be disrupted. Try safely disconnecting any local physical drives or other client devices from the VM on the source server and try the migration again.
Check network connectivity between the servers
Migration depends on network connectivity, so any connectivity problems at the source or destination server can easily disrupt migration actions. The most straightforward approach is to test connectivity between the source and destination servers using a ping utility. For example, VMware provides the vmkping utility, which can ping a destination server using a command shell at the source server. Simply enter the hostname or IP address of the destination server and look for a successful ping response, such as:
vmkping 192.168.1.1
You can also easily use a standard ping command through a Windows Command Prompt or Linux command line. If the ping is successful, you know that the source and destination servers can indeed communicate across the LAN. If not, there may be an issue with compatibility between the network interface cards (NICs) on the servers.
One common compatibility issue is the use of jumbo frames. For example, if one server NIC is configured for jumbo frames and the other is not, the two (although both fully functional) will not communicate successfully and workload migration will be impossible until the two NICs are configured identically. Another common problem occurs when an attempt to ping uses the destination server's hostname. If the hostname ping fails but the IP address ping works, there may be an issue with hostname resolution, which must be resolved to overcome the connectivity problem.
Check computing resources on the destination server
A workload cannot migrate to a destination server if it does not have the computing resources required to support it. Migration problems can occur when the destination server lacks adequate processor cores, memory space or NIC ports or has a storage shortage, and cannot reserve resources for the new workload. This is an increasingly common problem as physical server counts drop and workload consolidation levels increase.
For example, resource shortages can happen unexpectedly if the destination server has already received additional workloads failed over from other systems. Shortages can also occur if existing workloads on the destination server have received additional computing resources to accommodate greater resource demands due to factors like increased user activity and so on. Try migrating the workload to another system with adequate computing resources (such as an idle or spare server), or perform workload balancing to free resources on the required server.
One common issue is a shortage of disk space on the destination server, so inspect the disk space available. For example, VMware ESX/ESXi users can open a console to the destination server and use the df -h command to check the amount of used space (or use the vdf –h command to check space on a VMFS volume). If there isn't enough free space to store the migrated workload, an administrator will need to free space or migrate the workload to another system. If storage is provided through a storage area network (SAN), verify that both the source and destination servers are configured to use the same zoning.
Migrating workloads between physical servers is an essential feature of a virtualized environment, but the process is fraught with potential problems. Factors like hypervisor bugs, migration settings, unexpected hardware dependencies, network connectivity problems and configuration issues, resource shortages, and SAN setups can all conspire to prevent successful workload migrations. Fortunately, IT professionals can usually isolate and correct many migration snafus once they understand the most common problem areas.