Sergej Khackimullin - Fotolia
Virtualization challenges that negatively affect your systems
Virtualization provides lower costs and improved productivity, but these benefits can be offset by certain challenges, such as unresponsive VMs and VM network latency.
Virtualization is industry standard in modern IT. IT administrators should be aware of its challenges, such as unresponsive VMs, VM network latency, monster VMs, resource contention and zombie VMs, to ensure successful operation of their virtual systems. Otherwise, they risk lackluster performance, which can lead to critical issues.
Virtualization provides admins with many benefits, such as single-minded servers, expedited deployment and redeployment, lower costs, quicker backups and improved productivity. But a set of unique challenges can sometimes counterbalance these benefits. Admins must monitor their systems vigilantly to avoid system overload and must put specific tactics -- such as VM tags -- in place to help mitigate problems.
Resource contention within VMs
One of the main virtualization challenges, and the cause of most performance issues within a VM, is lack of resources within a storage array. Generally, a virtualization host has a finite pool of hardware resources. As a result, every VM within a given system must share those resources. If VMs generate a large amount of IOPS requests, these VMs risk overwhelming the storage array.
To solve this, admins can move VMs to a storage array that contains the appropriate amount of resources to handle their VMs. In addition, specific performance monitoring tools enable admins to see the number of IOPS that a VM requires. These tools can also compare admins' workloads against their storage hardware's capabilities.
By monitoring their systems, admins can move VMs prior to a performance issue arising and avoid resource contention.
Unresponsive VMs
Frozen VMs happen as a result of locked up or unresponsive VM tasks. In some cases, the guest OS might also refuse to respond, which can cause admins to struggle with halting, restarting, or powering off and on unresponsive VMs. A VM might become unresponsive for a variety of reasons, such as issues with storage, network and available resources on the host server. When a VM freezes, some admins might opt to kill the VM process through the hypervisor interface, but this should be a last resort.
Prior to taking action, admins should first determine whether there are one or more unresponsive VMs. If several VMs have become unresponsive on a single host server, the problem most likely originates with the host service itself. If a VM does respond through specific interfaces, then admins can trace the issue through check logs or error messages at the hypervisor console.
Once admins narrow down the specific issue, they can then discover the root cause for unresponsive VMs. If admins cannot trace the issue, they should consider whether a specific task causes VMs within the host to freeze. Then, admins can inspect the configuration of the VM and its host system, ensuring that enough resources are available. Finally, admins should check whether their network and shared storage accommodates each VM.
VM network latency
VMs require network access to operate, but issues such as extended ping response times can lead to performance issues, which can then impact the operations of admins' systems. To address VM network latency, admins must first rule out any LAN issues within their systems.
Network congestion, such as busy antimalware, often causes network latency. Admins might also find that IP conflict and faulty or poorly configured network equipment can cause network latency. In addition, virtual processor overcommitment can create problems. When the host system provides certain VMs with more virtual processors than required, other VMs might not have adequate processor time.
Once admins determine the cause for network latency, they can then isolate the issue to the host server hardware. For example, issues such as a poorly configured input/output system, improperly configured network port and outdated VM drivers can cause network latency. For Windows Server environments, a common cause for network latency comes from issues with the power plan. If the power plan is not set correctly -- such as set to balanced -- it can lead to performance problems. Rather, admins should set the power plan to high-performance to reduce network latency.
Monster VMs and application killers
Monster VMs run more than 8 vCPUs and 255 GB of virtual RAM, and admins use them to run applications that require high CPU and memory resources. But monster VMs can also cause performance problems because of resource scheduling issues. To better manage monster VMs, admins must consult vRealize Operations (vROps), CPU and memory demand metrics to rightsize the monster VMs.
In addition, a virtual system can contain several performance killers that significantly affect application operations. If admins experience performance latency, they can take the top-down approach to identify the issue, which starts with the application stack and, subsequently, moves down to the OS stack, VM stack, ESXi stack and, lastly, the infrastructure. Once admins single out the issue, they can then use tools such as the ESXi command line and vROps to remediate performance issues.
Zombie VMs and VM sprawl
Zombie VMs don't perform any beneficial tasks, yet they consume valuable system resources. Essentially, admins create a zombie VM when they abandon a VM. Automation is a critical component to modern IT, and admins can now automatically create VMs in large volumes. As a result, admins might lose track of VMs within their systems, which can lead to virtualization sprawl (VM sprawl).
Tracking down these zombie VMs and mitigating VM sprawl can become difficult, but admins can use VM tags to track VMs more easily. Once admins create VMs, they can attach a unique tag to each VM to help determine the exact purpose of the VM. If admins neglect to use VM tags at the time of VM creation, they must then monitor their entire systems for unusual performance behavior. For example, if admins notice performance issues within the CPU, memory and network of their systems, this can indicate zombie VMs and VM sprawl.
But admins should not kill VMs right away. Backup Active Directory controllers and domain name system servers don't stay active all the time. Admins must monitor their systems closely and, once they determine the presence of a zombie VM, test the VM by disconnecting it from the network and moving it to a disk to ensure they make no mistakes.