beawolf - Fotolia

How do I size capacity for VMware oversubscription?

What best practices and tools can I use to allocate the right resources in an oversubscribed virtualization deployment?

Analytics and automation are twin keys to mitigating the risk of VMware oversubscription and unlocking the true value of virtualization.

With virtualization, IT administrators can take advantage of memory sharing, ballooning and compression, and thin provisioning, decoupling virtual server allocations from a 1:1 ratio with physical servers. Oversubscription lets you provision virtual machines as needed, let the hypervisor optimize the resources, and then capacity plan based on the real utilization of the physical infrastructure. Oversubscription is a bit of a misnomer: Regardless of what IT resources you allocate to a VM, if you are capacity planning based on what the workload actually needs, then you are not oversubscribing but subscribing fit to purpose.

Oversubscription's risk lies in changes in workload behavior -- such as when a system that's only using 50% of its allocated resource spikes suddenly to using 90%. If your virtualization infrastructure capacity is not sized to handle those kinds of spikes, you could overrun a resource on a host, cluster or data store. This in turn could degrade performance or even cause downtime.

The object is not only to minimize capacity spending, but also keep the virtual infrastructure performing.

Analytics provides the data required to make decisions about workload placement, reconfiguration and capacity requirements for VMware oversubscription. Automation carries out some analytics-based actions automatically, making the IT environment more agile and reactive to workload changes in real time.

Administrators should consider analytics and automation tools, and tools that bundle both steps together. Options include VMware vRealize Operations Manager and vRealize Orchestrator, VMTurbo Operations Manager and Veeam One in the virtualization space. DevOps configuration management tools also offer capabilities to manage VMware oversubscription; consider Chef, Puppet, Ansible and others.

Tools like VMTurbo Operations Manager bundle the analytics, decision-making and automation into one pane of glass. The same effect could be created in-house by gluing vRealize Operations Manager and vRealize Orchestrator together and building in the decision-making intelligence via scripts. Look for a product that works well out of the box and offers customization to your needs. Regardless of tool set, focus on metrics, long-term demand and overhead when optimizing your infrastructure and workloads.

The object is not only to minimize capacity spending, but also keep the virtual infrastructure performing. A 10-terabyte bank of 7200-rpm SATA storage drives is undoubtedly cheaper than the equivalent capacity in enterprise flash drives, but may not provide an acceptable performance for the workloads. Metrics that tie directly to capacity are critical for proper VMware oversubscription, but take into account metrics for performance, such as IOPS and read/write latency. Other infrastructure components have these ancillary metrics as well. Metrics on memory contention and CPU ready time will help inform oversubscription decisions.

Once we establish good analytical data, it's time to make decisions and carry out actions in the VMware virtualization deployment based on this data. Read on to see the main actions that you should automate.

Make sure you understand the workloads over time. Some VMs remain idle for weeks on end and then work very hard generating a month's end report, for example. Whatever your analytics engine, it must take into account the workload's behavior over an extended period of time before making decisions. Differentiate between average and peak readings. Some decisions can be made on average utilization, but others must account for peaks.

You still need to have resource overhead with VMware oversubscription. Equipping a virtual deployment with analytics and automation lets it act agilely and react to changes in workloads quickly, but only if resource overhead is there to absorb these changes. Workload optimization means that we may not necessarily need the capacity for full allocation of all virtual machines, but we need to maintain some buffer above minimum demand. Expect to provision somewhere between 10% and 20% capacity above the observed load on the infrastructure, then adjust this buffer as appropriate for your VMs.

Next Steps

Big data integrates with Apache Spark streaming analytics engine

VRO helps alleviate added burden on administrators

No guarantees with CPU Affinity

Dig Deeper on IT systems management and monitoring