JumalaSika ltd - Fotolia
How to keep VM sprawl in check
Virtualization improves hardware use, but the pendulum can swing the other way and result in an overallocation of resources. Here's how to maintain the balance.
During the deployment of virtual environments, the focus is on the design and setup. Rarely are the environments revisited to check if improvements are possible.
Virtualization brought many benefits to data center operations, such as reliability and flexibility. One drawback is it can lead to VM sprawl and the generation of more VMs that contend for a finite amount of resources. VMs are not free; storage and compute have a real capital cost. This cost gets amplified if you look to move these resources into the cloud. It's up to the administrator to examine the infrastructure resources and make sure these VMs have just what they need because the costs never go away and typically never go down.
Use Excel to dig into resource usage
One of the fundamental tools you need for this isn't Hyper-V or some virtualization product -- it's Excel. Dashboards are nice, but there are times you need the raw data for more in-depth analysis. Nothing can provide that like Excel.
Most monitoring tools export data to CSV format. You can import this file into Excel for analysis. Shared storage is expensive, so I always like to see a report on drive space. It's interesting to see what servers consume the most drive space, and where. If you split your servers into a C:\ for the OS and D:\ for the data, shouldn't most of the C:\ drives use the same amount of space? Outside of your application install, why should the C:\ drives vary in space? Are admins leaving giant ISOs in the download folder or recycle bin? Or are multiple admins logging on with roaming profiles?
Whatever the reason, runaway C:\ drives can chew up your primary storage quickly. If it is something simple such as ISO files that should have been removed, keep in mind that this affects your backups as well. You can just buy additional storage in a pinch and, because often many us in IT are on autopilot mode, it's easy to not give drive space issues a second thought.
Overallocation is not as easy to correct
VM sprawl is one thing but when was the last time you looked at what resources you allocated to those VMs to see what they are actually using? The allocation process is still a little bit of a guess until things get up and running fully. Underallocation is often noticed promptly and corrected quickly, and everything moves forward.
Do you ever check for overallocation? Do you ever go back and remove extra CPU cores or RAM? In my experience, no one ever does. If everything runs well, there's little incentive to make changes.
Some in IT like to gamble and assume everything will run properly most of the time, but it's less stressful to prepare for some of these unlikely events. Is it possible that a host or two will fail, or that a network issue strikes your data center? You have to be prepared for failure and at a scale that is more than what you might think. We all know things will rarely fail in a way that is favorable to you. A review process could reveal places that could use an adjustment to drain resources from overallocated VMs to avoid trouble in the future.
Look closer at all aspects of VM sprawl to trim costs
Besides the resource aspect what about the licensing cost? With more and more products now allocating by core, overallocation of resources has an instant impact on the application cost to start but it gets worse. It's the annual maintenance costs that pick at your budget and drain your resources for no gain if you cannot tighten your resource allocation.
One other maintenance item that gets overlooked is reboots. When a majority of Windows Server deployments moved from hardware to virtualization, the runtime typically increased. This increase in stability brought with it an inadvertent problem. Too often, busy IT shops without structured patching and reboot cycles only performed these tasks when a server went offline, which -- for better or worse -- created a maintenance window.
With virtualization, the servers tend to run for longer stretches and show more unique issues. Memory leaks that might have gone unnoticed before -- because they were reset during a reboot -- can affect servers in unpredictable ways. Virtualization admins need to be on alert to recognize behaviors that might be out of the norm. If you right-size your VMs, you should have enough resources for them to run normally and still handle the occasional spikes in demand. If you see your VMs requiring more resources than normal, this could point to resource leaks that need to be reset.
Often, the process to get systems online is rushed, leads to VM sprawl and overlooks any attempts at optimization. This can be anything from overallocations to simple cleanup. If this isn't done, you lose out on ways to make the environment more efficient, losing both performance and capacity. While this all makes sense, it's important to follow through and actually do it.