Getty Images/iStockphoto

How to monitor and deallocate Azure VMs to save money

Enterprises can save costs by deallocating Azure VMs during inactive periods. Teams can use log files and automation to monitor VM jobs and push deallocation when a job is complete.

Deallocated VMs in Azure are charged only for consumed disk space and other small items they might consume, such as external IP addresses. This knowledge can help enterprises reduce costs in certain scenarios, so it's key to know why, how and when to deallocate a VM.

For example, a company might use a deallocation policy for a large computation job that runs once a quarter. The cloud team could turn on the VM, use it and shut it down -- ensuring it's deallocated -- until next time.

It's possible to build automation pipelines to achieve similar savings, but those can be overly complex for a single VM-based job the team uses infrequently. It's easier to shut the VM down and deallocate it until necessary.

Managing VM jobs on expensive nodes can be an art form, so let's explore how deallocation works and how it can benefit enterprises.

Deallocated vs. powered-off VMs

Here are the main differences between a deallocated VM and a powered-off VM:

  • Deallocated. A deallocated VM is no longer scheduled on a host, but it still appears in the inventory and is marked as "Deallocated" in the management system. It doesn't consume resources or incur VM billing costs.
  • Powered-off. A VM that is powered off is still considered active and incurs costs for every moment it runs.

It isn't possible to deallocate a machine from within the target VM. Staff must do so from the Azure control plane. Shutting down from within the portal enables deallocation. Teams can also deallocate a VM via automation.

When running big jobs, it's important to keep in mind that Microsoft could patch the underlying host at any point, which shuts down the VM in question. Teams should have measures in place to use checkpoints and resume complex jobs.

Cost savings

Deallocating large VMs when they're not active can save a considerable amount of money. Typically, the high-end Azure SKUs aren't end user- or customer-facing, so it might be wise to shut them down between runs.

Some GPU-heavy SKUs can easily cost over $20 an hour, depending on the configuration, with costs just under $10,000 per month, excluding disks, for 24/7 use.

Powering off such a server for the weekend alone could save around $1,300 per month or approximately $15,500 per year, per machine. But long-running jobs often need to run for days or weeks. It can be difficult to manage these jobs for several reasons, including visibility into job progress and hypervisor patching. Teams can power these machines on or off using Start/Stop VMs automation built into Azure.

How to monitor long-running VMs

Teams have several ways to keep a long-running VM process in check when using a single node.

A basic way to achieve this is using the built-in monitoring tools for RAM and CPU. A better method is to ensure the application log output is verbose. For example, a long-running compute job in the continental U.S. benefits from logging the output to a file and splitting it per state being processed. The log file then gives insights into the status of the job.

Email is a great but simple way to notify teams of job completion. It's simple to add code that pushes an email notification to the individual or team overseeing the job when it's finished. The team can then proceed to deallocate the VM.

How to deallocate a VM

It's possible to deallocate a VM programmatically. For example, teams can use the following Azure CLI parameter to achieve deallocation:

az vm deallocate -n VM -g ResourceGroup --verbose

Teams can also program a job to intercept notification emails about a completed job and automatically deallocate the VM. Below is an example PowerShell code fragment -- not the entire script -- someone can use to search for an email from the job and execute a deallocation shutdown. The shutdown also needs to include the resource group, which is simple to add to the code.

Here is the code fragment:

# Iterate through each email in the inbox

foreach ($email in $inbox.Items) {
# Check if the subject starts with "Completed:" followed by a hostname
if ($email.Subject -match "^Completed:\s*(\S+)$") {
# Extract the hostname from the subject
$hostname = $matches[1]

# Perform deallocation of Azure VM with the extracted hostname

Deallocate-AzureVM -ResourceGroupName $resourceGroupName -VMName $hostname

$email.Move("your_processed_folder")
$email.FlagStatus = 1 # Mark as complete
$email.Save()
}
}

Don't use this code fragment anywhere near production without testing. The inadvertent shutdown of production systems in an unscheduled environment could cause major issues. Also, a dedicated email inbox can streamline the process and prevent code from rummaging through a shared inbox of nonpublic items. Don't forget to include any dependencies.

Stuart Burns is a virtualization expert at a Fortune 500 company. He specializes in VMware and system integration with additional expertise in disaster recovery and systems management. Burns received vExpert status in 2015.

Dig Deeper on Cloud infrastructure design and management