Melpomene - Fotolia

Tip

Improve hybrid cloud monitoring through automation, alerts

To effectively monitor hybrid cloud infrastructure -- without being overloaded with data and alerts -- IT teams need to rethink some of their existing processes. Use these five best practices to get started.

George Lawton

Published: 17 Sep 2018

As applications and data increasingly span private and public clouds, enterprises face a host new challenges -- and monitoring is one them.

IT teams need to build a robust hybrid cloud monitoring strategy that ensures solid application performance, high availability and low costs across different infrastructures. In addition, they require a monitoring tool that can aggregate data about different infrastructure components, such as compute, storage and network. Automation should play a key role in any hybrid cloud monitoring strategy, as it improves data collection, reduces false negatives and positives and dynamically scales infrastructures.

Here's a look at some tools and best practices to monitor a hybrid cloud deployment.

Standardize back-end monitoring

Every public cloud platform and private tool generates different kinds data. Application performance monitoring, logging and tracing tools can complement this data. This can lead enterprises to store multiple data sets in separate monitoring tools, which makes it difficult to make effective decisions, identify problems and automatically scale cloud infrastructure.

In addition, different roles in an organization have an interest in different kinds information. Application developers, for example, are more interested in debugging code, while operations teams will want to know how to respond to incidents. To address this, some organizations create separate applications for these tasks that integrate directly with the various monitoring tools -- but this only increases complexity.

A good practice is to aggregate a useful subset monitoring data from the various cloud platforms in use into a single monitoring tier. This enables you to present the most appropriate alerts or reporting data to the right people, with the most appropriate tool and without new integrations.

Identify important KPIs

Enterprises need to identify and define key performance indicators (KPIs) to measure success. But there is also a risk KPI fatigue -- where users are presented with so many different KPIs that each one loses its importance.

A hybrid cloud monitoring infrastructure generates an enormous amount useful data, so enterprises need to focus not only on which metrics they should monitor, but how those metrics relate to specific organizational or departmental goals. These could be page load times, cloud cost optimization or conversation rates.

Evaluate where your company encounters the most challenges, and then focus on those KPIs. Then, continuously revaluate KPIs to determine if others should take precedence.

Establish a useful alert threshold

Hybrid cloud monitoring can also generate a large number alerts. If the alert threshold is set too high and results in many false positives, engineers can experience alert fatigue and overlook pressing problems. However, if the alert threshold is too low and creates false negatives, engineers will not receive notifications in time to act on an important issue.

To avoid this, automate the scoring and delivery alerts in a timely way. For example, some hybrid cloud monitoring tools now use AI to identify and tune alert thresholds. The combination AI and automation can correlate multiple alerts together and highlight the issue that engineers need to address.

Connect monitoring and management

As IT teams advance their hybrid cloud monitoring practices, they may start to address the same problems repeatedly. It is time-consuming to identify the root cause a problem and then fix it, especially when you do it over and over again.

Integrate your monitoring system with management tools, such as Slack or PagerDuty. This enables you to automatically capture data about how a team responds to a particular type problem. If the same issue resurfaces, look at prior communications about the alert, and quickly apply the same fix.

Look for ways to integrate your monitoring tool directly with your cloud management tool. This will enable alerts to automatically kick off tasks, like service restarts, resource scaling or rollbacks of a new deployment.

Explore vendor tool options

Many enterprises turn to their primary public cloud provider's native monitoring tools. AWS, Microsoft and Google all offer them, though not all have native support for hybrid infrastructures.

Amazon CloudWatch monitors applications, resources and CPU usage on the AWS platform. While it does not directly support hybrid monitoring, there are third-party tools -- such as those from BMC, CA, Datadog, Dynatrace, New Relic and Stackify -- that can aggregate CloudWatch data into their alerts and reports.

Azure Monitor provides an end-to- end view of all private and public cloud resources that run on Windows and Linux servers, as well as VMs. The tool captures, analyzes and produces alerts. It is best for companies that require unified monitoring between private infrastructure and Azure. Azure Monitor also supports tight integration into Microsoft Operations Management Suite Automation to dynamically scale private and public cloud infrastructure.

Google Stackdriver offers monitoring, logging and metadata for private infrastructure, as well as services that run on Google and AWS clouds, for availability and performance optimization. It integrates with Google analytics tools, like BigQuery and Cloud Datalab. Additionally, Stackdriver can automate alerts sent to apps, like PagerDuty and Slack.

Next Steps

Navigate hybrid cloud observability with 3 techniques

Improve hybrid cloud monitoring through automation, alerts

To effectively monitor hybrid cloud infrastructure -- without being overloaded with data and alerts -- IT teams need to rethink some of their existing processes. Use these five best practices to get started.

Standardize back-end monitoring

Identify important KPIs

Establish a useful alert threshold

Connect monitoring and management

Explore vendor tool options

Next Steps

Dig Deeper on Cloud deployment and architecture

APM vs. observability: Key differences explained

Cloud visibility: Definition, importance and challenges

FinOps best practices to fully optimize your cloud costs

Compare Datadog vs. New Relic for IT monitoring in 2024