Getty Images

Observability tools add FinOps amid macroeconomic worries

FinOps features added to observability tools from Datadog and Sysdig this week reflect concerns about cloud cost management amid gloomy economic forecasts.

Enterprise IT teams concerned with cloud cost management have new FinOps options built into observability tools as forecasts about next year's global economy grow pessimistic.

Inflation, conflicts such as Russia's war in Ukraine and the ongoing COVID-19 pandemic will slow economic growth worldwide from 6.0% in 2021 to 3.2% in 2022 and 2.7% in 2023, according to a report this month by the International Monetary Fund.

In response to these macroeconomic concerns, the enterprise rush to cloud computing over the last two years, spurred by the pandemic and fueled by an "open checkbook" for IT, has now started to give way to pressure to reduce IT spending, according to a report issued this week by Andy Thurai, vice president and principal analyst at Constellation Research.

Most organizations -- especially the organizations that have not matured their digital operations -- are one major incident away from bankrupting themselves.
Andy ThuraiVice president and principal analyst, Constellation Research

"The onset of the COVID-19 pandemic forced businesses to move online faster than they desired, rather than evolving or improving their progression to digital maturity," Thurai's report states. "Although some of those digital transformation projects have lived up to expectations, most organizations -- especially the organizations that have not matured their digital operations -- are one major incident away from bankrupting themselves."

A haphazard approach to digital transformation led to tool sprawl that threatens the reliability of enterprise services, Thurai's report asserts. Observability, in which all types of data from all systems in an organization are pooled in one repository, will be important to reduce this tool sprawl and improve enterprise IT reliability going forward, according to the report.

"Observability is about knowing at all times what is happening with your systems and whether anything is going to break down in the near future," Thurai's report states. "The widespread practice of bolting on monitoring and management after applications are deployed is no longer sufficient."

Observability vendors play musical chairs as users seek consolidation

To capitalize on these trends, observability vendors, many of which began as application performance monitoring (APM) specialists, have been steadily broadening the kinds of data they collect. Tools from vendors such as Dynatrace and New Relic now include logs and traces in addition to metrics and events. These vendors have also expanded data analysis to include application reliability and security in addition to performance, in a bid to be their customers' choice for tool consolidation.

This week, as the scrutiny on IT costs continued to intensify, two observability vendors, Datadog and Sysdig, also added FinOps support. FinOps, a more recent term for IT cost management, is a blend between financial and IT operations.

One Datadog customer who presented this week during the vendor's annual Dash conference, said the vendor's new Cloud Cost Management FinOps add-on is timely.

"Given the current macroeconomic climate, there has been a renewed focus on Opex, and as a result, more scrutiny around cloud costs," said Martin Amps, principal engineer at online clothing subscription retailer Stitch Fix, during a conference breakout session.

Stitch Fix had beta access to Cloud Cost Management, which it added to its software engineer observability dashboards. Amps showed an example of such a dashboard during the Dash conference keynote, which included data on Amazon Relational Database Service (RDS) spending.

"Our plan works subtly, but successfully; the service owner was guilt tripped into optimizing their service," he said. "Before, they were spending $430 per day on RDS but only utilizing a fraction of the clusters capacity. Using this additional insight, they were able to identify that they had to resize their cluster utilization, achieving savings of 78%."

'Real-time utilization data changes the game'

In its first version, which became generally available this week, Datadog's Cloud Cost Management is limited to analyzing AWS services on virtual machines, although support for more cloud providers and Kubernetes environments is planned. Cloud Cost Management costs $7.50 per host/per month for Datadog users.

Sysdig, which has roots in container monitoring and security, will take a different tack in its first release of a free Cost Advisor feature this week. Cost Advisor focuses on only resource utilization within cloud Kubernetes environments but supports AWS, Azure and Google Cloud Platform.

Both Datadog and Sysdig's FinOps tools issue alerts about cost spikes and rank the services that cost the most within customer organizations. Datadog's service highlights changes in daily, weekly and monthly spending, while Sysdig Cost Advisor puts real-time and historical cost data into a time series database for long-term analysis.

These tools join other relatively recent FinOps additions, including open source Cloud Custodian and Apptio, which partners with ServiceNow. CI/CD tools from Harness.io can also embed FinOps data into developer workflows.

Cloud cost management tools aren't a new concept, either, said Gregg Siegfried, an analyst at Gartner, citing past examples such as CloudHealth, now VMware Aria Cost, and Turbonomic, among others.

What's new is the widespread use of public cloud providers, which have standard billing rates and from which observability vendors can collect up-to-the-minute utilization data, Siegfried said. By contrast, cloud provider tools such as AWS Cost Explorer have a latency of about 24 hours between resource usage and cost data delivery.

This quick access to cost data can be used to predict and avoid cost overruns, the same way observability data can help to predict and avoid performance or availability failures, according to Siegfried.

"Suddenly, everybody's concerned about costs in the macroeconomic climate, and now we have that in combination with the data being available to us in relatively real time," he said. "Real-time utilization data [delivered] in a way that you can make workload placement decisions based on it changes the game and makes the whole thing much more interesting."

Observability itself requires cost management

Reducing the number of tools and their associated licensing fees can have an obvious impact on IT spending. In addition, the kind of predictive analysis observability tools can do on large amounts of wide-ranging data can save organizations from dealing with costly cascading incidents in production.

But as distributed systems grow larger and spread out further to encompass edge computing locations for enterprises, collecting more data for observability tools can itself lead to cost overruns without careful planning.

That was the case for Yum! Brands, parent company for restaurant chains including Taco Bell, KFC and Pizza Hut, which replaced multiple IT monitoring tools for its e-commerce sites with Datadog in late 2021 and then sought to extend that consolidated observability to 53,000 restaurant locations.

The sheer volume of data collection across that many locations could have generated an estimated 14 billion lines of log data per week if Yum! Brands had imported all of the logs from all of the locations, said Sivaram Adhiappan, director of site reliability engineering at Yum! in a Dash conference presentation. It never got to that scale, but it still ingested enough data to crash its edge observability system early in its rollout.

"Things started getting out of hand a little bit, the logs were too much, the cost was escalating, and then one of our Datadog instances turned off logs in production," as restaurant locations came online, Adhiappan said. "We started diving into log-based metrics and rebuilt our dashboards based on metrics ... and when issues do pop up, we can now rehydrate a narrow slice of logs to find out what's going on."

This and the addition of Datadog's Live Tail log parser and exclusion filters cut out an estimated 30% to 40% of log data ingestion costs for the restaurant rollout, Adhiappan said.

"If you look at the log volume coming through and the top five or 10 [log types], 'printer out of paper' kept popping up all the time for us," Adhiappan said. "We were able to start making a little more progress there with exclusion filters."

Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.

Dig Deeper on Systems automation and orchestration