Charity Majors on AI observability and the future of SRE
The observability maven reflects on 'Observability 2.0' and urges site reliability engineers to embrace AI agents, despite their perceived threat -- before it's too late.
The CTO who pioneered the term observability now faces an inflection point as generative AI goes mainstream and AI agents loom.
Charity Majors
Charity Majors co-founded observability software maker Honeycomb in 2016, based on her experience building and managing distributed systems at Parse, Facebook and Linden Lab building Second Life. She was among the first to apply the term to describe an IT management process based on a concept in control theory, a field at the intersection of applied mathematics and engineering. In both cases, observability is a measure of the ability to understand internal system states by observing outputs.
In the software realm, the popularization of observability led to an emphasis on collecting a comprehensive range of data types from IT systems, including metrics, logs and events. These came to be known as the "pillars" of observability. After collection, observability tools flexibly query that pool of data in response to issues or to evaluate system performance.
But it's time for that "mental model," as Majors describes it in this episode of IT Ops Query with Informa TechTarget senior news writer Beth Pariseau, to evolve.
Observability evolution by any name
Majors initially described a vision for the next evolution of observability as Observability 2.0, but now avoids that term.
"Some feathers have definitely gotten ruffled at calling, let's say, the previous mental model 1.0, so ... instead of calling it 1.0, I'll just call that the multiple pillars world," she said. "If you look at all of the observability startups that have been founded since 2021, the ones that still survive ... have been built around a very different mental model, which I've been calling 2.0, but you could also call it consolidated storage or unified storage."
In this newer observability model, instead of storing data in different formats, tools store "arbitrarily wide structured data blobs with lots of context," i.e., events, Majors said, in a unified repository, often a columnar database. Centralized data storage and a unified format can cut costs, reduce context switching and make it easier to correlate data points to understand application behavior.
What does all this have to do with AI observability? According to Majors, generative AI applications and emerging agentic AI applications make fundamental principles of observability "more true than ever" -- including the need to test and debug applications in production.
Any attempt to clearly delineate or separate out generative AI observability from software observability is just kind of doomed.
Charity MajorsCo-founder and CTO, Honeycomb
"You can't understand [an AI] model in a vacuum, [and] it's complexity that is driving a lot of this," she said. "[With] the surge of generative AI, complexity was increasing fast to begin with. It's now been cranked up even faster."
As a result, Majors said, "the wheels are coming off" the original observability model of data pillars.
"Any attempt to clearly delineate or separate out generative AI observability from software observability is just kind of doomed," she said. "In reality, data is made valuable by context. The more context you can pack into something, the more valuable every bit of that data is. And so I think that generative AI and the rise of AI in general is going to accelerate people's paths to this."
Agentic AI advice for SREs
AI observability is one thing, but agentic AI represents a disruptive leap forward, especially as it accelerates the automated generation of "disposable software," Majors said. In that world, which is at least half a decade away by her estimate, not every company will necessarily need to understand the software it runs deeply.
"The closest analog for me is when the camera was invented, because most of what painters used to do is paint little miniatures of people so that people could carry around their loved ones' faces," she said. "And then the camera came out, and all of a sudden [there] was this crisis in the art world about, 'What are our skills good for?'"
She said a more recent analogy was the refusal of many visual artists and graphic designers to adopt generative AI for images three years ago. Majors also compared the current position of site reliability engineers (SREs) with that of QA engineers a decade and a half ago. In both cases, skilled professionals refused to adapt to new technology and were left behind, she said.
"If we in the SRE community don't acknowledge that, then we're going to get left out," Majors said. "I feel very strongly that you don't help solve collective action problems by opting out of them, and that is my worry that SREs will do."
Beth Pariseau, senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.