Getty Images
Cribl Search marks fresh observability sortie for upstart
The Splunk nemesis begins new forays onto the turf of incumbent vendors with federated search that doesn't require data migration or indexing -- and big roadmap plans.
Listen to this story
If Cribl Search meets its roadmap goals, it could challenge established observability players with queries that don't require a central data repository and related infrastructure costs -- and its potentially disruptive product plans don't end there, according to its CEO.
Cribl Search and a related data processing agent, Cribl Edge, became generally available in 2022, alongside a renamed data pipeline product, Cribl Stream, formerly LogStream. The Cribl Search SaaS tool connects to Cribl Edge agents to index and search observability data locally. At least theoretically, Cribl's in-place search eliminates the need for users to send data to a separate system for conversion into another data format, indexing and long-term retention -- all of which incur licensing or storage costs, or both, depending on the vendor. Instead, Cribl Edge indexes data at its point of origin without converting it into a proprietary format, and performs a federated search in response to queries.
With these updates, Cribl began a move away from a business model that ran afoul of log monitoring bellwether Splunk, but it might start to step on more competitive toes if it can lure customers away from search services offered by Elastic Inc., AWS and others.
"Cribl Search works with the data wherever that data is," said Clint Sharp, CEO and co-founder of Cribl. "We are giving people the ability to continue to own their data. … When I put data into Splunk, or Snowflake or Datadog, that data becomes the vendor's data … that gives you a great experience retrieving this data, but the downside of that is that I have to maintain a relationship with that vendor in order to get the data back."
Early-stage Cribl Search presents tradeoffs
Cribl Search is still a version 1.0 product, Sharp said. It supports only log data so far, despite Cribl Stream's expansion to support metrics, events and traces in 2019 and 2020. Cribl Search also has yet to make fine-grained monitoring and alerting features available based on its search results; all are features Sharp said Cribl will build in soon.
That roadmap has one customer thinking about replacing its OpenSearch with Cribl Search long-term. OpenSearch requires data to be copied into search cluster nodes; the AWS OpenSearch Service offers lower-cost storage tiers but also requires separate copies of the original application's data.
"Cribl Search could eliminate the need for us to host our own OpenSearch and have to store S3 data twice," said Bob Chen, director of infrastructure engineering at iHerb, an online retailer for health and wellness products in Irvine, Calif. "We need a few more feature parity items in order for it to match up to … OpenSearch [such as] threshold alerting and dashboards."
In the meantime, Cribl Stream, Edge and the Cribl.Cloud managed service have already made an impact on iHerb's observability costs, Chen said. Cribl Stream prevents insignificant data from being sent to OpenSearch and speeds up the restoration of historical data from cold storage in S3 when needed. This reduced the higher-tier storage and compute resources iHerb has to maintain for OpenSearch by 25%, at a cost savings of tens of thousands of dollars per month. Cribl.Cloud SaaS offloaded upgrades and maintenance of Cribl Stream and Edge agents from iHerb's five-person SRE team, which supports about 300 developers.
"We went from three [search] clusters down to one," Chen said. "We also reduced the number of support tickets SREs got about logs missing or to triage a backlog to almost nothing."
Since S3 buckets can't host the Cribl Edge agent, Cribl Search triggers short-lived AWS Lambda functions to run it for S3 data searches. That approach could present cost tradeoffs for large-scale S3 users, in the form of network egress charges incurred when the Cribl-hosted Lambda function accesses data inside the user's AWS account.
"In your model, I'm paying to push the data to you so you can process it," said Carl Fugate, director and cloud technology adviser at electronic health records software maker Netsmart in Overland Park, Kan., during a recorded presentation and Q&A session with Cribl reps at a Tech Field Day event in November. Fugate asked if Cribl had plans to allow customers to host Lambda functions and Cribl Edge agents themselves to avoid those charges.
"We could certainly do that, but it would require [giving us] permissions to access compute resources living in your account," said Oliver Draese, senior principal software engineer at Cribl, during the Q&A session. "You would still have to account for some network egress, because the Lambdas are producing some filter data … to put into the cloud environment where we run the UI, post-processing, keep the query history and so on."
Cribl Search supports cloud region-aware search to minimize egress costs, according to a company spokesperson. It does not yet search data within Amazon Glacier cold storage instances.
Cribl CEO addresses product roadmap, Splunk lawsuit
Cribl doesn't market its own back-end data storage and analytics -- rather, it's primarily focused on refining data sent to such systems. But Cribl also plans to roll out its own version of a data lake sometime in the next two years, according to Sharp.
"We will make some sort of offering maybe this year, maybe next year, that will help people set up their data lakes," Sharp said. "And if you want to pay us to own the S3 bucket, you can. … But if you opt to give us data and put data into a lake that we've orchestrated for you, you're never locked in to what Cribl's doing. You can immediately substitute us for any other vendor."
Cribl will also eventually take further steps into the AI and machine learning realm, where Cribl Search has already established the company's first foothold, Sharp said. Future efforts here for Cribl would likely focus on network infrastructure and security data sets, which lend themselves better to analysis with machine learning than application data, in Sharp's view.
"There are certain areas we will move into over the next couple of years -- I don't have any specific timelines," Sharp said. "AI is really good at finding new things it's been trained on and it's very difficult to train it on something that's never happened before, so these approaches tend not to work as well in general observability."
Meanwhile, Cribl has filed a motion to dismiss the lawsuit filed against it by Splunk in October. That motion, based on arguments about Splunk's patent claims, won't be considered until March. It doesn't mention Sharp's October social media rebuttal to Splunk's allegations of stolen intellectual property, which pointed out that some of the data collection IP in question is available as open source code on GitHub.
"We're going through the legal process the way our law firm recommends … and we're really optimistic about our chances," Sharp said. "There are now many ways for [customers] to get value out of Cribl … and we have many successful joint customers [with Splunk]. … I'm very highly confident that we're going to continue to coexist in this world with them."
Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.