Eugenio Marongiu - Fotolia
Log management tools add finesse to Elasticsearch
Vendors that bring log management expertise to open source Elasticsearch helped unlock collaboration and incident response for companies that had struggled with ELK on their own.
Data management and query features from log management software vendors proved crucial for two companies that found raw Elasticsearch unwieldy to use.
Log management used to be practiced mostly by bleeding-edge IT departments, but the rise of microservices applications and complex cloud-native architectures has made detailed log data collection a common requirement for mainstream enterprises.
The ELK stack, which consists of ElasticSearch for log querying, Logstash for log data collection and management and the Kibana data visualization tool, is a widely used utility for collecting, indexing and querying log data. While versatile in its raw form, Elasticsearch and the ELK stack can be cumbersome to manage for IT pros who don't have deep expertise in its native query language and log data structures.
That's where log management software vendors LogDNA and Logz.io came in over the last 18 months for a financial services firm and a web messaging startup. The vendors' products, which use Elasticsearch behind the scenes, include features such as easily accessible query interfaces and sophisticated log data parsing that improved DevOps collaboration and IT incident response for these customers.
"Some of the other competitors in the field … expose a little bit more of the Elasticsearch native [query] engine to the end user, so one has to know a bit more about how Elasticsearch works to get data out of there," said Mark Pimentel, cloud engineering lead at PlatformZero, a financial services software division of Capco, a digital consultancy company based in London. "[LogDNA] allows you to query for various information via keys and tags, elements from an index, and building a query in LogDNA was pretty rudimentary."
LogDNA simplifies queries for DevOps collaboration
PlatformZero initially sought a log management software product to create a separate, access-controlled pool of data for developers and product managers who wouldn't otherwise have direct access to system logs that had been collected internally through Elasticearch. It selected LogDNA to create that data repository, in part because its simplified query interface would make information accessible for developers conceptually as well as logically.
LogDNA Enterprise software melds a proprietary message brokering service called Buzzsaw with an Elasticsearch back end. This system handles log parsing, a process that sorts log files into consistent chunks of information that are easier to manipulate, store and search. It also presents its own query interface to end users through a web UI that PlatformZero staff found easier to use than the native Elasticsearch query language, Pimentel said.
LogDNA is simple enough to be used by application developers who are not steeped in infrastructure management and the ELK stack, as well as release managers that work with the developers to evaluate the success of software deployments. But it's also sophisticated enough to be used by the company's site reliability engineers (SREs) in tandem with a SignalFx APM tool for incident response.
LogDNA introduced a feature called Usage Quotas in March that limits the data output from various services when users query them, to cut down on cost spikes associated with broad data searches. PlatformZero rolled out this feature in production soon after it was introduced.
"It doesn't so much reduce costs as it makes them more predictable," Pimentel said.
The company used SignalFx before its acquisition by Splunk, and while ease of use with LogDNA's tool was paramount, Pimentel said the company would like to see the vendor add some of the advanced log management features other competitors offer to its roadmap. These include AIOps and other sophisticated log analytics functions such as post-ingestion indexing.
In addition to Usage Quotas, LogDNA has data management features such as Exclusion Rules, which allow teams to choose which logs they store, as well as Extract and Aggregate Fields, which gives users the ability to view and export fields from log lines that have already been indexed. LogDNA officials did not say whether AIOps and other data analytics features are on the company's roadmap.
Logz.io eases ELK troubleshooting
As a startup with a 15-engineer team responsible for every aspect of IT, New York-based Holler sought outside help with log management after an incident in 2019.
"We were scaling pretty quickly to bring in new partners, Venmo among them, and it was really hard to get visibility into the back end when things went wrong," said Daniel Seravalli, lead engineer for the company, which makes GIFs and stickers used in popular web and mobile apps. "We had a monitoring stack, but it never worked right."
In July 2019, the company started to experience lengthy outages that sometimes took weeks to resolve.
"Investigating them meant collecting raw data from servers and aggregating it manually -- we didn't have dashboards that we could use as a starting point for our investigation," Seravalli said.
Daniel SeravalliLead engineer, Holler
Then the company released a new version of its software development kit (SDK) to a large partner, and it started generating much more log data than Seravalli's team had expected. This put strain on the company's Kafka data pipeline and storage infrastructure.
"It took us two weeks to really nail down what was going on there -- we just didn't have the data to narrow it down," Seravalli said. "Nine months later, we had a similar incident but had Logz.io and it took us a day to figure it out."
Logz.io is a software as a service (SaaS) provider that hosts open source observability data and visualization tools, including the ELK stack. Holler decided to switch from an internally managed ELK stack to the Logz.io version after the incident with troubleshooting its SDK in 2019, after considering Splunk, in part because Logz.io pricing was appealing.
Since then, Holler has also begun to expand its observability tools to include distributed tracing and time series metrics, which Logz.io also offers with a service based on Jaeger released in 2019 and Prometheus as a service, which became available in March. Holler has also used Logz.io's Grafana-based interface for metrics monitoring. Logz.io adds value to these open source tools by correlating data between them and providing direct links between their dashboards.
Logz.io's dashboards are also preconfigured to deliver key information as needed, as opposed to internal Holler developers' previous attempts to view data through Kibana, which Seravalli described as "flying blind."
Finally, Logz.io tech support engineers consulted with Holler IT pros on how to set up monitoring for Kafka data pipelines, including creating complex log parsing rules.
"That meant a lot to us, that Logz.io was willing to help us out like that, post-sale," Seravalli said.
As with PlatformZero's Pimentel, Seravalli would like to use more AIOps and data analytics features within Logz.io as his company grows, and he said he hopes to see Logz.io add synthetic tracing to its Jaeger-based services.
Synthetic tracing will probably be delivered next year, according to Logz.io officials.
"We are working hard with the community to beef up Jaeger for more and more APM use cases," said Logz.io CTO Jonah Kowall in an email. "This contribution to both Jaeger and OpenTelemetry is a work in progress… one important component of APM is Synthetic monitoring, and this is likely the next step for Logz.io."
As Holler continues to grow, it may also add in-house ops expertise and run its own ELK stack again, which is why the open source basis for Logz.io's tools is important, Seravalli said. But in the meantime, working with a service provider has also shielded Seravalli's team from having to deal with Elasticsearch licensing controversies that arose in the early months of this year.
"That's why I work with a service provider, so I don't have to worry about this stuff," he said. "But I also love the managed open source model, because if we bring this back in house in two years, we haven't spent the last five learning a proprietary technology."
Beth Pariseau, senior news writer at TechTarget, is an award-winning 15-year veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.