Kit Wai Chan - Fotolia

Apache Pulsar joins Kafka in Splunk Data Stream Processor

Splunk is integrating open source Apache Kafka capabilities alongside Apache Pulsar support to better enable its latest Data Stream Processor release.

Splunk built out its event streaming capabilities with a new update, released Wednesday, to its Data Stream Processor to bring in more data for analysis on the Splunk platform.

The DSP technology is a foundational component of the information security and event management vendor's Data-to-Everything approach. The new release, DSP 1.1, includes a series of improvements including better integration to ingest data from Microsoft Office 365.

Pulsar versus Kafka

The DSP update also benefits from Splunk's October 2019 acquisition of streaming data vendor Streamlio, a leader of the open source Apache Pulsar streaming data project. Pulsar is often seen as a rival to Apache Kafka, though the Splunk Data Stream Processor now integrates both technologies to enable its event streaming capabilities.

 "While Kafka certainly has the edge over Pulsar in terms of market presence and user traction, proponents argue that Pulsar's decoupled architecture provides it with performance advantages over Kafka, while it also boasts solid message queueing and multi-tenancy functionality," said Matt Aslett, research director at S&P Global Market Intelligence. "Like Kafka, Pulsar has also been expanding at a rapid pace beyond simple messaging."

While Kafka certainly has the edge over Pulsar in terms of market presence and user traction, proponents argue that Pulsar's decoupled architecture provides it with performance advantages over Kafka, while it also boasts solid message queueing and multi-tenancy functionality.
Matt AslettResearch director, S&P Global Market Intelligence

Splunk is fairly new to the stream processing niche, but it has ambitions to drive significant business from Data Stream Processor, beyond simple integration and enterprise-wide data delivery, with greater emphasis on delivering automated actions, Aslett noted.

Pulsar event streaming boosts Splunk DSP

Splunk has been busy integrating Apache Pulsar as a foundational element for event stream processing and data collections, said Josh Klahr, vice president of product management at Splunk.

"There are certain use cases where Pulsar works very well when compared against Kafka," he said. "What Pulsar provides is slightly more resilience for stateful jobs."

For example, Klahr said Pulsar is well-suited for a user executing a large-scale data lookup and doing enrichment on the stream. He argued that Pulsar is also often better than Kafka when there are latency problems with a data connection that might drop intermittently. With data interruptions, Pulsar can handle latency by storing data on a node until a connection becomes stable.

"Pulsar makes sure that there is a guaranteed delivery of all the messages across the network," Klahr said.

Splunk DSP 1.0 had already integrated support for Kafka as an event streaming data technology. With DSP 1.1, users will now get the benefits of both Kafka and Pulsar, without having to choose one or the other exclusively.

Screenshot of Splunk Data Stream Processor update
Screenshot of Splunk Data Stream Processor update

"The decision about what happens in the back end is kind of abstracted away when users are creating data pipelines," Klahr said. "There's not a specific choice that the user needs to make about how the processing is done."

Splunk Data Stream Processor 1.1 updates

Beyond the Apache Pulsar integration, Klahr explained that Splunk's goal for the new DSP release is to make data more accessible.

One of the data sources that is now more accessible in DSP 1.1 is Microsoft Office 365. Splunk has had other methods of getting data from Microsoft Office 365, including using an agent as an endpoint data collector, Klahr noted. However, that approach didn't allow for data manipulation, enrichment or alerting on the data coming from Office 365 as an event stream.

The type of data that Splunk users tend to pull from Office 365 includes audit logs for Active Directory, service status information as well as data from the management API that can be useful for security visibility.

"Now, with DSP 1.1, we're providing a more modern way to get that data from Office 365," Klahr said.

Next Steps

Kafka users detail real-time data benefits

Dig Deeper on Data warehousing