Joshua Resnick - Fotolia
Kafka at center of new event processing infrastructure
Events are as important as data in emerging applications underlying many e-commerce efforts. Streams of events tell a company what motivates customers to use online products.
Event processing was once a niche undertaking, the province of Wall Street and intelligence agencies looking to work on small bits of big data in real time. Now, with globally deployed applications, like Amazon, Walmart, Uber and Lyft, event processing is going mainstream.
With apps like Uber, in particular, nearly every activity is logged. This becomes part of an event, an occurrence that is part of a data process. For example, with Uber, events are spawned when a user opens an app or browses locations and services. More events are created as the service estimates a price for a trip or suggests a driver's route.
Events tell a company what is happening with its business. They are becoming the common currency in web and cloud applications.
Focusing on data flow, event processing is a big change in both computer and data architecture. It is often enabled by Kafka, the messaging system created and open-sourced a few years ago by LinkedIn.
Event processing also brings with it new microservice schemes and varied infrastructure components. It even changes the status of databases that have long been the workhorse of commerce, especially as vendors look to deliver their software on the cloud. The database is no longer the center of corporate computing.
Swiping events
Event processing architecture is "turning the database inside out," said Kyle Bendickson, software engineer for big data at Los Angeles-based Tinder, an app that lets users quickly swipe through profiles -- swiping right is a "like" -- to find potential dates.
All that swiping generates a lot of data on user behavior, Bendickson told attendees at the Kafka Summit in New York on April 2.
Events and data comprise about 1 million events per second and 40 TB of data per day as Tinder connects people, Bendickson said. Other presenters at the summit went further, clocking their estimated event processing levels as high as multiple millions of events per second.
In such architectures, Bendickson said, traditional database transactions become almost secondary to all the pieces around the associated pieces that continually combine to make events.
Leading a Kafka Summit session on event-driven architecture, and emphasizing that he spoke for himself and not for Tinder, Bendickson advised attendees to keep their microservices simple and to avoid excessive use of remote procedure calls when building event streaming systems.
Bendickson listed Spark, Flink and RocksDB along with Kafka and other elements as part of the event processing mix. Such components can be swapped in and out, as teams constantly update applications.
In event-driven apps like this, "a database is really just an insert log," Bendickson said. The view is somewhat stark, but it shows how events disclosing customers' intentions are gaining great importance in big data applications.
Challenges in operations
Complexity inherent in event processing with systems like Kafka goes beyond simply designing the new style of system.
For example, administering and managing the underlying infrastructure for event processing is difficult, and not necessarily a chore that organizations want to take on, according to Oskari Saarenmaa, CEO at Aiven, a vendor of managed cloud services, who was on hand at the Kakfa Summit.
"There is a lot of operational complexity around Kafka," Saarenmaa said, citing cluster management as an example. In fact, Kafka clusters require use of Apache ZooKeeper services that govern naming, configuration, synchronization, message queueing and other use cases. Not many developers have ZooKeeper skills, compared to the skills needed to use many other event processing components.
"Kafka event processing is complex just for the fact that it requires a cluster and a consensus system such as Zookeeper," Saarenmaa said. "Kafka requires a set of tools that you have to apply in a certain way in order for them to work out."
It doesn't get easier going to cloud, which is where many event processing applications are ultimately likely to reside. One reason, Saarenmaa said, is underlying infrastructure on the cloud is rapidly changing.
Taming event processing infrastructure
"Cloud is the direction this whole space is moving into, to get this whole type of system as a service and not to have to worry about any of the back-end hassles," said Jay Kreps, one of Kafka's original creators at LinkedIn and now co-founder and CEO at Confluent, the host of the Kafka Summit.
"What people want is a whole platform -- in other words, not just the engine, but the whole car around the engine," Kreps said in an interview.
As he put it, "open source Kafka is the engine of the car," to which Confluent has added data streaming, data management, analytical querying and other tools.
At the summit, Kreps discussed version 5.2 of the Confluent Platform, which continues the company's efforts to tame the rawer aspects of event-style programming. Improvements to the company's Kafka libraries are intended to bring C/C++, Python and .NET clients on par with Java development for Kafka. And an update to the Confluent Replicator simplifies streaming of events across on premises and public cloud.
Turbocharging exhaust data
Like other companies arising from the open source big data movement, Confluent will be pressed to further expand support for event processing infrastructure on the cloud, according to Doug Henschen, industry analyst at Constellation Research.
Henschen said he sees similarities between Confluent, with Kafka, and Databricks, with Spark. Databricks, like Confluent, is headed by original creators of the software it sells, he pointed out.
"Confluent is to Kafka as Databricks is to Spark," he said.
"You see both open source products used all over the place, but Confluent is less visible and well-known than Databricks," he said.
Oskari SaarenmaaCEO, Aiven
But Confluent does not have a special partnership with a big cloud provider, akin to Databricks' partnership with Microsoft for Spark on Azure, in Henschen's view.
As time goes by, Henschen said, he expects Confluent to get closer to a major cloud vendor. He pointed to Databricks' close partnership with Microsoft as an example of such a relationship. Microsoft offers Databricks' brand of Spark as part of its Azure HDInsight cloud platform.
Henschen suggested Confluent's Kafka expertise could be a good fit with Google's cloud efforts, as cloud leader Amazon continues to offer Kinesis as an alternative to Confluent's Kafka Streams products. In fact, last week at Google Cloud Next 2019, Confluent and Google described a partnership that allows Confluent's cloud software to be managed from a Google Cloud Platform console and enables integrated billing and support.
In the cloud or on premises, events are becoming central to a new class of e-commerce applications. All the elements around transactions, what some people have called "exhaust data," will gain added attention as a result.