Sergey Nivens - Fotolia
Real-time vs. near-real-time analytics -- how to choose
Data is king, but what happens when the data isn't truly real-time? Expert Tom Nolle explains the cloud platform choices when it comes to 'real-time' analytics.
There may be no concept as popular with CxOs than that "knowing more about the business" will pay large dividends. Analytics, or the application of computer intelligence to data analysis and correlation, is one of the top priorities of executives, and the top question isn't the business scope analytics should take but the timeframe it should adopt. Do you follow the business in real time, and if "near-real-time analytics" is the focus, how near should it be? The answer will probably depend on the availability of the data, the granularity of the decision process, the true value of as-it-happens and the time and cost of analysis. All this ends by dividing your applications by the value of real-time analysis, not presuming one approach fits all.
Real-time processing means collecting and analyzing data at the time it's generated, usually by accessing sensors that provide direct input from the real world. Normally, real-time data is in the form of simple events, things like counts or measurements, and normally, the purpose of analyzing it is to generate a response in the form of some kind of alarm, process control command or other real-world reaction. It's easy, when working through how you might analyze data, to forget that there are a lot of complications associated with real-time information.
One critical issue in the real-time vs. near-real-time analytics debate is data availability. Most analytics are based on business records, which, in most cases, mean transactional or partially digested and contextualized information. A sensor that counts boxes might look much like one that counts trucks, and what's inside either may or may not be immediately associated with the counting. IT processes operating at the edge of the business collect data points, not information, and so you have to step back through the edge activities to find the place where truly useful information in a detectable business context can be found.
Another edge-process question when it comes to near-real-time analytics is the granularity of data. While edge devices may collect information in real time, the information may not be used and stored in that form. How many cartons of parts are received per day? Clearly, that information can't have less than one-day granularity, and if someone believed that hourly information would be useful, it might be necessary to revamp data collection and aggregation to have the data available. The legacy data would not have the same granularity, and the value of real-time analytics would be reduced as a result of being cut off from historical correlation.
Granularity is important in another sense, which is the decision level. There's not much point to spending time and effort gathering business information in real time only to submit it to the monthly board meeting. Generally, real-time information implies a closed control loop, an application that converts information into immediate action rather than supporting a measured decision process. If you are not expecting to generate an application reaction to an event, then you probably only have a near-real-time analytics need.
The control loop of an event-driven application is the information path between event generation and control response. The length of a control loop is the amount of time in which the process under inspection requires a response. This can be set by a mechanical element, like an assembly line moving at a fixed speed, or by the tolerance of a worker waiting, for example, for a gate to open. Where the control loop has to be very short, the application generates real-time analytics, and if it can be longer, the application yields near-real-time analytics.
Sometimes, you can gain an insight into just how real-time your analytics need to be by looking at the tools that seem to suit your needs best. Analytics applications, as they morph into real-time processes, start to look increasingly like complex event processing or event correlation. If your applications test the relationship between things that are happening, you're looking at a real-time process. Complex event processing and analytics are related, but most applications will fall cleanly into one category or the other.
Those that don't can often be divided into a portion relating to historical data and another relating to real-time data. Many applications that have a real-time component actually do analysis on historical data to interpret real-time events. This is not even real-time or even near-real-time analytics, but the result is used to guide control loop applications.
This seems to be the solution that cloud providers worldwide are building for. Amazon, Google and Microsoft have all suddenly and massively enhanced their web services supporting event processing and event-based applications. Google calls the result a microservices strategy, and Amazon and Microsoft prefer to call it lambda processing, named after the programming technique used to create event-handling components.
Cloud giants see event processing as a mesh of paths, some leading to short control loop reactions and others leading to longer-term analytics. This vision supports a future where real-time and near-real-time analytics don't have sharp boundaries but rather have controlled-time boundaries where the length of control loops and the role of analysis are specifically integrated into application design. That's a much more logical outcome.
We can already see that application and business trends are making control loop applications more broadly useful, and similar trends are driving analytics. Today, merging the two means looking at the way the application sets control loop limits and working within them. Tomorrow, application design will support a much more flexible approach. Since the tools to support that future model are already becoming available, it would be wise to explore the cloud providers' near-real-time analytics solutions today and be ready for the inevitable point when they'll become the accepted strategy for everyone.