Amazon Kinesis: When to use Amazon's new big-data processing service
In this expert answer, Chris Moyer offers advice on using Amazon Kinesis for near-real-time processing of streaming big data.
Amazon recently launched Amazon Kinesis, a managed service for rapid processing of large amounts of data. Do you have any thoughts about the significance of this service and where it might be most useful?
Amazon Kinesis is a pipeline for processing data streams in near-real time. It's similar to Apache Storm, but it's hosted and scaled entirely by Amazon Web Services.
Kinesis offers a complete solution for processing data in near real time (within 10 seconds). Kinesis doesn't really offer any new concepts, but the managed version can serve as a complete suite for specific use cases.
Some examples include:
- Processing log data in real time (to find possible errors and alert IT staff when problems occur)
- Receiving real-time analysis of application usage
- Setting up real-time alerts for notification when someone mentions a particular company on Twitter, Facebook or Google+
- Monitoring real-time news sources for content using specific keywords, then delivering that content to a mobile device
The type of processing that Kinesis does is identical to processing that can be done with other Amazon services. You can already use Amazon's Simple Queue Service, Simple Notification Service and autoscaling capability to process real-time streams of data (in fact, at my company, Newstex, this is exactly what we do). The real advantage of Amazon Kinesis is that it makes it easier to build new services from scratch, providing a completely managed service for the entire process.
Ultimately, Amazon Kinesis will be most useful for addressing big-data problems, such as:
- Processing log data
- Processing social media streams for specific terms or key phrases
- Identifying trends in stock prices
- Analyzing real-time sales statistics
Even if you don't have a large amount of data to process, this type of architecture helps with fault tolerance, as well as scaling. So, there are many reasons to use Amazon Kinesis or something similar if you're worried about real-time processing of any stream of data.