Editor's note
The rise of public cloud has legitimized the data analytics market -- making big data a bigger deal than ever before. Companies that have been collecting terabytes of data for years can now use public cloud as a cost-effective approach to mine and analyze that data. And successful big data analytics strategies often mean a competitive advantage for companies.
Amazon Web Services (AWS) offers a variety of data analytics tools. AWS customers can do everything from process data in real time to implement machine learning for applications. Currently, there are five primary AWS products for cloud-based analytics: Elastic MapReduce (EMR), Kinesis, Redshift, Data Pipeline and Machine Learning.
Third-party tools also exist to diversify and expand on the AWS analytics portfolio. While each service supports big data in its own way, it's key for administrators to understand each offering to ensure proper data integration.
1Accessing, protecting and storing big data
Amazon Web Services makes managing big data easier and more cost effective than ever, with a variety of options to store the petabytes. Amazon's typical slate of products is well equipped for storing big data, including Simple Storage Service (S3) and Elastic Block Store (EBS). But speed is a consideration in data analytics; the faster an enterprise can access its data, the faster it can act on it. Enterprises can access that data more quickly by using a secure NoSQL database, which relies on solid-state hard drives. DynamoDB is a great place to start, though third-party options are available. Amazon Relational Database Service plays a complimentary role to a NoSQL database by offering quick and consistent performance, and is optimized for transactional workloads. Elastic File System can be another useful tool in big data projects, scaling up to handle large flows of data.
2Process big data, and then visualize it
Once you're ready to mine and process data from your databases, there is no shortage of tools to help with that task. In some situations, enterprises need instantaneous information -- such as monetary transactions, social media response and clickstreams. Amazon Kinesis allows users to build a dashboard or application to monitor information as soon as it comes in from the data stream. Kinesis dashboards are one method for visualizing big data, but it might not suit the needs of every business. Third-party options like Tableau offer connectivity to EMR and other AWS products. Being able to see past data and using it to generate predictive algorithms is another challenge. And creating mathematical algorithms to interpret future data can be a tough and time-consuming task. Amazon Machine Learning provides visualization tools and helps create models to react to real-time data.