Kit Wai Chan - Fotolia
Latest AWS machine learning service takes aim at AI novices
The latest service in AWS' AI portfolio, Amazon SageMaker, is designed to simplify the deployment of new machine learning models, but still requires data science skills.
Cloud services fueled a renaissance in AI algorithms, as cloud providers packaged them into easily consumable offerings. And now, there are two sides to the cloud AI market: development frameworks for experts, and packaged services for specific use cases.
As a result, a gap has formed that leaves out IT teams who want to develop custom applications using machine or deep learning models, but don't have the expertise to understand the nuances of different algorithms. Similar to the early days of the internet, when only academics and engineers could build and browse websites, AI needs new tools to enable mass adoption.
Amazon started to address this deficiency at re:Invent last year with SageMaker, a managed AWS machine learning service that accelerates the development, training and deployment of AI models.
Amazon SageMaker provides a significant improvement in usability, but its launch -- along with a dearth of updates around Amazon's original machine learning service, released in the spring of 2015 -- created confusion about the state of AWS' AI portfolio and which tool to use in different situations.
AWS' AI line-up
Until the release of Amazon SageMaker, the cloud provider targeted its AWS machine learning services at developers who wanted to run AI frameworks across compute instances. In this scenario, AI experts needed to size and provision the right amount of compute capacity and install all the software required to run a model development framework. AWS provides a variety of compute platforms designed for machine and deep learning workloads, plus Amazon Machine Images (AMIs) bundled with the necessary software. These resources include:
- Elastic Compute Cloud (EC2) C5 instances with Xeon Scalable Processors (Skylake generation), and 512-bit AVX vector instructions with support for up to 72 vCPUs and 144 GB memory;
- P3 instances that pair a Broadwell Xeon processor with one to eight of the fastest NVIDIA Tesla V100 GPUs, with 16 GB of GPU memory each;
- Supported AMIs built with popular deep learning frameworks, including TensorFlow, Microsoft Cognitive Toolkit, Caffe2, Theano, Apache MXNet and Gluon -- a promising new deep learning API designed to simplify model development and training.
As with all applications that run on EC2, deep learning instances can access other AWS resources and tools, like Simple Storage Service for persistent data, Glue for data extraction and transformation, Elastic MapReduce (EMR) for Hadoop and Spark processing, and Athena or Redshift for database and data warehouse operations. These integrations help developers build cloud-native AI applications rather than standalone software that's unable to access higher-level application services.
At the opposite end of the spectrum, the AWS AI portfolio provides fully packaged applications that use deep learning to solve particular problems, including:
- Amazon Rekognition for image and video recognition;
- Amazon Transcribe for speech transcription;
- Amazon Polly to perform text-to-speech translation;
- Amazon Comprehend for language sentiment analysis;
- Amazon Lex for conversational chatbots.
Machine learning platform services
In between low-level DIY platforms and packaged applications, there are three AWS machine learning services designed to simplify and accelerate custom development: Amazon Machine Learning, SageMaker and DeepLens. The last of the three is a hardware video appliance designed for software experimentation and learning.
Amazon Machine Learning appears dormant, as the AWS team hasn't written a blog post about it since the summer of 2016. This AWS machine learning service just gets lost amidst the deep learning hype, as it's designed for predictive analytics using three types of statistical models: binary classification, multiclass classification and regression.
Amazon Machine Learning's approach to data-driven predictions enables the software to automatically find the right algorithm, based on the training data you supply and the category of problem. For example, if you perform regression analysis, the system will determine which type of equation -- such as linear, quadratic or exponential -- best fits the data. Amazon Machine Learning also provides visual assessments of model performance to help optimize the model and training data set.
Crash course in SageMaker
As a mix of PaaS and an integrated development environment, Amazon SageMaker covers the entire AI software development lifecycle, including model development, building, optimization and validation, and deployment. The development environment uses Jupyter Notebooks to describe and load the data set, select and configure the algorithm, and validate the model. SageMaker has 11 pre-configured algorithms to handle a variety of problem types, including deep learning algorithms not covered by Amazon Machine Learning, such as:
- Discrete, binary and multiple-choice classification;
- Quantitative variables, including Linear Learner and extreme gradient boosting with different hyperparameters;
- Group identification from a data sample, such as k-means and principal component analysis algorithms;
- Image classification;
- Supervised learning for machine translation, text summarization and speech-to-text transcription;
- Unsupervised learning for content summarization, topic modeling and sentiment analysis.
SageMaker can deploy training jobs to establish parameters for the model, deploy the finished product and validate the production model with samples from a known data set. The service can also use custom TensorFlow or MXNet code to train a machine learning model. Although the included Python libraries fit most situations, developers can also use SageMaker with Spark running on Amazon EMR for more advanced data processing needs.
Enterprises can run a variety of applications that use SageMaker, including targeted marketing for ads or promotions, fraud protection, creditworthiness, predictive maintenance using internet of things data and other time-series forecasts.
Factor price, expertise into your decision
The SageMaker pricing model isn't simple. It includes several parameters for each of the three phases of model development. During the building stage, AWS bills customers for machine learning notebook instance usage per hour, with rates depending on instance type; solid-state drive (SSD) storage and data storage per GB per month; and data ingress or egress per GB. The training stage also bills customers for machine learning instances per hour and SSD storage. The hosting stage bills customers on the same criteria as the building stage, namely, instance usage, storage and data I/O.
Ultimately, SageMaker enhances the AWS AI portfolio. But, as evident by the technical intricacies of the supported models, the service still requires data science or AI expertise. If you put a novice business analyst in front of the SageMaker console, don't expect to get meaningful results from data.
The Amazon Machine Learning service provides a more automated option for statistical model development and tuning, but it has greater limitations than SageMaker. The service is convenient if it fits your needs, but, in most situations, SageMaker offers more flexibility and should be the preferred AWS development environment for future AI and predictive analytics projects.