your123 - stock.adobe.com
How to use Amazon SageMaker Canvas for accurate predictions
Amazon SageMaker Canvas is a useful machine learning tool for both technical and nontechnical professionals. This tutorial shows how to create data sets and train custom models.
Building and using machine learning models typically require at least an intermediate understanding of software engineering.
While most organizations practice collaboration between technical and nontechnical teams, technical personnel can deliver only a limited number of tasks due to resources. This constraint is particularly noticeable in the ML space, as it requires multiple time-consuming iterations. Therefore, exposing ML functionality to nontechnical team members can increase team productivity and deliver significant value.
Amazon SageMaker Canvas provides users that don't have software coding experience the ability to build and use ML models to predict outcomes based on available data sets. This tool helps make ML technology widely accessible across an organization. SageMaker offers a user-friendly GUI in the AWS console that supports authentication using single sign-on. This makes it easier for users to gain access without needing an explicit AWS console user.
Launch SageMaker Canvas
To get started with SageMaker Canvas, account admins must first configure a SageMaker Domain. This domain provides configurations related to identity and access management permissions and user profiles, as well as VPC and storage details.
From the Amazon SageMaker console, users can click on the Canvas link.
This link takes users to the Canvas landing page. From there, admins can select a user profile, which they configure during the SageMaker domain setup, and launch Canvas.
After a user clicks Open Canvas, SageMaker launches relevant AWS resources, such as the workspace instance required to run the UI, as well as build and prediction processes.
As part of this step, SageMaker creates an S3 bucket with the following name pattern: sagemaker-<region>-<aws-account-id>. It's important to be aware of this S3 bucket, as it stores data and artifacts related to the tasks completed by Canvas.
Launching the UI takes a few minutes to complete.
Import data and create data sets
Once SageMaker creates all the resources, and the UI is ready to use, one of the first steps is to import the data that will be analyzed. Canvas supports a variety of data formats, such as CSV, plaintext, and image and document files (PNG, JPG, PDF, TIFF). The appropriate data type depends on the type of prediction and ML model that is used.
Canvas provides ready-to-use ML models powered by AWS services, such as the following:
- Amazon Comprehend for sentiment analysis, entities extraction, language detection and personal information detection in CSV or plaintext files.
- Amazon Rekognition for object and text detection in images.
- Amazon Textract for expense analysis, identity document analysis, and document analysis in document and image files.
From the main Canvas page, users can select ready-to-use models. For this example, I selected Object detection in images, which is a useful model that helps business analysts and application developers detect specific patterns and subjects in multiple image files. The model acts based on the calculated predictions.
Next, the tool provides the option to upload a single image because I selected the Single prediction option, as seen in Figure 5.
The model then generates predictions on the uploaded image.
It also gives the option to generate or use an existing data set to analyze multiple files as a batch prediction. This capability enables different teams to share relevant data sets within an organization.
Canvas provides the option to create a data set from either a local file or an S3 location. As mentioned earlier, the type of selected ML model drives the data format that the data set uses.
View predictions
Once the data set is created, users can view the predictions for all files included in it, which helps with scalability. For this data set, the model identified multiple attributes associated to each picture. It detected the image contains an animal -- in this case, a cat or dog -- and included other relevant predictions and its confidence level for each attribute.
Users can also download prediction results in CSV or zip format. This capability enables users to handle large data sets and automate or share further actions based on the exported prediction results.
Build new models
Canvas also enables users to build and train custom models, using two available modes: Quick and Standard. Quick mode can take two to 30 minutes to build, depending on the model type, such as numeric prediction, categorical prediction, time-series forecasting, image prediction or text prediction.
Standard mode can take two to five hours. However, Standard is expected to have higher accuracy than Quick mode. Quick mode limits input data sets to 50,000 entries, which can be records or images, so it's not suitable for large data sets above this size.
To build a new model, go to My models in the left navigation menu, and then click on New model.
Then, select the type of model to use as a starting point, based on the problem at hand: predictive analysis, image analysis or text analysis. For this example, I selected Image analysis. The details related to the following steps vary according to the selected model and data set types.
Next, select an existing data set for the model build.
In the case of image analysis, the data set should contain labels that are relevant for the provided images. All images must be labeled, and each label must have at least 25 images assigned to it.
Once the data set complies with all requirements, users can start the build process. In this case, I selected the Quick build option.
When the model build is complete, Canvas displays a page with a summary of the model's performance, as seen in Figure 17.
At this point, users can run any number of predictions against new or existing data sets.
Share custom models
Canvas also enables users to share custom models in SageMaker Model Registry. Team members can access the models and eventually deploy them in other environments, including production ones.
Encourage team members to log out of the console once they complete their tasks in Canvas, as the cost for Canvas is $1.9 per hour per active session, or $1,368 per month.
Who should use SageMaker Canvas?
SageMaker Canvas is a great visual tool for building, training, sharing and making predictions using ML models, but it does require a significant amount of manual work. It's a valuable tool nontechnical users can learn to familiarize and expose themselves to ML models. It also quickly evaluates existing models and works on prototypes.
For long-term production deployments and high-frequency and high-volume predictions, however, SageMaker isn't necessarily the recommended tool. In those cases, it's preferable to trigger the required ML steps in SageMaker using automation tools outside of the Canvas interface, such as custom scripts and application components. That said, all teams exposed to SageMaker would benefit from growing familiar with the Canvas interface and functionality.