Getty Images

Build an automated image processing pipeline with AWS tools

In this guide, learn how to automate image processing with AWS Step Functions. Build an image processing pipeline with example code, then explore ways to expand your new workflow.

Your organization is rolling out a new application, and as a requirement for launch, users must be able to upload an image. The images will be stored in an Amazon S3 bucket and shown publicly on a single-page application with your company's branding.

The launch date for this new application is right around the corner, but a tester has found a problem: There is absolutely no filtering on the kinds of images users can upload, meaning that someone could post objectionable content right next to your company's logo. You now face a choice between telling shareholders you need to delay the application's launch or risking a PR nightmare.

To address the problem, you need a way to detect inappropriate content in uploaded images and remove problematic uploads before they reach your front page. Fortunately, using AWS Step Functions, you can quickly set up an automated image processing pipeline that scales as fast as your users do.

What are Step Functions in AWS?

AWS Step Functions is a service for orchestrating serverless workflows and passing stateful data between various Amazon services and APIs.

Workflows are stored in a state machine: a workflow where stateful data is passed from one component to another in JSON format. State machines can have simple instructions, in which values are passed to different services, or complex branching logic containing retries, conditional statements and parallel processing.

You can configure workflows either through a graphical drag-and-drop interface or in a JSON document using Amazon State Language. The automated image processing pipeline you'll build in this tutorial uses only two services, but you can add as many others as you like.

Diagram of a state machine for automated image processing built using AWS Step Functions.
Figure 1. After completing this tutorial, your finished product will resemble the above diagram.

Other AWS tools used in this tutorial

Before you get started, familiarize yourself with a few of the other Amazon services that you'll be using in this tutorial:

  • S3. With the Amazon S3 storage service, users can put objects into buckets and serve those objects as static content. S3 can also be configured to send an EventBridge notification when a new object is added.
  • Rekognition. This AI service is a computer vision SaaS tool capable of scanning images and video. The Rekognition API lets users scan images for potentially objectionable content, such as nudity or violence.

This is by no means a complete list -- you can add additional services and tasks to this state machine as you see fit.

Prerequisites for building an image processing pipeline with Step Functions

This tutorial assumes that you've set up a state machine to consume an EventBridge notification, which S3 will send after a new object is added to an existing bucket. If you haven't, you can find instructions to do so in the AWS Step Functions documentation.

If you've set up your state machine correctly, every time you upload a new object to your S3 bucket, the state machine will start with a JSON object similar to the following one as input for your first step.

{
  "version": "0",
  "id": "768de0a0-cefe-2da5-e520-70a2def70fa8",
  "detail-type": "Object Created",
  "source": "aws.s3",
  "account": "123456789012",
  "time": "2023-02-20T02:06:02Z",
  "region": "us-east-1",
  "resources": [
	"arn:aws:s3:::sfn-demo-bucket"
  ],
  "detail": {
	"version": "0",
	"bucket": {
  	"name": "sfn-demo-bucket"
	},
	"object": {
  	"key": "path/to/new/object",
  	"size": 182239,
  	"etag": "786b38bc6194f30ea41dadef4ab8d956",
  	"sequencer": "0A3E38E9F8063F2D58"
	},
	"request-id": "TSEVVAR6EBREWQB5",
	"requester": "123456789012",
	"source-ip-address": "123.456.789.012",
	"reason": "PutObject"
  }
}

Next, clear any steps in your state machine. Now you're ready to make a new workflow by editing your state machine in the Workflow Studio, where you'll do most of the configuration for this image processing pipeline.

Set up a state machine to perform automated image processing

In Workflow Studio, search for DetectModerationLabels in the left-hand panel showing available services. This is the name of the Rekognition API that detects objectionable content in images, returning the output in the form of a JSON array, and it represents the first step of your content moderation pipeline.

After adding this step, navigate to the pane on the right-hand side of Workflow Studio and click on the step to show its properties. Next, you'll change several fields, leaving the rest set to their defaults.

1. Configure API parameters

Under the Configuration tab, replace the API Parameters field with the following code block.

{
  "Image": {
    "S3Object": {
      "Bucket.$": "$.detail.bucket.name",
      "Name.$": "$.detail.object.key"
    }
  }
}

After completing this step, your API Parameters field should mirror the screenshot shown in Figure 2.

Screenshot of Configuration tab in AWS Step Functions, showing the correct settings for the DetectModerationLabels API parameters.
Figure 2. Configuring parameters for DetectModerationLabels API.

The .$ notation at the end of the Bucket and Name parameters is required for any properties passed from a previous step in the state machine -- in this case, the EventBridge notification from S3.

Similarly, parameters starting with $. contain the JSON path to the value that is passed in from the JSON object. You can find a complete explanation of the parameters for the Rekognition DetectModerationLabels API in the documentation.

2. Enable the state machine to access original JSON state and detected content labels

Under the Output tab, check the box labeled "Add original input to output using ResultPath," select "Combine original input with result" from the drop-down menu, and enter $.result in the box. After completing this step, the values should match the screenshot shown in Figure 3.

Screenshot of Output tab in AWS Step Functions, showing the correct settings for the DetectModerationLabels API response.
Figure 3. Defining output behavior for DetectModerationLabels API.

Checking this box tells the state machine to append the results from this step to the original JSON object that started this step under a new field: result. This gives the next step in the state machine access to the original JSON state as well as the content labels detected by the Rekognition API.

3. Analyze images and define system behavior for objectionable content

From here, your state machine will need to make a choice. If the Rekognition API finds anything problematic in the image, you'll remove it from S3. If not, it's cleared to be posted on the site, and no further action is needed.

In a state machine, you can set up this behavior with a flow step. Navigate to the Flow tab on the left-hand side and drag the Choice step under the DetectModerationLabels step. Choice steps are similar to if-else blocks in other programming languages: You set rules to check and then run the steps under that rule. If no rules evaluate to True, the default rule will run.

For this workflow, because you want to leave the image alone if no objectionable content is found, the default rule will be to take no action. To add this behavior, return to the Flow tab in the left-hand panel and drag the Success step under the Default rule path.

Next, set up a rule to follow when objectionable content is detected. Click on the Choice step. Under the Configuration tab, click on the Edit button next to Rule #1, then choose Edit conditions.

Enter the following line of code as a condition:

$.result.ModerationLabels[0] is present

Then select DeleteObject from the drop-down menu to match the screenshot shown in Figure 4, and save the rule.

Screenshot of Choice Rules in AWS Step Functions, showing the correct settings for Rule #1 in the Choice step.
Figure 4. Setting up rules to handle objectionable vs. acceptable images with a Choice step.

Because you set the output from the Rekognition API to be added to the JSON response under the result heading, Rekognition will add a new array called ModerationLabels. If any questionable labels are found, they'll be added to that array along with details and confidence scores; otherwise, it'll be empty.

With the rule set up, add a step for S3 DeleteObject in the left-hand panel, and add it under your new rule in the Choice step. As with the DetectModerationLabels step, you'll need to configure the API parameters for the DeleteObject step using the following code.

{
  "Bucket.$": "$.detail.bucket.name",
  "Key.$": "$.detail.object.key"
}

Your results should resemble the screenshot shown in Figure 5.

Screenshot showing the correct settings for the DeleteObject API parameters in AWS Step Functions.
Figure 5. Configuring parameters for the DeleteObject API.

Once that's done, grab a Fail step from the Flow tab and add it under the DeleteObject step.

Testing and enhancing your automated image processing pipeline

Now you've set up a state machine that scans images uploaded to S3 for graphic, explicit or otherwise unwanted content -- and can automatically delete them if detected.

Next, try adding some images to the bucket and watch each one go through the state machine. Each execution will show the JSON input and output from each step as well as each step run, which simplifies the process of troubleshooting any errors.

Step Functions are a simple but powerful tool to automate your AWS workloads. From no-code workflows that call existing APIs to invoking custom scripts, they can offer many benefits to any organization. For example, you could expand the above workflow with enhancements such as the following:

  • Recording offending IP addresses and file names before deleting them, then blocking uploads from IPs that have previously uploaded offensive content.
  • Using another Rekognition API to detect content that might be useful for analytics and store it for analysis.
  • Using AWS Batch to train a machine learning model from new data ingested through step functions.

Next Steps

Set up a basic AWS Batch workflow with this tutorial

Evaluate when to use added AWS Step Functions actions

Dig Deeper on Systems automation and orchestration