kentoh - Fotolia

Assent Compliance automates text analytics with AWS

With more than 20,000 documents to review each month, Assent Compliance, a supply chain data management vendor, turned to AWS to automate its text analysis operations.

In dire need of a way to analyze documents for key data, Assent Compliance turned to AWS' text analytics tools for help.

Assent Compliance is a supply chain data management company founded in 2005 and based in Ottawa.

In particular, the vendor helps companies collect and manage the supply chain data they need to mitigate risk associated with complying with constantly changing global regulations. Among them are corporate social responsibility guidelines, product compliance regulations and vendor management protocols.

Among its many customers are aerospace and defense companies, including Spirit AeroSystems and GE Aviation; medical companies, like Johnson & Johnson and Thermo Fisher Scientific; and retail giants, such as Ralph Lauren and Urban Outfitters.

Each industry has its own guidelines, and in association with each of those guidelines, Assent's customers file copious amounts of paperwork.

In all, Assent collects more than 20,000 information requests per month from its customers, according to Sandeep Mistry, senior machine learning engineer at Assent.

Managing that volume, meanwhile, is onerous, and despite having 250 employees who work with information requests, it would be impossible to analyze the text in each document quickly without the assistance of text analytics tools that automate the process.

Such tools are able to analyze all the text in each request and help structure the data in each document for record-keeping and even potential data modeling. They're able to determine what subject it's referencing, what information it includes and what necessary information may be missing -- all in mere seconds.

Assent Compliance senior machine learning engineer Sandeep Mistry talks about text analytics during AWS re:Invent.
Sandeep Mistry, senior machine learning engineer at Assent Compliance, discusses text analytics during AWS re:Invent.

"We can't have humans read them all, so [we] wanted to apply machine learning to help aid this process," Mistry said.

The AWS text analytics stack consists of Amazon Textract, Amazon Comprehend and Amazon Augmented AI (A2I), each of which comes with pre-built APIs and requires no code to set up and use.

We've identified the problem of understanding documents into three steps. There's extracting text from documents, then getting insights from this extracted text, and lastly having human oversight between the processes or after the process.
Mona MonaAI/machine learning solutions specialist, AWS

Textract, first introduced in 2018, is a fully managed machine learning service that automatically extracts text from documents. Comprehend, also introduced in 2018, is a natural language processing service that automatically discovers relationships and insights in unstructured data. Finally, A2I, unveiled in April 2020, is a service that enables users to add human review to machine learning models that result in low confidence predictions so errors can be identified and models can be improved.

"We've identified the problem of understanding documents into three steps," Mona Mona, AI/machine learning solutions specialist at AWS, said during a session at the virtual AWS re:Invent conference on Jan. 13. "There's extracting text from documents, then getting insights from this extracted text, and lastly having human oversight between the processes or after the process."

The initial text analytics workflow Assent built using AWS tools was designed to do intelligent document analysis related to environmental regulations, including the Restriction of Hazardous Substances Directive in the European Union; Registration, Evaluation, Authorization and Restriction of Chemicals (REACH) in the European Union; Section 1502 of the Dodd-Frank Act, which deals with conflict minerals in the United States; and Proposition 65 in California, which contains a list of chemicals deemed to cause cancer or reproductive toxicity.

Using a document from Laird Technologies related to REACH compliance for a series of parts, Mistry demonstrated how AWS was able to extract information about the company submitting the form.

That information included the company's name; contact information of the person who wrote the form; legislative reference to REACH; date of reference so Assent could apply the proper version of REACH to the request; a list of the parts and products that need to adhere to REACH; a signature; and a date for the document that could be different from the date of reference contained within.

Using a text analytics workflow that includes Amazon Textract and Amazon Comprehend, Mistry was able to drag and drop five PDFs and receive a summary of what was contained in the documents -- including reference to REACH and information about the parts -- within 15 seconds.

"Humans can only process one document at a time, but we've optimized our step function workflow to process pages in parallel," Mistry said. "If a document has three pages, we can process all of these at the same time to decrease latency."

That summary then enabled Mistry to take action related to the documents in case any key information was missing and, once all information was present, move the documents onto Assent's data warehouse, where it can be stored and accessed for analysis that will help customers better manage their supply chains.

"We're very happy with our partnership with AWS," Mistry said, noting that the project to build Assent's text analytics workflow started with just two people but, after its initial success, now includes a much larger team.

He added that Assent plans to add more AWS capabilities that will assist the company's text analytics operations, including Amazon A2I.

Dig Deeper on Data science and analytics