Getty Images/iStockphoto

Alation adds automation to deal with exploding data volume

The data catalog specialist's Workflow Automation provides customers with bots and prebuilt workflows that aid data stewardship by automating repetitive data management tasks.

Alation on Monday unveiled Workflow Automation, a set of bots and prebuilt workflows designed to automate repetitive data management tasks as data volumes grow beyond what data stewards can handle.

Among other bots, the suite includes one that automatically fills in metadata titles and descriptions as data is ingested. Another adds security classifications and ensures that data adheres to policy updates. The prebuilt workflows, meanwhile, can be customized to meet the needs of different users.

Given that Workflow Automation addresses a specific problem, it is a significant set of tools for Alation customers, according to Donald Farmer, founder and principal of TreeHive Strategy.

Overall, worldwide data generation is increasing at an exponential rate. At the same time, the amount of data that individual organizations collect is both growing rapidly and becoming more complex.

As a result, those responsible for assigning characteristics to data to make it discoverable are often unable to keep pace without the assistance of automation.

"Workflow Automation is an important addition to catalog and governance platforms like Alation," Farmer said. "As data volumes grow and environments become more complex, manually governing metadata, classifications and policies across many data sources becomes untenable."

Data teams are often tasked with overseeing hundreds of data sources, he continued. For some large organizations, data sources number in the thousands. Manually organizing and overseeing that amount of data quickly is difficult.

"Automating repetitive and, often, tedious data stewardship tasks improves efficiency, consistency and compliance at scale while reducing errors," Farmer said. "It allows data teams to manage more data sources. Perhaps not revolutionary, it's a significant evolutionary step to more scalable data governance."

Based in Redwood City, Calif., Alation is a data catalog specialist whose peers include Atlan and Collibra.

Using the vendor's Data Intelligence Platform, customers can connect data from various sources to create data sets that can train models and inform other data products. In addition, Collibra's data catalog enables users to organize data products themselves so they can be re-used to fuel decisions.

Beyond its data catalog, Collibra offers metadata management so that customers can trace the lineage of their data and ensure its quality.

New capabilities

Data stewards can only do so much.

Their role is to manage their organization's data and data products so they are high quality and can easily be found and operationalized to inform decisions.

Before the advent of the cloud, organizations collected data from a small number of sources. That data was stored on premises, and all of it was structured.

Data stewardship, as a result, was relatively straightforward.

In recent years, however, as many enterprises have migrated their data operations to the cloud, they have expanded the number of repositories -- data warehouses, data lakes, data lakehouses and databases -- they use for storage. Simultaneously, they have increased the number of sources from which they collect data, adding web pages and mobile devices, among others.

Further complicating matters is that organizations now collect unstructured data such as text, images and audio files in addition to the structured data that has historically been used to inform business intelligence. That unstructured data needs to be transformed to give it structure so it can be operationalized to inform models, dashboards and other analytics assets.

Making sure the metadata of potentially billions of data points is correct and every one of an organization's data points are high quality is too much for even a team of humans to handle.

Alation is attempting to help with the launch of Workflow Automation. It is not only aimed at giving data stewards a measure of relief from manual tasks but also at providing more accurate data classification than overworked and overwhelmed people can.

Stewart Bond, an analyst at IDC, noted that modern data is highly distributed, meaning data comes in from various systems, gets stored in various systems, takes on various forms and is constantly changing. Similarly, the metadata that describes and defines data is highly distributed.

"This reality makes the job of the data steward daunting and difficult to keep on top of all the changes and movement of data being managed by the organization," Bond said.

Anything that can assist data stewards as they attempt to manage their organization's data is significant, he continued.

"These new Alation bots will provide data stewards with more automation to not only keep up with the changes but also … improve the quality and relevancy of the data intelligence maintained by the catalog, resulting in improved data governance, analytics and -- ultimately -- data-driven decisions," Bond said.

Workflow Automation was built using a combination of Python, the open source Django development framework, Docker and Amazon EC2, according to Junaid Saiyed, Alation's chief technology officer. The suite includes the following capabilities:

  • Completeness Bot, a tool that accelerates data curation and subsequent operationalization by filling in metadata titles and descriptions so that data can be categorized into domains. In addition, Completeness Bot identifies and corrects missing and incorrect metadata.
  • Compliance Bot, a feature that aims to improve data consistency and regulatory compliance by assigning security classifications and ensuring that data adheres to policy updates.
  • Current Content Bot, a tool that automatically checks data to make sure it is current and adheres to the latest standards and regulations.
  • Metadata Management Bot, a feature designed to help data stewards and other administrators curate content in their data catalog by alerting them when new metadata, such as tables and columns, are added to key databases.
  • Custom Workflows, which are prebuilt workflows that include links to various activities required to complete a given task and can be customized to suit the needs of individual users.

Collectively, the bots reduce the manual labor required of data stewards, freeing them up to do more in-depth work than what amounts to data entry. In addition, they work together to make data available for analysis more quickly than is possible when humans are involved.

Individually, the Completeness and Compliance Bots stand out as having the biggest potential impact for data stewards, according to Farmer.

Workflow Automation is an important addition to catalog and governance platforms like Alation. As data volumes grow and environments become more complex, manually governing metadata, classifications and policies across many data sources becomes untenable.
Donald FarmerFounder and principal, TreeHive Strategy

"The Completeness Bot substantially frees up data stewards and reduces risk," he said. "But I'm also intrigued by the Completeness Bot. Filling in missing metadata is really tedious, and because it is tedious, it becomes error prone. But it's important for business users downstream to understand the data. Automating this allows much faster curation and time-to-value for data products."

Collibra and Atlan, perhaps Alation's closest competitors, also offer some automated capabilities, but not on the scale of those now provided by Alation, Farmer continued.

For example, Collibra has automated data lineage and Atlan offers users metadata suggestions, he noted.

Beyond helping existing data stewards, the bots have the potential to make it easier to use Alation's data catalog, according to Bond.

That could lead to more people within organizations using the vendor's tools, which contributes additional information and expertise to their organization's data catalog.

"These bots help address the top issue data catalog software faces in the customer environment, [which is] adoption," Bond said. "The more that people in the organization … contribute knowledge to the catalog, the more valuable it becomes. These bots will make the platform easier to use, lowering barriers to adoption and improving the value of the platform."

While Workflow Automation directly addresses a problem, the motivation for its development was a combination of customer feedback and Alation's own focus on providing users with what it calls active governance features, according to Saiyed.

Data curation, in particular, was identified as a labor intensive process that could be helped by automation.

"As customers [catalog and curate] a growing volume of data, they clearly need more efficient tools to assist with these tasks," Saiyed said. "This solution directly responds to the dual drivers of customer demand and Alation's commitment to empowering users with tools that facilitate more effective data governance and curation."

Next steps

With Workflow Automation now part of Data Intelligence Platform, Alation has a broad roadmap for future development, according to Saiyed.

The vendor's areas of focus include AI governance to help customers prepare for broader use of AI, adding capabilities to make the Data Intelligence Platform more scalable, improving the user experience within the platform and continuing to expand its partner ecosystem.

"These efforts aim to empower business and data leaders with robust tools to measure and maximize value," Saiyed said.

Farmer, meanwhile, said that while Workflow Automation provides more automation capabilities than those offered by Alation's closest competitors, there's still room for the vendor to add more.

One potential area to address with more automation is data quality, which is imperative to the accuracy of any data model or application. Another is proactive recommendations.

"I'd be interested to see how these automation capabilities could proactively monitor data quality and integrity over time. Could bots identify potential data issues and alert stewards?" Farmer said. "Finally, providing benchmarks and recommendations on where to apply automation for the most significant impact would help guide customers."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies