Getty Images/iStockphoto

How automated metadata management improves business insights

Automating metadata management can cut down time spent on tasks such as data tagging and cataloging. Explore how automated metadata management is improving data quality.

Studies have consistently shown that data-driven organizations outperform their non-data counterparts, with data-fueled programs improving everything from customer relations to operations to forecasting.

Yet leaders at many organizations still struggle with making their data programs efficient and effective, and executives are reporting as much. The "2021 Big Data and AI Executive Survey" from NewVantage Partners found that 99% of the responding C-level executives are investing in data initiatives, with 96% saying they've had measurable business outcomes from those data programs.

But according to the NewVantage Partners survey, a mere 24% said they've created a data-driven organization, only 24.2% said they've forged a data culture, and just 39.3% said they're managing data as a business asset.

The reasons for such struggles are varied.

Problems with data governance and, in particular, metadata management, are certainly factors, according to analysts, executives and management advisors.

Consider statistics from Gartner, the tech research and advisory firm: It reported that its clients spend 90% or more of their time preparing data for advanced analytics, data science and data engineering.

"A large part of that effort is spent addressing inadequate (missing or erroneous) metadata or inferring missing metadata," according to the Gartner report "The State of Metadata Management: Data Management Solutions Must Become Augmented Metadata Platforms."

Strong metadata management is essential to both good data governance and an effective data program, but is no easy task. The volume of work involved with metadata management requires automation -- and ultimately intelligence -- to yield the greatest benefits.

Augmented and automated metadata management can cut down on time spent on data tagging, cataloging and linking; providing intelligence to help uncover insights and connections that would be difficult to identify otherwise.

Automation also helps solve problems of scale that weren't an issue 10 or 20 years ago when the number of data sources was much smaller, said Max Martynov, CTO of IT services firm Grid Dynamics.

"Now there's a larger range of data and more types of data -- streaming, images, videos, voice and text, structured and unstructured," Martynov said. "There are more data sources, more data, it's faster, it's changing faster. It has become much harder to manage everything manually. You need some automation."

What is metadata management?

Metadata is often described as data about data. While that is true, the definition isn't particularly informative.

More specifically, metadata is information that describes and gives context to data so that the data itself can be better understood. It's something like the who, what, when and where behind data collection.

Typical metadata elements include title and description, tags and categories, who created or modified it, when it was created or modified and who can access it.

Metadata can also be categorized into different buckets, with administrative, descriptive and structural being three common categories.

Administrative metadata covers who owns the data, any restrictions on the data, and how the data can be used. Descriptive metadata covers details on who created the data and what it contains. "Structural" refers to how the data set is organized.

Such details about enterprise data are essential to understand and trust the data.

Metadata also ensures enterprise data teams aren't missing data sets critical to their analytics initiatives. Analyses that are missing important data sets typically produce incomplete or flawed results.

This is essential to remember for the many organizations that are still in the early stages of their metadata management initiatives.

"For most organizations, metadata tends to be an afterthought or the ugly stepchild. Even companies focusing on metadata tend to just focus on the definitions or maybe a little around data lineage," said Doug Laney, an innovation fellow with the consulting firm West Monroe.

Laney said a good metadata management program enables data and technology teams to more efficiently and effectively find and access all the data they need for analyses.

"It's critical to understanding data well and the relationships within data," he said, adding that the lack of metadata management is "like having a library without a card catalog."

Yet that's where most organizations are as they struggle to capture data upfront and understand its basic elements.

"They're not tracking or not diving into all the important elements that they can collect," Gartner analyst Alan Dayley said, noting that as a result many enterprise data teams "take a lot of time cleaning and prepping data because they have missing information about metadata and they're making inferences."

Advantages from automated metadata management

According to experts, organizations with strong metadata management programs that include automation -- and those that also have introduced some intelligence -- generally gain the following advantages and benefits:

  • A more complete collection of data sets for analytics and queries that then produce more insightful, more accurate results. "The more you put in, the better the outcomes," Dayley said.
  • Higher levels of efficiency as data teams can more quickly identify, find and access the data they need.
  • An increased trust in data being used for analytics, because teams can readily trace the data's lineage and cite any other requested or required information about the data; consequently, there's also more trust in the results of the analyses.
  • Better compliance with security and privacy regulations, as an enterprise can automatically manage more (if not all) instances of protected and sensitive data as well as quickly identify where specific data, such as customer information, exists.
  • Inference -- "That's when it starts providing answers before you have the questions," Dayley said.

What is automated metadata management?

The neatness of metadata categories and elements can be misleading, as the scope of metadata is vast.

"When you're talking about millions of data elements, there's no way to manually process those; you need some level of automation," Dayley said.

Gartner tracks about 60 vendors that offer metadata management technology. Dayley noted that some vendors offer standalone products, while some vendors have metadata management tools embedded in larger suites such as data science solutions. Additionally, some metadata management solutions include intelligence, so the tools not only automate managing and populating data catalogs but also allow data teams to use the intelligence to surface new insights and correlations within the data sets.

"That's where machine learning comes into play, recognizing elements for analytics, or operational data purposes," Dayley said.

Most enterprises are far from enabling such capabilities, however; Dayley estimated that only 5% to 20% of organizations are automating metadata management and less than 5% are adding intelligence to this function.

Dig Deeper on Data management strategies