Rawpixel.com - stock.adobe.com

Tip

How to make a metadata management framework

Don't wait until you have a metadata management problem to address the issue. Put a metadata management framework in place to prepare for potential issues.

Organizations often grow into metadata problems, rather than out of them.

IT departments invest time and effort into data management to ensure data quality, security and availability. These practices keep a modern organization running effectively. In contrast, metadata management tends to get attention only when problems start to appear.

As companies expand, they often encounter data silos, where each department collects, stores and describes data differently. This lack of standardization leads to inconsistencies, which can hinder collaboration and decision-making.

For example, a sales system might use client reference numbers whereas the marketing app uses customer IDs. At first glance, it might not seem like an issue because both terms refer to a unique identifier for each customer. However, without proper metadata management, this inconsistency can lead to a host of problems. And every new data source can exacerbate these inconsistencies.

This is when most organizations realize they need metadata management. By establishing a common data language and a centralized metadata framework, companies can bring clarity and consistency, cataloging and categorizing data assets for collaborative and cross-functional analysis and operations.

What is a metadata management framework?

A metadata management framework is not a technology, but a comprehensive set of policies, processes and tools designed to ensure consistent and accurate metadata capture, storage and use across an organization. It establishes a common language for describing data, enabling users to understand -- and trust -- the information they work with.

Policies often involve defining the roles, responsibilities and accountability for metadata management across the organization. A good policy outlines who is responsible for updating and maintaining metadata. This ownership of metadata might be different from ownership of the data itself. For example, a system administrator might be responsible for the complex data in an ERP system, but the metadata about that system, held in a data catalog for broader use, might have a different owner.

Processes can help resolve systems' conflicting definitions and identify appropriate metadata tags. In the example of customer ID and client reference number, although both refer to customers, it is common for marketing departments to include a wide range of prospects in their work. At the same time, sales might focus only on potential customers who have made direct contact or have already started to buy. Organizations need a process to reconcile those differences into a shared definition that could be useful for analysts covering both areas. Similarly, if metadata is updated in one system -- for example, if a product is moved into a new category - organizations would need a process to ensure that change propagates across all systems.

Metadata tools used to be highly centralized and cumbersome, which is not surprising, given the complexity of some of these policies. The metadata repository is a centralized system that provides metadata across the organization, which enables administrators to apply policies with a definitive model of how to organize data. However, the repository can become as complex as the systems it was meant to simplify.

A metadata catalog is a modern replacement for the repository. A catalog is a searchable inventory of an organization's data assets, along with their associated metadata. Individual departments make updates with a simple, effective browsing interface for discovering data assets.

The benefits of a metadata framework

Improved data searching

A primary benefit of a metadata management framework is enhanced search. Users can quickly and easily catalog and categorize data assets with relevant and helpful metadata tags. This reduces the time and effort spent enabling teams to be more productive and efficient, and reduces duplication of efforts.

Data initiatives often flop when different departments disagree on the basic definitions of data because it erodes trust in the data itself.

Data lineage tracking and impact analysis

A practical framework also enables organizations to track the lineage of their data, from its origin to its current state. This includes understanding how data is transformed, moved and used across different systems and processes. Data lineage tracking records how data is handled over time, which is crucial for ensuring data quality, compliance and audit-ability.

Rather than seeing where data comes from, impact analysis looks at where it is going. Suppose you intend to delete a data element or change it significantly. Impact analysis identifies downstream applications, which reports or analyses the changes affect, who is using the data and who owns it. This heads-up can save a great deal of disruption and aggravation.

Collaboration and a common data language

A metadata management framework establishes a common data language across an organization. It creates a shared understanding of data assets, their relationships and their business context by standardizing data descriptions and categorization.

This common language enables more effective communication, collaboration and decision-making, because everyone works from the same definitions and assumptions. It's particularly important to avoid confusion and ambiguity in large, complex organizations that share data across multiple business units.

An often-overlooked side effect of this shared data language is that users trust data more and will be more likely to use it for decision-making. Data initiatives often flop when different departments disagree on the basic definitions of data because it erodes trust in the data itself. For example, if the sales team defines customer differently than the marketing team, teams might be skeptical of any analysis or insights derived from customer data.

Automated management

Modern metadata management frameworks often include automated tools and processes to capture, update and maintain metadata. Automation reduces the burden on IT teams and data stewards, who would otherwise need to manage metadata across disparate systems manually. Automated metadata management ensures that metadata remains accurate and up-to-date as data assets evolve.

Steps to establish a metadata management framework

1. Define success

The first critical step in establishing a metadata management framework is defining success.

The framework seeks to bridge the gap between a technical architecture and the business uses that live over it, so it's best to define strategic objectives in business terms. For example, if an organization wants more collaboration across organizational boundaries, make it a clear goal. If you want more data sharing and reuse, that, too, can be an excellent goal.

Other objectives might be more tactical, such as more insight into lineage and impact analysis, more significant data accessibility, or less disruption to processes due to metadata changes. Organizations should align the framework with their data strategy and business objectives to secure buy-in and support from stakeholders.

2. Create a metadata team

In a large organization, a metadata management framework might require substantial effort to deliver. In smaller organizations, metadata management often falls to a cross-functional virtual team. In either case, it's best to involve stakeholders across the entire data value chain. Include the owners of source systems, where metadata is often created.

Many organizations have some middleware architecture, such as a data warehouse or data lake, and admins from those systems should also be involved. Metadata in the data pipelines or extract, transform and load processes changes frequently, and someone responsible for those systems should be included.

Finally, include a representative of the data consumers, who might be business intelligence users, application developers, data scientists or even business users working with self-service tools.

It sounds like many stakeholders, but many organizations have overlaps and shared responsibilities between functions in many organizations, lowering the number of people needed.

3. Establish processes and policies

The next step is establishing the processes and policies that govern the framework. Straightforward processes ensure that even nonspecialists can create, update and maintain metadata consistently across the organization.

First, define the metadata creation process. Owners of operational applications must know how downstream integration and analytic processes will use their data. They need a process that enables them to create new data items, and update or delete existing items while keeping downstream systems, such as data warehouses and data workflows, updated.

Next, establish policies that govern metadata security. Metadata is data about data, but it can reveal critical information about the underlying data or an organization's operations. Exposure can indirectly compromise the security and privacy of sensitive data, reveal data usage patterns or provide insights into the organization's strategies.

Other processes to consider include metadata approval and change management, metadata audit, collaboration and methods for reporting the state of metadata.

Metadata tools

When selecting metadata tools, it's essential to consider more than just ease of use. Factors such as scalability, integration with existing systems, and support for automation and collaboration can ensure the framework meets organizational needs.

A metadata repository is a good choice for organizations that have a large, complex metadata landscape with many different data sources and systems. The metadata catalog might be sufficient if the metadata is relatively simple or needs to integrate with existing catalogs or discovery tools. Catalogs provide a searchable, user-friendly interface.

If metadata needs are very specific or unique, a homegrown option might be best for a vertical market with complex and specific requirements, such as defense or manufacturing. Organizations can tailor custom tools to their requirements and integrate them into existing systems and processes.

Metadata repositories can require significant investments in upfront costs, maintenance and support. Catalogs can be a considerable investment, but are cost-effective in the long term with automation features that enable a fast time-to-solution.

A hybrid approach might be the best option, combining elements of repositories, catalogs and custom options to create a tailored metadata management framework.

Donald Farmer is principal of TreeHive Strategy and advises software vendors, enterprises and investors on data and advanced analytics strategies. He has worked on some of the leading data technologies in the market and previously led design and innovation teams at Microsoft and Qlik.

Dig Deeper on Data management strategies