18 top data catalog software tools to consider using in 2025
Numerous tools can be used to build and manage data catalogs. Here's a look at the key features, capabilities and components of 18 prominent data catalog tools.
Many organizations face a growing sprawl of data across various databases and other repositories in on-premises systems, cloud services and IoT infrastructure. That makes data management more challenging, and BI and data analytics initiatives are less effective if data scientists, other data analysts and business users can't find relevant data and understand what it means. "Organizations are drowning in data yet starving for insights," said Priya Iragavarapu, vice president of data science and analytics at consulting firm AArete.
Data catalogs can provide a unified view of all the data assets in an enterprise. The idea of a catalog has been around since the early days of relational databases, when IT teams wanted to keep track of how data sets were linked, joined and transformed across SQL tables. Modern data catalog tools inventory data and collect metadata about it from a wider variety of data stores, also including data lakes, data warehouses, NoSQL databases, cloud object storage and more.
They're also commonly integrated with data governance software to help organizations keep pace with changing regulatory compliance requirements and other aspects of governance programs. In addition, the tools are evolving to take advantage of natural language queries, machine learning and other AI functionality. Early data catalogs required custom scripts to crawl data and capture metadata. But newer tools can do that automatically.
Here, in alphabetical order, are details on 18 popular data catalog tools that organizations can use to tame their metadata management challenges and make data more readily accessible and understandable to end users. The list was compiled by Informa TechTarget editors based on research of available technologies, as well as market reports and vendor rankings from Forrester Research and Gartner.
1. Alation Data Catalog
Alation was founded in 2012 and launched its first products in 2015. The company's flagship data catalog software uses AI, machine learning, automation and natural language processing (NLP) techniques to simplify data discovery, create business glossaries and power its core Behavioral Analysis Engine. The engine indexes various data sources and uses pattern recognition to generate popularity rankings, usage recommendations and other insights. In addition, it analyzes data usage patterns with an eye toward streamlining data stewardship, data governance and query optimization tasks.
Alation also offers data governance and data lineage applications as part of its overall Alation Data Intelligence Platform. In that vein, Alation Data Catalog includes guided curation and various collaboration features. For example, an AI copilot named Allie AI can document new data assets, recommend metadata descriptions and identify potential data stewards. Users can create wiki articles and searchable conversations in catalogs, and prebuilt analytics dashboards offer customizable reporting.
Other key features in the Alation tool include the following:
- The ability to flag data health issues and define enterprise data governance policies.
- Prebuilt connectors to various data sources, plus an Open Connector Framework SDK for building custom ones.
- A built-in SQL editor that can be used as an alternative to natural language search.
2. Alex Augmented Data Catalog
Alex Solutions is a data catalog and metadata management provider founded in 2016. The company architected its data catalog software to take advantage of AI and machine learning techniques. Alex Augmented Data Catalog helps automate the process of discovering data assets and then bringing them into a consolidated catalog, with support for various types of structured, semistructured and unstructured data. The tool also includes a set of collaboration features for things such as data sharing and curation.
In addition, Alex automates various aspects of data governance and data quality within the data catalog tool. For example, data governance managers can create policies, assign data stewards and keep track of data pipeline processes from a central console.
Alex Augmented Data Catalog also provides the following features:
- Google-like natural language search and query capabilities.
- A marketplace of plug-and-play metadata connectors to popular data sources.
- Built-in automation for populating and enriching metadata in data catalogs.
3. Ataccama Data Catalog
Ataccama, which was founded in 2008, offers a data catalog tool as a core component of Ataccama One, a consolidated platform that supports data governance and management functions automated through the use of AI. Ataccama Data Catalog can catalog data from databases, data lakes, file systems and other sources. It comes with connectors for a variety of popular on-premises and cloud data platforms.
The data catalog software includes capabilities that help automate data discovery and change detection. The tool can also automate data quality assessments and detect and flag data anomalies, and it can be plugged into business process management workflows to automate data policy enforcement. It supports workflows spanning a diverse set of roles in organizations, including data stewards, data engineers, business users, data analysts and system owners.
Ataccama Data Catalog also includes the following features:
- Continuous data quality monitoring and data cleansing.
- Built-in data profiling, data classification, data lineage, data observability, relationship discovery and metadata management capabilities.
- Functions for configuring workflows, user permissions and custom metadata.
4. Atlan Data Discovery & Catalog
Atlan hit the market with its data catalog tool in 2018. The product is built on design principles borrowed from Google and other end-user tools, such as GitHub and Slack. Atlan Data Discovery & Catalog supports natural language searches for data assets plus ones based on associated business metrics, while a SQL syntax search capability is provided for use by data engineers. In addition, the tool can be used to integrate collaborative data workflows into catalogs.
For example, catalog users can create Jira requests to report issues they find while exploring data sets. The software also enables contextual discussions about data in Slack chats using a reverse metadata feature, which makes metadata available in applications outside of a data catalog. A Companion Sidebar feature provides at-a-glance information about data assets, their usage, Jira issues and more to help users decide whether the data is trustworthy.
Atlan Data Discovery & Catalog also includes the following features:
- Open APIs that enable fully customizable ingestion of metadata.
- A plugin marketplace with connectors to various data tools and platforms.
- Atlan AI, a copilot tool that can be used to generate data lineage summaries, SQL queries and descriptions of business terms.
5. AWS Glue Data Catalog
AWS Glue Data Catalog is the persistent metadata store in AWS Glue, a fully managed extract, transform and load (ETL) service. It enables data management teams to store, annotate and share metadata for use in ETL integration jobs when they create data warehouses or data lakes on the AWS cloud platform. AWS Glue Data Catalog is compatible with the metastore repository in Apache Hive, a popular open source data warehouse tool, and it can be used as an external metastore for Hive data.
The catalog tool helps enforce data governance requirements by tracking changes to schemas and data access controls. In addition, it supports data processes that span different AWS services, such as AWS Lake Formation, Amazon Athena, Amazon Redshift Spectrum and Amazon EMR. The tool can also be used to populate business data catalogs in Amazon DataZone, a separate data management service.
Other features offered by the AWS software include the following:
- A wizard for creating crawlers that automatically scan repositories and capture information on schemas and data types.
- Data lineage information, such as a record of data transformations.
- Integration with AWS Lake Formation for managing access to data catalogs and underlying data assets.
6. BigID Data Catalog
Founded in 2016, BigID developed this tool as part of BigID Data Intelligence Platform, which supports data security, data privacy and data governance initiatives. The catalog software uses machine learning algorithms to find data assets and harvest technical, business and operational metadata. Data classification, data profiling and metadata tagging are also automated through AI and machine learning. Catalogs can include structured and unstructured data from various cloud and on-premises sources.
Applications built into the BigID platform can be used to remove duplicate data from catalogs, manage data retention policies and address data governance issues. BigID Data Catalog also provides capabilities for identifying ungoverned or unsecured data assets. End users can run natural language searches to look for relevant data objects and related information about governance and usage policies.
In addition, BigID Data Catalog includes the following features:
- A function for revisiting recently viewed data objects in a catalog.
- Native connections to more than 150 data sources.
- Support for multiple data classification techniques, including advanced ones based on deep learning and NLP.
7. Collibra Data Catalog
Collibra, which started as a company in 2008, offers a namesake data intelligence platform centered on Collibra Data Catalog. The catalog tool supports a set of automated features, powered by machine learning and AI, for data discovery, data classification and data curation. That includes the ability to use generative AI (GenAI) to create descriptions of data assets. Data profiling and mapping of data lineage information across source systems are also automated.
Collibra Data Catalog contains more than 100 prebuilt integrations for ingesting metadata from various data stores, as well as business applications, BI platforms and data science tools. It also provides configurable workflows for managing data catalogs, guided data stewardship features and granular controls for enforcing data security and privacy protections, all in a single console.
In addition, the Collibra software offers the following features:
- Built-in views of data quality metrics.
- Collaboration capabilities, including crowdsourced feedback on data assets through ratings, reviews and comments.
- An integrated data marketplace that enables users to search for relevant data based on specified filters, such as relationships between data assets.
![Diagram showing how data catalogs work](https://www.techtarget.com/rms/onlineimages/example_of_how_a_data_catalog_works-f_mobile.png)
8. Data.world
Data.world is a cloud-native data catalog tool offered as a SaaS platform by a vendor with the same name. The company, which was founded in 2015, built the catalog software on a knowledge graph architecture that provides a semantically organized view of enterprise data assets and their associated metadata across disparate systems. That's designed to make it easier for business and analytics users to find relevant data and understand its context.
The Data.world platform includes a set of bots that can help organizations deploy and manage data catalogs and then automate data governance tasks. Another set of bots use GenAI to improve data discovery for catalog users through a chat-like interface. They can assist in data searches, suggest research questions, produce data lineage information and automatically generate natural language descriptions to enrich data assets and associated metadata. The platform also contains a visual map of data and relationships, plus a dashboard that provides metrics, alerts and recommendations.
Other notable features in the Data.world software include the following:
- A third set of bots that automate DataOps workflows and communications about data quality between data teams and end users.
- The ability to create customizable data governance workflows and task management processes.
- Support for both virtualized and federated access to data, with built-in governance controls.
9. Erwin Data Catalog by Quest
The first Erwin software was created in 1983 for data modeling. The product line went through several acquisitions over the years and is now owned by Quest Software. It also has been expanded to incorporate additional technologies, including this data catalog tool that was developed as part of a broader Erwin Data Intelligence platform launched in 2017 to support different aspects of data governance.
Erwin Data Catalog by Quest, as the software is formally known, automatically harvests, catalogs and curates metadata. It also includes components for data mapping, reference data management, data lifecycle management, data lineage and classification of sensitive data. Standard data connectors can ingest data from common databases, and optional ones can be added for streaming data, cloud applications, BI environments and more data sources. In addition, the data catalog software can be used together with companion data literacy and data quality tools in Erwin Data Intelligence.
Erwin Data Catalog also provides the following features:
- A management dashboard that can be used to view and analyze data catalog attributes.
- An impact analysis function for assessing the potential effects of changes in a catalog.
- Automated functions to accelerate data movement and transformation, as well as code generation and documentation.
10. Google Cloud Dataplex Catalog
Google Cloud Dataplex Catalog is a metadata management and data discovery platform that works across cloud and on-premises data sources. The tool, which became generally available in mid-2024, is part of Google's Dataplex data fabric environment and supports cataloging and other functionality via the Dataplex UI or a CLI. Potential uses include searching for data assets, exploring associated metadata, enriching and annotating metadata fields, and creating an inventory of available data sources for data engineers. The metadata stored in catalogs also aids in data governance initiatives.
Google still offers a predecessor Data Catalog service within Dataplex, too. Dataplex Catalog includes a new web interface and API, increased scalability and support for more types of metadata. On the other hand, it doesn't support some Data Catalog features. Users of the older service that don't have custom metadata can switch to Dataplex Catalog simply by making it their default tool instead; ones with custom metadata need to transition through a two-step preparation and transfer process.
The following features are also included in Google Cloud Dataplex Catalog:
- The ability to store both business and technical metadata in catalogs.
- Automatic harvesting of metadata from various Google Cloud data sources, plus support for importing metadata from other systems.
- Role-based permissions through Dataplex's identity and access management controls.
11. IBM Knowledge Catalog
IBM Knowledge Catalog is a metadata repository that was designed to support AI, machine learning and other analytics workflows. Part of the IBM Cloud Pak for Data platform, the tool can catalog various data and analytics assets, including machine learning models and structured, unstructured and semistructured data types. It supports AI-driven search for data discovery and provides automated data governance functions for tasks such as data quality assessments and managing data privacy policies.
The data catalog software also includes metadata enrichment capabilities powered by large language models (LLMs), plus a set of Knowledge Accelerators -- industry-specific vocabularies of business terms designed to streamline data governance and analytics deployments. In addition, it can use a knowledge graph and the FoundationDB open source database to visually map relationships between data assets and governance artifacts.
The IBM tool also offers the following features:
- Integration with IBM Manta Data Lineage for advanced metadata importing capabilities and IBM Match 360 to provide consolidated views of governed data sets.
- A built-in business glossary that can serve as a foundation for data governance efforts.
- More than 45 connectors to both IBM and external data sources.
12. Informatica Data Governance and Catalog
Informatica, which was founded in 1993 with a focus on data integration tools, now provides a broad set of technologies as part of its Intelligent Data Management Cloud platform. As the name indicates, Cloud Data Governance and Catalog combines data governance and data cataloging capabilities. It can automatically find, ingest, classify and inventory data through Claire, Informatica's AI and machine learning engine. Automated data curation features also use AI and machine learning algorithms to identify relationships between data sets and associate business terms with technical metadata.
Supported data sources include cloud and on-premises data stores, plus BI tools, ETL software, business applications and more. Data lineage capabilities track the movement of data through systems and data pipelines, with the ability to do impact analysis on changes to data assets. Built-in collaboration capabilities enable catalog users to add reviews, ratings and annotations to data assets, and subject matter experts can answer questions from users through a Q&A feature.
Other features provided by Informatica Data Catalog include the following:
- Data quality tracking capabilities to view data profiling statistics and data quality rules, scorecards and metrics.
- A natural language search function and browsable hierarchical views for finding relevant data in a catalog.
- A knowledge graph that displays views of the connections between related data assets.
13. Microsoft Purview Unified Catalog
This tool is part of Microsoft Purview, a data governance, compliance and risk management cloud service introduced in 2022. Initially known as Microsoft Purview Data Catalog, it was renamed in late 2024 as part of the launch of a revised data governance offering that's built around the catalog software. The features in the tool itself remain the same as before, though.
Users can search a catalog for data assets or data products, such as tables, files and Microsoft Power BI reports. Microsoft Purview Unified Catalog also provides a business glossary that can be used to find relevant data products by searching on glossary terms, key data elements or business objectives. In addition, an AI copilot can aid in catalog searches. The data catalog tool runs on top of Microsoft Purview Data Map, a companion metadata management product.
Other features provided by Microsoft Purview Unified Catalog include the following:
- Data curation capabilities, such as organizing data by different governance domains and grouping together related data assets and products.
- Integrated data quality management capabilities, including built-in quality rules and functions for quality scanning, scoring and alerting.
- Data health controls and management actions to help organizations track data governance practices and take steps to address issues.
14. Oracle Cloud Infrastructure Data Catalog
Oracle Cloud Infrastructure Data Catalog -- or OCI Data Catalog, for short -- was designed to complement Oracle's own technology ecosystem. The metadata management cloud service creates an inventory of data assets and a business glossary for users. It can automatically harvest metadata from Oracle data stores and a half dozen other data sources in both cloud and on-premises systems, using either an on-demand or a schedule-based approach.
OCI Data Catalog also uses fuzzy matching algorithms plus AI and machine learning techniques to help data stewards and other data experts curate and enrich metadata. As part of that, the tool recommends links between data objects and the terms and categories in a business glossary to make it easier for catalog users to find relevant data.
The Oracle data catalog software also includes the following features:
- Data discovery capabilities that enable users to search for data by technical metadata names, business glossary terms and tags.
- Integration with the Oracle Cloud Infrastructure Events service to distribute notifications about the status of metadata harvesting processes.
- Support for using the tool's Hive-compatible metastore as an external repository for schema definitions in Oracle's OCI Data Flow, OCI Big Data and OCI Data Science services.
15. OvalEdge
Founded in 2013, OvalEdge sells a data catalog tool as the centerpiece of its namesake data governance platform. The company touts the software's ease of use and affordability, plus its support for creating Amazon-like data marketplaces that can be searched in natural language or explored with external tools. The OvalEdge catalog crawls more than 100 data sources to index metadata. It then uses AI and machine learning algorithms to automatically organize and catalog data based on tags, usage statistics and other markers.
A data profiling function automatically generates statistical summaries of data sets, and data relationships can be marked by embedded algorithms or manual inputs. Data lineage information can also be created either automatically or manually, and the software provides AI-driven data classification capabilities, including recommended business glossary terms to apply to data objects. Role-based access control is supported at the data asset and column levels in catalogs or for different OvalEdge modules.
The OvalEdge data catalog also includes the following features:
- A set of self-service catalog tools designed for different groups of users.
- Collaboration through a built-in chat function and integration with Slack.
- Alerts to notify end users about data quality issues or changes to data.
16. Pentaho Data Catalog
This is the latest iteration of the data catalog tool offered by Hitachi Vantara, now through a separate Pentaho business unit that also sells data integration, data quality and business analytics software. The tool was originally developed by Waterline Data, which Hitachi Vantara bought in 2020. First rebranded as Lumada Data Catalog before getting the Pentaho name in late 2023, it also includes technology from Io-Tahoe, another data catalog vendor acquired in 2021. The catalog software supports data discovery and metadata management on both structured and unstructured data assets from various sources.
Pentaho Data Catalog uses machine learning and AI to automatically populate data catalogs and apply tags to data. AI technology also drives self-service data discovery through a metadata-based search function designed to identify dark data that might be missed by manual tagging. To aid in data governance, the software can also automatically identify, classify and secure sensitive data and track metadata that's needed for regulatory compliance.
Pentaho Data Catalog also provides the following features:
- Data profiling processes that generate data quality metrics and statistics.
- Data lineage tracking, including the ability to find hidden links between data assets.
- A "galaxy view" that gives catalog users a visual representation of interconnected data assets and the relationships between them.
17. Tableau Catalog
Founded in 2003 and acquired by Salesforce in 2019, Tableau was one of the pioneers of self-service BI and interactive data analysis. It later expanded into data management technologies, including Tableau Catalog. The tool is part of Tableau Data Management, a software bundle available for use with Tableau's BI and analytics platform. Tableau Catalog is designed to help improve data discovery and usage in Tableau installations.
The catalog software automatically ingests information about Tableau data sets into a centralized repository. It also includes data lineage and impact analysis features that can help users better understand data relationships and how changes to data sets or pipelines will affect analytics processes. In addition, the tool supports features like data quality warnings and contextual metadata to give users information they need to validate data sets for analytics uses.
Other features in Tableau Catalog include the following:
- A set of APIs to ingest metadata from other applications for analysis in Tableau.
- Integration with enterprise data catalogs through Tableau APIs or prebuilt connections from other catalog vendors.
- The ability to run data searches against columns, databases and tables.
18. Talend Data Catalog
Qlik was founded in 1993 as a BI and analytics vendor. Even more so than rival Tableau, it has added various data management technologies through acquisitions, most prominently the 2023 purchase of Talend, which offered data integration software and other tools for managing data. Talend Data Catalog is now part of Qlik's data quality and governance software suite, along with other Talend products for functions such as data preparation and data stewardship.
Talend Data Catalog is fundamentally a metadata management tool. It can automatically crawl, profile, organize and enrich metadata to support data discovery by users. The software also traces data lineage and tracks compliance with data privacy policies and regulations. Collaboration features enable catalog users to update metadata or business glossary information, with a role-based process for assigning responsibilities and capabilities for specific data objects.
The following features are also built into Talend Data Catalog:
- Data sampling and profiling capabilities to ensure that the associated metadata is complete and identify required changes.
- Semantic mapping to create contextual links between data objects that are related to one another.
- Connectors, or "bridges," for harvesting metadata from various data stores, BI tools, business applications and other data sources.
Open source data catalog software
Organizations can also consider various open source data catalog tools. Many were developed by enterprises trying to address their own data cataloging challenges. The following are some of the available open source options:
- Amundsen. This data discovery and metadata engine was created by Lyft to help increase the productivity of data scientists and other users in its internal data infrastructure. The ride-sharing company released the tool as an open source technology in 2019.
- Apache Atlas. The Atlas software includes data catalog, metadata management and data governance features. It was started by former big data platform vendor Hortonworks, initially for use in Hadoop clusters, and was handed off to the Apache Software Foundation in 2015.
- DataHub. LinkedIn's data team created this metadata search and data discovery tool to help internal users understand the context of data, rearchitecting and expanding on an earlier tool called WhereHows. DataHub became open source in 2020.
- Metacat. This federated metadata discovery and exploration tool was created by Netflix to simplify data discovery, data preparation and data science workflows in its big data environment. The technology was made open source in 2018.
- OpenDataDiscovery Platform. Developed by AI services firm Provectus and released as open source software in 2021, it features a federated data catalog plus data discovery, data lineage, data quality, reference data management and collaboration capabilities.
- OpenMetadata. Created primarily by software vendor Collate and also launched in 2021, OpenMetadata is a metadata management platform that supports data discovery, observability, governance and quality management and includes built-in collaboration features.
Editor's note: Informa TechTarget editors updated this article in January 2025 for timeliness and to add new information.
George Lawton is a journalist based in London. Over the last 30 years he has written more than 3,000 stories about computers, communications, knowledge management, business, health and other areas that interest him.
Craig Stedman is an industry editor who creates in-depth packages of content on analytics, data management and other technology areas.