Minerva Studio - Fotolia

Tip

How to build a master data index: Static vs. dynamic indexing

Expert David Loshin explores the differences between static and dynamic indexing in master data management systems, and which queries each approach can support.

David Loshin

By

David Loshin, Knowledge Integrity Inc.

Published: 31 Oct 2018

Master data management systems are intended to present a unified view of information about key data domains, such as customers or products, using data pulled from original sources located within and outside the organization. Identity resolution and record linkage techniques are used to load all of the input source records, block the records according to predefined strategies, look for similarities and link records presumed to represent the same real-world entities. In some MDM systems, data collected from the linked records is combined into a single master record.

However, there is a risk that such artificially produced master records are inconsistent with the original source records. A different approach to provide accessibility to the information about a sought-after entity is to use a searchable master data index. The goal of the master index is to allow consumer applications to request information about a named entity and retrieve all the original records that have been linked together.

Identity resolution is typically performed as a batch operation, pulling data from the original sources, extracting the values from the data attributes to be used for similarity scoring (we will call them the "matching attributes"), followed by the process of linking sets of similar records into groups. Each group of linked records is assigned a unique identifier, and this unique identifier becomes the key for building the master data index. That index consists of two mapping tables: the search table maps the set of matching attributes to the unique identifier, and the index table maps the assigned unique identifier to all records assigned that identifier.

Static indexing vs. dynamic indexing

This index configuration provides what could be called a static master index used for search and retrieve. The search process begins with a consumer request for any records associated with a set of presented matching values (such as a customer's last name, first name and telephone number). The search table is queried to find any records with the presented matching values. If any records are found, it means there was a match in the data set, and for each of the found records, the corresponding unique identifier is looked up in the index table to find all other records linked to the found record. All those associated records can be retrieved and assembled into a result set given back to the data consumer.

The goal of the master index is to allow consumer applications to request information about a named entity and retrieve all the original records that have been linked together.

This master index solution works well, as long as there is an exact match for the attributes provided by the consumer seeking the data. The challenge is that even though this configuration is designed to link records in the presence of data variation, it does not support approximate searching, in which there is tolerance for variation in the presented matching values. In other words, unless you know the exact values for at least one of the indexed records, you won't be able to find any matches.

This suggests the need for a second type of master data index that can be called a dynamic index. A dynamic indexing system uses the same two mapping tables, but it also relies on the same type of identity resolution techniques used to create the master data index in the first place.

The records in the search table need to be blocked according to the same blocking keys used for the identity resolution process to create the master data index. Any set of matching values presented by a data consumer is used to determine the blocks that might contain matching records, and the records in those blocks are selected from the search index. Instead of executing an exact match, each selected record is paired with the presented matching values and is subjected to the same similarity scoring method used for the batch identity resolution. At this point, the mapped unique identifiers for any search records with scores at or above the matching threshold are used to search the index table to find all other records that are linked to the found record. All the associated records can be retrieved and assembled into a result set given back to the data consumer.

The response time for the static master data index is relatively fast, as it can be executed using a standard query and then table accesses using that query's results. But although the searching process for dynamic indexing may take longer, the result is greater precision and accuracy in retrieving matching records, providing more complete visibility of information about the sought-after entity.

Dig Deeper on Data management strategies

Search Business Analytics

7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...
Knime updates framework for agentic AI development
The open source analytics vendor is keeping up with competitors by providing features aimed at enabling users to create ...
Data science applications across industries in 2025
Industries like healthcare, retail and finance use data science applications to improve diagnostics, optimize operations, ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

Search Content Management

The rise of AI-generated content
AI-generated content is revolutionizing media creation with speed and efficiency. Yet, it also raises ethical concerns and ...
CMS and e-commerce: How they differ and work together
Understanding the differences between CMSes and e-commerce platforms is vital for businesses. Yet, together, they help companies ...
8 Drupal security best practices
Drupal offers advanced security features, but admins must know how to implement and configure them. Best practices include using ...

Search Oracle

Oracle sets lofty national EHR goal with Cerner acquisition
With its Cerner acquisition, Oracle sets its sights on creating a national, anonymized patient database -- a road filled with ...
With Cerner, Oracle Cloud Infrastructure gets a boost
Oracle plans to acquire Cerner in a deal valued at about $30B. The second-largest EHR vendor in the U.S. could inject new life ...
Supreme Court sides with Google in Oracle API copyright suit
The Supreme Court ruled 6-2 that Java APIs used in Android phones are not subject to American copyright law, ending a ...

Search SAP

SAP agrees to allow Celonis data access until case resolved
SAP agrees to allow Celonis customers to access data from its systems as their legal battle continues, but customers will be best...
Grow with SAP fuels Phoenix Global's digital transition
Phoenix Global implemented S/4HANA Cloud via Grow with SAP to replace outdated systems, digitize manual processes and enable AI ...
SAP Sapphire 2025 news, trends and analysis
SAP showcased new business AI applications and continued to make the case for S/4HANA Cloud as the future of SaaS-based ERP ...

Close