Sergey Nivens - Fotolia

Maintaining data integrity key for data quality

Maintaining data integrity through improved communication and data literacy is paramount for organizations in the enterprise seeking to ensure data quality and trust.

For Kevin Quinn, senior director of data management and business intelligence at University of Pittsburgh Medical Center, building data integrity and quality into his organization was difficult.

Over the last five years, his team has learned that a partnership between departments and leaders within your organization is crucial to create trustworthy data. Though not every team is dealing with the data directly, clinicians and administrators need to feel comfortable with the processes and language. Building that enterprise-wide trust and communication has helped maintain data integrity for UPMC.

Organizations need to adjust their processes, involve different parts of the organization and invest in improving employees' data literacy in order to maintain data integrity and data quality.

What is data integrity?

Data integrity is the maintenance of the accuracy and consistency of data throughout its life. It is a key piece to the implementation and use of any system that stores or relies on data. Organizations have become more dependent on data in their decision-making process, which heightens data integrity's importance further.

"Data integrity is one of the aspects about overall data quality," said Melody Chien, senior research director at Gartner.

Overall the intent of data integrity is to ensure that data is recorded and retrieved as intended and that there is no effect on results because of unintentional changes throughout the data lifecycle.

Types of data integrity

Data integrity can broadly be separated into two categories:

  • Physical integrity: Ensuring physical integrity relates to the difficulties associated with storing and retrieving data. These problems include power outages, storage corrosion and natural disasters.
  • Logical integrity: Ensuring logical integrity means tackling software bugs, design flaws and human error. Logical integrity also focuses on the correctness of a piece of data in a particular context.

Data integrity and data quality

Data integrity ties into building better data quality in your organization, and it can be difficult to measure.

"Looking into the tables, looking into the data source to see what's going on," Chien said. "This is what we call measuring data quality."

Chien said organizations can use data quality tools in order to ease this burden and find any issues within the data set. Any changes to data as it is brought in or taken out can have dramatic effects on results and conclusions, leading to misinformed decisions.

Building communication for better data integrity

To prevent issues with data integrity, it is best to ensure different areas of an organization are on the same page. Encouraging proper communication and cooperation across departments can be a difficult transition for organizations. This was especially difficult in years past as teams that focused on data and analytics were less common.

Now that the infrastructure is there, Quinn and his team work with organization leaders and encourage them to ask questions about results and the data process. Any strange results or odd conclusions can be brought up to the appropriate team so they can check the data integrity. It has evolved over time with a lot of back and forth to make sure the right questions are being asked and all involved are comfortable with the process.

"Oftentimes people don't know what questions can be answered with data," Quinn said. "We listen to the questions and we try our best to answer them with what we know so far."

Better questions come from improving the data literacy within the organization. Establishing lines of communication between departments is an important piece, but without data literacy the conversations can't proceed as effectively as they should. Issues and errors can't be easily explained if one side doesn't understand the process the same way.

"I tend to think of it from like almost a data literacy angle," Quinn said. "People don't know what is available or what does exist or what can be done."

The investment in training less-technical departments allows those teams to better understand where their expectations should be and what might be attainable. This also allows for more back and forth between departments to increase the amount of checks in a system reliant on technical expertise.

"It's very important that you have alignment, you have agreements about your data quality rules and policies," Chien said. "So that people can sit down together and agree on what would be the expectation for certain data objects."

Tips for maintaining data integrity

Some tips to maintain data integrity are:

  • Data profiling. This is the process of examining the available data from a source and creating profiles about this data. By going through and getting to know more about the individual parts of a larger data set, an organization can ensure their data is without issue before using it.
  • Data cataloging. Keep an organized inventory of your organizations' data assets. This allows for a better understanding of your data's attributes and can prevent poor-quality data from hindering business operations.
  • Improving data literacy. The line-of-business part of organizations has become a larger part of the data and analytics process. This has increased the importance of ensuring all employees have a level of familiarity and comfort when it comes to discussing data and common issues.

Patience can improve data integrity

In order to align different departments and improve data literacy, organizations have to be aware of their competitors' successes and failures and be patient with their internal development. For Quinn and UPMC, the transition might have been difficult, but they have seen benefits over time.

"You almost learn what not to do, and how other industries and other organizations have tripping points -- and you learn from that," Quinn said.

Other parts of UPMC, such as the clinical analytics team, had a working familiarity with UPMC data, which aided Quinn. His team took it a step further by attempting to use that same data with a more rounded approach.

"That hard work is absolutely paying off now because we have a lot of those processes built," Quinn said. "Five years ago, it was okay to think, 'Let's look at just acquiring data on a monthly basis.'"

This evolved over time from monthly to weekly to now hourly. Quinn's team had to learn what could be done from a technology perspective while balancing the business at large's comfort level with the process. Processing massive amounts of data has become easier but has allowed for more opportunities for data integrity to slip. A slow approach has allowed his team to take on more manageable challenges without getting overwhelmed.

An organization needs to ensure data integrity in order to trust their data and their decisions. Patience and communication can help, but there must be a cultural commitment to maintaining data integrity if an organization seeks to succeed.

Next Steps

Airbyte set to advance open source data integration platform

Bigeye raises $17M Series A funding to boost data quality

Dig Deeper on Data management strategies