Getty Images/iStockphoto

How Regenstrief’s Privacy-Preserving EHR Linkage Fuels Clinical Research

NIH is leveraging Regenstrief’s privacy-preserving record linkage (PPRL) technology for COVID-19 clinical research.

Regenstrief Institute supports privacy-preserving record linkage (PPRL), which protects patients’ identities while allowing researchers to access health data from EHRs and other sources for clinical research.

“The data in the healthcare space can be fragmented,” Umberto Tachinardi, MD, MSc, chief information officer and interim president and CEO for Regenstrief Institute, told EHRIntelligence in an interview.

Connecting healthcare data in an identified way requires patient consent, which is a complex, costly, and lengthy process, he explained.

“In cases that we need a lot of data for a lot of people very fast, the only way to do that is connecting data without using the person identifiers,” Tachinardi said. “That's where the privacy preserving record linkage comes into play, because none of the true identifiers are used to identify a person.”

PPRL, developed in partnership with Datavant and Indiana Clinical and Translational Sciences Institute (CTSI), leverages technology that produces unique sets of de-identified tokens that are used to match patients. The tokens contain no relationship with true identifiers, and they cannot be used to reproduce those identifiers.

The National Institutes of Health (NIH) recently extended its relationship with Regenstrief Institute as the Linkage Honest Broker (LHB) for the National COVID-19 Cohort Collaborative (N3C), a national effort to gather EHR data to help scientists understand COVID-19.

The N3C has a unique model where NIH holds data in a centralized FedRAMP platform, and Regenstrief holds de-identified tokens corresponding to the patient data and connects up tokens originating from disparate data contributors.

Tachinardi, who serves as the N3C Linkage Honest Broker (LHB) director, explained that N3C is combining data from tests, viral variants, mortality, and vaccinations with EHR data using PPRL.

He noted that when sites contribute EHR data for their patients, they can define what kind of linkage they agree to. For instance, they could choose to permit linkages to mortality data, but not viral variant data.

Regenstrief plans to incorporate claims data from the Center of Medicare and Medicaid Services (CMS) upon approval by the National Institute of Health (NIH) National Center for  Advancing Translational Sciences (NCATS) and CMS into the N3C data enclave as well, which Tachinardi said will have major value for clinical research.

“The claims contain some redundancy with data that we obtain from EHR systems, but they can have additional information,” he explained. “For instance, we will have a lengthier vision of the patient medical history.”

Regenstrief will be able to track information from providers that are not part of the network, as the claims information will include data from all services, Tachinardi elaborated.

“If a patient is seen in systems that contribute data to N3C, but also in systems that do not contribute data, we will see data for both,” he said. “That's going to be extremely useful in learning more details about the medical history and the medical conditions for each patient.”

Tachinardi also noted that incorporating CMS claims data will provide more complete medication information.

“If the patients are also covered by the Medicaid program, we may have some information about prescriptions paid by their claims,” he said.

Tachinardi noted that Regenstrief wants to build an evaluation platform to make sure that patient matching using this technology is accurate and is not adding biases to the data.

Underserved populations may have, for instance, names that are not easy to spell, he noted.

“The spelling of the name is one of the contributors to the tokens that are generated to produce the PPRL, so those people may be negatively affected if we don't introduce certain improvements in the algorithm,” he explained.

Inaccuracy of demographic information may also affect patient matching because the tokens are generated out of those identifiers, Tachinardi added.

Regenstrief is working to ensure changes to demographic information can be properly detected.

“We will not fix anything, but as long as we can detect, we can improve the matching to not carry forward biases that are added by our process,” he said. “Of course, we cannot improve the data beyond what was originally captured, but we can make sure that we are not adding more inconsistencies in the data moving forward.”

The contract between Regenstrief and NIH, initially worth more than $2.3 million over the first year, allows for expansion of the data linkage based on future needs of the NIH and the National Center for Advancing Translational Sciences, which could include the use of PPRL to create databases for other conditions.

“While we are focusing primarily on COVID specific, there are no limitations in the architecture of the platform to stop there,” Tachinardi said. “We can easily adjust it to other conditions.”

An additional section of the agreement could increase the funding up to its pre-approved ceiling of $15 million for the first two years, Regenstrief officials noted.

Next Steps

Dig Deeper on Health IT optimization