Getty Images

Supporting Secure Data Sharing, Patient Privacy During COVID-19

Regenstrief Institute is partnering with NIH and other organizations to promote secure data sharing and enhance research related to COVID-19.

When COVID-19 began spreading across the US, the healthcare industry quickly moved to improve its secure data sharing practices in order to accelerate research efforts and treatment development.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

Because the crisis is occurring at such a large scale, leaders had to come up with a way to safely share data related to the virus among different organizations.

“When the pandemic hit, it became obvious that we should have a large COVID-19 database that people could use for research,” Umberto Tachinardi, MD, MSc, chief information officer for the Regenstrief Institute and director of informatics for Regenstrief and Indiana Clinical and Translational Sciences Institute (CTSI), told HealthITAnalytics.

This idea led to the creation of the National COVID Cohort Collaborative (N3C), a national effort to securely collect data to help scientists understand and develop treatments for COVID-19. NIH launched the N3C as a centralized analytics platform to store and study large amounts of EHR data from people tested for the virus.

Umberto Tachinardi, MD

The N3C is a partnership among the National Center for Data to Health (CD2H) and National Center for Advancing Translational Sciences (NCATS)-supported Clinical and Translational Science Awards (CTSA) Program hubs, with stewardship by NCATS.

Since its launch, the initiative has expanded to include data from a range of institutions, Tachinardi noted.

“Because of its success, other healthcare organizations have started to send data to the N3C enclave, not only CTSA hubs. We're talking about close to 100 healthcare organizations that are sending data to this platform, and it's all de-identified by design,” said Tachinardi.

To further preserve data security and patient privacy for the initiative, Regenstrief Institute recently announced that it will serve as the project’s Honest Data Broker, employing processes and technologies to ensure N3C data are shared in compliance with HIPAA standards.

These practices will help investigators overcome the challenges of securely collecting patient-level data, which is typically fragmented and difficult to use in large-scale research.

“EHR data comes labeled with identifiers – the patient’s name, date-of-birth, insurance numbers, and zip codes. One of the things that the sites do before they send the data to the enclave is remove most of the identifiers, and they will only send a little bit of a pseudo-identifier, or close-to identifiers, authorized by HIPAA. We call that the limited data set,” Tachinardi explained.

“However, once we get rid of all those identifiers, the problem is that we cannot add more data to a patient. So, if data was sent by a laboratory and another set of data was sent by a hospital that is separated from that laboratory, we’d need some identifiers in order to link these datasets together.”

To address this issue, N3C sought to develop a solution that would enable researchers to continue to keep the identification while simultaneously allowing for patient matching – even without knowing who the patient is.

That solution, called the privacy-preserving record linkage (PPRL), eliminates the need to expose identifiers and will help Regenstrief leaders ensure N3C data is shared securely, safely, and privately.

“We’re using a technology based on cryptographic methods. Once the site strips the identifiers out of the data, the software will use pieces of all those identifiers to create a hashed token. To make sure that it's even more difficult to re-identify patients, once the hash is created, we do another scrambling of the characters that define the token. So, it's kind of a double-encryption,” Tachinardi said.

“We’re now setting up an infrastructure to start supporting sites that are sending the de-identified data to the enclave. Once they join this process, those tokens will be stored at Regenstrief. We will not store any identifier, but we will keep the tokens so we can provide the means for linking the data itself without ever exposing the identifier.”

PPRL will also help researchers associated new pieces of data with an individual without revealing the person’s identity, Tachinardi said.

“For instance, we can associate an individual with images, genomic data, social determinants, future information. When people start getting vaccinations, we can continue to track what's going on with them. This is going to be very exciting,” he said.

The N3C initiative will provide researchers and providers with critical information related to the virus and its impact on different patient populations.

“Right now, we don't know the long-term effects of COVID-19, and because this is a database of COVID-positive patients, we’ll have information that can help us learn more about the virus and its side effects,” Tachinardi said.

“The larger the cohort, the faster we'll get answers and the higher the possibility that we’ll find something new.”

Going forward, Tachinardi believes that innovations that were accelerated due to the COVID-19 pandemic will continue to serve the healthcare industry even after the crisis has passed.

“The concept of data sharing and data integration is not new, but developing an infrastructure like this in a matter of months is unprecedented,” Tachinardi concluded.

“There are a number of accomplishments that we are seeing here that can be repeated, and now we know better. So, in the event of another pandemic, we can achieve these things faster and cheaper. We can also apply these techniques to other conditions, like diabetes, cardiovascular disease, and obesity. The principles of N3C will definitely be a game changer in healthcare research.”

Dig Deeper on Health data governance