Nobi_Prizue/istock via Getty Ima
How Architectural Privacy-Enhancing Tools Support Health Analytics
Architectural privacy-enhancing technologies play a key role in protecting patient data during healthcare analytics projects, but what are they?
Data analytics is critical to advancing healthcare quality and medical breakthroughs, but protecting patient data must be a priority throughout the process.
Privacy-enhancing technologies (PETs) are critical tools healthcare organizations can leverage for data privacy and security. PETs can be broken down into three categories: algorithmic, architectural, and augmentation. To support healthcare analytics, a combination of these types is recommended.
This is the second installment of a series breaking down each category of PET and its healthcare use cases, following a deep dive into algorithmic PETs.
Here, HealthITAnalytics will explore the second type: architectural PETs.
FEDERATED LEARNING
Unlike algorithmic PETs, which alter how data are represented to protect privacy, architectural PETs are concerned with the structure of the data or computation environments. These PETs focus on confidentially exchanging information without sharing the underlying data.
Federated learning is an approach that is often used in the development of artificial intelligence (AI) and machine learning (ML) models.
IBM conceptualizes federated learning as a method to help train these models without anyone having access to the data underpinning the model itself:
“Under federated learning, multiple people remotely share their data to collaboratively train a single deep learning model, improving on it iteratively, like a team presentation or report. Each party downloads the model from a datacenter in the cloud, usually a pre-trained foundation model. They train it on their private data, then summarize and encrypt the model’s new configuration. The model updates are sent back to the cloud, decrypted, averaged, and integrated into the centralized model. Iteration after iteration, the collaborative training continues until the model is fully trained.”
Research indicates that most federated learning applications for biomedical data are focused on radiology and oncology. Some use cases include brain imaging, COVID-19 diagnostics, tumor detection, cancer biomarker prediction, and Internet of Healthcare Things (IoHT) applications. Researchers have also proposed a federated learning framework to improve fairness in AI-based screening tools.
Researchers from the Perelman School of Medicine at the University of Pennsylvania (Penn) undertook the first application of federated learning in real-world medical imaging data in 2018.
The study describing these efforts was published in 2019, demonstrating that a deep-learning model trained via federated learning could accurately segment brain tumor images, achieving 99 percent of the performance of the same model when trained via traditional data-sharing methods.
The work helped to establish the feasibility of utilizing federated learning to address data acquisition, labeling, and sharing challenges typically associated with imaging analytics research.
That same year, researchers at Penn’s Center for Biomedical Image Computing & Analytics (CBICA) were awarded a three-year, $1.2 million federal grant to develop a federated learning framework focused on tumor segmentation.
The grant resulted in Penn spearheading a collaborative of 29 institutions around the world to advance these efforts.
The potential benefits of federated learning-based healthcare applications include enhancing data privacy, achieving a balance of accuracy and utility, enabling low-cost health data training, and reducing data fragmentation. The approach also enables asynchronous transmissions, which can bolster multi-institutional collaboration and communication.
Since federated learning allows users to move the model to the data, instead of vice versa, local model training doesn’t require high-dimensional, storage-intense medical data to be duplicated by each user. Researchers indicate this helps the model scale naturally with a growing dataset without increasing data storage requirements.
Federated learning can also be combined with other PETs, like differential privacy and secure multi-party computation, to fulfill additional data protection requirements in medical research. To this end, experts are working to benchmark strategies in federated learning for biomedical data.
However, there are some notable challenges to applying federated learning in healthcare analytics.
Like other PETs, federated learning is computationally intensive and requires high communication bandwidth.
Transparency is another issue. Keeping the training data used in a federated learning model private necessitates using a system to test the accuracy, fairness, and potential bias within the model’s outputs. But, the novelty of the technology means that such a system has yet to be developed and widely adopted.
Researchers are working to address these challenges, as shown by a recent proposal for a novel architecture to tackle missing values, data harmonization, and ‘learning task’ schema issues present in federated learning efforts in the biomedical space.
SECURE MULTI-PARTY COMPUTATION
Secure multi-party computation is an architectural PET that, like federated learning, allows parties to share data for computations without revealing that data, which is why the two are often used together.
Experts underscore that secure multi-party computation is an important tool for developing large-scale privacy-preserving applications, as it “enable[s] a group to jointly perform a computation without disclosing any participant’s private inputs. The participants agree on a function to compute, and then can use [a multi-party computation] protocol to jointly compute the output of that function on their secret inputs without revealing them.”
Successful secure multi-party computation also requires stakeholders to adhere to five processes: privacy, correctness, independence of inputs, guaranteed output, and fairness.
Privacy requires that no party involved obtain information about other parties, and each party should receive only the computed output, and correctness helps guarantee that the output is accurate. Independence of inputs dictates that necessary inputs must be provided independently by each party, while guaranteed output indicates that all parties should receive the output. But it also requires that parties are willing to respect the output.
Finally, fairness holds that each party should receive calculated outputs only if all other parties receive theirs.
Secure multi-party computation is being evaluated to bolster genome research and patient risk stratification. Researchers are also looking into how secure multi-party computation can support privacy-preserving healthcare data analytics and developing secure multi-party computation-based tools for biomedical research.
The use of this PET in other industries highlights some of its major benefits. A study evaluating the use of secure multi-party computation in the automotive industry found that the tool enables new strategies for technology-based control, reduces the need for inter-organizational trust, and prevents losing competitive advantage due to data leakage.
Despite these benefits, there are multiple limitations to the use of secure multi-party computation.
The PET requires stakeholders to trust the technology, which is a challenge in healthcare, as clinicians and healthcare stakeholders can be hesitant to buy into a 'black box' tool.
The approach can also introduce new risks of data misuse. Secure multi-party computation requires a certain level of communication between parties, and the nature of the PET creates the potential for two parties to collude to determine a third party’s data.
Essentially, if the first two parties are willing to share their data with one another, it is possible for them to deduce the third party’s data by doing so. Preventing this requires stakeholders to have a robust, secure multi-party computation strategy and implement certain protocols, like privacy zones.
Privacy zones can be set up by using multiple domains or servers that each contain sets of privacy restrictions. Using the privacy zone framework enables separate parties to engage in secure multi-party computation while protecting the data, as none of those data are stored on the same servers or located within the same domains.
BLOCKCHAIN
Blockchain isn’t always considered a PET on its own, but it can be when combined with approaches like secure multi-party computation, homomorphic encryption, and zero-knowledge proofs to protect data privacy.
Blockchain networks are distributed ledger technologies, meaning that they allow stakeholders to record, track, share, and synchronize information without a central entity. Each exchange made on a blockchain is transparent, immutable, and permanent as a result.
Each exchange or transaction is recorded as a 'block' of data with a unique identifier in the form of a 'hash.' This hash changes if the information within the block does.
Blocks are then connected to those before and after using a 'chain' that ensures the block cannot be altered. The chain also guarantees that another block cannot be inserted between two already existing ones.
In healthcare, blockchain has been used to overcome IT barriers, and it can be combined with AI to fuel big data analytics. Proposed blockchain applications for healthcare include EHR interoperability, improved data security, and fog computing for the Internet of Things (IoT).
Life sciences organizations have also leveraged blockchain to secure patients’ fertility data, a potentially valuable use case as healthcare rethinks patient privacy following the overturn of Roe v. Wade and experts push for healthcare data sharing to protect patient privacy.
Blockchains have the benefit of often utilizing decentralized architecture networks, which rely on multiple servers that can act as individual ‘master’ servers to manage data. In healthcare, these networks can help support data fluidity and break down data silos.
But to effectively leverage blockchain, healthcare organizations must understand how it potentially impacts healthcare data security and other measures. Some of the most important considerations, along with security, are confidentiality, data storage, and data availability.
Making blockchain privacy-compliant across sectors has also been a challenge, particularly in healthcare.
Research in this area suggests that one challenge lies in analyzing vulnerabilities to attacks within each layer of blockchain architecture to prevent data breaches. Using blockchain alongside other PETs can help minimize some of these risks. But, experts indicate that each PET has its own shortcomings, necessitating that stakeholders have a strong risk mitigation strategy for these, in addition to supporting the development of new PETs.