Getty Images
Using Algorithmic Privacy-Enhancing Technologies in Healthcare Analytics
Successful healthcare analytics efforts require stakeholders to prioritize data de-identification through the use of algorithmic privacy-enhancing technologies.
Big data analytics in healthcare have the potential to advance medical research and lead to important breakthroughs, but ensuring both security and access to protected health information (PHI) can be a challenge.
Healthcare data access is a major barrier to the advancement of technologies like artificial intelligence, but implementing robust data de-identification considerations has the potential to mitigate this obstacle.
A key part of data de-identification within healthcare analytics involves finding the tools that best support an organization’s needs.
Many of these tools are known as privacy-enhancing technologies (PETs), which the Organisation for Economic Co-operation and Development (OECD) defines as “digital solutions that allow information to be collected, processed, analysed, and shared while protecting data confidentiality and privacy.”
There are many types of PETs, but only a handful are best suited to healthcare use cases.
Experts indicate that a framework combining algorithmic, architectural, and augmentation PETs can help ensure appropriate data de-identification and protect patient privacy during a healthcare analytics project.
Below, HealthITAnalytics will outline some of the algorithmic PETs used to support healthcare data de-identification and analytics.
HOMOMORPHIC ENCRYPTION
Algorithmic PETs are designed to protect data privacy by altering how the data are represented while still ensuring that the information is usable. These PETs, which encapsulate tools like summary statistics and encryption, provide measurability and mathematical rigor to the data being analyzed.
In healthcare, encryption is a boon to data security. However, it is important to note that healthcare data encryption and de-identification are distinct, though essential, processes.
Data encryption involves transforming data into a new form that makes it harder to access and gain meaningful insights, while de-identification focuses on removing identifiers from PHI. At their core, though, both are concerned with modifying the traceability or changeability of healthcare data to protect patient privacy, and the two are often used in tandem to that end.
Encryption typically involves leveraging mathematical models to scramble data so that only parties with a special key can unscramble, or decrypt, and access that data. The process involves transforming “plaintext” into “ciphertext,” a process akin to using a secret code to convey a message to ensure that only those who understand the code can read it.
Homomorphic encryption is one type of encryption that can be utilized in healthcare. It is different from other types of encryptions in that it enables users to perform computations on the encrypted data without requiring decryption or an access key. The results remain encrypted, which can be decrypted later by those with the access key.
Researchers reviewing the use cases for homomorphic encryption in healthcare demonstrated that the technology has been used to support the detection of Long QT Syndrome, cancer research, heart rate monitoring, prediction of heart attack probability, and medicine side effect query-generating systems.
This PET has also been proposed for medical image encryption to bolster cloud storage security.
Despite its promise, homomorphic encryption presents some drawbacks, including that it is often more complex, limited, and slow compared to traditional encryption. The complex nature of this tool also requires that stakeholders wishing to implement it within their systems have access to engineers with a strong cryptography background and expertise in homomorphic encryption.
DIFFERENTIAL PRIVACY
Differential privacy is an approach that is often used in combination with tools like encryption.
It is not a specific process or tool that can be used on its own but a property that a process or tool can have. For example, an algorithm can be considered differentially private if it can be proven that said algorithm satisfies the parameters of differential privacy.
These parameters were first outlined in 2006, but a definition for data privacy was later laid out during a 2016 lecture: “[data privacy is achieved when] the outcome of any analysis is equally likely, independent of whether any individual joins, or refrains from joining, the dataset.”
The Harvard University Privacy Tools Project expands on this, noting that “an algorithm is said to be differentially private if by looking at the output, one cannot tell whether any individual's data was included in the original dataset or not.”
Using this framework, an algorithm’s behavior barely changes when one individual’s data is added to or removed from the dataset. Because of this, that algorithm’s outputs will be roughly the same, whether an individual contributes their data or not.
But how is this achieved?
According to the Brookings Institution, the success of differential privacy relies on random “noise” that is added to computations performed on a dataset. The amount of noise added is pre-determined and serves to obscure information about individual data.
Research suggests that differential privacy may be valuable for medical data analysis, particularly to improve existing medical data publishing methods. Some studies have also found that differential privacy has application potential for public health surveillance, genomics, and medical image analysis.
The major benefit of utilizing differential privacy is that it assumes that all information is identifying information, which helps address concerns about potential re-identification via auxiliary information. It is also resistant to auxiliary information-related privacy attacks.
These characteristics allow researchers to gain insights into patterns about a given population without compromising the privacy of any individuals within that population.
However, differential privacy is computationally intensive. It requires vast datasets, personnel, and resources to deploy. Organizations leveraging differential privacy could also overstate how much privacy they provide to users, or they may not wish to disclose how much private information they use in their analyses.
Additionally, differential privacy is relatively new, making standards, best practices, and tools to support its use difficult to access outside academic research communities.
ZERO-KNOWLEDGE PROOFS
Zero-knowledge proofs are another type of algorithmic PET with potential for use in healthcare analytics.
The National Institute of Standards and Technology (NIST) defines zero-knowledge proofs as “cryptographic schemes in which a prover is able to convince a verifier that a statement is true without providing any more information than that single bit (that is, that the statement is true rather than false).”
This PET is often adopted within blockchain applications, but research suggests that it may improve privacy and efficiency for other use cases, such as banking and economics, as well. In healthcare, researchers have proposed that zero-knowledge proofs could bolster Internet of Healthcare applications (IoHA).
Insights from the Centers for Disease Control and Prevention’s (CDC) Emerging Technology & Design and Acceleration Branch (ETDAB) indicate that zero-knowledge proofs could also be leveraged to enhance the security and privacy of patient data by improving the accuracy and completeness of patient records. This would enable providers to prove that they have access to a patient’s medical record without revealing its contents.
Data privacy is the main benefit of relying on zero-knowledge proofs, but doing so also means that the cost of verifying the correctness of a statement could be reduced significantly.
Computations require human capital in addition to resources like electricity and hardware. Technologies like blockchain often require users to re-execute a computation to verify the correctness of each block added to the end of a chain. Zero-knowledge proofs, on the other hand, allow one party to prove to others that the computation was executed correctly the first time.
Despite these benefits, zero-knowledge proofs are subject to a drawback shared by many PETs: their novelty. Because these technologies are newer than more traditional methods of privacy protection, standards for deployment and management are often difficult to find or establish.
While zero-knowledge proofs and other PETs have the potential to significantly improve the security and privacy of patient data, it is important to note that they are still in the early stages of development, and there are some limitations to their use. For example, zero-knowledge proof systems can be complex and resource-intensive, making them difficult to deploy and manage. It is also important to consider the potential risks associated with using these technologies, such as the risk of data being lost or corrupted.
In addition to these algorithmic PETs, users wishing to engage in data de-identification for healthcare analytics may also utilize data masking or obfuscation, which involves modifying the data in such a way that it will be of little or no value to unauthorized personnel.
This approach has been used in healthcare-driven machine learning (ML) and healthcare business intelligence (BI) platforms and to improve security for contact-tracing apps.