Getty Images
Machine Learning Tracks EHR Data to Predict Disease Risk
A machine learning technique can track patients’ EHR data over time to predict their risk of developing different diseases.
A new sequential approach uses machine learning to connect patients’ EHR data, including medications and diagnoses, to quantify disease risk, according to a study published in Cell Patterns.
While EHRs contain important information about patients’ health conditions and the care they receive, these records are not always precise. EHRs may not be direct indicators of patients’ true health states at different points in time, but rather reflect clinical processes, patients’ interactions with the system, and the recording process.
Researchers from Massachusetts General Hospital developed a strategy that uses machine learning to collect information on patients’ diagnoses and medications over time, rather than from independent health records.
“Over the past decade, billions of dollars have been spent to institute meaningful use of EHR systems. For a multitude of reasons, however, EHR data are still complex and have ample quality issues, which make it difficult to leverage these data to address pressing health issues, especially during pandemics such as COVID-19, when rapid responses are needed,” said lead author Hossein Estiri, PhD, of the Mass General Laboratory of General Science.
“In this paper, we propose an algorithm for exploiting the temporal information in the EHRs that is distorted by layers of administrative and health care system processes.”
Analyses revealed that this sequential approach can accurately predict the likelihood that a patient may actually have an underlying disease.
“The temporal relationships encoded in the new approach capture some of the complexities of the clinical process that are lost in the conventional approach,” the researchers noted.
For example, coronary artery disease followed by chest pain in the medical record was found to be more useful in predicting the development of heart failure than either of the factors on their own or in a different order.
The new approach could give providers new insights into common diseases, and highlight patterns that might be less obvious to caregivers.
“Our study doesn’t rely on single diagnostic codes but instead relies on sequences of codes with the expectation that a sequence of relevant characteristics over time is more likely to represent reality than a single element,” Estiri said.
“Additionally, the computer sorts through thousands of patients and can find sequences that a physician would likely never identify on their own as relevant, but actually are associated with the disease.”
The method could also help identify disease markers that are interpretable by clinicians. This could lead to new computational models for identifying and validating new disease markers and for advancing medical discoveries.
The proposed way of thinking about medical records could also help identify patients in a community who are at risk of developing a variety of other diseases and recommend their evaluation by healthcare practices.
The results of the study demonstrate the utility and value of temporal data in the EHR, and how this information can be used to inform care practices.
“Given the rapidly increasing prevalence of EHR systems in today’s practice, exploiting the temporal information in EHRs can advance medical knowledge discovery and meaningfully change clinical care by identifying and validating novel disease markers,” researchers concluded.
“Much like the genomics community, the identified sequences of medical records can be cataloged and shared on an accessible platform that would allow for the collaborative clinical use of the sequences as risk factors for diseases in many domains.”