terovesalainen - stock.adobe.com
ML Highlights Population Differences in Long COVID Risk, Symptoms
Weill Cornell Medicine researchers leveraged machine learning to uncover how long COVID symptoms and risk vary across populations.
Researchers from Weill Cornell Medicine using machine learning (ML)-based analysis of electronic health record (EHR) data found that long COVID risk and symptoms present differently across diverse populations.
The study, which was published last month in Nature Communications, aimed to characterize post-acute sequelae of SARS-CoV-2 infection (PASC), also known as long COVID, for different populations. Doing so remains a challenge because of the condition’s complexity.
“Long COVID is a new disease that is very complicated and quite difficult to characterize,” said Chengxi Zang, PhD, an instructor in population health sciences at Weill Cornell Medicine and lead author on the paper, in the press release. “It affects multiple organs and presents a severe burden to society, making it urgent that we define this disease and determine how well that definition applies among different populations. This paper provides the basis for furthering research on long COVID.”
To help investigate long COVID’s impact of various populations, the researchers analyzed EHRs from two Patient-Centered Clinical Research Networks (PCORnet).
The first dataset was pulled from the INSIGHT Clinical Research Network and contained data from 11 million patients in the New York City area. The second was sourced from the OneFlorida+ network, which included information from 16.8 million patients from Georgia, Florida, and Alabama.
Using these data and a machine learning tool, the research team identified diagnoses that were more common in patients with recent COVID infections versus non-infected patients.
“Our approach, which uses machine learning with electronic health records, provides a data-driven way to define long COVID and determine how generalizable our definition of the disease is,” Zang explained. “Comparing records across diverse populations in regions that experienced the COVID-19 pandemic differently highlighted how variable long COVID is for patients and emphasized the need for further investigation to improve the diagnosis and treatment of the disease.
Diagnoses found across both populations included hair loss, fatigue, abnormal heartbeat, blood clots in the lung, chest pain, sores in the small intestine and stomach, and dementia.
However, the researchers also found more types of symptoms and higher risk of long COVID in the New York City cohort. The research team indicated that these differences may be the result of multiple factors, such as New York City’s more diverse population and the fact that the area was one of the areas hit by the first waves of the COVID-19 pandemic.
Other institutions are also using ML approaches to investigate long COVID.
In January, researchers from the University of California, Berkeley leveraged ML software and EHR data to find common symptoms and identify condition subtypes of long COVID.
Their approach computationally modeled PASC phenotype data found in EHRs and assessed phenotypic similarities among patients. Using these, the research team then clustered patients into groups based on patient-patient similarity scores.
These efforts yieled six clusters of PASC patients with distinct profiles of phenotypic abnormalities.