
Rasi Bhadramani/istock via Getty
New ML tool enables more equitable genomic research
A new machine learning tool could help reduce ancestral bias in genetic data, improving equity in genomic research and personalizing care across diverse populations.
Though the risk of bias among AI tools is well-documented, a new tool may help close gaps in genetic research by mitigating ancestral bias in datasets, boosting equity in genomic precision medicine.
Developed by University of Florida researchers, the PhyloFrame tool uses machine learning (ML) to correct ancestral bias in genetic data. Ancestral bias refers to the diversity gaps in databases where individuals with European ancestry are dramatically overrepresented while other groups are underrepresented. The study authors note that the largest genomic database explicitly defining ancestry, GWAS Catalog, is currently 95% European, highlighting the need to close diversity gaps.
"If our training data doesn’t match our real-world data, we have ways to deal with that using machine learning," Kiley Graim, Ph.D., an assistant professor in the Department of Computer & Information Science & Engineering at the University of Florida who led the research, said in a press release. "They're not perfect, but they can do a lot to address the issue."
The researchers described the new ML-based PhyloFrame tool and how it could improve precision medicine outcomes in the journal Nature Communications. The tool accounts for ancestral bias by integrating functional interaction networks and population genomics data with transcriptomic training data. Essentially, it combines large datasets of healthy genomic data with the smaller disease-specific datasets used to train precision medicine models.
The researchers compared PhyloFrame and benchmark models across several performance metrics using genomic data for breast, thyroid and uterine cancers. They found that PhyloFrame performed better across several metrics and was able to correct ancestral bias.
"PhyloFrame signatures are more consistent across models, demonstrating more stability and therefore likely more biological relevance than the comparable benchmark," they wrote. "Hard-to-predict samples are better served by PhyloFrame models than the benchmark. Furthermore, when considering individuals from ancestries that are severely underrepresented in the data (e.g., East African), PhyloFrame models are better able to accurately predict outcomes."
Researchers also noted that the PhyloFrame tool's ability to offer accurate predictions by accounting for genetic data gaps could enhance ML models across populations.
"We want these models to work for any patient, not just the ones in our studies," Graim said. "Having diverse training data makes models better for Europeans, too. Having the population genomics data helps prevent models from overfitting, which means that they'll work better for everyone, including Europeans."
The PhyloFrame is the latest example of an AI tool that could enhance precision medicine efforts.
In May 2024, Pennsylvania State University researchers announced a new AI model to provide insight into how gene expression impacts autoimmune disease risk. The EXpression PREdiction with Summary Statistics Only (EXPRESSO) tool models the ways in which autoimmune disease-associated genes are expressed and regulated, which could be used to flag additional risk genes and improve therapies.
Further, Arizona State University developed an ML model to predict whether a patient's immune system will recognize pathogens and other foreign cells, which could enable clinical care teams to personalize treatment plans for conditions like cancer.
Anuja Vaidya has covered the healthcare industry since 2012. She currently covers the virtual healthcare landscape, including telehealth, remote patient monitoring and digital therapeutics.