275 Million New Genetic Variants Identified in “All of Us” Dataset

Nearly 4 million newly identified genetic variants found in 250,000 participants from the NIH’s All of Us Research Program are thought to be tied to disease risk.

Shania Kennedy, Assistant Editor

Published: 20 Feb 2024

Researchers have identified over 275 million previously unreported genetic variants within the National Institutes of Health’s All of Us Research Program dataset, which may provide additional insights into the genomic drivers of health and disease.

The findings, published this week in Nature, may help advance genomic research for populations historically underrepresented in these studies, as approximately half of the genomic information in the dataset is from participants of non-European genetic ancestry.

Over 90 percent of participants in large genomics studies to date have been of European genetic ancestry, leading to significant concerns around health equity in precision medicine.

“As a physician, I’ve seen the impact the lack of diversity in genomic research has had in deepening health disparities and limiting care for patients,” said Josh Denny, MD, MS, chief executive officer of the All of Us Research Program, in the news release. “The All of Us dataset has already led researchers to findings that expand what we know about health – many that may not have been possible without our participants' contributions of DNA and other health information. Their participation is setting a course for a future where scientific discovery is more inclusive, with broader benefits for all.”

Research teams are already utilizing the All of Us dataset to explore how genomic diversity can enhance precision medicine.

In one study, published in Communications Biology, researchers looking at the frequency of pathogenic variation identified in the All of Us cohort found ancestry-driven disparities. The research team underscored that the significant variability in the frequency of variants associated with disease risk among different genetic ancestry groups may be the result of limited diversity and a disease-focused approach in past studies.

In another study, published in Nature Medicine, researchers used the All of Us dataset to optimize polygenic risk scores for 10 common diseases across diverse populations. The research team emphasized that the genomic datasets used to inform polygenic risk scores typically overrepresent individuals of European genetic ancestry, creating representation gaps that can misrepresent a person’s risk for disease.

To date, the All of Us dataset represents roughly three times as many participants of non-European ancestry than other datasets previously used to calculate polygenic risk scores. Using these data, the researchers recalibrated polygenic risk scores for atrial fibrillation, breast cancer, chronic kidney disease, coronary heart disease, hypercholesterolemia, prostate cancer, type 1 diabetes, type 2 diabetes, obesity, and asthma.

Using the optimized scores, the research team found that one in five study participants were at high risk for at least one of these conditions. Further, participants’ diverse ancestral backgrounds helped demonstrate that the recalibrated polygenic risk scores were effective for all populations and not skewed toward individuals of European ancestry.

These efforts play a key role in advancing the All of Us program’s goal of improving diversity in medical research.

“All of Us values intentional community engagement to ensure that populations historically underrepresented in biomedical research can also benefit from future scientific discoveries,” said Karriem Watson, DHSc, MS, MPH, chief engagement officer of the All of Us Research Program. “This starts with building awareness and improving access to medical research so that everyone has the opportunity to participate.”

275 Million New Genetic Variants Identified in “All of Us” Dataset

Nearly 4 million newly identified genetic variants found in 250,000 participants from the NIH’s All of Us Research Program are thought to be tied to disease risk.

Next Steps

Dig Deeper on Precision medicine

New ML tool enables more equitable genomic research

Ada Lovelace Institute warns NHS against deploying genomics AI

NIH Precision Medicine Data Detects 275 Million New Genetic Variants

Explaining the Basics of Patient Risk Scores in Healthcare