NEW YORK – In a set of new papers, members of the National Institutes of Health's All of Us research effort have released data on almost a quarter-million genome sequences generated on individuals from diverse populations across the US, highlighting the genetic variation and disease risk clues that have been gained from the data so far.
"The All of Us Research Program has performed whole-genome sequencing on samples from nearly a quarter of a million individuals from diverse backgrounds," Alexander Bick, a genetic medicine researcher with the Vanderbilt University Medical Center, said in an email. "The resource will accelerate genomic medicine discoveries benefiting all of us."
For the first of the papers, published online in Nature on Monday, Bick and colleagues at Vanderbilt University, Baylor College of Medicine, the Broad Institute, and elsewhere provided a high-level view of the genetic variation and disease risk insights identified in clinical-grade whole-genome sequences for 245,388 All of Us participants. They noted that more than three-quarters of the individuals came from groups that have been underrepresented in biomedical research in the past, while nearly 46 percent self-identified as non-European.
"Historically, large groups of individuals have been left out of biomedical research in general and genomics research in particular, limiting our comprehensive understanding of human health," Bick explained. "All of Us seeks to address this gap by nurturing partnerships with at least a million participants who reflect the diversity of the United States and delivering information they share through a dataset broadly accessible to researchers."
With these genomes — which belong to an All of Us genomics data release — the team tracked down some 1.1 billion genetic variants. Among them were more than 275 million genetic variants not found in prior studies, including more than 3.9 million novel variants falling in protein-coding portions of the genome.
The authors reported that summary-level data are publicly available, and researchers can access individual-level data through the All of Us Researcher Workbench using a unique data passport model, with a median time of 29 hours from initial researcher registration to data access. "We anticipate that this diverse dataset will advance the promise of genomic medicine for all," the authors noted.
By combining the genome sequences with electronic health record data for more than 287,000 individuals, survey responses from around 413,370 participants, physical measurements for more than 337,500 individuals, and array-based genotyping profiles for 312,925 All of Us participants, meanwhile, the investigators flagged more than 3,700 genetic variants with apparent ties to 117 conditions, prompting them to take a closer look at the genetic associations behind low-density lipoprotein (LDL) cholesterol.
The work pointed to high replication between some of the associations in European and African ancestry participants, Bick said, noting that the investigators successfully brought genomic data generated at three sequencing centers together with EHR data from more than 50 US health systems for the analyses.
For their part, members of an international team led by investigators in the UK and Germany assimilated data from the All of Us Research Program, the Million Veteran Program, Biobank Japan, and other large research efforts in the US and beyond to dig into genetic contributors to type 2 diabetes (T2D) for an effort known as the Type 2 Diabetes Global Genomics Initiative — work they described in another Nature study. With GWAS data for more than 2.5 million individuals, including 428,452 individuals with T2D, for example, the researchers tracked down almost 1,300 variants with genome-wide significant T2D associations.
Combining published ATAC-seq chromatin accessibility profiles with single-cell chromatin accessibility atlases for 222 adult or fetal cell types, they mapped the variants to 611 new or known loci, which fell into eight distinct clusters based on established relationships to cardiometabolic traits.
"These clusters are differentially enriched for cell type-specific regions of open chromatin, including pancreatic islets, adipocytes, endothelial cells, and neuroendocrine cells," the authors reported, noting that they incorporated data for up to 279,552 more diverse individuals to further assess associations with vascular outcomes and come up with polygenic risk scores (PRS) specific to T2D clusters linked to coronary artery disease and other cardiometabolic traits.
"[O]ur findings show the value of integrating multi-ancestry GWASs of T2D and cardiometabolic traits with single-cell epigenomics across diverse tissues to disentangle the etiological heterogeneity driving the development and progression of T2D across population groups," authors of that study explained. "Improved understanding of the varied pathophysiological processes that link T2D to vascular outcomes could offer a route to genetically informed diabetes care and global opportunities for the clinical translation of findings from T2D GWASs."
In Communications Biology, researchers at Baylor College of Medicine, the University of Washington, and elsewhere used All of Us cohort data to explore ancestry-related differences in pathogenic or likely pathogenic variant frequencies in individuals from African, Latino/admixed American, East Asian, European, Middle Eastern, South Asian, and other ancestry groups.
When the team considered pathogenic or likely pathogenic variants implicated in everything from breast cancer, Li-Fraumeni syndrome, or Lynch syndrome to familial hypercholesterolemia, dilated cardiomyopathy, or long QT syndrome, it saw enhanced pathogenic variant representation in European ancestry individuals but more muted rates in individuals of African or Latino/admixed ancestry.
"This variability is likely the result of multiple factors, but ascertainment of pathogenic variants in databases is likely to contribute substantially," the authors suggested, noting that "[f]uture work will show whether variant interpretation of the All of Us diverse cohort will have an impact on this ascertainment bias of pathogenic variants, but future targeted efforts that aim to perform clinical interpretation of non-European participants could also be necessary."
In Nature Medicine, meanwhile, researchers relied on a National Human Genome Research Institute-funded "Electronic Medical Records and Genomics" (eMERGE) framework to come up with human diversity-informed polygenic risk scores for 10 chronic conditions.
Authors of that study suggested that "the eMERGE Network's work in PRS development represents an important step forward in the implementation of PRS-based risk assessment (in combination with other risk estimates from monogenic testing and family history) in clinical practice."
Finally, in a corresponding commentary in Nature Medicine, All of Us Research Program investigator Joshua Denny and his colleagues offered an overarching view of the program and its goals, along with the progress so far and remaining gaps.
"We represent the NIH institutes, centers, and offices and All of Us," they wrote, "and we invite researchers to join us in expanding the All of Us platform to all disease domains, diverse populations, common and rare genomic variants, social and commercial determinants of health, or other modalities to ensure the advancement of precision health, medicine, and equity for everyone."