Home Nature | Exome Sequencing of 454,787 Individuals Reveals 564 Genes Linked to Health-Related Traits, Paving the Way for Functional Genomics and Therapeutic Discovery

Nature | Exome Sequencing of 454,787 Individuals Reveals 564 Genes Linked to Health-Related Traits, Paving the Way for Functional Genomics and Therapeutic Discovery

Nov 30, 2021 14:01 CST Updated 14:01
Regeneron

Biopharmaceutical Manufacturer

UK Biobank

Health Information Provider

A major goal of human genetics is to use natural variation to understand the phenotypic effects caused by alterations in each protein-coding gene in the genome. Genomicists once estimated that achieving the initial goal of understanding the impact of genetic variations in every human gene on health might require sequencing millions of well-characterized individuals. In recent years, with the advancement of gene sequencing technology, large-scale population genome sequencing has made the "gene-phenotype" goal of human genetics seem attainable.

Recently, fromThe research team from Regeneron's Genetic Research Center collaborated with the UK Biobank team., performed exome sequencing on 450,000 participants and utilized bioinformatics analysis to study the phenotypic changes brought about by protein alterations.The research findings have been published inNatureAbove, the article is titled "Exome sequencing and analysis of 454,787 UK Biobank participants".

Corresponding author of the article, Dr. Manuel A. Ferreira

The research team conducted exome sequencing on 454,787 UK Biobank study participants to explore protein variations and their effects. Within the coding regions of 18,893 genes,The research team identified 12 million coding variants, including approximately 1 million loss-of-function variants and around 1.8 million deleterious missense variants (Figure 1).The number of coding variants identified in this study exceeds the total number of coding variants in both the TOPMed and gnomAD databases.Among the identified variants, there are 3,457,173 synonymous mutations, 7,878,586 missense mutations, and 915,289 predicted loss-of-function (pLOF) variants (Figure 1). This latest coding variant database, combining a large number of samples and thousands of available phenotypes, provides a resource for large-scale evaluation of gene function.

Figure 1. Statistics of gene variants identified in exome sequencing data. Source:Nature 

Large-Scale Exome Sequencing Data Reveals Associations Between Genes and Human Traits

Generally speaking, sequencing data cannot be used to explain gene function. To confirm that the analysis of exome sequencing data can be directly used for gene function analysis,Researchers attempted to associate the variants found in the study with 3,994 health-related traits and discovered that 564 genes were linked to these traits.

The research team first analyzed WES data from 430,998 individuals of European ancestry and conducted approximately 2.3 billion association tests for each trait and individual variant across 18,811 genes, eventually discovering8,865 significant associations, involving 564 genes, 492 traits, and 2,283 gene-trait pairs(Figure 2). The large number of gene and phenotype associations identified in this study provides analytical pathways for understanding the phenotypic impact of human protein variation and identifying new therapeutic targets.

Figure 2. 564 genes associated with rare variants. Source:Nature

The results showed that one of the significant associations was between SLC9A3R2 and the risk of hypertension. SLC9A3R2 encodes NHERF-2, a renal-expressed scaffold protein functionally associated with sodium absorption through its interaction with sodium/hydrogen transporters. Studies have found,Under the condition of Arg2200Cys in PKD1, the burden of rare putative loss-of-function (pLOF) mutations, deleterious missense mutations, and Arg171Trp in SLC9A3R2 remains highly correlated with systolic blood pressure (SBP), diastolic blood pressure (DBP), and hypertension.Overall, this signal is consistent with the role of sodium balance in regulating blood pressure, and reasonable blockade of SLC9A3R2 can manage blood pressure levels.

Another is the association between the risk of childhood asthma and rare putative loss-of-function (pLOF) mutations and deleterious missense variants in SLC27A3. SLC27A3 encodes an acyl-CoA synthetase that activates long-chain fatty acids and is most highly expressed in arterial, adipose, and lung tissues. Analysis revealed,SLC27A3 is associated with the number of blood eosinophils, a cell type with critical effector functions in allergic asthma.

What scale of exome sequencing is required to analyze all variants?

Under the current sample size, it is necessary to closely match the number of observed variants analyzed with the number of predicted variants. Thus, the research team used the current dataset as a baseline to perform extended predictions.When exome sequencing at a scale of 5 million people becomes feasible, the research team predicts that 18,035 genetic mutations will be captured, covering the vast majority of human protein-coding genes.(Figure 3).

Figure 3. Number of genes with mutations in the exome sequencing data. Source:Nature

At the same time, the results show that when using reference panels of different sizes, the larger panel has a stronger ability to estimate rare variants. Therefore, the research team expectsAs the reference panel size grows to 400,000 individuals or more, it becomes possible to estimate rarer variants.(Figure 4).

Figure 4. Inference of rare variants from exome sequencing. Source:Nature

Summary

The study performed exome sequencing on 454,787 individuals and constructed a dataset larger than the combined total of TOPMed and gnomAD. The research team identified potentially deleterious variants in 564 genes during the initial analysis. These findings reveal new biological functions of multiple genes and potential therapeutic strategies, including enzyme replacement, therapeutic blockade, and other approaches. Additionally, the team expanded the sequencing population to 5 million individuals, identifying mutations in approximately 15,000 genes, covering most of the human protein-coding genome. The above results and datasets will provide resources for generating new datasets, promoting the comprehensive analysis of gene-phenotype associations.
References:

Backman, J.D., Li, A.H., Marcketta, A. et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature 599, 628–634 (2021). https://doi.org/10.1038/s41586-021-04103-z