Home Nature Publishes Landmark Study on Whole-Genome Sequencing of 150,119 Individuals from UK Biobank, Uncovering Rare Variant Associations with Human Traits

Nature Publishes Landmark Study on Whole-Genome Sequencing of 150,119 Individuals from UK Biobank, Uncovering Rare Variant Associations with Human Traits

Aug 09, 2022 10:58 CST Updated 10:58
deCODE genetics

Developer and Provider of Genetic Mutation Diagnostic Tests

Amgen

Developer of Treatment Drugs for Serious Diseases

University of Iceland

The University of Iceland is one of the leading universities in the Nordic region and a world-renowned research university. Located adjacent to the coast with a beautiful environment and advantageous geographical location, it is Iceland’s premier institution of higher learning. The University of Iceland enjoys an outstanding reputation in the fields of science and technology as well as humanities and arts. It ranks among the world’s leaders in research areas such as energy (geothermal technology), physics, chemistry, geological sciences, genetic medicine, computer applications, software development, and Icelandic literature (including the Sagas and the Eddas; an alumnus was awarded the Nobel Prize in Literature in 1955).The University of Iceland comprises five main faculties: the Faculty of Education; the Faculty of Social Sciences; the Faculty of Humanities (including departments of Literature, Theology, Social Sciences, and Law); the Faculty of Health Sciences (including departments of Medicine, Dentistry, Nursing, Pharmacy, and the National Hospital); and the Faculty of Engineering and Natural Sciences (covering divisions of Civil and Environmental Engineering, Electrical and Computer Engineering, Industrial Engineering and Mechanical Engineering, Life and Environmental Sciences, Earth Sciences, the Center for Systems Biology, and more). In total, the university offers hundreds of programs across more than 30 disciplines. Additionally, the University of Iceland hosts over 40 research institutes and centers spanning various disciplines. The National Hospital, the National Library, and the National Museum are also under the administration of the University of Iceland.The Science Institute is one of the largest institutes at the University of Iceland. It comprises seven research departments—Physics, Chemistry, Mathematics, Applied Mathematics and Computer Science, Earth Sciences, and Geophysics—as well as more than 30 research groups (laboratories). The institute employs over 100 staff members, including 49 professors and associate professors, and 57 other mid-to-senior-level researchers. Most programs at the University of Iceland offer master’s and doctoral degree courses.

In recent years, through whole-exome sequencing (WES) and whole-genome sequencing (WGS) of large cohorts with rich phenotypes, there has been a deeper understanding of how the diversity of human genome sequences affects phenotypic diversity. The UK Biobank (UKB) conducted WGS analysis on all participants, with an average depth of at least 23.5×, documenting phenotypic variation data from 500,000 participants in the UK, and reported the first data release based on WGS from 150,119 individuals, including a large number of sequence variations, such as single nucleotide polymorphisms (SNPs), short insertions or deletions (indels), microsatellites, and structural variations (SVs). However, WES is limited to known coding regions, revealing only a small portion (2-3%) of sequence variations in the human genome.
Recently, the University of Reykjavik in Iceland, in collaboration with deCODE Genetics, Amgen, and other teamsAnalysis of WGS data from 150,119 individuals from UKB highlights the discovery of rare variant-trait associations with large effects, which are difficult or impossible to identify using WES and SNP array datasets. This represents the largest whole-genome sequencing effort to date.The research findings have been published.NatureAbove, the article is titled "The sequences of 150,119 genomes in the UK Biobank".

The article was published inNature

The research team in the WGS dataA total of 585,040,410 SNPs were identified, representing 7.0% of all possible human single nucleotide polymorphisms.In the genome, regions available for short sequence read mapping have an average of one SNP every 4.8 bp. The study observed81.5% of all possible autosomal CpG>TpG mutations, 11.8% of other base transitions, and 4.0% of base transversionsResearchers will analyze the methylation of 17,345,777 autosomal CpG dinucleotides in the germline.Base transition variants were found in 89.1% of CpG methylations.Due to the high saturation of CpG mutations (Figure 1), the ratio of transitions to transversions (1.66) is lower than that found in smaller WGS datasets and de novo mutation studies.

Figure 1. Mutation types of sequential variants in UKB.

Subsequently, the research team used the number of sequence variants in the UKB to search for conserved regions in 500bp windows of the human genome. The team tabulated the number of variants in each window and compared this number with the expected count based on the heptanucleotide composition of the given window, the proportion of heptamers with sequence variation in the genome, and their mutation categories. The research team then assigned a score from 0 (most depleted) to 100 (least depleted) to each 500bp window, referred to as the Depletion Rank (DR). As expected, coding exons had a lower DR (average DR = 28.4), but many non-coding regions had even lower DRs (more depleted), including non-coding regulatory elements.In the lowest 1% region of DR, coding regions account for 13.0%, while non-coding regions account for 87.0%. Overexpression is observed in splicing regions, UTR regions, and upstream/downstream gene regions (Figure 2). DR increases with the distance from exonic coding regions.After removing coding exons, GWAS variants were found to be overrepresented by 3.2-fold and 0.4-fold in the bottom and top 1% of regions ranked by DR score, respectively, indicatingDR score can be used for GWAS analysis.

Figure 2. Functionally Important Areas.

The analysis shows,On average, each haploid genome carries 341,510 SNP and indel alternative alleles.(Figure 3). Since the current human reference genome is primarily derived from individuals of European ancestry, a significant number of variations are typically found in populations outside Europe. Among these,The largest number of alternate alleles are carried by African individualsThe research team constructed a cohort-specific DR and found that the exon depletion in African individuals was greater than that in European and Asian individuals. Individuals from European, African, and Asian cohorts had an average of 1,330, 9,623, and 8,340 singleton variants, respectively. In European individuals, the expected number of new variants discovered per genome remains substantial. This situation is largely likely due to intensive sampling in certain regions.

Figure 3. Mutation Call Set.

The research team detected and evaluated GraphTyper SNP/indel, microsatellite, and SV datasets, which were associated with a total of 8,180, 1,291, and 459 phenotypes in European, African, and Asian cohorts, respectively, highlighting phenotype characteristics not easily identifiable in WES or SNP array data. Meanwhile, the research team used Manta to identify SVs in each individual and genotyped the resulting 895,055 SVs using GraphTyper50, of which 637,321 were considered reliable.
In addition, the research team used popSTR54 to identify 14,321,152 alleles at 2,536,688 microsatellite loci in 150,119 WGS individuals, with these individuals carrying an average of810,606 non-reference microsatellite allelesIn the UKB cohort, the number of non-reference alleles carried by each individual is similar to the distribution of other types of variants in the study. Microsatellites are one of the fastest mutating variants in the human genome and are also a source of genetic variation often overlooked in GWAS.
Kari Stefansson, founder of deCODE Genetics and co-corresponding author of the paper"This study provides a type and quantity of variation that will revolutionize our ability to identify and characterize intergenic sequences significant to human diversity, whether for disease risk and treatment response or other traits."
In summary, the research team conducted whole-genome sequencing analysis on 150,119 individuals from the UK Biobank. The analysis revealed that coding exons represent a small portion of the genome subject to strong sequence conservation. The study identified 895,055 structural variants (SVs) and 2,536,688 microsatellites, which are typically excluded from large-scale whole-genome sequencing studies. Using this powerful new resource, the research team described several cases of rare variant-trait associations with significant effects that were not previously discovered in WES-based or predictive studies.
References:
Halldorsson, B.V., Eggertsson, H.P., Moore, K.H.S. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022). https://doi.org/10.1038/s41586-022-04965-x