ChinaMAP Releases Largest-Scale Whole-Genome Sequencing and Phenotypic Study of the Chinese Population

Jun 28, 2020 08:00 CST Updated 08:00

MGI

Gene Sequencing Instruments and Related Reagent & Consumables R&D Manufacturer

On April 30, 2020, the China Metabolic Analytics Project (ChinaMAP) Consortium, initiated by the National Clinical Research Center for Metabolic Diseases (Shanghai) and Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, in collaboration with 29 research institutions and hospitals across China, published the first report on the largest-scale whole-genome sequencing and phenotypic study of the Chinese population in Cell Research, a journal hosted by the Shanghai Institute for Biological Sciences of the Chinese Academy of Sciences.

论文.png

The foundational data for this study were generated using the MGI DNBSEQ sequencing platform, covering more than 10,000 samples from eight ethnic groups across 27 provinces and municipalities in China. By leveraging domestically developed instruments, platforms, and analytical methods, this research conducts an in-depth and extensive investigation into the genomic characteristics of the Chinese population, holding unprecedented significance. Academician Ning Guang, Professor Wang Weiqing, and Professor Bi Yufang from Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, and the National Clinical Research Center for Metabolic Diseases, serve as co-corresponding authors of the paper. Researcher Cao Yanan, Researcher Li Lin, and Researcher Xu Min are among the co-first authors.

On June 26, 2020, the 7th N·GS Innovation Developer Conference, co-hosted by GeneCloud Health, the Medical Laboratory Engineering Branch of the Chinese Society of Biomedical Engineering, People’s Daily Online, and the Administrative Committee of Qiantang New Area, was held in Hangzhou’s Qiantang New Area. Researcher Cao Yanan attended the event and delivered a keynote address. Dr. Cao pointed out that multiple ultra-large-scale prospective epidemiological cohort studies are currently underway in various countries around the world, providing systematic data support for a range of applied research initiatives.

He stated that unbiased selection of population samples can better estimate the association between genetic effects and diseases or disease markers. Long-term follow-up of health records is indispensable for assessing the association between diseases and environmental factors. The number of incident cases observable in cohorts of varying sizes facilitates etiologic research on diseases with low incidence. Furthermore, Researcher Cao emphasized that the detection of pathogenic mechanisms and the establishment of diagnoses for common diseases are prerequisites for accurate risk assessment.

So, what genomic characteristics of the Chinese population does this study, which used an unstratified Chinese cohort as its sample, reveal? What are the differences among various ethnic groups within the Chinese population and between the Chinese population and other populations? What are the implications of these research findings for guiding applications in precision medicine and clinical guidelines? Following the presentation, VCBeat conducted an exclusive interview with Researcher Cao Yanan to delve into these questions, and the discussion is summarized below.

The Chinese Progress in Large-Scale Population Cohort Studies

VCBeat: First, could you please share the origins of ChinaMAP? Under what background was this large-scale study targeting the Chinese population initiated?

Researcher Cao Yanan: Genomics and multi-omics big data from large-scale population cohorts are playing a leading role in the prevention, diagnosis, and new drug development for major chronic diseases, cancers, and genetic disorders, driving transformative changes in personalized precision health management and disease diagnosis and treatment. The United States and Europe have implemented numerous medical research initiatives based on genotyping and genomic sequencing data from large-scale cohorts, including the renowned UK Biobank, The Cancer Genome Atlas (TCGA) program, and the Trans-Omics for Precision Medicine (TOPMed) program, yielding a series of landmark achievements with profound impact.

For a long time, many studies on genetic diseases in the Chinese population have directly applied data and conclusions derived from foreign populations. However, due to significant differences in historical origins and genetic backgrounds among populations from different regions and ethnic groups, it is incomplete and unreliable to directly use knowledge and conclusions biased toward other populations as the basis for disease risk assessment, genetic counseling, or diagnostic treatment for the Chinese population. Therefore, the National Clinical Research Center for Metabolic Diseases (Shanghai), led by Ruijin Hospital affiliated with Shanghai Jiao Tong University School of Medicine, has conducted multiple nationwide cohort studies. Relying on the National Major Science and Technology Infrastructure for Translational Medicine (Shanghai) and the State Key Laboratory of Medical Genomics, it has implemented the China Metabolic Analytics Project (ChinaMAP), aiming to establish a precision medicine system tailored to the Chinese population using data from Chinese individuals.

VCBeat: What research methods and strategies did you and your team adopt to conduct such a large-scale population study?

Researcher Cao Yanan: We performed 40× depth whole-genome sequencing on 10,588 human DNA samples from the cohort, representing diverse regions and ethnic groups across China. Given the large sample size, which imposed stringent requirements on sequencing cost and throughput while necessitating high accuracy, we selected MGI’s domestically developed, high-throughput DNBSEQ sequencing platform.

Following the acquisition of sequencing data, we completed the construction of a high-quality genetic variation dataset for the Chinese population, analyzed the population structure of the Chinese cohort, conducted comparative analyses of genomic features, and characterized variant spectra and pathogenic variants.

Currently, the ChinaMAP Phase I database contains 136 million single nucleotide polymorphism (SNP) sites and 10 million insertion or deletion (INDEL) sites, half of which are novel variants not found in the internationally recognized dbSNP, 1000 Genomes, gnomAD, and TOPMed databases.

VCBeat: How is the database for the ChinaMAP study made accessible and utilized?

Researcher Cao Yanan: Regarding data accessibility, information on all variant locations, annotations, frequencies, and data quality in the ChinaMAP database can be searched on the National Clinical Research Center for Metabolic Diseases’ website at www.mBiobank.com, providing support for medical and life sciences research in China.

Chinese Characteristics of Large-Scale Population Cohort Studies

VCBeat: Based on the findings of this study, are there differences in genetic characteristics among different regions and ethnic groups in China?

Researcher Cao Yanan: China boasts a vast territory and a multi-ethnic population. The first phase of the ChinaMAP study covered seven major geographical regions across the country, encompassing the Han, Zhuang, Hui, Manchu, Miao, Yi, Tibetan, and Mongol ethnic groups, which rank among the top ten in terms of population size. This demonstrates the diversity and complexity of the genetic backgrounds of Chinese populations across different geographical regions.

The research team has, for the first time, revealed that the Han Chinese population can be significantly divided into seven subgroups: Northern Han (Beijing, Tianjin, Henan, Hebei, Shandong, Liaoning, Jilin, Heilongjiang, Shanxi), Northwestern Han (Gansu, Shaanxi), Eastern Han (Jiangsu, Zhejiang, Shanghai, Anhui), Central Han (Hubei), Southern Han (Guizhou, Sichuan, Chongqing, Hunan, Yunnan, Jiangxi), Southeastern Han (Fujian), and Lingnan Han (Guangdong, Guangxi).

Furthermore, among ethnic minorities, the Tibetan, Yi, Mongolian, Miao, and Zhuang groups each exhibit distinct population clustering, whereas the Manchu are genetically similar to Northern Han Chinese, and the Hui are similar to Northwestern and Northern Han Chinese. The variation characteristics of populations in different regions are also associated with historical population migrations and demographic changes in China. For instance, the Hexi Corridor served as a key transportation route on the Silk Road for the migration of various ethnic groups; historically, many peoples, including the Sogdians, lived and conducted business there. The ChinaMAP study reveals that modern populations in the Hexi Corridor region possess a greater number and higher complexity of genetic polymorphic loci.

VCBeat: Based on this study, which diseases are Chinese populations more susceptible to compared to European and American populations?

Researcher Cao Yanan: ChinaMAP conducted a comprehensive analysis of genetic variants associated with hereditary diseases in the Chinese population. The research team discovered that the carrier frequencies of pathogenic gene variants for conditions such as congenital hypothyroidism, chronic pancreatitis, and hereditary palmoplantar keratoderma are significantly higher in the Chinese population than in European and American populations, exhibiting distinct regional distribution patterns. These findings provide valuable references for the screening, prevention, and control of key hereditary diseases in China. For instance, the allele frequency of rs142859678, a pathogenic variant in the SERPINB7 gene associated with Nagashima-type palmoplantar keratosis—which is prevalent among Chinese and Japanese populations—is approximately 20 times higher than that in European and American populations. Furthermore, the frequencies of certain pathogenic variants associated with hypothyroidism are more than 10 times higher in the Chinese population compared to European and American populations.

Differences in the frequency of disease-associated variant loci between Chinese and Euro-American populations indicate that genetic counseling and interpretation in China, research on variants of uncertain significance (VUS), and the development of related clinical guidelines and pathways must be based on large-scale, high-quality data from the Chinese population.

VCBeat: What are the differences between the Chinese population and those in Europe and America regarding major chronic diseases with high incidence rates, such as type 2 diabetes and obesity?

Researcher Cao Yanan: Among the genetic factors contributing to complex diseases, many high-effect gene variants are predominantly found in specific geographic regions and ethnic groups. Only through comprehensive data analysis based on specific populations can we accurately assess the genetic risk of diseases within those populations. For instance, the TCF7L2 gene variant (such as rs7903146), which represents the most significant genetic risk factor for type 2 diabetes in European populations, occurs at a very low frequency in Chinese individuals. This demonstrates that relying solely on results derived from European and American populations for reference and validation is insufficient in the study of metabolic traits and diseases.

Furthermore, the disease risk conferred by a single genetic variant carried by an individual may be modest; however, the combined effects of multiple genetic variants can have a substantial impact on individual traits. Therefore, assessing individual disease risk using polygenic risk scores (PRS), based on large-scale genotype and phenotype databases from specific populations, is a relatively accurate approach. In the ChinaMAP study, researchers calculated polygenic risk scores for the genetic risk of type 2 diabetes, displaying each individual’s precise position within the entire population in a three-dimensional plot ranked by PRS, age, and blood glucose levels. The PRS rankings revealed significant differences in blood glucose levels between individuals at high and low risk for type 2 diabetes. With advancing age, individuals at high risk exhibited significantly higher fasting and 2-hour postprandial blood glucose levels compared to those at moderate and low risk.

Furthermore, comparative validation has confirmed that baseline data derived from East Asian populations yield more accurate results than those based on European populations. These findings underscore the importance of leveraging baseline data from Chinese populations for precise risk assessment of type 2 diabetes and other metabolic diseases, offering significant value in the prevention of major chronic diseases, personalized health management, and public health decision-making.

In BMI-related analyses, the research team identified a novel East Asian–specific locus in the CADM2 gene; animal studies have confirmed that CADM2 participates in the regulation of body weight and energy homeostasis. Important obesity-associated loci identified in European and American populations, such as FTO, were not significant in the ChinaMAP study results. These findings suggest that large-scale, Chinese population–specific genomic research is crucial for establishing a precision medicine framework encompassing molecular mechanisms and personalized diagnosis and treatment.

The Precision Medicine System for the Chinese Population Is an Indispensable Part of the Global Precision Medicine System

VCBeat: What insights has ChinaMAP provided for the current stage and the foreseeable future?

Researcher Cao Yanan: First, ChinaMAP has revealed that the frequencies of many disease-causing genetic variants in the Chinese population differ from those in European and American populations, providing a reference for genetic counseling and interpretation, as well as for the screening, prevention, and control of genetic diseases in China.

Secondly, ChinaMAP also analyzed and compared the genetic characteristics related to drug metabolism in the Chinese population. The research team conducted analyses on dose reduction for the anticoagulant warfarin, classification of suitable populations for the antiplatelet drug clopidogrel, and identification of individuals at risk for side effects from statin lipid-lowering drugs. For instance, regarding simvastatin, a commonly used lipid-lowering agent for hyperlipidemia, more than 20% of Chinese individuals are at risk of rhabdomyolysis as an adverse reaction, indicating that caution is required when prescribing certain medications.

Furthermore, ChinaMAP has also analyzed and compared the genetic characteristics related to nutritional metabolism in Chinese individuals. It confirmed that the rs671 variant of the aldehyde dehydrogenase 2 (ALDH2) gene, which causes facial flushing after alcohol consumption and impaired alcohol metabolism, is specific to East Asians. The carrier rate in the Chinese population (4.50% for homozygotes and 34.27% for heterozygotes) is significantly higher than that in other global populations. The rs671 variant is also a significant risk factor for esophageal cancer; therefore, individuals who experience facial flushing after drinking should limit their alcohol intake.

Furthermore, ChinaMAP will also have a profound impact on drug development, research into disease mechanisms, and precision therapy.

Meanwhile, VCBeat readers who wish to learn more can register to watch the online presentation by Researcher Cao Yanan, titled “ChinaMAP: Whole-Genome and Phenotypic Study of the China Metabolic Analytics Project.”