The Future of Cancer Genomics: Translating Data into Clinical Applications

Feb 26, 2015 16:30 CST Updated 16:30

Editor's Note

The Cancer Genome Atlas (TCGA) is undoubtedly a successful initiative launched by the U.S. National Cancer Institute in recent years. Involving 10,000 patients with more than 20 different types of cancer, it has generated comprehensive clinical, genomic, and molecular biology data. With the participation of hundreds of top-tier laboratories and experts, TCGA has significantly advanced basic cancer research, leading to a deeper understanding of cancer cells and their genomes. On the other hand, data from these 10,000 patients have revealed that the complexity of genomic variations in cancer is far greater than previously imagined. Even within the same type of cancer, there are substantial differences in genetic mutations among patients. This poses significant challenges to our understanding of the mechanisms underlying carcinogenesis, progression, and metastasis. However, with the rapid advancement of sequencing technologies, as well as developments in computing and big data analytics, an ever-growing volume of data will become available. We have good reason to believe that our understanding and treatment of cancer will undergo qualitative improvements within the next 5–10 years.

With the completion of The Cancer Genome Atlas, it is now time to assess its impact and mine its data to gain a better understanding of cancer biology and treatment. On February 5, 2015, Nature Medicine published an editorial titled “The future of cancer genomics,” which evaluated the project’s influence on cancer research and the potential of data mining.

In 2015, The Cancer Genome Atlas (TCGA) will slow its pace to complete the largest project led by the U.S. National Institutes of Health. Initially launched as a pilot program in 2006, the project was tasked with generating a comprehensive panorama of alterations across all tumor types, aiming to yield new insights into cancer biology that could potentially be leveraged to develop improved therapies. This high-throughput approach, which deviated from traditionally funded hypothesis-driven projects, pursued the ambitious goal of capturing the full spectrum of cancer alterations. While initially welcomed by the scientific community, it also faced skepticism. It is now time to evaluate TCGA and determine how its insights can be utilized to benefit the cancer community.

In terms of data generation, the project has achieved unequivocal success. Since its inception nearly a decade ago, with a total investment of $375 million, TCGA has incorporated scientific contributions from more than 150 researchers across 16 countries and collected 100,000 tumor samples representing over 25 different cancer types. Its 20-petabyte dataset encompasses 10 million mutations, which have been published in 17 publications by the TCGA Research Network to date and cited in hundreds of papers. These remarkable figures reflect the project’s exponential growth, made possible by the rapid advancement of sample collection, sequencing, and analytical technologies.

The TCGA project continues to generate a vast amount of information. TCGA data have been used to identify novel mutations, define intrinsic tumor subtypes, determine similarities and differences across cancer types, elucidate mechanisms of drug resistance, and gather evidence of tumor evolution. Undoubtedly, we can now study cancer with unprecedented detail; however, we are not yet able to fully explain the overall landscape of this disease or clarify its underlying mechanisms.

Some TCGA researchers believe that more insights can be gained by continuously searching for new cancer alterations. However, recent assessments have highlighted the daunting nature of the cancer sequencing task:Based on background mutation rates, characterizing more than 10,000 samples is required to detect changes with a 1% probability in certain tumor types. Therefore, Louis Staudt, Director of the Office of Cancer Genomics at the National Cancer Institute (NCI), announced that the TCGA Research Network will now focus on employing whole-genome sequencing to expand the characterization of three selected tumor types: lung adenocarcinoma, colorectal cancer, and ovarian cancer. The aim is to identify alterations present in only 2% of tumors and to uncover previously overlooked changes, such as translocations.

This pilot project will also strive to overcome past financial and logistical barriers. Sample acquisition—once the largest financial burden for TCGA—will now be aligned with ongoing clinical trials of targeted cancer therapies, thereby enabling a more comprehensive characterization of genotypes and phenotypes across different stages of cancer. Importantly, the NCI will invest resources to ensure the accessibility and proper analysis of sequencing data. The newly established NCI Genomics Data Commons will provide a portal offering interactive support and best practices for users of genomic data. The results of this pilot study will determine whether a similar approach should be applied to broader oncology research.

Sequencing efforts continue; although on a smaller scale, they are crucial for determining the next steps. This will require renewed effort, creativity, and courage from the cancer patient community, as well as robust support from funding agencies.

The transformation of TCGA data has presented several challenges and solutions. First, researchers are developing improved computational models to identify relevant variations amidst the noise of genetic backgrounds. While this may reduce data complexity, functional studies must expand to encompass the dimensions of genetic research. For instance, recent advances in genome editing tools, such as CRISPR-Cas9, have provided unprecedented capabilities to study genetic variants in a rapid, scalable, and more cost-effective manner.

However, to obtain meaningful insights, we need to investigate genetic alterations within the complex and heterogeneous physiological tumor microenvironment. This will require integrating cell lines, organoids, and patient-derived models into a unified workflow to enable high-throughput functional testing of genetic variants. Furthermore, better integration between cancer genomics and clinical practice will allow us to directly identify phenotype–genotype correlations.

TCGA represents a significant contribution to the field of cancer research. By translating cancer genomics into mechanistic insights and future therapeutic strategies, its findings elevate the discipline to a new level and herald a new era in cancer research.

Qiang Sun, a senior bioinformatics expert currently serving at the National Cancer Institute (NCI), specializes in cancer genomics database management. A passionate advocate of big data, he has been a volunteer contributor to Big Data Digest for over a year, eager to connect with like-minded individuals through his writings and foster meaningful collaborations within the big data community. Having lived in the United States for many years, he resides in the Greater Washington D.C. area with his capable wife and three adorable daughters.

Educational Background: Shandong University, Institute of Botany, Chinese Academy of Sciences, and University of California, Los Angeles (UCLA)
Cities Lived In: Zibo, Jinan, Beijing, Los Angeles, Washington
Previous Employers: BioDiscovery Inc., The Institute for Genomic Research (TIGR), J. Craig Venter Institute (JCVI), National Institutes of Health (NIH)
Other hobbies: football, fishing, playing cards

This article is republished by VCBeat with authorization from Big Data Digest.

Author: BioExplore biodiscover.com Big Data Digest [WeChat ID: BigDataDigest]