Industry Analysis of Precision Medicine and Genetic Testing (Part I): Introduction and Product Overview

May 07, 2015 15:35 CST Updated 15:35

Editor's Note:With the advent of precision medicine, genetic testing, already a hot topic, has garnered increasing attention and discussion. Since the first human genome was sequenced in 2003 at a cost of $3 billion, advances in technology and reduced costs have led to a proliferation of genetic testing technologies and services. Today, genetic testing is employed in the diagnosis and management of over a thousand diseases (Centers for Disease Control and Prevention, 2015). Currently, there are 26,000 registered laboratory tests covering 5,400 conditions and 3,700 genes (NCBI, 2015). A whole-genome sequencing test costs only $6,995 (Skirton, Jackson, Goldsmith, & Connor, 2013), while direct-to-consumer genetic testing is available for as little as $99. A market report published by United Health Group (UnitedHealth, 2012) indicates that genetic testing is the fastest-growing segment of the laboratory testing market, with national expenditures reaching $5 billion and projected to rise to between $15 billion and $25 billion by 2021. Amidst this booming landscape, what exactly is genetic testing? What are its different types, and how does it contribute to healthcare and treatment? What role does data analysis play? In an era of transformative change toward universal health insurance coverage, how does the U.S. insurance industry approach genetic testing? How does the general public currently perceive such tests? The authors, Mr. and Mrs. Chen, are leading professionals in the field of big data analytics for biomedicine. With years of experience in the United States developing theoretical models and conducting practical research, they provide a professional elucidation of these concepts. Despite the capital market’s fervent enthusiasm, few truly understand the underlying theories and practical applications. It is our hope that this article will encourage interested readers to gain a deeper understanding and serve as an impetus for further elaboration and discussion by more professionals, thereby achieving our purpose.

I. What Is Genetic Testing

Genetic and Genomic TestingGenetic and genomic testing employs laboratory methods to analyze the DNA instructions inherited from your parents—namely, your genes. It can be used to identify increased risks for health conditions, guide treatment selection, or assess response to therapy.

Genetic testing is typically used in the context of hereditary diseases to detect specific genes associated with disease susceptibility (susceptibility genes), thereby assessing an individual’s risk of developing such conditions. For instance, it is well known that the presence of a mutated BRCA1 or BRCA2 gene increases one’s risk of developing breast or ovarian cancer. In contrast, genomic testing generally focuses solely on the cancer tumor itself. It is primarily used to provide patients and physicians with detailed information about a specific tumor: for example, indicating whether more aggressive treatment is required for certain tumors, or whether milder, more appropriate therapeutic approaches can be adopted for “indolent” tumors. This has also given rise to two prominent concepts: targeted cancer therapy and personalized medicine. Steve Jobs, who suffered from pancreatic cancer (one of the most aggressive forms of cancer), comprehensively employed various treatment methods, extending his life by nine years while maintaining a good quality of life.

This article adopts the former definition, namely genetic testing; however, since genes constitute an essential component of such testing, it is broadly referred to as genetic and genomic testing, without involving the definition and content of the latter.

II. Types and Utility of Genetic Testing

So, what types of genetic testing are currently available? What are their respective service purposes? The National Human Genome Research Institute provides definitions for seven categories of currently common tests.

Diagnostic Testing:Used to accurately determine the disease causing an individual's illness. The results of diagnostic tests can help individuals make timely decisions regarding treatment or health management.

Predictive and Pre-symptomatic Genetic Testing: Used to identify genetic variations that may increase an individual's susceptibility to disease. The results of these tests can be utilized to predict an individual's risk of developing specific diseases, thereby potentially guiding adjustments in lifestyle and healthcare management.

Vector Detection:Used to identify individuals carrying susceptibility genes associated with diseases. Carriers themselves may not exhibit any overt symptoms of the disease. However, they have the ability to pass these susceptibility genes on to the next generation. Offspring may then develop the disease or become new carriers. For some diseases, susceptibility genes must be inherited from both parents. This type of testing is particularly important for individuals with a family history of genetic disorders, as they often face a higher risk of specific hereditary diseases compared to the general population.

Prenatal Examination:Used to help identify whether the fetus has certain serious diseases during pregnancy.

Newborn Screening:Used to screen newborns aged one to two days for known diseases that may affect their health and future development.

Pharmacogenomic Testing:Provides information on how specific drugs act within the human body. This testing helps healthcare providers select the most effective medications based on an individual’s genetic makeup.

Research-Grade Genetic Testing:Used to gain a deeper understanding of the contribution of genes to health and disease. The results of such studies may not directly benefit participants, but they can help researchers better understand the human body, health, and disease, thereby advancing medical and health sciences and benefiting others in the future.

III. Can Genetic Testing Screen for All Diseases?

Although thousands of genes associated with hundreds of human diseases and traits have been discovered, only a small fraction of the genetic basis has been identified for most diseases; moreover,An association does not indicate that the gene is the culprit causing the disease; that is, association is not equivalent to causation.

Therefore, for almost all complex diseases, even those known to be highly heritable, existing genetic risk analyses have thus far been able to only partially explain disease occurrence (Do C., et al., 2012). For instance, in the case of ten complex diseases—Alzheimer’s disease, bipolar disorder, breast cancer, coronary artery disease, Crohn’s disease, prostate cancer, schizophrenia, systemic lupus erythematosus, type 1 diabetes, and type 2 diabetes—only approximately 0.4% to 31.2% of their occurrence is explained by known susceptibility gene variants (So HC., et al., 2011). This indicates that predictions of disease development based solely on currently identified genes remain highly inadequate for conditions with genetic components. In other words, risk prediction models built on single nucleotide polymorphisms (SNPs) (broadly understood as a type of DNA genetic variation) generally yield poor predictive performance using currently known markers (identifiable regions on chromosomes) (Do C., et al., 2012).

Therefore, in clinical practice, there is considerable caution regarding the use of genetic information for disease risk prediction. Furthermore, genomics encompasses not only genetic factors but also environmental influences, as well as more complex interactions such as gene-gene and gene-environment interactions, thereby increasing the complexity of data collection and subsequent modeling.

IV. Data Analysis in Genetic Testing

As mentioned above, risk prediction models derived from genetic information are less than ideal. So, how do we evaluate the quality of a prediction model? To answer this, it is necessary to understand the specifics of Genome-Wide Association Studies (GWAS) and how data analysis is applied within them. GWAS, also known as Common-Variant Association Studies (CVAS), involves examining numerous common genetic variants across individuals to determine whether any variants are associated with a particular trait, such as disease phenotypes. GWAS typically focuses on investigating the associations between single nucleotide polymorphisms (SNPs) and major diseases or traits. The most common approach in such studies is phenotypic stratification, where participants are divided into two groups based on their clinical characteristics—for example, a patient group and a healthy control group—and their SNPs are then detected and compared. If a specific variant (allele) appears more or less frequently in the patient group, and statistical validation confirms that this occurrence is not due to chance, the SNP is considered associated with the disease. The genomic region identified by such an associated SNP is thus believed to influence disease risk. It is crucial to reiterate that correlation does not imply causation.

SNPs identified through GWAS as being associated with specific diseases cannot be considered to directly cause or increase the risk of developing those diseases.Gene-based risk prediction models calculate an individual's disease risk coefficient based on the strength of the association between single nucleotide polymorphisms (SNPs) and the disease, and further stratify individuals into different risk groups according to this coefficient. The accuracy of the prediction directly determines whether an individual is correctly classified into a high-risk or low-risk group.

One of the most commonly used metrics for assessing the precision of risk stratification, originally applied in signal detection theory, is called:Receiver Operating Characteristic Curve, receiver operating characteristic curve, abbreviated as ROC. The area under the curve is referred to as Area Under Curve, abbreviated as AUC. These two metrics are combined to evaluate the accuracy of risk stratification. In practice, since virtually no predictive model is perfect, it is necessary to calculate the correct classification rate and error rate of the predictive model. Accordingly, we will introduce four fundamental concepts here:True High Risk，False High Risk，False Low Risk, andTrue Low Risk, as shown in Figure 1. Given the known disease outcomes for all individuals, we estimate the number of at-risk and non-at-risk individuals based on risk predictions derived from model-based classification.

Figure 1:

Real RisksIt is the sum of true high-risk and true low-risk cases, meaning that the classification is correct: individuals classified as high-risk indeed have the disease, while those classified as low-risk do not.False RiskThis is the sum of false high-risk and false low-risk cases, i.e., misclassification. We demonstrate this using Figure 2. From left to right, the classification threshold shifts from highly conservative to highly aggressive. The leftmost panel adopts a threshold that requires an extremely high predicted risk score for assignment to the high-risk group, thereby resulting in many false low-risk cases (i.e., many individuals classified as low-risk eventually develop the disease). In contrast, the rightmost panel assigns individuals to the high-risk group based on relatively low predicted risk scores, leading to many false high-risk cases. Thus, the choice of classification threshold substantially affects the overall misclassification rate of the predictive model.

Figure 2:

A robust classification criterion should naturally maximize the number of true high-risk and true low-risk cases, while minimizing the number of false high-risk and false low-risk cases. Generally speaking, whether a classification approach is considered “aggressive” depends on which type of misclassification—false high-risk or false low-risk—would lead to more severe health consequences for the individual after being categorized. We typically use four ratios to reflect the accuracy and error rates of a classification criterion, as shown in Figure 3 and the formulas below (this is closely related to the insurance industry’s stance on genetic testing, so a brief introduction here is warranted):

Figure 3 (the dark-shaded area represents the numerator, and the combined dark- and light-shaded areas represent the denominator):

We aim for the following conditions regarding these ratios: the higher the true risk ratio, the better; the lower the false risk ratio, the better; the higher the true risk prediction ratio, the better; and the lower the false risk prediction ratio, the better. Although all values range from 0 to 1, a value of 1 is optimal for both the true risk ratio and the true risk prediction ratio, whereas a value of 0 is optimal for both the false risk ratio and the false risk prediction ratio.

In the following section, we will continue to discuss the receiver operating characteristic curve and practical operational recommendations from insurance institutions regarding genetic testing. Stay tuned.

This article is republished by VCBeat with authorization from Zhenlipai. The views expressed are those of the author alone and do not represent the position of VCBeat.