Home AI as the Catalyst for Genetic Testing: Unlocking the Full Potential of 'AI + Genomics'

AI as the Catalyst for Genetic Testing: Unlocking the Full Potential of 'AI + Genomics'

Apr 24, 2019 08:00 CST Updated 08:00
Berry Genomics

High-throughput Gene Sequencing Technology Developer

After experiencing a "sudden surge," a "frenzy," and a "chilly autumn," AI venture capital investment has begun to rationalize. At this juncture, downstream applications of artificial intelligence are truly coming to the forefront. It is evident that AI has become the central theme at major medical exhibitions this year, spanning from early surgical robots to more advanced intelligent devices, image recognition, and drug discovery. In the field of gene technology, artificial intelligence has also garnered significant industry attention.

 

“It could become an accelerator for corporate competitiveness.” Dr. Zhou Daixing, CEO and co-founder of Berry Genomics, described it this way at the 2019 CHCC held recently.

 

Despite the United States spending $1 trillion annually on healthcare, the returns on this massive expenditure are underwhelming. For instance, while breast cancer can be screened for, screening does not prevent disease progression. Similarly, it is estimated that prescription drugs are effective in only 25% of cases, resulting in significant waste of clinical resources.

 

乳腺癌筛查.png 

75% of prescription drugs fail to achieve optimal efficacy

 

The underlying reason is actually the difference in metabolic absorption capacity among individuals, but current medication practices do not take into account individualized dosage standards. “The Internet of Things emphasizes the digitization of objects, an area in which we have excelled. However, the level of digitization concerning humans themselves remains very low,” explained Zhou Daixing. Genetic information is closely linked to individuals; these 3 billion base pairs constitute the body’s program code, regulating a series of physical and chemical changes within the human body. The best illustration of digital life is none other than gene sequencing.

 

Guiding disease diagnosis, treatment, and lifestyle management is the ultimate purpose of genetic testing.


Prior to 2012, the cost of digitization was prohibitively high. However, as sequencing costs continued to decline, surpassing the trajectory predicted by Moore’s Law, sequencing expenses gradually ceased to be a significant barrier.

 

摩尔定律.png 

Sequencing Costs and the Ultra-Moore’s Law

 

“It is now a matter of public acceptance,” he continued. The launch of the NIPT pilot program in 2014 marked the first step toward the clinical application of genetic technologies; today, annual testing volume for this technology has surpassed 4 million. The first approval for an NGS-based tumor genetic test was issued in July 2018, marking the beginning of clinical oncology testing. Furthermore, consumer-grade genetic testing under the “light healthcare” concept has already established a market abroad, with annual testing volume exceeding 26 million in 2018. Although China is not a country characterized by significant population migration and thus does not have the same demand for ancestry testing as the United States, its large population base and growing health management needs have nonetheless created a substantial market for consumer genetic testing.

 

Beyond NIPT, oncology testing, and consumer genomics, what further roles does genetic testing play? In a half-hour presentation, Zhou Daixing shared a story with the audience:

 

Two sisters from an ordinary family in Haicheng, Liaoning Province—aged 24 and 16—had visited numerous hospitals since childhood, only to be unfortunately diagnosed with “cerebral palsy.” However, whole-exome sequencing (WES) revealed that they actually suffer from a rare condition known as dopa-responsive dystonia (DRD). After one month of targeted treatment by physicians, the sisters were able to feed themselves; within 50 days of medication, they could independently use mobile phones and host live streams. The monthly cost of their medication amounts to just over 100 yuan.

 

This is a highly representative case that explains the root cause of the disease through molecular-level diagnosis. The ultimate significance of genetic testing may well lie in its ability to guide clinical treatment and daily life by deciphering the genetic code.

 

Artificial Intelligence Is a Prerequisite for Whole-Exome Sequencing


“These sisters are relatively fortunate in that the cause of their condition has been identified and a treatment plan is available,” Zhou Daixing told reporters. “In fact, for a considerable number of diseases, corresponding genes have not yet been identified through research.” Beyond chromosomal disorders and monogenic diseases, most conditions are governed by multiple genetic loci. There are complex interactions among these loci, and different combinations of variants may give rise to distinct disease subtypes. Furthermore, in addition to genetic factors, lifestyle and environmental elements are closely linked to disease development; individuals carrying pathogenic variants do not necessarily develop the disease. Even when the disease does manifest, clinical phenotypes can vary due to differences in individual tolerance. Therefore, while our aspirations are ambitious, we must acknowledge the stark reality: it is extremely difficult to elucidate the precise correspondence between diseases and genetic loci through manual effort alone.

 

AI-Powered Mining of Unknown Associations


After obtaining an individual’s genetic information, it is typically necessary to align it with the human reference genome to identify potential mutations. The determination of correlations between mutations and diseases relies largely on public databases, which are primarily constructed through data mining of published literature. However, given the vast volume of papers updated globally on a daily basis, relying solely on manual curation is impractical. This is where the value of artificial intelligence becomes evident.

 

Currently, the predominant technical approach in artificial intelligence (AI) is based on artificial neural networks, which include variants of several algorithms such as ART networks, LVQ networks, Kohonen networks, and Hopfield networks. Machine learning is at the core of contemporary AI, enabling the system to learn from and integrate large volumes of unstructured data to uncover and compute underlying associations. By continuously mining both existing and newly published literature, AI can persistently identify and update potential correlations between mutation sites and diseases.

 

“The more extensive this associative coverage is, the stronger and more accurate people’s ability to interpret genes becomes,” he told VCBeat, adding that this is also what IBM Watson does.

 

Deriving New Discoveries from “Old Data”

 

For diseases already covered, the significance of AI in genomics may extend to assisting in disease diagnosis. Taking the type 2 diabetes study released by 23andMe in March 2019 as an example, based on extensive data training, 23andMe can determine whether a user has type 2 diabetes relying solely on genetic data.

 

Although this is a polygenic disease, 23andMe has been able to train its model to achieve 79% accuracy with the support of large-scale data. However, you might wonder: since the clinical diagnosis of type 2 diabetes is relatively straightforward, why resort to more complex diagnostic methods? A different example may provide a more intuitive perspective. Eighty percent of depression cases are linked to genetic factors, and it is also a polygenic disease. Currently, clinical diagnosis of depression relies primarily on patient questionnaires and is heavily dependent on the individual experience of psychiatrists. It is no exaggeration to say that the diagnosis of depression remains in the era of empirical medicine.

 

“If a preliminary diagnosis of depression could be made based solely on genetic data, even an accuracy rate of 50% would represent a significant breakthrough compared to current methods,” explained Zhou Daixing.

 

Large-Scale Data Is the Prerequisite for Intelligence


It began with non-invasive prenatal testing, gained momentum with cancer detection, and reached its peak with whole-genome sequencing. In the foreseeable future, the widespread adoption of whole-genome or whole-exome sequencing is an inevitable trend. However, data interpretation for whole-genome or whole-exome sequencing has long faced bottlenecks. If relying solely on manual effort, a bioinformatics engineer may only be able to generate one to two reports per day—a pace that makes product scalability nearly impossible. Therefore, artificial intelligence is essential for the large-scale commercialization of whole-exome sequencing.

 

But how can such artificial intelligence be realized? The regulatory approval journey of digital health company AliveCor may offer valuable lessons. AliveCor launched the “KardiaBand” strap for the Apple Watch, capable of instant electrocardiogram (ECG) measurements. This is a

1.1 million ECG data points, with over 200,000 records of atrial fibrillation cases compared against 700,000 normal records, and only passed the assessment after continuous calibration.

 

For all artificial intelligence applications, the prerequisite for intelligence is extensive data training. In the field of genetic testing, the prerequisite for such extensive data training is the generation of large volumes of data, which necessitates widespread adoption of sequencing technologies.

 

Deliver the diagnostic products needed by society


“Companies must first deliver products that meet societal needs,” pointed out Zhou Daixing. He believes that only products capable of fulfilling societal demands can gain market acceptance. Undoubtedly, NIPT (Non-Invasive Prenatal Testing) serves as a successful precedent. However, we must also acknowledge that chromosome testing alone covers too limited an information scope. “We are currently working to promote whole-exome sequencing, which can cover more than 99% of genetic information,” he revealed. He added that the current price of this product is below RMB 3,000 and continues to decline.

 

Furthermore, 23andMe’s success in the consumer testing market has provided significant insights for professionals in the genetic testing industry. Although microarray-based testing covers a relatively limited amount of information, 23andMe has already achieved notable results based on such data, including research on insomnia-related genes and type 2 diabetes, as mentioned earlier. “Whole-exome sequencing yields 100 times more information than microarray-based testing, so I believe there will be even more breakthroughs,” said Zhou Daixing.

 

In February 2019, Berry Genomics announced the establishment of Yuan Gene, a consumer-grade genetic testing company jointly invested with Prenetics, an influential genetic testing firm in Southeast Asian and European markets. The company invited Yan Jun, former General Manager of Strategic Cooperation at Google China, to join as Chief Executive Officer. The new entity was named “Beijing Yuanyuan Gene Technology Co., Ltd. (Yuan Gene).” It is understood that Yuan Gene would commence operations in the second quarter of 2019. In an interview, Zhou Daixing revealed that to make the test reports more objective and comprehensive, Yuan Gene would also adopt whole-exome sequencing.

 

“Regardless of what is being tested, screening should be as comprehensive as possible,” emphasized Zhou Daixing. “From a developmental perspective, products that meet societal needs must come first to enable data accumulation. Only with this foundation can artificial intelligence analysis be applied, expanding coverage from a single disease to multiple diseases.”

 

Data Standards and Management


Beyond volume, data quality has long been a hotly debated issue in the field of artificial intelligence. A large data volume is not synonymous with big data; the degree of data structuring and standardization is also crucial.

 

In clinical practice, different physicians may describe the same symptom in varying terms. For instance, what Physician A describes as “abdominal pain” may be documented by Physician B as “abdominal cramping.” Furthermore, due to interindividual differences in pain perception and expression, the same clinical manifestation can elicit a wide array of descriptions.

 

In routine clinical practice, individual physicians’ habits do not significantly influence disease diagnosis and treatment; however, when such fragmented data are aggregated into a single dataset, it becomes challenging to perform accurate and effective statistical analysis and summarization. More importantly, corporate databases may source data from multiple hospitals. With tens of thousands of cases, it is difficult to imagine the wide variety of expressions different physicians may use to describe the same symptom.

 

Standardizing these linguistic variations into a unified format would undoubtedly facilitate data mining and utilization. To this end, Berry Genomics has developed a computational software called NLPearl, which leverages natural language processing (NLP) to standardize diverse terminological conventions. Through multi-layered learning, NLPearl can summarize the natural language descriptions used in hospitals; consequently, when encountering unstructured natural language inputs in the future, the system can automatically calibrate them into standardized descriptions. As data accumulates to a sufficient scale, it becomes possible to map any type of natural language description to corresponding genetic loci. Conversely, for patients with specific mutations, the system can infer the potential spectrum of clinical phenotypes they may exhibit. Potentially, once the system reaches a certain level of training maturity and whole-genome or whole-exome sequencing becomes widespread, preliminary diagnoses could be made before patients even visit the hospital. In such a scenario, clinical consultations at hospitals may place greater emphasis on discussing treatment strategies.

 

“Furthermore, data standardization will have a significant impact on future developments.” He pointed out that the data volume of a single hospital or company is insufficient to achieve scale; “whether data can be shared will become a key issue.” More importantly, strict industry standards must be enforced during the data-sharing process to ensure the privacy and security of data holders.

 

For individual enterprises, artificial intelligence may serve as a tool that reinforces the advantage of market leaders. AI assistance will further enhance the efficiency and accuracy of testing while indirectly reducing testing costs—key manifestations of a company’s competitive edge in the market. For the industry as a whole, AI acts as both a compass and an accelerator, enabling gene technology to enter the market and achieve widespread adoption with greater precision and speed.

 

Perhaps one day, genetic testing will become as commonplace a clinical tool as electrocardiography (ECG), freeing physicians from the burden of lacking a background in molecular genetics, as artificial intelligence assists them in data interpretation and analysis. With the aid of genetic technologies, artificial intelligence, and other emerging innovations, the secrets of Alzheimer’s disease may finally be unraveled; meanwhile, internet-based consultations and telemedicine will likely gain more robust technical support...

 

There are many visions for the future, and we believe that all of these possibilities may come to fruition. To put this into practice, the industry’s first step should begin with data generation. Of course, reaching industry consensus and establishing standards will undoubtedly accelerate the arrival of that day.