Home DeepDiagnos Files IPO Prospectus: Pioneering AI-Powered Early Cancer Detection Through Genomic Analysis

DeepDiagnos Files IPO Prospectus: Pioneering AI-Powered Early Cancer Detection Through Genomic Analysis

Jun 18, 2018 08:00 CST Updated 08:00
DeepDiagnos (US) was founded in 2017, dedicated to leveraging artificial intelligence for early cancer screening. Its core technology involves building AI models to scan every locus of the genome on a point-by-point basis, identifying potential tumor-associated mutations and issuing alerts at the earliest stages of carcinogenesis.


Perhaps it was destiny: from his student days through his professional career and into entrepreneurship, Zhang Xinjun has never strayed from the field of bioinformatics.


Harbin Institute of Technology (HIT) was the first university in China to establish a bioinformatics program. It is where Zhang Xinjun’s dreams took root, and where he began studying bioinformatics as an undergraduate. Later, his entrepreneurial team also largely originated from HIT, with most members being his senior peers or classmates from the same university.


“We have similar backgrounds and know each other well,” he told VCBeat.


After obtaining his master’s degree in 2009, Zhang Xinjun traveled to the United States for further studies. He enrolled at Indiana University, where his research focused on using machine learning algorithms to predict gene mutations that may lead to disease. Applied to the analysis of massive tumor genomics datasets, this approach can be used to identify relevant driver mutations. This work laid the foundation for what is now the Deep Diagnos algorithmic model.


After completing his Ph.D. in 2015, Zhang Xinjun joined Thermo Fisher Scientific, one of the world’s leading medical device companies. The company’s sequencers hold the second-largest market share, trailing only Illumina. He was assigned to the Clinical Sciences division, where he was responsible for software design for sequencing instruments.


“My job at the time was to design software for analyzing cell-free DNA in patients’ blood, to detect whether they carried specific mutations and to identify suitable targeted therapies,” he recalled. In short, it involved data analysis and processing for companion diagnostics.

 

>>>>

Followers of Technology

 

Since 2014, next-generation sequencing (NGS) technology has driven significant growth in the gene sequencing industry. Liquid biopsy technology has also continued to mature, with many companies beginning to apply it to tumor detection, treatment planning support, and even early screening.

 

“Liquid biopsy technology enables the detection of trace amounts of circulating tumor DNA (ctDNA) in blood. However, cancer early screening cannot be achieved by detecting DNA mutations alone,” said Dr. Hu Yang, CSO of DeepDiagnos.

 

Hu Yang is his alumnus from Harbin Institute of Technology and is currently conducting postdoctoral research at Harvard University. Hu Yang has published 21 SCI papers in the fields of neuro/immune diseases, functional annotation of human complex diseases, and related research areas, including 15 papers as first author or corresponding author, with a cumulative impact factor of 75.


Hu Yang and Zhang Xinjun are my longtime friends and classmates, as well as the earliest members of DeepDiagnos. “When I first had the idea of starting a business in 2017, I discussed it with Hu Yang, and we then decided to build the team together,” recalled Zhang Xinjun. Another partner, Cheng Liang, is also an alumnus of Harbin Institute of Technology. In September 2016, he was exceptionally promoted to Associate Professor at the School of Bioinformatics, Harbin Medical University, and in 2017, he was selected as a member of the Professional Group on Bioinformatics of the China Computer Federation.

 

Hu Yang believes that to achieve early cancer screening, two issues must be addressed: one is determining whether detected mutations indicate the presence of cancer; the other is identifying the location where the cancer has developed. Liquid biopsy provides the prerequisite for non-invasive testing, namely the stage of data generation, but subsequent data analysis relies on the robust application of machine learning and deep learning.

 

“One of the most complex challenges in cancer research is identifying driver mutations—the mutations that directly cause cancer. Leveraging our proprietary AI algorithms, we can accurately identify driver mutations, determine whether cancer has developed, and even pinpoint the location of the pathological changes,” added Zhang Xinjun. “Only then can the entire screening process be considered complete.”


In 2017, the DeepDiagnos team was fully assembled, officially embarking on its entrepreneurial journey. The members of Deep Diagnos hail from Harvard Medical School, Stanford University, and renowned multinational corporations such as Thermo Fisher Scientific and AstraZeneca.


The combined domestic and international teams consist of approximately 10 members. Frankly speaking, the company’s current team is not large. The overseas team is primarily responsible for technical R&D and diagnostic model design, while the domestic team handles sample collection and government liaison.


"Although small, it has all the vital organs."

 

>>>>

Efficient Screening of Tumor Markers: Leveraging the Power of AI Algorithms to Challenge Early Cancer Detection

 

Currently, there are two main technical approaches to early cancer screening: one is the methylation sequencing approach represented by companies such as Grail; the other is the approach combining gene mutations with protein biomarkers, represented by CancerSeek from Johns Hopkins University.

 

Two technical approaches represent distinct directions. The methylation-based approach involves extracting cell-free tumor DNA and performing methylation sequencing to analyze methylation patterns for determining the tumor’s anatomical origin. A key advantage of methylation analysis is its ability to accurately identify the tissue of origin for tumors. Its application in hepatocellular carcinoma diagnosis is well recognized, whereas research on methylation-based diagnostics for other cancer types remains limited. Moreover, there is currently no highly compelling evidence demonstrating the accuracy of methylation markers for early cancer detection.


DeepDiagnos employs a technology similar to CancerSeek, characterized by a relatively simple workflow, high stability of its algorithmic models, and low cost. By detecting a fixed set of mutation sites and conventional serum biomarkers, it enables accurate cancer screening and determination of tissue origin.


微信图片_20180530153033.jpg

deepDiagnos Workflow Diagram


Zhang Xinjun told VCBeat, “In April 2018, Grail disclosed a set of research data suggesting that the predictive performance of the CancerSeek model is no inferior to that of whole-genome methylation data. Moreover, since Grail uses whole-genome methylation data, its cost is at least ten times higher than that of CancerSeek. However, CancerSeek is by no means perfect; it currently performs poorly in diagnosing Stage I tumors, but it represents a very promising start for early cancer screening.”


The key distinction from CancerSeek is that DeepDiagnos employs its proprietary AI algorithms to precisely identify driver mutations, whereas CancerSeek relies on an “empirical” approach by selecting commonly known driver mutations. The clear advantage of the AI algorithm lies in its ability to address scenarios where research on certain cancers is limited; in such cases, relying solely on empirical knowledge may fail to yield sufficient mutation sites, thereby compromising model accuracy. Conversely, exploring novel driver mutations through traditional research methods is an arduous and time-consuming process.


DeepDiagnos’s proprietary driver mutation screening algorithm can rapidly analyze a patient’s whole-genome data and accurately identify the driver mutations within. Due to the high genomic heterogeneity among tumor patients, the probability that two patients share the same set of driver mutations is extremely low; thus, well-known driver mutations documented in academia cannot cover all patients. Only powerful AI algorithms can ensure that no clinically significant mutation site is overlooked.


微信图片_20180530153329.jpg

Leveraging proprietary algorithms to achieve high-precision whole-genome screening for the identification of cancer-associated driver gene mutations


The advantages of AI algorithms are even more pronounced in addressing ethnic differences. For instance, in non-small cell lung cancer (NSCLC), the EGFR mutation rate is significantly higher among East Asian patients than among their European and American counterparts. Therefore, it is predictable that relying solely on common driver mutations may lead to overlooking certain cases specific to Asian populations. However, this issue does not arise with AI algorithms.


Hu Yang explained that their algorithmic model is primarily divided into two components. The first component involves tumor assessment, where a list of mutated genes is initially identified through algorithms to evaluate the likelihood of tumorigenesis based on these mutations. The second component entails constructing disease-specific models, into which the detected data are input for scoring. The results are then ranked in descending order of scores, with higher scores indicating a greater probability of occurrence.


“This is, in fact, a process of quantification,” he explained.

 

>>>>

Data Acquisition Is the Greatest Challenge


Over the past year, the team has primarily focused on advancing early screening for lung cancer. “This type of cancer has a high incidence rate, but patients diagnosed at an early stage have a favorable prognosis, and treatment options are evolving rapidly; therefore, early diagnosis is highly meaningful,” said Hu Yang.

However, developing algorithmic models is no easy task; the greatest challenge lies in acquiring high-quality data. Once a machine learning model is constructed, it requires extensive clinical data for training and testing. The larger the sample size, the more accurate the resulting outcomes will be.


“As more data is collected, model performance will continue to improve, and the algorithmic component will undergo continuous iteration,” he stated, noting that the process of collecting high-quality data is inherently challenging.


微信图片_20180530153325.jpg

Through years of data accumulation, they have amassed genomic data from over 1,000 tumor cases, covering a wide range of common cancers.


They initially obtained partially public data. However, it was evident that these data were insufficient in terms of volume and diversity. After multiple engagements with the Third Affiliated Hospital of Harbin Medical University, they finally reached an agreement with the hospital to use its clinical research data for model training.

 

“At the outset, a cancer test may require only a few hundred samples, but as the product advances toward market launch, the required sample size continues to grow,” Zhang Xinjun explained to VCBeat. The market introduction of in vitro diagnostic (IVD) products is divided into three stages. The first stage, known as prospective studies, may require only a few hundred samples to validate the underlying principle. The second stage involves multicenter independent trials, which require thousands of samples. These trials are conducted to ensure product reliability and to prepare for subsequent regulatory approval for market entry.

 

In addition, DeepDiagnos has also reached and signed cooperation agreements with two other oncology hospitals to collect samples for preliminary scientific research collaboration through these hospitals.


He disclosed to VCBeat that the panel for early lung cancer screening has been fully developed, incorporating multiple mutation genes and protein biomarkers associated with lung cancer. Building on this foundation, they have completed the development of an algorithmic model capable of analyzing and determining whether a subject has developed cancer, as well as quantifying the tumor’s location and stage.


“This model can later be expanded to pan-cancer applications,” he added. It is understood that the team has already begun developing a diagnostic model for colorectal cancer.

 

>>>>

Product Objective: Price Reduction, Inclusion in the National Reimbursement Drug List (NRDL)


Unlike companies focused on liquid biopsy, DeepDiagnos positions itself as a provider of pan-cancer detection for Stage I and II tumors. They aim to enable patients to become aware and take action at the early stages of tumor development through precise prediction.


This has also enabled its products to take on diverse forms. On one hand, the product can function as a health monitoring tool, similar to routine check-up items. For example, in collaboration with insurance companies, it provides screening services every one to three years for high-risk populations. On the other hand, like other in vitro diagnostic products, it can serve as an auxiliary tool for the clinical diagnosis of tumors.


“Integration with insurance is undoubtedly a path that must be taken. China has hundreds of millions of smokers, among whom the high-risk population for lung cancer may reach 100 million. This represents a vast market for early screening,” said Hu Yang. “In addition, there are carriers of hereditary genetic mutations within families, so the health checkup market is substantial.”


“Our ultimate goal is still to have our products included in the national medical insurance reimbursement list. This is also the aspiration of most in vitro diagnostic (IVD) companies,” he added.


Although the cost of early screening based on liquid biopsy has been significantly reduced compared to MRI and PET-Scan detection, this cost is still some distance away from large-scale market adoption of the product.


In the United States, healthcare is closely integrated with commercial insurance. By partnering with insurers, healthcare providers can help identify high-risk populations and provide them with regular screening, thereby reducing future medical costs. For high-risk patients, insurer involvement encourages more proactive screening, which not only lowers disease risk but also reduces potential out-of-pocket medical expenses for patients.

 

“This is a win-win for both insurance companies and users. Just as dental insurance in the United States typically covers two free professional cleanings per year to reduce the risk of developing serious oral diseases in the future, thereby saving costs for insurers,” said Zhang Xinjun.


“Initially, there may be some high-end users who opt for testing by paying out of pocket. However, if the test can be incorporated into the national medical insurance system and as the user base expands, the cost-spreading effect will become increasingly significant, making the price much more affordable and substantially broadening the user base.” He believes that if the average revenue per user (ARPU) can be kept within RMB 3,000, a large proportion of high-risk individuals will undergo testing every one to two years, with an estimated annual testing volume of 50 million to 100 million person-times.


Prior to inclusion in the national medical insurance reimbursement list, these products will be marketed as laboratory-developed tests (LDTs). Subsequently, early lung cancer screening products will enter prospective clinical studies and prepare for regulatory submission for clinical medical device certification.

 

>>>>

Next Step: Talent and Scientific Research


The company is currently preparing for clinical trials of its early lung cancer screening product and applying for government funding and projects. The colorectal cancer product model is also under development, with subsequent products set to enter the market closely following the lung cancer product.


Next, talent expansion will be one of the key priorities. “I’ve basically reached out to all my former senior and junior fellow students,” Zhang Xinjun joked. “Most of them are currently engaged in scientific research at top-tier institutions such as Harvard and Stanford, which is exactly the kind of talent our company needs.”


On the other hand, they are engaging with research institutions both domestically and internationally, seeking to collaborate with more scientific and academic institutions in the United States and China to further develop and validate additional early screening products, thereby laying the foundation for subsequent clinical trials.


It is reported that the company has completed a seed funding round of several million RMB, with the funds primarily allocated to project initiation and technological research and development. The company currently has additional financing needs.