Zhitu Biopharma Files IPO Prospectus: Building a 30-Billion-Compound AI-Driven Database for Drug Discovery

Aug 03, 2020 08:00 CST Updated 08:00

“Future drug development will definitely require the involvement of AI.” In 2016, while pursuing his Ph.D. at Xiamen University, Chen Xingqiang followed his advisor’s recommendation and made some early entrepreneurial forays into the “AI + healthcare” sector.

Xingqiang Chen has traversed the interdisciplinary frontier from theoretical physics to biophysics, specializing in computer-aided drug design and AI technology R&D. During his academic years, his research consistently focused on computational simulations of protein–small molecule interactions and the chemical reaction processes between proteins and small molecules. In his professional career, he has primarily dedicated himself to the application of AI technologies and product commercialization.

As early as 2013, Chen Xingqiang had already planted the seeds for a career in drug R&D and worked diligently behind the scenes. He told VCBeat that he had been waiting for the right opportunity to enter the pharmaceutical industry, which finally arrived in 2016.

“Seeing the AI boom, I wanted to enter the healthcare industry to make an impact.” In October 2016, Chen Xingqiang embarked on his first entrepreneurial venture in the “AI + Healthcare” sector by founding Xiamen Xia Zhi Yi Biotechnology Co., Ltd. (hereinafter referred to as “Xia Zhi Yi”). The company entered the medical field by leveraging AI to enhance medical imaging screening, using artificial intelligence to help physicians achieve more precise diagnoses of patients’ lung images.

In March 2020, leveraging his extensive experience in the practical implementation of AI applications, Chen Xingqiang decided to return to his area of expertise and passion—computer-aided drug design (CADD)—and founded Zhejiang Zhitu Biopharmaceutical Technology Co., Ltd. (hereinafter referred to as “Zhitu Bio”). The company is dedicated to applying advanced machine learning algorithms to provide precise and efficient solutions for new drug discovery.

VCBeat (WeChat ID: vcbeat) conducted an exclusive interview with founder Chen Xingqiang on his two consecutive entrepreneurial ventures, aiming to reconstruct the core competitiveness of Zhitu Bio and gain insights into the future of AI-empowered drug discovery.

Build a database of 3 billion virtual compounds, with data cleaning, reorganization, and tenfold expansion expected to be completed by the end of the year

>>>>

Q: "What are your thoughts on the application of AI in this industry?"

“First, we must clarify the distinctions and connections between AI and traditional computer software. Traditional software is largely a functional aggregate built upon the Turing machine model, aiming to leverage CPU-intensive computing to help us improve daily work efficiency. AndAI Output Is a Capability, not specific functions. If you carefully distinguish between the two, you will find that software functionality is deterministic, whereas AI’s “capabilities” are dynamic and evolving; software functions are applied to specific workflows, while capabilities represent the core attributes required to solve a class of problems, demanding a higher standard. AI capabilities must reach the level of human experts before they can be integrated into production processes for commercial deployment. This poses a new requirement for computing systems, which goes beyond merely aggregating discrete functions.

Meanwhile, as we recognize the distinctions between AI and traditional software, we must also acknowledge their interconnections. Whether it is conventional software or AI systems, none can be divorced from the specific scenarios in which problems are solved. Within any given scenario, functionality alone is insufficient, as is capability alone; both are required. This presents a common challenge for today’s AI practitioners and software developers: how to clearly define their respective functional attributes and leverage the advantages of integration.

The AI-driven capabilities in the pharmaceutical industry must reach expert-level proficiency and withstand scrutiny and endorsement from regulatory bodies such as the China Food and Drug Administration (CFDA) and the U.S. Food and Drug Administration (FDA), as well as from healthcare professionals and experts, to achieve clinical-grade AI applications. Underpinning this achievement is the need for AI to develop its own models for addressing industry-specific challenges, which requires substantial data support and deep domain expertise.

Data is always the first step in AI-driven initiatives., this issue cannot be avoided. In the face of the myriad problems in the real world, vast amounts of data available for reference and calibration are being generated, yet also disappearing.

If we revisit the concept of big data, I believe two aspects must be addressed. On one hand, acquiring data of significant value always incurs costs; however, as computer technology and industry practices advance, the costs associated with cloud computing and big data development tools are gradually decreasing, making big data a viable option for enterprises to reconsider their strategic pathways and growth. On the other hand, people’s recognition of data’s value generation and their understanding of the boundaries of data analytics capabilities are continuously evolving.

From this perspective, big data may only be at its nascent stage, as without the iterative advancements of AI as a tool, the mining and application of big data would remain merely theoretical. Therefore, the rational generation, storage, and application of big data are essential tasks that every data-driven company must consider and implement, particularly those in the AI industry. We cannot explore data in isolation from specific industries, nor can we seek industry-specific solutions without leveraging industry data; furthermore, it is impossible to create valuable tools out of thin air.

>>>>

Q: "Could you elaborate on how Zhitu Biology applies, generates, and stores data in the pharmaceutical R&D industry?"

“Zhitu Biology has two core strategic pillars in terms of data,”One relies on going global, while the other depends on self-reliance.

Going Global, it means that our company's data construction process must not be detached from industry pain points or industry issues. We must accurately identify the primary contradictions currently existing in the industry, and based on the recognition of these contradictions, determine the data we need to collect and store;Self-Reliance, on one hand, it means that we need to rely on ourselves, but this is not entirely about subjective emotional efforts; rather, we need to leverage AI technology to produce and optimize data.

"Guided by these two considerations, it is evident that in the pharmaceutical industry, validating the relationship between targets and lead compounds represents a challenging yet highly worthwhile endeavor requiring in-depth resolution. As practitioners in the AI sector, our primary objective is to streamline legacy workflows, enhance problem-solving efficiency, and underscore innovation and transformation."

>>>>

Q: "From a long-term perspective, how does your company envision applying big data in the pharmaceutical industry?"

“Zhitu Biology aims to integrate various types of omics data generated by current research, including genomics, epigenomics, transcriptomics, proteomics, and cytomics, to investigate pathological mechanisms and identify potential targets for specific diseases. Centered on these targets, the company establishes data acquisition workflows, constructs corresponding libraries of lead compounds, and applies deep learning algorithms to search for and recommend suitable candidate compounds.”

"The company's long-term goal is to integrate omics data with in vitro experimental data and clinical-stage trial data for comprehensive analysis and algorithmic application, classify the data, and establish a series of ab initio databases for relevant targets. Ultimately, the collected datasets will be applied to machine learning models for continuous training and iterative optimization."

>>>>

Q: “What are the company’s current core products in development?”

“Currently, the company has developed a platform namedMolecularFlowvirtual screening platform. We leverage open-source data on small-molecule compounds, amounting to approximately3 billion data entries, based on the existing150,000 Potential Drug-Like Small MoleculesConduct generative learning and exploration of novel compounds, integrating Graph Convolutional Networks (GCN), Reinforcement Learning (RL), and Generative Adversarial Networks (GAN) to create new drug-like small molecule compounds, which is expected to be implemented inComplete a tenfold expansion of the base data by the end of this year., further clean and organize the data, and convert the database'sValid Data Expanded to 30 Billion Records, the small-molecule library is expanded to a larger chemical space.”

From the outset, our product was designed to address workflow and efficiency challenges in drug development. Unlike existing CRO companies that offer AI-assisted drug design, we are primarily driven by an integrated system combining algorithms and software. While most pharmaceutical companies use large-scale drug screening software merely as a standalone tool, Zhitu Biology has improved the integration between such traditional tools and R&D workflows. We have integrated, optimized, and streamlined the entire process through a unified algorithmic system, enabling enterprises to fulfill any drug-related requirements through our platform.

This represents a very distinct difference between AI output capabilities and conventional software output functions. When addressing existing validated targets, Zhitu Biology conducts targeted, iterative database screenings based on client requirements. Through repeated cycles of “screening” and “recall,” the order of magnitude of candidate compounds is progressively reduced, ultimately yielding a more precise range of target small-molecule compounds. We anticipate that the entire virtual screening process willApproximately 3–5 dayscan be completed.

>>>>

Q: "In the market for AI-empowered new drug development, why has Zhitu Biology chosen to enter at this time?"

“The state has consistently encouraged and supported the R&D of innovative drugs in recent years. With clear industry demand and a slew of new favorable policies recently introduced, our market and opportunities remain firmly in place. Pharmaceutical companies typically prioritize the technical prowess of CROs, expecting them to deliver well-defined solutions and credible results. Therefore, only by truly demonstrating the value of its technology to pharmaceutical companies can Zhitu Biology enable the market to recognize the value and capabilities of AI.”

>>>>

Q: "Which research institutions has Zhitu Biology currently established partnerships with, and will it develop its own drugs in the future?"

“Zhitu Biology is currently collaborating with the Laboratory of Xiamen University, its School of Pharmaceutical Sciences, and the Shenzhen Institute of Advanced Technology. The company is also actively exploring new partnership opportunities. Zhitu Biology positions itself as an AI-enabled CRO specializing in novel drug discovery, a strategic focus that will remain unchanged regardless of future corporate development. Our priority is to excel in our role as a CRO by fostering strong collaborations with leading pharmaceutical companies. Once the market fully recognizes our capabilities, we will then consider transitioning to independent original drug development. This phased approach offers a more rational and稳健 path for growth.”

>>>>

Q: “Finally, could you share your expectations and vision for the company’s next steps in development?”

“Zhitu Bio already has prototype products in three directions, involvingExpansion of the Lead Compound Library，Accelerated Virtual Screening，Vaccine Designand other areas. We are currently conducting preliminary validation of our first product, MolecularFlow; specific product details have not yet been disclosed. It has been just over three months since the establishment of Zhitu Biology, during which we have completed 30% of our first project. We expect to complete the construction of the entire database backend by October this year. The company has also initiated its pre-A financing round, aiming to raise approximately RMB 10 million, which will be primarily used for database expansion, validation, process optimization, and talent acquisition.