Home Shengshi Junlian Rewrites Biologic Drug Discovery with BioAI Powered by a 300-Billion-Strong Real-World Compound Library

Shengshi Junlian Rewrites Biologic Drug Discovery with BioAI Powered by a 300-Billion-Strong Real-World Compound Library

Nov 23, 2023 07:59 CST Updated 08:00
ABLINK

Large Molecule AI Drug Developer

In 2016, ABLINK was founded. Over the course of eight years, ABLINK has built a real biopharmaceutical library with three hundred billion levels of diversity and, on this basis, created BioAI, a large molecule AI drug discovery platform that combines dry and wet lab experiments.


How to advance from a real biopharmaceutical library to an AI drug discovery platform in 8 years? And how to break through the high barriers of AI drug discovery? VCBeat conducted an exclusive interview with Liu Jianghai, CEO of ABLINK.


>>>>

VCBeat: ABLINK started with biopharmaceutical discovery and optimization services. Why did it spontaneously create an AI-driven drug research and development platform?


Liu Jianghai:From the explosive popularity of the AI concept to its slight cooling over the past three years, many excellent large AI models have emerged. When applied to life-related fields with massive data and frequent interactions, they have achieved overwhelming victories over traditional work methods. However, in specialized fields with relatively limited public data and slower data validation feedback, such as the biopharmaceutical field, how can AI make its entry? Personally, I firmly believe that AI will eventually transform the drug research and development model. But as a researcher with a biomedical background, I cannot understand the mathematical formulas or computer codes within AI models, let alone program or adjust parameters, making it impossible for me to directly use AI in my work. At the same time, like most researchers accustomed to traditional methods, I constantly question the accuracy of AI-generated data.


In essence, this is due to the differences in cognition, language, understanding, and logical pathways between algorithm model developers and biopharmaceutical researchers, which in real-world scenarios lead to ineffective communication and application barriers characterized by "you don't understand me, I don't understand you." We are not short of excellent large AI models, but rather methods to deeply apply these models in specialized fields. Over the past few years, while actively engaging with and utilizing the latest AI tools, we have collaborated with or served several leading AI pharmaceutical companies in China, accumulating a wealth of practical experience in the process. We believe that high-quality labeled data, AI algorithms based on biological logic, and "foolproof" application software are the keys for AI to unlock the door to biopharmaceutical research and development.


Build a 300-billion-level diverse real biopharmaceutical library

>>>>

VCBeat: Can you share your in-depth thoughts on the use of AI in biopharmaceutical R&D? For instance, what constitutes high-quality labeled data? Where does ABLINK source its data for AI learning and training?


Liu Jianghai:High-quality data is the foundation of AI learning and training, and in the biopharmaceutical field, there is a particular need for real, verified data with biological annotations.Over the past seven years, ABLINK has built a "three-trillion-level diversity" real biological drug library, including human antibody libraries, nanobody libraries, peptide libraries, affibody libraries, CAR-T libraries, and TCR libraries. Relying on this platform, ABLINK's proprietary projects have obtained tens of millions of data points with multiple biological tags, and this number is still growing rapidly.


Data quality and originality are protected by continuously advancing technical barriers. The "30-billion-level diversity" achieved by ABLINK's drug library stems from one of the company's core technologies—"Fully Synthetic Library Technology." The foundational methodology was introduced from Genentech, a U.S.-based company, in 2016. Over the following five years, two significant local technological breakthroughs were achieved: the first being "multi-site mutation efficiency increased from an initial 20% to 100%," and the second "single construction of 10."11Diversity Synthesis Library". It is currently possible to construct 10 within 2 weeks.11Diverse Total Synthesis Library, 10 Constructs in 3 Months13Diverse full-synthesis libraries, which enable ABLINK to achieve library construction efficiency and sequence diversity far surpassing peers using traditional technical routes (such as mouse hybridoma or B cell sorting).


Utilizing globally leading fully synthetic library technology and a "300 billion-level diversity" real biopharmaceutical library, ABLINK has been providing technical services such as biopharmaceutical library construction and biopharmaceutical discovery optimization to numerous top pharmaceutical companies since 2019. It has established a strong reputation in the industry and gained extensive first-hand insights into the real pain points and challenges of the drug development process.


On the basis of ensuring data quality, authenticity, and originality, the continuity, labeling, and ranking of data are also important for AI learning and training.Typically, AI training uses data from public databases, which are fragmented in terms of sequence similarity and单一的、无关联的 in terms of sequence labeling, but weSequences obtained from the biopharmaceutical library exhibit excellent continuity and relevance.


First, the site-directed continuous mutation of fully synthetic library technology enables the sequences of biopharmaceutical libraries to exhibit continuous amino acid changes, which correspond one-to-one with biological properties. This allows AI to easily learn the biological significance of single or multiple amino acid mutations.


Second, through targeted design and screening, fully synthetic libraries can tag different sequences with the same biological label, or tag identical or similar sequences with different biological labels.


Third, the sequences produced by the fully synthetic library are directly sorted, and the screened sequences will be presented according to affinity strength, stability level, and activation capability.


Fourth, through positive and negative screening, the data produced by the fully synthetic library also shows a clear clustering of positive and negative data.


Therefore, our real biologics library can obtain drug sequences with a very rich combination of tags, continuous amino acid changes, and sorted biological properties, which are more adaptable to the AI application concept of "multi-task collaborative optimization." Based on these advantages, ABLINK's fully synthetic library has been providing data packages or customized production data to seven AI companies in China since 2021.


In addition, the full synthetic library technology is also a powerful tool for high-throughput validation of AI prediction data.Whether AI can outperform traditional human workflows, the efficiency of validation and iteration is also crucial. In the field of small-molecule drugs, AI can already design fewer than 20 small-molecule candidates for a specific disease target, making it relatively easy to verify each one individually. However, in the field of large-molecule drugs, AI often predicts over 10 large-molecule candidates for a single target.10The above cannot be expressed and verified one by one. The fully synthetic library targets AI-predicted sequences for library construction and high-throughput screening, which can be completed in a short time >1012Rapid validation of large molecular candidates. Currently, ABLINK's maximum diversity for validating AI-predicted data exceeds 10.20, is a commercial service project provided for a leading AI enterprise in China.


Building an AI Biopharmaceutical R&D System with "Model + Software"


>>>>

VCBeat: Why use biological logic to build AI algorithms? How does ABLINK do it?


Liu Jianghai:We know that small-molecule drugs have smaller, rigid structures with limited binding areas to target proteins, offering relatively stable spatial configurations. AI models based on energy or amino acid physicochemical properties, through geometric approximations, do not lose much critical real information. However, large-molecule drugs have larger binding areas with their targets, and their interactions remain in a dynamic process of mutual attraction and pulling.


Because our team members have backgrounds in structural biology and biomedicine, we consistently believe that it is unreasonable to train AI using data derived from energy and physics calculations to represent the dynamic changes of protein macromolecules themselves or between them. This can lead to inaccuracies in the structural predictions of macromolecular drugs by AI and unreliable assessments of bioactivity, introducing uncertainty into drug development.


So, how should AI technology empower the research and development of macromolecular drugs? Can we develop an AI model based on biology, particularly using protein structure and evolution as the underlying logic? Our unique fully synthetic library technology has the ability to continuously and directionally produce massive amounts of high-quality data. If we can use these biologically tagged, continuous, and ordered protein sequences to train AI, while paying special attention to the biological evolution and structural biology logic of these sequences, we will be able to obtain a unique biological AI model (BioAI).


Therefore, in 2021, we began to build our own AI technology team. At the time of its establishment, the team set clear goals:

(1) Without independently developing Transformer and hyperparameter large models, we use authorized large models and proprietary data to develop pre-trained models and specialized models tailored for biopharmaceutical R&D — this is our BioAI.

(2) At the same time, develop professional software based on BioAI to help each scientific researcher solve specific tasks in biopharmaceutical R&D.


BioAI uses the amino acid sequence of proteins as code, associating it with multiple biological tags. During training, it focuses on the randomness and preferences in protein evolution, the conservation and diversity of antibody sequences, the polymorphism of protein structures, and the rigidity and flexibility of protein-protein interaction interfaces.BioAI may not understand why over four billion years of evolution has led to the specific sequences of proteins we see today, but by comparing and correlating data, it will establish a one-to-one correspondence between sequence codes and biological properties. This allows it to predict how changes in amino acids at specific positions alter biological properties, enabling intelligent directed evolution of proteins—exactly what is needed for the generation and optimization of biopharmaceuticals.


The construction of BioAI requires the collaboration of talents from different professional backgrounds. To address the classification of massive data and high-dimensional correlation issues, ABLINK has brought in Huang Chen, who has years of experience at Microsoft and Oracle, as Co-CEO to oversee the application of generative AI. To train AI based on biological logic, ABLINK has hired Zeng Xin, a biology Ph.D. from Tsinghua University, as CTO, and Professor Zhang Kang, who has extensive experience in AI medical applications, as CSO. Additionally, to expedite the development of specialized software, ABLINK has introduced Shen Yun, with years of experience at Microsoft, as Chief Architect to develop "foolproof" biopharmaceutical R&D software.


It is gratifying that the paper led by Professor Zhang Kang and participated by ABLINK, which studies protein-protein interactions (PPI) through AI based on biological logic, was published in August this year.Nature MedicineUp.


"BioAI for Scientists" Software Launches


>>>>

VCBeat: You mentioned that you want to use BioAI to create "foolproof" software. Could you elaborate on how it can help researchers use AI to solve R&D problems?


Liu Jianghai:The availability of AI models does not mean they can be directly applied to specific projects in biopharmaceutical R&D. The professional software developed by ABLINK is called "BioAI for Scientists." We hope this software can achieve the "Four AI Transformations":


AI Specialization:Focusing on the deep application of large AI models in the biopharmaceutical field, using exclusive real data for pre-training, allowing AI to continuously iterate and evolve in vertical applications, making AI prediction data approach or even surpass data obtained from traditional experiments.


AI Scenarios:Set digital scenarios for specific experiments in biopharmaceutical R&D, match AI algorithms step by step for calculation and prediction in each scenario, embed the logic and rules of biopharmaceutical R&D into each step to control the quality of AI outputs, and achieve AI substitution for traditional R&D processes.


AI Toolization:Integrate specialized and scenario-based AI into software and apps that can solve key problems and steps in biopharmaceutical R&D through software engineering, enabling every scientist, researcher, technician, and student to easily use AI with "one-click input, one-click output" foolproof operations.


AI High-Throughput:By continuously training and validating in real-world scenarios, AI can be developed into an exceptional scientist, providing near real-time feedback while managing multiple workflows in parallel, significantly enhancing the efficiency of drug discovery and scientific research.


In October 2023, after three years of technical沉淀 and extensive data validation, ABLINK's "BioAI for Scientists" series software has officially launched. Researchers can now simply log in to ABLINK's "BioAI-Driven Biopharmaceutical R&D" portal, select the corresponding service type, and submit the relevant sequence data to await AI screening results. The standard AI computation time does not exceed one week.

 图片2.png

Structure Diagram of BioAI for Scientists


“BioAI for Scientists” relies on ABLINK's proprietary technology platform and Biological Artificial Intelligence (BioAI) to build a closed-loop, dry-wet alternating, and automatically iterative AI biopharmaceutical R&D software system. This system includes real data production, AbCypher data processing, BioAI algorithms, AI software, and high-throughput real validation. Through simple operations such as "one-click input of target sequence, one-click output of drug sequence" and "one-click input of parent sequence, one-click output of optimized sequence," it provides every drug researcher with efficient, convenient, and accurate AI technical service software. Most importantly, by embedding biopharmaceutical R&D rules into the software steps, it ensures that the output results are no less effective than traditional experimental results, achieving the professionalism of biopharmaceutical R&D.


The Future of AI Drug Development: "Boundaryless" Biopharmaceutical Creation


>>>>

VCBeat: What are your expectations for the future of BioAI? Or what are the future goals of ABLINK?


Liu Jianghai:BioAI actually has strong scalability. Its applications are not limited to the discovery of biopharmaceuticals but can also extend to bioenzymes, physicochemical property research, mRNA downstream validation, and downstream process development. The virtual biopharmaceutical library is infinitely large and diverse, an advantage that real drug libraries cannot match. This means a larger and more organized pool of drug sources, enabling limitless diversity in drug discovery.


I believe ABLINK has entered the second phase of AI development in biopharmaceuticals: from human experience to AI intelligence, from limited discovery to unlimited discovery. Although our current AI still cannot break free from natural laws and data dependency, I believeAfter 2-3 years of technological development, AI can progress to the drug creation stage. Once sufficient data training and project nurturing are completed, AI will no longer require pre-training with data, thereby evolving into a drug creator AI, or the third phase of GenAI. It will become a boundaryless drug creator, generating entirely new and unprecedented drugs.From natural discoveries to AI's "boundaryless" creation, this will signify that biomedicine has truly reached AI intelligence. This is also the future goal that ABLINK strives for.