Home AI-Powered Molecular Simulation in the Cloud: Accelerating Drug Discovery and Overcoming R&D Bottlenecks

AI-Powered Molecular Simulation in the Cloud: Accelerating Drug Discovery and Overcoming R&D Bottlenecks

Aug 11, 2022 10:00 CST Updated 10:00

New drug development is becoming increasingly difficult.

 

The increasing complexity of human diseases and the dwindling number of druggable targets have led to a decline in the speed of new drug development and market approval. Currently, it takes more than 10 years for a new drug to go from discovery through clinical validation to regulatory approval and market launch.

 

Moreover, new drug development is characterized by high investment. According to relevant data, the market size for preclinical R&D alone reaches $60 billion, with the market for protein-scale experiments amounting to as much as $18 billion.

 

Therefore, given the inherent characteristics of drug development—namely, high investment, high technical complexity, high risk, and long cycles—it is not feasible for most companies to rely solely on experimental methods for drug discovery. Instead, they are leveraging computational technologies such as artificial intelligence (AI) and big data to accelerate the drug development process.

 

Specifically, in the field of AI-assisted drug development, the primary role of AI is to perform mechanistic simulations and calculations regarding the binding of candidate drug molecules, compounds, and proteins, as well as gene functions. Typical application scenarios include virtual drug screening and protein structure prediction.

 

Cutting-edge information technologies such as AI and big data are becoming essential necessities in the pharmaceutical industry, spawning a blue-ocean market and attracting a large number of companies to enter the space. This is an uncharted path, destined to be far from smooth.

 

Algorithms and Data: The Two Major Challenges in AI-Driven Drug Discovery


Generally, the development of new drugs requires first identifying a specific disease target, which serves as the primary site of action for the drug. Since these targets are often proteins, understanding proteins can be considered the first step in new drug research and development.

 

“However, traditional proteomics analysis techniques and methods are not entirely suitable for studying protein systems; what is lacking is the process of accumulating quantitative data on proteins, as well as appropriate algorithms,” said Guo Tiannan, a specially appointed researcher at Westlake University, in a media interview.

 

Zhang Linfeng, Founder and Chief Scientist of DP Technology, also mentioned that when deciding to leverage artificial intelligence and molecular simulation algorithms to provide R&D professionals with tools for computation and design at the microscopic level, they also faced challenges in handling AI models and data.


B10I6213.JPG
Zhang Linfeng, Founder and Chief Scientist of DeepModeling

 

The primary reason is that AI itself requires relatively large-scale, structured data for training to extract patterns among the data and further optimize the model. However, in “AI + molecular simulation,” high-quality data are often lacking, making it difficult for AI to exert its effectiveness.

 

To address the “chicken-or-egg” dilemma between data and models, DP Technology has taken a novel approach by learning scientific principles from first principles in relevant fields at the source. It then leverages these AI-learned scientific principles to perform simulations, thereby generating new data. Subsequently, by integrating AI-generated data with experimental data, the company facilitates the emergence of higher-level AI models through iterative cycles.

 

Ultimately, DP Technology has increased the computational speed of molecular dynamics by at least five orders of magnitude while maintaining quantum mechanical accuracy, with computational resource requirements scaling linearly with the number of atoms in the system.

 

After establishing the new “computation + experimentation” paradigm and driving a transformation in drug discovery, DP Technology has encountered a new pain point: a surge in computing power demand.

 

“At the outset, our self-developed ‘AI + molecular simulation’ algorithm could run on a single laptop. As our solutions scaled up, computational demands increased, prompting us to progressively upgrade to small-scale supercomputing clusters. Given the inherent characteristics of the ‘AI + molecular simulation’ computational paradigm, varying computational scale requirements across different stages, and specific hardware specifications, achieving efficient computation has become an unavoidable challenge in the company’s development,” said Zhang Linfeng.


New R&D Paradigm, Soaring Demand for Computing Power


In fact, the demand for computing power is not a challenge faced solely by DP Technology, but a shared imperative across the currently booming field of life sciences.

 

Computer-aided drug design and gene sequencing are two major application scenarios in life sciences. The introduction of new technologies has improved efficiency, but also led to a surge in computational power demands for both.

 

High-throughput gene sequencing generates massive volumes of genomic sequence data following sample preparation and instrument-based sequencing, entailing extensive data storage, computation, and transmission, which impose stringent requirements on underlying infrastructure.

 

Shengting Medical, a biomedical company specializing in genetic testing and precision medicine, has experienced rapid business growth in recent years. As data volumes have increased, the costs for servers and operations and maintenance have surged, while the computing power of its IDC data centers has struggled to meet demand. Large volumes of sequencing files are forced into queues, placing significant pressure on sequencing personnel and adversely affecting patient experience.

 

In the field of computer-aided drug design (CADD) addressed by DP Technology, the substantial computational power provided by high-performance computing (HPC) is often required, whether during the drug discovery phase—such as target identification and compound synthesis—or during preclinical research stages, such as compound screening.

 

Emerging research paradigms demand massive computational power, while the rapid development of the industry has further driven a steep rise in enterprises’ demand for computing resources.

 

Migrating to the cloud has become a common choice for life sciences companies.


b7c6bf6118444f17bfddb415e47c0f05.jpg

He Wanqing, Head of High-Performance Computing (HPC) Product R&D at Alibaba Cloud, Discusses How HPC+AI Supports the Rapid Development of the Life Sciences Industry

 

Shengting Medical ultimately optimized data reliability, O&M costs, and efficiency issues of its traditional IDC clusters by migrating to the cloud, achieving a 70% increase in gene comparison and analysis efficiency. The Alibaba Cloud Supercomputing Team integrated Slurm workload dependency management with auto-scaling on E-HPC, minimizing wasteful consumption of computational resources and effectively reducing usage costs.

 

Like most startups, DeepModeling chose to build its business on the cloud from the outset. DeepModeling has clear computing power requirements: rapidly deploying services for customers while leveraging the elastic scaling capabilities of the cloud to maximize computational resource utilization and achieve cost-effective, efficient computation, given the significant fluctuations in its workload. This aligns with the common demands of most life sciences enterprises: rapid scalability and reduced operational and maintenance costs.

 

By leveraging DP Technology’s Hermite™ platform and Alibaba Cloud’s Elastic High Performance Computing (E-HPC) cluster, DP Technology has reduced the average synthesis scale per pipeline from thousands to dozens, significantly lowering synthesis costs and shortening wait times for custom synthesis. The time required to advance pipelines to preclinical candidate compounds has also been halved.


Leveraging Cloud Elasticity to Drive Massive Computational Power


Why Are Life Sciences Companies Flocking to the Cloud? It May Be Closely Tied to Their Growing Demand for High-Performance Computing.

 

For startups, traditional IDC data centers clearly cannot keep pace with their rapid growth; adopting a lightweight approach has instead allowed DeepModeling to better focus on its core business. Moreover, the inherent limitations of building IDC data centers are quite apparent—enterprise IT infrastructure may face three core challenges: fixed resource scalability, long construction cycles, and high operational and maintenance costs for hardware resources. As life sciences companies experience rapid business expansion, these shortcomings become even more pronounced. This is why Shengting Medical, having entered the fast lane of development, has chosen to migrate to the cloud.

 

“Compared to the traditional IT era, the most distinctive feature of the cloud era is the ‘servitization’ of software and hardware—delivering capabilities through IaaS, PaaS, and SaaS,” pointed out Zhang Xiantao, Head of Alibaba Cloud’s Elastic Computing Product Line. “In the traditional IT era, the biggest drawback was that all products had to be purchased outright, requiring companies to hire staff for application operations, infrastructure maintenance, and middleware/database management. Today, with these components transformed into services, cloud providers offer a comprehensive cloud service system spanning from IaaS to PaaS to SaaS.”

 

Alibaba Cloud organizes computing, storage, and network resources in the form of “resource pools,” which not only prevents the idling of limited resources but also enables timely allocation during peak loads. Meanwhile, it allows enterprises to enjoy the technological dividends of the cloud more quickly and at a lower cost, optimizing their IT resource configuration and significantly reducing their investment in IT operations and maintenance.

 

In addition to DP Technology and Shenting Medical, which were mentioned earlier for accelerating AI-driven drug discovery, other institutions such as Xunyin Bio, which is dedicated to “single-cell sequencing,” and the Global Health Drug Discovery Institute (GHDDI), which is committed to developing drugs for “diseases of the poor,” have all adopted Alibaba Cloud’s high-performance computing solutions.

 

Alibaba Cloud Launches Three New Solutions to Support the Development of Life Sciences


Leveraging its first-mover advantage in the cloud computing sector, Alibaba Cloud independently developed Apsara, China’s only cloud operating system, as early as 2009. Subsequently, by building distributed computing infrastructure “X-Dragon,” distributed networking “Luoshen,” and distributed storage “Pangu,” it has enabled enterprises to elastically schedule and utilize computing resources.

 

Building on its general-purpose computing platform, Alibaba Cloud unveiled the white paper “Cloud Solutions and Best Practices for the Life Sciences Industry” and launched three high-performance computing (HPC) solutions—high-performance containers, large-memory instances, and high-I/O instances—at the 2022 Alibaba Cloud Life Sciences and Intelligent Computing Summit on August 5. Led by He Wanqing, Head of HPC R&D at Alibaba Cloud, these offerings are designed to meet the diverse needs of life sciences scenarios, including massive-scale data analysis and heterogeneous workflows and environments in gene sequencing and AI-driven drug discovery. Together with its existing public cloud and hybrid cloud offerings, this brings Alibaba Cloud’s total portfolio to five comprehensive solutions for the life sciences industry.


B10I5540.jpg

 

DP Technology currently employs a hybrid cloud solution. This strategic decision is driven by the nature of its client base—primarily universities and pharmaceutical companies—which possess available computing resources in their own IDC data centers.

 

Xunyin Biology has opted for a high-memory solution, as single-cell sequencing data analysis involves hundreds of thousands of reads per cell, generating substantially larger datasets. This massive-scale data analysis imposes higher demands on the memory capacity of cloud servers.

 

When Xunyin Bio deployed single-cell sequencing analysis tasks on Alibaba Cloud i4p persistent memory instances, powered by 3rd Generation Intel® Xeon® Scalable Processors (codenamed Ice Lake) and Intel® Optane™ Persistent Memory, and utilized Memory Machine, a large-memory software developed by MemVerge, it not only completely eliminated I/O bottlenecks caused by disk read/write operations but also successfully executed sequencing data analysis tasks involving large cell counts and extensive sample sizes. Furthermore, by leveraging the ZeroIO memory snapshot feature of Memory Machine, the time required for data export and loading was reduced from 1,000 seconds to just 2.5 seconds, improving data reading efficiency by two orders of magnitude. Meanwhile, Alibaba Cloud E-HPC enables elastic scaling of ECS nodes equipped with Memory Machine, allowing for one-click installation and deployment of the environment. E-HPC manages the elastic scaling and automatic integration of MemVerge nodes; during peak business periods, ECS i4p instances with Memory Machine are added to the HPC cluster, while they are released during off-peak times to reduce costs.

 

The collaboration between DP Technology and Alibaba Cloud is deepening further. The Hermite™ drug computational design platform, built on Alibaba Cloud’s Compute Nest and E-HPC services, provides pharmaceutical companies with a delivery solution that balances SaaS flexibility with data asset security. This approach maximizes the operational efficiency of Hermite™ in the cloud and further strengthens customer trust in the platform.

 

As DP Technology further explores the integration of AI and molecular simulation, an increasing number of pharmaceutical companies are undoubtedly converging in this field. Ultimately, as He Wanqing, Head of High-Performance Computing Product R&D at Alibaba Cloud, noted, “The unique connectivity and elasticity of cloud computing can help break down R&D silos, facilitating the reuse and innovation of data-driven outcomes.” These advancements will ultimately empower the life sciences industry by accelerating pharmaceutical research and development.