Home GHDDI Leverages Alibaba Cloud Supercomputing to Enable High-Throughput Molecular Screening for COVID-19 Drug Discovery

GHDDI Leverages Alibaba Cloud Supercomputing to Enable High-Throughput Molecular Screening for COVID-19 Drug Discovery

Jun 04, 2020 08:00 CST Updated 08:00

On the early morning of June 1, 2020, the Wuhan Municipal Health Commission issued an announcement stating that on May 31, nucleic acid testing was conducted on more than 60,000 individuals in Wuhan, with no asymptomatic infections detected. This marked the first time since Wuhan began reporting figures for asymptomatic cases that the daily number of new asymptomatic infections was zero, representing another key milestone in China’s epidemic control efforts.

 

However, the impact of the global COVID-19 pandemic on socioeconomic activities continues to persist, and identifying effective therapeutic regimens remains a critical task for scientists worldwide. In an interview with CCTV journalist Bai Yansong in mid-April, Bill Gates specifically stated, “The R&D team at GHDDI will help the world better understand and combat the pandemic.” This independently operated, non-profit new drug research and development institution, founded in 2016, has once again entered the public spotlight in such a high-profile manner.

 

GHDDI, the Global Health Drug Discovery Institute, was jointly established in Beijing, China by the Bill & Melinda Gates Foundation, Tsinghua University, and the Beijing Municipal People’s Government. Professor Ding Sheng, Dean of the School of Pharmaceutical Sciences at Tsinghua University and a Bayer Distinguished Professor, serves as the Center’s Director. By leveraging top-tier global resources and capitalizing on China’s unique strengths, GHDDI is committed to building leading capabilities in biomedical R&D and an innovative platform for drug translation, tackling major disease challenges facing humanity and improving global health.

 

Alibaba Cloud also joined the fight against the pandemic alongside GHDDI. On January 29, Alibaba Cloud announced that it would provide free access to all its AI computing resources for public research institutions worldwide to support epidemic control efforts. Prior to this announcement, Alibaba Cloud’s high-performance computing platform had already been providing free support to GHDDI’s research on the novel coronavirus.

 

As early as January 2020, at the outset of the COVID-19 pandemic, GHDDI announced that it would provide global researchers with free access to its internal R&D platforms and pharmaceutical research resources, including high-throughput drug screening platforms and multiple compound libraries. Subsequently, hundreds of domestic and international research institutions and teams leveraged GHDDI’s open resources to develop drugs and vaccines against the SARS-CoV-2 virus.

 

Two days after opening its drug screening platform and internal pharmaceutical R&D resources, GHDDI’s AI R&D team launched “Targeting COVID-19,” a one-stop scientific data and information sharing platform dedicated to coronavirus research. Reportedly, the “Targeting COVID-19” platform was designed and built in just four days. In addition, GHDDI partnered with Alibaba Cloud to jointly establish a global cloud computing system for AI-driven coronavirus research, leveraging top-tier databases and high-performance supercomputing resources to support innovative efforts in the global fight against the pandemic.


High-Throughput: Rapidly Identifying Potential Targets for SARS-CoV-2


On January 21, confirmed cases of COVID-19 had emerged in 13 provinces and municipalities across China, intensifying the outbreak. Meanwhile, the genomic sequence of the novel coronavirus was publicly released.

 

At the inquiry of He Wanqing, head of high-performance computing at Alibaba Cloud, Dr. Pan Lurong, head of the AI department at GHDDI, compared the similarity between the novel coronavirus and the SARS virus. Recognizing the potential threat posed by this virus, GHDDI resolved to commit resources to help combat the novel coronavirus.


微信图片_20200603181416.jpg

 

In fact, as early as January, GHDDI simulated the three-dimensional structures of nearly all SARS-CoV-2-related targets and conducted comprehensive analyses of homology and infectivity, thereby rapidly identifying antigenic targets that play a pivotal role in subsequent drug development and antibody design. After determining these key targets, GHDDI released its preliminary research data to enable external teams to pursue drug development, while simultaneously conducting virtual screening of drug molecules based on computational models.

 

VCBeat learned from relevant GHDDI officials that the GHDDI drug research team focused on the concept of “drug repurposing.” By conducting structure-activity relationship and historical data analyses on more than 9,000 existing small molecules with antiviral activity and its internal ReFRAME compound library (which contains over 12,000 clinically safe compounds), the team screened out several hundred small molecules with a high probability of exhibiting anti-SARS-CoV-2 activity. Throughout this process, GHDDI has maintained an open-source approach, releasing its scientific data and multi-level antigen target phenotypic analysis models built upon these data to support the global research community in conducting subsequent studies on drug developability.

 

Amid the backdrop of the COVID-19 pandemic, sharing resources and research findings can undoubtedly greatly accelerate researchers’ progress and avoid duplication of effort.

 

We know that drug development is a highly complex and time-consuming process. During the compound discovery phase, traditional methods rely on extensive experimental screening to identify potentially suitable compounds. Taking the identification of small molecules that bind to viral proteases as an example, given the vast number of commercially available compound libraries—each containing millions of compounds, amounting to hundreds of millions in total—it is virtually impossible to test them all individually through experimental approaches alone.


微信图片_20200603181427.png

 

Consequently, scientists have attempted to use computational methods, such as simulating the interactions between molecular compounds and targets, to screen for potentially effective compounds for low-throughput experiments. One traditional virtual screening approach involves docking small molecules with targets, scoring the binding efficacy of different ligands, or performing further calculations via molecular dynamics simulations. This process helps identify high-scoring ligands with reasonable binding modes as candidate drugs for experimental validation, thereby accelerating the drug discovery process.

 

Due to the vast size of molecular libraries, even virtual screening implemented via computers poses a significant challenge to computational performance, as it must be completed within a limited timeframe. Assuming a compound library contains 10,000 candidate ligands, and calculating based on an average processing time of 1.5 hours per compound on a single-core CPU, the total time required to complete the molecular screening of this library would be 15,000 hours (625 days). The application of high-performance computing clusters provides essential support for modern drug research and development. If 625 CPUs are used for parallel computing on a high-performance cluster, the aforementioned task can be completed in one day. Furthermore, if an artificial intelligence model trained on high-performance GPUs is employed for predictive screening, the same task can be accomplished in just four minutes on a single GPU.


Validation of Cloud Supercomputing in Drug R&D Scenarios


High-Performance Computing (HPC), also known as supercomputing, is a method that utilizes supercomputers or large-scale computing clusters to address tasks requiring substantial computational power, such as parallel computing and AI model operations. It is widely applied in fields like oil exploration, weather forecasting, and drug development. Generally, to complete molecular screening for drug development within a specified timeframe, researchers require a computing platform with robust computational capabilities, high-capacity storage, and a comprehensive suite of high-performance application software, such as Amber and NAMD.

 

For many years following the advent of high-performance computing (HPC), cloud computing was not favored by HPC development experts due to performance overhead caused by virtualization. In single-node experiments, physical machines invariably outperform virtual machines, making the use of the best and fastest physical hardware an unwritten rule in the HPC field.

 

In 2017, Alibaba Cloud unveiled the X-Dragon server at the Apsara Conference. This cloud server, independently developed by Alibaba Cloud, primarily leverages self-developed chips and MOC cards to achieve virtualization capabilities. By elevating the control and management of storage and networking, it ensures that CPU resources are no longer wasted, dedicating 100% of their capacity to computational tasks.

 

Although it theoretically still consumes resources, the advantages of ECS Bare Metal Instance are obvious. The performance of containers running on ECS Bare Metal Instance is 20-30% higher than that on traditional physical machines. This is because when containers are deployed at high density on traditional physical machines, the CPU resources occupied by the core of storage network virtualization and those occupied by business services compete with each other. As the overall load increases, service latency deteriorates rapidly, eventually leading to service unavailability. In contrast, on ECS Bare Metal Instance, the data links between containers are isolated using hardware queues of chips, so they do not affect each other. Even when the load approaches 90%, the change in latency remains minimal.

 

By eliminating virtualization overhead, the X-Dragon architecture has made cloud-based supercomputing a reality. Alibaba Cloud’s supercomputing cluster uses X-Dragon servers as its computational foundation, interconnected via high-speed RoCE networks and integrated with the Cloud Parallel File System (CPFS), thereby providing the complete hardware infrastructure required for high-performance computing (HPC). On the software scheduling front, Elastic High Performance Computing (E-HPC) enables users to self-service deploy their own HPC clusters in the cloud, configure high-performance servers and large-capacity storage, and leverage multi-node software execution and high-throughput task processing solutions, directly meeting the computational platform needs of drug R&D professionals.

 

During the COVID-19 pandemic, GHDDI built an open and shared platform on Alibaba Cloud, utilizing Elastic High Performance Computing (E-HPC) to establish high-performance computing clusters for molecular docking, molecular dynamics simulations, and deep learning model training in drug discovery. Additionally, it created separate cloud HPC sub-accounts for partners to facilitate the sharing of computational resources and data.

 

Dr. He Wanqing, a Senior Expert at Alibaba Cloud, told VCBeat that GHDDI’s shared platform helps scientists more readily translate immediate ideas into guided innovative explorations, significantly improving the development efficiency of COVID-19-related drugs and vaccines. In the future, Alibaba Cloud’s supercomputing will provide the necessary computational power support for more drug research and development efforts, leveraging its elastic high-performance computing capabilities.