Accelerated Rational PROTAC Design via AI and Molecular Simulations: A Dual-Driven Approach to Targeting 'Undruggable' Proteins

Sep 16, 2022 13:17 CST Updated Sep 17, 21:21

Galixir

AI Technology Empowers Drug Developers

Protein Degradation Targeting Chimera（PROTACs）As an emerging therapeutic strategy, PROTACs harness the ubiquitin-proteasome system to degrade pathogenic target proteins, holding great potential to tackle "undruggable" targets. However, the ternary structure characteristics of PROTAC molecules result in higher molecular weight, making the optimization of their drug-like properties extremely challenging.

GalixirThe R&D team collaborates with the National Supercomputing Center in Guangzhou, Sun Yat-sen UniversityYang YuedongProfessor, leveraging the "Tianhe-2" supercomputer, in the context of interdisciplinary research involving supercomputing, AI, and biopharmaceuticals, fully integrates molecular generation based on reinforcement learning with molecular simulation techniques based on first-principles.Leveraging Dual-Driven Intelligent and Simulated Computing to Accelerate PROTAC Drug Development, Identifying Novel Lead Compounds with High Degradation Activity and Pharmacokinetic Properties within 49 Days and Completing Wet Lab Validation.This study is of great significance for the rational design of PROTAC and the optimization of drug-like properties.。

The relevant results were published on September 15 as:Accelerated rational PROTAC design via deep learning and molecular simulationsWas published under the title ofNatureSubsidiary JournalNature Machine IntelligenceUp.

This is also following the《Nature Sub-Journal: Galixir Provides Navigation for Biosynthetic Route Planning of Natural Products》Later, Galixir published its second AI pharmaceuticals paper in a Nature sub-journal this year.

Breaking Through "Undruggable" Targets

Challenges in the Rational Design of PROTAC

Since the first proof-of-concept of Proteolysis Targeting Chimeras (PROTACs) in 2001, PROTACs have become a revolutionary tool for the selective degradation of target proteins through the ubiquitin-proteasome system. PROTACs consist of three parts: one targeting the protein of interest（POI）Ligand(Also known as warhead)A ligand that recruits E3 ubiquitin ligase, and a chemical linker that connects the two ligands. Due to this bifunctional structure, PROTACs have the ability to simultaneously bind to the target protein and E3 ubiquitin ligase, forming an active ternary complex. Therefore, PROTACs only need to briefly bind to the target protein to induce ubiquitination and degradation. In addition, the fact that PROTACs do not need to occupy a druggable active site makes it possible to use all surface binding sites of the target protein to modulate "undruggable" targets. However, the current design and optimization of PROTACs still require empirical iterative refinement, which presents certain limitations in the development strategy.

In the development of PROTACs, the most challenging issue is how to select an appropriate linker to form a suitable PROTAC-active ternary complex that exhibits degradation activity and target selectivity. Due to the complexity and dynamic nature of the ternary structure, designing the linker often presents a significant challenge. The length, composition, flexibility, and attachment sites of the linker can all have a substantial impact on the outcome. Additionally, another design challenge arises from the fact that PROTAC molecules often do not meet the common properties required for oral drugs. As multi-component molecules, their relatively large molecular weight leads to issues such as poor solubility, low permeability, poor bioavailability, and unpredictable Hook effects compared to traditional small molecules, which hinder the clinical translation of PROTACs. Therefore, rationally optimizing PROTAC molecules to overcome these problems under limited conditions remains a major challenge in this field.

Dual-Driven by Intelligence and Simulation Computing

PROTAC-RL Discovers Novel Lead Compound in Just 49 Days

To address this issue, the research team proposed a PROTAC rational design algorithm based on deep generative models—PROTAC-RL. The model takes a pair of E3 ligands and warheads as input, outputs the designed linkers, and utilizes reinforcement learning.（Reinforcement Learning，RL）Under the guidance, PROTAC molecules with specific properties are generated. Specifically, the research team first pre-trained a linker generation model using a Transformer neural network. Then, during the model training process, to overcome the issue of limited PROTAC training data, the model was pre-trained with a large number of quasi-PROTAC molecules that share a similar chemical space with PROTACs, followed by fine-tuning the model with real PROTACs and augmented data. Subsequently, the trained model was integrated into a memory-based reinforcement learning module with empirical reward functions to generate PROTACs with better pharmacokinetic properties.

As a proof of concept, the research team selected BRD4 as the target protein and generated more than 5,000 PROTACs. Leveraging supercomputing power, the team used high-throughput machine learning scoring functions and molecular dynamics simulations to cluster and screen these virtual molecules. Based on synthetic accessibility, the researchers ultimately selected, synthesized, and experimentally tested six PROTACs, three of which demonstrated inhibitory activity against BRD4. One lead compound also showed high anti-proliferative potency in tumor cell lines and exhibited favorable pharmacokinetics in mice.(Figure 1)Based on Galixir's long-term exploration and沉淀 in the PROTAC research direction, the large-scale parallel molecular dynamics simulation methods from the National Supercomputing Center in Guangzhou at Sun Yat-sen University, and the massive GPU computing power used for deep learning model training, the entire research process took only 49 days. This demonstrates that the combination of supercomputing, deep learning, and molecular dynamics can facilitate efficient and rational PROTAC design and optimization.

Figure 1. Schematic diagram of the PROTAC-RL process

Comparative experiments show that the PROTAC-RL model performs superiorly in all metrics.

The PROTAC-RL model consists of two parts: the base generation model Proformer and the drug-likeness reinforcement learning model RL. In evaluating Proformer, researchers first split the PROTAC dataset in an 8:1:1 ratio for training, validation, and testing. For each test pair of warheads and E3 ligands, 10 candidate PROTACs were generated, and the percentage of reproducing the true PROTACs in the test set was used as the evaluation metric.(Reproducibility Rate)To evaluate performance, the researchers compared Proformer with other state-of-the-art fragment linking methods, including the graph learning-based method Delinker, the sequence-based method Syntalinker, and their retrained versions on the PROTAC training set. As shown in Figure 2-A, Proformer achieved a reproducibility rate of 43.0%, significantly outperforming the best existing baseline methods. The molecules generated by Proformer are also closer to the real chemical space of PROTACs compared to other methods.(Figure 2-B). At the same time, the ablation experiment(Figure 2-C)This once again proves the rationality of the model design. After being combined with reinforcement learning, the molecular scores generated by the PROTAC-RL model are also much higher than those of other model variants.(Figure 2-D)In a selected case, the PROTAC-RL model can generate specific linker properties based on different target score settings.(Fig. 2-E, F)Overall, compared with other methods, the PROTAC-RL model shows superior performance in terms of reproducibility, effectiveness, uniqueness, and novelty.

Figure 2. Comparison results of PROTAC-RL with the latest prediction methods and the results of ablation experiments

Validation Case: PROTAC Design Targeting BRD4 and Wet Lab Validation

BRD4 is an epigenetic regulator that plays a crucial role in cancer development. Although there has been extensive research on BRD4 using PROTAC, no candidate drug has yet successfully advanced to clinical trials, primarily due to issues related to pharmacokinetics and toxicity. To evaluate the performance of PROTAC-RL in practical drug development, researchers utilized PROTAC-RL to generate over 5,000 virtual molecules with favorable pharmacokinetic scores. After machine learning screening, molecular dynamics simulation assessment, structural clustering, synthesizability analysis, and patent searches, six final candidate molecules were selected for synthesis and validation.

The researchers first utilizedImmunityThe protein degradation ability was detected by imprinting experiments. As shown in Figure 3, compounds 1, 2, and 3 demonstrated a reduction in BRD4 protein activity. Researchers further investigated the in vitro anti-proliferative effects of compounds 1, 2, and 3 on Molt4 cells. As shown in Figures 4-A to 4-C, these compounds exhibited varying degrees of anti-proliferative activity on the Molt4 cell line, with IC values for compounds 1, 2, and 3.₅₀The values were 116 μM, 5.1 μM, and 21 μM, respectively. In addition to activity, researchers also evaluated the inhibitory effect of compound 1 on the hERG channel and found that the inhibition of hERG by compound 1 was very low, at only 27.4 μM, indicating that compound 1 has no significant cardiac toxicity. Moreover, compound 1 exhibited favorable physicochemical properties, with a logS of 1.42 and a logD of 3.27.

To verify that PROTAC-RL significantly optimizes the pharmacokinetic properties of PROTACs, the research team continued testing Compound 1 in animal models. After administering Compound 1 to mice via intraperitoneal injection (2 mg/kg), it was found that the half-lives of the three doses were similar, approximately 2.22 hours (Figure 4-D). Intraperitoneal administration reached a plasma concentration peak (Cmax) of 194 ng/mL upon the first dose. This is a clear advantage over the positive reference compound dBET6, which has a Cmax of only 176 ng/mL and a half-life of 0.52 hours. These results demonstrate that Compound 1 is a potent BRD4 degrader with favorable pharmacokinetics.

Figure 3. Bioactivity Testing and Pharmacokinetic Testing

This research report outlines a fully automated computational framework that integrates reinforcement learning-driven deep generative models, machine learning, and molecular dynamics simulations for the rational design and optimization of PROTACs. In a case study targeting BRD4, Galixir's research team, in collaboration with Professor Yuedong Yang from the National Supercomputer Center in Guangzhou at Sun Yat-sen University, relied on "Tianhe-2" and combined cross-disciplinary expertise in supercomputing, AI, and biopharmaceuticals. By leveraging reinforcement learning-based molecular generation and physics-based molecular simulation techniques, the team utilized dual-driven intelligent and simulation computing to accelerate PROTAC drug discovery. Within just 49 days, they identified novel lead compounds with high degradation activity and excellent pharmacokinetic properties and completed wet-lab validation. This further demonstrates that integrating supercomputing, AI-driven computational strategies, and experimental approaches is a crucial method for obtaining effective drug candidates.