NVIDIA, Google DeepMind, and EMBL Launch World's Largest Protein Complex Dataset at GTC 2026 to Accelerate AI-Driven Drug Discovery

Feb 06, 2026 15:09 CST Updated 15:09

NVIDIA

Artificial Intelligence Computing Service Provider

微信图片_2026-05-06_162055_806.png

NVIDIA、Google DeepMind、The European Bioinformatics Institute (EMBL-EBI), part of the European Molecular Biology Laboratory, and the Steinegger Lab at Seoul National University have significantly expanded the AlphaFold Protein Structure Database by adding 1.7 million high-confidence predicted protein complexes to the searchable database and providing approximately 30 million additional predicted structures for bulk download.

This newly added dataset is the largest of its kind, transforming the database into a comprehensive resource platform for protein–protein interaction modeling at an unprecedented scale.

Google DeepMind’s AlphaFold-Multimer model provides the database with AI-predicted protein structures.Meanwhile, by integrating into the OpenFold inference pipeline, includingNVIDIA TensorRTandcuEquivariance including NVIDIA computing libraries, with inference speed improved by more than 100 times compared to traditional methods.

This database provides these pre-computed protein structural conformations as research hypotheses, thereby accelerating the experimental validation process in new drug target discovery and disease mechanism research.This significantly lowers the barrier to scientific research, particularly benefiting researchers in resource-limited settings who lack access to advanced high-performance computing resources.

The project prioritizes the reference proteome—a protein collection representing taxonomic diversity—and the World Health Organization’s list of priority pathogens to advance infectious disease research.

For the pharmaceutical industry, these predicted structures can serve as robust initial hypotheses, significantly accelerating subsequent wet-lab experiments and saving valuable time and resources.