DeepMind's AlphaFold Triumphs in CASP13, Ushering in a New Era of Protein Structure Prediction

Dec 04, 2018 08:00 CST Updated 08:00

On November 2, at the 13th Critical Assessment of protein Structure Prediction (CASP) held in Cancun, Mexico, the organizers announced that DeepMind’s latest artificial intelligence program, AlphaFold, had defeated all competitors in an extremely challenging task by successfully predicting the three-dimensional structures of proteins, the fundamental molecules of life.

As a foundational technology, DeepMind referred to AlphaFold in its blog as the “first major milestone” in demonstrating how artificial intelligence research can drive and accelerate new scientific discoveries.

Through an interdisciplinary approach, DeepMind has brought together experts from the fields of structural biology, physics, and machine learning to apply cutting-edge technologies for predicting the 3D structure of proteins based solely on their gene sequences.

Upon learning of this epoch-making technological breakthrough, VCBeat seeks to reconstruct the “milestone” event of DeepMind once again surpassing human models, as well as DeepMind’s explorations in the healthcare sector, through the following logical framework.

1. The Epoch-Making Significance of Protein Structure Prediction

2. How Did DeepMind Win the Championship?

3. AI algorithms shorten the lengthy and laborious prediction process to just a few hours

4. How Much Disruption Will DeepMind’s AI Implementation Bring to Medicine?

# Nobel Prize-Winning Scientific Challenges

Proteins are large, complex molecules essential for sustaining life. Nearly all functions performed by our bodies—such as muscle contraction, light perception, or the conversion of food into energy—can be traced back to one or more proteins and how they move and change. The recipes for these proteins are called genes.

What any given protein can do depends on its unique 3D structure. For example, antibody proteins that constitute our immune system are “Y-shaped” and resemble specialized hooks. By binding to viruses and bacteria, antibody proteins detect and mark disease-causing microbes for destruction.

Similarly, collagen has a rope-like structure and transmits tensile forces among cartilage, ligaments, bones, and skin. Other types of proteins include CRISPR and Cas9, which function like scissors to cut and paste DNA; antifreeze proteins, whose 3D structures enable them to bind ice crystals and prevent organisms from freezing; and ribosomes, which act as programmed assembly lines to facilitate protein synthesis.

However, determining the three-dimensional structure of proteins solely from their genetic sequences is a complex task that has challenged scientists for decades. The challenge lies in the fact that DNA contains only information about the sequence of protein building blocks, known as amino acid residues, which form long chains. Predicting how these chains fold into the intricate 3D structures of proteins is what is known as the “protein folding problem.”

微信图片_20181203221757.gif

Schematic of Predicted Protein 3D Structure Model (Image Source: DeepMind Official Website)

“Protein folding” is an incredible form of molecular folding, rarely discussed outside the scientific community, yet it is a critically important issue. Living organisms are composed of proteins, and their functions are determined by protein shapes. Understanding how proteins fold can help researchers usher in a new era of scientific and medical research.

Therefore, the protein folding problem has been listed as a key topic in “biophysics of the 21st century.” It represents a major unresolved biological issue pertaining to the central dogma of molecular biology. Although proteins can fold from their primary structure into three-dimensional conformations within a short timeframe, researchers are unable to computationally predict protein structures from amino acid sequences in a comparable timeframe, and often fail to obtain accurate three-dimensional structures.

Dr. Christian Anfinsen of the U.S. National Institutes of Health (NIH) was awarded the Nobel Prize in Chemistry in 1972 for his discovery that proteins can spontaneously complete the folding process without the need for additional assistance.

Demis Hassabis, co-founder and CEO of DeepMind, said: “This is a very critical moment for DeepMind. This is a ‘lighthouse’ project, representing our first major investment in personnel and resources, while also addressing a fundamental and highly significant real-world scientific question.”

As early as 2017, biophysicists at the JILA physics research center of the University of Colorado discovered through more detailed measurements of protein folding that the folding process is more complex than scientists had previously predicted. This means that our understanding of proteins remains superficial.

Protein molecules are fundamentally composed of amino acid chains. Through a series of intermediate processes, these chains fold into three-dimensional structures akin to origami, only acquiring functionality thereafter. Accurately describing this folding process requires knowledge of the conformations of all intermediate states. Recent research has revealed many previously unknown states in this process, with the findings published in the March 3 issue of Science.

How Does AI Successfully Predict Protein 3D Structures?

CASP, which once again brought DeepMind into the spotlight, is regarded as the “Olympic Games” of the protein structure field. In this competition, the DeepMind team (competing under the name “A7D”) successfully secured the best individual model for 25 out of 43 target proteins, achieving a cumulative total score of 120.35 and ranking first. According to the official results released by the 13th Critical Assessment of Protein Structure Prediction (CASP13), the second-place team, named “Zhang,” achieved a total score of 107.03.

According to DeepMind, the design of this achievement stems from the use of neural networks to predict physical properties and the development of a novel approach for protein structure prediction.

Both methods rely on deep neural networks trained to predict protein properties from their gene sequences. DeepMind’s network predicts the following attributes: (a) distances between amino acid pairs and (b) angles between the chemical bonds connecting these amino acids. The first advance represents an improvement over commonly used techniques that estimate whether amino acid pairs are in close proximity.

Given the novel proteins under investigation, AlphaFold employs neural networks to predict the distances between amino acid pairs and the angles between the chemical bonds connecting them. In the second step, AlphaFold refines the initial structural model to identify the most energetically favorable conformation.

DeepMind trained a neural network to predict the individual distributions of distances between every pair of residues in a protein. These probabilities were then combined into a score to evaluate the accuracy of the proposed protein structure. Additionally, a separate neural network was trained to estimate how closely the proposed structure matches the correct answer, using the aggregated distance information.

Origami-CASP-181127-r01_fig3-results-crop.width-1500 (1).png

微信图片_20181203222108.gif

Predicting Physical Properties Using Neural Networks (Image source: DeepMind official website)

The second method optimizes scores through gradient descent—a mathematical technique commonly used in machine learning to make small, incremental improvements—resulting in highly accurate structures. This technique is applied to the entire protein chain rather than to fragments that must be folded individually prior to assembly, thereby reducing the complexity of the prediction process.

Origami-CASP-181127-r01_fig4-method.width-1500 (1).png

Developing New Methods for Protein Structure Prediction (Image source: DeepMind official website)

Using these scoring functions, DeepMind can search protein structures to identify those that match our predictions. The first approach builds on techniques commonly used in structural biology and involves iteratively replacing segments of a protein structure with new protein fragments. To develop AlphaFold, DeepMind trained a neural network on thousands of known proteins until it could predict the 3D structure based solely on amino acid sequences.

Once AlphaFold is provided with a new protein, it leverages its neural network to predict the distances between constituent amino acid pairs and the angles of their connecting chemical bonds, thereby generating an initial structural model. Subsequently, AlphaFold refines this model to identify the most energetically favorable conformation.

Although AlphaFold took two weeks to predict its first protein structure, the program can now complete the task within a few hours.

AI Shortens the Lengthy and Labor-Intensive Prediction Process to Just a Few Hours

According to data reported by The Guardian, as of 2010, only 0.6% of known protein sequences had their corresponding structures resolved.

Over the past five decades, scientists have been able to determine the shapes of proteins in the laboratory using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance (NMR), or X-ray crystallography. However, each method relies on extensive trial and error, which can cost up to tens of thousands of dollars per year. This is why biologists are turning to artificial intelligence approaches as an alternative to this lengthy and labor-intensive process.

Regarding the complexity of protein folding, foreign media once reported that simulating this process using the fastest computers available at the time would take 100 years. Those computers operated at speeds of several to over ten trillion floating-point operations per second (TFLOPS). Although today’s most powerful supercomputers can reach peak performance of 200 quadrillion operations per second, simulating protein folding may still require scientists to devote years or even decades to computational efforts.

Each protein is a chain of amino acids, of which there are 20 types. Proteins can twist and fold between amino acids, so a protein containing hundreds of amino acids can potentially adopt an astonishing number (10 to the power of 300) of structural conformations. It has long been recognized that malfunctioning proteins can cause diseases, and historically, targeting their structures with drugs to activate or deactivate them has yielded therapeutic effects. Due to limitations in computer algorithms and computational power, elucidating protein structures has remained challenging until recently.

According to Wang Zhizhen, a researcher at the State Key Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, and an academician of the Chinese Academy of Sciences, errors in protein folding and conformation can lead to diseases such as Alzheimer’s disease, Parkinson’s disease, Huntington’s disease, and cystic fibrosis. As research into protein folding deepens, more true etiologies of diseases and more targeted therapeutic approaches will be discovered, thereby enabling the design of more effective interventions.

If scientists can learn to predict protein shapes from their chemical compositions, they can determine their functions, identify how they might malfunction and cause harm, and design new proteins to combat diseases or perform other tasks. In short, by understanding how proteins fold, researchers can usher in a new era of scientific and medical advancement.

Taking dementia (scientifically known as Alzheimer’s disease) as an example, it has a latency period in the human body that can last for more than a decade. Due to its complex etiology, current medical technology makes it clinically challenging to detect the disease even several years before onset.

Fortunately, due to the rapid decline in the cost of gene sequencing, the field of genomics has become data-rich. Consequently, deep learning methods for prediction tasks reliant on genomic data have grown increasingly popular in recent years. DeepMind’s work in this area led to the development of AlphaFold, which was submitted to CASP this year.

DeepMind stated in its blog post: “We are proud to be part of what the CASP organizers have called ‘an unprecedented advance in the ability of computational methods to predict protein structure,’ ranking first among participating teams. Our team focused on the challenge of modeling target shapes from scratch, without using previously solved proteins as templates. We achieved a high degree of accuracy in predicting the physical properties of protein structures, and then employed two different methods to construct predictions of complete protein structures.”

In the 2013 annual report of a certain science and technology project (see details:http://www.nstrs.cn/xiangxiBG.aspx?id=64700, this report is for reference only and does not constitute any actual prediction or judgment) we found the following statement: “Virtual drug screening and computational biology are constrained by computational resources, methods, and software, making it difficult to conduct systematic virtual screening of tens of millions of compounds or to perform ab initio folding simulations of general protein structures, thereby failing to meet the demands of innovative drug development and computational biology research. Therefore, there is an urgent need to develop ultra-large-scale parallel platforms for virtual screening and molecular dynamics simulations of protein folding, in order to satisfy the needs of life sciences and innovative drug research.”

From this perspective, one of the application scenarios for DeepMind’s protein structure prediction will be compound screening for drug innovation.

In fact, as early as 2016, after AlphaGo’s planned victory over Lee Sedol, DeepMind quickly turned its attention to protein folding. In October 2017, DeepMind stated in a public interview that the team had begun to take an interest in the application of artificial intelligence in drug development, and that a critical step in new drug development is the accurate determination of the three-dimensional structure of target proteins.

Liam McGuffin, a researcher at the University of Reading, stated: “The ability to predict the folded structure of any protein is a major breakthrough. It holds profound significance for addressing many of the 21st century’s challenges, impacting health, ecology, and the environment, and fundamentally solving any problem involving living systems.”

Repeatedly Breaking Through Technological Innovations: DeepMind’s Exploration in the Healthcare Sector

Following AlphaGo’s rise to fame, DeepMind explored numerous data-driven tools and techniques, particularly machine learning methods that underpin artificial intelligence, offering hope for improving healthcare systems and services. Eric Schmidt, Executive Chairman of Alphabet, stated that new deep learning capabilities, exemplified by AlphaGo, could enhance daily productivity and create countless opportunities for businesses, especially in the fields of healthcare, transportation, and government.

Fundus Screening

In March 2016, DeepMind Health (now part of Google Health) employed the same deep learning technology used in the AlphaGo system. Collaborating with researchers from University College London and Moorfields Eye Hospital, it developed software that leverages deep learning to identify dozens of common eye diseases from 3D scans and then recommends appropriate treatments for patients.

This work is the result of years of collaboration among three institutions. Although the software is not yet ready for clinical use, it could be deployed in hospitals within a few years.

As described in a paper published in Nature Science, the software is built on established principles of deep learning, which use algorithms to identify common patterns in data. In this case, the data are derived from 3D scans of patients’ eyes using a technique called optical coherence tomography (OCT). Generating these scans takes approximately 10 minutes and involves reflecting near-infrared light off the inner surfaces of the eye to create three-dimensional images of ocular tissues, a commonly used method for assessing eye health.

The software was trained on nearly 15,000 OCT scans from approximately 7,500 patients, all of whom received treatment at Moorfields Eye Hospital. In one test, the AI’s assessments were compared against diagnoses made by a panel of eight physicians, and the software provided the same recommendations 94% of the time.

Breast Cancer Screening

In April 2018, DeepMind joined a groundbreaking new research partnership led by the Imperial Centre for Cancer Research at Imperial College London, to explore whether artificial intelligence technologies could help clinicians diagnose breast cancer more quickly and effectively.

The study will analyze approximately 30,000 mammograms collected from women at hospitals between 2007 and 2018. These images will be analyzed using AI technology in conjunction with previously provided historical de-identified mammograms. Through the UK OPTIMAM Mammography Database, the research aims to determine whether this technology can more effectively detect signs of cancerous tissue on these X-rays than existing screening methods. During the project, Jikei University Hospital will also share breast ultrasound data from approximately 30,000 women and 3,500 breast MRI scans.

These collaborations have laid the foundation for greater use of AI within the NHS by providing data that DeepMind can use to train healthcare algorithms.

Assist physicians in developing radiotherapy plans

In September 2018, the Radiotherapy Department of University College London Hospitals NHS Foundation Trust was developing an artificial intelligence (AI) system capable of analyzing medical imaging scans for head and neck cancer and classifying them according to standards comparable to those of expert clinicians. Organ segmentation is an essential yet time-consuming step in radiotherapy planning. DeepMind is developing a new performance metric designed to better reflect clinical workflows in model evaluation, along with a test dataset to assist physicians in organ segmentation and delineation of organs at risk.

Predicting the Risk of Acute Kidney Injury Worsening

In February 2018, DeepMind established a medical research partnership with the U.S. Department of Veterans Affairs (VA), one of the world’s leading healthcare organizations, which is responsible for providing high-quality medical services to veterans and their families across the United States.

This project, in collaboration with world-renowned VA clinicians and researchers, is analyzing approximately 700,000 historical de-identified medical records to determine whether machine learning can accurately identify risk factors for patient deterioration and correctly predict its onset, with a primary focus on acute kidney injury (AKI).

As can be seen from the aforementioned research by DeepMind, its exploration of artificial intelligence technologies across various fields remains in the experimental stage and has not yet entered the clinical phase.

Some media outlets argue that the integration of AI into biology is not an isolated case. In recent years, artificial intelligence teams led by Google have made comprehensive breakthroughs in the biomedical sector, achieving remarkable results in areas such as cancer pathology image recognition, genomic mutation detection, and disease risk assessment—performance that equals or even surpasses human-level capabilities. However, these seemingly successful models inevitably face challenges related to generalizability, usability, and interpretability.

From an algorithmic perspective, DeepMind’s technological breakthrough in the fundamental research of protein folding is epoch-making. Although Hassabis stated that DeepMind has not completely solved the protein folding problem, noting that prediction is only the first step. “Protein folding is a highly challenging problem, but we have a robust system and some ideas yet to be implemented.”

It is a fact that, although AlphaFold’s achievements are indeed commendable, the originality of its methodology can only be fully understood—and recognized as a research contribution—when it is described in detail in a peer-reviewed research paper.

That said, AlphaFold’s comprehensive success in this instance is a clear sign that the scientific community may soon be able to leverage technology to effectively predict protein structures.

As its focus shifts from games to real-world problems, it will be interesting to see which scientific questions DeepMind turns its attention to next.

Reference link:

https://deepmind.com/blog/alphafold/

https://www.theguardian.com/science/2018/dec/02/google-deepminds-ai-program-alphafold-predicts-3d-shapes-of-proteins

https://mp.weixin.qq.com/s/QAzcRAnZOmlBAm3PM7ZLNA

https://mp.weixin.qq.com/s/6BTN7WTQlIyrEEgNYUR7kQ