The AI machinery responsible for this historical news is known as AlphaFold, developed by Google’s sister company DeepMind in London. The human genome codes for the human proteome, the full complement of proteins expressed by an organism. However, up until now, only a third of the 3D structures for the proteome have been determined.
Now, AlphaFold has been able to characterise the structure of more than 35,000 proteins (which vary in accuracy) available through a public database. According to a Nature publication, the database is said to grow to 130 million structures by the end of 2021 and has aimed to predict the structure of every protein in humans as well as 20 model organisms.
The DeepMind Programme has proven its value after outperforming approx 100 other teams in a protein-structure prediction challenge called CASP – Critical Assessment of Structure Prediction.
Proteins constitute one of the key areas of focus for therapeutic targets, especially so in the last few years, with research investigating protein-protein interactions and targeted protein degradation. Unfortunately, this research has been limited in progress by something known as the ‘protein folding problem’.
A historical hypothesis inferred that, in theory, a protein’s amino acid sequence should fully determine its structure. The challenge is that it has been impossible to characterise protein structure due to the vast number of confirmations it could fold into before settling into the final 3D structure.
A folded protein can be thought of as a “spatial graph” which is important for understanding the physical interactions within proteins as well as their evolutionary history. According to DeepMind, AlphaFold works by creating an “attention-based neural network system, trained end-to-end, that attempts to interpret the structure of this graph. It uses evolutionarily related sequences, multiple sequence alignment, and a representation of amino acid residue pairs to refine this graph.”
A neural network system is a form of deep learning, which is developed from a branch of AI called machine learning (ML). Deep learning is a specialised area of ML that attempts to model abstraction from large-scale data using multi-layered deep neural networks (DNNs). Abstraction is a computer science term that refers to the process of filtering out irrelevant data in order to focus on the desired information. The neural network structure aims to mimic how the human brain calculates problems and analyses data through a set of algorithms.
Through this complex process, AlphaFold develops strong predictions of the underlying physical protein structure, which it can determine highly accurate structures of in the space of a few days.
AlphaFold has been trained upon publicly available data of approximately 170,000 protein structures and large databases containing the sequences of proteins with unknown structures.
DeepMind is optimistic about the impact of AlphaFold on biological research, especially in terms of understanding disease pathology. Understanding the 3D structure of a protein is so important as it plays a critical role in their function and contribution to physiological changes in the body.
Genes determine the amino acid sequence which determines the final structure of the protein – hence, an error in the genetic code may result in the malformation of a protein, causing disease or death.
The link between protein malformation and disease is not a new concept, however, targeting proteins has been an uphill struggle due to the fact that the final 3D structure of many proteins remains unknown. Knowing the genetic code of a protein is not enough – the structure is the key for drug targeting.
It’s not to say however that scientists have failed to determine protein structure – experimental techniques like X-Ray crystallography have been used over the last few decades to successfully determine protein shape. Unfortunately, these methods can take years to perform, cost thousands of dollars per protein structure and depend a lot on trial and error.
In terms of contributing to therapeutic advancements, DeepLearning could help accelerate research by predicting a protein’s shape computationally from its genetic code alone, rather than timely laborious lab work involving techniques like X-Ray crystallography.
Even more interesting is that “some of the regions that AlphaFold predicted with low confidence match up with those that biologists suspect are disordered” – a quote from the Head of AI for science at DeepMind. This is the first step forward for researchers to begin to understand how protein structure contributes to specific diseases, by understanding more about the complex structure in detail. In other words, AlphaFold could help to identify proteins which have malfunction and provide more information about how they interact.
These insights would no doubt contribute to more precise targeting in drug development, should researchers identify the structure of malfunctioned proteins and target specific regions of interest.
In early 2020, AlphaFold predicted several protein structures of SARS-CoV-2 virus – ORF3a and ORF8 – which were previously unknown. Recent work by experimentalists have confirmed the structure of both proteins, supporting the accuracy of AlphaFold’s protein predictions. This is a significant achievement especially given the challenging nature of the proteins with very few related sequences for the AI system to utilise.
Charlotte Di Salvo, Former Editor & Chief Medical Writer
PharmaFEATURES
The CA-Cys system could redefine the standard of care for halide-related diagnostics, bridging the gap between laboratory precision and point-of-care accessibility.
The integration of vision and intelligence is a hallmark of human cognition. Inspired by this interplay, MoS₂ transistors offer a platform for neuromorphic imaging.
The study of Ca-Mg-Si-based multiphase bioceramics marks a significant step forward in biomaterials science.
In the fight against malaria, the genome of P. falciparum is both a roadmap and a battleground.
As a convergence point for materials science, synthetic biology, and optogenetics, CarH represents a paradigm shift in how we design and control biological systems.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settings