NEW YORK — By incorporating deep learning approaches with a co-evolution analysis, researchers have generated new models of core protein complexes found in yeast.
Proteins that interact with one another are likely to have co-evolved, which scientists have taken advantage of to identify interacting pairs with higher accuracy than through previous approaches like yeast two-hybrid screens.
Researchers led by the University of Washington's David Baker applied this idea to the Saccharomyces cerevisiae proteome to identify protein complexes, with some tweaks. As they reported in Science on Thursday, they additionally incorporated the deep-learning based structure prediction methods RoseTTAFold and AlphaFold. In this way, they identified more than 1,000 proteins that are likely to interact and built models for hundreds of those complexes. These protein complexes have a range of functions in eukaryotic cells, from DNA repair to protein transport.
"Together with the advances in monomeric structure prediction, our results herald a new era of structural biology in which computation plays a fundamental role in both interaction discovery and structure determination," Baker and colleagues wrote in their paper.
Using OrthoDB and proteome sequences from the NCBI and JGI databases, the researchers identified a set of about 4,000 orthologs of single-copy yeast proteins within other species and generated paired multiple sequence alignments. This identified more than 4 million protein pairs — vastly more than the gold-standard set of 768 yeast protein pairs that are known to interact.
To better distinguish that core set, the researchers combined their co-evolution approach with RoseTTAFold and AlphaFold, both deep-learning based structure prediction approaches. Though RoseTTAFold was trained on monomeric protein structures and complexes, it can predict protein complexes, as long as the paired multiple sequence alignments are long enough. AlphaFold was similarly trained on monomeric protein structures and complexes, but the researchers suspected that, given its higher accuracy than RoseTTAFold on monomeric structures, it too might perform well on protein complexes.
By applying RoseTTAFold followed by AlphaFold to their dataset, the researchers identified a group of 715 likely interacting pairs. At the same time, they identified a set of 1,251 likely interacting pairs by applying AlphaFold to a literature-curated dataset. As 461 pairs overlapped, they uncovered 1,501 protein-protein interactions.
For about 800 of these proteins, the researchers developed and analyzed structural models. Many of these protein complexes were involved in DNA repair, but others had roles in protein transport or metabolism.
They predicted, for instance, the structure of the complexes Spo11 forms with Ski8 and Rec102. Spo11 is needed for sexual reproduction in most eukaryotes. While their Spo11-Ski8 structure is similar to a previous model developed based on the Ski3-Ski8 complex, it also indicates that there may be a more extensive interaction surface.
Likewise, the predicted interaction models of Rpl12B-Rmt2 and Rpl7A-Fpr4, which have roles in protein modifications, suggested increased cross-talk between ribosome-maturation pathways and metabolic sensors as well as between regulators of translation initiation and transcription factors.
They further modeled the four subunits of the GARP complex that is involved in the docking and fusing of vesicles with the Golgi apparatus as well as the SNARE proteins, which are also involved in the fusion of the intracellular membranes of vesicles and organelles.
The study has a number of limitations, the researchers noted, including that it only examined about two-thirds of the yeast proteome and that it would miss interactions that only happen among a few organisms. Still, the analysis "should advance understanding of a wide range of eukaryotic cellular processes and provide new targets for therapeutic intervention," according to the researchers, who added that the approach could be directly extended to the human proteome.