NEW YORK – New research based on predicted protein structures has provided clues to the functions of viruses that were not obvious from genome sequence data alone.
"In this study, we really asked if we can use predicted structures to learn things about viral proteins that we can't see using sequence alone. And, we found quite a lot," Jason Nomburg, a postdoctoral research fellow in Jennifer Doudna's lab at the University of California, Berkeley, and the first author of a study published in Nature on Monday, said in an email.
For their work, Nomburg, Doudna, and their colleagues at UC Berkeley, Lawrence Berkeley National Laboratory, and the Gladstone-UCSF Institute of Data Science and Biotechnology relied on a structural prediction tool called ColabFold to analyze multi-sequence alignments for nearly 4,500 eukaryotic virus species from the RefSeq database, predicting structures for 67,715 proteins.
From there, the team used the Foldseek tool to find 18,192 structural alignment-based protein clusters, compared to the 21,913 protein sequence-based clusters that were identified from the protein collection using the MMseqs2 tool.
"Structural similarity searches greatly expanded the taxonomic diversity of protein clusters, revealing putative protein functions by connecting unannotated viral proteins with annotated analogues," the authors reported, adding that "[s]tructural similarity both with other viral proteins and with host proteins can offer functional insights and provide insight into the origin and evolution of viral proteins."
The work builds on prior studies that hinted at strong protein structure conservation across organisms that are evolutionarily distinct from one another, Nomburg noted. But while many of the prior analyses focused on experimentally defined protein structures, the vast diversity of proteins present in viruses has been largely underrepresented in the past.
"[H]ere, we were actually able to do these structural comparisons more broadly, which let us find interesting potential functions for viral proteins that lack an experimental structure," he explained.
Across the set of predicted protein structures, the investigators estimated that 62 percent were unique from structures present in the AlphaFold database, while many of the remaining ones had analogs from nonviral sources. By delving into the structural similarity in the latter set, the team got a glimpse at possible functions of viral proteins, including those shared with nonviral human pathogens.
For example, the analyses highlighted conservation of a class of enzymes known as RNA ligase T (LigT)-like phosphodiesterases that are suspected of helping viruses dodge their hosts' innate immune systems, Nomburg said.
These and other findings suggested that viruses can target distinct immune features using a conserved protein scaffold, he added, noting that "RNA viruses use this fold to target the RNA-sensing OAS-RNaseL pathway, while DNA viruses (avian poxviruses) use this same fold to target the DNA-sensing cGAS-STING pathway."
More broadly, the study's authors suggested that building out protein structure and structure prediction databases further "will continue to enable functional inference" that is "important not only from a fundamental biology perspective, but also in light of the continued emergence of novel viruses with pandemic potential."