NEW YORK – A team led by researchers at the Max Planck Institute of Biochemistry has generated proteomes covering 100 organisms from a wide range of taxonomies, allowing for large-scale cross comparisons of species at the protein level.
Detailed in a paper published on Wednesday in Nature, the effort is one of the largest to date and aims to mirror similar work in genomics where, the authors noted, researchers have managed studies comparing the genomes of large numbers of diverse organisms.
Proteomics has lagged in this regard largely due to the technical challenges of mass spectrometry-based experiments. The Max Planck team addressed this issue using a recently developed chip-based chromatography method along with deep learning-based informatics that allowed them to achieve good depth of proteome coverage even when running samples at relatively high throughput and to maintain good reproducibility across the many samples analyzed.
The study looked at 19 archaea, 49 bacteria and 32 eukaryotes as well as 14 viruses, identifying 349,164 proteins across these organisms, including 9,500 proteins in a human cell line. They then used protein homology data to compare protein levels across the different organisms, building a graph database illustrating these connections. In total, they identified more than 8 million nodes and more than 53.8 million connections between different peptides, proteins, gene ontology terms and other organism characteristics, all of which, they wrote, can be "queries for any relationship between all of these nodes, as visualized for MS-identified homologues of two species.
Among other things, the resource will allow researchers to characterize proteins of unknown function by looking up their homologs in organisms with functional information. This could similarly be done for entire cellular pathways as well as organelles and cell compartments, the authors noted.
The study also provides a look at the composition of proteomes, finding among other things that proteins linked to the production and regulation of proteins comprised 10 percent of the total protein mass across all the organisms profiled.
They also found that a substantial portion of the proteome remains unexplored with 38.4 percent of the proteins identified having no functional annotation. Looking at just the most highly expressed proteins, almost 23 percent of the 100 most abundant proteins for each species had no functional annotation.
This points "to a very large number of highly expressed proteins without any functional annotation or sequence homology to proteins with known gene ontology terms," the authors wrote, adding that "exploration of this part of the 'dark proteome' would be attractive: these proteins may indicate essential but unique features in the evolutionary development of these organisms that may be of biological or biotechnological interest."