NEW YORK (GenomeWeb News) – In a paper published online today in Science’s special section on plant genomes, an international group of researchers described a project to start deciphering and mapping the Arabidopsis thaliana proteome.
The work not only catalogues some organ-specific proteins and biomarkers, but also provides a glimpse into the relationship between A. thaliana’s genetic code and its protein contingent. Overall, the group identified about half the proteins predicted from available Arabidopsis genetic data — along with several dozen that were not predicted.
“From the predicted proteome we find close to 50 percent,” senior author Sacha Baginsky, a researcher affiliated with Zurich’s Institute of Plant Sciences and the University of Zurich’s Center for Model Organism Proteomes, told GenomeWeb Daily News.
The team spent about three years collecting and interpreting data regarding the proteins in half a dozen different A. thaliana organs collected at different stages of development. They peeked into the protein content of each using high-throughput shotgun proteomics — an approach in which researchers analyze a mixture of unknown proteins by mass spectrometry. First, they came up with a complex peptide mixture and then fractionated this on a reverse-phase column as the mass spec acquired data.
Specifically, the team analyzed the protein content in six plant organs using 1,354 linear trap quadrupole ion trap mass spectrometry runs. They then applied two algorithms — PeptideProphet and PepSplice — to analyze the data, eventually matching 86,456 unique peptide sequences to 13,029 proteins.
While the actual data acquisition was relatively quick, Baginsky noted, the data handling and analysis were much more laborious. To deal with this data, the team developed its own database for uploading, searching, and mining data.
That database is not yet publicly available, but results from the experiment — including the mass spec data — are. They are being offered to the public through the PRoteomics IDEntifications, or PRIDE, database. “It was very important for us that we make everything available,” he said, since being able to scrutinize the mass spec data “gives you the transparency to judge the quality of the assignments.”
Not all proteins were detected equally, since some proteins are present in larger quantities than others. “This approach is biased for abundant proteins,” Baginsky said. Many of the predicted proteins that weren’t identified are expressed at low transcript levels, he added, and the detection approach they used is also biased toward high transcript levels and against small proteins.
Even so, the researchers were able to identify a broad range of proteins by using high-throughput proteomics across several different plant organs and tissues and by using biochemical fractionation to create different protein pools for analysis.
The proteins that overlapped between most or all of the samples represented the core proteome — the proteins that are essential for all of the plant tissues. Others were organ specific. In fact, using Gene Ontology classifications, Baginsky and his team identified 571 organ-specific biomarkers.
Proteins related to transcriptional regulation and signaling were under-represented in most samples, while those involved in metabolic processes such as glycolysis and translation were over-represented. The most plentiful proteins in specific organs tended to vary with organ function. For instance, proteins involved in oxidative stress response and intracellular protein transport were abundant in roots. And, not surprisingly, photosynthesis and chloroplast-related proteins were over-represented in leaves.
There were also 57 cases in which the researchers discovered proteins in different forms than those predicted from genetic information. These alternate gene models were surprising, Baginsky said, “We can find peptides from proteins that aren’t even predicted to be there.”
The work supplies information about A. thaliana specifically, but also highlights the importance of integrating different types of complementary biological information — in this case, genome annotation, gene prediction, and proteomic data.
Without genome sequence, Baginsky said, compiling and interpreting proteomic data on this scale would have been much more difficult. But even with a good Arabidopsis genome sequence database, they found unexpected gene models, he added, suggesting proteomic data is also crucial for annotating the genome.