The Plant Associated Microbe Gene Ontology has added more than 700 words to an ongoing 3-year Gene Ontology term-development initiative, and has presented them as part of eight papers in a special supplement to BMC Microbiology.
The PAMGO Consortium developed the ontology with the goal of addressing a lack of vocabulary in the area of microbe-host interactions, and of helping researchers better mine full genome sequences and allowing them to do cross-genome analysis.
PAMGO is a multi-institutional effort formed in 2004 and includes the Virginia Bioinformatics Institute at the Virginia Polytechnic Institute and State University; the Institute for Genome Sciences at the University of Maryland School of Medicine; Cornell University; North Carolina State University; the University of Wisconsin, Madison; Wells College in Aurora, NY; and the European Bioinformatics Institute in the UK.
By developing new GO terms the scientists want to improve data-mining results from genome-sequence and high-throughput microarray and proteomic analyses that shed light on microbes and hosts.
"The community-driven Gene Ontology resource is a big step forward in allowing researchers to make comparisons across microbial species of the many processes involved in microbe-host environment interactions," Brett Tyler, PAMGO project leader, co-author of five of the eight papers, and Virginia Bioinformatics Institute researcher, said in a statement.
He added that he hopes that the scientific community will add to this controlled vocabulary based on its own research findings.
"Extending the GO to include terms specific to pathogen function is a watershed step for infectious disease researchers," Fiona McCarthy, researcher at the College of Veterinary Medicine at Mississippi State University, told BioInform in an e-mail. She was not part of the PAMGO effort but is part of the Gene Ontology Consortium, which works on ontology development.
"Now we can use existing GO tools to functionally model host-pathogen interactions." Mississippi State hosts the AgBase Database a curated, open-source, agricultural research resource for researchers to use for functional analysis of agricultural plant and animal gene products," McCarthy said.
The papers cover a wide range. For example, they include the "struggle for control" of the programmed cell death process between microbes and their hosts and the terms to describe that process; vocabularies that describe how microbes pull nutrients from the hosts; and a paper that uses the new terms to annotate the genome of a highly destructive fungus for rice called Magnaporthe oryzae.
PAMGO's GO terms stand to help scientists annotate symbiont genomes in ways not possible through sequence analysis alone. Effector proteins that are deployed by microbes to manipulate host-cell structure and functions have "little in common at the sequence level" even though they might use similar strategies to defeat a host's defenses, Tyler and his colleagues pointed out in the series.
For the functional comparison of effectors in all their diversity, the scientists stated that an approach is needed that does not depend on sequence similarities.
"The GO provides such an approach," the researchers wrote, explaining that GO annotations "efficiently" summarize information about gene products from the literature in a standardized way.
Trudy Torto-Alalibo, PAMGO Project Coordinator at the Virginia Bioinformatics Institute and co-author of five papers, said in a statement she believes that this resource will enable "more comprehensive cross-kingdom analyses" and allow scientists to better understand the molecular mechanisms underlying microbial interactions with their hosts.
[ pagebreak ]
Finding the Right Words
Michelle Gwinn-Giglio, researcher at the Institute for Genome Sciences and PAMGO contributor, and who was formerly at The Institute for Genome Research, told BioInform in an email that the advantage of ontologies is that they provide a mechanism to translate information usually captured in human language such as in peer-reviewed literature and protein names into machine-readable and computable data that can be used to make predictions.
"The PAMGO project has expanded these powers of the GO system so that they can now be applied to the realm of symbiotic interactions," she said.
Biologists stand to benefit from this expansion of GO, she said, because these annotations allow information collected in studies to be utilized "by many more scientists than it would be if they used only literature publications as a means of communicating their findings." These terms are a way to "spread awareness and use of the knowledge" scientists have accumulated in their work.
As part of PAMGO, researchers have been working on reference genomes to show the "high quality examples of the usage of the new terms," Gwinn-Giglio, Torto-Alalibo, and Cornell University researcher Candace Collmer pointed out in their overview paper. The genomes are of the bacteria Pseudomonas syringae pv tomato DC3000, Dickeya dadantii (Erwinia chrysanthemii) 3937, and Agrobacteriun tumefaciens C58; the fungus Magnaporthe oryzae; and the oomycete P. sojae.
The more-than-29,000 annotations that are a result of PAMGO can be viewed here, the scientists pointed out in their paper.
The more-than-700 GO terms the scientists presented range from very general terms describing microbe-host interactions such as 'adhesion to host.' In each case, so-called child terms or sub-terms describe specifics relative to the parent terms. Parent terms, the scientists stated in their paper, usually describe processes shared across organisms.
More Low- Than High-Tech
"This was really a pretty low-tech effort," Gwinn-Giglio said.
The first phase of the project, she said, was all about term development, so it involved frequent exchange of Word files with term lists and their definitions.
"Eventually we started working much more closely with members of the GO editorial office," Gwinn-Giglio said. "At that point, although there were still a lot of Word files flying around, we started to use Oboedit a lot more." Oboedit is a tool for building and editing ontologies, which are stored in the obo format.
PAMGO began with a focus primarily on microbial pathogens and initially terms were generated to annotate microbial genes involved in interacting with the host such as "recognition of host."
Gwinn-Giglio and her colleagues noted that the team realized terms were also needed to frame the process from the host's perspective, such as "recognition of symbiont." They developed parallel term sets for those perspectives and also integrated terms to apply in cases in which neither organism can be clearly identified as being solely host or symbiont.
After the bulk of the terms were integrated into GO, the annotation phase began which was mainly literature-based manual assignment and curation of GO terms, she said.
[ pagebreak ]
The different consortium members each had their own tools and processes for doing this, she explained. Given the long-running collaboration between the Cornell P. syringae group and TIGR, that group, for example, chose the annotation tool Manatee developed at TIGR and now continuing to be developed at the Institute for Genome Sciences, Gwinn-Giglio said.
As Gwinn-Giglio pointed out, pathogenesis is only one kind of symbiosis. The term "symbiosis" has children terms including the term "pathogenesis" as well as "interaction with host" which can apply to symbiotic relationships that involve hosts whether they are mutually beneficial or harmful. "You can think of symbiosis as a continuum of interactions where on one end you have mutually beneficial and on the other end you have one organism kills the other."
Besides having filled the gap in the vocabularies, she and her colleagues believe this project helps researchers "gain insight on the commonalities of pathogenic processes across diverse species" and help develop approaches for intervention in these pathogeneic processes. The PAMGO species are plant pathogens that have profound impacts on plants that are important food sources worldwide and thus they have profound impacts on human health.
With sequencing technologies accelerating the availability of genomes, including microbial genomes, the window is opening on pathogens and their mechanisms as well as organisms that live more peaceably in symbiosis or close association with a host.
Structural and functional annotation are needed to use genome sequences to understand microbe-host interactions, Giglio and her colleagues pointed out in their paper. For "meaningful cross-genome searches," vocabulary is required that describes the functions of gene products so that they are "universally understandable across organisms."
Terms have to do justice to the biology on multiple levels. Genes encoding functionally equivalent proteins often carry different names in different organisms and some umbrella terms describing "very different biological processes." One example of one such term is sporulation, which, depending on the context is either a reproductive term as well as a reaction of a microbe to environmental stress.
The GO terms will also flow into the Human Microbiome Project, the Data Analysis and Coordination Center, which is at the Institute for Genome Sciences. As Owen White, HMP DACC's principal investigator, said in a statement, the PAMGO initiative has delivered terms for the HMP community that "can serve as the standard for data capture and exchange, greatly facilitating the use of HMP data."
As Gwinn-Giglio explained, part of the Institute for Genome Sciences' role in the DACC is to apply standards to the Human Microbiome Project genome data. "All of our annotation pipelines here at IGS already incorporate GO terms as part of the annotation that gets assigned to each gene," Gwinn-Giglio said.
As the DACC updates and adds annotation data sets that come from the HMP sequencing centers, GO terms will be added and updated as well. "These GO terms will include the terms derived from the PAMGO effort so they will be applied to the HMP data," she said.
On the DACC site, users will be able to query this data and using GO terms find genes that share common functions and processes across species and datasets, she said.
[ pagebreak ]
Know the Vocabulary
Gwinn-Giglio said that funding agencies "recognize the importance of controlled vocabularies and standards." Recent requests for applications contain language about using ontologies and standards, for example. "But it would be nice to have the funding centers mandate the use of certain standards or at least participation in standards development," she said.
When PAMGO got its start in 2004, it was through scientists devoting their spare time to the project before it started receiving funding, she explained.
Since that time, "a lot has changed" including plummeting sequencing costs, soaring data output and ever-new, high-throughput technologies that allow scientists to sequence genomes of interest. These developments make it "essential" to put in place systems to organize and standardize information associated with genomes and the genes they contain.
"There has been a significant shift" toward putting sequencing efforts into metagenomic projects, involving a community rather than single organism projects. For example NIH launched the Human Microbiome Project to do metagenomic analysis of the communities that live in and on the human body. "This is particularly relevant to the PAMGO project with its focus on host-microbe interactions."