The National Science Foundation has awarded a total of $10 million in research funding that will be spread out over five years to support further development of the Gramene database — a resource for comparative genomic analysis in grass species.
The funds will also support development of a new repository for biological network information dubbed the Plant Reactome, which will adopt a similar framework to the one used to develop the Human Reactome database.
According to a grant abstract, the project investigators intend to “expand the number of plant genomes incorporated into the [Gramene] portal, add new capabilities for studying gene expression, pathways, and networks, and fundamentally improve the database architecture and user interfaces to facilitate sophisticated systems-level analyses.”
The intent is to develop an analysis system that is “built on structured metadata and implemented through a high-capacity data warehouse and an advanced search engine,” the abstract states.
So far, the NSF has awarded around $4 million to the project.
In a statement, Doreen Ware, a scientist with the US Department of Agriculture's Agricultural Research Service and adjunct associate professor at Cold Spring Harbor Laboratory, said that the new funding is particularly timely given the acute need for biologists to integrate uniquely valuable but often scattered bits of genomic and related data.
Lincoln Stein, who heads informatics efforts at the Ontario Institute for Cancer Research and is one of the senior investigators on the Gramene project, noted that many useful resources are “underutilized because of the fragmentation of datasets and perhaps even more because of the scarcity of tools to make meaningful connections among them.”
Ware, who is the project’s principal investigator, believes that "by honing resources like Gramene, we are bringing to bear the knowledge and hidden insights that leading-edge information technology makes visible in order to serve the needs that plant biologists have in generating ever more sophisticated analyses of experimental data.”
The Gramene project was launched to provide added value to data collected on plant genomes. It currently hosts data on grasses such as rice, maize, sorghum, and barley, as well as on the genomes of broad-leaf crops such as soybean, tomato, poplar, and grapevine, and the model plant Arabidopsis thaliana. It also includes information on "lower" plant genomes such as moss and algae.
With the current influx of funds, the investigators will add 20 new plant genomes into Gramene’s portal, Pankaj Jaiswal, an assistant professor Oregon State University’s botany and plant pathology department and a co-principal investigator of project, said in a statement.
These will include genomes from blue-green algae among other plant species, Jaiswal told BioInform.
The new genomes will include information on gene annotations, structural variants, gene expression profiles, and associated phenotypes among other types of data, he said.
Reactome for Plants
In parallel, the investigators will develop the Plant Reactome repository using the framework developed for the Human Reactome database, which supports the combination of metabolic and regulatory networks in a single resource, Jaiswal said. His group is leading the development of the Plant Reactome for Gramene.
The Reactome infrastructure is “able to connect the dots between the datasets that are hosted in the Reactome and the datasets that are present in various genome and annotation projects like UniProt,” for example, he explained.
That kind of unifying infrastructure is necessary for plant projects because currently data from plant metabolic and regulatory networks are available in different formats and stored in multiple repositories — an unsustainable situation given the growing body of data in plant research, he said.
For example, “there are metabolic networks being investigated in plants, there are large-scale gene-gene interaction networks that are being studied, there are a lot of regulatory networks and epigenome networks that are being investigated in plants,” he said.
These and other plant research efforts have shown, for example, that there is broad conservation in grasses such as rice, maize, sorghum, barley, oats, wheat, and rye; as well as in the order of genes across chromosomes in these species. Other studies provide clues about gene function across species, which could ultimately improve crop yields or extend plants’ range.
Because of these developments and the need to maximize knowledge derived from plant genome data, “we thought that this is the right moment to adopt and develop [a] Plant Reactome proposal to have a set resource where we can put together these interaction networks, and invite the community to start working with us on collaborative aspects, contributing the data, and making sure the data quality is good,” Jaiswal said.
To start with, the Plant Reactome team will work on bringing together data from Arabidopsis, rice, and corn because these have the “most mature metabolic network” data, some of which was curated in earlier Gramene projects.
“Whatever we have curated in the metabolic networks, in the current grants, we will take those and import it into the Reactome framework using the community standard data exchange formats,” Jaiswal said.
“From that point onwards, we will build on bringing in regulatory networks for these three species and work with our collaborators on extending it to find the novel interactions, expression datasets [that show] which … genes in these networks are expressed at a given time or a given growth stage or under a given treatment,” he continued.
The developers plan to release a version of Reactome for rice first followed by one for Arabidopsis and then corn, Jasiwal said.
The rice version should be available later this year or early in 2013, he said.
Other activities planned under this round of funding include adopting the European Bioinformatics Institute’s Gene Expression Atlas to help display and analyze gene expression profiles. The investigators also plan to incorporate plant data and software from Ensembl and BioMart as well as a GWAS viewer.
Another institution that will participate in the project is the American Society of Plant Biologists, which will focus on developing mechanisms to facilitate the integration of data objects in Gramene with associated articles that have been published in ASPB’s journals — Plant Cell and Plant Physiology.
In a statement, Crispin Taylor, ASPB’s executive director, described ASPB’s portion of the project as “novel” for the plant research community.
He said that “in addition to providing important new venues for the discoverability of information in journal articles and databases in the plant sciences,” it “ought to serve as a compelling example for other disciplines."