NEW YORK (GenomeWeb News) – A pair of policy papers appearing in today's issue of Science are tackling the problems of 'omics data sharing and genome project classifications.
In the first of these articles, an international group of genomics researchers addressed challenges related to integrating and comparing large-scale data on DNA sequence, gene expression, and other processes.
The team proposed a set of guidelines for data-sharing genomic, transcriptomic, metabolomic, and proteomic research, touching on everything from the role of funding agencies to tools for sharing such information electronically.
For instance, the team suggested that researchers should be required to submit data-sharing plans in conjunction with grant applications. They also called for international formatting standards and rules mandating the submission of supporting or complete data to suitable databases following 'omics paper publications.
"We recommend that single, brief, high-level consensus guidelines serve as a template for policy documents at the funder, community, and project levels," the team wrote. "At its heart should be the public and timely release of data. It should be based on the principle that funders and the research community must work together to develop best practice"
For their part, the group has developed a site called BioSharing Web as a means to centralize 'omics data, standards, and policy information.
Meanwhile, in a separate policy paper, researchers from several large sequencing centers and elsewhere argued in favor of overhauling the way genome sequencing projects are reported and classified.
With sequencing speed increasing and cost decreasing, some have estimated that public databases will house 12,000 draft genomes by 2012. But because the quality and completion of these genomes varies dramatically, the authors suggest new standards are needed to classify draft and finished genomes — and everything in between.
"Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome," the authors wrote. "The result is an ever-widening gap between drafted and finished genomes that only promises to continue; hence, there is an urgent need to distinguish good from poor data sets."
To combat problems and confusion in the future, the researchers suggested four categories between "standard draft" and "finished" genomes: "high quality draft" genomes, "improved high quality draft" genomes, "annotation directed improvement" genomes, and "non-contiguous finished" genomes.
"In the past we've been limited to two options, requiring us and the other centers to come up with internal definitions," lead author Patrick Chain, a metagenomics researcher at the US Department of Energy Joint Genome Institute, said in a statement. "But these are not clear and they're not propagated to the databases to which we submit sequences. So when users try to download genomes they get data of unknown quality with no information, or a complete genome that they assume has been checked for missing-data errors."
The genome classification categories may not all apply to individual genome sequencing centers, senior author Chris Detter, also with JGI, said in a statement. But whatever categories do apply will benefit other members of the research community in the future, he said. "[M]y hope is that the smaller genomics groups adopt the classes as written to help the rest of the scientific community know what they are generating and submitting."
The genome projects standards group, which includes members of both large and small sequencing centers, spun out of a 2005 meeting at the Los Alamos National Laboratory, Chain noted. The group is reportedly discussing their proposed genome project standards with public databases housing genomic data. They also plan to join forces with the Genomic Standards Consortium, a group of scientists focused on developing data collection standards for genome projects.