NEW YORK (GenomeWeb News) – The practice of releasing large reference genomics data sets quickly and before publication and making it widely available can be "profoundly valuable to the scientific enterprise," and such a policy of pre-publication data release should be adopted by researchers in similar fields, according to a statement from a recent Data Release Workshop hosted by Genome Canada and other funding agencies.
The "The Toronto Statement," was generated by a group of scientists, ethicists, lawyers, and editors who met in May 2009, and was published this week as an op-ed in the journal Nature. The authors included Eric Green, who is senior investigator in the Genome Technology Branch at the National Human Genome Research Institute, among others.
The group from the Toronto International Data Release Workshop argues that rapid, pre-publication release of data should be encouraged for some projects that generate data that is large in scale and broad in utility, and for those that create reference data sets and those which are associated with community buy-in.
The data release policy should "go beyond genomics and proteomics studies to other data sets – including chemical structure, metabolomic and RNAi data sets," and to clinical resources such as cohorts, tissue banks, and case-control studies, according to the op-ed.
More specifically, the group advised pre-publication data-release policies should include whole genome or mRNA sequences of a reference organism or tissue; genome-wide association analysis of thousands of samples; whole-genome sequences of microbial communities in different environments; whole-genome expression profiles from a large panel of reference samples; mass spectrometry data sets from large panels of normal and diseased tissues; large-scale cataloguing of 3D structures of proteins or compounds; and others.
The group also advised that funders should adopt an optional pre-publication release policy for studies that include genotyping of selected gene candidates, gene variant studies, mass spectrometry from limited data sets, and similar projects that are more focused and smaller in scale.
Funding agencies, the Toronto group advised, should take several steps when adopting such a policy. They should explicitly inform applicants of their data-release requirements, particularly any mandatory pre-publication data release requirement. They also should evaluate applicants' data release plans, establish plans and timelines for projects engaging in pre-release, help to develop the appropriate consent, security, access and other ways to protect participants, and provide long-term database support.
Researchers should state their intensions for their data and should inform potential data users about what information will be generated and how it will be analyzed, should provide relevant metadata such as questionnaires and environmental information, and should include all data, even raw data, in databases.
Researchers who use such pre-publication data should "respect the scientific etiquette that allows data producers to publish the first global analysis," they should contact the creator of the data with any plans to publish, should cite the data producer, and they should ensure that their use of the data does not harm research participants and is ethical.
"The rapid pre-publication release of sequencing data has served the field of genomics well," the authors noted, and they acknowledge that "policies for pre-publication release of data need to evolve with the changing research landscape, that there is a range of opinion within the scientific community, and that community behavior (as opposed to intentions) need to be reviewed on a regular basis."