Mitchell Sogin of the Marine Biological Laboratory in Woods Hole, Mass., found himself in the spotlight two weeks ago when an article in Science detailed a dispute involving raw Giardia lamblia sequence data generated in his lab.
Sogin claimed that Hyman Hartman and Alexei Federov of MIT and Harvard, respectively, used the data posted on his website (www.mbl.edu/Giardia) to publish a paper in the Proceedings of the National Academy of Sciences on eukaryote evolution. While Hartman didn’t deny using Sogin’s data along with several other publicly available data sets, he told Science that the Giardia data was only used as a “filter” to delete the proteins of more primitive organisms.
In protest, Sogin shut down his website (www.mbl.edu/Giardia) for two weeks.
The website is back online, with the revamped data release policy prominently displayed on the home page as well as on any downloaded files. “It’s not our intention to limit publication of a handful of genes,” Sogin told BioInform. “We just didn’t expect people to download the whole genome as part of an evolutionary biology study.”
While it’s common practice to conduct research and even publish on small bits of genomic information downloaded from sequencing centers, scientific etiquette has dictated that scientists using such data check with the sequencing center first before publishing work based on their data. But with more and more of this data now accessible online, ease of availability has lowered some barriers that existed in the past to whole-genome data grabbing, raising concerns that cases like Sogin’s won’t remain isolated incidents.
Free access to pre-published sequence data has been a tenet of the genome community since 1996, when attendees of the First International Strategy Meeting on Human Genome Sequencing held in Bermuda drafted the “Bermuda rules” to ensure rapid access to primary sequence data. But the tide seems to shifting a bit as genome sequencing enters the realm of everyday biological lab work.
“The human genome was a special event and the Bermuda rules were completely appropriate for that,” said Sean Eddy of Washington University. “But there’s a lot of projects now where an organism is going to be sequenced for a particular reason because it’s got very interesting biology and the lab that’s sequencing it is actually very interested in that biology.” In such cases, Eddy suggested, sequencers ought to be given “a clear field” to study their organism.
Furthermore, Sogin added, this issue may be just “a passing stage in the evolution of genomic science.” As the pace of genomic sequencing increases and the technology becomes less novel, there will be less of a need to release data before publication, he predicted.
Noting that he’s received “quite a number of supportive statements” from his peers since his story made news, Sogin noted that despite the lack of enforcement of data access policies in the field, “most of the scientific community has behaved in a respectable fashion … I do still believe that we should be releasing data and making it available to the community.”