Skip to main content
Premium Trial:

Request an Annual Quote

Sogin Case Raises Questions over Access to Pre-Publication Sequence Data


Mitchell Sogin of the Marine Biological Laboratory in Woods Hole, Mass., found himself in the spotlight two weeks ago when an article in Science detailed a dispute involving raw Giardia lamblia sequence data generated in his lab.

Sogin claimed that Hyman Hartman and Alexei Federov of MIT and Harvard, respectively, used the data posted on his website ( to publish a paper in the Proceedings of the National Academy of Sciences on eukaryote evolution. While Hartman didn’t deny using Sogin’s data along with several other publicly available data sets, he told Science that the Giardia data was only used as a “filter” to delete the proteins of more primitive organisms.

In protest, Sogin shut down his website ( for two weeks.

The website is back online, with the revamped data release policy prominently displayed on the home page as well as on any downloaded files. “It’s not our intention to limit publication of a handful of genes,” Sogin told BioInform. “We just didn’t expect people to download the whole genome as part of an evolutionary biology study.”

While it’s common practice to conduct research and even publish on small bits of genomic information downloaded from sequencing centers, scientific etiquette has dictated that scientists using such data check with the sequencing center first before publishing work based on their data. But with more and more of this data now accessible online, ease of availability has lowered some barriers that existed in the past to whole-genome data grabbing, raising concerns that cases like Sogin’s won’t remain isolated incidents.

Free access to pre-published sequence data has been a tenet of the genome community since 1996, when attendees of the First International Strategy Meeting on Human Genome Sequencing held in Bermuda drafted the “Bermuda rules” to ensure rapid access to primary sequence data. But the tide seems to shifting a bit as genome sequencing enters the realm of everyday biological lab work.

“The human genome was a special event and the Bermuda rules were completely appropriate for that,” said Sean Eddy of Washington University. “But there’s a lot of projects now where an organism is going to be sequenced for a particular reason because it’s got very interesting biology and the lab that’s sequencing it is actually very interested in that biology.” In such cases, Eddy suggested, sequencers ought to be given “a clear field” to study their organism.

Furthermore, Sogin added, this issue may be just “a passing stage in the evolution of genomic science.” As the pace of genomic sequencing increases and the technology becomes less novel, there will be less of a need to release data before publication, he predicted.

Noting that he’s received “quite a number of supportive statements” from his peers since his story made news, Sogin noted that despite the lack of enforcement of data access policies in the field, “most of the scientific community has behaved in a respectable fashion … I do still believe that we should be releasing data and making it available to the community.”

— BT

Filed under

The Scan

Billions for Antivirals

The US is putting $3.2 billion toward a program to develop antivirals to treat COVID-19 in its early stages, the Wall Street Journal reports.

NFT of the Web

Tim Berners-Lee, who developed the World Wide Web, is auctioning its original source code as a non-fungible token, Reuters reports.

23andMe on the Nasdaq

23andMe's shares rose more than 20 percent following its merger with a special purpose acquisition company, as GenomeWeb has reported.

Science Papers Present GWAS of Brain Structure, System for Controlled Gene Transfer

In Science this week: genome-wide association study ties variants to white matter stricture in the brain, and more.