Skip to main content
Premium Trial:

Request an Annual Quote

Controversial Publication of Arabidopsis Papers Sparks Debate Over Use of Public Data


Science’s decision to publish two papers on Arabidopsis on December 15, just one day after the main sequencing paper was published in Nature, has stirred up a debate over the appropriate uses of public genomic data.

Some Arabidopsis researchers said they think it is unethical for scientists not involved in the sequencing project to use the publicly available data for whole genome analyses before the genome has been fully sequenced and published.

Others, however, disagreed, saying that researchers should have free rights to do what they want with publicly available data so long as they don’t publish their findings before the main sequencing paper is printed.

If there is a consensus, it is that more discussion is needed.

“I think it’s a question that deserves some serious dialogue among scientists and journals,” said Dick McCombie, associate professor at Cold Spring Harbor Laboratory.

The two Science papers, “Arabidopsis Transcription Factors” by lead author J.L. Riechmann of Mendel Biotechnology and “The Origins of Genomic Duplications in Arabidopsis” by lead author Todd Vision of the US Department of Agriculture’s Agricultural Research Service, did whole genome analyses based on data made publicly available before the plant was fully sequenced. Neither Riechmann nor Vision were part of the public sequencing project.

The papers followed the December 14 paper in Nature by the Arabidopsis Genome Initiative, the international public consortium that sequenced the mustard plant. The US part of the effort included the Institute for Genomic Research, Cold Spring Harbor, and groups of academic researchers.

Steven Salzberg, TIGR’s senior director of bioinformatics, objected to the publication of the papers in Science because both papers did genome-wide analyses on draft data that had not been peer-reviewed.

He said he did not think such papers should be published because they could have errors due to the unfinished nature of the data and because it takes away some of the incentive for scientists to participate in genome sequencing efforts. He said scientists used to wait until a project was done before doing such wider view analyses.

“Why would I spend years of my life generating genomic data if all I have to do is sit in my office and download someone else’s and I can write the same paper?” Salzberg asked.

However, some of his colleagues on the Arabidopsis project and at TIGR offered different views.

Claire Fraser, TIGR’s president and director, agreed that this is a “thorny issue” and the papers could be viewed as violating the spirit of the rules governing the public data. But, she added, the genome project “certainly wasn’t scooped” by the papers because they came out the next day. And much of the genome had been publicly available for months.

“We can’t start splitting hairs now and say a day is not long enough,” she said, adding that she would have felt very differently if the papers had appeared a week before.

McCombie added that he did not think using the data was unethical in any way, but acknowledged the need to give sequencers incentives to continue making data public.

“We put the data out there to make it publicly available and for people to use it,” said McCombie. “On the other hand, I think there is a concern that the system [should] not work in such a way to be a disincentive for people to do what we do, which is to make the data available.”

He said that these questions are going to become even more important as more genomes are sequenced.

McCombie rejected the idea that it is fine for scientists to do single gene analyses but not whole genome analyses unless they are working on the sequencing.

“Looking at things one gene at a time is a thing of the past. The real exciting biology is going to come from looking at whole genomes,” said McCombie, adding that the real story is why there weren’t more papers in Science since so much of the data had been in the public domain.

Barbara Jasny, Science’s supervisory senior editor, said that she is sympathetic to both sides of the debate but added that Science and the researchers who wrote the papers didn’t do anything wrong because the data was available to scientists without restrictions.

Jasny said that Science does not have a particular position on the issue but rather follows the guidelines that the scientific community establishes. She said that Science hopes more discussion will take place.

“Policies in this area are evolving. I think we’re all waiting to see what the next steps will be,” said Jasny.

Jasny referred to a letter by Lee Rowen and Lee Hood of the Institute for Systems Biology and Gane Wong and Robert Lane of the University of Washington that Science published in its September 15 issue, which was intended to foster debate on these issues.

—Matthew Dougherty

Filed under

The Scan

Transcriptomic, Epigenetic Study Appears to Explain Anti-Viral Effects of TB Vaccine

Researchers report in Science Advances on an interferon signature and long-term shifts in monocyte cell DNA methylation in Bacille Calmette-Guérin-vaccinated infant samples.

DNA Storage Method Taps Into Gene Editing Technology

With a dual-plasmid system informed by gene editing, researchers re-wrote DNA sequences in E. coli to store Charles Dickens prose over hundreds of generations, as they recount in Science Advances.

Researchers Model Microbiome Dynamics in Effort to Understand Chronic Human Conditions

Investigators demonstrate in PLOS Computational Biology a computational method for following microbiome dynamics in the absence of longitudinally collected samples.

New Study Highlights Role of Genetics in ADHD

Researchers report in Nature Genetics on differences in genetic architecture between ADHD affecting children versus ADHD that persists into adulthood or is diagnosed in adults.