Skip to main content
Premium Trial:

Request an Annual Quote

Controversial Publication of Arabidopsis Papers Sparks Debate Over Use of Public Data


Science’s decision to publish two papers on Arabidopsis on December 15, just one day after the main sequencing paper was published in Nature, has stirred up a debate over the appropriate uses of public genomic data.

Some Arabidopsis researchers said they think it is unethical for scientists not involved in the sequencing project to use the publicly available data for whole genome analyses before the genome has been fully sequenced and published.

Others, however, disagreed, saying that researchers should have free rights to do what they want with publicly available data so long as they don’t publish their findings before the main sequencing paper is printed.

If there is a consensus, it is that more discussion is needed.

“I think it’s a question that deserves some serious dialogue among scientists and journals,” said Dick McCombie, associate professor at Cold Spring Harbor Laboratory.

The two Science papers, “Arabidopsis Transcription Factors” by lead author J.L. Riechmann of Mendel Biotechnology and “The Origins of Genomic Duplications in Arabidopsis” by lead author Todd Vision of the US Department of Agriculture’s Agricultural Research Service, did whole genome analyses based on data made publicly available before the plant was fully sequenced. Neither Riechmann nor Vision were part of the public sequencing project.

The papers followed the December 14 paper in Nature by the Arabidopsis Genome Initiative, the international public consortium that sequenced the mustard plant. The US part of the effort included the Institute for Genomic Research, Cold Spring Harbor, and groups of academic researchers.

Steven Salzberg, TIGR’s senior director of bioinformatics, objected to the publication of the papers in Science because both papers did genome-wide analyses on draft data that had not been peer-reviewed.

He said he did not think such papers should be published because they could have errors due to the unfinished nature of the data and because it takes away some of the incentive for scientists to participate in genome sequencing efforts. He said scientists used to wait until a project was done before doing such wider view analyses.

“Why would I spend years of my life generating genomic data if all I have to do is sit in my office and download someone else’s and I can write the same paper?” Salzberg asked.

However, some of his colleagues on the Arabidopsis project and at TIGR offered different views.

Claire Fraser, TIGR’s president and director, agreed that this is a “thorny issue” and the papers could be viewed as violating the spirit of the rules governing the public data. But, she added, the genome project “certainly wasn’t scooped” by the papers because they came out the next day. And much of the genome had been publicly available for months.

“We can’t start splitting hairs now and say a day is not long enough,” she said, adding that she would have felt very differently if the papers had appeared a week before.

McCombie added that he did not think using the data was unethical in any way, but acknowledged the need to give sequencers incentives to continue making data public.

“We put the data out there to make it publicly available and for people to use it,” said McCombie. “On the other hand, I think there is a concern that the system [should] not work in such a way to be a disincentive for people to do what we do, which is to make the data available.”

He said that these questions are going to become even more important as more genomes are sequenced.

McCombie rejected the idea that it is fine for scientists to do single gene analyses but not whole genome analyses unless they are working on the sequencing.

“Looking at things one gene at a time is a thing of the past. The real exciting biology is going to come from looking at whole genomes,” said McCombie, adding that the real story is why there weren’t more papers in Science since so much of the data had been in the public domain.

Barbara Jasny, Science’s supervisory senior editor, said that she is sympathetic to both sides of the debate but added that Science and the researchers who wrote the papers didn’t do anything wrong because the data was available to scientists without restrictions.

Jasny said that Science does not have a particular position on the issue but rather follows the guidelines that the scientific community establishes. She said that Science hopes more discussion will take place.

“Policies in this area are evolving. I think we’re all waiting to see what the next steps will be,” said Jasny.

Jasny referred to a letter by Lee Rowen and Lee Hood of the Institute for Systems Biology and Gane Wong and Robert Lane of the University of Washington that Science published in its September 15 issue, which was intended to foster debate on these issues.

—Matthew Dougherty

Filed under

The Scan

Booster Push

New data shows a decline in SARS-CoV-2 vaccine efficacy over time, which the New York Times says Pfizer is using to argue its case for a booster, even as the lower efficacy remains high.

With Help from Mr. Fluffington, PurrhD

Cats could make good study animals for genetic research, the University of Missouri's Leslie Lyons tells the Atlantic.

Man Charged With Threatening to Harm Fauci, Collins

The Hill reports that Thomas Patrick Connally, Jr., was charged with making threats against federal officials.

Nature Papers Present Approach to Find Natural Products, Method to ID Cancer Driver Mutations, More

In Nature this week: combination of cryogenic electron microscopy with genome mining helps uncover natural products, driver mutations in cancer, and more.