NEW YORK, May 16 – Authors of the annotated Drosophila genome published in Science in March 2000 have voiced doubts about Nature ’s decision to publish a paper that highlights a number of errors in the initial release of the data.
The paper, which appears in today’s Nature , compares the annotation of the Drosophila genome by researchers from Celera Genomics and the Berkeley Drosophila Genome Project with known protein sequences in the SwissProt database. The comparison indicates that 45 percent of predicted proteins in the Celera/BDGP genome had sequence differences of more than 1 percent over the length of the SwissProt protein.
While the paper concludes that “Predictions based solely on the basis of statistical and homology methods may prove to be intrinsically inaccurate,” Susan Celniker, co-director of the BDGP and a co-author on the original Science paper, said, “the community absolutely knows that there are limitations to the programs that have been written to do an automated annotation and that they have to be curated.”
“We were surprised the paper got accepted in Nature ,” Celniker said. “Certainly within the genome community, this is common knowledge.”
Samuel Karlin, lead author on the Nature paper, said he originally submitted it in April to Science , but the journal refused to publish it without a response from the original authors. Karlin said that Nature had it reviewed and “the reviewers were extremely positive.”
But the paper’s statement that, “Individual sequences should continually be corrected and refined by multiple rounds of annotation backed up with experimental data before the Drosophila genome can be considered complete and accurate,” is certainly not news to genomics researchers, Celniker said, who understand that the sequenced and annotated Drosophila genome is far from “finished.”
Celniker added that the BDGP’s GadFly annotation database offers researchers the opportunity to fix specific annotations and the BDGP has been collecting corrections with the intention of including them in a complete release that is scheduled for September.
But Karlin noted that while many researchers may be aware of the annotation problems in published sequences, the actual error rate is much higher than perceived. “I think the article, hopefully, serves the purpose of being a caveat, a warning that people have to be careful if they’re using it,” Karlin said.
Karlin said he set out to indicate the dangers of rushing publication of sequenced genomes. “Since they knew about the errors, they should have spent maybe another six months resolving these differences, but they wanted to get the genome published,” he said.
Karlin added that he has spoken to a number of Drosophila researchers, “and they’re sort of split. Half of them think it’s a very good thing to have the genome as early as they have it and they’ll worry about how to use it, and the other half are saying it’s a very good thing to have these cautionary articles to make them aware they have to be more careful.”
Celniker said that several BDGP researchers are drafting a response to the Nature paper.