NEW YORK (GenomeWeb News) – In a paper scheduled to appear online this week in the Proceedings of the National Academy of Sciences, a team of researchers from Brigham Young University suggest that DNA barcoding may inadvertently over-estimate species diversity by amplifying pseudogenes in the nucleus.
The researchers warned that nuclear mitochondrial pseudogenes, which were rife in the four grasshopper and four cave crayfish species they tested, could lead to species misidentification or over-estimates during DNA barcoding. Those involved in the international barcode of life, or iBOL, project counter that psuedogenes are well-known and are already being addressed.
“We think it’s a pretty reasonable idea to be able to identify species with genetic tags,” senior author Keith Crandall, a biologist at BYU, told GenomeWeb Daily News. But, he said, pseudogenes could wreak havoc with barcoding efforts if researchers don’t take adequate steps to account for them. “If you weren’t watching for such things, [pseudogenes] would pose some problems for DNA barcoding.”
“Sadly, the authors of this paper do not understand barcoding protocols,” Paul Hebert, director of the Biodiversity Institute of Ontario at the University of Guelph, told GenomeWeb Daily News. Calling the title of the paper misleading, he said barcoders have been aware of nuclear pseudogenes for years and have already designed some strategies for dealing with the problems described in the paper.
DNA barcoding, in which a standard genetic sequence is amplified to catalog and identify species, has been touted as a tool for everything from exploring biodiversity to tracking poachers to ensuring food safety.
But because barcoding relies on mitochondrial sequence, it may be confounded by nuclear mitochondrial pseudogenes or “numts” — chunks of mitochondrial DNA that are inserted into the nuclear genome, Crandall explained. The function of the numts, if any, is unknown.
Crandall and his team encountered numts during a phylogeography project on cave crayfish, during which they sequenced small stretches of DNA using primers designed to anneal to mitochondrial DNA. They came up with a large number of numts — as did another research team working on grasshoppers.
The two groups decided to collaborate, characterizing the numts in the species. They found that many of the grasshopper and crayfish species contained one or several numts that could be amplified with mitochondrial DNA. Most, but not all, contained numt signals such as indels, in-frame stop codons, and certain nucleotides.
The paper went on to explore the potential pitfalls of numts for DNA barcoding and described approaches for minimizing these problems — for example, focusing on mitochondria rich tissues, amplifying longer sequences, or looking at additional markers that could decrease the risk of barcoding numts.
“Whereas DNA barcoding strives for rapid and inexpensive generation of molecular tags, we demonstrate that the presence of [cytochrome c oxidase I] numts makes this goal difficult to achieve when numts are prevalent and can introduce serious ambiguity into DNA barcoding,” the authors wrote.
“Given that pseudogenes were reported 25 years ago, it’s not new news to us,” Hebert said. He said the team focused on species in which numts are particularly common and drew conclusions based on these eight species. Barcoding projects such as iBOL, he said, include data from thousands of species and are carried out using methods that differ from those described in the paper.
Hebert emphasized that the Barcoding of Life Data Systems, or BOLD, database scours sequences for indels, stop codons, and other tell-tale pseudogene signs. Barcoding sequences are also screened against a pool of sequences representing known contaminants, he said. Sequences that raise red flags are then set aside for further assessment, including longer sequence analysis or RT-PCR.
And, he noted, large barcoding studies typically amalgamate DNA barcode data with information provided by taxonomy, morphology, ecology, and other biological measures. “We’ve never advocated that sequence information alone is declarative for species boundaries,” he said.
“In certain taxonomic groups we need to do better in adding to the standard barcoding,” Hebert admitted. He said barcoders are working to come up with new solutions for analyzing pseudogene-prone groups. Some options include weeding out pseudogenes by using nuclear histones as a marker for nuclear DNA or developing new primer sets for pseudogene-rich species. Alternatively, he said, it may eventually be simpler to tease nuclear and mitochondrial DNA apart with commercial kits.
But, Hebert emphasized, pseudogenes are not usually a concern in the majority of species — especially those in the most diverse species groups.
While he is confident that iBOL is taking adequate steps to address numts, Hebert is concerned that individuals who naively read the PNAS paper might mischaracterize iBOL’s work plan. He and others are currently preparing a response to the paper.
Overall, though, Hebert said such criticisms will ultimately serve to make the program stronger. “Barcoding has weathered many, many other assaults,” he said. “We’re ready to weather this storm, too.”
For his part, Crandall conceded that large barcoding projects such as iBOL “have excellent strategies for quality control of data” and are already applying many of the steps he and his colleagues recommended. Still, he said, even though some people are already worrying about numts does not mean everyone in the field is addressing the problems appropriately.
“The message is relevant for everybody doing barcoding,” Crandall emphasized. In particular, he noted, the paper should be a wake-up call for those who are attempting to do DNA barcoding projects that rely on static databases such as GenBank that do not use the same sort of pseudogene screening methods employed by BOLD.
At press time, the PNAS paper had not yet been added to the journal's website.