An international consortium of researchers said it plans to increase by more than 10-fold the catalog of eukaryotic species that are tagged by a DNA barcode, and to develop new barcoding technology to identify specimens rapidly and inexpensively.
The first phase of the project, which will generate a library of barcoded species, will largely involve Sanger sequencing technology, according to one of the organizers. Second-generation sequencing technologies will find applications in environmental barcoding studies later on, and a long-term goal of the project is to develop a hand-held barcoding sequencer.
The initiative, called International Barcode of Life Project, or iBOL, currently involves 26 countries. Planning for iBOL started a year ago at a workshop at the University of Guelph in Canada that brought together a variety of international researchers with a shared interest in barcoding.
Barcoding involves sequencing a short, standardized gene region that differs between species. In animals, for example, researchers use a portion of the mitochondrial cytochrome c oxidase I gene as a barcode.
iBOL’s first aim is to create a reference library by barcoding 5 million specimens representing 500,000 species within five years. This project, set to begin next year, will significantly expand the current library, which comprises approximately 41,000 barcoded species.
What distinguishes this project most from other large-scale genomic projects is that each sequence read derives from a different sample, according to Paul Hebert, director of the Biodiversity Institute of Ontario at the University of Guelph and an iBOL organizer.
To generate a 650-base read for each of hundreds of thousands of samples, “Sanger sequencing technology is really the only feasible way to go,” he said.
His institute, which opened in May 2007, includes a core facility dedicated to barcoding called the Canadian Centre for DNA Barcoding.
Equipped with three Applied Biosystems 3730xl sequencers that are assembled with other instrumentation into a semi-automated production line, the center — currently the largest barcoding facility in the world, according to Hebert — is capable of producing 1 million reads per year.
“Sanger sequencing technology is really the only feasible way to go.”
International collaborators visit the center and, bring new samples, or send tissue samples by mail for analysis along with images and other information about the specimens, said Hebert. All information about a species goes into the Barcode of Life Data System, or BOLD, a barcoding data-repository and -analysis platform that is currently housed at CCDB. Sequence data from BOLD also eventually enters GenBank for long-term storage.
The CCDB owns a 454 sequencer as well, though Herbert said it will not use the instrument for the library project because it does not multiplex sufficiently. Rather, institute researchers use the 454 technology for environmental barcoding, or determining which species are contained in an environmental sample. That application, Hebert said, is similar to metagenomic sequencing, but it delivers species names since it focuses on the barcoding region.
The 454 technology is suitable for this application even though its reads do not cover the entire barcode, Hebert said. “The interesting thing that we have discovered is that you certainly don’t need the whole barcode read to gain an identification,” he said. “You can identify almost all species on the planet from just 100 base pairs of information.”
iBOL is currently focused on raising at least $100 million of its $155 million budget from various funding sources around the world, according to a project outline published by the consortium last month.
iBOL says it complements the Consortium for the Barcode of Life, an existing partnership of 150 biodiversity organizations. While CBOL focuses on setting standards and networking, iBOL sees its mission in generating actual barcoding data.
Participants in iBOL are organized in three tiers. Four so-called central nodes —in Canada, China, the European Union, and the US — are each scheduled to bear $25 million of the cost and to be mainly responsible for coordinating the project and supporting its core facilities and data repositories. In addition, nine other countries or “regional nodes” will each contribute $5 million for barcoding work, and seven countries, or “national nodes” will provide $1 million each.
The majority of the funding, $92 million, is earmarked for barcoding the 5 million specimens. In particular, barcoding efforts will focus on vertebrates, land plants, fungi, human pathogens and vectors of pathogens, agricultural and forestry pests and their parasites, pollinators, freshwater organisms, marine species, butterflies, ants, earthworms, and polar species.
According to Hebert, data for this part of the project will probably be generated by a variety of facilities of different sizes. Several institutes, for example in the Netherlands, are currently planning to build dedicated barcoding sequencing core facilities similar to the CCDB, he said.
In addition, existing academic sequencing centers, such as Genoscope in France, as well as commercial contract sequencing facilities, will contribute data to iBOL.
“The final route is artisanal,” Hebert said, involving labs with a single medium-throughput capillary sequencer that might “generate a few 1,000 sequences” for iBOL.
Besides generating the barcode library, a total of $20 million will help develop protocols that will enable a later phase of iBOL, during which researchers want to barcode all known 10 million eukaryotic species. As a test, participants plan to barcode all eukaryotes found at single environmental sites. They also plan to barcode specimens kept in museums and in permafrost.
Another $13 million will be devoted to maintain and scale up the BOLD data system, and $10 million is budgeted for administrating iBOL.
Lastly, $20 million is earmarked for new sequencing technologies that will play a role at a later stage once the library of barcodes is established.
In particular, second-generation sequencers will be used in “massive biodiversity scans” such as “blending a container full of insects and soon knowing the species composition of the slurry,” according to the iBOL report, which states that pilot studies using pyrosequencing technology — such as the University of Guelph’s 454 sequencer — are already underway, focusing on aquatic invertebrates. These projects will generate large amounts of data, which “will require both storage and analysis, and the floodwaters will only rise over iBOL’s lifespan,” according to the report.
The other aim is to develop a small, integrated DNA barcode reader — initially a tabletop instrument, followed by a handheld device later — that would enable, for example, customs officers at a border checkpoint or researchers in the field to identify species without having to send samples to a core facility.
Such an instrument would integrate all steps of the barcode analysis, according to Hebert, who said that “we are in the early stages of assembling a group of people on this mission,” including Microchip Biotechnologies, a Californian company that is working on a miniaturized Sanger sequencer, and a new, undisclosed company. “The technology is there, it just needs to be bound intelligently,” Hebert said.