Plant breeding research firm KeyGene and genomic services provider Amplicon Express have teamed up for a new service designed to provide a sequence-based bacterial artificial chromosome map of plants and animals with the initial goal of helping whole-genome de novo sequencing projects.
The new method used in the service, called Whole Genome Profiling, relies on Amplicon Express' expertise with BAC libraries and pooling strategies, and KeyGene's sequencing services on Illumina's Genome Analyzer and internal software tools.
So far, the companies have tested their patent-pending method in-house on the 120-megabase Arabidopsis genome and the 450-megabase melon genome, and are currently applying it to an undisclosed 2.6-gigabase plant genome, its first customer project for the service.
Presented earlier this month at the Plant and Animal Genome conference in San Diego, the new service grew out of collaborations between the two companies on several genome projects that started more than two years ago.
Amplicon's BAC-related services and KeyGene's high-throughput sequencing capabilities "made a nice fit" and provided Amplicon's customers with access to second-generation sequencing, according to Amplicon CEO Robert Bogden.
"Rather than making a big investment in that kind of instrumentation and people, it made beautiful sense to go right to KeyGene," which can also offer other profiling as well as breeding services (see In Sequence 1/23/2007) to Amplicon's plant clients, Bodgen said.
KeyGene, on the other hand, stands to benefit from Amplicon's experience with BAC-related technology. KeyGene currently has one Genome Analyzer in-house but expects to add a second instrument as the new service gains traction, according to Michiel van Eijk, KeyGene's vice president of upstream research.
To construct a sequence-based BAC map under their service, KeyGene and Amplicon scientists generate first a 10x BAC library, pool individual BAC clones in a multi-dimensional format, prepare restriction fragment libraries from BAC
[ pagebreak ]
pool DNA, and sequence the ends of the restriction fragments on the Genome Analyzer. They then assemble overlapping BAC clones based on the sequence information, which consists of 30-base pair reads every 2 to 3 kilobases across each BAC clone.
The cost for the service will range from $90,000 for small genomes like Arabidopsis to $250,000 for mid-sized genomes similar in size to the melon's, and $1.9 million for a genome the size of the human's. The service will include everything from BAC library generation to data assembly and analysis. Customers can request the service from either company.
Compared to a traditional physical BAC map, or snapshot map, which puts BACs in order based on their restriction fragment fingerprints, the sequence-based approach provides not only the order of the BACs but also sequence information, which can be added directly to whole-genome shotgun-sequencing data.
In addition, although both types of maps cost roughly the same, a sequence-based map is more accurate than a restriction fragment map, which is based on fragment length and mobility rather than sequence, according to van Eijk.
"It's a better-quality product for a comparable price," he said.
Another profiling method, Sanger-based end-sequencing of a BAC library, costs as much as $2.4 million, based on a human BAC library with 300,000 clones and a cost of $4 per read, according to the firms — not including library preparation, pooling, or assembly.
Their whole-genome-profiling service is less expensive and provides higher resolution because it generates sequence data across the entire length of a BAC instead of only the ends, according to KeyGene and Amplicon.
To validate their new approach, the companies assembled BACs from the 120-megabase Arabidopsis genome that they sequenced in a single Genome Analyzer run last summer. According to van Eijk, they were able to cover 99 percent of the genome with a maximum gap of 125 kilobases between two clones, corresponding to the length of a single BAC.
Encouraged by these results, the partners last fall generated a sequence-based BAC map of the 450-megabase melon genome, a plant "that we are interested in from a commercial point of view," according to van Eijk. The resulting map had 670 contigs and is now serving as a scaffold for an internal melon genome-sequencing project that is still in progress.
[ pagebreak ]
To see if they can apply the method to even larger genomes, the companies are currently constructing a BAC map for an undisclosed 2.6-gigabase plant genome, a customer project, which they plan to complete in a couple of months. Initial results from this project are better than expected, van Eijk said.
"We expected some loss of performance for complex genomes, but so far we have not seen that," he said.
According to Bogden, a sequence-based BAC map could be a starting point for new whole-genome sequencing projects, especially if the researchers want to be able to pull out specific clones for further research.
"For people who want to do research, as soon as they analyze the data, they say, 'I want to play with this piece of DNA,' and then they need to have a resource," he said. However, this is not possible with approaches that avoid clone-based libraries and rely entirely on shotgun libraries sequenced by second-generation sequencing technologies.
Also, it is still not possible to assemble certain genomes with a clone-free approach, according to van Eijk. "For many genomes, that is still not feasible, especially for the complex genomes such as crop species that have genome sizes that are even bigger than human genomes," he said.
The new method could also help rescue projects in which researchers have obtained sequence reads that do not assemble, called orphans, from a second-generation sequencer, although the companies have not yet applied it to such a project. Researchers can use the BAC map data "to help to organize their orphans and build BAC contigs and obtain a superstructure for an existing whole-genome sequencing project," Bogden said.
In the future, sequence-based BAC maps could also be used to study structural variations between genomes, such as tumor genomes, at high resolution; or to generate high-resolution synteny maps between closely related species, according to van Eijk.
"These are applications that we have not yet carried out, but they are logical extensions of the concept," he said.
Researchers participating in several ongoing plant and animal genome projects believe the Whole Genome Profiling service is interesting, but caution that they have not yet seen any detailed results. Also, many of them have already generated physical BAC maps based on restriction fingerprinting and are not planning to add another one.
"WGP seems an interesting technology for physical map construction, but I would like to have more details about it," said Jordi Garcia-Mas, a scientist at the Institute of Agro-Food Research and Technology in Spain.
Garcia-Mas is part of a Spanish public-private melon genome sequencing project that uses the 454 sequencing technology and is unrelated to KeyGene's effort. That project has already obtained a BAC fingerprinting map and is not planning to try WGP, "unless we had serious problems with our current approach," he said. However, "we would be interested in using it in the future, but it mainly depends on the cost of this technology," he added.
Likewise, the International Sheep Genomics Consortium has already end-sequenced a BAC library and has generated a physical map of the sheep genome based on these data, according to John McEwan, a scientist at AgResearch in New Zealand who is part of this project.
"However, that said, we are looking to create a reference genome for sheep and will explore all options to fill any likely gaps we may have in that exercise," he said. "If we feel the assembled physical map resources are insufficient, we will look for the best way to solve the problem."