When it comes to de novo sequencing, next-generation technologies have largely stayed within the realm of bacteria. But two independent recent genome projects, led by teams in the US and in Italy, show that next-gen sequencing, combined with Sanger sequencing, can tackle eukaryotic genomes, despite challenges like homopolymers and repeat sequences that these genomes tend to harbor.
“When all is said and done, our conclusion will be that [next-generation technologies] have a big contribution to make in eukaryotic hybrid assemblies,” said Stephen Kingsmore, president of the National Center for Genome Resources, who headed one of the projects.
Both sequencing projects, the genomes of the Pinot Noir grapevine and the plant pathogen Phytophthora capsici, were presented at last week’s Plant and Animal Genome conference in San Diego. They used different blends of Sanger and 454 sequencing to create their respective assemblies. But while the P. capsici project included paired 454 reads, the grapevine project did not.
Sequencing of the 475-megabase grapevine, or Vitis vinifera, finished recently and the researchers just submitted the genome – 50 million bases of which are not yet assigned to chromosomes — to the EMBL Nucleotide Sequence Database last month. Riccardo Velasco of the Istituto Agrario San Michele All’Adige in Trento in Northern Italy, who led the effort, said the genome project started six years ago and began Sanger sequencing about a year and a half ago.
However, “after 7x coverage with Sanger, we saw that to go ahead with traditional Sanger to fill in the gaps would have been very expensive and time-consuming,” he said.
Thus, about six months ago, “we tried some rounds of 454 sequencing, and it worked very well in filling up small gaps that were still present after assembling the sequence,” Velasco said.
Velasco heads the department of genetics and molecular biology at IASMA, located in Trento, in the Dolomite region of the Alps that is well known for its vineyards and orchards. The grapevine genome project, funded with €10 million ($13 million) from the province of Trento, was a collaboration between IASMA; Myriad Genetics in Salt Lake City, which performed most of the Sanger sequencing and provided bioinformatics software; and 454 Life Sciences, which provided sequencing services with an average of 200-base reads at its service center.
In the end, the researchers settled for 7x coverage with Sanger sequencing and 4.2x coverage with 454 sequencing. Both short homopolymer stretches and longer repetitive regions, which are mostly present in eukaryotes, were a challenge for the 454 technology alone but could be overcome by a combination of the two technologies, Velasco said.
454 sequencing was essential in determining the two haplotypes of the particular grapevine cultivar his team chose to sequence, a cultivated Pinot Noir that is highly heterozygous. “454 helped quite a lot to identify stretches that belong to genome A and stretches that belong to genome B,” Velasco said.
According to Velasco, the “most difficult threshold” in the grapevine project was to merge the different reads from the Sanger and 454 sequencers into a single assembly. The researchers at Myriad Genetics helped with that by developing base-calling and assembly software that integrates the two data types. “The most convincing reason why we decided to collaborate with Myriad was its strength in bioinformatics,” he said.
Spurred by the success of grapevine project, Velasco’s consortium has already moved on to the apple genome, sequencing the Golden Delicious cultivar. Six months into the project, they have reached 3x coverage with Sanger sequencing and plan to add 454 sequencing soon.
“Our strategy with apple will be much less Sanger and much more 454,” Velasco said. “We are planning, at the moment, 4x coverage with Sanger, and maybe 10x with 454. … We have to see if that is enough Sanger sequencing.”
He said his institute is ”very interested” in acquiring its own 454 sequencer, adding that he expects 454 sequencing for the apple genome to be performed both in the US and in Italy.
The P. capsici genome project, funded by the National Science Foundation, the US Department of Agriculture, and the Department of Energy, serves “to figure out the role of next-generation sequencing technologies” for the NSF/USDA’s microbial genome sequencing program, said Stephen Kingsmore of the NCGR, who heads the project. He plans to publish the assembly this spring, and to finish the project by the end of the year.
“When all is said and done, our conclusion will be that [next generation technologies] have a big contribution to make in eukaryotic hybrid assemblies.”
The researchers on the project, which included teams from NCGR, the DoE’s Joint Genome Institute, the University of Tennessee, Ohio State University, 454 Life Sciences, and the Virginia Bioinformatics Institute, used a mix of 5x Sanger sequencing, 23x 454 sequencing, and approximately 1 million short paired 454 reads to tackle the 65-megabase genome. “This is the first de novo assembly using 454 paired reads,” said Kingsmore.
The researchers also tried using three different assemblers: 454’s own Newbler assembler, a Phrap assembler used at JGI for bacterial genomes, and the FORGE assembler developed by Darren Platt at JGI. Only the latter was able to incorporate all three data types, and resulted in the best assembly by generating approximately 250 scaffolds, according to Kingsmore.
“Clearly, for our assembly, the key to success were lots of fosmid reads and lots of 454 paired reads,” Kingsmore said. “Those two things really helped.” What is needed now, he said, is assemblers that can deal with paired reads at slightly different distances, like the ones 454’s approach creates. “There needs to be some flexibility there.”
There is still work to do on the P. capsici project, he said, such as figuring out the reasons for gaps, and determining strategies for identifying genes, especially if frameshifts are present in the sequence. “I don’t think this project will answer the question, ‘What’s the perfect mix that’s most cost-effective between next-gen and traditional technology?’ but it’s exploring that space and gives some recommendations,” he said.