This article was originally published May 18.
Roche's 454 Life Sciences used a mixture of shotgun and BAC-pool sequencing to analyze the genome of the oil palm, a strategy that could be useful — though costly — for other complex plant genome sequencing projects, according to a company official.
Last week, 454 said that its team, in collaboration with two Malaysian companies, bioinformatics firm Synamatix and plantation company Sime Darby, had sequenced the 1.7-gigabase oil palm genome (see In Sequence 5/12/2009).
According to a company statement, the result is the first de novo assembly of a "large and highly complex" plant to be completed without the use of Sanger sequencing data.
The hope is that markers identified in the genome can help to breed the plant to be more disease-resistant, tolerant to drought and salt, and to produce new varieties of palm oil, according to Sime Darby.
Michael Egholm, 454's vice president of research and development, told In Sequence last week that the project started about a year and a half ago, although the experimental phase took less than a year, and the sequencing data production just several months.
Synamatix conducted the project on behalf of Sime Darby, which paid an undisclosed amount for the work, and sub-contracted the sequencing to 454. Until a few weeks ago, even 454 researchers did not know Sime Darby's identity, and were only "very reluctantly" told which genome they were working on, according to Egholm. "It was really a stealth project," he said.
Sime Darby's relationship with Synamatix goes even further back. According to the company, its Sime Darby Technology Centre started a collaboration with Synamatix on oil palm gene expression analysis in 2005, and "it was during this time that the possibility of sequencing the genome was explored." After conducting a feasibility study last June, the company became convinced to invest in an oil palm genome project.
Sequencing the oil palm was a challenge because over 60 percent of its genome contains repeats, Egholm said. Therefore, the researchers decided to tackle the genome with a combination of shotgun sequencing and sequencing of BAC pools, using a mixture of 250-base 454 GS FLX and 500-base Titanium reads.
It took a significant amount of time to identify a vendor for the BAC pools and to figure out experimentally how many BACs could be pooled so each complex repeat would only appear once in a given pool, according to Egholm.
For the shotgun sequencing, the company used 3-kilobase and 20-kilobase libraries and is adding libraries of other sizes now, he said.
Synamatix assembled the genome using an iterative approach, he said, integrating both the BAC and the shotgun data. The company used 454's Newbler software to assemble the BAC pools and put these contigs together using other assemblers.
According to its website, Synamatix uses a new approach to build databases, in which patterns, their relationship, and their significance are maintained in a so-called "SynaBase," which the company says is the core technology underlying almost all of its applications. Computationally demanding analyses, such as assembling second-generation sequencing data, "can be completed hundreds of times faster" than with conventional databases, according to the firm.
Sime Darby said in a statement that the genome was sequenced at 30-fold coverage and with 93.8-percent completeness.
According to Egholm, the quality of the assembly is "very good," reflected by the fact that 99.9 percent of sequence reads from a transcriptome project that was conducted in parallel could be mapped back to the assembly.
In that second project, the researchers sequenced the transcriptomes of 12 different samples, including different plant parts and plants with certain desirable properties.
Synamatix has also annotated the genome and has generated "an essentially complete gene list," according to Egholm. The plan is to publish details of the techniques and the assembly, he said, though he did not say when the partners plan to do so.
In the meantime, 454 and Synamatix are planning to sequence additional, undisclosed genomes of commercially important plants using the same approach, he said.
Egholm also expects that other companies wanting to sequence crops might become interested in the service. Now that the experimental procedures have been worked out, he said, it will likely take no longer than three to six months for a project "from start to finish, at a fraction of a cost it takes to do a big plant genome today" using Sanger technology.
However, he acknowledged that the analysis is "not cheap" and requires high coverage for generating a mixture of shotgun and BAC pool data. "For very complex genomes, such as wheat, I think this is a viable approach," he said.
He added he would like to sequence complex plant genomes using only 454 long reads and paired reads, "and we may get there, [but] we are not there with Newbler." In the meantime, "here you have a strategy that actually works."