454 Life Sciences with Baylor College of Medicine, and the J. Craig Venter Institute have each completed an individual’s genome sequence, taking the discipline one step closer to personal genomics.
Even before scientists get the chance to pore over analyses of James Watson’s and Craig Venter’s genomes in detail — both analyses are currently under review at scientific journals — the two sequencing teams are already discussing the completion of their projects.
The two teams, whose subjects were on opposite sides of the race to sequence the human genome for the first time almost a decade ago, used markedly different approaches: 454 and collaborators at Baylor’s Human Genome Sequencing Center mapped Watson’s genome by aligning reads from 454’s Genome Sequencer FLX to the public human reference genome, and adding new sequence reads that were not contained in the reference. JCVI researchers, meantime, relied on their ABI 3730 workhorses to increase their coverage of Venter’s genome, which was already part of Celera’s assembly of the human genome, and assembled it de novo.
Illumina, meanwhile, has been sequencing an African HapMap individual using its Genetic Analyzer. Earlier this year, the company said it had generated 4X coverage with single reads and plans to add paired reads. The project is ongoing, according to a company spokeswoman.
Watson’s Genome: Fast and Cheap
Last week, at a ceremony at Baylor College of Medicine, 454 and Baylor scientists handed Watson a copy of his genome on a DVD. The entire project cost less that $1 million and took two months, according to 454.
At a conference earlier this year, Michael Egholm, 454’s vice president of research and development, and David Wheeler, a researcher in Baylor’s Human Genome Sequencing Center and an associate professor in the department of molecular and human genetics at Baylor, presented preliminary analyses of the results (see In Sequence 3/13/3007).
At the time, the researchers had generated 40 million single reads with an average read length of 250 base pairs, or 10 gigabases worth of sequence data, equivalent of 3X coverage, or 1.5X per haplotype.
They had found 1.9 million reads with single-base substitutions, representing potential SNPs, as well as 68,000 reads with potential insertions or deletions larger than two bases. In addition, about a million reads matched neither the public reference genome nor the Celera assembly, potentially representing new or unassembled sequences.
Since then, the researchers have increased the coverage to 6X, or 3X per haplotype, using 250-base-pair reads. Another 2X with 150 base pair reads is “to be added shortly,” Wheeler told In Sequence in an e-mail message this week.
In addition, the team has validated a number of indels in the coding sequence that range in size from three to 33 bases, and has validated a set of deletions and duplications larger than 30 kilobases. “We are working on other types of rearrangement, which I am hopeful we can find,” he said. In addition, “we may be honing in on some genes in the fraction of the reads that don’t hit the reference genome.”
The data is being submitted to the National Center for Biotechnology Information Trace Archive, while the SNP locations and read mapping locations are being submitted to genome browsers at Cold Spring Harbor Laboratory, NCBI, and the European Molecular Biology Laboratory, according to Wheeler.
However, Watson has the final say over which parts of his genome will be published. So far, he has decided to exclude his apoE gene from the analysis, which could reveal susceptibility to late-stage Alzheimer’s disease.
A “high-profile journal” is currently analyzing the data, according to Richard Gibbs, co-director of Baylor’s Human Genome Sequencing Center, who spoke during the press conference.
Venter’s Genome: True and Trusted Technology
The JCVI researchers have submitted their analysis of Venter’s genome to Public Library of Science Biology and are hoping for it to be published in July, according to a spokeswoman for the institute.
The scientists had a head start for their project because a 3.5X coverage of Venter’s genome was already contained in Celera’s assembly of the human genome that was published in 2001.
“I think that cost has to be put not in the context of just fold-coverage of a genome but actually what the end product looks like.”
“Once [Celera] deposited those [traces] in the Trace Archive, we could access those and use it, and we supplemented that [with 4X coverage here] to complete it to about 7.5X coverage,” said Sam Levy, a senior scientist in the human genomic medicine group at JCVI who headed the project.
The scientists generated the additional coverage by capillary electrophoresis sequencing with average read lengths of 800 base pairs — longer than the 500 base-pair reads of the Celera data. They used a modified version of the original Celera assembler to create a de novo assembly of the genome. Those modifications, which they plan to make publicly available once their paper is published, allowed them to display a diploid genome.
“We really tried to use the widest range of potential mixtures of library sizes; we have optimized the assembly, we did assemblies as we went along to try and get the best possible complete sequence at the end,” Levy said.
Having an assembly of the diploid genome “really enabled us to identify the full scope of the structural variation, and DNA variation from single base pair changes, all the way up to large copy-number changes,” he said. “And we have done extra work to corroborate some of the findings.”
All data, including the traces and the assembly, are currently available from NCBI, according to Bob Strausberg, deputy director of the institute and leader of its human genomic medicine group.
The JCVI scientists would not elaborate on the cost of the project, which has been ongoing for several years, although “it could have been expedited more,” Strausberg said. “I think that cost has to be put not in the context of just fold-coverage of a genome but actually what the end product looks like,” he said.
But scientists will have to wait at least a few more weeks to read about the results of the two sequencing efforts in journal articles.