On its way to reconstructing the Neandertal genome, a research team led by the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, has published an analysis of the mitochondrial genome, based on sequencing data generated by Roche’s 454 Life Sciences.
The results are part of a project, supported by an undisclosed amount of funding from the Max Planck Society, to sequence the Neandertal nuclear genome to one-fold coverage. 454 Life Sciences, which has been generating the sequence data for the project under contract, expects to complete data production, using its Titanium upgrade, within the next few months.
The researchers published their analysis of the mitochondrial genome in Cell last week. Svante Pääbo, director of the department of evolutionary genetics at the MPI in Leipzig, already presented a summary of the results at the Biology of Genomes meeting at Cold Spring Harbor Laboratory in May (see In Sequence 5/13/2008).
According to Michael Egholm, vice president of research and development at 454, the publication contains data from the first and the second phase of the project.
The first phase was designed to address technical issues associated with sequencing ancient DNA and “took much longer than we thought,” he said. During the second phase, the scientists generated approximately 100 megabases of Neandertal DNA, “enough data to be able to do a meaningful analysis.” The third phase, for which data production is slated to be completed by the end of the year, will rack up the coverage to one-fold, or 3 gigabases.
For their most recent publication, the researchers used the Genome Sequencer FLX to generate 39 million sequence reads in 147 runs from nine 454 libraries. These libraries were generated from three DNA extracts, each made from approximately 100 to 200 milligrams of the same Neandertal bone, which was uncovered in Vindija, Croatia.
The reason the researchers were able to reconstruct the mitochondrial genome ahead of the nuclear genome is that mitochondria are present in the cell in high copy numbers, explained Ed Green, a postdoctoral fellow in Pääbo’s group and the lead author of the paper.
“It got to the point that we just had so much mitochondrial sequence that we could pretty easily assemble it, [so] we decided to do that and see how that looks,” he said
Since he and his colleagues wrote the paper, the coverage of the mitochondrial genome has doubled from 35-fold to about 70-fold and continues to climb, he said.
The scientists made two improvements to the standard 454 protocol. Earlier this year, they published a new protocol for quantifying 454 libraries that uses qPCR and reduces the need for costly titration runs on the sequencer (see In Sequence 1/8/2008). According to the recent Cell paper, they also devised a new way to increase the DNA yield during library preparation. That protocol has just been submitted for publication, according to Green, who declined to provide details.
In order to limit contaminating the Neandertal genome with human DNA, the scientists extracted the DNA from the bone in a cleanroom and generated the 454 libraries using a project-specific four-base key, also in a cleanroom (see In Sequence 9/11/2007).
“It got to the point that we just had so much mitochondrial sequence that we could pretty easily assemble it … and see how that looks.”
These precautions were necessary because it turned out that about 10 percent of the first 454 library they sequenced, results from which they published in Nature in 2006 (see GenomeWeb Daily News 11/15/2006), was modern human DNA. They have now eliminated that dataset from the project.
However, the conclusions from that first publication still hold up, according to Egholm. “The only thing that this paper showed is that you can get genomic DNA from a Neandertal” and provide a first estimate for the time it took modern humans to diverge from Neandertals, he said.
For their analysis, the scientists first looked in their data — which overwhelmingly consists of contaminating bacterial sequences — for sequences that are similar to human mitochondrial DNA. They used for this task a 256-core compute cluster dedicated to their project that is administered by the Rechenzentrum Garching, a computing center near Munich that offers computing services to Max Planck institutes.
Next, they aligned the data to the human mitochondrial genome “to get some idea of where they would go in the assembly,” according to Green, who added that they factored damage specific to ancient DNA into their analysis.
“All of this then goes into our assembly procedure for reconstructing what the Neandertal sequence was 38,000 years ago, when this guy died,” he said. For the assembly, the researchers developed their own algorithm that is customized for ancient DNA. “It is modeled around the typical mapping assemblers, where you take a reference sequence to learn where things go,” Green said. “The main difference is that it knows what ancient DNA looks like [and] how often specific changes happen at each position of the molecule.” The assembler, which has not been described in a publication yet, is freely available to researchers, he said.
3 Gigabases and Beyond
The researchers plan to generate one-fold coverage of the Neandertal nuclear genome by the end of the year. According to Egholm, the remaining runs will be performed using 454’s new Titanium chemistry and plates, which the company plans to launch later this year. Though the project will not be able to profit from the longer read lengths that the new chemistry enables, because ancient DNA is broken into small pieces, the upgrade will allow the researchers to obtain more reads per run.
“The biggest drawback is [that] most of the DNA that we recover is not Neandertal,” said Green, adding that he and his colleagues are “experimenting with ways to get the percentage of [Neandertal] DNA that we care about higher.”
But the microbial sequence data, which makes up about 95 percent of the entire data, will not go to waste and will be analyzed separately, according to Egholm. “It’s the biggest metagenomics project in the world.”
It is unclear yet whether the project will continue beyond one-fold coverage. “The obvious thing to do is multifold coverage and trying to get a finished sequence,” Green said, though “there is no hard and fast plan for that.” In their paper, the scientists estimate that in order to obtain an error rate of 1 in 10,000, they would need 12-fold coverage.
In May, Pääbo told In Sequence that it was not yet clear which sequencing platform he and his team would use to increase the coverage.
According to Green, the group tests new sequencing platforms for how suitable they are for ancient DNA work. In addition to the 454 technology, the institute has already acquired an Illumina Genome Analyzer and has used it to sequence Neandertal DNA “on the side,” he said.
However, the group took a pass on Applied Biosystems’ SOLiD sequencer, which seemed “less well suited, at this point, for ancient DNA,” according to Green.
The color-space encoding of bases that the SOLiD provides, he explained, requires a reference genome to map against, “and for any ancient DNA project, you really don’t have an incredibly close genome to search against.” In addition, the SOLiD did not handle single-base indels well in their test of the platform. “It’s not something that the [SOLiD] standard workflow does out of the box, ….and we were not really keen to invest more time to try to figure out how to make it work for ancient DNA,” he said.
The analysis of the Neandertal mitochondrial genome provided new insights into the biology of the species. For a start, it allows a better estimate of when Neandertals diverged from modern humans — according to the latest data, approximately 660,000 years ago.
Also, an evolutionary test of the protein-coding sequences showed that there was weaker so-called “purifying selection” in Neandertals than in a number of great apes, according to Green. One explanation for that, he said, is that the Neandertal population was small, an estimated fewer than 10,000 individuals. If that is true, it suggests that modern humans were not responsible for the Neandertal’s demise and that “they were struggling in maintaining respectable numbers long before we came on the scene.”
A more direct way to estimate Neandertal population size would be to get diversity information from multiple individuals, he said, adding that a colleague of his is “working on a method for selective capture [of DNA] that looks pretty promising.” Right now, the group has between five and 10 Neandertal bones that are suitable for analysis, he said.
Pääbo mentioned in May that the data also suggests that Neandertals probably did not interbreed with humans since there is no evidence of Neandertal mitochrondrial DNA, which is inherited maternally, in modern humans.
Ultimately, the Neandertal genome sequence will help scientists to better understand the evolution of modern humans.
Egholm said the researchers will pay especially close attention to areas in the genome that Neandertals and chimps share, but where modern humans differ. “We are using Neandertals purely as a signpost on human development,” he said. The project will “increase our understanding of what makes us human.”