Researchers led by Penn State University have used a combination of technologies to sequence the complete genomes of a Khoisan hunter-gatherer from the Kalahari Desert in Southern Africa, as well as Archbishop Desmond Tutu, a Bantu, and have sequenced the exomes of an additional three men, Bushmen from other regions of the Kalahari Desert.
Their results, they said, confirmed the hypothesis that the hunter-gatherer tribes of southern Africa are genetically divergent from other humans as well as from each other. They also said that the Khoisan genome could eventually be a reference genome. The results were published last week in a study in Nature.
The team used an assortment of sequencing technology, including 454 GS FLX Titanium, SOLiD, and Illumina, and de novo assembled one genome from only 454 sequence data, using the Phusion assembler. Stephan Schuster, a professor of biochemistry and molecular biology at Penn State University and the lead author of the study, said that using a variety of sequencing technology allowed them to get a more accurate genome than they would have using only one platform.
"Through the combination [of sequencing platforms] you get validated SNP sets and you can also discover novel SNPs that one platform alone will not allow you to detect," said Schuster.
Jun Wang, executive director of BGI-Shenzhen, agreed that combining platforms could improve the genome quality. "If we can use different technologies to sequence the same human genome to a completed manner, the quality of the final sequence should be improved," he told In Sequence in an e-mail. "The sequencing errors that are caused systematically by different technologies potentially could be solved by using a different platform."
However, he added, it is still possible to generate a high-quality genome on one platform by sequencing with enough coverage and using different insert sizes.
BGI has used the Illumina technology alone for the de novo assembly of several genomes, including the giant panda genome and two human genomes, which were all published in December (see In Sequence 12/15/2009).
Using 454 Titanium long sequence reads, the Penn State team de novo assembled the genome of the Khoisan hunter-gatherer. They achieved average read lengths of 350 base pairs, and sequenced the genome to 10.2-fold coverage with shotgun reads. Additionally, they sequenced long-insert libraries with 454 Titanium paired-end technology, with insert sizes up to 17 kilobases and 12.3-fold non-redundant clone coverage. They used the Phusion assembler to assemble the genome. Assembled contigs totaled 2.79 gigabases, with an N50 contig size of 5.5 kilobases. The total scaffold size, including gaps, was 3.09 gigabases with an N50 scaffold size of 156 kilobases.
According to the study, some of the 454 sequence data "resulted in contigs and scaffolds that do not map against the human reference genome. Many of these scaffolds corresponded to gaps in the current human reference assembly, including gaps over 200,000 base pairs in length."
Schuster said this shows that the 454 technology is capable of assembling sequences that are undetectable by other platforms, including Sanger. "This will allow us to eventually have a more complete human genome than ever before," he said.
The researchers also sequenced the complete genome of Archbishop Desmond Tutu, using Life Technologies' SOLiD 3.0, to over 30-fold coverage. The sequence data for Tutu as well as the Khoisan individual were then validated with whole-genome sequencing on Illumina's Genome Analyzer — 23.2-fold for the Khoisan hunter-gatherer and 7.2-fold for Tutu.
Schuster said they used two different approaches for the whole-genome sequencing because they wanted to compare platforms. He said that each of the sequencing platforms had different advantages, but declined to comment on specifics because the team is planning to publish a paper comparing the three platforms in the coming weeks. He also added that Life Tech's Applied Biosystems group is currently in the process of sequencing the Khosian individual, and that three other sequencing companies had requested the DNA samples in order to demonstrate their sequencing technologies.
Schuster did say that for de novo assembly, he thought 454 was the best choice because of the long read lengths. He also added that the team is continuing to improve the Khosian genome using 454 technology with the goal of making it "a reference for a genome sequenced with next-gen technology."
The hunter-gatherer tribes of southern Africa are the oldest known lineage of modern humans, and it has been thought that they are genetically divergent from other humans and exhibit a great deal of genetic variation between tribes.
The sequencing data confirmed this hypothesis. The team detected over 4 million SNPs in the Khoisan individual, including over 700,000 novel SNPs — more than have ever been reported in a human genome. They also determined that there was more genetic variation between the African individuals than between a European and an Asian individual. When they compared the genomes to the publicly available Yoruba genome, they found more variation between the Bushmen and the Yoruba genomes than between the European and Yoruba genomes.
Schuster said the results could have implications for personalized medicine, particularly for individuals from southern Africa, for which there is not a well-characterized genome.
In addition, adding the newly found variants to current databases could aid in an understanding of region-specific disease. For example, the researchers found that the Bushmen lacked an African-specific allele for malaria resistance, likely because they were not adapted to a farming lifestyle.