Skip to main content
Premium Trial:

Request an Annual Quote

Two New Methods Tackle Whole-Genome Haplotyping, a Key Step in Bringing Sequencing to the Clinic


By Monica Heger

Despite considerable advances in whole-genome sequencing in recent years, key information is still missing from all of the genomes sequenced with next-generation sequencing technology — haplotype information.

Now, separate papers published this week in Nature Biotechnology detail two different ways to haplotype whole genomes. In one paper, a group from the University of Washington combined next-gen sequencing with large insert cloning to achieve a sequenced genome with haplotype information. In the second paper, a group from Stanford University used microfluidics technology in combination with genotyping to obtain haplotype information at the single-cell level. While the two methods are different, they could be complementary to each other, said Jay Shendure, who led the University of Washington team.

"The two papers show different techniques that are available to use for research immediately," Rade Drmanac, Complete Genomics' chief scientific officer, told In Sequence. Drmanac, who is also developing a haplotyping method at Complete Genomics, was not affiliated with either study published this week.

Until now, the only two genomes that had been completely haplotyped were the reference human genome and Craig Venter's genome, both of which relied on Sanger sequencing and clone mapping to resolve the haplotypes — a labor-intensive and costly process.

While the newer sequencing technologies have allowed for exponential cost reductions and much higher throughput, the shorter reads are not amenable to obtaining haplotype information, which will be critical in the fields of personalized medicine and population genetics.

"For personal genome sequencing and diagnostics, [haplotyping] is really going to be critical," particularly when genomes are sequenced early in life with the goal of predicting disease risk, Drmanac said. Without haplotype information, "reported genomes are a just consensus of the two parental genomes," making disease risk prediction difficult.

Old School Meets New School

In the University of Washington paper, the team combined "old school genomics with new school genomics," said Shendure.

First, they made a fosmid library from DNA from a HapMap individual of Indian descent, with inserts of about 37 kilobase pairs. They then split the library into more than 100 different pools, so that the odds of both alleles being contained in one pool were very low. Then, after barcoding the pools, the team shotgun-sequenced the libraries on the Illumina Genome Analyzer to a mean depth of 2.4-fold per haploid clone. Next, before phasing, the team used whole-genome resequencing to search for variants. They used the Illumina HiSeq 2000 with 50 base paired-end reads and sequenced the genome to 15-fold coverage.

[ pagebreak ]

Combining the results from both the sequencing of the pooled fosmid libraries and the resequencing, they were able to construct haplotype blocks of around 37 kilobase pairs —the size of their insert. They were able to then further assemble those into haplotype blocks with an N90 of 89 kilobase pairs, an N50 of 386 kilobase pairs, and an N10 of 1 megabase, which included 94 percent of heterozygous SNPs.

In order to evaluate accuracy, they compared the assembly with HapMap predictions. While the results were concordant for common variants, the team actually achieved better results for rare variants. "We show, for this genome, that we can phase rare variants better than HapMap and can obtain poorly ascertained areas of the genome," Shendure said.

Complete Genomics' Drmanac said that the major drawback with this method was its use of fosmid libraries. While making clone libraries is becoming simpler, "it's still an involved process and most scientists believe it won't be a routine process for sequencing genomes."

However, he added, the method is a good approach and can be used now. He said it would be useful for research purposes, but not in a clinical setting.

Shendure said his team is currently working on making the method more efficient. "Part of the value of next-gen sequencing is getting rid of clone-based approaches, and in fact, we're going back to that." However, he added, "we think we're trying to recover the best parts of the clone-based approach, without bringing with them the worst parts." For instance, he said, all the steps were done in large pools, and they never had to deal with individual colonies.

The authors noted that the sample prep cost about $4,000, due primarily to the reagents for fosmid and shotgun library construction, and could be completed in less than two weeks by one technician. While the price is low relative to the overall cost of whole-genome sequencing, Drmanac noted that it would be cost-prohibitive in a clinical setting.

"There is this goal to get to the $1,000 genome, and I have no doubt that will be achieved," he said. "When you have such a low cost of sequencing, you can't spend a couple thousand on sample prep; it really has to be a few hundred or under a hundred dollars."

[ pagebreak ]

Single-Cell Approach

In the second paper, Stephen Quake's group at Stanford University used microfluidics technology to haplotype single cells. A microfluidic device captured single cells, used a protease digestion to release the chromosomes, and then randomly separated the chromosomes into 48 regions. The chromosomes were then individually amplified and analyzed with PCR, so that two pools with differing homologous chromosomes could be created. Then, after creating two pools, each containing one haploid genome, the researchers genotyped them using an Illumina SNP array, creating haplotype blocks the size of a full chromosome.

"We get the haplotype of the entire chromosome, end to end," said Christina Fan, a graduate student in Quake's lab at Stanford and lead author of the paper.

They tested their technique on cell lines from a European HapMap trio, and one unrelated European individual, and compared their results to the haplotype data from the HapMap analysis and also to statistical phasing. For each individual, they analyzed between three and four single cells.

In the HapMap analysis, about 80 percent of the SNPs in the child of the trio could be unambiguously determined because one parent was homozygous for that SNP. In 20 percent of the cases, though, phasing had to be determined by statistical methods.

The new method developed by Quake's team matched the HapMap analysis 99.8 percent of the time. When looking at only the ambiguous SNPs, however, the two differed 5.7 percent of the time, and the majority of the inconsistencies were due to errors in the HapMap analysis, which "highlights the utility of direct experimental phasing even when family data are available," the authors wrote.

Next, the team applied the method to Quake's own genome, which was sequenced and annotated earlier this year (IS 5/4/2010). They were able to phase around 99.2 percent of SNPs. The SNPs they were unable to phase tended to cluster together in regions with high GC content.

While the method does not include whole-genome sequencing, one potential advantage of it over the University of Washington method is that it achieves full-chromosome resolution, as opposed to the approximately 40 kilobase pair resolution achieved by Shendure's team.

However, one drawback of the Stanford team's method is that it requires isolating single cells while they are in the metaphase stage of mitosis, when the chromosomes align in the center of the cell.

"The benefit is you have the whole chromosome, but the disadvantage is that you need those [specific] cells," Drmanac said. Also, he added, while the University of Washington group did not have full chromosomal haplotypes, they did achieve very long and good contigs of around 300 to 400 kilobase pairs. Additionally, including information about population genetics could probably extend the contigs to full chromosomes, he said.

Fan said that the Stanford group is continuing to work on optimizing the method and also to incorporate whole-genome sequencing.

She said she did not know the exact cost of the approach, but said that the most expensive portion is the Illumina SNP arrays, which run around $200 each. For each individual, they used eight arrays.

She said the team plans to use the method in noninvasive prenatal diagnostics as well as in HLA haplotyping.

Earlier this month researchers from the Chinese University of Hong Kong and Sequenom showed that the entire fetal genome was present in maternal plasma (IS 12/14/2010). While in that paper, the authors essentially had to distinguish each individual allele to determine whether it was maternal DNA or fetal DNA, Fan said that the new method would be a more efficient way of determining haplotype.

Have topics you'd like to see covered in In Sequence? Contact the editor at mheger [at] genomeweb [.] com.