Researchers at Illumina have developed a dilution-based haplotyping method that it is suitable for both targeted and whole-genome sequencing experiments.
The method, published online in PNAS last week, allows users to obtain SNP phasing information for DNA stretches of up to several hundred kilobases in length. According to the authors, it can be used by any researchers with access to a next-gen sequencer, not only Illumina's platform.
Illumina's approach is similar in principle to Complete Genomics' long fragment read technology and other dilution-based methods: large genomic DNA fragments are first diluted and aliquoted into a 96-well microtiter plate, such that each well contains a fraction of the haploid genome. DNA from each pool is then amplified, using multiple displacement amplification, and converted into a barcoded sequencing library. Samples are then either pooled and sequenced, or pooled and enriched for a specific target using hybridization probes prior to sequencing.
According to Jacob Kitzman, a graduate student in Jay Shendure's lab at the University of Washington, a "unique twist" of Illumina's approach is the pull-down of targets. "This could potentially reduce the amount of sequencing coverage required and could make this approach feasible for use with many individuals, if the pull-down reagents were available at low cost," he told In Sequence.
Last year, Kitzman and his colleagues published a study where they sequenced a fetal genome from cell-free circulating DNA, which also involved genome-wide haplotyping of the mother's genome (GWDN 6/6/2012). Complete Genomics published its version of dilution-based haplotyping last year (IS 7/17/2012).
Kitzman said one drawback of all dilution-based haplotyping methods is that they require both standard sequencing to detect variants and additional sequencing of the sub-haploid pools to obtain phasing information. In their paper, the authors did not provide an estimate of the extra cost of phasing but said it is "largely determined by the extra sequencing required," which they expect will decrease over time.
Another drawback of dilution-based methods, Kitzman said, is that they are limited by the length of the genomic DNA fragments. As a result, haplotype assemblies break in regions where heterozygous markers are sparse or where short reads do not map due to sequence repeats. Thus, he said, "there remains a need to combine these approaches with others which can resolve phase at the physical scale of whole chromosomes."
Compared to Illumina's Moleculo method, which sequences DNA fragments of around 10 kilobases with short reads and stitches the data together into long reads, the haplotyping approach "may be better suited for phasing resequencing data against a high-quality reference genome, whereas Moleculo may be better suited for de novo assembly and other related applications," he said.
It is unclear whether Illumina plans to make the method and protocols commercially available to its customers, or whether it plans to integrate it into its human whole-genome sequencing service for research or its Individual Genome Sequencing service for clinicians. The company declined to comment for this article.
In their paper, led by Illumina scientist Jian-Bing Fan, the researchers first demonstrate proof of concept for their method by performing targeted sequencing of a 1-megabase region of the Duchenne muscular dystrophy gene, which is located on the X chromosome, in two male DNA samples and in a mix of the two samples. Because each male genome has only one X chromosome, their haplotypes can be determined by sequencing the samples separately, and then compared to the result from the combined sample.
The researchers diluted the combined sample and distributed the DNA into 96 wells, so that each one would contain 0.2 haploid copies. Each aliquot was then amplified by MDA, converted into barcoded sequencing libraries, pooled, enriched for the DMD-gene region, and sequenced on an Illumina GAIIx.
Overall, they called 1,210 heterozygous SNPs in the DMD region, of which they were able to phase 1,200 into one of two haplotype blocks, one 303 kilobases and the other 687 kilobases in length. After comparing the phased haplotypes to the known haplotypes from sequencing the two individual samples, they found that the phasing accuracy was 99 percent for the longer and 97 percent for the shorter haplotype block.
Following the targeted sequencing experiment, the scientists switched to whole-genome haplotyping, applying their method, with several protocol modifications, to an African Yoruban HapMap sample, NA18506, and then to a European HapMap sample, NA12878.
For the Yoruban sample, they diluted the DNA to 0.4 haploid copies per well and sequenced a total of 192 aliquots. They achieved 81.6-fold average genome coverage, covering 90.3 percent of the genome. They called about 3 million heterozygous SNPs, of which they phased 95.6 percent into about 9,200 haplotype blocks. The average block size was 264 kilobases, with an N50 of 702 kilobases.
For the European genome, they also sequenced 192 aliquots, each at 0.4 haploid copies of genomic DNA, on the HiSeq 2000. In total, they obtained 115-fold average genome coverage and covered 90.3 percent of the genome. Of about 2.1 million heterozygous SNPs, they phased 98.5 percent and generated about 10,500 haplotype blocks, with a mean block size of 221 kilobases and an N50 of 542 kilobases.
According to the researchers, an "important challenge" is the bias introduced by MDA, which favors certain loci over others, but increasing the number of aliquots can make up for differences in average read depth, and reduce allelic dropout.
They also found that larger numbers of dilution aliquots increased the average haplotype block size significantly. Other improvements in phasing accuracy and haplotype block length could come from using longer template DNA fragments and from using custom-designed rather than public algorithms for data analysis, they wrote.
Overall, the researchers were able to either fully or partially phase about 91 percent of genes, where they said the "majority of phasing interest lies." This is because heterozygous SNPs on the same allele affect the same transcript and the same protein, which can have phenotypic and clinical implications.