NEW YORK (GenomeWeb News) – In a pair of papers appearing online yesterday in Nature Biotechnology, two research teams from the University of Washington and Stanford University outlined their new strategies for haplotyping human genomes.
In the first of these, University of Washington genomics researcher Jay Shendure and his colleagues used large insert cloning to tease apart haplotype patterns in the genome of a Gujarati Indian woman, the first individual from the Indian subcontinent to have her genome sequenced. Although it has been notoriously difficult to impute genotypes in this genetically diverse population, the researchers explained, they found that they were able to place almost all of the woman's heterozygous SNPs into haplotype blocks using their method.
This sort of haplotype-resolved, individual genome sequencing has only been achieved for two other human genomes, Shendure told GenomeWeb Daily News: the original human genome, which was sequenced BAC-by-BAC and based largely on sequence from one individual, and the Craig Venter genome, which was haplotype resolved using a combination of long Sanger reads and long inserts.
In contrast, the read and insert sizes involved in second generation sequencing make it more difficult to directly haplotype genomes, Shendure and his co-authors explained, noting that "the short read-lengths and paucity of contiguity information are such that it remains challenging to determine haplotypes at a genome-wide scale."
While genomic phasing and methods based on linkage disequilibrium patterns in population, pedigree, or other data can help to tease apart such patterns, Shendure explained, there has not been a direct method for doing genome-wide haplotyping, using second-generation sequencing.
"Haplotype inference is great when you have good linkage disequilibrium and when your variants are common," he said. But the ability to accurately infer haplotype breaks down quickly when variants are rare or in the absence of good linkage disequilibrium — as occurs in populations with relatively high genetic diversity.
To solve this problem, the researchers first used the Illumina HiSeq platform to whole-genome shotgun sequence the genome of a female Gujarati Indian HapMap participant from Houston to a depth of about 15 times.
"This person in particular appeared to have ancestry not only from what's called the ancestral north Indian group and the ancestral south Indian group, but also some other group," Shendure noted, "So this is the sort of person in which having haplotype resolved information would be useful for population genetics."
After this sequencing step, the team was left with three to four million haplotype unresolved variants, Shendure explained. They then combined this whole-genome data with data from a fosmid library generated for the same individual. The library was sub-divided into about 100 barcoded pools representing some three percent of the diploid genome each, which were then sequenced using the Illumina Genome Analyzer IIx.
"Because it's only three percent representation, the odds at any given location, that both chromosomes will be represented is very low," Shendure said. "So each pool of fosmids is essentially sampling a haploid subset of the genome."
Since the method is based on pooled fosmids, the team did not face the time and expense associated with picking fosmid clones, lead author Jacob Kitzman, a graduate student in Shendure's University of Washington lab, told GWDN.
By putting the two datasets together, the team was able to construct haplotype blocks — in this case representing some 94 percent of the known heterozygous SNPs in the woman's genome. They also gleaned information about duplicated and structurally variant genes using read depth, array, and other data.
Based on their findings so far, the team believes their haplotyping method may have applications for everything from population genetics to studies of rare Mendelian and recessive diseases. The team is currently exploring liquid handling steps that would further automate and streamline the process, Shendure explained.
Meanwhile, in another online Nature Biotechnology paper yesterday, Stephen Quake, a Stanford University bioengineering researcher and co-founder of Fluidigm and Helicos, along with colleagues, outlined their microfluidics method for haplotyping individuals from single cells. The team demonstrated the feasibility of their approach, which they call direct deterministic phasing, or DDP, by haplotyping four individuals of European descent.
"We developed a microfluidic device capable of separating and amplifying homologous copies of each chromosome from a single human metaphase cell," Quake and his co-authors wrote. "Single nucleotide polymorphism array analysis of amplified DNA enabled us to achieve completely deterministic, whole-genome, personal haplotypes of four individuals."
For their part, Quake and his co-authors emphasized the potential personalized medicine and pharmacogenomics applications of individual haplotyping, as well as its usefulness for studying complex traits, population genetics, and human migration.
The DDP strategy they developed uses a microfluidic device that first isolates individual metaphase cells and then separates and amplifies chromosomes from these cells.
By digesting metaphase chromosomes with a protease enzyme and siphoning them off into several dozen compartments, they explained, the researchers were able to amplify individual chromosomes. By combining their samples into two pools containing one copy of each chromosome, they could then genotype homologous chromosomes in each pool using the Illumina HumanOmni1-QuadBeadChip array.
Using this approach, the team successfully haplotyped three European individuals from a HapMap trio as well as the Quake genome, which was sequenced last fall.
They also showed that they could use their strategy to directly detect recombination events and phase heterozygous deletion events in the HapMap trio and to assess human leukocyte antigen haplotype in the individual genome.
"[W]e showed that amplified materials from separated chromosome homologs could be directly sequenced, yielding phasing information for variants, including the rare and private ones, which are absent on standard genotyping arrays," they concluded. "Combining DDP SNPs analysis with shotgun genome sequencing could allow the determination of the complete haplotype of an individual, even in the absence of family information."