NEW YORK (GenomeWeb) – A team from the Netherlands and Canada has demonstrated the feasibility of using its directional single-cell sequencing method, known as Strand-seq, to directly develop chromosome-length haplotypes with short-read sequence data using individual cells from one person.
As they reported in Genome Research this week, the researchers came up with an analytical pipeline called StrandPhase to phase variants in template DNA strands from individual cells, using Watson and Crick strand directionality to distinguish between maternal and paternal sequences before putting together consensus haplotypes from multiple individual cells.
Using this approach, the team did chromosome-level haplotyping for an individual previously profiled with her parents for a HapMap trio. In the absence of parental samples, Strand-seq data from almost 200 individual cells led to a haplotype that was more than 99 percent concordant with an existing reference. When Strand-seq data for a few hundred parental cells were included, meanwhile, it became possible to map meiotic recombination in the family.
"The analysis … shows that we can develop chromosome-length haplotypes without knowing the parents," senior author Peter Lansdorp, a researcher affiliated with the University of Groningen, the University of British Columbia, and the BC Cancer Agency, told GenomeWeb. "By doing the same with the parents, we can map where meiotic breakpoints and exchanges in the germline took place to give rise to the child."
Lansdorp noted that the analysis also uncovered subsets of cells with distinct haplotypes or with loss of heterozygosity, pointing to another potential application for Strand-seq in the clinic once the method becomes speedier and more affordable.
The general Strand-seq strategy involves selectively sequencing one strand of DNA from each homologous chromosome before mapping reads relative to the reference genome to distinguish between Watson and Crick strands based on read directionality.
The team grows cells in the presence of the DNA-labeling compound BrdU, which gets incorporated into new strands of sister chromatids during DNA synthesis. After mitosis, each BrdU-containing replicated strand can be taken out of the equation with ultraviolet light, which nicks the labeled DNA through photolysis.
This leaves one original template strand per chromosome from each parent available for sequencing. And for chromosomes represented by a Watson strand from one parent and a Crick strand from the other, the researchers explained, it becomes possible to bioinformatically tease apart distinct sequences from each parent with StrandPhase based on read directionality.
The analytical software starts by finding stretches of sequence in a given cell that are represented by a Watson and Crick strand. From there, haplotypes are incrementally built up for a cell based on phasing for SNPs on each strand of DNA — information that can be combined with phased variants from other cells to build up consensus haplotypes.
"[O]ur algorithm concatenates haplotype information from multiple single cells," the authors wrote, "reinforcing and validating the phased variants in a consensus haplotype for each homologue."
Lansdorp and colleagues from the BC Cancer Agency first described Strand-seq in a Nature Methods study in 2012. There, the directional single-cell sequencing approach was primarily applied to mapping sequence swaps between sister chromatids in dividing mouse embryonic stem cells, though there were hints that it could improve genome assembly accuracy.
Members of Lansdorp's team have since used Strand-seq to find improperly oriented regions spanning millions of bases in the human, mouse, Xenopus frog, pig, and zebrafish genomes. And in a Genome Research study published earlier this year, they described analytical methods that made it possible to track down polymorphic inversions in the genome with the help of Strand-seq.
"The [sequencing] method really hasn't change," Lansdorp said. "We've now developed some of the applications that we anticipated in the earlier paper, including haplotyping."
In the latest study, the researchers set out to test their new StrandPhase analytical pipeline using an individual that has already been well-haplotyped: a female dubbed NA12878, who was sequenced as part of a HapMap trio. The Strand-seq approach does not include a whole-genome amplification step before sequencing libraries are prepared, Lansdorp noted, which may prevent some phasing problems related to PCR-based biases or errors.
Using the Illumina HiSeq2500 instrument, the team sequenced a Strand-seq library from just one cell, generating sequence that spanned roughly 5 percent of the human reference genome. With these data, they could phase almost 78,000 variants from chromosomes in the cell that were represented by Watson and Crick template strands.
The researchers then expanded their analysis to include Strand-seq data for 183 cells from NA12878, uncovering almost 2.2 million SNPs — nearly three-quarters of those previously identified in the HapMap reference genome. Of those, about 1.3 million SNPs apiece fell within a consensus haplotype produced with StrandPhase.
More than 99 percent of the phased single nucleotide variants assembled in these haplotypes matched with those reported previously, the team noted, though the analysis also unearthed examples of haplotype switches suspected of stemming from homozygous inversions.
The remaining variants in the StandPhase-based NA12878 haplotype, almost 24,000, diverged from the HapMap reference. More than half the time, these mismatched variants turned up in multiple single cells, prompting the researchers to suggest that at least some of the haplotype discordance was due to polymorphic inversions, mutations, or errors in the reference.
Indeed, they found that their Strand-seq-based haplotypes were more than 99 percent concordant with phasing patterns present in long Pacific BioSciences RNA sequence reads generated previously for NA12878. And the StrandPhase approach uncovered 42 of 49 known de novo germline mutations in the HapMap individual.
"While our approach requires preparation of single-cell libraries," the authors wrote, "it circumvents the need for generational information and rapidly builds accurate whole chromosome haplotypes."
A few other methods for parental sample-free haplotyping have been proposed, Lansdorp noted, including computational predictions based on SNP patterns, long-read sequencing, or strategies based on isolating and sequencing individual chromosomes.
Given repetitive regions and palindromic sequences in the human genome, though, it can still be tricky to phase long stretches of sequences, particularly using the shorter reads produced with affordable sequencing technologies such as Illumina, he explained, noting that "Strand-seq is probably going to be very useful in combination with those approaches."
When the researchers expanded their haplotype analysis of NA12878 to include 233 Strand-seq cells from the individual's father and 267 from her mother, the were able to produce haplotypes for both parents that not only revealed the parent of origin for NA12878's heterozygous SNPs, but also provided a look at dozens of meiotic recombinations in the family.
The available Strand-seq data was used to phase just under 23 percent of small insertions and deletions in the trio, though the indels that were phased matched very closely with phasing patterns reported in the past. The method also showed promise for picking up larger structural variants, including inversions, as well as mosaic recombination events affecting some, but not all cells.
Based on their findings, Lansdorp and his colleagues called the method "a unique and powerful approach to completely phase individual genomes and map inheritance patterns in families, while preserving haplotype differences between single cells."
The team is continuing to explore strategies to improve the throughput and cost of Strand-seq in the hopes of making it more amenable to medical genetic and tumor biology applications in the clinic. At the moment, for example, the BrdU incorporation step typically requires a cell culture step prior to single cell isolation.
To ramp up the Strand-seq throughput, Lansdorp explained, the investigators are considering everything from chemistry tweaks to reactions done in droplets or other small volumes. They are also assessing the number of Strand-seq libraries that need to be sequenced, depending on the application and the availability of complementary sequence data.
"I imagine we'll probably end up [requiring] somewhere between 30 and 100 cells that are sequenced more shallow if all we want to do is break the gaps left by other sequencing technologies," Lansdorp said.
So far, the researchers have relied on Illumina instruments for their Strand-seq experiments, though the approach is expected to be compatible with other sequencing technologies.