Several research groups have been exploring the Pacific Biosciences RS sequencer for characterizing repeat structures in the human genome, applying the platform's long reads to defined target areas.
At the Personal Genomes and Medical Genomics conference at Cold Spring Harbor Laboratory last month, David Mittelman, an associate professor at the Virginia Bioinformatics Institute at Virginia Tech, presented a PacBio assay for measuring genomic repeat instability; and Mark Wang from Baylor's Human Genome Sequencing Center reported on long-insert capture PacBio sequencing to resolve structural rearrangements with low copy repeats.
Mittelman has been studying expanded triplet repeat tracts that are involved in more than a dozen inherited disorders, such as Huntington's disease and spinocerebellar ataxias. These tracts can be hundreds of bases long, making them unsuitable for short-read sequencing. They also vary between cells due to somatic instability, a feature that Sanger sequencing cannot capture adequately because it generates reads averaged over many cells.
PacBio sequencing, on the other hand, "gives you an entire spread" of repeat lengths, Mittelman said. And although there are other methods available for analyzing fragments of different lengths, having the actual DNA sequence ensures that only the repeats and no PCR artifacts are studied.
At the moment, Mittelman and his colleagues at Stanford University amplify the repeats by PCR, introducing barcodes at the beginning of the process to distinguish true somatic variation from PCR amplification errors.
"There is tremendous potential for using PacBio to capture these long repetitive regions and look at variability" between cells, he told In Sequence, noting that he is not concerned much about the platform's high single-read error rate because his team is primarily interested in the size of the repeats.
His group is not the only one exploring the PacBio for studying triplet repeats: last month, a group from the University of California, Davis, published a paper in which they used the platform to characterize expanded repeats in fragile X syndrome (IS 10/23/2012).
Mittelman and his coworkers developed their assay as part of their work on using engineered nucleases to disrupt or shrink expanded repeat tracts permanently — a collaboration with Matt Porteus at Stanford University. They turned to PacBio sequencing because they needed a way to measure the effect of the nuclease on repeat size.
In 2009, Mittelman published a proof-of-concept study in PNAS, showing that zinc finger nucleases can recognize and cleave CAG repeat sequences. Now, "we're trying to cut the repeat and push the cell to essentially chop up the repeat and shrink it," he explained. Their plan is to build a model system for exploring the impact of different size repeats on disease. In the distant future, the approach might also be used as a new type of gene therapy to target expanded repeats directly in patients. "If you can shrink the really big allele, then you can maybe delay the onset of the disorder," he said.
Besides tracking the effect of nucleases on repeat size, the PacBio assay will be a good method for studying genetic mosaicism in general, Mittelman said. "There are lots of sequences that are just highly variable, even in your own body," and being able to study this genomic instability between and within tissues will be very useful, he said.
"It's one of many ways that PacBio can establish itself as a leader in an area that isn't really dominated yet," he said. "The future is going to be in studying the genetic diversity within one individual, and this is one way you can do it."
The assay could be further improved by moving from PCR amplification of the repeats to capturing them directly, which would only require very little amplification afterwards.
"It would be great to take tissues from real patients and be able to capture the DNA we need, and run it through the PacBio," Mittelman said, noting that this would cut down on noise in the data.
Targeting Low Copy Repeats
Meanwhile, a team at the Human Genome Sequencing Center at Baylor College of Medicine has been working on long-insert capture for PacBio sequencing.
Structural rearrangements with large low copy repeats, such as duplications and deletions, are difficult to resolve using Illumina sequencing but might be amenable to targeted PacBio sequencing, so Mark Wang and his colleagues set out to combine long-insert capture with PacBio.
To construct the long-insert libraries, they first capture defined DNA targets and then add SMRT adaptors to the capture products for sequencing.
In one project, in collaboration with Baylor's Jim Lupski, they have been studying a complex deletion in Smith-Magenis syndrome, which involves large near-identical low copy repeat regions flanking the deletion area. The goal is to characterize single nucleotide variants in this region by Illumina sequencing, and to resolve the structure of the deletion using PacBio sequencing.
For the study, they designed probe sets, in collaboration with Roche/NimbleGen, to cover the 7-megabase deletion region. Starting with a microgram of DNA, they then constructed a 1.3-kilobase long-insert capture library, which they sequenced on the PacBio instrument. They obtained a mean mapped subread length of 770 base pairs, an N50 mapped subread length of more than 900 base pairs, and 65 percent of the reads were on target, which Wang called "quite good."
Subsequently, they constructed a 4.5-kilobase long-insert library using the same DNA, which led to a "dramatic improvement" in mean mapped subread length, to about 2,200 base pairs, and an N50 mapped subread length of about 3,700 base pairs. About 70 percent of reads were on target, and the mapping rate increased from 81 percent to 88 percent with the longer subreads.
In order to analyze the breakpoint of the deletion and obtain a "clear picture of the repeat structure," the Baylor team plans to generate more data. It also hopes to be able to capture even longer pieces of DNA to improve the mapping rate further. "Especially for this complex region, the longer reads will help," Wang said.