By Julia Karow
A little less than a year after the full commercial launch of Pacific Biosciences' PacBio RS sequencer, customers and collaborators are using the system for a variety of applications, including de novo assembly of genomes and transcriptomes in combination with other types of sequencing data, variant validation, analyses of trinucleotide repeat sequences, genome methylation analysis, and targeted cancer gene resequencing.
At the Advances in Genome Biology and Technology meeting in Marco Island, Fla., last month, researchers discussed their use of the platform in conference talks, poster presentations, and during a company-organized workshop.
One popular application of PacBio's long single-molecule reads appears to be the de novo assembly of genomes and transcriptomes, usually in combination with other types of sequencing data.
Adam Phillippy's group at the National Biodefense Analysis and Countermeasures Center in Frederick, Md., for example, has developed a hybrid error-correction and de novo assembly method that maps short reads from 454, Illumina, or Ion Torrent to the error-prone PacBio reads, improving their accuracy from about 85 percent to up to 99.9 percent. The team then assembles the corrected PacBio reads using a new version of the Celera assembler (IS 1/24/2012).
Phillippy and his colleagues have applied this strategy to several genomes, including a bacterial genome, which they were able to assemble into a single contig; the genome of a parakeet; and the transcriptome of corn.
Erich Jarvis from the Duke University Medical Center, who collaborated with Phillippy on the parakeet genome, noted that using a combination of 454 data and error-corrected PacBio reads doubled the N50 contig size, compared to other combinations of sequence data that did not use PacBio.
While PacBio data is difficult to use on its own because of its high error rate, Phillippy said, it is powerful in combination with other types of sequence reads. He said he sees future applications for the technology in genome finishing, the assembly of complex eukaryotic genomes, haplotype phasing, and the analysis of mixed samples.
However, the reliability, throughput, accuracy, and cost of the PacBio platform still represent challenges, he noted, as well as the high amount of starting DNA required to construct long-read libraries.
Others have been exploring the PacBio for hybrid de novo assemblies of genomes as well. David Jaffe from the Broad Institute, for example, presented the assembly of near-finished prokaryotic genomes from Illumina and PacBio reads, and researchers from the Department of Energy Joint Genome Institute reported a similar application. Gregory Harhay's team from the US Department of Agriculture's Meat Animal Research Center has explored a combination of PacBio and 454 data to produce reference genome sequences of bacteria.
A group at Cold Spring Harbor Laboratory, meanwhile, has used a combination of Illumina and PacBio data to assemble the genomes of yeast and other eukaryotes.
Other groups, including researchers at Baylor College of Medicine led by Adam English and at the JGI led by Cliff Han, have been improving and, in some cases, finishing existing reference genomes with the help of PacBio data.
Targeted validation of variants found by other sequencing technologies is another area where groups have found use for the PacBio platform.
Mark DePristo from the Broad Institute, for example, said that the institute is already using PacBio as a standard validation technology for human resequencing projects, including the 1000 Genomes Project. He said that the platform's long reads are useful to invalidate miscalled variants that result from mismapped short reads.
Large-scale resequencing projects, including human exome sequencing, require several hundred variants to be validated in hundreds of samples, he said, and PacBio is a good complement to other validation technologies, such as Sequenom's MassArray or Illumina's MiSeq.
Cold Spring Harbor Laboratory has also tested the use of the PacBio, as well as the MiSeq, for validating SNPs, according to a poster presentation.
While many users have been applying PacBio in concert with other sequencing data, others have explored the platform's unique capabilities.
Paul Hagerman from the University of California, Davis, School of Medicine, for example, has been collaborating with PacBio to sequence expanded CGG trinucleotide repeats in the 5' untranslated region of the human FMR1 gene. CGG repeats of more than 200 inactivate the gene, resulting in fragile X mental retardation, and repeats of between 55 and 200 lead to fragile X-associated tremor and ataxia syndrome.
Up until now, no DNA sequencing technology has been able to sequence through more than about 100 CGG repeats, he said, though it is important to determine their precise number for disease studies as well as for genetic counseling.
Using PacBio's circular consensus sequencing mode, Hagerman and his colleagues were able to sequence up to 750 PCR-amplified CGG repeats, he reported, the largest CGG element sequenced so far. Future goals include increasing the fraction of reads that exceed 10 kilobases and sequencing the expanded CGG repeats without PCR amplification or cloning steps.
Other research teams have been exploiting the capability of the PacBio platform to distinguish methylated from unmethylated DNA. Richard Roberts from New England Biolabs, for example, in a collaboration with PacBio, has used this feature to study the specificity of several bacterial DNA methyltransferases.
The platform will also be useful to determine the methylome of microbial genomes, he said, and maybe, in the future, of eukaryotic genomes.
Matt Waldor from Harvard University Medical School, also in collaboration with PacBio, has determined genome-wide methylation in the German E. coli outbreak strain from 2011, and has been working with Roberts to determine which methylases are responsible for the methylation.
Waldor said that the possibility to use the PacBio platform in the future to determine not only methylation but also other types of DNA modifications will open up a new field of biology.
Finally, at least one research group has started to use the PacBio for clinical applications. John McPherson from the Ontario Institute for Cancer Research provided an update of a feasibility study in which he and his colleagues used the PacBio platform to sequence a panel of 19 actionable cancer genes in patients with advanced metastatic cancer in order to help guide their treatment (CSN 8/3/2011).
They have already found actionable mutations in a number of patients, he said, of which four have had their treatment changed as a result of the test.
While the pilot study is being conducted using PacBio's system, the researchers are now also considering the Ion Torrent platform, which requires less input DNA, he said.
Have topics you'd like to see covered in In Sequence? Contact the editor at jkarow [at] genomeweb [.] com.