Harnessing the capability of the Pacific Biosciences RS platform to sequence through long repetitive DNA stretches, researchers at the University of California, Davis, and PacBio have analyzed expanded CGG repeats in the fragile X syndrome gene.
The work, published online in Genome Research this month, could lead to more precise diagnosis of fragile X syndrome and related disorders, resulting in better prediction of clinical development and treatment response. In addition, the authors said the method lends itself to newborn screening for the disease.
"The paper is a proof-of-principle experiment, and it really opens things up for people to start looking at the clinical questions, and whether knowing the full sequence through these very large repeat regions changes prognosis or response to treatment," said Lisa Edelmann, director of the Mount Sinai Genetic Testing Laboratory in New York, who was not involved in the research.
Fragile X syndrome, a neurodevelopmental disorder that leads to intellectual disability, results from expansions of a CGG trinucleotide repeat in the 5' untranslated region of the fragile X mental retardation 1, or FMR1, gene. More than 200 CGG repeats, called the "full mutation range," cause fragile X syndrome, where production of the FMR1 protein is shut down through gene silencing. The "premutation range," between 55 and 200 repeats, predisposes to a different disorder called fragile X-associated tremor/ataxia syndrome.
Up until now, it has been impossible to determine the precise length of the expanded repeats because existing DNA sequencing technologies cannot sequence through more than about 100 CGG repeats.
Fragment-sizing methods based on PCR or Southern blotting provide an estimate of the repeat expansion size, and yield some information about methylation status and repeat interruptions. But those methods are unable to detect minor alleles that are present in many patients and can make a difference in their disease development.
According to Paul Hagerman, a professor of biochemistry and molecular medicine at the UC Davis School of Medicine and the senior author of the study, most individuals with repeats in the full mutation range also have a number of different repeat sizes, or minor alleles, meaning they are mosaic, and those minor alleles "will, in some instances, determine clinical outcome."
For example, while alleles in the full mutation range will not produce any FMR1 protein, minor alleles in the premutation range might produce some protein, so the disease is less severe. "So understanding clinical outcome will be improved if we can really say what those alleles are," Hagerman said.
PacBio's single-molecule real-time sequencing platform has allowed the researchers to sequence through the repeat expansions, and to determine the nature of minor repeat species.
"When I first heard about this [technology], I realized that this could present us with an opportunity to actually size by sequencing," Hagerman said. And because the PacBio sequences single DNA molecules, "you can actually get sequences for essentially every representative length within that pool [of DNA] … and since you are sequencing individual molecules, you should be able to pick up 99-plus percent of all of these rare alleles."
"There are a lot of features of this gene in the way it behaves and its association with clinical involvement that we don't understand, and being able to sequence these alleles and look at the distribution of allele size and methylation status is going to be helped greatly by this SMRT sequencing capability," he said.
For their study, the researchers generated PacBio circular consensus sequencing libraries from previously sized FMR1 DNA, either DNA cloned in bacteria, containing alleles with 36 and 95 CGG repeats, or PCR-amplified DNA with 29, 100, or 750 CGG repeats. Using the PacBio RS at the UC Davis Genome Center, they sequenced each library at least in duplicate, and analyzed pooled data from all sequencing runs.
They were able to generate sequence data for all the CGG repeat elements, including the 750 repeats that extend "well into the full mutation repeat range" that is relevant for fragile X disease, according to the paper. They also picked up repeat-size distributions within the same sample, and they determined AGG interruptions of the CGG repeats, which can be medically important.
Using at least three-fold single-molecule coverage, the data accuracy in sequences flanking the CGG repeats approached 100 percent, sufficient for making variant calls, they noted.
The scientists also generated kinetic data for DNA polymerase moving along the repetitive DNA and found the enzyme to be sensitive to local and regional sequence elements. Those results "establish a clear foundation for the detection of epigenetic modifications within the CGG-repeat region, which has not been possible at the nucleotide level for full mutation alleles that are epigenetically silenced in fragile X syndrome," they wrote.
The main limitation of the approach, Hagerman said, is a good target enrichment method — one that is more accurate than PCR and performs better on simple repeats. His lab is currently working on a non-PCR selection method, but he declined to provide further details at this time.
Hagerman's group is currently using the PacBio to further study the distribution of repeat expansion alleles and patterns of methylation in fragile X syndrome. Right now, they use bisulfite sequencing to look at methylation, but he hopes that direct methylation analysis will be possible on the PacBio in the future. "That's really developmental at this point; there is a lot of work that needs to be done to move that forward," he said.
The same method could be used to study repeat expansions associated with other diseases, such as myotonic dystrophy, Huntington's disease, Friedreich's ataxia, and amyotrophic lateral sclerosis-frontal temporal dementia, although Hagerman's group is not pursuing these at the moment.
His group is also looking at potential diagnostic applications of the approach, such as how CGG repeat size, minor alleles, and methylation patterns can predict the severity of the disease, or the outcome of therapeutic interventions.
A recent study, for example, showed that for a particular class of drugs, treatment outcome depends on whether patients have repeat expansions in the full mutation range or whether they are mosaic. "Detailed knowledge of the degree or extent and nature of mosaicism is going to be important also for predicting outcome," he said.
In addition, Hagerman is interested in applying PacBio sequencing to high-throughput newborn screening for fragile X syndrome. "It has potential to screen many thousands of individuals very rapidly, whereas existing methods can't touch that," he said. The disease is suitable for newborn screening because there are early interventions available, he added.
While diagnostic or screening applications of the method are "nothing we can offer tomorrow," he said, he is hopeful that they can be developed and validated "within a year or two."
Hagerman said he believes the PacBio platform lends itself to diagnostic testing in a CLIA environment, though this still needs to be proven. "Our goal is to utilize such instrumentation in a CLIA setting," he said. "To us, it's inevitable. We really want to push that." While the instrument is expensive, its cost could be amortized over "thousands of tests," he said.
Using the PacBio approach clinically for fragile X syndrome "would probably require additional research first," Mount Sinai's Edelmann said. The phenotype of the disease is variable, and this variability — for example in disease severity or response to treatment — may correlate with something in the sequence, such as AGG interruptions or other kinds of insertions, or minor alleles. "This is all speculative. This is a big open field," she said.
Hagerman said he is following the development of other sequencing methods, such as nanopore sequencing, that promise to have similar capabilities to the PacBio, but those haven't proven themselves yet. "I'm an opportunist, and if something comes along that's better, I'm really interested in it," he said, but "I need to see more genomic sequencing done with [nanopores] before I will be convinced."