NEW YORK (GenomeWeb) – Researchers from Rice University have developed a new method for creating modular hybridization probes that overcome some of the main challenges in detecting long, complex, hypervariable, or repetitive DNA sequences using PCR, or to support targeted or exome sequencing assays.
This new way of designing probe molecules, which the group calls M-Probes, offers a way to target certain areas of the genome that may have significant relevance to human health and disease, but can't be sufficiently assayed by currently available hybridization tools, researchers explained in a study in Nature Chemistry this week.
As a proof-of-concept demonstration, the group showed that they could create an M-Probe-based hybrid capture assay to determine the exact triplet repeat expansion number in the Huntington's gene, and they suggested that similar results should be possible for other triplet-repeat disorders.
Unlike existing hybridization probes, M-Probes involve a modular, or interchangeable, linkage of multiple probe segments, with each segment targeting one section of a potentially long sequence, explained Rice University Professor David Zhang, the study's principle investigator.
At Zhang's lab at Rice, he and his colleagues have been working for several years on optimizing various aspects of DNA hybridization, including developing other novel probe designs for next-generation sequencing target enrichment, and working to improve multiplex PCR methods.
The group previously designed technology that involves coupling probe sequences with a complementary strand that mimics a target sequence in order to create a competitive hybridization reaction.
"We make partially double-stranded probes and use the idea of molecular competition to try to insure single-base specificity," Zhang said. "The idea is that if there is a particular part of the genome that we want to bind to – we construct a target-mimic that looks like the one you want to bind to outcompete all the other non-specific sequences."
The team's new M-Probes are essentially a multi-stranded equivalent of this predecessor, in which a collection of probes and target mimics — which correspond, in combination, to some long or hypervariable sequence — are strung out and connect by junctions or "arms." These arms have their own unique sequences that are unlikely to bind to the human genome.
In their report this week, the Rice investigators described how by precisely designing the junction points within their M-Probes, they could sensitively detect crucial points of variation in a particular segment of the genome while ignoring variation at other nearby locations.
The junctions between the probe segments tolerate up to 7 nucleotides of sequence variation without significant effect on binding affinity, while even variation of just a single nucleotide at other locations result in more than threefold reduction, the authors reported. In this way, a signal of the variant in question comes through without being confounded by the presence of nearby benign variation.
"If there are places that might be variable that's where we put the arms, because near those crosses it's basically robust to sequence changes. Anything near there gets washed out, and it's not sensitive to single nucleotide changes at that location," Zhang explained.
"You can put the arms wherever you want," he added, "so by using databases … you can design a probe around the knowledge of where these non-clinically relevant variations take place … almost anywhere in the genome."
Zhang raise the example of EGFR T790M mutations. "You have T790M, but right beside there, about 10 bases away, there is another synonymous mutation at the 787 position, and about 40 percent of people have this nonpathogenic SNP," he said.
That means that if you have a probe or diagnostic system that too-specifically recognizes the T790M from wild-type, it might accidentally pick up this 787 mutation. We don't want that. We want to be able to distinguish the positions that are pathogenic from the areas of normal variability between humans," he added.
According to Zhang and his colleagues, their M-Probe results are the first to experimentally demonstrate single-nucleotide sensitivity for a target region with simultaneous tolerance to multi-nucleotide variation at other specified positions.
In the study published this week, the group explored application of the approach to diagnosis of trinucleotide repeat disorders like Huntington's disease and fragile X syndrome.
In each of these conditions, the relevant gene features a triplet repeat count number above a certain threshold as compared to healthy individuals. For example, 27 CAG repeats in the Huntington's gene HTT is considered disease causing, while lower numbers of repeats may be present in normal, healthy individuals.
To demonstrate the potential of M-Probes in this setting, Zhang and colleagues designed probes that would selectively bind DNA with HTT sequences exceeding the threshold number of triplet repeats. "By designing two different M-Probes, one targeting nine repeats and one targeting 27 repeats, we could control for sample variability and determine the potential disease status through the difference in the observed … values," the authors wrote.
The M-Probe approach correctly identified the length status of five samples with known HTT genotypes, and determined that two other samples with unknown genotypes did not cross the 27-repeat threshold.
Authors wrote that the two-probe approach represents a minimal protocol needed for determining disease likelihood in an unknown sample, but even more precise quantitation of triplet repeat numbers could be achieved by including more M-Probes with varying triplet repeat thresholds.
According to the authors, besides diagnosis of triplet repeat disorders, the M-Probe hybridization methodology also shows promise for highly multiplexed enrichment for downstream NGS without missing genetic regions that contain complex variations.
"Of particular benefit may be enrichment of DNA structural variants (for example, translocations and fusions), RNA alternative splice patterns, and other sequences currently difficult to assay with short-read sequencing," Zhang and his coauthors wrote.
In the study, the team reported that they could create M-Probes that sequence-selectively bind a continuous DNA sequence of greater than 500 nucleotides.
"Experimentally we demonstrated 560 bases," Zhang said. "That also demonstrates how this could be a way to confirm the sequences that short-read technologies like Illumina or Ion Torrent produce.
Zhang and colleagues spun out a commercial company several years ago that was initially called Searna, but which he said has since been renamed NuProbe, and is anticipating soon closing a Series A financing round.
NuProbe is mainly focused on cancer diagnosis, Zhang said, so the firm is not sure whether M-Probes — which are more directly applicable to diagnosis of triplet repeat disorders — will fit in with its commercial goals.