NEW YORK (GenomeWeb) – Researchers at Baylor College of Medicine are exploring whether Pacific Biosciences' long sequence reads can detect genetic disease mutations that would otherwise go unnoticed.
"There are certain kinds of variants you just can't find with shorter reads," said Matthew Bainbridge, an assistant professor at Baylor's Human Genome Sequencing Center, who is involved in the project.
Baylor's medical genetics laboratory — now part of Baylor Miraca Genetics Laboratories, a joint venture between Baylor College of Medicine and Japan-based Miraca — was one of the first labs to offer clinical whole-exome sequencing to patients with likely genetic diseases. But the lab's diagnostic rate for genomic tests has been hovering around 30 percent, Bainbridge said, and the hope is that long PacBio reads may identify pathogenic variants — including repeat expansions and other structural variants — that short-read technologies miss.
"If you don't diagnose someone, and you think it's genetic, it could be one of two things — either it's a brand-new gene, which is what most people spend their time looking for … or it could be some kind of variant in a well-known gene that we have had trouble finding, or appreciating its pathogenicity," Bainbridge said, such as intronic deletions that affect gene splicing. The goal is to see if PacBio reads can find a subset of these variants.
"When whole-genome sequencing first came out, people said, 'This will solve all our problems. We will be able to find all the structural variants. It's going to be very easy,'" he said. But over the last couple of years, it became clear that many complex SVs are hard to find with current methods. "Any one structural variant program finds maybe a quarter to a third, and using them together, you can sort of get up to half, or a little bit over," he said. "Having a 6,000- or 10,000-base pair long read will actually make that a lot easier."
"The worst things to find are repeat expansions," he added. "If the repeat expansion is actually longer than your read, you simply can't tell how long it is."
For a proof-of-concept study to investigate what long PacBio reads may be able to add, the Baylor team is focusing on a set of 100-plus known disease genes, among them several genes involved in repeat expansion disorders, such as fragile X syndrome and several ataxias; genes with single-exon dropouts; and the almost 60 genes recommended by the American College of Medical Genetics and Genomics for incidental findings analysis, which include many cancer predisposition genes.
The sequencing approach the team is taking, called PacBio-LITS for large insert targeted sequencing, was published by Baylor researchers earlier this year and combines probe-based target capture with large-insert PacBio library prep and sequencing.
To validate their panel, the researchers have tested it on a number of samples with known single-exon dropouts and repeat expansions. For their first pilot experiment, they are now sequencing samples from about a dozen women with strong family histories of breast cancer who previously tested negative for mutations in BRCA1 and BRCA2. For that study, Bainbridge said, they are analyzing just three genes: BRCA1, BRCA2, and TP53. "We're not just capturing the exons, we're capturing the whole gene, the [untranslated regions], the introns, the exons, and a little bit of the upstream space," he said.
Depending on whether they find any new pathogenic mutations, they may then add more breast cancer patients, as well as a set of patients with apparent Lynch syndrome who tested negative for known Lynch syndrome genes. The plan is to have the first results within the next couple of months, Bainbridge said.
If the PacBio approach can indeed uncover new pathogenic mutations, as well as detect all mutations that current sequencing tests can, Baylor could eventually develop it into a targeted clinical test. "That's our goal, to see whether we can come up with a better, more thorough test," he said.
For example, he said, this could become a tier 1 test for patients with late-onset neurodegenerative disorders who are suspected of having a repeat expansion disease, or a tier 2 test for patients who tested negative on a standard Lynch syndrome test, for example. "We will probably find that there is going to be some niche markets for this," he said.
But the Baylor researchers also want to use the PacBio technology to discover novel pathogenic mutations. Right now, only about a dozen repeat expansion sites in the genome are known to play a role in disease, "but we are sort of looking under the lamplight," Bainbridge said. "We really want to find out whether there is a whole bunch of new diseases out there that are being caused by things like repeat expansions that are difficult to find."
The genome harbors many areas with highly repetitive sequences, both in coding and non-coding regions, that could potentially be repeat expansion sites, he said. To find them, the Baylor team plans to use the PacBio technology to sequence the genomes of patients where no other genetic test has so far yielded a diagnosis.
"These are cases where we have done everything under the sun," Bainbridge said, including exome and genome sequencing, as well as in some cases RNA-seq on affected tissues. "They certainly have a genetic disease that's very severe, and … we have looked for everything, but we still can't solve them," Bainbridge said.
For that project, the researchers are waiting for the new PacBio sequencing machine, the Sequel, which is scheduled to arrive at Baylor next week. The instrument promises to bring down sequencing consumables costs several-fold compared to the current RS II. They will also need to explore how much coverage is needed to find the structural variants they are looking for.
Bainbridge said he has samples from about half a dozen patients that would be a good fit for the project, and other researchers at Baylor have more.
"Those are really the next big targets for us, to start doing some whole genomes and see if there are complex rearrangements, or repeat expansions, or something just missed by the shorter-read technology," he said.