NEW YORK (GenomeWeb) – Complex structural variants appear to play a role in some Mendelian disease cases, according to a recent study by UK researchers, and their clinical detection could give diagnostic rates another boost, possibly aided by long-read sequencing technologies.
In a preprint published in BioRxiv this month, researchers at the Cambridge University Hospitals NHS Foundation Trust and their collaborators reported analyzing complex structural variants in 1,324 patients with undiagnosed neurodevelopmental or retinal disorders, using short-read whole-genome sequencing and three validation methods, among them nanopore sequencing.
Figuring out the precise breakpoints of these structural variants is important for assessing their effect on potentially disease-causing genes, and nanopore sequencing helped the researchers define these boundaries for one of the cases.
The study made use of the National Institute for Health Research England (NIHR) BioResource, a collection of DNA samples from patients with common and rare diseases as well as healthy volunteers, performing whole-genome sequencing on patients with undiagnosed rare disorders.
Keren Carss, a research associate at the University of Cambridge and one of the senior authors of the study, said her group had become more interested in structural variants as a cause of disease over the past year or so. While looking for simple structural variants in patients — such as insertions, deletions, or inversions — that overlap known disease genes, they accidentally came across a strange-looking SV that consisted of two duplications and one inversion in the same region.
"We wanted to look into this a little bit more systematically and see how commonly these might be causative of disease in our patients," Carss said.
For their project, the researchers performed short-read Illumina whole-genome sequencing on patient samples from three sub-projects: 725 from the Inherited Retinal Disorders project, 472 from the Neurological and Developmental Disorders project, and 127 from the Next Generation Children project, which analyzes parent-child trios of patients from neonatal and pediatric intensive care units.
After calling structural variants from the short-read data, they identified possible complex structural variants in disease-associated genes by clustering adjacent SV calls, resulting in 81 candidate complex SVs. After manually evaluating them and reconstructing their likely architecture, they whittled this number down to 46 potential real complex SVs. However, 42 of these were unlikely to be pathogenic because the patient's phenotype was inconsistent with the gene disrupted, or because the patient was heterozygous when the disease is recessive.
For the four cases with a seemingly clinically relevant complex SV, they used Sanger sequencing of PCR products to confirm the predicted novel breakpoint junctions, and microarrays to confirm predicted copy number changes and regions of homozygosity.
In one of the four patients, a child born with encephalopathy, whole-genome sequencing found a duplication-inversion-duplication on the X chromosome that overlapped the CDKL5 gene. However, the short-read data suggested two possible architectures for the complex SV, one predicting that the second copy of CDKL5 was intact, the other that it was disrupted. Sanger sequencing alone could not confirm either model, so the researchers turned to nanopore sequencing, which covered all the breakpoints and confirmed the presence of one intact copy of CDKL5. Since the complex structural variant might still affect gene regulation, they eventually called it a variant of unknown significance.
While only 0.3 percent of patients in the study were found to carry a likely disease-related complex SV, Carss believes there are probably more, since their assessment was conservative. For example, some patients may be found by array to have a copy number variant, when in fact they carry a complex SV. "Unless you do some more detailed assays, you wouldn't come across it," Carss said.
"We don't think it is causing a very high number of Mendelian diseases — it is going to be a rare cause — but we think that they are probably underestimated," she added.
Carss said that other groups have used a different approach for analyzing complex SVs — long-insert sequencing of fragments about 5 kilobases in size, using short-read technology. While that provides better resolution in repetitive regions than standard paired-end sequencing, she said, it limits the smallest size of SVs detected to about 5 kilobases.
Others agree that the UK study confirms the role of unusual structural variants and copy number variants as a cause of rare disease. "This is yet another example that some SVs/CNVs are in fact more complex than initially thought," said Alexander Hoischen, an assistant professor of immuno-genomics at Radboud University Medical Center in the Netherlands.
"The study urges for the use of long-read sequencing such as [Oxford Nanopore Technologies] or [Pacific Biosciences] for unsolved rare disease cases," he said, adding that several current projects, including the Solve-RD consortium his group is involved with, plan to study rare diseases more systematically with the help of long-read sequencing.
Using the PacBio platform in a pilot study that involved five patient-parent trios, he and his colleagues identified a total of 23,000 SVs and more than 30,000 indels per genome, "the majority of which are not identified routinely in short-read sequencing data," Hoischen said.
"In the long run, I could envision a truly generic genetic test for all rare disease patients using long-read sequencing data as a first-tier test," he said.
According to Carss, complex structural variants tend to show up in regions of the genome that are repetitive and difficult to sequence conventionally. "For that reason, it seems likely that long-read technologies, such as the nanopores we used for one of our patients, would be the most efficient way of resolving them," she said.
The error rate of nanopore sequencing is still relatively high, she said, and read coverage in their experiment was low, so they could not use the technique on its own. However, it is rapidly improving, "and I expect that using long-read sequencing technology on a larger cohort of patients would give more insights into the frequency and distribution of these types of variants," she said.
Clinically, defining complex SVs precisely might be especially important if a gene of interest lies near the edge of a deletion or duplication. "I can imagine a scenario, for example, where you could use a copy number array as a first-line test, and if the breakpoint of the rearrangement is close to the gene you're interested in, you have to do some more detailed assay to work out what exactly is going on."
"If understanding of those aspects is essential for understanding whether or not the variant might be pathogenic, then, what we have shown is that a combination of short-read whole-genome sequencing and long-read whole-genome sequencing is an effective way to resolve that," she said.
However, "the field has not quite yet come to a consensus as to the best way to identify these, characterize them, and confirm them," she added. "It's more of a research finding at this stage than a clinical diagnostic test. However, technologies seem to be moving fairly quickly from the research realm into clinical diagnostics, so I imagine in a few years' time, it probably will be [used by clinical labs]."