SAN FRANCISCO (GenomeWeb) – As long-read sequencing and genome mapping technologies continue to improve, researchers are increasingly using them to identify structural variants that were previously impossible to detect, both cost-effectively at a large scale. The technologies are being used for a variety of applications, increasingly moving into clinical research and diagnostics.
Last year, for instance, whole-genome sequencing on Pacific Biosciences' technology was used to diagnose a pathogenic structural variant for the first time.
Other groups are developing methods to look at clinically relevant structural variants using Oxford Nanopore's MinIon sequencer and Bionano Genomics' Saphyr DNA mapping instrument.
Such methods have been enabled due to technology improvements. For instance, both PacBio and Oxford Nanopore have been regularly increasing the read length and throughput of the Sequel and MinIon, respectively, while Bionano launched a new instrument early last year, Saphyr, which is faster, more accurate, and offers higher throughput.
Also, recently, researchers from 10x Genomics described in a publication on the BioRxiv preprint server how they used the company's linked-read technology to look at structural variants across the entire human genome.
"There's an immense challenge in looking at genome structure using only short-read sequencing-based platforms," said Chia-Lin Wei, director of genome technologies at the Jackson Laboratory.
Her team recently demonstrated in a publication posted on the BioRxiv preprint server that the MinIon could be used to identify structural variants in cancer genomes.
Wei said that the initial results, using the MinIon to look at somatic structural variants, were promising and that her team next plans to use the platform along with a custom-developed pipeline, called Picky, to analyze xenograft models to try and understand tumor heterogeneity and evolution. In addition, she said, the team will analyze well-characterized samples, like the trios from the 1,000 Genomes Project, "to see how many more new structural variants we can exhume through nanopore sequencing that were not previously revealed."
Wei said that in 2016, Oxford Nanopore made a number of improvements to its nanopore sequencing technology that made it feasible to study tumor genome structure and structural variants. The base accuracy and output both improved and the read length increased to a point where she said the technology could be useful for larger, more complex genomes. Although the team did not plan to develop its own bioinformatics pipeline, when it began using the MinIon, the structural variant callers that were available were all designed for short-read sequencing platforms. "The concept and assumptions were based on shorter reads and higher accuracy and so were not applicable to a long-read platform," she said. Even some of the algorithms that researchers had designed for PacBio data seemed to "break the long reads into shorter fragments," Wei said, essentially borrowing the concepts of short-read sequencing algorithms and applying them to long reads. "We felt that defeated the purpose."
The pipeline the Jackson Lab developed has two main components, an alignment tool and a structural variant calling tool. The key to the alignment tool, Wei said, is that it has to work with long reads that have lower base accuracy. In the BioRxiv paper, the researchers demonstrated their strategy on a breast cancer cell line used as a model for triple negative breast cancer. Its genome is known to contain extensive structural variation and has been previously sequenced. They sequenced the genome to 2.5-fold coverage and found that the nanopore reads extended further into repetitive regions of the genome than shorter reads. For instance, one 14.7-kilobase nanopore read extended into a region that is "rich in short interspersed nuclear elements/long interspersed nuclear elements," which was not covered in the previously generated data.
The team also found that the pipeline could accurately call structural variants. For example, it was able to confirm all of 40 structural variants that were called by more than one read using PCR. Of 173 structural variants supported by just one read, the researchers validated 136, or 79 percent.
Other groups have also begun developing bioinformatics pipelines to call structural variants from MinIon sequence data. A team from the University Medical Center Utrecht, for example, recently described a pipeline called NanoSV in Nature Communications, as well as at an Oxford Nanopore-sponsored workshop. Wigard Kloosterman, associate professor at UMC Utrecht, led a team that sequenced the genome of an individual with chromothripsis, a phenomenon involved in congenital diseases and cancer that is characterized by complex genome rearrangements, using the MinIon. They demonstrated that NanoSV was able to detect 40 structural variant breakpoints that had been previously identified by Illumina sequencing.
In addition, a team from Baylor College of Medicine, the Unversity of Vienna, and Cold Spring Harbor Laboratory described in a publication on BioRxiv a pipeline called Sniffles for calling structural variants from long-read sequencing data.
Wei said that although her team plans to move forward with the MinIon and its custom-developed pipeline to analyze cancer samples, she has also tested both the PacBio and 10x Genomics' technology. "We find comparable results with PacBio sequencing for structural variant analysis," she said. Currently, the group is generating longer reads with the MinIon, up to hundreds of kilobases in length, but it continues to evaluate PacBio technology and will be watching it closely. 10x Genomics' Chromium platform is also useful, she said, but because the instrument is "ultimately still using Illumina sequencing, the repeats are still a challenge."
Aside from nanopore sequencing, researchers are also evaluating PacBio's sequencing technology and Bionano's genome mapping platform for detecting clinically relevant structural variants.
Stanford's Clinical Genomics Service team, for example, used the PacBio Sequel to sequence the genome of an individual with an unknown disease to identify the causative mutation, a structural variant that had been missed by previous testing, and the company has been working on improvements that will eventually enable low-coverage whole-genome sequencing for the purpose of detecting structural variants for $1,000.
The firm has also demonstrated that its technology is amenable to targeted sequencing using the CRISPR/Cas9 system and groups have described developing targeted panels that focus on repetitive regions or structural variants. For instance, a group at the Parkinson's Institute and Clinical Center in Sunnyvale, California is developing a CRISPR/Cas9-based capture enrichment strategy to identify repeat expansions associated with ataxias and Parkinson's disease. PacBio plans to develop the protocol into a commercial product this year.
Meanwhile, a team from the University of California, Los Angeles and Children's National Health System demonstrated in a study published in Genome Medicine last year that genome mapping using the Bionano Saphyr instrument can identify structural variants that cause Duchenne muscular dystrophy. Eric Vilain, director of the Center for Genetic Medicine Research at Children's National, was an early-access user of Saphyr when he was previously a professor of human genetics at UCLA. He is also a senior author on the Genome Medicine study.
"It's very difficult to identify structural variants with NGS, especially when they don't result in any variation in copy number, like inversions," Vilain said. "The genome mapping technology provides us with a way to visualize structural variants in a way that's novel and full of potential."
Vilain is now testing whether genome mapping can help identify the cause of unknown, rare diseases. His team is currently using the Saphyr system on samples from patients enrolled in the National Institutes of Health's Undiagnosed Diseases Network who have already gone through a suite of diagnostic tests, including exome sequencing, and have still not received a diagnosis. The hypothesis is that at least some of these patients will have a pathogenic structural variant, and the goal is to see whether genome mapping can identify those structural variants.
Currently, exome sequencing has a diagnostic rate of about 30 percent, Vilain said. "That's great and it's much better than before, but it's still a minority," he said. "If we're able to move up from that 30 percent using genome mapping, that has huge implications." Vilain added that he does not think all the remaining 70 percent of cases will have a causative structural variant. There could be a complicated combination of genetics involved, or environmental factors, he said.
Stan Nelson, a professor of human genetics and co-director of the Center for Duchenne Muscular Dystrophy at UCLA, said that his team, in collaboration with Children's National, plans to evaluate at least 50 unsolved cases from the Undiagnosed Diseases Network using the Saphyr instrument. The technology "gives a very different view of the genome than we get with short-read sequencing," he said, "so I think one goal on our end is to work out how we should best use it."
Nelson added that the Genome Medicine study was a proof of principle demonstrating that the technology worked. But in order to roll it out clinically, "we need to prove how sensitively we can observe de novo mutations genome-wide."
Ultimately, Nelson predicted, no single technology will serve every purpose. "It will be all about integrating these different tools to get a comprehensive look at the genome," he said.