NEW YORK (GenomeWeb) – Long sequence reads are increasingly moving into the clinical space as sequencing costs continue to drop and the technology improves.
At the Advances in Genome Biology and Technology meeting held earlier this month in Orlando, Florida, a number of researchers described projects using long-read sequencing technology from Pacific Biosciences and Oxford Nanopore in clinical applications ranging from outbreak tracking, developing niche diagnostic assays, understanding infectious disease evolution, and identifying complex gene fusions and rearrangements in cancer patients.
Mt Sinai's diagnostic assays
Researchers at the Icahn School of Medicine at Mt. Sinai have been at the forefront of moving NGS into the clinic and are among the earliest adopters of PacBio's single-molecule sequencing technology. During a PacBio-sponsored workshop at AGBT, Robert Sebra, director of technology at Mt. Sinai, discussed a number of diagnostics the group is developing on PacBio instruments.
Mt. Sinai's NGS suite includes three PacBio RSII systems, two of PacBio's recently launched Sequel instrument, seven Illumina HiSeqs — including both the 4000 and 2500 — two Illumina MiSeqs, 10 Thermo Fisher Ion Protons, five Ion S5 XL machines, two Ion PGMs, and one CE sequencer. It also has a number of sample prep instruments and other technologies designed to process samples upstream of NGS, including 11 Ion Chefs, one of 10X Genomics' platforms for generating linked reads, one Fluidigm C1 system for single-cell processing, and two of Berkeley Lights' beta instruments for single-cell selection.
Sebra said that the long reads of the PacBio instruments have been particularly useful for identifying pseudogenes, repeat expansions, and polymorphic loci.
"Every individual has about 1,000 structural variants that are more than 2,500 base pairs in length," Sebra said, which are difficult to analyze with short-read sequencing.
Sebra said Mt. Sinai researchers last year created a de novo assembly of a human genome using technologies from Illumina, PacBio, and BioNano Genomics and are now designing a number of diagnostic tests on the PacBio system.
For instance, he described a pharmacogenomics test that would analyze the CYP2D6 gene, which is involved in metabolizing 20 to 25 percent of all medications, but is also extremely polymorphic and has homologous pseudogenes and copy number variants that complicate analysis. There are over 100 different so-called star alleles that have implications for an individual's ability to metabolize certain drugs, as well as many known duplications, Sebra said.
It's important "to tackle and correctly phase the variants" in that gene since it could influence drug choice and dosing, Sebra said. The Mt. Sinai team has designed an assay that uses long-range PCR and sequencing on a PacBio platform to sequence the full CYP2D6 gene.
In a validation study published late last year in Human Mutation, the researchers found that of 10 previously genotyped controls, their assay identified not only the known alleles, but also identified novel alleles. In addition, Sebra said that the assay had "higher resolution" than the previous methods used to genotype the samples, and in 14 cases that had unclear genotypes, it was able to disambiguate those samples.
The researchers have also been designing an assay to sequence the full BRCA1 and BRCA2 genes. They have designed a 70-amplicon panel composed of amplicons between 3 kb and 6 kb long that span the entirety of the two genes. The goal of the assay is both for discovery purposes — to identify variants "beyond what's in current databases," which are highly biased toward variants in exons — as well as for diagnostic purposes.
After designing the assay, the Mt. Sinai researchers compared it to more targeted panels on the MiSeq and Proton platform. They found that for known SNVs, the platforms were largely concordant, but the PacBio assay had a "significant increase in indel calls," which Sebra attributed to the longer reads.
"When you open the floodgates and look at novel regions, you see dozens to hundreds of novel variants," he added. The group is still validating these results, and Sebra said it would also need to test the assay on a larger cohort of samples.
Long reads are also useful in diagnosing diseases caused by repeat expansions and in analyzing disease-related genes that have known pseudogenes.
One cause of amyotrophic lateral sclerosis is a repeat expansion of four guanine bases and two cytosine bases. A repeat of more than 30 of that series is considered pathogenic, but some cases have "been shown to have thousands of repeats," Sebra said.
Sequencing through those repeats is difficult with short-read sequencing technology, but doable with longer reads. Sebra said that the Mt. Sinai researchers have tested both the RSII and the Sequel on targeting this repeat expansion. The results from the two systems were comparable, he said. And, because the Sequel has a higher throughput than the RSII, Sebra said they would be able to use it to barcode and multiplex samples.
Gene fusions and splice mutations
Adam Ameur, a bioinformatician at Uppsala University, said that Uppsala's core facility operates one HiSeq X Ten system, 17 HiSeq 2000s and 2500s, three MiSeqs, one NextSeq, two RSIIs, and expects to receive a Sequel system in the spring.
The university has developed several clinical applications on its RSII machines, including an assay to identify BCR-ABL1 gene fusions in chronic myeloid leukemia patients. The fusion is important to diagnose because patients who have it can be treated with the drug imatinib.
In addition, patients sometimes develop mutations in the fusion transcript that confer resistance to the drug. Traditional methods to identify the resistance-conferring mutations rely on Sanger sequencing, but that method is not very sensitive, according to Ameur, since it can only identify mutations at a 10 percent or higher frequency.
To improve this sensitivity, the researchers set about designing an assay on the RSII. They created a 1.6-kb amplicon that spans the fusion and sequenced it to more than 10,000-fold coverage. In one sample, they ran the assay at diagnosis and then six months later, at which point they identified two mutations that the tumor had acquired in the fusion transcript.
One mutation that was present in all of the PacBio reads was also detected with Sanger sequencing, but the second mutation, which was only present in about 4 percent of the PacBio reads, was not immediately detectable with Sanger sequencing. However, a few months later, the group re-analyzed the patient, at which point the mutation was present at a high enough frequency to be detected by both the PacBio and Sanger methods.
Ameur said the team has now tested more than 100 patients, running the PacBio assay alongside the Sanger test, and has achieved 100 percent consistency with Sanger results.
Separately at AGBT, Somasekar Seshagiri, associate director of molecular biology at Genentech, discussed his firm's use of sequencing in clinical research projects that inform and drive drug development at Genentech. While Genentech predominantly relies on short-read sequencing, he said the group has used PacBio's Iso-Seq application — an RNA sequencing protocol — to look at splice mutations in mesothelioma. Seshagiri said that his team has found mutations that alter splicing of the gene SFB1 in mesothelioma cases. In addition, he said they found a number of splice mutations that impact tumor suppressor genes.
Infectious disease
Finally, a number of groups are using both PacBio and Oxford Nanopore technology for infectious disease diagnosis and outbreak tracking.
Nick Loman, an independent research fellow at the University of Birmingham, described his lab's use of the MinIon real-time platform to sequence patient samples suspected to be infected with Ebola during the outbreak in western Africa. Josh Quick, a graduate student in Loman's lab, traveled to Guinea during the outbreak and worked out of a mobile lab set up by Miles Carroll, head of research and deputy director of microbiology services for Public Health England.
Loman said that the group was able to get reads as long as 2 kb from fresh samples, and they were able to uncover the evolution of the virus and work out a transmission route. For instance, he said, previous analysis had showed that there were two main lineages of Ebola — one that had originated in Guinea and one that had originated in Sierra Leone. By sequencing in real-time and collaborating with other researchers who were also sequencing Ebola strains, the group was able to identify that there had been cross-border transmissions and shared their results with the World Health Organization, Loman said.
Loman's group published the results of the project earlier this month in Nature, describing sequencing results from 142 samples.
Others are using long reads to sequence viruses such as HIV. Ben Murrell, from the University of California, San Diego, described using PacBio technology to sequence the ENV gene in HIV. The ENV gene is "really diverse and challenging to sequence," Murrell said. But, it is the only surface protein and some patients have been found to generate antibodies against it.
Because of its ENV's diversity, it is hard to make antibodies against it, so Murrell's team used circular consensus sequencing on the PacBio instrument to sequence the entire ENV gene to "look at the ENV variants in lots of detail," he said. The goal is to use the assay to follow patients over time, especially those who produce antibodies, to track the evolution of the gene with the goal of figuring out how to develop a vaccine. The group is also collaborating with the International AIDS Vaccine Initiative on this work.
Murrell said that the team is preparing a study to submit to a peer-reviewed journal describing the phylogeny of the ENV gene over time in response to an antibodies.
Charles Chiu, who heads the Viral Diagnostics and Discovery Lab at the University of California, San Francisco, described progress he has made in designing an infectious disease diagnostic on the MinIon. The lab originally developed a metagenomic sequencing test that runs on the Illumina MiSeq, which it plans to launch as a laboratory-developed test in May, but Chiu has also been working on using the MinIon to do metagenomic sequencing. He has previously described the method the team is developing and said at the AGBT conference that it plans to develop a test based on that strategy for febrile diseases on the MinIon.