This story includes reporting by Huanjia Zhang.
NEW YORK – As the reigning Nature Methods method of the year, long-read sequencing featured prominently in many of the talks at this year's Advances in Genome Biology and Technology meeting, held in Hollywood, Florida, last week.
Pacific Biosciences presented new data from its forthcoming Revio instrument, as did several speakers, including Karen Miga of the University of California, Santa Cruz. She also described the role of long reads in two reference genome-related projects she leads, the Telomere-to-Telomere (T2T) Consortium and the Human Pangenome Reference Consortium, and provided some data from Oxford Nanopore Technologies Duplex reads technology.
Winston Timp of Johns Hopkins University shared strategies his lab is working on with Oxford Nanopore Sequencing and Mitchell Vollger, a postdoc in Andrew Stergachis' lab at the University of Washington, described their use of Fiber-seq, a method for analyzing regions with open chromatin that takes advantage of PacBio's ability to call methylated bases.
Once upon a time, AGBT might have been the venue for PacBio's launch of Revio, its follow-up to the Sequel II and IIe instruments. But times have changed: Market dynamics and company aspirations led that firm to instead introduce it in October at the American Society of Human Genetics meeting.
The first early-access instrument has already been shipped to the Broad Institute, the company shared at a Thursday workshop. With flow cells exhibiting millions more of the features that make its single-molecule, long-read method work, Revio could provide the juice to increase the use of long reads in many studies.
In the workshop, PacBio's Aaron Wenger shared some of the first data from internal sequencing runs on seven human samples, two animal samples, and two plant samples. For HiFi reads, the highest quality reads available on PacBio instruments, yield per flow cell has "consistently" been around 96 Gb, and sometimes over 100 Gb, with average read length of about 15 Kb. On average, 90 percent of bases have Phred scores higher than Q30.
And Revio performs the same, or better, on methylation, SNV, indel, and structural variant calling, he said, based on F1 scores, a combination of precision and recall.
Johns Hopkins University professor Michael Schatz spoke at the workshop and announced that the National Institutes of Health's All of Us project would release 1,027 HiFi genomes it has created as part of a spring data release, with thousands more coming in the next several years.
During the workshop, PacBio also announced a new 16S rRNA kit for its MAS-seq line of long-read sequencing assays, coming in the second half of this year, and a new bulk RNA-seq kit, with no timeline for launch.
Miga also provided some Revio data in her Thursday afternoon talk, noting that the new platform "consistently" provides the same results as Sequel II in terms of providing "awesome telomere-to-telomere candidate [chromosome assemblies]."
Having last year created the first gapless assembly of a (haploid) human genome, the T2T consortium is working on creating assemblies of diploid human genomes. Miga described the challenges they face as they seek the combination of technologies that requires the least human intervention.
They began by using 170X HiFi coverage and 170X ultralong reads from Oxford Nanopore Technologies provided 25 T2T chromosome candidates, out of 46 chromosomes.
The T2T consortium is now exploring the use of ONT's Duplex reads, which add adapters to both strands of a DNA molecule and sequences them one right after the other. Doing so provides a "shift in quality" to higher Phred scores, she said, pushing them into the range of Q29 to Q30. The most recent batch of duplex flow cells her lab tested yielded 65 Gb.
With 42X coverage using only duplex reads, Miga's team was able to get 24 T2T chromosome candidates while PacBio alone offers 21 candidates. However, she said their errors are in different places. She then tried combining 42.5X coverage from HiFi and 20X coverage from duplex reads. "When you turn the same crank, you now have two T2T assemblies," she said. "This is moving us in the right direction."
Miga noted that her UCSC lab is trying out an all-nanopore combination of ultralong reads, duplex reads, and Pore-C, a method for probing chromosome conformation. "We're seeing how far we can take this," she said.
Vollger's Tuesday afternoon talk provided an update on the Fiber-seq method, which he said provides a data profile that looks like ATAC-seq (assay for transposase-accessible chromatin by sequencing). Vollger has led development of new computational tools that he believes will help more groups adopt the method.
Fiber-seq takes nuclei and treats them with a methyltransferase which "stencils" regions of open chromatin with m6A, a mark that PacBio's instruments can pick up on. It correlates well with short-read-based chromatin accessibility assays, he said.
"The exciting part about Fiber-seq is that you're sequencing a real molecule, not a PCR-amplified product," which means the assay is quantitative. "If you're looking at regulatory elements on single molecules of data, you can calculate the exact fraction of cells with accessible DNA at a particular site," he said.
Because it's a single-molecule long read, "we can tell you if a SNP 20 Kb away is affecting a regulatory element close [to the accessible region]," he said.
Most exciting for him is that it's identifying new regulatory elements never seen before, because they're in regions that are difficult to sequence without long reads. These are some of the fastest-evolving regions in the human genome, offering a chance to discover exciting new biology.
Vollger's talk focused on a new m6a caller and a toolkit that is 1,000 times faster than the previous pipeline. Now, analysis can be run in a matter of six hours of CPU time, down from 5,000 or even 10,000 CPU hours.
Already about a dozen labs are using Fiber-seq, however, the Stergachis lab is running a pilot project with the All of Us and HPRC teams to try the method out in those study contexts.
In a Thursday afternoon talk, Winston Timp, a biomedical engineering professor at Johns Hopkins University, showcased many of the ways his lab is using sequencing technologies from Oxford Nanopore and PacBio.
For starters, his lab is working with the MIT spinout Volta Labs to automate the process of extracting high molecular weight DNA. In general, the sample prep process is one of the obstacles to analyzing more samples with long reads.
His lab paired the Volta platform with Oxford Nanopore's MinIon sequencer. The pilot run generated “greater than 10 Gb sequencing yield with pretty good N50” of about 8.4 Kb, he said.
Timp also shared his lab's efforts to use nanopore sequencing on single-cell gene expression libraries generated on 10x Genomics' Chromium instrument.
"When cDNA comes off the 10x instrument, it's actually full-length,” he said. "They chop it down to fit on short-read sequencing."
"When you want to ask questions about isoforms, which are extremely important in neural samples and in the brain, you can do it just by pairing existing 10x tools to either PacBio or Oxford Nanopore," he added.
The lab used this technology combo to identify isoforms for Npas1 in inhibitory neurons, "which would have been impossible before" with only short read technology, Timp said.
To make this type of sequencing even cheaper, Timp's lab has been collaborating with Twist Bioscience to design tiled hybridizing probe sets for specific genes, which can result in 1,000- to 5,000-fold enrichment.