NEW YORK – Nanopore-based long-read sequencing can offer loads of information on higher-order chromatin structure, according to a recent study using a new assay developed by researchers at the New York Genome Center and Weill Cornell Medicine in collaboration with Oxford Nanopore Technologies.
Pore-C, the new chromatin conformation assay, builds on methods like Hi-C to provide information on genome structure that is lost by short-read sequencing technologies.
"It's built on essentially the same assay as Hi-C with a few modifications," said senior author Marcin Imielinski, a researcher at Weill Cornell and the New York Genome Center. Like Hi-C, Pore-C uses crosslinking, in situ digestion, proximity ligation, and sequencing to get information on contact points between genomic locations in three dimensions. But while Hi-C generally provides pairwise contacts, due to shearing the concatenated products for short-read sequencing, Pore-C de-crosslinks the ligated products and uses long-reads to extract information on 20, even 50, loci that might provide insight into higher-order chromatin structures.
Additionally, the method does not require amplification, which minimizes sequencing bias, provides access to repeat regions of the genome, and provides the potential to analyze base modifications, Imielinski said. In the study, they generated concatenated reads with MinIon, GridIon, and PromethIon instruments; sequencing runs had N50 read lengths in the range of approximately 1.5 kilobases to 6.8 kilobases.
His lab and their Oxford Nanopore collaborators used Pore-C to reconstruct complex and aneuploid rearranged alleles in a breast cancer cell line and generated a chromosome-scale de novo assembly of the HG002 genome. They published their results in a BioRxiv preprint in November.
"Our results establish Pore-C as the most simple and scalable assay for the genome-wide assessment of combinatorial chromatin interactions. We look forward to seeing how users of nanopore devices utilize this technique to gain new insights across a broad range of research applications," Sissel Juul, director of genomic applications at Oxford Nanopore and one of the study's authors, said in a statement.
Pore-C adds to a tiny but growing number of methods that can capture multiple data points on genome or nuclear structure.
"They were able to piece together multiple DNA molecules … and validate some of the multi-way interactions," said Sophia Quinodoz, a doctoral student at the California Institute of Technology. She and her advisor, CalTech Professor Mitchell Guttman, developed one of the other multi-way methods, called split-pool recognition of interactions by tag extension (SPRITE). "That was really great to see. I think they're the first to show that, because there aren't many multi-way methods," she said.
"Until you can get a global picture [of nuclear structure], it's impossible to understand how structure impacts function," Guttman added. Using methods like Hi-C, which debuted in 2009, "we've learned a lot, but there are still many unknowns," he said.
The collaboration that led to Pore-C began in 2017, when Imielinski, a core faculty member of the New York Genome Center, encountered the Oxford Nanopore team, which has a satellite office there.
Both sides contributed bench development and analytical tools. Imielinski's lab developed a custom alignment pipeline to interpret combinatorial outputs, and the collaboration used tools his team developed to look at rearranged genomes. Oxford Nanopore's Eoghan Harrington led development of raw data processing.
In their preprint, the researchers compared their method to both Hi-C and SPRITE.
"Visual inspection of the Pore-C virtual contact map revealed previously identified features of … chromatin structure … all of which were reflected in the corresponding Hi-C dataset," the authors wrote. "Additionally, we found a close correlation between Pore-C virtual pairwise contact maps and Hi-C data."
In addition, the preprint stated that Pore-C "also detects higher-order chromatin structure at 18.5-fold higher efficiency and greater fidelity than SPRITE."
SPRITE uses crosslinking to create clusters of DNA, RNA, and even proteins. After fragmenting chromatin, the method introduces combinatorial barcoding, resulting in clusters of molecules with the same series of tags. After sequencing, those clusters are recreated bioinformatically using the barcodes. In addition to capturing multi-way contacts between molecules, "SPRITE is measuring the 3D distance between different sites," Guttman said.
Imielinski said they looked at data in Guttman and Quinodoz's paper to determine that SPRITE clusters had a median order of four, and 11 percent of clusters had an order greater than two. For Pore-C, median order was seven, and 78 percent of concatemers had an order greater than two.
But Guttman said the comparison was "kind of like comparing a motorcycle and a car. Ultimately, what we're capturing and measuring is different things." SPRITE measures the 3D distance between different sites, while proximity ligation techniques "measure the frequency at which two DNA sites are close together in the nucleus," he said. He also suggested that the Pore-C preprint had overemphasized the number of SPRITE reads whose barcodes did not match other reads, approximately 30 percent in human cells.
Imielinski noted that the preprint was not finalized and that the comparison with SPRITE might need to be adjusted. "SPRITE does have these massively multi-way contacts, e.g. 1,000-way, which we don't see in Pore-C," he said. "Those may represent biologically significant signal."
According to Imielinski, the cost per gigabase of nanopore sequencing is approaching parity with Illumina, making Pore-C a more attractive choice. He added that the method "just has a markedly higher proportion of multi-way contacts among the sequenced reads."
Imielinski's lab has made Pore-C analysis tools available on GitHub.