BALTIMORE – Researchers at the New York Genome Center and Weill Cornell Medicine, along with their collaborators from Oxford Nanopore Technologies, have developed a new assay and algorithm that can decipher high-order three-dimensional (3D) interactions between more than two genomic loci at a genome-wide scale.
Described in a Nature Biotechnology paper earlier this month, the method, named Pore-C, combines chromatin conformation capture (3C) with nanopore sequencing to directly detect 3D locus groupings across the entire genome.
In conjunction with Pore-C, the scientists also developed Chromunity, a statistical algorithm that can uncover these biologically important interactions and discern them from the background.
Although high-order 3D interactions among groups of genomic loci are commonly observed in human chromatin, their role in gene regulation still remains nebulous, said Marcin Imieliński of the New York Genome Center and Weill Cornell Medicine and the senior author of the study.
According to Imieliński, over the past two decades, 3C-based assays, such as Hi-C, have become the "workhorse" for the 3D genomics field. By sequencing DNA concatemers that ligate sets of proximal DNA fragments in the sample together, Hi-C can help infer interaction frequencies between pairs of DNA loci that are close to each other.
However, because most 3C-based methods use Illumina short-read sequencing, they cannot detect high-order group activities of genomic loci beyond close-range pairwise interactions.
Imieliński suggested to imagine the cell nucleus as a big company and the genomic loci as its employees. While Hi-C could help people discover how frequently pairs of employees are close to each other, it can rarely detect interactions among a group of individuals. Meanwhile, in real life, important decisions within an organization are often made during team meetings, he said.
In addition to 3C, other genome-wide approaches exist, such as split-pool recognition of interactions by tag extension (SPRITE) and genome architecture mapping (GAM), that investigate high-order 3D chromatin structure by directly applying molecular barcodes to chemically fixed nuclear fragments or sections. However, Imieliński said these methods can mostly detect distant genomic activities, while lacking the specificity to detect close-range loci interactions that may be pertinent to gene regulation. Using the company analogy, these technologies may be able to tell individuals within the same office building, but cannot discern granular activities within the same room, he added.
To bridge the gap, Imieliński’s team married the backbone of Hi-C with nanopore sequencing. "We thought long reads would be good for this," he said, adding that the "rapid increase" in the throughput of long-read sequencing in recent years through Oxford Nanopore’s PromethIon offered the technical feasibility to scale up the assay genome-wide.
Mechanistically, Imieliński said, Pore-C is "essentially taking the 3C protocol with some modifications." When DNA is digested with restriction enzymes and re-ligated based on proximity, "it’s like a musical chairs thing happens," he explained, and the resulting sequence can reveal the structure of different genomic loci in 3D.
Meanwhile, compared with the conventional 3C methods, Pore-C is simpler because it eliminates the need for amplification and biotin pull-down, which is often used to enrich concatemer fragments, according to Imieliński. Additionally, the ability to detect base modifications by nanopore sequencing allows Pore-C to obtain bonus information on DNA methylation on top of detecting high-order 3D genomic structure.
With Pore-C enabling researchers to detect 3D group activities of the loci at the genome scale, the next big challenge is analysis, Imieliński said, meaning how to distinguish biologically important genomic 'meetings' from the background of pairwise interactions, which are analogous to watercooler talks in real life.
To do that, his team developed Chromunity, a statistical algorithm that can identify sets of genomic loci that have significantly higher frequencies of high-order 3D contacts than the background, or so-called "synergies."
"We called the algorithm Chromunity because we’re doing an analysis called community detection, which is a popular method to cluster sparse data for single-cell analysis, on these concatemers," Imieliński said.
Applying these methods to human cell lines, the researchers saw that the synergies were preferentially found in the enhancers and promoters in active chromatin, particularly in genes associated with cell identity. "There’s a clear link to gene regulation and gene expression, and our technology allowed us to see that," Imieliński said.
Specifically, in prostate cancer cells, the synergies were identified in binding sites of androgen-driven transcription factors and the promoters of androgen-regulated genes, while in breast cancer cells, they were linked to a class of complex DNA amplicons called tyfonas, the analysis showed.
This study "really pushes the limits of the technologies available in the field to a new state," said Robert Beagrie, a group leader at the Wellcome Centre for Human Genetics in the UK who helped developed the GAM method.
According to Beagrie, although there have been attempts to advance the 3C methods to detect more than pairwise genomic loci interactions over the past decade, this is the first study to his knowledge that can "scale those technologies, so that you can look at a whole genome in an appropriate depth all at once."
Another advantage of this method, he said, is that because it is based on 3C, "it should be relatively quick and easy to apply." Molecular barcode-based methods, such as SPRITE and GAM, can be "a little bit more labor-intensive," he said, adding that he believes "it’s very useful for the field to have a new method that will help us answer these biological questions a little bit more quickly."
More importantly, Beagrie said the tools described in the study can help decipher the functional importance of the 3D group loci interactions in the genome, which remains to be "one of the most important outstanding questions in the field."
According to Imieliński, the Pore-C data process pipeline, which is written in Python, is available as open-source software on GitHub and maintained by Oxford Nanopore. Meanwhile, Chromunity, which is written in R, is also available on GitHub and maintained by his lab.
Despite Pore-C’s strengths, Imieliński also pointed out some current limitations of the method, which his lab plans to improve moving forward.
For one, he said a tricky aspect of the method, as with many nanopore sequencing-based applications these days, is the high DNA input requirement. After library prep and size selection, the method called for microgram quantities of DNA, he said, which was achieved with 10 million or more cells in the paper. "That’s still a technical challenge," especially if people are hoping to apply this method to look at tissues, he added, and an important goal of future experiments is to figure out how low the DNA input can be.
Additionally, the data in the study is "still somewhat sparse," he said, and the signal could get even better. The method can potentially generate even more granular signals, he added, particularly for higher-order interactions, if the throughput of these datasets could increase five- to 10-fold.
Beyond 3D genomics applications, Pore-C also has “a lot of potential in genome assembly,” Imieliński said. "We’re particularly interested in cancer genomes, both in reconstructing highly amplified and rearranged cancer chromosomes and then understanding how they actually impact 3D structures." There is also still room for additional algorithm development for genome assembly, he added, particularly for phased assembly and polyploid or cancer genomes.
Lastly, Imieliński noted that the main error rate for nanopore sequencing remains in indels. "As with any other nanopore applications, I think this [method] would benefit from that error rate being lower," he said.
Although Pore-C is theoretically compatible with any long-read sequencing technology, Imieliński said his lab plans to stick with nanopore sequencing for now, given that the scalability of the platform is good and because of its ability to detect methylation profiles, which he considers an important biological signal.