CHICAGO – Researchers affiliated with the University of Oxford and startup genomics research firm Nucleome Therapeutics have published a novel analytics method that maps 3D genome structure down to a single base pair.
"We've got it down to the lowest possible level. It cannot get any higher resolution," said Jim Hughes, a professor of gene regulation at Oxford and a cofounder of Nucleome.
The technique, called Micro-Capture-C (MCC), is an advanced chromosome conformation capture (3C) method that features multiplexed assays, combining a new molecular biology method with a novel computational platform to zoom in on details in previously obscure regions of the genome. It is primarily meant to inform drug discovery.
The researchers described their work in a paper published last week in Nature. It builds on the Capture-C technique, first published in 2014, that two of the key authors of the new paper developed at Oxford's MRC Weatherall Institute of Molecular Medicine, as well as a 2015 paper in which some of the same people described a 3C method that had a much lower resolution than MCC.
Micro-Capture-C boosts resolution to a single base pair to enable detection of physical contacts between different elements of gene regulation. Previously, chromatin mapping had been at resolutions of 1 kilobase or higher, making it impossible to define physical contacts to determine gene expression at protein scale.
"The problem with these [earlier] techniques, though, is that the resolution is somewhat limited below about 500 base pairs," said James Davies, a hematologist in the MRC Weatherall Institute and an academic founder of Nucleome. "Most of the key proteins that control when genes are turned on and off bind much shorter sequences of DNA, say something between seven and 22 base pairs, so we really wanted to see the mechanics in much finer detail."
The MCC method unlocks some of the mysteries of unexplored parts of the genome. "This is a marked increase in resolution compared to all other available 3C methods, including Hi-C, Micro-C, promoter capture Hi-C, 4C, and next-generation (NG) Capture-C," they wrote.
Oxford, UK-based Nucleome Therapeutics, which has exclusively licensed MCC from the university, wants to decode the "dark genome" to open up new possibilities for drug discovery and development, according to CEO and Cofounder Danuta Jeziorska.
"The discovery and development of therapeutics is really costly and time-consuming," Jeziorska noted, but added that genetics is helping to reduce the failure rate. "However, most of the drug discovery in genomics is focusing on just 2 percent of our DNA."
The other 98 percent is the poorly characterized "regulatory dark matter" of the genome, which is considered to be noncoding, she said. "This 98 percent of the DNA acts like an instruction manual for the other 2 percent."
Nucleome was founded in August 2019 as a spinout from Oxford. The company raised £5.2 million ($7.3 million) in a seed round led by Oxford Sciences Innovation in its first year, and also was part of the 2019-20 cohort of Creative Destruction Lab, a business accelerator for science and technology companies.
Jeziorska said that the company has eight employees and will be adding at least five more in the next few months.
The gains described in the Nature article came through five advances over the NG Capture-C method: the use of micrococcal nuclease rather than restriction enzymes; using intact cells permeabilized with digitonin instead of chromatin in solution or purified nuclei to minimize disruption of nuclear architecture; a 1,000-fold increase in the depth of data from individual viewpoints over Hi-C and Micro-C; the generation of contact maps accurate to a single base pair by directly sequencing ligation junctions between reads at different sites; and the development of a new data analysis pipeline to help pinpoint ligation junctions and reconstruct protein-protein interactions.
"It's not a single technology," Jeziorska said. "It's multiple technologies that are overcoming the key challenges that really hinder the translation of this dark genome to treatments."
It allows researchers to identify the right variants affecting functions and mechanisms and to test the effects of variants on gene expression, according to Jeziorska. "Because we are doing everything at scale, it allows us to start identifying disease-affected pathways," she said.
Machine learning and computational genomics tools define variants and cell types, then the bioinformatics platform conducts the 3D genome analysis.
Davies and Hughes likened 3D genome analysis to a telescope. A researcher can choose a broad panorama of stars in the sky at low resolution or home in on the fine details of a single star.
"What we've been really trying to understand is how genes are turned on and off [by] look[ing] at individual genes in huge detail," Davies said.
"The key behind Nucleome is that we are coming from the expertise of gene regulation and we combine the molecular biology, the lab experiments, understanding how genes are regulated, together with the computational and also the machine learning," Jeziorska said.
"The method gives us more precision and confidence in linking the genes to genetic changes," she continued. "This allows us to identify potential novel, safe, and better drug targets that are guided by genetics with corresponding biomarkers."
Davies put 3D genome mapping into perspective by noting that if a single base pair of a human genetic sequence was a millimeter across, each cellular nucleus would contain 6,000 kilometers of DNA packed into a ball about 10 meters in diameter.
"Some of these contacts that we can see are 1 million base pairs apart, so that would be a kilometer of DNA," Davies said. "We can measure that to the nearest few millimeters and … we're able to pull out these sequences and work out what they're contacting."
On the computational side, Hughes — the principal investigator for the newly published research — said that his laboratory at Oxford applied machine learning to understand functionality embedded in the "dark" genome because genomes simply contain so much data. Nucleome and Oxford built their bioinformatics essentially from scratch.
"This combination of machine learning for the genome analysis functional validation, it's like a compendium of tools that are allowing us to tackle this specific problem," Jeziorska said.
The primary cellular data published in the paper comes from mouse cell lines, though Davies said that the Nucleome and Oxford researchers have subsequently begun applying their technique to primary human cells. They are preparing another manuscript based on this more recent work.
"Conventionally, what people have done is to strip off the cytoplasmic membrane of the cells when they do these kinds of assays," Davies said. That risks stripping off the membrane and disrupting the structure of the nucleus, though, so the researchers took a more gentle approach.
"We … introduced that just to punch small holes in the cell membrane," Davies explained. "That allows you to get the reagents in to do the molecular biology without disrupting the nucleus structure."
While the researchers still performed a "conventional" exome capture for the experiment described in Nature, Davies said that with their bioinformatics, they were able to increase the depth from a few thousand to hundreds of thousands of reads.
"These results show that it is possible to identify the precise location of the sequences bound by regulatory proteins in the enhancers that control a specific promoter using a single technique," the authors wrote.
The increased resolution allowed the Oxford-Nucleome team to define previously undetectable contacts with enhancers within 2 kilobases of the promoter. It also permitted them to separate signals from contacts between regulatory elements located very close to each other.
"The increased resolution of MCC enabled us to interrogate gene-dense loci that have previously been difficult to characterize with 3C methods," they added. "Promoters were seen to colocalize in gene-dense regions; the Klf1 locus, for example, contacts at least 15 other promoters and enhancers in its vicinity."
They said that earlier 3C methods lacked the "complex modeling algorithms" to identify peaks of interaction in 3C datasets.
Jeziorska said that Nucleome represents the translation of 10 years of research from multiple scientists at Oxford, particularly Davies and Hughes.
Nucleome is initially focusing on autoimmune diseases generally and lymphocytes specifically, with the goal of building a pipeline of compounds.
"We are using the platform to prioritize and discover novel and safer drug targets," potentially in partnership with pharmaceutical companies, Jeziorska said. "But at the same time, we will be generating different data pipes, so we will be building it as a dataset that allows us to explore how the genome is regulated for drug target discovery by derisking existing targets."
In a commentary accompanying the main article in Nature, Anne van Schoonhoven and Ralph Stadhouders of Erasmus University Medical Center in the Netherlands said that MCC represents a "huge leap forward" for 3D genome mapping resolution.
"[This] approach also enables DNA-binding-protein 'footprints' (the DNA sites to which such proteins bind) to be detected because DNA that is bound to proteins is protected from digestion by MNase," van Schoonhoven and Stadhouders wrote.
"Although, at first glance, the individual technological innovations in the MCC method might not seem revolutionary, when combined, they offer something the field has long been waiting for: a way to precisely detect which DNA bases mediate long-range genomic interactions," they added. While the new method may not facilitate genome-scale analyses, the commentators said that it could be improved to do so in the future.