NEW YORK – Researchers at Friedrich Miescher Laboratory of the Max Planck Society and the University of Groningen have developed a new, higher throughput version of an assay that gets chromatin accessibility and gene expression data from the same cell.
The coassay is based on simultaneous high-throughput ATAC and RNA expression with sequencing (SHARE-seq), a method published in 2020 by Jason Buenrostro and Aviv Regev's labs at the Broad Institute. While that method required custom sequencing runs, the new method, dubbed easySHARE-seq, was designed to be compatible with a standard Illumina sequencing run.
"We have streamlined easySHARE-seq to pack all its barcode complexity into 8 bp in Index 2 [of an Illumina library prep] and 17 bp in Index 1, along with a quicker and more efficient protocol, relative to SHARE-seq," Frank Chan, a researcher at the University of Groningen and senior author of a BioRxiv preprint about the method, said in an email.
EasySHARE-seq should enable new applications such as allele-specific expression studies and could help reduce costs by allowing researchers to tune how many cells they sequence per sample.
Besides SHARE-seq, there are several other methods for obtaining open chromatin and gene expression data from the same cell, said Darren Cusanovich, a researcher at the University of Arizona, who is an expert on single-cell ATAC-seq but was not involved in the study. 10x Genomics' single-cell multiome kit is a commercialized but pricier solution, and Paired-seq, developed by Bing Ren's lab at the University of California, San Diego, is another academic protocol available. "Any methods pushing scale are of value in thinking about designing studies going after larger populations or more [sample] replicates," Cusanovich said. "The challenges are trying to maintain data quality for both assays."
In SHARE-seq, single cells or nuclei are exposed to Tn5 transposase to mark regions of open chromatin and mRNA are reverse transcribed with a biotin tag. These targets are then tagged with hybridizing barcodes using a split-pool technique, which gives most cells a unique combination of barcoded molecules. The cDNAs are separated from the chromatin using bead pull-down, and each library is sequenced separately.
Paired-seq also uses Tn5-based tagmentation and split-pool barcoding but has "more complex molecular steps," Chan said, including three ligation steps, "which altogether results in much higher costs per cell."
EasySHARE-seq uses only two rounds of barcoding, after which cells are divided into "sub-libraries" of approximately 3,500 cells and the cDNA and chromatin fragment libraries are generated. Each sub-library is amplified using matched indexing primers "to allow identification of paired cellular scRNA- and scATAC-seq profiles. By scaling up the numbers of sub-libraries, this barcoding strategy therefore allows for high-throughput experiments of hundreds of thousands of cells, only limited by the availability of indexing primers," the preprint authors wrote. After quality control, sub-libraries yield approximately 2,500 cells or nuclei, Chan noted.
In proof-of-concept experiments on mouse liver samples, each nuclei had on average 3,629 unique molecular identifiers — a quality metric for gene expression — and 2,213 gene fragments.
Overall costs depend on how many cells are processed per experiment. For sample prep, a 100,000 cell library costs €.06 ($.07) per cell. "This makes it slightly more expensive in terms of reagents than SHARE-seq [at $.05], but because you can multiplex these libraries on any sequencer, it makes it in total much cheaper since you save on sequencing costs," Chan said.
Moreover, researchers can choose how many cells to sequence, rather than sequencing the entire library as was often done with SHARE-seq. "In easySHARE-seq, you can decide in steps of approximately 2,500 cells how many you want to sequence," Chan said. "This allows, among other things, sequencing one sub-library to check for data quality or simply fine-tune the number of cells, which is advantageous in studies with many experiments or conditions."
While there is a well-observed correlation between open chromatin and gene expression, it's not always a perfect correlation, Cusanovich said. "Genes are regulated by multiple elements, some very far away." When making any causal inferences about how one regulates the other, "it's helpful to have both types of information from the same cell," he said, adding that chromatin provides important regulatory information not available from gene expression alone.
With several maturing methods for assaying both cell features, "it becomes about study design, cost, and ease of implementation," Cusanovich said. "Those sound like pedestrian economic considerations, but they're fundamental to biological insights. You need to profile enough cells to see the variation present and have enough replicates to see if they're robust."
With easySHARE-seq, experiments can sequence up to 300 bp of a molecule. "The longer insert sequence improves the power and ability to resolve allele-specific expression — possibly doubling the number of genes where we can find the differences," Chan said.
How well this method may catch on remains to be seen. The original SHARE-seq paper has been cited 458 times, according to Scopus, Elsevier's citation database. Chan noted that he is the only user of easySHARE-seq so far.
He added that his team is not pursuing commercialization. "We want as many groups as possible to use this, so we are not pursuing this strategy," he said.