Researchers at the Gladstone Institute and elsewhere introduce an algorithm called CellWalker, designed to assess regulatory element features based on a combination of single-cell ATAC-seq (scATAC-seq) data for profiling open chromatin regions, single-cell RNA sequence data, and bulk sequence data. "We present CellWalker, a generalizable network model that improves the resolution of cell populations in scATAC-seq data, determines cell label similarity, and generates cell type-specific labels for bulk data by integrating information from scRNA-seq and a variety of bulk data," the team writes. When the authors applied CellWalker to scRNA-seq and scATAC-seq data from developing human brain samples, they tracked down potential cell type-specific regulatory elements, including putative regulatory elements and cell types related to genes implicated in autism spectrum disorder, developmental delay, or other neurological traits and conditions.
A Peking University-led team outlines the "integration of multiple single-cell datasets by adversarial paired-style transfer networks" (iMAP) algorithm, aimed at integrating multiple single-cell RNA-seq datasets in a deep learning framework for dialing down so-called batch effects that interfere with the ability to interpret authentic biological variation. Using available single-cell RNA-seq data generated for more than 50,000 individual cells with Smart-seq2 and 10x Genomics approaches, for example, the researchers profiled tumor-infiltrating immune cells in samples from 18 individuals with colorectal cancer, identifying previously unappreciated immune cell interactions. The iMAP method "may be easily extended to tackle other types of single-cell measurements," they note. "We expect this work to be further improved to suit the multi-dimensional nature of the new single cell data."
Finally, investigators at the University of Zurich and the SIB Swiss Institute of Bioinformatics share a strategy for doing more realistic simulations of high-throughput, short-read Illumina sequence data. The team reasoned that simulations "are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often overs-simplified, which leads to optimistic results for most tools." In an effort to come up with more authentic sequence simulations, the authors developed a tool known as ReSeq that takes systematic sequence errors into account through training on large datasets, which they applied alongside 11 available datasets. "We show that ReSeq outperforms all competitors in terms of delivering a realistic simulation," they write, "and therefore lays the methodological groundwork for accurate benchmarking of genomics tools."