Skip to main content
Premium Trial:

Request an Annual Quote

Nature Papers Discuss Ways to Consider Dropout, Missing Data; Simulation of SARS-CoV-2 Proteins

The high proportion of zeros in typical single-cell RNA sequencing datasets has resulted in widespread but inconsistent use of terminology such as dropout and missing data, a pair of University of Chicago scientists say in this week's Nature Genetics. Arguing that much of this terminology unnecessarily complicates matters, they offer ideas about how to think and talk about such data to limit confusion. First, the authors suggest that observed scRNA-seq counts reflect two distinct factors — the variation in actual expression levels among cells and the imperfect measurement process — and that carefully distinguishing between these contributions can help to clarify thinking. They also propose that method development should begin with a Poisson measurement model, rather than more complex models, because it is simple and generally consistent with existing data. "Both ideas are simple, and neither is new," they write. "We show how many scRNA-seq observation models can be interpreted as combining a Poisson measurement model with different expression models, clarifying their underlying assumptions about expression variation." They also explain how the ideas can help address questions of biological interest, such as whether messenger RNA expression levels are multi-modal among cells.

The results of a crowdsourced effort to simulate SARS-CoV-2 proteins is reported in this week's Nature Chemistry. While many of the protein structures involved in SARS-CoV-2 activities like infection initiation and replication have been solved, little is known about their relevant conformational changes, a team led by Washington University in St. Louis investigators write. To address this, more than one million citizen scientists from around the world donated their computer resources to the Folding@home distributed computing initiative, which aims to create the first exascale computer and simulate 0.1 seconds of the virus' proteome. "Using this resource, we constructed quantitative maps of the structural ensembles of over two dozen proteins and complexes that pertain to SARS-CoV-2 from milliseconds of simulation data generated for each system," the paper's authors write. "Together, we have run 0.1 s of simulation," uncovering the "mechanisms of conformational changes that are essential for SARS-CoV-2’s replication cycle and reveal a multitude of new therapeutic opportunities."