Skip to main content
Premium Trial:

Request an Annual Quote

Stanford Software Uses Bayesian Analysis to Parse Noise, Combine Data Sets for Genetic Screens


NEW YORK (GenomeWeb) – Earlier this week, Stanford scientists led by Michael Bassik published a study in Nature Biotechnology comparing the results of CRISPR/Cas9 knock-out and RNAi knock-down screens for essential genes.

As reported by GenomeWeb, while the screens performed comparably in the number of essential genes found by each method, the results could be combined using a software package developed by first author David Morgens to provide even better results. Running the combined data through the analysis tool revealed genes that were missed by one or both methods.

"It takes data for an overarching hypothesis from multiple, perhaps disparate sources," Morgens told GenomeWeb. "It takes the overlap, plus the confident things that appear in only one screen or the other."

The result is a tool that combines data between any two screening methods to improve results. And because it was developed to generally address experimental noise, it has the potential to be relevant in numerous areas beyond genetic screening.

Morgens, whose background is in mathematics and bioinformatics, said he started working on the analysis tool, called Cas9 high-throughput maximum likelihood estimator (casTLE), even before Bassik's lab did the CRISPR versus RNAi screening comparison.

He had just started rotating into the lab and was trying to get involved in the experimental work. As he acclimated himself to the bench, he saw that his more experienced lab mates had a an intuitive sense they used to decide if an effect in an experiment was real or not and wondered if he could capture that in code. "It was me trying to take that biological intuition that comes from doing experiments and codifying it in a strict statistical manner," he said.

His lab mates would tell him, "'For a given gene, if one of these reagents looks really good and the data supports that the gene has some kind of function, there are two possible explanations,'" he said. "'Either this gene really does play a role or its noise.' What they would use to decide between the two was to look at other reagents targeting that gene," he said.

What he came up with was an empirical Bayesian framework that helped parse the noise inherent in biological experiments. In genetic screens, each gene has some maximum possible phenotype — "a true answer," Morgens said. Each true effect is somewhere between zero and max possible effect, he explained, but is obscured through a lens of noise. "[casTLE] explicitly considers that there are two kinds of noise in our system," he said: technical and biological. "There's the noise that comes from the experiment, and separately there's this idea that if you have multiple reagents targeting the same gene they'll have a range of phenotypes."

"The reason it works is that the technical variability can be estimated," Morgens said. "We have large numbers of reagents in our experiment that we expect to have no effect whatsoever. We can measure that component of noise." For example, if only a few of the CRISPR/Cas9 guide RNAs showed a distinct phenotype, but the shRNAs also indicated it might be an important gene, combining the two data sets with casTLE might use that information to improve the confidence in the CRISPR result.

The biological noise is trickier. This kind of noise can happen a lot in CRISPR/Cas9 knockout screens, since an indel might lead to an in-frame mutation that leads to a phenotype where the gene product is altered but still somewhat functional. Morgens said that while it's impossible to measure that, "what we can do is make the fewest assumptions possible."

What casTLE does is combine the two sources of noise, and comparing genetic screens turned out to be a great proof-of-concept. For every gene, it spits out the maximum possible effect and a confidence score.

"I built this framework with no thought in mind of combining results from multiple screens, but we realized, almost after the fact, that the difference between Cas9 and shRNA screens should be only the technical noise," Morgens said.

Doing CRISPR/Cas9 and RNAi screens side by side, by the same people, helped eliminate other technical noise. "We knew some of the answers already and we knew what to expect, which made the comparison much easier," he said.

Because it is reagent agnostic, Morgens said that as it currently exists, casTLE could combine data from any two screening methods. In the paper, he validated casTLE with CRISPR interference and activation as well as knock-out and shRNA knock-down. He also thinks it could be applied in target identification in drug screening.

"What we're hoping to do, now that we have this tool, is apply it to new biological problems." Morgens put the code to help run casTLE in an online, open-source code repository. "People can download it if they want to," he said. "That's the exciting part about the academic world. This is out there now, and people can make it better."