NEW YORK – Researchers from the Broad Institute and the Wellcome Sanger Institute have found that results from two pan-cancer CRISPR-Cas9 screens recently published by the institutes were highly concordant across multiple metrics despite significant differences in experimental protocols and reagents.
In a paper published on Friday in Nature Communications, the researchers said both common and specific cancer dependencies were jointly identified across the two studies. Further, robust biomarkers of gene dependency found in one dataset were recovered in the other.
"Through further analysis and replication experiments at each institute, we show that batch effects are driven principally by two key experimental parameters: the reagent library and the assay length," the authors wrote. "These results indicate that the Broad and Sanger CRISPR-Cas9 viability screens yield robust and reproducible findings."
The researchers compared two sets of pooled genome-scale CRISPR-Cas9 drop-out screens in cancer cell lines, considering a total of 147 cell lines and 16,733 genes that were screened independently by both institutes. They performed comparisons of individual gene scores by quantifying the reduction of cell viability that resulted when genes were inactivated through CRISPR targeting. They also analyzed the profiles of the gene scores across cell lines and the profiles of the scores across genes in individual cell lines.
The team found concordant gene scores across all genes and cell lines for processed, unprocessed, and batch-corrected data, noting that the mean gene scores among all cell lines showed excellent agreement. The researchers further tested whether it was possible to recover consistent sets of common dependencies and found that the Broad and Sanger jointly identify 1,031 common dependency genes.
In a separate experiment, the researchers also looked at whether the dependency genes identified in the two studies could be reliably associated with informative molecular features of cancer, or biomarkers. To that end, they performed a systematic test for molecular-feature/dependency associations on the two datasets.
They found 71 out of 29,350 possible significant associations between molecular features and gene dependency when using the Broad unprocessed data, and 90 when using the Sanger unprocessed data. Of these, 55 (77 percent of the Broad associations and 61 percent of the Sanger ones) were found in both datasets.
Despite the concordance observed between the two datasets, however, the researchers did find some batch effects in the unprocessed data, both in individual genes and across cell lines. Although the bulk of these effects was mitigated by applying an established correction procedure, the investigators sought to elucidate their cause by conducting a gene set enrichment analysis. They found significant enrichment for genes involved in the spliceosome and ribosomes in the first principal component, indicating that screen quality likely explains some variability in the data. They also found that the choice of single guide RNA can significantly influence the observed phenotype in CRISPR-Cas9 experiments, implicating the differing sgRNA libraries as a likely source of batch effect.
They also noted that the different durations of the screens may have had some effect on the screens' agreement. The Broad used a 21-day assay, whereas the Sanger used a 14-day screen. When the researchers compared the distribution of gene scores for genes known to reduce viability upon inactivation at an early or late time, they found that early dependencies had similar score distributions in both datasets, but that late dependencies were more depleted in the Broad's data set.
"Our findings illustrate a high degree of consistency in estimating gene dependencies between studies at multiple levels of data processing, albeit with the longer duration of the Broad screens leading to stronger dependencies for a number of genes," the authors concluded.