Abstract
The CRISPR–Cas9 system has revolutionized gene editing both at single genes and in multiplexed loss-of-function screens, thus enabling precise genome-scale identification of genes essential for proliferation and survival of cancer cells1,2. However, previous studies have reported that a gene-independent antiproliferative effect of Cas9-mediated DNA cleavage confounds such measurement of genetic dependency, thereby leading to false-positive results in copy number–amplified regions3,4. We developed CERES, a computational method to estimate gene-dependency levels from CRISPR–Cas9 essentiality screens while accounting for the copy number–specific effect. In our efforts to define a cancer dependency map, we performed genome-scale CRISPR–Cas9 essentiality screens across 342 cancer cell lines and applied CERES to this data set. We found that CERES decreased false-positive results and estimated sgRNA activity for both this data set and previously published screens performed with different sgRNA libraries. We further demonstrate the utility of this collection of screens, after CERES correction, for identifying cancer-type-specific vulnerabilities.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
References
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Hart, T. et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).
Aguirre, A.J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).
Munoz, D.M. et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 6, 900–913 (2016).
Cheung, H.W. et al. Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific dependencies in ovarian cancer. Proc. Natl. Acad. Sci. USA 108, 12372–12377 (2011).
Marcotte, R. et al. Essential gene profiles in breast, pancreatic, and ovarian cancer cells. Cancer Discov. 2, 172–189 (2012).
Cowley, G.S. et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data 1, 140035 (2014).
Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).
Wang, T. et al. Gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic Ras. Cell 168, 890–903.e15 (2017).
Tsherniak, A. et al. Defining a cancer dependency map. Cell 170, 564–576.e16 (2017).
Fellmann, C., Gowen, B.G., Lin, P.-C., Doudna, J.A. & Corn, J.E. Cornerstones of CRISPR–Cas in drug discovery and therapy. Nat. Rev. Drug Discov. 16, 89–100 (2017).
Corsello, S.M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
Doench, J.G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Hart, T., Brown, K.R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Hart, T. & Moffat, J. BAGEL: a computational framework for identifying essential genes from pooled library screens. BMC Bioinformatics 17, 164 (2016).
Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).
Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).
Xiang, X. et al. Grhl2 determines the epithelial phenotype of breast cancers and promotes tumor progression. PLoS One 7, e50781 (2012).
Werner, S. et al. Dual roles of the transcription factor grainyhead-like 2 (GRHL2) in breast cancer. J. Biol. Chem. 288, 22993–23008 (2013).
Zhang, X.D. A pair of new statistical parameters for quality control in RNA interference high-throughput screening assays. Genomics 89, 552–561 (2007).
Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 D1, D353–D361 (2017).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Boyd, S. & Vandenberghe, L. Convex Optimization 1–730 (Cambridge Univ. Press, 2004).
Acknowledgements
This work was supported by grants U01 CA176058, U01 CA199253, and P01 CA154303 (W.C.H.) and by the Slim Initiative for Genomic Medicine, a project funded by the Carlos Slim Foundation and the H.L. Snyder Foundation.
Author information
Authors and Affiliations
Contributions
R.M.M., J.G.B., and A.T. conceived and designed the study. R.M.M., J.G.B., and J.M.M. performed computational analysis and interpretation of results. J.G.B. wrote and implemented the modeling software. R.M.M., B.A.W., and A.E.S. processed and managed data. H.X. and N.V.D. assisted with computational analysis. P.G.M. provided computational tools. G.S.C., S.P., and F.V. provided project management. A.G., Y.L., L.D.A., G.J., R.L., W.F.H., M.S., T.W., D.C.H., V.A.Z., M.R.W., Z.K., J.J.C., and M.O. assisted with data generation. R.M.M., J.G.B., J.M.M., W.C.H., and A.T. wrote and/or revised the manuscript with assistance from other authors. K.S., T.R.G., J.S.B., F.V., D.E.R., W.C.H., and A.T. supervised the study and performed an advisory role.
Corresponding authors
Ethics declarations
Competing interests
W.C.H. reports receiving a commercial research grant from Novartis and serving as a consultant/advisory-board member for Novartis as well as for KSQ Therapeutics. No potential conflicts of interest are disclosed by the other authors.
Integrated supplementary information
Supplementary Figure 1 Screen quality and the copy number effect in previously published CRISPR–Cas9 essentiality screens.
(a) Screen quality as measured by area under the receiver operating characteristic curve (AUC) in discriminating between sets of common core essential and nonessential genes in two previously published datasets. (b) sgRNA depletion is regressed against genomic cut sites using a saturating linear model. The fit is plotted in red, and the median depletion at various levels of cuts is shown with horizontal bars. The r-squared of the fit is shown as a function of breakpoint in the inset plot. To the right, the distributions of sgRNAs targeting the ribosome, proteasome, and spliceosome are shown in red, non-targeting sgRNAs in blue, and all other sgRNAs in grey. (c) The fit for all cell lines is shown, as in Figure 1b, for the previously published cell lines. (d) For each dataset, the fits in Figure 1b and in panel c are averaged across cell lines, grouped by p53 mutation status. Shaded regions indicate mean +/− 1 s.d.
Supplementary Figure 2 Copy number–amplified genes are enriched for depletion ranks in CRISPR–Cas9 essentiality screens.
(a) A one-sample K-S test is performed on the depletion ranks of the 100 most amplified genes per cell line in each dataset. The K-S enrichment statistic for each line is plotted against the mean copy number of these 100 genes. Red points indicate cell lines for which these 100 genes are significantly enriched at P < 0.05. (b) As in Figure 1d, genes are ranked by average guide score for cell lines screened in two previously published datasets, and the ranks (median and IQR) of the 100 genes with the highest copy number measurements are plotted and colored by their mean copy number.
Supplementary Figure 3 CERES corrects the copy number effect in CRISPR–Cas9 essentiality screens.
(a) Boxplots of gene dependency scores across copy number levels before and after CERES correction of two previously published datasets. (b) Boxplots as in Figure 3a and panel a filtering for only genes called unexpressed by RNA-seq. (c) The dependency of each gene across cell lines is correlated with its copy number measurements, before and after CERES correction. The distribution of Pearson correlation coefficients is shown for each dataset analyzed for all genes on the left and for unexpressed genes on the right. The mean of each distribution is shown with a dotted line.
Supplementary Figure 4 CERES improves the specificity of CRISPR–Cas9 essentiality screens while preserving data in copy number–amplified regions.
(a) The recall common core essential genes at a 5% FDR of nonessential genes is plotted for each cell line before (red) and after (blue) CERES correction for two previously published datasets. (b) The recall at 5% FDR is plotted after correction using a linear model of copy number correction for the Avana dataset. (c) Using a simple filtering scheme removing all genes with a copy number > 4, the total number of genes filtered per cell line is plotted on the left, and the number of genes per cell line with a CERES gene effect < -0.6 on the right. The means are shown with dotted lines.
Supplementary Figure 5 Example correction of JAK2 amplification in the HEL cell line from the Wang 2017 dataset.
Copy number and gene dependency scores, before and after CERES correction, are plotted as in Figure 3c. JAK2 is highlighted in orange and labeled, as well as RCL1, which is involved in the biogenesis of the 40S ribosomal subunit.
Supplementary Figure 6 CERES preserves known cancer-specific genetic dependencies.
(a-d) Known cancer-specific genetic dependencies are plotted against copy number before (left) and after (right) CERES correction for 342 cell lines. The dependencies plotted are: KRAS dependency colored by KRAS mutation status (a), BRAF dependency colored by BRAF mutation status (b), PIK3CA dependency colored by PIK3CA mutation status (c), and MYCN dependency with neuroblastoma cell lines in purple (d).
Supplementary Figure 7 CERES infers guide activity scores in a previously published dataset.
(a) The composition of guide activity scores inferred by CERES from the Wang 2017 dataset. (b) For the set of 18,501 sgRNAs shared between the Avana and Wang 2017 libraries, sgRNAs are ranked by guide activity scores in each dataset and are plotted against each other, with darker purple representing greater density of sgRNAs.
Supplementary Figure 9 Differentially dependent genes in breast cancer cell lines.
(a,b) Taking genes that were called differential dependencies in breast lines in the Avana dataset, a differential dependency analysis using a dataset of RNAi screens is plotted for genes on chromosome 8q (a) and other regions (b). (c) Example relationships between highly expressed transcription factors in breast cancer and differential dependency scores after CERES correction. Breast cell lines are highlighted in pink. TRPS1 and GRHL2 are labeled in red, indicating that they are on chromosome 8q.
Supplementary Figure 10 Precision recall of random subsamplings of cell lines.
The F1-measure (harmonic mean of precision and recall) is calculated for each random sub-sampling of cell lines and compared to improvement in F1-measure from the full run of CERES on 342 cell lines.
Supplementary Figure 11 Hyperparameter optimization of the CERES model.
Error of the CERES algorithm evaluated on training and test data at 25 values of the regularization parameter for each of the three datasets analyzed. As the regularization strength is eased (moving from left to right), error on the training set decreases monotonically, while error on the test set decreases then increases.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–11
Supplementary Table 1
Cancer cell line information
Sample information for the 342 cancer cell lines used in this study
Supplementary Table 2
sgRNA sequences and targets
sgRNA barcode sequences with genome alignments and coding sequence mappings for the Avana library
Supplementary Table 3
Avana gene-knockout effects
CERES-estimated gene-knockout effects for 342 cancer cell lines screened with the Avana sgRNA library
Supplementary Table 4
GeCKOv2 gene-knockout effects
CERES-estimated gene-knockout effects for 33 cancer cell lines screened with the GeCKOv2 sgRNA library published in Aguirre et al. (2016)
Supplementary Table 5
Wang2017 gene-knockout effects
CERES-estimated gene-knockout effects for 14 AML cell lines screened with the Wang2017 sgRNA library published in Wang et al. (2017)
Supplementary Table 6
Avana guide activity scores
CERES-estimated guide activity scores for sgRNAs in the Avana dataset
Supplementary Table 7
GeCKOv2 guide activity scores
CERES-estimated guide activity scores for sgRNAs in the GeCKOv2 dataset
Supplementary Table 8
Wang guide activity scores
CERES-estimated guide activity scores for sgRNAs in the Wang2017 dataset
Rights and permissions
About this article
Cite this article
Meyers, R., Bryan, J., McFarland, J. et al. Computational correction of copy number effect improves specificity of CRISPR–Cas9 essentiality screens in cancer cells. Nat Genet 49, 1779–1784 (2017). https://doi.org/10.1038/ng.3984
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3984
This article is cited by
-
Dickkopf-1 promotes tumor progression of gefitinib- resistant non-small cell lung cancer through cancer cell-fibroblast interactions
Experimental Hematology & Oncology (2025)
-
Predicting substrates for orphan solute carrier proteins using multi-omics datasets
BMC Genomics (2025)
-
Metformin sensitizes triple-negative breast cancer to histone deacetylase inhibitors by targeting FGFR4
Journal of Biomedical Science (2025)
-
PRODE recovers essential and context-essential genes through neighborhood-informed scores
Genome Biology (2025)
-
CXCL14 in prostate cancer: complex interactions in the tumor microenvironment and future prospects
Journal of Translational Medicine (2025)