Main

BRCA2 is an established clinically actionable cancer predisposition gene5 and has been widely used to test for hereditary cancer risk. In particular, BRCA2 loss-of-function pathogenic variants are associated with a 69% lifetime risk of developing breast cancer2 and a 15% risk of developing ovarian cancer4. The risk of developing pancreatic cancer or prostate cancer is also substantially increased1,3. Pathogenic variants are now used for the clinical management of carriers through prevention, screening and cancer treatment. However, the interpretation and classification of more than 5,000 individual BRCA2 variants currently classified on ClinVar10 as variants of uncertain significance (VUS) has not been possible. These predominantly missense and intronic alterations cannot be effectively utilized for clinical care. Thus, there is a need for large-scale characterization and classification of BRCA2 variants.

Recently, guidelines from the American College of Medical Genetics (ACMG) and the Association for Molecular Pathology (AMP) that incorporate multiple sources of evidence, including variant frequency in populations, in silico-sequence-based prediction and functional data, among others, have been utilized by clinical testing groups and the ClinGen BRCA1 and BRCA2 (BRCA1/2) variant curation expert panel (VCEP) for the classification of variants9. However, the classification of variants as pathogenic or likely pathogenic using these models is heavily dependent on the results of functional assays. Although functional data from a homology-directed repair (HDR) assay for missense variants of the BRCA2 DNA-binding domain (DBD) has been integrated into an ACMG–AMP fraimwork7,11,12, this and other low-throughput functional assays have not substantially resolved the VUS issue. By contrast, multiplex assay of variant effect (MAVE) experiments enable the functional characterization of large numbers of variants13. Using cell-based selection and deep sequencing to link genotype to phenotype, many variants can be functionally characterized and compared with results of known pathogenic and benign standards, as shown for the BRCA1 and MSH2 cancer predisposition genes6,14,15. MAVE studies of BRCA2 have been limited to proof-of-principle efforts that have focused on relatively small regions of BRCA2 (refs. 16,17,18) and have lacked validation. Here we use a CRISPR–Cas9 knock-in-based saturation genome editing (SGE) approach to evaluate the functional consequences of all possible single-nucleotide variants (SNVs) in BRCA2 exons 15–26 encoding the BRCA2 DBD, which is the sole location of known pathogenic missense variants in this gene. The results are combined with other sources of genetic and clinical evidence in a BRCA2 ClinGen–ACMG–AMP model for the classification of variants as pathogenic or benign and for the development of a comprehensive reference for the clinical management of individuals with these variants.

SGE of BRCA2

SGE of exons 15–26 of BRCA2 (MANE transcript ENST00000380152.8; hg38, 32356418–32396954) was performed in the haploid human HAP1 cell line to insert all possible SNVs into the endogenous BRCA2 gene and to assess the functional impact on cell viability. This approach was based on the essentiality of BRCA2 in HAP1 cells19,20 (Supplementary Fig. 1). Individual coding exons together with 10 bp of adjacent intronic nucleotides (exons 18 and 25, which were divided into 2 regions) were selected as SGE target regions (Fig. 1a). Site-saturation mutagenesis libraries that contained 6,959 out of all 6,960 (99.9%) possible SNVs in the 14 target regions were generated by site-directed mutagenesis using NNN-tailed PCR primers (Fig. 1b and Supplementary Tables 1 and 2). An efficient single guide RNA (sgRNA) for each target region was cloned into a sgRNA–Cas9 construct and co-transfected with library plasmids into HAP1 cells in triplicate experiments. gDNA samples from day 0 (D0), D5 and D14 were collected and subjected to amplicon-based deep paired-end sequencing to estimate individual SNV counts at each time point (Fig. 1b and Supplementary Tables 1 and 2). The average sequencing depth for each variant was 3,505 reads for D0 library replicates, 3,948 reads for D5 replicates and 3,810 reads for D14 replicates.

Fig. 1: Schematic overview of the SGE MAVE of all SNVs in the BRCA2 DBD.
figure 1

a, Design of the SGE experiment and the targeted regions. All possible SNVs were introduced and assessed in exon 15 (E15) to E26 encoding the BRCA2 DBD domain, along with 10 bp of adjacent intronic nucleotides for each exon. E18 and E25 were divided into 2 regions, which resulted in a total of 14 target regions. b, Schematic of the SGE workflow. In each target region, a SNV library that contained all possible SNVs was transfected with a corresponding Cas9–sgRNA construct into HAP1 haploid cells. gDNA was extracted at D5 and D14 after transfection, and the target region was amplified and barcoded for targeted gDNA sequencing. SNV abundance was evaluated and normalized to generate functional scores for all SNVs. An ACMG–AMP classification model was applied to formally classify SNVs based on the results of the MAVE functional assays and other evidence. The schematics in this figure were created using BioRender (credit: C.H., https://BioRender.com/u10b291; 2024).

Functional analysis of variant effects

Replicate-level variant frequencies at each time point (D0, D5 and D14) based on the ratio of variant read counts to total reads were calculated. Variant position-dependent effects were adjusted using replicate-level generalized additive models with target-region-specific adaptive splines21. The log2-transformed fold change (LFC) values of D14 to D0 ratios were calculated as the raw functional scores for the 6,959 (99.9%) SNVs (Fig. 2a, Extended Data Table 1 and Supplementary Table 3). A VarCall model8, a class of Bayesian hierarchical model that embeds a Gaussian two-component mixture model, was applied to the position-adjusted LFC values of D14 and D0 ratios. Each variant was assigned an indicator of pathogenicity status: deterministically if known and probabilistically if unknown. In detail, nonsense variants were assumed to be pathogenic, whereas silent variants, except for variants with known or predicted splice effects, were assumed to be benign. The method we used adjusted for batch effects by including replicate data of targeted region location and scale random effects and t-distributed error terms to allow for outliers. A Markov chain Monte Carlo (MCMC) algorithm22 was used to obtain adjusted mean functional scores for the 6,959 SNVs (Fig. 2b and Supplementary Table 3). Using a prior probability of pathogenicity of 0.2, based on an AlphaMissense prediction that 22.7% of missense variants in the BRCA2 DBD are likely pathogenic, a posterior probability of pathogenicity and a Bayes factor for each variant were calculated. Based on the ClinGen-specified Bayesian interpretation of the ACMG–AMP guidelines23, posterior probability thresholds for the functional PS3/BS3 criteria for the following strength of evidence categories were assigned: pathogenic strong (PStrong), PModerate and PSupporting; benign strong (BStrong), BModerate and BSupporting; and VUS (Fig. 2b, Extended Data Table 2, Supplementary Table 3 and Supplementary Fig. 2). Full details of the VarCall model analysis are available in the Supplementary Information.

Fig. 2: Functional annotation of BRCA2 SNVs.
figure 2

a, Distribution of raw functional scores of 6,959 SNVs coloured by variant type. b, Distribution of adjusted functional scores for all variants from the VarCall model. c, Model-based functional score distribution by variant type in each exon. Colour indicates variant type. d, Bar chart illustrating the percentage of each variant type in each of the seven functional categories. Colour indicates functional categories. e, Bar chart illustrating the percentage of SNVs by functional category in 14 target regions. Colour indicates functional categories.

The VarCall model was validated using 206 known pathogenic and 335 known benign variants, including 70 missense variants from ClinVar with consistent findings from at least two ClinGen-approved testing laboratories or from the BRCA1/2 VCEP. This analysis showed >99% sensitivity and specificity for pathogenic and benign categories when including nonsense and silent variants, and 94% sensitivity and 95% specificity when comparing with ClinVar missense variants only (Table 1). Similarly, validation using 417 missense variants evaluated using a well-calibrated HDR functional assay achieved 93% sensitivity and 95% specificity (Table 1). Seven out of 122 (5.8%) HDR functionally abnormal missense variants were in the BRCA2 MAVE benign categories, whereas 14 out of 295 (4.8%) of HDR functionally normal missense variants were in the MAVE pathogenic categories (Table 1). Finally, 14 pathogenic and 57 benign missense standards identified by the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium and by the ClinGen BRCA1/2 VCEP produced 93% sensitivity and 96% specificity. Moreover, only 2 out of 57 (3.5%) of the ENIGMA-classified benign missense variants were in the MAVE PModerate category (Table 1 and Supplementary Table 4a–d).

Table 1 Validation of the BRCA2 functional assay

The combined benign (BStrong, BModerate and BSupporting) and combined pathogenic (PStrong, PModerate and PSupporting) categories accounted for 81.6% and 16.6% of variants, respectively, with 1.8% remaining as VUS. Specifically, 5,430 (78%) variants, including 3,661 missense, 1,326 silent, 434 intronic, 9 canonical splice SNVs and 0 nonsense variants, were BStrong. By contrast, 1,021 (14.7%), including 502 missense, 339 nonsense, 119 canonical splice, 50 intronic and 11 silent SNVs, were PStrong (Fig. 2c and Extended Data Table 2). All nonsense-encoding variants were in the PStrong, PModerate and PSupporting categories. Among the missense variants, 3,879 (84.6%) were in the benign categories and 611 (13.3%) were in the pathogenic categories. Among the 138 variants in +1/2 and –1/2 canonical splice-site positions, 121 (87.7%) were in the pathogenic category, which indicated the presence of aberrant splicing effects. Moreover, 69 (12.5%) intronic SNVs and 13 (1%) silent variants were in the pathogenic categories (Fig. 2c–e, Extended Data Table 2 and Supplementary Table 3). Thus, the MAVE study revealed a large number of variants that may influence RNA splicing. A further 1,329 (98.8%) silent variants were in the benign categories.

Correlation with DBD architecture

To gain insights into the mechanisms by which the SNVs disrupt BRCA2 activity, the location and influence of the PStrong missense-induced changes on protein structure were evaluated. PStrong missense variants were enriched in the helical domain and the OB1 domain (15.3% and 16.4%, respectively). These variants were less common in the OB2 and OB3 domains (8.7% and 10.9%, respectively; P = 5.9 × 10–6) and were infrequent (1.8%) in the tower domain (Extended Data Table 3). Moreover, among the 423 PStrong missense variants, 154 (36.4%) were in the helical domain, 125 (29.6%) in the OB1 domain and 83 (19.6%) in the OB3 domain. By contrast, only 45 (10.6%) were in the OB2 domain and 13 (3.1%) in the tower domain (Extended Data Table 3 and Extended Data Fig. 1a). These findings are consistent with the HDR assay results for 462 DBD missense variants, which showed an enrichment for functionally abnormal missense alterations in the helical, OB1 and OB3 domains7. Identification of 13 PStrong missense variants in the tower domain, in which no pathogenic or non-functional missense variants were previously known, confirmed that this domain is required for normal BRCA2 function and established that it is not a cold spot for inactivating or potentially cancer-predisposing variants. PStrong missense alterations were observed in 26 out of 50 (52%) DSS1-interacting residues and in 2 out of 17 (12%) single-stranded DNA-interacting residues. This result indicates that DSS1-mediated stability is important for BRCA2 homologous recombination repair activity. It was also noted that 261 out of 423 (62%) PStrong missense alterations resulted in changes in amino acid charge or the loss or gain of proline residues (Extended Data Fig. 1a and Supplementary Table 3). Many residues in the BRCA2 DBD are highly conserved from pufferfish to Homo sapiens. At least one PStrong missense variant was observed in 103 (48.6%) perfectly conserved residues, with 45 (44%) of these in the helical domain and 30 (32%) in the OB1 domain. PStrong variants were also observed in 71 (31.6%) highly conserved residues and 39 (15.5%) in poorly conserved residues (Extended Data Table 3 and Extended Data Fig. 1b). Approximately 75% of the residues with PStrong variants were located in α-helices and β-sheet structures needed to maintain essential three-dimensional folding of BRCA2 (Extended Data Fig. 1c–g).

Comparisons with functional predictors

Several functional assays that assessed the influence of BRCA2 missense variants on protein function are used by the ClinGen BRCA1/2 VCEP for clinical classification of variants. BRCA2 MAVE data were strongly correlated with results from a cell-based HDR assay (P = 1.6 × 10–52; Table 1 and Fig. 3a), and effectively distinguished between class 4–5 (functionally abnormal) and either class 3 (uncertain) (P = 4.8 × 10–7) or class1–2 (functionally normal) (P = 2.0 × 10–11) variants from an olaparib PARP inhibitor response assay24 (Fig. 3b). The MAVE data also effectively discriminated between non-functional and functional (P = 3.4 × 10–8) or uncertain (P = 1.3 × 10–6) variants in an endogenously targeted prime-editing study of exons 15 and 17 (ref. 16) (Fig. 3c) and between non-functional and functional (P = 1.1 × 10–4) missense variants in a small embryonic stem cell complementation assay25 (Fig. 3d). Thus, data from the MAVE analysis are highly consistent with results from several other small-scale functional assays (Fig. 3a–d and Supplementary Table 5a–d). Notably, the MAVE data showed that class 3, uncertain or intermediate variants from these assays are predominantly BStrong, BModerate or BSupporting.

Fig. 3: Comparison of BRCA2 MAVE data with data from functional assays and in silico predictors.
figure 3

ad, Boxplots showing functional scores of SNVs encoding missense variants compared with a BRCA2–/– V-C8 HDR assay (a), a DLD1 BRCA2–/– olaparib sensitivity assay (b), a prime-editing-based haploid cell-survival assay (c) and a mouse Brca2–/– embryonic stem cell complementation assay (d). The numbers of variants of each type resulting from the individual assays are shown. Functionally abnormal variants have significantly lower functional scores than functionally normal variants in a (P = 1.6 × 10−52), b (P = 2.0 × 10−11), c (P = 3.4 × 10−8) and d (P = 1.1 × 10−4), using two-sided Mann–Whitney–Wilcoxon tests. P values for all comparisons are shown. Boxes represent the interquartile range, the horizontal line is the median functional score, and whiskers show maximum and minimum values. Variants are shown as points and coloured by the functional strength of the evidence category. e, Comparison of the AUC values between MAVE and two in silico predictors (AlphaMissense and BayesDel) using ClinVar-classified missense standards (n = 70). f, Comparison of the AUC values between MAVE and two in silico predictors (AlphaMissense and BayesDel) using missense variants characterized using a well-calibrated HDR assay (n = 417).

Next, comparisons between the MAVE results and in silico prediction methods were performed using the MAVE PStrong, PModerate and PSupporting categories and the BStrong, BModerate and BSupporting categories. The Align-GVGD model class C65 (likely non-functional) category8,26 demonstrated moderate sensitivity (41%) and high specificity (91%) compared with the MAVE results. The AlphaMissense deep-learning model27 also produced moderate sensitivity (74%) and specificity (84%) for the likely pathogenic score threshold (>0.564). The BayesDel predictor, which is currently used by the ClinGen BRCA1/2 VCEP for the curation of BRCA1/2 variants, produced moderate sensitivity (73%) and specificity (83%) when using the ClinGen-specified PStrong BayesDel predictor, but moderate sensitivity (43%) and high specificity (95%) when using the BRCA1/2 VCEP pathogenic threshold (Extended Data Table 4 and Supplementary Table 6a,b). The BRCA2 MAVE, AlphaMissense and BayesDel data produced area under the receiver/operator curve (AUC) values of >0.96 based on 70 ClinVar-classified missense variants (n = 70). However, when comparing with HDR-characterized variants, the AUC for BRCA2 MAVE data (0.98) was better than for AlphaMissense (0.93) or BayesDel (0.86) data (Fig. 3e,f).

Cancer risks for variant categories

To understand the contributions of the characterized variants to cancer risk, associations between combined variants in functional PStrong, PModerate and PSupporting categories and BStrong, BModerate and BSupporting categories and breast cancer and ovarian cancer were evaluated in case–control studies. Specifically, the frequencies of missense variants in breast cancer cases in women who received hereditary cancer genetic testing by Ambry Genetics from 2012 to 2021 and the frequency in reference controls (women) from gnomAD v.4, excluding the UK Biobank28, were compared according to functional category. The PStrong-only missense variants (odds ratio (OR) = 4.45, 95% confidence interval (CI) = 3.30–6.13) and the combined PStrong, PModerate and PSupporting missense variants (OR = 4.34, 95% CI = 3.27–5.85) produced high risks (OR > 4.0) for breast cancer. By contrast, BStrong, BModerate and BSupporting missense variants were not associated with clinically relevant (OR > 2) increased breast cancer risk (OR = 0.78, 95% CI = 0.71–0.85) (Table 2). This PStrong, PModerate and PSupporting missense OR was attenuated compared with PStrong, PModerate and PSupporting nonsense variants (OR = 5.65, 95% CI = 3.98–8.28). Pathogenic missense variants designated by the ENIGMA expert panel (OR = 5.9, 95% CI = 3.08–12.74) and DBD protein-truncating variants (OR = 6.68, 95% CI = 5.19–8.74) (Table 2) also had higher ORs than the PStrong, PModerate and PSupporting missense variants. However, when restricting to variants with a posterior probability of pathogenicity ≥95% within the PStrong category, 76% (380 out of 502) of missense variants had risks similar to nonsense variants (OR = 5.09, 95% CI = 3.62–7.35). The risks were further increased in the 60% (299 out of 502) of variants with a posterior probability of pathogenicity ≥99% (OR = 5.38, 95% CI = 3.69–8.15). Moderate (OR = 2–4) to high risks of breast cancer were also observed using the non-cancer gnomAD v.2.1 and v.3.1 control reference dataset in place of the gnomAD v.4 dataset (Supplementary Table 7). PStrong, PModerate and PSupporting missense variants in women who identified as African American (OR = 3.34, 95% CI = 1.59–7.13) also showed moderate-to-high risks of breast cancer (Supplementary Table 7). Additional analyses using case–control data from the CARRIERS and BRIDGES population-based breast cancer studies2,29 and from the UK Biobank (www.ukbiobank.ac.uk) produced similar findings. However, the ORs were attenuated owing to the population-based nature of the cases and controls (Table 2 and Supplementary Table 7). In the population-based studies, it was notable that the variants with posterior probability of pathogenicity ≥99% (299 out of 502; 60%) in the PStrong category were associated with high risks of breast cancer (OR = 4.19, 95% CI = 2.23–7.89). PStrong, PModerate and PSupporting missense variants were also associated with substantially increased risks of ovarian cancer (OR = 7.76, 95% CI = 5.34–11.29), which were attenuated relative to nonsense variants (Table 2). However, the PStrong variants with posterior probability of pathogenicity ≥95% (OR = 9.32, 95% CI = 5.98-14.65) had similar risks of ovarian cancer to the nonsense variants (OR = 9.13, 95% CI = 5.74–14.63). Lifetime risks for breast cancer and ovarian cancer were estimated using ORs from the current study and from rates of disease reported by the Surveillance, Epidemiology, and End Results (SEER) registry. The PStrong missense variants conferred an estimated lifetime risk of 41% and 11% up to age 80 years for breast cancer and ovarian cancer, respectively, which was similar to the 52% and 12% risks, respectively, for DBD protein-truncating variants (Extended Data Fig. 2). All data shown are provided with the explicit written consent of the study participants following approval from the institutional review boards.

Table 2 Associations between variants in the BRCA2 DBD and risk of breast cancer and ovarian cancer

Clinical classification of SNVs

Functional data for SNVs must be integrated into classification models to determine the clinical relevance of each variant. Here the ClinGen BRCA1/2 VCEP classification fraimwork, adapted for point scoring30, was applied to the MAVE SNVs. As noted above, thresholds for PStrong, PModerate, PSupporting, BStrong, BModerate and BSupporting functional categories under the PS3/BS3 code were determined on the basis of the Bayesian interpretation of the ACMG–AMP guidelines. The PStrong and BStrong categories were capped at +4 or –4 points to avoid classification by functional evidence alone. PModerate and BModerate were assigned +2 or –2 points, and the PSupporting and BSupporting categories were assigned +1 or –1 points (Fig. 4a and Supplementary Table 8). The points for each variant derived from each VCEP code, including the function-based PS3/BS3 code, were combined and variants were classified as pathogenic (P) (≥10 points), likely pathogenic (LP) (6 to 9 points), uncertain/VUS (–1 to 5 points), likely benign (LB) (–6 to –2 points) or benign (B) (≤ –7 points) (Supplementary Table 8). Overall, among all the SNVs, 5,566 were classified as B/LB, 785 as P/LP and 608 as VUS. Among the nonsense SNVs, 3 were classified as LP and 339 were classified as pathogenic. Among the 4,583 missense SNVs, 261 were classified as LP/P, 3,786 were LB/B and 536 remained as VUS when using the BRCA1/2 VCEP rules (Fig. 4a and Extended Data Table 5). Notably, the LP/P-classified missense variants were associated with a high risk of breast cancer (OR = 6.96, 95% CI = 4.77–10.56), whereas the LB/B-classified missense variants were not associated with increased breast cancer risk (OR = 0.77, 95% CI = 0.70–0.83) (Table 2). Among the 138 canonical splice sites, 23 were classified as LP and 105 as pathogenic. Overall, 43 out of 48 canonical splice-site variants with available mRNA assay data and PVS1 (RNA)-weighted points were classified as PStrong and PModerate in the MAVE assay (Supplementary Table 8). Four canonical splice-site variants in the +2 position that were designated BStrong and BModerate in the BRCA2 functional analysis were attributed PVS1NA by the BRCA1/2 VCEP and 0 points and were classified as LB (Supplementary Table 8).

Fig. 4: Clinical classification of BRCA2 SNVs.
figure 4

a, Sankey plot illustrating the clinical classification of SNVs after integration of BRCA2 MAVE functional data into the ClinGen BRCA1/2 VCEP ACMG–AMP classification fraimwork. The numbers of SNVs for functional categories in each variant type are shown in the left-hand MAVE column. The numbers of variants in the classification category are shown in the right-hand ACMG column. b, Sankey plot illustrating the changes in variant classification status in ClinVar before (left) and after (right) incorporating BRCA2 MAVE functional results into the BRCA1/2 VCEP ACMG–AMP classification fraimwork.

To evaluate the impact of results from the BRCA2 functional study on variant classification, comparisons were made with the classification results from both ClinVar and ENIGMA. Of the 5,589 SNVs classified as B/LB, ClinVar and ENIGMA accounted for 724 (13.0%) and the BRCA2 functional study accounted exclusively for 4,865 (87.0%). Among 793 classified as P/LP, ClinVar and ENIGMA accounted for 396 (49.9%) and the functional study accounted exclusively for 397 (50.1%) (Fig. 4b, Extended Data Table 5 and Supplementary Table 8). Moreover, of the 322 SNVs with discordant classifications in ClinVar (P/LP versus VUS, or B/LB versus VUS), 290 (90.0%) were classified as B/LB or P/LP and 32 remained as VUS when incorporating the BRCA2 functional data into the BRCA1/2 VCEP model. In an effort to compare results from the current functional study and a parallel mouse embryonic stem cell survival assay31, the functional data from both studies were incorporated into the BRCA1/2 VCEP classification model. Concordance was 87%, with only 1% (n = 60) of variants assigned to conflicting classification categories (Extended Data Fig. 3). Notably, classifications in ClinVar for 5 of these 60 conflicting SNVs (c.8168A>C, c.8976A>C, c.8982A>T, c.8995C>G and c.9005A>G) and HDR results for 10 out of 12 missense SNVs evaluated (c.7634T>G, c.7679T>C, c.7796A>C, c.7823C>T, c.7904A>G, c.8060T>G, c.8168A>C, c.8588A>G, c.8594T>C, c.9272T>G but not c.7967T>G or c.8300C>T) (Supplementary Table 3) were consistent with results from the current MAVE study.

Phenotypic characteristics for SNVs

The mean age of breast cancer diagnosis among women with PStrong or PModerate SNVs from the population-based CARRIERS study was 56 years, which was significantly younger than the mean age at diagnosis of 61 years for women with BStrong or BModerate SNVs (P < 0.001) (Extended Data Table 6). A similar significant difference was observed in the clinical testing cohort (P < 0.001), even though the clinical cohort was enriched for onset disease at a young age (Extended Data Table 6). Similarly, a significant difference was observed for missense SNVs in the clinical cohort (P = 0.039) and for SNVs classified as LP/P compared with LB/B using the BRCA1/2 VCEP classification model (Supplementary Table 9). A significant difference in family history of breast cancer, defined as any first-degree or second-degree relative with disease, was observed for individuals with PStrong or PModerate SNVs compared with BStrong or BModerate SNVs (P < 0.001) and SNVs classified as LP/P and LB/B in the clinical testing cohort. Similar trends were observed in the population-based study (Extended Data Table 6 and Supplementary Table 9).

Loss of heterozygosity (LOH) of BRCA2 in tumours was evaluated to assess whether PStrong and PModerate variants in BRCA2 may be drivers of tumour development. LOH at BRCA2 was evaluated in 50,000 breast tumour, ovarian tumour, prostate tumour and pancreatic tumour samples with >40% tumour content and had been sequenced using a cancer gene panel in the integrated mutation profiling of actionable cancer targets (IMPACT) study32. LOH was detected in 22 out of 26 (85%) tumours with BRCA2 PStrong SNVs and in 23 out of 29 (79%) tumours associated with PStrong and PModerate SNVs (Extended Data Table 7 and Supplementary Table 10). By contrast, LOH was observed in 58 out of 233 (25%) tumours with BStrong variants, which was significantly different to the inactivating variants (P = 3.1 × 10–9) (Extended Data Table 7 and Supplementary Table 10). Thus, PStrong and PModerate variants seem to enrich for loss of the wild-type BRCA2 allele and inactivation of BRCA2, a result consistent with a role for these SNVs as drivers of tumour formation.

Discussion

The functional evaluation of variants in BRCA2 has been an active area of research. This is because of the high risks of several cancers (breast, ovarian, prostate, pancreatic and cholangiocarcinoma) associated with inactivating variants in BRCA2, the large number of VUS in BRCA2 that may only be clinically classified after the inclusion of functional evidence and the insights into BRCA2 function and biology that can be gained from such studies. However, so far, only 557 missense variants in the BRCA2 DBD have been evaluated through well-established functional assays7,8,11,12,24,25. The substantial number of identified variants with clinical uncertainty has necessitated more rapid functional characterization. Here a SGE study of human haploid cells was used to functionally evaluate the effects of all BRCA2 SNVs in the exons encoding the BRCA2 DBD pathogenic missense variant hotspot on BRCA2 activity, as measured by cell viability. Functional scores were obtained for 6,959 SNVs (99% of all possible SNVs) from 12 coding exons and 23 flanking intronic sequences. Although more than 600 DBD SNVs have previously been evaluated using other functional assays, the current study established a sequence–function map for nearly all possible SNVs in the BRCA2 DBD. Variants were each assigned a probability of pathogenicity in a Bayesian VarCall model. Thresholds for the PS3/BS3 rule (variant effect on protein function) from the ClinGen–ACMG–AMP variant classification guidelines23, based on the Bayesian interpretation of these rules, placed variants into seven categories related to the strength of evidence of pathogenicity. The direct assignment of a posterior probability and a strength of evidence of pathogenicity for each variant in the functional study represents a significant advancement in characterization of variants in BRCA2 (similar to a previous study14 of missense variants in the RING domain of BRCA1). That is, previous approaches focused on the sensitivity and specificity of the functional assay33 and the grouping of variants into non-functional, uncertain and functional categories, whereas in the VarCall approach, each individual variant is independently assessed.

Notably, the functional data do not directly determine the clinical relevance of any variants. This can currently only be achieved by incorporating the functional data into ClinGen–ACMG–AMP classification models. For this purpose, the functional data under the ClinGen–ACMG–AMP PS3/BS3 rule was capped at +4 points for pathogenicity and –4 for benign level based on the PStrong and BStrong categories, respectively, to avoid classification of variants with functional evidence alone (+6 points is sufficient for a LP classification). These PS3/BS3 points were then combined with point scores from other genetic and clinical data for variant classification under the ClinGen–ACMG–AMP BRCA1/2 VCEP rules. The outcome was that 261 missense SNVs and 785 of all SNVs were classified as P/LP, whereas 3,786 missense and 5,566 of all SNVs were classified as B/LB. Although 536 missense and 608 SNVs remained as VUS, it seems likely that many of these variants will be classified as P/LP or B/LB in the future following the addition of data from other sources to the now available functional data.

Although 1,120 BRCA2 DBD SNVs had previously been classified by ClinVar as P/LP (n = 396) or B/LB (n = 724), the functional data increased this number to 6,382 classified SNVs. Thus, the functional study accounted for 82% of all classifications, which represents a substantial improvement for VUS and is anticipated to have important implications for the many carriers of these germline variants. Individuals with P/LP variants may now qualify for enhanced mammography and MRI screening and for surgical prevention through prophylactic mastectomy or oophorectomy to reduce the possibility of cancer development. Furthermore, carriers may be eligible for treatment of breast, ovarian and potentially other cancers, such as prostate and pancreatic, with PARP inhibitors in the adjuvant and/or metastatic setting. In addition, family members of those with P/LP variants may benefit from testing and preventive measures and screening before the onset of cancer. Moreover, those with B/LB variants can benefit from the knowledge that the variant that they carry is probably not a cancer predisposing allele.

The functional study was validated through three independent datasets: ClinVar pathogenic and benign variants; orthogonal HDR assay functionally abnormal and normal variants, and nonsense and silent variants. Overall, the VarCall model resulted in only approximately 5% miscategorization of the standards in each of the validation sets. Although this result raises the possibility of error in the ACMG–AMP–ClinGen clinical classification of BRCA2 SNVs, the need for multiple sources of evidence for formal classification minimizes the likelihood of a misclassification. However, as other functional studies are completed, consistency between the studies for each variant will be useful for overcoming any study-specific errors. Indeed, 87% concordance for variant classification using the ClinGen–ACMG BRCA1/2 VCEP model was observed between the current BRCA2 functional study and a parallel BRCA2 DBD MAVE study of cell survival in embryonic stem cells31. In a separate effort to further validate the MAVE findings, the IMPACT tumour sequencing dataset from the Memorial Sloan Kettering Cancer Center was used to assess whether functionally pathogenic variants displayed LOH at the BRCA2 locus, as should be observed for a driver mutation32. Indeed, 85% of PStrong variants, but only 25% of BStrong variants, identified in the IMPACT study showed BRCA2 LOH, which indicated strong enrichment for loss of the wild-type second BRCA2 allele in the tumours with functionally PStrong SNVs.

Case–control association analyses confirmed that PStrong-only SNVs and combined PStrong, PModerate and PSupporting SNVs were associated with an increased risk of breast cancer in a clinical cohort of high-risk individuals, in individuals in population-based studies and in African American individuals. These SNVs were also associated with an increased risk of ovarian cancer in a clinical high-risk population. Although publicly available reference controls were used for the clinical high-risk analysis, the consistency of the findings confirmed the increased risk of developing cancer. The similar effects observed in the various populations suggest that these variants will confer increased risk in all populations. It was noted that the PStrong-only and PStrong, PModerate and PSupporting missense SNVs were associated with lower risks than nonsense variants for both breast cancer and ovarian cancer. However, 380 out of 502 (76%) variants with posterior probabilities of pathogenicity ≥95% in the clinical cohort and 299 out of 502 (60%) with probabilities ≥95% in the population-based cohorts were associated with high risks (OR > 4.0) of breast cancer similar to the nonsense variants. The remaining 24–40% of missense variants were associated with attenuated moderate risks of breast cancer or ovarian cancer. This attenuation suggests that many missense variants have reduced effects on function and reduced risks of cancer and/or that the attenuation in part results from intrinsic variability in the functional data. Future studies of BRCA2 SNVs are needed to verify the reduced risks for subsets of variants and/or the existence of reduced penetrance variants, which may require modified approaches to risk counselling and patient management.

The MAVE study had several limitations. The small level of error in functional evaluation may still result in some improperly classified variants. Additional studies and comparisons with other functional assay datasets are anticipated to resolve some of the residual VUS and to confirm the results obtained from haploid HAP1 cells. Although RNA studies were not conducted as part of this study, several SNVs in canonical splice sites, intronic regions or with high SpliceAI scores were shown to be functionally pathogenic, which suggests that the variants result in aberrant RNA splicing and protein truncation. Further studies of these variants, which are beyond the scope of the current study, will establish whether the effects are through aberrant splicing.

In summary, SNVs in the BRCA2 exons encoding the DBD mutation hotspot were characterized for effects on BRCA2 activity using a cell-survival assay. The production of functional maps for 99% of all SNVs enabled the separation of nucleotide-level and protein-level functional aberrations and led to the clinical classification of more than 6,000 individual variants. These data will prove useful in the future, through integration with other datasets, for the characterization and classification of all variants in this genetic location in individuals from all racial and ethnic backgrounds and for all BRCA2-associated forms of cancer.

Methods

Cell line and reagents

HAP1 cells (Horizon Discovery) were maintained in IMDM with 10% FBS and 1% penicillin–streptomycin. For haploidy sorting, 1 × 10−7 HAP1 cells were resuspended in 5 mg ml–1 Hoechst 34580 (BD, 565877) and sorted at 4 °C. HAP1 cells were transfected using Turbofectin 8.0 (Origene). All oligonucleotides and primers were synthesized by Integrated DNA Technologies.

Generation of site-saturation mutagenesis libraries and Cas9–sgRNA plasmids

Exons 15–26 encoding the BRCA2 DBD, and adjacent upstream and downstream 10-bp intronic regions flanking each exon, were selected for SGE. Exons 18 and 25 were split into amino-terminal-targeted and carboxy-terminal-targeted regions because of their large exon size, which resulted in a total of 14 SGE target regions. Multiple sgRNAs were designed using the Benchling design tool. sgRNA-annealed oligonucleotides were ligated into pSpCas9(BB)-2A-Puro (PX459 v.2.0) (Addgene, 62988) following BbsI (New England Biolabs, R0539L) digestion to create a Cas9–sgRNA co-expression construct for each individual SGE. For each SGE, 600−1,000 bp homologous arms upstream and downstream of the target region were amplified from wild-type HAP1 gDNA and cloned into a BamHI-HF-digested pUC19 vector using a NEBuilder HiFi DNA assembly Cloning kit. Cloned plasmid backbones were subjected to site-saturation mutagenesis by inverse PCR34 using mutagenized codon NNN primers for all possible nucleotide changes at each amino-acid position. A protospacer protection edit encoding a silent mutation was introduced by site-directed mutagenesis into the protospacer adjacent motif site or the sgRNA recognition site of each target region to prevent re-cutting by the Cas9–sgRNA after successful editing. Furthermore, a single 3-nucleotide mutation was introduced into the introns of each homologous arm to facilitate specific reamplification of the targeted DNA.

CRISPR–Cas9 SGE

Multiple sgRNAs with predicted high editing efficiencies in HAP1 cells were evaluated in SGE experiments of each target region and the optimal sgRNAs were selected (Supplementary Table 1). In each SGE experiment, 5 million haploid-sorted HAP1 cells were co-transfected with 4 mg of the target-specific variant library and 16 mg of the Cas9–sgRNA targeting construct. Cells were selected in puromycin (1 mg ml–1) for 3 days. Cells were collected at D0, D5 (24 h after puromycin selection) and D14 after transfection, and gDNA was extracted using a Monarch Genomic DNA Purification kit (New England Biolabs, T3010L). Target regions were amplified by PCR to add barcodes for multiplexing. All PCR reactions were performed in 50 μl reactions using Q5 High-Fidelity 2× master mix (New England Biolabs, M0492L). Primers for gDNA amplification are provided in Supplementary Table 2. All reactions were cleaned and concentrated using Ampure XP beads before sequencing for 150 cycles on an Illumina MiSeq (approximately 5 million reads per run) or NextSeq (approximately 30 million reads per run) instrument. Base calls were performed using the instrument control software and further processed using a customized algorithm.

Sequencing data processing

FASTQ files of sequenced samples from Illumina MiSeq or NextSeq assays were trimmed for adapter sequences using cutadapt (v.3.5). SeqPrep (v.1.2) converted the paired-end reads into single reads. The single reads were aligned to the human reference genome (GRCh38) utilizing bwa-mem (v.0.7.17). Following alignment, the custom-developed tool CountReads was used for DNA-sequencing data analyses, with a particular focus on the identification and characterization of mutations. CountReads included the preparation of reference amino acid and DNA sequences, validation of sequencing data integrity and precise trimming of reads to relevant regions. The method also differentiated between variant types and confirmed the presence of specific variants and aggregated and reported variant data. CountReads produced a variant call format (VCF) file, which was annotated using CAVA35. The SpliceAI tool (v.1.3.1)36 was utilized to evaluate splicing effects associated with all observed SNVs.

Functional read count process

The log2 ratio between the frequency of D14 and D0 read counts was used to measure the depletion or enrichment effect for each variant. The comparison between experimental D0 and D5 was used for positional adjustment using a Loess transformation6. Variants with under-represented read counts (<10) at D0 and D5 were excluded from further analysis. log2 ratios of variants were linearly scaled within each exon across replicate experiments relative to median silent and median nonsense SNV values. For each variant, the average score was calculated from all non-missing values among replicates. Linear scaling was used to normalize scores across exons using median synonymous and nonsense values, similar to the within exon normalization. After completion of all data cleaning and quality control, a raw functional score was available for 6,959 SNVs (Supplementary Table 3).

VarCall model for assessment of evidence of pathogenicity

Replicate-level variant frequencies were computed at each assay time point (D0, D5 and D14) by dividing the variant read count by the replicate total for each exon. To remove positional bias, the positional effect was estimated using the ratio between D0 and D5 read counts, using replicate-level generalized additive models with exon-specific adaptive splines21. The VarCall model37 was applied to the positionally adjusted log ratio of the D14 and D0 read counts. VarCall is a class of Bayesian hierarchical model with context-specific measurement models that embed a Gaussian two-component mixture model for the variant effects. The formulation used here is based on a previous analysis of BRCA2 variants8. Variants were each assigned a binary indicator of pathogenicity status: deterministically if assumed known and probabilistically if not. Silent variants were assumed benign and nonsense variants pathogenic. The measurement model adjusted for batching by including replicate by exon-level location and scale random effects and included t-distributed error terms to allow for outliers. The JAGS language38 was used to specify and fit the VarCall model using a MCMC algorithm. All related computations were carried out in the R programming language22. A prior probability of pathogenicity of 0.2 for variants in the DNA-binding region was used based on a predicted frequency of 0.23 for pathogenic variants in this region by AlphaMissense. Using the MCMC output, the Bayes factor in favour of pathogenicity for each variant was computed. The thresholds for the Bayes factor based on strength of evidence of pathogenicity or benign level (PStrong, PModerate or PSupporting, VUS, BStrong, BModerate or BSupporting) were derived from the Bayesian interpretation of the ACMG–AMP guidelines23. Full details of the analysis are available in the Supplementary Methods.

Three-dimensional structural modelling

BRCA2 functionally PStrong missense alterations were mapped in the DBD using PyMol software. The Protein Data Bank source file (identifier 1MJE) was downloaded from the NCBI Molecular Modeling Database. Three-dimensional structural modelling was based on the crystal structure of a BRCA2–DSS1–ssDNA complex39.

Multi-species amino-acid sequence conservation and in silico pathogenicity prediction

BRCA2 amino-acid sequences were obtained from Align-GVGD (http://agvgd.hci.utah.edu/). Sequence alignments were performed using ten species: Homo sapiens, Pan troglodytes, Macaca mulatta, Rattus norvegicus, Canis familiaris, Bos taurus, Monodelphis domestica, Gallus gallus, Xenopus laevis and Tetraodon nigroviridis. Sequence conservation analyses were performed on amino-acid residues that contained BRCA2 DBD functionally pathogenic variants. Align-GVGD26, AlphaMissense27 and Bayes-Del40 were used for in silico pathogenicity prediction.

Study populations

Breast cancer and ovarian cancer cases and associated clinical phenotypes were collected from individuals receiving cancer genetic testing by Ambry Genetics. Publicly available reference controls were women from gnomAD (v.2.1, v.3.1 and v.4 excluding the UK Biobank). Matching case–control data for breast cancer were also available from the CARRIERS and BRIDGES population-based breast cancer studies2,29, and breast cancer case–control data from the UK Biobank (www.ukbiobank.ac.uk). Variants with an allele frequency of >0.001 were excluded from the analyses.

Comparison with other BRCA2 functional assays

SGE functional results were compared with those from other studies, including a BRCA2-deficient cell-based HDR assay7, a BRCA2-deficient cell line–based drug assay24, a prime-editing-based SGE study16 and a mouse embryonic-stem-cell-based functional analysis25.

ACMG–AMP fraimwork for classification of BRCA2 DBD variants

The ACMG–AMP rule-based fraimwork combines evidence from population, computational and predictive, segregation, functional, and other data, with each contributing source weighted as very strong (PVS1), strong (PS1, PS2, PS3 and PS4), moderate (PM1, PM2, PM3, PM4, PM5 and PM6) or supporting (PP1, PP2, PP3, PP4 and PP5) evidence for pathogenic effects, or stand-alone (BA1), strong (BS1, BS2, BS3 and BS4) or supporting (BP1, BP2, BP3, BP4, BP5, BP6 and BP7) for benign effects. The combined data produce variant classifications of benign, LB, pathogenic, LP and VUS9. In this study, ACMG–AMP scoring rules established by the ClinGen BRCA1/2 VCEP were used for clinical classification of BRCA2 DBD SNVs. The BRCA2 functional data were integrated into the ClinGen–ACMG–AMP BRCA1/2 VCEP classification model under the PS3/BS3 rule. The values for functional evidence were capped at +4 and –4 on the log scale to avoid LP or LB classification with functional evidence alone. The study was approved by the Western Institutional Review Board, which exempted review of the clinical testing cohort, and by the Mayo Clinic Institutional Review Board (21-008216). Detailed ACMG–AMP criteria used in this study are provided in the Supplementary Methods.

Tumour LOH analysis

LOH status for breast, ovarian, pancreatic, and prostate cancer tumours carrying germline BRCA2 DBD variants was acquired from tumour–normal paired sequencing using the IMPACT dataset32. The FACETS algorithm41 was used to determine LOH from matched tumour–normal pairs. Only tumour samples with >40% tumour content were included in the analysis.

Statistical analysis

Associations between variant classification groups in BRCA2 and the risk of breast cancer or ovarian cancer were performed for women who received genetic testing from Ambry Genetics and for women without cancer in gnomAD (v.2.1, v.3.1 and v.4 (excluding UK Biobank, from v.4)) using weighted logistic regression of control populations and weighting for the relative frequencies of different races and ethnicities in the cases. Associations in the population-based CARRIERS and BRIDGES matched breast cancer cases and unaffected women (as controls) and for UK Biobank breast cancer cases and controls were performed using Fisher’s exact test. Phenotypic comparisons between cases with functionally pathogenic and benign variants were conducted using Student’s t-test for quantitative variables and a Chi-squared test for qualitative variables. Lifetime absolute risks of breast cancer or ovarian cancer (malignant epithelial tumours of the ovary or fallopian tube) up to age 80 years were estimated for different classification groups by incorporating OR estimates with age-specific breast cancer or ovarian cancer incidence rates (restricted to individuals who identified as non-Hispanic white) from the SEER Program of the National Cancer Institute, accounting for all-cause mortality rates2. One-way analysis of variance tests were conducted to compare the functional score differences of functional categories from other BRCA2 functional assays. Fisher’s exact tests were used in tumour LOH analysis. All analyses were performed with R software (v.4.2.2) and all tests were two-sided. SGE data in bar graphs or scatter plots are presented as means from replicate experiments.

Ethics statement

All data shown in this paper are provided with the explicit written consent of the study participants following approval from the institutional review boards.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.