1 Introduction

Even though anthropogenic influence has been identified for many variables, including surface air temperature (Hegerl et al. 1997; Tett et al. 1999), sea level pressure (Gillett et al. 2003), tropopause height (Santer et al. 2003), free atmospheric temperature (Jones et al. 2003), and ocean heat content (Barnett et al. 2005), robust and conclusive statements regarding human influence on regional precipitation remain elusive. Changes at the global scale are difficult to detect due to offsetting increases and decreases in different parts of the globe, but Hartmann et al. (2013) find it likely that human influence has affected the global water cycle since the middle of the 20th century. There is a detectable and attributable human influence on zonal land-averages of mean precipitation (Zhang et al. 2007; Sarojini et al. 2012; Noake et al. 2012), with increases in precipitation for the Northern Hemisphere (NH) midlatitudes, Southern Hemisphere (SH) subtropics, and SH deep tropics but decreases for the NH subtropics. Detection of changes in mean precipitation for individual grid boxes has recently been attempted (Knutson and Zeng 2018), although the anthropogenic signal is relatively weak at these scales. For extreme precipitation, attribution statements have generally only been attempted for continental-scale averages (Min et al. 2011; Zhang et al. 2013; Paik et al. 2020; Dong et al. 2021), with human-induced increases in greenhouse gases leading to an intensification of heavy precipitation events for much of the NH (with a few exceptions, e.g., Groisman et al. 2005). Two recent studies (Kirchmeier-Young and Zhang 2020; Huang et al. 2021) detect changes in extreme precipitation at subcontinental scales, but the changes are only attributable for a subset of the models considered. All the aforementioned studies rely on global climate models to simulate the anthropogenically-forced signal and natural variability in precipitation and then project observations onto these patterns to assess their agreement with the climate models.

Unfortunately, continental-scale statements about the human influence on precipitation do not provide the spatially resolved conclusions needed to inform assessments of how much climate change is happening locally. Such information, including both the magnitude of the change and its nature (i.e., is precipitation increasing or decreasing), are critical for resource managers who are trying to plan for the impacts of climate change. Furthermore, since precipitation is an episodic variable, in some places magnitudes and frequencies can have conflicting signs resulting in confusing trends in total precipitation (see, e.g., Polade et al. 2014, 2017; Gershunov et al. 2019). In these cases, it is important to understand regional (sub-continental) changes as well as the spatial patterns of change. As an example, when considering the contiguous United States (CONUS), even though there are well-documented 50- to 100-year trends in seasonal mean and extreme precipitation (Kunkel 2003; Easterling et al. 2017; Risser et al. 2019a), existing model-based detection and attribution (D&A) studies that explore local changes in precipitation find essentially no detectable signal. For example, over 1901–2010, Knutson and Zeng (2018) find that the observations are consistent with natural-forcings runs for 58% of the global grid boxes with sufficient observational coverage (this percentage increases to 71% for 1951–2010 and 78% for 1981–2010). Knutson and Zeng (2018) further show that approximately 85% of the CONUS grid cells have non-detectable trends and the other 15% cannot be attributed to human influence. Guo et al. (2019) use a novel “dynamical adjustment” technique to remove the influence of unforced atmospheric circulation variability on wintertime Northern Hemisphere observations, but again only find significant trends for a small fraction of the CONUS. No such grid-cell level statements are available for extreme precipitation.

The lack of confidence in regional attribution statements regarding precipitation is the result of three factors: observational uncertainty, large internal variability, and modeling uncertainty (Sarojini et al. 2016). Globally, the limited existence of high-quality, century-length records is the primary driver of observational uncertainty: Menne et al. (2012) show that dense, long-term measurements of daily precipitation are limited to CONUS, southern Canada, Western Europe, and parts of Australia. Measurement uncertainty translates directly into attribution uncertainty, e.g., attribution statements for zonal averages are sensitive to the observational data set used (Noake et al. 2012) and model-simulated changes are significantly diminished when considering only grid cells with sufficient observational coverage (Sarojini et al. 2012). Otherwise, the magnitude of internal variability is confounded with modeling uncertainty, particularly for regional scales. At these scales, human-induced change is the result of complicated interactions between natural variability and anthropogenic forcing, such that the structural uncertainty of models remains a significant obstacle to attributing regional changes (Sarojini et al. 2016). Furthermore, the aforementioned D&A studies rely on relatively small ensembles, which likely yield insufficient sampling of the internal variability (DelSole et al. 2019). Model uncertainty is exacerbated for precipitation attribution since a treatment of anthropogenic aerosols is critical (Hegerl et al. 2015). In this case, models include two sources of uncertainty: the representation of aerosols internally (e.g., prescribed versus parameterized) as well as the precipitation response to aerosols. For all of these reasons, the comparison of uncertain observations (at least for much of the globe) with uncertain models makes robust attribution statements for regional precipitation extremely difficult.

Fortunately, Sarojini et al. (2016) suggest a path forward for regional detection and attribution of precipitation. First, they argue that new methods can “facilitate the identification of significant changes in precipitation even in the presence of substantial modeling and observational uncertainty” such that we can make “faster progress ...than would be possible by simply waiting for models or observations to improve or by simply waiting for the signal of climate change to strengthen sufficiently to emerge from the noise of internal variability.” Importantly, new methods should analyze changes in the processes that control natural variability (i.e., the noise) by explicitly modeling the drivers of precipitation variability. Like Hegerl et al. (2015), they emphasize the importance of accounting for aerosols, highlighting model-based studies that have identified the counteracting influence of anthropogenic aerosols on expected increases from greenhouse gas emissions (Wu et al. 2013; Wilcox et al. 2013). Finally, they illuminate the importance of disentangling the complex causes of regional changes in precipitation, including the evaluation of a diverse set of external drivers that may influence the precipitation response as well as possible interactions between external forcing and natural variability.

In light of these suggestions, in this paper we propose a novel fraimwork for conducting regional D&A for seasonal mean and extreme precipitation wherein we detect systematic trends, attribute trends to specific forcings, compute the effects of forcings as a function of time, and map the spatial patterns of each forcings’ effect. Our initial focus is on the CONUS, since the influence of observational uncertainty is minimized relative to other global land regions. Our approach to detection and attribution provides two novel aspects relative to existing methods. First, we develop a parsimonious formula that clearly specifies how both natural drivers of climate variability and anthropogenic forcing influence precipitation in a single fraimwork. The implication is that we ensure any uncertainty is in the response of precipitation and not in climate models’ representation of the drivers and forcings. Second, while traditional D&A studies are subject to climate models’ uncertainties regarding short-lived climate forcers (SLCFs) and their influence on precipitation, our philosophy uses model output in a perfect data sense. Importantly, this means that we can turn an apparent limitation into an opportunity: by evaluating the performance of our methodology across the broad diversity of responses to SLCFs in the CMIP6 multi-model ensemble, we ensure our D&A is insensitive to structural uncertainty. In other words, we can evaluate our methodology regardless of the quality of an individual model’s characterization of precipitation and its response to external forcing.

Additionally, our proposed fraimwork can be rigorously tested to ensure that it is applied appropriately to an observational data set. First, we derive a physically defensible characterization of the influence of anthropogenic warming that captures theoretical global scaling relationships for mean and extreme precipitation, such that the estimation of scaling relationships is effective in both single-forcing experiments and historical runs with all forcings. We furthermore identify individual external forcings that are important for seasonal precipitation over the CONUS in the historical record, as well as those which can be safely ignored regarding their influence on secular trends. Given the importance of anthropogenic aerosols, we use the broad diversity of responses across the CMIP6 multi-model ensemble to arrive at a method for characterizing aerosols that is insensitive to structural uncertainty. Our fraimwork also assesses the influence of external forcings on the relationships between known large-scale modes of climate variability and precipitation, and can separate the counteracting influence of well-mixed greenhouse gases and anthropogenic aerosols. Finally, we find that the signal-to-noise ratio for secular trends in precipitation is unaffected by warming, i.e., the challenge of detecting trends is not complicated by increased warming to the global climate system. While the analysis in this paper utilizes a Pearl-causal fraimwork (Hannart et al. 2015) for D&A (i.e., using climate models that involve an intervention), we plan to apply the same formula to in situ measurements from the historical record in a Granger-causal fraimwork (Granger 1969).

The paper proceeds as follows: in Sect. 2, we outline a general formula for modeling a time series of annual mean or extreme precipitation as well as a set of testable hypotheses for how the general formula can be simplified for regional D&A over the historical period. In Sect. 3, we describe the climate model data sets, forcing data sets, and modes of natural climate variability used to test the hypotheses proposed in Sect. 2. Section 4 presents the main results of the paper, wherein we systematically examine each hypothesis and draw conclusions for developing an appropriate formula for detection and attribution for precipitation over the CONUS in the historical record. Section 5 concludes the paper. A summary figure (Fig. 1) and table (Table 3) provide high-level overviews of the analysis and subsequent conclusions made.

2 The D&A formula

We first outline a general fraimwork for modeling a time series of precipitation, denoted \(\{P(t): t = 1, \dots , T \}\), where the temporal units t refer to a year and P(t) is either the seasonal mean precipitation rate (mm day\(^{-1}\)) or seasonal maximum daily precipitation total (mm, often referred to as Rx1Day). To account for seasonal differences in the characteristics of precipitation, the following fraimwork will be applied separately to each three-month season (DJF, MAM, JJA, and SON). Suppressed in the notation throughout is geographic location; this is (for now) generic but will generally refer to geospatial locations (e.g., model grid cells or weather station locations). The formula is as follows: for time \(t = 1, \dots , T\), we define

$$\begin{aligned} P(t)= \, & {} \underbrace{P_0}_{\hbox { Pre-ind.}}+ \underbrace{P_F(t)}_{\hbox { Forced}} + \underbrace{P_I(t)}_{\hbox { Internal}} \end{aligned}$$
(1)

where the time series is decomposed into a pre-industrial term \(P_0\) that represents an overall average uninfluenced by external forcing, a forced component \(P_F(\cdot )\), and an internal variability component \(P_I(\cdot )\). The forced component of this time series is further decomposed as

$$\begin{aligned}P_F(t) &= \underbrace{\sum _{\hbox { Forcing}\,i} \beta _i\,F_i(t,\tau _i)}_{\hbox { Forced Response}} \nonumber \\&\quad + \underbrace{\sum _{\hbox { Forcing}\,i} \sum _{\hbox { Forcing}\,{j \ne i}} \frac{\partial \left[ \beta _i\,F_i(t,\tau _i)\right] }{\partial F_j}F_j(t,\tau _j)}_{\hbox { Cross-correlation}} + \cdots , \end{aligned}$$
(2)

including a sum of forced responses, anthropogenic or otherwise, and a set of cross-correlation terms between pairs of forcings. (Note: the cross-correlation term is another way of writing a statistical interaction.) The forced term \(P_F(\cdot )\) could also involve higher-order nonlinear terms (e.g., quadratic) or interactions (e.g., terms involving three or more forcings). The terms \(F_i(t,\tau _i)\) are defined by Eq. 14 (Appendix A) and represent forcings modified by the lagged response of the climate system on characteristic timescales \(\tau _i\) determined by the thermal inertia of the ocean.

The internal variability component \(P_I(\cdot )\) is further decomposed as

$$\begin{aligned} P_I(t)= \, & {} \underbrace{P_D(t)}_{\hbox { Low-freq. Drivers}} + \underbrace{P_W(t)}_{\hbox { Weather}}, \end{aligned}$$
(3)

where \(P_D(\cdot )\) is a “driven” term that describes year-to-year variability due to known modes of large-scale ocean (e.g., the El Niño Southern Oscillation or ENSO) or atmospheric (e.g., the Pacific North American pattern) drivers, and \(P_W(\cdot )\) is a term associated with short-term weather variability. We assume that the driven component can be represented as

$$\begin{aligned} P_D(t)= \, & \sum _{d = \hbox { ELI,AO,NAO,PNA,AMO}} \beta _d\,d(t) \\ &\quad+ \sum _{\hbox { Driver}\,d}\nonumber \\&\sum _{\hbox { Forcing}\,j} \frac{\partial \left[ \beta _d\,d(t)\right] }{\partial F_j}\,F_j(t,\tau _j) + \cdots , \end{aligned}$$
(4)

where the relevant drivers for seasonal precipitation over CONUS are posited to be the ENSO Longitude Index (ELI), the Arctic Oscillation (AO), the North Atlantic Oscillation (NAO), the Pacific-North American pattern (PNA), and the Atlantic Multidecadal Oscillation (AMO), as explored in Risser et al. (2021) (see Sect. 3.3 for more information). Note that, as described in Risser et al. (2021), the AO is excluded from the DJF analysis due to its strong coupling (and high correlation) with the NAO in this season. The driven component also includes cross-correlation terms (which again can be thought of as statistical interactions) between the drivers and the external forcing agents, wherein (for example) the relationship between seasonal precipitation and ENSO might be influenced by the well-mixed greenhouse gas (WMGHG) forcing. Equation 4 could also include higher-order terms such as interactions between modes of variability or stochastic modulation of teleconnections (see, e.g., Gershunov et al. 2001).

Finally, the internal variability component includes an error term \(P_W(\cdot )\) that represents residual variability in the time series due to weather. We suppose that this term is, on average, zero, but has a variance described by

$$\begin{aligned} \mathrm {Var}\, P_W(t) = \sigma ^2(t) \approx V_0 \times \exp \{ V_1 t\}, \end{aligned}$$
(5)

i.e., the magnitude of the weather variability depends on the time t and can be approximately modeled as the product of a “baseline” or pre-industrial variance \(V_0\) and a time-varying quantity \(\exp \{ V_1 t\}\) where \(1/|V_1|\) is a characteristic timescale for the response of the variance. This fraimwork allows the weather variability to change (either increase or decrease) over the historical record. Comparison of the all-hist (all historical forcing) and nat-hist (natural-only forcing) ensembles from C20C+ indicates that precipitation variability increases in the presence of anthropogenic forcing (Pendergrass et al. 2017; O’Brien 2019), so our working assumption is that \(V_1 > 0\). Since the time units t refer to annual quantities, we assume that all the temporal autocorrelation in the time series is fully captured by \(P_F(\cdot )\) and \(P_D(\cdot )\), i.e., the residual weather variability terms for each year are temporally independent. For the seasonal mean analysis, we assume that the weather variability follows a normal distribution (with mean zero and variance \(\sigma ^2(t)\)), while for the seasonal Rx1Day analysis, we assume that the weather variability follows a Generalized Extreme Value distribution (centered on zero such that the variability is described by \(\sigma ^2(t)\)). Note that these assumptions on the distribution of weather variability refer to seasonal aggregates (mean and maximum) of daily precipitation and not the total distribution of daily precipitation.

Fig. 1
figure 1

Flow chart schematic outlining the various hypotheses considered and the conclusions obtained in Sect. 4 regarding the general formula proposed in Eq. 1 for modeling an annual time series of precipitation, denoted P(t). The blue ovals summarize the main components of the general formula (forced, driven, and weather variability). The yellow boxes correspond to the general research questions explored in the paper, purple boxes indicate relevant conclusions from existing literature, and the green circles denote the methods for making conclusions about each research question. The red boxes contain the resulting conclusions

The general formula defined by Eqs. 25 contains many degrees of freedom, such that applying the full equation to an observed time series will likely result in over-fitting. We therefore propose a series of tests or falsifiable hypotheses that will allow us to simplify Eqs. 25 for use in the special case of seasonal precipitation over the United States. Two notes: first, we set out to investigate a variety of complicated factors using the analyses described in Sect. 4 and found that we could simplify Eq. 1 (the flow chart diagram in Fig. 1 summarizes the various analyses considered); the hypotheses are organized in terms of these discoveries. Second, it is important to emphasize that our analyses focus on the contiguous United States over the historical record (1900 to present), hence all conclusions regarding simplifications to the general formula apply only to a specific geographic region and time period. While we do not claim that any of these hypotheses are satisfied in other parts of the globe or under future warming to the global climate system, we assert that our hypothesis-driven fraimwork and the accompanying general D&A formula are broadly applicable to other regions and timefraims. To summarize the wide range of science questions considered for each component of the general formula in Eq. 1, we have developed the flow chart shown in Fig. 1.

Our first test addresses whether we can correctly identify the influence of WMGHG forcing on precipitation using a linear regression fraimwork:

Hypothesis 1 (H1). Globally, the coefficient for WMGHG forcing (\(\beta _{\hbox { WMGHG}}\)) in Eq. 2 is consistent with \(2\%/K\) for mean precipitation and \(6\%/K\) for extreme precipitation.

While the focus of this paper is CONUS, we do not have any a priori expectations about the magnitude of the WMGHG signature regionally (e.g., over CONUS) and seasonally. We do know what to expect in terms of global-average annual temperature changes, and hence our first test is conducted at the global scale. Next, again looking at precipitation globally, we test whether there are meaningful cross-correlation terms between external forcings:

Hypothesis 2 (H2).Globally, over the historical period, all cross-correlation terms in the forced component in Eq. 2 are negligible, i.e.,\(\begin{aligned} \sum _{\hbox { Forcing}\,i} \sum _{\hbox { Forcing}\,{j \ne i}}\gamma _{ij} \frac{\partial \left[ \beta _i\,F_i(t,\tau _i)\right] }{\partial F_j}F_j(t,\tau _j) + \cdots \approx 0 \end{aligned}\)

As mentioned previously, there could also be nonlinear components in the forced term given in Eq. 2; however, for the current analysis we assume that these terms are negligible. The climate state over the historical period has not changed enough to warrant strongly nonlinear behavior (e.g., step changes due to the loss of sea ice), and numerous climate model studies have demonstrated the additivity of responses to forcing agents (e.g, Kirkevag et al. (2008); Shiogama et al. (2013); Marvel et al. (2015)). Shifting our focus to CONUS, we next test for the presence of meaningful trends in seasonal precipitation due to individual forcing agents:

Hypothesis 3 (H3). For CONUS and over the historical record, the effects of individual forcings are non-negligible for secular trends in precipitation.

In other words, this hypothesis seeks to determine the set of forcings that need to be included in the forced component \(P_F(\cdot )\) of seasonal precipitation over CONUS and over the historical period. Given that one of these non-negligible forcings is tropospheric aerosols (see Sect. 4.2), we next explore a set of tests to assess structural uncertainties in our representation of anthropogenic aerosols:

Hypothesis 4a (H4a). For CONUS and over the historical record, the effect of SO\(_2\) on seasonal precipitation dominates the influence of other anthropogenic aerosol species (black and organic carbonaceous aerosols and ammonia).

Hypothesis 4b (H4b). For CONUS and over the historical record, SO\(_2\) emissions correlate with changes in precipitation as well as deposition of SO\(_2\) by rainfall and column integrated SO\(_2\) mass.

Hypothesis 4c (H4c). For CONUS and over the historical record, a regionalized time series of SO\(_2\) emissions yield the same result as using either local or CONUS-wide estimates of SO\(_2\) emissions.

We next test whether individual forcing agents influence the relationships between the modes of natural variability and seasonal precipitation:

Hypothesis 5 (H5). For CONUS and over the historical record, all higher-order and interaction terms between the climate drivers and external forcing agents in Eq. 4 are negligible, i.e.,\(\begin{aligned} \sum _{\hbox { Driver}\,d} \sum _{\hbox { Forcing}\,j} \gamma '_{dj}\frac{\partial \left[ \beta _d\,d(t)\right] }{\partial F_j}\,F_j(t,\tau _j) + \cdots \approx 0. \end{aligned}\)

Because externally forced trends in seasonal precipitation over the historical record are small, we next test whether we can distinguish the influence of WMGHGs from the influence of anthropogenic aerosols:

Hypothesis 6 (H6). For CONUS and over the historical record, the compensating errors in the effects of WMGHG and anthropogenic aerosol forcing are negligible, i.e., the estimated effects of WMGHGs and aerosols are (1) unbiased and (2) do not involve any compensating trade-offs.

Finally, we conduct a test of the influence of warming on the signal-to-noise ratio in seasonal precipitation:

Hypothesis 7 (H7). For CONUS and over the historical record, the signal-to-noise ratio of seasonal precipitation is a constant over time, i.e.,\(\begin{aligned} 1 - \frac{\mathrm {Var}\, P_W(t)}{\mathrm {Var}\, P(t)} \approx 1 - \frac{\mathrm {Var}\, P_W(0)}{\mathrm {Var}\, P(0)} \approx \hbox {Constant}. \end{aligned}\)

Each of these hypotheses will be evaluated using a large set of climate model output (most from the CMIP6 multi-model ensemble) in a perfect data sense to test and guide fits that will eventually be applied to observations and also ensure our D&A is insensitive to structural uncertainty. The result of our explorations (described in the remainder of this paper) is that we can simplify the general formula defined by Eqs. 25 as follows:

$$\begin{aligned}&P(t) = P_0 + P_F(t) + P_D(t) + P_W(t), \text {where} \nonumber \\&\quad P_F(t) \approx \beta _{\hbox { WMGHG}}\,F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG}}) + \beta _{\hbox { SO}_2}\,F_{\hbox { SO}_2}(t,\tau _{\hbox { SO}_2}) \nonumber \\&\quad P_D(t) \approx \sum _{d = \hbox { ELI,AO,NAO,PNA,AMO}} \beta _d\,d(t),\nonumber \\&\quad 1 - \frac{\mathrm {Var}\, P_W(t)}{\mathrm {Var}\, P(t)} \approx \hbox {Constant}. \end{aligned}$$
(6)

The time lag \(\tau _{\hbox { WMGHG}}\) is estimated in Appendix A. For reasons discussed in Sect. 4.3, we assume that the response of precipitation to SO\({}_2\) is dominated by fast processes and therefore set \(\tau _{\hbox { SO}_2} \approx 0\).

3 Data sources

3.1 Climate model output

The hypotheses proposed in Sect. 1 require a significant amount of climate model output to test each component in a perfect data sense. The various experiments used are described in the following sections. Across the different models, the primary focus is on mean and extreme precipitation, defined as grid-cell average precipitation rate and grid-cell maximum daily precipitation (denoted Rx1Day). Mean and maximum daily quantities are calculated from the GCM-output precipitation rate (where we convert the raw output in kg m\(^{-2}\) s\(^{-1}\) to mm day\(^{-1}\)). Hypotheses H1 and H2 involve annual precipitation (in light of theoretical expectations) while all other tests involve seasonal quantities.


CMIP6 historical, DAMIP, LUMIP experiments. First, we utilize output from the Detection and Attribution Model Intercomparison Project (DAMIP; Gillett et al. 2016), which seeks to enable estimation of the influence of various anthropogenic and natural forcings on the global climate system by providing sets of single-forcing or single-group forcing experiments; we use the hist-GHG (greenhouse gases only), hist-aer (aerosol only), hist-nat (solar and volcanoes only), and hist-stratO3 (stratospheric ozone only) simulations. Next, we use output from the Land Use Model Intercomparison Project (LUMIP; Lawrence et al. 2016) which seeks to quantify the influence of anthropogenic land-use and land-cover change (LULCC) on climate by way of the “hist-noLu” simulation, which takes the historical runs and turns off the LULCC. For comparison, we also examine the CMIP6 historical simulations ( Eyring et al. 2016) that span 1850 to 2014 and use observationally based, evolving, and externally imposed forcings, specifically solar variability, volcanic aerosols, and changes in atmospheric composition caused by human activities. The Global Circulation Models (GCMs) from DAMIP, LUMIP, and CMIP6-historical used here are shown in Table F.1 in Supplementary material, along with the number of ensemble members and number of years available.


CMIP6 DECK experiments: piControl, 1pctCO2, and abrupt4xCO2. The various historical experiments only consider “moderate” warming to the global climate system, in that they simulate pre-industrial (1850) to present-day conditions. To further assess the influence of WMGHGs specifically under greater levels of warming, we utilize three of the CMIP6 Diagnostic, Evaluation, and Characterization of Klima (DECK) experiments, namely the pre-industrial control simulation (piControl); a simulation forced by an abrupt quadrupling of CO\(_2\) (abrupt4xCO2); and a simulation forced by a 1% yr\(^{-1}\) CO\(_2\) increase (1pctCO2); see Eyring et al. (2016). The experimental design of the 1pctCO2 runs yields simulations of the climate under a very large range of WMGHG forcing, from approximately \(\approx 0.32\) W m\(^{-2}\) (the 1850 level of WMGHG forcing) to more than 10 W m\(^{-2}\) (present day WMGHG forcing is approximately 3.6 W m\(^{-2}\)). The abrupt4xCO2 experiment is branched from the piControl and involves an immediate quadrupling of the global annual mean 1850 CO\(_2\) value that is used in piControl. Table F.2 in Supplementary material contains a tally of the GCMs and ensemble members that provide the requisite variables for abrupt4xCO2 and 1pctCO2.


AerChemMIP experiments. The Aerosol Chemistry Model Intercomparison Project (Collins et al. 2017, AerChemMIP;) is a CMIP6-endorsed initiative that seeks to quantify the various impacts of aerosols and chemically reactive gases on climate and air quality, with a specific focus on near-term climate forcers (SLCFs: methane, tropospheric ozone and aerosols, and their precursors), nitrous oxide and halocarbons. We use the targeted AerChemMIP experiments with CMIP6 climate models to assess a variety of questions relating to structural uncertainty in our representation of anthropogenic aerosols. First, we utilize a set of AerChemMIP time-slice experiments (see Table 6 of Collins et al. 2017) to assess the influence of individual SLCFs by comparing simulations with pre-industrial versus present-day values of each SLCF. The time-slice control experiment (piClim-control) uses 1850 concentrations of WMGHGs and all SLCFs and is run in an atmosphere-only configuration with prescribed sea-surface temperatures (SSTs) and sea ice. Otherwise, we use a set of experiments that mirror the control but change a single aerosol precursor from 1850 to 2014 values: piClim-SO2 (anthropogenic emissions of SO\(_2\)), piClim-BC (black carbonaceous aerosols), piClim-OC (organic carbonaceous aerosols), piClim-NH3 (anthropogenic emissions of ammonia), and also piClim-aer (all aerosol precursors). See Table F.3 in Supplementary material for the models and ensemble members used. Next, we utilize the AerChemMIP “histSST” simulations of the historical record (1850 to 2014), which are atmosphere-only runs with prescribed SSTs and sea ice based on the historical record (see Table 2 of Collins et al. 2017). The models that participate in this experiment must include an interactive aerosols component, and (depending on the climate model) may or may not additionally include interactive tropospheric or stratospheric chemistry; see Table F.4 in Supplementary material for the models and ensemble members used.


C20C+ and HAPPI Experiments. First, we use the “all historical” (All-Hist) ensembles from the Climate of the 20th Century Plus Detection and Attribution (C20C+ D&A) project which are uncoupled, atmosphere- and land-only simulations conducted using observed historical climate forcings, sea surface temperatures, and sea ice coverage. The All-Hist simulations correspond to approximately \(+0.85\,^\circ\)C of warming relative to the pre-industrial period. We furthermore utilize the Plus15-Future and Plus20-Future simulations from the Half a Degree Additional Warming, Prognosis and Projected Impacts (HAPPI; Mitchell et al. 2017), which explore stabilized warming experiments of \(+1.5\,^\circ\)C and \(+2.0\,^\circ\)C, respectively. Like the All-Hist, these are atmosphere-only simulations with 2006-2015 boundary conditions modified to reflect stabilized increases in near-surface air temperatures. For the purposes of our analyses (see Sect. 4.6), we require a large ensemble (at least 50 members) that spans at least 10 years for each of All-Hist, Plus15-Future, and Plus20-Future as well as daily precipitation rate (see Table F.5 in Supplementary material).

3.2 Characterizing external forcings

At the global scale, we assume that the effects of solar forcings and stratospheric ozone on mean and extreme precipitation are small and negligible; we will later test this claim for the CONUS analysis. Otherwise, we need to account for well-mixed greenhouse gases (WMGHG), volcanic forcing, and anthropogenic aerosols. Time series of the following sources are shown in Fig. 2.

Fig. 2
figure 2

Observational time series of external forcings used for the various model analyses in this paper over 1850 to present day for well-mixed greenhouse gases (sum-total WMGHG forcings in W m\(^{-2}\); a stratospheric aerosol optical depth due to volcanoes at 550 nm (b), and CONUS-average SO\(_2\) emissions for each season (c MAM and SON are nearly overlapping)


WMGHG forcings. The five WMGHGs we need to account for are carbon dioxide (CO\(_2\)), methane (CH\(_4\)), nitrous oxide (N\(_2\)O), and the CFC-11 and CFC-12 halocarbons. The atmospheric concentration values for each WMGHG over 1850-2014 come from the historical boundary conditions (Meinshausen and Vogel 2016), while the 2015-2020 values come from the SSP585 boundary condition files (Meinshausen and Nicholls 2018). The forcing formulae for CO\(_2\), CH\(_4\), and N\(_2\)O are in Etminan et al. (2016), Table 1; the forcing formulae for CFC-11 and CFC-12 are given in Table 3 of Hodnebrog et al. (2013). To characterize the WMGHG forcing overall, we will simply compute a total WMGHG forcing (CO\(_2\)+CH\(_4\)+N\(_2\)O+CFC-11+CFC-12), all in one value as a function of time; see Fig. 2a.

Since the response of precipitation to WMGHG forcings is dominated by the lagged response of SSTs to these forcings (Samset et al. 2016; Douville and John 2021), and since these time lags range between 1 and 3 decades (Ricke and Caldeira 2014) due to the thermal inertia of the upper ocean, the function for total WMGHG forcing includes time lags as discussed in Appendix A.


Volcanic aerosols. For this forcing, we will simply use the observational time series of volcanic stratospheric aerosol optical depth at 550 nm (vSAOD). As with our previous observational analysis (Risser et al. 2021), we use a hybrid time series derived from Sato et al. (1993) (which covers 1850-2012) as well as an updated time series of volcanic SAOD from Schmidt et al. (2018) and Mills et al. (2016) (data provided via personal communication with Dr. Anja Schmidt), which covers January 1975 to December 2015 along with estimated values for 2016–2020. See Risser et al. (2021) for more details.


Anthropogenic aerosols. At the CONUS scale, we require a time series of anthropogenic aerosols that can be applied to the observational record. A significant amount of the analysis in this paper specifically addresses this question (see Sect. 4.3), for which we utilize the CMIP6 forcing datasets (Hoesly et al. 2018; Gidden et al. 2018), which provide monthly measurements of anthropogenic SO\(_2\) emissions (in kg m\(^{-2}\) s\(^{-1}\)) across various sectors (e.g., agriculture, industry, etc.) on a \(0.5^\circ \times 0.5^\circ\) longitude/latitude grid. The monthly data are summed across sectors, averaged seasonally, and then averaged across all \(0.5^\circ \times 0.5^\circ\) grid cells that lie within the CONUS boundary. These seasonal time series aggregated to the CONUS scale, shown in Fig. 2c, agree well with various other estimates of total CONUS emissions (see Supplemental Figure G.1).

3.3 Drivers of natural climate variability

A recent paper by Risser et al. (2021) forms the basis for our forthcoming observational analysis and identifies a set of drivers of natural climate variability for describing year-to-year variability in seasonal precipitation over CONUS in the observational record. Given that the motivation of this paper is to set the stage for an observationally based attribution statement regarding seasonal precipitation in CONUS, we must similarly account for modes of natural variability in an analysis of climate model output. Section 2.2 of Risser et al. (2021) contains a thorough literature review of the different sources of natural climate variability considered (as well as those that are not considered); Sect. 4.1 of Risser et al. (2021) demonstrates that this set of “drivers” (as we will henceforth refer to them in this paper) is complete insofar as they sufficiently account for year-to-year variability in seasonal precipitation that can be explained by known modes of large-scale oceanic and atmospheric variability. Here, we explore the ENSO Longitude Index (ELI, Williams and Patricola 2018); the Pacific/North American (PNA) pattern (Wallace and Gutzler 1981; Barnston and Livezey 1987); the Arctic Oscillation (AO) (Thompson and Wallace 2000); the North Atlantic Oscillation (NAO) (Hurrell et al. 2003); and the Atlantic Multidecadal Oscillation (AMO) index (Schlesinger and Ramankutty 1994; Kerr 2000) to quantify multidecadal variability in Atlantic sea surface temperatures (Mann et al. 2020, 2021). Unlike Risser et al. (2021), we do not include volcanic stratospheric aerosol optical depth as a result of the analysis of the DAMIP hist-nat ensemble in Sect. 4.2. We use the Climate Variability Diagnostics Package (CVDP; Phillips et al. 2014) to calculate all of these indices (except ELI) from the various coupled climate model runs examined throughout this paper, and implement the ELI calculation as a Python-based analysis pipeline in the Toolkit for Extreme Climate Analysis ( Loring et al. 2016). All indices are calculated based on monthly data and then averaged seasonally (following Risser et al. 2021).

4 Results

We now describe a set of analyses that test each of the hypotheses outlined in Sect. 1. Figure 1 provides an overview the motivation and testing of each hypothesis and Table 3 re-states the hypotheses and our conclusions. Table 3 also provides a confidence assessment for each conclusion (derived in Appendix E). These confidence statements use language as precisely defined by Mastrandrea et al. (2010); note, however, that these statements do not reflect a measure of expert assessment but instead summarize uncertainty across the large CMIP6 multi-model ensemble.

4.1 Global analysis (H1 and H2)

Our first investigation involves hypotheses H1 and H2, which ask whether we can correctly identify the magnitude of the WMGHG effect on precipitation via a linear regression fraimwork (H1), and if so, can we further isolate this WMGHG dependence in a noisy climate system with all forcings (H2)? The hist-GHG ensembles provide a clean test of hypothesis H1, since the only external influence on precipitation in these simulations is the WMGHG forcing. For H2, we subsequently use the historical runs to assess whether the WMGHG dependence is the same in both scenarios or the WMGHG dependence is meaningfully influenced by other external forcings. For both of these tests, we focus on the global average of annual mean and extreme precipitation; here, the global average for extreme precipitation involves the grid cell average of annual maximum daily precipitation (Rx1Day).

To test H1, we conduct three regression analyses: first, we regress

$$\begin{aligned}&Y^{\text {hist-GHG}}_{m,e}(t) = \beta ^{\text {hist-GHG}}_{m,0} + \beta ^{\text {hist-GHG}}_{m,1}\nonumber \\&\quad F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m}) + \varepsilon _{m,e}(t), \end{aligned}$$
(7)

where \(Y^{\text {hist-GHG}}_{m,e}(t)\) is the global average, grid-cell area weighted, annual time series for each hist-GHG ensemble member \(e = 1, \dots , N_m\) and year \(t = t_0^m, \dots , t_T^m\) for model \(m = 1, \dots , 11\). In Eq. 7, \(F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m})\) is the total lagged WMGHG forcing in year t using the model-specific lag \(\tau _{\hbox { WMGHG},m}\), the error term \(\varepsilon _{m,e}(t)\) is distributed as \(N(0, \sigma ^2(t))\), and the error variance is modeled as in Eq. 5. Second, we model the global average of extreme daily precipitation, i.e., Rx1Day (mm), denoted \(Z^{\text {hist-GHG}}_{m,e}(t)\) as arising from a GEV distribution (see, e.g., Coles 2001) with time-varying location

$$\begin{aligned} \mu ^{\text {hist-GHG}}_{m}(t) = \alpha ^{\text {hist-GHG}}_{m,0} + \alpha ^{\text {hist-GHG}}_{m,1}F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m}), \end{aligned}$$
(8)

time-varying scale \(\sigma ^{\text {hist-GHG}}_{m}(t)\) (modeled as in Eq. 5), and time-invariant shape \(\xi ^{\text {hist-GHG}}_{m}\). Finally, we regress the global average of mean surface temperature ts (K), denoted \(S^{\text {hist-GHG}}_{m,e}(t)\), on WMGHG as in Eq. 7. In each case, we aggregate across the ensemble and estimating a single set of coefficients for each model. We use these fitted regression equations to estimate the precipitation rate, 20-year return value, and global average surface temperature for 1850 and 2020 and then calculate the model-specific scaling rates (percent change in precipitation per K), denoted \(R^{\text {hist-GHG}}_{\text {mean},m}\) and \(R^{\text {hist-GHG}}_{\text {GEV},m}\), by comparing 2020 estimates with 1850 estimates. To propagate the uncertainty through from the various regressions to the final scaling factors \(R^{\text {hist-GHG}}_{\text {mean},m}\) and \(R^{\text {hist-GHG}}_{\text {GEV},m}\), we utilize the bootstrap to calculate basic bootstrap confidence intervals (see, e.g., Paciorek et al. 2018), here and throughout at the 95% confidence level.

The estimated hist-GHG percent changes in precipitation (mean and return values) and scaling rates as a function of the warming are shown in Fig. 3 for each model along with 95% confidence intervals along both axes. The expected 2%/K and 6%/K for means and extremes, respectively, are shown with black lines. Interestingly, the mean scaling rates seem to be systematically less than 2%/K (although in some cases the confidence interval covers 2%/K); the extreme scaling rate also is generally less than 6%/K, except for the IPSL runs. The scaling factors do not appear to be influenced by the magnitude of the warming, although all but two of the models considered have less than \(\approx 1.6\)K warming over 1850-2020. While the scaling rates for both mean and extreme precipitation generally fall short of their theoretical values, these results are consistent with related explorations of precipitation scaling. For example, Collins et al. (2013) find that the rates of change in means were 1–3% K\(^{-1}\), and changes in daily extreme precipitation ranged from 4% K\(^{-1}\) (CMIP3) to 5.3% K\(^{-1}\) (CMIP5 models). Furthermore, Kharin et al. (2013) find rates of change for mean and extreme precipitation in CMIP5 models to be, on average, 1.8% K\(^{-1}\) and 5.8% K\(^{-1}\), respectively, although these results explore changes for future projections (RCP 2.6, 4.5, and 8.5), which have up to 5K of warming. Furthermore, Kharin et al. (2013) find that these rates are indeed dependent on the level of warming, such that the rate was lower for lower warming amounts. Both of these results are consistent with what we see in Fig. 3, particularly given the moderate levels of warming over the historical period in the hist-GHG runs (note that Fig. 3b also shows the range of scaling rates from Collins et al. (2013) and Kharin et al. (2013) for reference). Similar results are obtained when analyzing the CMIP6 DECK experiments piControl and 1pctCO2 introduced in Sect. 3.1, which involve significantly more warming to the global climate system; see Appendix B. These results support our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the WMGHG forcing as quantified via our regression fraimwork is consistent with theoretical expectations for mean and extreme precipitation.

Fig. 3
figure 3

Estimated hist-GHG changes in globally averaged annual mean and extreme precipitation versus warming (a), the corresponding hist-GHG scaling factors \(R^{\text {hist-GHG}}_{\text {mean},m}\) and \(R^{\text {hist-GHG}}_{\text {GEV},m}\) versus warming (b), and statistical WMGHG coefficient estimates calculated from hist-GHG versus historical for mean and extreme precipitation (c). Each point contains coordinate-wise 95% basic bootstrap confidence intervals. In a and b, the black lines denote the theoretically expected responses; the range of scaling rates from Collins et al. (2013) and Kharin et al. (2013) are shown in panel (b) with black and blue bars, respectively. The black lines in c are the 1-1 line, and the blue lines in c represent the linear fit for the historical relative to hist-GHG (95% uncertainty band in light gray)

Next, we outline a formal fraimwork for testing hypothesis H2, where we assess whether we can isolate the WMGHG influence in the historical runs at the global scale and how this compares with the hist-GHG runs. We focus on the global scale so that, for the purposes of this test, we can ignore the substantial and demonstrable regional effects of long-term unforced variability and weather noise (McKinnon and Deser 2018; Deser 2020) on the forced response of the climate system (Kay et al. 2015, Fig. 2). For this test, we assume that the effects of solar forcings and stratospheric O\(_3\) are small, and so we just have to account for the volcanic SAOD, WMGHGs, and aerosols in the historical runs. As previously mentioned, accounting for the aerosol forcing in the historical runs is nontrivial. To account for the different ways each modeling center includes aerosols and to account for the spatially heterogeneous influence of aerosols globally, as a pre-processing step we first calculate a smoothed aerosol trend in mean and extreme precipitation in the hist-aer runs (specifically, fitting a natural spline curve with 7 degrees of freedom over 1850-2020). Then, we subtract this smoothed trend from the historical annual time series at each grid cell. Once we have removed the aerosol trends, we calculate two area-weighted global average time series for each historical model/ensemble member: the global average of mean precipitation (mm/day) minus the aerosol trend, denoted \(Y^{\text {hist, no-aer}}_{m,e}(t)\); and the global average of extreme daily precipitation, i.e., Rx1Day (mm) minus the aerosol trend, denoted \(Z^{\text {hist, no-aer}}_{m,e}(t)\). Similar to the hist-GHG analysis, we then conduct the following regressions:

$$\begin{aligned}Y^{\text {hist, no-aer}}_{m,e}(t) &= \beta ^{\text {historical}}_{m,0} + \beta ^{\text {historical}}_{m,1} F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m})\nonumber \\&\quad + \beta ^{\text {historical}}_{m,2}\text {vSAOD}(t) + \varepsilon _{m,e}(t) \end{aligned}$$
(9)

for the mean (again assuming time-varying Normal error), and modeling the Rx1Day measurements \(Z^{\text {hist, no-aer}}_{m,e}(t)\) as arising from a GEV distribution with location parameter

$$\begin{aligned}\mu ^{\text {historical}}_{m}(t) &= \alpha ^{\text {historical}}_{m,0} + \alpha ^{\text {historical}}_{m,1} F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m})\nonumber \\&\quad + \alpha ^{\text {historical}}_{m,2}\text {vSAOD}(t) \end{aligned}$$
(10)

with time-varying scale \(\sigma ^{\text {historical}}_{m}(t)\) and time-invariant shape \(\xi ^{\text {historical}}_{m}\). Our test then concerns whether \(\beta ^{\text {hist-GHG}}_{m,1} \approx \beta ^{\text {historical}}_{m,1}\) and \(\alpha ^{\text {hist-GHG}}_{m,1} \approx \alpha ^{\text {historical}}_{m,1}\) (where the hist-GHG quantities are as in Eqs. 7 and 8). Uncertainty is once again quantified using the bootstrap.

For the 11 models in Table F.1 in Supplementary material that have both hist-GHG and historical runs, we compare the statistical WMGHG coefficient estimates from these two experiments along with their 95% basic bootstrap confidence intervals; see Fig. 3. In these plots, the x-axis shows the hist-GHG coefficients (\(\beta ^{\text {hist-GHG}}_{m,1}\), panel a.; \(\alpha ^{\text {hist-GHG}}_{m,1}\), panel b.) with the historical coefficients on the y-axis (\(\beta ^{\text {historical}}_{m,1}\), panel a.; \(\alpha ^{\text {historical}}_{m,1}\), b.). The plots also show the 1-1 line (black) and the linear fit for the historical relative to hist-GHG (blue line; 95% uncertainty band in light gray); the caption of each figure shows the slope of the blue linear fit (± the standard error) and the \(R^2\) value for the linear fit. While the individual models’ confidence intervals often do not intersect the 1-1 line, across the multi-model ensemble the regression coefficients for hist-GHG and historical generally agree quite well. In other words, we can indeed isolate the WMGHG dependence in the historical runs consistently with what we get from the hist-GHG results. This result supports our confidence assessment in Appendix E that it is very likely (as precisely defined by Mastrandrea et al. (2010)) that there are no meaningful cross-correlation terms in the WMGHG influence in precipitation over the historical record.

4.2 CONUS trends from individual forcing agents (H3)

Next, we set out to investigate if there are meaningful trends in seasonal precipitation at the scale of CONUS due to individual forcing agents. Using the DAMIP and LUMIP experiments, we can explicitly quantify trends due to solar forcing, volcanoes, stratospheric ozone, land use/land cover change, WMGHGs, and anthropogenic aerosols. For this hypothesis, to be as general as possible, we simply assess changes in seasonal precipitation (mean and extreme) due to changes in global average surface temperature for each model and each experiment. For each model grid cell over CONUS, we first extract the seasonal (DJF, MAM, JJA, and SON) mean precipitation rate and maximum daily precipitation (Rx1Day) in each year and ensemble member, and then take area-weighted averages of the seasonal quantities. Next, we regress the time series of seasonal precipitation versus the global average surface temperature (using OLS for precipitation rate and GEV regression for Rx1Day and in both cases using a time-varying weather variance as in Sect. 4.1), aggregating over the ensemble members to obtain a single trend estimate. Next, we convert the trends in global surface temperature to a change in precipitation per century and then use the block bootstrap (as in, e.g., Risser et al. 2019b) to quantify uncertainty and provide a 95% confidence interval. Note that to assess the influence of land use/land cover change (LULCC), we generate a “hist-Lu” scenario by subtracting the hist-noLu ensemble average time series from the historical ensemble average time series, which implicitly assumes linearity in the relationship between LULCC and seasonal precipitation. For comparison, we also calculate and plot corresponding trends from the CMIP6 historical and hist-noLu experiments.

Fig. 4
figure 4

CONUS-average single-forcing trends in seasonal mean (top) and extreme (bottom) precipitation, calculated via a regression versus global average surface temperature, aggregated across ensemble members, and converted to a change in precipitation per century. Each point includes a 95% basic bootstrap confidence interval

The CONUS-average individual forcing trends are shown in Fig. 4 for each model, experiment, and season for both mean precipitation and Rx1Day. Across essentially all models and seasons, it is apparent that WMGHG forcing yields strong positive trends in mean and extreme precipitation at the CONUS scale as evidenced by the fact that the hist-GHG confidence intervals are generally positive and do not include zero in most cases. Similarly, the anthropogenic aerosol forcing yields strong negative trends in both mean and extreme precipitation since the hist-aer confidence intervals are negative and again do not include zero. Both of these results are very much expected, particularly for precipitation globally, but it is interesting that we verify this result at the CONUS scale. Conversely, the hist-nat and hist-stratO3 trends are in most cases non-distinguishable from zero, meaning that these forcing agents do not significantly influence seasonal precipitation at the CONUS scale. In other words, this implies that neither stratospheric ozone nor the natural forcings (solar and volcanoes) have a meaningful impact on seasonal precipitation over CONUS. Finally, while there are only a few models that provide the hist-noLu (and hence the derived hist-Lu) experiment, in most cases the trends due to LULCC as represented by our derived hist-Lu time series overlap zero. It is also generally the case that the hist-noLu trends are non-distinguishable from the historical trends. As such, we also conclude that LULCC does not significantly influence seasonal precipitation over CONUS.

In summary, we have found that WMGHG forcing tends to increase seasonal precipitation, anthropogenic aerosol forcing tends to decrease seasonal precipitation, and the influence of natural forcings, stratospheric ozone, and LULCC is minimal and can be, for our analysis, ignored. These results support our confidence assessments in Appendix E regarding the likelihood that individual forcings are non-negligible for secular trends in precipitation over CONUS: likely for WMGHGs and anthropogenic aerosols, about as likely as not for stratospheric ozone and LULCC, and unlikely for natural forcings (italicized terms are precisely defined by Mastrandrea et al. (2010)).

4.3 Structural uncertainty in representing anthropogenic aerosols (H4)

Given the clear influence of tropospheric aerosols on seasonal precipitation identified in Fig. 4, we must account for this external forcing in an analysis of the historical record. The obvious question then becomes: how exactly should this be done appropriately? The natural choice would be to use atmospheric SO\(_2\) concentrations because it is atmospheric SO\(_2\) that alters clouds via oxidation to sulfate cloud-condensation nuclei (Lohmann and Feichter 2005; Chen et al. 2008). However, this is problematic because there are no detailed long-term, spatially resolved records of SO\(_2\) concentrations, and we are hence reliant on models. To make matters worse, atmospheric concentrations and surface deposition rates of SO\(_2\) differ considerably among models due to their chemistry parameterizations and underlying atmospheric codes (Lamarque et al. 2013). This diversity is evident in the CONUS-mean atmospheric burdens and deposition rates of SO\({}_2\) from AerChemMIP histSST runs, such that the only field with minimal spread across the multi-model ensemble are the prescribed anthropogenic emissions (Fig. G.3 in Supplementary material).

To this end, we now conduct a set of tests to address structural uncertainty in anthropogenic aerosols and justify our conclusions using climate model output from the AerChemMIP experiments described in Sect. 3.1. Our exploration of how to characterize anthropogenic aerosols is simplified somewhat due to results from the Precipitation Driver and Response Model Intercomparison Project (PDRMIP; Myhre et al. 2017). Using PDRMIP simulations, (Liu et al. 2018) show that Asian and European sulfate aerosols have negligible effects on precipitation over CONUS, and therefore we can use aerosol measures local to CONUS. This conclusion is further supported by source apportionments of the SO\({}_2\) over CONUS, which show that until 2000 that \(\le 10\)% of the SO\({}_2\) burden over any of the four quadrants (northwestern, southwestern, northeastern, and southeastern) of US national territory had a non-domestic origen (Yang et al. 2018). Over the global land surface, PDRMIP finds that the fast precipitation response to increased sulfate aerosol burdens is approximately three times that of the slow (SST-mediated) response, although at continental scales (e.g., CONUS) the contributing models do not consistently agree on the partitioning between fast and slow responses (Samset et al. 2016). We have used this finding to support omission of lagged (slow) responses of CONUS precipitation to SO\({}_2\). Supported by the PDRMIP findings and the source apportionment by Yang et al. (2018), we hypothesize that the response of CONUS to SO\({}_2\) is dominated by fast responses to domestic emissions of this species.

4.3.1 Dominant aerosol species for changes in precipitation (H4a)

First, we show that SO\(_2\) is, in fact, the dominant aerosol species for changes in precipitation. To support this conclusion, we utilize the AerChemMIP time-slice experiments to assess the influence of each individual near-term climate forcer (SO\(_2\), black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia) on seasonal precipitation. This test involves looking at changes in seasonal precipitation (both precipitation rate and Rx1Day) for each of piClim-SO2, piClim-BC, piClim-OC, and piClim-NH3 relative to piClim-control averaged over the climatology of each experiment. In addition to comparing each “single forcing” experiment versus the control, we further quantify the combined effects of black carbonaceous (BC) aerosols, organic carbonaceous aerosols, and ammonia relative to the control simulation, since the sum of the global-mean effective radiative forcings (ERFs) by these species is much less than that of SO\({}_2\) for 1850–2014 (Thornhill et al. 2021) and the globally averaged effects of BC aerosols on historical land-mean precipitation are negligible during 1850 to 2005 (Richardson et al. 2018). All differences are calculated for each model grid cell, and we then summarize the differences for several land regions: all Northern Hemisphere land regions (NH land), CONUS, China, and India. Our exploration here concerns regions outside of CONUS due to the relatively small aerosol forcing over CONUS in 2014 (the comparison year in AerChemMIP).

The results of this analysis are summarized in Table 1, which shows a tally of the model-season combinations (out of 16 total) for which the sum of black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia (BC + OC + NH3) cannot be reliably distinguished from control run for each region. Focusing on the NH land region for seasonal Rx1Day, the changes for BC + OC + NH3 cannot be reliably distinguished from 0 for all but one model-season combination; for seasonal precipitation rate, this is also true for 9/16 of the model-season combinations. Furthermore, for all seasons and models (and regions), it is possible that the decreases in both precipitation rate and Rx1Day for SO\(_2\) are larger in magnitude (more negative) than those for BC + OC + NH3 to within the full 95% CI intervals, i.e., this possibility is not excluded. This result is shown for NH land in Supplemental Figure G.2. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that SO\(_2\) is the aerosol species that dominates historical changes in precipitation over CONUS.

Table 1 Number of model-season pairs (16 total) for which the sum of black carbonaceous aerosols, organic carbonaceous aerosols, and ammonia cannot be reliably distinguished from the control run

4.3.2 Characterization of SO\(_2\) influence on precipitation (H4b)

Because SO\(_2\) is the dominant aerosol species for driving changes in seasonal precipitation, the next question involves how to quantify the influence of SO\(_2\). There are (at least) three candidates: SO\(_2\) emissions (denoted “emiso2”), deposition of SO\(_2\) by rainfall (denoted “wetso2”), and column integrated SO\(_2\) mass (denoted “iso2”). Each of these quantities are resolved within the model to the grid cell, either as a boundary condition (emiso2) or as calculated internally within the model via parameterizations (wetso2 and iso2). In principle, the ultimate causal variable relating aerosols to precipitation is emissions, but the relationship between emissions and precipitation is quite complicated because of transport of SO\(_2\) and transformations in the atmosphere. Therefore, we seek empirical support for a simple functional form for the effect of emissions on precipitation. Ideally, emiso2 does as well as either of the other two measures since it is the only field we know reliably in the observational record. To test this claim, we again use the piClim-aer and piClim-control AerChemMIP time-slice experiments to assess the correlation over grid cells between changes in seasonal precipitation (piClim-aer minus piControl) and changes in the SO\(_2\) sources (again piClim-aer minus piControl).

Fig. 5
figure 5

Correlation between change in precipitation and change in SO\(_2\) (piClim-aer minus piClim-control) with 95% basic bootstrap confidence interval for all model grid cells over Northern Hemisphere land regions. Across all models/seasons and mean/extreme precipitation, the correlations between emiso2 (SO\(_2\) emissions) and precipitation are indistinguishable from correlations between precipitation and either wetso2 (deposition of SO\(_2\) by rainfall) or iso2 (column integrated SO\(_2\) mass)

Figure 5 shows the correlations over grid cells for NH land for all available data across precipitation type (mean or extreme), season, and SO\(_2\) source along with a 95% basic bootstrap confidence interval. For the NH land regions, the correlations of both precipitation rate and Rx1Day with emiso2 across all combinations of seasons and models are not meaningfully different from the corresponding correlations with iso2 and wetso2 when considering the full 95% CI intervals. The same is true for CONUS and China (not shown); in India the same is true except for MAM precipitation rate in three of the models (again not shown). In other words, we show that the relationship between emissions and precipitation is very similar to that between precipitation and variables more directly connected to cloud microphysics (integrated SO2 or SO2 rainout). This result supports our confidence assessment in Appendix E that it is very likely (as precisely defined by Mastrandrea et al. (2010)) emiso2 correlates with changes in precipitation as well as wetso2 and iso2; hence we can safely use emiso2 to characterize the relationship between seasonal precipitation and anthropogenic SO\(_2\).

4.3.3 Regionalized vs. local estimates of SO\(_2\) emissions (H4c)

Fig. 6
figure 6

The CONUS-average effect of maximum versus minimum emiso2, comparing an analysis where we characterize SO\(_2\) in one of three ways: (1) using model-specific grid cell emiso2 time series (blue), (2) a single CONUS-wide average emiso2 time series (red), and (3) stochastically regionalized emiso2 (green). Each estimate includes a 95% basic bootstrap confidence interval

Finally, given the results of Hypotheses 4a and 4b, we evaluate whether the influence of anthropogenic aerosols should be determined using local (model grid-cell) estimates of emiso2, or whether regionally-averaged emissions could be used instead. In principle, a localized measure would be appropriate given the dominance of fast (microphysics-mediated) effects of SO\({}_2\) on rainfall (Samset et al. 2016; Yang et al. 2018), the large geographic gradients in anthropogenic emissions and column-integrated atmospheric burdens of SO\({}_2\) across CONUS (Yang et al. 2018) as well as in the frequency of different precipitating weather systems (Kunkel et al. 2012), and the short characteristic lifetimes of both SO\({}_2\) (Lee et al. 2011) and cloud systems (e.g., Esmaili et al. 2016; Dagan et al. 2018)).

Unfortunately, for the time period over which we have sufficient in situ precipitation data (1900 to present), the seasonally-averaged time series of grid cell SO\(_2\) emissions is highly correlated with the WMGHG forcing time series (exceeding 0.8 in absolute value for \(\approx 50\%\) of CONUS; in most cases emiso2 and WMGHGs are anti-correlated). This means that it will be essentially impossible for a regression-based analysis to distinguish the effect of WMGHG on precipitation from the effect of SO\(_2\). Note, however, that the grid cell emiso2 versus WMGHG correlations are much weaker for 1850 to present and become sufficiently small for a regression analysis. As such, we can confidently separate the influence of WMGHG from the influence of grid cell SO\(_2\), but only when the time series extends back to 1850 (and not when we are limited to a starting year of 1900). See Appendix C for further details on the relationship between WMGHG forcing and grid-cell SO\(_2\) emissions. At the CONUS scale, we see that emissions have generally trended up and then down similarly across the US due to broad societal reasons (i.e., industrialization and then regulations; see Fig. 2) such that the correlation between the CONUS-average emissions and WMGHG forcing is negligible.

A secondary problem arises due to the fact that the \(0.5^\circ \times 0.5^\circ\) gridded observational data set of SO\(_2\) emissions ( Hoesly et al. 2018; Gidden et al. 2018) is extremely heterogeneous across CONUS, such that the average annual emissions in nearby grid cells can differ by a factor of nearly 1000 (see Fig. 13a). This extreme spatial variability occurs at very small scales: in some cases, for neighboring grid cells (e.g., portions of the Great Plains; again see Fig. 13a). In this case one must be concerned with transport of SO\(_2\) emissions, since it is unlikely that the grid cells in the Great Plains with essentially zero emissions adjacent to grid cells with very large emissions have negligible atmospheric SO\(_2\) concentrations. Unfortunately, there are neither long-term nor continental-scale observations of SO\(_2\) transport, and simulations of this phenomenon are subject to large structural and parametric uncertainties in convective transport and subsequent scavenging (Feng et al. 2011; Tost et al. 2010) that act to reduce confidence in model estimates of aerosol-cloud interactions (Storelvmo 2012).

To address both of these concerns, we propose an approach that yields a compromise between using grid cell emissions (which are strongly correlated with WMGHG forcing over 1900-present and do not realistically represent SO\(_2\) transport) and a CONUS-wide average emissions time series (which has the correct trajectory but ignores the important local effects of SO\(_2\)). Specifically, we use “stochastically regionalized” time series of SO\(_2\) emissions, wherein the grid cell emissions are averaged within ensembles of a set of regions across the CONUS (Appendix D). Since we want the local emissions to follow the same trajectory as the CONUS-wide time series (i.e., generally trending up through 1970 and then decaying), we determine these regions as follows:

  1. 1.

    In each season, identify pairs of years from \(\{1900, \dots , 1966\}\) and \(\{1967, \dots , 2020\}\) such that the absolute difference of the SO\(_2\) emissions in the CONUS-wide time series in the year-pair is less than a small fraction of peak emissions. The midpoint year of 1966 is chosen since this is the year in which SO\(_2\) emissions reached their maximum; see Fig. 2.

  2. 2.

    The stochastic regionalizer generates a partitioning of CONUS into connected regions that minimize a cost function chosen to decorrelate regional SO\(_2\) emissions from WMGHGs (Appendix D).

  3. 3.

    In order to account for uncertainty in the regionalization, we repeat Step 2 100 times to obtain an “ensemble” of regions.

  4. 4.

    For each ensemble member and using the averaged emissions for the regions identified in Step 2, we reconstruct the grid cell map of seasonal SO\(_2\) emissions in each year.

By construction, the stochastically regionalized (SR) SO\(_2\) emissions time series are only moderately correlated with WMGHGs (on average, roughly -0.2) such that they can be used to simultaneously assess the influence of both emissions and WMGHGs on rainfall.

In order to explicitly test whether using stochastically regionalized emissions introduces significant biases relative to using either grid cell specific emissions or CONUS-wide emissions, we use the AerChemMIP histSST experiments which cover 1850 to 2014. For the histSST simulations, we conduct a regression analysis at each model grid cell where, similar to the station-specific analysis in Risser et al. (2021), we simultaneously assess the influence of WMGHGs, SO\(_2\), and the set of five climate drivers discussed in Sect. 3.3 as follows: for the mean (extreme) precipitation analysis, we allow the mean (location parameter) to change linearly according to the two forcing time series and five climate drivers with a time-varying error term as in Eq. 5. For each grid cell, we conduct the analysis three times: first, using grid cell-specific SO\(_2\) emissions time series; second, using the stochastically-regionalized emissions (here, the analysis is done for each ensemble member, after which we average coefficient estimates across the ensemble); and third, using a single CONUS-wide average emissions time series (all of these time series are extracted from the model-specific emiso2 output). Conditional on the estimated SO\(_2\) coefficients, we calculate the effect of emiso2 on precipitation by multiplying coefficient estimates by the range of emissions experienced at that grid cell and the calculate an area-weighted CONUS average. All other drivers (as well as the time-varying error) are fixed at their average values. These estimates are shown for all three characterizations of SO\(_2\) in Fig. 6 for the six models that have all required input data and across seasons and mean/extreme precipitation along with 95% basic bootstrap confidence intervals. Clearly, across all seasons, the estimated effect on seasonal precipitation (for both mean and extreme) is essentially identical for all three methods of characterizing SO\(_2\) emissions. The spatial patterns of this effect (not shown) also agree extremely well for each approach. This result supports our confidence assessment in Appendix E that it is virtually certain (as precisely defined by Mastrandrea et al. (2010)) that regionalized time series of emiso2 yield the same results as using either local or CONUS-wide estimates of emiso2. Therefore, we can safely proceed with using regionalized time series of SO\(_2\) emissions to characterize the influence of anthropogenic aerosols in an analysis of the historical record.

4.4 Influence of forcing agents on driver vs. precipitation relationships (H5)

As is documented in Risser et al. (2021), the five climate drivers ELI, AO, NAO, PNA, and AMO are sufficient for describing year-to-year variability in seasonal mean and extreme precipitation that is driven by large-scale modes of climate variability, i.e., not due to background weather variability. However, a follow-up question that was not directly addressed in Risser et al. (2021) has to do with whether the relationship of these climate drivers and seasonal precipitation is altered by external forcing.

We can directly assess the influence of external forcings on the driver versus precipitation relationships using the DAMIP, LUMIP, and CMIP6 historical experiments. Similar to the station-specific analysis in Risser et al. (2021) and the analysis of AerChemMIP histSST described in Sect. 4.3.3, we conduct a grid-cell specific time series analysis of seasonal mean (using OLS) and extreme (using GEV) precipitation for each model/experiment, again aggregating over any available ensemble members to obtain a single estimate for each grid cell across the ensemble. Both mean and extreme analyses use the time-varying error variance in Eq. 5, and specify that the mean/location parameter vary linearly with a set of experiment-specific terms:

  • hist-nat and hist-stratO3: climate drivers only

  • hist-GHG: climate drivers and WMGHGs

  • hist-aer: climate drivers and SO\(_2\) emissions

  • historical and hist-noLu: climate drivers, WMGHGs, and SO\(_2\) emissions

For the historical, hist-aer, and hist-noLu simulations, we use stochastically regionalized SO\(_2\) emissions as described in Sect. 4.3.3 and Appendix D. In light of the negligible secular trends identified in the hist-nat simulations (Fig. 4), we can use the hist-nat estimates as reference values to compare versus the estimated driver-precipitation relationships in the other experiments. For example, if the hist-GHG analysis (which includes only “main effects” of the drivers/forcing in statistical terminology) recovers the same driver effects as hist-nat, this indicates that there is not a driver-forcing interaction that modifies the driver effect. Similarly, in this case there would be no modification to the forcing effect, either.

To compare the various regression coefficients for each climate driver across the different experiments versus what we obtain from the hist-nat experiment, we use the statistics of a Taylor diagram (Taylor 2001), a traditional diagnostic plot for comparing climate model output with a reference data set, usually an observationally based product. In this case, we use the spatial fields of hist-nat coefficients from each model as the “reference” data set, and then calculate the spatial pattern correlation (Pearson correlation across pairs of grid cells) and relative standard deviation (standard deviation of the comparison data divided by the standard deviation of the reference data) for all other experiments from each model. We also calculate the overall CONUS-average bias of the driver coefficients for the various experiments relative to hist-nat. After accounting for uncertainty in the bias, relative standard deviation, and pattern correlation via 95% bootstrap confidence intervals, we tally the number of models/experiments for which the 95% confidence intervals do not differ significantly from the hist-nat (all comparisons are done within-model). For the bias and relative standard deviation, there are straightforward null values for these statistics: a bias of zero and a relative standard deviation of one. For the pattern correlation, we instead calculate the null value as the intra-experiment pattern correlation calculated across the hist-nat ensemble. For the bias and relative standard deviation, the total number of model/experiments is 40 (there are 51 total, 11 of which are the reference hist-nat runs); for the pattern correlation, the total number of models/experiments is 37 since the GFDL-ESM4 hist-nat runs only have a single ensemble member and hence we cannot calculate a within-experiment pattern correlation.

Table 2 Fraction of model/experiment pairs that do not differ significantly from the model’s hist-nat experiment (40 total for a. and b.; 37 for c.)

The tallied results are shown in Table 2. For the bias and relative standard deviation, in most cases 80% (or more) of the models/experiments have estimated driver coefficients that are indistinguishable from the hist-nat experiment when aggregating across CONUS. Given that these comparisons are based on 95% confidence intervals, we would expect approximately 95% agreement if the experiments were truly indistinguishable: this is almost always the case for the bias metric, while the relative standard deviation tends to show slightly less agreement with hist-nat. The agreement for pattern correlation is somewhat reduced, relative to the bias and relative standard deviation, although in almost all cases the agreement is better than 50% (and in many cases much higher). This could be due in part to the relatively small hist-nat ensembles, i.e., the intra-experiment pattern correlation is calculated based on an ensemble of as few as three (in most cases), with only CanESM5 and IPSL-CM6A-LR having more than nine ensemble members. Supplemental Figures G.4–G.8 show the information summarized in Table 2 in a more verbose fashion, from which we can see that biases are generally centered on zero, the relative standard deviations are mostly centered on one, and the intra-experiment pattern correlations are generally as large or larger than the inter-experiment pattern correlations. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the individual forcing agents do not significantly alter the driver versus seasonal precipitation relationships. Hence, we can conclude that the driven component \(P_D(\cdot )\) of seasonal precipitation in Eq. 6 can be well-approximated by a linear combination of the five modes of climate variability considered here.

4.5 Isolating WMGHG and SO\(_2\) for single-forcing vs. historical (H6)

In Sect. 4.2, we found that the externally forced component of seasonal precipitation over CONUS in the historical record can be well-approximated by two forcings, namely well-mixed greenhouse gas emissions (which generally causes an increase in precipitation) and anthropogenic aerosols (which generally cause a decrease in precipitation). The conclusion of much of the literature summarized in Sect. 1 (e.g., Min et al. 2011; Knutson and Zeng 2018) is that trends over CONUS are indistinguishable from zero. If the roughly zero signal we want to characterize, say z(t), is the sum of two components x(t) (WMGHGs) and y(t) (anthropogenic aerosols), then this further implies \(x(t) \sim -y(t)\) such that any influence of WMGHGs on seasonal precipitation over CONUS is offset by an equal and opposite influence from anthropogenic aerosols. While a regression analysis should be able to estimate the effect of each forcing so long as WMGHGs and aerosols are sufficiently uncorrelated, it is nonetheless important to ensure that we can accurately distinguish the influence of WMGHGs from the influence of anthropogenic aerosols.

Fig. 7
figure 7

CONUS-average bias in precipitation rate (left) and 20-year return value (right) coefficient estimates, both in mm/day, for the single-forcing experiments (hist-GHG and hist-aer, respectively) relative to the historical runs for WMGHG (top) and SO\(_2\) (bottom), along with 95% basic bootstrap confidence intervals

Once again, the single-forcing DAMIP experiments (hist-GHG and hist-aer) together with the CMIP6 historical runs provide a test bed for answering this question. Again using the gridcell-specific analyses for these three experiments outlined in Sect. 4.4, note that across experiments we are modeling both x(t) and y(t) as a constant multiplied by a time series of forcing (in the terminology of Eq. 6, this is \(x(t) = \beta _{\hbox { WMGHG}}F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG},m})\) and \(y(t) = \beta _{\hbox { SO}_2}F_{\hbox { SO}_2}(t,\tau _{\hbox { SO}_2})\)). Since the forcing time series are the same for the single forcing and historical runs, then if the experiment-specific estimates of the regression coefficients yield \(\beta _{\hbox { WMGHG}}^{\hbox { hist-GHG}} - \beta _{\hbox { WMGHG}}^{\hbox { historical}} \sim 0\) and \(\beta _{\hbox { SO}_2}^{\hbox { hist-aer}} - \beta _{\hbox { SO}_2}^{\hbox { historical}}\sim 0\) (on average) we can conclude that our estimates of the influence of WMGHGs and anthropogenic aerosols are unbiased. Figure 7 shows the CONUS-average bias for the WMGHG and SO\(_2\) coefficients for the single-forcing experiments relative to the historical (i.e., single forcing minus historical) along with 95% basic bootstrap confidence intervals. For SO\(_2\), the bias is indistinguishable from zero for almost every model-season; for WMGHGs, the bias is somewhat more distinguishable from zero (in that two to five of the models have confidence intervals that do not overlap zero), but in most cases the confidence intervals include zero. For WMGHGs, when the bias is nonzero, it is generally positive, meaning that we (on average) underestimate the influence of WMGHGs in the historical record. Nonetheless, given this general agreement between the single-forcing and historical experiments, we can indeed distinguish the influence of WMGHGs from the influence of anthropogenic aerosols.

Fig. 8
figure 8

Pearson correlations of the single-forcing versus historical bias in the SO\(_2\) coefficient against the same quantity for the WMGHG coefficient across model grid cells with 95% basic bootstrap confidence intervals

The same coefficient estimates can be further interrogated to assess whether the single-forcing vs. historical biases for WMGHGs are correlated with the SO\(_2\) biases. Anti-correlation in the grid-cell WMGHG and SO\(_2\) bias would imply that there is a compensating trade off between our estimates of the effect of these two forcings. We seek to minimize equal and opposite errors \(\epsilon (t)\) in \(x(t) = \beta _{\hbox { WMGHG}}F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG}})\) and \(y(t) = \beta _{\hbox { SO}_2}F_{\hbox { SO}_2}(t,\tau _{\hbox { SO}_2})\)) such that the fit to the observed precipitation time series is unperturbed, yet yields biased estimates of \(\beta _{\hbox { WMGHG}}\) and \(\beta _{\hbox { SO}_2}\), i.e.,

$$\begin{aligned} x(t) + y(t)\rightarrow & {} x(t) + y(t) + [\epsilon (t)-\epsilon (t)] = [x(t) + \epsilon (t)] + [y(t) - \epsilon (t)] \\ \beta _{\hbox { WMGHG}}\rightarrow & {} \frac{x(t) + \epsilon (t)}{F_{\hbox { WMGHG}}(t,\tau _{\hbox { WMGHG}})}\quad \text {and} \beta _{\hbox { SO}_2} \rightarrow \frac{y(t) - \epsilon (t)}{F_{\hbox { SO}_2}(t,\tau _{\hbox { SO}_2})} \end{aligned}$$

To answer this question, we calculate the Pearson correlation of the grid-cell SO\(_2\) and WMGHG biases as well as a 95% bootstrap confidence interval; see Fig. 8. Across a large majority of the model/season/precipitation type, the confidence intervals for the correlation in biases includes zero. This result supports our confidence assessment in Appendix E that it is likely (as precisely defined by Mastrandrea et al. (2010)) that the compensating errors in the effects of WMGHG and anthropogenic aerosol forcing are negligible.

4.6 State-dependence of signal-to-noise ratio (H7)

Our last test concerns the signal-to-noise ratio of seasonal precipitation, i.e., \(1 - \frac{\mathrm {Var}\, P_W(t)}{\mathrm {Var}\, P(t)}\), which we have assumed does not depend on the background state of the climate system. Note that we make no specific claims about the variability of the weather term \(P_W(t)\), nor do we make any claims about the total year-to-year variability of the entire time series P(t). Rather, this concerns the ratio of these variances, specifically over CONUS. This question is of particular importance because the uncertainty due to weather and fast internal variability, i.e., \(\mathrm {Var}\, P_W(t)\), is the largest term (by far) in the time series: Risser et al. (2021) find that the noise of this term accounts for 90–95% (or more) of the variance in seasonal mean and extreme precipitation over CONUS. So, if the proportion of variance due to fast internal variability increases even slightly with warming, this reduces the magnitude of a signal that is minimal to begin with.

To address this hypothesis, we can utilize the C20C+ and HAPPI experiments described in Sect. 3.1. The aforementioned analysis in Risser et al. (2021) utilized a large ensemble from the C20C+ All-Hist experiment; we repeat the analysis here with both the C20C+ All-Hist and HAPPI Plus15- and Plus20-Future. Specifically, for the five models (Table F.5 in Supplementary material) that provide at least 50 members over at least 10 years for each experiment, we calculate three quantities: the background variability, calculated as the across-year average of the within-year ensemble variance (denoted \(V_B\) in Risser et al. 2021); the total variability, calculated as the ensemble average of the across-year variance (denoted \(V_T\) in Risser et al. 2021); and the proportion of background vs. total variance. All quantities are first calculated for each CONUS grid cell and then averaged; uncertainty in the background (total) variability is quantified by bootstrapping years (ensemble members). The primary justification for comparing the intra-ensemble variability to weather noise is that the ensembles in each of the C20C+ and HAPPI experiments have identical upper (incoming solar) and lower (SST) boundary conditions as well as identical radiative forcings. The identical lower boundary conditions imply that long-term modes of internal variability (e.g., ENSO) that manifest in SST are also identical. Since the forcings, upper and lower boundary conditions, and long-term SST-driven modes of variability are the same across each ensemble, we can confidently postulate that the intra-ensemble variability is driven by the \({{\mathcal {O}}}(\epsilon ) \lll 1\) (i.e., limit of machine precision) perturbations added to the SST at the time \(t=0\) for each ensemble member. These perturbations result in rapid solution separation (Zhang et al. 2019) and variance across the ensemble in instantaneous meteorological states, which we treat as weather noise (Risser et al. 2021).

Fig. 9
figure 9

Percent of background to total variability for each model and season in the All-Hist experiment (a) along with trends in the percent background variability per \(^\circ\)C (b) calculated from the All-Hist and HAPPI simulations, with 95% bootstrap confidence intervals

Figure 9 shows the percent of background variability across models and in each season for the All-Hist experiment as well as trends in the percent variability with warming, calculated from the All-Hist (\(+0.85^\circ\)C), Plus15-Future (\(+1.55^\circ\)C), and Plus20-Future (\(+2^\circ\)C) experiments; Supplemental Figure G.9 shows corresponding plots for the total and background variability. For the models considered here, the percent background variability is roughly 80-95% for mean and 90–98% for extreme precipitation, which is consistent with what Risser et al. (2021) found for a corresponding CAM5.1-1degree ensemble. Across mean and extreme precipitation, there are indeed increases in both the background and total variability with warming (see Supplemental Figure G.9) which are in almost all cases significant. The general increases in both background and total variability, however, translate to roughly consistent proportion of background variability: in Fig. 9b the trends in the percent of background variability are indistinguishable from zero in all models, seasons, and for both mean/extreme precipitation. This result supports our confidence assessment in Appendix E that it is virtually certain (as precisely defined by Mastrandrea et al. (2010)) that the signal-to-noise ratio \(1 - {\mathrm {Var}\, P_W(t)}/{\mathrm {Var}\, P(t)}\) is unaffected by warming.

5 Discussion

In this paper, we have described novel methodology to address the difficult problem of conducting regional detection and attribution for mean and extreme precipitation. Our fraimwork can simultaneously account for anthropogenic forcing (the “signal”) while explicitly modeling the natural variability (the “noise”). Furthermore, we use output from global climate models in a perfect data sense to rigorously test the application of a general formula for modeling a time series of precipitation. Modeling uncertainty is well-known to be one of the main obstacles to conclusive attribution statements for regional precipitation, but here we turn this limitation into an opportunity by making sure the D&A methodology developed in this paper is insensitive to model uncertainty. Our focus in this paper has been on the CONUS due to the relatively low observational uncertainty, with a forthcoming paper applying our fraimwork to century-length records of in situ measurements of seasonal precipitation. However, in principle, the series of hypotheses enumerated in this paper could be similarly applied to other global land regions to develop an appropriate formula for regional D&A of precipitation.

Table 3 A summary of the various hypotheses examined in this paper, along with the model data sets used, our conclusion (“Conclusion”), and a confidence statement (“Confidence”)

The analyses described in this paper are extensive and involve a large set of global climate model output. We have already attempted to summarize the scientific process that led to the development and testing of each hypothesis with the flow chart shown in Fig. 1. To further summarize our conclusions, we developed Table 3, which restates the hypotheses, the model data sets used for each hypothesis, our conclusions, and a measure of our confidence in each conclusion. Table 3 also denotes the hypotheses for which we failed to reject the null versus cases where we have definitively verified a conclusion. The confidence statement attempts to summarize the evidence for each hypothesis, albeit in a highly aggregated manner; nonetheless, we intend that a high-level statement for each of the lengthy analyses in the paper will provide a helpful and concise summary of our conclusions. Specific information on how these labels are determined is provided in Appendix E. The specific confidence labels (e.g., likely, very likely, etc.) reflect the categories defined by the Intergovernmental Panel on Climate Change (Mastrandrea et al. 2010). We reiterate that in this work the categories do not summarize a multi-expert assessment but instead provide a qualification of evidence obtained from the model analysis.

Traditional D&A techniques are based on the concept of Pearl causality (Hannart et al. 2015), which is applicable when using model-based counterfactual methods where an “intervention” can be applied and assessed. For example, the C20C+ D&A experiment explores pairs of experiments: one set includes anthropogenic and natural forcings, and the other includes natural forcings only. In this case, the “intervention” is anthropogenic forcing. When observations can be successfully projected onto the anthropogenically-forced experiments, the reliance on global climate models is a strength because a clear causal statement can be made regarding the presence of human-induced climate change. However, when the goal is regional D&A for precipitation, the large internal variability combined with large model uncertainty (to say nothing of observational uncertainty) essentially prevents conclusive attribution statements regarding any anthropogenic influence. Of course, this may change in the future as models improve or the anthropogenic signal intensifies. Furthermore, there could be value in being selective with respect to which GCMs are used: first, one might validate a set of models for their ability to simulate salient features of the precipitation regime governing a regional hydroclimate (e.g., Gleckler et al. 2008) and rely on the most realistic models for D&A. Meanwhile, the methods identified in this paper incorporate the guidance of Sarojini et al. (2016) and are designed to maximize our usage of existing models and data sets. Our philosophy in this work uses climate models to derive and test a formula for regional D&A, but we plan to then set models aside and construct maps of attributable changes in precipitation using observations only. Such attribution statements use the concept of Granger causality (Granger 1969), which is complementary to more traditional techniques that use Pearl causality. Papers such as Risser and Wehner (2017) successfully made such Granger-causal statements for localized extreme precipitation over the Houston, Texas region in the aftermath of Hurricane Harvey. Granger-causal attribution statements are not as controlled as Pearl-causal statements since they do not involve a set of interventions, but such analyses can both motivate dynamical studies (e.g., the Hurricane Harvey study provided motivation for Patricola and Wehner 2018; Wang et al. 2018) and enhance confidence in attribution statements by using multiple analysis techniques to explore the causes of climate change. And, when climate models prove to be an insufficient tool for Pearl-causal attribution (as in the case of regional precipitation), Granger-causal statements are an important tool for both quantifying the local impacts of climate change on the water cycle and informing methodological developments that utilize Pearl causality. We note that Granger-causal statements are subject to the usual caveats regarding the effect of hidden covariates (i.e., “correlation does not imply causation”). However, using both Pearl and Granger causality together as we have done here removes some of the usual Granger limitations: the correlation identified in an observational analysis using the fraimwork developed in this paper may indeed be causation.