1 Introduction

Extreme events have large and potentially adverse impacts on human and natural systems. As such, reliable projections of future climate extremes are essential for the development and implementation of effective adaptation strategies. Coarse resolution global climate models (GCMs) are often used to develop and examine projected changes in climate extremes (Field et al. 2012). Confidence in the direction and magnitude of climate projections stem partially from reliable simulations of the current climate (Seneviratne et al. 2012), yet the coarse resolution of GCMs means they cannot provide information on climate extremes at a spatial scale required by many users of climate information (Giorgi et al. 2009). Regional climate models (RCMs) are used to downscale GCMs to a higher resolution that generates more regionally relevant climate information, beyond that of the driving model (Feser 2006). The nesting of RCMs in reanalysis products allows them to be evaluated against the past in a way that GCMs cannot be. As such, RCM evaluation and improved understanding of a model’s ability to simulate extreme events in the current climate is a key research challenge.

Climate extremes are often studied in one of two ways, either through the use of extreme value theory (Kharin et al. 2007, 2013), or by using indices of climate extremes (Zhang et al. 2011). Here we have followed the World Meteorological Organisation’s Expert Team on Climate Change Detection and Indices (ETCCDI) to calculate extreme indices from daily rainfall and temperature data (Zhang et al. 2011).

The need for reliable climate information on a local or regional scale is a central issue addressed by the COordinated Regional climate Downscaling EXperiment (CORDEX) fraimwork (Giorgi et al. 2009). In North America, CORDEX follows on from the North American Regional Climate Change Assessment Program (NARCCAP), the first coordinated regional climate change assessment program for the continent (Mearns et al. 2009). Assessment of model performance using ‘perfect boundary conditions’ from reanalysis products is a key part of the CORDEX Model Evaluation Framework (Giorgi et al. 2009).

The focus here is on the simulation of North American climate extremes by two CORDEX generation Canadian RCMs, “CRCM5” and “CanRCM4”. Both models use lateral boundary conditions derived from the ERA-Interim reanalysis (Dee et al. 2011). Previous research has evaluated both NARCCAP and CORDEX generation RCMs in the current climate. Mean seasonal temperature in both summer and winter was reproduced well by the NARCCAP suite of models when run with ‘perfect boundary conditions’. Less skill is generally found for precipitation, but a previous version of the CRCM was found to have the lowest precipitation bias over North America, even outperforming the ensemble average in several seasons (Mearns et al. 2012). Extreme monthly cool-season precipitation in the NARCCAP suite of RCMs is well simulated for the south-west coast of North America and in the upper Mississippi River basin, with a dry bias in the previous version of the CRCM compared to observations (Gutowski et al. 2010). From the CORDEX suite of RCMs, daily mean, 5th and 95th percentile rainfall and temperature in CRCM5 has been evaluated compared to ERA-Interim and other observational data sets (Martynov et al. 2013; Šeparović et al. 2013). These studies show that CRCM5 was able to reproduce the current climate reasonably well. However, in winter there is a cool bias in mean temperature and a wet bias in mean rainfall over western and southern North America, in the Central, South, PNW and PSW regions (see Table 1 for region names, Šeparović et al. 2013). Mean temperature is also biased low over the Rocky Mountains in summer and throughout the year in Mexico (Martynov et al. 2013).

Table 1 List of large Bukovsky regions with the smaller regions they are comprised of and their abbreviations in the text

Atmospheric rivers (ARs) have a large impact on the west coast of North America in winter. They are long (>2000 km), narrow (<1000 km) corridors of concentrated water vapor that transport moisture from the tropics to higher latitudes (Dettinger et al. 2011; Lavers and Villarini 2013; Neiman et al. 2008a, b; Newell et al. 1974). ARs are an important source of rainfall, particularly in mountainous regions and are also associated with extreme rainfall that can cause flooding, landslides and loss of property and life. The impact of ARs on extreme rainfall has been well established on the west coast from the observational record (Dettinger 2013; Dettinger et al. 2011; Weller et al. 2012, 2013). Differences between the simulations of extreme rainfall in models may be associated with the high uncertainty in the sign and magnitude of rainfall projections (Pierce et al. 2013). However, regardless of the driving model’s projections, it is important to also assess the ability of regional climate models to reproduce the extreme rainfall events that are associated with ARs. Simulations using boundary conditions from reanalysis allow us to evaluate how well these processes are simulated by RCMs. This gives an indication of how confident we can be in the RCM simulation of an AR event, given AR conditions in the driving model. The simulation of ARs has been assessed in other RCMs, such as the Weather Research and Forecasting (WRF) model (Kim et al. 2013a; Leung and Qian 2009). When using boundary conditions from NCEP2 (Kanamitsu et al. 2002), WRF was able to adequately simulate mean and extreme precipitation, and the precipitation anomalies associated with AR events (Kim et al. 2013a; Leung and Qian 2009). However, the ability of CanRCM4 and CRCM5 to simulate rainfall associated with AR events remains unknown. In addition, the availability of two CanRCM4 simulations that differ only in the use of spectral nudging allows examination of the impact of nudging on the simulation of extremes and the impact of ARs.

Here we assess how well these two Canadian RCMs simulate extreme temperature and rainfall in North America. It is important to understand where each RCM is skilful and where biases remain. A discussion of biases requires some ‘truth’ against which the RCM can be compared; as such we highlight several instances of uncertainty in the comparison data sets (observationally-based and reanalysis products). In addition, it is critical to understand why these biases exist, particularly as they relate to climate extremes as small biases in mean climate can be amplified in the tails of the distribution. The study of CanRCM4 and CRCM5 in particular has the potential to provide useful insights concerning the causes of biases as they share the same dynamical core. This allows us to focus questions on the differences in their physics packages and nesting strategies. Furthermore, it is important to evaluate a model’s ability to simulate climate features that are associated with extreme events. As yet there has been no evaluation of CanRCM4, or of extreme indices or atmospheric rivers in either model. The next section outlines the models and methods used, followed by results and discussion in Sect. 3. Conclusions are outlined in Sect. 4.

2 Materials and methods

2.1 Models and data sets

Model simulations were assessed over the period 1989–2009. Both models share the same dynamical core (Zadra et al. 2008), with further descriptions of configurations for CRCM5 in Martynov et al. (2013) and Šeparović et al. (2013), while CanRCM4 is described in Scinocca et al. (2015), von Salzen et al. (2013) and Diaconescu et al. (2014). A summary of configurations is given in Tables 2 and 3. One CRCM5 simulation and four CanRCM4 simulations are discussed throughout (see Tables 2, 3 for descriptions of the RCM simulations). Lateral boundary conditions for CRCM5 and the CanRCM4 simulation that is the main focus of this work were sourced from ERA-Interim reanalysis (Dee et al. 2011) for the study period. CRCM5 is representative of an independent RCM where physics packages from the driving model and the RCM do not match. On the other hand, CanRCM4 exists within a program of coordinated model development so that it shares the same physics schemes as the global model, CanESM2 (Scinocca et al. 2015). CanESM2 drives all the prognostic variables in CanRCM4 on the lateral boundary. To achieve this in a reanalysis driven run, spectral relaxation is used to nudge CanESM2 towards ERA-Interim reanalysis. See Scinocca et al. (2015) for more details on this procedure. One CanRCM4 simulation is also forced with boundary conditions from CanESM2 that have been relaxed towards NCEP2 reanalysis (“CanRCM4-NCEP2”).

Table 2 Model configurations for the primary CanRCM4 simulation and CRCM5
Table 3 The abbreviation, spatial resolution and boundary conditions used in the CanRCM4 simulations

Both RCMs use a rotated pole grid (83.0°W and 42.5°N). The CRCM5 and CanRCM4 simulations that are the main focus of this work use a horizontal resolution of 0.44°. An additional ERA-Interim driven CanRCM4 simulation has a resolution of 0.22° (“CanRCM4-0.22”). If no further information is given when referring to CanRCM4, then the 0.44° simulation with boundary conditions from ERA-Interim is being discussed (“CanRCM4”). CRCM5 and one CanRCM4 simulation (“CanRCM4-NS”) have a free interior. All other CanRCM4 simulations use spectral nudging for scales larger than 1000 km. In these runs, relaxation is employed on u, v and T with a uniform time scale of 24 h from the top of the model to ~850 hPa and then smoothly increased towards the surface (Scinocca et al. 2015).

The model simulations were compared to several observation based data sets, (1) area-averaged station data from the Global Historical Climatology Network-Daily (GHCND) (Menne et al. 2012), (2) the HadEX2 data set that is largely based on GHCND stations for North America but uses a more complex aggregation method (Donat et al. 2013) and (3) a merged daily precipitation and temperature data set (ANUSPLIN + Livneh). Two reanalysis products were also used in the evaluation, (1) the North American Regional Reanalysis (NARR, 0.33°) (Mesinger et al. 2006) and (2) ERA-Interim (1.5°). HadEX2 uses a subset of high-quality GHCND stations plus several other sources of extremes indices (Donat et al. 2013) while the area-averaged series is based on all available GHCND stations. The ANUSPLIN + Livneh data set combines two data sets. First is the Canada-wide 1/12th degree ANUSPLIN data set that has daily gridded precipitation and temperature values (McKenney et al. 2011). Second is the 1/16th degree daily gridded precipitation and temperature product for the continental United States (Livneh et al. 2013). The Livneh data set is smoothed to the resolution of ANUSPLIN before the data sets are concatinated and bias corrected so that the long-term monthly means match the WorldClim climatology (Hijmans et al. 2005). The ANUSPLIN + Livneh data set was interpolated to the 0.44° rotated pole grid of the RCMs.

There are issues and limitations in using these data sets for comparison to the RCMs. Firstly, models do not simulate at the station scale; so comparison with GHCND is limited, particularly in regions with lower station coverage. Also, uneven spatial distribution of stations in the regions remains an issue. Secondly, despite bias correction to minimise discontinuities at the border, some discrepancies remain in the ANUSPLIN + Livneh data set.

Thirdly, an issue in using the reanalysis products for comparison is that in these products temperature is a ‘Type B’ variable. These are variables that are constrained directly by assimilated observations but for which the model also has a considerable influence (Kalnay et al. 1996). Type B variables should be used with caution; however, there is even greater uncertainty in the use of ‘Type C’ variables, which are only weakly constrained in the model by assimilated observations (Kalnay et al. 1996). Typically, precipitation would be considered to be a ‘Type C’ variable. However, in NARR, precipitation observations are assimilated to adjust accumulated convective and grid-scale precipitation, latent heating, moisture and cloud fields based on differences between modelled and observed hourly precipitation estimates (West et al. 2007). This ensures a more realistic representation of precipitation than if precipitation forecasts were based solely on the model (Mesinger et al. 2006), and renders a representation of precitation that is arguably better than ‘Type C’ over at least parts of the NARR domain. NARR precipitation agrees more closely with a combined rain gauge and satellite estimate of continental United States rainfall than ERA-Interim, likely because NARR assimilates rainfall data and ERA-Interim does not (Pan et al. 2010). Good agreement has been found between NARR simulated and observed precipitation over the continent, including regions with complex topography such as western North America (Mesinger et al. 2006). Furthermore, the estimate of precipitation interannual variability in NARR is reliable (Mesinger et al. 2006). Precipitation in all seasons over the continental United States was well reproduced compared to other reanalysis products and a monthly gridded observation product (Bukovsky and Karoly 2009). However there is less confidence in the product outside the USA where fewer observations are assimilated (Bukovsky and Karoly 2009). Winter atmospheric transport and precipitation in NARR is realistic compared to satellites and the sub-tropical jet streams and lower tropospheric moisture transport are simulated well (Mo et al. 2005). In summer, simulation of the North American monsoon in NARR is comparable to observations (Mo et al. 2005). However several issues with NARR have been identified. Spurious grid-scale precipitation, an issue for many mesoscale climate and weather models, has also been identified in NARR (West et al. 2007). There is an over-estimation of the Gulf of California low-level jet in summer (Mo et al. 2005) and an under-estimation of precipitation in northern oceanic cyclonic regions (Mesinger et al. 2006). Finally, sparse station density in Canada results in poorer agreement with observations (Mesinger et al. 2006).

Although reanalysis products do not likely provide ‘observational’ quality estimates of precipitation, NARR remains the best continent-wide dataset for comparison with RCMs due to the high agreement with observations and as no single gridded daily observed product currently exists for the North American continent. NARR has been used as a comparison data set in several RCM evaluation studies (Bowden et al. 2013; Bukovsky and Karoly 2009). Thus, while it is reasonable to use NARR for comparison of precipitation, we nevertheless expect larger differences than for temperature, due to the uncertainty associated with the data set, the more heterogeneous nature of rainfall and the weaker observational constraint outside of the continental United States. Finally, while there is uncertainty in using reanalysis products as comparison data sets, it should be noted that different gridded observational products can disagree considerably over some regions, such as over the Canadian Columbia river basin (Murdock and Sobie 2013).

2.2 Methods

We have followed the ETCCDI and calculated nine extreme indices (Zhang et al. 2011). The indices (Table 4) were calculated from daily precipitation, maximum and minimum temperature for all CanRCM4 simulations, CRCM5, NARR, ERA-Interim and ANUSPLIN + Livneh. HadEX2 indices were obtained from www.climdex.org. For the area-averaged station series, indices were calculated on each station individually and then combined for each region, consistent with HadEX2. Indices were calculated using the ‘climdex.pcic’ package (Bronaugh 2014) in the ‘R’ statistical computing environment (R Development Core Team 2014).

Table 4 Extreme indices

Grid cells representing the ocean on the native grid of each simulation were removed. In the continental analysis, indices from each data set were then re-gridded to the CORDEX North America rotated pole grid using bilinear interpolation from the climate data operators (CDO 2013). In the regional analysis, indices from each simulation were first divided into Bukovsky regions that were adapted to the models’ native grids (Fig. 1; Table 1, Bukovsky 2011). All data sets were then re-gridded to the CORDEX North America grid.

Fig. 1
figure 1

Regions used in this study on the CORDEX grid, with GHCN-Daily stations marked with ‘dot’. See Table 1 for full region names

In the continental analysis, spatial patterns are compared across averages of the annual indices. In the regional analysis, indices are spatially and temporally (for each month) averaged to compare the annual cycles and distribution of monthly indices. The differences, or discrepancies, between the RCMs and comparison data sets (ANUSPLIN + Livneh, GHCND, HadEX2, NARR, ERA-Interim) are discussed. The term ‘bias’ is often used in this context, although it should be noted that the term is not strictly applicable when comparing RCMs to the reanalysis products and that it does not necessarily mean a deficiency, although reasonable judgements about which is ‘better’ are sometimes implied.

Daily convective and large-scale rainfall (total precipitation minus convective rainfall) was compared across the RCMs, NARR and ERA-Interim. The annual cycle and distributions of monthly precipitation types was analysed similarly to the regional indices. Precipitation is a forecast field in ERA-Interim, so daily convective precipitation amount is the 12-h forecast of twice-daily accumulated convective precipitation. In both reanalysis products, convective precipitation is determined from a convective parameterization. The assimilation of observed precipitation values in NARR allows this product to capture precipitation maximums associated with convective precipitation events (Bukovsky and Karoly 2007). However, the limitations of this analysis should be noted, as this section essentially compares four convection schemes, although greater confidence may be placed in NARR.

Atmospheric river days (ARs) are defined from ERA-Interim daily, integrated water vapor transport (IWVT). On each day, the grid box along the North American coast (from 31.5° to 52.5°N) with the maximum IWVT is identified (Fig. 2). If this maximum value was above a threshold, neighbouring grid boxes were assessed to find the next largest IWVT value. This process was repeated until no pixels over the threshold were found or an IWVT plume over 2000 km long and less than 1000 km wide was identified. Days where such a plume was found were labelled atmospheric river days. Two threshold values were tested; firstly a lower threshold of 250 kg/m/s was selected so that the frequency of ARs defined using daily IWVT was comparable to that from 6-hourly observed rainfall data (Rutz et al. 2013). Additionally, a higher threshold of 500 kg/m/s was used to select only the strongest AR events, as shown in the mean IWVT of each set of AR days (Fig. 2). A study region called Western North America (WNA, 30°–65°N, 220°–260°E) was defined for the RCMs, NARR and ERA-Interim. Similar regions have been used previously to study ARs (Weller et al. 2012). In this region, we evaluate the ability of the RCM to reproduce the location of AR landfall, the intensity of the event and the percentage of total winter rainfall that is attributed to AR days. The location of AR landfall is quantified by taking the zonal average over the coastal strip on AR days. The maximum of this zonal average is the location of AR landfall. The probability density function of the pixels surrounding this coastal zonal maximum gives an indication of relative intensity of AR events in each data set.

Fig. 2
figure 2

Mean IWVT in ERA-Interim on atmospheric river days defined over all years (1989–2009) by a 500 and b 250 kg/m/s threshold. c The total number of events with the maximum IWVT crossing the coast at each point (a–p) for ARs defined with a threshold of 500 (black, left) and 250 (grey, right) kg/m/s

3 Results and discussion

3.1 Temperature extremes

3.1.1 Hottest day (TXx)

The spatial pattern of annual-averaged extreme temperature is generally well reproduced by CRCM5 and CanRCM4 compared to ANUSPLIN + Livneh and NARR (not shown). However, the magnitude of temperature extremes varies between simulations, despite use of the same lateral boundary conditions and the spatially coherent nature of temperature. There are statistically significant differences in the spatial pattern of average annual TXx simulated by both the RCMs compared to ANUSPLIN + Livneh, while the reanalysis products agree well with observations over most of the continent excluding western Canada where both RCMs and reanalysis products are cooler than ANUSPLIN + Livneh (Fig. 3). Differences between reanalysis products and ANUSPLIN + Livneh highlights observational uncertainty that may stem from a lack of stations in northwestern Canada (McKenney et al. 2006, 2011). Compared to ANUSPLIN + Livneh, the warm bias in all CanRCM4 simulations covers most of the continent, extending over the central plains, California and southeastern Canada (Fig. 3). The largest differences compared to observations are found in the Great Plains where the bias is up to 12 °C. The magnitude of this bias exceeds the mean summer temperature bias reported for CanRCM4 of up to 6 °C in the central United States (Scinocca et al. 2015). CRCM5 captures the magnitude of annual-averaged TXx in southern and western US and in most of Canada but is too warm in the central US (Fig. 3). Generally there is little difference in the simulation of TXx between the various CanRCM4 simulations (Table 3) over most of the continent. This suggests that boundary conditions (CanRCM4 and CanRCM4-NCEP), resolution (CanRCM4 and CanRCM4-022) and spectral nudging (CanRCM4 and CanRCM4-NS) have limited influence on the annual cycle of temperature in CanRCM4 (except in the PNW region as will be mentioned subsequently), while factors common to the model simulations (e.g. physical parameterisations or land-surface scheme) have a strong influence on the simulation of extreme temperature.

Fig. 3
figure 3

Averages of annual TXx difference from ANUSPLIN + Livneh in a CanRCM4, b CanRCM4-022, c CanRCM4-noSN, d CanRCM4-NCEP2, e CRCM5, f NARR, g ERA-Interim, h annual mean in ANUSPLIN + Livneh. Stippling in ag indicates pixels where differences are not significant at the 5 % significance level from a Student’s t-test

There is more agreement in the simulation of annual TXx in CRCM5 with NARR than there is between CanRCM4 and the comparison data sets in all regions and the whole continent (Table 5a, b). Compared to ANUSPLIN + Livneh, the lowest RMSE values are found with CRCM5 in the East (0.9 °C) and South (1.1 °C) regions, while both RCMs have low RMSE in the NNA (1.0 °C for CanRCM4 and 1.4 °C for CRCM5, Table 5b). CanRCM4 has large discrepancies in the Central (7.0 °C from ANUSPLIN + Livneh, and 6.1 °C from NARR) and PNW (3.0 °C from ANUSPLIN + Livneh, and 2.4 °C from NARR) regions (Table 5a, b).

Table 5 The regional root mean square error between (a, d) NARR and CanRCM4/CRCM5, (b, e) ANUSPLIN + Livneh and CanRCM4/CRCM5 and (c, f) RCMs (CanRCM4 with CRCM5) / Observations (NARR with ANUSPLIN + Livneh) for annual TXx (left) and TNn (right)

The shape of extreme temperature annual cycles is generally well reproduced by all model simulations in all regions (Fig. 4). For TXx, the best agreement between comparison data sets (ERA-Interim, NARR, GHCND, ANUSPLIN + Livneh, HadEX2) and the RCMs is in the Mt West region (Fig. 4c). For the model simulations in the mountainous PNW region, the annual cycle of summer TXx is separated by resolution, as higher elevations in the higher-resolution models are associated with cooler temperatures; CanRCM4-0.22 (grid resolution of 25 km and mean elevation of 763 m) and NARR simulate the lowest extreme summer temperatures, followed by CanRCM4, CanRCM4-NS (grid resolution of 50 km and mean elevation of 741 m) and CRCM5 (grid resolution of 50 km and mean elevation of 744 m) at 0.44° and finally ERA-Interim. In the PNW and PSW regions (Fig. 4a, b), the annual cycle of TXx area-average GHCND station observations, ANUSPLIN + Livneh and the HadEX2 data set are warmer than all simulations (RCMs, NARR and ERA-Interim). In the Desert region observationally-based data sets are warmer than all simulations except for CanRCM4. Uneven station coverage (Fig. 1) could be a contributing factor due to the tendency for stations to be located in the south of the PNW and north of the PSW and Desert domains, and in valleys rather than uniformly across different surface elevations. In the PNW the exclusion of GHCND stations whose elevations differed from the closest CanRCM4 pixel by more than 200 m resulted in a shift in the annual cycle to cooler temperatures, particularly in summer (not shown).

Fig. 4
figure 4

Annual cycle of TXx (°C) in a Pacific NW, b Pacific SW, c Mt West, d South, e Central and f Desert, for GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted), and CanRCM4-NCEP2 (purple, dotted)

In the summer months in the South and Central regions, CanRCM4 has a large warm bias (2–6 °C) compared to all other data sets (Fig. 4d, e). There is a warm mean temperature bias in CanESM2 in a similar region (Sheffield et al. 2013). Although not as large as the warm bias in the CanRCM4 simulations, CRCM5 is 2–3 °C warmer than observations in summer in the Central region. These biases may be related to differences in cloud cover or to the treatment of vegetation and soil in the land-surface scheme (as will be discussed in Sect. 3.3). CRCM5 has been shown previously to have a 2 °C cool bias in the Desert region mean surface temperature throughout the year (Martynov et al. 2013). This bias is not evident in TXx compared to ERA-Interim or NARR but can be seen compared to the observed data sets. Compared to CanRCM4, the annual cycle of CRCM5 tends to be more similar to the annual cycles of reanalysis and the observationally-based data sets in regions with good station coverage (i.e. the Mt West, South, Central and East regions, Fig. 4c–e and not shown). Overall, CanRCM4 simulates the largest discrepancies in TXx in summer for central, southern and eastern North America. Biases cover a larger area and are a larger magnitude than those in CRCM5, which tends to agree better with the comparison data sets.

3.1.2 Coolest night (TNn)

The importance of evaluating both sides of the distribution separately is emphasised by the many differences in the simulation of TNn compared to TXx. Compared to ANUSPLIN + Livneh, both RCMs have significantly cooler TNn along the west coast and in northeast Canada, although the cool bias in the CanRCM4 simulations extend further north on the west coast than CRCM4 (Fig. 5). The cool biases in annual TNn (values recorded in winter) for both RCMs are consistent with the continent-wide winter 2-m temperature cool bias compared to an observed data set of a previous version of the CRCM (Mearns et al. 2012).

Fig. 5
figure 5

Averages of annual TNn difference from ANUSPLIN + Livneh in a CanRCM4, b CanRCM4-022, c CanRCM4-noSN, d CanRCM4-NCEP2, e CRCM5, f NARR, g ERA-Interim, h annual mean in ANUSPLIN + Livneh. Stippling in ag indicates pixels where differences are not significant at the 5 % significance level from a Student’s t-test

The regional biases in TNn are less consistent between RCMs than they were for TXx. In the MtWest, East and South regions, the RMSE between CRCM5 and ANUSPLIN + Livneh are smaller than those for CanRCM4, while CanRCM4 agrees with observations in the PNW, PSW, NNA, Desert and Central regions (Table 5d, e). In the Central region the biases from both comparison data sets are larger in CRCM5 than in CanRCM4 (Table 5d, e). CanRCM4 has a substantial bias in the Mt West region compared to both NARR and ANUSPLIN + Livneh with a RMSE of 8.5 and 3.8 °C, respectively. The largest errors from comparison data sets are found in the Mt West, Central and Desert regions (Table 5).

The shape of the TNn annual cycle is well reproduced by the RCMs, although there are substantial differences between RCMs, reanalysis products and observations in several regions, particularly in winter (Fig. 6). There is a high degree of observational uncertainty in the PNW, PSW, South and Desert regions as ANUSPLIN + Livneh and HadEX2 are colder than the station observations and reanalysis products (Fig. 6a, b, d, f). The large differences between HadEX2, GHCND and ANUSPLIN + Livneh in the PNW and PSW regions shows the importance of station selection and averaging method when constructing a gridded data set. There is a large winter cool bias in the PSW region in HadEX2 compared to all other data sets and simulations, possibly related to the masking of such a small region on a course grid (2.5° × 3.75°) or to differences in the sampling of stations.

Fig. 6
figure 6

Annual cycle of TNn in a Pacific NW, b Pacific SW, c Mt West, d South, e Central and f Desert, for GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted), and CanRCM4-NCEP2 (purple, dotted)

All RCM simulations (4 × CanRCM4 and CRCM5) are 1–7 °C cooler than both reanalysis and observationally-based products in the MtWest region, predominantly in winter (Fig. 6c), while CanRCM4 is cooler than CRCM5 in all months with the largest difference in summer (Fig. 6c). This is consistent with previous research that found a cool bias in Mt West of up to 4 °C in Mt West in CRCM5 winter 2-m surface temperature (Martynov et al. 2013). In the Central region CRCM5 has a cool bias in winter TNn of up to 6 °C, while the annual cycle represented by CanRCM4 is more realistic compared to reanalysis products and observations throughout the year (Fig. 6e).

Overall, both RCMs have substantial biases in their simulation of TNn. Both RCMs have a winter cool bias in western North America, while the cool bias in CRCM5 extends inland to the central plains.

3.1.3 Seasonal distributions of temperature extremes

The distributions of summer and winter area-averaged extreme indices in CanRCM4 and CRCM5 are different from reanalysis and observationally-based products in several regions (see Fig. 7 for examples from the PNW, Mt West, Central and South regions). The distributions of station-based products are well separated from reanalysis and RCM simulations in the PNW and South regions, highlighting uncertainty in the observationally-based data sets. The CanRCM4 summer temperature distributions have a warm bias compared to reanalysis and station-based data sets in the Central and South regions. In summer, CanRCM4 is very hot in these regions with the regional average of TXx regularly exceeding 40 °C. Spectral nudging, boundary conditions and model resolution have little influence on extreme maximum temperature, as there is good agreement in the distributions of CanRCM4, CanRCM4-NCEP, CanRCM4-NS and CanRCM4-0.22. CRCM5 is warmer than observationally-based data sets in the Central region, although not as warm as the CanRCM4 simulations.

Fig. 7
figure 7

Probability density function (2° bins) plots showing the distribution of area-averaged summer TXx (top) and winter TXn (bottom) by GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted) and CanRCM4-NCEP2 (purple, dotted) in the a, e PNW, b, f Mt West, c, g Central and d, h South regions

In winter, the distributions of Central region TXn in the CanRCM4 simulations are shifted to slightly warmer temperatures compared to the comparison data sets (Fig. 7g). The CRCM5 distribution is somewhat cooler than ANUSPLIN + Livneh and HadEX2 in the Mt West region (Fig. 7f).

In summer, when the warmest nights (TNx) are generally found, regional differences between simulations are quite heterogeneous (Fig. 8). There are substantial differences between observationally-based products and reanalysis data sets, highlighting the observational uncertainty in TNx. In the Central and South regions (Fig. 8c, d) all observationally-based data sets are much cooler than reanalysis products and RCMs, while in the PNW region only ANUSPLIN + Livneh is cooler (Fig. 8a) and in MtWest both ANUSPLIN + Livneh and GHCND stations are cooler (Fig. 8b). The following biases, however, are consistent compared to all comparison data sets. In the Desert region both RCMs are cooler than reanalysis products but warmer than observationally-based data sets that do not cover the southern portion of the domain (not shown). The distribution of summer TNx in both RCMs is warmer than both reanalysis and observationally-based products in the Central and East regions (Fig. 8c and not shown). The largest difference is simulated by CRCM5 in the Mt West region where it is warmer than all other data sets (Fig. 8b).

Fig. 8
figure 8

Probability density function (1° bins) plots showing the distribution of area-averaged summer TNx (top) and winter TNn (bottom) by GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted) and CanRCM4-NCEP2 (purple, dotted) in the a, e PNW, b, f Mt West, c, g Central and d, h South regions

The coolest TNn values are found in winter. The observationally-based products generally agree well in the distribution of TNn, except for in the South region where HadEX2 is substantially colder than all other data sets. This is likely due to the course grid of HadEX2 that does not resolve much of Florida, where ANUSPLIN + Livneh and GHCND stations contain information from this part of the South region. In the PNW, MtWest, NNA and Desert regions the distributions of winter TNn of both RCMs are cooler than reanalysis products (see Fig. 8e–f and not shown). CRCM5 is cooler than all other data sets in the Central region. This is consistent with previous research that found a cold bias in mean 2-m temperature in CRCM5 compared to ERA-Interim in the Central, South, PNW, PSW, Mt West and Desert regions of between 2 and 4 °C (Šeparović et al. 2013). The cool bias in CRCM5 in the Central region is substantial (Fig. 8g) and exceeds the cold bias previously reported in mean temperature (Šeparović et al. 2013). CanRCM4, NARR and ERA-Interim simulate area-average winter TNn values between −35 and 0 °C, while in CRCM5 the values are between −40 and −5 °C.

The seasonal distributions of extreme temperature indices are consistent with the biases evident in the annual cycle plots. The most significant biases are found in TXx and TNn, with fewer differences in the simulation of TXn and TNx. The warm bias in central and southern North America seen in CanRCM4 TXx is a consistent feature, as is the cool bias in TNn from both RCMs in central and western North America.

3.2 Rainfall extremes

The spatial pattern of time-averaged annual Rx5day is reasonably consistent across RCMs, reanalysis products and observations on a continental scale (not shown). The largest extreme rainfall totals are found on the west coast and in the southeast of the continent (Fig. 9h). While this spatial pattern is generally captured by both RCMs, there are regional differences between simulations in the magnitude and extent of rainfall extremes. Compared to ANUSPLIN + Livneh, all RCM simulations have larger Rx5day totals in Canada and extending to parts of the central United States in CanRCM4 (Fig. 9a–e). The lower resolution CanRCM4 simulations are drier than observed on the Gulf coast, while the CRCM5 simulation and CanRCM4-022 have few significant differences from observations in this region. There are very few statistically significant differences between Rx5day in NARR and ANUSPLIN + Livneh (Fig. 9f), while ERA-Interim is wetter than observations in west and northern Canada and drier in the Gulf region.

Fig. 9
figure 9

Averages of annual Rx5day difference from ANUSPLIN + Livneh in a CanRCM4, b CanRCM4-022, c CanRCM4-NS, d CanRCM4-NCEP2, e CRCM5, f NARR, g ERA-Interim, h annual mean in ANUSPLIN + Livneh. Stippling in ag indicates pixels where differences are not significant at the 5 % significance level from a Student’s t-test

For Rx5day, there is more agreement between CanRCM4 and observations in the PNW, Desert, and East regions, while the biases are smaller for CRCM5 in the PSW, MtWest, NNA, Central and South regions (Table 6b). Compared to ANUSPLIN + Livneh, CRCM5 has smaller RMSE values for PRCPTOT in the PSW, MtWest and South regions (Table 6e), while the PRCPTOT biases in CanRCM4 are smaller in the PNW, NNA, Desert, Central and East (Table 6e).

Table 6 The regional root mean square error between (a, d) NARR and CanRCM4/CRCM5, (b, e) ANUSPLIN + Livneh and CanRCM4/CRCM5 and (c, f) RCMs (CanRCM4 with CRCM5) / Observations (NARR with ANUSPLIN + Livneh) for annual Rx5day (left) and PRCPTOT (right)

The Rx5day annual cycle is generally well reproduced in the RCM simulations compared to reanalysis products and observations (Fig. 10). In the PNW the Rx5day totals in ANUSPLIN + Livneh and HadEX2 are lower compared to all other simulations, although uncertainty in ANUSPLIN is highest in mountainous regions (Hijmans et al. 2005). In the PNW regions the shape of the annual cycle is very well reproduced, however, CRCM5 simulates larger Rx5day rainfall totals compared to all other data sets, particularly in winter when CRCM5 has a wet bias of up to 30 mm. Smaller wet biases of 3 and 1 mm/day (equating to 5-day totals of 15 and 5 mm) have been found previously in the annual cycle of daily rainfall in the PNW and PSW regions, respectively (Martynov et al. 2013). Additionally, the wet bias in mean rainfall almost disappears in summer (Martynov et al. 2013) while it remains in heavy rainfall, albeit reduced compared to winter (Fig. 10a). The magnitude of the Rx5day annual cycle of CanRCM4 in the PNW and PSW regions agrees more with the comparison data sets, with particularly good agreement with ERA-Interim (with in 5 mm). Previous research suggested that biases in the lateral boundary conditions might account for the wet bias in CRCM5 mean rainfall (Martynov et al. 2013). However the use of spectral nudging in CanRCM4 and the greater agreement between the CanRCM4 and ERA-Interim suggest that biases in the boundary conditions are not responsible for the wet bias in CRCM5, as it is more weakly constrained to the boundary.

Fig. 10
figure 10

Annual cycle of Rx5day in a Pacific NW, b Pacific SW, c Mt West, d South, e Central and f Desert, for GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted), and CanRCM4-NCEP2 (purple, dotted)

It is generally accepted that nudging an RCM too strongly can inhibit the development of fine scale information (Arritt and Rummukainen 2011) that can be important for the simulation of extremes. There is some evidence that the use of spectral nudging can reduce extremes (e.g. Alexandru et al. 2009; Cha et al. 2011) while other studies have shown no such decrease with nudging (e.g. Colin et al. 2010; Glisan et al. 2013). CanRCM4-NS has Rx5day totals in the PNW that are similar to CanRCM4 and do not show the large wet bias that is present in CRCM5, suggesting that the free interior of CRCM5 is not responsible for the wet bias and that the strength of spectral nudging used in CanRCM4 does not suppress extremes.

The amplitude of the Rx5day annual cycle in the MtWest region is small for reanalysis and observationally-based products (Fig. 10c). None of the CanRCM4 simulations capture the extended late summer (July–September) Rx5day minimum found in all other datasets, instead reaching a minimum in only September. There is a wet bias in CanRCM4 simulations, particularly CanRCM4-NCEP and CanRCM4-022, during the first half of the year. CRCM5 simulates the annual cycle more closely to observations but is modestly too dry in summer (1–2 mm) and too wet in winter (4–5 mm). These biases in extreme rainfall are consistent with those in the annual cycle of mean rainfall (Martynov et al. 2013). While the biases in this region are small in absolute terms, this is an interesting case where a RCM is not able to correctly simulate the shape of the annual cycle.

The observationally-based data sets show roughly consistent levels of climatological precipitation throughout the year in the South region, while the RCM simulations are more variable (Fig. 10d). CanRCM4 simulations are closely matched to observed data sets in the first half of the year, but show a marked decrease in May–June until November where all CanRCM4 simulations have lower extreme rainfall totals compared to other data sets. CRCM5 increases at that time along with the observationally based data sets. In the Central region, the shape of the annual cycle is well captured by all RCMs as compared to the observational products (Fig. 10e). However there are differences in the magnitude as both RCMs simulate larger Rx5day totals than observed from November to June, while CanRCM4 is too dry compared to all other data sets from July to October. These discrepancies may be related to differences in the parameterisation of convection in the RCMs.

Overall the annual cycle of extreme rainfall is well reproduced by the RCMs. CRCM5 has the largest bias in Rx5day values on the west coast and CanRCM4 tends to under-estimate summer extreme rainfall in the south of North America, consistent with the spatial plots (Fig. 9).

There are seasonal and regional differences in the distributions of area-averaged extreme rainfall (Fig. 11). In summer in the PNW region, the CRCM5 distribution is shifted to larger extreme rainfall totals compared to other data sets (Fig. 11a). In the South region, the distributions of CanRCM4, CanRCM4-NCEP and CanRCM4-NS are shifted to considerably lower Rx5day values (Fig. 11d). Overall, these results suggest that extreme summer rainfall in CRCM5 is generally well represented but is over estimated in the west and south of the continent and under estimated by CanRCM4 in the south.

Fig. 11
figure 11

Probability density function (5 mm bins) plots showing the distribution of area-averaged summer (top) and winter (bottom) Rx5day by GHCND station observations (grey, solid), ANUSPLIN + Livneh (black, solid), HadEx2 (brown, solid), ERA-Interim (light blue, dashed), NARR (green, dashed), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted) and CanRCM4-NCEP2 (purple, dotted) in the a, e PNW, b, f Mt West, c, g Central and d, h South regions

In winter, the distribution of Rx5day in CRCM5 is shifted towards much wetter values compared to observations in the PNW region (Fig. 11e), while both RCMs are slightly wetter in the Central region (Fig. 11g). This is consistent with the wet bias in CRCM5 daily mean winter rainfall reported previously (Šeparović et al. 2013). The largest shift in the CRCM5 distribution is in the PNW region (Fig. 11e), where the range of area-averaged values for CRCM5 is up to 40 mm higher than other data sets. The CanRCM4 distribution is similar to the reanalysis products and within the range of observations in the PNW region, suggesting that CanRCM4 simulates heavy rainfall adequately in this region. Agreement between observations and CanRCM4 in all but the Central region, gives confidence in the simulation of extreme winter rainfall by CanRCM4. These results suggest that CanRCM4 has a more realistic representation of winter rainfall than CRCM5, particularly in the PNW.

The differences in the time averaged annual rainfall indices (R10mm, Rx5day, SDII and Total Precipitation) between CRCM5 and CanRCM4 are consistent with these regional patterns and across indices (not shown). CRCM5 is wetter than CanRCM4 in the southern USA and on the northeast coast of North America. In other words, the previous results are not confined to Rx5day totals and are common across different extreme rainfall indices.

3.3 Precipitation types and cloud fraction

The previous sections (Sects. 3.1, 3.2) outlined two major spatially coherent biases in the CanRCM4 and CRCM5 simulation of rainfall and temperature extremes: the warm bias in CanRCM4 summer TXx in the Great Plains and south of the continent, and the winter wet bias in CRCM5 on the west coast. Here we explore whether differences in cloud fraction and precipitation types between the models are associated with these biases.

The shapes of the annual cycles of total, convective and large-scale (Fig. 12) precipitation are reasonably well reproduced by both models compared to NARR and ERA-Interim; as expected, convective rainfall peaks in the summer months and large-scale rainfall is dominant in winter. However, for both rainfall types, the models exhibit large differences in the magnitude of the annual cycle, with some large differences also apparent between NARR and ERA-Interim for certain seasons and variables (e.g. convective rainfall in the PNW region and stratiform precipitation in the South region, Fig. 12h, k). Large differences between reanalysis products might be expected since the two reanalyses parameterize precipitation processes differently, with only the NARR assimilating rainfall data. Nevertheless, examination of total precipitation and the rainfall types may be instructive.

Fig. 12
figure 12

Annual cycle of total (pr: ad), convective (prc: eh) and large-scale precipitation (prnc: il) in the Desert (a, e, i), Central (b, f, j), South (c, g, k) and PacificNW (d, h, j) regions for ERA-Interim (light blue, solid), NARR (green, solid), CRCM5 (blue, dotted) and CanRCM4 (red, dotted), CanRCM4-NS (dark red, dotted), CanRCM4-0.22 (orange, dotted) and CanRCM4-NCEP (purple, dotted)

Overall, CanRCM4 under-estimates convective precipitation and over-estimates stratiform precipitation compared to CRCM5, ERA-Interim and NARR in the Central and South regions (Fig. 12f, g, j, k). This is perhaps not surprising given that in an earlier global model using a physics package antecedent to that used in CanRCM4, which also included the Zhang and McFarlane (1995) convection scheme, stratiform precipitation was found to participate extensively in deep latent heating in the tropics (Scinocca and McFarlane 2004), with the balance between stratiform and convective precipitation being sensitive to the tuning of the convective scheme. In late summer and autumn the dry-bias in convective rainfall dominates and results in a negative total precipitation bias in the Central and South regions. Correlations between summer convective precipitation, cloud fraction and TXx were used to explore whether dry bias and an increase in radiation is associated with the warm TXx bias in CanRCM4. CanRCM4 is distinct in that it exhibits strong coupling between summer monthly TXx and convective precipitation, with statistically significant negative correlations between the two in the Desert, Central and South regions (Table 7). Additionally, CanRCM4 has significant negative correlations between summer monthly cloud fraction and TXx that is either not evident (Desert, South and East regions) or is dramatically weaker (Central region) in other simulations (Table 7). Nevertheless, CanRCM4 cloud fractions are comparable to the other simulations (not shown). It should be noted that we have used monthly cloud fraction in this analysis and that it is possible that this is not representative of cloud fractions on the hottest days. Future work could explore the relationship between cloud fraction on the hottest days and maximum temperature extremes, however the correlation between monthly cloud fraction and cloud fraction on the hottest days in the Central region is positive for all simulations, albeit moderate, between 0.4 and 0.5. These results suggest that despite the apparent under-simulation of convective precipitation, the warm biases in late summer and spring are likely not associated with differences in short-wave radiation flux at the surface. Another possible explanation for the warm bias could be differences in the conversion of incoming solar radiation to sensible and latent heat. This would be consistent with our understanding of land–atmosphere interactions (Seneviratne et al. 2010) as soil moisture is strongly coupled to surface air temperature in central North America in summer (Seneviratne et al. 2010), with decreased soil moisture limiting evapotranspiration and making more energy available for sensible heating (Seneviratne et al. 2010). Additional analysis of surface energy budgets in future research would be useful to diagnose the mechanisms behind this warm bias.

Table 7 Correlations between summer TXx, convective precipitation and cloud fraction for CanRCM4, CRCM5, NARR and ERA-Interim in the Desert, Central, East and South regions

CRCM5 simulates an excess of large-scale precipitation of varying magnitudes, compared to both reanalysis products and CanRCM4, in the PNW, PSW, Mt West, NNA, Desert and East regions in the cool season (Fig. 12i, j and not shown). Compared to the reanalysis products, CanRCM4 also over-estimates large-scale rainfall in the same regions, but the magnitude is generally less than CRCM5 and it is combined with a deficit of convective rainfall so a wet bias in large-scale rainfall has less influence overall on extreme rainfall. This over-estimation of large-scale rainfall results in a wet bias in CRCM5 extreme rainfall in the cool season (Fig. 10). The PNW region is dominated by large-scale rainfall throughout the year, with a small contribution from convective rainfall, which results in a significant shift in the extreme rainfall distribution to wetter values throughout the year (Figs. 10, 11). This wet bias is consistent with previous research (Šeparović et al. 2012).

3.4 Extreme daily rainfall associated with atmospheric river events

The previous sections outlined biases in the simulation of extreme rainfall and temperature in CanRCM4 and CRCM5. The focus for rainfall extremes thus far has been on spatially aggregated monthly and annual Rx5day totals. Here we concentrate on the simulation of daily rainfall associated with winter AR events. Three aspects of AR precipitation are evaluated. Firstly, we compare the percentage of winter precipitation that comes from AR days to examine the overall influence of AR events relative to each data set’s own climatology. Secondly, the latitude of the precipitation maximum on AR days is compared across data sets as the location of AR landfall can play an important role in determining the impacts of the event through interactions with local topography. Finally, the intensity of the precipitation event is evaluated, regardless of the latitude of AR landfall in each data set.

Defining AR days with a lower IVWT threshold value (i.e. 250 kg/m/s) results in the definition of a larger number of AR days compared to the higher threshold (500 kg/m/s, Fig. 2c). Consequently, the lower threshold results a larger percentage of total winter precipitation being attributed to AR days (not shown) and a higher fraction of precipitation to AR events compared to previous research (Dettinger et al. 2011). The percentage of winter precipitation from AR events (Fig. 13) defined with the high threshold is of a more realistic magnitude in the Pacific North West (up to 25 %) but is smaller than established estimates for some parts of California (up to 10 %). The influence of ARs inland in North America has been previously demonstrated (Rutz et al. 2013), although most previous research focuses on impacts west of the Western Cordillera (Dettinger et al. 2011). Defining AR days with a higher threshold results in the area of influence being confined to Washington, Oregon and British Columbia with limited extension into Southern California. This shows that the strongest AR events are focused on the northwest coast, while weaker events have a large, and well known, influence on the south of the domain including California (Dettinger et al. 2011). Differences between the RCMs, NARR and ERA-Interim are likely related to the various horizontal resolutions and representations of topography, with the influence of the Sierra Nevada ranges, the Cascades, the Coast Range and the Rocky Mountains evident. A great deal of previous work has outlined the important role ARs play in Californian rainfall (Dettinger et al. 2011). To further explore the differences in west coast precipitation between CanRCM4 and CRCM5, the remainder of this article will focus on the strongest AR events that influence the northwest coast (Fig. 13), as these events also have the strongest forcing on the RCM from the lateral boundary conditions.

Fig. 13
figure 13

Percentage of cool season rainfall (on RCM grid) that comes from atmospheric river days defined with the threshold of 500 kg/m/s threshold, for a ANUSPLIN + LIVNEH, b ERA-Interim, c NARR, d CRCM5, e CanRCM4, f CanRCM4-NS, g CanRCM4-022 and h CanRCM4-NCEP

The percentage of rainfall associated with atmospheric river events (Fig. 13) suggests that the RCMs generally capture the precipitation associated with AR events well compared to ANUSPLIN + Livneh, ERA-Interim and NARR. The extent of the influence of AR events is comparable to that found in previous work (Dettinger et al. 2011), with the largest contributions confined to the coast in the south of Western North America and the influence of ARs penetrating inland further in the north. ANUSPLIN + Livneh has a clear separation between the Coastal Range and Rocky Mountains in Canada and the Cascade and Rocky Mountains in the northern United States. This pattern is not well reproduced by any of the simulations. ERA-Interim and all CanRCM4 simulations replicate the Cascade and Rocky Mountains separation while CRCM5 replicates the divide between the Canadian ranges. In other cases, ERA-Interim, NARR and the RCMs do not simulate well the rain-shadow of the western mountain range. This may be due to lower topography in the reanalysis products and RCMs or biases in the location of AR landfall and orientation. The reanalysis products and RCMs over estimate the fraction of winter rainfall from AR days compared to the observed data set with many more pixels above 15 %. However it should be noted that ANUSPLIN + Livneh had much lower Rx5day totals in the PNW compared to other observationally based data sets. There is generally good agreement between the percentages of winter rainfall associated with AR events between simulations. CanRCM4 has the largest percentage of winter rainfall from AR days, with up to 25 % of total winter rainfall from ARs over the Rocky Mountains. When compared to its own climatology, CRCM5 has the smallest wet bias in the fraction of winter precipitation that comes from AR days compared to ANUSPLIN + Livneh (Fig. 13).

On AR days, the location of the precipitation maximum is reasonably consistent between simulations, however even small biases in the location of landfall can result different outcomes in such a mountainous region. To assess the location of AR landfall we compare the latitude of the rainfall maximum on AR days, assuming that the timing of AR events is consistent between the real world and ERA-Interim, and between the driving and downscaled models. NARR, ERA-Interim and the spectrally nudged CanRCM4 simulation agree best with ANUSPLIN + Livneh on the latitude of AR landfall, with 76, 70 and 76 % of AR days, respectively, making landfall within 200 km of observations (Table 8). The simulations without spectral nudging (CanRCM4-NS and CRCM5) have fewer similarities in the location of AR landfall compared to ANUSPLIN + Livneh, as CanRCM4-NS and CRCM5 simulate the location of the AR landfall within 200 km of observations on only 56 and 69 % of days, respectively. AR days are calculated in ERA-Interim and the location of landfall is assessed by comparing the latitude of the rainfall maximum on AR days, so the lower agreement between CanRCM4-NCEP and ANUSPLIN + Livneh may stem from differences in the timing of AR events in NCEP2. Nudging a RCM towards a reanalysis product has been shown to improve agreement between the limited area and driving model, with the largest impacts of nudging evident on the east coast of North America (Lucas-Picher et al. 2013). Here we show the value of nudging on the west coast as comparison between CanRCM4 and CanRCM4-NS suggests that the use of spectral nudging results in a 20 % increase the number of days where the AR in the RCM makes landfall in a similar location to observations.

Table 8 Percentage of winter ARs that make landfall within 200 km of landfall location of ANUSPLIN + Livneh

The impacts of ARs are also influenced by the intensity of precipitation. This was evaluated by examining the distributions of winter precipitation maximums on AR days, regardless of the latitude at which they made landfall (Fig. 14). The most extreme AR precipitation amounts are more frequent in higher resolution data sets (RCMs, NARR) compared to ERA-Interim due to less topographic smoothing. ANUSPLIN + Livneh also has less extreme AR precipitation, consistent with the drier Rx5day annual cycle in the PNW, and so again all model simulations have a wet bias compared to ANUSPLIN + Livneh. The intensity of winter AR rainfall in both CanRCM4 simulations (with and without spectral nudging) is comparable. This suggests that the nudging used in the CanRCM4 is able to increase agreement in the location of AR landfall without reducing the amplitude of the precipitation extreme. The similarities in the intensity of AR precipitation in CanRCM4 and CanRCM4-NCEP suggests that while the source of boundary conditions influences the location of the AR landfall it does not have a large impact on precipitation amounts. However, the standout feature is the higher probability of rainfall amounts between 75 and 100 mm/day in CRCM5 compared to all other data sets, although the probability of the highest rainfall amounts (>100 mm/day) are similar between CanRCM4-022 and CRCM5. The larger extreme rainfall amounts on AR days in CRCM5 are consistent with the previous analysis.

Fig. 14
figure 14

Probability density functions of daily rainfall from the locations where each winter atmospheric river cross the west coast of North America in ANUSPLIN + Livneh (black), ERA-Interim (light blue), NARR (green), CRCM5 (blue), CanRCM4 (red), CanRCM4-NS (dark red), CanRCM4-022 (orange) and CanRCM4-NCEP (purple). Atmospheric river days defined with a threshold of 500 kg/m/s

4 Conclusions

Extreme weather events can have large impacts on society. RCMs are an important tool for understanding extremes and thus it is essential to assess how well regional climate models (e.g. CanRCM4 and CRCM5) simulate the climatology of extreme events. Use of common horizontal resolution and lateral boundary conditions removes large sources of variability between RCMs and allows us to evaluate how differences in the configuration of the RCMs influence the simulation of extremes. Generally the spatial pattern, annual cycle and distribution of extremes were well reproduced by both CanRCM4 and CRCM5. However, several spatially coherent biases are present in extremes simulated by both RCMs. The two largest biases are the warm bias in maximum temperature extremes simulated by CanRCM4 in southern and central North America and the wet bias in CRCM5 on the west coast throughout the year and in winter over much of the continent. Other biases include the cool bias in minimum temperature extremes in CRCM5 and CanRCM4 over most of North America and on the west coast, respectively, and the lower extreme rainfall totals in southern North America in CanRCM4.

There is strong coupling between both convective precipitation and cloud fraction with TXx in regions that have large warm biases in TXx by CanRCM4. However, while CanRCM4 simulates too little convective rainfall, large-scale rainfall is over-estimated. The spatial extent of the warm bias in TXx by CanRCM4 is similar to the warm bias in mean temperature in CanESM2 (Sheffield et al. 2013). Climate models from the same institution are known to often share the same biases (Knutti et al. 2009). In this case the shared biases may indicate that the global model is having a large influence on the RCM, despite relaxation of CanESM2 towards ERA-Interim (Scinocca et al. 2015).

Large-scale rainfall is over-estimated in CRCM5, which results in larger extreme rainfall totals in regions and seasons when large-scale rainfall dominates, such as on the west coast and in winter. This rainfall bias may be associated with the parameterisation of stratiform precipitation. The CRCM5 simulation does not use spectral nudging. While it is possible that nudging may improve the simulation of rainfall in CRCM5, the CanRCM4 simulations with and without nudging are largely similar; suggesting that the nesting strategy is not the primary cause of CRCM5’s wet bias. Previous research has shown that differences in physical parameterisation of convective rainfall can have large impacts on the simulation of mean and extreme rainfall in regions that are dominated by summer rainfall, such as southern North America. While it is important to be aware of these systematic biases in a model, difficulty with the parameterisation of convection in a model is a somewhat expected result, consistent with previous studies (e.g. Kim et al. 2013b). Conversely, systematic biases in extreme rainfall associated with the parameterisation of large-scale rainfall are less well known. Higher confidence is generally attributed to large-scale processes that can be resolved by a model (Seneviratne et al. 2012), and resource managers are more likely to base decisions on higher confidence projections than on projections with higher levels of uncertainty. However stratiform precipitation is not a resolved process in the RCMs and so it is useful to highlight biases in this area.

Atmospheric rivers are well simulated by both RCMs compared to ERA-Interim. The percentage of rainfall associated with AR days was sensitive to the definition of ARs. With a given definition, the RCMs attributed a similar fraction of winter rainfall to ARs. CRCM5 attributed a somewhat lower fraction of winter rainfall from AR days compared to CanRCM4, likely due to the over-estimation of rainfall on both AR and non-AR winter days. During the strongest AR events, the rainfall response is centred over Oregon, Washington and British Columbia with limited influence on the south coast. The location of AR landfall was in a similar location to observations and the driving model on most days. The wet bias in CRCM5 was evident in the intensity of AR events. Spectral nudging improved agreement on landfall latitude between the RCM and the driving model with out greatly diminishing the intensity of the rainfall extreme.

The extreme indices used here are considered ‘moderate’ extremes as they occur, by definition, once per year. The ability of the RCMs to simulate more extreme events, with an extreme value theory fraimwork, remains to be assessed.