Introduction

The extratropical circulation in the Southern Hemisphere (SH) is characterized by a strong storm track related to tracks of cyclones and anticyclones1,2. The intensity of the storm track, hereafter the storminess, is tightly connected to surface weather in the SH3,4 Understanding how storminess will change under anthropogenic forcing is important for mitigating the impacts of climate change2. Climate models project that the SH storm track will intensify by the end of the 21st century under climate change2,5,6,7,8,9, bringing increased precipitation10 and stronger surface winds11.

Recent work has shown that SH storminess has increased significantly in the satellite era (1979 to present) in reanalysis data9,12. However, the trends in reanalysis data were 2–3 times larger than the multi-model mean trends from models participating in the Coupled Model Intercomparison Project Phase 5 and Phase 6 (CMIP5 and CMIP6)13,14. Thus, recent work concluded that climate models significantly underestimate the storminess trend in the reanalysis datasets in the satellite era. This reanalysis-model trend discrepancy in the SH winter storm track calls into question the ability of climate models to predict future weather in the SH.

A discrepancy between the climate model and observed trends can have multiple causes that can be categorized into three factors15,16: (I) The observations are in error, (II) The observation-model comparison is flawed, (III) The models are deficient.

For (I), the trends can differ substantially across observational datasets17 and lead to observational uncertainty. The use of up-to-date observational data can be important in reconciling observation-model trend discrepancies18,19. For storminess, the observed trend is quantified using reanalysis datasets, which involve uncertainties arising from data assimilation techniques, physical parameterizations, and evolution of observational systems20,21. The reanalysis uncertainty can be particularly important in the SH where ground-based observations are limited22,23. For example, recent work showed that trends in SH winter weather-scale temperature variance differ by a factor of 5 across reanalysis datasets24.

For (II), there are two important aspects to consider. First, reanalysis trends involve a single realization of internal variability whereas model simulations reflect a distribution of realizations. Thus, it is important to properly sample the internal variability by using a large number of model simulations25,26. Second, a like-for-like comparison whereby the observations and climate models are compared with the same temporal and spatial sampling has been important for reconciling previous discrepancies27,28. A like-for-like comparison is especially important for storm tracks, which sample specific time and spatial scales29.

For (III), the models can be deficient in either the forced response or the internal variability because they are incapable of simulating the physical mechanism responsible for the observed trend. For example, CMIP6 models fail to simulate recent sea surface temperature (SST) trends in the tropical Pacific and Southern Ocean30,31,32,33,34. The tropical SST trends in CMIP6 models are characterized by an El Nino-like trend in the tropical Pacific, as opposed to a La Nina-like trend in the observations, with the observed trends lying at the edge or outside of the model trend distribution30,31,32. The observed cooling trend in the Southern Ocean is also not well captured by CMIP6 models30,33 and it has been suggested that this is also related to the SST trend difference in the tropical Pacific33,35. Previous work concluded that coupled models exhibit a systematic bias in the representation of SST trends and that differences between observed and modeled trends are very unlikely to occur due to internal variability30.

The reanalysis-model discrepancy in the SH winter storm track trends should be revisited following the three factors described above. (I) It is currently unclear how increasing the number of reanalysis datasets to address reanalysis uncertainty would affect the discrepancy. (II) It is also unknown how expanding the model ensemble size and addressing like-for-like comparison (calculating storminess using the same time and spatial grids) would affect the discrepancy. (III) Lastly, the impact of observation-coupled model SST trend discrepancy on the SH winter storminess trends has not been quantified, although SST trends are related to other large-scale circulation trends in the SH30,36,37.

Here we revisit the reanalysis-model discrepancy in the Southern Hemisphere winter storm track trends with additional reanalyses, expanded model ensembles, and like-for-like calculations. We also quantify the impact of SST trend discrepancy on the storm track trends, including the mechanisms connecting them.

Results

Revisiting the SH winter storminess trend discrepancy

We revisit the SH winter storminess trends in reanalyses and CMIP6 and AMIP6 multi-model ensembles (see Methods and Table S1), quantified using vertically integrated bandpass-filtered eddy kinetic energy (hereafter EKE, see Methods) as in previous work9. To address (I) and (II), we calculate EKE using a larger number of reanalysis data (expanding from 3 to 8) and model ensemble members (expanding from 16 to 26 for CMIP6 and from 13 to 32 for AMIP6, see Methods). We also ensure that EKE is calculated using the same time frequency and spatial grids: daily-mean, 8 pressure levels, and 1.5° by 1.5° horizontal grids (see Methods).

Expanding the number of reanalysis datasets shows the zonal-mean SH storminess trend exhibits large observational uncertainty (Fig. 1a and see Fig. S1 for time series). The reanalysis trend spread is larger than the multi-model ensemble spread, and not all trends are statistically significant. The largest and the smallest trends come from older generation reanalyses (NCEP2 and MERRA1), whereas the newer generation reanalyses are closer together (ERA5 and JRA3Q). The large spread in reanalysis trends is not significantly affected by start and end dates (Fig. S2), and a large spread is also found using a different metric for storminess (e.g., sea-level pressure variance11, Fig S3). Note that storminess climatology does not exhibit similarly large uncertainty, except for NCEP2 which has about 28% weaker climatology than other reanalyses (Fig. S1).

Fig. 1: Revisiting the reanalysis-model storminess trend discrepancy for CMIP6 and AMIP6 ensembles.
figure 1

a Linear trends of SH JJA EKE (40–75°S) in 8 reanalysis datasets (blue colors, 1979–2018) and 26 CMIP6 (1979–2018) and 32 AMIP6 (1979–2014) model simulations (diamonds). Statistically significant trends at the 95% confidence level are filled. The box represents the full spread of reanalysis trends and the 10–90% range of model ensemble trends. The horizontal line inside the box shows the median trend in the model ensemble. b Similar results to (a), but for the South Pacific (40–75°S, 170°E–60°W). Spatial pattern of SH JJA EKE trend for (c) reanalysis mean, (d) CMIP6 and (e) AMIP6 multi-model ensemble mean. Stipples indicate where reanalysis-mean or multi-model ensemble mean trends are significant at the 95% level. The green dashed lines indicate the South Pacific sector (40–75°S, 170°E–60°W).

Six out of eight reanalysis trends, including the newest reanalysis trends, fall outside the 10–90% range of the trend distribution of coupled CMIP6 ensemble which predicts SST (Fig. 1a, Table S2). For five reanalysis trends, no more than one CMIP6 ensemble member simulates a trend as high. Thus, while there is large observational uncertainty, CMIP6 ensemble seems to be underestimating the storminess trends. Although weak, the CMIP6 trend is dominated by greenhouse gas forcing according to the Detection and Attribution Model Intercomparison Project simulations (DAMIP38, see Fig. S4).

Four reanalysis trends fall within the 10–90% range of the trend distribution of prescribed-SST AMIP6 ensemble (Fig. 1a, Table S2). Consistently, comparing the distributions of storminess trends between the two ensembles, the trends in the AMIP6 are significantly greater than those in CMIP6 (p value = 0.01 see Methods), indicating that prescribing observed SST trends significantly increases the storminess trends in the models. Similar results were found in previous work37, where annual-mean atmospheric energy transport trends from transient eddies were greater in the AMIP6 than the CMIP6 ensemble.

The spatial pattern of the storminess trends provides insights into understanding the different trends in CMIP6 and AMIP6 ensembles (Fig. 1c–e). The multi-reanalysis-mean storminess trend is significant across most SH regions including high latitudes of the Indian Ocean, South Pacific, and South Atlantic (Fig. 1c and Fig. S5 for individual reanalysis). However, the CMIP6 ensemble-mean storminess trend in the South Pacific (170°E–60°W) is negligible (Fig. 1d). The AMIP6 ensemble-mean trend better captures the reanalysis trend, especially in the South Pacific, where CMIP6 ensemble shows a near-zero ensemble-mean trend (compare Fig. 1d, e). For more quantitative comparison, storminess trend distributions in the South Pacific are examined for the CMIP6 and AMIP6 ensembles (Fig. 1b). Five reanalysis trends fall outside the 10–90% range of CMIP6 ensemble trend distribution in the South Pacific (Fig. 1b). In contrast, seven reanalysis trends fall within the 10–90% range of AMIP6 ensemble trend distribution (Fig. 1b). Comparing the storminess trend distributions between the two ensembles, the trends in AMIP6 are significantly greater than those in the CMIP6 in the South Pacific (p value < 0.01). Overall, this demonstrates that prescribing observed SST trends significantly increases the South Pacific storminess trends in the models. These results are not sensitive to different years used in calculating trends in CMIP6 and AMIP6, and consistent results are obtained over 1979–2013 (Fig. S6).

To better understand the role of internal variability related to (II) for SH winter storminess trends, we perform similar analyses using the 50-member Community Earth System Model version 2 Large Ensemble (CESM2-LE) simulations (Fig. 2)39,40. We also use the 10-member prescribed (observed) SST CESM2 simulations, namely Global Ocean Global Atmosphere (GOGA) simulations (Fig. 2). Both CESM2-LE and GOGA simulations are forced with the same radiative forcing as CMIP6 and AMIP6 ensembles from 1979 to 2014. Here we focus on trends from 1979 to 2013 (see Fig. S6 for CMIP6 and AMIP6). The CESM2-LE simulations also do not capture the observed SST trends (compare Fig. 3a, b).

Fig. 2: Revisiting the reanalysis-model storminess trend discrepancy using CESM2-LE and GOGA simulations.
figure 2

a–e Similar results to Fig. 1, but using CESM2-LE and GOGA simulations. Note that EKE is defined differently from Fig. 1 and trends are calculated from 1979 to 2013.

Fig. 3: Sea surface temperature trends in CESM2-LE and the impact of pacemaking.
figure 3

a–e Spatial pattern of ensemble-mean JJA SST trends from 1979 to 2013 for (a) CESM2-LE, (b) GOGA, (c) SUM = CESM2-LE + ΔPac + ΔSO, (d) ΔSO = [SOPACE] − [CESM2-LE], and (e) ΔPac = [PacPACE] − [CESM2-LE] simulations. Stipples indicate where ensemble-mean trends are significant at the 95% level. In (d, e), the dashed black lines represent where the SST anomalies are nudged to observation.

For CESM2, EKE is defined from monthly anomalies due to data availability (see Methods). The zonal-mean storminess trend in reanalyses quantified using this definition of EKE also exhibits large observational uncertainty (Fig. 2a). It is also notable that the reanalysis trends are greater (especially the statistically significant ones) compared to Fig. 1a by roughly 4–5 times. This is related to using monthly anomalies instead of 2.5–6 day bandpass-filtered anomalies, which includes a broader spectrum of kinetic energy. Nevertheless, changing the EKE definition does not impact the result that storminess trends are greater in the AMIP6 than CMIP6 ensemble (Fig. S7).

Five reanalysis trends, including the newest reanalyses (ERA5 and JRA3Q), fall outside the 10–90% range of the 50-member CESM2-LE trend distribution (Fig. 2a), similar to CMIP6 ensemble in the zonal mean. The CESM2-LE simulations also show a smaller ensemble-mean storminess trend in the South Pacific compared to reanalyses (compare Fig. 2c, d). Similarly, five reanalysis trends fall outside the 10–90% of the CESM2-LE trend distribution in the South Pacific (Fig. 2b). Thus, CESM2-LE simulations underestimate reanalysis trends similar to CMIP6 ensemble, and internal variability cannot fully account for the reanalysis-model mismatch. We also find similar results for CESM2-LE and GOGA comparison (Fig. 2) as the CMIP6 and AMIP6 comparison (Fig. 1, Figs. S6, S7), where the storminess trends in prescribed SST simulations are significantly greater than those in coupled simulations, particularly in the South Pacific.

The analysis above which accounts for (I) and (II) reveals large uncertainty in reanalysis storminess trends in SH. However, most of the coupled simulations (CMIP6 and CESM2-LE) simulate trends that are smaller than most of the reanalyses, including the newest ones, thus they could be underestimating the reanalysis storminess trends. The simulations with prescribed observed SST (AMIP6 and GOGA), on the other hand, show statistically significantly greater storminess trends than the coupled simulations in the zonal mean and the South Pacific. This indicates that (III) coupled simulation’s deficiency in simulating observed SST trends impacts the prediction of SH storminess trends. In what follows, we show that the important factor impacting the storminess trend underestimation in the coupled simulations is their misrepresentation of the SST trends, particularly in the Southern Ocean and tropical Pacific.

Impact of SST trend discrepancies on storminess trends

SST trend discrepancies can impact the storminess through different mechanisms. Southern Ocean SST trends reflect changes in surface fluxes and equatorward ocean energy transport41. Recent work suggested CMIP6 models underestimate annual-mean SH storminess trends because they do not capture the surface energy flux trends connected to Southern Ocean cooling12. Furthermore, tropical Pacific SST trends likely impact the SH through Rossby wave teleconnections. More specifically, interannually La Nina leads to a stronger South Pacific storminess42,43,44, thus a La-Nina-like SST trend would be expected to strengthen the South Pacific storminess trend.

In order to test the hypothesis that SST trend discrepancies contribute to the underestimation of SH winter storminess trends in coupled simulations, we utilize the Southern Ocean (SOPACE) and tropical Pacific (PacPACE) CESM2 pacemaker simulations (see Methods). The pacemaker simulations nudge the Southern Ocean and tropical east Pacific SST anomalies to observations (Fig. 3d, e). Thus, comparing pacemakers to the free-running CESM2-LE simulations allows us to quantify how SH storminess trends in the coupled simulations would change if they correctly simulated the observed Southern Ocean cooling and La-Nina-like SST trend. Since the SOPACE and PacPACE simulations are forced with the same radiative forcing as CESM2-LE simulations, their ensemble-mean differences with the CESM2-LE simulations (ΔSO and ΔPac) quantify the impact of capturing SST trends in the Southern Ocean and the tropical Pacific, respectively (see Methods).

Impact of Southern Ocean SST trend discrepancy on storminess trends

When CESM2 simulations are forced with historical radiative forcings and SST anomalies are nudged to observations in the Southern Ocean (Fig. 3d), zonal-mean storminess trends increase compared to the CESM2-LE simulations (Fig. 4a). Most of the members in SOPACE simulations exhibit larger trends than the CESM2-LE median trend (compare green box and black dashed line in Fig. 4a). More quantitatively, the SOPACE zonal-mean storminess trends are significantly greater than those from the CESM2-LE simulations (p value = 0.02). The strengthening is noticeable across all longitudes in SH, especially at higher latitudes (compare Figs. 2d, 4c).

Fig. 4: Storminess trends when SST trends are corrected in coupled simulations.
figure 4

a–e Similar results to Fig. 2, but using SOPACE, PacPACE, and SUM simulations. In (a, b), the median trend from the CESM2-LE and GOGA simulations are shown in dashed lines. The reanalysis trends are repeated from Fig. 2.

From an energetic perspective, storminess trends are affected by surface energy flux, top-of-the-atmosphere radiation, and stationary circulation trends (see Eq. 1 in Shaw et al.12). Among these factors, Shaw et al.12 hypothesized the underestimation of annual-mean SH storminess in the CMIP6 models was due to an underestimated surface energy flux trends in CMIP6 models in the SH, which is connected to Southern Ocean cooling trends. The surface energy flux trends reflect ocean heat transport divergence and storage trends45. Thus, if the surface energy flux implies equatorward ocean heat transport, it also strengthens the SH storminess because it steepens the atmospheric equator-to-pole energy gradient (i.e., energy is moved away from the South Pole to the equator by the ocean). This influence of surface energy flux trends on the storminess reflecting ocean heat transport trends can be quantified through the moist static energy budget7,46 with the poleward atmospheric energy transport implied from surface energy flux (FSFC, see Methods).

The poleward atmospheric energy transport (FSFC) induced from surface energy flux shows positive ensemble-mean trends across SH in SOPACE simulations (green, Fig. 5a), which is consistent with strengthening SH storminess, similar to ERA5 (blue, Fig. 5a). In contrast, the ensemble-mean trends in CESM2-LE simulations are negative across SH indicating storminess weakening (black, Fig. 5a). Most of the ensemble members in the SOPACE simulations and ERA5 show positive FSFC trends, whereas most of the members in the CESM2-LE have negative trends (Fig. 5b). Thus, the FSFC trends from the SOPACE simulations are significantly greater than those from CESM2-LE simulations (p value < 0.01, compare green and black in Fig. 5b). Consistent results are found from the CMIP6 ensemble where most of the models simulate negative FSFC trends (black dashed, Fig. 5b).

Fig. 5: Southern Ocean pacemaker improves surface energy flux induced atmospheric energy transport trends.
figure 5

a Linear trends in JJA zonal-mean FSFC from 1979 to 2013 for ERA5 (dark blue), ensemble-mean SOPACE (green) and CESM2-LE (black) simulations. Statistically significant trends at the 95% level are dotted. b Linear trends of SH JJA FSFC (40–75°S) from 1979 to 2013 for ERA5, SOPACE, CESM2-LE, and CMIP6 ensemble (diamonds). Statistically significant trends at the 95% confidence level are filled. The box represents the 10–90% range of model ensemble trends. The horizontal line inside the box shows the median trend in the model ensemble.

Thus, the SOPACE simulations simulate surface energy flux trends consistent with reanalysis trends, which the CESM2-LE simulations struggled to simulate. Consequently, storminess trends are larger in SOPACE than CESM2-LE due to surface energy flux trends that better capture reanalysis trends. The causality is less likely to be the other way around since temperature at the surface is being nudged in the SOPACE simulations.

Note that while surface energy flux and FSFC trends indicate storminess weakening in CESM2-LE and CMIP6, other factors such as top-of-the-atmosphere radiation trends strengthen the storminess (Fig. S8), leading to weak but positive ensemble-mean storminess trends (e.g., Figs. 1, 2).

While nudging SST anomalies in the Southern Ocean significantly strengthens storminess trends in the zonal mean, its impact in the South Pacific is weak. Quantitatively, the SOPACE and CESM2-LE trend distributions in the South Pacific (Fig. 4b) are not significantly different (p value = 0.29). This suggests that other processes are important for storminess trends in the South Pacific.

Impact of tropical Pacific SST trend discrepancy on storminess trends

When CESM2 simulations are forced with historical radiative forcings and SST anomalies are nudged to observations in the tropical east Pacific (Fig. 3e), South Pacific storminess trends increase compared to the CESM2-LE simulations (Fig. 4b, d). Most of the ensemble members in the PacPACE simulations show larger trends than the CESM2-LE simulation median trend (compare red box and black dashed line in 4b). Thus, the South Pacific storminess trends from the PacPACE simulations are significantly greater than those from the CESM2-LE simulations (p value = 0.02). Note that the zonal-mean trend distribution from the PacPACE simulations is not statistically different (p value = 0.35) from the trend distribution from the CESM2-LE simulations (Fig. 4a).

We hypothesized that the La Nina-like SST trend in the tropical Pacific induces a Rossby wave teleconnection trend to the South Pacific, characterized by weaker subtropical jet and strengthened storminess in the South Pacific, similar to what is seen interannually42,43,44. Specifically, during La Nina winters, the colder tropical Pacific temperature leads to weaker subtropical jet directly via thermal wind balance. Weaker subtropical jet then leads to upper tropospheric eddy momentum flux convergence, which induces adiabatic descent to warm the midlatitudes (see Eq. 7 in Seager et al.42). The poleward eddy heat flux in the midlatitudes increases to alleviate the adiabatic warming, strengthening the storminess in the South Pacific42. As such, the subtropical jet weakening can lead to South Pacific storminess strengthening.

The 250-hPa zonal wind and eddy geopotential (deviation from the zonal mean) trends in the reanalyses and PacPACE simulations show a clear La Nina-like teleconnection pattern that is absent in the CESM2-LE simulations (Fig. 6a–c). In particular, all reanalyses and all members in PacPACE simulations have weakening subtropical jet trends across the South Pacific. The subtropical jet trends in PacPACE simulations are significantly different (p value < 0.01) from those in the CESM2-LE simulations where most of the members have strengthening subtropical jet similar to CMIP6 ensemble (dashed box, Fig. 6d). This confirms that PacPACE simulations exhibit stronger South Pacific storminess by capturing La Nina-like teleconnection trends in reanalysis.

Fig. 6: Pacific pacemaker improves La-Nina-like teleconnection trends in the South Pacific.
figure 6

Spatial pattern of 250-hPa zonal wind (colors) and eddy geopotential height (contours) trends for (a) reanalysis mean, (b) PacPACE and (c) CESM2-LE simulation ensemble mean. The positive and negative eddy geopotential height trends are respectively depicted in solid and dashed contours in 0.3 m yr−1 intervals (zero contour is suppressed). Stipples indicate where reanalysis-mean or ensemble-mean trends are significant at the 95% level. The magenta box indicates the South Pacific subtropical jet sector (15–30°S, 170°E–110°W). d Linear trends of South Pacific subtropical jet from 1979 to 2013 for reanalyses (blue colors), PacPACE, CESM2-LE, and CMIP6 ensemble (diamonds). Statistically significant trends at the 95% confidence level are filled. The box represents the full spread of reanalysis trends and the 10–90% range of model ensemble trends. The horizontal line inside the box shows the median trend in the model ensemble.

Combined impact of Southern Ocean and tropical Pacific SST trend discrepancies on storminess trends

To investigate the combined impact of both pacemakers (ΔPac + ΔSO) on the coupled simulations, we create a synthetic large ensemble named SUM with 50 members, which is defined as:

$$\,{\text{SUM}}={\text{CESM2-LE}}\,+{\Delta }_{Pac}+{\Delta }_{SO}$$
(1)

Note that we are adding ensemble-mean impacts (ΔPac + ΔSO) to individual ensemble members of CESM2-LE. This synthetic large ensemble is meant to estimate the results for ensemble simulations that nudge the Southern Ocean and tropical Pacific SST simultaneously. It assumes the ensemble-mean impacts of pacemaker simulations are combined with the forced response and internal variability in the CESM2-LE simulations and that the impacts of pacemakers can be added linearly. A similar approach was taken in Kang et al.33 to create synthetic ensemble simulations using SOPACE simulations.

The SUM ensemble provides valuable insights since it captures the observed SST trend in the ensemble mean (compare Fig. 3b, c). This is due to the remote impacts of the pacemaker simulations on SST trends outside the nudged area (see dashed lines in Fig. 3d, e). The SOPACE simulations affect the SST trend in the Southeast Pacific and around Antarctica33 while the PacPACE simulations reverse the trend in the tropical Pacific and enhance the warming in the Southwest Pacific (Fig. 3e).

The SUM ensemble shows storminess trends are larger than individual pacemaker simulations both in the zonal mean and South Pacific (Fig. 4a, b) and exhibit significant storminess trends across the SH (Fig. 4e). More importantly, the trends in the SUM ensemble are not statistically different from the prescribed-SST GOGA simulations in both zonal mean (p value = 0.41) and South Pacific (p value = 0.26).

Discussion

Our study revisits the reanalysis-model SH winter storminess trend discrepancy and examines factors underlying the discrepancy (observational uncertainty, flawed observation-model comparison, and model deficiency). Expanding the number of reanalysis datasets compared to previous work9 shows that there is significant observational (reanalysis) uncertainty. The results show that simulations with SST prescribed from observations (AMIP) have significantly greater storminess trends, particularly in the South Pacific, than the coupled simulations (CMIP), which do not capture the observed SST trends. Our results further imply that AMIP6 model trends are not significantly different from reanalysis trends, but CMIP6 model trends are still likely discrepant due to SST trend discrepancies. Our conclusion that AMIP6 models capture storm track trends in reanalysis data differs from previous work9 because we used additional reanalysis and model datasets and ensured a like-for-like comparison. This also demonstrates that, among many factors, SST trend discrepancies are an important factor underlying the reanalysis-model SH winter storminess trend discrepancy.

We use the Southern Ocean and tropical Pacific pacemaker simulations to demonstrate the underestimated SH storminess trends in the coupled simulations are connected to SST trend discrepancies through well-understood mechanisms. When the coupled simulation SST trends are corrected across the Southern Ocean, the zonal-mean storminess trends strengthen significantly. The improvement of zonal-mean storminess trends involves simulating surface energy flux trends closer to reanalysis that lead to equatorward ocean energy transport and increased storminess. When the coupled simulation SST trends are corrected across the tropical Pacific, the South Pacific storminess trends strengthen significantly. The improvement of South Pacific storminess trends is a result of capturing a teleconnection trend connecting the La Nina-like SST trend to South Pacific storminess trend, consistent with previous work on interannual timescales42,43,44. Thus, the pacemaker simulations show, when the coupled simulation SST trend discrepancies are corrected, the storminess trends strengthen significantly across the SH and are in better agreement with trends in most reanalyses and the newest reanalyses in particular. This highlights the importance of SST trend discrepancies on the SH winter circulation trends.

Our results show that observational uncertainty is large in the SH circulation trends. Since the SH exhibits a significant observation uncertainty (large spread in reanalyses trends), it is important to use all available reanalysis data as in previous work19,24 until there is good reason to rule out a certain product. The large spread in reanalyses trends, which is comparable to that in the large ensemble simulations, also poses a challenge for reanalysis-model comparison in the SH. Reducing the spread is important and may be helped by looking at direct observations rather than reanalysis products.

The proposed mechanisms connecting SST trends and storminess enhance our understanding of the SH winter storminess trends. Southern Ocean cooling involves equatorward ocean heat transport and storage, which strengthens the SH zonal-mean storminess by changing the equator-to-pole energy gradient. Thus, if the coupled simulations correctly represented Southern Ocean cooling and equatorward ocean heat transport and storage, then storminess trends are stronger12. When the coupled simulations properly represent the tropical SST trends, the subtropical jet weakens, which can affect the barotropic growth rate of storminess making them stronger9.

The pacemaker simulations show that when the SST trend discrepancy is corrected the coupled simulation circulation trend becomes significantly stronger. Thus, it is important to understand the SST trend discrepancy and its underlying mechanisms. For the tropical Pacific, many mechanisms have been proposed31,32,34. For the Southern Ocean, proposed mechanisms involving Antarctic meltwater seem to be important47,48 as well as multidecadal variability49. Understanding the mechanisms underlying the emergent responses is important for having confidence in climate model projections50.

Methods

Storminess trends

We quantify storminess in the SH winter (June–August) using vertically integrated eddy kinetic energy (hereafter EKE), which is defined as

$$EKE=\frac{1}{g}\mathop{\int}\nolimits_{\!{p}_{t}}^{{p}_{s}}{u}^{{\prime} 2}+{v}^{{\prime} 2}dp,$$
(2)

where g is the gravitational acceleration, ps is the surface pressure, pt is the pressure at the highest vertical level (Table S1), and u and v are daily-mean zonal and meridional winds, respectively. Here, the primes denote 2.5–6 day bandpass-filtered anomalies. To produce the bandpass-filtered anomalies, timeseries of u and v with 92 days of SH winter padded with 10 days at both ends are first created. We then apply a first-order Butterworth filter to the time series to obtain 2.5–6 day bandpass-filtered anomalies. We use ps data that has the same time frequency as u and v for most datasets, but monthly-mean ps when high-frequency data is not available. To ensure like-for-like comparison, both reanalysis and model u and v are linearly interpolated onto a common 1.5° × 1.5° grid. Vertical integration is performed using the eight vertical levels available to the CMIP6 models (1000, 850, 700, 500 250, 100, 50, and pt = 10 hPa). While the standard is to multiply 1/2 to the right-hand side of Eq. (2), the factor is not multiplied following Chemke et al.9.

After quantifying storminess each year, the long-term trends are calculated using the least-squares linear regression. The statistical significance of the trend is evaluated as the 95% confidence level using the Mann-Kendall test.

When the like-for-like calculation is not performed and EKE is calculated in unprocessed time and spatial grids in Table S1, the zonal-mean SH winter storminess trends are greater in most reanalyses by about 24% (Fig. S9). This exemplifies the importance of like-for-like calculation.

Atmospheric energy transport from surface energy fluxes

To connect surface energy flux and storminess, we calculate the poleward (in SH) atmospheric energy transport (FSFC) induced from surface energy flux (S, positive upward) trends. They are related as follows7,12,46:

$$\nabla \cdot {f}_{SFC}=S,$$
(3)

where S is the zonal-mean surface energy flux (in W m−2) with the global average removed (defined as positive upward), and \({F}_{SFC}=-2\pi a\cos \phi {f}_{SFC}\) (in PW), represents the poleward atmospheric energy flux induced by surface energy flux gradient at latitude ϕ where a is the Earth’s radius. In FSFC, \(2\pi a\cos \phi\) factor represents zonal integration and a minus sign is multiplied to represent poleward transport in SH. If the ocean heat storage trends are neglected, FSFC also represents equatorward ocean heat transport. The surface energy flux in ERA5 is obtained by subtracting mass-consistent atmospheric total energy flux divergence and energy tendency from the top-of-the-atmosphere radiation51. Note that other reanalyses do not have mass-consistent energy flux datasets available for this calculation.

Similarly, top-of-the-atmosphere radiation and storminess trends can be connected. In Fig. S8, \({F}_{TOA}=-2\pi a\cos \phi {f}_{TOA}\) is the poleward (in SH) atmospheric induced from to-of-atmosphere radiation, where fTOA = RaTOA and RaTOA is the net top-of-atmosphere radiation (positive downward).

Reanalysis datasets

Storminess trends are quantified in 8 reanalysis datasets (observation-based products). The 8 reanalysis datasets consist of the two latest reanalysis datasets from four different reanalysis centers. They are NCEP252, MERRA153, ERA-Interim54, CFSR/CFSv255,56, JRA-5557, MERRA258, ERA559, and JRA3Q60. We focus on the 40 years from 1979 to 2018 as in previous work. MERRA2 starts from 1980 so its trend is calculated from 1980. MERRA1 ends in 2015 (for JJA) so its trend is calculated to 2015. To create daily-mean variables, we use six-hourly instantaneous data, which is the highest frequency common to all reanalysis datasets, although ERA5 and MERRA2 data are available at higher frequencies. The CFSR trend is obtained by merging CFSR (1979–2010) and CFSv2 (2011–2018) datasets.

CMIP6 and AMIP6 model simulations

Storminess trends are quantified in 26 CMIP6 model simulations14 using the historical (1979 to 2014) and SSP5-8.5 (2015 to 2018) scenarios (Table S1). We use the SSP5-8.5 scenario following previous work9,12. Scenario uncertainty is a small contributor since the scenarios begin in 2015. In addition, we quantify storminess in AMIP6 simulations (1979 to 2014) in 32 models (Table S1) with observed SSTs prescribed from 1979 to 2014. We refer to the CMIP6 and AMIP6 model simulations as multi-model ensembles. The difference between the CMIP6 and AMIP6 multi-model ensembles quantifies the impact of discrepancies in SST trends in the CMIP6 models31,32 on storminess trends. We quantify the statistical significance of the difference between trend distributions in the multi-model ensembles using the Mann–Whitney U test61 (hereafter MW test) at the 95% level (p value < 0.05), which is a non-parametric statistical test (similar results are found using Student’s t test). The number of models used in each ensemble is based on the availability of daily-mean zonal and meridional wind data on pressure levels. We use the ‘r1i1p1f1’ ensemble member for all models to equally weight the structural uncertainty across different models.

CESM2 large ensemble and pacemaker simulations

We use the Community Earth System Model version 2 Large Ensemble (CESM2-LE) simulations39,40. The CESM2-LE simulations are an initial condition ensemble with a nominal 1-degree spatial resolution in both atmosphere and ocean. We use the first 50 simulations from this ensemble that are forced with historical radiative forcing and standard biomass burning from 1850 to 2014 consistent with CMIP6 simulations. The SST trends in the CESM2-LE simulations during this period fail to capture the observed SST trends in the tropical Pacific and Southern Ocean30,33 consistent with the CMIP6 multi-model ensemble.

We also use the Southern Ocean pacemaker simulations33 (hereafter called SOPACE) and Pacific pacemaker simulations (hereafter called PacPACE, see https://www.cesm.ucar.edu/working-groups/climate/simulations/cesm2-pacific-pacemakerfor details). The SOPACE and PacPACE simulations have 21 and 10 ensemble members, respectively. The same CMIP6 historical forcing is used for SOPACE (1979–2013) and PacPACE simulations (1880–2014). They have the same horizontal resolution as CESM2-LE. They are fully coupled except in the regions where SST anomalies (relative to observed 1880–2019 climatology) are nudged to observed SST anomalies from ERSSTv562. More specifically, in SOPACE, SST anomalies are nudged to observations poleward of 40°S. In PacPACE, SST anomalies are nudged to observation within a wedge-shaped area of 20°S–20°N from the American coast to the western Pacific. We quantify the impact of pacemaking on the simulated trends as

$$\begin{array}{l}{\Delta }_{SO}=[\,\text{SOPACE}\,]-[\,\text{CESM2-LE}\,],\\ {\Delta }_{Pac}=[\,\text{PacPACE}\,]-[\,\text{CESM2-LE}\,],\end{array}$$
(4)

where the squared brackets denote the ensemble mean33.

Finally, we utilize AMIP-style CESM2 simulations, namely Global Ocean Global Atmosphere (GOGA) simulations, with 10 members. The GOGA simulations are forced with the same CMIP6 historical forcing from 1880 to 2014 and take observed SSTs from ERSSTv5 as boundary conditions. We quantify the impact of SST trend discrepancy by comparing the trend distributions in CESM2-LE and GOGA simulations using the MW test.

For the CESM2 simulations, EKE is calculated using the monthly-mean kinetic energy output following Kang et al.63 due to data availability:

$$EKE=\frac{1}{g}\mathop{\int}\nolimits_{\!{p}_{t}}^{{p}_{s}}\left(\overline{{u}^{2}}+\overline{{v}^{2}}-{\overline{u}}^{2}-{\overline{v}}^{2}\right)dp,$$
(5)

where the \(\overline{{u}^{2}}\) and \(\overline{{v}^{2}}\) are the monthly averages of the square of u and v at each model time step (every 30 min). Since \(\overline{{u}^{{\prime} 2}}=\overline{{u}^{2}}-{\overline{u}}^{2}\) and \(\overline{{v}^{{\prime} 2}}=\overline{{v}^{2}}-{\overline{v}}^{2}\), this method is identical to defining primes in Eq. (2) as deviation from the monthly mean and using model time step (every 30 min) data. As such, this EKE represents the kinetic energy due to sub-monthly variations. For most reanalysis datasets except ERA5, \(\overline{{u}^{2}}\) and \(\overline{{v}^{2}}\) at model time step is not provided and it has to be calculated from six-hourly data. However, for ERA5, we find that the difference between calculating \(\overline{{u}^{2}}+\overline{{v}^{2}}\) at model time step versus six-hourly time step is negligible (about 0.1%, Fig. S10). We extract the 8 CMIP6 pressure levels from all reanalysis datasets. The CESM2 data are interpolated from model levels to the 8 pressure levels. Then, both reanalysis and CESM2 data are linearly interpolated onto a common 1.5° × 1.5° grid and vertically integrated over the 8 pressure levels (pt = 10 hPa). Here we focus on the trend from 1979 to 2013, which is the common period for the CESM2 simulations.