Skip to main content
Log in

Forced response and internal variability in ensembles of climate simulations: identification and analysis using linear dynamical mode decomposition

  • Original Article
  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

Estimating climate response to observed and projected increases in atmospheric greenhouse gases usually requires averaging among multiple independent simulations of computationally expensive global climate models to filter out the internal climate variability. Studies have shown that advanced pattern recognition methods allow one to obtain accurate estimates of the forced climate signal from just a handful of such climate realizations. The accuracy of these methods for a fixed ensemble size, however, decreases with an increasing magnitude of the low-frequency, decadal and longer internal climate variability. Here we generalize a previously developed Bayesian methodology of Linear Dynamical Mode (LDM) decomposition for spatially extended time series to enable joint identification and analysis of forced signal and internal variability in ensembles of climate simulations, a methodology dubbed here an ensemble LDM, or ELDM. The new ELDM method is shown to outperform its pattern-recognition competitors by more accurately isolating the forced signal in small ensembles of both toy- and state-of-the-art climate-model simulations. It is able to do so by explicitly recognizing a non-random structure of the internal variability, identified by the ELDM algorithm alongside the optimal forced-signal estimate, which allows one to study possible dynamical connections between the two types of variability. The optimal ELDM filtering provides a unique opportunity for objective intercomparison of decadal and longer climate variability across different global climate models—a task that proved difficult due to uncertainties associated with the noisy character and limited length of historical climate simulations combined with parameter uncertainties of alternative signal-detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

CESM Large Ensemble data used in this study is publicly available from Climate Data Gateway at the National Center for Atmospheric Research (https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.cesmLE.html). The 20th Century Reanalysis Project dataset (version 2c) is publicly available at the National Oceanic and Atmospheric Administration (NOAA)/Oceanic and Atmospheric Research (OAR)/Earth System Research Laboratories (ESRL) Physical Sciences Laboratory (PSL) website (https://psl.noaa.gov/).

Notes

  1. Note that in a nonlinear dynamical system the statistics of the internal variability so defined may still undergo changes under variable external conditions.

  2. This effect is conceptually very similar to the degeneracy of eigenrotation of a matrix in the presence of similar eigenvalues, for example, in the Principal Component Analysis.

References

Download references

Acknowledgements

This research was supported by the state assignment of the Institute of Applied Physics of the Russian Academy of Sciences (Project No. FFUF-2022-0008) (ELDM model formulation and training scheme, synthetic example and CESM-LE analysis; Sects. 2, 3, 4 and Appendices C, D). Also, this research was supported by the project #075-02-2023-911 of Program for the Development of the Regional Scientific and Educational Mathematical Center “Mathematics of Future Technologies” (numerical hyperparameter optimization algorithm in Appendices A, B). The authors acknowledge the CESM Large Ensemble Community Project and supercomputing resources provided by NSF/CISL/Yellowstone (Kay et al. 2015). The 20th Century Reanalysis Project dataset (version 2c, Compo et al. 2011) used for comparative estimates in Sect. 3 is provided by the U.S. Department of Energy (DOE), Office of Science Biological and Environmental Research (BER) and by the National Oceanic and Atmospheric Administration Climate Program Office.

Funding

This research was supported by the state assignment of the Institute of Applied Physics of the Russian Academy of Sciences (Project No. FFUF-2022-0008) (ELDM model formulation and training scheme, synthetic example and CESM-LE analysis; Sects. 2, 3, 4 and Appendices C, D). Also, this research was supported by the project #075-02-2023-911 of Program for the Development of the Regional Scientific and Educational Mathematical Center “Mathematics of Future Technologies” (numerical hyperparameter optimization algorithm in Appendices A, B).

Author information

Authors and Affiliations

Authors

Contributions

AG: elaborated the idea of applying LDM approach to ensemble climate data, SK, DM and AF: contributed to developing the ELDM method. AG and SK: constructed the synthetic example, and designed the analysis framework. AG: implemented the ELDM algorithm, performed ELDM analysis of the synthetic example, as well as all other supplementary computations. MB: computed ELDMs for CESM-LE sub-ensembles. All authors analysed the results. AG, SK and MB: wrote the first draft of the manuscript, which was then revised by all authors.

Corresponding author

Correspondence to Andrey Gavrilov.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: LDM decomposition: details of the cost functions

1.1 The original LDM method

The prior PDF for the LDM patterns \(\textbf{A},\textbf{c}\) is taken to be Gaussian, with the covariance matrix proportional to the (diagonal) sample covariance matrix of the data \(\textbf{X}\) (see Mukhin et al. 2015; Gavrilov et al. 2016):

$$\begin{aligned} \begin{array}{l} P_{pr}(\textbf{A},\textbf{c}\mid \sigma _A, \sigma _c,d) = \\ ~~~~~~~~= \prod \limits _{k=1}^{K}\left[ \prod \limits _{i=1}^{d}P_\mathcal {N}(A_{ki};\lambda _k\sigma _A^2) P_\mathcal {N}(c_{k};\lambda _k\sigma _c^2) \right] , \end{array} \end{aligned}$$
(A1)

where \(\lambda _k\) are variances of the input PCs, and the scaling factors \(\sigma _A\) and \(\sigma _c\) are included in the set of the LDM model’s hyperparameters. The prior PDF \(P_{pr}(\sigma )\) for the parameter \(\sigma \) is taken to be constant at the interval \([0, \sigma _{max}]\) and set to zero outside of this interval. The value \(\sigma _{max}\) represents a hypothetical maximum possible value of \(\sigma \) which we define as a square root of the mean variance of the input data \(\textbf{x}_n^{(m)}\) across ensemble, time and space dimensions, thus corresponding to the ELDM-solution with all modes set to zero in Eq. (10). In fact, the choice of any larger value of \(\sigma _{max}\) does not affect optimization results. The prior PDF \(P_{pr}(\textbf{P}\mid \varvec{\tau },\varvec{\sigma _p},d)\) for the LDM time series \(\textbf{P}\) is given by (3) of Sect. 2.1.

Bayesian formalism provides two cost functions—as described by (6) and (7) of section  2.1—to find the optimal values of LDM model (2) parameters \(\varvec{\mu }=(\textbf{A},\textbf{c},\textbf{P},\sigma )\) given the data \(\textbf{X}\) and the prior PDFs with the fixed hyperparameters \(\textbf{H}=(d,\varvec{\tau },\varvec{\sigma }_p,\sigma _A,\sigma _c)\). The conditional PDFs on the right-hand side of these cost functions are given by

$$\begin{aligned}{} & {} P\left(\textbf{X}\mid \varvec{\mu },\textbf{H}\right)= \prod \limits _{n=1}^{N_T} \prod \limits _{k=1}^K \nonumber \\{} & {} \quad P_\mathcal {N}({x}_{kn}-\sum \limits _{i=1}^{d}{A}_{ki} p_{in};\sigma ^2) \end{aligned}$$
(A2)

and

$$\begin{aligned} \begin{array}{l} P\left(\varvec{\mu }\mid \textbf{H}\right)= P_{pr}(\textbf{P}\mid \varvec{\tau },\varvec{\sigma }_p,d) \\ ~~~~~~~~~~~~~~~~ P_{pr}(\textbf{A},\textbf{c}\mid {\sigma }_A,{\sigma }_c,d) P_{pr}(\sigma ). \end{array} \end{aligned}$$
(A3)

Note that the likelihood function (A2) is directly based on the LDM model (2).

To obtain the solution maximizing the functionals (6) and (7), we have used a modified and improved version of the numerical algorithm originally developed in Gavrilov et al. (2016) and Mukhin et al. (2015).

1.2 The new ELDM method

In our present, modified ELDM method, the same Bayesian formalism is applied to find the optimal values of the parameters \(\varvec{\mu }=(\textbf{A},\textbf{B},\textbf{c},\textbf{P},\textbf{F},\sigma )\). The prior PDFs for \(\textbf{A},\textbf{c},\sigma \) are the same as in the original LDM method—see (A1)—except that \(\lambda _k\) are now associated with the diagonalized ensemble covariance matrix, the prior PDFs for \(\textbf{P}\) and \(\textbf{F}\) are given by (11) and (12), while the prior PDF for \(\textbf{B}\) is completely analogous to (A1):

$$\begin{aligned}{} & {} P_{pr}(\textbf{B}\mid \sigma _B, d_f) =\prod \limits _{k=1}^{K} \nonumber \\{} & {} \quad \prod \limits _{i=1}^{d_f}P_\mathcal {N}(B_{ki};\lambda _k\sigma _B^2), \end{aligned}$$
(A4)

where an additional hyperparameter \(\sigma _B\) is introduced to be optimized alongside the others. Thus, the full set of hyperparameters is \(\textbf{H}=(d,d_f,\varvec{\tau },\varvec{\tau _f},\varvec{\sigma }_p,\varvec{\sigma _f},\sigma _A,\sigma _B,\sigma _c)\). The Bayesian cost functions (6), (7) are now maximized using the following expressions in lieu of (A2),(A3):

$$\begin{aligned}{} & {} \begin{array}{l} P\left(\textbf{X}\mid \varvec{\mu },\textbf{H}\right)= \\ = \prod \limits _{m=1}^{N_R} \prod \limits _{n=1}^{N_T} \prod \limits _{k=1}^K P_\mathcal {N}({x}_{kn}^{(m)}-\sum \limits _{i=1}^{d}{A}_{ki} p_{in}^{(m)} -\sum \limits _{i=1}^{d_f}{B}_{ki} f_{in};\sigma ^2) \end{array} \end{aligned}$$
(A5)
$$\begin{aligned}{} & {} \begin{array}{l} P(\varvec{\mu }\mid \textbf{H}) = P_{pr}(\textbf{P}\mid \varvec{\tau },\varvec{\sigma }_p,d) \cdot P_{pr}(\textbf{A},\textbf{c}\mid {\sigma }_A,{\sigma }_c,d) \cdot \\ ~~~~~~~~~~~~~~~~ \cdot P_{pr}(\textbf{B}\mid {\sigma }_B, d_f) \cdot P_{pr}(\sigma ) \end{array} \end{aligned}$$
(A6)

The modified ELDM procedure provides, among other things, the forced response patterns \(\textbf{B}\) and time series \(\textbf{F}\) that are internally consistent with the other parameters of the LDM decomposition, which is an improvement over the traditional strategy of removing the forced signal from the data using linear regression methods prior to the analysis. Furthermore, and more importantly, the ELDM method assumes no ad hoc spatiotemporal orthogonality constraints among the modes of forced and internal variability, which permits a potentially revealing analysis of the dynamical relationship between these two types of variability.

Appendix B: Optimal dimensions of the ELDM estimated subspaces of forced and internal variability in CESM-LE

The optimal dimensions d and \(d_f\) of the subspaces associated with the ELDM estimated internal and forced variability are obtained by trial and error. Namely, for a given pair \((d,\, d_f)\) we maximize Bayesian evidence \(P(\textbf{X}\mid d,d_f,\tilde{\textbf{H}})\) [the cost function (6)] over all the other hyperparameters \(\tilde{\textbf{H}}=(\varvec{\tau },\varvec{\tau _f},\varvec{\sigma }_p,\varvec{\sigma _f},\sigma _A,\sigma _B,\sigma _c)\) using the algorithm described in Gavrilov et al. (2020). In fact, instead of the expression (6) we work with its logarithm divided by \(N_T N_R\) to define the evidence score for every pair \((d,\, d_f)\):

$$\begin{aligned} E(d,d_f)=\max _{\tilde{\textbf{H}}} \left[ \frac{1}{N_T N_R}\log P(\textbf{X}\mid d,d_f,\tilde{\textbf{H}}) \right] \end{aligned}$$
(B1)

At fixed \(d_f\), these evidence scores (Fig. 10) increase with d and reach a plateau at some value of d. The score associated with the latter plateau increases with \(d_f\) as well, but also saturates at some value of \(d_f\). The optimal pair \((d,\,d_f)\) is chosen to correspond to the minimum values at which the maximum (plateaued) evidence score is reached. In Fig. 10 this occurs at \(d=d_f=3\). Further increase of d or \(d_f\) only results in the identification of the null modes of internal variability or forced response, which capture no variance and should be omitted.

Fig. 10
figure 10

Evidence scores computed for the CESM-LE ensemble C (see Sect. 3.2) for different values of d (on the abscissa) and \(d_f\) (see color legend). Shown are the values \(E(d,d_f)-E(d=0,d_f=0)\) which are dimensionless (independent of the particular data pre-normalization). The minimum values \((d,\,d_f)\) associated with the beginning of saturation are chosen as the optimal pair. Here \(d=d_f=3\)

Appendix C: Comparison of ELDM and S/NP methods performance in estimating the forced signal in CESM-LE

Fig. 11
figure 11

Mean squared error (MSE; \(^\circ C^2\)) of the forced-signal reconstruction by different methods (see panel captions). The MSE was computed with respect to the reference forced signal estimate obtained as the 3 year smoothed 40-member SAT ensemble average (see text for details)

Fig. 12
figure 12

SAT forced signal reconstructions by different methods (see the legend) averaged in six non-overlapping latitude bands (see panel captions)

We mentioned in Sect. 3.2 that the spread (across the four CESM-LE 10-member sub-ensembles) of the ELDM estimated forced signal based on all of the three ELDM modes is less than that of the signal based on the two ELDM modes, which further demonstrates the optimality of the \(d_f=3\) forced-signal dimension returned by the ELDM algorithm. We computed this spread as the (time and space averaged) mean squared error (MSE) of the 4 estimated forced-signal time series with respect to their ensemble-mean time series, and expressed the total MSE as a fraction of the total variance of the reference forced signal (see below), which gave the MSE of \(4.7\%\) and \(4\%\) for the two-mode and three-mode ELDM based forced signal estimates. By comparison, the MSE spreads of the S/NP-method forced-signal estimates based on the two and three leading S/NP modes are 3.7 vs. \(6.9\%\), arguing for the optimality of the 2-mode S/NP representation of the forced signal.

The above measure of the forced-signal uncertainty has, however, more to do with the statistical significance of the estimated forced signal than with the accuracy of a given method in reconstructing the (unknown) true signal. To gauge the performance each method in doing so, we here define the reference estimate of the forced signal as the 40-member ensemble mean further smoothed by the 3 year boxcar running mean. The choice of the 3 year window size is a compromise between an attempt to increase the signal-to-noise ratio while retaining the short-term forced signals associated with volcanic eruptions. Other window sizes produce qualitatively similar results (not shown).

Figure 11 displays two-dimensional maps of the MSE of the forced signal estimated by each method with respect to the reference forced signal defined above; these MSE values were averaged across the four sub-ensembles considered. The three-mode ELDM estimate of the forced signal performs the best and has the lowest spatially averaged MSE of about \(16\%\) in terms of the total (reference) forced-signal variance. The two-mode S/NP based estimate of the forced signal has a slightly larger, second-best MSE and spatial MSE distribution which is very similar to that of the 3-mode ELDM based estimate. Overall, both methods perform similarly well in reconstructing the reference forced signal in the present example.

We finally speculate that a relatively high MSE of the forced-signal reconstruction (\(16-18\%\), as per the estimates above) may in fact be dominated by the errors of the reference forced signal with respect to the (unknown) true forced signal due to insufficient number (40) of ensemble members used to compute the reference signal. This is particularly apparent in the time series of the reference vs. reconstructed signals in Fig. 12, where the reference signal exhibits interannual undulations (outside of volcanic minima) not easily attributable to a particular forcing, and yet misses a substantial part of the cooling associated with volcanic episodes in some of the latitude bands.

Appendix D: ELDM decomposition of synthetic and CESM data: verifying the reconstructions of time scales and patterns

Fig. 13
figure 13

The autocorrelation functions (ACFs) of internal modes estimated by the ELDM method applied to the synthetic example of Sect. 3.1 in comparison with the ACFs of the corresponding true modes (marked “System”), based on their time series shown in Figs. 3a,b and 1d,e, respectively, for both slow (left panel) and fast (right panel) modes. The ACFs were computed by averaging the time-lagged autocorrelations across all the ensemble members

In this section, we provide further details on the ELDM time series and patterns in support of conclusions put forward in the main text.

Figure 13 shows the autocorrelation functions (ACFs) of the ELDM estimated slow and fast internal modes for the synthetic example of Sect. 3.1, alongside the ACFs of the actual slow and fast signals in the synthetic system, which illustrates the excellent reconstruction of these time scales by the ELDM procedure. The first zero of ACF for both ELDM and System based estimates equals approximately 25 time units for the slow mode and 1.3 time units for the fast mode, which roughly corresponds to the quarter-periods of the respective oscillators.

Fig. 14
figure 14

The autocorrelation functions (ACFs) of three internal modes (from left to right) estimated by the ELDM method applied to the four independent CESM-LE sub-ensembles A–D of SAT simulations (see the legend), based on their time series shown in Fig. 7, both slow (left panel) and fast (right panel). These ACFs are computed by averaging the time-lagged autocorrelations across all ensemble members. Their first intersection with zero for each mode in each sub-ensemble is denoted by \(\tau _{ACF}\) in Fig. 7: it corresponds approximately to the quarter-periods of the modes

Fig. 15
figure 15

The pairwise correlation coefficients between the spatial patterns of the three internal modes (from left to right) estimated by the ELDM method applied to the four independent CESM-LE sub-ensembles A–D of SAT simulations (see the axis labels). Shown from left to right are the correlation matrices for the ELDM modes 1–3. All correlations passed the statistical significance test described in text

Similarly, Fig. 14 shows ACFs for the internal modes obtained in the four sub-ensembles of the CESM-LE SAT data to support our conclusion about the robustness of their time scales. In addition, Fig. 15 shows pairwise correlation coefficient between the spatial patterns of these modes shown in Fig. 8. This coefficient was computed as the area-weighted dot-product of the normalized patterns, with normalization defined as a square root of the same dot product of the pattern with itself. High correlations (greater than 0.85) of the patterns for each mode between sub-ensembles signifies the robustness of these patterns. Slightly lower correlation coefficient value for the third mode can be explained by its very low variance (see the main text) and, therefore, lower detectability. To assess statistical significance of these correlations, we generated synthetic values of the correlations associated with the null hypothesis that the patterns tested are randomly drawn from the sub-space of 20 leading EOFs of CESM-LE SAT (with the weights corresponding to the EOF standard deviations). All the correlations in Fig. 15 far exceed the 5% significance level corresponding to the correlation value of 0.53 so computed.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gavrilov, A., Kravtsov, S., Buyanova, M. et al. Forced response and internal variability in ensembles of climate simulations: identification and analysis using linear dynamical mode decomposition. Clim Dyn 62, 1783–1810 (2024). https://doi.org/10.1007/s00382-023-06995-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-023-06995-1

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy