Skip to main content
Log in

Evaluation of metrics for assessing dipolar climate patterns in climate models

  • Published:
Climate Dynamics Aims and scope Submit manuscript

Abstract

In climate model assessment, one of the most widely used procedures is to evaluate the large-scale spatial patterns simulated by models. In this study, we evaluated four non-complex metrics for assessing dipolar and multipolar climate patterns, aiming to ascertain their strengths and possible caveats. Three established metrics are employed: the Taylor skill (TS) score, the Arcsin-Mielke measure M (measure M), and the Spatial Efficiency (SPAEF) metric. Additionally, a fourth metric is introduced by adjusting the TS score (TSadj score), where the standard deviation ratio is substituted with the spatial root-mean-square error (RMSE). By applying these metrics to measure and rank the performance of six CMIP6 models in simulating the dipolar patterns of the East Asian Summer Monsoon and Atlantic Meridional Mode, as well as the quadripolar pattern of the Pacific-North American pattern, the results show that metrics considering spatial error (RMSE/MSE), such as the TSadj score and measure M, offer a more accurate assessment compared to metrics relying on variance comparison of the patterns (such as the TS score and SPAEF metric) since they account for the patterns’ spatial variance distribution. Furthermore, the results provided by the established metrics might not effectively assess the quality of the models’ simulation. Therefore, the TSadj score can be quantified using a threshold set at half of its maximum attainable value to identify well-performing models, corresponding to a minimum spatial correlation of 0.4 and a maximum normalized RMSE of 1. This modification of the TSadj score yields a more practical outcome for the assessment of models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The CMIP6 models’ outputs used in this study are publicly available from https://esgf-node.llnl.gov/search/cmip6/. The Extended Reconstructed Sea Surface Temperature (ERSSTv4) data were provided by the NOAA/OAR/ESRL/PSD at https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html. The NCEP Reanalysis Derived data were provided by the NOAA/OAR/ESRL/PSL at https://www.psl.noaa.gov/data/gridded/data.ncep.reanalysis.derived.surface.html. The 20th Century Reanalysis V3 data provided by the NOAA/OAR/ESRL/PSL at https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html. The GPCP combined precipitation datasets were developed and computed by the NASA/Goddard Space Flight Center’s Mesoscale Atmospheric Processes Laboratory and are available at https://www.psl.noaa.gov/data/gridded/data.gpcp.html.

References

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (42050410322, 42075187), the Ministry of Science and Technology of China (QN2023140001L), the GeoX Interdisciplinary Research Funds (2023300286) for the Frontiers Science Center for Critical Earth Material Cycling, Nanjing University, Jiangsu Collaborative Innovation Center for Climate Change, and High-Performance Computing Center of Nanjing University. The authors acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIPs, and we thank the climate modeling groups (listed in Table 1) for producing and making available their model output. We thank Dr. Ian Watterson and another anonymous reviewer for their comments that greatly improved the original manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (42050410322, 42075187), the Ministry of Science and Technology of China (QN2023140001L), the GeoX Interdisciplinary Research Funds (2023300286) for the Frontiers Science Center for Critical Earth Material Cycling, Nanjing University, Jiangsu Collaborative Innovation Center for Climate Change, and High-Performance Computing Center of Nanjing University.

Author information

Authors and Affiliations

Authors

Contributions

Sandro F. Veiga and Huiling Yuan designed the study. Sandro F. Veiga performed the analysis and wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Huiling Yuan.

Ethics declarations

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 227 KB)

Appendix 1

Appendix 1

1.1 Adjusted Taylor skill score

In Fig. 5 the East Asian Summer Monsoon (EASM) spatial patterns simulated by ACCESS-CM2 and BCC-CSM2-MR are compared with the observation. A visual inspection reveals that ACCESS-CM2 simulates the EASM pattern better than BCC-CSM2-MR. This is confirmed by the difference in their spatial correlations, which is 0.78 for ACCESS-CM2 and 0.57 for BCC-CSM2-MR. ACCESS-CM2 correctly shows the southern zonally oriented negative pole and northern zonally oriented positive pole. This shape is not simulated by BCC-CSM2-MR. However, the TS score of ACCESS-CM2 is 0.73 and the TS score of BCC-CSM2-MR is 0.78, hence TS score indicates that BCC-CSM2-MR simulates the EASM pattern more accurately than ACCESS-CM2. The reason for the TS score to indicate a better simulation for BCC-CSM2-MR is due to the strong magnitudes in both poles simulated by ACCESS-CM2 which increases its spatial standard deviation, which is 0.92 mm day–1, whilst BCC-CSM2-MR simulates a deformed pattern with a spatial standard deviation of 0.56 mm day–1, which is very similar to the observed one (σo = 0.59 mm day–1). Thus, the very similar spatial standard deviations between BCC-CSM2-MR and observed patterns compensate the lack of dipolar structure in BCC-CSM2-MR when we use the TS score.

Fig. 5
figure 5

EASM from observation and simulated by ACCESS-CM2 and BCC-CSM2-MR

We consider that to evaluate dipolar patterns the metric should give a significant bonification weight to climate models that simulate the spatial dipolar structure correctly. Therefore, we adjusted the TS score to assess dipolar or multipolar patterns while preserving two important TS score original features, namely its simplicity and a numerical value as a result. This adjusted TS (TSadj) score should not be dependent on the spatial patterns’ standard deviations. Hence, similar to the original TS score, the pattern similarity is given by the spatial Pearson correlation coefficient (R, Eq. 12), but to assess the error in the dipolar shape and distribution magnitudes in the pattern we use the normalized RMSE (NRMSE). Using the reasoning underlying the original TS score, these two basic statistics are combined in a single metric via the formulation given by the following equation:

$${TS}_{adj}=\frac{4{(1+R)}^\alpha}{\frac{\left(4\;+\;NRMSE\right)^\beta}{4^{\beta-1}}{(1+R_0)}^\alpha},$$
(12)
$$\left\{\begin{array}{c}\alpha =1,\dots ,\infty \\ \beta =1,\dots ,\infty \end{array}\right.$$

where R, R0, and NRMSE are the spatial Pearson correlation coefficient, the maximum correlation coefficient attainable (defined as 0.999), and the normalized RMSE, respectively. Similar to TS, TSadj closer to 1 indicates a better match between the model pattern and observation pattern, whereas closer to 0 indicates a very bad match. This formulation assures the basic conditions for a useful skill score, namely: (1) TSadj \(\in\) [0,1]; (2) R → R0 and NRMSE → 0 implies TSadj → 1; (3) NRMSE → ∞ implies TSadj → 0; and (4) R = ‒1 implies TSadj = 0. The input parameters α and β are introduced to penalize low R and high NRMSE, respectively. They are integer values subjectively chosen by the user according to their evaluation interest. The parameter β-1 assures that the TSadj score is bounded between 0 and 1, independently of the value of β. For instance, by applying the TSadj score to the EASM dipole patterns (Fig. 5) with α = 1 and β = 3, we have a TSadj = 0.46 for ACCESS-CM2 and TSadj = 0.44 BCC-CSM2-MR, thus correcting the assessment that ACCESS-CM2 reproduces a better dipolar structure. This specific example shows the utility of the TSadj score to assess the performance fidelity of dipole patterns simulated by climate models.

To have a better idea of how the TSadj score is influenced by the parameters α and β, Fig. 6 shows how TSadj varies according to the variation of R and NRMSE (lines in the plots) and different values for the parameters α and β (each plot). The lines of the plots are set for different correlation values (R from ‒1 to 1), with different colors, whereas the behavior of the lines represents the variation of TSadj according to different values of NRMSE (from 0 to 7). Each one of the four plots represents the TSadj values based on different penalizations exerted in the input statistics (R and NRMSE) by the different values in parameters α and β. Also, the results of the six models simulating the EASM from the main article shown in Fig. 1 are compared for their TSadj score based on different values in parameters α and β. In Fig. 6a the parameters α is set to 1 and β is set to 2 (i.e., TSadj(R,NRMSE,α = 1,β = 2)), therefore no penalization is exerted in R whilst penalization is exerted on models that have greater NRMSE. In Fig. 6b α is set to 2 while keeping β = 2, therefore penalizing equally models that have greater R and NRMSE. Figure 6c describes a higher level of NRMSE penalization with β = 3 without penalization on R (α = 1), whereas Fig. 6d shows how TSadj varies when the lack of pattern correlation (R) is penalized by setting α = 2, with high NRMSE penalization (i.e., β = 3). TSadj strictly monotonically increases with increasing spatial correlation unless in extreme conditions of NRMSE → ∞ and/or β → ∞.

The diamond symbols represent the TSadj values calculated for the EASM dipolar patterns simulated by each six models in Fig. 1 (main text). It should be noticed that the R and NRMSE of the models do not change for each plot (different α and β), but the choice of the parameters changes the models’ ranking evaluation since we are penalizing the input statistics (R and NRMSE) with different weights. This can be seen by comparing BCC-CSM2-MR (purple diamond) and IPSL-CM6A-LR (pink diamond) in different plots. Although their TSadj values are very similar, when the lack of spatial correlation (α = 1) is not penalized but NRMSE is penalized (β = 2 or 3), IPSL-CM6A-LR has slightly higher TSadj than BCC-CSM2-MR (Fig. 6a and c), whilst when both R and NRMSE are penalized (α = 2 and β = 2 or 3), the TSadj values of BCC-CSM2-MR are slightly higher than the ones of IPSL-CM6A-LR (Fig. 6b and d). With no penalization or only penalizing NRMSE (Fig. 6a), there are high values for TSadj (e.g., TSadj ~ 0.6) even for models that might have high NRMSE (i.e., NRMSE > 1). In Fig. 6a (α = 1 and β = 2) and Fig. 6b (α = 2 and β = 2) the TSadj values of ACCESS-CM2 are 0.57 and 0.51, respectively. ACCESS-CM2 simulates a very strong dipolar pattern as shown by its NRMSE = 0.98 (worst of all models). In this case, it depends on the user how to penalize the models. We may use as a “rule-of-thumb” that the models that accurately simulate the main features of the dipole structure should have a TSadj above 0.5. To have a metric that shows a reasonable variation of its values, a sensitive choice may be necessary also for basic statistics R and NRMSE. Thus, it seems accurate if thresholds are defined for TSadj, R, and NRMSE to indicate well-performing models, although this is a subjective choice. A reasonable choice is to set α = 2 and β = 3. In this case, models with low spatial correlation (R < 0.4) or models with great error (e.g., NRMSE > 1) should return TSadj values that are lower than 0.5, therefore half of its highest attainable value. Thus, setting the minimum thresholds for the basic statistics of NRMSE = 1 (even with the highest value of R = 1) and R = 0.4 (even with the lowest value of NRMSE = 0) to yield a TSadj = 0.5, as shown in Fig. 6d, can be an appropriated strategy. In this kind of assessment usually the patterns have dozens of grid points, therefore the number of degrees of freedom allows R = 0.4 to be statistically significant at p < 0.001.

Fig. 6
figure 6

The variation of TSadj according to different R (different lines) and NRMSE (lines behavior) and different values for the parameters α and β (different plots), with (a) TSadj(R,NRMSE,α = 1,β = 2), (b) TSadj(R,NRMSE,α = 2,β = 2), (c) TSadj(R,NRMSE,α = 1,β = 3), and (d) TSadj(R,NRMSE,α = 2,β = 3). Different R lines are given by respective colors in the legends. The diamond symbols show the TSadj values of each pattern from Fig. 1 (main text). The gray horizontal lines indicate a TSadj value of 0.5. The metrics are dimensionless

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Veiga, S.F., Yuan, H. Evaluation of metrics for assessing dipolar climate patterns in climate models. Clim Dyn 62, 6487–6503 (2024). https://doi.org/10.1007/s00382-024-07220-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00382-024-07220-3

Keywords

Navigation

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy