Abstract
In climate model assessment, one of the most widely used procedures is to evaluate the large-scale spatial patterns simulated by models. In this study, we evaluated four non-complex metrics for assessing dipolar and multipolar climate patterns, aiming to ascertain their strengths and possible caveats. Three established metrics are employed: the Taylor skill (TS) score, the Arcsin-Mielke measure M (measure M), and the Spatial Efficiency (SPAEF) metric. Additionally, a fourth metric is introduced by adjusting the TS score (TSadj score), where the standard deviation ratio is substituted with the spatial root-mean-square error (RMSE). By applying these metrics to measure and rank the performance of six CMIP6 models in simulating the dipolar patterns of the East Asian Summer Monsoon and Atlantic Meridional Mode, as well as the quadripolar pattern of the Pacific-North American pattern, the results show that metrics considering spatial error (RMSE/MSE), such as the TSadj score and measure M, offer a more accurate assessment compared to metrics relying on variance comparison of the patterns (such as the TS score and SPAEF metric) since they account for the patterns’ spatial variance distribution. Furthermore, the results provided by the established metrics might not effectively assess the quality of the models’ simulation. Therefore, the TSadj score can be quantified using a threshold set at half of its maximum attainable value to identify well-performing models, corresponding to a minimum spatial correlation of 0.4 and a maximum normalized RMSE of 1. This modification of the TSadj score yields a more practical outcome for the assessment of models.
Similar content being viewed by others
Data availability
The CMIP6 models’ outputs used in this study are publicly available from https://esgf-node.llnl.gov/search/cmip6/. The Extended Reconstructed Sea Surface Temperature (ERSSTv4) data were provided by the NOAA/OAR/ESRL/PSD at https://www.esrl.noaa.gov/psd/data/gridded/data.noaa.ersst.v4.html. The NCEP Reanalysis Derived data were provided by the NOAA/OAR/ESRL/PSL at https://www.psl.noaa.gov/data/gridded/data.ncep.reanalysis.derived.surface.html. The 20th Century Reanalysis V3 data provided by the NOAA/OAR/ESRL/PSL at https://psl.noaa.gov/data/gridded/data.20thC_ReanV3.html. The GPCP combined precipitation datasets were developed and computed by the NASA/Goddard Space Flight Center’s Mesoscale Atmospheric Processes Laboratory and are available at https://www.psl.noaa.gov/data/gridded/data.gpcp.html.
References
Adler RF, Huffman GJ, Chang A et al (2003) The Version-2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis (1979–Present). J Hydrometeorol 4:1147–1167. https://doi.org/10.1175/1525-7541(2003)004%3c1147:TVGPCP%3e2.0.CO;2
Bador M, Boé J, Terray L et al (2020) Impact of Higher Spatial Atmospheric Resolution on Precipitation Extremes Over Land in Global Climate Models. J Geophys Res Atmos 125:e2019JD032184. https://doi.org/10.1029/2019JD032184
Bhattacharya B, Mohanty S, Singh C (2022) Assessment of the potential of CMIP6 models in simulating the sea surface temperature variability over the tropical Indian Ocean. Theor Appl Climatol 148:585–602. https://doi.org/10.1007/s00704-022-03952-6
Bi D, Dix M, Marsland S et al (2020) Configuration and spin-up of ACCESS-CM2, the new generation Australian Community Climate and Earth System Simulator Coupled Model. J South Hemisph Earth Syst Sci 70:225–251. https://doi.org/10.1071/ES19040
Boucher O, Servonnat J, Albright AL et al (2020) Presentation and Evaluation of the IPSL‐CM6A‐LR Climate Model. J Adv Model Earth Syst 12:e2019MS002010. https://doi.org/10.1029/2019MS002010
Brands S (2022) A circulation-based performance atlas of the CMIP5 and 6 models for regional climate studies in the Northern Hemisphere mid-to-high latitudes. Geosci Model Dev 15:1375–1411. https://doi.org/10.5194/gmd-15-1375-2022
Calim Costa M, Nobre P, Oke P et al (2022) The Spectral Diagram as a new tool for model assessment in the frequency domain: Application to a global ocean general circulation model with tides. Comput Geosci 159:104977. https://doi.org/10.1016/j.cageo.2021.104977
Chen C-A, Hsu H-H, Liang H-C (2021) Evaluation and comparison of CMIP6 and CMIP5 model performance in simulating the seasonal extreme precipitation in the Western North Pacific and East Asia. Weather Clim Extrem 31:100303. https://doi.org/10.1016/j.wace.2021.100303
Chiang JCH, Vimont DJ (2004) Analogous Pacific and Atlantic Meridional Modes of Tropical Atmosphere-Ocean Variability. J Clim 17:4143–4158. https://doi.org/10.1175/JCLI4953.1
Demirel MC, Mai J, Mendiguren G et al (2018) Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model. Hydrol Earth Syst Sci 22:1299–1315. https://doi.org/10.5194/hess-22-1299-2018
Dunne JP, Horowitz LW, Adcroft AJ, et al (2020) The GFDL Earth System Model Version 4.1 (GFDL‐ESM 4.1): Overall Coupled Model Description and Simulation Characteristics. J Adv Model Earth Syst 12:e2019MS002015. https://doi.org/10.1029/2019MS002015
Eyring V, Righi M, Lauer A et al (2016) ESMValTool (v1.0) – a community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP. Geosci Model Dev 9:1747–1802. https://doi.org/10.5194/gmd-9-1747-2016
Guo H, Bao A, Chen T et al (2021) Assessment of CMIP6 in simulating precipitation over arid Central Asia. Atmos Res 252:105451. https://doi.org/10.1016/j.atmosres.2021.105451
Hirota N, Takayabu YN, Watanabe M, Kimoto M (2011) Precipitation reproducibility over tropical oceans and its relationship to the double ITCZ problem in CMIP3 and MIROC5 climate models. J Clim 24:4859–4873. https://doi.org/10.1175/2011JCLI4156.1
Huang B, Banzon VF, Freeman E et al (2015) Extended reconstructed sea surface temperature version 4 (ERSST.v4). Part I: Upgrades and intercomparisons. J Clim 28:911–930. https://doi.org/10.1175/JCLI-D-14-00006.1
Jiang D, Hu D, Tian Z, Lang X (2020) Differences between CMIP6 and CMIP5 Models in Simulating Climate over China and the East Asian Monsoon. Adv Atmos Sci 37:1102–1118. https://doi.org/10.1007/s00376-020-2034-y
Kalnay E, Kanamitsu M, Kistler R et al (1996) The NCEP/NCAR 40-Year Reanalysis Project. Bull Am Meteorol Soc 77:437–471. https://doi.org/10.1175/1520-0477(1996)077%3c0437:TNYRP%3e2.0.CO;2
Koch J, Demirel MC, Stisen S (2018) The SPAtial EFficiency metric (SPAEF): multiple-component evaluation of spatial patterns for optimization of hydrological models. Geosci Model Dev 11:1873–1886. https://doi.org/10.5194/gmd-11-1873-2018
Koh T-Y, Wang S, Bhatt BC (2012) A diagnostic suite to assess NWP performance. J Geophys Res Atmos 117. https://doi.org/10.1029/2011JD017103
Meehl GA et al (2007) Global climate projections. In: Solomon S et al (eds) Climate Change 2007: The Physical Science Basis. Cambridge University Press, pp 747–845
Mielke PW (1991) The application of multivariate permutation methods based on distance functions in the earth sciences. Earth-Science Rev 31:55–71. https://doi.org/10.1016/0012-8252(91)90042-E
Nobre P, Shukla J (1996) Variations of Sea Surface Temperature, Wind Stress, and Rainfall over the Tropical Atlantic and South America. J Clim 9:2464–2479. https://doi.org/10.1175/1520-0442(1996)009%3c2464:VOSSTW%3e2.0.CO;2
Olmo ME, Bettolli ML (2021) Extreme daily precipitation in southern South America: statistical characterization and circulation types using observational datasets and regional climate models. Clim Dyn 57:895–916. https://doi.org/10.1007/s00382-021-05748-2
Rajendran K, Surendran S, Varghese SJ, Sathyanath A (2021) Simulation of Indian summer monsoon rainfall, interannual variability and teleconnections: evaluation of CMIP6 models. Clim Dyn 58:2693–2723. https://doi.org/10.1007/s00382-021-06027-w
Shiru MS, Chung E-S (2021) Performance evaluation of CMIP6 global climate models for selecting models for climate projection over Nigeria. Theor Appl Climatol 146:599–615. https://doi.org/10.1007/s00704-021-03746-2
Slivinski LC, Compo GP, Whitaker JS et al (2019) Towards a more reliable historical reanalysis: Improvements for version 3 of the Twentieth Century Reanalysis system. Q J R Meteorol Soc 145:2876–2908. https://doi.org/10.1002/qj.3598
Song F, Zhou T (2014) The climatology and interannual variability of East Asian summer monsoon in CMIP5 coupled models: Does air-sea coupling improve the simulations? J Clim 27:8761–8777. https://doi.org/10.1175/JCLI-D-14-00396.1
Swart NC, Cole JNS, Kharin VV et al (2019) The Canadian Earth System Model version 5 (CanESM5.0.3). Geosci Model Dev 12:4823–4873. https://doi.org/10.5194/gmd-12-4823-2019
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res Atmos 106:7183–7192. https://doi.org/10.1029/2000JD900719
Tsubono T (2022) Diagram statistically displaying model performance for tides or quasi-periodic oscillations. Deep Sea Res Part I Oceanogr Res Pap 180:103686. https://doi.org/10.1016/j.dsr.2021.103686
Veiga SF, Yuan H (2021) Performance-based projection of precipitation extremes over China based on CMIP5/6 models using integrated quadratic distance. Weather Clim Extrem 34:100398. https://doi.org/10.1016/j.wace.2021.100398
Volodin EM, Mortikov EV, Kostrykin SV et al (2018) Simulation of the modern climate using the INM-CM48 climate model. Russ J Numer Anal Math Model 33:367–374. https://doi.org/10.1515/rnam-2018-0032
Wallace JM, Gutzler DS (1981) Teleconnections in the Geopotential Height Field during the Northern Hemisphere Winter. Mon Weather Rev 109:784–812. https://doi.org/10.1175/1520-0493(1981)109%3c0784:TITGHF%3e2.0.CO;2
Wang B, Fan Z (1999) Choice of South Asian Summer Monsoon Indices. Bull Am Meteorol Soc 80:629–638. https://doi.org/10.1175/1520-0477(1999)080%3c0629:COSASM%3e2.0.CO;2
Watterson IG (1996) Non-dimensional measures of climate model performance. Int J Climatol 16:379–391. https://doi.org/10.1002/(SICI)1097-0088(199604)16:4%3c379::AID-JOC18%3e3.0.CO;2-U
Watterson IG (2015) Improved Simulation of Regional Climate by Global Models with Higher Resolution: Skill Scores Correlated with Grid Length. J Clim 28:5985–6000. https://doi.org/10.1175/JCLI-D-14-00702.1
Watterson IG, Bathols J, Heady C (2014) What Influences the Skill of Climate Models over the Continents? Bull Am Meteorol Soc 95:689–700. https://doi.org/10.1175/BAMS-D-12-00136.1
Wu T, Lu Y, Fang Y et al (2019) The Beijing Climate Center Climate System Model (BCC-CSM): the main progress from CMIP5 to CMIP6. Geosci Model Dev 12:1573–1600. https://doi.org/10.5194/gmd-12-1573-2019
Yool A, Palmiéri J, Jones CG et al (2021) Evaluating the physical and biogeochemical state of the global ocean component of UKESM1 in CMIP6 historical simulations. Geosci Model Dev 14:3437–3472. https://doi.org/10.5194/gmd-14-3437-2021
You Y, Ting M (2023) Improved Performance of High‐Resolution Climate Models in Simulating Asian Monsoon Rainfall Extremes. Geophys Res Lett 50: e2022GL100827. https://doi.org/10.1029/2022GL100827
Zhang M-Z, Xu Z, Han Y, Guo W (2021) An improved multivariable integrated evaluation method and tool (MVIETool) v1.0 for multimodel intercomparison. Geosci Model Dev 14:3079–3094. https://doi.org/10.5194/gmd-14-3079-2021
Acknowledgements
This work is supported by the National Natural Science Foundation of China (42050410322, 42075187), the Ministry of Science and Technology of China (QN2023140001L), the GeoX Interdisciplinary Research Funds (2023300286) for the Frontiers Science Center for Critical Earth Material Cycling, Nanjing University, Jiangsu Collaborative Innovation Center for Climate Change, and High-Performance Computing Center of Nanjing University. The authors acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIPs, and we thank the climate modeling groups (listed in Table 1) for producing and making available their model output. We thank Dr. Ian Watterson and another anonymous reviewer for their comments that greatly improved the original manuscript.
Funding
This work is supported by the National Natural Science Foundation of China (42050410322, 42075187), the Ministry of Science and Technology of China (QN2023140001L), the GeoX Interdisciplinary Research Funds (2023300286) for the Frontiers Science Center for Critical Earth Material Cycling, Nanjing University, Jiangsu Collaborative Innovation Center for Climate Change, and High-Performance Computing Center of Nanjing University.
Author information
Authors and Affiliations
Contributions
Sandro F. Veiga and Huiling Yuan designed the study. Sandro F. Veiga performed the analysis and wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix 1
Appendix 1
1.1 Adjusted Taylor skill score
In Fig. 5 the East Asian Summer Monsoon (EASM) spatial patterns simulated by ACCESS-CM2 and BCC-CSM2-MR are compared with the observation. A visual inspection reveals that ACCESS-CM2 simulates the EASM pattern better than BCC-CSM2-MR. This is confirmed by the difference in their spatial correlations, which is 0.78 for ACCESS-CM2 and 0.57 for BCC-CSM2-MR. ACCESS-CM2 correctly shows the southern zonally oriented negative pole and northern zonally oriented positive pole. This shape is not simulated by BCC-CSM2-MR. However, the TS score of ACCESS-CM2 is 0.73 and the TS score of BCC-CSM2-MR is 0.78, hence TS score indicates that BCC-CSM2-MR simulates the EASM pattern more accurately than ACCESS-CM2. The reason for the TS score to indicate a better simulation for BCC-CSM2-MR is due to the strong magnitudes in both poles simulated by ACCESS-CM2 which increases its spatial standard deviation, which is 0.92 mm day–1, whilst BCC-CSM2-MR simulates a deformed pattern with a spatial standard deviation of 0.56 mm day–1, which is very similar to the observed one (σo = 0.59 mm day–1). Thus, the very similar spatial standard deviations between BCC-CSM2-MR and observed patterns compensate the lack of dipolar structure in BCC-CSM2-MR when we use the TS score.
We consider that to evaluate dipolar patterns the metric should give a significant bonification weight to climate models that simulate the spatial dipolar structure correctly. Therefore, we adjusted the TS score to assess dipolar or multipolar patterns while preserving two important TS score original features, namely its simplicity and a numerical value as a result. This adjusted TS (TSadj) score should not be dependent on the spatial patterns’ standard deviations. Hence, similar to the original TS score, the pattern similarity is given by the spatial Pearson correlation coefficient (R, Eq. 12), but to assess the error in the dipolar shape and distribution magnitudes in the pattern we use the normalized RMSE (NRMSE). Using the reasoning underlying the original TS score, these two basic statistics are combined in a single metric via the formulation given by the following equation:
where R, R0, and NRMSE are the spatial Pearson correlation coefficient, the maximum correlation coefficient attainable (defined as 0.999), and the normalized RMSE, respectively. Similar to TS, TSadj closer to 1 indicates a better match between the model pattern and observation pattern, whereas closer to 0 indicates a very bad match. This formulation assures the basic conditions for a useful skill score, namely: (1) TSadj \(\in\) [0,1]; (2) R → R0 and NRMSE → 0 implies TSadj → 1; (3) NRMSE → ∞ implies TSadj → 0; and (4) R = ‒1 implies TSadj = 0. The input parameters α and β are introduced to penalize low R and high NRMSE, respectively. They are integer values subjectively chosen by the user according to their evaluation interest. The parameter β-1 assures that the TSadj score is bounded between 0 and 1, independently of the value of β. For instance, by applying the TSadj score to the EASM dipole patterns (Fig. 5) with α = 1 and β = 3, we have a TSadj = 0.46 for ACCESS-CM2 and TSadj = 0.44 BCC-CSM2-MR, thus correcting the assessment that ACCESS-CM2 reproduces a better dipolar structure. This specific example shows the utility of the TSadj score to assess the performance fidelity of dipole patterns simulated by climate models.
To have a better idea of how the TSadj score is influenced by the parameters α and β, Fig. 6 shows how TSadj varies according to the variation of R and NRMSE (lines in the plots) and different values for the parameters α and β (each plot). The lines of the plots are set for different correlation values (R from ‒1 to 1), with different colors, whereas the behavior of the lines represents the variation of TSadj according to different values of NRMSE (from 0 to 7). Each one of the four plots represents the TSadj values based on different penalizations exerted in the input statistics (R and NRMSE) by the different values in parameters α and β. Also, the results of the six models simulating the EASM from the main article shown in Fig. 1 are compared for their TSadj score based on different values in parameters α and β. In Fig. 6a the parameters α is set to 1 and β is set to 2 (i.e., TSadj(R,NRMSE,α = 1,β = 2)), therefore no penalization is exerted in R whilst penalization is exerted on models that have greater NRMSE. In Fig. 6b α is set to 2 while keeping β = 2, therefore penalizing equally models that have greater R and NRMSE. Figure 6c describes a higher level of NRMSE penalization with β = 3 without penalization on R (α = 1), whereas Fig. 6d shows how TSadj varies when the lack of pattern correlation (R) is penalized by setting α = 2, with high NRMSE penalization (i.e., β = 3). TSadj strictly monotonically increases with increasing spatial correlation unless in extreme conditions of NRMSE → ∞ and/or β → ∞.
The diamond symbols represent the TSadj values calculated for the EASM dipolar patterns simulated by each six models in Fig. 1 (main text). It should be noticed that the R and NRMSE of the models do not change for each plot (different α and β), but the choice of the parameters changes the models’ ranking evaluation since we are penalizing the input statistics (R and NRMSE) with different weights. This can be seen by comparing BCC-CSM2-MR (purple diamond) and IPSL-CM6A-LR (pink diamond) in different plots. Although their TSadj values are very similar, when the lack of spatial correlation (α = 1) is not penalized but NRMSE is penalized (β = 2 or 3), IPSL-CM6A-LR has slightly higher TSadj than BCC-CSM2-MR (Fig. 6a and c), whilst when both R and NRMSE are penalized (α = 2 and β = 2 or 3), the TSadj values of BCC-CSM2-MR are slightly higher than the ones of IPSL-CM6A-LR (Fig. 6b and d). With no penalization or only penalizing NRMSE (Fig. 6a), there are high values for TSadj (e.g., TSadj ~ 0.6) even for models that might have high NRMSE (i.e., NRMSE > 1). In Fig. 6a (α = 1 and β = 2) and Fig. 6b (α = 2 and β = 2) the TSadj values of ACCESS-CM2 are 0.57 and 0.51, respectively. ACCESS-CM2 simulates a very strong dipolar pattern as shown by its NRMSE = 0.98 (worst of all models). In this case, it depends on the user how to penalize the models. We may use as a “rule-of-thumb” that the models that accurately simulate the main features of the dipole structure should have a TSadj above 0.5. To have a metric that shows a reasonable variation of its values, a sensitive choice may be necessary also for basic statistics R and NRMSE. Thus, it seems accurate if thresholds are defined for TSadj, R, and NRMSE to indicate well-performing models, although this is a subjective choice. A reasonable choice is to set α = 2 and β = 3. In this case, models with low spatial correlation (R < 0.4) or models with great error (e.g., NRMSE > 1) should return TSadj values that are lower than 0.5, therefore half of its highest attainable value. Thus, setting the minimum thresholds for the basic statistics of NRMSE = 1 (even with the highest value of R = 1) and R = 0.4 (even with the lowest value of NRMSE = 0) to yield a TSadj = 0.5, as shown in Fig. 6d, can be an appropriated strategy. In this kind of assessment usually the patterns have dozens of grid points, therefore the number of degrees of freedom allows R = 0.4 to be statistically significant at p < 0.001.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Veiga, S.F., Yuan, H. Evaluation of metrics for assessing dipolar climate patterns in climate models. Clim Dyn 62, 6487–6503 (2024). https://doi.org/10.1007/s00382-024-07220-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-024-07220-3