Abstract
We describe the method of history matching, a method currently used to help quantify parametric uncertainty in climate models, and argue for its use in identifying and removing structural biases in climate models at the model development stage. We illustrate the method using an investigation of the potential to improve upon known ocean circulation biases in a coupled non-flux-adjusted climate model (the third Hadley Centre Climate Model; HadCM3). In particular, we use history matching to investigate whether or not the behaviour of the Antarctic Circumpolar Current (ACC), which is known to be too strong in HadCM3, represents a structural bias that could be corrected using the model parameters. We find that it is possible to improve the ACC strength using the parameters and observe that doing this leads to more realistic representations of the sub-polar and sub-tropical gyres, sea surface salinities (both globally and in the North Atlantic), sea surface temperatures in the sinking regions in the North Atlantic and in the Southern Ocean, North Atlantic Deep Water flows, global precipitation, wind fields and sea level pressure. We then use history matching to locate a region of parameter space predicted not to contain structural biases for ACC and SSTs that is around 1 % of the original parameter space. We explore qualitative features of this space and show that certain key ocean and atmosphere parameters must be tuned carefully together in order to locate climates that satisfy our chosen metrics. Our study shows that attempts to tune climate model parameters that vary only a handful of parameters relevant to a given process at a time will not be as successful or as efficient as history matching.
Similar content being viewed by others
References
Acreman DM, Jeffery CD (2007) The use of Argo for validation and tuning of mixed layer models. Ocean Model 19:53–69
Annan JD, Hargreaves JC, Edwards NR, Marsh R (2005) Parameter estimation in an intermediate complexity earth system model using an ensemble Kalman filter. Ocean Model 8:135–154
Annan JD, Lunt DJ, Hargreaves JC, Valdes PJ (2005) Parameter estimation in an atmospheric GCM using the ensemble Kalman filter. Nonlinear Process Geophys 12:363–371
Annan JD, Hargreaves JC, Ohgaito R, Abe-Ouchi A, Emori S (2005) Efficiently constraining climate sensitivity with ensembles of paleoclimate simulations. SOLA 1:181–184. doi:10.2151/sola.2005-047
Bellprat O, Kotlarski S, Luthi D, Schar C (2012) Objective calibration of regional climate models. J Geophys Res 117:D23115
Blaker AT, Hirschi JJ-M, McCarthy G, Sinha B, Taws S, Marsh R, de Cuevas BA, Alderson SG, Coward AC (2014) Historical analogues of the recent extreme minima observed in the Atlantic meridional overturning circulation at 26ºN. Clim Dyn. doi:10.1007/s00382-014-2274-6
Challenor P, McNeall D, Gattiker J (2009) Assessing the probability of rare climate events. In: O’Hagan A, West M (eds) The handbook of applied Bayesian analysis, chap 10. Oxford University Press, Oxford
Collins M, Brierley CM, MacVean M, Booth BBB, Harris GR (2007) The sensitivity of the rate of transient climate change to ocean physics perturbations. J Clim 20:2315–2320
Craig PS, Goldstein M, Seheult AH, Smith JA, Berger JO (1996) Bayes linear strategies for matching hydrocarbon reservoir history. In: Bernado JM, Dawid AP, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, Oxford, pp 69–95
Craig PS, Goldstein M, Rougier JC, Seheult AH (2001) Bayesian forecasting for complex systems using computer simulators. J Am Stat Assoc 96:717–729
Cumming JA, Goldstein M (2009) Small sample designs for complex high-dimensional models based on fast approximations. Technometrics 51:377–388
Cunningham SA, Alderson SG, King BA (2003) Transport and variability of the Antarctic circumpolar current in Drake passage. J Geophys Res 108(C5):8084. doi:10.1029/2001JC001147
Daniel C (1973) One at a time plans. J Am Stat Assoc 68:353–360
Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley, New York
Edwards NR, Cameron D, Rougier JC (2011) Precalibrating an intermediate complexity climate model. Clim Dyn 37:1469–1482
Fisher R (1926) The arrangement of field experiments. J Minist Agric Great Br 33:503–513
Friedman M, Savage LJ (1947) Planning experiments seeking maxima. In: Eisnenhart C, Hastay MW, Wallis WA (eds) Techniques of statistical analysis. McGraw-Hill, New York
Gent PR, Danabasoglu G, Donner LJ, Holland MM, Hunke EC, Jayne SR, Lawrence DM, Neale RB, Rasch PJ, Vertenstein M, Worley PH, Yang Z, Zhang M (2011) The community climate system model version 4. J Clim 24:4973–4991
Gladstone RM, Lee V, Rougier JC, Payne AJ, Hellmer H, Le Brocq A, Shepherd A, Edwards TL, Gregory J, Cornford SL (2012) Calibrated prediction of Pine Island Glacier retreat drink the 21st and 22nd centuries with a coupled flow line model. Earth Planet Sci Lett 333–334:191–199
Goldstein M, Rougier JC (2009) Reified Bayesian modelling and inference for physical systems. J Stat Plan Inference 139:1221–1239
Gordon C, Cooper C, Senior CA, Banks H, Gregory JM, Johns TC, Mitchell JFB, Wood RA (2000) The simulation of SST, sea ice extents and ocean heat transports in a version of the Hadley Centre coupled model without flux adjustments. Clim Dyn 16:147–168
Hargreaves JC, Annan JD, Edwards NR, Marsh R (2004) A efficient climate forecasting method using an intermediate complexity earch system model and the ensemble Kalman filter. Clim Dyn 23:745–760
Haylock R, O’Hagan A (1996) On inference for outputs of computationally expensive algorithms with uncertainty on the inputs. In: Berger JM, Dawid JO, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, Oxford, pp 629–637
Ingleby B, Huddleston M (2007) Quality control of ocean temperature and salinity profiles—historical and real-time data. J Mar Syst 65:158–175. doi:10.1016/j.jmarsys.2005.11.019
Johns et al (2006) The New Hadley Centre Climate Model (HadGEM1): evaluation of coupled simulations. J Clim 19:1327–1353
Johns et al (2008) Variability of shallow and deep Western boundary currents off the Bahamas during 2004-05: results from the 26N RAPID-MOC array. J Phys Oceanogr 38:605–623
Joshi MM, Webb MJ, Maycock AC, Collins M (2010) Stratospheric water vapour and high climate sensitivity in a version of the HadSM3 climate model. Atmos Chem Phys 10:7161–7167
Kalnay et al (1996) The NCEP/NCAR 40-year reanalysis project. Bull Am Meteorol Soc 77:437–470
Kennedy MC, O’Hagan A (2000) Predicting the output from a complex computer code when fast approximations are available. Biometrika 87:1–13
Kennedy MC, O’Hagan A (2001) Bayesian calibration of computer models. J R Stat Soc Ser B 63:425–464
Kraus EB, Turner J (1967) A one dimensional model of the seasonal thermocline II. The general theory and its consequences. Tellus 19:98106
Le Gratiet L (2014) Bayesian analysis of hierarchical multifidelity codes, SIAM. J Uncertain Quantif 1:244–269
Lee LA, Carslaw KS, Pringle KJ, Mann GW, Spracklen DV (2011) Emulation of a complex global aerosol model to quantify sensitivity to uncertain parameters. Atmos Chem Phys 11:12253–12273. doi:10.5194/acp-11-12253-2011
Loeppky JL, Sacks J, Welch WJ (2009) Choosing the sample size of a computer experiment: a practical guide. Technometrics 51(4):366–376
Martin GM, Milton SF, Senior CA, Brooks ME, Ineson S, Reichler T, Kim J (2010) Analysis and reduction of systematic errors through a seamless approach to modelling weather and climate. J Clim 23:5933–5957
Martin GM et al (2011) The HadGEM2 family of met office unified model climate configurations. Geosci Model Dev 4:723–757
Mauritsen T, Stevens B, Roeckner E, Crueger T, Esch M, Giorgetta M, Haak H, Jungclaus J, Klocke D, Matei D, Mikolajewicz U, Notz D, Pincus R, Schmidt H, Tomassini L (2012) Tuning the climate of a global model. J Adv Model Earth Syst 4:M00A01. doi:10.1029/2012MS000154
McNeall DJ, Challenor PG, Gattiker JR, Stone EJ (2013) The potential of an observational data set for calibration of a computationally expensive computer model. Geosci Model Dev 6:1715–1728
Meehl GA, Covey C, Delworth T, Latif M, McAvaney B, Mitchell JFB, Stouffer RJ, Taylor KE (2007) The WCRP CMIP3 multi-model dataset: a new era in climate change research. Bull Am Meteorol Soc 88:1383–1394
Megann AP et al (2010) The sensitivity of a coupled climate model to its ocean component. J Clim 23:5126–5150. doi:10.1175/2010JCLI3394.1
Meijers AJS, Shuckburgh E, Bruneau N, Sallee JB, Bracegirdle TJ, Wang Z (2012) Representation of the Antarctic circumpolar current in the CMIP5 climate models and future changes under warming scenarios. J Geophys Res 117:C12008. doi:10.1029/2012JC008412
Meinen CS, Garzoli SL (2014) Attribution of deep western boundary current variability at 26.5 N. Deep Sea Res I 90:81–90. doi:10.1016/j.dsr.2014.04.016
Murphy JM, Sexton DMH, Jenkins GJ, Booth BBB, Brown CC, Clark RT, Collins M, Harris GR, Kendon EJ, Betts RA, Brown SJ, Humphrey KA, McCarthy MP, McDonald RE, Stephens A, Wallace C, Warren R, Wilby R, Wood R (2009) UK climate projections science report: climate change projections. Met Office Hadley Centre, Exeter. http://ukclimateprojections.defra.gov.uk/images/stories/projections_pdfs/UKCP09_Projections_V2
Pope VD, Gallani ML, Rowntree PR, Stratton RA (2000) The impact of new physical parameterizations in the Hadley Centre climate model: HadAM3. Clim Dyn 16:123–146
Pukelsheim F (1994) The three sigma rule. Am Stat 48:88–91
Randall DA et al (2007) Climate models and their evaluation. In: Solomon S et al (eds) Climate change 2007: the physical science basis. Cambridge University Press, Cambridge, pp 589–662
Rougier JC (2007) Probabilistic inference for future climate using an ensemble of climate model evaluations. Clim Change 81:247–264
Rougier JC (2008) Efficient emulators for multivariate deterministic functions. J Comput Graph Stat 17:827–843
Rougier JC, Sexton DMH, Murphy JM, Stainforth D (2009) Emulating the sensitivity of the HadSM3 climate model using ensembles from different but related experiments. J Clim 22:3540–3557
Rougier JC (2013) ‘Intractable and unsolved’: some thoughts on statistical data assimilation with uncertain static parameters. Phil Trans R Soc A 371:20120297. doi:10.1098/rsta.2012.0297
Rowlands DJ, Frame DJ, Ackerley D, Aina T, Booth BBB, Christensen C, Collins M, Faull N, Forest CE, Grandey BS, Gryspeerdt E, Highwood EJ, Ingram WJ, Knight S, Lopez A, Massey N, McNamara F, Meinshausen N, Piani C, Rosier SM, Sanderson BJ, Smith LA, Stone DA, Thurston M, Yamazaki K, Yamazaki YH, Allen MR (2012) Broad range of 2050 warming from an observationally constrained large climate model ensemble. Nat Geosci doi:10.1038/NGEO1430
Russell JL, Stouffer RJ, Dixon KW (2006) Intercomparisons of the Southern Ocean circulations in IPCC coupled model control simulations. J Clim 19:4560–4575
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4:409–435
Santner TJ, Williams BJ, Notz WI (2003) The design and analysis of computer experiments. Springer, New York
Schmittner A, Urban NM, Shakun JD, Mahowald NM, Clark PU, Bartlein PJ, Mix AC, Rosell-Mele A (2011) Climate sensitivity estimated from temperature reconstructions of the last glacial maximum. Science 334:1385
Severijns CA, Hazeleger W (2005) Optimizing parameters in an atmospheric general circulation model. J Clim 18:3527–3535
Sexton DMH, Murphy JM, Collins M (2011) Multivariate probabilistic projections using imperfect climate models part 1: outline of methodology. Clim Dyn. doi:10.1007/s00382-011-1208-9
Shaffrey L et al (2009) UK-HiGEM: the new UK high resolution global environment model. Model description and basic evaluation. J Clim 22(8):1861–1896. doi:10.1175/2008JCLI2508.1
Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL (eds) (2007) Contribution of working group I to the fourth assessment report of the intergovernmental panel on climate change, 2007. Cambridge University Press, Cambridge
Uppala et al (2005) The ERA-40 re-analysis. Q J R Meteorol Soc 131:2961–3012
Vernon I, Goldstein M, Bower RG (2010) Galaxy formation: a Bayesian uncertainty analysis. Bayesian Anal 5(4):619–846 with Discussion
Watanabe M, Suzuki T, O’Ishi R, Komuro Y, Watanabe S, Emori S, Takemura T, Chikira M, Ogura T, Sekiguchi M, Takata K, Yamazaki D, Yokohata T, Nozawa T, Hasumi H, Tatebe H, Kimoto M (2010) Improved climate simulation by MIROC5: mean states, variability and climate sensitivity. J Clim 23:6312–6335
Williamson D (2010) Policy making using computer simulators for complex physical systems: Bayesian decision support for the development of adaptive strategies, Ph.D. thesis, Durham University
Williamson D, Goldstein M, Blaker A (2012) Fast linked analyses for scenario based hierarchies. J R Stat Soc Ser C 61(5):665–692. doi:10.1111/j.1467-9876.2012.01042.x
Williamson D, Goldstein M, Allison L, Blaker A, Challenor P, Jackson L, Yamazaki K (2013) History matching for exploring and reducing climate model parameter space using observations and a large perturbed physics ensemble. Clim Dyn 41:1703-1729. doi:10.1007/s00382-013-1896-4
Williamson D, Blaker AT (2014) Evolving Bayesian emulators for structurally chaotic time series with application to large climate models. SIAM/ASA J Uncertain Quantif 2(1):1–28. doi:10.1137/120900915
Williamson D, Vernon IR (2013) Implausibility driven evolutionary Monte Carlo for efficient generation of uniform and optimal designs for multi-wave computer experiments. arXiv:1309.3520 (in revision)
Xie P, Arkin PA (1997) Global precipitation: a 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs. Bull Am Meteorol Soc 78:2539–2558
Yamazaki K, Rowlands DJ, Aina T, Blaker A, Bowery A, Massey N, Miller J, Rye C, Tett SFB, Williamson D, Yamazaki YH, Allen MR (2012) Obtaining diverse behaviours in a climate model without the use of flux adjustments. J Geophys Res Atmos 118:2781-2793. doi:10.1002/jgrd.50304
Acknowledgments
This research was funded by the NERC RAPID-RAPIT project (NE/G015368/1). Daniel Williamson was also funded by an EPSRC fellowship, grant number EP/K019112/1. We would like to thank the CPDN team for their work on submitting our ensemble to CPDN users. We’d also like to thank the Institute of Advanced Study at Durham University for funding and hosting our workshop on ocean model discrepancy which formed the motivation for these investigations. In addition, we thank the oceanographers who participated in this workshop. We’d like to thank the CPDN users around the world who contributed their spare computing resource as part of the generation of our ensemble. NCEP and CMAP Precipitation data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their website at http://www.esrl.noaa.gov/psd/.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Appendix 1: Building emulators for history matching
What follows is a brief description of the methods we used to construct emulators for the constraints described in this paper. An emulator for element \(i\) of \(f(x)\) might typically be fitted as
where \(g(x)\) is a vector of specified functions of \(x, \beta\) is a matrix of coefficients, and \(\epsilon (x)\) is a stochastic process with a specified covariance function. As discussed in Sect. 2 there are many ways to build emulators and the way that is chosen will depend on the size of the PPE available, the type of constraint we wish to emulate and the relationships between the data and the parameters that we find. In this study we had access to large ensembles, and each of our constraints was a univariate quantity and so required less sophisticated modelling than a spatial field or time series might. Hence we fit the emulator mean functions, \(\beta g(x)\) in Eq. (2) using a stepwise regression procedure described below.
The functions we consider adding to \(g(x)\) were linear, quadratic and cubic terms in each of the parameters with up to third order interactions between all parameters considered. Switch parameters were treated as factors (variables with a small number of distinct possible “levels”) and interactions between factors and all continuous parameters were permitted. For a list of the parameters varied in the ensemble see Appendix 2.
Our fitting procedure begins with a “forward selection”, where we permit each allowed term to be added to \(g(x)\) in its lowest available form. For example, if the linear term for \(x_1\) is not yet in \(g(x), x_1\) is available for selection but \(x_1^{2}\) is not. If \(x_1\) is already in \(g(x)\) then all first order interactions with the other linear parameters in \(g(x)\) are included and then \(x_1^{2}\) is available for selection. So, suppose \(g(x)\) is \((1, x_2)\), then the selection of \(x_1\) implies that \(g(x)\) will become \((1,\,x_1,\,x_2,\,x_1*x_2)\). If \(x_1\) is selected, at the next iteration we may select any of the other parameters but we may also include quadratic terms \(x_2^{2}\) and \(x_1^{2}\). We add the interactions in this way, and do similar for third order interactions when quadratic terms have been included, so that the resulting emulator will be robust to changes of scale (see Draper and Smith 1998, for discussion). The term that is added to \(g(x)\) at each iteration is the term of those available that reduces the residual sum of squares the most after fitting by ordinary least squares.
When it becomes clear that adding more terms is not improving the predictive power of the emulator (a judgement made by the analyst based on looking at the proportion of variability explained by the emulator and at plots of the residuals from the fit) we begin a backwards elimination algorithm. This removes terms from \(g(x)\), strictly one at a time, with the least contribution to the sum of squares explained by the fit without compromising the quality of the fit. Lower order terms are not permitted to be removed from \(g(x)\) whilst higher order terms remain. We stop when removing the next term chosen by the algorithm leads to a poorer statistical model. For more details on stepwise methods such as these see Draper and Smith (1998).
We allow \(\epsilon (x)\) in Eq. (2) to be mean zero error with variance specified by the residual variability from the fits and no correlation between \(\epsilon (x)\) and \(\epsilon (x^{\prime })\) for \(x\ne x^{\prime }\). Though this lack of correlation might not be appropriate if we had smaller ensembles or, perhaps, if we had completed a number of waves of history matching and were focussing on a densely sampled subset of parameter space, it is computationally efficient and a reasonable enough approximation to the data here to be adopted for pragmatism. Including a more complex correlation would reduce our emulator uncertainty and likely lead to more parameter space being ruled out, though at a computational cost. Note that, with zero correlation between any points, the emulator will not interpolate the ensemble members and will have non-zero variance at each of them. Though the model is deterministic (in the sense that running it twice for the same values of the model inputs returns the same answer), it also displays sensitive dependence to initial conditions, hence the fitted variance at the design points represents the model’s internal variability. This form of emulator effectively assumes that internal variability is constant throughout parameter space, and hence can be estimated from the ensemble as part of the fitting procedure.
Following the fitting of each emulator we validate its quality using \(10\,\%\) of the ensemble that was chosen randomly and reserved from the training data prior to the fit. This procedure involves checking that the emulator accurately predicts each of the unseen ensemble members to within the accuracy specified by emulator uncertainty. If the emulators pass this diagnostic check, we then use them in our history matching.
We give details of our emulator for the ACC strength in HadCM3 to illustrate the complexity of the mean function and the performance of the predictions. The terms selected in \(g(x)\) are displayed in Table 1. Each header corresponds to the label given to each of the parameters in Table 2. Numbers on the diagonal of the table refer to the order of the parameter included in the emulator. For example, the number 2 implies that both quadratic and linear terms in that parameter were included in \(g(x)\). Numbers on the upper triangle refer to the inclusion (1) or not (0) of interactions between the two relevant parameters in \(g(x)\). So, reading from the first row of the table, the term (entcoef \(*\) ct) is included in \(g(x)\), but the term (entcoef \(*\) rhcrit) is not. Variables in bold on the lower triangle indicate the inclusion of the given third order interaction. For example, the table indicates that the term (AH1_\(\hbox {SI}^{2}*\)CWland) is in \(g(x)\). In addition to the terms in the table, the factor r_layers and a linear term in parameter charnock are included, as is 1 so that an intercept is fitted.
Figure 13 shows a validation plot for the ACC emulator. For 65 PPE members, chosen randomly, that were reserved from the emulator at the fitting stage, the data are sorted by ACC strength and plotted in red. We overlay the emulator predictions (black points) and the uncertainty on those predictions (error bars). The uncertainty represents approximately 2 standard deviations for each prediction. We can see that the predictions are generally good with most unseen PPE members laying within the uncertainty on the prediction. In fact, our uncertainty specification may be too conservative, in that we have allowed for more uncertainty in the predictions than is required. If this is the case, that would lead to less space ruled out by history matching, not more, and it is our preference to remain conservative when ruling out regions of parameter space.
Appendix 2: Tables
Tables 2, 3 and 4 give descriptions and ranges for the parameters and the settings of switches used in our ensemble. Some parameters have relationships with other model parameters that were given to us by the Met Office so that a change in one leads to a derivable value for the other. CWland also determines CWsea, the cloud droplet to rain threshold over sea (\(\hbox {kg/m}^3\)), MinSIA also determines dtice (the ocean ice diffusion coefficient) and k_gwd also determines kay_lee_gwave (the trapped lee wave constant for surface gravity waves \(\hbox {m}^{3/2}\)).
Appendix 3: Another NROY ACC model
In the main text we present the behaviour of one of the NROY ACC models, arguing that correcting the ACC strength seems to improve the ocean circulation. Though we do not reproduce all of the figures from the main text, in order to save space, we show the BSF of another of these models in Fig. 14 to indicate that the chosen model was not a “one-off”. This model has a slightly more physical looking sub polar gyre, at the expense of a more diffuse gulf stream. The cold bias in the North Atlantic (not shown) was also greater in this model.
Appendix 4: Anomalies from two additional precipitation climatologies
In the main text we present precipitation anomalies from the ERA40 climatology. However, we caution the reader against interpreting improvements seen in the improved ACC run as robust. We present anomalies from two alternative precipitation climatologies, the CPC Merged Analysis of Precipitation (CMAP) (Xie and Arkin 1997) and NCEP/NCAR reanalysis (Kalnay et al. 1996) in Fig. 15. These plots indicate that the standard model does have a tendency towards higher than observed precipitation along the ITCZ and over the maritime continent, and the improved ACC run we present tends to exhibit lower than observed precipitation over the western equatorial Pacific. The improved ACC run does perform better than the standard run everywhere outside the tropics, where the standard run has a tendency towards higher than observed precipitation.
Rights and permissions
About this article
Cite this article
Williamson, D., Blaker, A.T., Hampton, C. et al. Identifying and removing structural biases in climate models with history matching. Clim Dyn 45, 1299–1324 (2015). https://doi.org/10.1007/s00382-014-2378-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00382-014-2378-z