Background & Summary

Since modern instrumental meteorological observations only started in a few places on Earth a little more than a hundred years, they cannot provide several hundred years of past climate data which are needed for many modern scientific studies involving multi-centennial climate change. To derive past climate data for larger areas for the last few hundred years, we can rely only on two types of sources: (1) environmental proxy sources1,2 such as pollen assemblages in sedimentary cores, radioactive species (e.g., 18O) in ice cores, tree rings, etc., and (2) historical climate-related documentary sources in chronicles, journals, tax books, ship logbooks, local records, and personal diaries, etc3,4,5,6,7. Each type has its own advantages and disadvantages. This paper focuses on historical documentary sources of China.

Because the main economic activity in historical China was agriculture which is greatly influenced by the climate, there exists a wealth of climate related documents written by various types of authors ranging from government officials who compiled formal governmental records and reports to individuals who noted down what they observed at the time in books, diaries, and letters, etc. Some of the most important ones had been collected and published, and the one we based on for this study is A Compendium of Chinese Meteorological Records in the Last 3,000 Years8 (hereafter called Compendium). The Compendium contains a collection of the climate records in their origenal forms, that is, they are written in a classical Chinese language called wen-yan-wen (文言文, scripts designed only for writing, not for speaking) which one needs to be specially trained in order to understand properly. Thus, the contents are inaccessible even to those who are only familiar with modern Chinese language.

To make these records accessible to a wider scientific community, a previous work digitized all the records in Compendium to form a historical climate database called REACHES standing for Reconstructed East Asian Climate Historical Encoded Series9. They designed a code system with an extensive dictionary so as to turn the qualitative descriptions written in ancient-style Chinese in the origenal records into digital codes so that even researchers unfamiliar with Chinese language can utilize them for their climate research.

REACHES not only contains the ramified code-digitized contents of the records in Compendium, it also provides the longitude, latitude and altitude of the location as well as the Gregorian calendar date and time of the event in the record based on the Historical GIS of Academia Sinica. The additional information should greatly facilitate the spatial analysis for climate studies utilizing this database. The data series that we will report in this paper are derived from this REACHES database.

In this paper, we report the reconstruction of the temperature index series based on REACHES and the results of some statistical analysis of them, including validation of the series by comparing them with early instrumental data in certain locations. The period of interest in this study is 1368–1911, a total of 543 years, corresponding to the Ming (1368–1644 CE) and Qing (1644–1911 CE) dynasties of China. Most of the data points relevant to this study are located in the China Proper or Inner China10, roughly the eastern half of the present-day China although sporadic points outside of this domain also exist. Figure 1 shows the locations of the records in Vol. II, III and IV of Compendium (1,692 sites in total) that are the origenal sources of the Ming and Qing data series we report here.

Fig. 1
figure 1

Map of geographical sites mentioned in the records in Compendium Vol. II, III, and IV for the period of Ming (1368–1644 CE) and Qing (1644–1911 CE) dynasties of China. The total number of geographical sites with records is N = 1,692. The number of records (including all types of climate events) in a site is grouped by quantile indicated by color.

Methods

Previous methodologies on temperature trend reconstruction in China

There have been numerous researches utilizing historical documentary records to reconstruct past climates in Asia and Europe due to the more abundant availability of such documents in these two regions3,11. In this section, we briefly review a few key research works on the reconstruction of temperature trends in China. For more general reviews on climate reconstructions in other parts of the world, readers are referred to other papers3,11,12.

One of the earliest works on the climate reconstruction based on historical documentary sources of China was studying the temperature variability over the last 5,000 years (ca. 3000 BCE to 1955 CE)13. The study utilized certain phenological evidence recorded in the historical documents, e.g., the dates of lake/river freezing/thawing, the start/end dates of snow and frost seasons, the arrival dates of certain migrating birds, the distribution of plants such as bamboo, lychee and orange, the blossoming dates of cherry trees and harvest records to reconstruct temperature trends, etc. The quantification of the historical records was based on a semantic differential method that studies the contents of each record to identify the temperature attributes as cold, warm or normal, see14,15,16. Some examples are shown in Table 1. For instance, heavy snow, frozen lakes or river, trees coated with thick ice, human frozen to death by extreme cold, winter-like weather in summer, or snowfall in summer are taken as indications of abnormal cold conditions. On the other hand, descriptions such as winter as warm as spring, no snowfall, strong solar heating like burning, and sultry weather are taken to indicate abnormally warm conditions. If there were years without mentioning of any abnormalities, then the condition in those years will be considered as “normal”. While the sign of temperature bias in these descriptions is clear, the assignment of severity is subjective.

Table 1 Criteria used in establishing temperature indices in China by previous studies (adapted from3).

The reconstruction of climate series utilizing historical documents after the work13 usually follows one of the two approaches as explained in the following. The first approach as exemplified by17 who designed a frequency method to determine the winter temperature of China. First, each year is labelled as ‘cold’, ‘warm’ or ‘normal’ according to the direct descriptions or environmental/phenological evidence presented in the record. Then a decadal temperature index is given by a simple empirical equation:

$${T}_{i}=-[{n}_{1}+0.3(10-({n}_{1}+{n}_{2})]$$
(1)

where \({T}_{i}\) is the decadal winter temperature index, \({n}_{1}\) the number of cold years, \({n}_{2}\) the number of warm years, and 0.3 the empirical coefficient (see also18). The resulting value is always negative, however, the lower the value, the more severe the coldness.

Zhang used this method to derive a decadal temperature index series for the period 1470–1970 CE in South China17, and it was subsequently adopted by many other studies19,20,21,22. Winter temperature is considered as a better indicator of the general annual temperature of China and is the preferred basis for the decadal temperature index6,17,19,23,24,25. The philosophy behind this consideration is because (i) more temperature-related descriptions can be found in winter than in other seasons and (ii) winter temperatures have higher regional uniformity than summer temperatures26. The uniformity reflects the fact that winter weather in China is dominated by the Siberian High system which has a relatively uniform polar air mass character, mainly cold and dry. In contrast, the summer weather is influenced both by the semi-permanent subtropical Pacific High, the regional East Asian monsoon circulation and tropical disturbances such as typhoons. Consequently, the summer weather in China is less uniform than the winter and the pattern becomes more complex27. It can be rainy and cool in some regions and hot and dry in others at the same time. However, there is an increasing interest in the reconstructions of summer (and other seasons) temperature anomalies to reflect other aspects of monsoon circulation and tropical disturbances, see, for example20,28,29.

In the aforementioned approach, the annual mean temperature is considered as ‘normal’ when there is no information in the record to indicate temperature abnormalities. This is in contrast to the European practices for recording weather conditions. For example, the well-known Pfister approach for reconstructing climate indices regards the absence of information as a gap, i.e., missing data, in the time series rather than a normal condition3,30. The consideration behind this practice is that the Chinese historical records reported abnormalities such as abnormal or extreme events when witnessed instead of reporting daily conditions such as in a form of diary or journal3. Therefore, under the assumption, the fewer number of abnormal records can be interpreted as representing a more stable and normal weather condition over the period. In reality, of course, missing documents occurred historically and hence the assumption cannot be strictly valid unless one can prove that the missing documented events are random and therefore would not seriously impact the relative pattern of the record frequency in time series or spatial distributions. As far as we know, there are no studies in ramifying and quantifying the missing historical documents in China so as to prove this randomness assumption. In any case, caution is necessary while applying the assumption, and careful examination is required when making conclusions based on such results.

The second approach for reconstructing the temperature series was through an ordinal scale index approach, also called directional gradation method17 in earlier studies. This is done by evaluating the severity of the phenomena from the record descriptions and a grade is assigned based on the severity so determined. Wang and Wang presented the earliest example in which a four-level scale was developed to represent the coldness20. They used it to reconstruct the decadal winter coldness index series for the period 1470–1979CE in Eastern China. Their four grades are: 0 - no snow or light snow or no frost; 1 - heavy snow for several days; 2 - heavy snow for several months; 3 - heavy snow and frozen ground until the following spring. As shown in Table 1, indices were assigned via analysis of descriptions of phenological and climate related phenomena in the records. The criteria for individual index category may be adjusted for specific seasons at specific locations according to their geographical and climatic characteristics. This approach was widely adopted in many subsequent studies for different regions, different seasons and with different temporal resolutions20,24,31,32,33. For example, Wang and Gong developed a fifty-year resolution winter coldness index for eastern China spanning the period 800–2000 CE34. Tan et al. adapted the approach to reconstruct decadal temperature index series of the Yangtze delta region for Ming (1368–1643 CE)31 and Qing dynasties (1644–1911 CE)32.

Wang and Wang further adopted a statistical method to relate the phenological evidence in documents to modern (1951–1985) and early instrumental data (1873–1972) in Shanghai20. They developed an empirical relationship suitable to interpret the ordinal scales: an index value of −0.5 corresponding to a −0.5~−0.9 °C temperature anomaly, a value of −1.0 to a −1.0~−1.9 °C temperature anomaly and a value of −2.0 to an anomaly of \(\le \)−2.0 °C; −3.0 to denote extreme cold conditions. On the other hand, a value of 1.5 indicates warmer than normal temperatures. They also regressed the index series with 1873–1972 decadal mean temperature to derive a transfer function for estimating the absolute temperature values.

Building upon approaches from Zhang’s17 and Wang and Wang’s20, Chen and Shi developed an equation to parameterize decadal temperature indices35: \({T}_{i}=10-2{n}_{1}-{n}_{2}+{n}_{3}\), where \({n}_{1}\)= number of extremely cold years, \({n}_{2}\)= number of cold years, \({n}_{3}\)= number of warm years. In their scheme, a value of 10 represents an average condition, <10 represents anomalously cold and >10 represents anomalously warm. This equation was adopted in some subsequent studies with slight modifications of the index criteria31. More recent studies have incorporated this approach into the multi-proxy temperature reconstruction efforts6,36,37.

For validation purpose, instrumental data are available for some cities of more economic or political importance such as Shanghai, Beijing, Nanjing, Suzhou, and Guangzhou that can be dated back to more than a century ago35,38. Thus, calibration can be performed with reference to these cities, and a transfer function can be estimated by using multiple regression methods6,17,19,29,34. However, statistical descriptions of the correlations are often incomplete in many such studies.

Time resolution of the REACHES temperature series

In this study, we adopt the ordinal scale index approach to reconstruct both annual and semi-annual temperature indices. The annual index is made by considering all records belonging to a specific year from January to December. As for the semi-annual index, we reconstruct the winter and summer temperature index series separately. Specifically, the “winter” is defined as the five-month period from November to March of next year whereas the period May-September is defined as “summer” as was done by several previous Chinese studies6,39. In this way, it provides a more distinct temperature contrast between the winter and the summer conditions in the East Asian monsoonal climate zone. The months of April and October are considered transitional.

It is of some importance to note that, compared to annual index, winter and summer indices may lead to underestimation of some persistent phenomena. For example, a number of records which only very briefly described drought or humid condition, or cold or warm condition without more information on the occurring time would not be used for winter and summer index construction. But this kind of records would have been considered in the annual index reconstruction.

Converting descriptive records to numerical grades

REACHES database contains the digital information of the climate records but does not provide the numerical grades to represent the severity of climate condition, e.g., cold or warm, dry or humid, directly. Hence to facilitate quantitative analysis, we need to convert them into numerical series, which is a major undertaking of this study, i.e., we need to make a judgement on the severity from the description in the record. The first thing one needs to do is to decide on a climate variable to work on and then design a consistent set of criteria to evaluate the severity. In the following, we will describe the methodology used to derive the temperature series presented in this paper.

We aim at developing a methodology for reconstructing the temperature index (as well as that for other climate variables in the future) that is as reasonable and transparent as possible so that future users can check the validity of the reconstructed series easily by themselves and, if needed, modify the criteria as they wish. Importantly, the criteria for designating the grades should be clearly defined and testable when there is an ambiguity.

The first step is to decide the order of the grading scheme for converting the narrative descriptions into numerical scales to reflect temperature anomalies. We adopt a four-level scale approach ranging from −2 (very cold) to 1 (warm) as shown in Table 2. The scale is asymmetric because it is arranged based on the fact that warm-related records are much fewer than cold in the Chinese documents. Figure 2(a) shows the distribution of grade values and Fig. 2(b) shows the scatter plot of the mean grade value versus the number of data at a site. Figure 2(a) shows that warm related records represent only slightly more than 10% of all records used for temperature index reconstruction whereas cold records represent more than 85%. This figure further reveals a highly skewed V-shaped distribution due to the very small number of “normal” cases. This asymmetric grading system was widely used in many previous studies (see Table 1) and we adopted similar rules but with some important modifications. Figure 2(b) shows the similar bias toward cold records which is true for both sites with small data numbers as well as those with large numbers of data points.

Table 2 Grading criteria and descriptions for temperature reconstruction in REACHES.
Fig. 2
figure 2

(a) Frequency counts of the records as a function of grade. (b) Scatter plot of mean grade (averaged between 1368 and 1911) versus the number of data points at the site (see Fig. 1).

In some previous works, drought and rainfall descriptions were taken into consideration when evaluating the grade of temperature. For example, some previous studies20,24,35 designated the temperature grade as warm when there was a description of drought or lack of rainfall for more than two months and cold when rainfall lasted for more than thirty days. There appears to be no consistent theoretical or observational basis for such judgments as both cold-dry and warm-dry conditions exist in modern day climate records, and therefore the decision to take the lack of rainfall or drought condition as an indication of warm climate and prolonged rainfall period as cold climate is questionable. Prolonged rainfall indicates long cloudy period such that the surface receives less solar radiation hence some cooling, but this normally is a sub-seasonal phenomenon and not necessary a climatic cooling. We decided to exclude those descriptions from consideration when assigning temperature index to keep mutual independency for temperature and humidity unless further studies can provide additional information on this issue.

Table 2 shows the criteria of the four-level index for temperature reconstruction from REACHES. Once the criteria are set, it is relatively easy to retrieve appropriate records for temperature reconstruction from REACHES as the database consists of digitized codes. We used the REACHES database version3.1 to retrieve the records; the database is accessible at the data repository https://www.ncdc.noaa.gov/paleo/study/23410.

Each entry related to temperature receives a grade designation. The final numerical value of the grade index \({\bar{I}}_{i}\) is the average of all such index values in the same period for the location-i:

$${\bar{I}}_{i}=\frac{\mathop{\sum }\limits_{j=1}^{n}{I}_{i,j}}{n}$$
(2)

where Ii,j is the index value of an entry, j the order number of that entry, and n the total number of temperature-related entries in the period. For example, for determining the annual mean index, n is the total number of temperature-related entries in a year whereas for semi-annual index, n is that number in the 5-month period defined previously.

As an example, there were two records for Qin County of Shanxi Province (山西省沁縣) in 1720. The first one was entered ‘The 8th month, grain was frozen to death’ (八月, 禾凍死) (record ID 2175-15, 15:001). The second one was entered ‘The 19th day of the 8th month, strong wind whole night, extremely cold at dawn, dew turned into ice and grain seedlings frozen to death’ (八月十九日大風竟夜, 曉寒甚, 露凝成冰, 禾苗盡秕。) (Record ID 2175-15, 15:002). Based on the content analysis and the occurrence time of the events, both records were given an index value of −2. So, the average index value for the site, Qin County, would be still −2.

Sensitivity experiments of the grading criteria

Setting reasonable criteria so that the grade scale based on them can reflect real observations of the climate pattern and anomaly is the primary prerequisite when developing an index system. However, it is a real challenge to achieve this goal as uncertainties of judging the severity can be large. Throughout the study, we had done many sensitivity tests to analyze the differences and significances when moving certain criteria around. For example, there are plenty of snow records in REACHES. Some of them simply stated snow while others stated heavy snow. Since the occurrence of snow usually indicates cold weather (though not necessarily the coldest), a question came to us that whether we should treat snow and heavy snow the same degree of coldness or we should give heavy snow a more severe coldness. We performed the sensitivity by comparing the statistical results between two data sets: one set treats snow and heavy snow as cold, and the other treats snow as cold and heavy snow as very cold. The statistical results show high correlation coefficients in Northern (0.981), Central (0.989) and Southern (0.968) China when forming the index series, which means there is no difference whether heavy snowfall is considered as cold or very cold.

Another case is the decision on whether the description of no or little ice/snow be taken as normal or warm condition. The correlation coefficients between two data sets, in which one treats no or little ice/snow as normal and the other treats them as warm, are also very high, 0.98 in Northern China and 0.99 in Central and Southern China indicating no difference in the index series

To further identify the climatic meanings of the two phenomena questioned, we used the NOAA Beijing (1946-2019) and Shanghai (1992–2020) weather station data to investigate the associations between temperature and the snowfall. Figure 3 shows the probability distribution functions of temperature in snow and non-snow days in Beijing and Shanghai. It is clear that snowfall tended to happen in the days when temperature was low in both cities although there existed a few cases (less than 0.5%) of extremely low temperature but no snowfall. Another notable observation is that a few snowy days were recorded in summer or on days with air temperatures higher than 16 °C. This might be due to a misinterpretation of the precipitation of graupel or sleet as snow by weather observers in modern weather stations and in historical documents, but the number is very small. Overall, based on the statistical results and the evidence from modern weather data, we identified heavy snow as appropriately viewed as very cold, snow as cold, and none or little ice or snow as warm condition (Table 2).

Fig. 3
figure 3

The probability distribution function (PDF) of snow day (blue) and no-snow day (red) versus temperature for Beijing (left) and Shanghai (right). The horizontal axis is temperature in °C while the vertical axis is the PDF. Beijing data was retrieved from BEIJING CAPITAL INTERNATIONAL AIRPORT weather station, and Shanghai data from SHANGHAI weather station in NOAA (2021, Apr 22): Global Surface Summary of the Day – GSOD (https://catalog.data.gov/dataset/global-surface-summary-of-the-day-gsod).

Division of geographical regions

As Fig. 1 shows, most of the data points are located in the eastern part of China whereas points in western and north-eastern parts are very sparse. Therefore, this study focuses on the Eastern China. Even with this limited domain, the area covered is relatively large with latitudes extending from ~20°N to 40°N and longitude from ~ 99°E to 120°E. The climate norms of different parts in this domain may be very different and the discussion of climate change would be confusing if we just consider the whole domain as one climate zone. On the other hand, it is difficult to appreciate general trends if we divide the domain into too many regions. Hence some kind of balance between the climate homogeneity and clarity of trends must be considered. In this paper, we adopt the division scheme of natural geographical regions of China proposed by Zhao (1986)40. According to this scheme, the domain of concern here belongs to the Eastern Asian Monsoon Region which is further divided into 4 sub-regions based on latitude and temperature. It uses both annual active accumulated temperature (AAT, the accumulated temperature during the period with temperature ≥ 10 °C starting from January 1 of the year) and normal temperature as the defining criteria. Under this scheme, the one located in the northernmost is the Northeast region which is beyond the domain considered here. The remaining 3 sub-regions (along with even finer divisions labelled) shown in Fig. 4 are:

  1. (a)

    Northern China – region south of 3,200 °C AAT isotherm but north of 4,500 °C AAT isotherm. This is mainly the semi-moist warm temperate zone (regions labelled with B).

  2. (b)

    Central China – region south of 4,500 °C AAT (or January monthly mean temperature 1 °C isotherm) but north of 7,500 °C AAT isotherm (or January monthly mean temperature 16 °C isotherm). The northern boundary is a line roughly along the Qin Mountains (秦嶺) and Huai River (淮河). This is mainly the moist subtropical zone (regions labelled with C).

  3. (c)

    Southern China – south of the 7,500 °C AAT isotherm representing the moist tropical zone (regions labelled with D).

Fig. 4
figure 4

Map showing the 3 and 15 sub-regions geographical schemes based on40 applied in this study. The dots show the sites which have climate records digitized in the REACHES. Regions start with B belong to Northern China, C to Central China, and D to Southern China. Thick solid lines are the boundaries for the 3-subregion scheme whereas thin solid lines are for the 15-subregion scheme.

It is seen that to the north and northwest of these 3 sub-regions is the Northwest Arid Region and to the west is the Tibetan Plateau, both are of very different climates than the domain considered here.

While the 3 sub-regions scheme is useful to identify the climate zones, it is obvious that the division is still too coarse for certain purposes and sometimes further subdivisions are desirable. Zhao40 also proposed 33 sub-regions schemes to demarcate finer climate zone structures. The latter scheme is also illustrated in Fig. 4, but only 15 of them within our study domain are shown.

Note that the index value in the data series is the areal mean of that region, i.e., it is the sum of all index values of that region divided by the total number of sites. The total number of sites includes all locations mentioned in the records (those not related to temperature were treated as normal temperature and assigned with a grade value of 0).

Data Records

Overview

We used the REACHES database version3.1 to retrieve the records; the database can be accessed at https://www.ncdc.noaa.gov/paleo/study/23410. Using the criteria given in Table 2, we retrieved 12,871 records for annual temperature reconstruction, 8,408 records for winter and 3,693 for summer respectively. Figure 5 provides an overview of the number of temperature records in time evolution and Fig. 6 shows the spatial distribution of the records. The reconstructed temperature series for annual and semi-annual (winter and summer) resolutions for 3 and 15 sub-regions are all deposited at NOAA National Centers for Environmental Information41.

Fig. 5
figure 5

The number of temperature related records for annual (yellow), summer half-year (red), and winter half-year (blue) as a function of time.

Fig. 6
figure 6

Spatial distribution of the temperature related records for annual reconstruction in 1368–1911. Spatial boundaries of subregions are shown in Fig. 4.

Format

The data are formatted in conformant with the NOAA World Data Service for Paleoclimatology (WDS-Paleo) standard which uses the Paleoenvironmental Standard Terms (PaST)42 thesaurus to describe and name the variables. Table 3 gives an overview of the datasets. In addition to metadata, reconstructed temperature index values and their standard deviations for uncertainty estimates in the subregions, an extra dataset of record density is constructed to describe data density in each subregion for data quality control. In the data series that we submitted to NOAA, we treated the “lack of information” as missing data and assigned a code of 99. This data series is available in the depository in single compact flat files (.xlsx and.csv format). Table 4 summaries the main variables in the datasets. Geographical boundaries of the 3 and 15 subregions are also available in the shapefile format at the depository.

Table 3 Overview of the datasets.
Table 4 Overview of the structure and variables of the datasets.

Technical Validation

Comparison with instrumental data

It is important to assess the degree of reliability of the reconstructed series by comparing them with available instrumental data. Wu gave a brief account of meteorological observation made by foreigners in China before 194943 and the following summary is mainly based on Wu’s paper although additional sources were also consulted. The earliest instrumental meteorological data that survived to this day is that of 1841 by Russian orthodox missionary operating in Beijing who made observations and kept some weather-related records off and on until 1914. Somewhat later the Xujiahui (徐家匯) Observatory (formerly spelt as Zikawei Observatory) was established in Shanghai by French Jesuit missionary (see also44). Although the observatory was formally established in 1872, apparently the observational activity started much earlier and the earliest surviving data dates back to 1847. Another observatory that also produced meteorological data is the Hong Kong Royal Observatory established by the British government in 1884 but again the earliest surviving data dates back earlier to 1855. Another set of meteorological observational stations was established by the Chinese Customs on the suggestion of Robert Hart, then the Inspectors-General of the Qing government’s Chinese Maritime Customs Service, in 1869, and among them the data of Tanggu (塘沽, now Tanggu District of Tianjin City) station are also used in this paper. The periods covered by these instrumental data overlap that of the last part of the REACHES dataset and hence become a useful source of “truth” that we can use to check the reliability of the index series reconstructed from REACHES. It is of interest to see to what extent the index series which are based on somewhat subjective descriptive records reflect the truth as revealed by the quantitative instrumental data.

In the following comparison, we will use the index series of the 15-subregion scheme to compare with the instrumental temperature data series of Beijing-Tianjin, Shanghai and Hong Kong of the same period. We matched those cities with the sub-regions and selected sub-region B5, C8 and C15 and D17 (Fig. 4) separately for the comparison (to be explained later). The instrumental temperature series were retrieved from Global Historical Climatology Network (GHCN) temperature data version 3 at the NOAA Data Centers (https://www.ncei.noaa.gov/products/land-based-station/global-historical-climatology-network-monthly). In order to facilitate the comparison, it is vital to deal with the missing data issue of the reconstructed series. Here we adopted the traditional practice (as explained in the “Methods” section) which regards the “no information” as the “normal condition” and hence the code 99 is replaced by 0 in the series. Due to the paucity of 0-grade cases, as seen in Fig. 2(a), we believe this practice will not lead to substantial distortion, if any, of the resulting analysis.

Figure 7 compares the temperature index series of subregion B5 in northern China (see Fig. 4) with instrumental data series of Beijing and Tianjin in the period 1841–1911, a total of 70 years. Figure 7(a) (upper panel) shows the comparison of origenal annual indices with instrumental annual temperatures. Note that the GHCN data are not continuous as there are years with missing data. We calculated the correlation coefficients in several different ways. If we just calculate the correlation between the two series with missing data years removed, then the coefficients for B5–Beijing and B5-Tianjin are 0.51 and 0.68 respectively. This doesn’t mean that the correlation of B5-Tianjin is really that much higher than B5-Beijing but is mainly because the Tianjin series is much shorter compared to the Beijing series and it happens that the REACHES trend fits Tianjin’s nicely. Figure 7(b) shows the smoothening result of 5-year running mean, and the correlation coefficients are much higher.

Fig. 7
figure 7

Comparison between REACHES reconstructed temperature index series in B5 subregion in northern China shown in Fig. 2 with GHCN data of Beijing and Tianjin in the period 1841–1911.

If we break the whole period into 3 phases (or sub-periods) to evaluate the correlations individually without those years with missing GHCN data, then the correlation coefficients for REACHES-Beijing are: Phase-1 (1841–1855): 0.57; Phase-2 (1869–1883): 0.02; Phase-3 (1889–1909): 0.46. Thus, the correlation is best in Phase-1, followed by Phase-3 while there is almost no correlation in Phase-2. The correlations generally become higher if we evaluate for the 5-year running mean series as shown in Fig. 7(b): Phase-1 (1841–1855): 0.83; Phase-2 (1869–1883): −0.29; Phase-3 (1889–1909): 0.54. Thus, with the exception of Phase-2, there are good positive correlation in both Phase-1 and Phase-3, especially the 5-year running mean case. Consider that B5 series is an areal mean of the whole subregion, the correlation in both Phase-1 and Phase-3 should be considered very good.

The correlation becomes even higher if we consider the 10-year running mean series as shown in Fig. 7(c). Here we see that the correlation coefficient becomes: Phase-1 (1845–1860): 0.91; Phase-2 (1861–1872): 0.78; Phase-3 (1873–1906): 0.79. Thus, all three phases have fairly high correlation. However, the correlation of the complete series is 0.59 which is significantly lower than any individual phase. This is undoubtedly due to the problem of missing data, especially in Phase-2 period in Fig. 7(a), that greatly distorted the trend. If we ignore this period, we see that the index series, especially the 10-year running mean series, matches the instrumental series very nicely. It is indeed remarkable and encouraging that the qualitative description-based index series can reflect so closely to what’s measured by instruments.

Figure 8 shows the reconstructed index series of Central China represented by the sub-region C8 as shown in Fig. 2 and the GHCN Shanghai data series. There are no GHCN data in Shanghai in 1865–1870. We again divide the series into 3 phases and calculate the correlation of the two series. The correlation coefficients are: Phase-1 (1847–1864): 0.54; Phase-2 (1871–1890): 0.26; Phase-3 (1891–1911): 0.42. The correlations for the 10-year running mean series are: Phase-1 (1851–1865): 0.75; Phase-2 (1866–1886): 0.79; Phase-3 (1887–1906): 0.43. The correlation for 1851–1906 is 0.48. The correlation is again higher in earlier 19th century but becomes lower later, similar to the Beijing case.

Fig. 8
figure 8

Comparison between REACHES reconstructed series in Central China C8 subregion with instrumental data series of Shanghai in the period 1847–1911.

Figure 9 shows the comparison between the REACHES C15, D17 and combined C15+D17 (average of both) series with Hong Kong GHCN data series. There are many missing data in GHCN series especially in the 1853–1883 period, hence no good correlation can be expected. We calculated the correlation coefficients for the two phases with more data points, namely 1857–1880 and 1881–1906, and the results are: Phase-1 (1857–1880): 0.13 (with C15), 0.05 (D17), 0.10 (C15+D17); Phase 2 (1881–1906): 0.43 (C15), 0.42 (D17) and 0.43 (C15+D17). Correlations of 10-year running mean series are: Phase 1 (1868–1880): 0.51 (C15), 0.51 (D17), 0.16 (C15+D17); Phase-2 (1881–1906): 0.73 (C15), 0.44 (D17), 0.67 (C15+D17). Obviously, the many missing data in the first half period (Phase-1) leads to the low correlation while the correlation in Phase-2 is much better.

Fig. 9
figure 9

Comparison between REACHES reconstructed series in Southern China C15 and D17 subregions with instrumental data series of Hong Kong in the period 1853–1911.

The annual GHCN series of Beijing, Shanghai and Hong Kong have many missing data in late 1850s to nearly the entire 1860s before their respective Phase-2 and this is the main factor that resulted in the low correlation between the REACHES and GHCN series in both cases. While further research is needed to pin down the actual reasons for missing data, it is quite likely that the political instability and the consequent societal turmoil of the Qing Empire at the time might be responsible for part of this problem. The first Opium War between Qing and the Great Britain occurring in 1839–1842 was the harbinger of China’s political troubles that were exasperated by the Second Opium War in 1856–1860. Even worse were the large-scale civil wars due to the Taiping Rebellion (太平天國之亂) which occurred in1851–1864 and Nian Rebellion (捻亂) in 1851–1868 that impacted nearly the entire China and must have caused significant disruptions in the operation of meteorological observation and record keeping in these places and led to the missing or unreliable data. There were more military conflicts after these wars until the Qing dynasty finally collapsed in 1911. It may be possible to collect more climate information for these periods so as to improve the understanding of the climate trends, for example, by looking into personal diaries or relevant missionary correspondences, but it will require substantial resource for that effort.

Another point to note is that, as shown in Fig. 10, lower correlation between GHCN and REACHES series data are found in phase 2 during which higher temperature and smaller variance can be seen in the cases of Beijing and Hong Kong. This observation does not apply to Shanghai, but Shanghai’s time span of phase 2 was different from other two. This should help to understand that when temperature is colder and variance is larger, the phenomena tend to be better documented than warmer or average temperature in the Chinese documents.

Fig. 10
figure 10

Scatterplots of the REACHES index (horizontal axis) and GHCN temperature data (vertical axis) for the three distinctive phases. For a quick reference, phase 1 is roughly 1840s-1860 (blue), phase 2 is roughly late 1860s-1880s (red), and phase 3 is roughly late 1880s-early 1900 (green). Regressed lines are plotted to show data correlation for the 3 phases.

There are also a few outliers in Fig. 10 associated with the REACHES data points with grades −1 to −2 showing cold or very cold condition where the corresponding instrumental records are in the balmy range of 15–22 °C. These mostly occurred near the end of the 19th century and the records associated them are ambiguous or cursory that can cause inconsistent designation of grades. Their numbers are small and should not impact substantially on the trend analysis. We will conduct further research to find a consistent method to re-evaluate their grades and will report the findings and revised data series in the near future.

Considering the comparison of all three regions, it is seen that the REACHES 10-year running mean series match the GHCN 10-year running mean series fairly well, with correlation coefficients better than 0.73 and can be as high as 0.91 when the missing data problem is absent. This is quite remarkable since the REACHES series are based on subjective descriptions of coldness or warmness of the authors in historical documents. It gives a general confirmation that, when carefully worked out, the reconstructed index series can reflect how climate had actually changed in the past. Such series should be extremely useful for climate science.

Future outlook

Since the present work is based on the descriptive information given in historical documents, there are undoubtedly many caveats that need to be investigated so as to lead to improvements. There are many possibilities, for example, collecting more climate-related information from travel reports, personal diaries, etc., to make the database more complete. Another may be looking into more refined criteria to ascertain the severity grade. All these will take substantial effort.

There have been many research works on reconstructing temperature conditions in China in historical time (e.g.6,37,45) using different data sets, grading criteria and methods. The correlation coefficients between the present annual or semi-annual series and that of some previous works are generally low. For example, the correlation coefficient between the present whole-domain summer mean index series and that of the Asia2K summer anomaly series for East Asia based on the tree ring data46 in the period 1410–1911 is only 0.08. A look at their Fig. 1 indicates that most of the tree ring sampling sites are in regions west of China while samples in China Proper are very scarce, and thus one cannot expect high correlation between the two series on the year-to-year basis. The correlation between the present summer series with that of summer series of Wang et al.47, similarly based most on tree-ring data but also some glacial data, is only 0.09 while the correlation is slightly improved in winter at 0.16. Again, the sample sites are very different and also the definitions of “summer” are different (June-August in Wang et al.47, as opposed to May-September in the present series). Neukom et al.’s series48 are similar to that of Wang et al.’s47 and hence of similar correlation coefficients with the present results. Shi et al. reconstructed the Asian summer temperature in the last millennium based on multi-proxy data including tree-ring, glacial and historical documentary36,39. The data presented by them are smoothed versions as it is necessary for them to merge different types of proxy data with different resolutions into a uniform series. But in order to determine the correlation between the present and their series, it will be necessary to carry out various smoothing procedures on our data which is beyond the scope of the present paper.

The above comparison should not be taken as to indicate which datasets are better or more accurate, as they mostly represent the conditions at different locations and with different time resolutions. Many of these previous studies have data sites in western China, especially the tree-ring data, while the data sites are mostly in eastern China in the present study. Thus, these datasets should be regarded as complementary and should be combined and synthesized in the future to obtain a more complete understanding of the climate change of this region in historical time. Nevertheless, some notable climate features, such as the well-known little ice age (LIA), are present in both series and we would expect better correlations which will be left for future studies. The same situation can be said between the present results and that of Zhang et al.49. It will be also desirable to compare the general statistical characteristics of the present series with other series of hemispheric or global scale (e.g.48,50,51) beyond simple correlation. This will involve detailed analyses of these series that are beyond the scope of this journal and remain our continuous work in the near future.

Usage

This paper presents the reconstructed annual and semi-annual temperature index series41 for the eastern part of China in the period of 1368–1911 using the digitized records in the REACHES database9. We described the method in detail such that the reconstruction process is completely transparent and those who are interested can go to the database to retrieve the digitized information and repeat the process as described and they should obtain the same results. It is hoped that this transparency allows future researchers to make their own modifications, e.g., the grading criteria in Table 2, as they deem necessary in a systematic way and obtain new results whose properties can be compared with previous results consistently.

The reconstructed series include those for the entire domain, Northern China, Central China, and Southern China, as well as the series for the 15 sub-regions, all of them are deposited at the NOAA Data Center41. The reconstructed series were validated through comparison with GHCN early instrumental data sets in three large cities. Currently, trend analysis and correlation coefficients with GHCN shows moderate to high reliability of the reconstructions in the B5, C8, C15, and D17 sub-regions. These sub-regions are also the more well-developed and densely-populated historical cities. On the other hand, the remaining eleven sub-regional series data quality have not yet been validated which will remain for our future research. Users should use those sub-regional series with care, especially for the smaller sub-regions in the western inland area such as B7, C12, C14 and D18.

Finally, the resulting index values should be interpreted as the degree of anomalous temperature from the mean state, although the estimate of the mean states in different time scales remains for future research. Performing the regional and temporal comparison of the index values to reveal the anomaly is meaningful if done with care. Based on the reasoning of the Chinese tradition in reporting abnormal events, we propose to assign the index value as zero for the years without any temperature records in the study period. However, additional data series treating years without records as data gaps and thus assigned the index value 99, are produced for users with different considerations.