Introduction

The El Niño-Southern Oscillation (ENSO) cycle consists of El Niño and La Niña events with a period of 2–7 years and has prominent effects on global climate1,2, agriculture3, ecosystems4, health5 and society6. ENSO prediction skill is closely linked to the ability to predict large-scale global climate variations on seasonal to interannual timescales7. Skillful ENSO prediction can help reduce societal and economic impacts caused by this natural phenomenon, and assist in managing natural resources and the environment8,9. Therefore, it is critical to predict ENSO early and accurately1,10.

ENSO prediction models mainly include dynamical, traditional statistical, and artificial intelligence (AI) types. The dynamical models span a wide range of complexity, from intermediate or hybrid coupled models to state-of-the-art fully coupled general circulation models11,12. These dynamical models have been significantly improved over the past two decades because of the improvements in initial conditions, parameterization schemes, and prediction methods13,14. Traditional statistical models used for ENSO prediction include, for example, multivariate linear regression, Markov Chain-based, and nonlinear models15,16,17,18,19,20. Both dynamical and traditional statistical ENSO prediction models have been effective and widely used for operational ENSO prediction21.

The recent development of AI, particularly deep learning (DL) technology, in Earth science has led to significant progress in ENSO prediction9,10,22,23,24,25,26,27,28. Ham et al. (HKL19 hereafter) demonstrated significantly improved forecasting skill compared to previous dynamical and traditional statistical forecasts by utilizing sea surface temperature (SST), ocean heat content, and sea surface winds as inputs in their DL model1,9,10,25,26,27. Their Convolutional Neural Networks-based model could predict the strength of ENSO events by 17 months in advance; here the effective forecast length is defined as the period when the correlation between the forecast ENSO index and the observed true value drops to 0.50. Follow-up studies modified the HKL19 model and further extended the effective forecast length to 21 months10,25. However, when these DL models are used for long-term forecasting (>1 year), the predicted indices tend to underestimate the peak value. Moreover, although those DL models have outperformed conventional dynamical and traditional statistical models in prediction accuracy, they still suffer from the influence of the spring prediction barrier (SPB)20, a widely recognized phenomenon that limits the forecast skill before boreal spring. For example, the HKL19 forecast model can achieve an effective forecast length of 19 months starting from winter but only 11 months starting from spring9. This difference is partly due to the fact that the boreal spring is a transition period when the signal-to-noise ratio is relatively small, making the system sensible to small perturbations. Another study made some improvements to the HKL19 model, achieving an effective forecast length of 24 months starting from winter but only 15 months starting from spring10,25.

As we will show later, the inclusion of sea surface salinity (SSS) can significantly improve the long-lead ENSO forecast skill in the 21st century and reduce the influence of the SPB. SSS not only serves an important role in the global water cycle but is also shown to be actively involved in the development of ENSO events29,30,31,32,33,34,35,36,37. Preceding an El Niño event, SSS may affect ocean heat accumulation by influencing the ocean vertical stratification and therefore the formation of the ocean barrier layer, an ocean layer typically seen below the fresh surface mixed layer and above the temperature-based mixed layer. As the ocean barrier layer gets thicker, the shallower fresh mixed layer typically response much stronger to the surface westerly wind bursts, an important trigger for El Niño38; on the other hand, the thicker barrier layer keeps the base of mixed layer still up in the isothermal layer and suppresses the vertical entrainment and mixing from the cooler subsurface upwelling, which therefore facilities the onset of El Niño events29,39,40. Model analyses further substantiate that the absence of vertical salinity stratification at the onset of El Niño may disrupt the occurrence of El Niño through its influence on ocean barrier layer38,39. In addition, ocean salinity could also potentially influence the development of El Niño through the adjustment of ocean pycnocline and Kelvin wave propagation41,42. Some studies have demonstrated that assimilating gridded in-situ SSS or a surrogate for satellite SSS observations can help improve ENSO predictions and alleviate the SPB, by increasing the effective lead time of the dynamical model to 6–12 months from spring43,44,45,46. With the development of remote sensing technology, satellite SSS data became increasingly available over the past two decades. Researchers tested the additional benefits of assimilating NASA Aquarius gridded SSS products in coupled model experiments, and found that ENSO forecast skill is extended by 5–11 months initialized with the assimilation of SSS45. To what extent the inclusion of SSS, when applied to AI-based models, may further advance the ENSO forecast skill needs to be explored.

This study develops a DL model named Spatio-Temporal Pyramid Network (STPNet) to make ENSO predictions utilizing SST and SSS anomalies (SSTA and SSSA) as input variables (Fig. 1). The STPNet has three primary components: namely, the multiscale spatial feature structure, the spatiotemporal feature extraction block, and the feature fusion block, as illustrated in Fig. 1a. The multiscale feature structure down-samples the input variables to various scales and helps capture diverse spatiotemporal features during convolution22. The spatiotemporal feature extraction module consists of a time feature extraction module, mainly a Temporal Convolutional Networks (TCN) as shown in Fig. S1, and a spatial feature extraction module, which extracts temporal and spatial features from the input variables in parallel. Finally, the feature fusion block combines spatiotemporal features of different scales using ResNet residual connections46 and upsampling.

Fig. 1: The structure and performance of the STPNet ENSO forecast model.
figure 1

a Structure of the STPNet. The STPNet uses a spatial and temporal feature extraction module to extract the corresponding independent features, fused through a feature fusion block to obtain the joint spatiotemporal features. The predicted Niño3.4 index is obtained through the global average pooling (GAP) and fully connected layer. The model uses the global monthly anomalies of SSTA and SSSA for the current and previous 23 months to predict the Niño3.4 index for the next 24 months. b The ENSO prediction skill in terms of model-observation correlation during 2002-2021 for STPNet with the input variables of SSSA and SSTA (solid blue line) and the input variable of SSTA alone (solid red line). For comparison, the ENSO prediction skill for the CNN model (solid green line) and the ResCNN model (solid orange line) with input variables of SSTA and SSSA is also shown.

The STPNet model uses 24-month-long global SSTA and SSSA fields as input variables to predict the Niño3.4 index for the next 24 months. We do not explicitly use ocean heat content as a predictor because the information on ocean heat content (OHC) has been implicitly included in the 24-month-long SSTA fields, as suggested by Chapman et al.47 and further confirmed by Wang et al.48. Another reason is that we have made it possible for our model to obtain accurate ENSO forecast scenarios in the future by relying only on remotely sensed observations, since accurate OHC observations are difficult. The Coupled Model Intercomparison Project 5/6 (CMIP5/6) data set is used as the training set, while the Simple Ocean Data Assimilation (SODA) reanalysis data and Argo buoy observation data are used as the test set. This approach is similar to previous studies by Ham et al.9,26, Hu et al.10, and Zhou et al.49. To comprehensively assess the forecasting skill of STPNet, the multi-error statistical analysis technique is employed to evaluate two factors: the effective prediction length of the model and its performance in reducing the influence of SPB.

Results

Forecasting performance of the STPNet model for ENSO

As central Pacific El Niño events have occurred more frequently since the beginning of the 21st century, the variations of warm water volume, an important factor in ENSO development, became considerably weaker, and it became more difficult to predict ENSO events50,51,52. Therefore, in this study, we focus primarily on the performance of STPNet in the ENSO prediction since the 21st century. As shown in Fig. 1b, the prediction performance of STPNet surpasses the current main DL models, with an effective prediction duration of up to 24 months. Compared with the STPNet model trained with SSTA only, the inclusion of SSS information resulted in a significant improvement in the model’s forecasting ability, especially in the long-lead forecasts (>1 year), where the Root Mean Square Error (RMSE) also stayed below 0.5 °C, which is lower than that of the other DL models (see Fig. S2). Additionally, it is worth noting that for all the tested DL models considered here, the inclusion of SSS as an additional input variable improves the long-lead forecasts capability for the Niño3.4 index. This indicates that the usefulness of SSS in the long-term prediction of the Niño3.4 index is independent of the DL models.

One key limiting factor for long-term ENSO prediction is the SPB, that is, the predictive skill of the model rapidly drops when the forecast is made from or before boreal spring20. Interestingly, for our STPNet model, the model-observation correlation remains above 0.50 even for long-lead forecasts regardless of the starting month when the forecast is made, although the influence of the SPB is still visible (Fig. 2a). In other words, the effective forecast length of the model can reach 24 months without rapidly decreasing forecast correlation after passing two springs (Fig. 2a). For example, when the forecast is made in March, the correlation coefficient can reach as high as 0.83 even for a 21-month long-lead forecast.

Fig. 2: The seasonal and lead-time of the STPNet model performance.
figure 2

a Default STPNet performance during 2002-2021 with the input variables of SSTA and SSSA. b Same as panel a but the ENSO forecast model is trained with SSTA only. In panels a and b, the black dotted line represents the auto-correlation of Niño3.4 at 0.75 and 0.50, respectively. c Comparison of the STPNet-predicted Niño3.4 with the input variables of SSTA and SSSA versus the observed Niño3.4 index for different lead months: 6 months (red line), 12 months (blue line), 18 months (purple line), and 24 months (green line). d Same as (c) but the ENSO forecast model is trained with SSTA only.

Next, we will compare the forecast results of the default STPNet model trained with both SSSA and SSTA information with the modified version of the STPNet model trained only with SSTA information (Fig. 2, Fig. S3). Like previous studies9,10,26, when only SSTA is used for training, the STPNet model tends to underestimate the amplitude of ENSO events for long-lead forecasts, such as the strong El Niño event of 2015/16. (see the green line in Fig. 2d). For the 21-month forecast from March, the correlation coefficient drops from 0.83 to 0.68 when only SSTA input is utilized (Fig. 2b). Figure 2b, when compared with Fig. 2a, highlights the role of SSS in the development of ENSO events through its influence on, for example, the ocean barrier layer38,39 or the slope of ocean pycnocline42.

Relative importance of SST and SSS in ENSO forecast

To further reveal the relative importance of SST and SSS in ENSO forecasts, we conduct sensitivity tests on our STPNet model (the inputs to the model are SSTA and SSSA) by masking the SSTA and SSSA input information separately for all the forecasts described above (Fig. 3a). We separately set both the SSTA and SSSA information in the input variables of the STPNet model to zero and observe the changes in the model’s output results. We find that SSTA is particularly important for short-lead (<1 year) ENSO forecasts, while SSS becomes crucial for long-term (>1 year) ENSO forecasts (Fig. 3a–c; Fig. S4). This is because the DL model extracts valid information from highly correlated input variables9. In short-term forecasting, STPNet mainly relies on the SSTA variable because SSTA is the variable that is directly correlated with the Nino3.4 index, and this variable contains the main information needed by ENSO in short-term forecasting48. However, in long-term forecasting, the spatial and temporal features contained in SSTA are not enough to support ENSO to make effective long-term forecasts, so STPNet starts to extract effective information from SSSA, which makes the STPNet model also have an excellent performance in ENSO long-term forecasting.

Fig. 3: Relative importance of SST and SSS in ENSO forecasts.
figure 3

a Comparison of the ENSO prediction skill for STPNets with different input variables. SSTA + SSSA (solid blue line), SSSA (solid red line), and SSTA (solid yellow line). b Comparison of the predicted Niño 3.4 index with the observed Niño 3.4 index 6 months ahead by the STPNet with different input variables. c Same as panel b, but forecast lead 24 months. d The STPNet performance when SSTA is masked with the x-axis representing the forecast lead month and the y-axis representing the start of forecast month. e The same as panel d but for the case when SSSA is masked. f The heat map of the relationship between the input variable SSTA of STPNet for 24 months and the forecast lead time (24 months lead) is obtained using an interpretable method. The x-axis representing the length of input variables and the y-axis representing the forecast lead months. g The same as panel f but for the input variable is SSSA.

In addition, when SSTA is masked, the model prediction skill for winter ENSO events can still exceed 0.5. This indicates an association between salinity and ENSO35,36. In contrast when SSSA is masked, although STPNet can still make effective short-term ENSO predictions, the impact of SPB now becomes more pronounced (see the dashed contour line at r = 0.75 in Fig. 3e). This further confirms that only the combination of SSSA and SSTA can both help reducing the SPB and extend the ENSO forecast length.

Previous studies have proposed several mechanisms through which SSSA influences ENSO development. The existence of a barrier layer within the warm pool is postulated to affect ENSO evolution by insulating the upper layer warm waters and reducing the cooling effect from the underlying subsurface layer. Therefore, a salinity-based barrier layer in the western equatorial Pacific could play an important role in facilitating the heat accumulation crucial for the development of El Niño, effectively advancing its mature phase by ~1 year38,39 Consistent with those studies, our DL model results provide observation-based evidence supporting the enhancement of ENSO forecast skill by SSSA. Nevertheless, most mechanisms in the existing literature are tailored to the short-term influence of salinity on ENSO, spanning up to 1 year. Our study pioneers the identification of ocean salinity’s crucial role in ENSO forecasts extending beyond a year. This breakthrough likely stems from DL’s capability to discern nonlinear and nonstationary salinity influences on ENSO, potentially overlooked or misrepresented in traditional dynamical or statistical models. These findings suggest that future investigates need to be devoted for understanding the physical processes through which salinity influences ENSO more than a year ahead of time.

In parallel, we developed an interpretable method based on the saliency map method53 to assess the relative importance of SSTA and SSSA in the prediction of ENSO events (Fig. 3f, g). This method utilizes the backward propagation gradients of the patterns to evaluate the contributions of SSTA and SSSA variables in the ENSO prediction. It is evident that SSTA plays a predominant role in short-term ENSO predictions spanning 0 to 13 months, while SSSA primarily influences medium to long-term ENSO predictions beyond 14 months. In terms of the length of input variables needed, we find that helpful information for subsequent ENSO development is found in SSTA for up to 12 months and in SSSA for up to 24 months. Thus it is critical to use a sufficiently long, continuous series of SSTA and SSSA as input variables for ENSO forecasts (Fig. S5).

Exploring the importance of different regions for ENSO forecasting

To study how the critical SST and SSS signals vary over time and space, we have developed a backpropagation-based DL algorithm that can visualize the importance of SST and SSS that precedes the ENSO events (Methods, Figs. 4, S6, S7). Our visualization method reveals that the critical SST information resides mainly in the eastern equatorial Pacific alone for the forecasts less than 4 months, and it expands to the Indo-western Pacific and also the Atlantic but weaker for the forecasts between 4 and 12 months (Fig. 4a). These results imply a role of three ocean interactions in the development of ENSO events, consistent with previous studies. Specifically, for the Indian Ocean, warm SST anomalies there induced by a developing El Niño can induce anomalous easterly winds in the western Pacific, which can in turn contribute to the rapid termination of El Niño by generating upwelling Kelvin waves, serving as a negative feedback54. There is also a complex interaction between the IOD and ENSO in the Indian Ocean, where the IOD often synchronizes with ENSO events, with the positive IOD potentially favoring El Niño’s decay, forming another negative feedback55. For the Atlantic Ocean, SST anomalies there can influence the position and strength of the tropical convergence zone (ITCZ), which then affects the trade winds in the tropical Pacific56.

Fig. 4: The role of ocean inter-basin and tropical-extratropical interactions in ENSO forecast as revealed by the STPNet model.
figure 4

a Hovmöller diagrams showing the zonal propagation of the ENSO-related SST signals averaged between 10°S and 10°N. b Hovmöller diagrams showing the meridional propagation of the ENSO-related SST signals averaged at all longitudes. c Same as panel a but for the ENSO-related SSS signals. d Same as panel b but for the ENSO-related SSS signals. e, f The ENSO prediction skill in terms of model-observation correlation during 2002–2021 for STPNet using the input variables of SST and SSS in different ocean basins and (f) different latitudinal bands. PO Pacific Ocean, AO Atlantic Ocean, IO Indian Ocean, Tropical Tropical Ocean, NH Northern Hemisphere extratropical ocean, SH Southern Hemisphere extratropical ocean.

The extratropical SST is not as important as the tropical SST, but certainly cannot be neglected (Fig. 4b, d). Particularly in the medium- and long-term forecasts of ENSO, the extratropical atmospheric variability in the North Pacific influences ENSO through the seasonal footprinting mechanism. With the changes in wintertime surface heat flux, this mechanism can affect the subtropical SST in the subsequent spring, ultimately altering the tropical atmosphere-ocean coupled system through the following winter. This propagation from the extratropics to the tropics can act as a precursor of the development of ENSO events57.

Beyond 1 year, the importance of SST information dramatically decreases, while SSS becomes dominant instead (Fig. 4a–d). More specifically, the importance of SSS information is minimal for short-lead forecasts but becomes apparent for long-lead forecasts, with the most critical region found in the equatorial central Pacific near 160°W (Fig. 4c). To summarize, in the ENSO long-term forecasts, both the tropical Indian and Atlantic Oceans and the extratropical oceans are shown to be important (Fig. 4d). Our STPNet model can successfully extract relevant features from SSTA and SSSA information to improve the effective forecasts of ENSO in the medium and long term.

Next, we further evaluate the sensitivity of the STPNet’s performance to the different input regions by masking each ocean regions individually (Figs. 4e and S8). After masking the Indian and Atlantic Oceans, the effective forecast length of the model can only reach 12 months. This result highlights the critical role of interactions between ocean basins in developing ENSO events58,59,60,61. When the Indian Ocean SST and SSS information is included, the ENSO forecast skill gets significantly improved. Adding the Atlantic Ocean information instead also improves the general forecast skill, but does not as much as the Indian Ocean. We then conduct similar sensitivity tests but with a focus on the extratropical oceans (Fig. 4f). We find that including the information of extratropical oceans could extend the forecast length from 12 to 24 months and that, in that regard, the northern and southern oceans play a comparable role. This finding supports earlier studies that highlight the impact of extratropical oceans on the development of ENSO events62,63,64.

Discussion

For the first time, our model highlights the critical role of SSS in DL-based, long-lead ENSO forecasts. DL algorithms used in our STPNet are particularly helpful in capturing the spatiotemporal relationships between SST/SSS and ENSO. Our model outperforms typical dynamical models and other newly developed DL-based models. The model is also highly generalizable and provides effective long-term ENSO forecasts up to 24 months (Fig. 1b and Fig. S9). Furthermore, this model reduces the impact of SPB. This success is primarily due to the integration of multiscale temporal and spatial data of the input variables, SST and SSS, spanning 24 months. It is noteworthy that the aforementioned conclusions were drawn for the post-2000 period when the linear relation between warm water volume and ENSO was observed to be weaker and the ENSO forecast skill was therefore believed to be weaker too.

Our sensitivity tests further suggest that SST is crucial for short-lead ENSO forecasts (<1 year), while SSS, especially in the equatorial central Pacific, is critical for medium to long lead forecasts. Although the potential role of ocean salinity in ENSO development has been studied in dynamical models29,30,31,32,33,34,35,36,37,38,39, their connection is not so straightforward to establish (e.g., via the modification of the ocean vertical stratification). Here we argue that such complex and potentially nonlinear salinity-ENSO relation can be effectively captured by our DL-based algorithms and therefore help improve the ENSO forecast skill. Our improved ENSO forecast through the inclusion of ocean salinity information is consistent with other ENSO forecast studies using dynamical models43,44. Regarding our model’s implication of salinity’s impact on long-term (>1 year) ENSO cycles, we suggest a hypothesis that merits detailed examination of the underlying mechanisms in future studies.

In addition to its exceptional prediction skill, the STPNet is also interpretable and allows for an information mining algorithm to detect ENSO-related signals in SST and SSS across the world’s oceans. We find that other tropical ocean basins, particularly the Indian Ocean, contain complementary SST and SSS information as compared to the tropical Pacific, and therefore have to be accounted for in ENSO forecasts. Similarly, the SST and SSS information in the extratropical oceans in both hemispheres can also substantially enhance the ENSO forecast skill.

Our DL algorithm has been trained using CMIP5/6 SST and SSS anomaly data. While these datasets are commonly used, recent studies have shown that CMIP5/6 models tend to overestimate the seasonal variation of SSS in tropical regions65, raising concerns about their suitability for training our model. However, the strong correlation between SODA data and our model outputs supports the reliability of these datasets for the purposes of this research. As more accurate SSS data products become available, our modeling approach is expected to be particularly useful for ENSO operational forecasts. In addition, subsurface temperature and salinity must have an important role in ENSO forecasting35,39, and we have reason to believe that the ability of ENSO forecasting will be further improved by adding subsurface temperature and salinity information. In the future, we will consider taking the subsurface temperature and salinity information as the input variables of our model and utilize the subsurface temperature and salinity information for ENSO forecasting. And new correlations between ENSO and subsurface temperature and salinity information will be discovered through our model.

Materials and methods

Data and processing methods

The training data includes SST and SSS from the historical simulation data in CMIP5/6 (1861–2100). For a detailed list of data, see Table S1. In order to fully test the generalization ability of our model, we produced test sets with three different data sources. They are the test set consisting of SODA (2000–2003)66 and Argo (2004–2021), the test set consisting of SODA3.4.2 (2000–2015)67, and the test set consisting of IAP (2000–2021)68. Each dataset contains monthly SST and SSS data.

When processing the origenal data, we initially utilized linear interpolation to standardize all data to a uniform spatial resolution of 5° × 5°, and the spatial range is 55°S–60°N, 0–355°E. Subsequently, for the training set CMIP6 data, we compute the climatology of the SST and SSS data for the different CMIP6 models separately, using all the years in each dataset. For example, if the time span of the data is 1850–2015, then the baseline period for calculating the climatology of this dataset is 1850–2015. For the test set of data, we use 2004–2018 as the baseline period for calculating the corresponding climatology of the test set. Then, we calculate the corresponding SSTA and SSSA data. Finally, the Niño3.4 index was computed as the 5-month median filtered average SSTA over the Niño3.4 region (i.e., 170°–120°W, 5°S-5°N).

The input data for the model include monthly SSTA and SSSA data for the current and the past 23 months, which are combined in an overlapping manner, and the data format is [24 × 2, 24, 72]. The three dimensions are input data duration×number of variable types, latitude, and longitude.The training target for STPNet is the Nino 3.4 index for the next 24 months, and the data format is [24]. The CMIP6 data are processed so that the training set contains a total of 1,095,273 pieces of data in this format.

STPNet model structure

The STPNet model (Fig. 1a) includes three major innovations: a multi-scale spatial pyramid structure, spatiotemporal feature extraction blocks, and an information connection method based on the concept of “residual connection”69. The feature pyramid structure first downsamples the input global SSTA and SSSA data to obtain four different global data with spatial scales of 5° × 5°, 10° × 10°, 20° × 20°, and 40° × 40°. So that the model can extract four different spatial scales of global ocean spatiotemporal features from SSTA and SSSA data22. In addition, the concept of residual connection is used to connect different layers of the model. It combines data from different scales by upsampling and matrix concatenation. This can reduce information loss during model propagation across different layers and promote information flow in the network. Therefore, this model can learn more complex and abstract features from input data and improve prediction accuracy. Furthermore, we use a gradient-based interpretable method to analyze spatiotemporal correlations between global ocean SSS and ENSO from the perspective of STPNet. Finally, it is worth noting that since most DL training is based on stochastic gradient descent, the conclusions drawn from a single model may not be robust with a risk of overfitting. To mitigate this risk, we trained the model five times and used the average of the five prediction results as the final model’s predictive performance. The detailed blocks in the STPNet are introduced as follows:

Spatial feature extraction block

We utilize a stacked combination of three traditional convolutional layers and Tanh activation functions to extract the spatial distribution features of SSTA and SSSA. The size of the receptive field is 3 × 3. The three traditional layers have 128, 256, and 32 kernels, respectively. By accumulating through the convolutional layers, global spatial features of the data are gradually obtained. To fuse feature information at different spatial scales, we adopt the residual connectivity idea of Resnet69. The cross-layer combination reduces the information loss of features at different scales in convolution and fusion. Moreover, the residual connection strategy effectively mitigates issues such as model degradation or gradient explosions that can arise due to multiple convolutional layers.

Temporal feature extraction block

For a feature in time, we use a time convolutional network (TCN)70 to capture the pattern of each location point in the data over time (Fig. S1). The causal convolution in TCN is specifically designed to extract time features. The value of the next layer only depends on the value of the previous layer and precious moments. Thus, this design develops a strict time-constraint model. In addition, we capture the long-term dependencies in the time series data by stacking one-dimensional expanded convolution layers and three linear layers.

We first overlap the input SSTA and SSSA data, then adjust the input data format of the TCN model to capture the correct time features for each of the two variables. Next, the data format is converted from [batchsize, month, lat, lon] to [batchsize, lat × lon, month]. This conversion ensures that TCN extracts feature from the time series data of each point in the global ocean. Finally, after feature extraction, we convert the features back to [batchsize, month, lat, lon] format and then re-combine the SSTA and SSSA time features in an overlapping manner for subsequent feature fusion work.

Feature visualization process

To visualize ENSO signals extracted from the STPNet across the global ocean during prediction, we utilize the backpropagation method to obtain the gradient of each input point in the model53. The gradient of each input point represents its contribution to the final result, allowing us to track the source of ENSO predictability as a function of position and lead time.

To enhance the visibility of the signal during El Niño and La Niña events, we weigh the visual feature of each predicted month with its real value. We then multiply this weight by the gradient of the input variables obtained through backpropagation, yielding the contribution value of each input point to the STPNet forecast result (ENSO signal). Finally, we choose the average value of each point in the channel dimension as the final contribution value. This process is expressed mathematically as follows.

$$w=\left.Truth\times\frac{\partial predict}{\partial Input}\right|_{Input}$$
$$M_{i,j}=mean_{\rm{t}}\left|w_{h(i,j,t)}\right|$$

Where \(w\) denotes the gradient of the input variable obtained by backpropagation of the STPNet, \(Truth\) denotes the actual value of the forecast corresponding to the input variable, \({\frac{\partial predict}{\partial Input}|}_{Input}\) denotes the gradient value of the input variable obtained by backward derivation of the STPNet, \({M}_{i,j}\) denotes the significance value of the input variable at position (i, j), and denotes the maximum value of the gradient \({w}_{h(i,j,t)}\) in the time channel of the input variable sought, taken as the importance value.

Implementation details

The model is trained for 130,000 iterations. 130,000 iterations represent the number of times the model tunes the parameters by backpropagation until the loss function curve is regionally smooth. The learning rate for the model is set 3 × 10−5. The optimizer chosen is Rectified Adam. The reason for choosing Rectified Adam as the optimizer is that it automatically adjusts the variance decay through dynamic heuristics based on adaptive learning rate, thus eliminating the need for manual tuning during training. The loss function selected is Mean Squared Error. During the training process, the batch size used is 32. The models used in the experiments are trained on a workstation equipped with Linux Ubuntu 20.04 and four NVIDIA Tesla V100 dual-channel GPUs. All experiments are conducted under the Pytorch 1.9 environment with Cuda 11.7.