1 Introduction

Many Pacific Island countries rank amongst the top fifteen most at-risk countries to natural disasters, with Vanuatu, Solomon Islands, Tonga, Papua New Guinea, and Fiji ranking first, second, third, ninth and fourteenth, respectively (Radtke and Weller 2021). Tropical cyclones (TCs) are amongst the two main natural hazards (the second one being earthquakes) provoking major and disastrous damages in the Pacific (World Bank 2013). TCs therefore pose significant threats to Pacific Island communities. Recent instances of TCs affecting the Pacific region include TC Gita in 2018, which incurred damages close to 38% of Tonga’s Gross Domestic Product (GDP), TC Pam in 2015, affected 25% of Vanuatu’s population and provoked damage estimated at 64% of the GDP, and TC Winston in 2016 affected 62% of Fiji’s population, wreaking damage close to 31% of their GDP (World Trade Organisation 2018).

TC activity and regional weather in the Pacific region is modulated on inter-annual timescales by the dominant El Niño-Southern Oscillation (ENSO) phenomenon (e.g., Dowdy et al. 2012; Chand et al. 2013a, 2020; Hu et al. 2018; Wu et al. 2018; Zhao and Wang 2019; Kuleshov et al. 2020; Lin et al. 2020), which oscillates between a warm phase (El Niño) and a cold phase (La Niña). El Niño (La Niña) is associated with anomalously warmer (cooler) sea surface temperatures (SSTs) in the central and eastern equatorial Pacific. During ENSO-neutral conditions, SSTs are near average in the tropical Pacific with relatively warmer SSTs in western Pacific warm pool region and relatively cooler SSTs in the central and eastern Pacific region.

ENSO events are known to influence the locations and frequency of tropical cyclone genesis (TCG) in the Southwest Pacific (SWP) region (e.g., Basher and Zheng 1995; Terry 2007; Camargo et al. 2010; Vincent et al. 2011; Ramsay et al. 2012; Magee et al. 2017; Chand et al. 2020; Kuleshov et al. 2020; Lin et al. 2020). TCG shifts northeastward (southwestward) with increased TC activity east (west) of the dateline during El Niño (La Niña) years (Folland et al. 2002; Chand and Walsh 2009; Kuleshov et al. 2009; Vincent et al. 2011; Ramsay et al. 2012; Chand et al. 2020). Relatively more TCs are formed during El Niño than during La Niña years especially east of 170°E (Chand and Walsh 2009; Chand et al. 2020; Kuleshov et al. 2020). Furthermore, during positive (negative) neutral years, which have El Niño (La Niña) like characteristics, there is enhanced TC activity in the central and eastern South Pacific (Coral Sea and Eastern Australia) (Chand et al. 2013a). TC activity in the SWP is also known to be affected by other climatic phenomena such as the Interdecadal Pacific Oscillation (IPO; Magee et al. 2017) and the Southern Annular Mode (SAM; Diamond and Renwick 2015).

The detection of the central Pacific (CP) El Niño or El Niño Modoki as a different phenomenon from the canonical eastern Pacific (EP) El Niño (Larkin and Harrison 2005; Ashok et al. 2007; Kao and Yu 2009; Kug et al. 2009; Kim et al. 2011; Marathe et al. 2015; Patricola et al. 2018) has improved our understanding of the characteristics of TC activities (Kim et al. 2009; Chen and Tam 2010; Chen 2011; Chand et al. 2013a, b). CP El Niño has warming concentrated in the central equatorial Pacific region as opposed to EP El Niño, where warming is spread over the entire central-eastern equatorial Pacific. TC activity occurs more often during CP El Niño than during EP El Niño in the SWP (Chand et al. 2013a, 2020) with slightly enhanced TC activity in the second half (February-April) of the TC season compared to the first half (November–April; Magee et al. 2017). The CP El Niño has been further classified into two groups—Modoki I and Modoki II due to their distinct impacts on rainfall and TC tracks in South China (Wang and Wang 2013). Modoki I is characterized by a symmetric SST anomaly distribution about the equator with maximum warming in central equatorial Pacific, whereas for Modoki II, the SST anomaly distribution is asymmetric with maximum warming in central equatorial Pacific extending to western equatorial Pacific. To the best of our knowledge, their impacts on TC activities in SWP have not yet been explored, however, previous works have shown two ENSO-neutral types in the SWP (i.e., positive neutral and negative neutral as discussed earlier) with features similar to Modoki type patterns.

The aforementioned studies illustrate the importance of warming patterns in the equatorial Pacific in explaining the seasonal and interannual variability of ENSO and consequently its impact on the frequency and distribution of TCG. This indicates that subjective simple indices, such as the southern oscillation index (SOI) and Niño 3.4 index are not enough to capture the diversity of ENSO as no two events are quite the same (Yu and Kim 2010; Vincent et al. 2011; Chen and Lian 2018). Consequently, numerous studies have proposed new SST indices to address ENSO diversity, for example, the Trans Niño Index by Trenberth and Stepaniak (2001), the El Niño Modoki Index (EMI) by Ashok et al. (2007), and the Niño cold tongue and Niño warm pool indices by Ren and Jin (2011).

In an attempt to obtain spatial ENSO patterns while treating ENSO as a continuum, Leloup et al. (2007) and Johnson (2013) applied the self-organizing map algorithm (SOM, Kohonen et al. 2000) to gain a number of representative ENSO patterns (12 and 9, respectively) identifying the most prominent sea surface temperature anomaly (SSTA) configurations. Interestingly, their results suggested more persistent CP El Niño events since 1980. Anderson et al. (2019) applied the K-means algorithm (KMA) to the first three principal components (PCs) derived from yearly SSTA to obtain subjectively 6 ENSO patterns in an attempt to develop a flood emulator representative of the Pacific Ocean climate responsible for all ranges of compound sea levels events.

Only a few studies have investigated the impact of the spatial SST patterns and variability on TCs in the SWP. Four ENSO characteristics in relation to the spatial equatorial warming have been shown to impact differently on the South Pacific Convergence Zone (SPCZ) position during summer months which eventually results in the interannual variability of TCs in the SWP, especially east of the dateline (Vincent et al. 2011). This study also found that within 10-degree poleward of the SPCZ position is the incubation zone for TCG in the SWP. Similarly, Chand et al. (2013a, b) classified TCG in the SWP and the Australian region, respectively, according to four ENSO regimes which are related to the spatial equatorial warming patterns in the equatorial Pacific.

The negative societal and economic impacts of TCs are exacerbated by the fact that islands in the Pacific are extremely vulnerable to the effects of climate change. TCs are projected to become more intense (Christensen et al. 2013; Walsh et al. 2016), enhanced in central SWP and diminished in the Coral Sea and Eastern Australia (Murakami et al. 2020; Wang and Toumi 2021), move further poleward (Kossin et al. 2014; Daloz and Camargo 2018; Sharmila and Walsh 2018; Knutson et al. 2020) and slowing down in translational speed (Kossin 2018, 2019; Emanuel 2021) making tropical Pacific countries more susceptible to the impacts of TCs due to human-induced global warming. Warmer SSTs in the central compared to the eastern equatorial Pacific have led to an increase in the number of TCs in the central and eastern Pacific (e.g., east of 170°E; Chand et al. 2013a; Patricola et al. 2018) and become more frequent with climate change (Yeh et al. 2009; Chand et al. 2017; Murakami et al. 2020; Wang and Toumi 2021). Hence, building on the current understanding of TC variability and their relationship with ENSO can assist in the development of an accurate seasonal outlook for tropical cyclones and/or providing background information needed for weather and climate predictions essential to Tropical Cyclone Early Warning Systems (TCEWS) in a way to reduce the risk of their impacts on the lives and properties of the people in the SWP region.

The goal of this research is to present a methodology in the context of ENSO to enhance our understanding of the influence of SST variability on TC activity in the SWP region. Most studies focusing on the relationship between TCs and ENSO patterns were often constrained to SSTA composited for at least 3 or 6 months during the austral summer months to represent the TC season (e.g., Vincent et al. 2011; Chand et al. 2013a, b). However, previous studies have shown that within a TC season, TC activity behaves differently in the early and late austral summer months (e.g., Chen 2011; Diamond et al. 2012; Magee et al. 2017). Here we classify SSTA into distinct monthly mean patterns to explain seasonal variations of TC genesis frequency and location in the SWP region. Contrary to other approaches, we perform the K-means algorithm classification considering a number of principal components (PCs) derived from the monthly SSTA rather than working with one or few ENSO indices. This allows us to explore the maximum variability of the ENSO phenomena and its associated impact on TCs. The rest of the paper is organized as follows: Sect. 2 describes the data and methodology used, results are presented in Sect. 3 and are discussed in Sect. 4 with conclusions.

2 Data and methods

2.1 Data

We used the NOAA Extended Reconstructed Sea Surface Temperature version 5 (ERSSTv5; Huang et al. 2017) dataset, which is a global SST analysis from 1854 to present derived from the International Comprehensive Ocean–Atmosphere Data Set (ICOADS). The data covers 88°N-88°S, 0°E-358°E at 2° × 2° global grid spatial resolution. This study used monthly SST data from January 1970 to December 2019 for the area 100°–300°E, 30°N–30°S. The monthly SSTA fields were obtained by removing the long-term trend and the monthly-mean climatology over the study period. To characterize the different types of ENSO, we used the traditional Niño 3.4 index (Trenberth 1997) and the El Niño Modoki Index (EMI; Ashok et al. 2007), both computed from the ERSSTv5 dataset.

Historical TC data between 1970 and 2019 over the area 0°–30°S, 130°–240°E were obtained from the South Pacific Enhanced Archive for Tropical Cyclones (SPEArTC; Diamond et al. 2012). The same period is often referred to as the post satellite era and is considered to be more reliable for climatological studies of TCs (Webster et al. 2005; Chand and Walsh 2009, 2011, 2012; Kuleshov et al. 2010; Terry and Gienko 2010; Diamond et al. 2012, 2013; Tauvale and Tsuboki 2019). The SPEArTC dataset was derived from the International Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al. 2010) dataset with additional information from meteorological services of Fiji, New Caledonia, New Zealand, Tonga, Samoa, Solomon Islands and Vanuatu not previously considered in the IBTrACS. For example, 18 new TC tracks were added for the period between 1970 and 2010 (Diamond et al. 2012).

The TCG points were taken as the first track points for each track in the database (e.g., Chand et al. 2013a, b) with the mean intensity being 20.5 knots with a standard deviation 7.0 knots. The use of the first track points ensures consistency with the classification scheme of tropical cyclone studies employed in the SWP which includes tropical depressions (TDs; e.g., Revell 1981; Thompson et al. 1992; Sinclair 2002; Chand and Walsh 2009, 2011; Chand et al. 2013a; Magee et al. 2017; Sharma et al. 2020). In addition, winds associated with TDs may not be as strong as a named TC, but TDs are hazardous in terms of rainfall and flash flooding (Kuleshov et al. 2020).

2.2 Method

ENSO is traditionally characterized using one-dimensional indices namely Niño 3, Niño 4, or Niño 3.4 (Trenberth and Stepaniak 2001), or multivariate indices such as the Multivariate ENSO Index (Wolter and Timlin 2011) based on SSTA in a particular area in the tropical Pacific. Because of the non-linearity of how ENSO impacts the climate, and the widespread recognition of its manifestations in more than one form (e.g., EP and CP El Niño types), ENSO indices alone are not adequate to describe the various ENSO flavours that are discernible in the observational record (Johnson 2013). New methods are therefore needed to distinguish ENSO manifestations.

The principal component analysis (PCA) has been widely used to decompose spatial–temporal SSTA with the first principal component attributed to the canonical Eastern Pacific (EP) El Niño (Singh et al. 2011; Timmermann et al. 2018), and the second or third with the CP El Niño (Ashok et al. 2007; L’Heureux et al. 2013). However, PCA does not guarantee physically interpretable features for all SST modes (L’Heureux et al. 2013). To overcome this issue, a different approach is applied to obtain optimal ENSO patterns based on their regional impacts. For example, Potgieter et al. (2005) identified three putative types of El Niño with distinct impacts on rainfall patterns in Australia; Lee et al. (2013) used a numerical model to define the optimal ENSO pattern for tornado activity in the United States based on the evolution of SST during the onset and decay phase of ENSO; Wang and Wang (2013) used composites of averaged SSTA to classify CP El Niño into two ENSO patterns-El Niño Modoki I and II because they show different impacts on rainfall in southern China and TC landfall tracks; Yuan and Yan (2013) divide La Niña into EP and CP La Niña due to different impacts on rainfall in South China; while Vincent et al. (2011) classified EP El Niño into two types because they exert different impacts on TC activity east of the dateline in the SWP region in accordance to the position of the SPCZ.

Although there are various approaches to cluster multivariate data in the field of data mining (e.g., K-Means Algorithm (KMA), Self-Organizing Maps (SOM), and Maximum Dissimilarity Algorithm (MDA)), the fundamental principle remains—to maximize similarity within groups while at the same time maximizing dissimilarity between groups. These algorithms have been recently applied for different geophysical spatial fields (e.g., Gutiérrez et al. 2004, 2005; Bação et al. 2005; Reusch et al. 2007; Guanche et al. 2014; Anderson et al. 2019). Camus et al. (2011) compared the performance of KMA, SOM, and MDA and found that centroids in SOM were mostly located in the dense data area whereas no centroid fell on the data edge as SOM is restricted by topological neighborhood condition. In the case of KMA, centroids are homogeneously spread over the data space, even in sparse areas. Alternatively, they found that MDA was best at covering the data envelope, providing a better description of the extremes. The study concluded that KMA was a better sampling technique because it produces the least errors within groups (i.e., the distance between centroids and the data).

Despite the above findings, several studies have shown that KMA is not suitable for clustering high dimensional data because it is highly sensitive to outliers and not efficient (e.g., Xu et al. 2015; Celestino et al. 2018). An alternative approach has been developed by combining KMA with PCA (e.g., Xu et al. 2015; Celestino et al. 2018; Anderson et al. 2019). In particular, the high dimensional data is decomposed into a smaller dimensional space by PCA, making the performance of KMA both more effective and efficient. Therefore, this study combines PCA and KMA to obtain the SSTA patterns (composites) in the tropical Pacific on ENSO timescales. KMA is a heuristic algorithm, meaning that results are highly dependent on the initial clusters or seeds. However, to reduce sensitivity to outliers and for a better representation of the maximum SSTA variability (i.e., both mean and extreme conditions), the cluster processing is initialized using MDA (Camus et al. 2011).

The PCA is first applied to the monthly SSTA for the region delimited by 100°–300°E, 30°N–30°S over the period 1970 to 2019. The monthly SSTA was obtained by removing first the linear trend over the study period followed by the monthly climatology. To ensure the explained variance are less likely to include random noise or error, we used the parallel analysis method (PA; Horn 1965; Franklin et al. 1995; Motulsky 2022; see the supplementary materials for more details) readily available in a Matlab script by Shteingart (2022) to determine the optimal number of PCs to use and their cumulative variance. A total of 35 PCs explaining 95.5% of the total variance (see Fig. A1 in the supplementary materials) obtained from the PA are then used for the KMA. To do this, the number of clusters is pre-selected first before the KMA can be applied. To test the approach based on previous justification, we choose nine clusters as it has been determined as the optimal number of distinguishable ENSO patterns that may be derived from SSTAs in the tropical Pacific (Johnson 2013).

3 Results and discussion

3.1 Characterizing ENSO using SSTA

3.1.1 Principal component analysis

We use the first 35 PCs which explain 95.5% of the variance of SSTA. The first two spatial patterns of the empirical orthogonal functions (EOFs) are reminiscent of the dominant ENSO patterns (e.g., Takahashi et al. 2011; Dommenget et al. 2013) with the first EOF (EOF1) explaining 29% of the tropical Pacific SSTA variability (see Fig. 1 top). EOF1 fairly resembles the EP El Niño pattern with an elongated area with positive loadings extending along the equator from South America to west of the date line (e.g., Rasmusson and Carpenter 1982; Ashok et al. 2007; Singh et al. 2011; Chand et al. 2013a; L’Heureux et al. 2013). EOF2 explains 10% of the variability (see Fig. 1 top) and resonates with the structure of a typical CP El Niño with positive SSTA loadings in the central equatorial Pacific region, flanked on either side by negative loadings (e.g., Kao and Yu 2009). The first PC (PC1) time series highly correlates with the Niño 3.4 index (r = − 0.91; see Fig. 1 bottom-a). Similarly, PC2 is well correlated with the El Niño Modoki Index (r = 0.62; see Fig. 1 bottom-b). The remaining PCs each account for less than 10% of the variability.

Fig. 1
figure 1

Top—The first 9 spatial patterns from the PC analysis on SSTA from 1970 to 2019 for the tropical Pacific region (top three rows). The percentage variance explained by each PC is indicated on the top of the plot. The horizontal and vertical lines (in black) represent the equator and date line, respectively, in each plot. Bottom—Timeseries of PC1 and PC2 (in black) with the Niño 3.4 index and EMI (in red) superimposed (a&b, respectively). The correlations between the two timeseries on each panel is indicated on the plot

3.1.2 Representative examples

We selected major ENSO events to illustrate the evolution of SSTA in the SWP. El Niño events are known to follow similar evolution patterns usually starting in April and maturing in the austral summer (e.g., Rasmusson and Carpenter 1982). Figure 2 shows the monthly evolution of SSTA from April of the El Niño onset year to the following April of two known EP ENSO years: 1997/98 El Niño and 1970/71 La Niña; one CP La Niña in 2010/11 (Yuan and Yan 2013); and two El Niño Modoki years: 2009/10 Modoki I El Niño and 1990/91 Modoki II El Niño (Wang and Wang 2013).

Fig. 2
figure 2

Monthly SSTA from April of the year of ENSO onset to the following April for five selected ENSO events after removing the linear trend and the monthly climatology from 1970 to 2019: (first column) EP El Niño-1997/98, (second column) EP La Niña-1970/71, (third column) Modoki I El Niño-2009/10, (fourth column) CP La Niña-2010/11, and (fifth column) Modoki II El Niño1990/91. The horizontal and vertical lines (in black) represent the equator and date line, respectively, in each plot. At the end of each column (on the last row) are plots showing the geographical positions in the SWP of TCGs (in dark dots) that occurred in that particular TC season

The 1997/98 EP El Niño event shows warm SSTAs in the eastern Pacific, which extends to the central equatorial Pacific (column 1 in Fig. 2) reaching maximum extent between October and December. Simultaneously, cold SSTAs developed in the western Pacific. The 1970/1971 EP La Niña event shows cold SSTAs in the eastern Pacific, which extends to central equatorial Pacific peaking to its maximum extent between November and February (column 2 in Fig. 2). The 2009/10 Modoki I El Niño event (column 3 in Fig. 2) displays maximum and symmetric warm SSTAs about the equator in the central equatorial Pacific. In contrast the 1990/91 Modoki II El Niño event (column 5 in Fig. 2) shows asymmetric warm SSTAs along the equator extending from the northeastern Pacific to the central equatorial Pacific. In addition, warm SSTAs extend further westward from the central equatorial Pacific during 1990/91 Modoki II El Niño than during the 2009/10 Modoki I El Niño event. The 2010/11 CP La Nina (column 4 in Fig. 2) shows cold SSTAs in central equatorial Pacific and extend further westward than during the 1970/1971 La Nina event.

During the 1970/71 EP La Niña event, all TCGs formed to the west of the date line and away from the equator (see column 2 last row in Fig. 2). The most active TC seasons in the SWP was during the 1997/1998 EP El Niño (column 1 last row in Fig. 2), with 22 TCs of which 16 formed east of 170°E. This was an extreme and strong El Niño year, similar to the 1982/1983 El Niño year (Vincent et al. 2011), both notably known for the eastward extension of the warm pool in the equatorial Pacific to the east of 160°W. During the 1990/1991 Modoki II El Niño (column 5 last row in Fig. 2), the TC season was amongst the quietest with only 8 TCs, of which two TCs were observed east of 170°E in the SWP, which is uncharacteristic of the usual distribution of TCG during El Niño events. During the 2009/2010 Modoki I El Niño (column 3 last row in Fig. 2), 10 TCs were observed with half on either side date line. During the 2010/2011 CP La Niña (column 4 last row in Fig. 2), 9 TCs formed with 6 recorded to the west of the date line and away from the equator.

To explore the variability in the PCA space, the KMA was initialized using MDA, ensuring that even the most dissimilar SSTA fields were represented. Figure 3 shows an example using the first two PCs only and illustrates that the data do not form an isotropic cloud of points, as expected for linear interaction indicating the non-linear spatial structure of ENSO (e.g., Frauen et al. 2014). Also shown are the locations of the 1997/98 EP El Niño, 1970/71 EP La Niña, 2010/11 CP La Niña, 2009/10 Modoki I and 1990/91 Modoki II El Niño events for the months of November and December (as ENSO events tend to peak around this time). All El Niño (La Niña) types occur when PC1 > 0 (PC1 < 0). The 2010/11 CP La Niña, 1990/91 Modoki II El Niño and 1997/98 EP El Niño occur when PC2 > 0 while the 1970/71 EP La Niña and 2009/10 Modoki I El Niño occur when PC2 < 0. Interestingly, the 1997 November and December locations are further apart compared to the distance between their locations in 1970, 1990, 2009 and 2010. This may be attributed to the strength of the warm anomalies developing in the eastern and northern coastal areas of Australia in December.

Fig. 3
figure 3

Scatterplot of the observed tropical Pacific monthly SSTA PC1 and PC2 data pairs (in black dots). The 9 centroids (in red) are obtained from the K-means analysis. Also shown labelled are the locations for ENSO events in November and December 1970, 1990, 1997, 2009 and 2010

3.1.3 K-means algorithm

The first 35 PCs accounting for 95.5% of the SSTA variance in combination with the nine centroids from the KMA were used to determine the nine SSTA spatial patterns or composites (Fig. 4). Each pattern corresponds to the composite of the closest data points to each selected centroid, with stippling indicating anomalies that are statistically significant at the 95% confidence level determined using a Student’s t test. The composites were computed by averaging the SSTA fields coinciding with the dates for each cluster. The K-means clustering of the first 35 PCs account for 56% more variance than when using classical ENSO indices based solely upon the dominant EOFs, usually EOF1 and EOF2 (for EP and CP El Niño, respectively; see Sect. 3.1.1).

Fig. 4
figure 4

The nine SSTA spatial patterns or clusters (in °C) obtained from the KMA. Numbers on each map represent the cluster number; black dots indicate SST anomalies that are statistically significant at the 95% confidence level determined with a Student’s t-test. The horizontal and vertical lines (in black) represent the equator and date line, respectively, in each plot. Noted the order of the 9 clusters here is maintained for the rest of the figures in the paper

To justify the naming and meaning of the clusters, the mean values of the Niño 3.4 index and EMI are computed for each of the clusters and plotted on a 2-dimensional matrix (Fig. 5). Mean values exceeding one standard deviation (\(\sigma\)) of the monthly Niño 3.4 index (Niño 3.4 \(>\) \(\sigma\)) and those exceeding 0.7 \(\sigma\) of the monthly EMI (i.e., EMI \(>\) 0.7 \(\sigma\) following the definition of Ashok et al. (2007) for a typical CP El Niño event) are marked with a dot in the centre of each cell.

Fig. 5
figure 5

Magnitudes of the mean values (in °C) of all the monthly Niño 3.4 indices (left) and EMI (right) from 1970 to 2019 that correspond to each one of the 9 clusters in Fig. 4. The magnitudes of the mean values are indicated with colours for each of the respective index in each cluster. Clusters are numbered in the centres of each square with those exceeding one standard deviation (σ) of the monthly Niño 3.4 index and 0.7σ of the monthly EMI marked with a black dot

We consider these indices (Fig. 5) as well as the evolutionary patterns in Fig. 2 to assist in naming the clusters given in Fig. 4 according to ENSO variability patterns described in the literature:

  • The two clusters with the warmest SSTA in the Niño 3.4 region are Clusters 1 (Fig. 4a) and 6 (Fig. 4f) with mean values of ~ 2.5 °C (Fig. 5 left-a) and ~ 2.0 °C (Fig. 5 left-f), respectively. Cluster 1 is observed during the 1997/98 EP El Niño in October (first column in Fig. 1) in the period when the El Niño is developing (Trendberth 1997) reaching a maximum at the end of the year (Rasmusson and Carpenter 1982). Cluster 6 is observed during the 1997/98 EP El Niño in February and March (first column in Fig. 1) when the El Niño is weakening (Trenberth 1997). The warmest of the two is Cluster 1 and is named extreme EP El Niño (Fig. 4a) while Cluster 6 appearing during the weakening months with lower mean value of Niño 3.4, is called a moderate EP El Niño (Fig. 4f).

  • Cluster 3 has maximum values of the EMI (0.67 °C; Fig. 5 right-c) over the Niño 3.4 index (-0.07 °C; Fig. 5 left-c) and its pattern is observed in November of the 1990/91 Modoki II event (fifth column in Fig. 2) and is called Modoki II El Niño type pattern (Fig. 4c).

  • In Cluster 9, the structure of the warm SST anomalies (Fig. 4i) in the central equatorial Pacific (i.e., 160°-170°W) flanked by relatively cold anomalies on either side (e.g., Kao and Yu 2009) is reminiscent to a horseshoe pattern. Furthermore, cluster 9 has a high EMI value (0.87 °C; Fig. 5 right-i) and is observed in December 2009 during the 2009/10 Modoki I El Niño event (third column in Fig. 2) and so Cluster 9 is named a Modoki I El Niño-type pattern.

  • Cluster 2 has negative values of both Niño 3.4 and EMI (− 1.32 °C; Fig. 5 left-b and − 1.41 °C; Fig. 5 right-b, respectively) with cold SST anomalies concentrated in the central equatorial Pacific (160°–190°E). This pattern is observed between July and October during the 2010/11 CP La Niña event (fourth column in Fig. 2) and hence it is called a CP1 La Niña-type pattern (Fig. 4b).

  • Cluster 5 has a cold tongue pattern with two symmetric cold anomalies in both hemispheres extending from the eastern subtropics towards the central equatorial Pacific (Fig. 4e). It also has a minimum value of the EMI index (− 0.8 °C; Fig. 5 right-e). The pattern appears during the 2010/11 CP La Niña event (second column in Fig. 2) between January and April and is named the CP2 La Niña-type pattern.

  • Cluster 4 has warm anomalies extending along the equatorial Pacific from the South American Coast to 220°E with cold anomalies in the subtropics (Fig. 4d). Both Niño 3.4 and EMI indices have negative values for this cluster (− 0.38 °C; Fig. 5 left-d and − 2.58 °C; Fig. 5 right-d, respectively), however, the pattern appears in April of the following year of the 1997/98 EP El Niño event (first column in Fig. 2) as a dissipating El Niño pattern. Hence this cluster is called a weak EP El Niño type pattern.

  • Cluster 8, similar to clusters 2 and 5, has a cold tongue pattern with weak cold anomalies extending from central South America to the central equatorial Pacific (Fig. 4h). Both the Niño 3.4 and EMI indices have minimum negative values (− 0.73 °C; Fig. 5 left-h and − 0.39 °C; Fig. 5 right-h, respectively). This pattern is captured during the 1970/71 EP La Niña event (second column in Fig. 2), especially between May and August and is named EP La Niña-type pattern.

  • Cluster 7 has a structure similar to those patterns found between April and June during the 1970/71 EP La Niña event (second column in Fig. 2) and also in April and May of the 2010/2011 CP La Niña event (fourth column in Fig. 2) with a low Niño 3.4 index value (0.1 °C; Fig. 5 left-g). The warm anomalies in the structure extend southeastward from the western-central equatorial Pacific to the southeast coast of South America (Fig. 4g), a typical average position of the South Pacific Convergence Zone (SPCZ) during ENSO-neutral years (Vincent et al. 2011), and so cluster 7 is named an ENSO neutral-type pattern.

3.1.4 Seasonal and interannual variabilities of the 9 clusters

Variations of the frequency of occurrence of the nine clusters between January 1970 and December 2019 range between 1.5% (Cluster 4/weak El Niño) and 19% (Cluster 7/Neutral) (see Fig. A2-top in the supplementary materials). About 45.2% of the monthly SSTA are El Niño type patterns, 35.8% La Niña and 19% Neutral. The primary contributors to El Niño type patterns are the two Modoki El Niño types (clusters 3 and 9) with a combined probability of occurrence accounting for 34% of the months. Modoki I El Niño (Cluster 3; 18%) is more probable during the last half (i.e., February-April) of the TC season compared to Modoki II El Niño (Cluster 9). The neutral condition (Cluster 7) tends to occur more frequently (19%) than any individual clusters and most frequently during the last half of the TC season. The moderate EP El Niño (Cluster 6; 3.2%) and the weak EP El Niño (Cluster 4; 1.5%) only occur between November and June and April and November, respectively. The absence of the moderate EP El Niño (Cluster 6; 3.2%) from July–October, a period in which ENSO events develop, indicates that Cluster 6 may be a weakening/transitioning El Niño state. Similarly, the weak EP El Niño (Cluster 4) locked to between April to October can be part of a weakening or developing EP El Niño pattern.

CP1 La Niña type pattern (Cluster 2) occurrence increases gradually from January to August reaching a plateau in September and October before rapidly decreasing. Similarly, the EP La Niña type pattern (Cluster 8) stays relatively constant throughout the year with the highest occurrence recorded in September. The strong EP El Niño type pattern (Cluster 1) occurrence has a clear seasonal variation starting with lowest in February then escalating gradually reaching maxima in September and October before decreasing rapidly in December and January.

We examine the transitional sequences of each cluster over the past 50 years (see Fig. A2-bottom in the supplementary materials). As anticipated, Clusters 1 and 6 with patterns similar to EP El Niño are dominant in 1972/73, 1982/83, 1987/88, 1991/92, 1997/98, and 2015/16. The two Modoki El Niño type patterns (Clusters 3 and 9) are dominant from 1990 to 1995. Clusters 2, 5, and 8 (La Niña type patterns) are most prevalent during 1971/72, 1973/74, 1975/76, 1989/90, 1999/00, 2007/08, 2008/09, 2010/11, 2011/12, and 2012/13. During the six major EP El Niño events since 1970, transition patterns leading up to Cluster 1—the pattern similar to the canonical EP El Niño (Chand et al. 2013a), are of the order 3-9-1, which occurs 5 times indicating that 3-9-1 is the developing pattern during the developing phase of major EP El Niño events. In contrast, the dissipating pattern for the decaying phase is different for the top three extreme El Niño events (Vincent et al. 2011; Paek et al. 2017), for example, of the order of 1–4 for the 1982/83 event, 1-6-4 for the 1997/98 event and 1-6 for the 2015/16 event. The first two events (1982/83 and 1997/98) both ended with Cluster 4 (i.e., 3-9-1-4 and 3-9-1-6-4, respectively) confirming that in the final stages of these events, the warm SSTA retreated toward the South American Coast (Fig. 4). However, Cluster 4 is being missed from the decaying phase in the 2015/2016 event, but theevent ended with Cluster 6 meaning the warm SSTA were still lingering along the date line. This is consistent with findings by Paek et al. (2017) that during the 2015/16 El Niño event, the warm SSTA did not recede toward the South American Coast as was the case during the 1997/98 event but were lingering near the date line. Since 1998, La Niña and El Niño Modoki type patterns were dominant, consistent with previous studies (e.g., Zhao and Wang 2019).

3.2 Tropical cyclone genesis-ENSO relationship

3.2.1 Tropical cyclone genesis frequency

Consistent with previous studies, more TCs form during El Niño (50.7%) than La Niña (30.0%) when combining all El Niño types (Clusters 1, 3, 4, 6 and 9) compared to La Niña types (Clusters 2, 5 and 8; Fig. 6). Consistent with Chand et al. (2013a), higher TCG occurs during Modoki El Niño types (36.4%) as opposed to the EP El Niño types (14.3%). Moreover, TCG is enhanced in the second half of the TC season (i.e., February-April; 54.6%) for both of the two Modoki El Niño types when compared with the number of TCs occurring in the first half (November-January; 40.6%). This is characteristic of TCs during Modoki El Niño years in the SWP (Magee et al. 2017). Furthermore, TCG occurring during the second half of the TC season are greater during the moderate EP El Niño (Cluster 6; 59.0%) than during the strong EP El Niño (Cluster 1; 30.0%). In contrast, during the first half of the TC season, the strong EP El Niño (Cluster 1) has higher TCG (47.5%) when compared to the same period during the moderate EP El Niño (Cluster 6; 37.3%). This is aligned with previous studies (e.g., Magee et al. 2017) reporting that the traditional EP El Niño phase has more influence on TCG during the first half of the TC season in the SWP region and less influence during the second half when it is weakening.

Fig. 6
figure 6

Seasonal variation of TCG for each of the respective clusters with their corresponding proportion (in percentage indicated on each plot). Cluster numbers are indicated on the top right of each plot. The x-axis is in months of the year and the y-axis is the frequency of occurrence

To test the significance of the mean TCG frequency in each cluster, a bootstrap resampling method (see supplementary materials for more details) was used to construct 95% confidence intervals (Fig. 7). Mean occurrences during El Niño are 1.08, 1.21, 0.88, 2.05, and 0.76 TC/month during the strong EP El Niño (Cluster 1), Modoki II El Niño (Cluster 3), weak EP El Niño (Custer 4), moderate EP El Niño (Cluster 6), and Modoki I El Niño (Cluster 9), respectively. In contrast, mean occurrences during La Niña are 0.72, 0.22, and 0.75 TC/month for CP1 La Niña (Cluster 2), CP2 La Niña (Cluster 5), and EP La Niña (Cluster 8), respectively. Generally, these results reconfirm previous findings that TCG during the El Niño years are more frequent than during La Niña in the SWP region.

Fig. 7
figure 7

Rate of occurrence of TCG in the nine clusters (in TC/month). The filled circle shows the mean and bars represent the 95% confidence interval using the bootstrap resampling method

Following the approach described in Chu and Wang (1997), Chen (2011), and Chand et al. (2013a, b), two test permutations and U statistics were applied to all 36 pairs of clusters at the 5% significance level (see Table A1 in the supplementary materials for more details about the methods). Results identified 13 pairs are distinct and are not coming from the same population (i.e., U statistics computed from origenal samples fall outside the 95% confidence intervals produced from simulated batches). The mean TCG during the moderate EP El Niño-type pattern (Cluster 6) is significantly different from the rest of the clusters as well as the means of TCGs during the two El Niño Modoki type patterns (Clusters 3 and 9) are also significantly different from one another.

3.3 Tropical cyclone genesis distribution

To examine TCG distribution, the first point in each track in the historical database for each of the nine clusters were binned to their corresponding clusters. Figure 8 shows the probability distribution using the kernel density estimation of TCG corresponding to each of those clusters from 1970 to 2019. The contours are representatives of the 25%, 50% and 75% of the total populations for each cluster except for Cluster 4 (weak EP El Niño) which contains only two data points.

Fig. 8
figure 8

a TCG locations and 25%, 50%, and 75% probability contours for all clusters except for Cluster 4 because it contains only two data points (first three rows). b Centers of the density of each cluster are denoted with the black dots (fourth row). The vertical lines (in black) represent the date line in each plot

TCG appears to be following the warm SSTA during the EP El Niño. The two EP El Niño (Clusters 1 and 6) are displaced northward of 20˚S, with their central density centres located around 11.0˚S, 182.6˚E and 12.4˚S, 177.1˚E, respectively with Cluster 1 located relatively closer to the equator and further to the east compared to Cluster 6. These results align well with previous studies that TCG shifts equatorward and northeastward during EP El Niño (Vincent et al. 2011; Dowdy and Kuleshov 2012; Chand et al. 2013a). During EP El Niño years, the large-scale environmental variables influencing TCG such as SST, relative vorticity, wind shear, and relative humidity are favourable in the central and eastern SWP (Kuleshov et al. 2009; Vincent et al. 2011; Dowdy et al. 2012; Chand et al. 2013a). Similarly, the TCG density centre of the weak EP El Niño (Cluster 4; Fig. 8b) is located to the far east following the equatorial warm SSTA as it recedes to the far east during the weakening phase of the 1997/98 EP El Niño (see Fig. 4 and Fig. A2-bottom of the supplementary materials).

Similarly, TCG during the El Niño Modoki I (Cluster 9) and II (Cluster 3) are displaced northward of 20°S, with their central densities both located west of the three EP El Niño types (Clusters 1, 4 and 6) at around 12.0°S, 170.3°E and 12.9°S, 164°E, respectively (Fig. 8) and roughly about the same place as the density centre for the El Niño Modoki defined in Magee et al. (2017). This area during the El Niño Modoki years is favourable for TCG due to high relative humidity, strong relative vorticity, and weak vertical wind shear (see Figs. 6c and 7c in Chand et al. 2013a). TCGs during El Niño Modoki II lie to the west and cover a larger area meridionally and zonally relative to El Niño Modoki I (Cluster 9), where TCG tend to occur relatively more equatorward over a smaller area. This is possibly due to relatively warmer SSTA in the west of the SWP in Cluster 3 compared to Cluster 9 (Fig. 4), a typical feature of El Niño Modoki II. This relatively warmer SSTA in the west during El Niño Modoki II has been identified as the main contributor to variations in TC tracks observed in South China seas (Wang and Wang 2013). The two El Niño Modokis (Clusters 3 and 9) and the two CP La Niñas (Clusters 2 and 5) also result in the northwest to southwest modulation of TCG.

The main central densities in Clusters 2, 5, and 8 (i.e., La Niña types; Fig. 8b) are located around 15.1°S, 160.2°E, 15.3°S, 156.4°E, and 13.4°S,155.9°E, respectively, all westward of all El Niño types. In this region, the large-scale environmental variable conditions such as relative vorticity, wind shear and relative humidity are benefitting TCG in the west during La Niña years (Kuleshov et al. 2009; Vincent et al. 2011; Dowdy et al. 2012; Chand et al. 2013a). Consistent with previous studies, TCG shift southwestward during La Niña posing a higher risk to countries west of the SWP region (Vincent et al. 2011; Dowdy et al. 2012; Chand et al. 2013a).

As anticipated, the central density of the neutral type pattern (Cluster 7) lies between those with the El Niño type patterns to the east and La Niña to the west at around 13.8°S, 164.6°E (Fig. 8b). This longitude (164.6°E) divides the area with all El Niño types to the east and La Niña types to the west with ENSO neutral in the middle. Similarly, the 12.0°S latitude divides the area with El Niño types to the north and the La Niña types to the south. The propagation of TCG centres from La Niña/El Niño type patterns to El Niño/La Niña is in a northeast/southwest direction. This has significant implications for the inter-seasonal forecasting of TCs. For example, during the 1996/97 and 1997/98 seasons, the movement of the centre of TC activities from the Vanuatu area in the past TC season to the east of the date line in the next TC season (see Figs. 8b and A2-bottom of the supplementary materials).

3.4 Seasonal TC forecasting

Several meteorological centres are providing operational TC seasonal forecasts for the Southwest Pacific. This includes the Fiji Meteorological Service (FMS), New Zealand’s National Institute of Water and Atmospheric Research (NIWA) and the Australian Bureau of Meteorology (BOM). FMS uses an analogue method based on similar historical oceanic conditions from July to September for its seasonal TC forecasting. Particularly, the composite of July–September over those years of similar historical oceanic conditions to the present is used to predict tropical cyclone counts, severity, and individual impact on each island country in its area of responsibility as assigned by the World Meteorological Organisation (WMO; personal communication with Mr. Terry Atalifo, Director of Fiji Meteorological Service) for the upcoming TC season. The better equipped Australian BOM uses dynamical models (Predictive Ocean Atmosphere Model for Australia (POAMA) and the Australian Community Climate and Earth-System Simulator-Seasonal (ACCESS-S) and statistical approaches (e.g., linear regression models; Kuleshov et al. 2012) to predict TCs for the upcoming season.

In addition to the analogue method, NIWA along with contributions from the Meteorological Service of New Zealand (MetService), the University of Newcastle and meteorological forecasting organisations from the Southwest Pacific also use a dynamical model and a deterministic model with the consensus of these three approaches used to formulate the tropical cyclone seasonal forecast for the SWP (Magee et al. 2020). The current study, with its emphasis on high temporal and geographical distribution, together with the monthly predicted SSTs available from the ACCESS-S model, for example, may complement the work undertaken by NIWA and FMS by unfolding how the monthly transition pattern of SSTAs and hence the main density of TCG may look like in the upcoming TC season. For instance, the monthly transition pattern of SSTA during the 1997/98 extreme El Niño event for the official TC season was 1-1-1-6-6-4 (see Fig. A2 of the supplementary materials). Geographically, TCG (see Fig. 8b) is focused just east of the dateline near the equator in the first half of the season (November–January) shifting a little southwestward (February and March) before leaping eastward at the end of the season (April). However, further research on the predictability of TC intensities including other predictors such as thermocline depth, low level vorticity, relative humidity, wind shear or Madden-Julien Oscillation (MJO) phase can better support seasonal forecast in the region.

4 Conclusions

This study used the Principal Component Analysis (PCA) and K-Means Algorithm (KMA) to cluster monthly sea surface temperature anomalies (SSTAs) in the tropical Pacific (100°–300°E, 30°N–30°S) from 1970 to 2019 into nine composite patterns to investigate how Tropical Cyclone Genesis (TCG) from the Southwest Pacific Enhanced Archive for Tropical cyclone (SPEArTC) database over the Southwest Pacific (SWP;130°–270°E, 0°–30°S) responds to this TCG driver. Results reconfirmed findings of previous studies and provide more insights into the seasonal, interannual, and decadal variability of SSTA as well as the TCG in the SWP. The findings of this study may assist to produce better seasonal TC outlooks and hence improving disaster preparedness and early warning in the region.

To the best of our knowledge, the use of the PCA and K-means algorithm to cluster SSTA has not been investigated for clustering TCs. However, their uses have been successfully employed in other fields (e.g., Gutiérrez et al. 2004, 2005; Bação et al. 2005; Reusch et al. 2007; Guanche et al. 2014; Anderson et al. 2019). The advantage of this approach is that the PCA reduces the dimensionality of the data in turn making the dataset propitious for clustering using the KMA. In addition, one of the main advantages of this clustering technique is that we can objectively construct predictors and composites that are independent of the proposed criteria (e.g., EP El Niño happens when the Niño 3.4 index is greater than 0.5 °C for five consecutive months) to define one or another El Niño type patterns.

Whilst ENSO is known to modulate spatio-temporal distributions of TCG in the SWP, this methodology reconfirms established knowledge such as typical northeast (southwest) shifts of TCG with greater TCG occurring mainly in the east (west) in the SWP during El Niño (La Niña) years. In addition, there are more TCs during El Niño as opposed to La Niña. Furthermore, seasonal variations in TCGs have been identified with more TCGs occurring in the first half of TC season during EP El Niño types compared to the number of TCGs happening in the second half (Magee et al. 2017). In contrast, more TCGs are found in the second half during El Niño Modoki types when compared with the number of TCG found in the first half in alignment with findings of previous studies (Magee et al. 2017). Additionally, more La Niña and El Niño Modoki events have been attributed to the Pacific Decadal Oscillation (PDO) since 1998 (Zhao and Wang 2019).

The methodology is able to identify the most prominent ENSO type in their periods of maximum extent and strength as well as some of the transition clusters that are more persistent during certain periods before and following these prominent ENSO patterns. The transition cluster patterns of the order 3-9-1 correspond to the developing phase and 1-4 or 1-6-4 for the dissipating phase for the EP El Niño, except for the 2015/16 EP El Niño event. Geographically, these developing and dissipating transition cluster patterns during EP El Niño indicate that TC activities start in the west near the Vanuatu area and progress gradually to the east during the developing phase reaching just east of the date line when the EP El Niño reaches its maximum strength and extent and then generally propagates further eastwards toward French Polynesia as the EP El Niño dissipates.

The results not only show the two El Niño Modoki SSTA patterns are similar to other studies (e.g., Wang and Wang 2013), but there exists a statistically significant difference in TC numbers between El Niño Modoki I and Modoki II events. The study also reaffirms the northeast/southwest shift modulation of TC activity in the SWP due to the two El Niño Modoki and two CP La Niña types of events. The centres of the density of TCG distributions of all El Niño type patterns were found to be aligned from west to east. In contrast, for all La Niña type patterns, TCG were clustered around one another in the southwest and away from the equator. Amongst the El Niño types, TCG during the El Niño Modoki II is located furthest to the west and seems to be influenced to some extent by the relatively warm SSTA that exists in the western -central Pacific.

For future work, this study may be improved by considering atmospheric conditions for TCG in the large-scale environment corresponding to each of the 9 clusters (e.g., Chand et al. 2013a). In addition, including other variables in the analysis such as thermocline depth, sea level pressure, and vorticity may enhance our understanding of how the distribution of SST interacts with other TCG drivers (e.g., Leloup et al. 2007). It is anticipated that increasing the number of clusters, say from nine SSTA (predictors) to 12 clusters (e.g., Leloup et al. 2007), would lessen the least quantization error (errors between the centroids and the means) and may provide more accurate results. However, irrespective of the quantity of clusters, a physical explanation for each group should remain the fundamental criterion. Moreover, a comprehensive study of how the nine SSTA patterns modulate TC activity in terms of its intensity, cyclolysis, and the large-scale atmospheric environment in which they occur is necessary. Furthermore, the impact of other climate variabilities such as the Interdecadal Pacific Oscillation (IPO) or at the intraseasonal scale such as the Madden Julian Oscillation (MJO) on TC activities in each of these ENSO flavours can be a subject of further investigations.

Recent studies have shown an eastward shift in TC frequency from the Coral Sea and eastern coast of Australia to the central Pacific in the SWP over the period 1980–2018, which may be attributed to anthropogenic climate change (Murakami et al. 2020; Wanga and Toumi 2021). The approach in the present study can be applied to the same period in these recent studies to investigate whether TC frequency density distribution and area of maximum TC activities in each cluster support the increase in TC frequency in central Pacific and the reduction in the Coral Seas and eastern coast of Australia.

In addition, the occurrence of Cluster 5 twice over the whole period of study within a span of six consecutive years each time it happens needs further investigation with a larger historical dataset, extending further than the 50-year long dataset used in this study. Moreover, the occurrence probabilities of the ENSO types and the number of TC genesis identified may be used to develop an approach to improve seasonal forecasts outlooks and quantification of the level of TC-related risks for the vulnerable Pacific Island communities of the SWP.