INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 30: 333–346 (2010) Published online 23 March 2009 in Wiley InterScience (www.interscience.wiley.com) DOI: 10.1002/joc.1888
Interpreting variability in global SST data using independent component analysis and principal component analysis Seth Westra,a * Casey Brown,b Upmanu Lall,b Inge Kochc and Ashish Sharmaa a
School of Civil and Environmental Engineering, The University of New South Wales, Sydney, NSW, Australia b Department of Earth and Environmental Engineering, Columbia University, New York, NY, USA c School of Mathematics, The University of New South Wales, Sydney, NSW, Australia
ABSTRACT: Component extraction techniques are used widely in the analysis and interpretation of high-dimensional climate datasets such as global sea surface temperatures (SSTs). Principal component analysis (PCA), a frequently used component extraction technique, provides an orthogonal representation of the multivariate dataset and maximizes the variance explained by successive components. A disadvantage of PCA, however, is that the interpretability of the second and higher components may be limited. For this reason, a Varimax rotation is often applied to the PCA solution to enhance the interpretability of the components by maximizing a simple structure. An alternative rotational approach is known as independent component analysis (ICA), which finds a set of underlying ‘source signals’ which drive the multivariate ‘mixed’ dataset. Here we compare the capacity of PCA, the Varimax rotation and ICA in explaining climate variability present in globally distributed SST anomaly (SSTA) data. We find that phenomena which are global in extent, such as the global warming trend and the El Ni˜no-Southern Oscillation (ENSO), are well represented using PCA. In contrast, the Varimax rotation provides distinct advantages in interpreting more localized phenomena such as variability in the tropical Atlantic. Finally, our analysis suggests that the interpretability of independent components (ICs) appears to be low. This does not diminish the statistical advantages of deriving components that are mutually independent, with potential applications ranging from synthetically generating multivariate datasets, developing statistical forecasts, and reconstructing spatial datasets from patchy observations at multiple point locations. Copyright 2009 Royal Meteorological Society KEY WORDS
sea surface temperature; principal component analysis; independent component analysis; varimax; EI Nino-Southern Oscillation; climate variability
Received 5 April 2008; Revised 23 September 2008; Accepted 1 February 2009
1.
Introduction
Component extraction techniques have been used extensively in climatological studies, to both reduce the dimension of large datasets such as sea surface temperatures (SSTs) for statistical analysis (Nicholls, 1989; Drosdowsky, 1993; Cordery and Opoku-Ankomah, 1994) and aid in the identification and interpretation of significant modes of climate variability, such as the Pacific Decadal Oscillation (PDO, Mantua et al., 1997; Zhang et al., 1997; Power et al., 1999a), the Indian Ocean Dipole (IOD, Saji et al., 1999), and the North Atlantic Oscillation (NAO, Hurrell, 1995; NAO, Walker and Bliss, 1932). The most frequently used component extraction technique is principal component analysis (PCA), also referred to as empirical orthogonal function (EOF) analysis, which was introduced to meteorology and climate research in early works by Obukhov (1947); Lorenz * Correspondence to: Seth Westra, School of Civil and Environmental Engineering, The University of New South Wales, Sydney, NSW, Australia 2052. E-mail:
[email protected] Copyright 2009 Royal Meteorological Society
(1956) and Kutzbach (1967). This technique has enjoyed a high level of popularity largely due to its mathematical simplicity, insensitivity to the number of components retained, and ability to optimize the variance (in a least squares sense) explained by successive principal components (PCs), thereby allowing representations of a large fraction of the data variability using only a relatively small number of PCs (Richman, 1986, 1987; Jolliffe, 1987; Preisendorfer, 1988; Wilks, 2006). Despite its popularity, PCA contains some important constraints which can affect the interpretation of the PCs, including (a) the fact that components are spatially and temporally orthogonal; (b) the property that successive components explain the maximum remaining variance; and (c) the influence of the domain used for the analysis (Aires et al., 2000; Mestas-Nunez, 2000; Aires et al., 2002; Dommenget and Latif, 2002). These properties often force the first PC to cover over the full domain, with the consequence that several distinct regional climate modes may be mixed into a single component (Richman, 1986; Trenberth et al., 2005). Dipole structures that do not exist in reality may also be induced artificially into
334
S. WESTRA ET AL.
subsequent components, with the node typically located at the point of maximum variability of the first component (Richman, 1986; Houghton and Tourre, 1992; Dommenget and Latif, 2002; Jolliffe, 2003). The above concerns are especially valid if PCA is performed using the covariance matrix of the data, with related though different disadvantages if the correlation matrix is used instead. To address these concerns, numerous authors (Horel, 1981; Richman, 1986; Houghton and Tourre, 1992; Opoku-Ankomah and Cordery, 1993; Kawamura, 1994; Mestas-Nunez and Enfield, 1999; Dommenget and Latif, 2000; Janicot et al., 2001; e.g., Trenberth et al., 2005) have applied a Varimax rotation of the eigenvectors obtained through PCA to increase component interpretability. This approach regionalizes the data by applying a linear transformation to the PCA solution so that the correlations between the rotated components and the original data are either maximized to the upper limit of 1 (or equivalently, minimized to −1) or reduced to zero (Mestas-Nunez and Enfield, 1999; Von Storch and Zwiers, 1999; Jolliffe et al., 2002). It is frequently argued that one of the benefits of this regionalization is that the components are less sensitive to changes in the temporal and spatial domains of the original dataset (Richman, 1986). On the other hand, the loss of temporal orthogonality means that individual components can be highly correlated with each other (Mestas-Nunez, 2000; Trenberth et al., 2005), which may impact upon the performance of statistical regression models. In recent years, an alternative rotational technique known as independent component analysis (ICA) has been developed to find independent representations of a multivariate dataset. As discussed further in Section 2, the innovation behind ICA is that the ICs can be found by finding a transformation of the multivariate data such that the derived components are as far from a normal (Gaussian) distribution as possible. Thus, finding the ICs is equivalent to finding directions which are maximally non-Gaussian, with derivations and alternative measures of non-Gaussianity provided in (Herault and Jutten, 1986; Comon, 1994). Applications of ICA to geophysical time series such as SSTs (Aires et al., 1999, 2000, 2002; Basak et al., 2004; Ilin et al., 2006) and seismic signals (Ciaramella et al., 2004) are few and recent. The appeal of ICA as a tool to find dominant modes of climate variability is driven primarily by the link between statistical independence and the solution to the blind source separation (BSS) problem, where one wishes to derive a set of independent ‘source signals’ from a set of observations, having no information about either the nature of these signals or the manner in which they have been mixed (Hyvarinen et al., 2001). The use of independence as the criterion for separating a set of climate signals therefore provides a logical alternative to the concept of simple structure used in the Varimax algorithm as the basis of rotating the PCA solution to enhance interpretability. Therefore, there are now at least two rotational methods that have been commonly applied in the study of Copyright 2009 Royal Meteorological Society
climatic datasets, both of which enhance the interpretability of components that are derived from the PCA analysis, but based on very different mathematical properties. In this article, we present a comparative analysis of the two techniques applied to a global SST dataset and use the results from this analysis to determine (a) the degree of similarity between the results obtained by the two methods; (b) the sensitivity of the methods to factors such as the number of PCs retained before rotation; and (c) the interpretability of extracted components based on what is currently known about the climate system. The remainder of this article is structured as follows. The datasets used for the analysis are described in Section 2. Section 3 then provides an overview of the mathematical basis of PCA, the Varimax rotation and ICA, including a brief discussion on the optimal degree of dimension reduction prior to rotation. Section 4 provides a preliminary examination of the components extracted by each technique, including similarities and differences between the components derived using the three techniques. Section 5 focuses on the interpretability of the components including a detailed comparison of the results from each technique and linkages between the components and established climate phenomena. Section 6 considers other applications for ICA that focuses on the statistical applications of generating ICs, rather than on the physical interpretability of the components themselves. Finally, the conclusions are presented in Section 7.
2. 2.1.
Data Global SST anomalies
A global sea surface temperature anomaly (SSTA) dataset was obtained from the reconstruction of raw SST values using an optimal smoother, as described in Kaplan et al. (1998). The data are available on a 5° longitude by 5° latitude grid across the global ocean field, totalling 1207 spatial locations. In the temporal dimension, the data comprise monthly averages from 1900 to 2005, totalling 1272 time steps. Unlike a range of other studies (MestasNunez and Enfield, 1999), we did not remove the linear trend from the data, since we do not believe this trend to be linear. Rather, the data exhibit two periods of warming, first from 1910 to 1940 and then from 1970 to the present, with the intervening period showing roughly constant temperatures (Smith and Reynolds, 2005). 2.2.
Indices
Numerous indices were used in this study to assist with the interpretation of the components derived from PCA, the Varimax rotation and ICA. The indices were selected to represent variability in the Pacific, Atlantic, and Indian Oceans and have been derived using either SST or sea level pressure (SLP) datasets at the global or regional scale. In the Pacific Ocean, the dominant source of variability is typically ascribed to the El Ni˜no-Southern Int. J. Climatol. 30: 333–346 (2010)
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA
Oscillation (ENSO) phenomenon. Here, we consider an index of the oceanic component of ENSO known as Ni˜no 3.4, which is defined as the seasonally averaged SSTA over the central Pacific Ocean (170 ° W–120 ° W, 5° N–5 ° S) (5 ° S–5° N, 120 ° W–170 ° W; Trenberth, 1997). The Ni˜no 3.4 index was obtained from the International Research Institute (IRI) for Climate and Society website (http://ingrid.ldgo.columbia.edu/SOURCES/.Indices/ .nino/.KAPLAN). Also in the Pacific Ocean, we consider the PDO, which is derived as the first PC of monthly SST residuals in the North Pacific Ocean poleward of 20° N, with residuals defined as the difference between observed anomalies and the monthly mean global average SSTA to separate this pattern of variability from any global warming signal that may be present in the data (Mantua et al., 1997; Zhang et al., 1997; Mantua and Hare, 2002). Several studies find similar results mirrored in the southern hemisphere and have derived an index that captures variability in both hemispheres known as the Interdecadal Pacific Oscillation (IPO) (Power et al., 1998, 1999b; Folland et al., 2002). Both the PDO and IPO appear to measure the same phenomenon, and we will focus on the original PDO index for this study. The data was obtained from the Department of Atmospheric Sciences, University of Washington website (http://www.atmos.washington.edu/∼mantua/abst.PDO. html). In the Atlantic Ocean, we first consider the NAO, which is a climatic phenomenon in the North Atlantic Ocean described by fluctuations in the difference of SLP between the Icelandic low and Azores high. We consider a monthly index of the NAO based on the difference of normalized SLP between Ponta Delgada, Azores and Stykkisholmur/Reykjavik, Iceland, found in http://www.cgd.ucar.edu/cas/jhurrell/indices.html. We also use two tropical Atlantic indices: the tropical North Atlantic (TNA) index, defined as the average of the monthly data over the rectangular region 5° N–25° N, 55 ° W–15 ° W, and the tropical South Atlantic (TSA) index, defined over the region 20 ° S–0, 30 ° W–10 ° E (Enfield et al., 1999). We used the Kaplan (1998) global SSTA data described in Section 2.1 to construct both of these tropical indices. In the Indian Ocean, we consider the IOD, which is a coupled ocean–atmosphere phenomenon in the Indian Ocean, and is characterized by anomalous cooling of SSTs in the south eastern equatorial Indian Ocean and anomalous warming of SSTs in the western equatorial Indian Ocean. Here we use an index used by Saji et al. (1999), which represents the anomalous SST gradient between the western equatorial Indian Ocean (50 ° E–70 ° E and 10 ° S–10° N) and the south eastern equatorial Indian Ocean (90 ° E–110 ° E and 10 ° S–0° N). The index can be found in http://www.jamstec.go.jp/ frsgc/research/d1/iod/. Finally we consider a global surface temperature anomaly dataset from 1900 to 2005 based on the study of Smith and Reynolds (2005) blended land Copyright 2009 Royal Meteorological Society
335
and ocean composite, which is derived by merging the global monthly SSTAs of Smith and Reynolds (2004) with an analysis of land surface temperature anomalies from the Global Historical Climatology Network (GHCN) dataset. This dataset was obtained from the US National Climatic Data Centre website (http://www.ncdc.noaa.gov/oa/climate/research/anoma lies/anomalies.html).
3.
Component extraction techniques
3.1. Principal component analysis Principal component analysis (PCA) is a technique that identifies a representation of a multivariate dataset using the information that is contained within the covariance matrix, so that the PCs are mutually uncorrelated. In addition, the PCs have the important property that successive components explain the maximum residual variance of the data in a least squares sense. As such, an important application of PCA is dimension reduction of the original dataset by retaining only those PCs that explain a significant portion of the data variance. To describe PCA, we first define X as an m by l data matrix (see Appendix A for the nomenclature used in the text) which has been row-centred, with m representing the number of spatial locations (dimensions) and l representing the temporal points of the data. The solution to the PCA problem is then defined in terms of the unit-norm eigenvectors (also referred to as ‘loading vectors’ or EOFs) e1 , . . . , em of the covariance matrix Cx = E{XXT }, which have been ordered so that the corresponding eigenvalues d1 , . . . , dm satisfy d1 ≥ d2 ≥ . . . ≥ dm with dj ≥ 0 for all j . The first PC of X may now be written as: PC1 = e1 T X
(1)
with successive PCs defined in a similar fashion. In the present application, the PCs represent time series and the eigenvectors represent spatial patterns. By construction, the PCs are uncorrelated with respect to the other PCs, and the eigenvectors orthogonal to the other eigenvectors. PCA therefore requires only the use of classic linear algebraic methods to find the eigenvectors and the corresponding eigenvalues of Cx . Due to the ordering of the eigenvectors and eigenvalues, reducing the dimension of the data set to dimension n, with n ≤ m, is now trivial and simply involves discarding all higher PCs. Let E = (e1 , . . . , en ) be the matrix whose columns are the unit-norm eigenvectors and D = diag(d1 , . . . , dn ) be the diagonal matrix of the eigenvalues of Cx , we can then find the whitening transform by: V = D−1/2 ET
(2)
This matrix can always be found and is an important pre-processing step for independent component analysis (Hyvarinen et al., 2001). Int. J. Climatol. 30: 333–346 (2010)
336 3.2.
S. WESTRA ET AL.
The Varimax rotation
Varimax is a rotational technique that was first developed by Kaiser (1958). The idea behind the Varimax rotation is that the components are most interpretable when the variance of the rotated squared loadings is maximized; i.e., the loading vector contains loadings that are maximized in an absolute sense (indicating a perfect linear relationship between the rotated component and the original data) or zero (indicating no linear relationship), thereby relegating the middle-sized loadings that are often the most difficult to interpret, to values that tend to zero. Thus the Varimax rotation solution can be found by maximizing a measure of variance given by υ ∗ which is defined as (Kaiser, 1958): m m 2 2 2 2 (aj i ) − ( aj i ) m n n j =1 j =1 v∗ = v∗i = (3) 2 m i=1 i=1 where j = 1, . . . , m represents the number of spatial points or number of elements in each eigenvector: i = 1, . . . , n represents the number of components after dimension reduction, and aj i are the loadings vectors of the ith variable on the j th rotated component. Here, PCA is used as an initial solution, with each loading vector defined as the eigenvector multiplied by the square root of the corresponding eigenvalue, such that ai T ai = di . One of the main challenges to applying the Varimax algorithm is the decision about the size of the reduced dimension, n, as this may have a significant influence on the result. For example, Horel (1981) makes use of the Guttman (1954) criterion which states that only the PCs which contribute more total variance than the typical normalized time series, i.e. one unit of total variance, should be retained. In the case of the SSTA data considered here, this would suggest the retention of 28 components out of a total of 1207 spatial locations (dimensions). Horel (1981) also demonstrated that the results are relatively insensitive to the number of PCs retained unless fewer than 25% of the number suggested by the Guttman criterion are rotated, leading to a number of studies (Kawamura, 1994; MestasNunez and Enfield, 1999; Dommenget and Latif, 2000) to retain 10 components for their analysis of global SST variability. We also found that our results are insensitive to the number of PCs retained and therefore considered only the first 10 components for analysis in this article. 3.3.
Independent component analysis
ICA, first introduced by Herault and Jutten (1986), may be considered as an extension to PCA (Oja, 2004), except that while PCA focuses on identifying components based only on second-order statistics (covariance), ICA considers higher-order statistics which allows it to search for components that are statistically independent. ICA Copyright 2009 Royal Meteorological Society
has been applied successfully in a wide range of areas, including BSS and feature extraction (see Hyvarinen, 1999 and references therein; Lee, 1998). The generic form of ICA occurs when the observations matrix, X, is derived through the mixing of an ndimensional ‘source’ matrix, S = (s1 , . . . , sn )T , with a temporal dimension of l, commonly referred to as the ICs (Comon, 1994). These ICs are assumed to be nonGaussian, mutually statistically independent and zeromean, with n ≤ m. Assuming that the mixing is both linear and stationary, the relationship between the data matrix and the ICs is given as: X = AS
(4)
where A is called the mixing matrix of dimension m × n. The objective of ICA is to estimate the mixing matrix, A, as well as the ICs, S, knowing only the observations matrix X. This can be achieved up to some scalar multiple of S, since any constant multiplying an independent component in Equation (4) can be cancelled by dividing the corresponding column of the mixing matrix A by the same constant. Central to the identification of the ICs from the data X is the assumption that all ICs will be nonGaussian, with the first component being the most nonGaussian (Hyvarinen et al., 2001). This follows from the logic outlined in the central limit theorem that if one mixes independent random variables through a linear transformation, the result will be a set of variables that tend to be Gaussian. If one reverses this logic, it can be presumed that the original ICs must have a distribution that has minimal similarity to a Gaussian distribution. Consequently, the approach adopted to extract ICs from data containing mixed signals amounts to finding a transformation that results in variables that exhibit maximal non-Gaussianity as defined through an appropriately specified statistic. The principal advantage of ICA over PCA is that ICA produces components that are statistically independent, whereas PCA produces components that while being uncorrelated may not be completely statistically independent. The distinction between removing correlation and achieving independence is discussed at length in Hyvarinen et al. (2001), with the concept of statistical independence being the more stringent criteria which requires the factorization of the joint probability distribution of a multivariate dataset into the product of its marginal distributions. For this reason, while both PCA and ICA result in a diagonal correlation (or covariance) matrix, ICA also considers higher-order moment information [i.e. information other than that contained in the covariance matrix of X (Oja, 2004)] to achieve independence. PCA is commonly used as a pre-processing step, however, both as a means of dimension reduction and as a starting point for whitening (or sphering) the data, such that X is linearly transformed into another n-dimensional vector Z that has an identity covariance matrix. For more detail on the statistical differences between uncorrelatedness and Int. J. Climatol. 30: 333–346 (2010)
337
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA
independence, refer to the analysis provided in Westra et al. (2007). As with the Varimax approach described earlier, selection of the reduced dimension by PCA is required before applying ICA. We adopt an approach described by Koch and Naito (2007), who have developed a criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis. The basis of the method is that if the dimension of the input data is too small, ICA may not find interesting directions, since valuable information has been lost as a result of dimension reduction. On the contrary, a large dimension requires higher computational effort and may contain noise or irrelevant information. To find a compromise between data complexity and available information, Koch and Naito (2007) find the dimension which results in the most non-Gaussian representation relative to the subspace of that dimension, using a criterion based on the Akaike Information Criterion (AIC) that penalizes higher dimensions. Further details are provided in their paper. We applied this approach using both skewness and kurtosis as measures of non-Gaussianity and found an optimal PCA-reduced dimension size of n = 4 in both cases. A sensitivity analysis shows that, unlike the results from the Varimax rotation, the ICA results are highly sensitive to the choice of n. For example, we compared the components with n = 3 to the three most similar components with n = 4 and found that the mean correlation coefficient between the components was 0.80. Similarly, although n = 5 yielded high correlation coefficients with a mean of 0.98 (compared with the base case of n = 4), increasing to n = 8 yielded correlation coefficients with a mean of just 0.64. These results suggest that the choice of n is a key parameter that must be determined prior to using ICA for interpretive applications.
4. Similarity between component extraction techniques Component extraction techniques are frequently applied to large global climatic datasets to identify modes of variability and improve our understanding of the underlying dynamics of the climate system. In this study we compare PCA with two rotational techniques that can increase the interpretability of the extracted components: the Varimax rotation, which finds a rotation that maximizes ‘simple structure’, thereby improving the regionalization of components; and ICA, which by maximizing non-Gaussianity finds the underlying ‘source signals’ which drive the multivariate dataset. As discussed in Section 3, using Koch and Naito’s (2007) criteria, the optimum number of components extracted from the SSTA dataset using ICA was found to be four. For easier comparison, we also consider only the first four PCs and Varimax-rotated PCs (referred to henceforth as R-PCs for consistency with earlier articles) for analysis in this article. The percentage of variance Copyright 2009 Royal Meteorological Society
Table I. Percentage variance explained by the first 10 unrotated PCs, the first 10 Varimax-rotated PCs (R-PCs), and the first four ICs. Component number
PC
R-PC
IC
1 2 3 4 Total (first four components) 5 6 7 8 9 10 Total (first 10 components)
18.7 10.2 4.9 3.9 37.7 3.4 3.0 2.8 2.4 2.3 2.1 53.7
12.5 7.3 6.2 5.4 31.4 5.4 4.6 4.2 3.3 2.5 2.36 53.7
13.0 11.2 8.0 5.5 37.7
explained for each of these components is shown in Table I. As can be seen, the total variance explained for the first four components is exactly the same for the PCA and ICA solutions by construction: 37.7%, although the variance is distributed differently among the individual components. Similarly, the variance explained by the first 10 R-PCs is the same as the first 10 PCs, although variance explained by the first four R-PCs is only 31.4%, since a greater fraction of variance is spread over the higher-order R-PCs. Once the first four components from each technique have been found, we wish to consider how sensitive the components are to the techniques used to extract them. This question is particularly pertinent for the Varimax and ICA results, since both techniques are described in the literature as being capable of enhancing interpretability. To compare techniques, we examine the Pearson correlation coefficients between the components from each technique. The results are shown in Table II and illustrate that although there are some similarities between methods, each method largely delivers components that are different from one case to the other. For example, when comparing the Varimax rotation with ICA, we see that although IC1 is highly correlated with both R-PC4 and R-PC2 (correlation coefficients of 0.89 and −0.82 respectively), and IC2 is highly correlated with R-PC1 (correlation coefficient of 0.75), neither IC3 nor IC4 shows a strong relationship to any of the Varimax-rotated components. Therefore, despite some apparent overlap, these results suggest that the Varimax rotation and ICA yield different results, with significant implications on interpretation. Another interesting result in Table II is the high correlation between PC1 and each of the Varimax-rotated components. To find out why this is the case, we recall that one of the implications of rotation is that it is not possible to maintain orthogonality simultaneously in both time and space (Mestas-Nunez, 2000). While the temporal orthogonality is maintained for the ICA solution at the expense of spatial orthogonality, the opposite Int. J. Climatol. 30: 333–346 (2010)
338
S. WESTRA ET AL.
Table II. Temporal correlations between the first four unrotated PCs, Varimax-rotated PCs (R-PC), and the ICs.
PC1 PC2 PC3 PC4
PC1 PC2 PC3 PC4
IC1 IC2 IC3 IC4
R-PC1
R-PC2
R-PC3
R-PC4
−0.80 0.59 0.05 −0.02
−0.89 −0.36 0.11 0.01
0.82 0.17 0.52 0.14
0.84 0.41 −0.06 0.23
IC1
IC2
IC3
IC4
0.75 0.33 −0.34 0.46
−0.63 0.45 −0.11 0.62
−0.12 −0.75 −0.56 0.33
−0.16 0.34 −0.75 −0.54
R-PC1
R-PC2
R-PC3
R-PC4
−0.43 0.75 −0.39 0.31
−0.82 0.40 0.32 −0.07
0.55 −0.41 −0.47 −0.54
0.89 0.20 −0.30 −0.08
Table III. Correlation coefficients between Varimax-rotated PCs (R-PCs).
R-PC1 R-PC2 R-PC3 R-PC4
R-PC1
R-PC2
R-PC3
1 0.51 −0.53 −0.44
1 −0.73 −0.88
1 0.73
R-PC4
1
5.
Interpretability of components
We have established that the components derived by PCA, the Varimax rotation, and ICA are significantly different from each other and wish to determine the degree to which each component is physically interpretable. To assist in this assessment, we use two complementary approaches. The first is a correlation analysis between each of the components and a range of climate indices which are used to describe variability in the Pacific, Atlantic, and Indian Oceans. The indices were described in Section 2 and the correlation coefficients between the indices and each of the components are provided in Table IV. The second approach compares the spatial patterns derived as the correlation coefficients between global SSTA time series and our PCs, R-PCs, and ICs (presented in Figures 1, 2, and 3, respectively), with patterns described in the literature. The patterns from this literature review were derived using a range of mathematical techniques (PCA, Varimax, ICA and others), spatial domains (some analyses consider the full global SST dataset, some others consider only a single ocean basin, while some others consider only the global tropics), temporal extents, and pre-processing techniques (some of the analyses use de-trended data and some removed the influence of the ENSO phenomenon before rotation). Many of these analyses also verified their results with an analysis of other datasets, such as global SLP. The variety of approaches allows us to examine Table IV. Correlation coefficients between components from PCA, Varimax and ICA, with a range of climate indices. PCA
is true for most climate applications of the Varimax algorithm (Kawamura, 1994; Mestas-Nunez and Enfield, 1999) since the objective is to find localized regions of climate variability. In consequence, the R-PCs, which are the temporal representations of the data variability, are mutually correlated. To examine the degree of correlation between the RPCs, we present the Pearson correlation coefficients in Table III. As can be seen, the correlation between RPCs is moderate to high, with the absolute value of the correlation coefficients up to 0.88. An analysis by MestasNunez and Enfield (1999) on global SSTAs from 1856 to 1991 with the linear trend removed from each grid point showed correlation coefficients ranging from 0.01 to 0.4. We repeated the Varimax rotation on our global SSTA data with the linear trend similarly removed and found correlation coefficients to be lower than 0.5 in all cases. This suggests that the high coefficients found in Table III are in great part due to the trend which is present in each of the R-PCs, although removing the trend does not completely eliminate temporal correlation. Copyright 2009 Royal Meteorological Society
Component number 1
2
3
4
Ni˜no34 PDO NAO TNA TSA IOD Temperature
0.41 0.33 −0.07 0.39 0.23 0.06 0.87
−0.80 −0.46 0.06 −0.05 0.12 −0.27 0.19
−0.09 0.08 −0.21 0.69 −0.10 −0.12 −0.09
0.05 −0.01 −0.10 0.29 0.04 −0.21 −0.06
Varimax Ni˜no34 PDO NAO TNA TSA IOD Temperature
−0.82 −0.50 0.08 −0.32 −0.12 −0.20 −0.59
−0.10 −0.13 0.02 −0.23 −0.46 0.03 −0.84
0.17 0.20 −0.19 0.73 0.14 −0.07 0.73
0.02 0.14 −0.02 0.30 0.20 −0.19 0.81
ICA Ni˜no34 PDO NAO TNA TSA IOD Temperature
0.09 0.06 −0.00 0.17 0.27 −0.10 0.70
−0.58 −0.43 0.03 −0.16 −0.06 −0.27 −0.50
0.62 0.26 0.04 −0.30 −0.05 0.19 −0.25
−0.30 −0.26 0.24 −0.76 0.06 0.11 −0.02
Int. J. Climatol. 30: 333–346 (2010)
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
339
1
Principal component:1. Variance explained:18.7% 60°N 40°N 20°N 0° 20°S 40°S Principal component:2. Variance explained:10.2% 60°N 40°N 20°N 0° 20°S 40°S Principal component:3. Variance explained:4.86% 60°N 40°N 20°N 0° 20°S 40°S Principal component:4. Variance explained:3.91% 60°N 40°N 20°N 0° 20°S
0°
40°W
80°W
120°W
160°W
160°E
120°E
80°E
40°E
40°S
Figure 1. Correlation between the first four PCs and gridded global SSTAs. Contours are spaced at correlation coefficient intervals of 0.2. This figure is available in colour online at www.interscience.wiley.com/ijoc
the robustness of each of our components in describing modes of climate variability, which we will examine on a region-by-region basis in the following sections. It should be noted that the results presented here are based on the ability of the extracted components to be related to physically known modes of variability in the climate, such as the ENSO phenomenon, the IOD, the PDO, and the like. Our assumption in using this logic is that these modes can be presumed to represent multiple different or near-independent signals that when combined result in the type of variability we observe in global SSTs. How reasonable the above assumption Copyright 2009 Royal Meteorological Society
is, however, is an open question, with some of the above-mentioned modes often being argued as lowfrequency representations of the more prominent ENSO phenomenon that dominates inter-annual variability in SSTA. Furthermore, the fact that many of the above modes have been formed based on the PCA logic (though implemented across smaller regions or in a manner different to how it has been implemented in our study) makes the comparison, especially with PCA or Varimax rotated PCA (R-PCA) more difficult to interpret. Having said the above, the fact still remains that if the known modes we are comparing with do represent different forcings in the SSTA system, then they are the only Int. J. Climatol. 30: 333–346 (2010)
340
S. WESTRA ET AL. -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Varimax component:1. Variance explained:12.5% 60°N 40°N 20°N 0° 20°S 40°S
Varimax component:2. Variance explained:7.25% 60°N 40°N 20°N 0° 20°S 40°S
Varimax component:3. Variance explained:6.18% 60°N 40°N 20°N 0° 20°S 40°S
Varimax component:4. Variance explained:5.42% 60°N 40°N 20°N 0° 20°S
0°
40°W
80°W
120°W
160°W
160°E
120°E
80°E
40°E
40°S
Figure 2. Correlation between the first four R-PCs and gridded global SSTAs. Contours are spaced at correlation coefficient intervals of 0.2. This figure is available in colour online at www.interscience.wiley.com/ijoc
legitimate basis identified against which the components can be compared. 5.1.
The global temperature trend
We consider two representations of the global temperature trend as a basis for examining the components derived in this article. The first is an index representing global surface temperature anomalies, described in Section 2.2. The correlation coefficient between this index and the components from PCA and between the Varimax Copyright 2009 Royal Meteorological Society
rotation and ICA are presented in Table IV. Considering firstly the PCA analysis, it is evident that the global temperature trend is captured by PC1 with a correlation coefficient of 0.87. The remainder of the PCs have correlation coefficients of 0.19 and lower, suggesting that PCA is able to provide a clear representation of the global warming signal. The time series of this first component is shown in the lower plot of Figure 4, and suggests the classic global warming trend before the 1940s and from the 1970s to the present, with a period of stable temperatures in between. Int. J. Climatol. 30: 333–346 (2010)
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA -1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
341
1
Independent component:1. Variance explained:13% 60°N 40°N 20°N 0° 20°S 40°S
Independent component:2. Variance explained:11.2% 60°N 40°N 20°N 0° 20°S 40°S
Independent component:3. Variance explained:7.97% 60°N 40°N 20°N 0° 20°S 40°S
Independent component:4. Variance explained:5.58% 60°N 40°N 20°N 0° 20°S
0°
40°W
80°W
120°W
160°W
160°E
120°E
80°E
40°E
40°S
Figure 3. Correlation between the first four ICs and gridded global SSTAs. Contours are spaced at correlation coefficient intervals of 0.2. This figure is available in colour online at www.interscience.wiley.com/ijoc
In contrast to the results from the PCA analysis, relatively high correlation coefficients can be found between the global temperature trend and each of the R-PCs, with correlation coefficients ranging from 0.59 to 0.84, and the first two ICs, with correlation coefficients of 0.70 and 0.50, respectively. Thus, the effect of rotating the PCs in this case is to diffuse the long-term trend across a larger number of components. The second representation of the global temperature trend is a spatial representation developed by Cane et al. (1997), who looked at the trend in monthly mean Copyright 2009 Royal Meteorological Society
temperature anomalies of global SST changes between 1900 and 1991, performed at each grid point of their SST dataset. Their results suggested warming in most of the global oceans, with the exception of much of the North Pacific, a region south of Greenland, and the eastern equatorial Pacific. Comparing their Figure 3(a) to our PC1 and R-PC2 suggests similar regions with limited or no warming. These regions of limited or no warming could not be observed as clearly in our ICA results, reflecting the lower correlation coefficients between the ICs and the temperature trend described earlier. Int. J. Climatol. 30: 333–346 (2010)
342
S. WESTRA ET AL. -1
-0.5
0
0.5
1
Principal component:1. Variance explained:18.7% 60°N 40°N 20°N 0° 20°S 0°
40°W
80°W
120°W
160°W
160°E
120°E
80°E
40°E
40°S
500 0 -500 1900
1920
1940
1960
1980
2000
Figure 4. (a) Correlation between PC1 and gridded global SSTAs and (b) time series of PC1 with 10-year smoother. Contours are spaced at intervals of 0.2. The first- and the last-five-year segments of the smoothed time series were constructed using less than 10 years of data and, therefore, may not accurately capture the underlying trend for this period. This figure is available in colour online at www.interscience.wiley.com/ijoc
5.2.
The ENSO phenomenon
The ENSO phenomenon has long been known to be the dominant source of inter-annual climate variations for the Pacific and the global tropics (Rasmussen and Wallace, 1983). Using PCA, Weare et al. (1976) showed that the loading patterns of their first PC represented an SSTA pattern indicative of ENSO, with positive and negative anomalies over the eastern equatorial Pacific and the central North Pacific around 40° N, respectively. In terms of global SST analyses, the loading pattern of the first PC from the global analysis of Hsiung and Newell (1983) and Nitta and Yamada (1989), and the analysis of tropical SSTs by Dommenget and Latif (2002), were all also shown to be related to the ENSO phenomenon. As discussed in the Introduction, the interpretability of the first PC is generally not called into question; rather, it is the fact that successive PCs must explain the maximum remaining variance that leads to constrained patterns in PC2 and beyond (Richman, 1986). Therefore it is not surprising that Kawamura (1994) performed a Varimax rotation of global SST data from 1955 to 1988 and found that their rotated empirical orthogonal function (R-EOF) also showed strong positive loadings in the central and eastern equatorial Pacific. Loadings of this component and the central North Pacific and tropical Indian Ocean were weaker, however, highlighting the tendency for the Varimax rotation to result in components that are localized. In our examination of indices described in Table IV, we show that it is PC2 which is most closely related to the Ni˜no 3.4 index, with a correlation coefficient of −0.80. The first PC was related to the temperature Copyright 2009 Royal Meteorological Society
trend. This result is similar to the analysis by Folland et al. (1991) and Ward et al. (1993), who find that their first PC captures the global warming signal and the second PC captures the ENSO signal. We suggest that, in contradistinction to the arguments presented earlier, in this case our PC2 is not likely to be an artificially induced mode, since the global warming trend is largely global in nature and as such the tendency for PCA to maximize variance across the whole dataset is legitimate in this case. Considering the Varimax and ICA results, it is seen that R-PC1 is highly correlated to Ni˜no 3.4 with a correlation coefficient of −0.82 with low correlation coefficients for the remaining R-PCs. In contrast, ICA appears to represent the ENSO signal as a combination of IC2 and IC3 with correlation coefficients of 0.58 and 0.62, respectively. Considering the correlation between these ICs and global SSTs in Figures 3(b) and (c), it is apparent that IC2 focuses on variability in the extra-tropics, with positive loadings in the central northern Pacific above 20° N and south Pacific below 20 ° S and with negative loadings in the north eastern part of the Pacific along the North American coast. In contrast, IC3 focuses on the tropical portion of ENSO between 10 ° S and 10° N, with positive loadings in the central and eastern tropical Pacific and negative loadings in the western tropical Pacific. 5.3. Decadal variability in the Pacific In addition to the inter-annual variability in the Pacific Ocean resulting from the ENSO phenomenon, numerous studies have described Pacific Ocean variability at decadal and interdecadal time scales, focusing largely on the extra-tropics. The dynamics of this low-frequency variability have been examined using both observational datasets and the output of coupled ocean–atmosphere general circulation models. Both approaches support the picture that this variability is based on a cycle involving unstable ocean–atmosphere interactions over the North Pacific with a period in the order of a few decades (Latif and Barnett, 1996; Minobe, 1997; Zhang et al., 1997). As discussed in Section 2, the PDO has been put forward to represent the dominant pattern of this longterm variability (Mantua et al., 1997; Mantua and Hare, 2002). A correlation analysis between the PDO and each of the components shows generally low levels of correlation, with coefficients of 0.33 and −0.46 for the PC1 and PC2, respectively; −0.50 for R-PC1 and −0.43 for the IC2. These components are also the same components that exhibit high correlations with the Ni˜no 3.4 index and are of the same sign, demonstrating the high level of coherence between this decadal-scale variability and the inter-annual variability represented by ENSO. The relationship between ENSO and this lowerfrequency extra-tropical variability has been described in various earlier studies (Tanimoto et al., 1993; Mantua et al., 1997; Zhang et al., 1997), suggesting the PDO may be viewed as ENSO-like interdecadal variability. Those studies that do succeed in separating the interannual variability associated with ENSO and the decadal Int. J. Climatol. 30: 333–346 (2010)
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA
to centennial variability use some form of bandpass filtering before applying PCA (Zhang et al., 1997), which is beyond the scope of the present study. The correlation maps of those components that are related to the PDO all show significant variability in the central North Pacific Ocean, poleward of about 30° N, as well as some symmetry reflected in the southern Pacific Ocean as was found in White and Cayan (1998). Some linkages could also be seen between these components and the Indian and Atlantic Ocean depending on which component extraction technique is being considered; however, given that the correlation coefficients between our components and the PDO are relatively low, it is not possible to use our results to comment on teleconnections between the PDO and variability in other ocean basins. 5.4. Tropical Atlantic Ocean Large-scale coherent variability in the tropical Atlantic Ocean has been the subject of several decades of research, with rainfall anomalies in the Sahel (Lamb, 1978a,b; Folland et al., 1986; Lamb and Peppler, 1992) and northeast Brazil (Hastenrath and Heller, 1977; Moura and Shukla, 1981; Hastenrath and Greischar, 1993; Filho and Lall, 2003) suggesting a pattern of SSTAs with opposite sign north and south of the inter-tropical convergence zone (ITCZ). This polar behaviour across the ITCZ also can be observed by applying PCA to the Atlantic SSTA field, which is used as the basis of a dipole index (Weare, 1977; Hastenrath, 1978; Servain, 1991). A number of more recent papers claim that due to the orthogonality that is inherent in PCA, this dipole structure may be induced artificially into the SSTA data. A reinterpretation based on the Varimax rotation has been performed by numerous authors, suggesting that there are two regions, one to the north and the other to the south of the equator, which show significant variability but are not significantly correlated with each other (Houghton and Tourre, 1992; Enfield and Mayer, 1997; Dommenget and Latif, 2000). This variability forms the basis for two indices, the tropical North Atlantic (TNA) index and the tropical South Atlantic (TSA) index, which can be used to describe tropical Atlantic variability (Enfield, 1996; Enfield et al., 1999). To see how our analysis contributes to this discussion, we first consider the correlation between each of the components and both the TNA and TSA. It is seen that PC3 exhibits fairly strong correlation with the TNA (correlation coefficient of 0.69); however, no significant correlation is observed with any of the PCs and the TSA. In contrast, the Varimax rotation results suggest correlation between the R-PC3 and the TNA (correlation coefficient of 0.73), and between the R-PC2 and the TSA (correlation coefficient of 0.46). Examination of the loading patterns of PC3 suggests the potential of polar behaviour, with strongly positive weights north of the equator and slightly negative weights south of the equator. In contrast, the Varimax results suggest the Atlantic variability north and south of the equator to Copyright 2009 Royal Meteorological Society
343
be represented by different modes, confirming the more recent interpretation. Finally, the ICA results support the results from the Varimax analysis, with IC4 correlated with the TNA (correlation coefficient of 0.76) and IC1 correlated with the TSA correlation coefficient 0.27. This, therefore, suggests that the separation of the TNA and TSA as two separate modes is a robust result. 5.5. North Atlantic oscillation The NAO is a climatic phenomenon in the North Atlantic Ocean of fluctuations in the difference of SLP between the Icelandic low and Azores high. A review of the correlation coefficients between each of the components and the NAO found a maximum coefficient of 0.24, suggesting that the techniques reviewed in this article are not capable of distinguishing this mode. A possible reason for this is that global teleconnections associated with the NAO are not sufficiently strong to be elucidated using any of the techniques described in this article, which focus on the extraction of modes that explain a large proportion of the global SST variability. 5.6. Indian Ocean Dipole The IOD is a coupled ocean–atmosphere phenomenon in the Indian Ocean and is characterized by anomalous cooling of SST in the south eastern equatorial Indian Ocean and anomalous warming of SST in the western equatorial Indian Ocean (Saji et al., 1999; Webster et al., 1999). The mode accounts for about 12% of the SST variability in the Indian Ocean and has been known to cause above-average rainfall in eastern Africa and droughts in Indonesia (Saji et al., 1999). Similar to the NAO, correlation coefficients between each of the components analysed in this paper and the IOD are less than 0.27, which is attributed to the relatively low variability of the IOD as a fraction of total global SST variability.
6.
Applications for ICA
Our results suggest that, compared with ICA, the Varimax rotation provides components that are both easier to interpret and more consistent with previous studies described in the literature. Although some of the comparisons in the literature make use of the Varimax rotation, partially explaining the close correspondence with the Varimax rotational results presented in this article, the mathematical approaches that were used in the literature reviewed for this study were sufficiently varied to suggest that the improved interpretability of the Varimax-rotated components compared with ICs is a robust result. The comparative examples described in the literature include the following: using spatial averaging techniques rather than component extraction techniques to develop indices such as Ni˜no 3.4, TNA and TSA; using different spatial domains such as tropical oceans only or using only Int. J. Climatol. 30: 333–346 (2010)
344
S. WESTRA ET AL.
a single ocean basin; using time series of different durations; and using different pre-processing techniques such as removing the global temperature trend or removing the ENSO phenomenon before analysis. Applications of ICA are not limited to finding components that are physically interpretable; ICA also has an important advantage over both PCA and the Varimax rotation in the manner in which it maximizes statistically independence. The benefits of this property have been demonstrated in an analysis of a multivariate Colombian streamflow dataset, first in the context of synthetically generating multivariate time series while maintaining the spatial and temporal dependence properties of the historical data (Westra et al., 2007) and secondly in the context of generating multivariate seasonal forecasts using global SSTs as the predictors (Westra et al., 2008). The basis of both these approaches is that ICA is used to transform the multivariate data to a set of univariate time series which are mutually independent, so that the analysis can be performed separately on each component. The final step of the analysis then involves applying the inverse ICA transform to the generated components to transform the data back to the original space, ensuring spatial dependence is maintained. Such applications are equally valid for the SST dataset, in which one may wish to generate seasonal forecasts of global or regional SSTs based on SSTs at previous time steps or by using exogenous variables. A further application would involve applying ICA in a similar way to the PCA-based approach used in Kaplan et al. (1997) to reconstruct a global SST dataset. These applications will be reserved for future research.
with more recent interpretations of the tropical Atlantic climate system. These results confirm the conclusions of numerous previous studies stating that any interpretive analysis of datasets such as global SST benefits from using PCA followed by a Varimax rotation, either on the full reduced-dimension PCA solution or alternatively after removing phenomena that are global in extent, such as the global warming trend and/or the ENSO phenomenon. In contrast, ICA appears to be less successful in extracting components that are physically interpretable, with ENSO split across several modes. Furthermore, where ICA extracts components which are physically interesting, such as the separation of the TNA and TSA as two distinct modes, this does not add any interpretive value to the solution already provided by the Varimax rotation and other techniques cited in the literature. Finally, the ICA technique is highly sensitive to the number of PCs retained during pre-filtering, and as such presents difficulties both during the analysis (in determining the optimal number of PCs to retain) and in the interpretation (which might vary if the number of PCs retained varies). Acknowledgements Funding for this research came from the Australian Research Council and the Sydney Catchment Authority. Their support for this work is gratefully acknowledged.
Appendix A Notation
7.
Conclusions
The objective of the research presented in this article is to consider PCA and two rotational techniques, Varimax and ICA, with applications to the analysis of the global SSTA dataset. Both the Varimax rotation and ICA have been cited in the literature as providing solutions that enhance ‘interpretability’ of the extracted components compared with the traditional PCA solution, although the techniques are based on very different mathematics. Specifically, the Varimax rotation seeks to maximize a measure of ‘simple structure’ such that the resultant components have greater regionalization, while ICA seeks to maximize a measure of non-Gaussianity. We show that the ICA and Varimax solutions are significantly different from each other, and both are different from the original PCA solution. In terms of the physical interpretability of the extracted components, we find that the phenomena which are global in extent, such as a global warming trend and the ENSO phenomenon, are well represented using PCA. In contrast, as expected the more regional phenomena are less well represented by PCA. For example, unlike PCA, the Varimax algorithm is able to represent the tropical North and South Atlantic indices as two distinct phenomena, thereby providing conclusions consistent Copyright 2009 Royal Meteorological Society
an m × l matrix of observed data which has been centred m number of spatial points (indexed by i) n number of reduced dimensions (indexed by i) l number of spatial points (length) in time series (indexed by j ) Cx covariance matrix of X, Cx =E {XXT } ei eigenvectors of Cx , i = 1, . . . , m di eigenvalues of Cx , i = 1, . . . , m E matrix of eigenvectors D matrix of eigenvalues, given as D = diag (d1 , . . . , dn ) PCi ith principal component of X V whitening transform v ∗ measure of variance used for Varimax rotation ai loading vectors after Varimax rotation S an n by l matrix of independent components A an m by n mixing matrix E{.} mathematical expectation X
References Aires F, Chedin A, Nadal JP. 1999. Analyse de series temporelles geophysiques et theorie de l’information: l’analyse en composants independants. Geophysique Externe, Climat et Environnement/External Geophysics, Climate and Environment 328: 569–575. Int. J. Climatol. 30: 333–346 (2010)
INTERPRETING VARIABILITY IN GLOBAL SST DATA USING ICA AND PCA Aires F, Chedin A, Nadal JP. 2000. Independent component analysis of multivariate time series: Application to the tropical SST variability. Journal of Geophysical Research-Atmospheres 105(D13): 17437–17455. Aires F, Rossow WB, Chedin A. 2002. Rotation of EOFs by the independent component analysis: Toward a solution of the mixing problem in the decomposition of geophysical time series. Journal of the Atmospheric Sciences 59(1): 111–123. Basak J, Sudarshan A, Trivedi D, Santhanam MS. 2004. Weather data mining using independent component analysis. Journal of Machine Learning Research 5: 239–253. Cane MA, Clement AC, Kaplan A, Kushnir Y, Pozdhyakov D, Seager R, Zebiak SE, Murtugudde R, 1997. Twentieth-century sea surface temperature trends. Science 275(5302): 957–960. Ciaramella A, Lauro ED, Martino SD, Lieto BD, Falanga M, Tagliaferri R. 2004. Characterisation of Strombolian events by using independent component analysis. Nonlinear Processes in Geophysics 11: 453–461. Comon P. 1994. Independent component analysis: A new concept? Signal Processing 36: 287–314. Cordery I, Opoku-Ankomah Y. 1994. Temporal variation of relations between tropical sea-surface temperatures and New South Wales Rainfall. Australian Meteorological Magazine 43(2): 73–80. Dommenget D, Latif M. 2000. Interannual to decadal variability in the tropical Atlantic. Journal of Climate 13: 777–792. Dommenget D, Latif M. 2002. A cautionary note on the interpretation of EOFs. Journal of Climate 15: 216–225. Drosdowsky W. 1993. An analysis of Australian seasonal rainfall anomalies: 1950–1987. II: Temporal variability and teleconnection patterns. International Journal of Climatology 13: 111–149. Enfield DB. 1996. Relationships of inter-American rainfall to tropical Atlantic and Pacific SST variability. Geophysical Research Letters 23: 3305–3308. Enfield DB, Mayer DA. 1997. Tropical Atlantic sea surface temperature variability and its relation to El Nino Southern Oscillation. Journal of Geophysical Research 102(C1): 929–945. Enfield DB, Mestas-Nunez AM, Mayer DA, Cid-Serrano L. 1999. How ubiquitous is the dipole relationship in tropical Atlantic sea surface temperatures? Journal of Geophysical Research 104(C4): 7841–7848. Filho FAS, Lall U. 2003. Seasonal to interannual ensemble streamflow forecasts for Ceara, Brazil: Appliactions of a multivariate, semiparametric algorithm. Water Resources Research 39(11): 1307. Folland CK, Owen J, Ward MN, Colman A. 1991. Prediction of seasonal rainfall in the Sahel region using empirical and dynamical statistical methods. Journal of Forecasting 10: 21–56. Folland CK, Palmer TN, Parker DE. 1986. Sahel rainfall and worldwide sea temperatures. Nature 320: 602–607. Folland C, Renwick JA, Salinger MJ, Mullan AB. 2002. Relative influences of the Interdecadal Pacific Oscillation and ENSO on the South Pacific Convergence Zone. Geophysical Research Letters 29(13): 21-1–21-4, DOI: 10.1029/2001GL014201. Guttman L. 1954. Some necessary conditions for common-factor analysis. Psychometrika 19(2): 149–161. Hastenrath S. 1978. On modes of tropical circulation and climate anomalies. Journal of the Atmospheric Sciences 35: 2222–2231. Hastenrath S, Greischar L. 1993. Further work on the prediction of northeast Brazil rainfall anomalies. Journal of Climate 6: 743–758. Hastenrath S, Heller L. 1977. Dynamics of climatic hazards in Northeast Brazil. Quarterly Journal of the Royal Meteorological Society 103: 77–92. Herault J, Jutten C. 1986. Space or time adaptive signal processing by neural network models. In Neural Networks for Computing: AIP Conference Proceedings, Denker JS (ed.). American Institute for physics: New York. Horel JD. 1981. A rotated principal compoent analysis of the interannual varaibility of Northern Hemisphere 500 mb height field. Monthly Weather Review 109: 2080–2902. Houghton RW, Tourre YM. 1992. Characteristics of Low-frequency sea surface temperature fluctuations in the tropical Atlantic. Journal of Climate 5: 765–771. Hsiung J, Newell RE. 1983. The principal nonseasonal modes of variation of global sea surface temperature. Journal of Physical Oceanography 13: 1957–1967. Hurrell JW. 1995. Decadal trends in the North Atlantic Oscillation: Regional temperature and precipitation. Science 269: 676–679. Hyvarinen A. 1999. Survey on independent component analysis. Neural Computing Surveys 2: 94–128. Copyright 2009 Royal Meteorological Society
345
Hyvarinen A, Karhunen J, Oja E. 2001. Independent Component Analysis. John Wiley and Sons: New York; 481. Ilin A, Valpola H, Oja E. 2006. Exploratory analysis of climate data using source separation methods. Neural Networks 19: 155–167. Janicot S, Trzaska S, Poccard I. 2001. Summer Sahel-ENSO teleconnection and decadal time scale SST variations. Climate Dynamics 18: 303–320. Jolliffe IT. 1987. Rotation of principal components: Some comments. Journal of Climatology 7: 507–510. Jolliffe IT. 2003. A cautionary note on Artificial examples of EOFs. Journal of Climate 16(7): 1084–1086. Jolliffe IT, Uddin M, Vines SK. 2002. Simplified EOFs – three alternatives to rotation. Climate Research 20(3): 271–279. Kaiser HF. 1958. The Varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3): 187–200. Kaplan A, Cane MA, Kushnir Y, Clement AC, Blumenthal MB, Rajagopalan B. 1998. Analyses of global sea surface temperature 1856–1991. Journal of Geophysical Research-Oceans 103(C9): 18567–18589. Kaplan A, Kushnir Y, Cane MA, Blumenthal MB. 1997. Reduced space optimal analysis for historical data sets – 136 years of atlantic sea surface temperatures. Journal of Geophysical Research-Oceans 102(C13): 27835–27860. Kawamura R. 1994. A rotated EOF analysis of global sea surface temperature variability with interannual and interdecadal scales. Journal of Physical Oceanography 24: 707–715. Koch I, Naito K. 2007. Dimension selection for feature selection and dimension reduction with principal and independent component analysis. Neural Computation 19(2): 513–545. Kutzbach J. 1967. Empirical Engenvectors of sea level pressure, surface temperature and precipitation complexes over North America. Journal of Applied Meteorology 6: 791–802. Lamb PJ. 1978a. Case studies of tropical Atlantic surface circulation patterns during recent sub-Saharan weather anomalies: 1967 and 1968. Monthly Weather Review 106: 482–491. Lamb PJ. 1978b. Large-scale tropical Atlantic surface circulation patterns associated with Subsaharan weather anomalies. Tellus A30: 240–251. Lamb PJ, Peppler RA. 1992. Further case studies of tropical Atlantic surface atmospheric and oceanic patterns associated with sub-saharan drought. Journal of Climate 5: 476–488. Latif M, Barnett TP. 1996. Decadal climate variability over the North Pacific and North America: dynamics and predictability. Journal of Climate 9: 2407–2423. Lee TW. 1998. Independent Component Analysis – Theory and Applications. Kluwer Academic Publishers: Boston, MA. Lorenz EN. 1956. Empirical orthogonal functions and statistical weather prediction. Technical report, Statistical Forecast Project Report 1, Department of Meteorology, MIT (NTIS AD 110268), 49 pp. Mantua NJ, Hare SR. 2002. The Pacific decadal oscillation. Journal of Oceanography 58(1): 35–44. Mantua NJ, Hare SR, Zhang Y, Wallace JM, Francis RC. 1997. A Pacific interdecadal climate oscillation with impacts on salmon production. Bulletin of the American Meteorological Society 78(6): 1069–1079. Mestas-Nunez AM. 2000. Orthogonality properties of rotated empirical modes. International Journal of Climatology 20: 1509–1516. Mestas-Nunez AM, Enfield DB. 1999. Rotated global modes of nonENSO sea surface temperature variability. Journal of Climate 12: 2734–2746. Minobe S. 1997. A 50–70 year climatic oscillation over the North Pacific and North America. Geophysical Research Letters 24: 683–686. Moura AD, Shukla J. 1981. On the dynamics of droughts in northeast Brazil: Observations, theory and numerical experiments with a general circulation model. Journal of the Atmospheric Sciences 38: 2653–2675. Nicholls N. 1989. Sea surface temperatures and Australian winter rainfall. Journal of Climate 2: 965–973. Nitta T, Yamada S. 1989. Recent warming of tropical SST and its relationship to the Northern hemispheric circulation. Journal of the Meteorological Society of Japan 67: 375–383. Obukhov AM. 1947. Stastically homogeneous fields on a sphere. Uspekht Matematicheskikh Navk 2: 196–198. Oja E. 2004. Applications of independent component analysis. Neural Information Processing – Lecture Notes in Computer Science. Springer-Verlag: Berlin Heidelberg; 1044–1051. Int. J. Climatol. 30: 333–346 (2010)
346
S. WESTRA ET AL.
Opoku-Ankomah Y, Cordery I. 1993. Temporal variation between New South Wales rainfall and the southern oscillation. International Journal of Climatology 13: 51–64. Power S, Casey T, Folland C, Colman A, Mehta V. 1999a. Interdecadal modulation of the impact of ENSO on Australia. Climate Dynamics 15(5): 319–324. Power S, Tseitkin F, Mehta V, Lavery B, Torok S, Holbrook N. 1999b. Decadal climate variability in Australia during the twentieth century. International Journal of Climatology 19(2): 169–184. Power S, Tseitkin E, Torok S, Lavey B, Dalini R, McAvaney B. 1998. Australian temperature, Australian rainfall and the southern oscillation, 1910–1992: coherent variability and recent change. Australian Meteorological Magazine 47: 85–101. Preisendorfer RW. 1988. Principal Component Analysis in Meteorology and Oceanography. Elsevier: Amsterdam; 425. Rasmussen EM, Wallace JM. 1983. Meteorological aspects of the El Nino/Southern Oscillation. Science 222: 1195–1202. Richman MB. 1986. Rotation of principal components. Journal of Climatology 6: 293–335. Richman MB. 1987. Rotation of principal components: A reply. Journal of Climatology 7: 511–520. Saji NH, Goswami BN, Vinayachandran PN, Yamagata T. 1999. A dipole mode in the tropical Indian Ocean. Nature 401: 360–363. Servain J. 1991. Simple climatic indices for the tropical Atlantic Ocean and some applications. Journal of Geophysical Research 96: 15137–15146. Smith TM, Reynolds RW. 2004. Improved extended reconstruction of SST (1854–1997). Journal of Climate 17: 2466–2477. Smith TM, Reynolds RW. 2005. A global merged land air and sea surface temperature reconstruction based on historical observations (1880–1997). Journal of Climate 18: 2021–2036. Tanimoto Y, Iwasaka N, Hanawa K, Toba Y. 1993. Characteristic variations of sea surface temperature with multiple time scales in the North Pacific. Journal of Climate 6: 1153–1600. Trenberth KE. 1997. The definition of El Nino. Bulletin of the American Meteorological Society 78: 2771–2777.
Copyright 2009 Royal Meteorological Society
Trenberth KE, Stepaniak DP, Smith L. 2005. Interannual variability of patterns of atmospheric mass distribution. Journal of Climate 18: 2812–2825. Von Storch H, Zwiers FW. 1999. Statistical Analysis in Climate Research. Cambridge University Press: Cambridge. Walker GT, Bliss EW. 1932. World Weather V. Memoirs of the Royal Meteorological Society 4: 53–84. Ward MN, Folland CK, Maskell K, Colman A, Rowell DP. 1993. Experimental seasonal forecasting of tropical rainfall at the UK Meteorological Office. In Prediction of Interannual Climate Variations, Shukla J (ed.). Springer-Verlag: Berlin; 197–216. Weare BC. 1977. Empirical orthogonal analysis of Atlantic Ocean surface temperatures. Quarterly Journal of the Royal Meteorological Society 103: 467–478. Weare BC, Navato A, Newell RE. 1976. Empirical orthogonal analysis of Pacific Ocean sea surface temperature. Journal of Physical Oceanography 6: 671–678. Webster PJ, Moore A, Loschnigg J, Leban M. 1999. Coupled ocean dynamics in the Indian Ocean during the 1997–1998. Nature 401: 356–360. Westra SP, Brown C, Lall U, Sharma A. 2007. Modeling multivariable hydrological series: principal component analysis or independent component analysis? Water Resources Research 43: W06429, DOI:10.1029/2006WR005617. Westra SP, Brown C, Lall U, Sharma A. 2008. Multivariate streamflow forecasting using independent component analysis. Water Resources Research 44: W02437, DOI: 10.1029/2007WR006104. White WB, Cayan DR. 1998. Quasi-periodicity and global symmetrics in interdecadal upper ocean temperature variability. Journal of Geophysical Research 103: 21335–21354. Wilks DS. 2006. Statistical Methods in the Atmospheric Sciences. Elsevier: Amsterdam. Zhang Y, Wallace JM, Battisti DS. 1997. ENSO-like interdecadal variability: 1900–1993. Journal of Climate 10(5): 1004–1020.
Int. J. Climatol. 30: 333–346 (2010)