THE TIME-DEPENDENT INTRINSIC CORRELATION BASED ON THE ...

5 downloads 69 Views 2MB Size Report
Research Center for Adaptive Data Analysis. National Central University .... also be time-varying, which would call for an adaptive sliding window in order to.
April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

Advances in Adaptive Data Analysis Vol. 2, No. 2 (2010) 233–265 c World Scientific Publishing Company  DOI: 10.1142/S1793536910000471

THE TIME-DEPENDENT INTRINSIC CORRELATION BASED ON THE EMPIRICAL MODE DECOMPOSITION

XIANYAO CHEN First Institute of Oceanography State Oceanic Administration Qingdao, Shandong 266061, China [email protected] ZHAOHUA WU Department of Meteorology Florida State University Tallahassee, FL 32306-4520, USA NORDEN E. HUANG Research Center for Adaptive Data Analysis National Central University Chungli, Taiwan

A Time-Dependent Intrinsic Correlation (TDIC) method is introduced. This new approach includes both auto- and cross-correlation analysis designed especially to analyze, capture and track the local correlations between nonlinear and nonstationary time series pairs. The approach is based on Empirical Mode Decomposition (EMD) to decompose the nonlinear and nonstationary data into their intrinsic mode functions (IMFs) and uses the instantaneous periods of the IMFs to determine a set of the sliding window sizes for the computation of the running correlation coefficients for multi-scale data. This new method treats the selection of the sliding window sizes as an adaptive process determined by the data itself, not a “tuning” process. Therefore, it gives an intrinsic correlation analysis of the data. Furthermore, the multi-window approach makes the new method applicable to complicated data from multi-scale phenomena. The synthetic and time series from real world are used to demonstrate conclusively that the new approach is far more superior over the traditional method in its ability to reveal detailed and subtle correlations unavailable through any other methods in existence. Thus, the TDIC represents a major advance in statistical analysis of data from nonlinear and nonstationary processes. Keywords: Time-Dependent Intrinsic Correlation; Time-Dependent Intrinsic Crosscorrelation; Time-Dependent Intrinsic Auto-correlation; nonlinear and nonstationary time series; Empirical Mode Decomposition.

1. Introduction The estimation of the correlation coefficient between two data series is the standard method to detect the physical relationship between them. There are, however, 233

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

234

00047

X. Chen, Z. Wu & N. E. Huang

certain problems with this seemingly simple operation. As the correlation coefficient is defined as the covariance of two variables divided by the product of the standard deviation of the two variables globally, there is an underlying assumption that the variables should be stationary and linear. Therefore, it can only be applied to data from linear and stationary processes. For the data from the real world, such as medical, financial, climate, mechanical, geophysical, and many social science applications, the processes are usually nonstationary and nonlinear. Consequently, the correlation coefficient of two data series on the whole domain cannot reveal the possible relationship between them. This is especially true when the processes are intermittent or contain drift and trends. Studies on the validity of the correlation for nonstationary time series can be traced back to 1925 when Yule [1926] gave his presidential address at the meeting in the Royal Statistical Society. He stated: “It is fairly familiar knowledge that we sometimes obtain between quantities varying with the time (time-variables) quite high correlations to which we cannot attach any physical significant whatever, although under the ordinary test the correlation would be held to be certainly “significant”. Yule introduced the term, “nonsense correlations”, for the correlation coefficients calculated without checking the stationary assumption behind the interpretation. Many research scientists tried to address this problem through different ways. Wierwille (1965) used optimal approximation of the ensemble correlation function to obtain the cross-correlation and auto-correlation functions for a special kind of nonstationary processes. Phillips (1986) developed the asymptotic distribution of correlation and regression coefficients, which is applicable for a special class of nonstationary time series. Hoover (2003) argued that the common measures of statistical association cannot generally reflect probabilistic dependence among nonstationary data and suggested using the population counterpart of the empirical correlations in the nonstationary case to avoid the nonsense correlation. Yang and Shahabi (2005) developed an algorithm to evaluate the correlation coefficient of a time series by estimating the stationarity of the dataset. The above methods mainly focused on the validity of the correlation coefficient that considers the full length of data available by introducing a special notion of stationarity. But for the genuinely nonstationary time series, the statistical properties of the variables are changing over time, i.e. the correlation coefficient for the entire dataset will be statistically different with that for the subset of the data. Therefore, a time-dependent correlation structure is necessary to give a useful estimation of the relationship between the two variables for various subsets of the data instead of a single static mean coefficient. To implement this idea, Papadimitriou et al. [2006] introduced a time evolving local similarity score for the time series by generalizing the notion of cross-correlation coefficient and applied a sliding window to localize the correlation estimations. Rodo and Rodriguez-Aria

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

235

[2006] developed the scale-dependent correlation technique to resolve transitory signatures and local processes in the climate system. This moving correlation analysis technique can also be applied to two-dimensional data to demonstrate the spatial relationships, for example, Bolviken [2003] applied it for studying geochemical association rates for multiples sclerosis versus environmental data for indoor radon and fallout of atmospheric magnesium, and Svanda et al. [2005] studied the mapping horizontal velocity fields in the solar photosphere. These methods detected the correlation between two nonstationary signals by computing the correlation coefficient in a local sliding window, which can describe the complex time-dependent correlation between nonstationary signals, but several problems still remain to be solved. One issue is associated with the multi-scales integrated in the signals. The real world data usually consist of the amalgamation over all (or multi-) length and time scales. Certain processes may correlate with each other in one scale but not in others. When amalgamating these different scale components together, the meaningful correlation could be masked by the coexisting signals of different scales and usually resulted in the nonsense characterization even by means of traditional techniques, including the moving correlation analysis. Another issue is that the scales could also be time-varying, which would call for an adaptive sliding window in order to provide accurate time-dependent correlation structure. Therefore, to get the useful information, two steps are important: firstly, separating the original signal into different components according to their scales each associated with a special process, and secondly, determining the size of the window adaptively to keep the stationary properties of the data within each window as much as possible. The various Fourier based band pass filters would not fit the requirements, for they were designed with the underlying linear and stationary assumption and also they are not adaptive. The recent developed empirical mode decomposition (EMD) could be used to satisfy these requirements [Huang et al. (1998); Huang and Wu (2008); Wu and Huang (2009)]. EMD is an adaptive method to decompose any time series into a set of intrinsic mode functions (IMFs), which represent the different scales of the original time series and form the adaptive and physical basis of the data. With no pre-defined basis functions, the IMFs usually are functions of the time, which are ideally for the analysis of the nonstationary and nonlinear data. In this paper, we will introduce a newly developed time-dependent correlation analysis technique based on the EMD method as a pre-conditioning step. This new method will resolve all the problems associated with the traditional methods mentioned above. Section 2 discusses the problems and limitations of the traditional correlation coefficient with several examples. The new time-dependent correlation method to analyze the nonstationary time series is introduced in Sec. 3. The applications of the method on the climate and paleoclimate records are given in Sec. 4. Section 5 contains the conclusions and discussions.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

236

00047

X. Chen, Z. Wu & N. E. Huang

2. General Problems of Correlation Coefficient Three examples are given in this section to show the general problems when applying the traditional correlation analysis. It includes three different cases on the signals with (a) amalgamated multi-scales, (b) time-dependent scaling behavior, and (c) time-varying relationship on the same scale. 2.1. Correlation of signals with amalgamated multi-scales The real-world data usually amalgamates overall time scales. Correlation between the signals without separating these different scales clearly may lead to nonsense result. Therefore, the information with different time scales of the data should be separated first before the estimation of the correlation. Figure 1 gives two synthetic time series x1 (t) and x2 (t) as follows: x1 (t) = a(t) x2 (t) = b(t) + c(t) where a(t) and b(t) are two signals with the same mean time scale L and a significant correlation coefficient 0.78; c(t) is constructed using the following steps: decomposing a white noise e(t) using EMD into several IMFs, removing the IMFs with the time scale closing to L, and summing up all the other IMFs. While c(t) is constructed without any possible correlation with the time scale L, it is expected that x1 (t) and x2 (t) will still keep the relationship between a(t) and b(t) as much as possible. However, the correlation coefficient of the constructed signals x1 (t)

2 1 0 -1 -2 4 2 0 -2 -4 0

50

100

150

200

250

300

350

400

450

500

Fig. 1. (Top) The synthetic signals a(t) (solid line) and b(t) (dash line). (Bottom) The synthetic signals x1 (t) (solid line) and x2 (t) (dash line).

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

237

and x2 (t) is only 0.23. The correlation of the combined signal has been significantly masked by the presence of the noisy component, c(t) so much so that the underlying relationship of the components a(t) and b(t) is no longer detectable. Because the signal c(t) is constructed from the white noise, choosing any size of the “neighborhood” of t will inevitably include the white noise information on any time scale. This is the main reason for the meaningless correlation estimation. Figure 2 shows the correlation coefficients of the subset of the signals x1 (t) and x2 (t) with different sizes of the window t (the bottom part), as well as the correlation coefficients of the subset of the original signals a(t) and b(t) (the top part). This result shows that before doing the correlation analysis, the different time scales hidden in the signals should be separated first to get the meaningful correlation estimation. 2.2. Correlation of signals with time-dependent scaling behavior Figure 3 shows another case for the failure of the overall correlation analysis. One signal is a simple cosine wave with frequency f1 suddenly switching to another frequency f2 , and after a short period, switching back to f1 . Another signal is an amplitude-modulated cosine wave with same frequency f1 as the first signal. The overall significant correlation coefficient between two signals is 0.6. Obviously, this cannot grasp either the perfect correlation (0.93) between two signals in the regions [0 1], and [2 3], or the uncorrelated part (0.01) in the region [1 2]. This time-dependent scaling behavior usually happens in the real applications, e.g. in the climatic time series, in which the transitory external forcing that do not last long but has significant impact locally are embedding within the regular, longer lasting processes. 2.3. Correlation of signals with time-varying relationship on the same scale The similar example as Fig. 3 is shown in Fig. 4. Here the pure cosine wave switches to sine wave within one period suddenly and, after several cycles, switches back. Two signals are nearly with the same scale except the transition period, but with cyclic positive–negative correlation relationship induced by the sudden phase shifts. In this case, the overall correlation coefficient is only 0.01 because the cyclic positive– negative correlations canceled out with each other. Although the existed moving correlation technique can resolve this feature using the sliding window, the problem of how to determine the size of the sliding window without a priori knowledge of the scale changes still begs for an answer. For these constructed signals, we do have the knowledge about where the frequency shifts happened. But for the real applications, this information is seldom available, and usually it is precisely the answer we are looking for. Under such conditions, choices of the sliding window size will be critical, for t would dedicate the answer one to obtain. Figure 5 compares the results using long (t = 6), medium (t = 3),

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

238

00047

X. Chen, Z. Wu & N. E. Huang 100

0.8

90

0.6

Sliding Window Size

80

0.4

70

0.2

60 0 50 -0.2 40 -0.4

30

-0.6

20

-0.8

10 50

100

150

200

250 Time

300

350

400

450

500

100

0.8

90

0.6

Sliding Window Size

80

0.4

70

0.2

60 0 50 -0.2 40 -0.4 30 -0.6

20

-0.8

10 50

100

150

200

250 Time

300

350

400

450

500

Fig. 2. (Top) Correlation coefficients between a(t) and b(t) with different sliding windows size. (Bottom) Same as above but for x1 (t) and x2 (t).

and short (t = 1) sliding window sizes for the example given in Fig. 4. Obviously, for the nonstationary time series, using long sliding window size will smooth the shift from one stage to another leading to a widen transition period, while using the short sliding window size will get the correlation coefficients with strong variability, which is mainly the reason for the lack of the degree of freedom, even if it passed the statistical significant test. Using medium window size can give reasonable variation of the correlation and the position of the phase shift, however, the

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

239

1 0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 0

Fig. 3.

0.5

1

1.5 Time

2

2.5

3

Simple cosine waves with (solid line) and without (dashed line) frequency shift.

0.8 0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 5

10

15

20

25

30

35

Time

Fig. 4. Two simple cosine waves with same scale, but with (solid line) and without (dashed line) phase shift.

pre-knowledge about this intermediate scale is required. To reduce this limitation, Papadimitriou et al. [2006] suggested, if desirable, tracking the correlation scores at multiple scales. But for the real data, this process is time consuming and unpractical, for the real multi-scale data might call for different windows for different time

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

240

00047

X. Chen, Z. Wu & N. E. Huang

Correlation Coefficeint

1

0.5

0

-0.5

Short: 1 Medium: 3 Long: 6

-1

5

10

15

20

25

30

35

Time

Fig. 5. Moving correlation coefficients using short (black solid line), medium (gray solid line), and long (black dashed line) sliding window sizes. The blank part denotes the correlation coefficient did not pass the statistical significant test.

scale signals. A practical solution is based on an adaptive criterion to determine the sliding window size, which is proposed in the next section. 3. Time-Dependent Intrinsic Correlation Based on EMD (TDIC) The new time-dependent correlation based on EMD method is introduced in this section. As the analysis above indicated, to avoid the nonsense correlation, the first step is to separate the different scales embedded in the time series. Instinctively, we want to extract the intrinsic and local information in the data and measure their physical relationship. The methods with linear or stationary assumptions, such as Short-Time Window Fourier Transformation, Wavelet, and Singular Spectrum Analysis, will not be valid to this end. In this section, the EMD method, which is developed ideally for the analysis of nonstationary and nonlinear data, will be applied. 3.1. Intrinsic mode functions (IMFs) and instantaneous period EMD method separates a signal into a set of IMFs in the form x(t) =

N 

ci (t) + r(t)

i=1

where r(t) is the residual, which is usually a monotonic function representing either the mean trend or a constant. ci (t) is the ith IMF i = 1, 2, . . . , N , which is the amplitude- and frequency-modulated function of time [Huang et al. (1998);

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

241

Wu and Huang (2009)]. For a nonstationary process, the frequency of the process will change over the time giving the instantaneous frequency [Huang et al. (2009)] and the instantaneous period that are essential for understanding the physical meaning of the nonstationary process. Based on the definition of the IMF, we can compute the instantaneous frequency and instantaneous period by counting the number of zero crossing points of the function, or using the Hilbert transformation, or calculating the quadrature of the IMF directly. Detail comparisons of different methods are given in Huang et al. [2009]. In this paper, for the convenience, the general zero crossing method is applied. The original EMD method has a major drawback in the frequent appearance of mode mixing, which is defined as a single IMF consisting of widely disparate scales, or a similar scale is separated into different IMFs. This problem is overcome by Wu and Huang [2009] through the introduction of the white-noise-assistant method, i.e. Ensemble Empirical Mode Decomposition (EEMD). Because the added white noise populates the whole time–frequency space uniformly with the components of different scales and a set of white noise added to the original signal will cancel each other out in a time–space ensemble mean and leave the signal intact, EEMD can separate the scales of the signal naturally and reducing the mode mixing significantly. It will be found that EEMD is very helpful to get the reasonable time-dependent correlation. The detail introduction of EEMD method is given in Wu and Huang (2009), here we only give a brief procedure as follows: (1) add a white noise to the analyzed time series; (2) decompose the time series with added white noise into IMFs; (3) repeat Steps (1) and (2) for enough times, but with different white noise each time; (4) calculate the ensemble mean of the corresponding IMFs obtained from Step (3) as the final result. 3.2. Computation of time-dependent intrinsic correlation The Time-Dependent Intrinsic Correlation (TDIC) starts with EEMD of the time series to be studied. After the two targeted time series are decomposed using EEMD, respectively, into several IMFs, xp (t) =

N 

cpi (t) + rp (t),

p = 1, 2,

i=1

where cpi (t) are IMFs of xp (t), and rp (t) are the residues. Then, the corresponding instantaneous frequency Fpi (t) and the instantaneous period Tpi (t) of ci (t) are obtained using the general zero crossing method. The time-dependent intrinsic correlation of each pair of IMFs is defined as follows: Ri (tnk ) = Corr(c1i (tnw ), c2i (tnw )) at any time tk

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

242

00047

X. Chen, Z. Wu & N. E. Huang

where Corr denotes the general correlation coefficient of two time series. The sliding window is given as tnw = [tk −ntd/2 : tk +ntd /2], where the minimum sliding window size for the local correlation computation is chosen td = max(T1i (tk ), T2i (tk )), and n is any positive real number. The following features of the defined time-dependent correlation should be marked: • The sliding window used for Ri (tnk ) is determined adaptively based on the estimation of the instantaneous period of the IMF; therefore, the computed correlation coefficient will capture the nonstationarity of the original physical processes. This gives an intrinsic relationship between the time series, so Ri (tnk ) is named the time-dependent intrinsic correlation (TDIC), compared with the existed timevarying correlation analysis methods. • For the stationary time series, the variance of the subset of the data equals statistically to that of the whole data, then Ri (tnk ) will be constant and gives the same results as that from traditional correlation definition for the current time scale. • Choosing the sliding window size equal to or larger than td ensures the data at least in one instantaneous period is included when calculating the local correlation coefficient. The sliding window size tnw includes n-times length of the instantaneous period at the time tk ; therefore, the significant test is easily designated as the degree of freedom of n with the sliding window tnw . The student’s t-test is also performed in this paper to investigate whether the difference between the correlation coefficient Ri (tnk ) and zero is statistically significant. 3.3. Interpretation of patterns in TDIC plots In this part, two synthetic time series are used to summarize steps computing the TDIC and illustrate the patterns in the TDIC plots. In order to clarify the TDIC procedure, the synthetic time series are constructed as the Intrinsic Mode Functions already, the EMD decomposition step is not necessary here. Two synthetic time series x1 (t) and x2 (t) are given as follows (Fig. 6(a)) x1 (t) = 0.8(1 + 0.25 cos(2π(1/360)t)) cos(0.25t + 1.25 sin(2π(1/125)t)) x2 (t) = 0.6(1 + 0.50 sin(2π(1/360)t)) sin(0.25t + 1.25 cos(2π(1/125)t)) where x1 (t) and x2 (t) are the amplitude- and frequency-modulated signals, which are IMFs already following the analysis of Rilling and Flandrin (2008). The second step is to compute the instantaneous periods of two time series as shown in Fig. 6(b). It is found that the period of the signal varies continuously between 19 and 36 units, which give the minimum criterion for determining the sliding window size. The third step is choosing the fragment of two signals with different sliding window sizes and calculating the correlation coefficients. The investigation of whether the difference between the sample correlation coefficient and zero is statistically significant is carried out using the Student’s t-test. The degree of freedom is determined as n, i.e. the times of the sliding window size

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

243

tnw to the minimum sliding windows size td at the time tk . When the sliding window size is short than four-times td at the time tk , i.e. the degree of freedom is less than 4, the traditional statistical significant test supported by Numerical Recipes (2007, p. 747) is applied.

2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 0

100

200

300

400

500

600

700

800

600

700

800

(a) Time

38 36

Instantaneous Period

34 32 30 28 26 24 22 20 18 0

100

200

300

400

500

(b) Time Fig. 6. (a) The two synthetic time series. (b) The instantaneous period of the signal. (c) The TDIC plot of the two synthetic time series. The white space of the TDIC plot means the difference between the correlation coefficient and zero is not significant at 95% significance level.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

244

00047

X. Chen, Z. Wu & N. E. Huang 800

1.0 0.8 0.4 0.2

400

400

0.0

w do in W

of Sl id in g

600

g in id Sl of

Si ze

600

ze Si

W in do w

0.6

-0.2 -0.4 -0.6 -0.8

200

0

100

200

200

300

400

500

600

700

-1.0

800

(c) Center Position of Sliding Window Fig. 6.

(Continued )

Finally, the TDIC plot is obtained as shown in Fig. 7(c). The horizontal axis of the TDIC plot is the time of the series, corresponding to the center position of the sliding window, and the vertical axis is the size of the sliding window. The minimum size of the sliding window is the maximum instantaneous period between the two signals at the current position. The maximum size of the sliding window is the whole domain. When the boundary of the sliding window exceeds the left or right end points of the series, the TDIC is not computed any more, because, at the time, the current position is not the center of the sliding window. Therefore, the TDIC plot is a triangle plot. The top point corresponds the sliding window is the whole domain, at which the value is actually the general correlation coefficient of the whole time series. Obviously, the TDIC plot displays the clear variability of the relationship between two time series at different time and time region. Several features should be noted in Fig. 6(c). First, it shows neither positive nor negative correlation between x1 (t) and x2 (t) can last more than 100 time units; second, it highlights the periods when the cycles in two time series are in phase or out of phase, as well as at which time the alternation happens. Another interesting feature of Fig. 6(c) is that it shows two similar patterns between the region [0–400, 0–400] and [400–800, 0–400], which gives a group-correlation variability of the amplitude modulation between two signals. As shown in Fig. 6(a), the amplitude modulation of x1 (t) and x2 (t) is about two cycles in the whole domain. The series containing in each modulation cycle in one signal has a group relationship with those in the other signal. This is one of the special features of the TDIC plot. 3.4. Time-dependent intrinsic cross-correlation The TDIC analysis can also be applied to track the lead–lag correlation between each pair of the fragments of two signals (cross-correlation). This is quite necessary

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

245

10 dow g Win 6 Slidin 4

Size

8

2 6

4

Time Lag

2

0

-2

-4

-6 0

1

2

3

4 5 6 Center of Sliding Window

7

8

9

10

Fig. 7. The schematic of TDICC plot. The cross-section along τ = 0 gives the TDIC as the shadow triangle shows.

for studying complicated signal pairs such as in the climate processes, medical signals or financial series. Following Sec. 3.2, the time-dependent intrinsic crosscorrelation (TDICC) can be defined as Ri (tnk , τ ) = Corr(c1i (tnw ), c2i (tnw,τ )) at any time tk where the sliding window is given as tnw = [tk − ntd /2 : tk + ntd /2] for the signal number one, and tnw,τ = [tk − τ − ntd /2 : tk − τ + ntd /2] for the signal number two. The minimum sliding window size for the local correlation computation is chosen as td = max(T1i (tk ), T2i (tk − τ )). Following the definition, Ri (tnk , τ ) is three-dimensional, i.e. the time, the sliding window sizes, and the lead–lag steps. Therefore, the resulting correlation coefficient is a three-dimensional matrix, having the shape of a prism with the triangular ends (Fig. 7). Considering the problems when the index into the series is less than 0 or greater than or equal to the number of the data points, i.e. either tk − τ − ntd /2 and/or tk − τ + ntd /2 is < 0 or > Nt , where Nt is the total data points of the original series, we can have the following three common approaches, (1) ignoring these points, (2) assuming the series are zero for the index outside the region [0 Nt ], and (3) assuming the series is circular in which case the out of range indexes are “wrapped” back within the range. For the nonstationary time series, the latter two options are unphysical. Therefore, we choose the first option to ignore the out of

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

246

00047

X. Chen, Z. Wu & N. E. Huang

range indexes. Based on this option to deal with the data out of the range, the traditional cross-correlation between two series can find their counterpart in the new developed TDICC plot. While the TDICC is a three-dimensional matrix, direct visualization of such matrix depends on the complex 3-D visualization technique, but the cross-section along one of three axes would already give more detail information of the signals. The cross-section along the axis of the sliding window size td = 80 units with the maximum lead–lag window τ = 200 units for the data given in Fig. 6(a) is shown in Fig. 8. It can be found that there exist three bands with repeated high negative–positive correlation coefficients at about τ = −93, 31, 156. At these three lead–lag steps, two signals have strong time-independent correlation, the temporal mean cross-correlation coefficient agrees well with the structure of the overall crosscorrelation of x1 (t) and x2 (t) (Fig. 9). Figure 8 also shows the periodic variability of the correlation between two signals for the other lead–lag steps. Obviously, for the complex signals, the TDICC can give a clear picture of the nonstationary structure of the data. 3.5. Time-dependent intrinsic auto-correlation Based on the definition of the Time-Dependent Intrinsic Cross-Correlation Ri (tnk , τ ), if choosing c1 = c2 , we have the Time-Dependent Intrinsic AutoCorrelation (TDIAC). The resulted TDIAC plot is similar to the upper part of the TDICC plot (τ > 0), except the cross-section along τ = 0 always equals to 1.0. Figure 10 shows the cross-section of the TDIAC of the data x2 (t) given in Fig. 6(a) 200

0.8

150

0.6 0.4

100

0.2

Time Lag

50

0 0 -0.2 -50 -0.4 -100

-0.6

-150 -200

-0.8 100

200

300

400 Time

500

600

700

800

Fig. 8. The cross-section of the three-dimensional time-dependent cross-correlation plot along the axis of the sliding window size td = 80 units with the maximum lead–lag window τ = 200 units for the data given in Fig. 6(a).

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

247

1 0.8

Correlation Coefficient

0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -200

-150

-100

-50

0 Time Lag

50

100

150

200

Fig. 9. (Solid line) Cross-correlation of the data shown in Fig. 6(a) and (dashed line) the mean value of the time-dependent cross-correlation at the sliding window size td = 80 units and the lead–lag step τ = 31.

Time Lag

along the axis of the sliding window size td = 80 units with the maximum lead–lag window τ = 200. With the help of the TDIAC analysis, we can define the degree of stationary (D.O.S.) using the temporal variability of the correlation coefficients, which is given

200

1

180

0.8

160

0.6

140

0.4

120

0.2

100

0

80

-0.2

60

-0.4

40

-0.6

20

-0.8

0

100

200

300

400 Time

500

600

700

800

-1

Fig. 10. The cross-section of the three-dimensional time-dependent auto-correlation plot along the axis of the sliding window size td = 80 units with the maximum lead–lag window τ = 200 units for the data given in Fig. 6(a).

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

248

00047

X. Chen, Z. Wu & N. E. Huang

as the standard deviation (std) of TDIAC along the time axis D.O.S. = std(Ri (tnk , τ )). Obviously, the D.O.S. is a function of the sliding window size td , as well as the lead–lag step τ . Notice that the D.O.S. is correlation-based definition, its maximum value will not exceed 1.0. This definition has a similar physical interpretation as the degree of stationarity introduced by Huang et al. [1998], which is based on the Hilbert spectral function. As the standard definition of stationarity is based on the correlation function, the definition given here has a definite affinity with the traditional definition. It should be pointed out that with the introduction of the D.O.S. we can quantify the stationarity rather than relay on the traditional qualification approach. As the definition of D.O.S. is too rigid, we would also introduce a Statistical D.O.S. (S.D.O.S.) as in Huang et al. (1998). Therefore, D.O.S. is actually the statistical degree of stationary and could give an indicator of how statistically nonstationary the signal is by measuring its distance to 1.0. It is defined as S.D.O.S. = std(Ri (tnk , τ )τ ). In which M τ indicates the mean value over the time span of τ . This definition is more general. For example, the white noisy signal is nonstationary strictly speaking, but it could be statistically stationary. All values of D.O.S. whether strictly or statistically should have the values between 0 and 1. For the stationary series, D.O.S. or S.D.O.S. is closed to zero. Figure 11 shows the D.O.S. of the data x2 (t) given in Fig. 6(a) along the axis of the sliding window size td = 80 units. This figure illustrates how often one segment

0.5 0.45 0.4 0.35

D.O.S.

0.3 0.25 0.2 0.15 0.1 0.05 0 0

20

40

60

80

100 Time Lag

120

140

160

180

200

Fig. 11. The D.O.S of the data x2 (t) given in Fig. 6(a) along the axis of the sliding window size td = 80 units.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

249

with window size td = 80 units of original signal is correlated or anti-correlated with the other segment of the signal with same window size, and where this segment could be found. The D.O.S. closer to zero means higher probability, hence more stationary.

3.6. Resolution of the general problems of correlation coefficient Section 2 discussed three general problems of the traditional correlation analysis, including the signals with (a) amalgamated multi-scales, (b) time-dependent scaling behavior, and (c) time-varying relationship on the same scale. With the help of the newly developed TDIC analysis technique, the difficulties with these problems can be resolved successfully.

3.6.1. Correlation of signals with amalgamated multi-scales When we only have the information of time series x1 (t) and x2 (t) as given in Sec. 2.1, and without any pre-knowledge of a(t) and b(t), the overall correlation coefficient of x1 (t) and x2 (t) is 0.23 and the traditional time-dependent correlation analysis gives the result shown in Fig. 2(b), which mask the true physical relationship between two signals. Following the steps suggested in Sec. 3.2, x1 (t) and x2 (t) are first decomposed into 8 IMFs, respectively (figure is not shown). The significance test of the IMFs shows that all IMFs of x1 (t) are significant compared with the white noise series, but for x2 (t), only the 4th IMF is significant (Fig. 12). Actually, the 4th IMF has explained about 64.5% variance of original signal b(t), as shown in Fig. 13. The overall correlation between a(t) (i.e. x1 (t)) and b(t), x2 (t), and the 4th IMF of x2 (t), and the overall correlation between b(t) and x2 (t) and the 4th IMF of x2 (t) are given in Table 1, respectively. Obviously, due to the contamination of the noise signal c(t), the overall correlation between x2 (t) and a(t) and b(t) reduced significantly (second column of Table 1). However, using the EEMD to decompose the contaminated signal x2 (t), the hidden information, i.e. 4th IMF of x2 (t), is uncovered, whose overall correlation with a(t) and b(t) is all increased significantly (third column of Table 1). Especially, the improvement of the correlation coefficient between a(t) and 4th IMF of x2 (t), compared with that of a(t) and x2 (t) itself shows before the correlation analysis, the different scales of the original signal must be separated first.

3.6.2. Correlation of signals with time-dependent scaling behavior The example given in Sec. 2.2 shows the frequency shift from f1 to f2 at t = 1 and shift back from f2 to f1 at t = 2. The TDIC plot (Fig. 14) shows the clear pattern of these two frequency shifts, as well as the correlation variation during the transient period. This will be analyzed in the next section.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

250

00047

X. Chen, Z. Wu & N. E. Huang Significance Test of the IMFs 20

3

log2(E) (Mean Normalized Energy)

15

4 10

2 5

5 6 0 7

1

95% percenta line

-5

white noise

-10

-15 0

2

4

6 log2(T) (Mean Period)

8

10

12

(a) Significance Test of the IMFs 0

log2(E) (Mean Normalized Energy)

-2

1

2

3

4

-4 95% percenta line -6

5 6

-8 7

white noise

-10

-12 0

2

4

6 log2(T) (Mean Period)

8

10

12

(b) Fig. 12.

Statistical significance test of IMFs of (a) x1 (t) and (b) x2 (t).

3.6.3. Correlation of signals with time-varying relationship on the same scale The time-varying correlation on the same scale (data given Fig. 4) can be easily resolved by the TDIC plot as shown in Fig. 15. In order to compare with analysis given in Fig. 5, the correlation within the regional shorter than the instantaneous

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD 1.5 4th IMF of x 2(t) b(t) 1

0.5

0

-0.5

-1

-1.5 0

Fig. 13.

50

100

150

200

250

300

350

400

450

500

The 4th IMF of noise-contaminated signal x2 (t) and the original signal b(t). Table 1.

a(t) b(t)

Overall correlation coefficients.

b(t)

x2 (t)

4th IMF of x2 (t)

0.78 1.00

0.23 0.45

0.57 0.74

0.8

3

0.7 2.5

0.6

Sliding Window Size

0.5 2 0.4 0.3

1.5

0.2 1

0.1 0

0.5

-0.1 -0.2 0.5

Fig. 14.

1 1.5 2 Center of Sliding Window

2.5

TDIC plot of the signal given in Fig. 3.

3

251

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

252

00047

X. Chen, Z. Wu & N. E. Huang 36

0.8 0.6

30

Sliding Window Size

0.4 24

0.2 0

18 -0.2 12

-0.4 -0.6

6 -0.8

6

12 18 24 Center of Sliding Window

30

36

Fig. 15. Same as Fig. 14 but for the signal given in Fig. 4. The thick black line denotes the instantaneous period, which can be served as the minimum sliding window size to get the meaningful time-dependent correlation.

period is also shown, as the area below the thick black line. It is obviously shown that when the sliding window size is less than the instantaneous period, the local correlation has strong fluctuations. These fluctuated correlations have no physical meaning, though it passes the general statistical significance test. This is because when the local window size is shorter than the instantaneous period, the signals are consisted of partial waves. Therefore, there are not enough degree of freedom to assure the significance of the correlation coefficient. As soon as the sliding window size exceeds the instantaneous period, the strong fluctuation reduced a lot, for example, in the period of t = [12 24]. This result shows the instantaneous period can serve as a local guide to choose the sliding window size when computing the time-dependent correlation. The time (or position) when the frequency shift happened in the signals given in Figs. 3 and 4 can be easily identified in their TDIC plots, respectively, as well as how large the phase shift affects the data, from the point of view of the correlation. This is an extremely useful information for the real applications to be discussed in Sec. 4. 4. Application of TDIC to Climate and Paleoclimatological Data Two real data sets are analyzed in this section to illustrate the power of the TDC method proposed here. These examples are drawn from climate and paleoclimate phenomena. The first example examines the Nino 3.4 index and the Indian Ocean Dipole (IOD) mode index (DMI) to study the relationship between the ENSO

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

253

and the Indian Ocean variability. The second example investigates the proxy temperature observations from the stacked records of the sediments to identify the relationship between the orbital cycles and the paleoclimate variation.

4.1. ENSO and IOD ENSO (El Ni˜ no-Southern Oscillation) and IOD are the inter-annual climate fluctuations in the tropical Pacific Ocean and the Indian Ocean, respectively. It is well known that the IOD is an independent variability of the Indian Ocean [Saji et al. (1999)] but tends to occur synchronously with ENSO [Annamalai et al. (2005)]. For the simultaneous ENSO and IOD events, for example, during the late 1997, when the ENSO begins in the Pacific, the surface wind anomalies were easterly in the central equatorial Indian Ocean, suggesting a Walker-like circulation cell with uplift over eastern Africa and subsidence over Sumatra is established, which is a favor for the development of a positive IOD [Behera et al. (2006); Luo et al. (2010)]. However, not all intense ENSO can trigger an IOD in the Indian Ocean, and some IOD events occurred independent of an ENSO in the Pacific. A new result from Izumo et al. [2010] suggests a negative phase of the IOD probably an efficient predictor of ENSO 14 months before its peak. The relationship and the interaction between the ENSO and IOD are still in need of further investigation. In order to study the relationship between the ENSO and IOD, the Nino3.4 Index and DMI are analyzed by using the TDIC technique. The Nino3.4 Index is the sea surface temperature (SST) anomaly in the box 170◦ W–120◦ W, 5◦ S–5◦ N, and the DMI is an indicator of the east–west temperature gradient across the tropical Indian Ocean. Two indices are calculated using the Reynolds OIv2 SST analysis and available through http://ioc-goos-oopc.org/state of the ocean/. The monthly anomaly is calculated relative to a climatological seasonal cycle based on the years 1982–2005 and the data till the early 2010 is used (Fig. 16). The overall correlation between Nino3.4 index and DMI is no more than 0.4 as Fig. 17 shows. By decomposing the index into their IMFs, four significant IMFs are given in Fig. 18, which represent the inter-annual and inter-decadal oscillations, respectively. The mean period of 3rd, 4th, 5th, and 6th IMFs of Nino 3.4 Index and DMI are given in Table 2. For the 6th IMF of Nino 3.4 Index and DMI, the cross-correlations are shown in Fig. 19, which show the high correlation between the ENSO and the IOD on the decadal time scale and the ENSO has nearly 1.6 year leading to the IOD. For the 5th IMF, the mean period of Nino 3.4 Index is 7.1 years, whereas that of DMI is only 5.3 years. This suggested the IOD has less variability on the period about 7 years as that of ENSO. The mechanism controlling the variation of ENSO on around 7 years has few effects on the IOD variability. The overall correlation of 3rd and 4th IMFs of Nino 3.4 index and DMI are 0.31 and 0.39, respectively. Although the overall correlation between each pairs of IMFs are not significant, the TDIC analysis (Fig. 20) shows the significant timedependent variability of the signals. Firstly, it is well known that the intense ENSO

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

X. Chen, Z. Wu & N. E. Huang

Degree oC

254

00047

2.5 2 1.5 1 0.5 0 -0.5 -1 -1.5 -2 -2.5

Nino3.4

82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10

Degree oC

(a) Time: year 2.5 DMI 2 1.5 1 0.5 0 -0.5 -1 -1.5 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10

(b) Time: year Fig. 16.

(a) Nino 3.4 index and (b) DMI index.

0.4 0.3

Cross-Correlation

0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -8

-6

Fig. 17.

-4

-2

0 TIme Lag: Year

2

4

6

8

Lag-correlation between Nino 3.4 index and DMI.

in 1982–1983 did not trigger an IOD event in the Indian Ocean, whereas the ENSO in 1997–1998 did. The TDIC plot in Fig. 20 shows that the difference between these two events mainly happens on the time scale around 2 years, i.e. 3rd IMF. Secondly, on the time scale around 4 years, i.e. 4th IMF, ENSO and IOD has high correlation

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

255

1.5 Nino3.4 DMI

Degree oC

1 0.5 0 -0.5 -1 -1.5

1

o

Degree C

2

0 -1 -2

o

Degree C

0.5

0

-0.5

0.3

0.1

o

Degree C

0.2

0 -0.1 -0.2 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 Time: Year

Fig. 18. line).

The 3rd, 4th, 5th, and 6th IMFs of Nino 3.4 index (solid line) and the DMI (dashed

Table 2. Mean period of 3rd and 4th IMFs of Nino 3.4 index and DMI (unit: year).

Nino 3.4 index DMI

3rd IMF

4th IMF

5th IMF

6th IMF

1.63 1.48

3.78 3.32

7.1 5.3

15.3 16.1

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

256

00047

X. Chen, Z. Wu & N. E. Huang 1 0.8 0.6

Cross-Correlation

0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -30

Fig. 19.

-20

-10

0 TIme Lag: Year

10

20

30

Lag-correlation between 6th IMF of Nino 3.4 index and that of DMI.

during 1990s and the period from 1985 to 1990, which is probably contributed to the same physical processes. Further analysis of the physical relationship between ENSO and IOD based on the results of this paper will be given in another paper. 4.2. Paleoclimate observations In this section, the TDIC method is applied to identify the relationship between the orbital parameters of the Earth and the paleoclimate variation. Figure 21 shows the proxy temperature data from the global deep-sea oxygen isotope records based on the stacked sediment cores compiled from more than 40 DSDP and ODP sites, compiled by Zachos et al. (2001). A casual inspection of the data immediately reveals the highly nonstationary characteristics of the time series. The amplitude changes are obvious. The periodicity also shows a clear shift from the first 1.5 Ma BP to the subsequent period from 1.5 to 4 Ma BP, for example. The longer period sections could also be identified intermittently throughout the whole time series. Furthermore, the sharp and spiky peaks and troughs also suggest that the underlying process had strongly distorted the wave forms from the smooth linear sinusoidal waves to a high distorted ones. Finally, the visible trend in the data made it violating the fundamental requirement of zero-mean for traditional correlation analysis. To analyze these data, we definitely need new tools. Here, we will use it to demonstrate the prowess of the newly developed TDIC between the temperature proxy and the orbital parameters. It is well known from the analysis by Milankovitch (1941) that the gravitational effects of the Solar system could cause orbital perturbations of the Earth, which,

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

257

0.8

30

0.6

Sliding Window Size: Year

25

0.4 0.2

20

0 15 -0.2 -0.4

10

-0.6 5 -0.8

1985

1990

1995

2000

2005

2010

(a) Time: year 0.8

30

0.6

Sliding Window Size: Year

25 0.4 20 0.2 15 0 10 -0.2 5 -0.4

1985

1990

1995

2000

2005

2010

(b) Time: year Fig. 20. The TDIC plot of (a) 3rd and (b) 4th IMFs of Nino 3.4 index and DMI. The black dashed line in each figure denotes the instantaneous period of the correspondent IMF of Nino 3.4 index.

in turn, cause subtle quasi-periodical variations of the geographic distribution of incoming solar radiation. These subtle changes had been identified as the main driving force changing the climate of the Earth on the time scales from 10 KY (period in thousand year) to 100 KY, a subject of great interest to the paleoclimatologists and also the subject of a recent Symposium sponsored by the Royal Society of London in

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

258

00047

X. Chen, Z. Wu & N. E. Huang 2

2.5

3.5

δ

18

O

3

4

4.5

5 0

1

2

3 Time: Ma BP

4

5

6

Fig. 21. Global deep-sea oxygen isotope records based on data compiled from more than 40 DSDP and ODP sites.

1998 [Shackleton et al. (1999)]. There are three primary orbital parameters identified by Milankovitch: the eccentricity of the orbit varies at periods of about 100 KY, and 400 KY; the obliquity of the Earth’s axis varies between about 22◦ and 25◦ at a period of about 41 KY; and the precession of the Earth axis varies between 19 KY and 23 KY, which indicates the changes the distance between the earth and the sun at any given season [Crowley and North (1991)]. The variation of the eccentricity, the obliquity, and the precession during the past 35 Ma are computed from the planetary solution with detailed information on the planetary masses and positions in the solar system by Laskar (1999) and Laskar et al. (2004). The 2004 planetary solution is decomposed using EEMD for the first 6 Ma years. Figures 22(a) gives the IMFs of the eccentricity. During the last 6 Ma, it is shown that the eccentricity has three main characteristic periods, 100 KY (2nd IMF), 400 KY (3rd IMF), and 1 Ma (4th IMF). In this paper, we will only focus on the 100-KY-cycle component, i.e. 2nd IMF. The obliquity and precession, from their definition, are already the IMFs, but with slightly high frequency noises, which are removed by the EEMD. Figures 22(b) and 22(c) gives the remaining signals of obliquity and precession after removing the noise information and general mean of the original obliquity and precession time series, respectively. To identify the relationship between the orbital cycle and the proxy temperature, the observed oxygen isotope proxy temperature data are also decomposed with EEMD into 8 IMFs (Fig. 23). The mean periods of 2nd, 3rd, and 4th IMFs are 21 KY, 40 KY, and 93 KY, which match the time scales of the precession, obliquity, and eccentricity, respectively. However, it should be noted that the overall correlation between 2nd IMF of data and the precession, between 3rd IMF of data and the

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

259

IMF 1

0 -5

IMF 2

-10 20 0 -20 IMF 3

10 0 -10 IMF 4

5 0

Residual

IMF 5

-5 4 2 0 -2

28 26 0

1

2

3 Time: Ma BP

4

5

6

(a)

Obliquity

20 10 0 -10 -20 0

1

2

3 Time: Ma BP

4

5

6

4

5

6

(b)

Precession

40 20 0 -20 -40 0

1

2

3 Time: Ma BP

(c) Fig. 22. (a) IMFs of eccentricity, rescaled by multiplying 1000; (b) & (c) 2nd IMF of obliquity and precession, rescaled by multiplying 1000.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

260

00047

X. Chen, Z. Wu & N. E. Huang

IMF 1

0.2 0 -0.2 0.4 IMF 2

0.2 0 -0.2 0.4

IMF 3

0.2 0 -0.2 -0.4

IMF 4

0.4 0.2 0 -0.2 -0.4

IMF 5

0.1 0 -0.1

IMF 6

0.05 0 -0.05

IMF 7

0.1 0 -0.1

Residual

4 3.5 3 0

Fig. 23.

1

2

3 Time: Ma BP

4

5

The IMFs of global deep-sea oxygen isotope records compiled by Zachos et al. (2001).

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

261

obliquity, and between 4th IMF of data and the eccentricity are only 0.10, −0.30, and −0.33, respectively. These seemingly low correlation coefficients could only be used to describe the relationship between each pair of the data as not uncorrelated with the traditional correlation analysis. The lack of higher correlation here is mainly caused by the strong nonlinearity and nonstationarity of the phenomena, for it is well known that in the highly nonlinear Earth climate system there could be more than one locally equilibrium states. The nonlinear and nonstationary characteristics of the past climate is further complicated by the fact that the orbital parameters of the Earth are actually chaotic [Laskar (1999)]. Therefore, if the correlation between the orbital parameters and the paleoclimatic record exist at all, their relationship would be extremely complicated, so much so that the traditional method could never fully explore the relationships in detail. We will show presently that, with the new TDIC, we could find not only significant correlations but also an important time varying characteristics of the phenomena. Based on the instantaneous periods of the precession, the obliquity, and the 2nd IMF of the eccentricity, the TDIC plots of the 2nd IMF of the sediment data and the precession, that of the 3rd IMF of the data and the obliquity, and that of the 4th IMF of the data and the eccentricity cycle are shown in Figs. 24(a), 24(b), and 24(c), respectively. The oxygen isotope is out of phase with the temperature, therefore, the negative correlation coefficient in Fig. 24(a) denote the positive correlation between the orbital cycles and the temperature variations. During the

Fig. 24. TDIC plots of the IMFs of the precession (a), the obliquity (b), and the eccentricity (c) with correspondent IMFs of the sediment data.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

262

00047

X. Chen, Z. Wu & N. E. Huang

period from 6 Ma before present (BP) to 4 Ma BP, the precession is generally negative correlated with the oxygen isotope data on about 21 KY time scale. This relationship is reversed after 4 Ma BP and positive correlation lasts until about 2 Ma BP. Figure 24(b) shows almost reverse time-dependent correlation pattern between the obliquity and the data, compared with that for the precession and the observation, during the period from 5 Ma BP to 2 Ma BP. This suggests that the Earth orbital parameters obliquity and precession have similar impacts on the global climate variability during this period. There is no significant correlation between data and the precession during the recent 2 Ma. The lack of correlation in the recent period where the data are of the best quality indicates that the cause should not be attributed to the noise of the observations. Some unknown physical explanations should be sought here. The negative correlation between the obliquity and the oxygen isotope data on about 40 KY time scale seems lasting for about 3 Ma until the recent 60 KY, but since 1.5 Ma BP, the correlation becomes weak. The TDIC plot for the eccentricity in Fig. 24(c) shows the most complicated variability. Due to the modulation of the eccentricity on the precession, the TDIC plot for the eccentricity has the similar structure with that of the precession, which is generally negative correlated with the observation during the recent 3 Ma, but positive during the early 3 Ma. However, the reversion of the positive and negative correlations between the eccentricity and the observation on 100 KY time scale is more frequently than the obliquity and the precession, which has occurred on 5 Ma BP, 4 Ma BP, and 3 Ma BP. The strongest correlation between the orbital eccentricity and the observations lies in the recent 2 Ma, which coincides well with the repeat glacier cycles of around 100 KY. The TDIC analysis shows the strong nonstationary relationship between the orbital cycles and the paleoclimate variations during last 6 Ma. Especially, though it is well known that the eccentricity is dominant during recent 1 Ma, and the obliquity is dominant before 1–4 Ma BP [Crowley and North (1991)], the TDIC analysis presented here shows that the Earth orbital cycle on 40 KY has opposite contribution to the paleoclimate variability before and after 4 Ma BP. This result suggests the following possibilities: first, the Earth climate system has its own response time scale to the periodically varying of the geographic distribution of incoming solar radiation, which will be analyzed in detail in another paper. Second, the Earth system could indeed have many equilibrium states. As the orbital parameters would produce a chaotic time series of the orbit changes, the resulting climate would never be a stationary process. Therefore, the time-dependent correlation analysis should be used. As the results presented here, the new TDIC had indeed revealed important and intriguing sequence of events in our recent climate history. Of critical significance is the high, albeit intermittent, correlation values all reached over the positive and negative range of 0.8 locally rather than the overall initial values of 0.1–0.3 presented based on the traditional methods. These high local correlation values reveal the actually existing hidden intrinsic correlation relationship but masked by nonstationary processes.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

263

5. Discussions and Conclusions The classic correlation analysis reveals only the global properties between the data sets. As a result, it is unsuitable for nonstationary processes. In order to capture the complex relationships between the nonstationary processes and to track their timeevolution correlations, the traditional method has been improved by introduction of a sliding window and computing the running correlation coefficients. Such an improvement is not complete, for complicate system could be a conglomeration of signals from co-existing multi-scale processes. Therefore, for the complicated data from engineering to scientific research with embedded multi-scales signals, no single sliding window size will be sufficient to cover the signal with various scales, which would inevitably lead to the useless or sometimes misleading estimation of the correlation properties. Consequently, the selection of a single sliding window size would be a futile exercise, for such an approach representing only a “tuning” process. For complicated data from nonlinear and nonstationary processes, the adaptive windows determined by the data and adaptive decomposition of the data into their intrinsic components are essential to cover the nonstationarity as well as the multi-scale complications. Based on the state-of-art empirical mode decomposition (EMD) and Ensemble EMD methods, an adaptive procedure to compute the time-dependent intrinsic correlation (TDIC) coefficient is introduced here. The new concepts introduced here also include using the adaptive instantaneous periods of various IMF components as criteria to determine the sliding window sizes for the various scales involved in the physical phenomena. Thus, the new approach introduced here is truly adaptive. The examples of analyses on both synthetic and real data presented here demonstrate that TDIC can indeed give the time-evolution of the local correlation between two signals, as well as the time-evolution of the group relationship, as Fig. 6 shows. This ability makes TDIC a useful tool for studying both the locality and the overall nonstationarity of the data that would give deeper insight of the underlying physics. For the real world data, the TDIC successfully captures the transition period between two time series and gives the duration of the transition period and the strength and intensity of the correlation between the two signals as well as their variability. The important improvements of the TDIC over the other traditional methods are the adaptively defined range for the sliding window sizes and the detailed multi-scale analysis of the complicated data driven by more than one physical parameter and without a priori knowledge of the processes. Therefore, the Time-Dependent Intrinsic Correlation could be viewed as a major advance in the statistical analysis of nonlinear and nonstationary processes. Acknowledgments XC was supported by the National Basic Research Program of China 2007CB816002, National Science Foundation of China 40776018, National Key Technology RandD Program 2006BAB18B02, and Chinese Polar Science Strategy

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

264

00047

X. Chen, Z. Wu & N. E. Huang

Foundation 20070208. ZW and NEH was supported by a grant from Federal Highway Administration, DTFH61-08-00028, and grants from NSC, NSC95-2119-M-008031-MY3, NSC97-2627-B-008-007, and finally a grant from NCU 965941 that have made the conclusion of this study possible. References Annamalai, H., Xie, S.-P., McCreary, J.-P. and Murtugudde, R. (2005). Impact of Indian Ocean sea surface temperature on developing El Ni˜ no. J. Clim., 18: 302–319. Behera, S. K. et al. (2006). A CGCM study on the interaction between IOD and ENSO. J. Clim., 19: 1608–1705. Bolviken, B. (2003). A method for spatially moving correlation analysis. Norsk Epidemiologi, 13: 229–232. Crowley, T. J. and North, G. R. (1991). Paleoclimatology, Oxford University Press. Hoover, K. (2003). Nonstationary time series, conintegration, and the principle of the common cause. British J. Philosophy Sci., 54: 527–551. Huang, N. E. and Wu Z. (2008). A review on Hilbert-Huang transform: the method and its applications on geophysical studies. Rev. Geophys., 46: RG2006, doi:10.1029/ 2007RG000228. Huang, N. E. et al. (1998). The empirical mode decomposition method and the Hilbert spectrum for nonstationary time series analysis. Proc. Roy. Soc. London 454A: 903–995. Huang, N. E. et al. (2009). On instantaneous frequency. Adv. Adapt. Data Anal., 1: 177–229. Izumo, T. et al. (2010). Influence of the state of the Indian Ocean Dipole on the following year’s El Nino. Nat. Geosci., 3: 168–172. Laskar, J. (1999). The limits of Earth orbital calculations for geological time-scale use. Phil. Trans. R. Soc. Lond., A 357: 1735–1759. Laskar, J. et al. (2004). A long-term numerical solution for the insolation quantities of the Earth. Astron. Astrophys., 428: 261–285. Luo, J.-J. et al. (2010). Interaction between El Ni˜ no and extreme Indian Ocean dipole. J. Clim., doi:10.1175/2009JCLI3104.1. Milankovitch, M. (1941). Canon of insolation and the ice age problem (in Yugoslavian). K. Serb. Acad. Beorg. Spec. Publ. 132. (English translation by Israel Program for Scientific Translations, Jerusalem, 1969). Papadimitriou, S., Sun, J. and Yu, P. S. (2006). Local correlation tracking in time series, Proc. Sixth Int. Conf. Date Mining, pp. 456–465, doi:10.1109/ICDM. 2006.99. Press, W. H. et al. (2007). Numerical Recipes, the Art of Scientific Computing (3rd Edition), Cambridge University Press, 1235 pp. Phillips, P. C. B. (1986). Understanding spurious regression in econometrics. J. Econometrics, 33: 311–340. Rilling, G. and Flandrin, P. (2008). One or two frequencies? The empirical mode decomposition answers. IEEE Trans. Signal Process., 56: 85–95. Rodo, X. and Rodriguez-Arias, M. A. (2006). A new method to detect transitory signatures and local time/space variability structures in the climate system: the scale-dependent correlation analysis. Clim. Dyn., 27: 441–458. Saji, N. H., Goswami, B. N., Vinayachandran, P. N. and Yamagata, T. (1999). A dipole mode in the tropical Indian Ocean. Nature, 401: 360–363. Shackleton, N. J., McCave, I. N. and Weedon, G. P. (1999). Astronomical (Milankovitch) calibration of the geological time-scale. Phil. Trans. R. Soc. Lond., A 357: 1731–2007.

April 27, 2010 14:40 WSPC/1793-5369 244-AADA

00047

The Time-Dependent Intrinsic Correlation Based on the EMD

265

Svanda, M., Sobotka, M. and Klvana, M. (2005). Experiences with the use of the local correlation tracking method when studying large-scale velocity fields. WDS’05 Proceedings of Contributed Papers, Part III, pp. 457–462. Wierwille, W. (1965). A theory and method for correlation analysis of nonstationary signals. IEEE Electron. Comput., 14: 909–919, doi:10.1109/PGEC.1965.264087. Wu, Z. and Huang, N. E. (2009). Ensemble empirical mode decomposition: A noise-assisted data analysis method. Adv. Adapt. Data Anal., 1: 1–41. Yang, K. and Shahabi, C. (2005). On the stationarity of multivariate time series for correlation-based data analysis. Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 805–808. Yule, G. U. (1926). Why do we sometimes get nonsense-correlations between time-series? A study in sampling and the nature of time-series. J. Royal Stat. Soc., 89: 1–63. Zachos, J. et al. (2001). Trends, rhythms, and aberrations in global climate 65 Ma to present. Science, 292: 686–693.

Suggest Documents