time series methods (Davis 1979; Tsay 1988), and Bayesian methods (Abraham and. Wei 1984). The cumulative sum ..... Ogden (1997). Chui (1997) provides a ...
Assessing Nonstationary Time Series Using Wavelets by Brandon J Whitcher A dissertation submitted in partial ful llment of the requirements for the degree of Doctor of Philosophy University of Washington 1998 Approved by
Program Authorized to Oer Degree Date
(Chairperson of Supervisory Committee)
In presenting this dissertation in partial ful llment of the requirements for the Doctoral degree at the University of Washington, I agree that the Library shall make its copies freely available for inspection. I further agree that extensive copying of this dissertation is allowable only for scholarly purposes, consistent with \fair use" as prescribed in the U.S. Copyright Law. Requests for copying or reproduction of this dissertation may be referred to University Micro lms, 1490 Eisenhower Place, P.O. Box 975, Ann Arbor, MI 48106, to whom the author has granted \the right to reproduce and sell (a) copies of the manuscript in micro lm and/or (b) printed copies of the manuscript made from micro lm." Signature Date
University of Washington Abstract
Assessing Nonstationary Time Series Using Wavelets by Brandon J Whitcher Chairperson of Supervisory Committee: Professor Peter Guttorp & Professor Donald B. Percival Statistics & Applied Physics Laboratory The discrete wavelet transform has be used extensively in the eld of Statistics, mostly in the area of \denoising signals" or nonparametric regression. This thesis provides a new application for the discrete wavelet transform, assessing nonstationary events in time series { especially long memory processes. Long memory processes are those which exhibit substantial correlations between events separated by a long period of time. Departures from stationarity in these heavily autocorrelated time series, such as an abrupt change in the variance at an unknown location or \bursts" of increased variability, can be detected and accurately located using discrete wavelet transforms { both orthogonal and overcomplete. A cumulative sum of squares method, utilizing a Kolomogorov{Smirnov-type test statistic is applied to this problem. By analyzing a time series on a scale by scale basis, each scale corresponding to a range of frequencies, the ability to detect and locate a sudden change in the variance in the time series is introduced. Using this same procedure to detect a change in the long memory parameter, when the process variance remains constant, is also brie y investigated. Applications involve Nile River minimum water levels and vertical ocean
shear measurements. In the atmospheric sciences, broadband features in the spectrum of recorded time series have been hypothesized to be nonstationary events; e.g., the Madden{Julian oscillation. The Madden{Julian oscillation is a result of large-scale circulation cells oriented in the equatorial plane from the Indian Ocean to the central Paci c. The oscillation has been noted to have higher frequencies during warm events in El Ni~no{ Southern Oscillation (ENSO) years. The concepts of wavelet covariance and wavelet correlation are introduced and applied to this problem as an alternative to crossspectrum analysis. The wavelet covariance is shown to decompose the covariance between two stationary processes on a scale by scale basis. Asymptotic normality of estimators of the wavelet covariance and correlation is shown in order to construct approximate con dence intervals. Both quantities are generalized into the wavelet cross-covariance and cross-correlation in order to investigate possible lead/lag relations in bivariate time series on a scale by scale basis. Atmospheric measurements (such as station pressure and zonal wind speeds) from a single station at Canton Island (2.8S, 171.7 W) are put through a wavelet analysis of covariance and are shown to provide similar results to those found in Madden and Julian (1971) and multitaper spectral techniques. To investigate the possible interaction between ENSO activity and the Madden{Julian oscillation, a daily \Southern Oscillation Index" and station pressure series collected from Truk Island (7.4N, 151.8 W) are analyzed. The wavelet cross-covariance nicely decomposes the usual cross-covariance into scales which are more easily associated with atmospheric phenomena. The time-varying wavelet variance and covariance are used to investigate possible seasonal eects and changes due to ENSO activity.
List of Tables
Chapter 1:
Chapter 2:
Long Memory Processes
Chapter 3:
Discrete Wavelet Transforms and the Wavelet Variance 22
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Detecting Nonstationary Events in Long Range Dependence 1.1.2 Wavelet Analysis of Bivariate Time Series . . . . . . . . . . 1.2 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Fractional Dierence Processes . . . . . . . . 2.1.1 De nition . . . . . . . . . . . . . . . 2.1.2 Simulation . . . . . . . . . . . . . . . 2.2 Generalized Fractional Dierence Processes . 2.2.1 De nition . . . . . . . . . . . . . . . 2.2.2 Simulation . . . . . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . .
1 1 2 3 5 7
10 10 11 13 13 15
3.1 Wavelet Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.1 The Haar Wavelet . . . . . . . . . . . . . . . . . . . . . . . . 23 3.1.2 Daubechies Families of Wavelet Filters . . . . . . . . . . . . . 24
3.2 The Partial Discrete Wavelet Transform . . . . . . 3.2.1 De nition . . . . . . . . . . . . . . . . . . . 3.2.2 Analysis of Variance . . . . . . . . . . . . . 3.3 The Maximal Overlap Discrete Wavelet Transform . 3.3.1 Comparison with the DWT . . . . . . . . . 3.3.2 De nition . . . . . . . . . . . . . . . . . . . 3.3.3 Analysis of Variance . . . . . . . . . . . . . 3.4 Wavelet Variance . . . . . . . . . . . . . . . . . . . 3.4.1 De nition . . . . . . . . . . . . . . . . . . . 3.4.2 Equivalent Degrees of Freedom . . . . . . .
Chapter 4:
Testing Homogeneity of Variance
. . . . . . . . . .
. . . . . . . . . .
4.1 Spectral Analysis of DWT Wavelet Coecients . . . . 4.1.1 Long Memory Processes . . . . . . . . . . . . . 4.1.2 Short Memory Processes . . . . . . . . . . . . . 4.1.3 Conclusions . . . . . . . . . . . . . . . . . . . . 4.2 Normalized Cumulative Sum of Squares Test Statistic . 4.2.1 De nition . . . . . . . . . . . . . . . . . . . . . 4.2.2 Data Analytic Thresholding . . . . . . . . . . . 4.3 Testing Procedure . . . . . . . . . . . . . . . . . . . . . 4.4 Testing for a Single Variance Change . . . . . . . . . . 4.4.1 Empirical Size . . . . . . . . . . . . . . . . . . . 4.4.2 Empirical Power . . . . . . . . . . . . . . . . . 4.4.3 Conclusions . . . . . . . . . . . . . . . . . . . . 4.5 Locating a Single Variance Change . . . . . . . . . . . 4.5.1 Auxiliary Test . . . . . . . . . . . . . . . . . . . 4.5.2 Simulation Study . . . . . . . . . . . . . . . . . ii
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
31 31 33 33 33 34 36 36 36 37
47 49 49 54 56 59 59 61 64 65 65 71 73 73 73 74
4.5.3 Conclusions . . . . . . . . . . . . . . . . . . . 4.6 Testing for Multiple Variance Changes . . . . . . . . 4.6.1 Iterated Algorithm . . . . . . . . . . . . . . . 4.6.2 Empirical Power . . . . . . . . . . . . . . . . 4.6.3 Locating Multiple Variance Changes . . . . . 4.6.4 Conclusions . . . . . . . . . . . . . . . . . . . 4.7 Testing for a Change in the Long Memory Parameter 4.7.1 Introduction . . . . . . . . . . . . . . . . . . . 4.7.2 Simulation Results . . . . . . . . . . . . . . . 4.7.3 Conclusions . . . . . . . . . . . . . . . . . . .
Chapter 5:
Wavelet Analysis of Covariance
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
5.1 De nition of the Wavelet Covariance . . . . . . . . . . . . . . . 5.1.1 Decomposition of Covariance . . . . . . . . . . . . . . . 5.1.2 Wavelet Correlation . . . . . . . . . . . . . . . . . . . . . 5.2 Estimating the Wavelet Covariance . . . . . . . . . . . . . . . . 5.2.1 The MODWT Estimator . . . . . . . . . . . . . . . . . . 5.2.2 The DWT Estimator . . . . . . . . . . . . . . . . . . . . 5.2.3 Estimating the Wavelet Cross-Covariance . . . . . . . . . 5.3 Estimating the Wavelet Correlation and Cross-Correlation . . . 5.4 Con dence Intervals for the Wavelet Covariance and Correlation 5.4.1 Wavelet Covariance . . . . . . . . . . . . . . . . . . . . . 5.4.2 Wavelet Correlation . . . . . . . . . . . . . . . . . . . . . 5.5 Comparison of Variance Estimators for the Wavelet Covariance . 5.5.1 First Moment Properties of Ve j . . . . . . . . . . . . . . . 5.5.2 First Moment Properties of Vej . . . . . . . . . . . . . . . 5.5.3 Empirical Results . . . . . . . . . . . . . . . . . . . . . . iii
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
76 77 77 78 81 86 91 91 93 95
97 97 104 104 105 110 114 115 119 119 121 122 123 124 127
5.5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Chapter 6:
Chapter 7:
Conclusions and Future Directions
6.1 Nile River Minimum Water Levels . . . . . . . . . . . . . . . . . . . . 135 6.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.1.2 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.1.3 Testing for Homogeneity of Variance . . . . . . . . . . . . . . 140 6.2 Vertical Ocean Shear Measurements . . . . . . . . . . . . . . . . . . . 142 6.3 Wavelet and Multitaper Spectral Analysis of the Madden{Julian Oscillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.3.2 Univariate Spectral Analysis . . . . . . . . . . . . . . . . . . . 151 6.3.3 Bivariate Spectral Analysis . . . . . . . . . . . . . . . . . . . . 155 6.3.4 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 158 6.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.4 Wavelet Analysis of Covariance Between the Southern Oscillation Index and Madden{Julian Oscillation . . . . . . . . . . . . . . . . . . . 166 6.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 6.4.2 Time-Domain and Spectral Analysis . . . . . . . . . . . . . . 168 6.4.3 Wavelet Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.4.4 Investigating Seasonal Variation in the Madden{Julian Oscillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.4.5 Investigating ENSO Variation of the Madden{Julian Oscillation 179
7.1 Distributional Results for Testing Homogeneity of Variance . . . . . . 184 7.2 The Schwarz Information Criterion . . . . . . . . . . . . . . . . . . . 184 iv
7.3 7.4 7.5 7.6 7.7
Re nement of the Multiple Variance Change Testing Procedure Testing Homogeneity of Covariance . . . . . . . . . . . . . . . . Equivalent Degrees of Freedom for the Wavelet Covariance . . . Assessing Non-Gaussian/Non-Linear Processes . . . . . . . . . . Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
186 187 189 190 190
Appendix A: Fourier Theory and Filtering
Appendix B: Univariate Spectral Analysis
Appendix C: Bivariate Spectral Analysis
A.1 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 204 A.2 Properties of the DFT . . . . . . . . . . . . . . . . . . . . . . . . . . 205 A.3 Filtering of Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 B.2 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 B.3 Equivalent Degrees of Freedom for a Spectral Estimator . . . . . . . . 212 C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 C.2 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
LIST OF FIGURES 2.1 Spectral densities for fractional dierence processes . . . . . . . . . . 2.2 Realizations of fractional dierence processes . . . . . . . . . . . . . . 2.3 Autocovariance sequences for MA(q) approximations to fractional difference processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Autocovariance sequences for MA(q) approximations, using the modi ed innovations variance ^2, to fractional dierence processes . . . . . 2.5 Realizations of generalized fractional dierence processes . . . . . . .
3.1 3.2 3.3 3.4
28 30 41 43
The Haar, D(4) and LA(8) wavelet lters . . . . . . . . . . . . . . . . Squared gain functions for the Haar, D(4) and LA(8) wavelet lters . Quantile-quantile plots for the MODWT wavelet variance . . . . . . . Cumulative distribution functions for the MODWT wavelet variance .
4.1 Theoretical spectra for the unit scale DWT wavelet coecients of fractional dierence processes . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Theoretical spectra for the unit scale DWT wavelet coecients of an AR(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Theoretical spectra for the unit scale DWT wavelet coecients of an MA(1) process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Rejection rates for fractional dierence processes using white noise critical levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Rejection rates for fractional dierence processes using asymptotic critical levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
12 14
18 20
52 57 58 66 67
4.6 Rejection rates for fractional dierence processes using the MODWT and equivalent degrees of freedom . . . . . . . . . . . . . . . . . . . . 4.7 Estimated locations of a single variance change at k = 100 for fractional dierence processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Estimated locations of variance change for fractional dierence processes using the iterated cumulative sum of squares procedure . . . . 4.9 Estimated locations of multiple variance changes for fractional dierence processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 Spectra of fractional dierence processes and octave bands of the discrete wavelet transform . . . . . . . . . . . . . . . . . . . . . . . . . .
70 75 82 87 92
5.1 Estimates of Vej and Ve j ; j = 1; : : : ; 6 for uncorrelated white noise processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.2 Estimates of Vej and Ve j ; j = 1; : : : ; 6, minus their true value, for processes which satisfy a linear regression with delay . . . . . . . . . . . 131 6.1 Nile River minimum water levels for 622 AD to 1284 AD . . . . . . . 6.2 Multiresolution analysis of the Nile River minimum water levels using the D(4) wavelet lter and MODWT . . . . . . . . . . . . . . . . . . 6.3 Estimated D(4) wavelet variances for the Nile River minimum water levels before and after the year 722 AD . . . . . . . . . . . . . . . . . 6.4 Normalized cumulative sum of squares from the MODWT for the Nile River minimum water levels . . . . . . . . . . . . . . . . . . . . . . . 6.5 Plot of vertical shear measurements (inverse seconds) versus depth (meters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Multiresolution analysis of the vertical ocean shear measurements . . 6.7 Estimated wavelet variance of the vertical ocean shear measurements vii
136 137 139 141 143 144 145
6.8 Estimated locations of variance change for the vertical ocean shear measurements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Climate stations in the tropical Paci c Ocean . . . . . . . . . . . . . 6.10 Atmospheric time series collected from Canton Island (2:8S, 171:7 W) over the period 1 June 1957 to 31 March 1967 . . . . . . . . . . . . . 6.11 Univariate spectral analysis of Canton Island data . . . . . . . . . . . 6.12 Estimated co-spectra for the Canton Island data . . . . . . . . . . . . 6.13 Mean squared coherence of the Canton Island data . . . . . . . . . . 6.14 Multiresolution analysis of atmospheric time series collected at Canton Island (2:8S, 171:7 W) . . . . . . . . . . . . . . . . . . . . . . . . . . 6.15 MODWT estimated wavelet variance for Canton Island time series . . 6.16 MODWT estimated wavelet correlation for Canton Island time series 6.17 Station pressure series for Truk Island(7.4N, 151.8 W) and the Southern Oscillation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.18 Estimated cross-correlation sequence for the Southern Oscillation Index and Truk Island station pressure series. . . . . . . . . . . . . . . 6.19 Multiresolution analysis for the Truk Island station pressure series (1957{1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.20 Multiresolution analysis for the daily Southern Oscillation Index (1957{ 1992) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.21 MODWT estimated wavelet variance for the Southern Oscillation Index and Truk Island station pressure series. . . . . . . . . . . . . . . 6.22 MODWT estimated wavelet correlation for the Southern Oscillation Index and Truk Island station pressure series. The transformed con dence intervals were computed using Section 5.4.2. . . . . . . . . . . . viii
147 149 150 154 156 157 159 163 164 168 169 171 172 173
6.23 MODWT estimated wavelet cross-correlation for the Southern Oscillation Index and Truk Island station pressure series . . . . . . . . . . 175 6.24 Time-varying wavelet variance for the Truk Island station pressure series and SOI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6.25 Indicator of ENSO activity . . . . . . . . . . . . . . . . . . . . . . . . 180 6.26 Time-varying wavelet quantities, for the scale associated with the MJO 181 6.27 Time-varying wavelet quantities, for the scale associated with shorter periods than the MJO . . . . . . . . . . . . . . . . . . . . . . . . . . 182 7.1 Quantile-quantile plot comparing the Monte Carlo distributions of D and DXY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
LIST OF TABLES 3.1 Scaling coecients for the Daubechies least asymmetric wavelet lter of length L = 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.2 Equivalent degrees of freedom for the MODWT of white noise . . . . 39 3.3 Large sample approximation to the ratio of equivalent degrees of freedom j =N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1 Maximum dynamic range for the spectra of DWT wavelet coecients when applied to fractional dierence processes . . . . . . . . . . . . . 4.2 Maximum dynamic range for the spectra of DWT wavelet coecients when applied to AR(1) and MA(1) processes . . . . . . . . . . . . . . 4.3 Monte Carlo critical values for the test statistic (N=2) D . . . . . . . 4.4 Performance of the cumulative sum of squares method for fractional dierence processes with one variance change . . . . . . . . . . . . . . 4.5 Empirical power of iterated CSS algorithm for fractional dierence processes with one variance change . . . . . . . . . . . . . . . . . . . 4.6 Empirical power of the iterated CSS algorithm for fractional dierence proccesses with two variance changes . . . . . . . . . . . . . . . . . . 4.7 Rejection rates for a change in the long memory parameter of a fractional dierence process . . . . . . . . . . . . . . . . . . . . . . . . . 1 2
53 55 61 72 79 80 94
5.1 Variance of ^XY (j ); j = 1; : : : ; 6, for two white noise time series associated via linear regression with delay . . . . . . . . . . . . . . . 113 x
5.2 Empirical bias and mean squared error of Vej ; j = 1; : : : ; 6 for uncorrelated white noise processes . . . . . . . . . . . . . . . . . . . . . . . 128 5.3 Empirical bias and mean squared error of Vej ; j = 1; : : : ; 6 for white noise processes which are related via linear regression with delay . . . 130 6.1 Results of testing the Nile River minimum water levels for homogeneity of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
ACKNOWLEDGMENTS I would like to thank those people most directly involved with this dissertation. My two principal advisors, Professors Peter Guttorp and Don Percival, guided me and never stopped demanding a high level of my understanding and of my work. I would also like to thank the other members of my committee: Professors Paul Sampson, Chris Bretherton, and Stephen Majeski. I would especially like to thank my parents, Dona Farsdahl and Dennis Whitcher. Their willingness to provide me with every resource possible throughout my life in order to succeed is the reason why I am completing this degree. Finally, I would like to thank my fellow graduate students for good times, stimulating conversations and unparalleled drinking.
Chapter 1
INTRODUCTION 1.1 Motivation The analysis of time series has often been dicult when data do not conform to wellstudied theoretical concepts. One of the most common statistical properties violated by time series data is stationarity. A time series is considered (weakly or second-order) stationary when it has a mean and autocovariance sequence that do not vary with time. It is not uncommon to encounter departures from stationarity in recorded time series from the physical sciences, e.g., atmospheric science. There, seasonal eects are not limited to the mean of a time series, but may also enter into the variance. Some atmospheric variables are known, for instance, to exhibit increased variability in the winter of each year. Other time series exhibit a persistence of correlation much longer than can be explained by short memory (ARIMA) models; they are known as long memory processes. The existence of data, such as these, that defy current statistical methods motivates researchers to develop better theories and better tools with which to analyze them. In this dissertation I present statistical techniques that can be useful for detecting and evaluating nonstationary events in univariate or bivariate time series. A complicating factor in many situations is the presence of slowly decaying autocorrelations, or long memory, in a time series. The techniques presented here are shown to perform well whether short memory or long memory structure is assumed. Another concept which arises in the physical sciences is the notion of `multiscale
2 features.' That is, an observed time series may contain several phenomena, each occurring in dierent time scales (these correspond to ranges of frequencies in the Fourier domain). An example in atmospheric science would be that weather has a very short time scale, around three days, while seasonal patterns occur around 365 days when measured at a single station away from the equator. Wavelet techniques possess a natural ability to decompose time series into several sub-series which may be associated with particular time scales. Hence, interpretation of features in complex atmospheric time series may be alleviated by rst applying a wavelet transform and subsequently interpreting each individual sub-series. This dissertation grew out of a project to investigate atmospheric phenomena, such as the Madden{Julian oscillation (MJO) (Madden and Julian 1971), using wavelet techniques. While developing sound statistical quantities and tests, I have also tried to keep in mind their application to relevant scienti c questions and interpretability. 1.1.1 Detecting Nonstationary Events in Long Range Dependence
The rst topic I consider is the detection and location of nonstationary events in time series which may exhibit long memory structure. Here I fuse two established techniques, wavelet analysis and change-point analysis, in order to extend our ability to test hypotheses concerning the homogeneity of variance for a univariate time series, with somewhat mild restrictions on its underlying spectrum. First, change point detection is a well studied eld in statistics. Detecting a change in variance has a much smaller amount of literature associated with it. Techniques include, but are not restricted to, Fourier methods (Nuri and Herbst 1969), cumulative sum of squares methods (Wichern et al. 1976; Hsu 1977), parametric time series methods (Davis 1979; Tsay 1988), and Bayesian methods (Abraham and Wei 1984). The cumulative sum of squares method is closely related to the notion of testing using the empirical distribution function (Stephens 1970; Stephens 1986) and the cumulative periodogram test; see, e.g., Priestley (1981, Sec. 6.1.4). Recently,
3 researchers have investigated detecting and locating not single changes of variance, but multiple changes. Techniques used include a cumulative sum of squares method (Inclan and Tiao 1994) and an information criterion method (Chen and Gupta 1997). Second, the discrete wavelet transform (DWT) has been shown to approximately decorrelate time series with long memory structure; see, for example, Tew k and Kim (1992), McCoy and Walden (1996) and Wornell (1996). In fact, the DWT of a long memory process produces several sub-series which are approximately white noise sequences. Features which dier from this long memory structure, such as sudden changes of variance, are retained in certain sub-series of wavelet coecients. We take advantage of this approximate \decorrelation" of the DWT and the simplicity of the cumulative sum of squares method to test for homogeneity of variance, on a scale by scale basis, of long memory processes in Chapter 4. This provides a statistically sound technique of testing for nonstationary features without knowing the exact nature of the correlation structure in a given time series. The methodology developed in Chapter 4 is applied to the minimum water levels of the Nile River (Toussoun 1925) in Section 6.1, a time series known to exhibit long-range dependence (Mohr 1981; Graf 1983; Beran 1994, p. 22). I also analyze measurements of vertical ocean shear (Percival and Guttorp 1994) in Section 6.2. While this series does not appear to exhibit long memory structure, it is a good application of the multiple change point testing procedure where exact knowledge of the underlying spectrum is not required. The residual correlation in the wavelet coecients of both short and long memory processes is investigated in Section 4.1. 1.1.2 Wavelet Analysis of Bivariate Time Series
Atmospheric phenomena are not always discovered using solely univariate techniques. For example, the Madden{Julian oscillation (MJO) (Madden and Julian 1971) was found using bivariate spectral analysis { speci cally the co-spectrum and magnitude squared coherence. This oscillation has been documented as having a period anywhere
4 from 30{60 days and has appeared in many studies in the Indian Ocean and tropical Paci c Ocean; see, e.g., Madden and Julian (1994) for a review. This apparent broadband nature of the oscillation has been hypothesized as being nonstationary, so the broad peak observed in previous spectral analyses might be attributed to the fading in-and-out of the oscillation over the time series of measurements. The in uence of El Ni~no{Southern Oscillation (ENSO) events has also been hypothesized to aect the period of the MJO (Gray 1988; Kuhnel 1989). The ability of the wavelet transform to capture variability in both time and scale may provide insight into the nature of atmospheric phenomena such as the MJO, but rst bivariate techniques must be developed. Wavelet methods for time series analysis have been performed primarily on univariate processes { with the following exceptions. There has been some work in the eld of turbulence { in a thesis by Hudgins (1992) and subsequent paper by Hudgins, Friehe, and Mayer (1993). Hudgins used the output from wavelet transforms to measure association between turbulent velocity components in the atmosphere. A few articles also appear in the engineering literature from Japan. Kawata and Arimoto (1996) were interested in signal matching for pattern recognition problems, and Li and Nozaki (1997) used the wavelet cross-correlation of two velocity signals in order to reveal similar structures on a scale by scale basis at particular delays and times. Recently, Torrence and Compo (1998) discuss the cross-wavelet spectrum, which is complex valued, and the cross-wavelet power, which is simply the magnitude of their cross-wavelet spectrum. They also introduce con dence intervals for their cross-wavelet power and compare the Southern Oscillation Index (SOI) with the Ni~no3 sea surface temperature (SST). Both time series are measures of ENSO activity; the SOI is de ned to be seasonally averaged pressure dierence between Darwin, Australia, and Tahiti, French Polynesia, and the Ni~no3 SST is the seasonal SST averaged over the central Paci c (5S{5N, 90{150 W). The articles discussed above solely utilized the continuous wavelet transform.
5 Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet covariance for the DWT and maximal overlap DWT (a redundant version of the DWT), along with con dence intervals based on large sample results. These methods were applied to the surface temperature and albedo of ice pack in the Beaufort Sea, o the coast of Alaska and the Northwest Territory. I introduce the wavelet covariance and correlation in Chapter 5, establishing their asymptotic distributions for certain Gaussian processes. The wavelet covariance is shown to decompose the covariance between two stationary processes on a scale by scale basis. The wavelet cross-covariance and cross-correlation are also de ned in order to perform a more thorough scale by scale analysis of bivariate time series. The same time series used by Madden and Julian (1971) are analyzed using bivariate wavelet techniques and multitaper spectral methods in Section 6.3. A daily Southern Oscillation Index is used as an indicator of ENSO activity and compared with the station pressure at Truk Island (7.4N, 151.8 W) in order to investigate the possible relationship between ENSO events and the MJO (Section 6.4).
1.2 Outline of Thesis Fractional dierence processes and generalized fractional dierence processes, which have a time-varying long memory parameter, are introduced in Chapter 2. Descriptions of simulation methods are given for both types of processes, with a slight modi cation to the simulation of generalized fractional dierence processes as proposed by Wang et al. (1997). Realizations are also provided for both types of processes. Chapter 3 begins by introducing Daubechies families of compactly supported wavelet lters. Material related to the Fourier transform and ltering are provided in Appendix A. The DWT and maximal overlap DWT (MODWT) are then introduced, with references provided for implementation in practice. A key property of wavelet transforms is their conservation of energy. The wavelet variance is de ned
6 to establish the decomposition of variance for a time series. An equivalent degrees of freedom argument for the wavelet variance is also investigated. Chapter 4 discusses testing for homogeneity of variance in univariate time series. First, the spectral properties of wavelet coecients from both short memory and long memory processes are investigated. The ability of the DWT to approximately decorrelate long memory processes with respect to a cumulative sum of squares test statistic, on a scale by scale basis, is shown via Monte Carlo simulation. The alternative hypothesis of a single variance change is then investigated. The MODWT is employed to estimate the location of the variance change. Both the detection and location procedures are then applied to multiple variance changes in a time series using a recursive procedure. The ability of this method to detect a bona de change in the long memory parameter, when the variance of the process remains constant, is also studied. A sudden change in the long memory parameter will produce changes in the autocovariance sequence of the wavelet coecients, at almost all levels of the DWT, which should aect the cumulative sum of squares test statistic in quite a dierent way from a simple variance change. The extension of wavelet methodology to bivariate time series is explored in Chapter 5. By de ning the wavelet covariance and wavelet correlation between two processes in a natural way, we can succinctly describe their relationship on a scale by scale basis. The wavelet covariance is shown to decompose the covariance between two stationary processes on a scale by scale basis. Asymptotic normality of the wavelet covariance and correlation is proven, allowing for construction of approximate con dence intervals for their estimators. The wavelet cross-covariance and cross-correlation are also introduced. The DWT estimator of the wavelet covariance is shown to suer from bias depending on the delay between the two time series. Chapter 6 contains examples of how the methodology introduced in Chapters 4 and 5 perform on real time series. The Nile River minimum water levels (Toussoun 1925) are analyzed to nd a sudden change of variance around 720 AD, which cor-
7 responds nicely to the construction of an instrument to measure the river levels in 715 AD. Vertical ocean shear measurements (Percival and Guttorp 1994) are analyzed to determine multiple variance changes in the rst 5 scales. Atmospheric time series from the tropical Paci c Ocean (provided by Rol Madden at the National Center for Atmospheric Research) are analyzed in order to contrast the results from a wavelet analysis of these data to results obtained from bivariate spectral analysis in Madden and Julian (1971). Two series, one a measure of ENSO and the other for the MJO, are analyzed in order to investigate the possible association between ENSO events and the frequency and/or magnitude of the MJO. Conclusions from the research presented here are given in Chapter 7, along with open questions and future directions.
1.3 Contributions The following is a list of original contributions in this dissertation:
Investigation of the spectral properties of the DWT wavelet coecients when
applied to both short memory (ARMA) and long memory (fractional dierence) processes (Section 4.1).
Demonstration, through Monte Carlo simulation, that the DWT of fractional
dierence processes produces approximately uncorrelated output, on a scale by scale basis, with respect to a Kolmogorov-type test statistic (Chapter 4).
Proof that the wavelet covariance decomposes the covariance between two stationary processes on a scale by scale basis (Section 5.1.1).
Proof that the MODWT estimator of the wavelet covariance is asymptotically normally distributed when applied to nonstationary Gaussian processes whose dth order backward dierences are short memory stationary (Section 5.2.1).
8 This allows for the construction of con dence intervals when estimating the wavelet covariance.
Proof that the MODWT estimator of the wavelet correlation is asymptotically normally distributed when applied to nonstationary Gaussian processes whose dth order backward dierences are short memory stationary (Section 5.3). This allows for the construction of con dence intervals when estimating the wavelet correlation.
Demonstration that the lack of shift invariance of the DWT introduces bias into the variance of the DWT estimator of the wavelet covariance (Section 5.2.2).
Demonstration of evidence for a change in the variance of the Nile River min-
imum water levels (Toussoun 1925) instead of a change in the long memory parameter { as proposed in Beran and Terrin (1996) (Section 6.1).
Investigation of the possible interaction between ENSO events and the MJO us-
ing a wavelet analysis of covariance developed in this dissertation (Section 6.4).
Chapter 2
LONG MEMORY PROCESSES Our current understanding, and more importantly awareness, that natural phenomena may exhibit long-range dependence is due to the pioneering work by Hurst (1951). While looking at time series from the physical sciences (e.g., rainfall, tree rings, river levels, etc.) he noticed that his R=S -statistic, on a logarithmic scale, was randomly scattered around a line with slope H > 21 for large sample sizes. The R=S statistic is the rescaled adjusted range and was used to calculate the ideal capacity of a water reservoir from time t to time t + k. Loosely, the numerator R (or adjusted range) measures the cumulative in ow to the reservoir and the denominator S is proportional to the standard deviation of all measured in ows. For a stationary process with short-term dependence, log R=S should be proportional to k , for k large. The discovery of slopes proportional to kH , with H > 21 , was in direct contradiction to the theory of such processes at the time. This discovery is known as the Hurst eect. Mandelbrot and co-workers (Mandelbrot and van Ness 1968; Mandelbrot and Wallis 1969) showed that the Hurst eect can be modeled by fractional Gaussian noise with self-similarity parameter 0 < H < 1 (H being for Hurst). More information about the history of long memory processes can be found in Beran (1994). Examples of such behavior can be found in a variety of disciplines, such as geophysics (Percival and Guttorp 1994; Walden 1994), hydrology (Lawrence and Kottegoda 1977; Hosking 1984), economics (Jensen 1994) and engineering (Mehrabi, Rassamdana, and Sahimi 1997; Abry and Veitch 1998). In this dissertation, I look at the Nile River minimum water levels (Toussoun 1925) and vertical shear measurements in the ocean in Chapter 6. 1 2
10 This chapter is divided into two parts, fractional dierence processes and generalized fractional dierence processes. The latter is a generalization of the former where the dierence parameter d is allowed to vary with time. Along with brief descriptions, simulation techniques for both types of processes are also provided.
2.1 Fractional Dierence Processes 2.1.1 De nition
In the early 1980s, a family of models were developed to help analyze long memory processes. Granger and Joyeux (1980) and Hosking (1981) introduced fractional ARIMA models, which are a generalization of the standard ARIMA(p; d; q) models de ned by Box and Jenkins (1976). Let fXtg be a stationary process whose dth order backward dierence (1 ? B )dX
1 X d k=0
k k (?1) Xt?k = t
is a stationary process, where d is a real number and a a! = ?(a + 1) b b!(a ? b)! ?(b + 1)?(a ? b + 1) : For example, the rst order dierence is Yt = Xt ? Xt?1 when d = 1. If ftg is a white noise process with variance 2, then fXtg is the simplest case of a fractional ARIMA process, a fractional ARIMA(0; d; 0). We will refer to such processes as fractional dierence processes from now on. Now let fXtg be a zero mean fractional ARIMA(0; d; 0) process with ? 21 < d < 12 (for simplicity, the sampling interval is t = 1). This process is stationary and invertible (Hosking 1981). The autocovariance sequence (acvs) of fXtg is de ned to be 2 ? 2d) ; s E fXt; Xt? g = ?(1 + (??1)d)?(1 ?(1 ? ? d)
11 which means the variance is given by
2 ?(1 ? 2d) : VarfXt g = s0 = [?(1 ? d)]2
The spectral density of fXtg is
(2.3) SX (f ) = 2j2 sin(f )j?2d for ? 21 < f < 12 ; so that SX (f ) / f ?2d approximately as f ! 0 and, thus, the spectral density is approximately linear on the log scale. This property can be seen in Figure 2.1 for various fractional dierence processes. When 0 < d < 21 , this spectral density has a pole at zero, in which case the process exhibits slowly decaying autocovariances and constitutes a simple example of a long memory process. 2.1.2 Simulation
Davies and Harte (1987) describe a method for simulating certain stationary Gaussian time series of length N with autocovariances 0; 1; : : : ; N ?1. The method is based on the Fourier transform and is as follows (Beran 1994, pp. 216{217): 1. De ne
k 22(nk??21) ;
k = 1; : : : ; 2n ? 2, and the discrete Fourier transform ?k of the two-sided sequence of autocovariances 0; 1; : : : ; n?2 ; n?1; n?2 ; : : : ; 1, ?k for k = 1; : : : ; 2n ? 2.
n?1 X j =1
ei(j?1)k +
j =n
2n?j?1 ei(j?1)
2. Check to see that ?k > 0 for all k = 1; : : : ; 2n ? 2. If this condition does not hold, the Davies{Harte method will not work for this time series (this is not a problem with fractional dierence processes).
20 d = 0.05 d = 0.25 d = 0.40 d = 0.45 15
Figure 2.1: Spectral densities for fractional dierence processes. The x-axis is displayed on the log2 scale. 3. Simulate two independent sequences of normal random variables, U1; U2; : : : ; Un and V1; V2; : : : ; Vn, such that VarfU1g = VarfUn g = 2 and, for k 6= 1; n, VarfUk g = VarfVk g = 1: De ne V1 = Vn = 0 and the sequence of complex random variables fZk g by
Zk Uk + iVk ; k = 1; : : : ; n and
Zk U2n?k ? iV2n?k ;
13 for k = n + 1; : : : ; 2n ? 2. 4. For t = 1; : : : ; n de ne
Xt p 1 2 n?1
?k ei(t?1) Zk : k
The series fXtg has the desired covariance structure. This method has a computational advantage since Equations (2.4) and (2.5) can be calculated using the fast Fourier transform. Percival (1992) compares this method to others in the context of generating a stationary Gaussian process with speci ed spectrum. S-plus code for the Davies{Harte method, along with documentation provided by Martin Maechler and Jan Beran, can be obtained via the World Wide Web from StatLib at http://lib.stat.cmu.edu/S/ under the title beran. Realizations of length 512 from several fractional dierence processes (generated in S-plus) are displayed in Figure 2.2. As the long memory parameter increases in magnitude, the fractional dierence process appears to have more and more low frequency content.
2.2 Generalized Fractional Dierence Processes 2.2.1 De nition
A process de ned by Equation (2.3) has a long memory parameter which is constant over time. We introduce a related process where the long memory parameter dt is a discrete function of time { called a generalized fractional dierence process (gfdp). This process has recently appeared in a paper by Wang, Cavanaugh, and Song (1997). We will utilize these processes later on when we investigate how the test for homogeneity of variance reacts to a sudden change in the long memory parameter of a generalized fractional dierence process.
14 d: 0.45 4
-4 d: 0.40 4
-4 d: 0.25 4
-4 d: 0.05 4
-4 0
Figure 2.2: Realizations of fractional dierence processes (N = 512).
15 2.2.2 Simulation
Hosking (1981) looked at representing a fractional ARIMA as an in nite autoregressive process or in nite moving average process with coecients which may be given explicitly; see also Beran (1994, pp. 64{65). We utilize the in nite moving average representation in order to simulate generalized fractional dierence processes. Let fXt g be a generalize fractional dierence process with long memory parameter fdtg, then it has an in nite moving average representation
Xt =
1 X k=0
at;k t+N ?k ; t = 1; : : : ; N;
where k ; t = 1; 2; : : : , is a white noise sequence and
k + dt ) : at;k ?(?( k + 1)?(d ) t
For k ! 1 we have the following approximation
at;k ?(1d ) kd ?1; t
by Stirling's formula. We now provide an algorithm for simulating such a process. For a realization Xt ; t = 1; : : : ; N , of a portion of a generalized fractional dierence process, a white noise sequence t; t = 1; : : : ; mN is generated. The parameter m > 1 is a positive integer that determines the order of the moving-average model used to generate the realization Xt. When simulating generalized fractional dierence processes in this dissertation, I used m = 2. Once the length of previous observations is speci ed, each observation Xt is simply the moving average of the previous (m ? 1)N observations; i.e.,
Xt =
(m?1)N ?1
at;k t?k t = 1; : : : ; N:
16 The coecients at;k are functions of the time-varying long memory sequence fdtg and can be de ned recursively via
at;k = at;k?1 k ? 1k + dt for k = 1; : : : ; N ? 1; where at;0 = 1. This allows for fast computation since computing the gamma function in Equation (2.6) explicitly is inecient. As previously stated, this technique is based on an in nite moving average process but implemented as a nite moving average process. A simple check to see how large m needs to be, in order to reasonably simulate fractional dierence processes, is to compare the autocovariance sequence (acvs) between the true process and the moving-average approximation. The acvs of an MA(q) process, such as the one given in Equation (2.7), is
8 Pq?jkj < 2 j=0 at;j at;j+jkj; jkj q; = s(MA) k : 0; jkj > q;
(see, e.g., Brockwell and Davis (1991, p .79)), where dt does not vary with time. The exact acvs for a fractional dierence process fs(fdp) k g is given in Equation (2.1). The acvs for an MA(q) process, computed using order q = 512; 1024; 2048 and 4096, was compared with fs(fdp) k g for long memory parameters d = 0:05; 0:25; 0:4 and 0.45; see Figure 2.3. For fractional dierence processes with small long memory parameters, the MA(q) approximation is very good. However, when heavy amounts of autocorrelation are present even the MA(4096) process does not perform well. This makes the approximation quite crude with respect to fractional dierence processes as d ! 0:5. (MA) There is one free parameter to help us improve the t between fs(fdp) k g and fsk g, namely, the innovations variance 2. If we adopt the least-squares approach, then we want to nd a 2 such that
X h (fdp)
N ?1 k=0
? 2s(MA) k
d: 0.45
0.5 0
Exact q = 512 q = 1024 q = 2048 q = 4096
15 d: 0.25
d: 0.05
d: 0.40
Figure 2.3: Autocovariance sequences for MA(q) approximations to fractional dierence processes, with orders ranging from q = 512 to q = 4096.
d: 0.45
Exact q = 512 q = 1024 q = 2048 q = 4096
15 d: 0.25
d: 0.05
d: 0.40
Figure 2.4: Autocovariance sequences for MA(q) approximations, using the modi ed innovations variance ^2, to fractional dierence processes, with orders ranging from q = 512 to q = 4096.
19 is minimized. Dierentiating with respect to 2, setting the equation to zero, and solving for 2 yields the usual estimator from linear regression theory; i.e.,
PN ?1 s(fdp)s(MA) PkN=0?1 hk (MA)k i2 : s k=0
Hence, inserting ^2 into Equation (2.8) will produce an improved, in the leastsquares sense, autocovariance sequence. Simulation of fractional dierence processes is straightforward, by generating sequences of white noise with variance ^2 and utilizing Equation (2.7), where fdtg does not vary with time. For generalized fractional dierence processes, sequences of white noise with unit variance are generated and the innovations standard deviation ^t; enters into Equation (2.7) and varies with fdt g. This procedure was applied to the autocovariance sequences displayed in Figure 2.3, using only the rst 30 lags in the regression, with the results shown in Figure 2.4. There is no change in the autocovariance sequences when d = 0:05 or 0.25, these approximations were adequate without modi cation even for q = 512. A marked improvement is seen for d = 0:40, where the approximation at initial lags is only slightly higher and, for larger lags, is not nearly as low as before. This pattern is even more apparent for d = 0:45, where the new autocovariance sequences do not t the exact acvs well, but better than in Figure 2.3. Whereas modifying the innovations variance improves this method of simulation for a larger interval of d, as d ! 0:5, any nite moving-average model must have an extremely large order to adequately capture the amount of correlation structure present in a fractional dierence process. Figure 2.5 shows realizations of four generalized fractional dierence processes (N = 512), where m = 2 and the modi ed innovations variance ^2 is utilized, with a sudden shift in the long memory parameter at t = N2 . Whereas we are only interested in generating fractional dierence processes with these sudden changes in the dierence parameter, this method is able to produce processes with parameters
20 d: 0.05 / 0.45
-4 d: 0.05 / 0.40
-4 d: 0.05 / 0.25
-4 d: 0.05 / 0.05
Figure 2.5: Realizations of generalized fractional dierence processes (N = 512).
21 which change linearly or even nonlinearly with time. Examples of such processes can be found in Wang et al. (1997).
Chapter 3
DISCRETE WAVELET TRANSFORMS AND THE WAVELET VARIANCE The wavelet transform is a powerful mathematical tool that is receiving more and more attention by the statistical community. While most work is being done in the engineering and physical sciences, wavelet transforms have already proven useful in well established statistical elds such as nonparametric regression, classi cation, and time series analysis. The ground-breaking work of Donoho and co-workers (Donoho 1993; Donoho and Johnstone 1994; Donoho 1995; Donoho, Johnstone, Kerkyacharian, and Picard 1995, etc.) introduced statisticians to wavelet transforms in the context of signal estimation and wavelet shrinkage. In the following chapters, I extend wavelet methodology in the area of time series analysis. This chapter introduces some basic concepts of wavelet methodology and brie y investigates how well the equivalent degrees of freedom argument holds with respect to the wavelet variance. Wavelet methods are best understood when contrasted against classical Fourier methodology. Two texts with introductory material on Fourier theory are Percival and Walden (1993) and Briggs and Henson (1995). Classic texts on spectral analysis of time series include Koopmans (1974), Bloom eld (1976) and Priestley (1981); Anderson (1971) and Fuller (1996) focus on the time-domain analysis of time series. These texts will be utilized when the details of a particular concept are beyond the scope of this dissertation. Pertinent concepts, such as Fourier theory, ltering and spectral analysis, are presented in the rst two appendices at the end of this dissertation for direct reference. Introductory texts on wavelet theory abound, and
23 most { if not all { are from an engineering perspective, except for the recent work of Ogden (1997). Chui (1997) provides a basic synopsis of wavelet theory, while Vetterli and Kovacevic (1995) gives a more thorough account from an engineering perspective. The mathematically rigorous book by Daubechies (1992) contains a wealth of details on her families of wavelet lters. By utilizing notation and concepts from the rst two appendices, we introduce the Haar wavelet lter and two families of Daubechies wavelets in Section 3.1. The partial discrete wavelet transform (DWT) and maximal overlap discrete wavelet transform (MODWT) are brie y introduced. Algorithms for the DWT abound in the literature; for a detailed computer algorithm of the MODWT see Appendix A of Percival and Mofjeld (1997). The bulk of the background material on wavelets, DWT and MODWT (including notation) is a synopsis derived from Percival and Walden (1999). The wavelet variance is introduced along with the concept of equivalent degrees of freedom for a time series. The distribution of the wavelet variance under the equivalent degrees of freedom argument is compared with exact methods from the theory of quadratic forms of normal random variables.
3.1 Wavelet Filters 3.1.1 The Haar Wavelet
The rst wavelet lter, the Haar wavelet (Haar 1910), remained in relative obscurity until the convergence of several disciplines to form what we now know in a broad sense as wavelet methodology. It is a lter of width L = 2 which can be succinctly de ned by its scaling coecients
g0 = g1 = p1 ; 2
or equivalently by its wavelet coecients h0 = 1= 2 and h1 = ?1= 2 through the quadrature mirror relationship
hl = (?1)lgL?1?l for l = 0; : : : ; L ? 1
(the convention of gl corresponding to the low-pass or scaling coecients and hl corresponding to the high-pass or wavelet coecients will be adhered to throughout this dissertation). The Haar wavelet is special since it is the only compactly supported orthonormal wavelet that is symmetric (Daubechies 1992, Ch. 8). It is also useful for presenting the basic properties shared by all Daubechies wavelet lters { orthonormality and orthogonality to even shifts. The former property is seen by L?1 X l=0
h2l = 1;
and the latter through a similar calculation L?1 X l=0
hlhl+2k =
1 X
hlhl+2k = 0
for all non-zero integers k, where by de nition hl = 0 for l < 0 and l L. Although the Haar wavelet lter is easy to visualize and implement, it is inadequate for most real-world applications in that it is a poor approximation to an ideal band-pass lter. This can be seen, for example, in the analysis of vertical ocean shear measurements (Percival and Guttorp 1994). Other wavelets of even width L 4 have been developed in the past decade that yield much better approximations. 3.1.2 Daubechies Families of Wavelet Filters
A wavelet family consists of all wavelet basis vectors, over all scales and translations, derived from a single wavelet lter (or mother wavelet). Two wavelet families which will be used exclusively in later chapters were developed by I. Daubechies. They are the extremal phase and least asymmetric wavelets. When referring to these wavelets
25 in the future, `D(L)' and `LA(L)' will be used to denote Daubechies extremal phase and least asymmetric wavelet lters of length L, respectively. The D(2) wavelet is equivalent to the Haar wavelet. In general, let
H (f )
L?1 X t=0
and G(f )
L?1 X t=0
de ne the transfer function for the wavelet and scaling coecients, respectively. Recall that any arbitrary transfer function may be factored into the product of its magnitude component and a complex exponential containing the phase component (cf. Section A.3). The D(L) and LA(L) wavelet lters are identical in the magnitude of their transfer functions, only diering in their phase properties. The manipulation of these phase properties is known as spectral factorization (Percival and Walden 1999, Sec. 4.8). Wavelet lter coecients for the D(4) wavelet, at unit scale, are de ned to be
+ 3 ; h = ?3 p + 3 and h = ?1 ? p 3; h0 = 1 ?p 3 ; h1 = ?3 p 2 3 4 2 4 2 4 2 4 2 and the scaling coecients for the LA(8) are given in Table 3.1. Recall that the scaling lter is related to the wavelet lter via the quadrature mirror lter relationship given by Equation (3.1). The scaling coecients de ning Daubechies families of wavelet lters of varying lengths can be found in Daubechies (1992, Ch. 6). More information about the properties of these wavelets can be seen when comparing the squared gain functions of the wavelet and scaling coecients
H(f ) jH (f )j2 = H (f )H (f ) and G (f ) jG(f )j2 = G(f )G (f ); respectively. The two transfer functions, and hence the squared gain functions, are related through the quadrature mirror relationship (Equation (3.1)) such that
G(f ) = e?i2f (L?1)H
?1 ? f 2
and hence G (f ) = H
?1 ? f : 2
26 Table 3.1: Scaling coecients for the Daubechies least asymmetric wavelet lter of length L = 8, taken from Percival and Walden (1999, Sec. 4.4).
g0 g1 g2 g3 g4 g5 g6 g7
= = = = = = = =
?0:0757657147893407 ?0:0296355276459541
0:4976186676324578 0:8037387518052163 0:2978577956055422 ?0:0992195435769354 ?0:0126039672622612 0:0322231006040713
The orthonormality (Equation (3.2)) and orthogonality to its even shifts (Equation (3.3)) seen for the Haar wavelet lter, and shared by both Daubechies families of wavelet lters used here, can be succinctly expressed using the squared gain function of the wavelet lter via
H(f ) + H ?f + 21 = 2 for all f:
To illustrate these properties, we can show the Haar wavelet lter satis es Equation (3.5) since
H (Haar)(f ) =
1 X
hle?i2fl = 1 ?pe
= i 2e?if sin(f );
and therefore the squared gain function is H(Haar)(f ) = 2 sin2(f ). Using the rela? tionship that cos(f ) = sin f + 21 , we have
H(Haar)(f ) + H(Haar) f + 12 = 2 sin2(f ) + 2 cos2(f ) = 2: Alternative ways of expressing Equation (3.5), say, using the squared gain function for the scaling coecients or combinations between the two, are
G (f ) + G ?f + 21 = 2 or G (f ) + H(f ) = 2 for all f;
27 and follow from the fact that they both have unit period and their quadrature mirror relationship (Equation (3.4)). Now, the discrete wavelet transform (DWT) can be thought of as a sequence of ltering operations which form a cascade of lters (cf. Section A.3). The lowpass output from one ltering operation fXt gl g is the input to the next ltering operation where the lter is an upsampled version of the original lter. Upsampling consists of inserting one zero between each of the elements of fhlg to form fh"l g fh0; 0; h1; 0; : : : ; hL?2; 0; hL?1g; see, e.g., (Vetterli and Kovacevic 1995, Sec. 2.5.3) or Percival and Walden (1999, Sec. 4.4). The transfer function for fh"l g is
H "(f ) =
h"l e?i2fl =
L?1 X l=0
h"2le?i2f (2l) =
L?1 X l=0
hle?i2(2f )l = H (2f );
since every other element of fh"l g is zero. Using Equation (A.4), the transfer function for the second level wavelet lter fh2;lg is H2(f ) H (2f )G(f ). By a similar argument, the transfer function for the second level scaling lter fg2;lg is determined by convolving fgl g with fgl"g fg0; 0; g1; 0; : : : ; gL?2; 0; gL?1 g and is therefore G2 (f ) G(2f )G(f ). This method can be extended to an arbitrary level j , by repeatedly upsampling the lters and applying Equation (A.4), yielding the following expressions for the transfer functions of the wavelet and scaling lters, respectively,
Hj (f ) H (2j?1 f )
j ?2 Y l=0
G(2l f ) and Gj (f )
j ?1 Y l=0
G(2lf ):
Intuitively, a vector of wavelet coecients for level j is composed of j ? 1 applications of a low-pass lter (or averaging operator) followed by one application of a high-pass lter (or dierence operator), and a vector of scaling coecients is obtained from j applications of the low-pass lter. Figure 3.1 shows some of the common wavelets, or more speci cally, wavelet basis vectors taken from the sixth level of the transform. As the length of the wavelet lter increases, the smoothness of the basis function increases. However, the increased
Haar 0.2
D(4) 0.2
LA(8) 0.2
Figure 3.1: The Haar, D(4) and LA(8) wavelet lters for level 6 (N = 512).
29 length, while improving the lters' approximation to an ideal band-pass lter, ampli es boundary eects encountered whenever nite time series are analyzed. This is an important feature to realize in practical situations where data may be at a premium. From the gure, the Haar wavelet lter is a simple square-wave function, the D(4) is quite jagged with a self-similar or fractal-like appearance to it and the LA(8) is reasonably smooth and quite close to symmetric. When selecting a wavelet lter, several factors must be taken into account, such as, boundary eects, leakage protection, etc. Most importantly, the wavelet lter should agree with the underlying structure of the physical process it is analyzing. The squared gain functions of the wavelet and scaling lter coecients for the Haar, D(4) and LA(8) wavelets are given in Figure 3.2. For comparison, the vertical dotted lines indicate the passband of frequencies for an ideal band-pass lter. The rst column in the gure shows the squared gain functions for the unit scale wavelet lters. As the length of the lter increases, from Haar (L = 2) to LA(8), the approximation to an ideal high-pass lter for 41 < f < 12 by the wavelet coecients improves as does the approximation to an ideal low-pass lter by the scaling coecients. The Haar wavelet lter is seen to be a poor approximation to an ideal band-pass lter for all scales shown. Another interesting feature to point out is the leakage of the shorter wavelet lters. Because a high portion of low frequencies is being captured in each scale, one may observe a fair amount of low frequency structure at smaller scales; see, e.g., Percival and Guttorp (1994). This is due to the poor approximation to an ideal band-pass lter by the analyzing wavelet. However, unlike spectral analysis, where leakage must be dealt with using tapering or other pre-processing of the data, an easy way to eliminate (or at least suppress) leakage is to increase the length of the wavelet lter. In practice, it is a good idea to perform a wavelet decomposition using lters of varying lengths, in order to determine if leakage is present.
Haar Level: 1
Haar Level: 2
Haar Level: 3
Haar Level: 4
D(4) Level: 1
D(4) Level: 2
D(4) Level: 3
D(4) Level: 4
0.0 LA(8) Level: 1
LA(8) Level: 2
LA(8) Level: 3
LA(8) Level: 4
0.0 0.0
Figure 3.2: Squared gain functions for the Haar, D(4) and LA(8) wavelet lters associated with scales 2j ? 1; j = 1; : : : ; 4. The solid line denotes the wavelet lter and the dashed line denotes the scaling lter while the vertical dotted lines denote the frequency bands for an ideal band-pass lter.
3.2 The Partial Discrete Wavelet Transform 3.2.1 De nition
Here we introduce notation and concepts in order to compute the partial DWT of a vector of observations. This section closely follows previous de nitions of the DWT by, for example, Percival and Mofjeld (1997) and McCoy, Walden, and Percival (1998). A much more thorough introduction can soon be found in Percival and Walden (1999, Ch. 4). Let fh1g fh1;0; : : : ; h1;L?1g denote the wavelet lter coecients of a Daubechies compactly supported wavelet and let fg1g fg1;0; : : : ; g1;L?1g be the corresponding scaling lter coecients, de ned via an analogous relationship to Equation (3.1), speci cally,
g1;m = (?1)m+1h1;L?1?m : The wavelet lter fh1g is associated with unit scale and we assume it satis es Equations (3.2) and (3.3). For any dyadic sample size N L, let
H1;k =
N ?1 m=0
h1;m e?i2mk=N ; k = 0; : : : ; N ? 1;
be the discrete Fourier transform (DFT) of fh1g (cf. Equation (A.1)) and let G1;k denote the DFT of fg1g. Now de ne the wavelet lter fhj g for scale j 2j?1 as the inverse DFT of
Hj;k = H1;2 ? k mod N j 1
j ?2 Y l=0
G1;2 k mod N ; k = 0; : : : ; N ? 1; l
(cf. Equation (3.6)). The resulting wavelet lter associated with scale j has length Lj (2j ? 1)(L ? 1)+1. Also, de ne the scaling lter fgJ g for scale J as the inverse DFT of
GJ;k =
J ?1 l=0
G1;2 k mod N ; k = 0; : : : ; N ? 1; l
32 (cf. Equation (3.6)). Let W be an N N matrix de ning a J th order partial orthonormal DWT based upon a Daubechies wavelet lter of even length L N . The rows of W consist of circularly shifted (by multiples of 2) versions of the zero-padded wavelet lters for scale j , de ned via
hj [ hj;0; 0; : : : ; 0; hj;L ?1; hj;L ?2; : : : ; hj;2; hj;1 ]T ; j
where the non-zero wavelet lter coecients are in reverse order. Constructing a matrix from all possible circular shifts, at a particular scale j , of Equation (3.7) yields the sub-matrix Wj . This allows us to think of the orthonormal matrix W being comprised of several sub-matrices, each one stacked on top of the other, such that W = [W1; : : : ; WJ ; VJ ]T . For example, when L = 4 and N > 4 we get
2 66 h1;1 h1;0 0 0 0 0 66 h1;3 h1;2 h1;1 h1;0 0 0 66 0 0 h1;3 h1;1 h1;1 h1;0 W1 = 66 .. .. .. .. .. .. 66 . . . . . . 66 0 0 0 0 0 0 4 0
0 0 0 0 0 0
. . . ... 0 0 0
0 0 0 ... ... h1;3 h1;2 0 0
0 0 0 ... h1;1 h1;3
0 0 0 ... h1;0 h1;2
h1;3 0 0 ... 0 h1;1
h1;2 7 0 777 0 777 ... 77 ; 77 0 75 h1;0 (3.8)
where W1 is a N=2 N matrix whose rows are h1 circularly shifted by 2m ? 1 for m = 1; : : : ; N=2. The remaining sub-matrices W2; : : : ; WJ are de ned similarly to Equation (3.8), being shifted by 2j m ? 1 for m = 1; : : : ; N=2j , and VJ is identical in dimension to WJ but contains circularly shifted versions of gJ , instead of hJ , by 2J m ? 1 for m = 1; : : : ; N=2J . In practice, the rows of the matrix W are not explicitly constructed, but instead the DWT is implemented via a pyramid algorithm (Mallat 1989) that applies wavelet coecients to the input series and subsamples the output one scale at a time.
33 When applied to a vector of observations X, the DWT yields N wavelet coecients W = WX, which can be organized into J +1 vectors W = [W1; : : : ; WJ ; VJ ]T , similar to W above, where Wj is a length N=2j vector of wavelet coecients associated with changes on a scale of length 2j?1 and VJ is a length N=2J vector of scaling coecients associated with averages on a scale of length 2J . 3.2.2 Analysis of Variance
Like the DFT, orthonormality of the matrix W implies that the DWT is an energy preserving transform so that kWk2 = kXk2. This can be easily proven through basic matrix manipulation via
kXk2 = XT X = (WW)T WW = WT W T WW = WT W = kWk2 : Given the structure of the wavelet coecients, the energy in X is decomposed on a scale by scale basis via
kXk = 2
J X j =1
kWj k2 + kVJ k2 ;
where kWj k2 is the energy of X due to changes at scale j and kVJ k2 is the energy due to changes at scales J and higher. This property is exploited in later sections to de ne the wavelet variance (Section 3.4) and the wavelet covariance and correlation (Chapter 5).
3.3 The Maximal Overlap Discrete Wavelet Transform 3.3.1 Comparison with the DWT
The DWT is a very useful operation, but does not possess all the attributes which may be desirable for certain applications. In response to this, an alternative wavelet transform has been developed { the maximal overlap discrete wavelet transform (MODWT). The MODWT gives up orthogonality in order to gain other features
34 the DWT does not possess. It does this by not subsampling the ltered output at each scale. A consequence of this is the wavelet and scaling coecients must be rescaled in order to retain the energy preserving property of the DWT. The following properties are important in distinguishing the MODWT from the DWT (Percival and Mofjeld 1997): 1. The MODWT can handle any sample size N , while the J th order partial DWT restricts the sample size to a multiple of 2J . 2. The details and smooths of a MODWT multiresolution analysis are associated with zero phase lters. This means events in the original time series may be properly aligned with features in a multiresolution analysis. 3. The MODWT is invariant to circularly shifting the original time series. Hence, shifting the time series by a given amount will simply shift the MODWT wavelet and scaling coecients the same amount. This property simply does not hold for the DWT. 4. While both the DWT and MODWT can perform an analysis of variance on a time series, the MODWT wavelet variance estimator is asymptotically more ecient than the same estimator based on the DWT (Percival 1995). The transform goes by several names in the statistical and engineering literature, such as, the stationary DWT and translation-invariant DWT; see Percival and Mofjeld (1997) for more details. 3.3.2 De nition
The brief introduction presented here follows from Percival and Mofjeld (1997). A thorough discussion of the MODWT will appear in Percival and Walden (1999, Ch. 5).
35 The notation follows from the DWT, with the J th order partial MODWT being f=W fX, where W f is composed of J +1 length N vectors, W f 1; : : : ; W fJ de ned by W and Ve J , which can be arranged in the following manner
hf f iT f f e W W1 W2 WJ VJ : f j is associated with changes of length j = 2j?1 The vector of wavelet coecients W and Ve J is associated with averages of lengths j and higher. For time series of dyadic length, the MODWT can be subsampled and rescaled to obtain the DWT. f is also made up of J +1 subSimilar to the matrix W for the DWT, the matrix W f = hWf1; : : : ; WfJ ; VeJ iT . matrices, each of them N N , and may be expressed as W In this case, when L = 4 and N > 4, we have 2~ 3 ~ ~ ~ h 0 0 0 0 0 0 0 0 0 h h h 1;3 1;2 1;1 7 66 1;0 66 ~h1;1 ~h1;0 0 0 0 0 0 0 0 0 0 ~h1;3 ~h1;2 777 66 ~h1;2 ~h1;1 h~ 1;0 0 0 0 0 0 0 0 0 0 ~h1;3 77 66 7 66 ~h1;3 ~h1;2 h~ 1;1 h~ 1;0 0 0 0 0 0 0 0 0 0 777 66 0 ~h1;3 h~ 1;2 h~ 1;1 ~h1;0 0 0 0 0 0 0 0 0 77 f1 = 66 7; W 66 0 0 h~ 1;3 h~ 1;2 ~h1;1 ~h1;0 0 0 0 0 0 0 0 777 66 ... ... ... ... ... ... . . . ... ... ... ... ... ... ... 77 66 77 ~ ~ ~ ~ 66 0 0 0 0 0 0 0 h1;3 h1;2 h1;1 h1;0 0 0 77 66 0 0 0 0 0 0 0 0 ~h1;3 h~ 1;2 h~ 1;1 ~h1;0 0 77 4 5 ~ ~ ~ ~ 0 0 0 0 0 0 0 0 0 h1;3 h1;2 h1;1 h1;0 (3.9)
f1 is a N N matrix, and the rows of the matrix h~1 = h1=21=2 are simply where W the rescaled wavelet lter coecients circularly shifted by m ? 1 for m = 1; : : : ; N . In general, let h~j hj =2j=2 and g~J gJ =2J=2 be, respectively, the rescaled wavelet f. The remaining sub-matrices and scaling lter coecients required to construct W f2; : : : ; WfJ are constructed similarly to Equation (3.9) and VeJ has the same strucW fJ only using circularly shifted scaling coecients instead of the wavelet ture as W
36 coecients. Circular shifting for all scales is identical to that of Equation (3.9). In practice, a pyramid scheme is utilized similar to that of the DWT; see, e.g., Percival and Walden (1999, Sec. 5.4). 3.3.3 Analysis of Variance
Percival and Mofjeld (1997) proved that the MODWT is an energy preserving transform and, just as with the DWT, the total energy of a time series can be partitioned using the MODWT wavelet and scaling coecients; i.e., J 2 2 X f j
Ve J
: kXk =
W 2
j =1
This will allow us to construct MODWT versions of the wavelet variance (Section 3.4) and the wavelet covariance and correlation (Chapter 5).
3.4 Wavelet Variance 3.4.1 De nition
The wavelet transform has been used to decompose the variance of physical processes in many disciplines; see, e.g., Bradshaw and Spies (1992), Hudgins, Friehe, and Mayer (1993), and Wornell (1993). Percival (1995) investigated the concept of wavelet variance and showed that it decomposes the variance of a stationary process on a scale by scale basis. He also showed its asymptotic normality, thus allowing for approximate con dence intervals to be computed. We summarize some of his results and introduce notation which will be useful when de ning the wavelet correlation in Section 5.1.2. Let fXtg be a real valued Gaussian stationary process. The time independent wavelet variance is de ned to be the variance of the wavelet coecients at scale j ; i.e., (3.10) X2 (j ) 21 VarfWj;tg: j
37 Percival (1995) showed that the wavelet variance decomposes the variance of fXtg on a scale by scale basis; i.e., 1 X j =1
X2 (j ) = VarfXtg:
This is analogous to the spectral density function (Equation (B.2)) which decomposes the variance of fXtg on a frequency by frequency basis. We can form an unbiased estimator of the wavelet variance based upon the MODWT using
~X2 (j )
N X?1 f2 1 W ; e Nj l=L ?1 j;t
where Nej = N ? Lj + 1 and Lj = (2j ? 1)(L ? 1) + 1. This can be seen by
E ~2 ( ) = X
?1 n o 1 NX 2 = 1 E W 2 = 1 Var fW g ; f E W j;t j;t j;t 2j 2j Nej l=L ?1 j
which yields the result in Equation (3.10). The unbiased estimator based on the DWT is given by
^X2 (j )
NX ?1 1 b Wj;t2 ; 2j Nj l=L0 j
where Nj = N=2j , Nbj = Nj ? L0j and L0j = d(L ? 2)(1 ? 2?j )e. We utilize the wavelet variance not only when de ning the wavelet correlation between two time series, but also in Chapter 6 when analyzing time series from the physical sciences. 3.4.2 Equivalent Degrees of Freedom
The redundancy involved in the MODWT induces correlation in its wavelet coecients. A useful concept to de ne is the equivalent degrees of freedom for a time series, which is based on a 2 approximation to the distribution of (smoothed) periodogram
38 ordinates; see, for example, Priestley (1981, pp. 466{468), Brillinger (1981, pp. 145{ 146) and Percival and Walden (1993, Sec. 6.10). The equivalent degrees of freedom concept has been used, for example, in estimating the statistical bandwidth of a time series (Walden and White 1990) and establishing con dence intervals for the wavelet variance based on the MODWT (Percival 1995). fj;tg be a vector of MODWT wavelet coecients associated with scale j for Let fW a real valued Gaussian process fXt g whose dth order backward dierence is stationary, and let L > 2d. We de ne f ;X (j )g to be the known autocovariance sequence of fj;tg. If we assume the MODWT estimator of wavelet variance ~X2 (j ) can be fW approximated via
~X2 (j ) =d bj 2
d (Rice 1945), where \=" means equal in distribution, we obtain e 4 j = PNe ?1 Nj X (jj)j ; 2 (j )
1 ? ;X Ne =?(Ne ?1) j
(3.13) (3.14)
as an expression for the equivalent degrees of freedom of a vector of length Nej MODWT wavelet coecients for scale j (analogous to Equation (B.4)) and 2 bj = X(j ) ; j for the multiplicative constant. We can think of the equivalent degrees of freedom as a measure of the \sample size" for a time series with unimodal spectrum. For our purposes, we want to adjust hypothesis testing procedures that assume uncorrelated observations. This is of interest to atmospheric scientists also; see, e.g., Bretherton et al. (1998). Speci cally, substituting the equivalent degrees of freedom for a vector of MODWT wavelet coecients for the true sample size may compensate for their correlation structure and allow the use of critical values based on sequences of independent random variables. This allows us to use the MODWT in testing homogeneity of variance, on a scale by scale basis, in Section 4.4.1.
39 Table 3.2: Equivalent degrees of freedom for the MODWT of white noise (N = 512) using the Haar, D(4), LA(8), and LA(16) wavelet lters. The numbers in parentheses are Nej , the number of MODWT wavelet coecients unaected by the boundary conditions.
Wavelet 1 2 3 4 5 6 Haar 341:8 (511) 293:3 (509) 179:0 (505) 95:0 (497) 48:6 (481) 24:8 (449) D(4) 312:6 (509) 244:1 (503) 129:9 (491) 65:6 (467) 33:2 (419) 16:9 (323) LA(8) 293:9 (505) 204:0 (491) 103:1 (463) 51:9 (407) 26:3 (295) 13:5 (71) For a particular scale j , the equivalent degrees of freedom j of the wavelet lter can be computed by rst obtaining the autocovariance sequence f ;X (j )g. Assuming a MODWT of Gaussian white noise with unit variance, this is simply the inverse DFT of the squared gain function Hj () of the wavelet lter (cf. Section A.3). The number of non-zero terms in f ;X (j )g is quite small and can be determined from the length of the original wavelet lter. Plugging f ;X (j )g into Equation (3.14) gives the equivalent degrees of freedom for the wavelet lter associated with scale j . Table 3.2 gives the equivalent degrees of freedom for the MODWT wavelet coecients, using several wavelet lters, for scales j ; j = 1; : : : ; 6. As the amount of correlation increases at each scale of the MODWT, the autocovariance involves more and more non-zero terms, forcing the equivalent degrees of freedom down. A large sample approximation for j =Nej is possible by utilizing Cesaro summability (Titchmarsh 1939, p. 411) in the denominator of Equation (3.14) to yield, for large Nej , 2 P1 2 (j ) ?1 j P X4 (j ) =1 ;X : = 1 + 1 2 X4 (j ) Nej =?1 ;X (j ) If Lj 2, then the ratio must be smaller than 1 and thus, any application of the
40 Table 3.3: Large sample approximation to the ratio of equivalent degrees of freedom j =Nej .
Wavelet 1 2 3 4 5 6 Haar 0:6667 0:5714 0:3478 0:1839 0:0933 0:0468 D(4) 0:6095 0:4752 0:2521 0:1266 0:0633 0:0317 LA(8) 0:5730 0:3968 0:1999 0:0999 0:0500 0:0250
MODWT will result in a decrease in the eective sample size. Table 3.3 gives the large sample approximation to the ratio j =Nej for MODWT wavelet coecients applied to white noise. Comparing the results from Table 3.2 for a sample size of N = 512, the large sample approximation gives 1 = 341:3 versus 341.8 (a dierence of 12 df) for the unit scale Haar wavelet lter. The dierence between the two methods is less than 1 df for all other scales and wavelet lters. We therefore recommend the large sample approximation for moderate to large sample sizes in practice, keeping in mind the estimate will be slightly conservative. To see how well Equation (3.13) holds for MODWT wavelet coecients, a small Monte Carlo study (100 iterations) was performed to simulate the MODWT wavelet variance ~X2 (j ) and compare them to appropriate 2 distributions. Sequences of Gaussian white noise (N = 512) were simulated, and a partial MODWT (J = 6), using the Haar wavelet lter, was applied to them. Wavelet coecients aected by the boundary were discarded and the variance for each scale calculated. Figure 3.3 shows quantile-quantile plots for the estimated variances against 2 distributions with degrees of freedom given in the rst row of Table 3.2. Even for the higher scales, where there are relatively few degrees of freedom involved, the MODWT estimates of wavelet variance ~X2 (j ) appear to follow the approximation given in Equation (3.13).
Level: 6
Level: 5
Level: 4
Rescaled Wavelet Variance
Level: 3
100 Level: 2
Level: 1
Quantiles of the Chi-square Distribution
Figure 3.3: Quantile-quantile plots for the MODWT wavelet variance, using the Haar wavelet lter, against a 2 distribution with degrees of freedom taken from the rst row of Table 3.2. The wavelet variance has been adjusted at each scale in order to more easily compare it with the 2 distribution.
42 The exact distribution of the MODWT wavelet variance ~X2 (j ) can be found using the theory of quadratic forms of normal (Gaussian) random variables. Percival (1983, Sec. 2.5) investigated this for the Allan variance (Allan 1966), which is proportional to the wavelet variance using the Haar wavelet lter. Let us de ne
fj;l2 ej ~X2 (j ) lN=?L1?1 W N (3.15) Q 2 ( ) = 2 ( ) X j X j to be the quadratic form of interest. Now, we may rewrite Equation (3.15) as j
Ne X j
where fUi2g are independent 2 random variables with 1 degree of freedom each and f ig are the eigenvalues of the autocorrelation matrix B; see, e.g., Johnson and Kotz (1970, Ch. 29). The matrix B is band diagonal and computed by dividing the inverse DFT of the squared gain function for scale j (i.e., the autocovariance sequence f ;X (j )g) by the wavelet variance X2 (j ). To evaluate the distribution function of Q we utilize the method of Imhof (1961), where the characteristic function of Q is numerically inverted. The exact distribution for Q was obtained from a combination of S-Plus and FORTRAN code graciously provided by Professor R. Lockhart. We are not interested in the distribution of Q per se, but instead are interested in the distribution of the wavelet variance. This is easily obtained from the software using Equation (3.15) to obtain ) ( (e 2 ) 2 (j )qp N j ~X (j ) PfQ qpg = P 2 ( ) qp = P ~X2 (j ) X e = p; Nj X j where qp is the pth quantile of the distribution of Q. For comparison, we note the corresponding distribution of the wavelet variance assuming the equivalent degrees of freedom argument (Equation (3.13)) is given by
P ~X2 (j ) bj p = p;
Level: 5
Level: 6
0.2 edof = 12.7
edof = 6.8
0.0 0.0
Cumulative Distribution Function
Level: 3
Level: 4 1.0
0.2 edof = 45.2
edof = 24.3
0.0 0.0
0.25 0.0
Level: 1
Level: 2
0.2 edof = 85.6
edof = 73.6
0.0 0.0
Wavelet Variance
Figure 3.4a: Cumulative distribution functions for the MODWT wavelet variance, using the Haar wavelet lter. A sample size of N = 128 was used, with equivalent degrees of freedom given in each panel. The solid line uses the exact method while the dotted line uses the equivalent degrees of freedom method.
Level: 5 1.0
Exact EDOF
0.2 edof = 16.9
0.0 0.0
Cumulative Distribution Function
Level: 3
Level: 4 1.0
0.2 edof = 65.2
edof = 33.1
Level: 1
0.0 0.15
Level: 2
0.2 edof = 156.3
0.0 0.0
edof = 122.2
Wavelet Variance
Figure 3.4b: Cumulative distribution functions for the MODWT wavelet variance, using the D(4) wavelet lter. A sample size of N = 256 was used, with equivalent degrees of freedom given in each panel. The solid line uses the exact method while the dotted line uses the equivalent degrees of freedom method.
Exact EDOF
Cumulative Distribution Function
Level: 3
Level: 4 1.0
0.2 edof = 51.8
edof = 26.2
0.25 0.0
Level: 1
0.0 0.15
Level: 2
0.2 edof = 146.9
0.0 0.0
edof = 102.2
0.8 0.0
Wavelet Variance
Figure 3.4c: Cumulative distribution functions for the MODWT wavelet variance, using the LA(8) wavelet lter. A sample size of N = 256 was used, with equivalent degrees of freedom given in each panel. The solid line uses the exact method while the dotted line uses the equivalent degrees of freedom method.
46 where p is the pth quantile from a 2 distribution with j degrees of freedom. The results for the Haar wavelet lter are given in Figure 3.4a with the D(4) and LA(8) wavelet lters in Figures 3.4b{c, respectively. We see the distributions agree very well for the smaller scales { the two curves are virtually on top of one another. However, they begin to diverge slightly for higher scales (small equivalent degrees of freedom). The software limited the maximum sample size that could be analyzed. This is the reason for displaying fewer scales with respect to the D(4) and LA(8) wavelet lters. Although the distributions based on the equivalent degrees of freedom argument do not follow the true distribution for some scales, it is dicult to determine the impact this would have when using the equivalent degrees of freedom argument to modify hypothesis tests for homogeneity of variance. This point is explicitly investigated in Section 4.4.1, where the cumulative sum of squares test statistic formed with MODWT wavelet coecients is adjusted using the equivalent degrees of freedom on a scale by scale basis. Simulations are run to compare the empirical size of this hypothesis testing procedure using asymptotic critical values from the DWT applied to white noise.
Chapter 4
TESTING HOMOGENEITY OF VARIANCE Suppose we have a time series that we are considering to model as a realization of one portion Y1; : : : ; YN of a stationary Gaussian fractional dierence process fYtg de ned by Equation (2.3). An important assumption behind any stationary process is that its variance is a constant independent of the time index t. In the context of short memory models, such as stationary autoregressive and moving average (ARMA) processes, a number of tests have been proposed for homogeneity of variance. For a time series consisting of either independent Gaussian random variables with zero mean and possibly time-dependent variances t2 or a moving average of such variables, Nuri and Herbst (1969) proposed to test the hypothesis that t2 is constant for all t by using the periodogram of the squared random variables. Wichern, Miller, and Hsu (1976) proposed a moving block procedure for detecting a single change of variance at an unknown time point in an autoregressive model of order one. Hsu (1977, 1979) studied the detection of a variance shift at a single unknown point in a sequence of independent observations. Davis (1979) studied tests for a shift in the innovation variance of an autoregressive process at a speci ed point. Abraham and Wei (1984) used a Bayesian framework to study changes in the innovation variance of an ARMA process. Tsay (1988) looked at detecting several types of disturbances in time series { among them variance changes { by analyzing the residuals from tting an ARMA model. Srivastava (1993) found the cumulative sum of squares procedure to perform better than the exponentially weighted moving average procedure for detecting an increase in variance in white noise sequences. Inclan and Tiao (1994) investigated the detection of multiple changes of variance in sequences of independent Gaussian
48 random variables by recursively applying a cumulative sum of squares test to pieces of the original series. Using a similar recursive scheme, Chen and Gupta (1997) applied an information criterion test to the problem of multiple variance changes. None of the above tests have been adapted to work with long memory processes. The DWT has already proven useful for investigating other types of nonstationary events. For example, Wang (1995) tested wavelet coecients at ne scales to detect jumps and sharp cusps of signals embedded in Gaussian white noise, and Ogden and Parzen (1996) used wavelet coecients to develop a data-dependent threshold for removing noise from a signal. The key property of the DWT that makes it useful for studying possible nonstationarities is that it transforms a time series into coecients that re ect changes at various scales and at particular times. For fractional dierence and related long memory processes, the DWT wavelet coecients for a given scale are approximately uncorrelated; see Section 4.1. We show here that this approximation is good enough that a test designed for a null hypothesis of white noise can be used for testing homogeneity of variance in a long memory process on a scale by scale basis. An additional advantage of testing the output from the DWT is that the scale at which the inhomogeneity occurs can be identi ed. Using a variation of the DWT, the maximal overlap discrete wavelet transform (MODWT) (Section 3.3), we also investigate an auxiliary test statistic that can estimate the time at which the variance of a time series changes on a particular scale. Here, we demonstrate how the DWT can be used to construct a test for homogeneity of variance in a time series exhibiting long memory characteristics. We begin by performing a spectral analysis of DWT wavelet coecients, when applied to both short and long memory processes, in order to establish the approximate decorrelation property required for testing the statistical hypothesis of homogeneity of variance. We then introduce the normalized cumulative sum of squares test statistic. Its asymptotic and small sample distribution, along with its relationship to data analytic thresholding (Ogden and Parzen 1996), are investigated. Empirical size and power calculations
49 are presented when detecting single and multiple variance change points. The ability of the cumulative sum of squares test statistic to detect a change in the long memory parameter of a fractional dierence process is brie y investigated. Applications can be found in Section 6.1, where we analyze the annual minimum water levels of the Nile River, and Section 6.2, where we investigate a series of measurements related to vertical shear in the ocean.
4.1 Spectral Analysis of DWT Wavelet Coecients 4.1.1 Long Memory Processes
The ability of the wavelet transform to decorrelate time series, such as fractional dierence processes, producing DWT wavelet coecients for a given scale which are approximately uncorrelated is well known; see, e.g., Tew k and Kim (1992), McCoy and Walden (1996) and Wornell (1996). Here, we explore the output of the DWT when applied to a fractional dierence process. Emphasis will be on the spectral properties of the DWT wavelet coecients, instead of looking at the pairwise covariances between coecients as in the references given above. Some information and notational conventions used here are taken from Percival and Walden (1999). Let fXt g be a fractional dierence process with spectrum SX (f ) = j2 sin(f )j?2d, for jf j 21 (cf. Equation (2.3)). Let us restrict ourselves to looking at processes with dierence parameters 0 < d < 12 ; i.e., stationary long memory processes. We know the vector of DWT wavelet coecients W1 for unit scale is simply the original time series convolved with the wavelet lter h1 and subsampled by two. Now, the spectrum of a subsampled process, say Yt = X2t, is known to be
S 1 f + SX 21 f + 21 SY (f ) X 2 2
(Anderson 1971, p. 388), and from Section A.2, we know that ltering a time series corresponds to a multiplication of its spectrum with the squared gain function of the
50 lter. Hence, the ltered coecients have spectrum H1(f )S (f ), and
?1 ?1 1 1 1 1 H 1 2 f S 2 f + H1 2 f + 2 S 2 f + 2 S (f ) 1
is the spectrum of the DWT wavelet coecients for unit scale. For the class of Daubechies wavelets, we have a closed form expression for H1(), namely,
(D) 1 (f )
2 sinL(f )
X L=2 ? 1 + l
cos2l(f ) = D (f )C (f )
L 2
(Daubechies 1992, Ch. 6.1), where D(f ) j2 sin(f )j2 corresponds to a rst order backward dierence lter and
L=X 2?1 1 L=2 ? 1 + l cos2l(f ): C (f ) 2L?1 l l=0
Substituting Equations (2.3) and (4.3) into Equation (4.2) gives us
S1(D)(f )
? ? ?
D 12 f C 21 f S 21 f + D 21 f + 21 C 21 f + 12 S 21 f + 21 = 2 ? ? ? ? d ? ? d ? 1 1 1f + 1 C ?1f + 1 ( ( ) ) f C f + D D 2 2 2 2 2 2 : = L 2
L 2
L 2
The rst lter D?(d? )() corresponds to a fractional dierence process with dierence parameter d ? L2 , and the second lter C () has compact support. Let us look into the characteristics of the spectrum of DWT wavelet coecients at unit scale for the Haar wavelet. It is relatively simple to calculate, since H(Haar) (f ) = 1 2 sin2(f ). The spectrum of the ltered fractional dierence process fh1 Xtg is therefore L 2
H(Haar) (f )S (f ) = 1 j2 sin(f )j?2(d?1): 1 2 That is, the spectrum of the ltered process is proportional to a fractional dierence process with parameter d0 = d ? 1. Since we were looking at so called \red" processes with 0 < d < 12 , this means ?1 < d0 < ? 21 and hence the ltered process is \blue."
51 The colorful terminology comes from optics, where low frequencies of light are seen as red and high frequencies seen as blue. The spectrum of the DWT wavelet coecients using the Haar wavelet is
S1(Haar)(f ) =
1 2 sin f ?2(d?1) + 2 cos f ?2(d?1) : 4 2 2
When d = 0, fXtg is a white noise process and the spectrum of the DWT wavelet coecients for unit scale is constant { as to be expected. Figure 4.1 shows the theoretical spectra for the unit scale DWT coecients of fractional dierence processes with ? 21 d 12 . As the long memory parameter approaches 0.5 or ?0:5, the wavelet coecients have a greater amount of correlation. However, the range of the vertical axis in each plot is only from ?3 to 3 decibels, and the variation for any particular spectrum is less than this range, so the spectrum for any choice of long memory parameter does not have much structure beyond that of white noise. The formula given in Equation (4.2) can be extended to an arbitrary scale j using the notion of a cascading lter (cf. Section A.3). The rst step is to separate Hj(D)() into pieces using an equivalent formula to Equation (3.6) for squared gain functions, namely,
Hj(D)(f ) = H1(D)(2j?1 f )Gj(?D1)(f ) = D (2j?1 f )C (2j?1 f )Gj(?D1)(f ); L 2
(D) 1 (f )
2 cosL(f )
X L=2 ? 1 + l
sin2l(f )
is the squared gain function for the Daubechies scaling coecients. Using the trigonometric identity, sin2(2f ) = 4 sin2(f ) cos2(f ), we can re-express the rst term of Equation (4.4) via
D(2j?1 ) = D(f )
j ?2 Y k=0
4 cos2(2k f ):
d frequency
d frequency
Figure 4.1: Theoretical spectra for the unit scale DWT wavelet coecients of fractional dierence processes. The z-axis ranges from ?3 to 3 dB and, for ease of viewing, the x and y-axes have been reversed { the long memory parameter d goes from ?0:5 to 0:5 and the frequency goes from 0 to 0:5 in the direction of the arrow.
53 Since we are downsampling by 2 at each level of the transform, the spectrum for a vector of DWT wavelet coecients Wj associated with scale j is
Sj(D)(f ) = where
(D) j (f ) =
X Hj(D) ? 21 f + 2k S ? 21 f + 2k
2j ?1
D (f ) L 2
"Y j ?2 k=0
4 cos2(2k f )
L 2
C (2j?1 f )Gj(?D1)(f ):
That is, the spectrum is stretched by 2j , and then 2j ? 1 aliased versions are added to it (Vetterli and Kovacevic 1995, p. 66). This can intuitively be seen through successive applications of Equation (4.1) to the ltered spectrum. Table 4.1: Maximum dynamic range for the spectra of DWT wavelet coecients, in decibels (dB), when applied to fractional dierence processes with long memory parameter d.
Level d 2 [? 12 ; 0] d 2 [0; 12 ] d 2 [? 12 ; 0] d 2 [0; 21 ] d 2 [? 12 ; 0] d 2 [0; 12 ] 1 1:43 1:48 1:40 1:56 1:36 1:59 2 1:83 2:26 2:28 2:56 2:51 2:71 3 2:07 2:71 2:64 2:93 2:81 3:00 4 2:21 2:87 2:76 3:07 2:80 3:14 5 2:33 2:93 2:84 3:10 2:82 3:16 6 2:42 2:95 2:88 3:10 2:83 3:16 To summarize the information contained in Figure 4.1, a useful measure to introduce is the dynamic range, de ned to be maxf S (f ) 10 log10 min S (f ) f
54 (Percival and Walden 1993, p. 201). Table 4.1 gives the maximum dynamic ranges, in dB, for the spectra of DWT coecients applied to fractional dierence processes with long memory parameter ? 21 d 21 . As the level of the DWT increases, where more and more energy is present for red processes, the dynamic range of the spectra is negligible and appears to level o around 3 dB regardless of wavelet lter. This lack of dynamic range, which corresponds to almost uncorrelated observations in the original process, is utilized in the next chapter to test for nonstationary events in the presence of long memory structure. 4.1.2 Short Memory Processes
The frequency-domain analysis of Daubechies wavelet coecients from long memory processes (speci cally, fractional dierence processes) is relatively straightforward since their spectra are of a similar form. It is also useful to look at the behavior of these wavelet coecients from short-memory processes. We focus on two simple processes, the moving average process of order 1 (MA(1)) and autoregressive process of order 1 (AR(1)). The spectra of these two processes are given by
S (MA)(f ) = 2 1 ? e?i2f 2 = 2 1 ? 2 cos(2f ) + 2 S (AR)(f ) = 2 1 ? e?i2f ?2 = 2 1 ? 2 cos(2f ) + 2 ?1 ; respectively (Percival and Walden 1993, pp. 392,443). Let us rst look at the Haar wavelet lter. The spectrum of the unit scale DWT wavelet coecients (2 = 1) applied to the MA(1) process is
S1(Haar,MA)(f )
2 2 1 ?2 cos(f ) + 1 ? 2 sin(f ) + 2 ; + cos2 f 2
55 Table 4.2: Maximum dynamic range for the spectra of DWT wavelet coecients, in decibels (dB), when applied to AR(1) and MA(1) processes with parameters and , respectively.
Level 2 (?1; 0] 2 [0; 1) 2 (?1; 0] 2 [0; 1) 2 (?1; 0] 2 [0; 1) 1 42:85 2:91 42:83 3:03 42:80 3:07 2 0:66 4:64 1:35 5:19 1:96 5:42 3 0:75 5:38 0:95 5:89 0:55 5:98 4 0:77 5:05 0:39 6:04 0:12 6:11 5 0:73 5:71 0:08 6:06 0:17 6:11 6 0:74 5:85 0:10 5:97 0:17 6:02 Level 2 (?1; 0] 2 [0; 1) 2 (?1; 0] 2 [0; 1) 2 (?1; 0] 2 [0; 1) 1 34:54 2:90 42:98 2:88 43:03 2:84 2 1:84 3:01 2:23 4:19 2:47 4:92 3 0:73 3:00 0:73 4:70 0:71 5:50 4 0:35 3:00 0:30 4:99 0:31 5:72 5 0:19 3:01 0:16 5:17 0:20 5:77 6 0:12 3:00 0:12 5:26 0:18 5:72 and for the AR(1) process
2 ?1 2 1 ?2 cos(f ) + 1 ? 2 sin(f ) + 2?1 : + cos2 f 2 When = 0 for the MA(1) process, or = 0 for the AR(1) process, we see that the spectra equal 1 for all frequencies. This is to be expected since the processes
S1(Haar,AR)(f )
56 themselves are simply white noise. Table 4.2 gives the maximum dynamic ranges for the spectra of DWT wavelet coecients applied to short-memory processes. Dynamic ranges are similar to those encountered for fractional dierence processes, except for the rst scale. As ! ?1 and ! ?1, the DWT wavelet coecient spectra appear to asymptote. This is easy to explain, since the spectra of these AR(1) and MA(1) processes have a large spike of energy around f = 0:5. Whereas fractional dierence processes are limited in the amount of energy at higher frequencies, these short memory processes are not. The DWT partitions the frequency plane in octave bands and therefore captures most of the spectral energy in the rst scale. This may appear to be a fatal aw in the decorrelating properties of the DWT, but in the following section I discuss how to adapt the transform to overcome this problem in practice. Figures 4.2 and 4.3 show the spectra of unit scale DWT wavelet coecients of AR(1) and MA(1) processes, respectively. As already noted in Table 4.2, the spectra are relatively featureless except where ! ?1 and ! ?1. At rst glance, the location of the asymptote may seem counter-intuitive. The reason it is located at f = 0 is because of the folding of the spectrum through subsampling to obtain the DWT wavelet coecients (cf. Equation (4.1)). 4.1.3 Conclusions
The DWT has been shown, through spectral theory, to approximately decorrelate both short-memory and long memory processes. As seen from in Figures 4.2 and 4.3, this attribute appears to fail when the process asymptotes in the high frequency range, say f = 0:5, instead of in the low frequency range. If this occurs in practice, there are at least two simple ways to overcome this problem. First, a signal processing trick of multiplying every other value of the time series by ?1 will reverse the spectrum of the original series. A large amount of energy in high frequencies will therefore be shifted into the lower frequencies { where the DWT has been show to approximately
φ 20
Haar 15
dB 5
Figure 4.2: Theoretical spectra for the unit scale DWT wavelet coecients of an AR(1) process. The z-axis ranges from ?5 to 40 dB and, for ease of viewing, the xand y-axes have been reversed { the parameter goes from ?0:99 to 0:99 and the frequency goes from 0 to 0:5 in the direction of the arrow.
dB -5
θ -15
Haar -20
dB -30
θ frequency
Figure 4.3: Theoretical spectra for the unit scale DWT wavelet coecients of an MA(1) process. The z-axis ranges from ?5 to 40 dB and, for ease of viewing, the xand y-axes have been reversed { the parameter goes from ?0:99 to 0:99 and the frequency goes from 0 to 0:5 in the direction of the arrow.
59 decorrelate. Instead of preprocessing the time series, a slight adjustment to the wavelet transform has a similar result. Namely, switch the wavelet and scaling lters when performing the DWT. Thus, lower frequency ranges will be ltered out and the ne partitioning of the frequency plane will occur as f ! 0:5. If the spectrum of the underlying process asymptotes at a frequency between 0 and 0.5, a generalization of the DWT { the discrete wavelet packet transform (DWPT) { may be used to construct an orthonormal transform of the data with an alternative frequency tiling; see, e.g., Wickerhauser (1994) or Percival and Walden (1999, Ch. 9) for more information on the DWPT. Such a process is a generalization of a fractional dierence process; see, e.g., Hosking (1981) and Giraitis and Leipus (1995).
4.2 Normalized Cumulative Sum of Squares Test Statistic 4.2.1 De nition
Let X1; X2; : : : ; XN be a sequence of independent Gaussian (normal) random variables with zero means and variances 12; 22; : : : ; N2 . We would like to test the hypothesis
H0 : 12 = 22 = = N2 :
A test statistic that can discriminate between this null hypothesis and a variety of alternative hypotheses (such as H1 : 12 = = k2 6= k2+1 = = N2 , where k is an unknown change point) is the normalized cumulative sums of squares test statistic D, de ned as follows. Let
Pk X 2 Pk PjN=1 j2 ; k = 1; : : : ; N ? 1; X j =1 j
and de ne D max(D+ ; D? ), where k?1 k ? + ? Pk and D 1max Pk ? N ? 1 : D 1max kN ?1 kN ?1 N ? 1
60 Percentage points for the distribution of D under the null hypothesis can be readily obtained through Monte Carlo simulations. When N = 2,
8 > x < 21 ; > < 0; (4.9) PfD xg = > P 1 ? x B ? 21 ; 12 x ; 12 x < 1; > : 1; x 1; ? where B 1 ; 1 is a beta random variable with parameters 1 and 1 . The proof of this 2 2
is straightforward. Let X1 and X2 be two independent Gaussian random variables with zero means and common variance (under H0), then 2 P1 = X 2X+1 X 2 and P2 = 1: 1 2 The random variable P1 has a beta distribution with parameters 12 and 21 . Now, the preliminary statistics are given by D? = P1 and D+ = 1 ? P1, therefore
PfD xg = Pfmax(P1; 1 ? P1) xg = PfP1 x; 1 ? P1 xg = Pf1 ? x P1 xg and Equation 4.9 follows directly. There is no known tractable closed form expression for PfD xg with arbitrary N . Hsu (1977) commented on this fact and used two methods to obtain small sample critical values; Edgeworth expansions and tting the rst three moments of his statistic, which is equivalent to D, to a one-parameter beta distribution. Inclan and Tiao (1994) proved that, for large N and x > 0,
1 N D x P sup W 0 x = 1 + 2 X l e?2l x ; ( ? 1) t 2 t l=1 2 2
where Wt0 is a Brownian bridge process, and the right-hand expression is Equation (11.39) of Billingsley (1968). Table 4.3 shows how quickly the Monte Carlo critical values converge to the quantiles of the Brownian bridge process.
61 Table 4.3: Monte Carlo critical values for the test statistic (N=2) D, using the Haar wavelet lter, for a level test. These values are based upon 10,000 replicates. The standard error (SE) is provided for each estimate, and was computed via SE = f(1 ? )=(10; 000f 2 )g where f is the histogram estimate of the density at the (1 ? )th quantile using a bandwidth of 0.01 (Inclan and Tiao 1994). Quantiles of a Brownian bridge process are given at the far right for comparison. 1 2
1 2
0.10 SE 0.05 SE 0.01 SE
8 1:109 0:003 1:232 0:004 1:459 0:007
16 1:135 0:003 1:265 0:004 1:508 0:008
32 1:157 0:003 1:293 0:004 1:553 0:008
Sample size
64 1:182 0:003 1:313 0:004 1:584 0:009
128 1:193 0:003 1:326 0:004 1:596 0:008
256 1:197 0:003 1:329 0:004 1:596 0:010
512 1:206 0:003 1:345 0:004 1:630 0:008
1024 1 1:209 1:224 0:003 1:341 1:358 0:004 1:617 1:628 0:007
4.2.2 Data Analytic Thresholding
The idea of using a Kolmogorov-type statistic for wavelet coecients has been proposed previously by Ogden and Parzen (1996). They were interested in applying change-point methods to the problem of wavelet thresholding. Their result is a threshold which is determined by the data being analyzed on a scale by scale basis. As in Section 4.2.1, let X1; : : : ; XN be a sequence of independent Gaussian random variables. The procedure is based on a sample Brownian bridge process 8 1 1 PbNtc 1 P t b Nt c < p i=1 g (Xi ) N t 1; i=1 g (Xi ) ? N cN0 (t) N (4.10) W :0 0 t < N1 ; where g() is a nonlinear function, g2 Var fg(Xi)g and bxc is the greatest integer g
cN0 (t) j 0 t 1g converges in distribution less than or equal to x. The process fW to a Brownian bridge process. In practice, their thresholding technique is implemented on a scale by scale basis to sequences of wavelet coecients as follows: [1] Form the sample Brownian bridge process (Equation (4.10)) of the wavelet coecients and test against the null distribution; see, e.g., Table 4.2 in Stephens (1986) for appropriate critical values. [2] If the null hypothesis is rejected, remove the wavelet coecient with the greatest absolute value, reduce t to t ? 1, and return to [1]. [3] If the null hypothesis is not rejected, set the threshold equal to the absolute value of the largest (in absolute magnitude) wavelet coecient. Although several transformations and empirical distribution function tests were available, Ogden and Parzen (1996) used the transformation g(x) = x2 in Equation (4.10) and the Kolmogorov{Smirnov test statistic. It should be pointed out, g will not be known in practice and hence must be estimated from the data. This problem was addressed in Ogden (1994, Sec. 5.5) by recommending the median absolute deviation of the nest level of wavelet coef cients as in Donoho and Johnstone (1994). By formulating the test statistic as in p Equation (4.8), the g term is no longer involved; it being replaced by 2 which is scale independent. This is seen through the following argument. Let Y1; : : : ; YN be a sequence of j th level wavelet coecients, from the DWT, obtained from a white noise process (let N be dyadic for simplicity). Hence, the wavelet coecients are also distributed as white noise. Let g2 Var fY12g and g E fY12g. We de ne the statistic c Y 2 bNtc ) p ( PbkNt VN (t) N PN=1 k2 ? N ; 0 t 1; k=1 Yk
63 where VN () is related to D via r 1 p sup jVN (t)j N2 D for large N; 2 t and de ne
9 8bNtc N < X c X Y 2= ; 0 t 1; UN (t) p 1 : Yk2 ? bNt N k=1 k ; Ng k=1 P to be Ogden's Brownian bridge statistic, where 0k=1 Yk2 0, then 9 p 8 bX Ntc N < c X Y 2= VN (t) = PN N 2 : Yk2 ? bNt N k=1 k ; k=1 Yk k=1 9 8bNtc N < X X c Y 2= = p1 g 1 PN1 2 : Yk2 ? bNt k; N N g Y =
1 N g g
k=1 k
PN Y 2 UN (t) k=1 k
U V as N ! 1;
where U is a Brownian bridge process. Since the squared wavelet coecients are distributed as 221 random variables,
g = Var fY12g = 24 and g = E Y12 = 2; and hence
g = 2 2 = p2: g 2 When looking at the boundary crossing probability for a Brownian bridge process, p the asymptotic critical values for V are 2 times the asymptotic critical values for U. Thus, we do not need to estimate the variance of the squared wavelet coecients when testing for homogeneity of variance. The relationship between Equations (4.8) and (4.10) is similar to the one between the test of periodogram ordinates by Schuster (1898) and Fisher's g-statistic (Fisher 1929), where standardizing by the sum of squares eliminates having to know the variance of the time series.
4.3 Testing Procedure We present the steps for two testing procedures: the cumulative sum of squares test using the DWT and the cumulative sum of squares test using the MODWT. The procedure presented here is for the DWT-based cumulative sum of squares test statistic using Monte Carlo critical values. It [1] generates a realization of length N from a fractional dierence process with a speci ed parameter d; [2] computes the partial DWT of order J , de ned in Section 3.2, using the Haar, D(4) and LA(8) wavelet lters; [3] discards all coecients on each scale that make explicit use of the periodic boundary conditions; [4] computes the test statistic D for all scales based upon the remaining wavelet coecients; and 1
[5] rejects the null hypothesis if (N=2) 2 D is greater than the Monte Carlo white noise critical levels. A slight modi cation may be made for the DWT-based procedure for large sample sizes, speci cally, using asymptotic critical values instead of one obtained through Monte Carlo experiments (cf. Table 4.3). The MODWT-based cumulative sum of squares procedure, using Monte Carlo critical values, is similar to the DWT-based procedure already de ned { simply substitute the MODWT for the DWT. Asymptotic critical values are not available since the MODWT wavelet coecients are correlated. However, a slight modi cation may be made by substituting the equivalent degrees of freedom j (cf. Section 3.4.2) instead of the sample size Nej when testing the statistic D. If we believe that the
65 equivalent degrees of freedom adequately captures the inherent correlation in the wavelet coecients, then asymptotic critical values based on the DWT (which assume uncorrelated wavelet coecients) may be used. Thus, we are freed from Monte Carlo experiments in order to test arbitrary sample sizes.
4.4 Testing for a Single Variance Change 4.4.1 Empirical Size
To study if in fact the DWT of a fractional dierence process behaves as if each sub-series were white noise, as far as the test statistic D is concerned, Monte Carlo methods were employed. We determined the upper 10%, 5% and 1% quantiles for the distribution of D based upon 40,000 realizations of white noise for a range of sample sizes commensurate with time series of sample sizes N = 128; 256; 512; 1024 and 2048. Using these quantiles, we then employed the testing procedure outlined in Section 4.3, of order J = 4, repeated 10,000 times each for d = 0:05; 0:25; 0:4 and 0.45. The percentages of times that D exceeded the white noise critical levels are recorded in Figure 4.4. We see that the percentages are quite close to the rejection rates established from white noise, so the assumption that the DWT decorrelates long memory processes is a good one for the purposes of evaluating D. We can thus conduct an approximate level test for variance homogeneity of a fractional dierence process on a scale by scale basis by simply using critical levels determined under the assumption of white noise. Figure 4.4 indicates that the Haar wavelet performs as well as the D(4) and LA(8) wavelets for a fractional dierence process. These later wavelets are more appropriate for nonstationary processes with stationary dierences because the extra implicit dierencing operations ensure that the wavelet coecients form a stationary process with zero mean, which will not necessarily be true for the Haar wavelet. For a fractional dierence process, the simple form of the Haar wavelet makes it possible
66 256 Level: 4 d: 0.05
Level: 4 d: 0.25
Level: 4 d: 0.40
Level: 4 d: 0.45
1 Level: 3 d: 0.05
Level: 3 d: 0.25
Level: 3 d: 0.40
Level: 3 d: 0.45
Level: 2 d: 0.05
Level: 2 d: 0.25
Level: 2 d: 0.40
Level: 2 d: 0.45
Rejection Rate (%)
1 Level: 1 d: 0.05
Level: 1 d: 0.25
Level: 1 d: 0.40
Level: 1 d: 0.45
1 256
Sample Size
Figure 4.4: Rejection rates for fractional dierence processes using white noise critical levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter, the dotted line is the D(4) and the dashed line is the LA(8).
67 256 Level: 4 d: 0.05
Level: 4 d: 0.25
Level: 4 d: 0.40
Level: 4 d: 0.45
1 Level: 3 d: 0.05
Level: 3 d: 0.25
Level: 3 d: 0.40
Level: 3 d: 0.45
Level: 2 d: 0.05
Level: 2 d: 0.25
Level: 2 d: 0.40
Level: 2 d: 0.45
Rejection Rate (%)
1 Level: 1 d: 0.05
Level: 1 d: 0.25
Level: 1 d: 0.40
Level: 1 d: 0.45
Sample Size
Figure 4.5: Rejection rates for fractional dierence processes using asymptotic critical levels, N = 128, 256, 512, 1024 and 2048. The solid line is the Haar wavelet lter, the dotted line is the D(4) and the dashed line is the LA(8).
68 to obtain analytical expressions for the quantiles of D, a subject for future study. One may not want to perform Monte Carlo studies in order to obtain critical values for the test statistic D. The simulation study described above was run again substituting the asymptotic critical values (last column of Table 4.1) for the Monte Carlo critical values. The results are given in Figure 4.5, using a similar vertical axis to Figure 4.4 for comparison. The percentage of times D exceeded the asymptotic critical levels was within 10% of the theoretical quantile when there were at least 128 wavelet coecients. The Haar wavelet lter was found to be conservative for all sample sizes, that is, the percentage of times D exceeded the asymptotic critical levels was below the theoretical quantile. Hence, using asymptotic critical values will give reasonable results, if Monte Carlo critical values are unavailable, when the at least 128 wavelet coecients are present. To investigate how well this approximation performs for large sample sizes, the procedure from Section 4.3 was performed for fractional dierence processes of length N = 215 = 32; 768 using a partial DWT of order J = 8. Due to the computational time involved, the procedure was only repeated 1000 times. The percentages of times that D exceeded the white noise critical levels under these conditions were found to be quite close to the rejection rates established from asymptotic critical values with increased variability due to the reduced number of iterations in the Monte Carlo study. Thus, all the simulation studies we have conducted to date indicate that the DWT adequately decorrelates long memory processes for the purpose of using the test statistics D. Although the MODWT of a fractional dierence process exhibits correlation between wavelet coecients, it does retain a greater number of coecients per scale. This may be a useful attribute for testing a wider range of alternative hypotheses, not just a sudden change of variance. To examine the cumulative sum of squares procedure using the MODWT, a similar investigation to the DWT was performed. The correlation structure of MODWT wavelet coecients invalidates our ability to
69 use an asymptotic distribution (like a Brownian bridge process for the DWT) when testing for homogeneity of variance. Although Monte Carlo techniques are relatively easy to implement, they depend on the sample size. When repeatedly testing under unknown sample sizes, e.g., testing multiple variance changes in Section 4.6, this requires considerable computing time. One possible approach is to compensate for the correlation structure by modifying the test statistic D, computed via the MODWT, using the equivalent degrees of freedom. The equivalent degrees of freedom argument was investigated in Section 3.4.2 with respect to the wavelet variance. The distribution under this argument was not found to dier too drastically from the true distribution of the MODWT estimator of the wavelet variance. Here, instead of testing (N=2) D we propose to use (=2) D, where is the equivalent degrees of freedom given by Equation (3.14). By doing so, we attempt to obviate the need for determining critical levels via Monte Carlo experiments. To investigate this test, we followed the procedure outlined in Section 4.3, of order J = 4, repeated 10,000 times each for d = 0:05; 0:25; 0:4 and 0.45. The percentages of times that (=2) D exceeded the asymptotic critical levels are recorded in Figure 4.6. We see that the percentages are quite close to the nominal rejection rates, when the long memory parameter is smaller, and the percentages are more and more conservative as d approaches 0.5. Between the three wavelet lters, the LA(8) appears to give the most consistent rejection rates across all sample sizes and long memory parameters. The D(4) also performs reasonably well, but is quite a bit more conservative when compared with the LA(8) wavelet lter { especially as the number of wavelet coecients decreases. This problem has already been noted when using the DWT, that is, when using asymptotic critical values all wavelet lters suer as the number of wavelet coecients decreases. The equivalent degrees of freedom adjustment is crude, however, by using it the MODWT may be used to conduct an approximate level test for variance homo1 2
1 2
1 2
70 256 Level: 4 d: 0.05
Level: 4 d: 0.25
Level: 4 d: 0.40
Level: 4 d: 0.45
1 Level: 3 d: 0.05
Level: 3 d: 0.25
Level: 3 d: 0.40
Level: 3 d: 0.45
Level: 2 d: 0.05
Level: 2 d: 0.25
Level: 2 d: 0.40
Level: 2 d: 0.45
Rejection Rate (%)
1 Level: 1 d: 0.05
Level: 1 d: 0.25
Level: 1 d: 0.40
Level: 1 d: 0.45
Sample Size
Figure 4.6: Rejection rates for fractional dierence processes using the MODWT and asymptotic critical values, adjusted using equivalent degrees of freedom (N = 128; 256; 512; 1024; 2048). The solid line is the Haar wavelet lter, the dotted line is the D(4) and the dashed line is the LA(8).
71 geneity of a fractional dierence process on a scale by scale basis when there are at least 64 wavelet coecients. 4.4.2 Empirical Power
With the empirical size of the proposed test established, we can look at how powerful the test is to particular alternatives. The procedures outlined in Section 4.3, of order J = 4, were repeated for a speci c sample size N = 656 and long memory parameter d = 0:40 (this particular sample size and long memory parameter mimic the attributes of the Nile River minimum water levels introduced in Section 6.1). One modi cation was made to Step [1], namely, adding a vector of independent Gaussian random variables to the rst 100 observations of the fractional dierence process. Instead of adjusting the long memory parameter, the variance ratio between the rst 100 and remaining observations was adjusted { producing variance ratios of = 1:5, 2, 3 and 4. Table 4.4 gives the rejection rates, at the = 0:05 level of signi cance, for the cumulative sum of squares (CSS) method when applied to fractional dierence processes (N = 656; d = 0:40) with one variance change at k = 100. The parameter is de ned to be the ratio of variances within each octave band between the fractional dierence process with noise and without. For a given , the variance ratio decreases as the level increases. This is reasonable, since less and less of the spectrum is being captured as the level of the DWT increases; so the in uence of adding a constant to all frequencies will gradually diminish. For all variance ratios the CSS method gives reasonable rejection rates. While an increase in variance appears to aect primarily the rst two levels of the DWT wavelet coecients, as seen by the high percentage of rejections, as the magnitude of the variance change increases higher levels of wavelet coecients are also aected. This is important to consider when deciding between a change in variance and a change in the long memory parameter of a time series (see Section 4.7 for further
Table 4.4: Performance of cumulative sums of squares (CSS) method for fractional dierence processes (N = 656; d = 0:40) with one change of variance at k = 100. All tests were performed at the = 0:05 level of signi cance. The parameter indicates the variance ratio between the rst 100 and remaining observations, and the parameter is the octave band by octave band variance ratio.
= 1.5
Level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
Haar 1.82 89:9 1.55 42:5 1.33 13:9 1.19 8:2 2.64 99:9 2.10 82:8 1.65 32:9 1.38 12:2 4.29 100:0 3.19 98:9 2.30 67:9 1.75 25:4 5.93 100:0 4.29 99:9 2.95 84:8 2.13 40:3
D(4) 92:0 42:7 14:0 7:5 99:9 83:0 31:7 11:1 100:0 98:8 63:9 22:3 100:0 99:8 82:6 32:9
LA(8) 92:2 40:1 11:7 7:0 99:9 80:3 24:8 8:5 100:0 98:5 52:7 13:1 100:0 99:8 71:6 19:9
73 discussion). 4.4.3 Conclusions
Several procedures for testing homogeneity of variance, on a scale by scale basis, have been investigated with respect to fractional dierence processes. When using Monte Carlo critical values, the DWT-based cumulative sum of squares procedure is shown to have an adequate empirical size. When using asymptotic critical values, this procedure gives reasonable results when at least 128 wavelet coecients are present for testing. The MODWT-based CSS procedure using asymptotic critical values is slightly conservative when at least 128 wavelet coecients are used. I have also shown that the cumulative sum of squares test statistic, based on the DWT, can successfully detect changes of variance in fractional dierence processes. Depending on the magnitude of the variance change, the rst two scales are primarily aected. This is to be expected since the octave band variance ratio decreases as the scale of the DWT increases.
4.5 Locating a Single Variance Change 4.5.1 Auxiliary Test
We shift our attention to determining the location of a variance change in the original time series. A naive choice of location can be based on the cumulative sum of squares statistic D; i.e., on the location of the wavelet coecient at which the cumulative sum of squares at level j achieves its maximum. Since each wavelet coecient is simply a linear combination of observations from the original series, this procedure will yield a range of observations which contain the change of variance. The subsampling inherent in the DWT, however, causes a loss of resolution at each scale with respect to the original time series. We thus propose to use the MODWT (Section 3.3) to more accurately determine the location of a variance change after it has been detected by
74 the DWT. This way, the location of a variance change may be associated with a speci c observation in the original time series. This is another example of how the MODWT, through a lack of subsampling, is useful over and above the usual DWT. 4.5.2 Simulation Study
A study was conducted to investigate how well the statistic De , which is similar to D but computed using the MODWT wavelet coecients, locates a single variance change in a series with long memory structure when a change is already detected using the DWT. To do this, we implemented the procedure described in Section 4.3, of order J = 4 and for one long memory parameter d = 0:4, repeated 10,000 times to test for a single change of variance in time series of length N = 656. The sample size and choice of long memory parameter were motivated by the Nile River example (Section 6.1). The following steps were added to the detection procedure in order to simultaneously estimate the location of the variance change. The additional steps include [2a] computing the partial MODWT of order J , de ned in Section 3.3, using the Haar, D(4) and LA(8) wavelet lters; [3a] discarding all MODWT wavelet coecients, on each scale, aected by the periodic boundary conditions; [4a] computing the statistic De for all scales based upon the remaining MODWT wavelet coecients; [5a] recording the location of the wavelet coecient from which the statistic De , computed using the MODWT, attains its value and adjusting for the phase by shifting the location L2 units to the left (see Percival and Mofjeld (1997) for more details). j
75 100
4:1 Haar
Level 2
... ... .... .... ......... .. .... ... ..... .......... ... ....... .............. .
.... ...... ..... ..... ..... .... .. .... .... ... ..... ...... ........ ......... ... .............. .
Level 1
.... ... ...... ..... ... ... .. ... ..... ......... .
... ... ... ...... ....... .. ... ... .. ... ... ....... ..... .....
Level 1
Level 1
.. .............. ....... .. ... ...... ..... .. .... ... ... ..... ... ..... .. .... ............................ ... .. .
..... .. .... ...... .... .... .... .... ... .. .. .... ..... ... .. .... .... .. ...............
....... ......... .... ...... .. ... ........ ..... ... ... .... ... ............... ..
.... ..... ... ..... ... .... ... ....... .. ...... .. ...... ... .. ......................................
... .... ... ... .... ... ... ..... ... ..... .. ...... ..... ........... .. ... ..... ......... ................... ... .... ....................... .. . .
2:1 LA(8)
...... .. ... .. ........ .. ..... ... .... .. ...... .. ........ .. ... ...... ...... ..... .... .............. ............. ..... ................... ....
... ... .. ....... ... .. ........ ... .......... .. .......... ... .......... .. ...... ..................... .................. ........... .. .. . . .
1.5:1 LA(8)
... ............. .. .... .... .... ....... ...... ..... .... ... ..... ..... ................. ............. ... .
. ..
..... ... ..... ...... ..... ...... .. ... ................ ........ ................. ....... ..... ....... ................. ..
..... ........ ... ..... .......... ...... ... ... ....... ...... .. ..... ..... ............. ........................ .
1.5:1 D(4)
..... ... ..... ........ ......... ...... ... ..... ........... ........................................... . ..
.......... ... ..... .... .. ... ... ..... ... ........ ...... ..... .... ..... ....... ......... ............................ .
.... ... ....... ..... .... ...... .. ... ........... ... ... ..... ......................
..... ....... ..... ... .. ... ......... ... .. ... ..... ... .... ...... ... .... .. ...... ...... ..... ......... ............... ...
1.5:1 Haar
Level 1
3:1 LA(8)
2:1 D(4)
..... .. .... ... .... .... ..... ... ... ........ ...... .. ...... ....... ...... ............ .......... ........................ ....... ... ...
Level 2
4:1 LA(8)
..... ........ ...... ... .. ... ... ...... .... ..... .... ..... ................. . . .
..... ........ .... .. .... .... ... ... ... ... ....... ... ... ..... .......... .... ... ..... .............. . .. .
3:1 D(4)
2:1 Haar
Level 2
... ....... ..... ....... ... .. .... ... .... ....... .........
3:1 Haar
Level 2
300 4:1 D(4)
... .... .... .... ............... ............... .......... ........... ................ . .
... ..... ... ...... ... ..... ..... ... .. .... ..... ........ .......... ................................. ....................... .. . .
Wavelet Coefficient
Figure 4.7: Estimated locations of variance change at k = 100 for fractional dierence processes (N = 656; d = 0:4) using the MODWT. Each boxplot contains a varying number of estimates corresponding to the associated rejection rate.
76 As in Section 4.4.2, Step [1] was modi ed by adding a sequence of white noise to the rst 100 observations creating variance ratios of 1.5, 2, 3 and 4. The estimated locations of the variance changes are displayed in Figure 4.7. The estimates are roughly centered around the 100th wavelet coecient for j = 1; 2 with the spread narrowing as the variance ratio increases. There is a very slight dierence between wavelet lters, the broader spread being associated with the longer wavelet lters. However, for variance ratios of = 2 or greater all three wavelets appear to perform equally well. The estimates from the rst level of the MODWT have a median value closer to the truth with much less spread at every combination of variance ratio and wavelet lter, as compared to the second level. The slight positive bias appearing in the rst scale, with more bias in the second, appears to be an intrinsic feature to the cumulative sum of squares method. Inclan and Tiao (1994) showed that the average estimated location of change is biased towards the middle of the series when using such a procedure; for sample sizes of 100, 200 and 500, and variance ratios of = 2 and 3. This should be kept in mind when interpreting the results from such an analysis.
4.5.3 Conclusions
I have shown that the cumulative sum of squares statistic, using the MODWT, can accurately locate a change of variance in fractional dierence processes. When the variance ratio is large enough ( 2), the wavelet coecients at the rst scale are distributed very tightly around the true location. Wavelet coecients at the second scale require larger variance ratios ( 3) to achieve the same level of accuracy. To reduce bias, I recommend using the estimate associated with the rst level when trying to locate a sudden change of variance in a time series.
4.6 Testing for Multiple Variance Changes 4.6.1 Iterated Algorithm
We move on to a natural extension of the previous section { multiple unknown variance change points. In practice, a given time series may exhibit more than one potential change in variance. Inclan and Tiao (1994) and Chen and Gupta (1997) have both recently looked at this problem. They employ a \bisection algorithm" where the entire time series is initially tested. If there is a signi cant change of variance, then the series is split in two about the estimated change point and each tested individually. This process is iterated until no signi cant changes are found. Inclan and Tiao (1994) included an additional step when detecting multiple variance change points. After the bisection algorithm had terminated, each \potential" change point was tested again using only those observations between its two adjacent change points. For example, a vector of length 128 containing potential change points at 26, 69, and 108, would again test 26 using only observations 1; : : : ; 69, test 69 using observations 26; : : : ; 108 and test 108 using observations 69; : : : ; 128. This was to compensate for an apparent overestimation of the number of variance change points. Simulations were run both with and without this additional procedure. The rejection rates for the rst two scales were found to change up to 4% for low variance ratios and up to 1% for larger variance ratios. All tables using the iterated CSS method include the extra step. For a time series Y1; : : : ; YN , the iterated CSS algorithm proceeds as follows: [1] Determine the test statistic D, via the procedure described in Section 4.3, and record the point k1 at which D is attained. If D exceeds its critical value for a given level of signi cance , then proceed to the next step. If D is less than the critical value, the algorithm terminates. [2] Determine the test statistic D for the new time series Y1; : : : ; Yk . If D exceeds 1
78 its critical value, then repeat step 2 until D is less than its critical value. [3] Determine the test statistic D for the new time series Yk ; : : : ; YN . Repeat step 3 until D is less than its critical value. 1
[4] Go through the potential change points as outlined above. 4.6.2 Empirical Power
The procedure outlined in Section 4.6.1 was repeated for a speci c sample size N = 656 and long memory parameter d = 0:40, with a partial DWT of order J = 4. Again, a vector of independent Gaussian random variables was added to the rst 100 observations of the fractional dierence processes in Step [1] of Section 4.3. Instead of adjusting the long memory parameter, the variance ratio between the rst 100 and remaining observations was adjusted { producing variance ratios of = 1:5, 2, 3 and 4. Table 4.5 displays simulation results for the iterated CSS method, employed in this divide-and-conquer scheme, when detecting one unknown variance change point. All tests were performed at the = 0:05 level of signi cance, and the same octave band variance ratios apply as in Table 4.4. We see the CSS procedure does quite well at locating only one variance change point for all variance ratios. With a ratio of = 2 or greater, it errs only towards multiple change points { always indicating at least one change point in the rst scales. For larger variance ratios ( 3) the procedure produces rejection rates around 90% or greater in the rst two scales, and it errs on the side of three or more change-points with greater frequency. Table 4.6 displays simulation results for the iterated CSS when detecting two unknown variance change points. Again, all tests were performed at the = 0:05 level. Gaussian random variables, of length 100, were added to the middle of the series creating two variance changes at k1 = 250 and k2 = 350 for a variety of
Table 4.5: Empirical power of iterated Cumulative Sum of Squares (CSS) algorithm for fractional dierence processes (N = 512; d = 0:4) with one variance change at k = 100.
= 1.5
Level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
0 9:5 58:9 87:5 95:1 0:1 17:7 69:2 90:8 0:0 1:3 33:8 79:4 0:0 0:1 16:6 66:0
Haar 1
85.2 39.8 12.2 4.9 93.0 79.6 30.1 9.2 92.9 95.4 64.7 20.6 92.9 96.3 81.5 34.0
5:3 1:2 0:3 0:0 6:9 2:8 0:7 0:0 7:1 3:3 1:5 0:0 7:1 3:6 1:9 0:0
0 7:6 58:9 88:2 95:2 0:1 17:9 71:2 91:2 0:0 1:2 38:3 80:9 0:0 0:1 19:1 69:1
D(4) 1
87.8 39.7 11.6 4.8 93.5 79.5 28.2 8.8 93.6 95.5 60.4 19.1 92.7 96.3 79.1 30.9
4:6 1:4 0:2 0:0 6:4 2:6 0:6 0:0 6:4 3:3 1:2 0:0 7:3 3:6 1:8 0:0
0 7:3 60:4 90:3 95:9 0:1 20:6 77:2 93:5 0:0 1:9 49:4 87:2 0:0 0:2 29:1 79:4
LA(8) 1
88.8 38.5 9.5 4.1 94.2 77.1 22.4 6.5 93.8 95.0 49.6 12.3 93.2 96.5 69.4 20.6
3:9 1:2 0:2 0:0 5:7 2:3 0:4 0:0 6:2 3:2 1:0 0:0 6:8 3:3 1:5 0:0
Table 4.6: Empirical power of the iterated Cumulative Sum of Squares (CSS) algorithm for fractional dierence processes (N = 512; d = 0:4) with two variance changes at k1 = 250 and k2 = 350. Variance ratios are given by .
= 1.5
Level 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
0 14:5 67:4 88:7 94:6 0:1 26:7 75:8 91:5 0:0 1:6 48:0 83:5 0:0 0:1 26:3 74:0
1 9:3 20:8 10:1 5:4 0:2 17:2 18:5 8:5 0:0 2:2 24:0 16:5 0:0 0:2 18:8 26:0
71.8 11.7 1.1 0.0 91.8 55.4 5.6 0.0 90.4 93.8 27.8 0.0 89.5 96.5 54.4 0.0
4:3 0:1 0:0 0:0 7:9 0:7 0:0 0:0 9:6 2:3 0:2 0:0 10:5 3:2 0:6 0:0
0 11:7 67:3 88:8 100:0 0:0 26:2 77:3 100:0 0:0 1:6 49:7 100:0 0:0 0:1 29:6 100:0
1 10:4 22:6 10:6 0:0 0:2 22:1 19:0 0:0 0:0 4:0 29:8 0:0 0:0 0:7 28:0 0:0
73.6 10.0 0.6 0.0 92.4 51.1 3.7 0.0 91.4 92.5 20.3 0.0 90.5 96.6 42.1 0.0
4:2 0:1 0:0 0:0 7:3 0:6 0:0 0:0 8:6 1:9 0:2 0:0 9:5 2:6 0:4 0:0
0 11:2 67:7 90:4 100:0 0:0 27:2 78:8 100:0 0:0 2:2 55:2 100:0 0:0 0:1 34:9 100:0
1 11:4 23:8 9:2 0:0 0:3 26:4 18:3 0:0 0:0 5:5 29:4 0:0 0:0 0:9 28:9 0:0
74.2 8.5 0.4 0.0 94.0 45.8 2.9 0.0 92.5 90.4 15.3 0.0 91.4 96.3 35.7 0.0
3:2 0:1 0:0 0:0 5:8 0:5 0:0 0:0 7:5 1:9 0:1 0:0 8:6 2:8 0:5 0:0
81 variance ratios. The iterated CSS method once again performs quite well for small variance ratios = 1:5, with a slight increase in power as the wavelet lter increases in length. For larger variance ratios, the rst scale gives a maximum rejection rate of 94% and then hovers around 90% for very large . All errors in the rst scale, for higher variance ratios, are towards overestimating the number of variance changes. The second scale, which exhibits almost no power for smaller variance ratios, rapidly approaches the 90{95% range for 3 and errs primarily towards overestimating the number of variance changes also. The 100% rejection rates for the D(4) and LA(8) wavelet lters in the fourth scale occurs because of a reduction, due to boundary aects, in the number of wavelet coecients below a minimum established threshold. 4.6.3 Locating Multiple Variance Changes
The MODWT can be utilized again to estimate the location of multiple variance changes. The procedure is slightly more complicated than in the single variance change scenario, but manageable. For each iteration of the algorithm, estimated locations of the variance change for both the DWT and MODWT are recorded. The DWT estimates are used to test for homogeneity of variance and the MODWT estimates are used to determine the time of the variance change, as in Section 4.5. Obviously, the MODWT estimate of the time of the variance change is discarded if the variance change is found not to be signi cant. Figures 4.8a{d displays the estimated location of variance change for various fractional dierence processes with one change of variance. We see more and more of the area of the histogram centered at k = 100 as the variance ratio increases. There also appears to be a small percentage of rejections to the right of the main peak across all levels and wavelet lters. This is to be expected, since we are not forcing the testing procedure to stop at only one change of variance. The small percentage of second or third variance changes (cf. Table 4.5) in the same scale appear as an increase in the right tails of these histograms. With this feature in mind, the procedure still performs
82 0
100 200 300 400 500
100 200 300 400 500
1.5:1 Haar 1
1.5:1 Haar 2
1.5:1 Haar 3
1.5:1 Haar 4
1.5:1 D(4) 1
1.5:1 D(4) 2
1.5:1 D(4) 3
1.5:1 D(4) 4
0 1.5:1 LA(8) 1
1.5:1 LA(8) 2
1.5:1 LA(8) 3
1.5:1 LA(8) 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.8a: Estimated locations of variance change at k = 100 for fractional difference processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is = 1:5.
83 0
100 200 300 400 500
100 200 300 400 500
2:1 Haar 1
2:1 Haar 2
2:1 Haar 3
2:1 Haar 4
2:1 D(4) 1
2:1 D(4) 2
2:1 D(4) 3
2:1 D(4) 4
0 2:1 LA(8) 1
2:1 LA(8) 2
2:1 LA(8) 3
2:1 LA(8) 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.8b: Estimated locations of variance change at k = 100 for fractional difference processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is = 2.
84 0
100 200 300 400 500
100 200 300 400 500
3:1 Haar 1
3:1 Haar 2
3:1 Haar 3
3:1 Haar 4
3:1 D(4) 1
3:1 D(4) 2
3:1 D(4) 3
3:1 D(4) 4
0 3:1 LA(8) 1
3:1 LA(8) 2
3:1 LA(8) 3
3:1 LA(8) 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.8c: Estimated locations of variance change at k = 100 for fractional difference processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is = 3.
85 0
100 200 300 400 500
100 200 300 400 500
4:1 Haar 1
4:1 Haar 2
4:1 Haar 3
4:1 Haar 4
4:1 D(4) 1
4:1 D(4) 2
4:1 D(4) 3
4:1 D(4) 4
0 4:1 LA(8) 1
4:1 LA(8) 2
4:1 LA(8) 3
4:1 LA(8) 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.8d: Estimated locations of variance change at k = 100 for fractional difference processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is = 4.
86 quite well when the variance ratio is relatively large ( 2), especially in the rst two scales. The third and fourth scales are quite spread out and not recommended for estimating variance change points in practice. Figures 4.9a{d display the estimated location of variance change for various fractional dierence processes with two variance changes. For small variance ratios ( = 1:5) the cumulative sum of squares procedure does a decent job with locating the variance changes in the rst scale, with mixed results for the second scale. As before, we do not expect much information to come from looking at higher scales. Although, as the magnitude of the variance ratio increases the higher scales (j = 3; 4) do exhibit structure similar to the rst two scales. Regardless, we shall strictly use the rst two scales for inference in the future. With respect to the rst two scales, as the variance ratio increases to, say, = 3 or 4, then the bimodality is readily apparent. As was the case for a single variance change, the longer wavelet lters give a slightly more spread out distribution for the locations of the variance changes. To be more precise, the estimated locations appear to be skewed to the right at k1 = 250 and k2 = 350 for the D(4) and LA(8) wavelet lters, especially in the second scale.
4.6.4 Conclusions
I have presented the iterated cumulative sums of squares (CSS) algorithm for detecting and locating multiple variance changes in time series with long-range dependence. The rst scale of wavelet coecients is quite powerful for variance ratios of = 2 or greater, for either one or two variance change-points. The second scale is also equally powerful, but for variance ratios of = 3 or greater. This procedure also performs well at locating single or multiple variance changes using the auxiliary test statistic compute via the MODWT.
87 0
100 200 300 400 500
100 200 300 400 500
1.5:1 LA(8) Level 1
1.5:1 LA(8) Level 2
1.5:1 LA(8) Level 3
1.5:1 LA(8) Level 4
1.5:1 Haar Level 1
1.5:1 Haar Level 2
1.5:1 Haar Level 3
1.5:1 Haar Level 4
Percent of Total
0 1.5:1 D(4) Level 1
1.5:1 D(4) Level 2
1.5:1 D(4) Level 3
1.5:1 D(4) Level 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.9a: Estimated locations of variance change at k1 = 251 and k2 = 350 for fractional dierence processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is 1.5.
88 0
100 200 300 400 500
100 200 300 400 500
2:1 LA(8) Level 1
2:1 LA(8) Level 2
2:1 LA(8) Level 3
2:1 LA(8) Level 4
2:1 Haar Level 1
2:1 Haar Level 2
2:1 Haar Level 3
2:1 Haar Level 4
Percent of Total
0 2:1 D(4) Level 1
2:1 D(4) Level 2
2:1 D(4) Level 3
2:1 D(4) Level 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.9b: Estimated locations of variance change at k1 = 251 and k2 = 350 for fractional dierence processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is 2.
89 0
100 200 300 400 500
100 200 300 400 500
3:1 LA(8) Level 1
3:1 LA(8) Level 2
3:1 LA(8) Level 3
3:1 LA(8) Level 4
3:1 Haar Level 1
3:1 Haar Level 2
3:1 Haar Level 3
3:1 Haar Level 4
Percent of Total
0 3:1 D(4) Level 1
3:1 D(4) Level 2
3:1 D(4) Level 3
3:1 D(4) Level 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.9c: Estimated locations of variance change at k1 = 251 and k2 = 350 for fractional dierence processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is 3.
90 0
100 200 300 400 500
100 200 300 400 500
4:1 LA(8) Level 1
4:1 LA(8) Level 2
4:1 LA(8) Level 3
4:1 LA(8) Level 4
4:1 Haar Level 1
4:1 Haar Level 2
4:1 Haar Level 3
4:1 Haar Level 4
Percent of Total
0 4:1 D(4) Level 1
4:1 D(4) Level 2
4:1 D(4) Level 3
4:1 D(4) Level 4
0 0
100 200 300 400 500
100 200 300 400 500
Wavelet Coefficient
Figure 4.9d: Estimated locations of variance change at k1 = 251 and k2 = 350 for fractional dierence processes (N = 656; d = 0:4) using the iterated cumulative sum of squares procedure and maximal overlap discrete wavelet transform. The variance ratio is 4.
4.7 Testing for a Change in the Long Memory Parameter 4.7.1 Introduction
It is also possible to utilize the test for homogeneity of variance to test, at least indirectly, for a change in the long memory parameter of a fractional dierence process. A change in the long memory parameter should manifest itself dierently than a sudden change of variance in each scale of wavelet coecients. To be precise, whereas a change in variance will primarily aect only the rst two scales of wavelet coef cients, a change in the long memory parameter should aect much higher scales (corresponding to lower frequencies). We restrict ourselves to the following alternative hypothesis, namely, the long memory parameter makes a sudden change from one value to another at time t0 while the process variance remains constant. How to construct such a process is given below. Let fUtg be a fractional dierence process with long memory parameter d1 and fVtg be a fractional dierence process with long memory parameter d2. In both processes the innovations variance is de ned to be unity. From Section 2.1.1, we have expressions for the spectral density functions, variances, and autocovariance sequences of these processes. Suppose X1; : : : ; XN is a time series where the rst t0 observations are a realization of a portion of the fractional dierence process fUtg. So the variance of X1; : : : ; Xt is simply 0
?(1 ? 2d1 ) ; k = 1; : : : ; t VarfXk g = VarfUtg = [?(1 0 ? d )]2 1
(cf. Equation 2.2). Let the remaining N ? t0 observations be a realization of a portion of a fractional dierence process with long memory parameter d2 and innovations variance
fUtg : 22; Var VarfV g t
92 Hence, the variance of Xt+1 ; : : : ; XN is equivalent to the variance of X1; : : : ; Xt . The only change at time t0 occurs with respect to the long memory parameter. 15
d = 0.05 d = 0.25 d = 0.40 d = 0.45
Figure 4.10: Spectra of fractional dierence processes and octave bands of the DWT. Frequencies between the vertical dashed lines correspond to approximate pass-bands of the DWT. The spectra have been normalized in order to produce time series with the variance of a fractional dierence process with long memory parameter d = 0:05. Figure 4.10 shows how the spectra from several dierent fractional dierence processes compare throughout octave bands which approximately correspond to scales of the DWT. All spectra have the same total energy, which is equal to the variance of a fractional dierence process with long memory parameter of d = 0:05. Since the wavelet variance is approximately the integral of the spectral density function over an octave band (Percival and Guttorp 1994), we would not expect to detect a change
93 in the variance when the spectra of the two fractional dierence processes cross. In fact, the variability of the wavelet coecients would be greater in the section of the time series with smaller long memory parameter than the section with the larger long memory parameter for the scale before they intersect, with this pattern reversed for all scales after the intersection. 4.7.2 Simulation Results
To explore how the test for homogeneity of variance, as proposed in Section 4.3, responds to a change in the long memory parameter of a fractional dierence process we rst simulated such processes using the methodology presented in Section 2.2. When run 1000 times, where the change in the long memory parameter occurred at three dierent locations (t0 = 200; 512; 824), the rejection rates for testing homogeneity of variance are given in Table 4.7. All tests were performed at the = 0:05 level of signi cance. As in Table 4.4, the parameter gives an octave band by octave band variance ratio; i.e., how much the underlying spectrum from the beginning of the series diers from the underlying spectrum from the end of the series on an octave band by octave band basis. These variance ratios are greater than one for the rst few scales and then less than one for all subsequent scales. This agrees with the relationship between the underlying spectra as seen in Figure 4.10. For a sudden change in the long memory parameter from d1 = 0:05 to d2 = 0:25, the testing procedure does a fair job; with rejection rates reaching 80% when the change occurs in the middle of the series. We also see rejection rates around 5% at the third scale, where the two spectra associated with the rst and second portion of the process cross, and slightly higher rejection rates in the subsequent scales. We see much higher rejection rates in the rst scale ( 100%) when the change in the long memory parameter increases to 0.4 or 0.45. When d2 = 0:4, the spectra cross in the fourth scale with rejection rates hovering around 5% and a slight increase in the rejection rates for larger scales. When d2 = 0:45, the spectra cross essentially on the
Table 4.7: Rejection rates for a change in the long memory parameter of a fractional dierence process (N = 1024). For all cases, observations X1; : : : ; Xt have long memory parameter d1 = 0:05, while Xt +1; : : : ; XN have long memory paramter d2. The quantity provides the octave band by octave band variance ratio. 0
2 = 0:45
2 = 0:4
2 = 0:25
Level 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
1.48 1.21 0.93 0.71 0.54 0.41 3.08 2.16 1.37 0.85 0.53 0.32 5.76 3.83 2.28 1.32 0.76 0.44
t0 = 200 t0 = 512 t0 = 824 Haar D(4) LA(8) Haar D(4) LA(8) Haar D(4) LA(8) 44.8 7.6 5.1 5.5 5.1 5.6 100.0 77.0 13.1 5.4 6.5 6.1 100.0 99.6 58.7 9.1 5.4 5.2
44.6 50.9 69.6 78.1 79.2 25.2 29.8 27.8 9.7 9.5 15.0 15.9 16.0 6.8 7.2 5.5 6.5 5.7 5.7 6.1 6.2 5.7 4.5 5.5 5.1 5.3 10.6 11.5 12.7 9.8 10.9 7.4 6.7 4.5 13.3 15.4 14.0 14.7 14.3 10.8 5.3 4.9 15.5 15.5 11.2 17.4 14.1 5.4 100.0 100.0 100.0 100.0 100.0 99.8 100.0 99.8 80.2 82.7 96.2 97.0 96.6 47.1 49.3 42.3 15.5 11.4 18.9 21.8 19.7 7.1 7.7 7.0 5.6 5.3 6.5 7.4 5.7 5.4 6.7 6.2 6.2 5.0 16.9 17.0 14.6 16.0 14.8 11.2 6.5 5.0 19.5 15.8 9.6 24.1 20.7 6.6 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 99.8 99.9 100.0 100.0 100.0 96.2 97.1 96.2 61.9 57.5 82.0 81.8 82.6 20.3 21.1 19.2 10.7 8.5 8.9 7.9 9.4 5.4 4.1 5.7 5.2 5.7 7.2 7.8 7.5 6.0 7.6 7.4 5.2 6.0 13.2 10.8 8.4 13.6 13.8 6.1
95 boundary between the fourth and fth scales, and hence, both sets of rejection rates are relatively small with an increase seen in the sixth scale. 4.7.3 Conclusions
I have brie y investigated how the test for homogeneity of variance reacts to changes in the long memory parameter of a fractional dierence process; which constitute a simple example of a generalized fractional dierence process (Wang et al. 1997). Speci cally, the simulation procedure in Section 2.2 has been modi ed to produce time series where the long memory parameter changes at a given time, but the variance of the process remains constant. When applying the testing procedure on time series simulated this way, the pattern of rejection rates across scales diers from those found when a simple change in variance occurs. Hence, this method shows promise in addressing a very dierent alternative hypothesis { one where the variance remains constant but the autocovariance structure of the time series changes abruptly. The current procedure is crude and would bene t greatly from additional research.
Chapter 5
WAVELET ANALYSIS OF COVARIANCE In this chapter, we consider the use of wavelets in the analysis of multivariate time series. In his thesis, Hudgins (1992) introduced the concepts of the wavelet cross spectrum and wavelet cross correlation, both in terms of the continuous wavelet transform. In a subsequent paper, Hudgins, Friehe, and Mayer (1993) applied these concepts to atmospheric turbulence. They found the bivariate wavelet techniques provided a better analysis of the data over traditional Fourier methods { especially at low frequencies. Lindsay, Percival, and Rothrock (1996) de ned the sample wavelet covariance for the discrete wavelet transform (DWT) and maximal overlap discrete wavelet transform (MODWT) along with con dence intervals based on large sample results. These methods were applied to the surface temperature and albedo of ice pack in the Beaufort Sea. Kawata and Arimoto (1996) discussed the wavelet correlation and its ability to match features between two signals. They show the estimated wavelet correlation is eective when compared to other measures of \local correlation," such as the Gabor transform (or short-time Fourier transform). Li and Nozaki (1997) also introduced the wavelet cross-correlation in terms of the continuous wavelet transform and related it to the cross spectrum. They went on to use the real portion of the wavelet cross-correlation to analyze both simulated and real signals. Recently, Torrence and Compo (1998) discussed the cross-wavelet spectrum, which is complex valued, and the cross-wavelet power, which is simply the magnitude of their cross-wavelet spectrum. They also introduced con dence intervals for their cross-wavelet power and compare the Southern Oscillation Index with the Ni~no3 sea surface temperature.
97 Here, we introduce quantities that measure the association between two stationary processes based on the the DWT and MODWT. First, the wavelet covariance for bivariate stationary time series is introduced along with the notion of a decomposition of covariance. That is, the wavelet covariance is shown to decompose the covariance between two stationary processes on a scale by scale basis. The wavelet correlation is also introduced, which is analogous to the usual correlation coecient but utilizes the wavelet covariance and wavelet variance. Asymptotic normality of the wavelet covariance and correlation is established. Estimation procedures are provided along with approximate con dence intervals for the estimated wavelet covariance and wavelet correlation. Both the wavelet covariance and wavelet correlation are generalized into the wavelet cross-covariance and wavelet cross-correlation. The lack of shift invariance of the DWT is shown to bias the variance of the DWT estimator of the wavelet cross-covariance. This may arise due to misalignment between the two time series. Finally, moment properties of two potential estimators for the variance of the wavelet covariance are investigated. One is shown to be clearly superior to the other.
5.1 De nition of the Wavelet Covariance Let fXtg and fYtg be stationary processes with univariate spectra (also known as autospectra) SX () and SY (), respectively. The wavelet covariance of fXt; Yt g for scale j = 2j?1 is de ned to be o o n n
XY (j ) 21 Cov Wj;t(X ); Wj;t(Y ) = 21 E Wj;t(X )Wj;t(Y ) ; (5.1) j j where fWj;t(X )g and fWj;t(Y )g are the scale j wavelet coecients for fXtg and fYtg, respectively. 5.1.1 Decomposition of Covariance
We now show a basic result of the wavelet covariance, namely, it decomposes the covariance between two stationary time series on a scale by scale basis. This argument
98 closely follows the proof of decomposition of variance for the wavelet variance, see Percival and Walden (1999, Sec. 8.1), the major complication here being that the cross spectrum SXY () is a complex-valued function (cf. Section C.1). We begin by expressing the covariance between two ltered time series in the Fourier domain.
Proposition 5.1 Suppose that fXtg and fYtg are zero-mean weakly stationary processes with autospectra SX () and SY (), respectively. If fal j l = 0; : : : ; L ? 1g is a lter of length L with transfer function de ned to be
A(f )
L?1 X l=0
then the covariance between fal Xtg and fal Yt g is given by
Covfal Xt ; al Ytg =
1 2
? 12
A(f )SXY (f ) df;
where A(f ) jA(f )j2 is the squared gain function for A().
Proof We can apply the spectral representation theorem (Equation (B.1)) to the stationary processes fXtg and fYtg, giving us Xt =
1 2
1 2
ei2ft dZ
X (f )
and Yt =
1 2
1 2
ei2ft dZY (f )
where fZX ()g and fZY ()g are not only orthogonal but also cross-orthogonal processes; i.e., E [dZX (f ) dZY (f 0)] = 0; f 6= f 0. De ne fUtg and fVtg to be ltered versions of fXt g and fYtg, respectively; i.e.,
U t al X t =
L?1 X l=0
alXt?l and Vt al Yt =
L?1 X l=0
alYt?l :
From Section A.2 we know that convolution in the time domain is equivalent to multiplication in the Fourier domain, hence, alternative representations for fUtg and fVtg are given by the spectral representation theorem (Equation (B.1)) via
Ut =
1 2 1 2
A(f )ei2ft dZ
X (f )
and Vt =
1 2
1 2
A(f )ei2ft dZY (f ):
99 The covariance between fUtg and fVt g, using Fubini's theorem (Lehmann 1983, p. 15) and the fact that fZX ()g and fZY ()g are cross-orthogonal processes, is therefore CovfUt; Vt g = E fUtVtg = E = =
1 2
? 12
Z Z 1 2
1 2 1 2
? 12
1 2 1 2
A(f 0)e?i2f 0t dZ (f 0) X
1 2
? 12
A(f )ei2ft dZY (f )
A(f 0)A(f )ei2(f ?f 0)t E [dZX (f 0) dZY (f )]
A(f )SXY (f ) df;
where SXY () is the cross spectrum of fXt; Ytg.
2 Using this fact, we can now establish the nite decomposition of covariance between two time series using the wavelet covariance.
Proposition 5.2 Let fXtg and fYtg be weakly stationary processes with autospectra SX () and SY (), respectively. De ne VeJ;t(X ) g~J;l Xt?l and VeJ;t(Y ) g~J;l Yt?l, which are stationary processes obtained by ltering fXt g and fYtg using the MODWT scaling lter fg~J;lg. For any integer J 1, J n e (X ) e (Y )o X CovfXt ; Ytg = Cov VJ;t ; VJ;t + XY (j ); j =1
where XY (j ) is the wavelet covariance for scale j .
fj;t(X )g and fWfj;t(Y )g are obtained by ltering the stationary proBecause fW fj;t(X )g and fWfj;t(Y )g are stationary cesses fXtg and fYtg, respectively, we know that fW processes with autospectra de ned by
Sj;X (f ) = Hej (f )SX (f ) and Sj;Y (f ) = Hej (f )SY (f );
where Hej () is the squared gain function for f~hj;lg. Since the wavelet covariance fj;t(X )g and fWfj;t(Y )g, and since the integral of
XY (j ) is the covariance between fW
100 the cross spectrum SXY () is equal to this covariance, we can use Proposition 5.1 to obtain
XY (j ) =
1 2
? 12
Hej (f )SXY (f ) df;
n e (X ) e (Y )o Z e Cov VJ;t ; VJ;t = GJ (f )SXY (f ) df; 1 2
? 12
where GeJ () is the squared gain function for fg~J;lg. The squared gain functions for f~hj;lg and fg~J;l g are given by a formula equivalent to Equation (3.6) for squared gain functions; i.e.,
Y Hej (f ) = He(2j?1 f ) Ge(2l f ) j ?2 l=0
Y GeJ (f ) = Ge(2lf ): J ?1 l=0
Since He(f ) = H(f )=2 and Ge(f ) = G (f )=2, we may use Equation (3.5) to say that Ge(f ) + He(f ) = 1 for all f . We therefore have CovfXt ; Ytg =
1 2
? 12
SXY (f ) df = =
Z h eG(f ) + He(f )i SXY (f ) df ? n e (X ) e (Y )o Cov V1;t ; V1;t + XY (1); 1 2
1 2
and the case when J = 1 holds. We now proceed to prove the main assertion by induction. Assume the property holds for J ? 1; i.e., J ?1 n e (X ) e (Y ) o X CovfXt; Ytg = Cov VJ ?1;t ; VJ ?1;t + XY (j ): j =1
101 So we have Cov
VJ(?X1);t; VJ(?Y 1);t
= = = = =
Therefore, CovfXt ; Ytg =
1 2
GeJ ?1 (f )SXY (f ) df
? # Z "JY ?2 Ge(2lf ) SXY (f ) df ? l=0 "JY?2 Z h i e J ?1 e J ?1 e 1 2
1 2
1 2
1 2
1 2
G (2 f ) + H(2 f )
l=0 Z h i GeJ (f ) + HeJ (f ) SXY (f ) df ? o n Cov VeJ;t(X ); VeJ;t(Y ) + XY (J ):
G (2lf ) SXY (f ) df
1 2
1 2
J ?1 n e (X ) e (Y )o X Cov VJ;t ; VJ;t + XY (J ) + XY (j )
o X
= Cov VeJ;t(X ); VeJ;t(Y ) +
j =1
j =1
XY (j )
2 The decomposition of covariance will now be established by allowing J ! 1. This is intuitively plausible since the wavelet lter is capturing smaller and smaller portions of the cross spectrum as J gets larger.
Theorem 5.1 Let fXtg and fYtg be stationary processes with autospectra SX () and SY (), respectively, and let XY (j ) be the wavelet covariance associated with scale j , then
1 X j =1
XY (j ) = CovfXt; Ytg;
that is, the wavelet covariance decomposes the covariance between fXt g and fYt g on a scale by scale basis.
n (X ) (Y )o Lemma 5.1 For all > 0, there exists a J such that Cov VeJ;t ; VeJ;t < for
J > J .
Proof Because Pl gJ;l2 = 1 and g~J;l = gJ;l=2J=2 , we have Pl g~J;l2 = 1=2J . Parseval's relation (cf. Section A.2) tells us that
1 2
? 12
GeJ (f ) df =
Z 2 LX ?1 2 = 1: e g~J;l GJ (f ) df = 2J ? 1 2
1 2
Recall, we know the amplitude spectrum AXY (f ) jSXY (f )j is a non-negative real valued function (cf. Section C.1). Hence, if AXY () is bounded by some nite number C , then for J > J,
n e (X ) e (Y )o Cov VJ;t ; VJ;t
Z = GeJ (f )SXY (f ) df ? Z GeJ (f ) jSXY (f )j df ? Z = GeJ (f )AXY (f ) df ? Z C GeJ (f ) df = CJ < : 1 2
1 2
1 2
1 2 1 2 1 2
1 2
? 12
If AXY () cannot be bounded by any nite number C , there at least exists a constant C such that
Z AXY (f )C
AXY (f ) df < 2 ;
103 using a Lebesgue integral. A rough bound on the squared gain function of the scaling lter for Daubechies wavelets is GeJ (f ) 1, so for all J > J,
Z GeJ (f )SXY (f ) df ? Z = ZA 1 2
1 2
GeJ (f )SXY (f ) df + GeJ (f )SXY (f ) df (f )C AZ (f ) > < Ne1 PlN=?L1?1 Wj;l (XW)j;l+(Y;) = 0; : : : ; Nj ? 1; fj;l Wfj;l+ ; = ?1; : : : ; ?(Nej ? 1); (5.11)
~;XY (j ) > Ne l=L ?1? W > : 0; otherwise: The bias is due to the denominator 1=Nej remaining constant for all lags. We are still j
not using wavelet coecients which make use of the periodic boundary conditions. Just as with the wavelet covariance, we can de ne a biased estimator of the wavelet cross-covariance based on the DWT to be 8 1 PN ?0 ?1 W (X )W (Y ) ; = 0; : : : ; N bj ? 1; > > < 21Nb PlN=?L1 j;l(X ) j;l(Y+) (5.12)
^;XY (j ) > 2Nb l=L0 ? Wj;l Wj;l+ ; = ?1; : : : ; ?(Nbj ? 1); > : 0; otherwise: j
115 This estimator is biased for the same reason as Equation (5.11). This quantity is provided for completeness, as stated in Section 5.2.2, the inherent subsampling of the DWT will result in the variance of ^;XY (j ) being 2j -periodic unless the two series are properly aligned.
5.3 Estimating the Wavelet Correlation and Cross-Correlation Since the wavelet correlation is simply made up of the wavelet covariance for fXt; Ytg and wavelet variances for fXtg and fYtg, the MODWT estimator of the wavelet cross-correlation is simply (j ) ; ~;XY (j ) ~ ~(;XY Y (j ) X j )~
where ~;XY (j ) is given in Equation (5.11), and ~X2 (j ) and ~Y2 (j ) are given in Equation (3.11). When = 0 we obtain the MODWT estimator of the wavelet correlation between fXt; Ytg. Large sample theory for the cross-correlation is more dicult to come by than for the cross-covariance. The following result can be found in Fuller (1996, p. 342).
fj;t(X ); Wfj;t(Y )g is a bivariate Gaussian weakly stationary process Proposition 5.3 If fW and if all autocovariance and cross-covariance sequences are absolutely summable, then
lim N Covf~;XY (j ); ~;XY (j )g N !1 j 1 = f t;X (j )t+?;Y (j ) + t+;XY (j )t?;Y X (j ) t=?1
? ;XY (j )[t;X (j )t+;Y X (j ) + t;Y (j )t?;Y X (j )] ? ;XY (j )[t;X (j )t+;Y X (j ) + t;Y (j )t?;Y X (j )]
+ ;XY (j );XY (j )[ 12 2t;X (j ) + 2t;XY (j ) + 12 2t;Y (j )] g:
See Corollary in Fuller (1996). As established in the proof of Theorem 5.2, we only need square integrability of the autospectra of fXt; Ytg to ensure absolute summability of the autocovariance and cross-covariance sequences. Therefore, using the assumptions of Theorem 5.2 we may use the conclusion of Proposition 5.3 to determine the large sample variance of the wavelet cross-correlation. Thus, for large Nej , the expression in Proposition 5.3 reduces to e ?1 NX 1 f t;X (j )t;Y (j ) + t;XY (j )t;Y X (j ) Varf~XY (j )g e Nj t=?(Ne ?1) ? 20;XY (j )[t;X (j )t;Y X (j ) + t;Y (j )t;Y X (j )] + 20;XY (j )[ 12 2t;X (j ) + 2t;XY (j ) + 12 2t;Y (j )] g; (5.14) j
Wj;l(X )Wj;l(X+)jtj (5.15) t;X (j ) X2 (j ) is the lag-t wavelet autocorrelation at scale j for the process fXt g. Brillinger (1979) constructed approximate con dence intervals for the auto and cross-correlation sequences of bivariate stationary time series. We present a brief outline of his result for the MODWT estimated wavelet cross-correlation coecients in the form a theorem. 1 2j E
fj;t(X ); W fj;t(Y )g is a bivariate Gaussian weakly Theorem 5.3 Let L > 2d, and suppose fW
stationary process with square integrable autospectra, then the MODWT estimator ~XY (j ) of the wavelet correlation is asymptotically normally distributed with mean XY (j ) and large sample variance given by Equation (5.14).
fj;t(X )g and Proof Since L > 2d, we have that both sets of wavelet coecients fW fj;t(Y )g have mean zero. Let us de ne fW hf(X )i2 hf(Y )i2 fj;l(X )W fj;l(Y ); Aj;l W ; B Wj;t ; and Cj;l W j;l j;t
117 and subsequently de ne their sample means N X?1 1 Aj;l = ~X2 (j ); Aj e Nj l=L ?1 N X?1 1 Bj e Bj;l = ~Y2 (j ); and Nj l=L ?1 N X?1 C j e1 Cj;l = ~XY (j ): Nj l=L ?1 j
The vector-valued process fAj;t; Bj;t; Cj;tg has an absolutely summable joint cumulant sequence by Theorem 2.9.1 of Brillinger (1981, p. 38). Hence, from Lemma 5.2 the vector of sample means fAj ; Bj ; C j g are asymptotically normally distributed with mean vector fX2 (j ); Y2 (j ); XY (j )g, and large sample variance given by Nej?1 Sj;ABC (0), where Sj;ABC () is the 3 3 spectral matrix for fAj;t; Bj;t; Cj;tg (cf. Section C.1). The MODWT estimator of the wavelet correlation ~XY (j ) is essentially a function of these sample means g(Aj ; B j ; C j ), where g(x; y; z) z=pxy. Appealing to Mann and Wald (1943), we have that ~XY (j ) is asymptotically normally distributed with mean XY (j ) and large sample variance
Nej?1 g_ X2 (j ); Y2 (j ); XY (j ) T Sj;ABC (0) g_ X2 (j ); Y2 (j ); XY (j )
where g_ (; ; ) is the gradient of g(; ; ). To complete the proof, we must show the equivalence of Equation (5.16) to Equation (5.14). Because we are evaluating Sj;ABC () at f = 0, it is in fact a symmetric matrix of the form
3 2 S (0) S (0) S (0) j;AB j;AC 7 66 j;AA Sj;ABC (0) = 64 Sj;AB (0) Sj;BB (0) Sj;BC (0) 775 ; Sj;AC (0) Sj;BC (0) Sj;CC (0)
118 where the elements of the matrix are
Sj;AA (0) = 2 Sj;BB(0) = 2 Sj;CC (0) =
1 2
Sj;AC (0) = 2 Sj;BC (0) = 2
1 2 1 2
Sj;AB (0) = 2
1 2
1 2
1 2
2 (f ) df; Sj;X 2 (f ) df; Sj;Y
Sj;X (f )Sj;Y (f ) df + 1 2
? 12
? 12
1 2
1 2
? 12
Z ?
1 2 1 2
2 (f ) df; Sj;XY
Sj;XY (f )Sj;Y X (f ) df; Sj;X (f )Sj;Y X (f ) df; and Sj;Y (f )Sj;Y X (f ) df:
The gradient is given by
g_ X2 (j ); Y2 (j ); XY (j ) = "
pXY (j ) ? 2 pXY 2(j ) 2 ? 2 2X (j ) X (j )Y (j ) 2Y (j ) X2 (j )Y2 (j )
p 2 (1) 2 ( ) X j Y j
and, therefore, matrix multiplication of Equation (5.16) produces 2 (j ) 2 (j ) 2 (j )
XY S (0) + S (0) + 4X6 (j )Y2 (j ) j;AA 2X4 (j )Y4 (j ) j;AB 4X2 (j )Y6 (j ) Sj;BB(0) + 2 ( )1 2 ( ) Sj;CC (0) ? 4 (XY)(2j() ) Sj;AC (0) ? 2 (XY)(4j() ) Sj;BC (0):
j Y
j Y
X j Y
Utilizing Parseval's relation, each auto and cross spectrum in Sj;ABC (0) can be approximated by a sum of squared auto or cross-covariance sequences, respectively.
119 Hence, we may express Equation (5.16) as
?1 2 (j ) 2 (j )
XY 1 NX 2 2 s + 2Cj;;XY Cj;;Y X Nej =?(Ne ?1) 4X6 (j )Y2 (j ) j;;X 2X4 (j )Y4 (j ) 2 (j ) ?s s + C 2 1 2 2 s + + 4 2 (XY j;;Y 6 X2 (j )Y2 (j ) j;;X j;;Y j;;XY X j )Y (j )
XY (j ) XY (j ) ? 4 ( ) 2 ( ) 2sj;;X Cj;;Y X ? 2 ( ) 4 ( ) 2sj;;Y Cj;;Y X : ej
j Y
j Y
Each of the autocovariance terms are equivalent to the wavelet autocovariance for scale j (de ned by letting both wavelet coecients come from the same process in Equation (5.1)) and each cross-covariance term is equivalent to the wavelet crosscovariance for scale j . Using these quantities, Equation (5.16) may nally be expressed as a function of auto and cross-correlations based on the wavelet coecients ?1 1 NX 2 eNj =?(Ne ?1) ;X (j );Y (j ) + ;XY (j ) ? 20;XY (j )[;X (j );Y X (j ) + ;Y (j );Y X (j )] + 20;XY (j )[ 12 2;X (j ) + ;XY (j );Y X (j ) + 12 2;Y (j )] ; ej
which is (almost) equivalent to Equation (5.14), for large Nej .
5.4 Con dence Intervals for the Wavelet Covariance and Correlation 5.4.1 Wavelet Covariance
We now discuss how to formulate con dence intervals for the estimators of the wavelet covariance by making use of the large sample result in Equation (5.6). This was previously given in Lindsay, Percival, and Rothrock (1996). We will use the periodogram (Equation (B.3)) and the cross-periodogram (Equation (C.3)) to help es(p) () of timate the quantities of interest. First, we simply use the periodogram Sbj;X
fj;t(X ); t = Lj ? 1; : : : ; N ? 1, as the estimator of Sj;X (), so that W 2 N ?1 X (X )e?i2ft ; (p) (f ) 1 f W Sbj;X j;l Nej l=L ?1 j
(p) (). Next, we de ne the biased estimator of the autocovariance and similarly for Sbj;Y sequence associated with the scale j MODWT wavelet coecients of fXtg by
p) s^(j;;X
N ?X 1?j j 1 fj;l(X )W fj;l(X+)j j; e W Nj l=L ?1 j
p) with a similar de nition for fs^(j;;Y g, the biased estimator of the autocovariance sequence associated with the scale j MODWT wavelet coecients of fYtg. Second, (p) fj;t(X ); W fj;t(Y ); t = Lj ? 1; : : : ; N ? 1 as the we use the cross-periodogram Sbj;XY of W estimator of Sj;XY (), so that
1 0 N ?1 0 N ?1 1 X X (p) fj;l(X )e?i2flA @ W fj;l(Y )e?i2flA ; W Sbj;XY (f ) e1 @ Nj l=L ?1 l=L ?1 j
and the corresponding biased estimator of the cross covariance sequence associated with the scale j MODWT wavelet coecients of fXt; Ytg by
X f(X ) f(Y ) (p) W ; Cbj;;XY e1 W Nj l j;l j;l+
where the summation goes from l = Lj ?1 to N ?1? for 0 and from l = Lj ?1? to N ? 1 for < 0. Substituting the periodogram estimates of the autospectra and cross spectrum into Equation (5.6) gives us an estimator Vej for the large sample variance of the MODWT estimator of the wavelet covariance. We can use Parseval's relation to obtain an alternative representation for Vej that uses only the autocovariance and cross-covariance sequences instead of the autospectra and cross spectrum. Speci cally, the integral of the product of the autospectra is
121 determined from the autocovariance sequences of fXtg and fYtg via
1 2
1 2
(p) (f )S (p) (f ) df Sj;X j;Y
= =
Nej ?1 =?(Nej ?1)
p) s^(p) s^(j;;X j;;Y
s^(j;p0);X s^(j;p0);Y
Nej ?1 =1
p) (p) s^(j;;X s^j;;Y ;
and the integral of the product of the cross spectra from the cross covariance sequence of fXt; Ytg e ?1 NX Z h (p) i2 h b(p) i2 Sbj;Y X (f ) df = Cj;;XY : 1 2
? 12
=?(Nej ?1)
We may now make an explicit de nition for the large sample variance of the MODWT estimator of the wavelet covariance using the autocovariance and cross covariance sequences obtained from periodogram estimates of the autospectra and cross spectrum; i.e., N ?1 N ?1 h (p) (p) i2 (p) p) s^(p) + 1 X b eVj s^j;0;X s^j;0;Y + X s^(j;;X C j;;XY : j;;Y 2 2 =1 =?(Ne ?1) ej
Under the assumption that the spectral estimates are close to the true values, an approximate 100(1 ? 2p)% con dence interval for XY (j ) is
se # se
~XY (j ) ? ?1 (1 ? p) Vej ; ~XY (j ) + ?1(1 ? p) Vej ; Nj Nj
where ?1 (p) is the p 100% percentage point for the standard normal distribution. Replacing the MODWT wavelet coecients with their DWT counterparts, and adjusting for the number of wavelet coecients, will lead to an analogous con dence interval for the DWT estimator of the wavelet covariance. 5.4.2 Wavelet Correlation
We now use the large sample theory developed in Section 5.3 to construct approximate con dence intervals for the MODWT estimator of the wavelet correlation. Given
122 the non-normality of the correlation coecient for small sample sizes, a nonlinear transformation is sometimes required { Fisher's z-transformation (Fisher 1915; Kotz, Johnson, and Read 1982, Volume 3). Let
+ = tanh?1() h() 21 log 11 ? de ne the transformation. For the estimated correlation coecient ^, based on n p independent samples, n ? 3(h(^) ? h()) has approximately a N (0; 1) distribution. p The factor n ? 3 leads to a better approximation of the distribution (David 1966). An approximate 100(1?2p)% con dence interval for XY (j ) based on the MODWT is therefore
9 2 8 < = ? 1 4 tanh :h[ ~XY (j )] ? q (1 ? p) ; ; Nbj ? 3
8 93 < = ? 1 tanh :h[ ~XY (j )] + q (1 ? p) ; 5 Nbj ? 3
where Nbj is the number of DWT wavelet coecients associated with scale j . Note that I am using the number of wavelet coecients as if I had computed the point estimates using the DWT. This is done to provide a \better" estimate of the sample size with respect to the number of approximately uncorrelated observations. The assumption of uncorrelated observations is only valid if we believe no systematic trends or nonstationary features exist at that scale. If an equivalent degrees of freedom argument were available for the wavelet covariance, this could be utilized instead of Nbj (cf. Section 7.5). The primary bene t here is, by utilizing the variance stabilizing transformation h(), that we can avoid estimating the large sample variance for ~XY (j ) in Equation (5.14).
5.5 Comparison of Variance Estimators for the Wavelet Covariance Looking back at Equation (5.4) we can see an alternative way to estimate the variance of ~XY (j ) by estimating the autocovariance sequence of the product of the scale j
123 MODWT wavelet coecients for fXt g and fYtg; i.e., p) s^(j;;XY
N ?X 1?j j f(X ) f(Y ) 1 ( X ) ( Y ) f f W W ? E j;XY Wj;l+j jWj;l+j j ? E j;XY ; e Nj l=L ?1 j;l j;l j
N X?1 f(X ) f(Y ) 1 E j;XY e W W : Nj l=L ?1 j;l j;l Now de ne the alternative variance estimate to be ! e ?1 NX p) Ve j 1 ? je j s^(j;;XY Nj =?(Ne ?1) j
(cf. Equation 5.4), and the variance of ~XY (j ) can be approximated by Ve j =Nej . If we are interested in the performance of one estimator versus the other in practice, then we can look at the moment properties of these estimators. The following sections investigate the bias and mean squared error of Ve j and Vej . For the bias, explicit calculations are made and backed-up with simulation results. Due to the complexity of calculating the mean squared error explicitly, only simulation results are provided. 5.5.1 First Moment Properties of Ve j
We start with the rst moment (bias) properties of the variance estimator given in Equation (5.18). The expectation of Ve j , using Anderson (1971, p. 449), is given by
e E Vj
= =
Nej ?1 =?(Nej ?1) Nej ?1
=?(Nej ?1) 1 21 ?2
n p) o 1 ? je j E s^(j;;XY Nj
1 ? j j
Z (
f Nej ) sin(f (Nej ? )) cos(f ) cos(2f ) ? 2 sin( Nej sin(f (Nej ? )) sin(f ) 9 ej ) !2= sin( f N S (f ) df; + e Nj sin(f ) ; j;(XY )
124 where Sj;(XY )() is the spectral density of the product of the wavelet coecients fj;l(X )W fj;l(Y ). The bias for Ve j is therefore given by W
e e e (5.19) bias V j = E V j ? Vj = E V j ? Sj;(XY )(0): At rst glance, it is dicult to determine the bias of Ve j simply from Equation (5.19).
A surprising result comes from Percival (1993), when we restrict ourselves to the biased estimator of the acvs in practice; i.e., before taking its expectation. Speci cally, p) when the process mean is unknown, the biased estimator of the acvs fs^(j;;XY g obeys
Nej ?1 =?(Nej ?1)
p) s^(j;;XY = 0:
So we have that, for large sample sizes, the quantity Ve j will be approximately zero and the empirical bias will therefore be approximately equal to ?Vj . This fact is con rmed, through Monte Carlo simulation, below. 5.5.2 First Moment Properties of Vej
To simplify the notation, let Xt ; t = 1; : : : ; N and Yt; t = 1; : : : ; N denote the wavelet coecients of interest associated with level j . In order to examine the bias properties of Vej , we will make an argument utilizing the equivalent degrees of freedom of spectral estimators. Some preliminary results on second moment properties of spectral estimators, from Priestley (1981, pp. 700{702), will be useful here; speci cally n b b o CW (f ) Cov SX (f ); SY (f ) N SXY (f )SXY (5.20) = CNW jSXY (f )j2; f 6= 0; 1=2; n o Var RbXY (f ) CNW 12 SX (f )SY (f ) + R2XY (f ) ? Q2XY (f ) ; f 6= 0; 1=2; (5.21) n b o CW 1 Var QXY (f ) N 2 SX (f )SY (f ) + Q2XY (f ) ? R2XY (f ) ; f 6= 0; 1=2; (5.22)
125 where SbXY (), RbXY () and QbXY () are estimates of the cross, co- and quadrature spectra. That is, I can write SXY (f ) = RXY (f ) ? iQXY (f ) (cf. Section C.1). The fraction CW =N involves the smoothing window Wm() applied to the spectral estimator. Using Equation (B.5), we can re-express this fraction as
CW = N
R f W 2 ()d m ?f
= C2 ;
(N )
(N )
where Ch is based on the type of data taper used and is the equivalent degrees of freedom for the spectral estimator (cf. Section B.3). Assuming no tapering (i.e., Ch = 1) and utilizing Equations (5.20){(5.22), we can write
E SbX (f )SbY (f )
n b2
E Rb2XY (f ) E QXY (f )
o n
= Cov SbX (f ); SbY (f ) + E SbX (f ) E SbY (f ) 2 2jSXY (f )j + SX (f )SY (f ); f 6= 0; 1=2; o2 o n n = Var RbXY (f ) + E RbXY (f ) 2 2 SX (f )SY (f ) + RXY (f ) ? QXY (f ) + R2XY (f ); f 6= 0; 1=2; n b o n b o2 = Var QXY (f ) + E QXY (f ) 2 2 SX (f )SY (f ) + QXY (f ) ? RXY (f ) + Q2XY (f ); f =6 0; 1=2:
Note, all the above quantities are real-valued random variables. The integrals of the squared cross spectrum and magnitude squared cross spectrum can both be expressed through their co- and quadrature spectra; i.e.,
1 2
? 12
Z ?
1 2 1 2
2 (f ) df SXY
jSXY j
(f ) 2 df
1 2
1 2
R2XY (f ) df
1 2
? 12
R2XY (f ) df
? 2i
Z ?
1 2 1 2
1 2
? 12
Q2XY (f ) df;
RXY (f )QXY (f ) df ?
1 2
1 2
Q2XY (f ) df: (5.23)
126 The cross-product term in Equation (5.23) disappears because
1 2
RXY (f )QXY (f ) df
? 12
1 X 1 X =?1 =?1
e;XY o;XY
1 2
? 12
cos(2f t) sin(2ft) df = 0;
e;XY C;XY +2 C?;XY and o;XY C;XY ?2 C?;XY are even and odd sequences, respectively, based on the cross-covariance sequence. Therefore, Equation (5.23) reduces to
1 2
? 12
2 (f ) df SXY
1 2
? 12
R2XY (f ) df
1 2
? 12
Q2XY (f ) df:
This gives an approximate expectation, since the frequencies f = 0; 1=2 are included in the integrals, of
ne o
E Vj
Z n Z n o o 1 1 2 (f ) df E Sbj;X (f )Sbj;Y (f ) df + 2 E Sbj;XY = 2 ? ? Z Z n n o o = 21 E Sbj;X (f )Sbj;Y (f ) df + 21 E Rb2j;XY (f ) df ? ? Z n o ? 12 E Qb2j;XY (f ) df ? 1 1 Z Z 2 j S 1 j;XY (f )j2 2 (f ) df Sj;XY + Sj;X (f )Sj;Y (f ) df + + 2 2 ? ? 1 2
1 2
1 2 1 2
1 2 1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
1 2
When the magnitude squared coherence between the two processes is unity, then Sj;X (f )Sj;Y (f ) = jSj;XY (f )j2 and therefore
ne o 1
E Vj + 21
1 2
? 12
Sj;X (f )Sj;Y (f ) df + + 12
1 2
? 12
2 (f ) df: Sj;XY
Hence, Vej is an unbiased estimator of Vj when the periodogram is used to estimate the spectra; i.e., when = 2 as de ned in Equation (5.17). If Xt = Yt, then the
127 MODWT estimator of the wavelet covariance ~XY (j ) is equivalent to the MODWT estimator of the wavelet variance ~X2 (j ) and Equation (5.24) reduces to the quantity AW , where AW =Nej is the large sample variance of ~X2 (j ) in Percival (1995). j
5.5.3 Empirical Results
The rst moment properties of Vej and Ve j were nontrivial to obtain. If we are to investigate the second moment properties of these estimators, then we are dealing with even higher moments and, hence, more complicated expressions. Instead of solving this analytically, we will instead refer to empirical results. For the estimator Vej , let us de ne the empirical bias M ne o 1 X d Vej ? Vj ; bias Vj M m=1 and the empirical mean squared error M 2 ne o 1 X Vej ? Vj ; m dse Vj M m=1 for a given number of iterations M . Analogous estimates may be de ned for the alternative estimator Ve j . The empirical bias and mean squared error for Vej when analyzing two uncorrelated white noise processes fXt g and fYtg, with variances X2 = Y2 = 1, are given in Table 5.2. This reduces the relevant quantities to something similar to the case of the wavelet variance (if X2 = Y2 , as in this case, then it is exactly the wavelet variance). We see that the estimates are quite close to their theoretical values with the Haar and D(4) wavelet lters performing similarly. In order to compute Vj , we make use of the lack of correlation between these processes to reduce Equation (5.6) to
1 2 ? 12
He2j (f )Sj;X (f )Sj;Y (f ) df = X2 Y2
1 2 ? 12
He2j (f ) df;
which depends upon the squared gain function of the wavelet lter for scale j . While the squared gain function for the rst scale is easy enough to compute analytically,
128 Table 5.2: Empirical bias and mean squared error (mse) of Vej ; j = 1; : : : ; 6, for uncorrelated white noise processes (N = 512), based on M = 500 realizations.
Level 1 2 3 4 5 6
Vj(Haar) ave. 0:3750 0:1094 0:0449 0:0212 0:0105 0:0052
0:3763 0:1086 0:0443 0:0210 0:0105 0:0049
bd ias 0:00127 ?0:00076 ?0:00064 ?0:00023 0:00000 ?0:00033
m dse 0:00317 0:00318 0:00010 0:00005 0:00002 0:00001
0:4102 0:1315 0:0620 0:0308 0:0154 0:0077
ave. 0:4063 0:1294 0:0613 0:0309 0:0144 0:0067
bd ias 0:00383 ?0:00210 ?0:00067 0:00009 ?0:00100 ?0:00104
m dse 0:00336 0:00056 0:00024 0:00015 0:00005 0:00003
for higher scales this integral was evaluated through numeric integration (Press et al. 1992, Ch. 4). From the table, we see that the estimates have, on average, negligible bias and very small mean squared error for either wavelet lter. Figure 5.1 displays the distributions of the estimated variances Vej ; j = 1; : : : ; 6, between these uncorrelated white noise processes (N = 512). The estimates appear to be distributed symmetrically about their true value at all scales. There appears to be a slight increase in variability when using the D(4) wavelet lter over the Haar across all scales. Also shown are the distributions of the estimated variances Ve j ; j = 1; : : : ; 6. These estimates have skewed distributions and are negatively biased across all scales. As previously stated, the constraint on the acvs with unknown mean appears to be forcing the estimates towards ?Vj . One of the simplest relationships between bivariate time series is linear regression with delay; see, e.g., Priestley (1981, pp. 663{664). If we have two time series fXt g
Haar V~
Level 5
Level 4
Level 3
... ..........
..... ... ......... .. ...
Level 1
.......... .. . . . . ..
D(4) V~
Level 5
Level 4
.. ........ .
Level 3
... ..... ... ........ .
Level 2
..... .
...... .... ... . ..
. -0.4
.... 0.0
D(4) V~~
Level 6
Level 1
Haar V~~
Level 6
Level 2
........... . ...... . . 0.4
Estimate - True Value
Figure 5.1: Estimates of Vej (left column) and Ve j (right column) j = 1; : : : ; 6, for uncorrelated white noise processes (N = 512), based on M = 500 iterations. and fYtg with autospectra SX () and SY (), respectively, that are related via
Yt = cXt?d + t; and Section A.2) we know their spectra are related via
SY (f ) = c2SX (f ) + 2t:
130 Their cross spectrum is given by SXY (f ) = ce?i2fdtSX (f ), with co-spectrum taking the form RXY (f ) = c cos(2fdt)SX (f ) and quadrature spectrum QXY (f ) = c sin(2fdt)SX (f ). To simplify these expressions, assume fXtg is a white noise process with X2 = 1, let c = 1 and 2 = 0. Then Equation (5.6) reduces to
1 2 2 4 c X 1 ?2
He2j (f ) df
1 2 + c2x4 1 ?2
He2j (f )e?4fd df;
and can be evaluated via numeric integration as was the case with uncorrelated processes. Table 5.3: Empirical bias and mean squared error (mse) of Vej ; j = 1; : : : ; 6, for white noise processes which are related via linear regression with delay (N = 512), based on M = 500 iterations.
Level 1 2 3 4 5 6
Vj(Haar) ave. 0:7500 0:2188 0:0898 0:0425 0:0209 0:0104
0:7430 0:2180 0:0890 0:0422 0:0195 0:0096
bd ias ?0:00697 ?0:00072 ?0:00084 ?0:00029 ?0:00141 ?0:00083
m dse 0:01523 0:00200 0:00060 0:00030 0:00014 0:00005
0:8203 0:2630 0:1240 0:0617 0:0308 0:0154
ave. 0:8132 0:2612 0:1224 0:0611 0:0282 0:0129
bd ias ?0:00717 ?0:00105 ?0:00159 ?0:00057 ?0:00266 ?0:00250
m dse 0:01858 0:00369 0:00147 0:00085 0:00044 0:00017
Table 5.3 gives the empirical bias and mean squared error for a simulation study based on two processes related via Equation (5.25). Again, we see a slightly larger bias and mean squared error when using the D(4) wavelet lter. The distributions of the estimated variances Vej ; j = 1; : : : ; 6, between processes which are related via linear regression with delay are given in Figure 5.2. The es-
Haar V~
Haar V~~
Level 6
Level 5
Level 4
Level 3
.......... ..
Level 2
.. ......... .. .. . .
Level 1
Level 5
... .. . .
...... .
Level 4
..... .
......... ..
Level 3
....... ......
... ... .. . -0.5
.... . 0.0
D(4) V~~
Level 6
Level 1
... ...... ...... . . ...
D(4) V~
Level 2
... .... .........
. . .
... ........ .. ..
Estimate - True Value
Figure 5.2: Estimates of Vej (left column) and Ve j (right column) j = 1; : : : ; 6, minus their true value, for processes which satisfy a linear regression with delay relationship (N = 512), based on M = 500 iterations.
timates appear to be distributed symmetrically about their true value at all scales. There is a slight increase in variability when using the D(4) wavelet lter over the Haar across scales. Also shown are the distributions of the estimated variances Ve j ; j = 1; : : : ; 6. Again, these distributions are negatively biased about their true
132 value. 5.5.4 Conclusions
We have compared two potential estimators, Vej and Ve j , for the variance of the wavelet covariance. The former is based on using periodogram-based estimates of the integrals in Equation (5.6), while the latter uses an estimate of the autocovariance sequence in Equation (5.4). The variance estimate Vej , de ned in Equation (5.17), is an unbiased estimator of Vj and has negligible mean squared error when considering uncorrelated or linear regression with delay processes. The alternative variance estimate Ve j , de ned in Equation (5.18), is a negatively biased estimator of Vj with the bias approaching ?Vj for large Nej .
Chapter 6
APPLICATIONS In this chapter we apply the various techniques previously introduced, such as testing homogeneity of variance in time series and analyzing bivariate time series using wavelet estimators, to real data. The Nile River minimum water levels (Toussoun 1925) is a time series of yearly measurements starting in 622 AD and continuing, with both large and small gaps of missing values, into the twentieth century. We analyze the rst continuous piece of the series from 622 AD to 1284 AD. A key feature of the series is a marked increase in variability during the rst century of measurements (Beran 1994, Sec. 10.3). We compare results from our wavelet analysis to those of Beran and Terrin (1996), where they utilized a test statistic to detect a change in the long memory parameter in the time series. We nd a signi cant change of variance around 720 AD which coincides with the construction of an instrument, called a nilometer, in 715 AD. A time series of vertical ocean shear measurements (Percival and Guttorp 1994), where the observations are based on depth not time, is analyzed to detect multiple variance changes in the series. Two bursts of increased variability occur towards the beginning and the end of the series. When comparing the series to 4096 observations in the middle of the series (used in Percival and Guttorp (1994)), there is increased variability in the rst 5 scales only. Applying the multiple variance change detection procedure (Section 4.6) to this series yields a variety of signi cant variance changes in the rst ve scales. The two obvious bursts of variability (at 450 m and 1000 m) are adequately identi ed, and a third burst around 800m appears in the rst four scales.
134 The Madden{Julian oscillation (MJO) (Madden and Julian 1971) was originally discovered using bivariate spectral analysis; i.e., the lag window estimators of cospectrum and magnitude squared coherence. Since then it has been identi ed and described by researchers in a variety of physical disciplines; see Madden and Julian (1994) for a review. I reanalyze the data used by Madden and Julian (1971) using multitaper spectral techniques and, more importantly, bivariate wavelet techniques developed in Chapter 5. The multitaper estimates of the co-spectrum and magnitude squared coherence show a much more narrow period for the MJO, primarily because the amount of smoothing has been drastically reduced with respect to corresponding lag window estimates. The estimated wavelet correlation and cross-correlation agree with the original ndings. A peak in the estimated correlation occurs in the fth scale, corresponding to changes of 16 days and frequencies 1=64 f 1=32 cycles per day, between station pressure and 850 mb zonal wind, and between 150 mb and 850 mb zonal winds with a small lead/lag relationship between the atmospheric variables. While analyzing atmospheric time series collected at Canton Island (Madden and Julian 1971), we provided an empirical validation of the bivariate wavelet techniques, but did not take advantage of the time-localization properties which the wavelet transform possesses. With this in mind, we turn our attention to investigating the possible interaction between El Ni~no{Southern Oscillation (ENSO) events with the MJO. Using daily station pressure readings from Darwin, Australia, and Tahiti, French Polynesia, we construct a (daily) Southern Oscillation Index from roughly 1957 to 1992 by simply dierencing the observations at these two stations { this is a measure of ENSO activity. Similar readings from Truk Island are used as a proxy for the MJO. A bivariate wavelet analysis is performed between these two atmospheric time series. We nd a large peak in the fth scale of the wavelet correlation corresponding to the MJO. The wavelet cross-correlation nicely \decomposes" the usual cross-correlation into a few distinct patterns. The time-varying structure of the wavelet variance and covariance is also qualitatively analyzed by partitioning them by season and ENSO
135 activity.
6.1 Nile River Minimum Water Levels 6.1.1 Introduction
\In spite of all the changing, uncertain, and erroneous factors that must be considered in connection with records of stages of the Nile River, it is believed that they disclose some important information; and there is a fair prospect that they may yield more data with further study and the cumulation of ideas for various students." The words of Jarvis (1936) are very prophetic, for in fact data collected from the Nile River have spurred the development of a whole eld of mathematics (i.e., fractional Brownian motion and fractional Gaussian noise) along with a eld of statistics concerned with the behavior of long memory time series. Gathered by Toussoun (1925), there exists a remarkable hydrological time series of minimum and maximum water levels for the Nile River. Starting in 622 AD, the rst missing observation in the annual minima occurs in 1285 AD. This leaves us with a complete record for 663 years to analyze, shown in Figure 6.1. A reasonable amount of literature has been written about Toussoun's Nile River data. Some notable facts are given here. The minimum water levels for the Nile River are not actually the yearly minima. These values were recorded around the end of June each year whereas the maximum water levels were the actual yearly maxima (Popper 1951; Verner 1972; Leftus 1986) even though Brooks (1949, p. 329) notes that the lowest levels of the Nile occur in April and May with erratic behavior in June and the beginning of July. Various time domain and spectral domain methods have been used to analyze these data. Given the current state of knowledge about this process and its apparent long memory structure, these past results will largely be ignored. Statistical modeling
Minimum Water Level (cm)
1400 1300 1200 1100 1000
Figure 6.1: Nile River minimum water levels for 622 AD to 1284 AD. These data can be obtained via the World Wide Web at http://lib.stat.cmu.edu/S/ under the title `beran'. This is the address for StatLib, a statistical archive maintained by Carnegie{Mellon University.
of this time series as a long memory process began with the doctoral works of Mohr (1981) and Graf (1983). Both used Fourier transform (periodogram) analysis for estimating the self-similarity parameter of a fractional Gaussian noise model. Graf (1983) reported estimates of H = d + 12 between 0.83 and 0.85. Beran (1994, p. 118) has reported estimates of H = 0:84 for fractional Gaussian noise and H = 0:90 for a fractional ARIMA model with 95% con dence intervals of (0:79; 0:89) and (0:84; 0:96), respectively. He also established a goodness-of- t test for the spectral density of a long memory process. An approximate p-value for the fractional Gaussian noise model of the yearly minimum water levels of the Nile River is 0.70 { meaning that fractional Gaussian noise appears to t the spectral density of the Nile River series well.
1000 Years
Figure 6.2: Multiresolution analysis of the Nile River minimum water levels using the D(4) wavelet lter and MODWT. The top plot of the gure is the series itself, while the ve time series plotted below it constitute an additive decomposition of the series into components associated with { from top to bottom { variations on scales of 1 year (De1), 2 years (De2), 4 years (De3), 8 years (De4) and 16 years or longer (Se4). The vertical dotted line splits the series into two parts: the rst 100 observations (from 622 to 721 AD) and the remaining 563 observations (722 to 1284 AD).
138 6.1.2 Wavelet Analysis
Figure 6.2 shows a wavelet-based multiresolution analysis of Nile River minimum water levels. The subseries Dej and SeJ in a multiresolution analysis form an additive decomposition of the original time series
Yt =
J X e j =1
Dj;t + SeJ;t :
Each subseries Dej is associated with changes at scale j = 2j?1 , while SeJ is associated with weighted averages over scales of 2J ; see Percival and Mofjeld (1997) for more details. We used the D(4) wavelet in conjunction with the MODWT, extended to N coecients at each scale by assuming periodic boundary conditions. Visually it appears that there is greater variability in changes on scales of 1 and 2 years prior to 722 AD, but not on longer scales. Beran (1994, Sec. 10.3) investigated the question of a change in the long memory parameter in this time series by partitioning the rst 600 observations into two subseries containing, respectively, the rst 100 and the remaining 500 measurements. Estimates of the long memory parameter d, using maximum likelihood, were quite dierent between the two subseries, 0.04 and 0.38 respectively. This analysis suggests a change in d, a conclusion that was also drawn in Beran and Terrin (1996) using a procedure designed to test for a change in the long memory parameter. We can perform a similar analysis using the wavelet variance Y2 (j ), which makes use of the DWT or MODWT to decompose the variance of fYtg on a scale by scale basis (cf. Section 3.4). The estimated MODWT wavelet variances, given a partitioning scheme similar to the one used by Beran, are displayed in Figure 6.3. We see that the 95% con dence intervals for scales of 1 and 2 years do not overlap, which agrees with the apparent change of variance for those same scales in Figure 6.2. For a fractional dierence process we have Y2 (j ) / 2j d?1 approximately, so we can estimate d by regressing log ~Y2 (j ) on log j and using the estimated slope ^
Wavelet Variance
622 - 721 AD
722 - 1284 AD
Scale (years)
Figure 6.3: Estimated D(4) wavelet variances for the Nile River minimum water levels before and after the year 722 AD, along with 95% con dence intervals based upon a chi-square approximation given in Percival (1995).
to form d^ = 21 ( ^ + 1) (Percival and Walden 1999, Sec. 8.1). This procedure yields estimates of d^ = 0:38, 0.42 and ?0:07 for, respectively, the whole time series, the last 563 observations and the rst 100 observations. These compare favorably with Beran's values of 0.40, 0.38 and 0.04, but it is clear from Figure 6.3 that the smaller value for d^ in the rst 100 years is due to increased variability at scales of 2 years or
140 less. The observed dierence in variability at longer scales between the rst and last portions of the time series is consistent with sampling variability. 6.1.3 Testing for Homogeneity of Variance
Let us now apply the methodology developed in Chapter 4 to the Nile River minima. Using all N = 663 values in the time series, we computed our test statistic D (cf. Section 4.2.1) for scales of 1, 2, 4 and 8 years (j = 1; 2; 3; 4) based, respectively, on 331, 115, 57 and 28 wavelet coecients. The results from the test, shown in Table 6.1, con rm the visual appearance of inhomogeneity of variance at scales of 1 and 2 years, but fail to reject the null hypothesis of variance homogeneity at scales of 4 and 8 years. Table 6.1: Results of testing the Nile River minimum water levels for homogeneity of variance (N = 663) using the Haar wavelet lter with Monte Carlo critical values. As shown in the table, the test statistic at scale 1 is signi cant at the 1% level, and the test statistic at scale 2 is signi cant at the 5% level. Scale 1 2 4 8
D 10% critical level 5% critical level 1% critical level 0:1559 0:0945 0:1051 0:1262 0:1754 0:1320 0:1469 0:1765 0:1000 0:1855 0:2068 0:2474 0:2313 0:2572 0:2864 0:3436
With a change of variance detected in the rst and second scales, we can apply the methodology from Section 4.5 to locate these change points. Figure 6.4 displays the normalized cumulative sum of squares as a function of wavelet coecient for the rst two scales. We see a sudden accumulation of variance in the rst 100 years and a gradual tapering o of the variance afterwards (by construction the series must begin and end at zero). The maximum is actually attained in 720 AD for the level 1
141 coecients and 722 AD for level 2. The subsequent smaller peaks occurring in the ninth century are associated with large observations, as seen in the original series, not changes in the variance of the time series.
Figure 6.4: Normalized cumulative sum of squares from the rst two scales of the MODWT for the Nile River minimum water levels. The vertical dotted line is at 715 AD. The source document for this series (Toussoun 1925) and subsequent historical
142 studies by Popper (1951) and Balek (1977, Ch. 1) all indicate the construction in 715 AD of a \nilometer" in a mosque on Roda Island in the Nile River near Cairo. The yearly minimum water levels for 715 AD to 1284 AD were measured using this device, or a reconstruction of it done in 861 AD. The precise source of measurements for 622 AD to 714 AD is unknown, but they were most likely made at dierent locations around Cairo, with possibly dierent types of measurement devices, of less accuracy than the one in the Roda Island mosque. Our estimated change point at 720 or 722 AD coincides well with the construction of this new instrument in 715 AD, and it is reasonable that this new nilometer led to a reduction in variability at the very smallest scales. Beran and Terrin (1996) had looked at the Nile River minimum water levels and used a test statistic to argue for a change in the long memory parameter in the time series. The results from our analysis, in conjunction with an examination of the historical record, suggest an alternative interpretation. There is a decrease in variability at scales of 2 years and less after about 720 AD and that this decrease is due to a new measurement instrument, rather than to a change in the long term characteristics of the Nile River.
6.2 Vertical Ocean Shear Measurements Percival and Guttorp (1994) analyzed a set of vertical ocean shear measurements. The data were collected by dropping a probe into the ocean which records the water velocity every 0.1 meter as it descends. Hence, the \time" index is really depth (in meters). The shear measurements (in s?1) are obtained by taking a rst dierence of the velocity readings over 10 meter intervals and applying a low-pass lter to the dierence readings. Figure 6.5 shows all 6875 observations available for analysis. We see two sections of greater variability, one around 450 m and the other around 1000 m, with a fairly
2 0 -2 -4 -6 400
Figure 6.5: Plot of vertical shear measurements (inverse seconds) versus depth (meters). The two vertical lines are at 489.5 m and 899.0 m, and denote the roughly stationary series used by Percival and Guttorp (1994). This series can be obtained via the World Wide Web at http://lib.stat.cmu.edu/datasets/ under the title `lmpavw'. stationary section in between. Percival and Guttorp (1994) commented on this fact and only looked at 4096 observations ranging from 489.5 m to 899.0 m in their paper. Wang, Cavanaugh, and Song (1997) analyzed the full time series in order to estimate a time-varying self-similarity parameter using the DWT. We propose to apply the methodology for detecting and locating multiple variance changes (cf. Section 4.6) to this geophysical series. Figure 6.6 gives a multiresolution analysis of the ocean shear time series using the D(4) wavelet. The eight time series plotted constitute a portion of an additive decomposition of the series into components associated with { from top to bottom { variations on scales of 0.1 meters (De1), 0.2 meters (De2), 0.4 meters (De3), 0.8 meters (De4 ), 1.6 meters (De5), 3.2 meters (De6), 6.4 meters (De7 ) and 12.8 meters (De8). We see a persistence of the increased variability in the rst 5 scales around 1000m, and
700 800 Depth (meters)
Figure 6.6: Multiresolution analysis of the vertical ocean shear measurements using the D(4) wavelet lter and maximal overlap discrete wavelet transform. The rst eight details De1{De8 are displayed with each series on the same vertical scale. The two vertical lines are at 489.5 m and 899.0 m, and denote the wavelet coecients used by Percival and Guttorp (1994).
to a lesser extent, around 450m.
wavelet variance
N = 6875
N = 4096
16 32 64 scale (0.1 meters)
Figure 6.7: Estimated wavelet variance of the vertical ocean shear measurements using the D(4) wavelet lter and MODWT. The light grey con dence intervals correspond to all 6875 observations, while the dark grey con dence intervals correspond to the middle 4096 observations as analyzed in Percival and Guttorp (1994). The in uence of the ends of the time series (i.e., the observations outside the vertical dotted lines in Figure 6.5) is most evident when comparing its wavelet variance to the wavelet variance between the middle 4096 observations; see Figure 6.7. The
146 bursts of increased variability observed in the rst 5 scales make a signi cant contribution to the wavelet variance. For those scales, the con dence intervals do not overlap between the full and truncated time series, whereas the con dence intervals do overlap for all subsequent scales. As with the Nile River minimum water levels, this feature hints at a possible heterogeneity of variance in the rst 5 scales. Classifying these data as having long-range dependence is not obvious. The rollo of the wavelet variance at the higher scales (lower frequencies) does not t with the general framework of a fractional dierence process. Wang et al. (1997) estimated a time-varying long memory parameter for these measurements. The middle of the series has a roughly constant long memory parameter between 0.65 and 0.70, while the ends of the series exhibit much greater long memory parameters. I will not concentrate on modeling this process as a globally or locally self-similar process, but instead investigate the nonstationary features through testing for homogeneity of variance on a scale by scale basis. Figure 6.8 shows the MODWT wavelet coecients for the rst ve scales of the vertical ocean shear measurements. The vertical dotted lines are the estimated locations of variance change points using the DWT to test and the MODWT to locate with asymptotic critical values ( = 0:05). The procedure does a good job of isolating the two regions of increased variability at 450 m and 1000 m in each scale, except for the second scale. There, the rst burst has been \picked apart" by the procedure with 10 distinct stationary regions. This does not seem appropriate and it is unclear why this only occurred on the second scale when the third scale appears to be similar in changing variability with time. Besides the two obvious regions of increased variability, there appears to be a third burst around 800 m. It is present, to diering degrees, in the rst four scales whereas most other bursts disappear after the rst and second scale. This is a much more subtle type of nonstationarity, compared to the obvious bursts at 450 m and 1000 m, and not particularly visible in the original time series with the naked eye.
Level: 5
0.6 -1.0
Level: 4
0.2 0.0 -0.4
Level: 3
Level: 2
Level: 1
Depth (meters)
Figure 6.8: Estimated locations of variance change for the vertical ocean shear measurements using the D(4) wavelet lter displayed on the MODWT wavelet coecients. Only the rst ve scales were found to have signi cant changes of variance. Asymptotic critical values were used for the hypothesis testing at the = 0:05 level of signi cance.
148 This algorithm for detecting and locating multiple variance changes via the DWT is in its infancy. More work is needed in order to re ne the procedure and investigate its properties. Given the ability of the DWT to remove heavy amounts of autocorrelation in time series, this method has wide application in many elds. Whereas this test can handle high amounts of autocorrelation, as found in stationary long memory processes, the advantage of this procedure is that only limited assumptions are made with respect to the underlying spectrum of the observed physical process.
6.3 Wavelet and Multitaper Spectral Analysis of the Madden{Julian Oscillation 6.3.1 Introduction
The Madden{Julian oscillation (MJO) (Madden and Julian 1971) is a particular atmospheric phenomenon which has been discovered in a variety of studies involving data from the tropical Paci c Ocean from 1971 to the present; see Madden and Julian (1994) for a review. In their rst paper, the authors utilized bivariate spectral time series analysis in order to detect the oscillation. Speci c atmospheric variables used were the station pressure, 850 mb wind speed, and 150 mb wind speed. We propose a univariate and bivariate spectral analysis using multitaper methods (Thomson 1982; Percival and Walden 1993, Ch. 7) and then a bivariate wavelet analysis on the same time series originally used from Canton Island (2.8S, 171.7 W); see Figure 6.9 for the location of Canton Island in the Paci c Ocean. The data, obtained from NCAR (the National Center for Atmospheric Research), consists of three time series measured at the same location from 1 June 1957 to 31 March 1967. The atmospheric variables of interest are station pressure, wind speed at 150 mb and wind speed at 850 mb. Measurements were taken at 0000 GMT and 1200 GMT. As in the original paper, only the 0000 GMT observations are used. Hence, we can regard the sampling interval of the series t = 1 day. This gives us
Canton Darwin
• •
Figure 6.9: Climate stations in the tropical Paci c Ocean. The horizontal line is the equator, plotted for reference. The horizontal range is roughly from 110 E to 140 W and the vertical range is 45. a Nyquist frequency of f(N ) 1=(2t) = 1=2 cycles/day. For days with no measurements recorded, an ARIMA (3,1,0) model was t and one step ahead predictions were used to ll-in the gaps (Jones 1980). The majority of missing values were isolated observations, except for a week of missing data between 5 January 1965 and 10 January 1965. This gives us three length 3591 time series, which are shown in Figure 6.10. The ragged look of each series is because no decimal places were kept for any measurement. We rst analyze each time series separately, starting with the periodogram and
Station Pressure
850 mb Wind
150 mb Wind
Figure 6.10: Atmospheric time series collected from Canton Island (2:8S, 171:7W) over the period 1 June 1957 to 31 March 1967. From top to bottom, they are station pressure (in hPa), wind speed at 850 mb and wind speed at 150 mb (both in km/h).
151 then apply a variety of techniques to investigate any potential sources of bias. Conclusions drawn here are compared with those found in Madden and Julian (1971). The lag window spectral estimates utilized in the original paper are reproduced, as best as possible, and compared with multitaper spectral estimates. After looking at the three series independently, we perform a bivariate spectral analysis between the three possible pairings of the time series. Lag window estimates of the co-spectrum and magnitude squared coherence are contrasted with their corresponding multitaper estimates. A bivariate wavelet analysis is then performed in order to see how these new techniques compare to classical spectral analysis. 6.3.2 Univariate Spectral Analysis
We begin with a univariate spectral analysis of the station pressure, 150 mb wind speed, and 850 mb wind speed series. The periodogram for each showed no obvious signs of leakage; i.e., the transfer of power from one region of the spectrum to another. Common indications include a change in the variability of the periodogram for speci c ranges of frequencies. An approximate dynamic range for the all three series is around 30{35 dB. When applying a 10% cosine taper to these data, there are no obvious dierences between the periodogram and direct spectral estimator. A discrete prolate spheroidal sequence (dpss) data taper (cf. Section B.2), with NW = 4, was applied to the three series with no apparent changes in station pressure or 850 mb wind speed, but a marked accentuation of shape is seen for 150 mb wind speed. Hence, it appears that little or no tapering is required for the station pressure and 850 mb wind series, but caution should be exercised when analyzing the 150 mb wind speed. Madden and Julian (1971) performed the following algorithm to estimate the univariate spectra: \The algorithm makes use of the modi ed Fourier periodogram obtained by 1) removing the sample mean of the N members of the time series, 2)
152 `tapering' the rst and last 10% of the resulting N members by multiplication by a segment of the cosine curve so that the ends of the series are zero, and 3) performing the fFt to obtain N=2 harmonic coecients. The squared amplitudes or modi ed periodogram estimates are then averaged by a running average of length L coecients; this averaging producing an estimate of the continuous spectra viewed through a rectangular spectral window of bandwidth equal to (2L=N )fN where fN is the Nyquist frequency." y
and reference Bingham et al. (1967) for their Fourier methodology. In order to reproduce the results of Madden and Julian (1971), some preliminary interpretations and calculations must be performed. From the quotation given above, they appear to have utilized a lag window spectral estimate using the Daniell smoothing window. From p. 703, \The value of L was chosen so that the bandwidth of the spectral window was 0.0081 day?1." Using the formula from Table 269 in Percival and Walden (1993), we can compute the window parameter 1 123: m = B 1t = 0:0081 W For the Daniell smoothing window, the parameter m controls the amount of averaging across frequencies of the spectral estimate { the smaller the m the more averaging occurs. Upon comparing our lag window spectral estimators with those in the original work, they do not appear to have the same degree of smoothness. Two possible explanations for this dierence are they didn't use lag window spectral estimators as de ned in Percival and Walden (1993) or the lag window spectral estimates were smoothed again, possibly by splines, before publication. Regardless, we can obtain a reasonably smooth spectral estimator by replacing the Daniell smoothing window with the Parzen smoothing window. A recalculation of m 228 for the Parzen smoothing window is required, where m is now a truncation point where the acvs is zero for lags greater than m. The left column of plots in Figure 6.11 show these lag
153 window spectral estimates. The station pressure spectrum is very close to the one displayed in Madden and Julian (1971). As an alternative to lag window spectral estimation, we apply multitaper spectral estimation (Thomson 1982) to these series. Several dpss data tapers, which are orthogonal and normalized to have unit energy, are applied to the time series. The modulus squared Fourier transforms of these tapered series (also known as eigenspectra), are then averaged across frequencies. The right column of plots in Figure 6.11 show these multitaper spectral estimates. Although these spectral estimates are much less ragged than the periodogram, they are far from the smoothness of a lag window spectral estimate. We see an \annual" peak near zero frequency in the three multitaper spectra, with the peak being the largest in the 150 mb wind speed series. This is most likely due to its relatively at background spectrum. Notice, the multitaper spectral estimate for the 150 mb wind speed series agrees with the periodogram in shape. This is contrary to the result using a direct spectral estimator (dpss, NW = 4) stated at the beginning of this section. There is an issue in how to handle the annual cycle in these data. From Madden and Julian (1971, p. 703), \The particular algorithm used to estimate the spectra included an adjustment to eliminate the annual period and higher harmonics thereof from the series. This was accomplished by substituting the average of four adjacent modi ed periodogram estimates for the estimates at those frequencies nearest to the annual and semiannual frequencies. The object of this substitution was to insert values on the order of the background continuous spectrum for those estimates in uenced by the annual component." Given the degree of smoothing involved in the lag window spectral estimates, this does not appear to be necessary. The plots in the left column of Figure 6.11 were not
25 15
. . . . . ... . . .. . . . . . . ... . .. .. . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . .. . . . . . . . . . . .. .. . . . . .. . . . . .. . . . . . . .. .... . ... . . . . . . . . . .. . . . . . . . . . . . . . .. . . . .. . . . . . . . . . .. . . . . .. .. .. ... . .. . .. . . . . . . . . . .. . . . . . . . .. . . . . . . . . . . .. .. . . . . . .. . . . ... .. . . .. . .. . . . . . . .. .. . . .. . . . .. . . . . .. . . . . . . . . . . . . . . . 0.0 0.02 0.04 0.06 0.08 frequency (cycles/day) .. .
. .
15 0
30 0.02
0.04 frequency (cycles/day)
25 20 15
10 5
. . . . .. . . .. . . . . .. . .. . . . . . . .. . . . . . . . . ... . . . . .. ... . ... . . . . .. . .. ... . . . .. .. . . . .. . . .. . .. . . . . . . . . . . . . .. . . . . . . . . . . . . . . . .. . . . .. .. .. . . . . . .. . . . . . . .. . . ... . . . . . . . .... . . . . . . .. . .. . . . . . .. .. .. . .. .. . . . ..... . . . .. . .. . . .. . . . .. .. . .. . . . . . .. . . .. . . . . .. . .. . .. . . . . . .. . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . 0.08
0.04 frequency (cycles/day)
W850 30
W850 30
15 0
0.04 frequency (cycles/day)
. .
25 20 dB
15 10 5 0
. . . .. . . . .. . . .. . . . . . .. . .. . . . . . . . . . . . . .. .. . . . . . . . . . . . . .. .. . . .. . . . . . . .. . . .. . . . .. . . . . . . . . . . . . . . . .. . .. . .. . .. . . . ... . ... . .. . . . .. . . . . .. . . . . .. .. . . . . . . . . . . . .. . . . . .. . .. . .. . . .. . .. . . . . . .. . . .. . . . . . . . . .. . . . . . . . . . . . .. . ... .. . . . . . . . . . .. . .. . . . . .. . . . . . . . . .. .. . . . . .. . . . . . . . . . .. . . . . . . .. . . . . . . . . . . . . ..
0.04 frequency (cycles/day)
0.04 frequency (cycles/day)
Figure 6.11: Univariate spectral analysis of Canton Island data. The left column shows the lag window spectral estimates using the Parzen smoothing window (m = 228) of the three atmospheric series with the dots representing the periodogram estimates. The right column shows the corresponding multitaper spectral estimates using K = 5 dpss data tapers (NW = 4).
155 adjusted this way and show no obvious dierence for frequencies close to the annual frequency. The broad-band feature apparent in the lag window spectral estimates is seen in the multitaper spectral estimates as multiple peaks in the frequency band of interest. 6.3.3 Bivariate Spectral Analysis
While the periodogram and direct spectral estimates are useful in the univariate case for data analysis, they are not appropriate when moving into the realm of multivariate spectral analysis of time series. This is because important statistical quantities, such as the mean squared coherence (msc), are unity over all frequencies when calculated through these methods; see Priestley (1981, p. 708) for an explanation of this result. Hence, we concentrate our eorts on contrasting lag window bivariate spectral estimators (as used in the original study) with multitaper bivariate spectral estimators. We rst look at the co-spectra between station pressure and 850 mb wind speed and 150 mb wind speed and 850 mb wind speed. Figure 6.12 show estimates of the co-spectra using a lag window spectral estimator and a multitaper spectral estimator. The lag window co-spectra are similar in shape to those reported in the original paper, diering only in the magnitude. The multitaper co-spectra exhibit the multiple peaks in the frequency range of broad peaks for the lag window estimates with large peaks around f = 0:025 being the most dominant feature in that frequency band. The left column of Figure 6.13 shows the estimated lag window msc for pairwise comparisons between the three atmospheric time series. We can test, at the level of signi cance, the null hypothesis of zero msc by checking the estimated msc, on a frequency by frequency basis, against 1 ? 2=(?2) and rejecting if the estimated msc exceeds it (Koopmans 1974, p. 284). The parameter is the number of equivalent degrees of freedom associated with the spectral estimates, which is identical to the
156 0.0
estimated co-spectrum
Lag Window
-200 SLP / 850 mb 150 mb / 850 mb
0.08 frequency
Figure 6.12: Estimated co-spectra for the Canton Island data. The left panel displays the lag window co-spectral estimates using a Parzen smoothing window (m = 228) and the right panel is the multitaper co-spectral estimates using K = 5 dpss data tapers (NW = 4). univariate case. Using Table 269 in Percival and Walden (1993),
:71N = 3:71(3591) 52: = 3mC 228(1:12) h We reject the null hypothesis of non-zero msc in the rst two plots around what is most likely an annual frequency or close to it. A broad-band peak in the msc is observed from frequencies 0:0134 f 0:0297 between station pressure and 850 mb wind speed, corresponding to a 33{75 day oscillation, and from frequencies 0:0150 f 0:0304 between 150 mb wind speed and 850 mb wind speed, corresponding to a 33{66 day oscillation. The right column of Figure 6.13 shows the estimated multitaper msc for pairwise
0.6 0.0
estimated msc
0.6 0.4 0.0
estimated msc
SLP and W150
SLP and W150
0.04 frequency (cycles/day)
0.8 0.6 0.0
estimated msc
0.6 0.4
estimated msc
0.2 0.0 0.0
0.04 frequency (cycles/day)
0.04 ffuency (cycles/day)
0.6 0.4 0.2 0.0
estimated msc
W150 and W850
W150 and W850
estimated msc
SLP and W850
SLP and W850
0.04 ffuency (cycles/day)
0.04 frequency (cycles/day)
0.04 ffuency (cycles/day)
Figure 6.13: Mean squared coherence of the Canton Island data. The two horizontal lines in each plot are the = 0:05 (dotted) and = 0:01 (dashed) levels of signi cance test for non-zero msc. The left column contains the lag window spectral estimates and the right column are the corresponding multitaper spectral estimates.
158 comparisons between the three atmospheric time series. The rst ve dpss data tapers (NW = 4) were used to compute these estimates. We can test for non-zero msc as before. Using K = 5 data tapers gives us = 2K = 10 degrees of freedom. We reject the null hypothesis of non-zero msc in all three plots around f 0:0036, which corresponds to a period of around 276 days. A second peak is found around f 0:0058, which corresponds to a period of around 171 days, between station pressure and 850 mb wind speed. The group of frequencies which peak near the frequencies of the Madden{Julian oscillation cover a period of around 37{40 days, slightly shorter and much more narrow a range than the 41{53 days observed in Madden and Julian (1971). Several analyses of the estimated multitaper msc was performed between station pressure and 150 mb wind speed in order to determine if signi cant bias is introduced by using too many data tapers. Hypothesis testing using 2 data tapers is dicult given only = 4 degrees of freedom. For 3 data tapers a small peak occurs around frequency f = 0:0214 (approximately a 47 day oscillation), but the msc appears to be contaminated with several spikes from the still high variability of the multitaper estimate. With 4 or more data tapers, nothing except the annual frequency appears to be signi cant. Hence, while some leakage may be present in this series, the hypothesis of non-zero mean squared coherence cannot be tested without a sucient number of data tapers. 6.3.4 Wavelet Analysis
The spectral analysis performed by Madden and Julian (1971) used \modern" techniques for the late 1960s. The multitaper spectral analysis in Section 6.3.3 utilized time series analysis techniques from the middle 1980s. Here we jump to \modern" methods by using discrete wavelet transforms. The goal of this analysis is to compare and contrast the results obtained using wavelet methodology with those of standard spectral analysis.
1962 time (days)
Figure 6.14a: Multiresolution analysis of station pressure series collected at Canton Island (2:8S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details De1 ; De2; De3; : : : ; De8 are associated with variations on scales of 1; 2; 4; : : : ; 256 days and the wavelet smooth Se8 is associated with variations of 512 days or longer.
1962 time (days)
Figure 6.14b: Multiresolution analysis of 150 mb wind speed series collected at Canton Island (2:8S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details De1; De2; De3; : : : ; De8 are associated with variations on scales of 1; 2; 4; : : : ; 256 days and the wavelet smooth Se8 is associated with variations of 512 days or longer.
1962 time (days)
Figure 6.14c: Multiresolution analysis of 850 mb wind speed series collected at Canton Island (2:8S, 171:7 W) using the D(4) wavelet and MODWT. The wavelet details De1 ; De2; De3; : : : ; De8 are associated with variations on scales of 1; 2; 4; : : : ; 256 days and the wavelet smooth Se8 is associated with variations of 512 days or longer.
162 A partial MODWT, of order J = 8, was performed on the three time series of interest using the D(4) wavelet lter and displayed in Figures 6.14a{c. We can see how the approximate bandpass nature of the MODWT is able to separate events on dierent scales. For example, two \spikes" in the station pressure series, in 1959 and 1961, only show up in the rst scale wavelet detail De1 of the multiresolution analysis. We also observe a slight annual oscillation in scale 8, which captures the frequencies 1=512 f 1=256, and a gradual peak spanning 1962{1964 in the wavelet smooth Se8. Remember, the wavelet detail at the fth scale De5 captures frequencies 1=64 f 1=32 and is associated with changes of 16 days; i.e., the dierence of weighted averages each comprised of 16 values. This is therefore the focus of our attention for investigating the Madden{Julian oscillation. The wavelet variance for each series is shown in Figure 6.15 plotted on a log-log scale. Remember, longer scales correspond to lower frequency bands. The wind speed series show approximately an order of magnitude increase in variability at the lower scales, meaning there is more high frequency noise in their signals. In each series there is an abrupt drop in energy from changes of 16 days (5) to changes of 32 days (6), although this is less apparent in the 150 mb wind series. As previously stated, this fth scale captures the exact frequency range we would expect to be aected by the MJO. The wavelet correlation for the three pairwise combinations of the time series is shown in Figure 6.16. The con dence intervals for the wavelet correlation between station pressure and 150 mb wind speed includes zero for scales between 8 and 64 days. This includes the frequency range of interest (5) and agrees with the lack of significant frequencies when analyzing the magnitude squared coherence between the two series. Both the wavelet correlation between station pressure and 850 mb wind speed, and between 150 mb and 850 mb wind speed are signi cantly dierent from zero at scales of 4 to 32 days { with a (positive/negative) peak in scale 5 in both plots. The wavelet cross-correlation, between sea level pressure and 150 mb wind speed,
sea level pressure
Wavelet Variance
wind speed 150
sea level pressure wind speed 850
Wavelet Variance
Scale (days)
Scale (days)
Figure 6.15: MODWT estimated wavelet variance for Canton Island time series using a D(4) wavelet lter. The station pressure series is plotted in both the upper and lower plots for reference. The shaded regions form an approximate 95% con dence interval. showed no strong patterns with a range of ?0:19 to 0:20 for scale 5. When comparing sea level pressure and 850 mb wind speed, they are most positively correlated at a lag of +2 days (~2;850=SLP (5) = 0:412), and when comparing 150 mb and 850 mb wind speed, they are most negatively correlated at a lag of +1 days (~1;150=850(5) = ?0:309). These results, which correspond to the 850 mb wind speed trailing the
SLP / 150 mb
Wavelet Correlation -0.5 0.0 0.5
8 16 Scale (days)
SLP / 850 mb -
Wavelet Correlation -0.5 0.0 0.5
8 16 Scale (days)
850 mb / 150 mb
Wavelet Correlation -0.5 0.0 0.5
8 16 Scale (days)
Figure 6.16: MODWT estimated wavelet correlation for Canton Island time series using a D(4) wavelet lter. The plots are { from top to bottom { station pressure versus 150 mb wind, station pressure versus 850 mb wind, and 150 mb wind versus 850 mb wind. The shaded regions form approximate 95% con dence intervals.
165 station pressure by 2 days and the 850 mb wind speed leading the 150 mb wind speed by 1 day, agrees with ndings in Madden and Julian (1971) where the 850 mb wind was nearly in phase ( 10) with station pressure and the two winds were found to be almost out of phase (177 ), respectively. 6.3.5 Conclusions
It is dicult to be overly critical of the time series analysis techniques of Madden and Julian (1971). Given the year of the discovery, they used the most reasonable techniques available, namely, lag window bivariate spectral estimates. The amount of tapering applied to the series (10% cosine) appears adequate when compared to stronger data tapers. Their results led to the discovery of a broad-band feature in atmospheric readings from the tropical Paci c Ocean. Utilizing the multitaper techniques of Thomson (1982), we obtain \smoothed versions" of univariate and bivariate spectral estimators. Given that the spectral bandwidth of a multitaper spectral estimator is smaller than a corresponding lag window spectral estimator (in general), we will not over-smooth the spectra and potentially lose interesting features. In the frequency range of interest, we observe several peaks instead of a one broad peak in the univariate spectra. This translates into a very choppy estimated co-spectrum and magnitude squared coherence. When testing the magnitude squared coherence, we observe a period of 37{40 days instead of the 41{53 day oscillation reported in Madden and Julian (1971). We have shown how wavelet analysis techniques have captured, and adequately summarized, information about the Madden{Julian oscillation. The ability of the DWT to approximately bandpass lter a time series alleviates some of the preprocessing performed in spectral analysis of atmospheric time series, such as removal of annual, semi-annual, and seasonal trends. These will naturally be partitioned by the DWT. Wavelet techniques also open up the possibility of answering questions about how the time series vary with time.
6.4 Wavelet Analysis of Covariance Between the Southern Oscillation Index and Madden{Julian Oscillation 6.4.1 Introduction
Temporal variations in the Madden{Julian Oscillation (MJO) and its relationship with El Ni~no{Southern Oscillation (ENSO) events has been previously investigated using classical spectral analysis; see, e.g., Madden and Julian (1994) and references therein. Anderson, Stevens, and Julian (1984) ltered two time series, atmospheric relative angular momentum (4 years) and the 850{200 mb shear of the zonal wind at Truk Island (25 years), with a lter designed to pass the frequency band corresponding to periods of 32{64 days. They noted that, with respect to the Truk Island series, a possible association with increased amplitude of the oscillation during the 1956{ 57, 1972{73, and 1976{77 ENSO warm events but noted that the duration of these increases were much longer than the ENSO events. Madden (1986) performed a seasonally varying cross-spectral analysis on nearly twenty time series of rawinsonde data from tropical stations around the world. The MJO appears strongest during December{February and weakest during June{August, and that it is always stronger in the western Paci c and Indian oceans than elsewhere. Gray (1988) performed a correlation analysis between daily station pressure data from Truk (7N, 152 W), Balboa (9 N, 80 W), Darwin (12S, 131E) and Gan (1S, 73E), with seasonal sea surface temperature anomalies on a 5 grid. The data were partitioned into ENSO and non-ENSO years; in non-ENSO years a strong seasonal shift in frequency was found at all sites except Truk Island. Kuhnel (1989) investigated the characteristics of a 40{50 day oscillation in cloudiness for the Australo{Indonesian region. Using data on a 10 by 5 grid, regions in the eastern Indian Ocean and western Paci c Ocean were found to have a pronounced 40{50 day peak with no obvious seasonal variation. Another region in the Indian Ocean (5{15 S, 95{100 E) showed a stronger oscillation in the March{June period. Regions around 5{15 S over northern Australia
167 and in the Paci c Ocean showed a much stronger 40{50 day oscillation during the Australian monsoon season from December to March, than the rest of the year. The 40{50 day cloud amount oscillation did not appear to be aected by warm ENSO events. Madden and Julian (1994) note the broadband nature of the oscillation by comparing the station pressure spectra for Truk Island (7.4N, 151.8 W) during two time spans { 1967 to 1979 and 1980 to 1985. The MJO appears to have a 26-day period in the early 1980s. The relationship between ENSO events and the MJO is a topic which could bene t markedly by using wavelet techniques. To investigate how these two atmospheric phenomena interact, we will analyze two time series. The rst one being the Southern Oscillation Index (SOI), which is an indicator of ENSO and usually de ned to be the dierence between monthly averages of the station pressure series from climate stations at Darwin, Australia (130.8 E, 12.4S) and Tahiti, French Polynesia (149 W, 14 S); see Figure 6.9 for the locations of these climate stations. It was rst introduced by Walker (1928) and came from the observation that pressure in the tropical Paci c Ocean is inversely related to pressure in the Indian Ocean. In our case, we deviate from the usual de nition of the SOI by introducing a daily version of it. I obtained daily pressure readings from Darwin, Australia, starting in 1 June 1957 and continuing to 31 December 1992 (N = 12; 998) and dierenced them; see Figure 6.17. The distance of the stations from the equator is apparent in the strong annual component in the time series. The measurements in the summer and winter of 1983 appear to be higher than those in adjacent years. This approximately corresponds to a large ENSO event in the early 1980s. Any missing values were lled in using one-step-ahead predictions from an ARIMA(3,1,0) model applied to the series (Jones 1980). I also obtained daily station pressure readings from Truk Island (7.4N, 151.8 W) as an indicator of the MJO. This series also exists from 1 June 1957 to 31 December 1992; see Figure 6.17. Unlike the SOI, there is no apparent annual trend since
1005 1000 5
Pressure (mb)
Figure 6.17: Station pressure series for Truk Island (7.4N, 151.8W) and the Southern Oscillation Index. The \staggered" look of the Truk Island series prior to 1971 is the result of rounding to the nearest millibar. the station is quite close to the equator. Missing values were dealt with in the same manner as described for the SOI. 6.4.2 Time-Domain and Spectral Analysis
We now propose to analyze the SOI and Truk Island station pressure series using standard time-domain (e.g., the cross-correlation sequences) and Fourier (e.g., the cross-spectrum) techniques. The cross-correlation sequence (ccs) is typically esti-
169 mated by p) ^(;XY
(p) Cb;XY
s^(0p;X) s^(0p;Y)
1 2
(see, e.g., Brockwell and Davis (1991, p. 29)), utilizing the periodogram-based estimates of the acvs for fXtg and fYtg, and ccvs. The estimated cross-correlation sequence for the SOI and Truk Island series is shown in Figure 6.18. The maximum occurs at a lag of +1 days. We also observe the characteristic broad-band peak commonly found in atmospheric time series from this region, with a approximate range of 35{55 day lags.
0.25 0.20
0.15 0.10 0.05 0.0 -0.05 -200
Lag (days)
Figure 6.18: Estimated cross-correlation sequence for the Southern Oscillation Index and Truk Island station pressure series. A spectral analysis of these data provides very little insight into the possible relationship between ENSO events and the MJO. The multitaper co-spectrum between the SOI and Truk Island station pressure series exhibit large peaks at annual and inter-annual frequencies, and only a very slight peak in the frequency range of the MJO. With the co-spectrum producing values so close to zero, the multitaper msc is
170 very erratic and gives a large number of signi cant peaks over 0 f 0:08. Hence, classical bivariate spectral estimation of these series does not exhibit any indication of a possible relationship between these two series. 6.4.3 Wavelet Analysis
Daily measurements allow us to apply the MODWT and analyze the sub-series which correspond to ltered series with approximate pass-band 1=2j+1 jf j 1=2j . Due to the approximate bandpass nature of the MODWT, with the approximation improving as the length of the wavelet lter increases, it is unnecessary to remove any annual or semiannual components (a similar argument is made when bandpass ltering atmospheric time series in Anderson et al. (1984)), which should be roughly captured in the 7 and 8 scales. The MJO is known to occur with periods of around 30{60 days. We therefore expect to see it in scale 5 , associated with changes of 16 days and an approximate pass-band of 1=64 jf j 1=32. A partial MODWT (J = 10) was applied to each series using the D(4) wavelet lter. Figures 6.19 and 6.20 give multiresolution analyses of the Truk Island station pressure series and SOI, respectively. For the Truk Island series, we observe only a slight annual trend in D8 and the obvious disruption in the early 1980s appears to primarily aect scales 7 and 8 . The fth scale appears to fade in and out in magnitude with no apparent pattern. The SOI multiresolution analysis exhibits a strong annual trend where the disturbance in the early 1980s aects the scales 8 and 9 . The scale associated with the MJO, 5 , exhibits numerous bursts across time. Figure 6.21 gives the estimated wavelet variance for each series (cf. Section 3.4). We see that the SOI has a higher amount of variability in every scale; i.e., the approximate 95% con dence intervals do not overlap except for scale 8 which corresponds to the annual frequency. Here, the estimated wavelet covariance for the SOI is more than an order of magnitude higher than those for Truk, but the associated error is also
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 S10
time (days)
Figure 6.19: Multiresolution analysis of station pressure series collected at Truk Island (7.4N, 151.8W), from June 1957 through December 1992, using the D(4) wavelet lter and the MODWT. The wavelet details De1; De2; De3; : : : ; De10 are associated with variations on scales of 1; 2; 4; : : : ; 1024 days and the wavelet smooth Se10 is associated with variations of 2048 days or longer.
D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 S10
time (days)
Figure 6.20: Multiresolution analysis for the daily Southern Oscillation Index, from June 1957 through December 1992, using the D(4) wavelet lter and the MODWT. The wavelet details De1; De2; De3; : : : ; De10 are associated with variations on scales of 1; 2; 4; : : : ; 1024 days and the wavelet smooth Se10 is associated with variations of 2048 days or longer.
Wavelet Variance
Truk Station Pressure Southern Oscillation Index
16 32 Scale (days)
Figure 6.21: MODWT estimated wavelet variance for the Southern Oscillation Index and Truk Island station pressure series. quite large. The Truk Island estimated wavelet variance appears to follow the SOI estimates in shape, with much less emphasis at the semi-annual and annual scales. Figure 6.22 shows the estimated wavelet correlation between the SOI and Truk station pressure series at a lag of zero days. The wavelet correlation appears to be signi cantly dierent from zero for all scales except 6 and 7, giving moderately pos-
Wavelet Correlation
16 32 Scale (days)
Figure 6.22: MODWT estimated wavelet correlation for the Southern Oscillation Index and Truk Island station pressure series. The transformed con dence intervals were computed using Section 5.4.2. itive correlations. These results are liberal for scales 7 and 8 because the con dence intervals assume approximate zero correlation between the product of DWT wavelet coecients. This is not true for scales 7 and 8, since they involve the semi-annual and annual oscillations, and residual autocorrelations in the time-varying wavelet co-
175 variance persist. The signi cant correlation at 5 lends credibility to the hypothesis of an association between the SOI and MJO.
d1 d2 d3 d4 d5 d6 d7 d8 d9 d10
0 Lag (days)
Figure 6.23: MODWT estimated wavelet cross-correlation for the Southern Oscillation Index and Truk Island station pressure series for lags up to 240 days. The con dence intervals from Figure 6.22 apply on a point-to-point basis. The positive peak in the fth scale 5 is at a lag of 0 days. If we are to investigate a possible lead/lag relationship between the two series, then
176 the wavelet cross-correlation must be estimated for various lags. Figure 6.23 shows the estimated wavelet cross-correlation between the SOI and Truk Island station pressure series. The large positive peak in the rst ve scales is at a lag of 1 day for scales 1{2 , a lag of 2 days for scales 3, a lag of 4 days for scale 4 and zero days for scale 5. In the fth scale, the largest negative value is at a lag of 20 days. The higher scales do not show any apparent trend when looking at lags up to 240 days. Possible interpretations for the rst four scales is that of weather patterns as they travel from West to East (cf. Figure 6.9). The abrupt change in lead/lag relationship of the wavelet cross-correlation at the fth scale is most likely due to the MJO. Patterns in higher scales (lower frequencies) correspond to semi-annual, annual, and interannual trends. The ability of the wavelet cross-covariance to analyze (decompose) the usual covariance between these two series on a scale by scale basis allows these interpretations to be made. Although direct comparison between Figure 6.23 and Figure 6.18 is not appropriate, because the wavelet correlation does not decompose the correlation between two stationary processes, the wavelet covariance does decompose the covariance between two time series. Since the wavelet correlation is simply the wavelet covariance standardized at each scale, the shape of each wavelet cross-correlation is the same even though the magnitudes are o. Hence, we may make a rough comparison between the two, keeping in mind the facts just stated. The rst obvious dierence is the p) is positive for all negative lags. Looking at Figure 6.23, we see that fact that ^(;XY the larger scales (9 and 10) are all positive and contribute to this feature, whereas for positive lags they are close to zero and allow the annual scale (8) to dominate. The two dips on either side of the peak at lag +1 is the superposition of the rst six scales in Figure 6.23. The subsequent peak around a lag of +40 days is a result of the negative correlations for scales 5 and 6 pushing down the annual correlation (8). It is not, most likely, an interesting feature in the association between these two processes. The interaction of the correlation structure on a scale by scale basis
177 results in a quite complex looking cross-correlation sequence. However, when broken up with the wavelet transform a few simple, yet distinct, patterns appear which may be more easily interpreted. 6.4.4 Investigating Seasonal Variation in the Madden{Julian Oscillation
So far the analysis performed has largely ignored an important feature of discrete wavelet transforms, their ability to extract information which is local in time. With respect to these data, the question of whether or not the MJO changes over time is of particular interest. Changing over time could have (at least) two potential meanings: that the strength of the MJO is changing over time or that the frequency of the MJO is changing over time. The rst type of change would produce a pattern of increasing and decreasing coecients for the time-varying wavelet variance; i.e., the squared wavelet coecients at the same scale (recall that the MJO should be captured in the scale 5 coecients). The second type of change would not only produce diering magnitudes of the time-varying wavelet variance, but, if the change in frequency was large enough, a shift of large coecients from one scale to another. Madden and Julian (1994) investigated the latter type of change, with respect to the MJO at Truk Island, using univariate spectral methods. Let us investigate the possibility of a changing MJO over time by plotting the time-varying wavelet variance for the Truk Island station pressure series and SOI, individually. In Figure 6.24, we see the scale 5 time-varying wavelet variance for November through April, what I roughly call \winter," and for May through October, what I roughly call \summer." We notice that the variability is much greater in the winter months when compared with the summer months { especially in the SOI. The median value of winter wavelet variance is 0.22 versus just 0.15 for the summer with respect to the Truk Island station pressure series. For the SOI, the median value of winter wavelet variance is 0.48 compared with 0.23 in the summer. The most extreme years, for winter wavelet variance, appear to be 1961, 1986 and 1992 for the SOI and
Truk Station Pressure "Winter" 20
0 Truk Station Pressure "Summer" 20
Time-Varying Wavelet Variance
0 SOI "Winter" 20
0 SOI "Summer" 20
0 1960
Figure 6.24: Time-varying wavelet variance for the Truk Island station pressure series and SOI at the fth scale (5), using the MODWT and D(4) wavelet lter. The \winter" period corresponds to November through April and the \summer" period corresponds with May through October.
179 for the Truk Island station pressure series, 1959, 1974, 1978 and 1990. These extreme years for the Truk Island series, before 1990, agree with the results from Anderson, Stevens, and Julian (1984). It is evident, from the analysis presented here, that a seasonal pattern exists in the Madden{Julian oscillation { even in locations close to the equator. This is a feature not easily recognized using classical spectral techniques. This increased knowledge of MJO variability changing with time is exciting, and will hopefully allow research scientists to better describe similar physical phenomena. 6.4.5 Investigating ENSO Variation of the Madden{Julian Oscillation
We have already seen how the association between the Southern Oscillation Index and station pressure series collected at Truk Island, for scales associated with the MJO, exhibits dierence characteristics between \summer" and \winter" seasons. Now I propose to qualitatively analyze the same association between diering periods of ENSO activity. The SOI is a measure of the strength of the trade winds, where high SOI (pressure dierence from East to West) is associated with La Ni~na conditions and low SOI (pressure dierence from West to East) is associated with El Ni~no conditions. To construct an interpretable indicator of ENSO activity, the last two wavelet details and wavelet smooth from the multiresolution analysis of the daily SOI (cf. Figure 6.20) were combined. The sample mean was removed and the series was then inverted in order to agree with the conventional SOI; see Figure 6.25. Looking at positive and negative values of this time series, there is good agreement between it and the conventional SOI. We see the large El Ni~no events of 1981{1982 and 1986{ 1987, with a subsequent La Ni~na event in 1988{1989. The time-varying wavelet variance for scale 5 of the station pressure series collected at Truk Island, de ned to be the squared scale 5 wavelet coecients, is given in the top half of Figure 6.26. The upper plot is the time-varying wavelet variance during La Ni~na periods, when our measure of ENSO activity is positive, and the
2 0 -2 -4 1960
Figure 6.25: Indicator of ENSO activity, constructed by combining the last two wavelet details and wavelet smooth from the multiresolution analysis of the Southern Oscillation Index (cf. Figure 6.20); i.e., De9 + De10 + Se10. The sample mean was removed and the series was inverted in order to agree with the conventional SOI. lower plot is the time-varying wavelet variance during El Ni~no periods, when ENSO activity is negative. There is no apparent dierence between the two time series, the median value for La Ni~na periods is 0.19 while the median value for El Ni~no periods is 0.16. The time-varying wavelet covariance for scale 5, de ned to be the product of scale 5 wavelet coecients computed from the SOI and station pressure series collected at Truk Island, is given in the bottom half of Figure 6.26. Again, the upper plot is the time-varying wavelet covariance during La Ni~na periods and the lower plot is the time-varying wavelet variance during El Ni~no periods. Here, the wavelet covariance during El Ni~no periods has much higher extreme values in the early 1990s, but it is still dicult to distinguish between La Ni~na and El Ni~no periods. The median value for La Ni~na periods is 0.036 while the median value for El Ni~no periods is 0.006. Figure 6.27 provides similar information to that of Figure 6.26 for scale 4 , which
Variance for ENSO > 0.5 8 6 4 2 0 -2 Variance for ENSO < -0.5 8
Time-Varying Wavleet Quantities
6 4 2 0 -2
Covariance for ENSO > 0.5 8 6 4 2 0 -2 Covariance for ENSO < -0.5 8 6 4 2 0 -2
Figure 6.26: Time-varying wavelet quantities, for the scale associated with the Madden{Julian oscillation (5), partitioned into El Ni~no and La Ni~na periods. The upper two plots display the wavelet variance and the lower two display the wavelet covariance.
Variance for ENSO > 0.5 10 8 6 4 2 0 -2 -4 Variance for ENSO < -0.5 10 8
Time-Varying Wavleet Quantities
6 4 2 0 -2 -4 Covariance for ENSO > 0.5 10 8 6 4 2 0 -2 -4 Covariance for ENSO < -0.5 10 8 6 4 2 0 -2 -4 1960
Figure 6.27: Time-varying wavelet quantities, for the scale associated with shorter periods than the Madden{Julian oscillation (4), partitioned into El Ni~no and La Ni~na periods. The upper two plots display the wavelet variance and the lower two display the wavelet covariance.
183 is associated with shorter periods than the MJO. The time-varying wavelet variance for the Truk Island station pressure series (top two plots) appears to have a greater number of large coecients in the La Ni~na periods. This could indicate a potential frequency shift in the MJO similar to the one discussed in Gray (1988), who looked at station pressure and sea surface temperature anomalies. The time-varying wavelet covariance between the SOI and station pressure series collected at Truk Island (bottom two plots) are quite similar, as was the case with scale 5.
Chapter 7
CONCLUSIONS AND FUTURE DIRECTIONS This chapter contains ideas for future work that would complement the material I have presented in the two major areas of my dissertation. Final comments are also provided.
7.1 Distributional Results for Testing Homogeneity of Variance Monte Carlo experiments were mainly used to obtain the quantiles for the test statistic D when testing for homogeneity of variance, and the multiple variance change procedure utilized its asymptotic distribution which is proportional to a Brownian bridge. Starting with the Haar wavelet, it may be possible to obtain analytical expressions for the quantiles of D { with higher order wavelets following. This would allow us to abandon Monte Carlo studies and not rely on \sucient" sample sizes in order to appeal to the asymptotic distribution of D. These results would be most useful when testing multiple variance changes where the asymptotic distribution is utilized exclusively. In addition, this may also provide insight into the distribution of D when the MODWT wavelet coecients are used instead of the DWT coecients.
7.2 The Schwarz Information Criterion Chen and Gupta (1997) proposed an information criterion based approach to testing and locating variance change points. Let X1 ; : : : ; XN be a sequence of Gaussian random variables with parameters (1; 12); (2; 22); : : : ; (N ; N2 ): Assume that 1 = 2 = = N = 0, which is true when testing wavelet coecients. The Schwarz
185 Information Criterion (SIC), also known as the Bayesian Information Criterion (BIC), is de ned to be ?2 log L(^)+ p log N , where L(^) is the maximum likelihood function for the model, p is the number of free parameters in the model, and N is the sample size. Two models are being compared, the null hypothesis H0 of Equation (4.6) and an alternative hypothesis H1 with two distinct variances; i.e.,
H1 : 12 = = k2 6= k2+1 = = N2 : We reject H0 if SIC(N ) > SIC(k) for some k and estimate its position of the change point k0 by k^ such that SIC(k^) = 1min SIC(k); kN where SIC(N ) is the SIC under H0 and SIC(k) is the SIC under H1 for k = 1; : : : ; N ? 1. Here SIC(N ) N log 2 + N log ^ 2 + N + log N and 2 + N + 2 log N; SIC(k) N log 2 + k log ^2 k + (N ? k) log ^>k
^ 2
N k N X X X 1 1 1 2 2 2 2 Xi2: N Xi ; ^k k Xi ; and ^>k N ? k i=1 i=1 i=k+1
Since we require at least one sample in each estimate, we can only detect changes for 2 k0 N ? 1. To eliminate, or at least suppress, the possibility of random
uctuations in the data contributing to the dierence between the SIC's, Chen and Gupta (1997) introduced a signi cance level and its associated critical value c. Hence, H0 is rejected if SIC(N ) > SIC(k)+ c for some 2 k0 N ? 2. Approximate values of c can be obtained using the formula 1 ? b (log N ) ? b (log N ) + a(log N ) ? log N; c = ? a(log N ) log log 1 ? + exp ?2e 1 2
186 and found in Table 1 of Chen and Gupta (1997). It would be of interest to compare the performance of this method to the cumulative sum of squares testing procedure, proposed in Section 4.2.1, for detecting single and multiple variance changes. A modi cation, similar to that in Section 4.5 using the MODWT, would also allow this information criterion approach to estimate the location of the variance change.
7.3 Re nement of the Multiple Variance Change Testing Procedure At the present time, the procedure for detecting multiple variance changes in time series is rather crude. The DWT is used solely for testing and the MODWT for locating the variance change points. The information from one set of wavelet coecients is not used to in uence the other procedure. In this sense, I am making a \leap of faith" in that the DWT and MODWT wavelet coecients will identically represent features in the original time series. This is most likely not true, especially at higher scales. If several peaks occur in the rotated cumulative variance, then it may well be the case that the maximum value at scale j from the DWT and MODWT will not correspond with the same location. To ensure correspondence between the DWT and MODWT maxima, I could include a logical statement which determines if the two locations are roughly equivalent. If so, the location is kept as a signi cant variance change, otherwise the next highest value from the MODWT could be selected and compared with the DWT maxima. This is repeated until an agreement is reached between the two transforms. A more appealing x to this problem is to detect and locate variance change points using the MODWT. This is not currently possible, since we lack an asymptotic distribution for the test statistic when using MODWT wavelet coecients. Monte Carlo studies are possible when testing for a single variance change (and therefore a xed sample size), but the multiple testing procedure guarantees not knowing the
187 sample size after the rst split { making Monte Carlo results dicult to utilize. The use of an equivalent degrees of freedom argument has been shown to reasonably approximate the distribution of the wavelet variance and also proven sucient, for certain sample sizes, to modify the test statistic D computed with MODWT wavelet coecients.
7.4 Testing Homogeneity of Covariance With the analysis of bivariate time series now possible using wavelet techniques, the question of homogeneity of covariance arises. A cumulative sum of squares test statistic may be useful for investigating departures from a constant association between two time series. More work is needed to formulate speci c statistical hypotheses. Following Section 4.2.1, let Pk jX Y j Qk PNj=1 j j ; k = 1; : : : ; N ? 1; j =1 jXj Yj j be the partial cumulative absolute covariance between two processes fXt g and fYtg + ; D? ), where and de ne DXY max(DXY k XY k ? 1 + ? DXY 1max Qk ? N ? 1 : ? Qk and DXY 1max kN ?1 N ? 1 kN ?1 As before, percentage points for the distribution of DXY under the null hypothesis can be readily obtained through Monte Carlo simulations. In fact, the empirical distribution of DXY appears to be similar to D; see Figure 7.1. Across all scales, we see that the distribution of D is stochastically greater than that of DXY { especially at the right tail { and the dierence increases as we go further in that right tail. Plots of the empirical densities of each test statistic indicate that they are skewed to the right and very similar in shape. Initially, Monte Carlo experiments should be reasonable for approximating the distribution of DXY . For the time being, single changes of covariance are easily tractable but a reasonable approximation must be found in order to test multiple changes of covariance on a scale by scale basis.
Level: 6
Level: 5
Level: 4
Covariance Critical Values
Level: 3
Level: 2
Level: 1
Variance Critical Values
Figure 7.1: Quantile-quantile plot comparing the Monte Carlo distributions of D (horizontal axis) and DXY (vertical axis). The sample size of the original time series was N = 1024, which was decomposed by the LA(8) wavelet lter using the DWT. Each plot contains 5000 observations.
189 Previous work in the distribution of the product of two normally distributed variables may be applicable here; see, e.g., Craig (1936) and Aroian (1947). Torrence and Compo (1998) utilize the distribution of the square root of the product of two 2 random variables when computing con dence intervals for their cross-wavelet power.
7.5 Equivalent Degrees of Freedom for the Wavelet Covariance An equivalent degrees of freedom argument is used by Percival (1995) to specify alternative con dence intervals for the wavelet variance. The idea of alternative con dence intervals for the wavelet covariance may also be of interest. Priestley (1981, pp. 695-696) mentions using the complex Wishart distribution for approximating the distribution of the cross-spectral matrix. It is not immediately apparent if this is useful when considering the wavelet covariance. Let Xt and Yt; t = 0; : : : ; N ? 1 be de ned as in Section 5.2.1. Recall that the MODWT estimator of the wavelet covariance, associated with scale j , is de ned to be N X?1 f(X ) f(Y ) W W
~XY (j ) = e1 Nj l=L ?1 j;l j;l j
(Equation (5.3)). Now de ne ?XY (j ) to be the 22 sample wavelet variance/covariance matrix computed using the MODWT; i.e.,
2 3 2 2 3
~ ( ) ~ ( ) ~ ( ) ~ ( ) ?XY (j ) 4 XX j XY j 5 = 4 X j XY2 j 5 :
~Y X (j ) ~Y Y (j )
~Y X (j ) ~Y (j )
Hence, the joint distribution of the elements of Nej ?XY (j ) is Wishart (Johnson and Kotz 1972, Ch. 38). We are not interested in the distribution of ?XY (j ) per se, but in the marginal distribution of ~XY (j ) only. The diagonal elements of ?XY (j ) are a quadratic form of normal variables and may be approximated by a 2 distribution (c.f. Section 3.4.2). For the o-diagonal elements of ?XY (j ), their distribution is not of the gamma type (Johnson and Kotz 1972, p. 159). However, Goodman (1957) de-
190 rived the asymptotic marginal and joint distributions of bivariate spectral estimators. These results may be adapted for use with our bivariate wavelet estimators.
7.6 Assessing Non-Gaussian/Non-Linear Processes With respect to bivariate wavelet analysis of time series, I have given results solely for certain Gaussian (normal) processes. This assumption greatly simpli es proofs which require properties such as strict stationarity and/or nite moments. However, time series analysis often does not have the \luxury" of analyzing only Gaussian processes. Working under the assumption of a linear process (Hannan 1970, pp. 9{13) with not necessarily Gaussian errors, or even more general processes would greatly expand the methodology provided in this dissertation. Recently, Serroukh, Walden, and Percival (1998) have looked at the statistical properties of the MODWT estimator of wavelet variance for non-Gaussian and nonlinear time series. A central limit theorem is established for the centralized wavelet variance under the assumption of strict stationarity and a given number of nite moments. Using the surface albedo measurements of pack ice from Lindsay et al. (1996), they show that con dence intervals based on the Gaussian assumption are much smaller than the ones under more general processes at smaller scales.
7.7 Final Comments I believe that better statistical methods come from the necessity demanded by real world problems. In this dissertation I have attempted to provide new and useful methodology in order to analyze problems in time series analysis where techniques were previously lacking. The two primary areas being detecting and locating (multiple) changes of variance in time series with long memory structure using the DWT, and extending wavelet analysis of variance for univariate time series to a wavelet analysis of covariance for bivariate time series; i.e., introducing the wavelet cross-
191 covariance and cross-correlation. With respect to the former area, I have tried to provide a more thorough investigation of ideas from the study of long memory processes, change-point detection and the ability of the DWT to approximately decorrelate time series on a scale by scale basis. The concepts of wavelet covariance and correlation are natural extensions of the work by D. B. Percival and others on the wavelet variance. The DWT is a powerful mathematical tool, enabling statisticians to examine much more complicated processes by separating features on a scale by scale basis. However, it cannot be applied to problems without caution. Which wavelet lter to use is a very important issue. To help select an appropriate wavelet lter, several lters of various lengths should be applied to the data and visually analyzed to help detect the potential leakage of low frequency features throughout the multiresolution analysis. Most importantly, serious thought should be put into how the shape of the underlying wavelet lter matches the physical process from where the data was sampled. Keeping these issues in mind, there should be a great variety of problems where wavelet analysis is bene cial.
BIBLIOGRAPHY Abraham, B. and W. W. S. Wei (1984). Inferences about the parameters of a time series model with changing variance. Metrika 31, 183{194. Abry, P. and D. Veitch (1998). Wavelet analysis of long-range-dependent trac. IEEE Transactions on Information Theory 44 (1), 2{15. Allan, D. W. (1966). Statistics of atomic frequency standards. Proceedings of the IEEE 31, 221{230. Anderson, J. R., D. E. Stevens, and P. R. Julian (1984). Temporal variations of the tropical 40{50 day oscillation. Monthly Weather Review 112 (12), 2431{2438. Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John Wiley and Sons, Inc. Aroian, L. A. (1947). The probability function of the product of two normally distributed variables. The Annals of Mathematical Statistics 18, 265{271. Balek, J. (1977). Hydrology and Water Resources in Tropical Africa, Volume 8 of Developments in Water Science. New York: Elsevier Scienti c Pub. Co. Beran, J. (1994). Statistics for Long-Memory Processes, Volume 61 of Monographs on Statistics and Applied Probability. New York: Chapman & Hall. Beran, J. and N. Terrin (1996). Testing for a change of the long-memory parameter. Biometrika 83 (3), 627{638. Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley & Sons.
193 Bingham, C., M. D. Godfrey, and J. W. Tukey (1967). Modern techniques of power spectrum estimation. IEEE Transactions on Audio and Electroacoustics 15 (2), 56{66. Bloom eld, P. (1976). Fourier Analysis of Time Series: An Introduction. New York: John Wiley & Sons. Box, G. E. P. and G. M. Jenkins (1976). Time Series Analysis: Forecasting and Control (2 ed.). Time Series Analysis and Digital Processing. San Francisco: Holden Day. Bradshaw, G. A. and T. A. Spies (1992). Characterizing canopy gap structure in forests using wavelet analysis. Journal of Ecology 80 (2), 205{215. Bretherton, C. S., M. Widmann, V. P. Dymnikov, J. M. Wallace, and I. Blade (1998). Eective number of degrees of freedom of a spatial eld. Submitted to Journal of Climate. Briggs, W. L. and V. E. Henson (1995). The DFT: An Owner's Manual for the Discrete Fourier Transform. Philadelphia: Society for Industrial and Applied Mathematics. Brillinger, D. R. (1979). Con dence intervals for the crosscovariance function. In Mathematical Statistics, Volume 5 of Selecta Statistica Canadiana, pp. 1{16. Hamilton, Ontario: McMaster University Printing Services. Brillinger, D. R. (1981). Time Series: Data Analysis and Theory. Holden-Day Series in Time Series Analysis. San Francisco: Holden-Day. Expanded edition. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods (2 ed.). New York: Springer-Verlag. Brooks, C. E. P. (1949). Climate Through the Ages (2 ed.). New York: McGraw-Hill Book Co.
194 Chen, J. and A. K. Gupta (1997). Testing and locating variance changepoints with application to stock prices. Journal of the American Statistical Association 92 (438), 739{747. Chui, C. K. (1997). Wavelets: A Mathematical Tool for Signal Analysis. SIAM Monographs on Mathematical Modeling and Computation. Philadelphia: Society for Industrial and Applied Mathematics. Craig, C. C. (1936). On the frequency function of xy. The Annals of Mathematical Statistics 7, 1{15. Daubechies, I. (1992). Ten Lectures on Wavelets, Volume 61 of CBMS-NSF Regional Conference Series in Applied Mathematics. Philadelphia: Society for Industrial and Applied Mathematics. David, F. N. (1966). Tables of the correlation coecient. In E. S. Pearson and H. O. Hartley (Eds.), Biometrika Tables for Statisticians (3 ed.), Volume 1. Cambridge: Cambridge University Press. Davies, R. B. and D. S. Harte (1987). Tests for Hurst eect. Biometrika 74, 95{101. Davis, W. W. (1979). Robust methods for detection of shifts of the innovation variance of a time series. Technometrics 21 (3), 313{320. Donoho, D. L. (1993). Nonlinear wavelet methods for recovery of signals, densities, and spectra from indirect and noisy data. In Proceedings of Symposia in Applied Mathematics, Volume 47, pp. 173{205. American Mathematical Society. Donoho, D. L. (1995). De-noising by soft-thresholding. IEEE Transactions on Information Theory 41 (3), 613{627. Donoho, D. L. and I. M. Johnstone (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 (3), 425{455.
195 Donoho, D. L., I. M. Johnstone, G. Kerkyacharian, and D. Picard (1995). Wavelet shrinkage: Asymptopia? (with discussion). Journal of the Royal Statistical Society B 57 (2), 301{369. Fisher, R. A. (1915). Frequency distribution of the values of the correlation coecient in samples from an inde nitely large population. Biometrika 10, 507{521. Fisher, R. A. (1929). Tests of signi cance in harmonic analysis. Proceedings of the Royal Society of London, Series A 125, 54{59. Fuller, W. A. (1996). Introduction to Statistical Time Series (2 ed.). New York: Wiley-Interscience. Giraitis, L. and R. Leipus (1995). A generalized fractionally dierencing approach in long-memory modeling. Lietuvos Matematikos Rinkinys 35 (1), 65{81. Goodman, N. R. (1957). On the joint estiamtion of spectra, cospectrum and quadrature spectrum of a two-dimensional stationary Gaussian process. Sci. Paper No. 10. Engrng. Statist. Lab., New York Univ., New York. Graf, H.-P. (1983). Long-range correlations and estimation of the self-similarity parameter. Ph. D. thesis, Eidgenossische Technische Hochschule, Zurich. Granger, C. W. J. and R. Joyeux (1980). An introduction to long-memory time series models and fractional dierencing. Journal of Time Series Analysis 1, 15{29. Gray, B. M. (1988). Seasonal frequency variations of the 40{50 day oscillation. Journal of Climatology 8, 511{519. Haar, A. (1910). Zur Theorie der orthogonalen Funktionen-Systeme. Mathematische Annalen 69, 331{371. In German. Hannan, E. J. (1970). Multiple Time Series. New York: John Wiley and Sons, Inc.
196 Hosking, J. R. M. (1981). Fractional dierencing. Biometrika 68 (1), 165{176. Hosking, J. R. M. (1984). Modeling persistence in hydrological time series using fractional dierencing. Water Resources Research 20 (12), 1898{1908. Hsu, D.-A. (1977). Tests for variance shift at an unknown time point. Applied Statistics 26 (3), 279{284. Hsu, D.-A. (1979). Detecting shifts of parameter in gamma sequences with applications to stock price and air trac ow analysis. Journal of the American Statistical Association 74, 31{40. Hudgins, L., C. A. Friehe, and M. E. Mayer (1993). Wavelet transforms and atmospheric turbulence. Physical Review Letters 71 (20), 3279{3282. Hudgins, L. H. (1992). Wavelet Analysis of Atmospheric Turbulence. Ph. D. thesis, University of California, Irvine. Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers 116, 770{779. Imhof, J. P. (1961). Computing the distribution of a quadratic form in normal variables. Biometrika 48, 419{426. Inclan, C. and G. C. Tiao (1994). Use of cumulative sums of squares for retrospective detection of changes of variance. Journal of the American Statistical Association 89 (427), 913{923. Isserlis, L. (1918). On a formula for the product-moment coecient of any order of a normal frequency distribution in any number of variables. Biometrika 12, 134{139. Jarvis, C. S. (1936). Flood-stage records of the river Nile. Transactions of the American Society of Civil Engineers 101, 1012{1071.
197 Jensen, M. J. (1994). Wavelet analysis of fractionally integrated processes. Technical Report ewp-em/9405001, Department of Economics, Washington University. Johnson, N. L. and S. Kotz (1970). Continuous Univariate Distributions. New York: Houghton Miin. Johnson, N. L. and S. Kotz (1972). Continuous Multivariate Distributions. New York: John Wiley & Sons, Inc. Jones, R. H. (1980). Maximum likelihood tting of ARMA models to time series with missing observations. Technometrics 22 (3), 389{395. Kawata, K. and S. Arimoto (1996). Signal matching using wavelet correlation. Electronics and Communications in Japan 3 79 (9), 23{34. Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. 78-A, No. 12, December 1995, pp. 1655{1664. Koopmans, L. H. (1974). The Spectral Analysis of Time Series. New York. Academic Press. Kotz, S., N. L. Johnson, and C. B. Read (Eds.) (1982). Encyclopedia of Statistical Sciences. New York: Wiley. Kuhnel, I. (1989). Spatial and temporal variations in Australo{Indonesian region cloudiness. International Journal of Climatology 9 (4), 395{405. Lawrence, A. J. and N. T. Kottegoda (1977). Stochastic modelling of river ow time series. Journal of the Royal Statistical Society A 140 (1), 1{47. Leftus, V. (1986). Solar activity variations and climatic changes. Studia Geophysica et Geodaetica 30 (1), 93{110. Lehmann, E. L. (1983). Theory of Point Estimation. New York: Wiley. Li, H. and T. Nozaki (1997). Application of wavelet cross-correlation analysis to
198 a plane turbulent jet. Japanese Society of Mechanical Engineers International Journal, Series B 40 (1), 58{66. Lindsay, R. W., D. B. Percival, and D. A. Rothrock (1996). The discrete wavelet transform and the scale analysis of the surface properties of sea ice. IEEE Transactions on Geoscience and Remote Sensing 34 (3), 771{787. Madden, R. A. (1986). Seasonal variation of the 40{50 day oscillation in the tropics. Journal of Atmospheric Science 43 (24), 3138{3158. Madden, R. A. and P. R. Julian (1971). Detection of a 40{50 day oscillation in the zonal wind in the tropical paci c. Journal of Atmospheric Science 28, 702{708. Madden, R. A. and P. R. Julian (1994). Observations of the 40{50 day tropical oscillation: A review. Monthly Weather Review 122 (5), 814{837. Mallat, S. G. (1989). A theory for multiresolution signal decomposition: The wavelet representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 11 (7), 674{693. Mandelbrot, B. B. and J. W. van Ness (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review 10 (4), 422{437. Mandelbrot, B. B. and J. R. Wallis (1969). Some long-run properties of geophysical records. Water Resources Research 5 (2), 321{340. Mann, H. B. and A. Wald (1943). On stochastic limit and order relationships. The Annals of Mathematical Statistics 14, 217{226. McCoy, E. J. and A. T. Walden (1996). Wavelet analysis and synthesis of stationary long-memory processes. Journal of Computational and Graphical Statistics 5 (1), 26{56. McCoy, E. J., A. T. Walden, and D. B. Percival (1998). Multitaper spectral estimation of power law processes. IEEE Transactions on Signal Processing 46 (3),
199 655{668. Mehrabi, A. R., H. Rassamdana, and M. Sahimi (1997). Characterization of long-range correlations in complex distributions and pro les. Physical Review E 56 (1), 712{722. Mohr, D. L. (1981). Modeling Data as a Fractional Gaussian Noise. Ph. D. thesis, Princeton University. Nuri, W. A. and L. J. Herbst (1969). Fourier methods in the study of variance
uctuations in time series analysis. Technometrics 11 (1), 103{113. Ogden, R. T. (1994). Wavelet Thresholding in Nonparametric Regression with Change-Point Applications. Ph. D. thesis, Texas A&M University. Ogden, R. T. (1997). Essential Wavelets for Statistical Applications and Data Analysis. Boston: Birkhauser. Ogden, R. T. and E. Parzen (1996). Change-point approach to data analytic wavelet thresholding. Statistics and Computing 6 (2), 93{99. Percival, D. B. (1983). The Statistics of Long Memory Processes. Ph. D. thesis, Department of Statistics, University of Washington. Percival, D. B. (1992). Simulating Gaussian random processes with a speci ed spectra. Computing Science and Statistics 24, 534{538. Percival, D. B. (1993). Three curious properties of the sample variance and autocovariance for stationary processes with unknown mean. The American Statistician 47 (4), 274{276. Percival, D. B. (1994). Spectral analysis of univariate and bivariate time series. In J. L. Stanford and S. B. Vardeman (Eds.), Statistical Methods for Physical Science, Volume 28 of Methods of Experimental Physics, pp. 313{348. Boston: Academic Press, Inc.
200 Percival, D. B. (1995). On estimation of the wavelet variance. Biometrika 82 (3), 619{631. Percival, D. B. and P. Guttorp (1994). Long-memory processes, the Allan variance and wavelets. In E. Foufoula-Georgiou and P. Kumar (Eds.), Wavelets in Geophysics, Volume 4 of Wavelet Analysis and its Applications, pp. 325{344. San Diego: Academic Press, Inc. Percival, D. B. and H. O. Mofjeld (1997). Analysis of subtidal coastal sea level uctuations using wavelets. Journal of the American Statistical Association 92 (439), 868{880. Percival, D. B. and A. T. Walden (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques. Cambridge: Cambridge University Press. Percival, D. B. and A. T. Walden (1999). Wavelet Methods for Time Series Analysis. Cambridge: Cambridge University Press. Forthcoming. Popper, W. (1951). The Cairo Nilometer, Volume 12 of Publications in Semitic Philology. Berkeley: University of California Press. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery (1992). Numerical Recipes in C: The Art of Scienti c Computing (2 ed.). Cambridge: Cambridge University Press. Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press, Inc. Rice, S. O. (1945). Mathematical analysis of random noise, part III: Statistical properties of random noise currents. Bell Systems Technical Journal 24, 46{ 156.
201 Riedel, K. S. and A. Sidorenko (1995). Minimum bias multiple taper spectral estimation. IEEE Transactions on Signal Processing 43 (1), 188{195. Schuster, A. (1898). On the investigation of hidden periodicities with application to a supposed 26-day period of meterological phenomena. Terrestrial Magnetism 3, 13{41. Serroukh, A., A. T. Walden, and D. B. Percival (1998). Statistical properties of the wavelet variance estimator for non-Gaussian/non-linear time series. Technical Report 98{03, Department of Mathematics, Imperial College of Science, Technology & Medicine. Slepian, D. (1978). Prolate spheroidal wave functions, Fourier analysis, and uncetainty { V: The discrete case. Bell System Technical Journal 57, 1371{1430. Srivastava, M. S. (1993). Comparison of CUMSUM and EWMA procedures for detecting a shift in the mean or an increase in the variance. Journal of Applied Statistical Science 1 (4), 445{468. Stephens, M. A. (1970). Use of the Kolmogorov{Smirnov, Cramer{von Mises and related statistics without extensive tables. Journal of the Royal Statistical Society B 32 (1), 115{122. Stephens, M. A. (1986). Tests based on EDF statistics. In R. B. D'Agostino and M. A. Stephens (Eds.), Goodness-of-Fit Techniques, Volume 68 of STATISTICS: Textbooks and Monographs, pp. 97{193. New York: Marcel Dekker. Tew k, A. H. and M. Kim (1992). Correlation structure of the discrete wavelet coecients of fractional Brownian motion. IEEE Transactions on Information Theory 38 (2), 904{909. Thomson, D. J. (1982). Spectrum estimation and harmonic analysis. IEEE Proceedings 70 (9), 1055{1096.
202 Titchmarsh, E. C. (1939). The Theory of Functions (2 ed.). Oxford: Oxford University Press. Torrence, C. and G. P. Compo (1998). A practical guide to wavelet analysis. Bulletin of the American Meteorological Society 79 (1), 61{78. Toussoun, O. (1925). Memoire sur l'histoire du nil. In Memoires a l'Institut d'Egypte, Volume 18, pp. 366{404. Tsay, R. S. (1988). Outliers, level shifts, and variance changes in time series. Journal of Forecasting 7, 1{20. Tukey, J. W. (1949). The sampling theory of power spectrum estimates. In Symposium on Applications of Autocorrelation Analysis to Physical Problems, pp. 47{67. Oce of Naval Research, Department of the Navy, Washington, U.S.A. Verner, M. (1972). Periodical water-volume uctuations of the Nile. Archiv Orientaln 40 (2), 105{123. Vetterli, M. and J. Kovacevic (1995). Wavelets and Subband Coding. New Jersey: Prentice Hall PTR. Walden, A. T. (1994). Interpretation of geophysical borehole data via interpolation of fractionally dierenced white noise. Applied Statistics 43 (2), 335{345. Walden, A. T. and R. E. White (1990). Estimating the statistical bandwidth of a time series. Biometrika 77, 699{707. Walker, G. T. (1928). World weather. Monthly Weather Review 56, 167{170. Wang, Y. (1995). Jump and sharp cusp detection by wavelets. Biometrika 82 (2), 385{397. Wang, Y., J. E. Cavanaugh, and C. Song (1997). Self-similarity index estimation via wavelets for locally self-similar processes. Department of Statistics, University of Missouri.
203 Wichern, D. W., R. B. Miller, and D.-A. Hsu (1976). Changes of variance in rstorder autoregressive time series models { with an application. Applied Statistics 25 (3), 248{256. Wickerhauser, M. V. (1994). Adapted Wavelet Analysis from Theory to Software. Wellesley, Massachusetts: A K Peters, Ltd. Wornell, G. W. (1993). Wavelet-based representations for the 1=f family of fractal processes. Proceedings of the IEEE 81 (10), 1428{1450. Wornell, G. W. (1996). Signal Processing with Fractals: A Wavelet Based Approach. New Jersey: Prentice Hall.
Appendix A
FOURIER THEORY AND FILTERING Wavelet methodology shares the basic goals of its Fourier cousin, to transform signals into a dierent domain so that interesting features may be brought to the surface. This is done by using basis functions that dier from the sines and cosines utilized by the discrete Fourier transform (DFT). Having been developed much later, the notation and concepts of wavelet methodology borrow a great deal from the well established elds of ltering and Fourier analysis. We will now outline basic concepts and notation which will be used over and over in this dissertation.
A.1 The Discrete Fourier Transform Even though the Fourier transform comes in several varieties, we present a synopsis of the discrete time/continuous frequency avor as given in Percival and Walden (1999, Ch. 2); see Percival and Walden (1993, Ch. 3) for a discussion of all types of the Fourier transform and related issues. This version of the Fourier transform facilitates the analysis of discrete sequences of observations (time series), for example, fxt j t = 0; 1; 2g. Let us further restrict ourselves to square-summable sequences; P jx j2 < 1. i.e., 1 t=?1 t Let X () be a complex-valued function which de nes the DFT (analysis) of fxtg via
X (f )
1 X
where ? 21 < f < 12 are frequencies. The function X () measures the association between fxtg and the sequences fe?i2ftg. This is simply one choice of sequences
205 with which to analyze a time series of observations, the discrete wavelet transform (DWT) uses a formula similar to Equation (A.1) that utilizes sequences which dier fundamentally from the complex exponentials. The inverse DFT (synthesis) of fxtg is given by
xt =
A(f )ei2ft df; t = 0; 1; 2; : : : :
This can be shown by substituting Equation (A.1) for X () to get
A(f )ei2ft df =
Z1 X 1
x 0 e?i2ft0
ei2ft df
?1 t0 =?1 1 1 xt0 ei2f (t?t0) df: = ?1 t0 =?1 when t = t0, thus the inverse DFT
The integral is non-zero only is established. The relationship between fxtg and X () is summarized by calling them a Fourier transform pair using the notation
fxtg ! X ():
A.2 Properties of the DFT We state and prove several properties of the discrete Fourier transform which will prove useful in later sections, not only concerning the spectral analysis of time series but also the discrete wavelet transform.
Linearity The DFT of a linear combination of sequences is simply the linear combination of their respective DFTs,
xt + yt ! X (f ) + Y (f ): This is easily veri ed by replacing xt with xt in Equation (A.1), noting that the constant does not depend on t and therefore can be taken out of the integral. Hence, the DFT of fxtg is X (). The same argument is applied to yt to obtain its DFT, and linearity follows.
Translation A shift in time in the original sequence causes a multiplication by a phase factor to its DFT,
xt? ! e?i2f X (f ): The proof follows from a direct application of the DFT to fxt? g, 1 X t=?1
xt? e?i2ft =
1 X
xt0 e?i2f (t0+ )
t0 =?1
= e?i2f
1 X t0 =?1
xt0 e?i2ft0 = e?i2f X (f ):
The converse, that is, a shift in frequency in the DFT of a sequence is equivalent to a multiplication of a phase factor to that sequence, follows from a similar argument applied to the inverse DFT,
1 2 ? 21
? )ei2ft df
= =
1 2 ? 12
A(f 0)ei2(f 0+)t df 0
1 2 ei2t 1 ?2
A(f 0)ei2f 0t df 0 = ei2txt:
Convolution The convolution of two sequences fxtg and fytg is de ned to be
x yt
1 X u=?1
xuyt?u ; t = 0; 1; 2; : : : :
Convolving two sequences in time results in the multiplication of their respective DFTs,
x yt ! X (f )Y (f ):
207 The proof is seen by applying the de nitions to the left-hand side of Equation (A.2) and evaluating the resulting expression 1 X
x yte?i2ft = = = =
1 1 X X
xuyt?u e?i2ft
t=?1 u=?1 1 1 yt?ue?i2ft xu u=?1 t=?1 1 1 y e?i2f (+u) xu u=?1 =?1 1 1 y e?i2f xue?i2fu =?1 u=?1
! X
= X (f )Y (f ):
A convolution of the DFTs results in the multiplication of the original sequences,
xt yt ! X Y (f ): A proof of this statement is similar to that of Equation (A.2).
Parseval's Relation The DFT is an energy preserving transform in that 1 X
xtyt =
Z 21
X (f )Y (f ) df;
? 12
where fxtg and fytg are square-summable sequences with fxtg ! X () and fytg ! Y (). If we let xt = yt, then we have Parseval's relation 1 X
1 2 ? 12
jX (f )j2 df:
The proof of Equation (A.3) is done by substituting the de nition of the DFTs X () and Y () in Equation (A.3) and evaluating the resulting integral
1 2 ? 21
X (f )Y (f ) df = =
1 2 ? 12
1 X
t=?1 1
X X 1
t=?1 t0=?1
! X 1 ?i2ft
Z 12
? 12
t0 =?1
y0 ei2ft0
ei2f (t0?t) df
1 X t=?1
df xtyt;
since the integral of the modulated complex exponential is one if t = t0 and zero otherwise.
A.3 Filtering of Sequences Here we present some notation and concepts taken from Percival and Walden (1999, Ch. 2). The concept of convolution, presented in the previous section, is identical to that of linear time-invariant ltering. If an input sequence fytg is ltered by fxtg, then the output from the lter is fx ytg. Implementation of the discrete wavelet transform is done eciently by ltering, versus a series of matrix operations. Notions present in the realm of ltering theory appear whenever discussing wavelet methodology, and therefore, we present some key topics here. A lter fhtg is also known as an impulse response sequence and its DFT H () is called the transfer function. The transfer function characterizes the lter in terms of the frequencies it captures; examples include high-pass, low-pass and band-pass lters. As the names imply, a high-pass lter retains high frequencies and suppresses low frequencies, a low-pass lter does the opposite, and a band-pass lter preserves a speci c range (or band) of frequencies while suppressing all other frequencies. Like any complex-valued function, the transfer function for a given lter can be expressed in polar notation as
H (f ) = jH (f )jei(f ); where jH (f )j is the gain function and (f ) is the phase function for the lter. We will see squared gain functions for several wavelet lters in Section 3.1. Often is the case where not one, but several lters are used to analyze a sequence. A cascade of lters (reference?) is a series of J lters such that the output from the rst lter is the input to the second lter and so on. If fhj;t j t = 0; 1; 2; : : : g, j = 1; : : : ; J , are a series of lters with transfer functions Hj (), then the output from the cascade of lters can be expressed as
yt =
1 X
hu xt?u; t = 0; 1; 2; : : : ;
209 where fhtg is the equivalent lter for the cascade whose transfer function is given by
H (f )
YJ j =1
Hj (f ):
The output from the rst lter fh1;tg has DFT H1(f )X (f ). After applying the J lters, the DFT of fytg is Y (f ) H1(f )H2 (f ) HJ (f )X (f ). Using Equation (A.4) and the convolution property of the Fourier transform, Y (f ) = H (f )X (f ) and, therefore, fytg is simply the convolution of fxtg with fhtg.
Appendix B
UNIVARIATE SPECTRAL ANALYSIS B.1 Introduction As with the Fourier transform, we also require concepts from the spectral analysis of time series in order to better describe and understand wavelet methodology. The topics described here can be found, using similar notation, within Percival and Walden (1993) and, with much greater detail, within Priestley (1981). Let us begin with the spectral representation theorem for a discrete parameter stationary process. There exists an orthogonal process fZ (f )g de ned on the interval [?1=2; 1=2] such that ms
Xt =
1 2 1 2
ei2ft dZ (f )
for all integers t, where the equality is in the mean square sense. That is, the squared norm between the left-hand side and right-hand side is zero. We de ne E fjdZ (f )j2g dS (I )(f ) for all jf j 1=2, and call S (I )() the integrated spectrum of fXtg. For our purposes here, we will assume the integrated spectrum is dierentiable everywhere with derivative S (), so that
E fjdZ (f )j2g = dS (I )(f ) = S (f )df: The autocovariance sequence (acvs) of a stationary process fXtg, with zero mean, can be written as
s E fXtXt+ g =
1 2 1 2
S (f )ei2f df:
211 Conversely, if fs g is square-summable, the spectrum of fXtg can be de ned in terms of the acvs via
S (f ) =
1 X
s e?i2f ;
and therefore the two quantities form a Fourier transform pair fs g ! S (). The spectrum S () possesses all the properties outlined in Section A.2 and will be useful in proving various results for functions of wavelet coecients.
B.2 Spectral Estimation Suppose the time series Xt; t = 1; : : : ; N , is a realization of a portion of a zero mean stationary process with sdf S () and autocovariance sequence fs g. Let fs^(p)g ! Sb(p)(), where s^(p) is the usual biased estimator of the autocovariance sequence; i.e.,
NX ?j j 1 XtXt+j j for 0 N ? 1; N t=1
and s^(p) = 0 for j j N . The method of moments spectral estimator is the periodogram
N X Sb(p)(f ) = s^(p)e?i2f = N1 Xt e?i2ft : t=1 =?(N ?1)
N ?1
The disadvantages of the periodogram are well documented; see, for example, (Percival and Walden 1993, p. 197). We will not concern ourselves with such matters, except to point out the existence of one of several alternative spectral estimators { the multitaper spectral estimator (Thomson 1982; Percival and Walden 1993, Ch. 7). We introduce a set of K orthonormal data tapers fht;k j t = 1; : : : ; N g, where k P ranges from 0 to K ? 1; i.e., Nt=1 ht;j ht;k = 1 if j = k and 0 if j 6= k. Examples of common data tapers are the sine tapers (Riedel and Sidorenko 1995) and discrete prolate spheroidal sequences data tapers (dpss) (Slepian 1978; Thomson 1982; Percival
212 and Walden 1993, Ch. 8). Sine tapers were designed to minimize the spectral window bias and can be approximated well using the following closed form expression (sine) ht;k
= N 2+ 1
1 2
(k + 1)t N +1
In contrast, the dpss data tapers minimize the spectral window sidelobes, using a resolution bandwidth parameter W , and must be calculated using techniques such as inverse iteration, numerical integration or a tridiagonal formulation (Percival and Walden 1993, Ch. 8). The role of any data taper is to protect against leakage, and all the sine tapers provide moderate leakage protection where the dpss data tapers oer adjustable leakage protection through the parameter W . In practice there is little dierence in the multitaper spectral estimators when using either data taper. The typical multitaper spectral estimator is given by
K ?1 N bS (mt)(f ) = 1 X Sbk(mt)(f ) with Sbk(mt)(f ) = X ht;k Xt e?i2f : t=1 K k=0
Thus, the multitaper spectral estimator is the average of several direct spectral estimators (more speci cally, eigenspectra) using an set of orthonormal data tapers. Multitaper spectral estimators overcome several of the inadequacies of the periodogram and possess reasonable bias, variance and resolution properties.
B.3 Equivalent Degrees of Freedom for a Spectral Estimator We want to approximate the asymptotic distribution of our spectral estimate Sb(f ) using a distribution of the form a2 because the true distribution is dicult to determine (see, e.g., Priestley (1981, pp. 466{468)), where the constants a and are found by moment matching (Tukey 1949). Using the properties of the 2 distribution, we know
E Sb(f ) = E a2 = a and Var Sb(f ) = Var a2 = 2a2:
213 Solving for a and , we obtain
2 E Sb(f ) n o = Var Sb(f )
E Sb(f ) and a = :
If we express the spectral estimate as a lag window spectral estimator (Percival and Walden 1993, Sec. 6.7), then we can rewrite Equation (B.4) as
2N t R f Ch ?f Wm2 ()d (N )
(N )
and a = S (f ) ;
where Ch is a constant which depends on the data taper used (see Table 248 in Percival and Walden (1993) for values of Ch) and Wm() is the smoothing window. Hence, the quantity is the equivalent degrees of freedom of the spectral estimator Sb(f ).
Appendix C
BIVARIATE SPECTRAL ANALYSIS The material presented in following sections closely follows an introduction to bivariate spectral analysis in Percival (1994), and is a natural extension of univariate topics found in Percival and Walden (1993) using similar notation. A more thorough introduction to multivariate spectral analysis can be found in, for example, Koopmans (1974), Priestley (1981) and Brillinger (1981).
C.1 Introduction Let fXtg and fYtg be zero-mean weakly stationary processes with spectral density functions (autospectra) SX () and SY (), respectively. The cross spectral density function (csdf) of fXt; Yt g is de ned to be 1 X SXY (f ) = C;XY e?i2f ; ? 21 f 21 ; =?1 where C;XY is the cross covariance sequence (ccvs) given by
C;XY = CovfXt; Yt+ g = E fXt Yt+ g: The complete spectral properties of a bivariate time series at frequency f can be summarized by the spectral matrix 2 3 S(f ) 4 SX (f ) SXY (f ) 5 : (C.1) SY X (f ) SX (f ) Although this is not a symmetric matrix, there are numerous ways of expressing the cross-diagonal terms (Brillinger 1981, p. 23); i.e., (?f ) = SY X (?f ) = S (f ): SXY (f ) = SXY YX
215 Thus, the spectral matrix can be expressed in terms of three distinct quantities instead of four 2 3 S ( f ) S ( f ) XY 5: S(f ) = 4 X SXY (?f ) SX (f ) Whereas the spectrum of a real valued process is real valued, since the autocovariance sequence is symmetric about 0, the csdf (or cross spectrum) is usually complex valued. This allows us to express SXY () in Cartesian form as
SXY (f ) = RXY (f ) ? iQXY (f ); where RXY () is the co-spectrum and QXY (f ) is the quadrature spectrum. It may also be expressed in polar notation as
SXY (f ) = AXY (f )ei
(f );
where AXY (f ) jSXY (f )j is the amplitude spectrum and XY () is the phase spectrum. These new functions are at least real valued and may be more easily handled than the cross spectrum. The complex coherency (C.2) wXY (f ) = p SXY (f ) ; SX (f )SY (f ) depends upon both the cross spectrum and the autospectra for fXtg and fYtg. The complex coherency is a complex valued frequency domain \correlation coecient." It measures the correlation in the random amplitudes assigned to the complex exponentials with frequency f in the spectral representations of fXtg and fYtg. The quantity jwXY (f )j2 is called the magnitude squared coherence (msc) at the frequency f . Thus, we have (f )j2 = A2XY (f ) ; jwXY (f )j2 = SjS(XY X f )SY (f ) SX (f )SY (f ) that is, the msc is a normalized version of the square of the cross-amplitude spectrum. The msc captures the \amplitude" part of the cross spectrum, but completely ignores its phase, so the msc and phase spectrum can be used together to summarize the \information" in the complex valued cross spectrum.
C.2 Spectral Estimation Let Xt; Yt; t = 1; : : : ; N , be a realization of a portion of a zero mean stationary process fXt; Ytg with cross spectrum SXY () and autospectra SX () and SY (), respectively. Just as the periodogram was used in the univariate case (Section B.2), the cross periodogram N ?1 (p) (f ) = X C b;XY e?i2f bSXY =?(N ?1)
is utilized here to estimate the cross spectrum. The sample cross covariance sequence is de ned to be
XtYt+ ;
where the summation goes from t = 1 to N ? for 0 and from t = 1 ? to N for < 0. The cross periodogram can also be written in a more computationally friendly form as (p) SbXY (f ) = N1
N X t=1
! X N ?i2ft
Yte?i2ft ;
where the asterisk denotes complex conjugation. The multitaper estimator of the cross spectrum is given by
(mt) SXY (f ) =
1 K
N X t=1
! X N ?i2ft t=1
hk;tYte?i2ft ;
where fhk;tg is the kth-order data taper for a sequence of length N normalized such P that t h2k;t = 1; k = 1; : : : ; K (c.f. Section B.2). Thus, the multitaper estimators for the phase spectrum and magnitude squared coherence are given by
n b(mt) o
mt) ^(XY (f ) = arg SXY (f )
b(mt) 2 (mt) 2 SXY (f ) w ^ ( f ) = XY Sb(mt)(f )Sb(mt)(f ) ; X Y
217 mt) respectively. The phase spectrum ^(XY () takes on values between ? and and, hence, is modulo 2. This can lead to discontinuities around . Priestley (1981, p. 709) describes a method to avoid these discontinuities { by simultaneously plotting the original estimate and translated versions of it.
VITA Brandon Whitcher was born in Carmel, California on August 1, 1971, and raised in Yakima, Washington. He graduated from Carnegie Mellon University in 1993 with a Bachelor of Science in Applied Mathematics (Statistics). He commenced studies at the University of Washington in the fall of 1993, where he received a M.S. in Statistics in 1995 and a Ph.D. in 1998. He is now working as a postdoc for EURANDOM, a European research institute for the study of stochastic phenomena, in Eindhoven, the Netherlands.