Journal of Oceanography, Vol. 59, pp. 187 to 200, 2003
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi TAKUJI WASEDA1,2*, LELAND J AMESON3, HUMIO MITSUDERA1,2 and MAX YAREMCHUK 2 1
Frontier Research System for Global Change, Yokohama 236-0001, Japan International Pacific Research Center †, University of Hawaii, Honolulu, HI 96822, U.S.A. 3 Complex Hydrodynamics, University of California, Lawrence Livermore National Laboratory, CA 94551, U.S.A. 2
(Received 30 July 2001; in revised form 10 September 2002; accepted 10 September 2002)
Wavelet Analysis provides a new orthogonal basis set which is localized in both physical space and Fourier transform space. Empirical Orthogonal Functions (EOFs), on the other hand, provide a global representation of data sets. Here we investigate the various ways in which one can combine these basis sets for optimal representation of data. EOFs represent the global large scale information and wavelet analysis are used to supplement this large scale information with local fine scale information. Here we begin to explore the application of these two basis sets for outputs from an Ocean General Circulation Model and we explore various applications, including data assimilation.
Keywords: ⋅ Wavelets, ⋅ EOF, ⋅ Kalman filter, ⋅ Ocean General Circulation Model.
the physical space. “Local”, on the other hand, refers to information that spans only a small fraction of the computational domain. By doing so, the global basis functions will represent the global, slowly varying structures in the flow and the small scale wavelets will represent the rapidly evolving, small scale structures such as vortices (Luo and Jameson, 2002). Put another way, one can only have an effective representation of information if the fundamental vectors of representation are “similar” to the underlying information that is being represented or compressed. For example, if information is fundamentally local in nature, then the Fourier basis vectors are not as effective as, say, a basis set of local vectors such as wavelets. By contrast, if the fundamental nature of the information is global, periodic and smooth, the Fourier will be close to optimal for representing the information. Now, in the case of EOFs of course they are not exactly Fourier in nature but there are many similarities in the global nature. So our new, unique idea in this case is to use a global basis set to represent global information and to use a local basis set to augment the global basis set. Now, there are two criteria that one should use to judge whether or not the new method is superior or not and they are: (1) For a given number of degrees of freedom is the error in the representation less than the error in other representations? And (2) Is the cost of calculating the next representation expensive or not? Now, our idea of combining
1. Introduction Physical processes generally are dominated by a few underlying degrees of freedom. When one takes a Fourier transform of a set of data, one expects a decaying spectrum with the majority of the energy of the process being contained in the large scale modes. The smaller scale modes are certainly relevant but comprise only a small portion of the total energy of the system. When one chooses to use a set of natural basis functions such as Empirical Orthogonal Functions (EOFs), one finds a similar phenomenon where the majority of the energy is contained in the modes which represent the largest scale structures. In many physical processes one has an overall global flow of the system with small localized phenomenon, such as eddies or vortices, which are confined to a small portion of the domain and are each moving independently. The thesis of this paper is that a global basis set such as EOFs or Fourier analysis can be combined with a local basis set such as wavelets. The meaning of “global” in this context is that the information that projects onto the basis function spans the entire computational domain in
* Corresponding author. E-mail:
[email protected] †
International Pacific Research Center is partly sponsored by Frontier Research System for Global Change. Copyright © The Oceanographic Society of Japan.
187
EOFs with wavelets is very general and one could in fact chose any combination of EOF and wavelets. That is, we might find some data fields that are “optimally” represented in terms of just wavelets and other data fields that are “optimally” represented by just EOFs. Here optimality will be a combination of the two criteria defined above, including the number of degrees of freedom and the error that is produced and the second criteria of the cost of calculating new representations. It is important to keep the cost in mind because wavelets are very inexpensive to calculate. Although, generaly speaking, EOFs are not expensive, we feel that EOFs are very expensive compared to wavelets and this observation, coupled with the fact that global basis elements such as EOFs are not efficient for representing local features such as eddies, means, we feel, that combining EOFs with wavelets provides a great deal of flexibility, computational efficiency and are optimal in the sense that if one were to define an “optimality norm” that combines numbers of degrees of freedom with cost, then certainly a combination of EOFs and wavelets is better than either EOFs or wavelets alone. The second criterion mentioned above, i.e., whether the cost of calculating the next representation is expensive or not, is crucial when the proposed method is applied to data assimilation. In a typical reduced-Kalman filter approach, one would reduce the rank of the matrix by approximating the error covariance by representative eigenvectors. If the system under consideration is linear, the optimality of this approximation can be estimated by the percentage of the variance that the reduced state represents. EOFs will be optimal for that purpose, and are used to initialize the error covariance matrix. The cost of computing the EOFs, of course, is insignificant because one would compute at most O(10) modes. However, if one intends to evolve the initialized error in time, the computational burden is huge. It will become even worse if one were to try to represent a more localized error that requires higher degrees of freedom. In essence, even with the reduced state, calculating the next representation of the eigenmodes is expensive. These methods can certainly be used in any field but here we concentrate on applications to data assimilation. The reader should note that the acronym used in the title of this paper, “WADAi” has the meaning of “topic of conversation” in the Japanese language. Now let us continue with setting the stage for data assimilation. The synthesis of observational data and numerical model output has long been utilized in improving the forecast skills of meteorological prediction models. The socalled reanalysis products from this data assimilation project in meteorology are now well recognized in the climate study community. Data assimilation efforts have just started being implemented in the field of oceanography. Such efforts are supported by international collabo-
188
T. Waseda et al.
rations such as GODAE (Global Ocean Data Assimilation Experiments, Smith and Lefebvre, 1998) which aims to combine the state-of-the-art numerical models with global ocean observation such as TOPEX/Poseidon satellite altimetry (Fu et al., 1994) and ARGO in situ subsurface hydrography (Wilson, 2000). As the available computational resources become abundant, the OGCM (Ocean General Circulation Model) become more sophisticated. These high resolution global or regional OGCMs will produce enormous amount of data, typically of the order of 10 million degrees of freedom. Thus the synthesis of the observational data and the OGCMs require significant resources if a straightforward implementation of the data assimilation schemes is to be conducted. Data assimilation schemes based on a statistical optimal estimation theory come in two flavors: the adjoint method and the Kalman filtering method. The former methods are considered to be more dynamically constrained whereas the latter are less constrained dynamically but simpler due to their sequential nature. Both methods require some kind of simplification since otherwise they are too expensive to perform. In the case of a full Kalman filtering scheme the handling of n by n matrices is required where n, the degrees of freedom of the model state vector, can easily exceed a million. The Kalman filtering scheme provides a method to take a weighted average of the model and the observation data sequentially as new data becomes available. The key to the success of this method is a knowledge of the prediction error or the model statistics at each time a correction is made. There are two issues here: first, the computation of the evolution of the prediction error, and, second, the initialization of the prediction error. The Kalman filtering algorithm provides us with a formal way to conduct the former but a straightforward implementation is impossible with computational resources currently available. On the other hand, the algorithm does not give clues about how we can initialize the prediction error. The most simplified form of the Kalman filtering scheme, then, was to assume that the prediction error is stationary and can be approximated by some analytical function; such simplified schemes are called optimal interpolation and nudging (Malanotte-Rizzoli and Holland, 1986; Mellor and Ezer, 1991; Ezer and Mellor, 1994). An alternative method is to significantly reduce the cost of computing the evolution of prediction error by approximating the prediction error in a reduced space (Fukumori and MalanotteRizzoli, 1995). The prediction error can be approximated by significantly fewer degrees of freedom (typically of order 10) provided one knows the principal direction of the error growth (Fukumori et al., 1999). The so-called SEEK filter was introduced based on this underlying approximation (Pham et al., 1998; Verron et al., 1999). This method computes only the evolution of the selected
singular vector that is initialized by the EOF modes of the model variations. In the case of Verron et al., the number of necessary modes that was needed for their equatorial model was about 15, in order to reproduce longterm events such as El Niño. The dominant eigenmodes of the prediction error that have been used tend to assume a large spatial structure and longer time scales in oceanographic applications. This implies that a short-term variability inherent in the numerical simulations, such as numerical errors to model the prediction error, are completely ignored in the reduced Kalman filtering approach such as the SEEK filter. Recently, Jameson and Waseda (2000) introduced a new scheme that diagnostically detects the relative magnitudes of the numerical errors, and applied it to assimilating satellite data into an OGCM. The detection of the numerical errors is done by projecting the model output field onto a wavelet basis, and the finest scale wavelet coefficients indicate the numerical errors associated with discretization (Jameson, 1998; Arai and Jameson, 2001). The method was further extended to utilize a coarser scale wavelet basis to improve the assimilation skill (Jameson et al., 2002). Here we focus on combining wavelet analysis and EOF analysis in order to obtain a representation of data. The example data set we use is SSH data from an OGCM that has already filtered out the external mode such as barotropic tides. The first few modes of an EOF analysis contain the large-scale global and relatively slowly varying information, whereas the wavelet analysis contains the small-scale, local information that changes on a relatively fast time scale. By combining these two approaches we hope to obtain an efficient representation that can reduce the dimension of the underlying space and hence provide the foundation for an efficient scheme for data assimilation. Our desire is that our new scheme, though not optimal in the least-squares sense, will be optimal when the computational effort is considered. That is, utilizing a large number of global eigenmodes that can represent fine scale features is computationaly expensive, if their evolution is to be computed. This is not a problem if one is satisfied with fixed eigenmodes, initialized by EOF modes, for example. However, if following the change in the small scale features is important, the computational efficiency does matter. On the other hand, recalculating wavelet coefficients is computationally cheap and can be done every time step if necessary without any significant impact on the performance of the model. Thus, the slowly varying, large scale EOF modes will be kept and possibly evolved in time as in the SEEK filter and augmented by rapidly changing fine-scale wavelet modes. The suggested method, therefore, is not an alternative but a complement to the SEEK filter. We begin with a review of the basics of wavelet
analysis. The following section introduces wavelets and sets down the notation that we have chosen to use. Some examples of a wavelet analysed SSH field from an Ocean GCM are also presented. Section 3 introduces EOF analysis of the SSH field of an OGCM. In Section 4, the orthogonality of the EOF modes and the wavelets is tested using the same OGCM output. Finally, Section 5 discusses a possible extension of our proposed method to data assimilation. Conclusions follow. 2. Wavelet Analysis Possibly the most instructive way to think of wavelets is in contrast to traditional analysis techniques such as Fourier analysis. With Fourier analysis we analyze discrete or continuous data using basis functions which are global, smooth and periodic. This analysis yields a set of coefficients, say ak, which gives the amount of energy in the data at frequency k. Wavelet analysis, by contrast, analyzes data with basis functions which are local, slightly smooth, not periodic, and which vary with respect to scale and location. Wavelet analysis thereby produces a set of coefficients bj,k which give the amount of energy in the data at scale j and location k. Wavelet analysis can serve as a good complement to Fourier analysis. In fact, data which is efficiently analyzed with Fourier analysis is often not efficiently analyzed with wavelet analysis and vice versa. For our purposes we confine our discussion to socalled orthogonal wavelets and specifically the Daubechies family of wavelets. The orthogonality property leads to a clear indication when data deviates from a low-order polynomial, the importance of which will become clear when we discuss numerical methods. 2.1 Theoretical background in the continuous world To define Daubechies-based wavelets (see Daubechies, 1988; Erlebacher et al., 1996), consider the two functions φ(x), the scaling function, and ψ(x), the wavelet. The scaling function is the solution of the dilation equation, L −1
φ ( x ) = 2 ∑ hk φ (2 x − k ), k =0
(1)
which is so called because the independent variable x appears alone on the left hand side but is multiplied by 2, or dilated, on the right hand side. One also requires the scaling function φ (x) be normalized: ∫–∞∞φ (x)dx = 1. The wavelet ψ(x) is defined in terms of the scaling function, L −1
ψ ( x ) = 2 ∑ gk φ (2 x − k ). k =0
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
(2 )
189
One builds an orthonormal basis from φ(x) and ψ(x) by dilating and translating to get the following functions:
(
)
φ kj ( x ) = 2 − j / 2 φ 2 − j x − k ,
∞
(3)
δ klδ jm = ∫ ψ kj ( x )ψ lm ( x )dx, −∞
( 9)
where δ kl is the Kronecker delta function, and the accuracy is specified by requiring that ψ(x) = ψ00(x) satisfy
and
(
determines the accuracy. In other words, ψ kj(x) will satisfy
)
ψ kj ( x ) = 2 − j / 2 ψ 2 − j x − k ,
( 4)
where j, k ∈ Z. j is the dilation parameter and k is the translation parameter. It is usual to let the spaces spanned by φ kj(x) and j ψ k (x) over the parameter k, with j fixed, be denoted by Vj and W j respectively, Vj =
span j φ ( x ), k ∈Z k
(5)
Wj =
span j ψ ( x ). k ∈Z k
(6 )
The spaces Vj and W j are related by, ... ⊂ V1 ⊂ V0 ⊂ V –1 ⊂ ...,
(7)
∞
∫−∞ ψ ( x ) x
m
dx = 0,
(10)
for m = 0, ..., M – 1. This statement on accuracy can be explained in an alternative manner. Recall that within the scaling function subspaces Vj one can reconstruct low order polynomials exactly up to a given order so that x0, x1, ..., xM–1 can be represented exactly by choosing appropriate scaling function coefficients. Accordingly, the subspace W j which is orthogonal to Vj and consequently the basis elements ψ j(x) of W j will be orthogonal to all elements contained in Vj such as these low-order polynomials. So, one can see that by specifying the number of vanishing moments of the wavelets one has in effect specified the number of polynomials that can be represented exactly and hence the numerical accuracy of the method. For representing functions in L 2(R) one can see from the above expressions that for any function f(x) ∈ L 2(R) there exists a set {djk} such that
and Vj = Vj+1 ⊕ W j+1,
(8)
where the notation V0 = V1 ⊕ W1 indicates that the vectors in V1 are orthogonal to the vectors in W1 and the space V 0 is simply decomposed into these two component subspaces.
The coefficients H = { } and G = { } are related by gk = (–1)khL–k for k = 0, ..., L – 1. All wavelet properties are specified through the parameters H and G. If the data is defined on a continuous domain such as f(x) where x ∈ R is a real number then one uses φ kj(x) and ψ kj(x) to perform the wavelet analysis. If, on the other hand, the data is defined on a discrete domain such as f(i) where i ∈ Z is an integer then the data is analyzed, or filtered, with the coefficients H and G. In either case, the scaling function φ (x) and its defining coefficients H detect localized low frequency information, i.e., they are low-pass filters (LPF), and the wavelet ψ(x) and its defining coefficients G detect localized high frequency information, i.e., they are high-pass filters (HPF). Specifically, H and G are chosen so that dilations and translations of the wavelet, ψkj(x), form an orthonormal basis of L 2(R) and so that ψ(x) has M vanishing moments which L −1 hk k = 0
190
T. Waseda et al.
f ( x) =
L −1 gk k = 0
∑ ∑ dkjψ kj ( x ),
j ∈Z k ∈Z
(11)
where
dkj = ∫
∞
−∞
f ( x )ψ kj ( x )dx.
(12)
The two sets of coefficients H and G are known as quadrature mirror filters. For Daubechies wavelets the number of coefficients in H and G, or the length of the filters H and G, denoted by L, is related to the number of vanishing moments M by 2M = L. For example, the famous Haar wavelet is found by defining H as h0 = h1 = 1. For this filter, H, the solution to the dilation equation (1), φ(x), is the box function: φ(x) = 1 for x ∈ [0,1] and φ(x) = 0 otherwise. The Haar function is very useful as a learning tool, but because of its low order of approximation accuracy and lack of differentiability it is of limited use as a basis set. The coefficients H needed to define compactly supported wavelets with a greater degree of regularity can be found in Daubechies (1988). As is expected, the regularity increases with the support of the wavelet. The usual notation to denote a Daubechies-based wavelet defined by coefficients H of length L is D L.
As mentioned in the introduction, here we explore a combination of EOF basis vectors with wavelets in an effort to reduce the dimension of the approximation space below the dimension obtained from using EOFs or wavelets alone.
it will detect variation in both the horizontal and vertical directions. Note that as in the one dimensional case we certainly will perform many levels of wavelet decompositions, and the subspaces that we will primarily be interested in will be,
2.2 Wavelet analysis in 2D of SSH data Generally, our fields of interest will be in two or three dimensions so we must say something about higher dimensional wavelet transforms. The approach that is most commonly taken is to use the tensor product of the one dimensional transform in order to generate the two dimensional transform. Recall that in one dimension that we can break up the subspace V0 such that,
W j ⊗ W j, for j = 1, 2, 3, 4 ....
V 0 = V 1 ⊕ W1. So, if one takes the tensor product of this one dimensional analysis with another one dimensional analysis one then obtains, V0 ⊗ V0 = (V1 ⊕ W 1) ⊗ (V1 ⊕ W 1),
(13)
which yields V0 ⊗ V 0 = (V1 ⊗ V1) ⊕ (V1 ⊗ W 1) ⊕ (W 1 ⊗ V1) ⊕ (W 1 ⊗ W 1).
(14)
One can think of the subspace, V1 ⊗ V1 as representing the horizontal and vertical average, lowpass filtering, of the information contained in V0 ⊗ V0. The subspace V 1 ⊗ W1 represents a horizontal low-pass filtering process and a vertical high-pass filtering process. Such a subspace would capture horizontal edges. On the other hand, the subspace W1 ⊗ V1 would represent a horizontal high-pass filtering process and a vertical low-pass filtering process. One would be able to detect vertical edges in such a subspace. Finally, when the high pass filter is applied in both the vertical and horizontal directions one arrives at the subspace, W 1 ⊗ W 1. It is this subspace that we are primarily interested in since
Generally, decompositions up to j = 4 will be sufficient. To illustrate, Fig. 1 shows four frames. The upper left frame contains the original flow field of the Pacific Basin with the Kuroshio current (see Section 3 for a description of the regional OGCM). The upper right frame contains the two dimensional wavelet transform and the decomposition W1 ⊗ W1. In the lower left frame one can see the wavelet decomposition after two transformations and we are looking at the transform W 2 ⊗ W2. In the lower right frame we are looking at the wavelet transform after three decompositions, looking at W 3 ⊗ W 3. We can see that the three frames provide information on the original field at increasingly large scales. It is this localized small scale information that we will combine with the global EOF basis functions to obtain a reduced dimension basis set. To further clarify the definition introduced in this section, in Fig. 2 we plot the decomposed subspaces defined in Eq. (14). The upper left frame contains the two dimensional wavelet transform and the decomposition W1 ⊗ W 1. The upper right frame contains the decomposition V 1 ⊗ W1, the lower left frame contains the decomposition W1 ⊗ V1, and the lower right frame contains the decomposition V1 ⊗ V1. Similar decompositions follow for the second and the third scale, but we do not present those here (see Jameson et al., 2002). Our introduction to wavelet analysis is now complete and we now proceed to empirical orthogonal functions. 3. Construction of the Empirical Orthogonal Modes The empirical orthogonal modes (EOFs) of the model variances are constructed for the two-dimensional data of SSH. The error covariance of the model state x can be expressed as
ε = ( x (t ) − x ) ( x (t ) − x ) , T
(15)
where x is the mean state vector and < > represents time mean, or 1 ε = XX T , s
(16)
where X contains columnwise state vectors of length n sub-sampled in time for s samples.
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
191
original SSH
first scale
50
50
45
45
40
40
35
35
30
30
25
25
20
130
140
150
160
20
130
second scale 50
45
45
40
40
35
35
30
30
25
25 130
140
150
150
160
third scale
50
20
140
160
20
130
140
150
160
Fig. 1. Original SSH data of the Kuroshio model and the wavelet transformed fields of the SSH at different scales.
Fig. 2. Graphical representation of the decomposition subspaces presented in Eq. (14). (a) High pass filter in both the vertical and horizontal directions; (b) high pass filter in the vertical and low pass filter in the horizontal directions; (c) low pass filter in the vertical and high pass filter in the horizontal directions; (d) low pass filter in both vertical and horizontal directions. Note that vertical and horizontal directions refer to grid lines parallel to and perpendicular to the coast of Japan, respectively. 192
T. Waseda et al.
50
mode=1
50
mode=7
50
45
45
45
40
40
40
35
35
35
30
30
30
25
25
25
20
50
130
140
150
160
mode=19
20
50
130
140
150
160
mode=25
20
50
45
45
45
40
40
40
35
35
35
30
30
30
25
25
25
20
130
140
150
160
20
130
140
150
160
20
mode=13
130
140
150
160
150
160
mode=31
130
140
Fig. 3. Eigenvectors for 6 EOF modes. Data is not normalized by the local variance.
The construction of the entire eigenmodes from ε requires handling an n by n matrix, but the eigenvalues up to s can be obtained by singular value decomposition, X = VΛUT,
(17)
where U is an s by s matrix containing column-wise eigenvectors of length s; Λ is a diagonal square s by s matrix containing the corresponding eigenvalues λ. The matrix Λ is a subset of Λ n , the square root of the eigenvalues of ε , V is an n by s matrix containing s eigenvectors of length n. The Vs are the EOF modes. The EOFs of the SSH were constructed from an output of the regional OGCM that mimics the general circulation of the Kuroshio, the Oyashio and the Kuroshio Extension (Mitsudera et al., 1997). It should be noted that the tides have been removed from the data. The model is a version of the σ-coordinate primitive equation solver (POM; Blumberg and Mellor, 1983) that covers the main Kuroshio stream along the southern Japan coast, the Oyashio current in the north and the Kuroshio extension region; the approximate domain is from 125°E to 170°E, 20°N to 52°N, configured to have a curvilinear coordinate system in which the horizontal axes follow the mean geometry of Kuroshio stream. The horizontal resolution of the model varies between 1/6 degrees to 1/12 degrees within the model domain, and 32 sigma-levels are configured in depth. The EOF analysis was conducted for
particular SSH data sets of length one year from this model. The corresponding time scale of the detected eigenmodes depends on the sampling rate ∆t = T/s where T is the length of the data set. The longest detectable time scale is T and the shortest detectable time scale is 2/∆t = 2T/s. In this study T is one year, and the number of samples s is 72 so the time scale of the EOFs range between 10 days and 1 year. However, the time scales of each detected modes depend on the data set and in this study the most significant EOF mode v1 had a time scale of about 6 months, the second one v2 had a time scale of about 3 months and the higher modes had shorter time scales. The length scale of the EOF modes was largest for the most significant mode and it reduced with decreasing magnitude of significance. In Fig. 3 we have plotted the eigenvectors for six modes in our EOF expansion. Beginning with mode 1 in the upper left plot, we have the large scale global information. As we procede from left to right and down through the plots, we end with the eigenvector at mode 31 which represents global fine scale information. In Fig. 4 we plot the spectrum which shows the decay in the energy from the large scale eigenvectors which correspond by index to the lowest indexed eigenvalues to the small scale eigenvectors which correspond by index to the eigenvalues with the highest index. In the next section we show how these eigenvector scales appear when projected onto a wavelet basis.
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
193
4
For EOF analysis we adopt a representation of the form,
10
Deof = ∑ λ j Vj VjH ,
3
10
(18)
λ2
j
2
10
1
10
0
10
0
5
10
15
20 nth eigenvector
25
30
35
40
where λj is the eigenvalue for the j-th eigenvector Vj, where H denotes the Hermitian operation or conjugate transpose. Here we have arbitrarily defined the data and are denoting it by Deof. As mentioned above, we assume the eigenvectors are ordered from largest to smallest, i.e., λj > λj+1. Ordered in this manner we have a decaying spectrum. On the other hand, we can represent the data in wavelet basis with the notation, Dwavelet = ∑ d j , k Wj , k .
Fig. 4. Eigenvalue spectrum of the EOF analysis. Data is not normalized by the local variance.
4. Projection of Wavelets onto an EOF Basis As mentioned above, wavelet analysis and EOF analysis provide different and complementary ways to represent information. The basis vectors for EOF analysis come from the eigenvectors of a covariance matrix and represent global information. If the eigenvalues are ordered from largest to smallest, one finds that the eigenvectors represent information of increasingly fine scale, but again, this information is global. Wavelets, on the other hand, are completely local in nature. The relative efficiency of these two data sets will simply depend on the nature of the information being represented. If the data is completely local in nature, then one would find that wavelet analysis provides a very efficient basis set. And if the data is global in nature then the EOF analysis will be the preferred basis. There are, however, many situations in physics and oceanography where the data is primarily composed of large scale, slowly varying flows with localized small-scale phenomenon appearing periodically. Consider the large-scale eigenvectors from the EOF analysis as representing the global average flow in ocean circulation. Then one would superimpose upon this the small scale locally varying features using wavelet basis vectors. If the small scale information is global then one would opt to use the small-scale eigenvectors from the EOF analysis. Precisely speaking, one must look at the decay rates of the expansions in both the EOF basis vectors and the wavelet basis vectors. Further, one must look at the projection of the wavelet basis functions onto the EOF basis vectors. This information is sufficient to form an effective combination of EOF analysis and wavelet analysis.
194
T. Waseda et al.
(19)
j, k
We are assuming that the data D wavelet is finite and thus our two representations are complete and the representations as written above are lossless. Lossless is a word used to describe representations that lose no information such that original signal can be perfectly reconstructed. Here for illustration purposes we have let the scaling function subspace have zero coefficients. The key differences between the EOF basis set and the wavelet basis is that the wavelets provide a local representation and the EOF basis vectors provide a global representation. Further, in contrast to the EOF vectors, the wavelet basis vectors Wj,k will not depend on the data. Let us consider a slightly more general scenario with simplified notation and in a continuous domain. If we have two different orthonormal basis sets B1 and B2 with,
PB1 f ( x ) = ∑ a1m Bm1 ( x )
(20)
a1m = ∫ f ( x ) Bm1 ( x )dx,
(21)
m
with
and the second representation in terms of the second basis,
PB 2 f ( x ) = ∑ an2 Bn2 ( x )
(22)
an2 = ∫ f ( x ) Bn2 ( x )dx
(23)
n
with
then if one projects the representation in one basis set onto the second basis set, one obtains, PB1 PB 2 f ( x ) = ∑ a1m Bm1 ( x ),
(24)
a1m = ∫ ∑ an2 Bn2 ( x ) Bm1 ( x )dx, n
(25)
m
where,
where we have intentionally written the wavelet coefficients as functions of time and left the eigenvectors and eigenvalues as non-time dependent. The idea is that the eigenanalysis will represent the very slowly varying global features of the flow and the wavelet basis vectors will represent the relatively small scale features that vary quickly in time. Since the eigenvectors are not orthogonal to the wavelets, the above representation has some redundancy or error that is proportional to the projection of the wavelets onto the eigenmodes, as explained above. That is, d j , k (t ) = ∫ f ( x, t )Wj , k ( x )dx − ∑ λi ∫ Vi ( x )Wj , k ( x )dx,
or,
(30)
i
a1m = ∑ an2 ∫ Bn2 ( x ) Bm1 ( x )dx,
(26)
n
so that the key to mapping from one basis set to the other basis is the matrix Matrixn,m = , where this matrix is simply the projection of one basis onto another. For the projection of wavelets onto EOF eigenvectors we would, therefore, be interested in the behaviour of, p continuous = ∫ Wj , k ( x )Vn ( x )dx, j, k,n
(27)
written in a continuous variable or p discrete = WjT, k Vn , j, k,n
(28)
in discrete notation. In either case, the number pj,k,n is a measure of the similarity of the two basis vectors. As one projects wavelet basis functions of increasing scale onto EOF basis functions of decreasing scale, one thus expects the projection to decrease in amplitude. Further, one expects that there will be a wavelet scale which most closely matches the predominant scale in a given EOF eigenvector. All of these phenomenon are observed in the following plots. 4.1 Combining EOF and wavelet analysis As mentioned above, our overall goal is to combine the strong features of EOF analysis with the strong features of wavelet analysis. We envision a function or data composed of contributions from both basis sets. For now we will not write limits on the integrals and summations. Such limits will be chosen so that various error criteria are satisfied, as outlined below. So, our expansion will be of the form,
f ( x, t ) = ∑ λi Vi ( x ) + ∑ d j , k (t )Wj , k ( x ), i
(29)
where the term, E ja, k = ∑ λi ∫ Vi ( x )Wj , k ( x )dx,
(31)
i
represents the error in the above inner product of the wavelet with the data. This error, as discussed above, can be made as small as desired by appropriately choosing which wavelets and eigenmodes to use in the expansion and by considering the spectrum in both the wavelet and EOF basis sets. Conversely, the eigenvalue represents the projection of the data onto the corresponding eigenvector and there will be a similar error term which depends ultimately on the projection of the wavelet basis functions on the eigenvectors. That is,
λi = ∫ f ( x )Vi ( x )dx − ∑ d j , k ∫ Wj , k ( x )Vi ( x )dx,
(32)
j, k
and the error term is, Eib = ∑ d j , k ∫ Wj , k ( x )Vi ( x )dx.
(33)
j, k
So, if we define ε to be,
ε i, j , k = ∫ Wj , k Vi dx,
(34)
then we can see that our two errors are,
E ja, k = ∑ λi ε i, j , k ,
(35)
i
and,
j, k
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
195
–7
4
–5
x 10
1
a)
–4
x 10
1
b)
–7
x 10
1.5
c)
3 0.5
0.5 0.5
1 0
20
40
0
–4
6
0
20
40
0
–3
x 10
3
d)
x 10
6
e)
0
20
x 10
40
0
–3
6
f)
2
4
4
2
1
2
2
0
20
40
0
0
20
40
0
0
0
20
40
20
40
0
8
b)
6
4
4
2
2
0
0
20
40
0
–3
x 10
2
d)
0
–5
x 10
6
–4
4
0
8
a)
1
2
0
–6
x 10
20
40
x 10
2
e)
1.5
1
1
0.5
0.5 0
c)
0
20
40
20
40
–3
1.5
0
x 10
20
40
0
x 10 f)
0
Fig. 5. Projection of the Daubechies 1 wavelet at six different scales onto the EOF basis vectors. Wavelet scales go from fine to coarse as the plots go from upper left to lower right. Data is not normalized by the local variance.
Fig. 6. Projection of the Daubechies 3 wavelet at six different scales onto the EOF basis vectors. Wavelet scales go from fine to coarse as the plots go from upper left to lower right. Data is not normalized by the local variance.
Eib = ∑ d j , k ε i, j , k .
or interference of the eigenvector Vi with the wavelets at wavelet scale j. From the above section on wavelet analysis we recall that we denote the finest wavelet scale of information as j = 0 and more coarse scales as j increases. Further, we denote the subspace spanned by wavelets at scale j as W j. So our ε i, j is a measure of the orthogonality of the subspace W j with the subspace spanned by the eigenvector V i . Though the issue of scale size for eigenvector Vi is not well defined, we do expect for physical reasons that scales of information in Vi will be larger than the scales of information in Vi+1. We expect this is due to the previously defined ordering of the eigenvectors. That is, we expect that the overall general flow of the ocean model will have most of its energy in the larger scales with the smaller scales being less and less energetic as the scale size decreases. And, since the eigenvalues represent the energy in a given eigenvector, we expect that the larger eigenvalues will correspond to larger scales. This physically intuitive argument is borne out in Figs. 5 and 6. The projections in these figures give a direct measure of the scale of the information contained in the eigenvectors. As one proceeds from upper left to lower right through the six plots in Figs. 5 and 6, one obtains, respectively, the projections of W1 onto Vi, then W2 onto Vi, ... and finally W 6 onto Vi. One can see from these figures that the projection of W 1 onto the eigenvectors is quite small with the largest projection being of the order of 10–7. Further, one can see from the first plot in Fig. 5 that the magnitude of the projection increases as the index of the eigenvector increases. This indicates that the similarity of the wavelets in W 1 with the eigenvectors Vi is increasing as i increases, thereby
(36)
j, k
Therefore, we can have very accurate estimates of the errors introduced by combining the two basis sets simply by using the above two expressions which require that we know the EOF spectrum λi, the wavelet spectrum dj,k and the projection of the basis functions onto each other εi,j,k. 4.2 The interference between basis functions The projection of wavelets onto eigenfunctions or eigenfunctions onto wavelets can also be viewed as the interference between the two representations. In principle we would prefer that this interference be zero, but it will not be. Further, the EOF basis functions will depend on the data and one cannot therefore characterize theoretically once and for all the interaction of wavelets and some generalized EOF basis functions. We choose instead to actually do projections using real data derived from an ocean model. Whereas wavelets have the ability to detect scales with respect to position, the EOF eigenvectors do not. We therefore need to modify our εi,j,k so that the position parameter k is eliminated. We can do this in a number of ways, but we keep the presentation in the L 2 norm and simply sum up the square of the interactions over k,
(
)
ε i, j = ∑ ε i, j , k , k
2
(37)
and in this manner we obtain a measure of the interaction 196
T. Waseda et al.
indicating that scales are decreasing as i increases. As one proceeds to plots b) and c) in Fig. 5 one can see that the overall magnitude of the projections have increased to order 10 –6 and 10–5 respectively. This indicates that the scale of all the eigenvectors are more similar, hence larger projection, to the wavelets in W 2 and W3 than to the wavelets in W1. Next in plots d), e), and f) of Fig. 5 we can see that a peak projection is obtained before that last eigenvector is reached. In plot d), for example, we can see that the peak is roughly near eigenvector V20, indicating that the scales of information contained in this eigenvector are most similar to the wavelets in W 4 than all the other eigenvectors. One can see that this analysis provides a direct measure of the scale of information contained in the various eigenvectors. 4.3 Role of different wavelets As mentioned above, we limit our study to the family of orthogonal wavelets known as the Daubechies wavelets. So, within this family of wavelets which one should we choose to work with? This will depend on several factors. First of all, as one goes to higher and higher order wavelets, one finds that the non-zero region of the wavelet grows. That is, the compact support of the wavelets is certainly maintained, but the support grows. Further, when the wavelets become higher and higher order, there is a region of their support which is non-zero but which can be very close to zero. Generally this close-to-zero area of support is not a desirable property and should generally be avoided in computer computations where roundoff error can become an issue for quantities close to zero. A second issue is the overall magnitude of the projection of the wavelets onto the eigenvectors. From the approximation theory presented above, one can see that in the wavelet approximation of sine waves that the truncation error, hence projection size is governed by traditional polynomial truncation error. From the basic expressions defining the approximation ability of wavelets and the fundamental approximation ability of sine waves, one can easily derive the following expression for the truncation error of a wavelet approximation of sine waves,
(2 ∆x ) Proj = j
p +1 p +1
k
( p + 1)!
,
(38)
where p is the number of vanishing moments of the wavelet and k is the k-th sine mode being approximated. From this expression one can see that as p increases, the magnitude of the projection and hence interference between the wavelet basis vectors and the EOF eigenvectors will decrease. In Fig. 5 our projections are calculated with the
Daubechies wavelet with one vanishing moment and in Fig. 6 our projections are calculated using the wavelet with 2 vanishing moments. One can see that the magnitude of the projections using the wavelet with more vanishing moments has decreased slightly, as expected. Generally speaking, one should find that the wavelets with either 1 or 2 vanishing moments should be sufficient for most applications. 4.4 When the eigenvectors are Fourier modes In a simplified and idealized situation one constructs a domain that is periodic in all dimensions, then one can construct examples such that all matrices are circulant and hence can be diagonalized by the Fourier modes. That is, our eigenvectors are the Fourier vectors or simply sine waves. Though we do not expect most problems to be periodic, it is important to consider this special case since much of the character of the interaction of EOF eigenvectors can be understood from the special case where the eigenvectors are simply sine waves. This construction is only to help the reader intuitively understand the interaction of smooth global basis function such as an EOF vector with a wavelet. Of course, in practice in the ocean such Fourier modes will likely appear as the EOF vectors, but Fourier modes do occur in many systems in nature, particularly those with periodicity. As indicated above, an upper bound on the size of projection of a wavelet onto a sine wave can be found from truncation error analysis of traditional polynomial interpolation. Therefore, if our eigenvector is now set to be a sine wave, Vn = einx, then our projection,
ε np, j , k = ∫ Wj , k ( x )e inx dx,
(39)
will have an upper bound of,
(2 ∆x ) ≤ j
max ε np, j , k k
p +1 p +1
n
( p + 1)!
,
(40)
where p is the number of vanishing moments of the wavelet and ∆x is the distance between grid points in the original data sample. Since this expression is an upper bound, it should only be used as a rough guide to the projection magnitude. Now that we have a precise expression for ε, we can assume a form for the Fourier spectrum and thereby obtain a more precise estimate of the error in mixing a Fourier basis with a wavelet basis. If we assume a Fourier spectrum of the form,
λn = n − m e a ∗n / n0 ,
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
(41) 197
Fig. 7. (a) SSH snapshot from the regional model; (b) variance estimate from EOF modes of the SSH signal from a year long data; (c) wavelet detected varaince from the SSH snapshot; (d) combination of the variance estimates from EOFs and wavelets.
then we can see that error Ej,ka as defined above will be bounded above by, E ja, k ≤ ∑ n
p − n +1
(
n 2 j ∆x ( p + 1)!
)
p +1 a ∗ n / n 0
e
T P= L 1Λ 2L 3 +
.
(42)
This upper bound on error can be used to estimate the interference between the wavelet and the EOF basis approach and in the context of data assimilation these estimates will provide the off-diagonal elements of matrics which would usually contain off-diagonal elements which are zero. 5. Preparation for Ocean Data Assimilation In this paper we have so far provided mathematical justification of how a given signal can be approximated by a series of orthogonal functions, including both global functions (EOFs) and local functions (wavelets). The key for this to be possible was to test the orthogonality of the two types of basis functions, EOFs and wavelets. Although as an example we have analysed an output of an ocean model, the discussion given so far is general and the method can be applied to any given signal. Here we introduce a first step to utilize our technique to construct a prediction error covariance matrix for an ocean model. The essence of this approach is to approximate the 198
T. Waseda et al.
prediction error covariance matrix by,
reduced rank
, Qh { SUgOiWADAi
(43)
where L contains eigenvectors of global type (like EOFs), Λ contains corresponding eigenvalues, and the diagonal element of Qh contains the error variance estimate from wavelet detection (SUgOiWADAi, Jameson et al., 2002). Therefore, in addition to the typical reduced rank Kalman filtering scheme (e.g., SEEK filter), we have additional information that contains high-frequency (or localized) information such as numerical errors and local variations. The term Qh can be considered as a combination of the so-called model error (typically represented as Q) and the high-frequency part of the signal that was low-pass filtered out when reducing the rank of the prediction error covariance matrix. In our previous work we have approximated Q h using information obtained from wavelet transformation, thus representing it as, Qh = D1/2CD1/2 where diag(D) was estimated from the wavelet detected model variations of various scales (see Jameson et al., 2002), and C is a Gaussian distant dependent function. As an example, we have estimated
D = ad12 + bd22 + cd32 ,
(44)
with a = b = c = 1 from an SSH snapshot and presented in Fig. 7(c). The SSH snapshot (Fig. 7(a)) is from the model output where the EOF modes were constructed. The variance estimate from the first 10 EOF modes are shown in Fig. 7(b). Both variance estimates are scaled to be in units of m2. In contrast with the variance estimate from the wavelet dyagnosis, the EOF variances tend to show larger spatial structure. In this particular example the EOF modes captured the average location of eddies that results from a repeated cyclonic ring generation at the beginning of the Kuroshio extension and its merger with the Kuroshio main current. On the other hand, the wavelet diagnosis captured the exact location of the eddy itself; one south of Kuroshio around 142°E and the other at 153°E. This clearly illustrates the distinct nature of the two analysis, that is, the EOF represents time-averaged variance whereas the wavelet detects at each time instant, an estimate of the variance. As mentioned earlier in the paper, we expect that EOF analysis will capture the signal that is slowly varying in time. We therefore expect that EOFs approximate the eigenvectors of the error covariance matrix that rotates at a relatively slow time scale. Perhaps the outputs from a regional model we used were not ideal data to illustrate this because the time scale of the first EOF mode was only 60 days or so. If we had analysed a basin-wide model, the distinction of the time-scale of the physical phenomenon that EOFs and wavelets would detect will become more evident; consider El Niño signal vs. eddies. Finally, the strength of wavelet diagnosis is the detection of the numerical error. In the SSH signal (Fig. 7(a), at around 145°E and 38°N, where the Kuroshio water and the Oyashio water meets, there is a strong front generated due to the large temperature gradient. In this neighborhood we have detected a delta-2 noise in the temperature signal. The wavelet analysis detected this numerical error and showed large values of variance comparable to the magnitude of the large peaks of the EOF variance shown in Fig. 7(b). This last statement may require some explanation. The EOFs are an optimal representation of the signal in the sense of time and space average. On the other hand, wavelet detected variances are local and are not averaged in time. Thus, even if the magnitude of the wavelet detected variances maybe comparable to that of EOFs, when they are averaged in both time and space the averaged variance will be significantly less than what the EOFs can represent. In other words, the wavelets can only account for the high-frequency energy of the EOF modes. And if one thinks that these high-frequency errors are important, unfortunately, EOFs will simply smear them
out. Whether the inclusion of the wavelet detected variances are important or not depends on one’s application. Our method suggests that such high-frequency local error can be detected diagnostically with wavelet analysis at a relatively cheap computational cost. 6. Conclusion As outlined in the introduction, our thesis is the idea of combining different basis sets to represent a source of information. The reason that one should be interested in such an approach is that physical information often exists at many scales. Further, it is often the case that the large scale information is global in nature and often slowly varying. In addition, the small scale information is often local in nature and varies on a faster time scale than the large scale information. Thus, our goal is to combine the often used Empirical Orthogonal Functions with the relatively new wavelet basis functions. The EOFs will be used to initialize the large-scale slowly varying modes and the wavelets will be used to represent the small scale modes. We have attempted to show some of the initial theoretical work that would be needed for such a technique to be applied in any practical scientific analysis. We have focused our attention on data assimilation where we hope to apply this new method to future algorithms with a highly reduced dimension in the underlying modes of the process. By doing so, we expect to obtain a very efficient and practical new data assimilation technique. Acknowledgements We thank the two anonymous reviewers for their valuable comments and the editor in charge Prof. Awaji for his guidance in improving the paper. We also thank Ms. Diane Henderson for her careful editorial review. This research was supported by Frontier Research System for Global Change and International Pacific Research Center. School of Ocean and Earth Science and Technology Contribution Number 6093 and International Pacific Research Center Contribution Number 186. References Arai, K. and L. Jameson (2001): Fundamental Applications of Wavelets for Earth Observation Satellite Image Analysis. Morikita Shuppan (in Japanese). Blumberg, A. F. and G. L. Mellor (1983): Diagnostic and prognostic numerical circulation studies of the South Atlantic Bight. J. Geophys. Res., 88, 4579–4592. Daubechies, I. (1988): Orthonormal basis of compactly supported wavelets. Comm. Pure Appl. Math., 41, 909–996. Erlebacher, G., M. Y. Hussaini and L. Jameson (1996): Wavelets: Theory and Applications. Oxford. Ezer, T. and G. L. Mellor (1994): Continuous assimilation of Geosat altimeter data into a three-dimensional primitive equation Gulf Stream model. J. Phys. Oceanogr., 24, 832– 847.
Optimal Basis from Empirical Orthogonal Functions and Wavelet Analysis for Data Assimilation: Optimal Basis WADAi
199
Fu, L.-L., E. J. Christensen, C. A. Yamarone, Jr., M. Lefebvre, Y. Menard, M. Dorrer and P. Escudier (1994): TOPEX/ POSEIDON mission overview. J. Geophys. Res., 99, C12, 24,369–24,381. Fukumori, I. and P. Malanotte-Rizzoli (1995): An approximate Kalman filter for ocean data assimilation: An example with an idealized Gulf Stream. J. Geophys. Res., 100, 4, 6777– 6793. Fukumori, I., R. Raghunath, L. Fu and Y. Chao (1999): Assimilation of TOPEX/Poseidon altimeter data into a global ocean circulation model: How good are the results? J. Geophys. Res., 104, 11, 25,647–25,665. Jameson, L. (1998): A wavelet-optimized, very high-order adaptive grid and order numerical method. SIAM J. Sci. Comput., 19, 1980–2013. Jameson, L. and T. Waseda (2000): Error estimation using wavelet analysis for data assimilation: EEWADAi. J. Atmos. Oceanic Technol., 17, 9, 1235–1246. Jameson, L., T. Waseda and H. Mitsudera (2002): Scale utilization and optimization from wavelet analysis for data assimilation: SUgOiWADAi. J. Atmos. Oceanic Technol., 19, 5, 747–758. Luo, J. and L. Jameson (2002): A wavelet-based technique for identifying, labeling, and tracking of ocean eddies. J. Atmos. Oceanic Technol., 19, 3, 381–390.
200
T. Waseda et al.
Malanotte-Rizzoli, P. and W. R. Holland (1986): Data constraints applied to models of the ocean general circulation, Part I: The steady case. J. Phys. Oceanogr., 16, 1665–1687. Mellor, G. L. and T. Ezer (1991): A Gulf stream model and an altimetry assimilation scheme. J. Geophys. Res., 96, C5, 8779–8795. Mitsudera, H., Y. Yoshikawa, B. Taguchi and H. Nakamura (1997): High-resolution Kuroshio/Oyashio system model: Preliminary results. JAMSTECR, 36, 147–155. Pham, D. T., J. Verron and M. C. Roubaud (1998): A singular evolutive extended Kalman filter for data assimilation in oceanography. J. Mar. Sys., 16, 323–340. Smith, N. and M. Lefebvre (1998): The Global Ocean Data Assimilation Experiment. Paper presented at the International Symposium “Monitoring the Oceans in the 2000s: An Integrated Approach” 15–17 October 1998, Biarritz, France. Verron, J., L. Gourdeau, D. T. Pham, R. Murtugudde and A. J. Busalacchi (1999): An extended Kalman filter to assimilate satellite altimeter data into a nonlinear numerical model of the tropical Pacific Ocean: Method and validation. J. Geophys. Res., 104, C3, 5441–5458. Wilson, S. (2000): Launching the ARGO armada. Oceanus, 42, 1, 17–19.