Nonparametric Wavelet Methods for Nonstationary Time Series Rainer von Sachs October 3, 1998
FB Mathematik, Universitat Kaiserslautern, D-67653 Kaiserslautern, Germany e-mail:
[email protected] and Institut de Statistique, UCL, Louvain-la-Neuve, Belgium e-mail:
[email protected]
Abstract
This article gives an overview on nonparametric modelling of nonstationary time series and estimation of their time-changing spectral content by modern denoising (smoothing) methods. For the modelling aspect localized decompositions such as various local Fourier (spectral) representations are discussed, among which wavelet and local cosine bases are most prominent ones. For the estimation of the possibly time-varying coecients of these local representations wavelet denoising algorithms are applied and their particular properties in the context of time{ frequency and time{scale analysis is discussed. In particular non{linear wavelet thresholding, including recent developments of generalizing its usefulness for non{identically, correlated and even non{stationary noise, is brie y reviewed as this is the unifying component of the estimation algorithms. The introduced procedures are illustrated by application to various simulated and real data examples from time{frequency and time{scale analysis, respectively.
1 Introduction
1.1 Overview
This article reviews recent developments of the author and his collaborators on the usefulness of wavelet methods for nonparametric estimation and modelling of the time{varying structure of nonstationary random signals. Wavelets, in particular denoising by non{linear wavelet thresholding, have become popular in the eld of statistical curve estimation where the rst theoretical results have been established for the simple case of Gaussian, i.i.d. data. Quite naturally the question arises whether it is possible to transfer these results and use appropriately modi ed wavelet denoising for more general signal{in{noise situations, as for, e.g., regression and spectral density estimation, with non{Gaussian, dependent, heteroskedastic and even nonstationary errors. This question has been addressed by embedding the respective denoising problem into the paradigm of minimax estimation: By methods described in a general overview in [NvS95] it can be shown that non{ linearly thresholded wavelet estimators perform uniformly well in a whole range of function classes measured by the \worst{case" L2 {risk between the estimator and the unknown function modelled to be in a ball of one of these function classes. That is, the resulting estimators attain the near{ optimal minimax rate for the L2 {risk, known from idealized Gaussian white noise models. In particular, the { in this line { most general case of this principle has been delivered by the work of [vSMG] which treats regression with so{called locally stationary errors, a special case of a restricted 1
departure from stationarity (see below). The key to transferring the well{known results from Gaussian i.i.d. situations is to control the (non{Gaussian) error structure by uniform cumulant conditions: These restrict the dependency in the underlying noise process and can be seen as some sort of mixing conditions. As they are given in a uniform manner, they are even sucient to cope with an instationary noise structure. Hence, they are well tted for the treatment of estimation of possibly time{varying spectral structure ([vSS]; [NvS]), of developing wavelet{based tests for stationarity of a given time series ([vSNeu]), as well as of estimation of the trend function of a locally stationary time series ([vSMG]). Finally they allow for a rigorous treatment of the problem of time{varying autoregression ([DNvS]). All these approaches have in common that it is possible to show asymptotic equivalence of the considered L2 estimation risk to the one in an accompanying Gaussian i.i.d. situation. This is achieved by showing a strong form of asymptotic normality of the (standardized) empirical wavelet coecients which puts more emphasis on moderate and large deviations. Of course, depending on the given speci c model situation, the question of choosing an appropriate threshold in practice always needs some extra considerations which can be found in the various sections on applications of the cited papers. One aspect is to view wavelet methods, i.e. non{linear thresholding, as a simple and powerful methodology for curve denoising. Another important aspect is that they can provide for either new local methods or localizing existing approaches which try to estimate the time{varying structure of possibly nonstationary time series. Discussing rst the role of wavelets as \post{processing" tools, they can serve as adaptive time{frequency smoothers of Fourier{based (periodogram{based) estimators, as shown in [vSS]: This approach on segmented periodograms of locally stationary time series, in the above{mentioned sense of nonparametric curve estimation, generalizes the work of [Gao] and of [Neu], which treat the stationary situation. Full adaptivity which goes beyond the question of an appropriate local time{frequency smoother, is addressed in [NvS]: Here, by applying non{linear thresholding to the empirical 2{d wavelet coecients of a fully localized periodogram (a kind of Wigner estimator), an automatic adaptive selection results into a proper time{frequency segmentation of the given time series. A complementary approach, which starts from a family of given time segmentations instead, is using a library of local cosine packet bases in [DoMvS] to nd the basis which almost diagonalizes the covariance matrix of a given time series. For an actual estimator of these time{varying autocovariances, again principles of thresholding are used, now of the coecients in a wavelet expansion of the (squared) cosine packet coecients per time segment. Thus, appropriately smoothed variance estimators can be used on the empirical side to maximize the Best Basis search criterion, which is sums of squared variances. This allows to search for the least{smeared out time{frequency representation of the given time series, and hence for a consistent estimator of its time{changing second order structure. In all approaches, models of local stationarity, as introduced by R. Dahlhaus ([Dahl]), allow to do rigorous estimation theory based on rescaling in local time by analogues to nonparametric regression. However, wavelets as genuine local representations can themselves be used to represent nonstationary time series: In an approach which in this sense competes with Fourier methods, in [vSNK] wavelet representations of stochastic processes are introduced. First ideas are presented of how to use wavelet periodograms (or scalograms) to estimate the time{scale content of a given time series which shows an instationary behaviour possibly very dierent to that of Fourier local stationarity. These provide a local autocorrelation analysis where it might be more appropriate to think of a decomposition of the time series into blocks which are both localized in time and in scale (or \wave") length. In a slightly simpli ed time{scale model, again using ideas of rescaling in time, a theory is derived of how to estimate what is called an evolutionary wavelet spectrum. 2
This quantity measures the local power in the variance{covariance decomposition of the process in the time{scale plane, i.e. with respect to location and scale. In [vSNK2], nally, a fully developed model of locally stationary wavelet processes is used to derive adaptive estimation of the evolutionary wavelet spectrum: Here, a spectral representation with orthogonal increments (in generalization of the classical Cramer representation) is given with respect to a non{decimated or stationary wavelet transform (SWT). This allows for a broader class of models, including classical stationary processes with moderately decaying autocovariances, for the price of a more re ned estimation theory which takes the redundancy of the SWT appropiately into account. In both approaches, the principle of local smoothing by thresholding methods discussed above, is used to derive consistent estimators based on smoothed wavelet periodograms. Again, rates for the L2 {risk between these estimators and the evolutionary wavelet spectrum are given.
1.2 Outline
In Section 2 we will primarily discuss modelling of the time{varying structure of nonstationary time series. We rst introduce into the problem of nonstationary spectral analysis, and give some motivation why this is a relevant problem at all. Then we introduce rigorous statistical models in the time{frequency and time{scale plane and compare them. We de ne models of local stationarity in both \worlds", whereas in time{frequency we focus on the approach as it was introduced by Dahlhaus, see [Dahl]. Whereas in general, there is no notion of a unique time{ frequency representation, this latter approach allows for some (asymptotically) uniquely de ned time{dependent spectrum. Hence a framework for rigorous estimation theory is provided. We will also discuss alternative models by Mallat, see [MPZ], by [MHK] and in the work of [DoMvS], which uses ideas of the Dahlhaus' model of renormalization in time given as regularity conditions on the sequence of covariances ( t;s ) instead. As for time{scale models, we present the model(s) of locally stationary wavelet processes, both with respect to the standard discrete (DWT) and to the non{decimated or \stationary" wavelet transform (SWT). The rst ones provides a slightly narrow model class, but allows for a nice intuitive development of estimation theory with wavelet periodograms. The second is a much broader approach, for the price of some extra work how to control the redundancy in the (non{ orthogonal) SWT. Again concepts of rescaling in local time allow for uniqueness of the spectral representation in the same sense as given above. As in all approaches non{linear wavelet thresholding will play a major role, in Section 3 we turn to a closed, but not too detailed, presentation of this: We brie y talk about wavelet curve estimation and the principle of denoising in the simple Gaussian i.i.d. model, as it was derived by Donoho and various co{authors (see, e.g., [DJKP]). Then we discuss in sort of a benchmark theorem a typical result for wavelet threshold estimators attaining the near{optimal minimax rate for the L2 {risk to the unknown function modelled to be in a ball of a certain function class (e.g., Holder, Sobolev, Besov). We show how we can prove this theorem for the more general situations of non{Gaussian, non{i.i.d. errors, even for non{stationary and correlated data. This enables us to use wavelet denoising in the situation of estimation of evolutionary spectra, of time{varying autoregression and of regression with nonstationary errors. Finally we discuss some new ideas of how to appropriately modify existing threshold rules for the situation of non{homogeneous variability of the emprirical wavelet coecients. In Section 4 we will discuss in detail various estimation procedures, rst, in time{frequency, including wavelet{based tests on stationarity, then, estimation in time{scale models. Both, under the respective model of local stationarity, have some consistency property, often even some optimality, such as a (near{) optimal rate of L2 {risk between the estimator of the evolutionary (wavelet) spectrum and this theoretical quantity. Also, some simulated and some real data examples are shown to emphasize the usefulness of the estimation algorithms. 3
2 Time{varying structure of nonstationary time series 2.1 Motivation: The problem of nonstationary spectral analysis
Many phenomena in the applied science (e.g, biomedicine, electrical and acoustical engineering, geophysics, and economics) show an instationary behavior over time. Hence, time series data collected from these signals, should not be modelled by a stationary random process but rather by some approach which tries to capture the time{varying second order structure of these data, i.e. an autocovariance function which is no more simply a function of time{lag, or a time{dependent spectrum, respectively. There have been several attempts in the past to introduce various concepts of time{dependent power spectra, or more general, time{frequency representations of an instationary signal. Among those the Wigner{Ville spectrum plays a prominent role (see, e.g., [MaFl]), which more generally can be considered as one particular member of a general class of time{frequency distributions, the so{called Cohen's class. For a short good overview we refer to [Adak]. Dierent approaches try to either generalize from parametric models, like AR{ or ARMA{processes, as in the work of [DNvS], and introduce a time dependency on the parameters of the model. Or they directly generalize from the spectral representation of a second{order stationary time series, as [Pr] and [Dahl], to mention the most prominent ones. We will use those latter ones, and related concepts of quasi or locally stationary processes, to provide a framework for rigorous statistical theory of how to estimate the time{dependent second{ order structure of a nonstationary time series. For this, some concept of restricting the departure from stationarity of the time series is needed, for which basically the notion of a slow change of its spectral characteristics is introduced. We like to formalize this a bit now. Let Xt , 1 t T; denote an observed segment of a zero mean process fXt g. Let ? = ( t;s ) denote the covariance matrix of Xt , i.e. t;s = EXt Xs . If the process fXt g is stationary, then we can estimate this covariance t;s = (t ? s), or its Fourier transform, the spectrum, from one single realizations of X , a vector of length T . In general, however, without further assumption on the time{varying spectrum of a nonstationary time series this is no more possible, hence n > 1 replications of the set of observed data X1(i) ; :::; XT(i) , 1 i n; are needed. Then one would use the empirical covariance estimate n
X C = n?1 X (i) X (i) i=1
0
;
which is fairly good, if n T (for asymptotics, T would be xed and n tend to in nity). In the setting of the applications described above, however, one has often T very large { think of realizations of a voice signal or a digital communications signal where T can be thousands or millions { while n may be quite small { dozens or hundreds, or even just one. In this setting, the empirical covariance is useless, and one has to assume that the process, though nonstationary, is slowly changing in its spectral characteristics. This restriction to locally or quasi{stationary time series allows the estimation of the time{dependent spectrum from a single record of the process. A slowly changing process is, roughly speaking, a time series for which at each local time point, an interval of stationarity can be de ned, within which the covariance is stationary or at least approximately so. Consequently, the covariance would need to decay suciently fast over a region which allows for its faster variation in local time.
2.2 Locally stationary time{frequency models
For later purpose of estimation theory (see Section 4.1), we now introduce the approach of [Dahl], which we explain in the rst subsection. However, in the second subsection, we also discuss alternative models, i.e. the ones of [MPZ] and [MHK], and somehow focus on the recent one 4
of [DoMvS]. Finally, also the models of [Adak], motivated by piecewise stationary processes, and the time{varying autoregressive model of [DNvS], are shortly mentioned.
2.2.1 Locally stationary processes and evolutionary spectra
The approach of [Dahl] is a straightforward generalization of the Cramer representation of a stationary stochastic process (see [Pr2], e.g.). This approach de nes a sequence of doubly{indexed processes as follows:
De nition 2.1 (Dahlhaus) A sequence of stochastic processes fXt;T gt
;::: ;T is called
=1
stationary with transfer function Ao and trend if there exists a representation Z Xt;T = ( Tt ) + Aot;T (!) exp(i!t) d(!) ;
?
locally (1)
where (i) (!) is a stochastic process on [?; ] with (!) = (?!); E (!) = 0 and orthonormal increments, i.e. cov(d (!); d (!0 )) = (! ? !0 )d! ; and where (ii) there exists a positive constant K and a smooth function A(u; !) on [0; 1] [?; ] which is 2-periodic in !, with A(u; ?!) = A(u; !), such that for all T ,
sup jAot;T (!) ? A(t=T; !)j K T ?1 : t;!
(2)
A(u; !) and (u) are assumed to be continuous in u. The asymptotics are based on rescaling in time{location which allows asymptotic inference starting from a single realization of fXt;T g. This is possible because the smoothness of A in u controls the variation of Aot;T (!) as a function of t such that it is allowed to change only slowly enough. The idea is essentially that one implicitly assumes some local interval of stationarity which determines this variation. This neighbourhood becomes asymptotically arbitrarily small in the rescaled time u. In other words, in actual time t it gets asymptotically larger but on a slower rate than the length T of the whole time series (cf. Section 4.1.1 below). Estimation theory then parallels the one of nonparametric regression with an asymptotically denser and denser design on (0; 1). There are a variety of possibilities for exact smoothness assumptions on A(u; !): Basically bounded partial derivatives in both time and frequency are needed, but without too much eort these can be relaxed quite a bit, to some slightly stronger than continuity and being of bounded variation, see [NvS]. Though, quite naturally, the locally stationary process is not uniquely determined by the sequence fAot;T g, this model allows to de ne a unique underlying time{varying spectrum of fXt;T g, as follows. Consider rst, for u 2 (0; 1), 1 X 1 CovfX[uT ?s=2];T ; X[uT +s=2];T g exp(?i!s); fT (u; !) = 2 s=?1
(3)
where the Xt;T 's are given by (2.1), with Aot;T (!) = A(0; !) for t < 1 and Aot;T (!) = A(1; !) for t > T . This quantity, for xed T , is similar to the so{called Wigner{Ville spectrum (see, e.g., [MaFl]). Then, by the following (4) and (5), fT (u; !) will be related to the smooth amplitude function A(u; !), which de nes the \evolutionary spectrum": 5
De nition 2.2 (Dahlhaus) The evolutionary spectrum of fXt;T g given in (1) is de ned, for
u 2 (0; 1), by
f (u; !) := jA(u; !)j2 : (4) This f (u; !) is the limit of fT (u; !) as T ! 1, in general in some mean{square sense as shown
in [NvS], Theorem 3.1:
Z 1Z
?
0
j fT (u; !) ? f (u; !) j d! du = o(1) : 2
(5)
Some examples for locally stationary processes are the following: 0 0 Modulated processes: Xt;T = t;T Yt ; where fYt g is a stationary process, and j t;T ? ? 1 (t=T ) j = O(T ), for some smooth function (u) on (0; 1). The resulting evolutionary spectrum is simply a one dimensional function in u times a one dimensional function of frequency, the spectrum of fYt g. Time{varying moving average processes: Here, the sequence A0t;T (!) coincides with the smooth function A(t=T; !) which is a rational function in the frequency ! (as it is for stationary MA{processes). This does not hold any more for the next example, due to the additional autoregressive part. Time{varying ARMA processes: These can again be best characterized by the form of their evolutionary spectrum, which, as a function of frequency, is exactly the same ratio of rational AR and MA parts as it is for the stationary case. The time dependency simply shows in the dependency of the AR{ and MA{parameters, being functions of rescaled time u, now. In all of these examples one easily observes that the theory of locally stationary processes is a real generalization of classical stationarity: For stationary processes the evolutionary spectrum coincides with the classical spectral density, as should be expected from a concise theory.
2.2.2 Alternative or related models
An alternative idea to describe the deviation from stationarity is the approach of [MPZ] which tries to explicitly control rst and higher order derivatives of a continuum symbol describing a generalized Fourier spectrum of the nonstationary process. In contrast to the Dahlhaus approach it works with explicit bounds on the variation in time of the spectrum and with explicit local intervals of stationarity in which the covariance approximately behaves like a stationary one. So, the [MPZ] approach might be more appropriate for approximation of a given time series of nite sample size by quasi{stationary models (if one gets sucient knowledge on the explicit bounds). It does not allow, however, for an asymptotic estimation theory. This is in the same spirit as is the concept of underspread processes of [MHK] where the deviation from stationarity is modelled by the so{called spread. Basically this expresses a control of the trade-o between a slow decay of the autocorrelation and of a fast change of the second{order structure as a function of local time. This is accompanied by some de nition of and analysis by an evolutionary Weyl spectrum of nonstationary random processes. The concept of [DoMvS] goes beyond these ideas and uses renormalization in time as in the Dahlhaus' model by imposing only an inhomogeneous local stationarity on the sequence of covariances ( t;s ): Assume a triangular array of processes fXt;T g1;:::;T , in which the T {th process obeys a uniform decay of the correlation X u
2 (1 + 2juj1 )2 t;t 8t = 1; : : : ; T ; +u Q1 ;
6
(6)
and a quasi{stationarity of the covariance Avet k t; ? s+h;+hk`2 Q2;T jhj2 ; 8h:
(7)
Here Q1 is independent of T , but Q2;T = Q2;1 =T 2 depends on T , which basically is the necessary smoothness assumption on the time{varying behavior of the covariances and which, for 2 = 1 would parallel equation (2) in the previous subsection. With this, an explicit bound on the deviation of the given covariance s;t from a stationary one is given which asymptotically tends to zero. The presence of the averaging component in (7), i.e. the sum over all t, says that on average nearby rows of the covariance matrix are similar, but that occasionally, nearby rows can be very dierent. In contrast to the notion of local stationarity as in the previous subsection, this allows for processes which are typically almost stationary, but with a few sudden changes of regime. This inclusion of an inhomogeneous change calls for certain inhomogeneous partitionings of the time{frequency plane, and allows then for faster rates of approximation and convergence of appropriately de ned autocovariance estimators (see Section 4.1.5). We also like to mention the work of [Adak] which uses a slightly modi ed version of the Dahlhaus model. This leads to a class of locally stationary processes for which simple piecewise stationary models are a good (mean{square) approximation. Finally, in [DNvS], time-varying autoregressive models are used which are a special case of the general Dahlhaus class of locally stationary processes. I.e. in this model of again a doubly{indexed sequence of AR{processes, the autoregressive parameters are now functions of rescaled time u and allow estimation theory as in traditional nonparametric regression, see Section 4.1.4.
2.3 Locally stationary time{scale models
As an alternative to time{frequency decompositions of nonstationary processes, time{scale models express a dierent idea of a local autocovariance analysis: here the process is decomposed into atoms which match the typical correlation length of Xt in a local neighborhood of t, i.e. this correlation length is matched by the scale (or \wave") length of the building blocks. A very simple example would be the superposition of box car functions (t ? k) with (uncorrelated) zero{mean random amplitudes k ,
Xt =
X k
k (t ? k) ; (t) = 1[0;1] (t) ;
or, to allow for really changing scale length aj of the atoms,
Xt =
X j;k
jk (aj (t ? k)) ; (t) = 1[0;1=2] (t) ? 1(1=2;1] (t) ;
which resembles a stochastic (mean square) wavelet decomposition (with Haar wavelets). In fact, one possibility arises for the speci c choice aj = 2j , which is a discretely indexed family of wavelets f jk g; jk (t) = 2j=2 (2j t ? k), not necessarily restricted to the Haar family, but allowing for other orthogonal wavelets, of course. (For a slightly more detailed introduction into wavelets we refer to Section 3.1, given, however, in the context of curve estimation.) This speci c set{up and its statistical treatment has been introduced by [vSNK] where a wavelet decomposition of a stochastic process is developed which parallels a time{localized Cramer (Fourier) spectral representation - again to be understood in the mean squared sense. Instead of thinking as scale in terms of \inverse frequency" they start from genuine time{scale building blocks or \atoms". Then, typically the variance (\power") of a coecient j;k will be large if at time t = 2?j k there is high correlation of Xk with Xk? or Xk+ for some that is in the order of the scale length of the atom jk (t) (basically the length of the support of this wavelet), which is proportional to 2?j 7
(assuming that (t) is localized at time zero, which it nearly always is in practice). The promising aspect of this model is that it allows local (time) variation, in a way that fast \oscillations" are modeled to change quickly and slow \oscillations" to change slowly. Some related approaches should be mentioned, e.g. the one of [MoCh], which however covers only the stationary situation, and the one of [AGF] which uses a concept similar to the following concept of wavelet spectral analysis, in a dierent context, i.e. analysis of 1=f processes. To do second{order analysis, a wavelet spectrum is introduced which is the expected value of the squared modulus of the coecients j;k . This quantity measures the local power in the variance{ covariance decomposition of the process fXt g at a certain scale j and a time location k and delivers a time{scale decomposition in as much as (1) represents a time{frequency decomposition. In comparison with so-called \scalegrams", where for xed j the P power2 in a whole \frequency band" corresponding to the inverse of the scale j is considered by k jj;k j , this new approach delivers a real local analysis (cf. Figure 7 in Section 4.2.1). In continuation of this work [vSNK2] replace the discrete orthogonal wavelet basis by a non{ decimated or \stationary" one (see [NaSi]). This leads to localized representations which seem to provide just the right amount of discretization (of continuously indexed time{scale transforms) needed for a most general but still meaningful two parameter representation with respect to uncorrelated increments. Hence, they really generalize the classical Cramer spectral decomposition and include classical stationary processes with moderately decaying autocovariances. In a model of locally stationary wavelet processes (see De nition 2.3 below) a theory is developed which allows de nition and estimation of the evolutionary wavelet spectrum, given in De nition 2.4. An appropriately de ned local autocovariance possesses a paralleling theoretical representation. i.e. equation (16) below, now with respect to an autocorrelation wavelet family. In order to ensure that this spectral representation is a unique one, in the same sense as it is for locally stationary processes of De nition 2.1, we need to investigate a certain transform: the bounded inverse of the operator A to be introduced in (17) below. This matrix measures the degree of redundancy in the overcomplete non{decimated wavelet family. Its bounded inverse guarantees linear independence of the autocorrelation wavelet family (which can even be orthogonalized). Hence evolutionary wavelet spectrum and local autocovariance are linked by transformation with respect to this autocorrelation wavelet family. With this family, a variety of possibly time{changing autocovariances can be represented much sparser than with the classical Fourier decomposition. For examples, we refer to the illustrating sections 2.3 and 2.4 of [vSNK2], which discuss how the autovariance function of speci c (piecewise) stationary moving average processes of low order can be very parsimonously represented by just few elements of the autocorrelation Haar wavelet family. This idea transfers to more general situations analogously. Sparse representations are generally easier to estimate as well as being appealing in their own right.
2.3.1 LSW processes and evolutionary wavelet spectra As for the Dahlhaus model in De nition 2.1, again a concept of local stationarity { now in \time"
k { is needed to allow for identi cation and estimation of the model and its coecients. What is mainly needed for asymptotics is a coupling of the lowest wavelet scale length (which is called ?J
here) to the (asymptotically growing) length of the process itself. Moreover, one needs to collect a growing local information per time{scale. In order to do so, the control of slow variation over k is introduced by the concept of \locally stationary wavelet (LSW) processes", again a doubly{indexed array of processes fXt;T gt=1;::: ;T ; T 1, which obeys the following de nition.
De nition 2.3 (von Sachs, Nason, Kroisandt (1997)) De ne a class of locally stationary wavelet (LSW) processes, a sequence of doubly-indexed stochastic processes fXt;T gt ;::: ;T ; T = 2J 1 with the following representation in the mean{square sense with respect to a stationary =1
8
family of wavelets f jk (t)gjk with jk (t) = 2j=2 (2j (t ? k)) from a given wavelet of compact support f (t)g (generating an ONB of L2 (R)), and an orthonormal random sequence of increments jk
Xt;T =
?1 X X
j =?J k
0 wj;k ;T
jk (t) jk ;
(8)
where 2J = T , jk (t) = 2j=2 (2j (t ? k)), j = ?1; ?2; : : : ; ?J (T ) = ? log2 (T ) , k = 1; : : : T; and possessing the following properties: 1. Ejk = 0 for all j; k. Hence EXt;T = 0 for all t and T . 2. Cov(jk ; `m ) = j`km . 3. There exists for each j ?1 a Lipschitz{continuous function Wj (z ) on (0; 1) which ful lls the following properties:
?1 X j =?1
jWj (z)j < 1 uniformly in z 2 (0; 1) ; 2
(9)
The Lipschitz constants Lj are uniformly bounded in j and ?1 X 2?j Lj j =?1