7.3 A Central Limit Theorem for m-Dependent Random Fields . . . . . . . . . . 123. 7.4 Asymptotic ...... Basu and Dorea [3], Nahapetian [62], and Poghosyan and. 89 ...
TOPICS ON MAX-STABLE PROCESSES AND THE CENTRAL LIMIT THEOREM
by Yizao Wang
A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The University of Michigan 2012
Doctoral Committee: Associate Professor Stilian. A. Stoev, Chair Professor Tailen Hsing Professor Robert W. Keener Professor Roman Vershynin Professor Emeritus Michael B. Woodroofe
ACKNOWLEDGEMENTS
First of all, I am indebted to my thesis advisor Professor Stilian A. Stoev for his help and support since 2008. He has been a great mentor for me in my research career. At the same time, he has also provided me many helps and advice in daily life. This dissertation would not have been possible without him. In particular, the first part of this dissertation is under his supervision. Second, I am grateful to Professor Emeritus Michael Woodroofe. He sets up a very high standard for scholars, and as a young researcher I am deeply influenced by him in many aspects. The second part of this dissertation is under his supervision. I would also like to thank Professor Yves Atchad´e, Professor Tailen Hsing, Professor Bob Keener and Professor Parthanil Roy (from Michigan State University) for many insightful and inspiring discussions on research. I also appreciate Professor Tailen Hsing, Professor Bob Keener, Professor Roman Vershynin and Professor Michael Woodroofe for serving on my thesis committee. I own many thanks to all the faculty members and students in the Department of Statistics at the University of Michigan. I really enjoy my last five years as a graduate student in Ann Arbor. At last, I am greatly indebted to my parents for their unconditional support of my pursue of academia career abroad during the past years. Without their support I can achieve no success. I am also grateful to my wife, Fei Xu, for her companionship full of encouragement, support and consideration.
ii
TABLE OF CONTENTS
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
LIST OF FIGURES
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
CHAPTER I. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 1.2
1
Max-stable Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Central Limit Theorems for Random Fields . . . . . . . . . . . . . . . . . . .
2 5
II. Preliminaries on Max-stable Processes . . . . . . . . . . . . . . . . . . . . . .
8
2.1 2.2
Spectral Representation and Extremal Integrals . . . . . . . . . . . . . . . . Spectrally Continuous and Discrete α-Fr´echet processes . . . . . . . . . . . .
9 14
III. Association of Sum- and Max-stable Processes . . . . . . . . . . . . . . . . .
16
3.1 3.2 3.3 3.4 3.5
. . . . .
19 22 24 29 31
IV. Decomposability of Sum- and Max-stable Processes . . . . . . . . . . . . . .
35
4.1 4.2 4.3 4.4
Preliminaries . . . . . . . . . . . . . . . . . . . Identification of Max-linear and Positive-linear Association of Sum- and Max-stable Processes Association of Classifications . . . . . . . . . . Proofs of Auxiliary Results . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . . . .
. . . .
. . . . . .
. . . .
. . . . . .
. . . .
. . . . . .
. . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . .
. . . . . .
. . . .
. . . . . .
59 62 68 73 80 82
VI. Central Limit Theorems for Stationary Random Fields . . . . . . . . . . . .
89
iii
. . . . . .
. . . .
. . . . .
58
. . . . . .
. . . .
. . . . .
V. Conditional Sampling for Max-stable Processes . . . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . .
37 41 49 52
Overview . . . . . . . . . . . . . . . . . . . . . . Conditional Probability in Max-linear Models . Conditional Sampling: Computational Efficiency MARMA Processes . . . . . . . . . . . . . . . . Discrete Smith Model . . . . . . . . . . . . . . . Proofs of Theorems V.4 and V.9 . . . . . . . . .
. . . .
. . . . .
. . . .
5.1 5.2 5.3 5.4 5.5 5.6
SαS Components . . . . . . . . . . . . . Stationary SαS Components and Flows . Decomposability of Max-stable Processes Proof of Theorem IV.1 . . . . . . . . . .
. . . . . . Isometries . . . . . . . . . . . . . . . . . .
. . . . . .
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8
Main Result . . . . . . . . . . . . m-Dependent Approximation . . . A Central Limit Theorem . . . . . An Invariance Principle . . . . . . Orthomartingales . . . . . . . . . Stationary Causal Linear Random A Moment Inequality . . . . . . . Auxiliary Proofs . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . Fields . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
91 93 95 98 101 105 109 112
VII. Asymptotic Normality of Kernel Density Estimators for Stationary Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 7.1 7.2 7.3 7.4 7.5
Assumptions and Main Result . . . . . . . . . . . . . . . . Examples and Discussions . . . . . . . . . . . . . . . . . . A Central Limit Theorem for m-Dependent Random Fields Asymptotic Normality by m-Approximation . . . . . . . . Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
. . . . .
118 121 123 125 128 134
LIST OF FIGURES
Figure 5.1
5.2
5.3
Four samples from the conditional distribution of the discrete Smith model (see Section 5.5), given the observed values (all equal to 5) at the locations marked by crosses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Prediction of a MARMA(3,0) process with φ1 = 0.7, φ2 = 0.5 and φ3 = 0.3, based on the observation of the first 100 values of the process. . . . . . . . . . . . . . . .
77
Conditional medians (left) and 0.95-th conditional marginal quantiles (right). Each cross indicates an observed location of the random field, with the observed value at right. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
v
LIST OF TABLES
Table 5.1
5.2
5.3
Means and standard deviations (in parentheses) of the running times (in seconds) for the decomposition of the hitting matrix H, based on 100 independent observations X = A Z, where A is an (n × p) matrix corresponding to a discretized Smith model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
Cumulative probabilities that the projection predictors correspond to at time 100+ t, based on 1000 simulations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
Coverage rates (CR) and the widths of the upper 95% confidence intervals at time 100 + t, based on 1000 simulations. . . . . . . . . . . . . . . . . . . . . . . . . . .
78
vi
CHAPTER I
Introduction
This dissertation consists of results in two distinct areas of probability theory. One is the extreme value theory, the other is the central limit theorem. In the extreme value theory, the focus is on max-stable processes. Such processes play an increasingly important role in characterizing and modeling extremal phenomena in finance, environmental sciences and statistical mechanics. Several structural and ergodic properties of max-stable processes are investigated via their spectral representations. Besides, the conditional distributions of max-stable processes are also studied, and a computationally efficient algorithm is developed. This algorithm has many potential applications in prediction of extremal phenomena. In the central limit theorem, the asymptotic normality for partial sums of stationary random fields is studied, with a focus on the projective conditions on the dependence. Such conditions, easy to check for many stochastic processes and random fields, have recently drawn many attentions for (one-dimensional) time series models in statistics and econometrics. Here, the focus is on (high-dimensional) stationary random fields. In particular, a general central limit theorem for stationary random fields and orthomartingales is established. The method is then extended to establish the asymptotic normality for the kernel density estimator of linear random
1
2
fields. Below are overviews of the following chapters of this dissertation. 1.1
Max-stable Processes
Max-stable processes arise in the limit of maxima of independent and identically distributed processes. It is well known that all max-stable processes can be transformed to α-Fr´echet processes. A random variable Y is α-Fr´echet with α > 0, if P(Y ≤ y) = exp(−σ α y −α ), y > 0. A stochastic process {Yt }t∈T is α-Fr´echet, if all its max-linear combinations in form W of maxi=1,...,n ai Yti ≡ ni=1 ai Yti , ai > 0, ti ∈ T, i = 1, . . . , n, n ∈ N are α-Fr´echet. It is known since de Haan [23] that under mild regularity conditions, for every αFr´echet process {Yt }t∈T , there exists a class of non-negative, Lα -integrable functions {ft }t∈T ∈ Lα+ (S, BS , µ), such that P(Yt1 ≤ y1 , . . . , Ytn ≤ yn ) = exp
(1.1)
n
−
Z _ n S
fti (s)/yti
α
o µ(ds) .
i=1
Indeed, every such a process has an extremal integral representation as d
{Yt }t∈T =
(1.2)
n Ze
o ft (s)Mα∨ (ds)
S
where ‘
eR
,
t∈T
’ is the symbol of the extremal integral and Mα∨ is an α-Fr´echet random
sup-measure (see Stoev and Taqqu [101]). Preliminary results on max-stable processes can be found in Chapter II. Then, starting with such representation results, structural properties of max-stable processes are investigated. Besides, a careful investigation of its conditional distributions also yields an exact conditional sampling algorithm, which has potential applications of spatial extremes.
3
Association of max-stable processes to sum-stable processes The association of α-Fr´echet processes to the symmetric α-stable (SαS) processes is established in Chapter III. Namely, under mild assumptions, every α-Fr´echet process can be associated to an SαS process via spectral representations. This provides a theoretical support to the longstanding folklore that two classes of processes share many similar structural results. However, the converse is not true. That means that roughly speaking, the class of SαS processes has richer structures than the class of α-Fr´echet processes. The association method has become a convenient tool to translate results on SαS processes (e.g. Rosi´ nski [83] and Samorodnitsky [92]) to α-Fr´echet processes. By the association method, many structural results on SαS processes have natural counterparts for α-Fr´echet processes. See also Kabluchko [50] for an independent treatment with different tools. Decomposability of max-stable processes The decomposability properties have been extensively studied for probability distributions. The notion of decomposability can be generalized to α-Fr´echet processes. Namely, letting Y = {Yt }t∈T be an α-Fr´echet process as in (1.2), a natural question is, when can we write n o d (1) (2) {Yt }t∈T = Yt ∨ Yt
(1.3)
,
t∈T
(i)
where Y(i) = {Yt }t∈T , i = 1, 2, are two independent α-Fr´echet processes? If such processes Y(1) , Y(2) exist, what kind of α-Fr´echet processes can they be? To what extent are their structures determined by Y? A characterization of all possible α-Fr´echet components Y(i) is established in Chapter IV. Furthermore, when Y is stationary, a necessary and sufficient condition
4
for its α-Fr´echet component to be stationary is established. This time, Y may have but only trivial stationary α-Fr´echet components (scaled copies cY with c ∈ (0, 1)), and such a process is said to be indecomposable. These indecomposable processes can be viewed as the elementary building blocks for all stationary α-Fr´echet processes. Therefore, to study stationary α-Fr´echet processes, it suffices to focus on the indecomposable ones. The decomposability of stationary α-Fr´echet processes also provides a different point of view on the classification problem for stationary α-Fr´echet processes. Similar decomposability results also hold for sum-stable processes. This is clear from the association point of view. In fact, we first establish the decomposability result for sum-stable processes, and then obtain results for max-stable processes by the association method. Conditional sampling for max-stable random fields Given an α-Fr´echet random field {Yt }t∈Zd , what is the conditional distribution (1.4)
P((Ys1 , . . . , Ysm ) ∈ · | Yt1 , . . . , Ytn )
= ?
The conditional distribution formula is established for a dense class of α-Fr´echet random fields (the spectrally discrete ones). For such random fields, explicit exact formula of the conditional distribution (1.4) is obtained. The hard part of the problem is to provide an efficient algorithm applicable in practice. Such an algorithm is developed, thanks to certain conditional independence structure of spectrally discrete max-stable random fields. As a potential application, such an algorithm would play an important role in the prediction problem. The prediction problem arises in many scenarios from different areas. For example, suppose observations of heavy rainfalls are available in an area
5
at certain locations. The engineers often need estimates (predictions) of the rainfalls over the entire area, and this information is useful for building infrastructure for flood-protection. Max-stable random fields are natural models for such problems focusing on extremal phenomena. Remark I.1. The main results in Chapters II, III, IV and V have already been published in peer-reviewed journals ([111], [110], [112] and [113] respectively). 1.2
Central Limit Theorems for Random Fields
In probability theory, the central limit theorem is one of the problems with longest history: when does n
(1.5)
1 X √ (Xk − EXk ) ⇒ N (0, σ 2 ) n k=1
occur? While the case when {Xk }k=1,...,n are independent has been completely solved for more than half a century, establishing central limit theorems in the dependent case is still an active area of research. Such limit results are of fundamental importance in various areas, particularly in statistics theory, where it is important to characterize the cumulative behavior of large amount of individuals. This dissertation investigates two problems, focusing on central limit theorems for random fields. Namely, given a stationary random field {Xi,j }(i,j)∈Z2 , we establish conditions on the dependence such that n
(1.6)
n
1 XX (Xi,j − EXi,j ) ⇒ N (0, σ 2 ). n i=1 j=1
This problem has been investigated by many researchers. Many results in the literature are based on mixing-type conditions (see e.g. Bradley [8]). These conditions are sometimes difficult to check in applications. Here, the focus is on projective-type
6
conditions that are easy to check. Such conditions have recently drawn much attention in the study of central limit theorems for (one-dimensional) stochastic processes, with applications in statistics and econometrics. See for example Dedecker et al. [30] and Wu [118, 119], among others. The extension of the aforementioned results to high dimensions (i.e. random fields) is not trivial, as the main technical tool used there, the martingale approximation method, is not applicable in the multiparameter setting most of the time. Instead, we will take an m-approximation approach. A general central limit theorem A central limit theorem for stationary random fields (1.6) is established in Chapter VI. A particular example is the functionals of linear random fields (1.7)
Xi,j = g
∞ ∞ X X
ak,l i−k,j−l , (i, j) ∈ Z2 ,
k=0 l=0
where {i,j }(i,j)∈Z2 are i.i.d. random variables, and g is often a Lipschitz function. Such models from statistics have recently attracted people’s attentions (see e.g. [15]). Another example is when {Xi,j }(i,j)∈Z2 are orthomartingale differences (see e.g. Khoshnevisan [53]). In this case, a new central limit theorem for orthomartingales follows from the previous result, generalizing known results [3, 62, 63, 73] in the literature. Asymptotic normality of kernel density estimators Consider a causal linear random field {Xi,j }(i,j)∈Z2 (as in (1.7) with g(x) = x). The kernel density estimator n n 1 X X x − Xi,j fn (x) = 2 K n bn i=1 j=1 bn
is said to be asymptotically normal, if (1.8)
p n2 bn (fn (x) − Efn (x)) ⇒ N (0, σx2 ) ,
7
R where σx2 = p(x) K 2 (s)ds and p(x) is the density of X0,0 at x. Such estimators were first considered for i.i.d. sequences by Rosenblatt [82] and Parzen [65], and have been widely studied since then (see e.g. Wu and Mielniczuk [120] for a treatment for stationary sequences). Sufficient conditions for (1.8) to hold are provided in Chatper VII. Remark I.2. The results in Chapters VI and VII can be found in [114] and [115], which have been submitted to peer-reviewed journals at the time of writing this dissertation.
CHAPTER II
Preliminaries on Max-stable Processes
Max-stable processes have been studied extensively in the past 30 years. The works of Balkema and Resnick [2], de Haan [22, 23], de Haan and Pickands [26], Gin´e et al. [38] and Resnick and Roy [78], among many others have led to a wealth of knowledge on max-stable processes. The seminal works of de Haan [23] and de Haan and Pickands [26] laid the foundations of the spectral representations of maxstable processes and established important structural results for stationary maxstable processes. Since then, however, while many authors focused on various important aspects of max-stable processes, the general theory of their representation and structural properties had not been thoroughly explored. At the same time, the structure and the classification of sum-stable processes has been vigorously studied. Rosi´ nski [83], building on the seminal works of Hardin [45, 46] about minimal representations, developed the important connection between stationary sum-stable processes and flows. This led to a number of important contributions on the structure of sum-stable processes (see, e.g. [86, 84, 70, 71, 92]). There are relatively few results of this nature about the structure of max-stable processes, with the notable exceptions of de Haan and Pickands [26], Davis and Resnick [20] and the very recent works of Kabluchko et al. [51] and Kabluchko [50].
8
9
This chapter collects preliminary results on max-stable processes and their (stochastic) extremal integral representation introduced by Stoev and Taqqu [101] (see also Wang and Stoev [111]). This representation, essentially equivalent to the one by de Haan [23], provides a natural connection to sum-stable processes (see e.g. Samorodnitsky and Taqqu [93]). This connection is explored in Chapters III and IV. 2.1
Spectral Representation and Extremal Integrals
It is well known that the univariate marginals of a max-stable process are necessarily extreme value distributions, i.e. up to rescaling and shift they are either Fr´echet, Gumbel or negative Fr´echet. The extreme value distributions arise as limits of normalized maxima of independent and identically distributed random variables: Wn
i=1
X i − bn ⇒Z. an
If the weak convergence holds and Z is non-degenerate, then it must have one of the above mentioned distributions (see e.g. [76], Proposition 0.3). Similarly, given (i)
independent and identically distributed stochastic processes {Xt }t∈T , i ∈ N, if n Wn X (i) − b (t) o n t i=1 ⇒ {Zt }t∈T , an (t) t∈T for some {an (t)}t∈T ∈ RT+ , {bn (t)}t∈T ∈ RT , then the limiting process is necessarily a max-stable process. We focus on a special class of max-stable processes: the α-Fr´echet processes. Recall that a positive random variable Z ≥ 0 has α-Fr´echet distribution, α > 0, if P(Z ≤ x) = exp{−σ α x−α } , x ∈ (0, ∞) . Here kZkα := σ > 0 stands for the scale coefficient of Z. A stochastic process
10
{Xt }t∈T is α-Fr´echet, if all max-linear combinations: (2.1)
max aj Xtj ≡
1≤j≤n
n _
aj Xtj
for all aj > 0, tj ∈ T, j = 1, . . . , n,
j=1
are α-Fr´echet random variables. Any max-stable process can be transformed into an α-Fr´echet process by simply transforming their one-dimensional distributions into α-Fr´echet ones (see e.g. [76], Chapter 5.4). The seminal work of de Haan [23] provides convenient spectral representations for stochastically continuous α-Fr´echet processes in terms of functionals of Poisson point processes on (0, 1) × (0, ∞). Here, we adopt the slightly more general, but essentially equivalent, approach of representing max-stable processes through extremal integrals with respect to random sup-measures (see Stoev and Taqqu [101]). We do so in order to emphasize the analogies with the well-developed theory of sum-stable processes (see e.g. Samorodnitsky and Taqqu [93] and Chapter III below). Given a measure space (S, BS , µ) and α > 0, {Mα (A)}A∈BS is said to be an αFr´echet random sup-measure with control measure µ, if (i) the Mα (Ai )’s are independent random variables for disjoint Ai ∈ BS , 1 ≤ i ≤ n, (ii) Mα (A) is α-Fr´echet with scale coefficient kMα (A)kα = µ(A)1/α , and S W (iii) for all disjoint Ai ’s, i ∈ N, we have Mα ( i∈N Ai ) = i∈N Mα (Ai ), almost surely. One can then define the extremal integral of a non-negative simple function f (u) := Pn i=1 ai 1Ai (u) ≥ 0 with disjoint A1 , . . . , An ∈ BS : Z e
Z e f dMα ≡
S
f (u)Mα (du) := S
_
ai Mα (Ai ).
1≤i≤n
eR One can show that S f dMα is an α-Fr´echet random variable with scale coeffiR eR cient ( S f α dµ)1/α . The definition of f dMα can, by continuity in probability, S
11
be extended to integrands f in the space of nonnegative, Lα -integrable measurable functions Lα+ (S, µ) := {f ∈ Lα (S, µ) : f ≥ 0}. Here and in the sequel, we may write (S, µ) = (S, BS , µ) for simplicity. Extremal integrals are sometimes referred to as stochastic extremal integrals to emphasize that they are random variables. We omit the term ‘stochastic’ for the sake of simplicity. Extremal integrals are in parallel to the notion of stochastic integrals based on SαS random measures ([101]). In particular, two important properties of extremal integrals are: (i) the random variables
eR S
fj dMα , j = 1, . . . , n are independent, if and only if
the fj ’s have pairwise disjoint supports (mod µ), and (ii) the extremal integral is max-linear:
eR (af S
∨ bg)dMα = a
eR S
f dMα ∨ b
eR S
gdMα ,
for all a, b > 0 and f, g ∈ Lα+ (S, µ) = {f ∈ Lα (S, µ) : f ≥ 0}. For more details, see Stoev and Taqqu [101]. Now, for any collection of deterministic functions {ft }t∈T ⊂ Lα+ (S, µ), one can construct the stochastic process: Ze (2.2)
ft (u)Mα (du) , for all t ∈ T .
Xt = S
In view of the max-linearity of the extremal integrals and (2.1), the resulting process X = {Xt }t∈T is α-Fr´echet. Furthermore, for any n ∈ N, xi > 0, ti ∈ T, i = 1, . . . , n: (2.3)
n
P(Xt1 ≤ x1 , . . . , Xtn ≤ xn ) = exp
−
Z _ n S
α o x−1 f (u) µ(du) . t i i
i=1
This shows that the deterministic functions {ft }t∈T characterize completely the finitedimensional distributions of the process X. In general, if (2.4)
d
{Xt }t∈T =
nZ e S
ft dMα
o t∈T
,
12
for some {ft }t∈T ⊂ Lα+ (S, µ), we shall say that the process X has the extremal integral representation or spectral representation {ft }t∈T over the space Lα+ (S, µ). The ft ’s in d
(2.4) are also referred to as spectral functions of X. In this dissertation, we let ‘=’ denote ‘equal in finite-dimensional distributions’. Many α-Fr´echet processes of practical interest have tractable spectral representations, with (S, BS , µ) being a standard Lebesgue space. A measurable space (S, S, ν) is a standard Lebesgue space, if (S, S) is a standard Borel space and ν is a σ-finite measure. A standard Borel space is a measurable space measurably isomorphic (i.e., there exists a one-to-one, onto and bi-measurable map) to a Borel subset of a Polish space. For example, a Polish space with σ-finite measure on its Borel sets is standard Lebesgue, and one often chooses (S, BS , µ) = ([0, 1], B[0,1] , Leb) in (2.4). (For more discussions on standard Lebesgue spaces and stationary sum-stable processes, see Appendix A in [71].) As shown in Proposition 3.2 in [101], an α-Fr´echet process X has a representation (2.4) with (S, BS , µ) being standard Lebesgue, if and only if X satisfies Condition S. Condition S. There exists a countable subset T0 ⊆ T such that for every t ∈ T , we P
have that Xtn −→ Xt for some {tn }n∈N ⊂ T0 . Note that without the Condition S, every max-stable process X can still have a spectral representation as in (2.4), but the space (S, µ) may not be standard Lebesgue (see Theorem 1 in [50]). Remark II.1. The assumption that (S, µ) is a standard Lebesgue space implies that the space of integrands Lα+ (S, µ) is a complete and separable metric space with respect to the metric: Z (2.5)
ρµ,α (f, g) = S
|f α − g α |dµ .
13
This metric is natural to use when handling extremal integrals, since as n → ∞, Z Ze P |fnα − f α |dµ → 0, fn dMα −→ ξ , if and only if, ρµ,α (fn , f ) = (2.6) S
where ξ =
eR S
S
f dMα (see e.g. [101]). (Such a metric naturally induces a metric for
space of jointly α-Fr´echet random variables.) By default, we equip the space Lα+ (S, µ) R with the metric ρµ,α and often write kf kLα+ (S,µ) for ( S f α dµ)1/α . Here k·kLα (S,µ) is +
not a norm unless α ≥ 1. We focus only on the rich class of α-Fr´echet processes that satisfy Condition S. In particular, we want ft (s) to be jointly measurable as a function from (T, S) to R+ . Here, we suppose T is a σ-algebra on T and the measurability is w.r.t. the product σ-algebra T ⊗BS := σ(T ×BS ). The following result clarifies the connection between the joint measurability of the spectral functions ft (s) and the measurability of its corresponding α-Fr´echet process. The proof can be found in [111]. Proposition II.2. Let (S, µ) be a standard Lebesgue space and Mα (α > 0) be an α-Fr´echet random sup-measure on S with control measure µ. Suppose (T, ρT ) is a separable metric space and T is the Borel σ-algebra. (i) Let X = {Xt }t∈T have a spectral representation {ft }t∈T ⊂ Lα+ (S, µ) as in (2.4). Then, X has a measurable modification if and only if {ft (s)}t∈T has a jointly measurable modification, i.e., there exists a T ⊗BS -measurable mapping (s, t) 7→ gt (s), such that ft (s) = gt (s) µ-a.e. for all t ∈ T . (ii) If an α-Fr´echet process {Xt }t∈T has a measurable modification, then it satisfies Condition S, and hence it has a representation as in (2.4). We always assume (T, ρT ) is a separable metric space and T is the Borel σ-algebra. By Proposition II.2, any measurable α-Fr´echet process {Xt }t∈T always has a jointly measurable spectral representation and satisfies Condition S.
14
2.2
Spectrally Continuous and Discrete α-Fr´ echet processes
Definition II.3. Consider an α-Fr´echet process X = {Xt }t∈T . We say X is spectrally discrete, if X can be represented as d
{Xt }t∈T =
_
ft (i)Zi ,
i∈Z
where {Zi }i∈Z are i.i.d. standard α-Fr´echet random variables, and for each t ∈ T , P the map ft : Z → R+ satisfies i ft (i)α < ∞. The α-Fr´echet process X is spectrally continuous, if X cannot be represented as n o d (1) (2) {Xt }t∈T = Xt ∨ Xt
t∈T (1)
(2)
with two independent non-degenerate α-Fr´echet processes {Xt }t∈T , {Xt }t∈T , such that one of them is spectrally discrete. Theorem II.4. Let {Xt }t∈T be an α-Fr´echet process with jointly measurable representation {ft }t∈T ⊂ Lα+ (S, µ). Then, there exist spectrally continuous and discrete α-Fr´echet processes {Xtcont }t∈T and {Xtdisc }t∈T , such that two processes are independent, and d {Xt }t∈T = Xtcont ∨ Xtdisc t∈T . Furthermore, this decomposition is unique in distribution. The proof can be found in [111]. The processes {Xtcont }t∈T and {Xtdisc }t∈T are referred to as the spectrally continuous and spectrally discrete components of X, respectively. Example II.5. Let Zi , i ∈ N be independent standard α-Fr´echet variables and let P ft (i) ≥ 0, t ∈ T be such that i∈N ftα (i) < ∞, for all t ∈ T . The spectrally discrete α-Fr´echet processes can have stochastic extremal integral representation as Z e _ Xt := ft (i)Zi ≡ ft dMα , t ∈ T, i∈N
N
15
where Mα is an α-Fr´echet random sup-measure on N with counting control measure. Spectrally discrete max-stable processes have simple structure of conditional distributions, which will be explored in Chapter V. Example II.6. Consider the well-known α-Fr´echet extremal process (α > 0): (2.7)
d
{Xt }t∈R+ =
n Ze
o 1(0,t] (u)Mα (du)
R+
,
t∈R+
where Mα has the Lebesgue control measure on R+ (see e.g. [76], Chapter 4). The α-Fr´echet extremal process X is spectrally continuous.
CHAPTER III
Association of Sum- and Max-stable Processes
The deep connection between sum- and max-stable processes has long been suspected. As observed, for example, in [19] the moving maxima and the moving averages are statistically indistinguishable in the extremes. Also, the maxima of independent copies of a sum-stable process (appropriately rescaled) converge in distribution to a max-stable process and the two processes have very similar spectral representations (see e.g. [101], Theorem 5.1). In [100], the ergodic properties of max-stable processes were characterized by borrowing ideas and drawing parallels to existing work in the sum-stable domain. In this chapter, we introduce the notion of association of sum- and max-stable processes, by relating their spectral functions. It provides a theoretical support for the long-standing folklore that the two classes of processes share similar structures. Furthermore, we will see that the association method also helps ‘translate’ structural properties of sum-stable processes to max-stable processes. We focus on infinite variance symmetric α-stable (SαS, α ∈ (0, 2)) sum-stable processes and α-Fr´echet max-stable processes. Recall that an infinite variance SαS variable X has characteristic function ϕX (t) = E exp{−itX} = exp{−σ α |t|α } , ∀t ∈ R, where α ∈ (0, 2). On the other hand, Y has an α-Fr´echet distribution if FY (y) =
16
17
P(Y ≤ y) = exp{−σ α y −α } , ∀y ∈ (0, ∞), where now α is in (0, ∞). The σ’s in both cases are positive parameters referred to as scale coefficients. Recall that X = {Xt }t∈T is an SαS stochastic process if all its finite linear combiP nations ni=1 ai Xti , ai ∈ R, ti ∈ T, are SαS. These processes have convenient integral (or spectral) representations: d
Z
{Xt }t∈T =
(3.1)
ft (s)Mα,+ (ds)
.
S
t∈T
R Here {ft }t∈T ⊂ Lα (S, µ), ‘ ’ stands for the stable integral and Mα,+ is an SαS random measure on measure space (S, µ) with control measure µ (see [93], Chapters 3 and 13). The representation (3.1) implies that (3.2) E exp
n
−i
n X
aj Xtj
o
= exp
n
Z X n α o − aj ftj (s) µ(ds) , S
j=1
aj ∈ R , tj ∈ T ,
j=1
which determines the finite-dimensional distributions (f.d.d.) of the SαS process {Xt }t∈T . On the other hand, we have seen in Chapter II that, every α-Fr´echet process has an extremal integral representation Ze d (3.3) {Yt }t∈T = ft (s)Mα,∨ (ds) S
,
t∈T
with {ft }t∈T ⊂ Lα+ (S, µ), and (3.4) P(Yt1 ≤ a1 , . . . , Ytn ≤ an ) = exp
n
−
Z _ n S
ftj (s)/aj
α
o µ(ds) ,
aj ≥ 0 , tj ∈ T .
j=1
The ft ’s in (3.1) and (3.3) are called the spectral functions of the sum- or maxstable processes, respectively. Based on the spectral representations above, we define association as follows: Definition III.1 (Associated SαS and α-Fr´echet processes). We say that an SαS process {Xt }t∈T and an α-Fr´echet process {Yt }t∈T are associated, if there exist
18
{ft }t∈T ⊂ Lα+ (S, µ) such that: d
{Xt }t∈T =
nZ
ft dMα,+
S
o
d
t∈T
and {Yt }t∈T =
n Ze
ft dMα,∨
S
o
.
t∈T
In this case, we say {Xt }t∈T and {Yt }t∈T are associated by {ft }t∈T . (1)
We need to show that this definition is consistent. That is, if {ft }t∈T and (2)
{ft }t∈T are two different spectral representations of certain SαS (α-Fr´echet, resp.) process, then the associated α-Fr´echet (SαS, resp.) processes are equal in finitedimensional distributions. This is ensured by the following theorem, which is proved in Section 3.2 below. (i)
(i)
Theorem III.2. Consider two arbitrary collections of functions f1 , . . . , fn
∈
Lα+ (Si , µi ) , i = 1, 2, 0 < α < 2. Then,
(3.5)
n
X
(1) a f
j j
Lα (S1 ,µ1 )
j=1
n
X
(2) = aj f j
Lα (S2 ,µ2 )
j=1
, for all aj ∈ R ,
if and only if (3.6)
n
_
(1) a f
j j j=1
Lα + (S1 ,µ1 )
n
_
(2) = aj f j j=1
Lα + (S2 ,µ2 )
, for all aj ≥ 0 .
Furthermore, Theorem III.2 entails that our notion of association is not merely formal. For example, stationary or self-similar max-stable processes are associated with stationary or self-similar sum-stable ones, respectively (see Corollary III.12). We will also see, however, that there are SαS processes that cannot be associated to any α-Fr´echet processes (see Theorem III.13). In particular, we provide a practical characterization of the max-associable SαS processes {Xt }t∈T with stationary increments characterized by dissipative flow, indexed by T = R or T = Z (see Proposition III.16).
19
This chapter is organized as follows. In Section 3.1, some preliminaries are provided. In Section 3.2, we prove Theorem III.2. In Section 3.3, we establish the association of SαS and α-Fr´echet processes and give examples of both max-associable and non max-associable SαS processes. In Section 3.4, we show how the association can serve as a tool to translate available structural results for SαS processes to α-Fr´echet processes, and vice versa. 3.1
Preliminaries
We draw a connection between the linear isometries and max-linear isometries, which play important roles in relating two representations of a given SαS or an αFr´echet process, respectively. The notion of a linear isometry is well known. To define a max-linear isometry, we say that a subset F ⊂ Lα+ (S, µ) is a max-linear W space if for all n ∈ N, fi ∈ F, ai > 0, ni=1 ai fi ∈ F and if F is closed w.r.t. the R metric ρµ,α defined by ρµ,α (f, g) = S |f α − g α |dµ (and recall Remark II.1). Definition III.3 (Max-linear isometry). Let α > 0 and consider two measure spaces (S1 , µ1 ) and (S2 , µ2 ) with positive and σ-finite measures µ1 and µ2 . Let F1 ⊂ Lα+ (S1 , µ1 ) be a max-linear space. A mapping U : F1 → Lα+ (S2 , µ2 ), is said to be a max-linear isometry, if: (i) for all f1 , f2 ∈ F1 and a1 , a2 ≥ 0, U (a1 f1 ∨ a2 f1 ) = a1 (U f1 ) ∨ a2 (U f2 ), µ2 -a.e. and (ii) for all f ∈ F1 , kU f kLα (S2 ,µ2 ) = kf kLα (S1 ,µ1 ) . +
+
A linear (max-linear resp.) isometry may be defined only on a small linear (maxlinear resp.) subspace of Lα (S, µ) (Lα+ (S, µ) resp.). However, this linear (max-linear resp.) isometry can be extended uniquely to the extended ratio space (extended
20
positive ratio space resp.), which will turn out to be closed w.r.t. both linear and max-linear combinations. Definition III.4. Let F be a collection of functions in Lα (S, µ). (i) The ratio σ-field of F , written ρ(F ) := σ ({f1 /f2 , f1 , f2 ∈ F }), is defined as the σ-field generated by ratio of functions in F , with the conventions ±1/0 = ±∞ and 0/0 = 0; (ii) The extended ratio space of F , written Re (F ), is defined as: (3.7)
Re (F ) := {rf : rf ∈ Lα (S, µ), r ∼ ρ(F ), f ∈ F } .
Similarly, we define extended positive ratio space: (3.8)
Re,+ (F ) := {rf : rf ∈ Lα+ (S, µ), r ∼ ρ(F ), r ≥ 0, f ∈ F } .
The following result is due to [45] and [109]. Theorem III.5. Let F be a linear (max-linear resp.) subspace of Lα (S1 , µ1 ) with 0 < α < 2 (Lα+ (S1 , µ1 ) with 0 < α < ∞ resp.). If U is a linear (max-linear resp.) isometry from F to U (F), then U can be uniquely extended to a linear (max-linear resp.) isometry U : Re (F) → Re (U (F)) (U : Re,+ (F) → Re,+ (U (F)) resp.), with the form (3.9)
U (rf ) = T (r)U (f ) ,
for all rf ∈ Re (F) in (3.7) (rf ∈ Re,+ (F) as in (3.8) resp.). Here T is the mapping from Lα (S1 , ρ(F), µ1 ) to Lα (S1 , ρ(U (F)), µ2 ), induced by a regular set isomorphism T from ρ(F) to ρ(U (F)). For the precise definition of a regular set isomorphism T and the induced mapping T , see [56], [45] or [109]. The following remark provides some intuition. Part (iii) is especially important since it shows that the two types of isometries can be identified.
21
Remark III.6.
(i) U is well defined in the sense that for any ri fi ∈ Re (F) , i = 1, 2
in (3.7), if r1 f1 = r2 f2 , µ1 -a.e., then U (r1 f1 ) = U (r2 f2 ) , µ2 -a.e. Similar result holds for ri fi ∈ Re,+ (F) as in (3.8). (ii) T maps any two almost disjoint sets to almost disjoint sets. See [56]. (iii) The mapping T is both linear and max-linear, i.e., for a, b ≥ 0, (3.10)
T (af + bg) = aT f + bT g
and
T (af ∨ bg) = aT f ∨ bT g .
This follows from the definition T 1A = 1T (A) for measurable A ⊂ S1 and the construction of T via simple functions. It is via T that the linearity and maxlinearity are identified. To make good use of (iii) in Remark III.6, we introduce the notion of positivelinearity. We say a linear isometry U is positive-linear, if U maps all nonnegative functions to nonnegative functions. Accordingly, we say that F ⊂ Lα+ (S, µ) is a positive-linear space, if it is closed w.r.t. the metric ρµ,α and all positive-linear comP binations, i.e., for all n ∈ N, fi ∈ F, ai ≥ 0, we have g := ni=1 ai fi ∈ F. Note that α the metric (f, g) 7→ kf − gk1∧α Lα (S,µ) restricted to L+ (S, µ) generates the same topology
as the metric ρµ,α . Clearly, Theorem III.5 holds if F is a positive-linear (instead of a linear) subspace of Lα+ (S, µ). In this case, U is also positive-linear. We conclude this section with the following refinement of statement (iii) in Remark III.6. Proposition III.7. Let U be as in Theorem III.5. If F is a positive-linear subspace of Lα+ (S1 , µ1 ), then the linear isometry U in (3.9) is also a max-linear isometry from Re,+ (F) to Re,+ (U (F)). If F is a max-linear subspace of Lα (S1 , µ1 ), then the max-linear isometry U in (3.9) is also a positive-linear isometry from Re (F) to Re (U (F)).
22
Proof. Suppose F is max-linear and U is a max-linear isometry. is also positive-linear.
We show U
First, if U in (3.9) is max-linear, then the mapping T
from Lα+ (S1 , ρ(F), µ1 ) to Lα+ (S2 , ρ(U (F)), µ2 ) is both max-linear and linear, by Remark III.6 (iii). Moreover, it is easily seen that T is positive-linear. Now, for r1 f1 , r2 f2 ∈ Re,+ (F) as in (3.8), we have f2 f1 + a2 r2 (f1 ∨ f2 ) f1 ∨ f2 f1 ∨ f2 f1 f2 = T a1 r 1 + a2 r 2 U (f1 ∨ f2 ) = a1 U (r1 f1 ) + a2 U (r2 f2 ) . f1 ∨ f2 f1 ∨ f2
U (a1 r1 f1 + a2 r2 f2 ) = U
a1 r1
That is, U is positive-linear. The proof of the other case is similar, except that we need the existence of full support function f in F, guaranteed by Lemma 3.2 in [45]. 3.2
Identification of Max-linear and Positive-linear Isometries
In this section we prove Theorem III.2. It will be used to relate SαS and αFr´echet processes in the next section. To do so, we need to introduce a subspace of Lα+ (S, µ), which is closed w.r.t. the max-linear and positive-linear combinations. For any F ⊂ Lα+ (S, µ), let (3.11)
F+ := span+ {F }
and
F∨ := ∨-span{F }
denote the smallest positive-linear and max-linear subspace of Lα+ (S, µ) containing the collection of functions F , respectively. We call them the max-linear and positivelinear spaces generated by F , respectively. (We also write F := span{F } as the smallest linear subspace of Lα (S, µ) containing F .) In general, we have F+ 6= F∨ . P W This means both F+ and F∨ are too small to be closed w.r.t. both ‘ ’ and ‘ ’ operators. However, these two subspaces generate the same extended positive ratio
23
space, on which the two types of isometries are identical. The following fact is proved in the Appendix. Proposition III.8. Suppose F ⊂ Lα+ (S, µ). Then Re,+ (F+ ) = Re,+ (F∨ ). (i) (i) Proof of Theorem III.2. Let F (i) := {f1 , . . . , fn } ⊂ Lα+ (Si , µi ). We prove the ‘if’
part. Suppose Relation (3.6) holds and we will show (3.5). Relation (3.6) implies that (1)
(2)
(1)
there exists unique max-linear isometry U from F∨ onto F∨ , such that U fj
=
(2)
fj , 1 ≤ j ≤ n. Thus, Theorem III.5 implies that the mapping (1)
(1)
U : Re,+ (F∨ ) → Re,+ (U (F∨ )) with form (3.9) is a max-linear isometry. By Proposition III.7, we have that U is also a positive-linear isometry. By Proposition III.8, U is a positive-linear isometry (1)
defined on Re,+ (F+ ), which implies (3.5). The proof of the ‘if’ part is similar. (1)
(1)
To conclude this section, we will address the following question: for f1 , . . . , fn ∈ (2)
(2)
Lα (S1 , µ1 ), do there always exist nonnegative f1 , . . . , fn
∈ Lα+ (S2 , µ2 ) such that
Relation (3.5) holds for any aj ∈ R? The answer is ’No’. As a consequence, in the next section we will see that there are SαS processes, which cannot be associated to any α-Fr´echet process. (1)
Proposition III.9. Consider fj (2)
fj
∈ Lα (S1 , µ1 ), 1 ≤ j ≤ n. Then, there exist some
∈ Lα+ (S2 , µ2 ) , 1 ≤ j ≤ n such that (3.5) holds, if and only if
(3.12)
(1)
(1)
fi (s)fj (s) ≥ 0 , µ1 -a.e. for all 1 ≤ i, j ≤ n .
(2) (1) When (3.12) is true, one can take fi (s) := |fi (s)|, 1 ≤ i ≤ n and (S2 , µ2 ) ≡
(S1 , µ1 ) for (3.5) to hold. The proof is given in the Appendix. We will call (3.12) the associability condition.
24
3.3
Association of Sum- and Max-stable Processes
In this section, by essentially applying Theorem III.2, we associate an SαS process to every α-Fr´echet process by Definition III.1. The associated processes will be shown to have similar properties. However, we will also see that not all the SαS processes can be associated to α-Fr´echet processes. We conclude with several examples. Remark III.10. In Definition III.1, the associated SαS and α-Fr´echet processes have the same α ∈ (0, 2). It is easy to see that, for any α-Fr´echet process {Yt }t∈T with spectral functions {ft }t∈T , {Ytβ }t∈T is α/β-Fr´echet with spectral functions {ftβ }t∈T , for all 0 < α, β < ∞. This transformation shows that the parameter α plays essentially no role in characterizing the dependence structure of the α-Fr´echet process. Given an SαS process with nonnegative spectral functions, one could associate it to the 1-Fr´echet process with spectral functions {ftα }t∈T . This leads to no loss of generality. Here, we chose to pair-up the two α’s for technical convenience. The following result, a simple application of Theorem III.2, shows the consistency of Definition III.1, i.e., the notion of association is independent of the choice of the spectral functions. Theorem III.11. Suppose an SαS process {Xt }t∈T and an α-Fr´echet process {Yt }t∈T (1)
(2)
are associated by {ft }t∈T ⊂ Lα+ (S1 , µ1 ). Then, {ft }t∈T ⊂ Lα+ (S2 , µ2 ) is a spectral representation of {Xt }t∈T , if and only if it is a spectral representation of {Yt }t∈T . Namely, nZ
(1) (1) ft dMα,+
S1
o
d
=
t∈T
nZ
(2)
(2)
ft dMα,+
o
,
t∈T
S2
if and only if n Ze S1
(1) (1) ft dMα,∨
o t∈T
d
=
n Ze S2
(2) (2) ft dMα,∨
o t∈T
,
25 (i)
(i)
where Mα,+ and Mα,∨ are SαS random measures and α-Fr´echet random supmeasures, respectively, on Si with control measure µi , i = 1, 2. As an immediate consequence, stationarity and self-similarity are preserved under association. Here we assume T = Rd or Zd . Corollary III.12. Suppose an SαS process {Xt }t∈T and an α-Fr´echet process {Yt }t∈T are associated. Then, (i) {Xt }t∈T is stationary if and only if {Yt }t∈T is stationary. (ii) {Xt }t∈T is self-similar with exponent H, if and only if {Yt }t∈T is self-similar with exponent H. Proof. Suppose {Xt }t∈T and {Yt }t∈T are associated by {ft }t∈T ⊂ Lα+ (S, µ). (i) For any h ∈ T , letting gt = ft+h , ∀t ∈ T , by stationarity of {Xt }t∈T , we obtain {gt }t∈T R R d as another spectral representation. Namely, { S ft dMα,+ }t∈T = { S gt dMα,+ }t∈T . d eR By Theorem III.11, the previous statement is equivalent to { S ft dMα,∨ }t∈T = eR { S gt dMα,∨ }t∈T , which is equivalent to the fact that {Yt }t∈T is stationary. The proof of part (ii) is similar and thus omitted. Observe that not all SαS processes can be associated to α-Fr´echet processes, since not all SαS processes have nonnegative spectral representations. For an SαS process {Xt }t∈T with spectral representation {ft }t∈T to have an associated α-Fr´echet process, a necessary and sufficient condition is that for all t1 , . . . , tn ∈ T , ft1 , . . . , ftn satisfy the associability condition (3.12). We say such SαS processes are max-associable. Now, Proposition III.9 becomes: Theorem III.13. Any SαS process {Xt }t∈T with representation (3.1) is max-
26
associable, if and only if for all t1 , t2 ∈ T , ft1 (s)ft2 (s) ≥ 0 , µ-a.e.
(3.13)
Indeed, by Theorem III.13 for any max-associable spectral representation {ft }t∈T , {|ft |}t∈T is also a spectral representation for the same process. Clearly, if the spectral functions are nonnegative, then the SαS process is max-associable. We give two simple examples next. Example III.14 (Association of mixed fractional motions). Consider the self-similar SαS processes {Xt }t∈R+ with the following representations (3.14)
d
{Xt }t∈R+ =
nZ Z E
0
∞
u o 1 tH− α g x, Mα,+ (dx, du) , H ∈ (0, ∞) , t t∈R+
where (E, E, ν) is a standard Lebesgue space, Mα,+ is an SαS random measure on X × R+ with control measure m(dx, du) = ν(dx)du and g ∈ Lα (E × R+ , m). Such processes are called mixed fractional motions (see [10]). When g ≥ 0 a.e., the process {Xt }t∈R+ is max-associable. The Corollary III.12 implies the associated α-Fr´echet process is H-self-similar. Example III.15 (Association of Chentzov SαS random fields). Recall that {Xt }t∈Rn is a Chentzov SαS random field, if d
{Xt }t∈Rn ≡ {Mα,+ (Vt )}t∈Rn =
nZ
o 1Vt (u)Mα,+ (du)
S
t∈Rn
.
Here, 0 < α < 2, (S, µ) is a measure space and Vt , t ∈ Rn is a family of measurable sets such that µ(Vt ) < ∞ for all t ∈ Rn (see Ch. 8 in [93]). Since 1Vt (u) ≥ 0, all Chentzov SαS random fields are max-associable. We conclude this section with some examples of SαS processes that are not maxassociable. In particular, recall that the SαS processes with stationary increments
27
(zero at t = 0) characterized by dissipative flows were shown in [103] to have representation (3.15)
d
{Xt }t∈R =
nZ Z E
o (G(x, t + u) − G(x, u))Mα,+ (dx, du)
.
t∈R
R
Here, (E, E, ν) is a standard Lebesgue space, Mα,+ , α ∈ (0, 2), is an SαS random measure with control measure m(dx, du) = ν(dx)du and G : E × R → R is a measurable function such that, for all t ∈ R, Gt (x, u) = G(x, t + u) − G(x, u) , x ∈ E, u ∈ R belongs to Lα (E × R, m). The process {Xt }t∈R in (3.15) is called a mixed moving average with stationary increments. The following result provides a partial characterization of the max-associable SαS processes {Xt }t∈T , which have the representation (3.15). We shall suppose that E is equipped with a metric ρ and endow E × R with the product topology. Proposition III.16. Consider an SαS process {Xt }t∈R with representation (3.15). Suppose there exists a closed set N ⊂ E × R, such that m(N ) = 0 and the function G is continuous at all (x, u) ∈ N c := E × R \ N , w.r.t. the product topology. Then, {Xt }t∈R is max-associable, if and only if (3.16)
G(x, u) = f (x)1Ax (u) + c(x), on N c .
Namely, for all x ∈ E, G(x, u) can take at most two values on N c . Proof. By Theorem III.13, {Xt }t∈R is max-associable, if and only if for all t1 , t2 ∈ R, (3.17) Gt1 (x, u)Gt2 (x, u) = (G(x, t1 + u) − G(x, u))(G(x, t2 + u) − G(x, u)) ≥ 0 , m-a.e. (x, u) ∈ E × R .
28
e u) := G(x, u) (given by (3.16)) on N c First, we show the ‘if’ part. Define G(x, e u) := f (x)1Ax (u) + c(x) on N (if Ax and c(x) are not defined, then set and G(x, e u) = 0). Set G et (x, u) = G(x, e u + t) − G(x, e u). Note that G et (x, u) is another G(x, spectral representation of {Xt }t∈R and for all (x, u), {1Ax (u + t) − 1Ax (u)}t∈R can take at most 2 values, one of which is 0. This observation implies (3.17) with Gt (x, u) et (x, u), whence {Xt }t∈R is max-associable. replaced by G Next, we prove the ‘only if’ part. We show that (3.17) is violated, if G(x, u) takes more than 2 different values on ({x} × R) ∩ N c for some x ∈ X. Suppose there exist x ∈ E, ui ∈ R such that (x, ui ) ∈ N c and gxi := G(x, ui ) are mutually different, for i = 1, 2, 3. Indeed, without loss of generality we may suppose that gx1 < gx2 < gx3 . Then, by the continuity of G, there exists > 0 such that Bi := B(x, ) × (ui − , ui + ) , i = 1, 2, 3 are disjoint sets with B(x, ) := {y ∈ E : ρ(x, y) < }, ρ is the metric on E and sup G(x, u) < inf c G(x, u) ≤ sup G(x, u) < inf c G(x, u) .
(3.18)
B1 ∩N c
B2 ∩N
B2 ∩N c
B3 ∩N
Put t1 = u1 −u2 and t2 = u3 −u2 . Inequality (3.18) implies that Gt1 (x, u)Gt2 (x, u) < 0 on B2 ∩ N c . This, in view of Theorem III.13, contradicts the max-associability. We have thus shown (3.16). We give two classes of SαS processes, which cannot be associated to any α-Fr´echet processes, according to Proposition III.16. Example III.17 (Non-associability of linear fractional stable motions). The linear fractional stable motions (see Ch. 7.4 in [93]) have the following spectral representations: {Xt }t∈R
nZ h i o H−1/α H−1/α H−1/α H−1/α = a (t + u)+ − u+ + b (t + u)− − u− Mα,+ (du) d
R
t∈R
.
29
Here H ∈ (0, 1), α ∈ (0, 2), H 6= 1/α, a, b ∈ R and |a|+|b| > 0. By Proposition III.16, these processes are not max-associable. Example III.18 (Non-associability of Telecom processes). The Telecom process offers an extension of fractional Brownian motion consistent with heavy-tailed fluctuations. It is a large scale limit of renewal reward processes and it can be obtained by choosing the distribution of the rewards accordingly (see [57] and [72]). A Telecom process {Xt }t∈R has the following representation d
{Xt }t∈R =
nZ Z R
o e(H−1)/α (F (es (t + u)) − F (es u)) Mα,+ (ds, du)
,
t∈R
R
where 1 < α < 2, 1/α < H < 1, F (z) = (z ∧ 0 + 1)+ , z ∈ R and the SαS random measure Mα,+ is with control measure mα (ds, du) = dsdu. By Proposition III.16, the Telecom process is not max-associable. Remark III.19. It is important that the index T in Proposition III.16 is the entire real line R. Indeed, in both Example III.17 and III.18, when the time index is restricted to the half-line T = R+ (or T = R− ), the processes {Xt }t∈T satisfy condition (3.13) and are therefore max-associable. 3.4
Association of Classifications
In this section, we show how to apply the association technique to relate various classification results for SαS and α-Fr´echet processes. Note that, many classifications of SαS (α-Fr´echet as well) processes are induced by suitable decompositions of the measure space (S, µ). The following theorem provides an essential tool for translating classification results for SαS to α-Fr´echet processes, and vice versa. Theorem III.20. Suppose an SαS process {Xt }t∈T and an α-Fr´echet process {Yt }t∈T (i)
are associated by two spectral representations {ft }t∈T ⊂ Lα+ (Si , µi ) for i = 1, 2. That
30
is, d
Z
{Xt }t∈T =
(i) (i) ft dMα,+
Si
d
Ze
(i) (i) ft dMα,∨
{Yt }t∈T =
and
Si
t∈T
, i = 1, 2 . t∈T
Then, for any measurable subsets Ai ⊂ Si , i = 1, 2, we have nZ
(1) (1) ft dMα,+
o
d
=
nZ
t∈T
A1
(2)
(2)
ft dMα,+
o
A2
t∈T
if and only if n Ze
(1) (1) ft dMα,∨
o
A1
d
=
t∈T
n Ze
(2)
(2)
ft dMα,∨
A2
o
. t∈T
The proof follows from Theorem III.2 by restricting the measures onto the sets Ai , i = 1, 2. For an SαS process {Xt }t∈T with spectral functions {ft }t∈T ⊂ Lα (S, µ), a deP d (j) (j) composition typically takes the form {Xt }t∈T = { nj=1 Xt }t∈T , where Xt = R Sn (j) (j) f dM for all t ∈ T and A , 1 ≤ j ≤ n are disjoint subsets of S = t α,+ (j) j=1 A . A (j)
The components {Xt }t∈T , 1 ≤ j ≤ n are independent SαS processes. When {Xt }t∈T is max-associable, Theorem III.20 enables us to define the associated decomposition, for the α-Fr´echet process {Yt }t∈T associated with {Xt }t∈T . Namely, W d eR (j) (j) we have {Yt }t∈T = { nj=1 Yt }t∈T , where Yt = A(j) |ft |dMα,∨ for all t ∈ T . Conversely, given a decomposition for α-Fr´echet processes, we can define a corresponding decomposition for the associated SαS processes. Example III.21 (Conservative-dissipative decomposition). In the seminal work, [83] established the conservative-dissipative decomposition for SαS processes. Namely, for any {Xt }t∈T with representation (3.1), one have d
{Xt }t∈T = {XtC + XtD }t∈T , where XtC =
R
f dMα,+ and XtD = C t
R D
ft dMα,+ for all t ∈ T , with C and D defined
31
by (3.19)
n Z o C := s : ft (s)α λ(dt) = ∞
and
D := S \ C .
T
When {Xt }t∈T is stationary, the sets C and D correspond to the Hopf decomposition S = C ∪ D of the non-singular flow associated with {Xt }t∈T (see [83] for details). Therefore, {XtC }t∈T and {XtD }t∈T are referred to as the conservative and dissipative components of {Xt }t∈T , respectively. Theorem III.20 enables us to use (3.19) to establish the parallel decomposition of the associated α-Fr´echet process d
{Yt }t∈T . Namely, for the associated {Yt }t∈T , we have, {Yt }t∈T = {YtC ∨ YtD }t∈T , eR eR where YtC = C |ft |dMα,∨ and YtD = D |ft |dMα,∨ for all t ∈ T . This decomposition was established in [109] by using different tools. Remark III.22. Similar associations can be established for other decompositions, including positive-null decomposition (see [92] and [109]), and the decompositions of the above two types for random fields (T = Zd or Rd , see [91] and [108]). A more specific decomposition for SαS processes with representation (3.15) was developed in [70], and one can obtain the corresponding decomposition for the associated αFr´echet process by Theorem III.20. 3.5
Proofs of Auxiliary Results
We first need the following lemma. Lemma III.23. If F ⊂ Lα+ (S, µ), then (i) ρ(F ) = ρ(span+ (F )) = ρ(∨-span(F )), and (ii) for any f (1) ∈ span+ (F ) and f (2) ∈ ∨-span(F ), f (1) /f (2) ∈ ρ(F ).
32
Proof. (i) First, for any fi , gi ∈ F, ai ≥ 0, bi ≥ 0, i ∈ N, we have nW a f o \n a f o \ \ [ na f 1o i i i i W i∈N W i i ≤x = ≤x = 0,
we similarly obtain that (a1 f1 +a2 f2 )/(b1 g1 +b2 g2 ) is ρ(∨-span(F )) measurable. SimP P ilarly arguments can be used to show that ( ni=1 ai fi )/( ni=1 bi gi ) is ρ(∨-span(F )) measurable for all ai , bi ≥ 0, fi , gi ∈ F, 1 ≤ i ≤ n. We have thus shown that ρ(span+ (F )) ⊂ ρ(∨-span(F )). If now f, g ∈ span+ (F ), then there exist two sequences fn , gn ∈ span+ (F ), such that fn → f and gn → g a.e.. Thus, hn := fn /gn → h := f /g as n → ∞, a.e.. Since hn are ρ(span+ (F )) measurable for all n ∈ N, so is h. Hence ρ(span+ (F )) = ρ(span+ (F )) ⊂ ρ(∨-span(F )). (ii) By the previous argument, it is enough to focus on finite linear and maxP W linear combinations. Suppose f (1) = ni=1 ai fi and f (2) = pj=1 bj gj for some fi , gj ∈ F, ai , bj ≥ 0, 1 ≤ i ≤ n, 1 ≤ j ≤ p. Then, for all x > 0, p n n n Pn a f o [ o X fi Wpi=1 i i < x = ai < xbj ∈ ρ(F ) . gj j=1 bj gj j=1 i=1
It follows that f (1) /f (2) ∈ ρ(F ).
33
Proof of Proposition III.8. First we show Re,+ (F∨ ) ⊃ Re,+ (F+ ), where F∨ and F+ are defined in (3.11). By (3.8), it suffices to show that, for any r2 ∈ ρ(F+ ), f (2) ∈ F+ , there exist r1 ∈ ρ(F∨ ) and f (1) ∈ F∨ , such that r1 f (1) = r2 f (2) .
(3.20)
To obtain (3.20), we need the concept of full support. We say a function g has full support in F (an arbitrary collection of functions defined on (S, µ)), if g ∈ F and for all f ∈ F , µ(supp(g)\supp(f )) = 0. Here supp(f ) := {s ∈ S : f (s) 6= 0}. By Lemma 3.2 in [109], there exists function f (1) ∈ F∨ , which has full support in F∨ . One can show that this function has also full support in F+ . Indeed, let g ∈ F+ be arbitrary. Pn µ ani gni , ani ≥ 0 and gni ∈ F ⊂ F∨ such that gn −→ g as Then, there exist gn = ki=1 n → ∞. Note that µ(supp(gn ) \ supp(f )) = 0 for all n. Thus, for all > 0, we have µ(|gn − g| > ) ≥ µ({|g| > } \ supp(f )). Since µ(|gn − g| > ) → 0 as n → ∞, it follows that µ({|g| > } \ supp(f )) = 0 for all > 0, i.e., µ(supp(g) \ supp(f )) = 0. We have thus shown that f has full support in F+ . Now, set r1 := r2 f (2) /f (1) , we have (3.20). (Note that f (2) = 0 , µ-a.e. on S \ supp(f (1) ). By setting 0/0 = 0, f (2) /f (1) is well defined.) Lemma III.23 (ii) implies that f (2) /f (1) ∈ ρ(F ), whence r1 ∈ ρ(F ) = ρ(F+ ). We have thus shown Re,+ (F∨ ) ⊃ Re,+ (F+ ). In a similar way one can show Re,+ (F∨ ) ⊂ Re,∨ (F+ ). Proof of Proposition III.9. First, suppose (3.12) does not hold but (3.5) holds. Then, (1)
without loss of generality, we can assume that there exists S0 (1)
(1)
(1)
f1 (s) > 0, f2 (s) < 0 for all s ∈ S0
⊂ S1 such that
(1)
and µ(S0 ) > 0. It follows from (3.5) (1)
that there exists a linear isometry U such that, by Theorem III.5, U fi
(2)
= fi
=
(1)
T (ri )U (f ), with certain f and ri = fi /f , for i = 1, 2. In particular, f can be taken (1)
(2)
with full support. Note that sign(r1 ) 6= sign(r2 ) on S0 . It follows that f1
(2)
and f2
34
have different signs on a set of positive measure (indeed, this set is the image of the (1)
under the regular set isomorphism T ). This contradicts the fact that f1
(2)
are both nonnegative on S2 .
S0 f2
(2)
(1)
On the other hand, suppose (3.12) is true. Define U fi
and
:= |fi(1) |. It follows
from (3.12) that U can be extended to a positive-linear isometry from Lα (S1 , µ1 ) to Lα+ (S2 , µ2 ), which implies (3.5).
CHAPTER IV
Decomposability of Sum- and Max-stable Processes
In this chapter, we investigate the general decomposability problem for both SαS and α-Fr´echet processes with 0 < α < 2. We first focus on SαS processes. Then, by the association method introduced in Chapter III, the counterpart results for α-Fr´echet processes are proved with little extra effort in Section 4.3. Let X = {Xt }t∈T be an SαS process. We are interested in the case when X can be written as d
{Xt }t∈T =
(4.1)
n
(1) Xt
+ ··· +
(n) Xt
o
,
t∈T
d
where ‘=’ stands for ‘equality in finite-dimensional distributions’, and X (k) = d
(k)
{Xt }t∈T , k = 1, . . . , n are independent SαS processes. We will write X = X (1) + · · · + X (n) in short, and each X (k) will be referred to as a component of X. The stad
bility property readily implies that (4.1) holds with X (k) = n−1/α X ≡ {n−1/α Xt }t∈T . The components equal in finite-dimensional distributions to a constant multiple of X will be referred to as trivial. We are interested in the general structure of non-trivial SαS components of X. Many important decompositions (4.1) of SαS processes (with non-trivial components) are already available in the literature: see for example Cambanis et al. [12], Rosi´ nski [83], Rosi´ nski and Samorodnitsky [86], Surgailis et al. [103], Pipiras and 35
36
Taqqu [70, 71], and Samorodnitsky [92], to name a few. These results were motivated by studies of various probabilistic and structural aspects of the underlying SαS processes such as ergodicity, mixing, stationarity, self-similarity, etc. Notably, Rosi´ nski [83] established a fundamental connection between stationary SαS processes and non-singular flows. He developed important tools based on minimal representations of SαS processes and inspired multiple decomposition results motivated by connections to ergodic theory. In this chapter, we adopt a different perspective. Our main goal is to characterize all possible SαS decompositions (4.1). Our results show how the dependence structure of an SαS process determines the structure of its components. Consider SαS processes {Xt }t∈T indexed by a complete separable metric space T with an integral representation d
{Xt }t∈T =
(4.2)
nZ
o ft (s)Mα (ds)
,
t∈T
S
with spectral functions {ft }t∈T ⊂ Lα (S, BS , µ). Recall that for all n ∈ N, tj ∈ T, aj ∈ R, (4.3)
E exp − i
n X j=1
aj Xtj
Z X n α = exp − aj ftj dµ .
S
j=1
Without loss of generality, we always assume that the spectral functions {ft }t∈T ⊂ Lα (S, BS , µ) have full support, i.e., S = supp{ft , t ∈ T }. We first state the main result of this chapter. To this end, we recall that the ratio σ-algebra of a spectral representation F = {ft }t∈T (of {Xt }) is defined as (4.4)
ρ(F ) ≡ ρ{ft , t ∈ T } := σ{ft1 /ft2 , t1 , t2 ∈ T }.
The following result characterizes the structure of all SαS decompositions.
37
Theorem IV.1. Suppose {Xt }t∈T is an SαS process (0 < α < 2) with spectral representation d
{Xt }t∈T =
nZ
ft (s)Mα (ds)
o
S
,
t∈T
(k)
with {ft }t∈T ⊂ Lα (S, BS , µ). Let {Xt }t∈T , k = 1, · · · , n be independent SαS processes. (i) The decomposition {Xt }t∈T
(4.5)
n o (1) (n) = Xt + · · · + X t d
t∈T
holds, if and only if there exist measurable functions rk : S → [−1, 1], k = 1, · · · , n, such that (4.6)
d (k) {Xt }t∈T =
nZ
o rk (s)ft (s)Mα (ds)
In this case, necessarily
Pn
k=1
, k = 1, · · · , n.
t∈T
S
|rk (s)|α = 1, µ-almost everywhere on S.
(ii) If (4.5) holds, then the rk ’s in (4.6) can be chosen to be non-negative and ρ(F )measurable. Such rk ’s are unique modulo µ. The rest of the chapter is structured as follows. In Section 4.1, we provide some consequences of Theorem IV.1 for general SαS processes. The stationary case is discussed in Section 4.2. Parallel results on max-stable processes are presented in Section 4.3. The proof of Theorem IV.1 is given in Section 4.4. 4.1
SαS Components
In this section, we provide a few examples to illustrate the consequences of our main result Theorem IV.1. The first one is about SαS processes with independent increments. Recall that we always assume 0 < α < 2.
38
Corollary IV.2. Let X = {Xt }t∈R+ be an arbitrary SαS process with independent increments and X0 = 0. Then all SαS components of X also have independent increments. Proof. Write m(t) = kXt kαα , where kXt kα denotes the scale coefficient of the SαS random variable Xt . By the independence of the increments of X, it follows that m is a non-decreasing function with m(0) = 0. First, we consider the simple case when m(t) is right-continuous. Consider the Borel measure µ on [0, ∞) determined by µ([0, t]) := m(t). The independence of the increments of X readily implies that X has the representation: d
{Xt }t∈R+ =
(4.7)
nZ 0
∞
o 1[0,t] (s)Mα (ds)
,
t∈R+
where Mα is an SαS random measure with control measure µ. Now, for any SαS component Y (≡ X (k) ) of X, we have that (4.6) holds with ft (s) = 1[0,t] (s) and some function r(s)(≡ rk (s)). This implies that the increments of Y are also independent since, for example, for any 0 ≤ t1 < t2 , the spectral functions r(s)ft1 (s) = r(s)1[0,t1 ] (s) and r(s)ft2 (s) − r(s)ft1 (s) = r(s)1(t1 ,t2 ] (s) have disjoint supports. It remains to prove the general case. The difficulty is that m(t) may have (at most countably many) discontinuities, and a representation as (4.7) is not always possible. Nevertheless, introduce the right-continuous functions t 7→ mi (t), i = 0, 1, X
m0 (t) := m(t+) −
(m(τ ) − m(τ −)) and m1 (t) :=
τ ≤t
X (m(τ ) − m(τ −)) τ ≤t
fα be an SαS random measure on R+ ×{0, 1} with control measure µ([0, t]× and let M {i}) := mi (t), i = 0, 1, t ∈ R+ . In this way, as in (4.7) one can show that d
{Xt }t∈T =
nZ R+ ×{0,1}
o fα (ds, dv) 1[0,t)×{0} (s, v) + 1[0,t]×{1} (s, v)M
t∈T
.
39
The rest of the proof remains similar and is omitted. Remark IV.3. Theorem IV.1 and Corollary IV.2 do not apply to the Gaussian case (α = 2). For the sake of simplicity, take T = {1, 2} and n = 2 (2 SαS components) in (4.1). In this case, all the (in)dependence information of the mean-zero Gaussian process {Xt }t∈T is characterized by the covariance matrix Σ of the Gaussian vector (1)
(2)
(1)
(2)
(X1 , X1 , X2 , X2 ). A counterexample can be easily constructed by choosing appropriately Σ. This reflects the drastic difference of the geometries of Lα spaces for α < 2 and α = 2. The next natural question to ask is whether two SαS processes have common components. Namely, the SαS process Z is a common component of the SαS processes d
d
X and Y , if X = Z + X (1) and Y = Z + Y (1) , where X (1) and Y (1) are both SαS processes independent of Z. To study the common components, the co-spectral point of view introduced in Wang and Stoev [111] is helpful. Consider a measurable SαS process {Xt }t∈T with spectral representation (4.2), where the index set T is equipped with a measure λ defined on the σ-algebra BT . Without loss of generality, we take f (·, ·) : (S × T, BS × BT ) → (R, BR ) to be jointly measurable (see Theorems 9.4.2 and 11.1.1 in [93]). The co-spectral functions, f· (s) ≡ f (s, ·), are elements of L0 (T ) ≡ L0 (T, BT , λ), the space of BT -measurable functions modulo λ-null sets. The co-spectral functions are indexed by s ∈ S, in contrast to the spectral functions ft (·) indexed by t ∈ T . Recall also that a set P ⊂ L0 (T ) is a cone, if cP = P for all c ∈ R \ {0} and {0} ∈ P. We write {f· (s)}s∈S ⊂ P modulo µ, if for µ-almost all s ∈ S, f· (s) ∈ P. (i)
Proposition IV.4. Let X (i) = {Xt }t∈T be SαS processes with measurable represen(i)
tations {ft }t∈T ⊂ Lα (Si , BSi , µi ), i = 1, 2. If there exist two cones Pi ⊂ L0 (T ), i =
40 (i)
1, 2, such that {f· (s)}s∈Si ⊂ Pi modulo µi , for i = 1, 2, and P1 ∩ P2 = {0}, then the two processes have no common component. Proof. Suppose Z is a component of X (1) . Then, by Theorem IV.1, Z has a spectral (1)
representation {r(1) ft }t∈T , for some BS1 -measurable function r(1) . By the definition (1)
of cones, the co-spectral functions of Z are included in P1 , i.e., {r(1) (s)f· (s)}s∈S1 ⊂ P1 modulo µ1 . If Z is also a component of X (2) , then by the same argument, (2)
{r(2) (s)f· (s)}s∈S2 ⊂ P2 modulo µ2 , for some BS2 -measurable function r(2) (s). Since P1 ∩ P2 = {0}, it then follows that µi (supp(r(i) )) = 0, i = 1, 2, or equivalently Z = 0, the degenerate case. We conclude this section with an application to SαS moving averages. Corollary IV.5. Let X (1) and X (2) be two SαS moving averages nZ o d (i) {Xt }t∈Rd = f (i) (t + s)Mα(i) (ds) Rd
t∈Rd
with kernel functions f (i) ∈ Lα (Rd , BRd , λ), i = 1, 2. Then, either (4.8)
d
X (1) = cX (2) for some c > 0 ,
or X (1) and X (2) have no common component. Moreover, (4.8) holds, if and only if for some τ ∈ Rd and ∈ {±1}, (4.9)
f (1) (s) = cf (2) (s + τ ) , µ-almost all s ∈ S.
Proof. Clearly (4.9) implies (4.8). Conversely, if (4.8) holds, then (4.9) follows as in the proof of Corollary 4.2 in [111], with slight modification (the proof therein was for positive cones). When (4.8) (or equivalently (4.9)) does not hold, consider the smallest cones containing {f (i) (s + ·)}s∈R , i = 1, 2 respectively. Since these two cones have trivial intersection {0}, Proposition IV.4 implies that X (1) and X (2) have no common component.
41
4.2
Stationary SαS Components and Flows
Let X = {Xt }t∈T be a stationary SαS process with representation (4.2), where now T = Rd or T = Zd , d ∈ N. The seminal work of Ros´ nski [83] established an important connection between stationary SαS processes and flows. A family of functions {φt }t∈T is said to be a flow on (S, BS , µ), if for all t1 , t2 ∈ T , φt1 +t2 (s) = φt1 (φt2 (s)) for all s ∈ S, and φ0 (s) = s for all s ∈ S. We say that a flow is non-singular, if µ(φt (A)) = 0 is equivalent to µ(A) = 0, for all A ∈ BS , t ∈ T . Given a flow {φt }t∈T , {ct }t∈T is said to be a cocycle if ct+τ (s) = ct (s)cτ ◦ φt (s) µ-almost surely for all t, τ ∈ T and ct ∈ {±1} for all t ∈ T . To understand the relation between the structure of stationary SαS processes and flows, it is necessary to work with minimal representations of SαS processes, introduced by Hardin [45, 46]. The minimality assumption is crucial in many results on the structure of SαS processes, although it is in general difficult to check (see e.g. Rosi´ nski [85] and Pipiras [69]). Definition IV.6. The spectral functions F ≡ {ft }t∈T (and the corresponding spectral representation (4.2)) are said to be minimal, if the ratio σ-algebra ρ(F ) in (4.4) is equivalent to BS , i.e., for all A ∈ BS , there exists B ∈ ρ(F ) such that µ(A∆B) = 0, where A∆B = (A \ B) ∪ (B \ A). Rosi´ nski ([83], Theorem 3.1) proved that if {ft }t∈T is minimal, then there exists a modulo µ unique non-singular flow {φt }t∈T , and a corresponding cocycle {ct }t∈T , such that for all t ∈ T , (4.10)
ft (s) = ct (s)
dµ ◦ φ
t
dµ
(s)
1/α
f0 ◦ φt (s) , µ-almost everywhere.
Conversely, suppose that (4.10) holds for some non-singular flow {φt }t∈T , a corresponding cocycle {ct }t∈T , and a function f0 ∈ Lα (S, µ) ({ft }t∈T not necessarily
42
minimal). Then, clearly the SαS process X in (4.2) is stationary. In this case, we shall say that X is generated by the flow {φt }t∈T . Consider now an SαS decomposition (4.1) of X, where the independent com(k)
ponents {Xt }t∈T ’s are stationary. This will be referred to as a stationary SαS (k)
decomposition, and the {Xt }t∈T ’s as stationary components of X. Our goal in this section is to characterize the structure of all possible stationary components. This characterization involves the invariant σ-algebra with respect to the flow {φt }t∈T : (4.11)
Fφ = {A ∈ BS : µ(φτ (A)∆A) = 0 , for all τ ∈ T } .
Given a function g and a σ-algebra G, we write g ∈ G, if g is measurable with respect to G. Theorem IV.7. Let {Xt }t∈T be a stationary and measurable SαS process with spectral functions {ft }t∈T given by Z ft (s) =
ct (s)
dµ ◦ φ
t
dµ
S
(s)
1/α
f0 ◦ φt (s)Mα (ds), t ∈ T .
(i) Suppose that {Xt }t∈T has a stationary SαS decomposition (4.12)
n o d (1) (n) {Xt }t∈T = Xt + · · · + Xt
.
t∈T
(k)
Then, each component {Xt }t∈T has a representation (4.13)
d (k) {Xt }t∈T =
nZ S
rk (s)ft (s)Mα (ds)
o
, k = 1, · · · , n,
t∈T
where the rk ’s can be chosen to be non-negative and ρ(F )-measurable. This choice is unique modulo µ and these rk ’s are φ-invariant, i.e. rk ∈ Fφ . P (ii) Conversely, for any φ-invariant rk ’s such that nk=1 |rk (s)|α = 1, µ-almost everywhere on S, decomposition (4.12) holds with X (k) ’s as in (4.13).
43
Proof. By using (4.10), a change of variables, and the φ-invariance of the functions rk ’s, one can show that the X (k) ’s in (4.13) are stationary. This fact and Theorem IV.1 yield part (ii). We now show (i). Suppose that X (k) is a stationary (SαS) component of X. Theorem IV.1 implies that there exists unique modulo µ non-negative and ρ(F )measurable function rk for which (4.13) holds. By the stationarity of X (k) , it also follows that for all τ ∈ T , {rk (s)ft+τ (s)}t∈T is also a spectral representation of X (k) . By the flow representation (4.10), it follows that for all t, τ ∈ T , dµ ◦ φ 1/α τ ft+τ (s) = cτ (s)ft ◦ φτ (s) (s) , µ-almost everywhere, dµ
(4.14)
and we obtain that for all τ, tj ∈ T, aj ∈ R, j = 1, · · · , n: Z X Z X n n α α aj rk (s)ftj +τ (s) µ(ds) = aj rk ◦ φ−τ (s)ftj (s) µ(ds), S
S
j=1
j=1
which shows that {rk ◦ φ−τ (s)ft (s)}t∈T is also a representation for X (k) , for all τ ∈ T . Observe that from (4.14), for all t1 , t2 , τ ∈ T and λ ∈ R, nf
t1 +τ
ft2 +τ
o nf o t1 ≤ λ = φ−1 ≤ λ modulo µ. τ ft2
It then follows that for all τ ∈ T , the σ-algebra φ−τ (ρ(F )) ≡ (φτ )−1 (ρ(F )) is equivalent to ρ(F ). This, by the uniqueness of rk ∈ ρ(F ) (Theorem IV.1), implies that rk ◦ φτ = rk modulo µ, for all τ . Then, rk ∈ Fφ follows from standard measuretheoretic argument. The proof is complete. Remark IV.8. The structure of the stationary SαS components of stationary SαS processes (including random fields) has attracted much interest since the seminal work of Rosi´ nski [83, 84]. See, for example, Pipiras and Taqqu [71], Samorodnitsky [92], Roy [87, 88], Roy and Samorodnitsky [91], Roy [89, 90], and Wang et al. [108]. In
44
view of Theorem IV.7, the components considered in these works correspond to indicator functions rk (s) = 1Ak (s) of certain disjoint flow-invariant sets Ak ’s arising from ergodic theory (see e.g. Krengel [55] and Aaronson [1]). Theorem IV.7 can be applied to check indecomposability of stationary SαS processes. Recall that a stationary SαS process is said to be indecomposable, if all its stationary SαS components are trivial (i.e. constant multiples of the original process). Corollary IV.9. Consider {Xt }t∈T as in Theorem IV.7. If Fφ is trivial, then {Xt }t∈T is indecomposable. The converse is true when, in addition, {ft }t∈T is minimal. Proof. If Fφ is trivial, the result follows from Theorem IV.7. Conversely, let {ft }t∈T be minimal and X indecomposable. Then, one can choose A ∈ Fφ , such that µ(A) > 0 and µ(S \ A) > 0. Then, consider d {XtA }t∈T =
nZ
o 1A (s)ft (s)Mα (ds)
.
t∈T
S
By Theorem IV.7, X A is a stationary component of X. It suffices to show that X A is a non-trivial of X, which would contradict the indecomposability. d
Suppose that X A is trivial, then cX A = X, for some c > 0. Thus, by Theorem IV.7, cX A has a representation as in (4.13), with rk := c1A . On the other hand, since d
cX A = X, we also have the trivial representation with rk := 1. Since A ∈ ρ(F ), the uniqueness of rk implies that 1 = c1A modulo µ, which contradicts µ(Ac ) > 0. Therefore, X A is non-trivial. The indecomposable stationary SαS processes can be seen as the elementary building blocks for the construction of general stationary SαS processes. We conclude this section with two examples.
45
Example IV.10 (Mixed moving averages). Consider a mixed moving average in the sense of [102]: d
{Xt }t∈Rd =
(4.15)
nZ
o f (t + s, v)Mα (ds, dv)
t∈Rd
Rd ×V
.
Here, Mα is an SαS random measure on Rd ×V with the control measure λ×ν, where λ is the Lebesgue measure on (Rd , BRd ) and ν is a probability measure on (V, BV ), and S f (s, v) ∈ Lα (Rd × V, BRd ×V , λ × ν). Given a disjoint union V = nj=1 Aj , where Aj ’s are measurable subsets of V , the mixed moving averages can clearly be decomposed as in (4.12) with d (k) {Xt }t∈Rd =
nZ
o f (t + s, v)Mα (ds, dv)
t∈Rd
Rd ×Ak
, for all k = 1, . . . , n .
Any moving average process (4.16)
d
{Xt }t∈Rd =
nZ Rd
o f (t + s)Mα (ds)
t∈Rd
trivially has a mixed moving average representation. The next result shows when the converse is true. Corollary IV.11. The mixed moving average X in (4.15) is indecomposable, if and only if it has a moving average representation as in (4.16). Proof. By Corollary IV.9, the moving average process (4.16) is indecomposable, since in this case φt (s) = t + s, t, s ∈ Rd and therefore Fφ is trivial. This proves the ‘if’ part. Suppose now that X in (4.15) is indecomposable. In Section 5 of Pipiras [69] it was shown that SαS processes with mixed moving average representations and stationary increments also have minimal representations of the mixed moving average type. By using similar arguments, one can show that this is also true for the class of stationary mixed moving average processes.
46
Thus, without loss of generality, we assume that the representation in (4.15) is minimal. Suppose now that there exists a set A ∈ BV with ν(A) > 0 and ν(Ac ) > 0. Since Rd × A and Rd × Ac are flow-invariant, we have the stationary decomposition d
c
{Xt }t∈Rd = {XtA + XtA }t∈Rd , where XtB
Z
1B (v)f (t + s, v)Mα (ds, dv), B ∈ {A, Ac }.
:= R×V
c
c
Note that both components X A = {XtA }t∈Rd and X A = {XtA }t∈Rd are non-zero because the representation of X has full support. Now, since X is indecomposable, there exist positive constants c1 and c2 , such d
d
c
that X = c1 X A = c2 X A . The minimality of the representation and Theorem IV.7 imply that c1 1A = c2 1Ac modulo ν, which is impossible. This contradiction shows that the set V cannot be partitioned into two disjoint sets of positive measure. That is, V is a singleton and the mixed moving average is in fact a moving average. Example IV.12 (Doubly stationary processes). Consider a stationary process ξ = {ξt }t∈T (T = Zd ) supported on the probability space (E, E, µ) with ξt ∈ Lα (E, E, µ). Without loss of generality, we may suppose that ξt (u) = ξ0 ◦ φt (u), where {φt }t∈T is a µ-measure-preserving flow. Let Mα be an SαS random measure on (E, E, µ) with control measure µ. The stationary SαS process X = {Xt }t∈T Z (4.17)
ξt (u)Mα (du), t ∈ T
Xt := E
is said to be doubly stationary (see Cambanis et al. [11]). By Corollary IV.9, if ξ is ergodic, then X is indecomposable. A natural and interesting question raised by a referee is: what happens when X is decomposable and hence ξ is non-ergodic? Can we have a direct integral decom-
47
position of the process X into indecomposable components? The following remark partly addresses this question. Remark IV.13. The doubly stationary SαS processes are a special case of stationary SαS processes generated by positively recurrent flows (actions). As shown in Samorodnitsky [92], Remark 2.6, each such stationary SαS process X = {Xt }t∈T can be expressed through a measure-preserving flow (action) on a finite measure space. Namely, d
{Xt }t∈T =
(4.18)
nZ E
(µ)
where Mα
ft (u)Mα(µ) (du)
o t∈T
,
with ft (u) := ct (u)f0 ◦ φt (u),
is an SαS random measure with a finite control measure µ on (E, E),
φ = {φt }t∈T is a µ-preserving flow (action), and {ct }t∈T is a co-cycle with respect to φ. In the case when the co-cycle is trivial (ct ≡ 1) and µ(E) = 1, the process X is doubly stationary. For simplicity, suppose that T = Zd and without loss of generality let (E, E, µ) be a standard Lebesgue space with µ(E) = 1. The ergodic decomposition theorem (see e.g. Keller [52], Theorem 2.3.3) implies that there exists conditional probability distributions {µu }u∈E with respect to I such that φ is measure-preserving and ergodic with respect to the measures µu for µ-almost all u ∈ E. Let ν be another φ-invariant measure on (E, E) dominating the conditional probabilities µu so that the Radon– Nikodym derivatives p(x, u) = (dµu /dν)(x) are jointly measurable on (E × E, E ⊗ E, ν × µ). Consider gt (x, u) = ft (x)p(φt (x), u)1/α . Recall that ν and µu are φ-invariant, whence p(φt (x), u) =
dµu dµu (φt (x)) = (x) = p(x, u), dν dν
modulo ν × µ.
48
Thus, gt (x, u) = ft (x)(dµu /dν)1/α (x), and for all aj ∈ R, tj ∈ T, j = 1, · · · , n, we have Z E2
Z n X α aj gtj (x, u) ν(dx)µ(du) =
n X α dµ u aj ftj (x) (x)ν(dx)µ(du) dν E 2 j=1 Z X n α aj ftj (x) dµu (dx)µ(du) =
j=1
E2
j=1
Z X n α aj ftj (x) µ(dx), = E
j=1
where the last equality follows from the identity that Z
Z
h(x)µu (dx)µ(du), for all h ∈ L1 (E, E, µ).
h(x)µ(dx) = E2
E
We have thus shown that {Xt }t∈T defined by (4.18) has another spectral representation d
{Xt }t∈T =
(4.19)
nZ
gt (x, u)Mα(ν×µ) (dx, du)
E×E (ν×µ)
where Mα
o
,
t∈T
is an SαS random measure on E × E with control measure ν × µ. It
also follows that for µ-almost all u ∈ E, the process defined by (u) Xt
Z
gt (x, u)Mα(ν) (dx), t ∈ T,
:= E (ν)
is indecomposable, where Mα
has control measure ν. Indeed, as above, one can
show that d (u) {Xt }t∈T =
nZ E
(µu )
where Mα
ft (u, x)Mα(µu ) (dx)
o
,
t∈T
has control measure µu . The ergodic decomposition theorem implies
that the flow (action) φ is ergodic with respect to µu , which by Corollary IV.9 implies (u)
the indecomposability of X (u) = {Xt }t∈T . In this way, (4.19) parallels the mixed moving average representation for stationary SαS processes generated by dissipative flows (see e.g. Rosi´ nski [83]).
49
Remark IV.14. The above construction of the decomposition (4.19) assumes the existence of a φ-invariant measure ν dominating all conditional probabilities µu , u ∈ E. If the measure µ, restricted on the invariant σ-algebra Fφ is discrete, i.e. Fφ consists of countably many atoms under µ, then one can take ν ≡ µ. In this case, the process X is decomposed into a sum (possibly infinite) of its indecomposable components: Xt =
XZ k
ft (x)Mα(µ) (dx),
Ek
where the Ek ’s are disjoint φ-invariant measurable sets, such that E = ∪k Ek and φ|Ek is ergodic, for each k. In this case, the Ek ’s are the atoms of Fφ . In general, when µ|Fφ is not discrete, the dominating measure ν if it exists, may not be σ-finite. Indeed, since the φt ’s are ergodic for µu , it follows that either µu0 = µu00 or µu0 and µu00 are singular, for µ-almost all u0 , u00 ∈ E. Thus, if Fφ is “too rich”, this singularity feature implies that the measure ν may not be chosen to be σ-finite. 4.3
Decomposability of Max-stable Processes
In this section, we state and prove some results on the (max-)decomposability of max-stable processes. Again, we focus on α-Fr´echet processes. Let Y = {Yt }t∈T be an α-Fr´echet process. If (4.20)
{Yt }t∈T
n o (1) (n) = Yt ∨ · · · ∨ Yt d
,
t∈T
(k)
for some independent α-Fr´echet processes Y (k) = {Yt }t∈T , i = 1, · · · , n, then we say that the Y (k) ’s are components of Y . By the max-stability of Y , (4.20) trivially holds if the Y (k) ’s are independent copies of {n−1/α Yt }t∈T . The constant multiples of Y are referred to as trivial components of Y and as in the SαS case, we are interested in the structure of the non-trivial ones.
50
The association method can be readily applied to transfer decomposability results for SαS processes to the max-stable setting. Let Y = {Yt }t∈T be an α-Fr´echet (α ∈ (0, 2)) process with extremal representation d
{Yt }t∈T =
(4.21)
nZe
ft (s)Mα∨ (ds)
o
,
t∈T
S
where {ft }t∈T ⊂ Lα+ (S, BS , µ) are spectral functions, and recall that (4.22)
P(Yti ≤ yi , i = 1, · · · , n) = exp
n
−
Z S
o fti (s) α µ(ds) , 1≤i≤n yi max
for all yi > 0, ti ∈ T, i = 1, · · · , n. Assume 0 < α < 2. Recall that, an SαS process X and an α-Fr´echet process Y are said to be associated if they have a common spectral representation. That is, if for some non-negative {ft }t∈T ⊂ Lα+ (S, BS , µ), Relations (4.2) and (4.21) hold. To illustrate the association method in Chapter III, we prove the max-stable counterpart of our main result Theorem IV.1. From the proof, we can see that the other results in the sum-stable setting have their natural max-stable counterparts by association. We briefly state some of these results at the end of this section. Theorem IV.15. Suppose {Yt }t∈T is an α-Fr´echet process with spectral represen(k)
tation (4.21), where F ≡ {ft }t∈T ⊂ Lα+ (S, BS , µ). Let {Yt }t∈T , k = 1, · · · , n, be independent α-Fr´echet processes. Then the decomposition (4.20) holds, if and only if there exist measurable functions rk : S → [0, 1], k = 1, · · · , n, such that (4.23)
d (k) {Yt }t∈T =
nZe S
In this case,
Pn
k=1 rk (s)
α
o rk (s)ft (s)Mα∨ (ds)
, k = 1, · · · , n.
t∈T
= 1, µ-almost everywhere on S and the rk ’s in (4.23) can
be chosen to be ρ(F )-measurable, uniquely modulo µ. Proof. The ‘if’ part follows from straight-forward calculation of the cumulative distribution functions (4.22). To show the ‘only if’ part, suppose (4.20) holds and Y (k) has
51 (k)
spectral functions {gt }t∈T ⊂ Lα+ (Vk , BBk , νk ), k = 1, . . . , n. Without loss of generalP (k) ity, assume {Vk }k=1,...,n to be mutually disjoint and define gt (v) = nk=1 gt (v)1Vk ∈ Lα+ (V, BV , ν) for appropriately defined (V, BV , ν) (see the proof of Theorem IV.1). Now, consider the SαS process X associated to Y . It has spectral functions {ft }t∈T and {gt }t∈T . Consider the SαS processes X (k) associated to Y (k) via spectral functions (k)
{gt }t∈T for k = 1, . . . , n. By checking the characteristic functions, one can show that {X (k) }k=1,...,n form a decomposition of X as in (4.1). Then, by Theorem IV.1, each SαS component X (k) has a spectral representation (4.6) with spectral functions {rk ft }t∈T . But we introduced X (k) as the SαS process associated to Y (k) via spectral (k)
(k)
representation {gt }t∈T . Hence, X (k) has spectral functions {gt }t∈T and {rk ft }t∈T , and so does Y (k) by Theorem III.11. Therefore, (4.23) holds and the rest of the desired results follow. Further parallel results can be established by the association method. Consider a stationary α-Fr´echet process Y . If Y (k) , k = 1, . . . , n are independent stationary α-Fr´echet processes such that (4.20) holds, then we say each Y (k) is a stationary αFr´echet component of Y . The process Y is said to be indecomposable, if it has no nontrivial stationary component. The following results on (mixed) moving maxima (see e.g. [101] and [50] for more details) follow from Theorem IV.15 and the association method, in parallel to Corollary IV.11 on (mixed) moving averages in the sum-stable setting. Corollary IV.16. The mixed moving maxima process o nZe d {Yt }t∈Rd = f (t + s, v)Mα∨ (ds, dv)
t∈Rd
Rd ×V
is indecomposable, if and only if it has a moving maxima representation nZe o d {Yt }t∈Rd = f (t + s)Mα∨ (ds) . Rd
t∈Rd
52
4.4
Proof of Theorem IV.1
We will first show that Theorem IV.1 is true when {ft }t∈T is minimal (Proposition IV.18), and then we complete the proof by relating a general spectral representations to a minimal one. This technique is standard in the literature of representations of SαS processes (see e.g. Rosi´ nski [83], Remark 2.3). We start with a useful lemma. Lemma IV.17. Let {ft }t∈T ⊂ Lα (S, BS , µ) be a minimal representation of an SαS process. For any two bounded BS -measurable functions r(1) and r(2) , we have nZ
(1)
r ft dMα
S
o
d
=
nZ
t∈T
r(2) ft dMα
S
o
,
t∈T
if and only if |r(1) | = |r(2) | modulo µ. Proof. The ’if’ part is trivial. We shall prove now the ’only if’ part. Let S (k) := supp(r(k) ), k = 1, 2 and note that since {ft }t∈T is minimal, then {r(k) ft }t∈T , are minimal representations, restricted to S (k) , k = 1, 2, respectively. Since the latter two representations correspond to the same process, by Theorem 2.2 in [83], there exist a bi-measurable, one-to-one and onto point mapping Ψ : S (1) → S (2) and a function h : S (1) → R \ {0}, such that, for all t ∈ T , (4.24)
r(1) (s)ft (s) = r(2) ◦ Ψ(s)ft ◦ Ψ(s)h(s) , almost all s ∈ S (1) ,
and (4.25)
dµ ◦ Ψ = |h|α , µ-almost everywhere. dµ
It then follows that, for almost all s ∈ S (1) , (4.26)
ft1 (s) r(1) (s)ft1 (s) ft ◦ Ψ(s) = 1 = (1) . ft2 (s) r (s)ft2 (s) ft2 ◦ Ψ(s)
53
Define Rλ (t1 , t2 ) = {s : ft1 (s)/ft2 (s) ≤ λ} and note that by (4.26), for all A ≡ Rλ (t1 , t2 ), µ(Ψ(A ∩ S (1) )∆(A ∩ S (2) )) = 0 .
(4.27)
In fact, one can show that Relation (4.27) is also valid for all A ∈ ρ(F ) ≡ σ(Rλ (t1 , t2 ) : λ ∈ R, t1 , t2 ∈ T ).
Then, by minimality, (4.27) holds for all
A ∈ BS . In particular, taking A equal to S (1) and S (2) , respectively, it follows that µ(S (1) ∆S (2) ) = 0. Therefore, writing Se := S (1) ∩ S (2) , we have e e = 0, µ(Ψ(A ∩ S)∆(A ∩ S))
(4.28)
for all A ∈ BS .
e To see this, let B e = BS ∩ Se This implies that Ψ(s) = s, for µ-almost all s ∈ S. S e Observe that for all A ∈ B e, we have denote the σ-algebra BS restricted to S. S e and trivially σ(1A : A ∈ B e) = B e. Thus, 1A = 1A ◦ Ψ, for µ-almost all s ∈ S, S S by the second part of Proposition 5.1 in [85], it follows that Ψ(s) = s modulo µ on e This and (4.25) imply that h(s) ∈ {±1}, almost everywhere. Plugging Ψ and h S. into (4.24) yields the desired result. Proposition IV.18. Theorem IV.1 is true when {ft }t∈T is minimal. Proof. We first prove the ’if ’ part. The result follows readily by using characteristic (k)
functions. Indeed, suppose that the X (k) = {Xt }t∈T , k = 1, . . . , n are independent and have representations as in (4.6). Then, for all aj ∈ R, tj ∈ T, j = 1, · · · , m, we have m m α X Z X (4.29) E exp i aj Xtj = exp − aj ftj dµ S
j=1
=
n Y k=1
j=1
m n m α Y Z X X (k) exp − aj rk ftj dµ = E exp i aj Xtj , S
j=1
k=1
j=1
54
Pn
where the second equality follows from the fact that
k=1
|rk (s)|α = 1, for µ-almost
all s ∈ S. Relation (4.29) implies the decomposition (4.1). (k)
We now prove the ’only if ’ part. Suppose that (4.1) holds and let {ft }t∈T ⊂ Lα (Vk , BVk , νk ), k = 1, . . . , n be representations for the independent components (k)
{Xt }t∈T , k = 1, . . . , n, respectively, and without loss of generality, assume that {Vk }k=1,...,n are mutually disjoint. Introduce the measure space (V, BV , ν), where S S P V := nk=1 Vk , BV := { nk=1 Ak , Ak ∈ BVk , k = 1, . . . , n} and ν(A) := nk=1 νk (A ∩ Vk ) for all A ∈ BV . d
By decomposition (4.1), it follows that {Xt }t∈T = { Pn
k=1
R V
gt dM α }t∈T , with gt (u) :=
(k)
ft (u)1Vk (u) and M α an SαS random measure on (V, BV ) with control measure
ν. Thus, {ft }t∈T ⊂ Lα (S, BS , µ) and {gt }t∈T ⊂ Lα (V, BV , ν) are two representations of the same process X, and by assumption the former is minimal. Therefore, by Remark 2.5 in [83], there exist modulo ν unique functions Φ : V → S and h : V → R \ {0}, such that, for all t ∈ T , gt (u) = h(u)ft ◦ Φ(u) , almost all u ∈ V ,
(4.30)
where moreover µ = νh ◦ Φ−1 with dνh = |h|α dν. Recall that V is the union of mutually disjoint sets {Vk }k=1,...,n . For each k = 1, . . . , n, let Φk : Vk → Sk := Φ(Vk ) be the restriction of Φ to Vk , and define the α measure µk (·) := νh,k ◦ Φ−1 k ( · ∩ Sk ) on (S, BS ) with dνh,k := |h| dνk . Note that
µk has support Sk , and the Radon–Nikodym derivative dµk /dµ exists. We claim that (4.6) holds with rk := (dµk /dµ)1/α . To see this, observe that for all m ∈ N, a1 , . . . , am ∈ R, t1 , . . . , tm ∈ T , Z X Z m α aj rk ftj dµ = S
j=1
Sk
Z m X α aj ftj dµk = j=1
Vk
m X α aj hftj ◦ Φk dνk , j=1
55 (k)
which, combined with (4.30), yields (4.6) because gt |Vk = ft . P P Note also that nk=1 µk = µ and thus nk=1 rkα = 1. This completes the proof of part (i) of Theorem IV.1 in the case when {ft }t∈T is minimal. To prove part (ii), note that the rk ’s above are in fact non-negative and BS measurable. Note also that by minimality, the rk ’s have versions rek ’s that are ρ(F )measurable, i.e. rk = rek modulo µ. Their uniqueness follows from Lemma IV.17. Proof of Theorem IV.1. (i) The ‘if’ part follows by using characteristic functions as in the proof of Proposition IV.18 above. e B e, µ Now, we prove the ‘only if’ part. Let {fet }t∈T ⊂ Lα (S, S e) be a minimal representation of X. As in the proof of Proposition IV.18, by Remark 2.5 in [83], there exist modulo µ unique functions Φ : S → Se and h : S → R \ {0}, such that, for all t ∈ T , ft (s) = h(s)fet ◦ Φ(s) , almost all s ∈ S,
(4.31)
where µ e = µh ◦ Φ−1 with dµh = |h|α dµ. Now, by Proposition IV.18, if the decomposition (4.1) holds, then there exist unique non-negative functions rek , k = 1, · · · , n, such that d (k) {Xt }t∈T =
(4.32)
nZ e S
and
Pn
ekα k=1 r
fα rek fet dM
o
, k = 1, · · · , n,
t∈T
fα is an SαS measure on (S, e B e) with control = 1 modulo µ e. Here M S
measure µ e. Let rk (s) := rek ◦ Φ(s) and note that by using (4.31) and a change of variables, for all aj ∈ R, tj ∈ T, j = 1, · · · , m, we obtain (4.33)
Z X Z X m m aj rk (s)ftj (s) µ(ds) = aj rek (s)fetj (s) µ e(ds). S
j=1
e S
j=1
P This, in view of Relation (4.32), implies (4.6). Further, the fact that nk=1 rekα = 1 P implies nk=1 rkα = 1, modulo µ, because the mapping Φ is non-singular, i.e. µ e ◦Φ−1 ∼ µ. This completes the proof of part (i).
56
We now focus on proving part (ii). Suppose that (4.6) holds for two choices of rk , namely rk0 and rk00 . Let also rk0 and rk00 be non-negative and measurable with respect to ρ(F ). We claim that ρ(F ) ∼ Φ−1 (ρ(Fe))
(4.34)
and defer the proof to the end. Then, since the minimality implies that BSe ∼ ρ(Fe). rk0 and rk00 are measurable with respect to ρ(F ) ∼ Φ−1 (BSe). Now, Doob–Dynkin’s lemma (see e.g. Rao [75], p. 30) implies that (4.35)
rk0 (s) = rek0 ◦ Φ(s) and rk00 (s) = rek00 ◦ Φ(s),
for µ almost all s,
where rek0 and rek00 are two BSe-measurable functions. By using the last relation and a change of variables, we obtain that (4.33) holds with (rk , rek ) replaced by (rk0 , rek0 ) and rk00 fet }t∈T are representations of the (rk00 , rek00 ), respectively. Thus both {e rk0 fet }t∈T and {e k-th component of X. Since {fet }t∈T is a minimal representation of X, Lemma IV.17 implies that rek0 = rek00 modulo µ e. This, by (4.35) and the non-singularity of Φ yields rk0 = rk00 modulo µ. It remains to prove (4.34) Relation (4.31) and the fact that h(s) 6= 0 imply that for all λ and t1 , t2 ∈ T , {ft1 /ft2 ≤ λ} = Φ−1 ({fet1 /fet2 ≤ λ}) modulo µ. Thus the classes of sets C := {{ft1 /ft2 ≤ λ}, t1 , t2 ∈ T, λ ∈ R} and Ce := {Φ−1 ({fet1 /fet2 ≤ e ∈ C, e λ}), t1 , t2 ∈ T, λ ∈ R} are equivalent. That is, for all A ∈ C, there exists A e = 0 and vice versa. with µ(A∆A) Define n o Ge = Φ−1 (A) : A ∈ ρ(Fe) such that µ(Φ−1 (A)∆B) = 0 for some B ∈ σ(C) . e = Notice that Ge is a σ-algebra and since Ce ⊂ Ge ⊂ Φ−1 (ρ(Fe)), we obtain that σ(C) e This, in view of definition of G, e shows that for all A e ∈ σ(C), e exists Φ−1 (ρ(Fe)) ≡ G.
57
e = 0. In a similar way one can show that each element of A ∈ σ(C) with µ(A∆A) e which completes the proof of the desired σ(C) is equivalent to an element in σ(C), equivalence of the σ-algebras.
CHAPTER V
Conditional Sampling for Max-stable Processes
The modeling and parameter estimation of the univariate marginal distributions of the extremes have been studied extensively (see e.g. Davison and Smith [21], de Haan and Ferreira [24], Resnick [77] and the references therein). Many of the recent developments of statistical inference in extreme value theory focus on the characterization, modeling and estimation of the dependence for multivariate extremes. In this context, building adequate max-stable processes and random fields plays a key role. See for example de Haan and Pereira [25], Buishand et al. [9], Schlather [94], Schlather and Tawn [95], Cooley et al. [17], and Naveau et al. [64]. This chapter is motivated by an important and long-standing challenge, namely, the prediction for max-stable random processes and fields. Suppose that one already has a suitable max-stable model for the dependence structure of a random field {Xt }t∈T . The field is observed at several locations t1 , . . . , tn ∈ T and one wants to predict the values of the field Xs1 , . . . , Xsm at some other locations. The optimal predictors involve the conditional distribution of {Xt }t∈T , given the data. Even if the finite-dimensional distributions of the field {Xt }t∈T are available in analytic form, it is typically impossible to obtain a closed-form solution for the conditional distribution. Na¨ıve Monte Carlo approximations are not practical either, since they
58
59
involve conditioning on events of infinitesimal probability, which leads to mounting errors and computational costs. Prior studies of Davis and Resnick [19, 20] and Cooley et al. [17], among others, have shown that the prediction problem in the max-stable context is challenging, and it does not have an elegant analytical solution. On the other hand, the growing popularity and the use of max-stable processes in various applications, make this an important problem. This motivated us to seek a computational solution. 5.1
Overview
In this chapter, we develop theory and methodology for sampling from the conditional distributions of spectrally discrete max-stable models. More precisely, we provide an algorithm that can generate efficiently exact independent samples from the regular conditional probability of (Xs1 , . . . , Xsm ), given the values (Xt1 , . . . , Xtn ). For the sake of simplicity, we write X = (X1 , . . . , Xn ) ≡ (Xt1 , . . . , Xtn ). The algorithm applies to the general max-linear model: (5.1)
Xi = max ai,j Zj ≡ j=1,...,p
p _
ai,j Zj , i = 1, . . . , n.
j=1
where the ai,j ’s are known non-negative constants and the Zj ’s are independent continuous non-negative random variables. Any multivariate max-stable distribution can be approximated arbitrarily well via a max-linear model with sufficiently large p (see e.g. Remark II.1). The main idea is to first generate samples from the regular conditional probability distribution of Z | X = x, where Z = (Zj )j=1,...,p . Then, the conditional distributions of X sk =
p _ j=1
bk,j Zj , k = 1, . . . , m,
60
given X = x can be readily obtained, for any given bk,j ’s. In this chapter, we assume that the model is completely known, i.e., the parameters {ai,j } and {bk,j } are given. The statistical inference for these parameters is beyond the scope of this chapter. Observe that if X = x, then (5.1) implies natural equality and inequality constraints on the Zj ’s. More precisely, (5.1) gives rise to a set of so-called hitting scenarios. In each hitting scenario, a subset of the Zj ’s equal, in other words hit, their upper bounds and the rest of the Zj ’s can take arbitrary values in certain open intervals. We will show that the regular conditional probability of Z | X = x is a weighted mixture of the various distributions of the vector Z, under all possible hitting scenarios corresponding to X = x. The resulting formula, however, involves determining all hitting scenarios, which becomes computationally prohibitive for large and even moderate values of p. This issue is closely related to the NP-hard set-covering problem in computer science (see e.g. [13]). Fortunately, further detailed analysis of the probabilistic structure of the maxlinear models allows us to obtain a different formula of the regular conditional probability (Theorem V.9). It yields an exact and computationally efficient algorithm, which in practice can handle complex max-linear models with p in the order of thousands, on a conventional desktop computer. The algorithm is implemented in the R ([74]) package maxLinear [107], with the core part written in C/C++. We also used the R package fields ([37]) to generate some of the figures in this chapter. We illustrate the performance of our algorithm over two classes of processes: the max-autoregressive moving average (MARMA) time series (Davis and Resnick [19]), and the Smith model (Smith [98]) for spatial extremes. The MARMA processes are spectrally discrete max-stable processes, and our algorithm applies directly. In
61
Section 5.4, we demonstrate the prediction of MARMA processes by conditional sampling and compare our result to the projection predictors proposed in [19]. To apply our algorithm to the Smith model, on the other hand, we first need to discretize the (spectrally continuous) model. Section 5.5 is devoted to conditional sampling for the discretized Smith model. Thanks to the computational efficiency of our algorithm, we can choose a mesh fine enough to obtain a satisfactory discretization. Figure 5.1 shows four realizations from such a discretized Smith model, conditioning only on 7 observations (with assumed value 5). The algorithm applies in the same way to more complex models. Conditional sampling from the Smith model 15
4
8
10
6
5
5
10 8
5
1
1
5
12
5
5
5 4 10
6
0
0
5
5
2
8
15
6
5
5 5
5
5
4
−1
−1
5
5
5 5
10
6 8
15
−2
−2
10
−2
−1
0
1
−2
−1
0
10
1
15
10
5
5
1
5
5
5
10
1
5
5 5
5
5
0
5
15
0
5
5 5
5
5
5
5
5
−1
−1
55
5
5
5 10
5
−2
−2
10
5
−2
−1
0
1
−2
−1
0
1
Parameters:ρ=0,β1=1,β2=1
Figure 5.1:
Four samples from the conditional distribution of the discrete Smith model (see Section 5.5), given the observed values (all equal to 5) at the locations marked by crosses.
62
Remark V.1. We shall focus on spectrally discrete max-stable processes (see Chapter II): Xt :=
p _
φj (t)Zj , t ∈ T,
j=1
where the φj (t)’s are non-negative deterministic functions. By taking sufficiently large p’s and with judicious φj (t)’s, one can build flexible models that can replicate the behavior of an arbitrary max-stable process (recall the metric (2.5) characterizing the convergence of stochastic extremal integrals). From this point of view, a satisfactory computational solution must be able to deal with max-linear models with large p’s. Remark V.2. After our work [112] was published, the exact conditional distributions of spectrally continuous max-stable processes were addressed by Dombry and Eyi– Minko [31] via a different approach. Nevertheless, they also have a similar notion of hitting scenarios introduced below. 5.2
Conditional Probability in Max-linear Models
Consider the max-linear model in (5.1). We shall denote this model by: (5.2)
X = A Z,
where A = (ai,j )n×p is a matrix with non-negative entries, X = (X1 , . . . , Xn ) and Z = (Z1 , . . . , Zp ) are column vectors. We assume that the Zj ’s, j = 1, . . . , p, are independent non-negative random variables having probability densities. In this section, we provide an explicit formula for the regular conditional probability of Z with respect to X (see Theorem V.4 below). We start with some intuition and notation. Throughout this chapter, we assume that the matrix A has at least one nonzero entry in each of its rows and columns. This will be referred to as Assumption A.
63
Observe that if x = A z with x ∈ Rn+ , z ∈ Rp+ , then 0 ≤ zj ≤ zbj ≡ zbj (A, x) := min xi /ai,j ,
(5.3)
1≤i≤n
j = 1, . . . , p.
That is, the max-linear model (5.2) imposes certain inequality and equality constraints on the Zj ’s, given a set of observed Xi ’s. Namely, some of the upper bounds zbj (A, x) in (5.3) must be attained, or hit, i.e., zj = zbj (A, x) in such a way that xi = ai,j(i) zj(i) ,
i = 1, . . . , n,
with judicious j(i) ∈ {1, . . . , p}. The next example helps to understand the inequality and equality constraints. Example V.3. Suppose that n = p = 3 and
1 0 0 A= 1 1 0 . 1 1 1 Let x = A z for some z ∈ R3+ . In this case, it necessarily follows that x1 ≤ x2 ≤ x3 . Moreover, (5.3) yields b z = x. (i) If x = (1, 2, 3), then it trivially follows that z = b z = (1, 2, 3), which is an equality constraint on z. (ii) If x = (1, 1, 3), then it follows that z1 = zb1 = 1, z2 ≤ zb2 = 1 and z3 = zb3 = 3. Here, the “equality constraints” must hold for z1 = zb1 and z3 = zb3 , while z2 only needs to satisfy the “inequality constraint” 0 ≤ z2 ≤ zb2 . Write C(A, x) := {z ∈ Rp+ : x = A z},
64
and note that the conditional distribution of Z | X = x concentrates on the set C(A, x). The observation in Example V.3 can be generalized and formulated as follows. • Every z ∈ C(A, x) corresponds to a set of active (equality) constraints J ⊂ {1, . . . , p}, which we refer to as a hitting scenario of (A, x), such that (5.4)
zj = zbj (A, x), j ∈ J and zj < zbj (A, x), j ∈ J c := {1, . . . , p} \ J.
Observe that if j 6∈ J, then there are no further constraints and zj can take any value in [0, zbj ), regardless of the values of the other components of the vector z ∈ C(A, x). • Every value x may give rise to many different hitting scenarios J ⊂ {1, . . . , p}. Let J (A, x) denote the collection of all such J’s. We refer to J (A, x) as to the hitting distribution of x w.r.t. A: n o J (A, x) ≡ J ⊂ {1, . . . , p} : exist z ∈ C(A, x), such that (5.4) holds .
To illustrate the notions of hitting scenario and hitting distribution, consider again Example V.3. Therein, we have J (A, x) = {{1, 2, 3}} in case (i), and J (A, x) = {{1, 3}, {1, 2, 3}} in case (ii). The hitting distribution J (A, x) is a finite set and thus can always be identified. However, the identification procedure is the key difficulty in providing an efficient algorithm for conditional sampling in practice. This issue is addressed in Section 5.3. In the rest of this section, suppose that J (A, x) is given. Then, we can partition C(A, x) as follows C(A, x) =
[ J∈J (A,x)
CJ (A, x) ,
65
where CJ (A, x) = {z ∈ Rp+ : zj = zbj , j ∈ J and zj < zbj , j 6∈ J}. The sets CJ (A, x), J ∈ J (A, x) are disjoint since they correspond to different hitting scenarios in J (A, x). Let (5.5)
min |J| ,
r(J (A, x)) =
J∈J (A,x)
where |J| is the number of elements in J. We call r(J (A, x)) the rank of the hitting distribution J (A, x). It equals the minimal number of equality constraints among the hitting scenarios in J (A, x). It will turn out that the hitting scenarios J ⊂ J (A, x) with |J| > r(J (A, x)) occur with (conditional) probability zero and can be ignored. We therefore focus on the set of all relevant hitting scenarios: Jr (A, x) = {J ∈ J (A, x) : |J| = r(J (A, x))}. Theorem V.4. Consider the max-linear model in (5.2), where Zj ’s are independent random variables with densities fZj and distribution functions FZj , j = 1, . . . , p. Let A = (ai,j )n×p have non-negative entries satisfying Assumption A and let RRp+ be the class of all rectangles {(e, f ], e, f ∈ Rp+ } in Rp+ . For all J ∈ J (A, x), E ∈ RRp+ , and x ∈ Rn+ , define (5.6)
νJ (x, E) :=
Y
δzbj (πj (E))
Y
P{Zj ∈ πj (E) | Zj < zbj },
j∈J c
j∈J
where πj (z1 , . . . , zp ) = zj and δa is a unit point-mass at a. Then, the regular conditional probability ν(x, E) of Z w.r.t. X equals: (5.7)
X
ν(x, E) =
pJ (A, x)νJ (x, E), E ∈ RRp+ ,
J∈Jr (A,x)
for PX -almost all x ∈ A (Rp+ ), where for all J ∈ Jr (A, x), (5.8)
pJ (A, x) = P
wJ
K∈Jr (A,x)
wK
with
wJ =
Y j∈J
zbj fZj (b zj )
Y j∈J c
FZj (b zj ).
66
In the special case when the Zj ’s are α-Fr´echet with scale coefficient 1, we have Q wJ = j∈J (b zj )−α . Remark V.5. We state (5.7) only for rectangle sets E because the projections πj (B) of an arbitrary Borel set B ⊂ Rp+ are not always Borel (see e.g. [99]). Nevertheless, the extension of measure theorem ensures that Formula (5.7) specifies completely the regular conditional probability. We do not provide a proof of Theorem V.4 directly. Instead, we will first provide an equivalent formula for ν(x, E) in Theorem V.9 in Section 5.3, and then prove that ν(x, E) is the desired regular conditional probability. All the proofs are deferred to Section 5.6. The next example gives the intuition behind Formula (5.7). Example V.6. Continue with Example V.3. (i) If X = x = (1, 2, 3), then b z = x, J (A, x) = {{1, 2, 3}}. Therefore, r(J (A, x)) = 3 and Formula (5.7) yields ν(x, E) = νJ (x, E) = δzb1 (π1 (E))δzb2 (π2 (E))δzb3 (π3 (E)) ≡ δbz (E) , a degenerate distribution with single unit point mass at b z. (ii) If X = x = (1, 1, 3), then, b z = x, J (A, x) = {{1, 3}, {1, 2, 3}}, and r(J (A, x)) = 2. Therefore, Jr (A, x) = {{1, 3}} and Formula (5.7) yields: ν(x, E) = ν{1,3} (x, E) = δzb1 (π1 (E))P(Z2 ∈ π2 (E) | Z2 < zb2 )δzb3 (π3 (E)). In this case, the conditional distribution concentrates on the one-dimensional set {1} × (0, 1) × {3}. (iii) Finally, if X = x = (1, 1, 1), then b z = x and J (A, x) = {{1}, {1, 2}, {1, 2, 3}}. Then, Jr (A, x) = {{1}} and ν(x, E) = ν{1} (x, E) = δzb1 (π1 (E))
3 Y j=2
P(Zj ∈ πj (E) | Zj < zbj ).
67
The conditional distribution concentrates on the set {1} × (0, 1) × (0, 1). We conclude this section by showing that the conditional distributions (5.7) arise as suitable limits. This result can be viewed as a heuristic justification of Theorem V.4. Let > 0, consider n o (5.9) CJ (A, x) := z ∈ Rp+ : zj ∈ [b zj (1−), zbj (1+)], j ∈ J, zk < zbk (1−) , k ∈ J c , and set C (A, x) :=
(5.10)
[
CJ (A, x) .
J∈J (A,x)
Note that the sets A (C (A, x)) shrink to the point x, as ↓ 0. Proposition V.7. Under the assumptions of Theorem V.4, for all x ∈ A (Rp+ ), we have, as ↓ 0, (5.11)
P(Z ∈ E | Z ∈ C (A, x)) −→ ν(x, E), E ∈ RRp+ .
Proof. Recall the definition of CJ in (5.9). Observe that for all > 0, the sets {CJ (A, x)}J∈J (A,x) are mutually disjoint. Thus, writing C ≡ C (A, x) and CJ ≡ CJ (A, x), by (5.10) we have P(Z ∈ E | Z ∈ C ) =
X
P(Z ∈ E | Z ∈ CJ )P(Z ∈ CJ | Z ∈ C )
J∈J
(5.12)
=
X
P(Z ∈ CJ ) , K∈J P(Z ∈ CK )
P(Z ∈ E | Z ∈ CJ ) P
J∈J
where the terms with P(Z ∈ CJ ) = 0 are ignored. One can see that P(Z ∈ E | Z ∈ CJ ) converge to νJ (E, x) in (5.6), as ↓ 0. The independence of the Zj ’s also implies that (5.13) P(Z ∈ CJ ) =
Y
P(Zj ∈ [b zj (1 − ), zbj (1 + )])
Y
P(Zk ≤ zbk (1 − ))
k∈J c
j∈J
=
Y j∈J
Y fZj (b zj )b zj · 2 + o() FZj (b zj ) + o() . k∈J c
68
Observe that for J ∈ Jr (A, x), the latter expression equals 2wJ |J| (1 + o(1)), ↓ 0 and the terms with |J| > r will become negligible since they are of smaller order. Therefore, Relation (5.13) yields (5.7), and the proof is thus complete. The proof of Proposition V.7 provides an insight to the expressions of the weights wJ ’s in (5.8) and the components νJ ’s in (5.6). In particular, it explains why only hitting scenarios of rank r are involved in the expression of the conditional probability. The formal proof of Theorem V.4, however, requires a different argument. 5.3
Conditional Sampling: Computational Efficiency
We discuss here important computational issues related to sampling from the regular conditional probability in (5.7). It turns out that identifying all hitting scenarios amounts to solving the set covering problem, which is NP-hard (see e.g. [13]). The probabilistic structure of the max-linear models, however, will lead us to an alternative efficient solution, valid with probability one. In particular, we will provide a new formula for the regular conditional probability, showing that Z can be decomposed into conditionally independent vectors, given X = x. As a consequence, with probability one we are not in the ‘bad’ situation that the corresponding set covering problem requires exponential time to solve. Indeed, this will lead us to an efficient and linearly-scalable algorithm for conditional sampling, which works well for max-linear models with large dimensions n × p arising in applications. To fix ideas, observe that Theorem V.4 implies the following simple algorithm. Algorithm I: 1. Compute zbj for j = 1, . . . , p. 2. Identify J (A, x), compute r = r(J (A, x)) and focus on the set of relevant
69
hitting scenarios Jr = Jr (A, x). 3. Compute {wJ }J∈Jr and {pJ }J∈Jr . 4. Sample Z ∼ ν(x, ·) according to (5.7). Step 1 is immediate. Provided that Step 2 is done, Step 3 is trivial and, Step 4 can be carried out by first picking a hitting scenario J ∈ Jr (A, x) (with probability pJ (A, x)), setting Zj = zbj , for j ∈ J and then resampling independently the remaining Zj ’s from the truncated distributions: Zj | {Zj < zbj }, for all j ∈ {1, . . . , p} \ J. The most computationally intensive aspect of this algorithm is to identify the set of all relevant hitting scenarios Jr (A, x) in Step 2. This is closely related to the NP-hard set covering problem in theoretical computer science (see e.g. [13]), which is formulated next. Let H = (hi,j )n×p be a matrix of 0’s and 1’s, and let c = (cj )pj=1 ∈ Zp+ be a p-dimensional cost vector. For simplicity, introduce the notation: hmi ≡ {1, 2, . . . , m}, m ∈ N. For the matrix H, we say that the column j ∈ hpi covers the row i ∈ hni, if hi,j = 1. The goal of the set-covering problem is to find a minimum-cost subset J ⊂ hpi, such that every row is covered by at least one column j ∈ J. This is equivalent to solving (5.14)
min
X
δj ∈{0,1} j∈hpi j∈hpi
cj δj , subject to
X
hi,j δj ≥ 1 , i ∈ hni .
j∈hpi
We can relate the problem of identifying Jr (A, x) to the set covering problem by defining (5.15)
hi,j = 1{ai,j zbj =xi } ,
where A = (ai,j )n×p and x = (xi )ni=1 are as in (5.2), and cj = 1 , j ∈ hpi. It is easy
70
to see that, every J ∈ Jr (A, x) corresponds to a solution of (5.14), and vice versa. Namely, for {δj }j∈hpi minimizing (5.14), we have J = {j ∈ hpi : δj = 1} ∈ Jr (A, x). The set Jr (A, x) corresponds to the set of all solutions of (5.14), which depends only on the matrix H. Therefore, in the sequel we write Jr (H) for Jr (A, x), and (5.16)
H = (hi,j )n×p ≡ H(A, x),
with hi,j as in (5.15) will be referred to as the hitting matrix. Example V.8. Recall Example V.6. The following hitting matrices correspond to the three cases of x discussed therein: H (i)
1 0 0 1 0 0 1 0 0 (iii) (ii) = 1 1 0 . = 0 1 0 , H = 1 1 0 and H 1 1 1 0 0 1 0 0 1
Observe that solving for Jr (H) is even more challenging than solving the set covering problem (5.14), where only one minimum-cost subset J is needed, and often an approximation of the optimal solution is acceptable. Here, we need to identify exhaustively all J’s such that (5.14) holds. Fortunately, this problem can be substantially simplified, thanks to the probabilistic structure of the max-linear model. We first study the distribution of H. In view of (5.16), we have that H = H(A, X), with X = A Z, is a random matrix. It will turn out that, with probability one, H has a nice structure, leading to an efficient conditional sampling algorithm. For any hitting matrix H, we will decompose the set hpi ≡ {1, . . . , p} into a S (s) certain disjoint union hpi = rs=1 J . The vectors (Zj )j∈J (s) , s = 1, . . . , r will turn out to be conditionally independent (in s), given X = x. Therefore, ν(x, E) will be expressed as a product of (conditional) probabilities.
71
We start by decomposing the set hni ≡ {1, . . . , n}. First, for all i1 , i2 ∈ hni , j ∈ j
hpi, we write i1 ∼ i2 , if hi1 ,j = hi2 ,j = 1. Then, we define an equivalence relation on hni: j1
j2
jm
i1 ∼ i2 , if i1 = ei0 ∼ ei1 ∼ · · · ∼ eim = i2 ,
(5.17)
with some m ≤ n, i1 = ei0 , ei1 , . . . , eim = i2 ∈ hni, j1 , . . . , jm ∈ hpi. That is, ‘∼’ is the j
transitive closure of ‘∼’. Consequently, we obtain a partition of hni, denoted by hni =
(5.18)
r [
Is ,
s=1
where Is , s = 1, . . . , r are the equivalence classes w.r.t. (5.17). Based on (5.18), we define further (5.19)
J
(5.20)
J
The sets {J (s) , J
(s)
(s) (s)
n o = j ∈ hpi : hi,j = 1 for all i ∈ Is , n o = j ∈ hpi : hi,j = 1 for some i ∈ Is .
}s∈hri will determine the factorization form of ν(x, E).
Theorem V.9. Let Z be as in Theorem V.4. Let also H be the hitting matrix corresponding to (A, X) with X = A Z, and {J (s) , J
(s)
}s∈hri be the sets defined
in (5.19) and (5.20). Then, with probability one, we have (i) r = r(J (A, X)), (ii) for all J ⊂ hpi, J ∈ Jr (A, A Z) if and only if J can be written as (5.21)
J = {j1 , . . . , jr }
with
js ∈ J (s) , s ∈ hri ,
(iii) for ν(x, E) defined in (5.7), (5.22) ν(X, E) =
r Y s=1
(s)
P (s)
(s)
ν (X, E) with ν (X, E) =
(s)
wj (X)νj (X, E)
j∈J (s)
P
j∈J (s)
(s)
wj (X)
,
72
where for all j ∈ J (s) , (5.23)
Y
(s) wj (x) := zbj fZj (b zj )
k∈J
(5.24)
(s) νj (x, E)
(s)
\{j}
Y
:= δπj (E) (b zj )
k∈J
FZk (b zk ) ,
(s)
P(Zk ∈ πk (E)|Zk < zbk ),
\{j}
with zbj = zbj (x) as in (5.3). The proof of Theorem V.9 is given in Section 5.6. Remark V.10. Note that this result does not claim that ν(x, E) in (5.22) is the regular conditional probability. It merely provides an equivalent expression for (5.7), which is valid with probability one. We still need to show that (5.7), or equivalently (5.22), is indeed the regular conditional probability. From (5.23) and (5.24), one can see that ν (s) is the conditional distribution of (Zj )j∈J (s) . Therefore, Relation (5.22) implies that {(Zj )j∈J (s) }s∈hri , as vectors indexed by s, are conditionally independent, given X = x. This leads to the following improved conditional sampling algorithm: Algorithm II: 1. Compute zbj for j = 1, . . . , p and the hitting matrix H = H(A, x). 2. Identify {J (s) , J
(s)
}s∈hri by (5.19) and (5.20).
(s)
3. Compute {wj }j∈J (s) for all s ∈ hri by (5.23). 4. Sample (Zj )j∈J (s) | X = x ∼ ν (s) (x, ·) independently for s = 1, . . . , r. 5. Combine the sampled (Zj )j∈J (s) , s = 1, . . . , r to obtain a sample Z. This algorithm identifies all hitting scenarios in an efficient way. To illustrate its efficiency compared to Algorithm I, consider that r = 10 and |J (s) | = 10 for all
73
Table 5.1:
Means and standard deviations (in parentheses) of the running times (in seconds) for the decomposition of the hitting matrix H, based on 100 independent observations X = A Z, where A is an (n × p) matrix corresponding to a discretized Smith model. p\n 1 5 10 50 2500 0.03 (0.02) 0.13 (0.03) 0.24 (0.04) 1.25 (0.09) 10000 0.11 (0.04) 0.50 (0.05) 1.00 (0.08) 4.98 (0.33)
s ∈ h10i. Then, applying Formula (5.7) in Algorithm I requires storing in memory the weights of all 1010 hitting scenarios. In contrast, the implementation of (5.22) requires saving only 10 × 10 weights. This improvement is critical in practice since it allows us to handle large, realistic models. Table 5.1 demonstrates the running times of Algorithm II as a function of the dimensions n × p of the matrix A. It is based on a discretized 2-d Smith model (Section 5.5) and measured on an Intel(R) Core(TM)2 Duo CPU E4400 2.00GHz with 2GB RAM. It is remarkable that the times scale linearly in both n and p. 5.4
MARMA Processes
In this section, we apply our result to the max-autoregressive moving average (MARMA) processes studied by Davis and Resnick [19].
A stationary process
{Xt }t∈Z is a MARMA(m, q) process if it satisfies the MARMA recursion: (5.25)
Xt = φ1 Xt−1 ∨ · · · ∨ φm Xt−m ∨ Zt ∨ θ1 Zt−1 ∨ · · · ∨ θq Zt−q ,
for all t ∈ Z, where φi ≥ 0, θj ≥ 0, i = 1, . . . , m, j = 1, . . . , q are the parameters, and {Zt }t∈Z are i.i.d. 1-Fr´echet random variables. Proposition 2.2 in [19] shows that, (5.25) has a unique solution in form of (5.26)
Xt =
∞ _ j=0
ψj Zt−j < ∞ , almost surely,
74
with ψj ≥ 0, j ≥ 0,
P∞
j=0
ψj < ∞, if and only if φ∗ =
Wm
i=1
φi < 1. In this case,
j∧q
ψj =
_
αj−k θk ,
k=0
where {αj }j∈Z are determined recursively by αj = 0 for all j < 0, α0 = 1 and αj = φ1 αj−1 ∨ φ2 αj−2 ∨ · · · ∨ φm αj−m , ∀j ≥ 1 .
(5.27)
In the sequel, we will focus on the MARMA process (5.25) with unique stationary solution (5.26). In this case, the MARMA process is a spectrally discrete max–stable process. Without loss of generality, we also assume {Zk }k∈Z to be standard 1-Fr´echet. We consider the prediction of the MARMA process in the following framework: suppose at each time t ∈ {1, . . . , n} we observe the value Xt of the process, and the goal is to predict {Xs }n 100 , P(Xs ≤ X ) ≈ 1{Xs(k) ≤ X t=1 500 k=1
78
bs is the projection predictor in (5.34). This procedure was repeated 1000 where X times for independent realizations of {Xt }100 t=1 and the means of the (estimated) probability in (5.35) are reported in Table 5.2. Note that as the time lag increases, the conditional quantiles of the projection predictors decrease. In this way, our conditional sampling algorithm helps quantify numerically the observed underestimation phenomenon in Figure 5.2. Table 5.2:
Cumulative probabilities that the projection predictors correspond to at time 100 + t, based on 1000 simulations. t 1 2 3 4 5 10 20 30 40 mean 70.6% 50.3% 35.6% 25.3% 17.8% 2.9% 0.1% 0% 0%
Finally, we compare the generated conditional samples to the true process values at times s = 101, . . . , 150. Our goal is to demonstrate the validity of our conditional sampling algorithm. The idea is that, at each location s = 101, . . . , 150, the true process should lie below the predicted 95% upper confidence bound of Xs | {Xt }100 t=1 , with probability at least 95%. (Note that due to the presence of atoms in the conditional distributions, the coverage probability may in principle be higher than 95%.) Motivated by this, we repeat the procedure in the previous paragraph and record the proportion of the times that Xs is below the predicted confidence quantile, for each s. We refer to these values as the coverage rates. As discussed, the coverage rates should be close to 95%. This is supported by our simulation result, shown in Table 5.3. Table 5.3:
Coverage rates (CR) and the widths of the upper 95% confidence intervals at time 100+t, based on 1000 simulations. t 1 2 3 4 5 10 20 30 40 CR 0.956 0.952 0.954 0.957 0.966 0.947 0.943 0.951 0.955 width 13.06 26.6 37.8 45.6 51.2 62.8 66.0 66.2 65.4
79
Table 5.3 also shows the widths of the upper 95%-confidence intervals. Note that these widths are not equal to the upper confidence bounds, given by the conditional 95%-quantiles, since the left end-point of the conditional distributions are greater than zero. When the time lag is small, the left end-point is large and the widths are small, due to the strong influence of the past of the process {Xt }100 t=1 . On the other hand, because of the weak temporal dependence of the MAR(3) processes, this influence decreases fast as the lags increase. Consequently, the conditional distribution converges to the unconditional one, and the conditional quantile to the unconditional one. Note that the (unconditional) 95%-quantile of Xs for the MARMA process (5.26) can be calculated via the formula 0.95 = P(σZ ≤ u) = exp(−σu−1 ), P with σ = pj=0 ψj . For the MAR(3) process we chose, we have σ = 3.4 and the 95%-quantile of Xs equals 66.29. This is consistent with the widths in Table 5.3 for large lags. Remark V.12. As pointed out by an anonymous referee, in this case one can directly n generate samples from {Xs }N echet rans=n+1 | {Xt }t=1 , by generating independent Fr´
dom variables and iterating (5.33). We selected this example only for illustrative purpose and to be able to compare with the projection predictors in [19]. One can modify slightly the prediction problem, such that our algorithm still applies by adjusting accordingly (5.30), while both the projection predictor and the direct method by using (5.33) do not apply. For example, consider the prediction problem with re2n+N spect to the conditional distribution P({Xs }s=2n+1 ∈ · | {Xt : t = 1, 3, . . . , 2n − 1})
(prediction with only partial history observed) or P({Xs }n−1 s=2 ∈ · | X1 , Xn ) (prediction of the middle path with the beginning and the end-point (in the future) given). In other words, our algorithm has no restriction on the locations of observations. This feature is of great importance in spatial prediction problems.
80
5.5
Discrete Smith Model
Consider the following moving maxima random field model in R2 : Ze (5.36) Xt = R2 φ(t − u)Mα (du), t = (t1 , t2 ) ∈ R2 , where Mα is an α-Fr´echet random sup-measure on R2 with the Lebesgue control measure. Smith [98] proposed to use for φ the bivariate Gaussian density: (5.37)
φ(t1 , t2 ) :=
n 22 o ββ 1 2 2 p1 2 β t − 2ρβ β t t + β t , exp − 1 2 1 2 2 2 2(1 − ρ2 ) 1 1 2π 1 − ρ2
with correlation ρ ∈ (−1, 1) and variances σi2 = 1/βi2 , i = 1, 2. Consistent and asymptotically normal estimators for the parameters ρ, β1 and β2 were obtained by de Haan and Pereira [25]. Here, we will assume that these parameters are known and will illustrate the conditional sampling methodology over a discretized version of the random field (5.36). Namely, we truncate the extremal integral in (5.36) to the square region [−M, M ]2 and consider a uniform mesh of size h := M/q, q ∈ N. We then set (5.38)
Xt :=
_
h2/α φ(t − uj1 j2 )Zj1 j2 ,
−q≤j1 ,j2 ≤q−1 d
where uj1 j2 = ((j1 + 1/2)h, (j2 + 1/2)h) and h2/α Zj1 j2 = Mα ((j1 h, (j1 + 1)h] × (j2 h, (j2 + 1)h]). This discretized model (5.38) can be made arbitrarily close to the spectrally continuous one in (5.36) by taking a fine mesh h and sufficiently large M (see e.g. [101]). Suppose that the random field X in (5.38) is observed at n locations Xti = xi , ti ∈ [−M, M ]2 , i = 1, . . . , n. In view of (5.38), we have the max-linear model X = A Z, with X = (Xti )ni=1 and Z = (Zj )pj=1 , p = q 2 . By sampling from the conditional distribution of Z | X = x, we can predict the random field Xs at arbitrary locations s ∈ R2 .
81
To illustrate our algorithm, we used the model (5.38) with parameter values ρ = 0, β1 = β2 = 1, M = 4, p = q 2 = 2500, and n = 7 observed locations. We generated N = 500 independent samples from the conditional distribution of the random field {Xs }, where s takes values on an uniform 100×100 grid, in the region [−2, 2]×[−2, 2]. We have already seen four of these realizations in Figure 5.1. Figure 5.3 illustrates the median and 0.95-th quantile of the conditional distribution. The former provides the optimal predictor for the values of the random field given the observed data, with respect to the absolute deviation loss. The marginal quantiles, on the other hand, provide important confidence regions for the random field, given the data. Certainly, conditional sampling may be used to address more complex functional prediction problems. In particular, given a two-dimensional threshold surface, one can readily obtain the correct probability that the random field exceeds or stays below this surface, conditionally on the observed values. This is much more than what marginal conditional distributions can provide. Conditional Median of the Smith model
Conditional Marginal Quantile of the Smith model 25
20
5.5 6
5 5
6.0
5
25
5.5
5
1
5.5
1
15
15 20
5
6 5.5
5
5 20
6
5
−1
−1
5
4.5
10
5 4.0
25
6
0 Parameters:ρ=0, β1=1, β2=1
1
20
−2
−2
5.5
−2
15
10 20
15 −1
0
25
5.5
5
−1
5
5
5
Figure 5.3:
15 5 4.5
4
−2
5
5.5
5.5
5.0
0
0
5
5
1
Parameters:ρ=0, β1=1, β2=1, q=0.95
Conditional medians (left) and 0.95-th conditional marginal quantiles (right). Each cross indicates an observed location of the random field, with the observed value at right.
82
5.6
Proofs of Theorems V.4 and V.9
In this section, we prove Theorems V.4 and V.9. We will first prove Theorem V.9, which simplifies the regular conditional probability formula (5.7) in Theorem V.4. Then, we show the simplified new formula is the desired regular conditional probability, which completes the proof of Theorem V.4. The key step to prove Theorem V.9 is the following lemma. Write H·j = {i ∈ hri : hi,j = 1}. Lemma V.13. Under the assumptions of Theorem V.9, with probability one, (i) J (s) is nonempty for all s ∈ hri, and (ii) for all j ∈ J (s) , H·j ∩ Is 6= ∅ implies H·j ⊂ Is . Proof. Note that to show part (ii) of Lemma V.13, it suffices to observe that since Is is an equivalence class w.r.t. Relation (5.17), H·j \ Is and H·j ∩ Is cannot be both nonempty. Thus, it remains to show part (i). We proceed by excluding several P-measure zero sets, on which the desired results may not hold. First, observe that for all i ∈ hni, the maximum value of {ai,j Zj }j∈hri is achieved for unique j ∈ hpi with probability one, since the Zj ’s are independent and have continuous distributions. Thus, the set n o ai,j1 Zj1 = ai,j2 Zj2 = max ai,j Zj
[
N1 :=
j∈hpi
i∈hni,j1 ,j2 ∈hpi,j1 6=j2
has P-measure zero.
From now on, we focus on the event N1c and set j(i) =
argmaxj∈hpi ai,j Zj for all i ∈ hni. j
Next, we show that with probability one, i1 ∼ i2 implies j(i1 ) = j(i2 ). That is, the set N2 :=
[ j∈hpi,i1 ,i2 ∈hni,i1 6=i2
Nj,i1 ,i2 with Nj,i1 ,i2
n o j := j(i1 ) 6= j(i2 ), i1 ∼ i2
83
has P-measure 0. It suffices to show P(Nj,i1 ,i2 ) = 0 for all i1 6= i2 . If not, since hpi and hni are finite sets, there exists N0 ⊂ Nj,i1 ,j2 , such that j(i1 ) = j1 6= j(i2 ) = j2 j
on N0 , and P(N0 ) > 0. At the same time, however, observe that i1 ∼ i2 implies hi1 ,j = hi2 ,j = 1, which yields aik ,j zbj = xik = aik ,j(ik ) Zj(ik ) = aik ,jk Zjk , k = 1, 2 . It then follows that on N0 , Zj1 /Zj2 = ai1 ,j ai2 ,j2 /(ai2 ,j ai1 ,j1 ), which is a constant. This constant is strictly positive and finite. Indeed, this is because on N1c , ai,j(i) > 0 by Assumption A and hi,j = 1 implies ai,j > 0. Since Zj1 and Zj2 are independent continuous random variables, it then follows that P(N0 ) = 0. Finally, we focus on the event (N1 ∪N2 )c . Then, for any i1 , i2 ∈ Is , we have i1 ∼ i2 and let ei0 , . . . , ein be as in (5.17). It then follows that j(i1 ) = j(ei0 ) = j(ei1 ) = · · · = j(ein ) = j(i2 ). Note that for all i ∈ hni, hi,j(i) = 1 by the definition of j(i). Hence, j(i1 ) = j(i2 ) ∈ J (s) . We have thus completed the proof. Proof of Theorem V.9. Since {Is }s∈hri are disjoint with
S
s∈hri Is
= hni, in the lan-
guage of the set-covering problem, to cover hni, we need to cover each Is . By part (ii) of Lemma V.13, any two different Is1 and Is2 cannot be covered by a single set H·j . Thus we need at least r sets to cover hni. On the other hand, with probability one we can select one js from each J (s) (by part (i) of Lemma V.13), which yields a valid cover. That is, with probability one, r = r(J (H)) and any valid minimum-cost cover of hni must be as in (5.21), and vice versa. We have thus proved parts (i) and (ii).
84
To show (iii), by straight-forward calculation, we have, with probability one, X
X
wJ =
J∈Jr (A,x)
j1
∈J (1)
X
=
j1
X
···
jr
" r−1 Y
X
···
∈J (1)
wj1 ,...,jr
∈J (r)
jr−1
s=1
∈J (r−1)
j ∈J / j6=j1 ,...,jr−1
j∈J (r)
(5.39)
=
X
Y k∈J
k∈J
(s)
(r)
# o FZk (b zk )
\{j}
r X Y (s) FZk (b zk ) = wj .
Y
zbj fZj (b zj )
s=1 j∈J (s)
FZj (b zj )
(r)
n X × zbj fZj (b zj ) r Y
Y
zbjs fZjs (b zjs )
\{j}
s=1 j∈J (s)
Similarly, we have (5.40)
X
wJ νJ (x, E) =
J∈Jr (A,x)
r X Y s=1
(s) (s) wj νj (x, E) .
j∈J (s)
By plugging (5.39) and (5.40) into (5.7), we obtain the desired result and complete the proof. Proof of Theorem V.4. To prove that ν in (5.7) yields the regular conditional probability of Z given X, it is enough to show that Z (5.41)
P(X ∈ D, Z ∈ E) =
ν(x, E)PX (dx),
D
for all rectangles D ∈ RRn+ and E ∈ RRp+ . In view of Theorem V.9, it is enough to work with ν(x, E) given by (5.22). We shall prove (5.41) by breaking the integration into a suitable sum of integrals over regions corresponding to all hitting matrices H for the max-linear model X = A Z. We say such a hitting matrix H is nice, if J (s) defined in (5.19) is nonempty for all s ∈ hri. In view of Lemma V.13, it suffices to focus on the set H(A) of nice hitting matrices H. Notice that the set H(A) is finite since the elements of the hitting matrices are 0’s and 1’s.
85
For all rectangles D ∈ RRn+ , let n o DH = x = A z : H(A, x) = H, x ∈ D be the set of all x ∈ Rn+ that give rise to the hitting matrix H. By Lemma V.13 (i), for the random vector X = A Z, with probability one, we have X
X=
X1DH (X)
H∈H(A)
and hence Z (5.42)
X
ν(x, E)P (dx) = D
X Z H∈H(A)
ν(x, E)PX (dx) .
DH
Now fix an arbitrary and non-random nice hitting matrix H ∈ H(A). Let {Is }s∈hri denote the partition of hni determined by (5.17) and let J (s) , J as in (5.19). Recall that J (s) ⊂ J
(s)
and the sets J
(s)
(s)
, s = 1, . . . , r be
, s = 1, . . . , r are disjoint.
Focus on the set DH ⊂ Rn+ . Without loss of generality, and for notational convenience, suppose that s ∈ Is , for all s = 1, . . . , r. That is, I1 = {1, i1,2 , . . . , i1,k1 }, I2 = {2, i2,2 , . . . , i2,k2 }, · · · , Ir = {r, ir,2 , . . . , ir,kr }. Define the projection mapping PH : DH → Rr+ onto the first r coordinates: PH (x1 , . . . , xn ) = (x1 , . . . , xr ) ≡ xr . Note that PH , restricted to DH is one-to-one. Indeed, for all i ∈ Is , we have xi = ai,j zbj and xs = as,j zbj , for all j ∈ J (s) (recall (5.19)). This implies xi = (ai,j /as,j )xs , for all e = x. i ∈ Is and all s = 1, . . . , r. Hence, PH (e x) = PH (x) implies x Consequently, can write x = P −1 (xr ), xr ∈ P(DH ), and Z
X
Z
ν(x, E)P (dx) = DH
r ν(x, E)QX H (dx1 . . . dxr ),
PH (DH )
−1 X r where QX H := P ◦ PH is the induced measure on the set PH (DH ).
86 r Lemma V.14. The measure QX H has a density with respect to the Lebesgue measure
on the set PH (DH ). The density is given by r QX H (dxr )
(5.43)
= 1PH (DH ) (xr )
r X Y
(s)
wj (x)
s=1 j∈J (s)
dx1 dxr ··· . x1 xr
The proof of this result is given below. In view of (5.43) and (5.22), we obtain Z r ν(x, E)QX H (dxr ) PH (DH ) r Y
Z =
(s)
P
P
PH (DH ) s=1
k∈J (s)
| =
(s) wk (x)
}
=ν(x,E) (s)
(s)
×
r X Y
(s)
wj (x)νj (x, E)
PH (DH ) s=1 j∈J (s)
|
dx1 dxr ··· x1 xr {z }
wj (x)
s=1 j∈J (s)
{z
r X Y
Z
(s)
wj (x)νj (x, E)
j∈J (s)
r =QX H (dxr )
dx1 dxr ··· , x1 xr
which equals (5.44)
r Y
Z
X
PH (DH ) s=1
j1 ∈J (1) ,··· ,jr ∈J (r)
(s)
(s)
wjs (x)νjs (x, E)
|
{z
=:I(j1 ,...,jr )
dx1 dxr ··· . x1 xr }
Fix j1 ∈ J (1) , · · · , jr ∈ J (r) and focus on the integral I(j1 , · · · , jr ). Define ΩrH (DH )
o n := (zj1 , . . . , zjr ) : zjs = xs /as,js , s = 1, . . . , r, xr = (xs )rs=1 ∈ PH (DH ) .
We have, by (5.23), (5.24), and replacing xs with as,js zjs , s = 1, . . . , r (simple change of variables), I(j1 , · · · , jr ) Z r Y = zjs fZjs (zjs ) ΩrH (DH ) s=1
k∈J
Y
×δπjs (E) (zjs ) k∈J
Z = ΩrH (DH )
(5.45)
×
r Y
Y
(s)
(s)
FZk (b zk )
\{js }
P(Zk ∈ πk (E) | Zk < zbk )
\{js }
dz
j1
zj1
fZjs (zjs )δπjs (E) (zjs )
s=1
Y k∈hpi\{j1 ,...,jr }
P(Zk ∈ πk (E), Zk < zbk )dzj1 · · · dzjr .
···
dzjr zjr
87
Define n ΩH;j1 ,...,jr (DH ) = z ∈ Rp+ : x = A z ∈ DH , o zjs = xs /as,js , s = 1, . . . , r, zk < zbk (x), k ∈ hpi \ {j1 , . . . , jr } . By the independence of the Zk ’s, (5.45) becomes I(j1 , . . . , jr ) = P Z ∈ ΩH;j1 ,...,jr (DH ) ∩ E .
(5.46)
By plugging (5.46) into (5.44), we obtain Z
Z
X
ν(x, E)P (dx) =
(5.47)
r ν(x, E)QX H (dxr )
PH (DH )
DH
X
= j1
∈J (1) ,··· ,j
P(Z ∈ ΩH;j1 ,··· ,jr (DH ) ∩ E) = P(A Z ∈ DH , Z ∈ E), r
∈J (r)
because the summation over (j1 , . . . , jr ) accounts for all relevant hitting scenarios corresponding to the matrix H. Plugging (5.47) into (5.42), we have Z
X
ν(x, E)PX (dx) =
D
P(X ≡ A Z ∈ DH , Z ∈ E) = P(X ∈ D, Z ∈ E) .
H∈H(A)
This completes the proof of Theorem V.4. Proof of Lemma V.14. Consider the random vector Xr = (X1 , . . . , Xr ). Observe that by the definition of the set PH (DH ), on the event {Xr ∈ PH (DH )}, we have (5.48)
Xr =
X j1 ∈J (1) , ··· ,
a1,j1 Zj1 .. . jr ∈J (r) ar,jr Zjr
Note that since J(s) ⊂ J
(s)
r Y n o _ 1 as,k Zk < as,js Zjs . s=1 (s) k∈J \{js } {z } |
, s = 1, . . . , r, the events
r-tuples (j1 , . . . , jr ) ∈ J (1) × · · · × J (r) .
=:1{Cs,js }
Tr
s=1
Cs,js are disjoint for all
88
Recall that our goal is to establish (5.43). By the fact that the sum in (5.48) involves only one non-zero term for some (j1 , . . . , jr ), with probability one, we have that for all measurable set ∆ ⊂ PH (DH ), writing ξjs = as,js Zjs , r (5.49) QX H (∆) ≡ P(Xr ∈ ∆)
X
= j1
∈J (1) ,
P {(ξj1 , · · · , ξjr ) ∈ ∆} ∩
··· , jr
r \
Cs,js
.
s=1
∈J (r)
Now, consider the last probability, for fixed (j1 , . . . , jr ). The random variables ξjs , s = 1, . . . , r are independent and they have densities fZjs (xs /as,js )/as,js , xs ∈ R+ . We also have that the events Cs,js , s = 1, . . . , r are mutually independent, since their definitions involve Zk ’s indexed by disjoint sets J
(s)
, s = 1, . . . , r. By
conditioning on the ξjs ’s, we obtain that the probability in the right-hand side of (5.49) equals Z Y r r Y 1 f (xs /as,js ) × P ∆ s=1 as,js s=1 Z Y r 1 f (xs /as,js ) = ∆ s=1 as,js
_ k∈J
(s)
In view of (5.48) and (5.23), replacing
\{js }
FZk (xs /as,k ) dx1 · · · dxr .
Y k∈J
(s)
as,k Zk < xs dx1 · · · dxr
\{js }
P
j1 ∈J (1) , ··· , jr ∈J (r)
Qr
s=1
by
Qr
s=1
r we obtain that the measure QX H has a density on P(DH ), given by (5.43).
P
j∈J (s) ,
CHAPTER VI
Central Limit Theorems for Stationary Random Fields
The central limit theorem studies the asymptotic behavior of partial sums of random variables Sn = X1 + · · · + Xn . In the case that the random variables are independent, it has been well understood when the normalized partial sums converge to a normal distribution: Sn − ESn √ ⇒ N (0, σ 2 ) . n The case that {Xi }i∈N are dependent also has a long history. This dates back to at least 1910, when Markov [58] already proved a central limit theorem for a two-state Markov chain. Since then, the central limit theorem for stationary processes has been an active research area in probability theory. In this chapter, our focus is on establishing central limit theorems for stationary random fields. That is, for stationary random variables {Xi,j }(i,j)∈N2 , when do we have Pn Pn (6.1)
i=1
j=1 (Xi,j
− EXi,j )
n
⇒ N (0, σ 2 )?
This problem has already been considered by many researchers. For example, Bolthausen [5], Goldie and Morrow [40] and Bradley [7] studied this problem under suitable mixing conditions. Basu and Dorea [3], Nahapetian [62], and Poghosyan and
89
90
Roelly [73] considered the problem for multiparameter martingales. Another important result is due to Dedecker [27, 28], whose approach was based on an adaptation of the Lindeberg method. As a particular case, Cheng and Ho [15] established a central limit theorem for functionals of linear random fields, based on a lexicographically ordered martingale approximation. Here, we aim at establish the so-called projective-type conditions such that the central limit theorem (6.1) holds. Such conditions have recently drawn much attentions in central limit theorems for stationary sequences, as they are easy to verify when applying such results to stochastic processes from statistics and econometrics (see e.g. Wu [119]). However, central limit theorems for stationary random fields based on projective conditions have been much less explored. This problem is not a simple extension of a one-dimensional problem to a highdimensional one. An important reason is that, the main technique for establishing central limit theorems with projective conditions in one dimension, the martingale approximation approach, does not apply to (high-dimensional) random fields as successfully as to (one-dimensional) stochastic processes. This obstacle has been known among researchers for more than 30 years. For example, Bolthausen [5] remarked that ‘Gordin uses an approximation by martingales, but his method appears difficult to generalizes to dimensions ≥ 2.’ (For literatures on martingale approximation, see e.g. Gordin and Lifˇsic [43], Kipnis and Varadhan [54], Woodroofe [116], Maxwell and Woodroofe [59], Wu and Woodroofe [121], Dedecker et al. [30], Peligrad et al. [68], among others, and Merlev`ede et al. [60] for a survey.) In this chapter, we establish a central limit theorem and an invariance principle for stationary multiparameter random fields. We will apply an m-approximation approach. we first state the main result in the next section.
91
6.1
Main Result
We start with some notations. We consider a product probability space (Ω, A, P), i.e., a Zd -indexed-product of i.i.d. probability spaces in form of d
d
d
(Ω, A, P) ≡ (RZ , B Z , P Z ) . d
Write k (ω) = ωk , for all ω ∈ RZ and k ∈ Zd . Then, {k }k∈Zd are i.i.d. random variables with distribution P . On such a space, we define the natural filtration {Fk }k∈Zd by Fk := σ{l : l k, l ∈ Zd }, for all k ∈ Zd .
(6.2)
Here and in the sequel, for all vector x ∈ Rd , we write x = (x1 , . . . , xd ) and for all l, k ∈ Rd , let l k stand for li ≤ ki , i = 1, . . . , d. We focus on mean-zero stationary random fields, defined on a product probability d
space. Let {Tk }k∈Zd denote the group of shift operators on RZ with (Tk ω)l = ωk+l , d
for all k, l ∈ Zd , ω ∈ RZ . Then, we consider random fields in form of {f ◦ Tk }k∈Zd , R W where f is in the class Lp0 = {f ∈ Lp (F∞ ), f dP = 0}, p ≥ 2, with F∞ = k∈Zd Fk . Throughout this chapter, we consider a sequence {Vn }n∈N of finite rectangular subsets of Zd , in form of (6.3)
Vn =
d Y (n) {1, . . . , mi } ⊂ Nd , for all n ∈ N , i=1
(n)
with mi (6.4)
increasing to infinity as n → ∞ for all i = 1, . . . , d. Let Sn (f ) ≡ S(Vn , f ) =
X k∈Vn
f ◦ Tk .
92
denote the partial sums with respect to Vn . Moreover, write for t ∈ [0, 1], Vn (t) = Qd Qd (n) d d d i=1 [0, mi t] ⊂ R and Rk = i=1 (ki − 1, ki ] ⊂ R for all k ∈ Z . We write also (6.5)
Bn,t (f ) ≡ BVn ,t (f ) =
X
λ(Vn (t) ∩ Rk )f ◦ Tk ,
k∈Nd
where λ is the Lebesgue measure on Rd , and consider the weak convergence in the space C[0, 1]d , the space of continuous functions on [0, 1]d , equipped with the uniform metric. Recall that the standard d-parameter Brownian sheet on [0, 1]d , denoted by {B(t)}t∈[0,1]d , is a mean-zero Gaussian random field with covariance E(B(s)B(t)) = Qd d d i=1 min(si , ti ), s, t ∈ [0, 1] . Write 0 = (0, . . . , 0), 1 = (1, . . . , 1) ∈ Z . Our condition involves the following term: (6.6)
e d,p (f ) := ∆
X kE(f ◦ Tk | F1 )kp . Qd 1/2 k d i i=1 k∈N
Our main result is the following. Theorem VI.1. Consider a product probability space described above. If f ∈ L20 , e d,2 (f ) < ∞, then f ∈ F0 and ∆ E(Sn (f )2 ) 2, then In addition, if f ∈ Lp0 and ∆ (6.7)
Bn,· (f ) ⇒ σB(·) |Vn |1/2
in C[0, 1]d . For the sake of simplicity, we will prove Theorem VI.1 in the case d = 2 in Sections 6.3 and 6.4.
93
Remark VI.2. Conditions involving conditional expectations like the one here e d,p (f ) < ∞ are referred to as projective conditions. Compared to mixing-type ∆ conditions (see e.g. Bradley [8]), projective ones are often easy to check in practice. One of such examples is given in Section 6.6. See for example [30] for comparisons of projective conditions. The rest of chapter is organized as follows. In Section 6.2 we provide preliminary results on m-dependent approximation. We establish the central limit theorem in Section 6.3 and then the invariance principle in Section 6.4. Sections 6.5 and 6.6 are devoted to the applications to orthomartingales and functionals of stationary linear random fields, respectively. In Section 6.7, we prove a moment inequality, which plays a crucial role in proving our limit results. Some other auxiliary proofs are given in Section 6.8. 6.2
m-Dependent Approximation
We describe the general procedure of m-dependent approximation in this section. In this section, we do not assume any structure on the underlying probability space, nor the filtration structure. Instead, we simply assume f ∈ L20 = {f ∈ R L2 (Ω, A, P), f dP = 0}, and {Tk }k∈Zd is an Abelian group of bimeasurable, measurepreserving, one-to-one and onto maps on (Ω, A, P). The notion of m-dependence was first introduced by Hoeffding and Robbins [49]. We say a random variable f is m-dependent, if f ◦ Tk , f ◦ Tl are independent whenever |k − l|∞ := maxi=1,...,d |ki − li | > m. The following result on the asymptotic normality of sums of m-dependent random variables is due to Bolthausen [5] (see also Ros´en [81]). Recall {Vn }n∈N given in (6.3).
94
Theorem VI.3. Suppose fm ∈ L20 is m-dependent. Write 2 = σm
(6.8)
X
E[fm (fm ◦ Tk )] .
k∈Zd
Then, Sn (fm ) 2 ⇒ N (0, σm ). |Vn |1/2 Now, consider the function f ∈ L20 (P) and define kf kV,+ = lim sup
(6.9)
n→∞
kSn (f )k2 . |Vn |1/2
We refer to the pseudo norm k·kV,+ as the plus-norm. Lemma VI.4. Suppose f, f1 , f2 , · · · ∈ L20 (P) and fm is m-dependent for all m ∈ N. If lim kf − fm kV,+ = 0 ,
(6.10)
m→∞
then lim σm = lim kfm kV,+ =: σ < ∞
(6.11)
m→∞
m→∞
exists, and Sn (f ) ⇒ N (0, σ 2 ) . 1/2 |Vn |
(6.12)
2 Proof. It suffices to prove (6.11). We will show that {σm }m∈N forms a Cauchy se-
quence in R+ . Observe that since fm is m-dependent with zero mean, kSn (fm )k2 . n→∞ |Vn |1/2
σm = lim It then follows that
|σm1 − σm2 | ≤ lim sup n→∞
kSn (fm1 − fm2 )k2 |Vn |1/2
≤ kfm1 − f kV,+ + kfm2 − f kV,+ ,
95
which can be made arbitrarily small by taking m1 , m2 large enough. We have thus shown that {σn2 }n∈N is a Cauchy sequence in R+ . Remark VI.5. The idea of establishing the central limit theorem by controlling the quantity kf − fm kV,+ dates back to Gordin [42], where fm was selected from a different subspace. In the one-dimensional case, when Vn = {1, . . . , n}, Zhao and Woodroofe [123] named k·kV,+ the plus-norm, and established a necessary and sufficient condition for the martingale approximation, in term of the plus-norm. See Gordin and Peligrad [41] and Peligrad [66] for improvements and more discussions on such conditions. In the next section, we will establish conditions, under which (6.10) holds. 6.3
A Central Limit Theorem
From this section on, we will focus on stationary multiparameter random fields, defined on product probability spaces. On Such a space, any integrable function has a natural L2 -approximation by m-dependent functions, and there is a natural commuting filtration. For the sake of simplicity, we consider only the 2-parameter random fields in the sequel and simply say ‘random fields’ for short. We will prove a central limit theorem here and then an invariance principle in the next section. The argument, however, can be generalized easily to d-parameter random fields, and the result has been stated in Theorem VI.1. We start with a product probability space with i.i.d. random variables {i,j }(i,j)∈Z2 . 2
Recall that {Ti,j }(i,j)∈Z2 are the group of shift operators on RZ and write F∞,∞ = σ(i,j : (i, j) ∈ Z2 ). We focus on the class of functions Lp0 = {f ∈ Lp (F∞,∞ ) : Ef =
96
0}, p ≥ 2. For all measurable function f ∈ L20 , define, for all m ∈ N, (6.13)
fm := E(f |Fhmi )
with
Fhmi = σ(j : j ∈ {−m, . . . , m}2 ) .
Clearly, fm ∈ L20 , kf − fm k2 → 0 as m → ∞ and {fm ◦ Ti,j }(i,j)∈Z2 are m-dependent functions. Now, recall the natural filtration {Fi,j }(i,j)∈Z2 defined by Fk,l = σ(i,j : i ≤ k, j ≤ l). This is a 2-parameter filtration, i.e., (6.14)
Fi,j ⊂ Fk,l
if
i ≤ k, j ≤ l .
Also, (6.15)
T−i,−j Fk,l = Fk+i,l+j , ∀(i, j), (k, l) ∈ Z2 .
Moreover, the notion of commuting filtration is of importance to us. Definition VI.6. A filtration {Fi,j }(i,j)∈Z2 is commuting, if for all Fk,l -measurable bounded random variable Y , E(Y |Fi,j ) = E(Y |Fi∧k,j∧l ). Since {k,l }(k,l)∈Z2 are independent random variables, {Fi,j }(i,j)∈Z2 is commuting (see Proposition VI.22 in Section 6.8). This implies that the marginal filtrations (6.16)
Fi,∞ =
_
Fi,j
and
F∞,j =
_
Fi,j
i≥0
j≥0
are commuting, in the sense that for all Y ∈ L1 (P), (6.17)
E[E(Y |Fi,∞ )|F∞,j ] = E[E(Y |F∞,j )|Fi,∞ ] = E(Y |Fi,j ) .
For more details on the commuting filtration, see Khoshnevisan [53]. For all F0,0 -measurable function f ∈ L20 , write (6.18)
Sm,n (f ) =
m X n X i=1 j=1
f ◦ Ti,j .
97
Thanks to the commuting structure of the filtration, applying twice the maximal inequality in [68], we can prove the following moment inequality with p ≥ 2: kSm,n (f )kp ≤ Cm1/2 n1/2 ∆(m,n),p (f )
(6.19) with
∆(m,n),p (f ) =
m X n X kE(Sk,l (f ) | F1,1 )kp k=1 l=1
k 3/2 l3/2
.
In fact, we will prove a stronger inequality without the assumptions of product probability space and the F0,0 -measurability of f . See Section 6.7, Proposition VI.20 and Corollary VI.21. Recall that e 2,p (f ) = ∆
(6.20)
∞ ∞ X X kE(f ◦ Tk,l | F1,1 )kp . 1/2 l 1/2 k k=1 l=1
Now, we can prove the following central limit theorem for adapted stationary random fields. Theorem VI.7. Consider the product probability space discussed above. Let {Vn }n∈N be as in (6.3) with d = 2. Suppose f ∈ L20 , f ∈ F0,0 , and define fm as in (6.13). If e 2,2 (f ) < ∞, then ∆ lim kf − fm kV,+ = 0 .
m→∞
Therefore, σ := limm→∞ kfm kV,+ < ∞ exist and Sn (f )/|Vn |1/2 ⇒ N (0, σ 2 ). Proof. The second part follows immediately from Lemma VI.4. It suffices to prove kf − fm kV,+ → 0 as m → ∞. First, by the fact that kE(Sk,l (f ) | F1,1 )k2 ≤
k X l X i=1 j=1
kE(f ◦ Tk,l | F1,1 )k2
98
e 2,2 (f ). So, by (6.9) and (6.19), it and Fubini’s theorem, we have ∆(∞,∞),2 (f ) ≤ 9∆ suffices to show (6.21)
e 2,2 (f − fm ) = ∆
∞ X ∞ X kE[(f − fm ) ◦ Tk,l | F1,1 ]k2 →0 1/2 l 1/2 k k=1 l=1
as m → ∞. Clearly, the summand in (6.21) converges to 0 for each k, l fixed, since (6.13) implies kf − fm k2 → 0 as m → ∞ and kE[(f − fm ) ◦ Tk,l | F1,1 ]k2 ≤ kf − fm k2 . Moreover, observe that, E(fm ◦ Tk,l | F1,1 ) = E[E(f ◦ Tk,l | T−k,−l (Fhmi )) | F1,1 ] = E[E(f ◦ Tk,l | F1,1 ) | T−k,−l (Fhmi )] , where in the second equality we can exchange the order of conditional expectations by the definitions of F1,1 and T−k,−l (Fhmi ) (see Proposition VI.22 in Section 6.8 for a detailed treatment). Therefore, kE[(f − fm ) ◦ Tk,l | F1,1 ]k2 ≤ kE(f ◦ Tk,l | F1,1 )k2 + kE(fm ◦ Tk,l | F1,1 )k2 ≤ 2kE(f ◦ Tk,l | F1,1 )k2 . e 2,2 (f ) < ∞ combined with the dominated convergence theorem Then, the condition ∆ yields (6.21). The proof is thus completed. 6.4
An Invariance Principle
Recall the space C[0, 1]2 and the 2-parameter Brownian sheet {B(t)}t∈[0,1]2 . Theorem VI.8. Under the assumptions in Theorem VI.7, suppose in addition that e 2,p (f ) < ∞ for some p > 2. Write Bn,t (f ) as in (6.5) with d = 2. f ∈ Lp0 and ∆ Then, Bn,· (f ) ⇒ σB(·) , |Vn |1/2
99
where ‘ ⇒’ stands for the weak convergence in C[0, 1]2 . Proof. It suffices to show that the finite-dimensional distributions converge, and {Bn,t (f )/|Vn |1/2 }t∈[0,1]2 is tight. We first show that, for all e t = (t(1) , . . . , t(k) ) ⊂ [0, 1]2 , B
n,t(1) (f ) ,··· |Vn |1/2
(6.22)
,
Bn,t(k) (f ) ee . ⇒ σ(B(t(1) ), · · · , B(t(k) )) =: σ B t |Vn |1/2
Consider the m-dependent function fm defined in (6.13). Then, the convergence of the finite-dimensional distributions (6.22) with f replaced by fm follows from the invariance principle of m-dependent random fields (see e.g. [96]).
Further-
e 2,2 (f ) ≤ ∆ e 2,p (f ) < ∞, so that kf − fm k more, by Theorem VI.7, ∆ V,+ → 0 as e e(f )/|Vn |1/2 denote the left-hand side of (6.22), m → ∞, and therefore, letting B n,t e e(fm − f )/|Vn |1/2 → (0, . . . , 0) ∈ Rk in probability. The convergence of the finiteB n,t dimensional distribution (6.22) follows. Now, we prove the tightness of {Bn,t (f )}t∈[0,1]2 . Fix n and consider Vn = {1, . . . , n1 } × {1, . . . , n2 } . Write Bn,t ≡ Bn,t (f ) and Sm,n ≡ Sm,n (f ) for short. For all 0 ≤ r1 < s1 ≤ 1, 0 ≤ r2 < s2 ≤ 1, set, Bn ((r1 , s1 ] × (r2 , s2 ]) := Bn,(s1 ,s2 ) − Bn,(r1 ,s2 ) − Bn,(s1 ,r2 ) + Bn,(r1 ,r2 ) . We will show that there exists a constant C, independent of n, r1 , r2 , s1 and s2 , such that (6.23)
(n1 n2 )−1/2 kBn ((r1 , s1 ] × (r2 , s2 ])kp ≤ C
p e 2,p (f ) . (s1 − r1 )(s2 − r2 )∆
Inequality (6.23) implies the tightness, by Nagai [61], Theorem 1.
100
Now, we prove (6.23) to complete the proof. From now on, the constant C may change from line to line. Write mi = bni si c − bni ri c , i = 1, 2. If mi ≥ 2, i = 1, 2, then kBn ((r1 , s1 ] × (r2 , s2 ])kp ≤ kSm1 ,m2 kp + 2kSm1 ,1 kp + 2kS1,m2 kp + 4kS1,1 kp (6.24)
e 2,p (f ) ≤ C(m1 m2 )1/2 ∆
for some constant C, by (6.19). Note that mi ≥ 2 also implies ni (si − ri ) > 1. Therefore, mi ≤ ni (si − ri ) + 1 < 2ni (si − ri ), and (6.24) can be bounded by e 2,p (f ), which yields (6.23). C(n1 n2 )1/2 [(s1 − r1 )(s2 − r2 )]1/2 ∆ In the case m1 < 2 or m2 < 2, to obtain (6.23) requires more careful analysis. We only show the case when m1 = 1, m2 ≥ 2, as the proof for the other cases are similar. Observe that m1 = 1 implies n1 r1 < dn1 r1 e = bn1 s1 c ≤ n1 s1 . Then, kBn ((r1 , s1 ] × (r2 , s2 ])kp 1/2 e ≤ n1 (s1 − r1 )(kS1,m2 kp + 2kS1,1 kp ) ≤ Cn1 (s1 − r1 )m2 ∆ 2,p (f ) .
Observe that m1 = 1 also implies n1 (s1 − r1 ) ∈ (0, 2). If n1 (s1 − r1 ) ≤ 1, then n1 (s1 − √ r1 ) ≤ [n1 (s1 − r1 )]1/2 . If n1 (s1 − r1 ) ∈ (1, 2), then n1 (s1 − r1 ) < 2[n1 (s1 − r1 )]1/2 . It then follows that (6.23) still holds. Remark VI.9. To prove the invariance principle of stationary random fields, most of the results require finite moment of order strictly larger than 2. See for example Berkes and Morrow [4], Goldie and Greenwood [39] and Dedecker [28]. This is in contrast to the one-dimensional case, where the invariance principle can be established with finite second moment assumption.
101
To the best of our knowledge, there are two invariance principles for stationary random fields requiring finite second moment. One is due to Sashkin [96], who assumed to be BL(θ)-dependent (including m-dependent stationary random fields). In general the BL(θ)-dependence is difficult to check. The other is due to Basu and Dorea [3], who proved an invariance principle for martingale difference random fields with finite second moment assumption. However, they have stringent conditions on the filtration (see Remark VI.13 below). In our case, it remains an open problem: e 2,2 (f ) < ∞ implies the invariance principle. See also a similar conjecture whether ∆ by Dedecker in [28], Remark 1. 6.5
Orthomartingales
The central limit theorems and invariance principles for multiparameter martingales are more difficult to establish than in the one-dimensional case. This is due to the complex structure of multiparameter martingales. We will focus on orthomartingales first and establish an invariance principle, and then compare the results on other types of multiparameter martingales. The idea of orthomartingales are due to R. Cairoli and J. B. Walsh. See e.g. references in Khoshnevisan [53], which also provides a nice introduction to the materials. For the sake of simplicity, we suppose d = 2. Consider a probability space (Ω, A, P) and recall the definition of 2-parameter filtration (6.14). We restrict ourselves to the filtration indexed by N2 . Definition VI.10. Given a commuting 2-parameter filtration {Fi,j }(i,j)∈N2 on (Ω, A, P), we say a family of random variables {Mi,j }(i,j)∈N2 is a 2-parameter orthomartingale on (Ω, A, P), with respect to {Fi,j }(i,j)∈N2 , if for all (i, j) ∈ N2 , Mi,j is Fi,j -measurable, and E(Mi+1,j | Fi,∞ ) = E(Mi,j+1 | F∞,j ) = Mi,j , almost surely.
102
In our case, for F0,0 -measurable function f ∈ L20 , Mm,n = Sm,n (f ) as in (6.18) yields a 2-parameter orthomartingale, if E(f ◦ Ti+1,j | Fi,∞ ) = E(f ◦ Ti,j+1 | F∞,j ) = 0 almost surely,
(6.25)
for all (i, j) ∈ N2 . In this case, we say {f ◦Ti,j }(i,j)∈N2 are 2-parameter orthomartingale differences. Remark VI.11. In our case, {Mi,j }(i,j)∈N2 is also a 2-parameter martingale in the normal sense, i.e., E(Mi,j | Fk,l ) = Mi∧k,j∧l , almost surely. Indeed, E(Mi,j | Fk,l ) = E[E(Mi,j | Fk,∞ ) | F∞,l ] = E(Mi∧k,j | F∞,l ) = Mi∧k,j∧l . In general, however, the converse is not true, i.e., multiparameter martingales are not necessarily orthomartingales (see e.g. [53] p. 33). The two notions are equivalent, when the filtration is commuting (see e.g. [53], Chapter I, Theorem 3.5.1). Theorem VI.12. Consider a product probability space (Ω, A, P) with a natural filtration {Fi,j }(i,j)∈N2 . Suppose f ∈ L20 and f ∈ F0,0 . If {f ◦ Ti,j }(i,j)∈N2 are 2-parameter orthomartingale differences, i.e., (6.25) holds, then σ 2 = limn→∞ E(Sn (f )2 )/|Vn |2 < ∞ exists, and Sn (f ) ⇒ σN (0, 1) . |Vn |1/2 In addition, if f ∈ Lp0 for some p > 2, then the invariance principle (6.7) holds. Proof. Observe that, (6.25) implies E(f ◦ Ti,j | F1,1 ) = 0 if i > 1 or j > 1. Then, for f ∈ Lp0 , p ≥ 2, e ∞,p (f ) = kE(f ◦ T1,1 | F1,1 )kp = kf k < ∞ . ∆ p The result then follows immediately from Theorem VI.1. Note that, the argument holds for general d-parameter orthomartingales (d ≥ 2) defined in [53].
103
Remark VI.13. Our result is more general than [3], [62] and [73] in the following way. Let be {i,j }(i,j)∈Z2 be i.i.d. random variables. In [62], the central limit theorem was established for the so-called martingale-difference random fields {Mi,j }(i,j)∈N2 with P P Mi,j = ik=1 jl=1 Dk,l , such that E[Di,j | σ(k,l : (k, l) ∈ Z2 , (k, l) 6= (i, j))] = 0 , for all (i, j) ∈ N2 . In [3] and [73], the authors considered the multiparameter martingales {Mi,j }(i,j)∈N2 with respect to the filtration defined by Fei,j = σ(k,l : k ≤ i or l ≤ j) . It is easy to see, in both cases above, their assumptions are stronger, in the sense that they imply that {Mi,j }(i,j)∈N2 is an orthomartingale, with the natural filtration {Fi,j }(i,j)∈N2 (6.2). On the other hand, however, the results mentioned above only assume that {i,j }(i,j)∈Z2 is a stationary random field, which is weaker than our assumption. At last, we point out that the product structure of the probability space plays an important role. We provide an example of an orthomartingale with a different underlying probability structure. In this case, the limit behavior is quite different from the case that we studied so far. Example VI.14. Suppose {k }k∈Z and {ηk }k∈Z are two families of i.i.d. random variables. Define Gi = σ(j : j ≤ i) and Hi = σ(ηj : j ≤ i) for all i ∈ N. Then, G = {Gn }n∈N and H = {Hn }n∈N are two filtrations. Now, let {Yn }n∈N and {Zn }n∈N be two arbitrary martingales with stationary increment with respect to the filtration G and H, respectively. Suppose Yn = Pn Pn i=1 Di , Zn = i=1 Ei , where {Dn }n∈N and {En }n∈N are stationary martingale dif-
104
ferences. Then, {Di Ej }(i,j)∈N2 is a stationary random fields and Mm,n :=
m X n X
Di Ej = Ym Zn
i=1 j=1
is an orthomartingale with respect to the filtration {Gi ∨ Hj }(i,j)∈N2 . Clearly, Mn,n Yn Z n = √ √ ⇒ N (0, σY2 ) × N (0, σZ2 ) , n n n where the limit is the distribution of the product of two independent normal random variables (a Gaussian chaos). That is, Sn (f )/n has asymptotically non-normal distribution. fm,n = Ym + Zn , which again gives an orthomartingale, and One can also define M {Di + Ej }(i,j)∈N2 is the corresponding stationary random field. This time, one can show that fn,n M Y Z √ = √n + √n ⇒ N (0, σY2 + σZ2 ) . n n n Here, the limit is a normal distribution, but the normalizing sequence is
√ n instead
of n. This example demonstrates that for general orthomartingales, to obtain a central limit theorem one must assume extra conditions on the structure of the underlying probability space. For the structure mentioned above, there is no m-dependent approximation for the random fields. Indeed, the example corresponds to the sample space Ω = (RZ , RZ ) with [Tk,l (, η)]i,j = (i+k , ηj+l ), and if we define fm similarly as in (6.13) with Fhmi := σ(i , ηj : −m ≤ i, j ≤ m) , then f and f ◦ Tk,l are independent, if and only if min(k, l) > m. That is, the dependence can be very strong, along the horizontal (the vertical resp.) direction of the random field.
105
6.6
Stationary Causal Linear Random Fields
We establish a central limit theorem for functionals of stationary causal linear random fields. We focus on d = 2. Consider a stationary linear random field {Zi,j }(i,j)∈Z2 defined by (6.26)
Zi,j =
XX
ar,s i−r,j−s =
r∈Z s∈Z
XX
ai−r,j−s r,s ,
r∈Z s∈Z
with coefficient {ai,j }(i,j)∈Z2 satisfying
P
(i,j)∈Z2
a2i,j < ∞. We restrict ourselves to
causal linear random fields, i.e., ai,j = 0 unless i ≥ 0 and j ≥ 0. They are also referred to be adapted to the filtration {Fi,j }(i,j)∈Z2 . Now, consider the random fields {f ◦ Tk,l }(k,l)∈Z2 with a more specific form f = K({Zi,j }0,0 h ), where h is a fixed strictly positive integer, K is a measurable function 2
from Rh to R and for all (k, l) ∈ Z2 , {Zi,j }k,l h := {Zi,j : k − h + 1 ≤ i ≤ k, l − h + 1 ≤ j ≤ l} 2
is viewed as a random vector in Rh with covariates lexicographically ordered. In the sequel, the same definition applies similarly to {xi,j }k,l h , given {xi,j }(i,j)∈Z2 . Assume that (6.27)
EK({Zi,j }h0,0 ) = 0
and
EK p ({Zi,j }0,0 h ) < ∞
for some p ≥ 2. In this way, (6.28)
f ◦ Tk,l = K({Zi,j }k,l h ).
The model (6.28) is a natural extension of the functionals of causal linear processes considered by Wu [117]. Next, we introduce a few notations similarly as in [48] and [117]. Here, our ultimate goal is to translate Condition (6.20) into a condition on the regularity of K
106
and the summability of {ai,j }(i,j)∈Z2 . For all (i, j) ∈ Z2 , let Γ(i, j) = {(r, s) ∈ Z2 : r ≤ i, s ≤ j} ,
(6.29) and write Zi,j
X
=
ai−r,j−s r,s
(r,s)∈Γ(i,j)
X
=
ai−r,j−s r,s +
(r,s)∈Γ(i,j)\Γ(1,1)
(6.30)
X
ai−r,j−s r,s
(r,s)∈Γ(1,1)
=: Zi,j,+ + Zi,j,− .
2 Write Wk,l,− = {Zi,j,− }k,l h and define, for all (k, l) ∈ Z , k,l Kk,l ({xi,j }k,l h ) = EK({Zi,j,+ + xi,j }h ) .
In this way, (6.31)
E(f ◦ Tk,l | F1,1 ) = Kk,l ({Zi,j,− }k,l h ) =: Kk,l (Wk,l,− ) .
Plugging (6.31) into (6.20), we obtain a central limit theorem for functionals of stationary causal linear random fields. Theorem VI.15. Consider the functionals of stationary causal linear random fields (6.28). If Conditions (6.27) holds and ∞ X ∞ X kKk,l (Wk,l,− )kp
(6.32)
k=1 l=1
k 1/2 l1/2
< ∞,
for p = 2, then σ 2 = limn→∞ E(Sn2 )/n2 < ∞ exists and Sn /|Vn |1/2 ⇒ N (0, σ 2 ). If the two conditions hold with p > 2, then the invariance principle (6.7) holds. Next, we will provide conditions on K and {ai,j }(i,j)∈Z2 such that (6.32) holds. For all Λ ⊂ Z2 , write (6.33)
ZΛ =
X (i,j)∈Λ
ai,j −i,−j
and
AΛ =
X (i,j)∈Λ
a2i,j .
107
In particular, our conditions involves summations of ai,j over the following type of regions: Λ(k, l) := {(i, j) ∈ Z2 : i ≥ k, j ≥ l} , (k, l) ∈ Z2 . For the sake of simplicity, we write Ak,l ≡ AΛ(k,l) . The following lemma is a simple extension of Lemma 2, part (b) in [117]. Lemma VI.16. Suppose that there exist α, β ∈ R such that 0 < α ≤ 1 ≤ β < ∞ and E(||2β ) < ∞. If 2 EMα,β (W1,1 ) < ∞ with Mα,β (x) =
(6.34)
|K(x) − K(y)| , |x − y|α + |x − y|β y6=x
sup 2
y∈Rh ,
then, for all p ≥ 2, α/2
kKk,l (Wk,l,− )kp = O(Ak+1−h,l+1−h ) .
(6.35)
Consequently, Condition (6.32) can be replaced by specific ones on Ak,l . Corollary VI.17. Assume there exist α, β ∈ R as in Lemma VI.16. Consider the functionals of stationary linear random fields in form of (6.28). Suppose Condition (6.34) holds and α/2 ∞ ∞ X X Ak+1−h,l+1−h
(6.36)
k=1 l=1
k 1/2 l1/2
< ∞.
p
If E(|| ) < ∞ and (6.27) hold with p = 2, then Sn /n ⇒ N (0, σ 2 ) with some σ < ∞. If E(||p ) < ∞ and (6.27) holds with p > 2, then the invariance principle (6.7) holds. Next, we compare our Condition (6.36) on the summability of {ai,j }(i,j)∈Z2 , and the one considered by Cheng and Ho [15]. They only established central limit theorems for functionals of stationary linear random fields, so we restrict to the case p = 2. Cheng and Ho [15] assumed (6.37)
∞ X ∞ X i=0 j=0
|ai,j |1/2 < ∞ ,
108
and provided different regularity conditions on K. Namely, sup EK 2 (x + ZΛ ) < ∞
Λ⊂Z2
for all x ∈ R with ZΛ defined in (6.33), and that for any two independent random variables X and Y with E(K 2 (X) + K 2 (Y ) + K 2 (X + Y )) < M < ∞, E[(K(X + Y ) − K(X))2 ] ≤ C[E(Y 2 )]γ
(6.38)
for some γ ≥ 1/2. In general, Cheng and Ho [15]’s condition and ours on the regularity K are not comparable and thus have different range of applications. Below, we focus on the simple case that h = 1 and K is Lipschitz, covered by both conditions. This correspond to α = β = 1 in (6.34) and γ = 1 in (6.38). In the following two examples, our Condition (6.36) turn out to be weaker than Condition (6.37). Example VI.18. Consider ai,j = (i + j + 1)−q for all i, j ≥ 0 and some q > 1. Then, P P∞ 2 A= ∞ i=0 j=0 ai,j < ∞ and Ak,l =
∞ X
j(k + l + j)−2q = O((k + l)2−2q ) .
j=1
Then (6.36) is bounded by, up to a multiplicative constant, ∞ X ∞ X (k + l)1−q k=1 l=1
k 1/2 l1/2
∞ ∞ ∞ X k (1−q)/2 X l(1−q)/2 X −q/2 2 < ≤ k . k 1/2 l=1 l1/2 k=1 k=1
Therefore, Condition (6.36) requires q > 2. In this case, Condition (6.37) requires q > 4. Example VI.19. Consider ai,j = (i + 1)−q (j + 1)−q , for all i, j ≥ 0 for some q > 1. P P∞ 2 Then, A = ∞ i=0 j=0 ai,j < ∞ and (6.39)
Ak,l =
∞ X ∞ X
a2i,j = O(k −(2q−1) l−(2q−1) ) .
i=k j=l
One can thus check that Condition (6.36) requires q > 3/2 while Condition (6.37) requires q > 2.
109
6.7
A Moment Inequality
We establish a moment inequality for stationary 2-parameter random fields on general probability spaces, without assuming the product structure. We first review the Peligrad–Utev inequality, a maximal Lp -inequality in dimension one, with p ≥ 2. Let {Xk }k∈Z be a stationary process with Xk = f ◦ T k for all k ∈ Z, where f is a measurable function from a probability space (Ω, A, P) to R, and T is a bimeasurable, measure-preserving, one-to-one and onto map on (Ω, A, P). Consider (6.40)
Sn (f ) =
n X
f ◦ T k.
k=1
Let {Fk }k∈Z be a filtration on (Ω, A, P) such that T −1 Fk = Fk+1 for all k ∈ Z. R R Suppose f 2 dP < ∞, f dP = 0, f ∈ F0 (i.e., the sequence is adapted) and T W f ∈ L2 (F∞ ) L2 (F−∞ ) with F∞ = k∈Z Fk and F−∞ = k∈Z Fk . Let C denote a constant that may change from line to line. It is known that for all f ∈ Lp (F∞ ), E(f | F−∞ ) = 0,
(6.41) max |Sk (f )| ≤ Cn1/2 kE(f | F0 )kp + kf − E(f | F0 )kp 1≤k≤n
p
+
n X kE(Sk (f ) | F0 )kp k=1
k 3/2
+
n X kSk (f ) − E(Sk (f ) | Fk )kp k=1
k 3/2
! .
The inequality above was first established for adapted stationary sequences in Peligrad and Utev [67] and then extended to Lp -inequality for p ≥ 2 in Peligrad et al. [68]. The case p ∈ (1, 2) was addressed by Wu and Zhao [122]. The non-adapted case for p ≥ 2 was addressed by Voln´ y [105]. For the sake of simplicity, we simplify the bound in (6.41) by regrouping the summations. Observe that kE(Sk (f ) | F0 )kp ≤ kE(Sk (f ) | F1 )kp , kE(f | F0 )kp =
110
kE(S1 (f ) | F1 )kp and kf − E(f | F0 )kp = kS1 (f ) − E(S1 (f ) | F1 )kp . Thus, we obtain
(6.42) max |Sk (f )| 1≤k≤n
p
≤ Cn1/2
n X kE(Sk (f ) | F1 )kp
k 3/2
k=1
+
n X kSk (f ) − E(Sk (f ) | Fk )kp
k 3/2
k=1
! .
Now, consider a general probability space (Ω, A, P), and suppose there exists a commuting 2-parameter filtration {Fi,j }(i,j)∈Z2 , and an Abelian group of bimeasurable, measure-preserving, one-to-one and onto maps {Ti,j }(i,j)∈Z2 on (Ω, A, P), W T such that (6.15) holds. Define F∞,∞ = (i,j)∈Z2 Fi,j , F−∞,∞ = i∈Z Fi,∞ and T F∞,−∞ = j∈Z F∞,j . Note that when (Ω, A, P) is a product probability space, then F−∞,∞ and F∞,−∞ are trivial, by Kolmogorov’s zero-one law. Recall the definition of Sm,n (f ) in (6.18). Given f , write Sm,n ≡ Sm,n (f ) for the sake of simplicity. Proposition VI.20. Consider (Ω, A, P), {Ti,j }(i,j)∈Z2 and {Fi,j }(i,j)∈Z2 described as above. Suppose p ≥ 2, f ∈ Lp (F∞,∞ ) and E(f | F−∞,∞ ) = E(f | F∞,−∞ ) = 0. Then, kSm,n kp ≤ Cm1/2 n1/2
n m X X dk,l (f ) k 3/2 l3/2 k=1 l=1
with dk,l (f ) = kE(Sk,l | F1,1 )kp +kE(Sk,l | F1,∞ ) − E(Sk,l | F1,l )kp +kE(Sk,l | F∞,1 ) − E(Sk,l | Fk,1 )kp +kSk,l − E(Sk,l | Fk,∞ ) − E(Sk,l | F∞,l ) + E(Sk,l | Fk,l )kp . Corollary VI.21. Suppose the assumptions in Proposition VI.20 hold. (i) If f ∈ F0,0 , then 1/2 1/2
kSm,n (f )kp ≤ Cm
n
m X n X kE(Sk,l (f ) | F1,1 )kp k=1 l=1
k 3/2 l3/2
.
111
(ii) If {f ◦ Ti,j }(i,j)∈Z2 are two-dimensional martingale differences, in the sense that f ∈ Lp (F0,0 ) and E(f | F0,−1 ) = E(f | F−1,0 ) = 0, then kSm,n (f )kp ≤ Cm1/2 n1/2 kf kp . The proof of Corollary VI.21 is trivial. We only remark that the second case recovers the Burkholder’s inequality for multiparameter martingale differences established in [36]. P Proof of Proposition VI.20. Fix f . Define Se0,n = nj=1 f ◦ T0,j . Clearly, (6.43)
Sm,n =
m X n X
f ◦ Ti,j =
i=1 j=1
m X n X i=1
f ◦ T0,j ◦ Ti,0 =
m X
Se0,n ◦ Ti,0 .
i=1
j=1
Fix n. Observe that ESe0,n = 0 and Se0,n ◦ Ti,0 is a stationary sequence. Furthermore, −1 {Fi,∞ }i∈Z is a filtration, Ti,0 Fj,∞ = T−i,0 Fj,∞ = Fi+j,∞ and E(Se0,n | F−∞,∞ ) = 0.
Therefore, we can apply the Peligrad–Utev inequality (6.42) and obtain 1/2
kSm,n kp ≤ Cm
m X k=1
(6.44)
+
k −3/2 kE(Sk,n | F1,∞ )kp | {z } Λ1
m X
k
−3/2
k=1
kSk,n − E(Sk,n | Fk,∞ )kp . | {z } Λ2
P We first deal with Λ1 . Define Sem,0 = m i=1 f ◦ Ti,0 . Similarly as in (6.43), Sk,n = Pn e j=1 Sk,0 ◦ T0,j , and E(Sk,n | F1,∞ ) =
n X
E(Sek,0 ◦ T0,j | F1,∞ )
j=1
=
n X
E(Sek,0 ◦ T0,j | T0,−j (F1,∞ )) ,
j=1
where in the last equality we used the fact that T0,j (Fi,∞ ) = Fi,∞ , for all i, j ∈ Z. Now, by the identify E(f | F) ◦ T = E(f ◦ T | T −1 (F)), we have (6.45)
E(Sk,n | F1,∞ ) =
n X j=1
E(Sek,0 | F1,∞ ) ◦ T0,j .
112
Observe that (6.45) is again a summation in the form of (6.40). Then, applying the Peligrad–Utev inequality (6.42) again, we obtain 1/2
Λ1 ≤ Cn
n X
l−3/2 kE[E(Sk,l | F1,∞ ) | F∞,1 ]kp
l=1 n X
+
l
−3/2
kE(Sk,l | F1,∞ ) − E[E(Sk,l | F1,∞ ) | F∞,l ]kp .
l=1
By the commuting property of the marginal filtrations (6.17), the above inequality becomes 1/2
Λ1 ≤ Cn
n X
l−3/2 kE(Sk,l | F1,1 )kp
l=1 n X
(6.46)
+
l−3/2 kE(Sk,l | F1,∞ ) − E(Sk,l | F1,l )kp .
l=1
Similarly, one can show Λ2
n
X
= [Sk,0 − E(Sk,0 | Fk,∞ )] ◦ T0,j
p
j=1 1/2
≤ Cn
n X
l−3/2 kE(Sk,l | F∞,1 ) − E(Sk,l | Fk,1 )kp
l=1 n X
+
l−3/2 kSk,l − E(Sk,l | Fk,∞ )
l=1
−E(Sk,l | F∞,l ) + E(Sk,l | Fk,l )kp .
(6.47)
Combining (6.44), (6.46) and (6.47), we have thus proved Proposition VI.20. 6.8
Auxiliary Proofs
For arbitrary σ-fields F, G, let F ∨ G denote the smallest σ-field that contains F and G. Proposition VI.22. Let (Ω, B, P) be a probability space and let F, G, H be mutually independent sub-σ-fields of B. Then, for all random variable X ∈ B, E|X| < ∞, we
113
have E [E(X | F ∨ G) | G ∨ H] = E(X | G) a.s.
(6.48)
Proposition VI.22 is closely related to the notion of conditional independence (see e.g. [16], Chapter 7.3). Namely, provided a probability space (Ω, F, P), and sub-σfields G1 , G2 and G3 of F, G1 and G2 are said to be conditionally independent given G3 , if for all A1 ∈ G1 , A2 ∈ G2 , P(A1 ∩ A2 | G3 ) = P(A1 | G3 )P(A2 | G3 ) almost surely. Proof of Proposition VI.22. First, we show that F ∨ G and G ∨ H are conditionally independent, given G. By Theorem 7.3.1 (ii) in [16], it is equivalent to show, for all F ∈ F, G ∈ G, P(F ∩ G | G ∨ H) = P(F ∩ G | G) almost surely. This is true since P(F ∩ G | G ∨ H) = 1G E(1F | G ∨ H) = 1G E(1F | G) = P(F ∩ G | G) a.s. Next, by Theorem 7.3.1 (iv) in [16], the conditional independence obtained above yields E(X | G ∨ H) = E(X | G) almost surely, for all X ∈ F ∨ G, E|X| < ∞. Replacing X by E(X | F ∨ G), we have thus proved (6.48). Proof of Lemma VI.16. Write Wk,l = {Zi,j }k,l h . Define (and recall that) Wk,l,± = f f {Zi,j,± }k,l h . Let Wk,l,− be a copy of Wk,l,− , independent of Wk,l,± . Set Wk,l := Wk,l,+ + fk,l,− . W Recall Kk,l (Wk,l,− ) = E(K(Wk,l ) | F1,1 ) in (6.31). Observe that by (6.30), Wk,l,− ∈ fk,l,− are independent of F1,1 . Therefore, E(K(W fk,l ) | F1,1 ) = F1,1 , and Wk,l,+ , W fk,l )) = 0, and E(K(W fk,l ) | F1,1 )| |Kk,l (Wk,l,− )| = |E(K(Wk,l ) − K(W fk,l )| | F1,1 ) . ≤ E(|K(Wk,l ) − K(W Observe that by (6.34), fk,l )| ≤ Mα,β (W fk,l )(|Wk,l,− − W fk,l,− |α + |Wk,l,− − W fk,l,− |β ) . |K(Wk,l ) − K(W
114
fk,l,− . By Cauchy–Schwartz’s inequality, and noting that Write Uk,l = Wk,l,− − W fk,l )|2 | F1,1 ) = kMα,β (W fk,l )k2 = kMα,β (W f1,1 )k2 , we have E(|Mα,β (W 2 2 f1,1 )k2 {E[(|Uk,l |α + |Uk,l |β )2 | F1,1 ]}1/2 , |Kk,l (Wk,l,− )| ≤ kMα,β (W whence, for p ≥ 2, f1,1 )k2 k|Uk,l |α + |Yk,l |β kp kKk,l (Wk,l,− )kp ≤ kMα,β (W f1,1 )k2 (k|Uk,l |α kp + k|Yk,l |β kp ) . ≤ kMα,β (W
(6.49)
Finally, since for all γ > 0 and n ∈ N, there exists a constant C(γ, n) > 0 such that and for all vector w = (w1 , . . . , wn ) ∈ Rn , 2γ
|w|
=
n X
wi2
γ
n X ≤ C(γ, n) wi2γ ,
i=1
i=1
it follows that for all γ > 0, fk,l,− |2γ ) = E(|{Zi,j,− − Zei,j,− }k,l |2γ ) E(|Uk,l |2γ ) = E(|Wk,l,− − W h i h X (Zi,j,− − Zei,j,− )2γ . = O E k−h 0, and bn = c2 n−γ . Then a sufficient condition such that Condition VII.3 holds is (7.14)
γ
d.
Consequently, if E(|0 |α ) < ∞ for some α > 2, and Condition VII.1 and (7.14) hold, then the asymptotic normality (7.3) follows. Proof. Assume that mn takes the form of bnδ c. Observe that Bmn is of the same order of A[mn ] as n → ∞. Then, the limit conditions (7.9), (7.10) and (7.11) are implied by lim n−βδ+γ + ndδ−γ + nδ−1+γ/d = 0 ,
n→∞
which is equivalent to γ/β < δ < min{γ/d, 1 − γ/d}. Since β > d implies that ∆∞ < ∞, the desired result follows. Remark VII.7. Under the assumptions of Corollary VII.6, Condition (7.14) is very close to necessary for Condition VII.3 to holds. Indeed, if A[n] = l(n)n−β with limn→∞ l(n) = c2 > 0, then the same argument above yields that Condition VII.3 is equivalent to (7.14). Below, we provide examples of coefficients so that Condition VII.3 holds. We assume that bn = n−γ for some γ ∈ (0, d).
122
Example VII.8. We compare our conditions and the ones by Hallin et al. [44]. They considered the case that |ai | ≤ C|i|−q ∞ , i 0. Then, they require (7.15)
q > max(d + 3, 2d + 1/2)
and
lim nd bn(2q−1+6d)/(2q−1−4d) = ∞ .
n→∞
Our condition (7.14) imposes weaker assumption in this case (with bn = n−γ ). First, observe that A2n,1,...,1 ≤ Bn2 ≤ C
∞ X
id−1 i−2q ≤ Cnd−2q .
i=n
We can apply Corollary VII.6 with β = q − d/2. Then, (7.14) becomes (7.16)
q>
3d 2
and
γ d. Indeed, suppose (7.17) holds with some q > 0. Then, to apply Corollary VII.6, it suffices to observe A2n,1
=
∞ X
X
i1 =n i2 ,...,id ∈N
and take β = q.
2
−2q
|ai | ≤ Cn
∞ X
X
i1 =n i2 ,...,id ∈N
2 −2q |i|2q , ∞ |ai | < Cn
123
At the same time, our result requires γ < dq/(q+d) for the bandwidth, in addition to the minimal one (7.2) assumed in [34]. Recall also that we assume E(|0 |α ) < ∞ for some α > 2, while El Machkouri’s result needs only finite-second-moment assumption on 0 . Remark VII.10. Finally, we compare our result to Wu and Mielniczuk [120]. In the one-dimensional case, to have asymptotic normality they assume only finite variance of 0 and weaker assumption on the coefficient: ∞ X
(7.18)
|ai | < ∞ .
i=0
This is weaker than our condition in one dimension (with q > d = 1 in (7.17)). Wu and Mielniczuk followed a martingale approximation approach. It remains an open question that in high dimension, whether the condition q > d in (7.17) can be improved to match (7.18) in dimension one. 7.3
A Central Limit Theorem for m-Dependent Random Fields
In this section, we prove a central limit theorem for stationary triangular arrays of m-dependent random fields. Throughout this section, let {Yn,i : i ∈ Nd }n∈N denote stationary zero-mean triangular arrays. That is, for each n, {Yn,i }i∈Nd is stationary and Yn,i has zero mean. Furthermore, we assume that {Yn,i }i∈Nd is mn -dependent in the sense that Yn,i and Yn,j are independent if |i − j|∞ ≥ m. We provide conditions such that Sn (Y ) ≡ nd/2
(7.19)
P
i∈J1,nKd nd/2
Yn,i
⇒ N (0, σ 2 ) as n → ∞.
A key condition is the following: (7.20)
X i∈Nd ,1ij
Yn,i ≤ C(j1 · · · jd )1/2 for all n ∈ N, j ∈ Nd . 2
124
Remark VII.11. Observe that Proposition VI.20 provides conditions such that (7.20) holds. In fact, inequality (7.20) has been established, under various conditions on the dependence of stationary random fields, by Dedecker [28] and El Machkouri et al. [35], among others. Theorem VII.12. Suppose that there exists a constant C such that (7.20) holds. If there exists a sequence {ln }n∈N ⊂ N, mn /ln → 0 and ln /n → 0 as n → ∞, such that (7.21) (7.22)
2 i 1 h X E Y = σ2 , n,k d n→∞ ln k∈J1,ln Kd h 2 n X oi X 1 Yn,k 1 Yn,k > nd/2 = 0 , lim d E n→∞ ln d d lim
k∈J1,ln K
k∈J1,ln K
for all > 0, then (7.19) holds. Proof. Consider partial sums over big blocks of size lnd , denoted by ηn,k =
X
Yn,i+k(ln +mn ) , k ∈ Nd .
i∈J1,ln Kd
In this way, for each n ∈ N, {ηn,k }k∈Nd are i.i.d., as we separate neighboring blocks by distance mn , and {Yn,i }i∈Zd are mn -dependent. Set Sn (η) =
X
ηn,k , n ∈ N .
k∈J0,bn/(ln +mn )c−1Kd
Then, (7.20) implies that
S (Y ) S (η)
n
n
d/2 − d/2 → 0 as n → ∞ . n n 2 To see this, for the sake of simplicity, we consider the case n/(ln + mn ) = bn/(ln + mn )c. Indeed, by the triangular inequality, the left-hand side above can
P
be bounded by sums in form of i∈B Yn,i 2 /nd/2 , where B can be a rectangle of size nd−r mrn with r ∈ {1, . . . , d − 1}. Focusing on the dominant term with r = 1,
125
we then bound the left-hand side above by C(n/(ln + mn ))1/2 (nd−1 mn )1/2 /nd/2 = 1/2
Cmn /(ln + mn )1/2 → ∞ as n → ∞. As a consequence, it suffices to show Sn (η)/nd/2 ⇒ N (0, σ 2 ). This, under conditions (7.21) and (7.22), follows from the standard central limit theorem for triangular arrays of independent random variables (see e.g. [32], Chapter 2, Theorem 4.5). Remark VII.13. Central limit theorems for mn -dependent random fields has been considered by Heinrich [47]. His result has been recently applied, with mn = m fixed, by El Machkouri et al. [35] to establish a central limit theorem for stationary random fields. Our application requires us to take mn → ∞. In this case our condition in Theorem VII.12 is weaker than Heinrich’s. In particular, he assumed X m2d n 2 lim E Yn,i 1{|Yn,i |>nd/2 m−2d } = 0 , for all > 0 . n n→∞ nd d
(7.23)
i∈J1,nK
This is stronger than (7.22). 7.4
Asymptotic Normality by m-Approximation
In this section, we prove Theorem VII.4 by an m-approximation argument. Fix x ∈ R and write x − X 1 i Zn,i = √ K bn bn
and
x − X 1 i,mn ζn,i = √ K , i ∈ Zd . bn bn
In this way, {ζn,i }i∈Zd are mn -dependent. We will use {ζn,i : i ∈ Zd }n∈N to approximate {Zn,i : i ∈ Zd }n∈N . We also write Z n,i = Zn,i − EZn,i and ζ n,i = ζn,i − Eζn,i . Setting Sn (ζ) =
X i∈J1,nKd
ζ n,i
and
Sn (Z − ζ) =
X i∈J1,nKd
(Z n,i − ζ n,i ) ,
126
we decompose (nd bn )1/2 (fn (x) − Efn (x)) =
(7.24)
Sn (ζ) Sn (Z − ζ) + . nd/2 nd/2
To prove Theorem VII.4, it suffices to establish the following two results. Proposition VII.14. Under Condition VII.1 and (7.8), (7.10), (7.11) of Condition VII.3, Sn (ζ) ⇒ N (0, σx2 ) . nd/2
(7.25)
Proposition VII.15. Under Condition VII.1 and (7.8), (7.9) of Condition VII.3, Sn (Z − ζ) P −→ 0 . nd/2
(7.26)
To prove the above two propositions, a key step is to establish the following moment inequalities. Lemma VII.16. There exists a constant C > 0, such that for all n ∈ N,
Sn (Z − ζ) ≤ Cnd/2 Z n,0 − ζ n,0 + b1/2 n ∆n . 2 2
(7.27)
In addition, if E(|0 |α ) < ∞ for some α ≥ 2, then (7.28)
X i∈Nd ,1ij
1/2 1/2
ζ n,i ≤ C(j1 · · · jd ) ζ n,0 α + bn ∆n , for all j ∈ Nd . α
The proof is deferred to Section 7.5. Proof of Proposition VII.14. Observing that Sn (ζ)/nd/2 is a partial sum of mn
dependent random fields, we apply Theorem VII.12. Observe that since ζ n,0 2 → σx as n → ∞, (7.28) with α = 2 and assumption (7.8) entail (7.20). prove (7.25), it suffices to show, for ln = mn log n, (7.29)
2 i 1 h X lim d E ζ n,i = σx2 , n→∞ ln d i∈J1,ln K
Thus, to
127
and, writing ξn =
P
i∈J1,ln Kd
ζ n,i ,
1 2 E ξ 1 d/2 n {|ξn |>n } = 0 , for all > 0 . d n→∞ ln
(7.30)
lim
By standard calculation, under (7.7) of Condition VII.1, for all n ∈ N and i 6= 0, |E(ζ n,0 ζ n,i )| ≤ Cpi,mn bn ≤ Cbn . Therefore, 1 h X 2 i 2 ζ n,i − Eζ n,0 ≤ 2 dE ln d i∈J1,ln K
X
|E(ζ n,0 ζ n,i )|1{i6=0} ≤ Cmdn bn .
i∈J−mn ,mn Kd
Thus, assumption (7.10) entails (7.29). To prove (7.30), observe that E(ξn2 1{|ξn |>nd/2 } )
≤
kξn k2α P(|ξn |
d/2
>n
(α−2)/α
)
≤
kξn k2α
kξ k2 (α−2)/α n 2 . d n 2
d/2
This time, (7.28) and (7.8) yield kξn k2 ≤ Cln . For α > 2, observe that, since K is bounded,
ζ n,0 = (E|ζ n,0 |α )1/α ≤ α −(α−2)/α
So, kξn k2α ≤ Clnd bn
C
1/α
ζ n,0 2 ≤ Cb−(α−2)/(2α) . n (α−2)/2 2
bn
. To sum up, we have obtained that
ld (α−2)/α 1 n 2 E(ξ 1 ) ≤ C . d/2 n {|ξn |>n } lnd nd b n Now, (7.11) entails (7.30). Proof of Proposition VII.15. In order to obtain the desired result, it suffices to combine (7.27), assumptions (7.8) and (7.9) and Lemma VII.17 below. Lemma VII.17. Under the assumption of Condition VII.1, there exists a constant C, such that for all n ∈ N, (7.31)
h 1/2 i
1/2
ζ n,0 − Z n,0 ≤ C Bmn + b . n 2 bn
The proof is deferred to Section 7.5.
128
7.5
Proofs
Proof of Lemma VII.2. (i) The existence and Lipschitz continuity of p and pm have been proved by Wu and Mielniczuk [120], Lemma 1. To prove (7.6), observe that Z |pm (y) − p(y)| ≤
|pm (y) − pm (y − x)|e pm (x)dx Z e0,m | . ≤ C |x|e pm (x)dx = CE|X
(7.32)
This entails that pm (x) → p(x) uniformly for x ∈ R as m → ∞. Therefore, (7.6) holds. (ii) Fix i ∈ Zd \ {0} and let Fi denote the joint distribution function of (X0 , Xi ). For the sake of simplicity, we prove the case of a0 = 1. Write R = X0 − 0 and Ri = Xi − i − ai 0 . Now, R and Ri are dependent random variables. First, we show that (7.33)
pi (x, y) ≡
∂2 Fi (x, y) = E[p (x − R)p (y − Ri − ai x)] . ∂x∂y
Indeed, Fi (x, y) = P(X0 ≤ x, Xi ≤ y) = P(0 + R ≤ x, i + ai 0 + Ri ≤ y) (7.34)
= EΦi (x − R, y − Ri ) ,
with, letting F denote the cumulative distribution function of 0 , Z
x
Φi (x, y) =
F (y − ai x0 )F (dx0 ) .
−∞
Differentiating (7.34) yields (7.33) (see e.g. [32], Appendix A.9 on the validation of exchange of differentiation and expectation).
129
Next, we prove (7.7) by establishing the following two steps: (7.35)
lim sup |pi (x, y) − p(x)p(y − ai x)| = 0 ,
|i|∞ →∞ x,y
and lim sup |pi (x, y) − pi,m (x, y)| = 0 .
(7.36)
m→∞ x,y,i
Then, (7.35) implies the first part of (7.7), and the two limits imply the second part. To prove (7.35), set e i = E(Ri | σ(k : k 0)) D
and
e i , i ∈ Zd . Di = Ri − D
By definition, Di and R are independent. Introducing an intermediate term E[p (x − R)p (y−Di −ai x)] = p(x)Ep (y−Di −ai x), we then bound |pi (x, y)−p(x)p(y−ai x)| ≤ Ψ1 + Ψ2 with, under the assumption that p is bounded and Lipschitz, Ψ1 = |pi (x, y) − E[p (x − R)p (y − Di − ai x)]| e i |, ≤ E[p (x − R)|Ri − Di |] ≤ CE|D and Ψ2 = |p(x)p(y − ai x) − E[p (x − R)p (y − Di − ai x)]| ≤ p(x)E|p (y − ai x − Ri + ai 0 ) − p (y − Di − ai x)| e i | + |ai |) . ≤ C(E|D By (7.13), |pi (x, y) − p(x)p(y − ai x)| → 0 as |i|∞ → ∞. To prove (7.36), define Rm = X0,m − 0 and Ri,m = Xi,m − i − ai 0 1{|i|∞