Folded standardized time series area variance

Submitted to IIE Transactions.

Folded standardized time series area variance estimators for simulation CLAUDIA ANTONINI1 Departamento de Matem´ aticas Puras y Aplicadas, Universidad Sim´ on Bolivar, Sartenejas, 1080, Venezuela

CHRISTOS ALEXOPOULOS2 H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

DAVID GOLDSMAN3 H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

JAMES R. WILSON4 Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Campus Box 7906, Raleigh, NC 27695-7906, USA

1

Member, Institute of Industrial Engineers. E-mail: [email protected]. Telephone: +58-212-9063294. 2 Member, Institute of Industrial Engineers. E-mail: [email protected]. Telephone: (404) 894-2361. Fax: (404) 894-2301. 3

Member, Institute of Industrial Engineers, and corresponding author. E-mail: [email protected]. Telephone: (404) 894-2365. Fax: (404) 894-2301. 4

Fellow, Institute of Industrial Engineers. E-mail: [email protected]. Telephone: (919) 515-6415. Fax: (919) 515-5281.

iie08.tex

1

August 5, 2007 – 13:50

Folded standardized time series area variance estimators for simulation We estimate the variance parameter of a stationary simulation-generated process using “folded” versions of standardized time series area estimators. Asymptotically, different folding levels yield unbiased estimators that are independent scaled chi-squared variates, each with one degree of freedom. We exploit this result to formulate improved variance estimators based on the combination of multiple levels as well as the use of batching; the improved estimators preserve the asymptotic bias properties of their predecessors, but have substantially lower variance. A Monte Carlo example demonstrates the efficacy of the new methodology.

1. Introduction One of the most important problems in simulation output analysis is the estimation of the mean µ of a steady-state (stationary) simulation-generated process {Yi : i = 1, 2, . . .}. For instance, we may be interested in determining the steady-state mean transit time in a job shop or the long-run expected profit per period arising from a certain inventory policy. Assuming that the simulation is indeed operating in steady state, the estimation of µ is not itself a particularly difficult problem—simply use the sample mean of the observations, P Y¯n ≡ n1 ni=1 Yi , as the point estimator. But point estimation of the mean is usually not enough, since any serious statistical analysis should also include a measure of the variability of the sample mean. One of the most commonly used measures of this variability is the variance parameter , which is defined as the sum of covariances of the process at all lags, and which can often be written as the intuitively pleasing σ 2 = limn→∞ nVar(Y¯n ). With knowledge of such a measure in hand, we could provide, among other benefits, confidence intervals for µ—typically of the form µ ∈ Y¯n ± t

p σ ˆ 2 /n,

where t is a quantile from the appropriate pivot distribution and σ ˆ 2 is an estimator for σ 2 . Unfortunately, the problem of estimating the variance of the sample mean is not so straightforward. The trouble is caused by the facts that discrete-event simulation data are almost always serially correlated as well as non-normal, e.g., consecutive waiting times in a queueing system. These characteristics render as inappropriate traditional statistical analysis iie08.tex

2

August 5, 2007 – 13:50

methods—which may rely on the assumption of independent and identically distributed (i.i.d.) normal observations. This article is concerned with providing underlying theory for estimating the variance parameter σ 2 of a stationary simulation-generated process. Over the years, a number of methodologies for estimating σ 2 have been proposed in the literature (see Law, 2006), e.g., the techniques referred to as nonoverlapping batch means (NBM), overlapping batch means (OBM), standardized time series (STS), spectral analysis, and regeneration. NBM—conceptually the simplest of these methodologies—divides the data {Yi : i = 1, 2, . . . , n} into nonoverlapping batches, and uses the sample variance of the sample means from the batches (i.e., the batch means) as a foundation to estimate σ 2 . OBM (Meketon and Schmeiser, 1984), on the other hand, effectively re-uses the data by forming overlapping batches, and then invokes an appropriately scaled sample variance of the resulting sample means from the batches to estimate σ 2 . The result is an OBM variance estimator having about the same bias as, but significantly lower variance than, the benchmark NBM estimator employing the same batch and total sample sizes. STS (Schruben, 1983) uses a functional central limit theorem to standardize a stationary time series, such as output from a steady-state discrete-event simulation, into a process that converges to limiting Brownian bridge processes as the batch or total sample sizes become large. Known properties of the Brownian bridge process are then used to obtain estimators for σ 2 . Similar to OBM, overlapping batched versions of various STS estimators have been shown to have the same bias as, but substantially lower variance than, their nonoverlapping counterparts (Alexopoulos et al., 2007b,c). Additional variance-reducing tricks in which STS re-uses data involve orthonormalizing (Foley and Goldsman, 1999) and linearly combining different STS estimators (Aktaran-Kalaycı et al., 2007; Goldsman et al., 2007). A recurring theme that emerges in the development of new estimators for σ 2 is that of the re-use of data. In the current article, we study the consequences of a “folding” operation on the original STS process (and its limiting Brownian bridge process). The folding operation produces multiple standardized time series processes, which in turn will ultimately allow us to use the original data to produce multiple estimators for σ 2 —estimators that are often asymptotically independent as the sample size grows. These folded estimators will lead to combined estimators having smaller variance than existing estimators not based on the folding operation. The article is organized as follows. Section 2 gives some background material on STS. In Section 3, we introduce the notion of folding a Brownian bridge, and we show that each application of folding yields a new Brownian bridge process. We also derive useful expressions for these folded processes in terms of the original Brownian bridge and in terms of the original underlying Brownian motion. Section 4 is concerned with derivations of the iie08.tex

3

August 5, 2007 – 13:50

expected values, variances, and covariances of certain functionals related to the area under a folded Brownian bridge. In Section 5, we finally show how to apply these results to the problem of estimating the variance parameter of a steady-state simulation process. The idea is to start with a single STS, form folded versions of that original STS (which converge to corresponding folded versions of a Brownian bridge process), calculate an estimator for σ 2 from each folded STS, and then combine the estimators into one low-variance estimator. We illustrate the efficacy of the folded estimators via analytical and Monte Carlo examples, and we find that the new estimators indeed reduce estimator variance at little cost in bias. Section 6 presents conclusions, while the technical details of some of the proofs are relegated to the Appendix.

2. Background This section lays out preliminaries on the STS methodology. We begin with some standard assumptions that we shall invoke whenever needed in the sequel. In plain English, these assumptions will ensure that our upcoming variance estimators work properly on a wide variety of stationary stochastic processes. Assumptions A 1. The process {Yi , i ≥ 1} is stationary and satisfies the following Functional Central Limit Theorem. For n = 1, 2, . . . and t ∈ [0, 1], the process Xn (t) ≡

bntc(Y¯bntc − µ) √ σ n

(1)

satisfies Xn =⇒ W, where: µ is the steady-state mean, σ 2 is the variance parameter, n→∞

b·c denotes the greatest integer function; W is a standard Brownian motion process on [0, 1]; and =⇒ denotes weak convergence in D[0, 1], the space of functions on [0, 1] n→∞

that are right-continuous with left-hand limits, as n → ∞. See also Billingsley (1968) and Glynn and Iglehart (1990). P∞ 2 2. i=−∞ Ri = σ ∈ (0, ∞), where Ri ≡ Cov(Y1 , Y1+i ), i = 0, 1, 2, . . ..

3.

P∞

i=1

i2 |Ri | < ∞.

4. The function f (·), defined on [0, 1], is twice continuously differentiable. Further, f (t) R1R1 defines the normalizing condition 0 0 f (s)f (t)[min{s, t} − st] ds dt = 1.

iie08.tex

4

August 5, 2007 – 13:50

Assumptions A.1–A.3 are mild conditions that hold for a variety of stochastic processes encountered in practice (see Glynn and Iglehart, 1990). Assumption A.4 gives conditions on the normalized weight function f (·) that will be used in our estimators for σ 2 . Of fundamental importance to the rest of the paper is the standardized time series of the underlying stochastic process. It is the STS that will form the basis of all of the estimators studied herein. Definition 1. As in Schruben (1983), the (level-0) standardized time series of the process {Yi } is bntc(Y¯n − Y¯bntc ) √ (2) T0,n (t) ≡ σ n for t ∈ [0, 1].

/

In the next section, we discuss how the STS (2) is related to a Brownian bridge process; and in Section 5.2, we show how to use this process to derive estimators for σ 2 .

3. Folded Brownian bridges Our development requires some additional nomenclature. First of all, we define a Brownian bridge, which will turn out to be the limiting process of a standardized time series, much as the standard normal distribution is the limiting distribution of a properly standardized sample mean. Definition 2. Suppose that W(·) is a standard Brownian motion process. The associated level-0 Brownian bridge process is B0 (t) ≡ B(t) ≡ W(t) − tW(1) for t ∈ [0, 1]. / In fact, a Brownian bridge {B(t) : t ∈ [0, 1]} is a Gaussian process with E[B(t)] = 0 and Cov(B(s), B(t)) = min{s, t} − st, for s, t ∈ [0, 1]. Brownian bridges are important for our purposes because under Assumptions A.1–A.3, Schruben (1983) shows that T0,n (·) =⇒ B(·) n→∞

√ and that n(Y¯n − µ) and T0,n (·) are asymptotically independent as n → ∞. The contribution of the current paper is the development and evaluation of folded estimators for σ 2 . We now define precisely what we mean by the folding operation, a map that can be applied either to a STS or a Brownian bridge. iie08.tex

5

August 5, 2007 – 13:50

Definition 3. The folding map Ψ : Y ∈ D[0, 1] → ΨY ∈ D[0, 1] is defined by ΨY (t) ≡ Y

t 2

−Y 1−

t 2

for t ∈ [0, 1]. Moreover for each nonnegative integer k, we define Ψk : Y ∈ D[0, 1] → ΨkY ∈ D[0, 1], the k-fold composition of the folding map Ψ, so that for every t ∈ [0, 1], ( Y (t), if k = 0, ΨkY (t) ≡ / k−1 Ψ ◦ ΨY (t), if k = 1, 2, . . .. The folding operation can be performed multiple times on the Brownian bridge process, as demonstrated by the following definition. Definition 4. (see Shorack and Wellner, 1986) For k = 1, 2, . . . , the level-k folded Brownian bridge is Bk (t) ≡ ΨBk−1 (t) = Bk−1 ( 2t ) − Bk−1 (1 − 2t ), so that Bk (t) = ΨkB0 (t) for t ∈ [0, 1].

/

Intuitively speaking, when the folding operator Ψ is applied to a Brownian bridge process {B0 (t) : t ∈ [0, 1]}, it does the following: (i) Ψ reflects (folds) the portion of the original process defined on the subinterval 21 , 1 (shown in the upper right-hand portion of Figure 1a) about the vertical line t = 21 (yielding the subprocess shown in the upper left-hand portion of Figure 1a); and (ii) Ψ takes the difference between these two subprocesses defined on 0, 21 and stretches that difference over the unit interval [0, 1] (yielding the new process shown in Figure 1b). Lemma 1 shows that as long as we start with a Brownian bridge, folding it will produce another Brownian bridge as well. The proof simply requires that we verify the necessary covariance structure (see Antonini, 2005). [Ref 1 asks why we need Lemma 1 in light of Def 4 above. I believe that Def 4 actually only goes to one level. I’ll check Shorack and Wellner today to make sure.] Lemma 1. For k = 1, 2, . . . , the process {Bk (t) : t ∈ [0, 1]} is a Brownian bridge. The next lemma gives an equation relating the level-k Brownian bridge with the original (level-0) Brownian bridge and the initial Brownian motion process. These results will be useful later on when we derive properties of certain functionals of Bk (t).

iie08.tex

6

August 5, 2007 – 13:50

0.5

1

0.5 -0.5

0.5

1 -1

-0.5

a. Reflecting about t = 1/2

b. Differencing and Stretching

Fig. 1. Geometric Illustration of Folded Brownian Bridges.

Lemma 2. For k = 1, 2, . . . , Bk (t) =

k−1 2X

i=1

i−1 B( 2k−1 +

= (1 − t)W(1) +

t ) 2k

i − B( 2k−1 −

k−1 2X

i=1

W( 2i−1 k−1 +

t ) 2k

t ) 2k

(3)

i − W( 2k−1 −

t ) 2k

.

(4)

The proof of Lemma 2 is a direct consequence of Definition 4. See Antonini (2005) for the details.

4. Some functionals of folded Brownian bridges The purpose of this section is to highlight results on the weighted areas under successively higher levels of folded Brownian bridges. Such functionals will be used in Section 5 to construct estimators for the variance parameter σ 2 arising from a stationary stochastic process. Definition 5. For k = 0, 1, . . . , the weighted area under the level-k folded Brownian bridge is Z 1 Nk (f ) ≡ f (t)Bk (t) dt. / 0

Under simple conditions, Theorem 4.1 shows that Nk (f ) has a standard normal distribution; its proof is in the Appendix. iie08.tex

7

August 5, 2007 – 13:50

Theorem 4.1. For any normalized weight function f (t) and any nonnegative integer k, we have Nk (f ) ∼ Nor(0, 1). Corollary 1. Under the conditions of Theorem 4.1, we have Ak (f ) ≡ σ 2 Nk2 (f ) ∼ σ 2 χ21 . Of course, the corollary is an immediate consequence of Theorem 4.1. Besides the distributional result, it follows that E[Ak (f )] = σ 2 , a finding that we will revisit in Theorem 5.3 when we develop estimators for σ 2 . Meanwhile, we proceed with several results concerning the joint distribution of the {Nk (f ) : k = 0, 1, . . .}. Our first such result, the proof of which is in the Appendix, gives an explicit expression for the covariance between folded area functionals from different levRt els. Before stating the theorem, for any weight function f (·), we define F (t) ≡ 0 f (s) ds, Rt F ≡ F (1), F¯ (t) ≡ 0 F (s) ds, and F¯ ≡ F¯ (1).

Theorem 4.2. Let f1 (t) and f2 (t) be normalized weight functions. Then for ` = 0, 1, . . . and k = 1, 2, . . ., we have Cov[N` (f1 ), N`+k (f2 )] =

k−1 2X

i=1

Z

1 0

h f2 (t) F¯1

i 2k−1

−

t 2k

− F¯1

i−1 2k−1

+

t 2k

i

dt − F¯1 F¯2 . (5)

Lemmas 3–6 give results on the covariance between functionals of Brownian motion from different levels; these will be used later on to establish asymptotic covariances of estimators for σ 2 from different levels. In particular, Lemmas 5 and 6 give simple conditions under which these functionals are uncorrelated. Lemma 3. For `, k = 0, 1, . . . and s, t ∈ [0, 1], Cov[B` (s), B`+k (t)] = Cov[B0 (s), Bk (t)]. Proof. Follows by induction on k.

2

Lemma 4. For `, k = 0, 1, . . ., Cov[N` (f1 ), N`+k (f2 )] = Cov[N0 (f1 ), Nk (f2 )]. Proof. By Lemma 3, Cov[N` (f1 ), N`+k (f2 )] Z 1Z 1 = f1 (s)f2 (t)Cov[B` (s), B`+k (t)] ds dt 0 0 Z 1Z 1 f1 (s)f2 (t)Cov[B0 (s), Bk (t)] ds dt. = 0

2

0

Lemma 5. If the normalized weight function f (t) satisfies f (t) = f (1 − t) for all t ∈ [0, 1], then Cov(N0 (f ), Nk (f )) = 0 for all k = 1, 2, . . . . iie08.tex

8

August 5, 2007 – 13:50

Proof. Applying integration by parts to Eq. (5) with f1 = f2 = f , we obtain Cov[N0 (f ), Nk (f )] 2k−1 Z 1 X 1 i t = k F (t) F 2k−1 − 2tk + F 2i−1 dt − F¯ 2 k−1 + 2k 2 i=1 0 k−1 Z 2X 1 1 t t F (t) + F 2i−1 dt − F¯ 2 F 1 − 2i−1 = k k−1 + 2k k−1 + 2k 2 0 i=1 ¯ FF = − F¯ 2 , 2 R1 R1 R 1−x which follows since F (1 − x) = 0 f (y) dy = x f (1 − z) dz = x f (z) dz, so that F (1 − x) + F (x) = F for all x ∈ [0, 1]. The proof is completed by noting that Z 1 F¯ = f (x)(1 − x) dx 0 Z 1/2 Z 1/2 f (1 − y)y dy f (x)(1 − x) dx + = 0 0 Z 1/2 Z 1/2 = f (x)(1 − x) dx + f (y)y dy 0 0 Z 1/2 F f (x) dx = = . 2 2 0 Lemma 6. If the normalized weight function satisfies f (t) = f (1 − t) for all t ∈ [0, 1], then for ` = 0, 1, . . . and k = 1, 2, . . ., Cov[N` (f ), N`+k (f )] = Cov[N0 (f ), Nk (f )] = 0. Proof. Immediate from Lemmas 4 and 5.

2

The following lemma, proven in the Appendix, establishes the multivariate normality of the random vector N(f ) ≡ [N0 (f ), N1 (f ), . . . , Nk (f )]. It will be used in Theorem 4.3 to obtain the remarkable result that, under relatively simple conditions, the folded functionals {Nk (f ) : k = 0, 1, . . .} are i.i.d. Nor(0, 1). Lemma 7. If the normalized weight function satisfies f (t) = f (1 − t) for all t ∈ [0, 1], then for each positive integer k the random vector N(f ) has a nonsingular multivariate normal distribution. Theorem 4.3. If the normalized weight function f (t) satisfies f (t) = f (1−t) for all t ∈ [0, 1], then the random variables {Nk (f ) : k = 0, 1, . . .} are i.i.d. Nor(0, 1) random variables. Proof. Lemma 6 implies that Cov(Nk (f ), Nj (f )) = 0 for every k 6= j. Now, since by Lemma 7 the random vector N(f ) has a multivariate normal distribution, we can conclude iie08.tex

9

August 5, 2007 – 13:50

that the random variables N0 (f ), N1 (f ), . . . are i.i.d. Nor(0, 1).

2

The next corollary, which is immediate from Theorem 4.3, will serve as the basis for our new variance estimators to be derived in Section 5. Corollary 2. Under the conditions of Theorem 4.3, the random variables {A k (f ) : k = 0, 1, . . .} are i.i.d. σ 2 χ21 . Example 1. The following weight functions arise in simulation output analysis applications √ (see Foley and Goldsman, 1999 and Section 5 of the current article): f0 (t) ≡ 12, f2 (t) ≡ √ √ 840(3t2 − 3t + 1/2), and fcos,j (t) ≡ 8πj cos(2πjt), j = 1, 2, . . ., all for t ∈ [0, 1]. By Theorem 4.3, {Nk (f ), k ≥ 0} are i.i.d. Nor(0, 1), and by Corollary 2, {Ak (f ), k ≥ 0} are i.i.d. σ 2 χ21 for f = f0 , f2 , or fcos,j , j = 1, 2, . . .. /

5. Application to variance estimation We finally show how our work on properties of area functionals of folded Brownian bridges can be used in simulation output analysis. With this application in mind, we apply the folding transformation to Schruben’s level-0 STS (Schruben, 1983) in Section 5.1, thereby obtaining several new versions of the STS. These new series are used in Section 5.2 to produce new estimators for σ 2 . Section 5.3 gives obvious methods to improve the estimators, and Section 5.4 presents a simple Monte Carlo example showing that the estimators work as intended. 5.1. Folded standardized time series Analogous to the level-k folded Brownian bridge from Definition 4, we define the level-k folded STS. Definition 6. For k = 1, 2, . . . , the level-k folded STS is Tk,n (t) ≡ ΨTk−1,n (t) = Tk−1,n so that Tk,n (t) = ΨkT0,n (t) for t ∈ [0, 1].

/

t 2

− Tk−1,n 1 −

t 2

The next goal is to examine the convergence of the level-k folded STS to the analogous level-k folded Brownian bridge process. The following result is an immediate consequence of the almost-sure continuity of Ψk on D[0, 1] for k = 0, 1, . . . , and the Continuous Mapping Theorem (CMT) (Billingsley, 1968). iie08.tex

10

August 5, 2007 – 13:50

Theorem 5.1. If Assumptions A.1–A.3 hold, then for any fixed nonnegative integer k, we have T0,n (·), . . . , Tk,n (·) =⇒ B0 (·), . . . , Bk (·) . n→∞

Moreover,

√

n Y¯n − µ and T0,n (·), . . . , Tk,n (·) are asymptotically independent as n → ∞.

5.2. Folded area estimators We introduce folded versions of the STS area estimator for σ 2 , along with their asymptotic distributions, expected values, and variances. To begin, we define our new estimators, along with their limiting Brownian bridge functionals. Definition 7. For each nonnegative integer k, the STS level-k folded area estimator for σ 2 is Ak (f ; n) ≡ Nk2 (f ; n), where

n

1X f Nk (f ; n) ≡ n j=1

j n

σTk,n

j n

and f (·) is a normalized weight function (satisfying Assumption A.4). The case k = 0, f = f0 corresponds to Schruben’s original area estimator (Schruben, 1983). / Definition 8. Let A(f ; n) [A0 (f ), A1 (f ), . . . , Ak (f )]. /

≡

[A0 (f ; n), A1 (f ; n), . . . , Ak (f ; n)]

and

A(f )

≡

The following definitions provide the necessary set-up to establish in Theorem 5.2 below the asymptotic distribution of the random vector A(f ; n) as n → ∞. Definition 9. Let Λ denote the class of strictly increasing, continuous mappings of [0, 1] onto itself such that for every λ ∈ Λ, we have λ(0) = 0 and λ(1) = 1. If X, Y ∈ D[0, 1], then the Skorohod metric ρ(X, Y ) defining the “distance” between X and Y in D[0, 1] is the infimum of those positive ξ for which there exists a λ ∈ Λ such that sup t∈[0,1] |λ(t) − t| ≤ ξ and supt∈[0,1] |X(t) − Y [λ(t)]| ≤ ξ. (See Billingsley, 1968 for further details.) / Definition 10. For each positive integer n, let Ωn : Y ∈ D[0, 1] → ΩnY ∈ D[0, 1] be the approximate (discrete) STS map ΩnY (t) ≡

iie08.tex

bntcY (1) − Y (t) n

11

August 5, 2007 – 13:50

for t ∈ [0, 1]. Moreover, let Ω : Y ∈ D[0, 1] → ΩY ∈ D[0, 1] denote the corresponding asymptotic STS map ΩY (t) ≡ lim ΩnY (t) = tY (1) − Y (t) n→∞

for t ∈ [0, 1].

/

Note that Ωn maps the process (1) into the corresponding standardized time series (2) so that we have ΩnXn (t) = T0,n (t) for t ∈ [0, 1] and n = 1, 2, . . . ; moreover, Ω maps a standard Brownian motion process into a standard Brownian bridge process, ΩW (t) = tW(1)−W(t) ∼ B0 (t) for t ∈ [0, 1]. Definition 11. For a given normalized weight function, for every nonnegative integer k, and for every positive integer n, the approximate (discrete) folded area map Θkn : Y ∈ D[0, 1] → Θkn (Y ) ∈ R is defined by n

"

1X f Θkn (Y ) ≡ n i=1

i

n

Ψk ◦ ΩnY

#2 i

n

.

Moreover, the corresponding asymptotic folded area map Θk : Y ∈ D[0, 1] → Θk (Y ) ∈ R is defined by Z 1 2 k k Θ (Y ) ≡ f (t)Ψ ◦ ΩY (t) dt . / 0

[Ref 1 correctly points out that we’re missing a σ on the RHS of the eqns for Θkn (Y ) and Θk (Y ). I propose that Jim take the first cut at fixing this, and anything that happens to cascade from it (which may be nothing at all.]

In terms of Eq. (1), the definition of Ak (f ) from Corollary 1, and the Definitions 8–11, we see that Θkn (Xn ) = Ak (f ; n) and Θk (W) = Ak (f ) for every nonnegative integer k. We are now ready to proceed with the main convergence theorem, which shows that the folded area estimators converge jointly to their asymptotic counterparts. Theorem 5.2. If Assumptions A hold, then A(f, n) =⇒ A(f ).

(6)

n→∞

Sketch of Proof. Although the proof of Theorem 5.2 is detailed in the Appendix, it can be summarized as follows. Our goal is to apply the generalized CMT—that is, Theorem 5.5 of Billingsley (1968)—to prove that the (k + 1) × 1 random vector with jth element Θjn (Xn ) converges in distribution to the (k + 1) × 1 random vector with jth element Θj (W) iie08.tex

12

August 5, 2007 – 13:50

for j = 0, 1, . . . , k. To establish the hypotheses of the generalized CMT, we show that if {xn } ⊂ D[0, 1] is any sequence of functions converging to a realization W of a standard Brownian motion process in the Skorohod metric on D[0, 1], then the real-valued sequence Θjn (xn ) converges to Θj (W) almost surely. First we exploit the almost-sure continuity of W(u) at every u ∈ [0, 1] and the convergence of {xn } to W in D[0, 1] to show that for every nonnegative integer j, with probability one we have Ψj ◦ Ωnxn (t) − Ψj ◦ ΩnW (t) → 0 uniformly for t ∈ [0, 1] as n → ∞; and it follows that lim Θjn (xn ) − Θjn (W) = 0 with probability one. (7) n→∞

Next we exploit the almost-sure convergence Ψj ◦ΩnW (t)−Ψj ◦ΩW (t) → 0 for all t ∈ [0, 1] as n → ∞ together with the almost-sure continuity and Riemann integrability of f (t)Ψj ◦ΩW (t) for t ∈ [0, 1] to show that (8) lim Θjn (W) − Θj (W) = 0 with probability one. n→∞

Combining (7) and (8) and applying the triangle inequality, we see that the corre 0 sponding vector-valued sequence Θn (xn ), . . . , Θkn (xn ) : n = 1, 2, . . . converges to 0 Θ (W), . . . , Θk (W) in Rk+1 with probability one; and thus the desired result follows directly from the generalized CMT. 2 Remark 1. Under Assumptions A, and for the weight functions in Example 1, Theorem 5.2 and Corollary 1 imply that A0 (f ; n), . . . , Ak (f ; n) are asymptotically (as n → ∞) i.i.d. σ 2 χ21 random variables. / Under relatively modest conditions, Theorem 5.3 gives asymptotic expressions for the expected values and variances of the level-k area estimators. Theorem 5.3. Suppose that Assumptions A hold. Further, for fixed k ≥ 0, suppose that the family of random variables A2k (f ; n) : n ≥ 1 is uniformly integrable (see Billingsley, 1968 for a definition and sufficient conditions). Then we have E[Ak (f ; n)] −→ E[Ak (f )] = σ 2 , n→∞

and Var[Ak (f ; n)] −→ Var[Ak (f )] = 2σ 4 . n→∞

Remark 2. One can obtain finer-tuned results for E[A0 (f ; n)] and E[A1 (f ; n)]. In particular, under Assumptions A, Foley and Goldsman (1999) and Goldsman et al. (1990) show that E[A0 (f ; n)] = σ 2 + iie08.tex

[(F − F¯ )2 + F¯ 2 ]γ + o(1/n), 2n 13

August 5, 2007 – 13:50

P where γ ≡ −2 ∞ i=1 iRi (Song and Schmeiser, 1995). [As I discovered in the WSC paper, Ref 1 points out that we never defined γ!] In a companion paper, Alexopoulos et al. (2007a), we find that if Assumptions A hold and n is even, then E[A1 (f ; n)] = σ 2 +

F¯ 2 γ + o(1/n). n

5.3. Enhanced estimators The individual estimators whose properties are given in Theorem 5.3 are all based on a single long run of n observations, and all involve a single level k of folding. This section discusses some obvious extensions of the estimators that have improved asymptotic properties—batching and combining levels. Batching: In actual applications, we often organize the data by breaking the n observations into b contiguous, nonoverlapping batches, each of size m so that n = bm; and then we can compute the folded variance estimators from each batch separately. As the batch size m → ∞, the variance estimators computed from different batches are asymptotically independent under broadly applicable conditions on the original (unbatched) process {Yi , i ≥ 1}; and thus more stable (i.e., more accurate) variance estimators can be obtained by combining the folded variance estimators computed from all available batches. In view of this motivation, suppose that the ith batch of size m consists of the observations Y(i−1)m+1 , Y(i−1)m+2 , . . . , Yim , for i = 1, 2, . . . , b. Using the obvious minor changes to the appropriate definitions, one can construct the level-k STS from the ith batch of observations, say Tk,m,i (t); and from there, one can obtain the resulting level-k area estimator from the ith batch, say Ak,i (f ; m). Finally, we define the level-k batched folded area estimator for σ 2 by b 1X ˜ Ak,i (f ; m). Ak (f ; b, m) ≡ b i=1

Under the conditions of Theorem 5.3, we have limm→∞ E[A˜k (f ; b, m)] = σ 2 and limm→∞ Var[A˜k (f ; b, m)] = 2σ 4 /b, where the latter result follows from the fact that the Ak,i (f ; m)’s, i = 1, 2, . . . , b, are asymptotically independent as m → ∞. Thus, we obtain batched estimators with approximately the same expected value σ 2 as a single folded estimator arising from one long run, yet with substantially smaller variance. Combining levels of folding: Theorem 5.3 shows that, for a particular weight function f (·), all of the estimators from different levels of folding behave about the same asymptotically in terms of their expected value and variance. We can improve upon these individual

iie08.tex

14

August 5, 2007 – 13:50

estimators by combining the different levels. To this end, denote the average of the folded area estimators from levels 0, 1, . . . , k by k

A¯k (f ; n) ≡

1 X Aj (f ; n). k + 1 j=0

Under the conditions of Remark 1 and Theorem 5.3, we have limn→∞ E[A¯k (f ; n)] = σ 2 , and limn→∞ Var[A¯k (f ; n)] = 2σ 4 /(k + 1). Thus, we obtain combined estimators with approximately the same expected value σ 2 as a single folded estimator arising from one level, yet with significantly smaller variance. 5.4. Monte Carlo Examples We illustrate the performance of the new folded estimator with simple Monte Carlo experiments involving a stationary first-order autoregressive [AR(1)] process and a stationary M/M/1 queue-waiting-time process. In both cases, simulation runs are based on 4,096 independent replications using b = 32 batches. We used common random numbers across all variance estimation methods based on the combined generator given in Figure 1 of L’Ecuyer (1999) for random number generation. Can somebody add this reference correctly? We also checked the performance of the batched estimators when we combine levels. In particular, we used the realizations from the individual levels 0 and 1 with quadratic weight function f2 (·) to calculate realizations of 1 A1 (f2 ; 32, m) ≡ [A˜0 (f2 ; 32, m) + A˜1 (f2 ; 32, m)], 2 and we obtained the estimated performance characteristics for this combined estimator, shown in column 4 of all Tables below. [we need to include the material about the NBM here. Can somebody else do this?] We also compare the folded area estimators againts the NMB. See column 5 of Tables 1 and and column 6 of Tables 2 and . 5.4.1. AR(1) Process An AR(1) process is constructed by setting Yi = φYi−1 + i , i = 1, 2, . . ., where the i ’s are i.i.d. Nor(0, 1 − φ2 ), Y0 is a standard normal random variable that is independent of the i ’s, and −1 < φ < 1 (to preserve stationarity). It is well known that, for the AR(1) process, Rk = φk , k = 0, 1, 2 . . ., and σ 2 = (1 + φ)/(1 − φ). In the current example, iie08.tex

15

August 5, 2007 – 13:50

• We set the parameter φ = 0.9 corresponding to a highly positive autocorrelation structure with variance parameter σ 2 = 19. • We calculated the estimated mean (Table 1) and standard error and correlations of the estimators (Table 2), averaged over the 4096 realizations. • With the purpose of demonstrating the convergence of the expected values of the estimators to the variance parameter σ 2 as the batch size m increases, we show the numerical results in Table 1. • We compare the folded estimators versus the non-overlapping batch mean variance estimator. Results are shown in column 5 of Tables 1 and 3 and column 6 of Tables 2 and 4.

Table 1. Estimated Expected Values of the Enhanced Folded Area Estimators versus the Non-Overlapping Batch Means for an AR(1) Process with φ = 0.9, σ 2 = 19, and b = 32 m A˜0 (f2 ; 32, m) A˜1 (f2 ; 32, m) A1 (f2 ; 32, m) N (32; m) 128 256 512 1024 2048 4096 8192 16384 32768 65536

16.32 18.04 18.66 18.90 19.01 18.91 19.12 18.83 18.99 19.06

12.75 16.48 18.06 18.70 18.85 19.00 18.91 19.01 19.03 18.86

14.53 17.26 18.36 18.80 18.93 18.95 19.01 18.92 19.01 18.96

17.43 18.12 18.60 18.87 18.85 18.83 18.70 18.94 18.90 18.88

We summarize our conclusions from Tables 1 and 2 regarding the expected values and standard errors respectively of the variance estimators under consideration as follows: • The estimated expected values of all variance estimators converge to σ 2 (19 in this case) as m increases, in accordance with our theoretical results. • For small values of m, the NBM yield expected values that are superior to those of the folded area estimators, but the folded area estimators quickly catch up to the batch-mean estimator as m gets larger.

iie08.tex

16

August 5, 2007 – 13:50

Table 2. Estimated Standard Error and Correlation of the Enhanced Folded Area Estimators versus the Non-Overlapping Batch Means for an AR(1) Process with φ = 0.9, σ 2 = 19, and b = 32 m A˜0 (f2 ; 32, m) A˜1 (f2 ; 32, m) A1 (f2 ; 32, m) Corr(A˜0 , A˜1 ; 32, m) N (32; m) 128 256 512 1024 2048 4096 8192 16384 32768 65536

0.0635 0.0711 0.0735 0.0743 0.0757 0.0757 0.0747 0.0724 0.0767 0.0751

0.0495 0.0643 0.0709 0.0740 0.0736 0.0743 0.0738 0.0748 0.0731 0.0740

0.0404 0.0479 0.0509 0.0519 0.0528 0.0531 0.0523 0.0517 0.0527 0.0527

0.0045 -0.0031 -0.0075 -0.0206 0.0018 0.0035 -0.0084 -0.0112 -0.0086 0.0004

0.0696 0.0716 0.0750 0.0739 0.0743 0.0748 0.0752 0.0765 0.0753 0.0766

• As the batch size m becomes large, the combined estimator A1 (f2 ; 32, m) attains comparable expected values with the bonus of substantially (≈ 50%) reduced variance. Theoretically, we know that the asymptotic variance of the individual levels of the batched folded area estimators is 2σ 4 /b. This translates in ≈ 22.56 asymptotic variance (as batch size m becomes large) for the AR(1) process we chose. This gives us, when performing 4,096 replications, an ≈ 0.0742 standard error. Similarly, the theoretical asymptotic variance for the NBM is 2σ 4 /(b − 1) (we need to add the right reference for this) which is not significantly larger. However, when we observe the theoretical asymptotic variance for the combined estimator A1 (f2 ; 32, m), it is divided by 2 (see section 5.3) since we are combining levels 0 and 1. As a result, this estimator has half the asymptotic variance of the other 3 while preserving rates of convergence to σ 2 similar to the single level estimators and catching up relatively fast with NBM. This √ variance reduction translates in a reduction in the standard error by a factor of 2. Obtaining, in this case, a standard error of ≈ 0.0525. We can observe this behavior in column 4 of Table 2. • As Lemma 6 establishes, the correlation goes to zero as the batch size m increases. We can observe that in column 5 of Table 2.

iie08.tex

17

August 5, 2007 – 13:50

5.4.2. M/M/1 Process We also consider the stationary queue-waiting-time process for an M/M/1 queue with arrival rate λ and traffic intensity ρ < 1; i.e., a queueing system with Poisson arrivals and firstin-first-out i.i.d. exponential service times at a single server. For this process, we have σ 2 = ρ3 (2 + 5ρ − 4ρ2 + ρ3 )/[λ2 (1 − ρ)4 ]; see, for example, Steiger and Wilson (2001). In this particular example, • We set the arrival rate at 0.8 and the service rate at 1.0, so that the server utilization parameter is ρ = 0.8 corresponding to a highly positive autocorrelation structure and variance parameter σ 2 = 1, 976. • With the purpose of demonstrating the convergence of the expected values of the estimators to the variance parameter σ 2 as the batch size m increases, we show the numerical results in Table 3. Similarly, we show the standard errors and correlations of the estimators in Table 4. • In columns 2–4 of Tables 3 and 4 we show the results for the enhanced folded area estimators with quadratic weight function f2 . We also compare the folded estimators versus the non-overlapping batch mean variance estimator. Results of this experiment are shown in column 5 of Table 3 and column 6 of Table 4.

Table 3. Estimated Expected Values of the Enhanced Folded Area Estimators versus the Non-Overlapping Batch Means for an M/M/1 Waiting Time Process with ρ = 0.8, σ 2 = 1976, and b = 32. 1024 Step Warm-up for the M/M/1 Process. m A˜0 (f2 ; 32, m) A˜1 (f2 ; 32, m) A1 (f2 ; 32, m) N (32; m) 128 256 512 1024 2048 4096 8192 16384 32768 65536

iie08.tex

557 1,043 1,515 1,827 1,922 1,975 1,974 1,979 1,982 1,977

281 643 1,135 1,584 1,831 1,923 1,972 1,970 1,980 1,972

18

419 843 1,325 1,705 1,877 1,949 1,973 1,975 1,981 1,975

1,275 1,593 1,769 1,893 1,928 1,958 1,963 1,970 1,969 1,957

August 5, 2007 – 13:50

Table 4. Estimated Standard Error and Correlation of the Enhanced Folded Area Estimators versus the Non-Overlapping Batch Means for an M/M/1 Waiting Time Process with ρ = 0.8, σ 2 = 1976, and b = 32. 1024 Step Warm-up for the M/M/1 Process. m A˜0 (f2 ; 32, m) A˜1 (f2 ; 32, m) A1 (f2 ; 32, m) C(A˜0 , A˜1 ; 32, m) N (32; m) 128 256 512 1024 2048 4096 8192 16384 32768 65536

4.318 8.867 14.39 17.15 15.86 14.04 11.14 9.589 8.381 7.988

1.652 4.410 8.440 12.42 13.19 12.69 11.74 9.335 8.640 8.105

2.533 5.550 9.480 12.62 12.37 11.04 9.339 7.343 6.369 5.803

0.3002 0.3217 0.3342 0.4426 0.4474 0.3628 0.3322 0.2046 0.1197 0.0401

19.83 22.43 20.94 18.09 14.16 12.22 10.15 9.068 8.350 8.146

We summarize our conclusions from Tables 3 and 4 regarding the expected values and standard errors respectively of the variance estimators under consideration as follows: • The estimated expected values of all variance estimators converge to σ 2 (1976 in this case) as m increases, in accordance with our theoretical results. • For small values of m, the NBM yield expected values that are superior to those of the folded area estimators, but the folded area estimators quickly catch up to the batch-mean estimator as m gets larger. • Similarly to the example with the AR(1) process, the combined estimator A1 (f2 ; 32, m) attains comparable expected values with the bonus of substantially (≈ 50%) reduced variance. This translates in ≈ 244, 036 asymptotic variance (as batch size m becomes large) for the M/M/1 process under study. This gives us, when performing 4,096 replications, an ≈ 7.719 standard error. Again, the asymptotic variance for the NBM is not significantly larger. However, when we observe the theoretical asymptotic variance for the combined estimator A1 (f2 ; 32, m), it is divided by 2 (see section 5.3) since we are combining levels 0 and 1. As a result, this estimator has half the asymptotic variance of the other 3 while preserving rates of convergence to σ 2 similar to the single level estimators and catching up relatively fast with NBM. This variance reduction √ translates in a reduction in the standard error by a factor of 2. Obtaining, in this case, a standard error of ≈ 5.458. We can observe this behavior in column 4 of Table 4. iie08.tex

19

August 5, 2007 – 13:50

• As Lemma 6 establishes, the correlation goes to zero as the batch size m increases. We can observe that in column 5 of Table 4. The convergence to zero in this case seems to be slower than the one in the AR(1) example. Do you have an explanation for this? It would be nice to include it here.

6. Conclusions The main purpose of this article was to introduce “folded” versions of the standardized time series area estimator for the variance parameter arising from a stationary simulation process. We provided theoretical results showing that the folded estimators converge to appropriate functionals of Brownian motion; and these convergence results allow us to produce asymptotically unbiased and low-variance estimators using multiple folding levels in conjunction with standard batching techniques. At each folding level, and for each weight function in Example 1, the proposed estimators can be computed in O(n) time; the detailed computations will be listed in a forthcoming article. Ongoing work includes the following. As in Remark 2, we can derive precise expressions for the expected values of the folded estimators—expressions that show just how quickly any estimator bias dies off as the batch size increases. We can also produce analogous folding results for other “primitive” STS variance estimators, e.g., for Cramér–von Mises estimators, as described in Antonini (2005). In addition, whatever type of primitive estimator we choose to use, there is interest in finding the best ways to combine batching and multiple folding levels in order to produce even-better estimators for σ 2 , and subsequently, good confidence intervals for the underlying steady-state mean µ. In any case, we have also planned a massive Monte Carlo analysis to examine estimator performance over a variety of benchmark processes. Future work includes the development of overlapping versions of the folded estimators, as in Alexopoulos et al. (2007b,c) and in Meketon and Schmeiser (1984).

Acknowledgements Partial support for our research was provided by National Science Foundation Grant DMI0400260.

References Aktaran-Kalaycı, T., Goldsman, D., and Wilson, J. R. (2007) Linear combinations of overlapping variance estimators for simulation. Operations Research Letters, to appear. iie08.tex

20

August 5, 2007 – 13:50

Alexopoulos, C., Antonini, C., Goldsman, D., Meterelliyoz, M., and Wilson, J. R. (2007a) Properties of folded standardized time series variance estimators for simulation. Technical Report, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA. Alexopoulos, C., Argon, N. T., Goldsman, D., Steiger, N. M., Tokol, G., and Wilson, J. R. (2007b) Efficient computation of overlapping variance estimators for simulation. INFORMS Journal on Computing, to appear. Alexopoulos, C., Argon, N. T., Goldsman, D., Tokol, G., and Wilson, J. R. (2007c) Overlapping variance estimators for simulation. Operations Research, to appear. Anderson, T. W. (1984) An Introduction to Multivariate Statistical Analysis, 2nd ed., Wiley, New York. Antonini, C. (2005) Folded variance estimators for stationary time series. Ph.D. dissertation, H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA. Apostol, T. M. (1974) Mathematical Analysis, 2nd ed., Addison-Wesley, Reading, MA. Bartle, R. G. (1976) The Elements of Real Analysis, 2nd ed., Wiley, New York. Billingsley, P. (1968) Convergence of Probability Measures, Wiley, New York. Foley, R. D., and Goldsman, D. (1999) Confidence intervals using orthonormally weighted standardized time series. ACM Transactions on Modeling and Computer Simulation, 9, 297–325. Glynn, P. W., and Iglehart, D. L. (1990) Simulation output analysis using standardized time series. Mathematics of Operations Research, 15, 1–16. Goldsman, G., Kang, K., Kim, S.-H., Seila, A. F., and Tokol, G. (2007) Combining standardized time series area and Cramér–von Mises variance estimators. Naval Research Logistics, to appear. Goldsman, D., Meketon, M., and Schruben, L. (1990) Properties of standardized time series weighted area variance estimators. Management Science, 36, 602–612. Grimmett, G. R., and Stirzaker, D. R. (1992) Probability and Random Processes, Oxford Science Publications, Oxford. Law, A. M. (2006) Simulation Modeling and Analysis, 4th ed., McGraw-Hill, New York. iie08.tex

21

August 5, 2007 – 13:50

Loève, M. Probability Theory II, 4th ed., Springer-Verlag, New York. Meketon, M. S., Schmeiser, B. W. (1984) Overlapping batch means: Something for nothing?, in Proceedings of the 1984 Winter Simulation Conference, Sheppard, S., Pooch, U., Pegden, D. (eds.), Institute of Electrical and Electronics Engineers, Piscataway, NJ, pp. 227–230. Schruben, L. (1983) Confidence interval estimation using standardized time series. Operations Research, 31, 1090–1108. Shorack, G. R., and Wellner, J. A. (1986) Empirical Processes with Applications to Statistics, Wiley, New York. Song, W. T., and Schmeiser, B. W. (1995) Optimal mean-squared-error batch sizes. Management Science, 41, 110–123. Steiger, N. M., and Wilson, J. R. 2001. Convergence properties of the batch-means method for simulation output analysis. INFORMS J. Comput., 13 277–293.

Appendix Proof of Theorem 4.1 Since Nk (f ) is the integral of a continuous function over the closed interval [0, 1], its Riemann sum satisfies (see p. 229 of Bartle, 1976) m

a.s. 1 X i Nk,m (f ) ≡ f ( m )Bk ( mi ) −→ Nk (f ), m i=1 m→∞

(A1)

a.s.

where “ −→ ” denotes almost-sure convergence as m → ∞. For fixed k and m, Nk,m (f ) is m→∞

normal since it is a finite linear combination of jointly normal random variables. Furthermore, Nk,m (f ) has expected value 0 and variance m m 1 XX i j j i Var [Nk,m (f )] = f ( )f ( )Cov B ( ), B ( ) k k m m m m2 i=1 j=1 m Z 1Z 1 = f (s)f (t) [min{s, t} − st] ds dt + O(1/m) 0

0

= 1 + O(1/m).

iie08.tex

22

August 5, 2007 – 13:50

Eq. (A1) implies that the characteristic function of Nk,m (f ) converges to the characteristic function of Nk (f ) as m → ∞ (see p. 172 of Grimmett and Stirzaker, 1992). Since Nk,m (f ) is normal with mean 0 and variance 1 + O(1/m), its characteristic function is given by o n 2 √ ϕm (t) = E exp −1 tNk,m (f ) = exp − t2 [1 + O(1/m)] . It follows immediately that limm→∞ ϕm (t) = exp (−t2 /2), the characteristic function of the standard normal distribution; and thus Nk (f ) ∼ Nor(0, 1). 2 Proof of Theorem 4.2 By Lemma 4 and Eq. (3), Cov[N` (f1 ), N`+k (f2 )] = Cov[N0 (f1 ), Nk (f2 )] Z 1Z 1 f1 (s)f2 (t)Cov[B0 (s), Bk (t)] ds dt = 0

0

=

k−1 2X

i=1

=

k−1 2X

Z

k−1 2X

Z

i=1

=

Z

i=1

− =

k−1 2X

i=1

=

i=1

0 1 0

Z

Z

Z

1

h f1 (s)f2 (t)Cov B(s), B

0 1

h

"Z

0 i 2k−1

−

t 2k

0 1

0

i−1 2k−1

−

1

t 2k

h

i 2k−1

−

−

t 2k

t 2k

i−1 2k−1

+

f1 (s)s ds +

f1 (s)s ds −

f2 (t) − i 2k−1

+

i−1 2k−1

0

" Z f2 (t) −

i 2k−1

0

f1 (s)f2 (t) min s, 0

f2 (t)

+F¯1 −

Z

1

Z

− k−1 2X

1

t 2k

i 2k−1 i−1 2k−1

Z

+

t 2k

t 2k

1 i 2k−1

− tk 2

i

−

2k−1

−

− F¯1

Z

Z

t 2k

+

t 2k

i

−B

2k−1

i

− min s, 2k−1 −

i−1 2k−1

+

t 2k

i−1 2k−1

f1 (s)

i 2k−1

f1 (s)

− tk 2

ds dt s(1−t) 2k−1

t 2k

+

t 2k

ds

i

ds dt

1

f1 (s)s ds +

i−1 2k−1

+

− t 2k

#

t 2k

Z

+ #

1 i 2k−1

i

t 2k

−

ds dt + (F1 − F¯1 )F¯2

1

i−1 2k−1

+

t 2k

f1 (s) ds

f1 (s) ds dt + (F1 − F¯1 )F¯2

t 2k

i−1 2k−1

F1 (1) − F1

i

F1 +

2k−1 t 2k i

2k−1

−

+

−

t 2k

t 2k

i−1 2k−1

i

+ +

i−1 2k−1 t 2k

+

t 2k

F1

i−1 2k−1

F1 (1) − F1

dt + (F1 − F¯1 )F¯2

+

i−1 2k−1

t 2k

+

t 2k

Rb (since a sf1 (s) ds = bF1 (b) − aF1 (a) − F¯1 (b) + F¯1 (a)) k−1 Z 2X n 1 (1−t)F1 o i t f2 (t) F¯1 2k−1 = − 2tk − F¯1 2i−1 + − 2k−1 dt + (F1 − F¯1 )F¯2 , k−1 2k i=1

0

from which the result follows. iie08.tex

2 23

August 5, 2007 – 13:50

Proof of Lemma 7 P First, we show that every linear combination kj=0 aj Nj (f ) has a normal distribution, and hence, N(f ) has a multivariate normal distribution by virtue of Theorem 2.6.2 of Anderson (1984). Indeed, # Z 1"X k k X aj f (t)Bj (t) dt. aj Nj (f ) = 0

j=0

j=0

Further, by Definition 2 and Eq. (4), for each Bj (t), Z(t) ≡

k X

aj f (t)Bj (t)

j=0

= a0 f (t)(W(t) − tW(1)) + +

X k j=1

k X

aj

j=1

j−1 2 X

i=1

f (t) W

i−1 2j−1

+

t 2j

−W

i 2j−1

−

t 2j

aj (1 − t)f (t)W(1).

Now, let c1 , c2 , . . . , cm be real constants and 0 ≤ t1 < · · · < tm ≤ 1. Then m X `=1

c` Z(t` ) =

m X `=1

c` a0 f (t` )(W(t` ) − t` W(1))

+

m X

c`

+

`=1

aj

j=1

`=1

m X

k X

c`

k X j=1

j−1 2 X

i=1

!

f (t` ) W

i−1 2j−1

+

t` 2j

−W

i 2j−1

−

t` 2j

aj (1 − t` )f (t` )W(1).

t` t` i Let T be the set of all times of the form 2i−1 j−1 + 2j or 2j−1 − 2j , for some ` = 1, . . . , m, j = 1, . . . , k, and i = 1, . . . , 2j−1 . Let {τ1 , . . . , τL } be an increasing ordering of T ∪ {1}. P PL Clearly, we can write m c Z(t ) as ` ` `=1 `=1 d` W(τ` ), for some real constants d1 , . . . , dL since the function f (·) is deterministic. Since W is a Gaussian process, the latter summation is Gaussian and thus, Z is a Gaussian process. Notice also that Z has continuous paths R1 P because W has continuous paths. Finally recall that kj=0 aj Nj (f ) = 0 Z(t) dt; and the same methodology used in Theorem 4.1 can be used to show that the latter integral is a normal random variable.

To prove that N(f ) = [N0 (f ), . . . , Nk (f )] has a nonsingular multivariate normal distribution, we show that the variance-covariance matrix ΣN(f ) is positive definite. This follows iie08.tex

24

August 5, 2007 – 13:50

immediately from Lemma 6 since T

aΣN(f ) a = Var for all a = (a0 , . . . , ak ) ∈ Rk+1 − {0}.

"

k X

#

aj Nj (f ) =

j=0

k X

a2j > 0,

j=0

2

Proof of Theorem 5.2 In terms of the definition (1) and the Definitions 8–11, we see that Θkn (Xn ) = Ak (f ; n) for k = 0, 1, . . . ; and we seek to apply the generalized CMT—that is, Theorem 5.5 of Billingsley (1968)—to prove that the (k + 1) × 1 random vector with jth element Θjn (Xn ), j = 0, 1, . . . , k, converges in distribution to the (k + 1) × 1 random vector with jth element Θj (W). To verify the hypotheses of the generalized CMT, we establish the following result. In terms of the set of discontinuities n Dj ≡ x ∈ D[0, 1] : for some sequence {xn } ⊂ D[0, 1] converging to x, o the sequence {Θjn (xn )} does not converge to Θj (x) for j = 0, . . . , k, we will show that (

Pr W(·) ∈ D[0, 1] −

k [

j=0

Dj

)

= 1.

(A2)

To prove (A2), we will exploit the almost-sure continuity of sample paths of W(·): With probability 1, the function W(t) is continuous at every t ≥ 0;

(A3)

see §41.3.A of Loève (1978) or p. 64 of Billingsley (1968). Thus we may assume without loss of generality that we are restricting our attention to an event H ⊂ D[0, 1] for which (A3) holds so that Pr W ∈ H = 1. (A4)

Suppose {xn } ⊂ D[0, 1] converges to W ∈ H and that j ∈ {0, 1, . . . , k} is a fixed integer. Next we seek to prove the key intermediate result,  For each W ∈ H and for each sequence {xn } ⊂ D[0, 1] converging to W, we     have (A5)  j    lim Ψ ◦ Ωnx (t) − Ψj ◦ ΩnW (t) = 0 uniformly for t ∈ [0, 1]. n→∞

iie08.tex

n

25

August 5, 2007 – 13:50

We prove (A5) by induction on j, starting with j = 0. Choose ε > 0 arbitrarily. Throughout the following discussion, W ∈ H and {xn } are fixed; and thus virtually all the quantities introduced in the rest of the proof depend on W and {xn }. The sample-path continuity property (A3) and Theorem 4.47 of Apostol (1974) imply that W(t) is uniformly continuous on [0, 1]; and thus we can find ζ > 0 such that For all t, t0 ∈ [0, 1] with |t − t0 | < ζ, we have |W(t) − W(t0 )| < ε/4.

(A6)

Because {xn } converges to W in D[0, 1], there is a sufficiently large integer N such that for each n ≥ N , there exists λn (·) ∈ Λ satisfying sup |λn (t) − t| < min{ζ, ε/4}

(A7)

sup |xn (t) − W[λn (t)]| < min{ζ, ε/4}.

(A8)

t∈[0,1]

and t∈[0,1]

When j = 0, the map Ψj is the identity; and in this case for each n ≥ N we have j Ψ ◦ Ωn (t) − Ψj ◦ Ωn (t) = Ωn (t) − Ωn (t) xn W xn W bntcxn (1) bntcW(1) − xn (t) − − W(t) = n n bntc |xn (1) − W(1)| + |xn (t) − W(t)| ≤ n ≤ |xn (1) − W[λn (1)]| + |W[λn (1)] − W(1)| + |xn (t) − W[λn (t)]| + |W[λn (t)] − W(t)|

(A9) (A10)

≤ ε/4 + ε/4 + ε/4 + ε/4 = ε for t ∈ [0, 1],

(A11)

where (A9) and (A10) follow from the triangle inequality and (A11) follows from (A6), (A7) and (A8). This establishes (A5) for j = 0. Now suppose that (A5) holds for some j ≥ 0. Again we choose ε > 0 arbitrarily. The induction hypothesis ensures that there exists N 0 sufficiently large such that for each n ≥ N 0 , we have j Ψ ◦ Ωnx (t) − Ψj ◦ ΩnW (t) < ε/2 for all t ∈ [0, 1]. (A12) n We have j+1 Ψ ◦ Ωnx (t) − Ψj+1 ◦ ΩnW (t) h n i h i = Ψj ◦ Ωnxn 2t − Ψj ◦ Ωnxn 1 − 2t − Ψj ◦ ΩnW 2t − Ψj ◦ ΩnW 1 − 2t (A13) j j j n t n j n n t t t (A14) ≤ Ψ ◦ Ω xn 2 − Ψ ◦ Ω W 2 + Ψ ◦ Ω xn 1 − 2 − Ψ ◦ Ω W 1 − 2 < ε/2 + ε/2 = ε for t ∈ [0, 1] and n ≥ N 0 ,

iie08.tex

26

(A15)

August 5, 2007 – 13:50

where: (A13) follows from Definition 3 of the folding map; (A14) follows from the triangle inequality; and (A15) follows from (A12). This establishes (A5) for j = 0, 1, . . . . Since f (·) is continuous on [0, 1] by Assumption A.4, we have f ∗ ≡ max |f (t)| < ∞;

(A16)

t∈[0,1]

thus (A4), (A5), and (A16) imply that n 1 X j lim Θjn (xn ) − Θjn (W) ≤ f ∗ lim Ψ ◦ Ωnxn n→∞ n→∞ n i=1

= 0 with probability 1.

i n

− Ψj ◦ ΩnW

i n

(A17)

An argument similar to that justifying (A5) proves that With probability 1, lim Ψj ◦ ΩnW (t) − Ψj ◦ ΩW (t) = 0. n→∞

In view of the almost-sure continuity of sample paths of W(·) and the continuity of f (·), it is straightforward to show that [Ref 2 wants to move the next eqn into this text. In any case, I got rid of the eqn number since it’s not referenced later.] With probability 1, the function f (t)Ψj ◦ ΩW (t) is continuous at every t ∈ [0, 1]; and thus it follows that f (t)Ψj ◦ ΩW (t) is Riemann integrable with probability 1 and that Z 1 n 1 X j j j j i i lim Θn (W) − Θ (W) = lim f n Ψ ◦ ΩW n − f (t)Ψ ◦ ΩW (t) dt n→∞ n→∞ n 0 i=1 = 0 with probability 1.

(A18)

Combining (A17) and (A18) and applying the triangle inequality, we see that for each j ∈ {0, 1, . . . , k}, lim Θjn (xn ) − Θj (W) n→∞ ≤ lim Θjn (xn ) − Θjn (W) + lim Θjn (W) − Θj (W) = 0 with probability 1. n→∞

n→∞

0 It follows that the corresponding vector-valued sequence Θn (xn ), . . . , Θkn (xn ) : n = 1, 2, . . . converges to Θ0 (W), . . . , Θk (W) in Rk+1 with probability one; and thus the desired result (6) follows directly from the generalized CMT. 2

iie08.tex

27

August 5, 2007 – 13:50