purely sequential and two-stage bounded-length confidence interval ...

7 downloads 0 Views 311KB Size Report
Section 4 develops the bounded-length confidence interval estimation prob- ... sequential as well as the two-stage estimation methodologies perform remarkably.
J. Japan Statist. Soc. Vol. 47 No. 2 2017 237–271

PURELY SEQUENTIAL AND TWO-STAGE BOUNDED-LENGTH CONFIDENCE INTERVAL ESTIMATION PROBLEMS IN FISHER’S “NILE” EXAMPLE Nitis Mukhopadhyay* and Yan Zhuang* Fisher’s “Nile” example is a classic which involves a bivariate random variable (X, Y ) having a joint probability density function given by f (x, y; θ) = exp(−θx − θ−1 y), 0 < x, y < ∞, where θ > 0 is a single unknown parameter. We develop bounded-length confidence interval estimations for Pθ (X > a) with a preassigned confidence coefficient using both purely sequential and two-stage methodologies. We show: (i) Both methodologies enjoy asymptotic first-order efficiency and asymptotic consistency properties; (ii) Both methodologies enjoy second-order efficiency properties. After presenting substantial theoretical investigations, we have also implemented extensive sets of computer simulations to empirically validate the theoretical properties. Key words and phrases: Asymptotic consistency, bounded-length, confidence intervals, first-order asymptotic efficiency, maximum likelihood estimator, “Nile” example, purely sequential sampling, second-order asymptotic efficiency, two-stage sampling.

1. Introduction Fisher (1973) revisited an interesting bivariate distribution from his previous discourses (Fisher (1934, 1956)) on classical “Nile” example. That joint distribution was given by the following probability density function (p.d.f.):  exp{−θx − θ−1 y} if x > 0, y > 0 f (x, y; θ) = (1.1) , 0 otherwise where θ(> 0) is an unknown parameter. What is special about this distribution may be briefly summarized as follows: The maximum likelihood estimator (MLE) for θ, namely θMLE , is not sufficient for θ, but there exists an ancillary complement U . In other words, while U is an ancillary statistic, the statistic (θMLE , U ) is a jointly sufficient statistic for θ. The parameter θ may indicate the depth of a river (for example, Nile) in some location within its path where the river is prone to flooding during heavy rainy season. In order to estimate θ, one may record two measurements at-atime: X may be the speed at which water flows whereas Y may be the volume of water that flows by. If θ is large (small), one may expect X to be small (large), Received May 31, 2017. Revised August 8, 2017. Accepted August 29, 2017. *Department of Statistics, University of Connecticut, Austin Building U-4120, 215 Glenbrook Road, Storrs, CT 06269-4120, U.S.A. Email: [email protected]; [email protected]

238

NITIS MUKHOPADHYAY AND YAN ZHUANG

but Y would be accordingly large (small). Thus, estimation of θ is directly linked to chances of potential flooding of the banks of the river. This distribution (1.1) led Fisher to propose generally that inferences based on some non-sufficient MLE θMLE ought to be conditioned on the observed value of an ancillary complement U in order to recoup or make up for lost information due to one’s use of θMLE . This path-breaking direction from Fisher (1934, 1956, 1973) led him to create the foundation of conditional inference in a solid footing within statistical science. 1.1. Brief literature review Over the years, many researchers have returned to Fisher’s example (1.1). One may refer to Basu (1964), Rao (1973), Cox and Hinkley (1974), Lehmann and Casella (1998), Ghosh et al. (2010), Kagan and Malinovsky (2013, 2016), Mukhopadhyay (2000, 2014), Mukhopadhyay and Zhuang (2016, 2017) and other sources. Mukhopadhyay and Zhuang (2017) constructed a number of Nile-like illustrations where the MLE of θ was non-sufficient, had less than full information, but its ancillary complement helped in recovering the full information. Joshi and Shah (1999) handled unbiased point estimation problem for the parametric function τ (θ) ≡ Pθ (Y < X) = (1 + θ2 )−1 based on sufficient statistics of θ. They derived a number of crucial expressions of the moments of the MLE for τ (θ) including the bias and mean squared error of the MLE. Mukhopadhyay and Banerjee (2014, 2015a) introduced fixed-accuracy confidence interval estimations for the mean parameter in a negative binomial distribution. Banerjee and Mukhopadhyay (2016) developed a general structure for fixed-accuracy confidence interval estimation methodologies for a positive parameter of an arbitrary distribution, which would enjoy asymptotic consistency and asymptotic first-order efficiency properties. They handled precisely both implementation and validity of their proposed methodologies with the help of illustrations involving odds-ratio estimation in a Bernoulli(θ) distribution and mean estimations in the case of Poisson(θ) and Normal(θ, θ) distributions. Mukhopadhyay and Banerjee (2015b) proposed two-stage and purely sequential boundedlength confidence intervals for θ in a Bernoulli(θ) distribution, where 0 < θ < 1 is an unknown parameter. More specifically, Mukhopadhyay and Zhuang (2016) developed both fixedwidth and fixed-accuracy confidence intervals for θ in Fisher’s example (1.1). Their fixed-accuracy confidence interval estimation methodology for θ turned out to have some major advantages over the fixed-width confidence interval estimation methodology for θ. An appropriate fixed-sample-size estimation strategy was developed with both exact and approximate properties which could be guaranteed to produce fixed-accuracy confidence intervals for θ. Also, a bounded-accuracy confidence interval for Pθ (Y < X) associated with (1.1), and the requisite fixedsample-size methodology were introduced. In this paper, we aim at estimating the parametric function, Pθ (X > a), a > 0, with the help of bounded-length confidence intervals. Here, the parametric function Pθ (X > a) lies between 0 and 1, and so Banerjee and Mukhopadhyay’s

BOUNDED-LENGTH CONFIDENCE INTERVALS

239

(2016) general methodology does not immediately apply. Our goal is to appropriately improvise upon Banerjee and Mukhopadhyay’s (2016) general methodology in order to make it fit with requirements that we must face. The parametric function Pθ (X > a) may be directly linked to calibrate chances of flooding the banks of the (Nile) river. Its accurate estimation may lead to a good warning system for thousands of people whose livelihood relies upon the status of the river and its many estuaries. Early warning of possible severe flooding will save thousands of lives, hopefully allowing families time to move to higher grounds and stay away from harm’s way. Now, if one happens to decide to propose sampling strategies based on utilizing the X-data (or Y -data) alone, many technical details arising from the associated estimation problems may become rather simple in nature, but we do not advocate such a route at all. We emphasize that our data will consist of paired observations on (X, Y ) and then utilizing the X-data or Y -data alone would amount to significant loss of information about θ. This requires us to develop new and innovative ideas as well as interesting and productive new formulations, theoretical challenges, and methodologies. Next, we provide a layout of our presentation. 1.2. Layout of the paper We begin Section 2 describing the foundation, formulation and motivation. First, we may transform pθ from (2.1) to another suitable parametric function qθ defined on the space (0, ∞) where qθ is a one-to-one function of pθ . Then, in the spirits of Mukhopadhyay and Banerjee (2015a, b) and Banerjee and Mukhopadhyay (2016), we proceed to build a fixed-accuracy confidence interval Jn ⊂ (0, ∞) for qθ as in (2.2) with confidence coefficient at least (or approximately) 1−α where 0 < α < 1 is preassigned. This will lead to a bounded-length confidence interval Kn for pθ as illustrated in (2.6). Theorem 2.1 shows that the length of the finally proposed confidence interval Kn for pθ is bounded from above by a preassigned positive number. In Section 3, we supply more details surrounding the bounded-length confidence interval estimation problem for pθ and obtain an expression of n∗d ≡ n∗d (θ) given by (3.4), the required optimal fixed sample size. Lemma 3.1, which may be of some independent interest, helps us to show the existence of a natural positive lower bound n0d for n∗d (θ) in (3.8) where n0d is known and n0d , n∗d have the same order as d → 1+. Section 4 develops the bounded-length confidence interval estimation problem for pθ using a properly designed purely sequential sampling strategy along with nonlinear renewal theoretic representations (Subsection 4.1). We prove that the purely sequential estimation methodology enjoys attractive properties (Theorems 4.1–4.3) including asymptotic first-order efficiency and asymptotic consistency (Theorem 4.1). Even though the boundary condition in our purely sequential stopping time (4.1) admittedly appears complicated, we are able to obtain asymptotic second-order properties (Theorem 4.4). In doing so, we have been mindful in addressing the role of second-order approximate expression κ(θ)

240

NITIS MUKHOPADHYAY AND YAN ZHUANG

from (4.19) numerically (Table 1). Subsection 4.4 summarizes truly encouraging findings obtained from computer simulations. In Section 5, we address the bounded-length confidence interval estimation problem for pθ under a properly designed two-stage sequential sampling strategy. After developing some requisite preliminaries (Subsection 5.1), we move to verify that the two-stage estimation methodology (5.1)–(5.2) enjoys attractive properties (Theorem 5.1) including asymptotic first-order efficiency and asymptotic consistency. We have also strengthened our assertion considerably by obtaining asymptotic second-order properties (Theorems 5.2–5.3) including asymptotic second-order efficiency (Theorem 5.3). In doing so, we have again been especially mindful in addressing the role of second-order approximate expressions numerically (Subsections 5.3.1–5.3.2). Subsection 5.4 summarizes truly encouraging findings obtained from computer simulations. Based on the summaries of data analyses, we find that both proposed purely sequential as well as the two-stage estimation methodologies perform remarkably well across the board. This sentiment is clearly validated under all circumstances whether or not the sample size happens to be small, moderate, or large. Section 6 lays down a number of concluding thoughts. 2. Foundation-formulation-motivation We believe that it is important to estimate pθ ≡ Pθ (X > a), a > 0 by means of fixed-accuracy confidence intervals. In the same spirit, however, one may argue and want to estimate Pθ (Y > b), b > 0 instead of pθ . For brevity, we focus on estimating pθ . Obviously, (2.1)

pθ ≡ Pθ (X > a) = e−aθ ,

where a(> 0) is a fixed constant. Obviously, pθ ∈ (0, 1) and we transform pθ to qθ where (2.2)

qθ ≡

pθ e−aθ = . 1 − pθ 1 − e−aθ

Clearly, the space for qθ is (0, ∞) as Banerjee and Mukhopadhyay (2016) would have required and qθ is also a one-to-one function of pθ . Thus, we will accordingly tie a fixed accuracy confidence interval for qθ with an associated bounded-length confidence interval for pθ . We will explain more as we move forward. In the light of Mukhopadhyay and Banerjee (2015a, b), Banerjee and Mukhopadhyay (2016), Mukhopadhyay and Zhuang (2016), we begin with independent and identically distributed (i.i.d.) random samples (Xi , Yi ), i = 1, 2, . . . , n, each having the common p.d.f. (1.1). The MLE of θ is given by: (2.3)

Tn ≡ θn,MLE = (Σni=1 Yi /Σni=1 Xi )1/2 .

For fixed n, we note that Tn is a biased but consistent estimator of θ.

BOUNDED-LENGTH CONFIDENCE INTERVALS

241

Now, according to the invariance property of the MLE, in view of (2.2), the MLE of qθ can be expressed as follows: Un = e−aTn (1 − e−aTn )−1 .

(2.4)

We construct a fixed-accuracy confidence interval Jn for qθ as follows: Jn = {qθ : qθ ∈ [d−1 Un , dUn ]},

(2.5)

involving the MLE Un for qθ from (2.4) where d(> 1) is the preassigned fixedaccuracy measure. Once the fixed-accuracy confidence interval for qθ is constructed, the associated confidence interval Kn for pθ can be derived as follows in view of (2.2) and (2.4): Kn = {pθ : pθ ∈ [(d + Un )−1 Un , (dUn + 1)−1 dUn ]}.

(2.6)

Theorem 2.1. For all fixed sample size n and for all fixed but otherwise arbitrary d(> 1), the length of the confidence interval Kn from (2.6) proposed for pθ defined in (2.1) is bounded from above by the expression d−1 d+1 w.p.1 where d stands for the preassigned fixed-accuracy measure associated with Jn from (2.5). Proof. The length of the confidence interval is given by: (2.7)

Length(Kn ) = (dUn + 1)−1 dUn − (d + Un )−1 Un =

Now, letting m(x) ≡

(d2 − 1)x , (dx + 1)(d + x)

(d2 − 1)Un . (dUn + 1)(d + Un )

x > 0,

we can easily check that the function m(x) attains its global maximum x = 1. Thus, we can immediately claim that Length(Kn ) ≤ d−1 d+1 w.p.1.  At this point, it is understood that the upper bound d−1 d+1 meant for the length of the proposed confidence interval Kn for pθ is smaller than 1 and it goes to zero as d ↓ 1. That ought to make good sense because after all the confidence interval Kn is proposed for pθ which is a number between zero and one. In other words, if we alternatively begin with a preassigned number 0 < δ < 1 and we require that the proposed confidence interval Kn has its length ≤ δ, then we should equate d−1 d+1 with δ. We will thereby employ the proposed confidence interval procedure with d = 1+δ 1−δ . In closing this section, we should record one other important point: We have deliberately not yet tied the proposed confidence interval Kn for pθ with a given preassigned confidence level 1 − α, 0 < α < 1. Obviously, an expression for an appropriate minimum requisite fixed-sample-size n associated with (2.5), and hence equivalently with (2.6), must be determined. This is explored explicitly in Section 3.

242

NITIS MUKHOPADHYAY AND YAN ZHUANG

Remark 2.1. Generally speaking, suppose that one is interested in a bounded-length confidence interval of an unknown parametric function ψ(ξ) of an unknown parameter ξ where cL < ψ(ξ) < cU with known lower and upper bounds cL , cU respectively. Then, clearly, much like pθ from (2.1), the parametric function (cU − ψ(ξ))/(cU − cL ) will belong to the space (0, 1). In other words, we should consider estimating a one-to-one transform of ψ(ξ), namely (cU − ψ(ξ))/(ψ(ξ) − cL ) ∈ (0, ∞), much in the spirit of qθ from (2.2). We live out a full-blown general discourse for brevity. 3. Optimal fixed sample size for a fixed preassigned confidence coefficient The Fisher information about θ in a single pair of observation (X, Y ) is given by: (3.1)

I(X,Y ) (θ) =

2 . θ2

One may refer to Fisher (1934) and Mukhopadhyay and Zhuang (2016, 2017). Now, having recorded i.i.d. random samples (Xi , Yi ), i = 1, . . . , n, each following the p.d.f. (1.1), recall Tn ≡ (Σni=1 Yi /Σni=1 Xi )1/2 from (2.3), the MLE for θ. Then, we will have:   1 2 £ 1/2 n (Tn − θ) → N 0, θ (3.2) as n → ∞. 2 With qθ defined in (2.2), we express: b(θ) ≡ log qθ = −aθ − log(1 − e−aθ ), which implies:

−a ∂b(θ) ae−aθ = . = −a − −aθ ∂θ 1−e 1 − e−θa

Next, applying Mann-Wald theorem to (3.2), we can conclude: £

(3.3)

n1/2 (log Un − log qθ ) → N (0, σ 2 (θ)) 1 with σ 2 (θ) ≡ a2 θ2 (1 − e−aθ )−2 . 2

as

n→∞

One may refer to Rao (1973, pp. 385–386), Mukhopadhyay (2000, pp. 261–262) or another source. Some readers may prefer to appeal to delta-method (Sen and Singer (1993, pp. 131–132)) in order to claim (3.3) from (3.2). Thus, in order to obtain a fixed-accuracy confidence interval Jn defined via (2.5) for the parametric function qθ defined in (2.2), we utilize (3.3) to claim that Pθ {qθ ∈ Jn } is approximately 1 − α for large n. Here, we assume that 0 < α < 1 is fixed and preassigned.

BOUNDED-LENGTH CONFIDENCE INTERVALS

243

Hence, we can claim that Pθ {qθ ∈ Jn } ≈ 1 − α for large n when the required minimum fixed sample size n satisfies:   1 zα/2 2 2 2 (3.4) a θ (1 − e−aθ )−2 . n ≥ n∗d ≡ n∗d (θ) = 2 log d Here, zα/2 is the upper 100(α/2)% point of a standard normal distribution. Obviously, the magnitude of n∗d , the optimal fixed sample size, remains unknown since it involves the unknown parameter θ(> 0). 3.1. Further examination of the optimal fixed sample size from (3.4) We observe that n∗d from (3.4) can be alternatively expressed as follows:   1 zα/2 2 n∗d = (3.5) {g(aθ)}2 , 2 log d where we define the function g(·) as: (3.6)

g(x) = x(1 − e−x )−1 ,

x > 0.

Parts of the following lemma may already be known. It will help in arriving at an appropriate lower bound for n∗d and making it easier in developing our proposed sampling strategies in the sequel. Hence, we state it and sketch its proof for completeness. Lemma 3.1. For all fixed x > 0, we have the following result: 1 < x(1 − e−x )−1 < 1 + x. Proof. We define h(x) ≡ x−(1−e−x ) for x > 0. Then, the first derivative, h (x) = 1 − e−x , which is positive for all x > 0. That is, h(x) is a monotonically increasing function in x for all x > 0. But, since h(0) = 0, we have h(x) > 0 for all x > 0. Thus, the lower bound holds. Also, for all x > 0, we have: ex > 1 + x ⇔ 1 − e−x > x(1 + x)−1 , which shows the upper bound.  Using the lower limit from Lemma 3.1 and then by appealing to (3.5), we obtain the following lower bound:   1 zα/2 2 ∗ ∗ nd ≡ nd (θ) > (3.7) , 2 log d for all θ > 0. Hence, the pilot sample size will be defined as:     1 zα/2 2 n0d ≡ (3.8) + 1, 2 log d where u denotes the largest integer smaller than u with u > 0. We may emphasize that n0d does not involve the unknown parameter θ.

244

NITIS MUKHOPADHYAY AND YAN ZHUANG

4. A purely sequential estimation methodology We begin with the pilot set of data (Xi , Yi ), i = 1, . . . , n0d with n0d coming from (3.8). After that we continue by recording one additional pair of observation (X, Y ) at-a-time as needed by an appropriately defined stopping criterion until until it decides to terminate gathering more data. Our stopping time is defined as follows:     1 zα/2 2 2 (4.1) Nd ≡ N = inf n ≥ n0d : n ≥ {g(aTn )} . 2 log d One sees readily that Nd estimates the optimal fized-sample-size n∗d defined by (3.5). By referring to Chow and Robbins (1965), we can claim that Pθ {Nd < ∞} = 1 for every fixed θ > 0, d > 1, and a > 0. Upon termination, we’ll have the fully gathered data {(Xi , Yi ), i = 1, 2, . . . , n0d , . . . , Nd }, and we obtain the randomly stopped version of the MLE from (2.3) for θ: (4.2)

Nd 1/2 d TNd ≡ θNd ,MLE = (ΣN , i=1 Yi /Σi=1 Xi )

and the randomly stopped version of the MLE from (2.4) for qθ : (4.3)

UNd = e−aTNd (1 − e−aTNd )−1 .

Then, we propose the associated fixed-accuracy confidence interval for qθ as follows: (4.4)

JNd = {qθ : qθ ∈ [d−1 UNd , dUNd ]},

and the bounded-length confidence interval for pθ as follows: (4.5)

KNd = {pθ : pθ ∈ [(d + UNd )−1 UNd , (dUNd + 1)−1 dUNd ]},

in the spirits of (2.5) and (2.6) respectively. In view of Theorem 2.1, one can easily verify that the length of the confidence interval KNd for pθ is bounded above by d−1 d+1 . 4.1. Nonlinear renewal theoretic representation of the stopping time from (4.1) Observe that the stopping time Nd from (4.1) can be rewritten as follows:   2  z 1 α/2 Nd ≡ N = inf n ≥ n0d : n{g(aTn )}−2 ≥ (4.6) . 2 log d Recall that n0d comes from (3.8) and g(·) was defined in (3.6). We refer to nonlinear renewal theoretic representation, originally developed by Woodroofe (1977, 1982) and Lai and Siegmund (1977, 1979), in the case of

BOUNDED-LENGTH CONFIDENCE INTERVALS

245

our stopping variable Nd . In the light of Ghosh et al. (1997, Section 2.9), we may define: 1 n0 = 2

(4.7)



zα/2 log d



2 ,

Wi =



 Xi , Yi

Wn =

 Xn , Yn

which gives:  Eθ [Wi ] = (4.8)

 1/θ , θ

 1/θ2 0 = Σ, Vθ [Wi ] = 0 θ2

say, and we let Zn = nh(W n ),

{1 − exp(−a y/x)}2 h(x, y) = , a2 y/x (x, y) ∈ R+2 leading to h(W n ) ≡ h(X n , Y n ).

Next, we obtain the first- and second-order partial derivatives of h(x, y): √ ∂h(x, y)/∂x = a−2 y −1 (1 − e−a y/x )2 √ √ − a−1 x−1/2 y −1/2 e−a y/x (1 − e−a y/x ); √ ∂h(x, y)/∂y = −a−2 xy −2 (1 − e−a y/x )2 √ √ + a−1 x1/2 y −3/2 e−a y/x (1 − e−a y/x ); √ √ 1 1 ∂ 2 h(x, y)/∂x2 = − a−1 x−3/2 y −1/2 e−a y/x − x−2 e−a y/x 2 2 √ (4.9) 1 −1 −3/2 −1/2 −2a√y/x + a x y e + x−2 e−2a y/x ; 2 √ ∂ 2 h(x, y)/∂y 2 = 2a−2 xy −3 (1 − e−a

y/x 2



)

√ − a−1 x1/2 y −5/2 e−a y/x (1 − e−a y/x ) √ √ 1 3 − a−1 x1/2 y −5/2 e−a y/x − y −2 e−a y/x 2 2 √ 3 −1 1/2 −5/2 −2a√y/x + a x y e + y −2 e−2a y/x . 2

The second-order mixed partial derivative of h(x, y) is given by: √ ∂ 2 h(x, y)/∂x∂y = − a−2 y −2 (1 − e−a y/x )2 (4.10) √ √ + a−1 x−1/2 y −3/2 e−a y/x (1 − e−a y/x ) √ √ 1 − x−1 y −1 e−2a y/x + a−1 x−1/2 y −3/2 e−a y/x 2 1 −1 −1 −a√y/x 1 −1 −1/2 −3/2 −2a√y/x + x y e − a x y e . 2 2

246

NITIS MUKHOPADHYAY AND YAN ZHUANG

Now, we evaluate these derivatives from (4.9)–(4.10) at (x = 1/θ, y = θ) and obtain the following expressions: ∂h(x, y)/∂x |(x=1/θ,y=θ) = a−2 θ−1 (1 − e−aθ )2 − a−1 e−aθ (1 − e−aθ ) ≡ h1 ,

say;

∂h(x, y)/∂y |(x=1/θ,y=θ) = −a−2 θ−3 (1 − e−aθ )2 + a−1 θ−2 e−aθ (1 − e−aθ ) ≡ h2 ,

say;

1 1 1 ∂ 2 h(x, y)/∂x2 |(x=1/θ,y=θ) = − a−1 θe−aθ − θ2 e−aθ + a−1 θe−2aθ 2 2 2 + θ2 e−2aθ ; (4.11) 5 ∂ 2 h(x, y)/∂y 2 |(x=1/θ,y=θ) = 2a−2 θ−4 (1 − e−aθ )2 − a−1 θ−3 e−aθ 2 5 −1 −3 −2aθ 1 −2 −aθ + a θ e − θ e + θ−2 e−2aθ ; 2 2 ∂ 2 h(x, y)/∂x∂y |(x=1/θ,y=θ) = −a−2 θ−2 (1 − e−aθ )2 1 + a−1 θ−1 e−aθ (1 − e−aθ ) + a−1 θ−1 e−aθ 2 1 −aθ 1 −1 −1 −2aθ + e − a θ e − e−2aθ . 2 2 Thus, with h1 , h2 from (4.11) and Σ from (4.8), we obtain the following expressions:   1 u≡h , θ = a−2 θ−2 (1 − e−aθ )2 ; θ σ 2 ≡ (h1 , h2 ) Σ(h1 , h2 ) = 2a−4 θ−4 (1 − e−aθ )4 − 4a−3 θ−3 e−aθ (1 − e−aθ )3 (4.12)

+ 2a−2 θ−2 e−2aθ (1 − e−aθ )2 ; 1 1 Sn ≡ Σni=1 (u + h1 Xi + h2 Yi ) = nu + h1 θ−1 Hn + θh2 Ln 2 2 Hn (≡ 2θΣni=1 Xi ), Ln (≡ 2θ−1 Σni=1 Yi ) are i.i.d. χ22n ;

and

ξn ≡ Zn − Sn = nh(W n ) − Sn ; We also have the following expression for the trace of a required matrix:   

2 h(x, y)/∂x2 ∂ 2 h(x, y)/∂x∂y ∂

 tr Σ

∂ 2 h(x, y)/∂y∂x ∂ 2 h(x, y)/∂y 2 (x=1/θ,y=θ) (4.13) −1 −1 −aθ = −3a θ e − e−aθ + 3a−1 θ−1 e−2aθ + 2e−2aθ + 2a−2 θ−2 (1 − e−aθ )2 ≡ tra,θ , say. From (4.1), recall that Nd ≥ n0d w.p.1 for fixed α and d. We may pick ε = 1 − {g(aθ)}−2 so that we have 0 < ε < 1 and then clearly (1 − ε)n∗d = n0 . Thus, we have the following property holds: (4.14)

Pθ {Nd ≤ (1 − ε)n∗d } = 0.

BOUNDED-LENGTH CONFIDENCE INTERVALS

247

We see immediately that our stopping rule Nd from (4.1) has the same representation required in Theorem 2.9.6 in Ghosh et al. (1997, p. 64) with the verifiable conditions (A.1)–(A.7) from Ghosh et al. (1997, p. 63) holding. See also Woodroofe (1982, pp. 47–48) and Siegmund (1985, pp. 194–195). 4.2. First-order asymptotics We first set out to introduce a number of desirable interesting properties associated with our proposed bounded-length purely sequential confidence interval estimation strategy (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1). Theorem 4.1. For the purely sequential sampling strategy (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the following asymptotic results as d → 1+: (i) Nd /n∗d → 1 w.p.1(Pθ ); (ii) Eθ [Nd /n∗d ] → 1 [Asymptotic First-Order Efficiency]; (iii) Pθ {pθ ∈ KNd } → 1 − α [Asymptotic Consistency]; where n∗d comes from (3.5) and a(> 0) is known. Proof. In what follows, we sketch an outline of the proof. Part (i). For the purely sequential stopping time Nd defined in (4.1), it is obvious that Nd → ∞ w.p.1(Pθ ) and both TNd , TNd −1 converge to θ w.p.1(Pθ ) as d → 1+. Then, using (4.1), we can claim the following basic inequality w.p.1(Pθ ):     1 zα/2 2 1 zα/2 2 2 (4.15) {g(aTNd )} ≤ Nd ≤ 1 + {g(aTNd −1 )}2 , 2 log d 2 log d since Nd > n0d w.p.1(Pθ ). Then, dividing throughout (4.15) by n∗d we can claim that Nd n∗−1 → 1 w.p.1(Pθ ) as d → 1+. d Part (ii). Given the discussions in Subsection 4.1, this claim follows from Theorem 2.9.3 in Ghosh et al. (1997). Part (iii). By combining Anscombe’s (1952) random central limit theorem (CLT) for the MLE with Slusky’s theorem, we conclude:   1 2 £ ∗1/2 (4.16) as d → 1 + . nd (TNd − θ) → N 0, θ 2 with n∗d is defined in (3.5). Then, applying Mann-Wald theorem to (4.16), we can alternatively express (4.16) as:   1 2 2 £ ∗1/2 −θa −2 (4.17) nd (log UNd − log qθ ) → N 0, a θ (1 − e ) as d → 1+, 2 where UNd is the randomly stopped version of the MLE of qθ obtained from {(Xi , Yi ), i = 1, 2, . . . , Nd }. Recall (3.3). See also Gut (2012) and Mukhopadhyay and Chattopadhyay (2012).

248

NITIS MUKHOPADHYAY AND YAN ZHUANG

Now, in view of (4.17) let us note that W Nd ≡

(4.18)

zα/2 £ (log UNd − log qθ ) → N (0, 1) log d

as

d → 1.

Then, in view of (4.18), the following will hold: Pθ {qθ ∈ JNd } = Pθ {| log UNd − log qθ | < log d} = Pθ {|WN | < zα/2 }, which converges to 1 − α as d → 1+. Now, the proof is complete.  The following result shows convergence of the negative moments of Nd . Theorem 4.2. For the purely sequential sampling strategy (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the following asymptotic results as d → 1+: Eθ [(n∗d /Nd )ω ] → 1 for all fixed ω > 0. where n∗d comes from (3.5). Proof. In what follows, we sketch an outline of the proof. We can write w.p.1(Pθ ): 0 < (n∗d /Nd )ω ≤ (n∗d /n0d )ω = {g(aθ)}2ω , which shows that (n∗d /Nd )ω remains bounded for all d > 1. Hence, (n∗d /Nd )ω is uniformly integrable. Next, for the purely sequential stopping variable Nd , we can obviously claim that (n∗d /Nd )ω → 1 w.p.1(Pθ ) for all fixed ω(> 0) as d → 1+. Hence, the result follows for all fixed ω > 0.  The following result shows moment convergence for the standardized sample means, X Nd and Y Nd . This result may be of independent interest. Theorem 4.3. For the purely sequential sampling strategy (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the uniformly integrability of ∗1/2 ∗1/2 |nd (X Nd − θ−1 )|ω and |nd (Y Nd − θ)|ω so that we have as d → 1+: ∗1/2

Eθ [|nd

∗1/2

(X Nd − θ−1 )|ω ] and Eθ [|nd

(Y Nd − θ)|ω ]

are both O(1),

for all fixed ω > 0, where n∗d comes from (3.5). ∗1/2

Proof. From Chow et al. (1979), it is known that |nd (X Nd − θ−1 )|ω is uniformly integrable for all fixed ω > 0 if (n∗d /Nd )ω is uniformly integrable for all fixed ω > 0. But, our Theorem 4.1 part (i) combined with Theorem 4.2 show that ∗1/2 (n∗d /Nd )ω is uniformly integrable for all fixed ω > 0. Thus, |nd (X Nd − θ−1 )|ω is uniformly integrable for all fixed ω > 0.

BOUNDED-LENGTH CONFIDENCE INTERVALS

249

Let Z be a random variable that is distributed as N (0, θ−2 ). Now, from Anscombe’s (1952) random CLT, we know that ∗1/2

nd

(X Nd − θ−1 ) → N (0, θ−2 ) £

as

d→1+.

One may also refer to Gut (2012) and Mukhopadhyay and Chattopadhyay (2012). Hence, we have: ∗1/2

lim Eθ [|nd

d→1+

(X Nd − θ−1 )|ω ] = Eθ [|Z|ω ] = O(1),

for all fixed ω > 0. The other result follows similarly.  4.3. Second-order asymptotics We recall u, σ 2 , and Sn from (4.12) as well as tra,θ from (4.13) in order to define two new entities as follows: (4.19)

−1 − −1 ρ ≡ ρ(θ) = (u2 + σ 2 ){2u − Σ∞ n=1 n Eθ [Sn ]} ;   1 κ ≡ κ(θ) = u−1 ρ(θ) − tra,θ ; 2

where Sn− = min{0, Sn }. Now, we are in a position to significantly strengthen our previous conclusion in part (ii), Theorem 4.1. The next result shows asymptotic second-order efficiency of our proposed purely sequential estimation methodology (4.1) in the sense of Ghosh and Mukhopadhyay (1981). Theorem 4.4. For the purely sequential sampling strategy (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the following asymptotic result under true θ: (4.20)

lim Eθ [Nd − n∗d ] = κ(θ),

d→1+

where n∗d and κ(θ) come from (3.5) and (4.19) respectively. Proof. Given our discussions in Subsection 4.1, this result follows immediately from Theorem 2.9.6 in Ghosh et al. (1997, p. 64). Thus, with the function h(x, y) defined in (4.8), we claim:   1 −1 −1 Eθ [Nd ] = {h(θ , θ)} n0d + ρ(θ) − tra,θ + o(1) 2 ∗ = nd + κ(θ) + o(1). This completes the proof. 

250

NITIS MUKHOPADHYAY AND YAN ZHUANG

4.3.1. Numerical evaluation of κ(θ) Defined in (4.19) Computing u, σ 2 from (4.12) and tra,θ from (4.13) involves straightforward calculations given values of a and θ. But, in order to evaluate κ(θ) from (4.19), −1 − we also need to evaluate ρ(θ) requiring a numerical value of Σ∞ n=1 n Eθ [Sn ] with Sn coming from (4.12). −1 − Since exact evaluation of Σ∞ n=1 n Eθ [Sn ] is complicated, we decided to uti−1 − lize large-scale computer simulations to estimate the expression Σ∞ n=1 n Eθ [Sn ] fairly accurately under a number of fixed combination of a and θ values. Now, we explain this computer algorithm. First, we fixed n and approximated Eθ [Sn− ] for each fixed n = 1, . . . , Q, say. Having fixed n, we generated independent pairs of pseudorandom observations (Hn,r , Ln,r ), r = 1, . . . , R, say, on the pair of random variables (Hn , Ln ) defined in (4.12) where Hn , Ln are i.i.d. χ22n . Thus, with n fixed, we calculated r pseudorandom values (4.21)

1 1 Sn,r = nu + h1 θ−1 Hn,r + θh2 Ln,r , 2 2

r = 1, . . . , R,

on the random variable Sn defined via (4.12). Then, (4.22)

− −1 R Eθ [Sn− ] was estimated by E θ [Sn ] ≡ R Σr=1 min{0, Sn,r },

−1 − with n fixed, n = 1, . . . , Q. Next, we estimated Σ∞ n=1 n Eθ [Sn ] as follows:

(4.23)

Q − − −1   −1 Σ∞ n=1 n Eθ [Sn ] ≡ Σn=1 n Eθ [Sn ].

This led us to obtain:

(4.24)

  ≡ (u2 + σ 2 ){2u − Σ∞ n −1 E [S − ]}−1 , ρ(θ) θ n n=1    ≡ u−1 ρ(θ)  − 1 tra,θ , κ(θ) 2

in the spirit of (4.19). Table 1 shows the estimated values of ρ, κ, and other requisite entities corresponding to number of fixed combination of a and θ when we took Q = 5,000 and R = 10,000. Some comments are in order: • R = 10,000 replications are expected to be enough to estimate the true value of Eθ [Sn− ]; • In the case of the given combinations of a and θ values, it may be fairly reasonable to say that it would be rather nearly impossible to observe negative values of Sn when n is especially large enough (n > 5000). That is, we would largely expect Sn− ≡ min{0, Sn } = 0. Hence, we pick the value Q = 5000 to −1 − estimate Σ∞ n=1 n Eθ [Sn ]; • Estimated standard errors from R = 10,000 replications appeared very small (< 0.001) for each fixed n(= 1, . . . , Q) and for each fixed combination of a and θ. Thus, ρ and κ could be estimated very accurately.

BOUNDED-LENGTH CONFIDENCE INTERVALS

251

−1 E [S − ] from (4.12), Table 1. Estimating κ(θ) from (4.19) together with u, σ 2 , Σ∞ θ n n=1 n tra,θ from (4.13), when R = 10000, Q = 5000, and ρ comes from (4.19): a = (1, 2) and θ = (0.1, 1.0, 2.0, 5.0).

θ a

terms

0.1

1.0

2.0

5.0

1

u σ2 −1 Eθ [Sn− ] Σ∞ n=1 n ρ tra,θ

0.906 0.004 0.000 0.455 −0.039 0.524 0.821 0.013 0.000 0.418 −0.061 0.546

0.400 0.056 −0.010 0.267 0.004 0.663 0.187 0.033 −0.025 0.170 0.100 0.642

0.187 0.033 −0.025 0.170 0.100 0.642 0.060 0.006 −0.020 0.069 0.089 0.408

0.039 0.003 −0.014 0.049 0.068 0.385 0.010 0.000 −0.004 0.004 0.020 −0.600

 κ

2

u σ2 −1 Eθ [Sn− ] Σ∞ n=1 n ρ tra,θ  κ

• In the context of our Table 1, we additionally estimated ρ and κ when

(4.25)

1 (a) Q = R = 5,000 3 2 (c) Q = R = 10,000 3

(b) Q = R = 10,000 1 (d) Q = R = 10,000. 2

We saw no changes up to 3 decimal places in the estimated values of ρ and κ in comparisons with those entries shown in Table 1 in the process of implementing Q, R from (4.25). 4.4. Data illustrations using simulations In this section, we summarize some interesting features obtained from analyzing simulated data for the purely sequential bounded-length confidence interval estimation methodology (Nd , KNd ) defined via (4.1) and (4.5) for the parametric function pθ from (2.1). Simulations were carried out under these pre-fixed values: θ = 1, 2, 5, a = 2, and α = 0.10, 0.05, 0.01. Also, we fixed the following choices of values of d: (4.26)

d = 2.00, 1.65, 1.60, 1.55, 1.50, 1.35, 1.30, 1.20, 1.10, 1.08, 1.05.

The features and performances highlighted here remain nearly the same for many other choices of d and (θ, a, α) values, and so we omit those for brevity. Under each fixed set of values of θ, a, α, and d, we determined n0d from (3.8), the pilot sample size for purely sequential procedure. Also, we determined n∗d using (3.4), the optimal fixed sample size, but treated n∗d as unknown. We first generated n0d pseudorandom observations {(Xi , Yi ), i = 1, . . . , n0d }

252

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 2. Simulated performances of the purely sequential estimation strategy defined via (4.1) and (4.5) with 10,000 replications: θ = 1, a = 2 along with κ  (θ) = 0.642 defined via (4.19).

α

d

n0d

n∗d

n

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08

3 6 7 8 9 16 20 41 149 229 569 4 8 9 11 12 22 28 58 212 325 807 7 14 16 18 21 37 49 100 366 561

15.064 28.860 32.763 37.682 44.023 80.361 105.143 217.727 796.729 1221.932 3040.356 21.388 40.977 46.519 53.503 62.506 114.100 149.287 309.140 1131.233 1734.957 4316.842 36.942 70.775 80.346 92.409 107.960 197.071 257.845 533.940 1953.845 2996.586

15.717 29.513 33.414 38.293 44.605 80.896 105.807 218.320 797.799 1222.492 3040.186 21.910 41.496 47.088 53.998 63.154 114.642 149.801 309.590 1131.517 1735.445 4316.624 37.385 71.307 80.920 92.841 108.538 197.721 258.266 534.795 1954.765 2996.685

0.05

0.01

n − n∗d

n/n∗d

Cov b

sb

0.038 0.653 0.053 0.653 0.055 0.651 0.060 0.611 0.065 0.582 0.087 0.536 0.100 0.665 0.143 0.592 0.273 1.070 0.340 0.560 0.541 −0.170 0.046 0.522 0.063 0.519 0.066 0.569 0.072 0.495 0.076 0.647 0.103 0.542 0.118 0.514 0.171 0.450 0.329 0.284 0.411 0.488 0.639 −0.218 0.059 0.443 0.082 0.532 0.088 0.574 0.095 0.432 0.101 0.579 0.135 0.649 0.157 0.421 0.227 0.855 0.426 0.920 0.530 0.099

1.043 1.023 1.020 1.016 1.013 1.007 1.006 1.003 1.001 1.000 1.000 1.024 1.013 1.012 1.009 1.010 1.005 1.003 1.001 1.000 1.000 1.000 1.012 1.008 1.007 1.005 1.005 1.003 1.002 1.002 1.000 1.000

0.885 0.895 0.901 0.894 0.896 0.901 0.893 0.900 0.902 0.901 0.893 0.936 0.945 0.943 0.942 0.948 0.951 0.951 0.949 0.948 0.944 0.949 0.985 0.990 0.988 0.987 0.989 0.990 0.989 0.989 0.990 0.989

0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

sn

from the p.d.f. (1.1). Then, we generated one new pair of observations (X, Y ) at-a-time until termination according to the purely sequential rule (4.1). Under each configuration, we replicated the purely sequential procedure (4.1) 10,000(= B, say) times. In the ith replication, suppose that we observed terminal values Nd = ni , bi = 1 (or 0) if pθ belonged (or did not belong) to the constructed interval Kni in (4.5), i = 1, . . . , B. From such data observed across

BOUNDED-LENGTH CONFIDENCE INTERVALS

253

Table 3. Simulated performances of the purely sequential estimation strategy defined via (4.1) and (4.5) with 10,000 replications: θ = 2, a = 2 along with κ  (θ) = 0.408 defined via (4.19).

α

d

n0d

n∗d

n

0.1

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10

3 6 7 8 9 16 20 41 149 229 4 8 9 11 12 22 28 58 212 325 7 14 16 18 21 37 49 100 366

46.747 89.560 101.671 116.936 136.614 249.377 326.281 675.655 2472.420 3791.918 66.373 127.162 144.358 166.031 193.970 354.077 463.269 959.326 3510.459 5383.944 114.638 219.632 249.332 286.765 335.022 611.554 800.149 1656.931 6063.199

47.020 89.677 102.175 117.148 136.843 249.642 326.535 675.764 2473.567 3792.201 66.508 127.492 144.648 166.145 193.933 354.517 463.880 960.043 3510.788 5385.243 114.920 220.066 249.670 286.968 335.602 612.062 800.932 1657.671 6063.742

0.05

0.01

n − n∗d

n/n∗d

Cov b

sb

0.090 0.273 0.125 0.117 0.132 0.504 0.142 0.212 0.154 0.229 0.207 0.265 0.236 0.254 0.341 0.109 0.656 1.147 0.800 0.283 0.107 0.135 0.147 0.330 0.158 0.291 0.169 0.114 0.184 −0.038 0.247 0.441 0.282 0.611 0.401 0.716 0.788 0.329 0.957 1.299 0.141 0.281 0.195 0.434 0.210 0.338 0.221 0.202 0.242 0.580 0.326 0.508 0.370 0.783 0.535 0.740 1.016 0.543

1.006 1.001 1.005 1.002 1.002 1.001 1.001 1.000 1.000 1.000 1.002 1.003 1.002 1.001 1.000 1.001 1.001 1.001 1.000 1.000 1.002 1.002 1.001 1.001 1.002 1.001 1.001 1.000 1.000

0.891 0.894 0.898 0.897 0.895 0.899 0.900 0.897 0.899 0.904 0.945 0.948 0.946 0.948 0.945 0.949 0.949 0.955 0.946 0.951 0.987 0.989 0.986 0.989 0.988 0.989 0.989 0.991 0.990

0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

sn

B replications, we determined the following entities: ∗ n = B −1 ΣB i=1 ni : should estimate nd or E[Nd ];

(4.27)

2 1/2 sn = {(B 2 − B)−1 ΣB : estimated standard error of n; i=1 (ni − n) }

b = B −1 ΣB i=1 bi : should estimate Pθ {pθ ∈ KNd }; sb = {B −1 b(1 − b)}1/2 : estimated standard error of b;

Using the notation from (4.27), Tables 2–4 summarize our findings. All n values shown in column 5 are nearly the same as n∗d across the board whether the sample sizes are small (n∗d ≤ 100), moderate (100 < n∗d < 300) or large (n∗d ≥ 300). This is consistent with the notion of asymptotic first-order efficiency property (Theorem 4.1, part (ii)).

254

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 4. Simulated performances of the purely sequential estimation strategy defined via (4.1) and (4.5) with 10,000 replications: θ = 5, a = 2 along with κ  (θ) = −0.600 defined via (4.19).

α

d

n0d

n∗d

n

sn

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30

3 6 7 8 9 16 20 41 4 8 9 11 12 22 28 58 7 14 16 18 21 37 49

281.587 539.484 612.437 704.385 822.919 1502.169 1965.416 4069.939 399.811 765.984 869.567 1000.120 1168.419 2132.850 2790.591 5778.692 690.546 1322.994 1501.901 1727.388 2018.072 3683.819 4819.858

281.945 539.352 612.264 704.315 823.468 1503.072 1964.865 4070.838 399.751 765.756 869.625 1000.015 1168.885 2133.827 2789.685 5780.603 690.183 1322.005 1503.627 1727.584 2018.434 3683.795 4822.132

0.238 0.327 0.355 0.373 0.406 0.550 0.631 0.899 0.282 0.388 0.417 0.448 0.485 0.646 0.750 1.096 0.370 0.516 0.553 0.593 0.636 0.848 0.982

0.05

0.01

n − n∗d

n/n∗d

Cov b

sb

0.357 −0.131 −0.173 −0.070 0.549 0.903 −0.551 0.899 −0.060 −0.228 0.057 −0.105 0.466 0.977 −0.906 1.911 −0.363 −0.989 1.726 0.196 0.362 −0.024 2.274

1.001 1.000 1.000 1.000 1.001 1.001 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.999 1.001 1.000 1.000 1.000 1.000

0.897 0.901 0.894 0.903 0.901 0.902 0.898 0.897 0.950 0.953 0.948 0.947 0.951 0.948 0.948 0.943 0.991 0.989 0.989 0.989 0.988 0.990 0.990

0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001

We also note that the b values (column 9) are very close to the target coverage (Cov), 1 − α. This validates the notion of asymptotic consistency property (Theorem 4.1, part (iii)). All estimated standard error values, namely sn and sb , came out small. And for each fixed combination of α and d, all 10,000 confidence intervals had their lengths smaller than the corresponding d−1 d+1 , which validates our conclusion from Theorem 2.1. From Theorem 4.4, we know that the purely sequential sampling strategy should enjoy asymptotic second-order efficiency property (4.20) for large n∗d , that is as d → 1+. We may reasonably expect that the differences between n and n∗d should hover around κ(θ) defined via (4.19). From Table 1, we observe: (i) κ  = 0.642 when θ = 1, a = 2; (ii) κ  = 0.408 when θ = 2, a = 2; and (iii) κ  = −0.600 when θ = 5, a = 2. From column 7 in Tables 2–4, we see that the values of n − n∗d are very close to the corresponding κ values in the sense that the corresponding intervals of (n − n∗d ) ± 2sn include these κ  values. These validates asymptotic second-order efficiency property (4.20) in practice.

BOUNDED-LENGTH CONFIDENCE INTERVALS

255

5. A two-stage estimation methodology Recall (i) the expression of n∗d from (3.4), that is the optimal fixed sample size whose magnitude remains unknown, and (ii) the fact that n∗d > n0d given by (3.7) where n0d is positive and completely known. We had utilized these information in developing our purely sequential bounded-length confidence interval estimation strategy and its interesting properties (Section 4). It may be more convenient, however, to implement a two-stage boundedlength confidence interval estimation strategy especially since batch sampling in two steps will enjoy significant operational convenience when compared with purely sequential sampling. Thus, we proceed to develop a two-stage estimation methodology and its associated first-order as well as second-order properties, assuming that we may be allowed to gather data in two batches given a practical scenario on hand. We let the pilot sample size be n0d as in (3.8), gather initial data, and obtain the MLE for θ based on pilot data summarized as follows:     1 zα/2 2 Pilot Size : n0d ≡ + 1; 2 log d (5.1) Pilot Data : {(Xi , Yi ), i = 1, . . . , n0d }; 0d 0d Yi /Σni=1 Xi )1/2 . MLE for θ : Tn0d ≡ θn0d ,MLE = (Σni=1

Next, we let u denote the largest integer < u with u > 0. Recall the function g(x) = x(1 − e−x )−1 , x > 0, from (3.6). Now, based on the information laid out in (5.1), we define our two-stage stopping time:     1 zα/2 2 2 (5.2) Ld ≡ L = {g(aTn0d )} + 1, 2 log d to construct a 100(1 − α)% fixed-accuracy confidence interval for qθ along with g(·) function coming from (3.6). It should be clear that the stopping variable Ld estimates n∗d in two steps. In the second stage, we record additional data {(Xi , Yi ), i = n0d + 1, . . . , Ld } in a single batch. Then, based on the combined set of data, that is {(Xi , Yi ), i = 1, . . . , Ld }, we propose the following bounded-length confidence interval for the parametric function defined via (2.1), namely pθ : Ld 1/2 d in the spirit of (4.2); TLd ≡ θLd ,MLE = (ΣL i=1 Yi /Σi=1 Xi )

(5.3)

ULd = e−aTLd (1 − e−aTLd )−1 in the spirit of (4.3); KLd = {pθ : pθ ∈ [(d + ULd )−1 ULd , (dULd + 1)−1 dULd ]} in the spirit of (4.5).

Again, the length of the proposed confidence interval KLd for pθ is bounded from above by d−1 d+1 in view of Theorem 2.1.

256

NITIS MUKHOPADHYAY AND YAN ZHUANG

5.1. Moments of the MLE and their expansions In order to develop asymptotic first-order and second-order properties for the two-stage estimation strategy, we will require behaviors of the moments of Tn0d , the MLE of θ, up to certain specific orders. These results may be of independent interest, but some of these may be well-known. We collect them here for immediate attention and completeness. We explicitly focus on positive moments of Tn0d since the expressions and expansions of the negative moments of Tn0d will obviously follow from the positive 1/2 1/2 moments of Tn0d once we note that Tn0d ∼ θF2n0d ,2n0d so that Tn−1 ∼ θ−1 F2n0d ,2n0d 0d under Pθ . Let us consider small enough d(> 1) so that n0d may be large enough enabling us to talk about the moments of Tn0d . We begin with the following explicit expressions:       1 1 Γ n + − Γ n 0d 0d   2 2 − 1 Eθ [Tn0d − θ] = θ   , 2 {Γ(n0d )}     1 1 Γ n0d − Γ n0d +   2 2 2 2  n0d . − θ) ] = θ  + 1 −2  n0d − 1 {Γ(n0d )}2 

(5.4) Eθ [(Tn0d



Now, according to Property 6.1.47 in Abramowitz and Stegun (1972, p. 257), for large n, we know: (5.5)

nb−a

Γ(n + a) 1 = 1 + (a − b)(a + b − 1)n−1 + O(n−2 ), Γ(n + b) 2

where a, b are any two fixed real numbers. Thus, for large n0d or small d(> 1), by exploiting (5.4)–(5.5) we have the following results:   1 Γ n0d + 1 2 −1/2 + O(n−2 and n0d = 1 − n−1 0d ), Γ(n0d ) 8 0d   (5.6) 1 Γ n0d − 3 2 1/2 n0d + O(n−2 = 1 + n−1 0d ). Γ(n0d ) 8 0d Using (5.6), we immediately obtain:     1 1 Γ n0d − Γ n0d + 1 2 2 (5.7) = 1 + n−1 + O(n−2 0d 0d ). 2 {Γ(n0d )} 4

BOUNDED-LENGTH CONFIDENCE INTERVALS

257

Combining (5.4) with (5.7), we obtain the following expressions: 1 θ + O(n−2 Eθ [Tn0d − θ] = n−1 0d ), 4 0d 1 Eθ [(Tn0d − θ)2 ] = n−1 θ2 + O(n−2 0d ). 2 0d

(5.8)

For large enough n0d (> k > 0), we can also express: (5.9)

E[Tn2k0d ] = θ2k

Γ(n0d + k)Γ(n0d − k) ⇒ |Eθ [(1 + a(Tn0d + θ))k ]| < ∞. {Γ(n0d )}2

Using (5.5) and (5.9), we obtain: (5.10)

Eθ [|Tn0d − θ|4 ] = E[θ4 − 4θ3 Tn0d + 6θ2 Tn20d − 4θTn30d + Tn40d ]    1 −2 −2 = θ4 1 − 4 1 + n−1 + O(n ) + 6(1 + n−1 0d 0d + O(n0d )) 4 0d    9 −1 −2 −1 −2 − 4 1 + n0d + O(n0d ) + (1 + 4n0d + O(n0d )) 4 = θ4 O(n−2 0d ).

that is, Eθ [|Tn0d − θ|4 ] = O(n−2 0d ).

(5.11)

5.2. First-order asymptotics We first set out to introduce a number of desirable interesting properties associated with our proposed bounded-length two-stage confidence interval estimation strategy (Ld , KLd ) defined via (5.2) and (5.3) for the parametric function pθ from (2.1). Theorem 5.1. For the two-stage sampling strategy (Ld , KLd ) defined via (5.2)–(5.3) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the following asymptotic results as d → 1+: (i) Ld /n∗d → 1 w.p.1(Pθ ); (ii) Eθ [Ld /n∗d ] → 1 [Asymptotic First-Order Efficiency]; (iii) Pθ {pθ ∈ KLd } → 1 − α [Asymptotic Consistency]; where n∗d comes from (3.5) and a(> 0) is known. Proof. In what follows, we sketch an outline of the proof. Part (i). Using (5.2), we will have the following inequality w.p.1(Pθ ): (5.12)

1 2



zα/2 log d

2

1 {g(aTn0d )} ≤ Ld ≤ 2 2



zα/2 log d

2 {g(aTn0d )}2 + 1.

258

NITIS MUKHOPADHYAY AND YAN ZHUANG

Now, since Ld → ∞ w.p.1(Pθ ), Tn0 d → θ w.p.1(Pθ ), and n∗d → ∞ as d → 1+, the result follows by dividing throughout (5.12) with n∗d and then taking limits as d → 1+. Part (ii). In view of the upper bound from Lemma 3.1, we begin by expressing: (5.13)

Eθ [{g(aTn0d )}4 ] ≤ Eθ [(1 + aTn0d )4 ] = Eθ [1 + 4aTn0d + 6a2 Tn20d + 4a3 Tn30d + a4 Tn40d ].

Then, utilizing (5.9), we can show that the upper bound from (5.13) is finite and it converges to 1 + 4aθ + 6a2 θ2 + 4a3 θ3 + a4 θ4 , as d → 1+. Thus, clearly, {g(aTn0d )}2 is uniformly integrable. Now, part (ii) follows in view of part (i). Part (iii). This result follows along the lines of Theorem 4.1, part (iii) once we realize:     1 2 1 2 £ £ 1/2 ∗1/2 Ld (TLd −θ) → N 0, θ and nd (TLd −θ) → N 0, θ as d → 1+. 2 2 Now, the proof is complete.  Remark 5.2. The conclusions from Theorems 4.2–4.3 continue to hold when Nd is replaced by Ld . For brevity, the detailed proofs are omitted. 5.3. Second-order asymptotics Before we get to asymptotic second-order analysis, we define two new entities: (5.14)

h(x) ≡ a2 x2 (1 − e−ax )−2 = {g(ax)}2 , 1 φ(θ) ≡ {θh (θ) + θ2 h (θ)}; 4

x > 0;

where the first three derivatives of the h-function are expressed as: h (x) = 2a2 x(1 − e−ax )−2 − 2a3 x2 e−ax (1 − e−ax )−3 ; h (x) = 2a2 (1 − e−ax )−2 + 2a3 xe−ax (ax − 4)(1 − e−ax )−3 (5.15)

+ 6a4 x2 e−2ax (1 − e−ax )−4 ; h (x) = −2a3 e−ax (a2 x2 − 6ax + 6)(1 − e−ax )−3 + 18a4 xe−2ax (2 − ax)(1 − e−ax )−4 − 24a5 x2 e−3ax (1 − e−ax )−5 .

Now, we proceed to obtain an expansion for Eθ [h(Tn0d )−h(θ)] up to a desired order when n0d is large. This result would help in the sequel in securing asymptotic second-order efficiency property in the sense of Ghosh and Mukhopadhyay (1981) for the two-stage sampling strategy (Ld , KLd ) from (5.2)–(5.3).

BOUNDED-LENGTH CONFIDENCE INTERVALS

259

Theorem 5.2. With 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, as d → 1+, we have the following expansion: (5.16)

1 −3/2 Eθ [h(Tn0d )] = h(θ) + {θh (θ) + θ2 h (θ)}n−1 0d + O(n0d ), 4

where n0d comes from (5.1) and h(·), h (·) and h (·) come from (5.14)–(5.15). Proof. We express h(x) from (5.14) using Taylor expansion around x = θ: (5.17)

1 1 h(x) = h(θ) + (x − θ)h (θ) + (x − θ)2 h (θ) + (x − θ)3 h (ξ), 2 6

with some appropriate ξ that lies between x and θ. From (5.17), we obtain: (5.18)

1 1 Eθ [h(Tn0d )] = h(θ) + h (θ)Eθ [Tn0d − θ] + h (θ)Eθ [(Tn0d − θ)2 ] + Rd , 2 6 where the remainder term is Rd ≡ Eθ [h (Wd )(Tn0d − θ)3 ],

with an appropriate random variable Wd lying between Tn0d and θ. Now, using Jensen’s inequality, Lemma 3.1, and the expression of h (x) from (5.15), we obtain: (5.19)

|Rd | ≤ Eθ [|h (Wd )||Tn0d − θ|3 ]   3 3 (1 + aWd ) 2 2 ≤ 2Eθ |Tn0d − θ| (a Wd + 6aWd + 6) Wd3   (1 + aWd )4 (2 + aW ) + 18Eθ |Tn0d − θ|3 d Wd3   (1 + aWd )5 + 24Eθ |Tn0d − θ|3 Wd3 = 2Eθ [A1,d ] + 18Eθ [A2,d ] + 24Eθ [A3,d ],

say.

Next, we begin to handle each term Eθ [Ai,d ] from (5.19) by splitting it as follows:       1 1 Eθ [Ai,d ] = Eθ Ai,d I Wd > θ (5.20) + Eθ Ai,d I Wd ≤ θ 2 2 say, = Eθ [Ai1,d ] + Eθ [Ai2,d ], for i = 1, 2, 3. At this point, in view of (5.20), our goal is to show: (5.21)

−3/2

Eθ [Ai1,d ] = O(n0d

)

and

−3/2

Eθ [Ai2,d ] = O(n0d

),

for all θ > 0, i = 1, 2, 3. Let us fix i = 1 and let c ≡ c(a, θ) stand for a generic positive constant. We note that Wd lies between Tn0d and θ and on the set [Wd > 12 θ], we certainly have 1 2 θ < Wd < Tn0d +θ. Thus, using triangular inequality, we look at an upper bound

260

NITIS MUKHOPADHYAY AND YAN ZHUANG

for |Eθ [A11,d ]| which would consist of a sum of terms such as cEθ [|Tn0d − θ|3 Tnk0d ] with k = 0, ±1, ±2, ±3. Applying Holder’s inequality, (5.9), and (5.11), we can −3/2 claim that |Eθ [A11,d ]| = O(n0d ). Similarly, we can argue that |Eθ [A12,d ]| = −3/2 O(n0d ). With analogous analyses, we can verify (5.21) when i = 2, 3. We combine these findings with (5.8), (5.18)–(5.20) to complete the proof.  Theorem 5.3. For the two-stage sampling strategy (Ld , KLd ) defined via (5.2)–(5.3) for the parametric function pθ from (2.1), with 0 < α < 1 and θ > 0 fixed but otherwise arbitrary, we have the following asymptotic second-order result: (5.22)

φ ≤ lim Eθ [Ld − n∗d ] ≤ φ + 1, d→1+

where n∗d and φ ≡ φ(θ) come from (3.5) and (5.14) respectively as well as h(·), h (·) and h (·) come from (5.14)–(5.15). z

α/2 2 ∗ 2 Proof. Let us denote Hα,d ≡ 12 ( log d ) and Ld ≡ Hα,d {g(aTn0d )} = ∗ Hα,d h(Tn0d ) with d > 1. We emphasize that w.p.1(Pθ ), Ld is a positive random variable, but it is not a positive integer valued random variable. We recall that n0d ≡ Hα,d  + 1 where u is the largest integer < u, u > 0, and n∗d = Hα,d h(θ). From Theorem 5.2, we immediately conclude:

−3/2

(5.23) Eθ [L∗d − n∗d ] ≡ Eθ [Hα,d {h(Tn0d ) − h(θ)}] = Hα,d {φ(θ)n−1 0d + O(n0d = φ(θ){1 + o(1)} +

−3/2 Hα,d O(n0d )

)}

= φ(θ) + o(1),

since Hα,d /n0d = 1 + o(1) as d → 1+. Obviously, w.p.1(Pθ ), we have: (5.24)

L∗d ≤ Ld ≤ L∗d + 1.

Now, combining (5.23)–(5.24), the theorem follows.  5.3.1. Some heuristics to estimate Eθ [Ld − L∗d ] tightly Let us record our heuristic thoughts step-by-step in support of our understanding of possible validity of this conjecture. Step 1. Observe that h(x) is a strictly increasing function in x > 0. Thus, the distribution function Pθ {h(Tn0d ) ≤ x} for h(Tn0d ) can be equivalently expressed as Pθ {Tn0d ≤ x∗ } with x∗ = h−1 (x) for all x > 0. Step 2. Conditionally, given X1 , . . . , Xn0d which are independent of Y1 , . . . , Yn0d , we are able to express Pθ {Tn0d ≤ x∗ } in the form of a distribution function of a chi-square random variable with 2n0d degrees of freedom. Step 3. Then appropriately modified versions of Lemma 2 and Lemma 3 from Aoshima and Yata (2010) would be expected to hold to claim Eθ [Tn0d −

BOUNDED-LENGTH CONFIDENCE INTERVALS

261

Tn0d ] ≈ 12 + o(1). One could look over other related publications of Aoshima and his collaborators. Step 4. Next, an appropriately improvised version of Lemma 4 from Aoshima and Yata (2010) would be expected to hold thereby leading us to our heuristic claim that Ld − L∗d is asymptotically distributed as Uniform(0, 1). This, upon verification, would lead to (5.25). Our heuristic Steps 1–4 would lead us to propose the following approximation: 1 (5.25) Eθ [Ld − L∗d ] ≈ . 2 as a suggested guideline for use in practice. 5.3.2. Empirical validation of (5.26) via computer simulations We may look into some empirical characteristics of a random variable defined as follows:       1 zα/2 2 1 zα/2 2 2 2 {g(aTn0d )} − {g(aTn0d )} , Ud = (5.26) 2 log d 2 log d with the help of computer simulations. In order to facilitate that part of our investigation, we first fixed a set of values α, d, θ, a which led to the associated well-defined pilot size, n0d . Then, for such a fixed set of values α, d, θ, a, we went through the following steps: Step 1. We draw n0d pseudo random observations on (X, Y ) following the distribution (1.1). These would lead to one observed value ud1 of Ud defined via (5.26). This is only the first iteration out of 100,000(= R, say) iterations. After R iterations under a fixed set of values α, d, θ, a, we would have recorded R independently observed values ud1 , . . . , udR of Ud . Step 2. In Tables 5–7, we summarize a number of customary descriptive statistics: mean u (column 4), estimated standard error su (column 5), median umed (column 6), lower quartile QL (column 7), upper quartile QU (column 8). We saw the minimum umin and maximum umax coincide with zero and one respectively throughtout this exercise. Thus, in Tables 5–7, we do not show the umin , umax values. Step 3. Additionally, we ran tests to decide whether or not ud1 , . . . , udR could be reasonably assumed to arrive from a Uniform(0, 1) universe. We performed both chi-square goodness-of-fit test as well as the Kolmogorov-Smirnov (KS) test. The chi-square goodness-of-fit test showed P-value = 1.0 throughtout this exercise and thus we keep these out from Tables 5–7. However, these tables show the P-values (column 9) associated with the KS test. Step 4. We also looked at the histograms and boxplots illustrated in Fig. 1 obtained from our observed dataset ud1 , . . . , udR . In a very small number of situations, our reported P-values have fallen under 0.05. However, considering all our histograms and boxplots obtained in the contexts of Tables 5–7, we saw no appreciable departure overall from a Uniform(0, 1) distribution’s fit.

262

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 5. Summary statistics from 100,000 pseudorandom observations on Ud in (5.26) under same configuration as in Table 2. Column 9 shows P-value associated with the KS test: θ = 1, a = 2.

α

d

n0d

u

su

umed

QL

QU

P-values

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08

3 6 7 8 9 16 20 41 149 229 569 4 8 9 11 12 22 28 58 212 325 807 7 14 16 18 21 37 49 100 366 561

0.501 0.501 0.501 0.501 0.500 0.499 0.499 0.500 0.500 0.501 0.502 0.500 0.500 0.502 0.500 0.498 0.501 0.499 0.501 0.500 0.500 0.499 0.501 0.500 0.499 0.500 0.499 0.500 0.499 0.499 0.498 0.500

0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

0.502 0.500 0.503 0.503 0.500 0.498 0.499 0.499 0.501 0.500 0.500 0.500 0.501 0.503 0.499 0.497 0.501 0.499 0.500 0.502 0.500 0.499 0.502 0.500 0.497 0.500 0.499 0.500 0.499 0.497 0.497 0.501

0.252 0.250 0.253 0.251 0.249 0.248 0.250 0.250 0.247 0.251 0.252 0.249 0.249 0.252 0.250 0.247 0.251 0.249 0.252 0.250 0.250 0.247 0.250 0.251 0.249 0.251 0.249 0.251 0.250 0.250 0.248 0.250

0.751 0.751 0.75 0.751 0.75 0.75 0.749 0.751 0.751 0.751 0.754 0.750 0.752 0.752 0.751 0.748 0.750 0.749 0.752 0.751 0.751 0.748 0.752 0.750 0.748 0.750 0.749 0.750 0.748 0.748 0.749 0.749

0.061 0.968 0.056 0.175 0.719 0.473 0.816 0.879 0.250 0.905 0.010 0.508 0.342 0.030 0.694 0.173 0.555 0.453 0.443 0.864 0.820 0.167 0.319 0.926 0.120 0.659 0.331 0.456 0.550 0.184 0.202 0.752

0.05

0.01

Step 5. Then, we successively fixed other sets of values α, d, θ, a, and ran through Steps 1–4. Our summary findings are highlighted in Tables 5–7 corresponding to several sets of values α, d, θ, a that are consistent with such choices highlighted in Tables 2–4. We also include a selected set of histograms obtained from ud1 , . . . , udR with the Uniform(0, 1) p.d.f. with side-by-side boxplots as illustrations in Fig. 1. Given that we considered a wide variety of choices of α, d, θ, a (and hence, n0d ), these numerical analyses seem to validate reasonably well our strong sen-

BOUNDED-LENGTH CONFIDENCE INTERVALS

263

Table 6. Summary statistics from 100,000 pseudorandom observations on Ud in (5.26) under same configuration as in Table 3. Column 9 shows P-value associated with the KS test: θ = 2, a = 2.

α

d

n0d

u

su

umed

QL

QU

P-values

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10

3 6 7 8 9 16 20 41 149 229 4 8 9 11 12 22 28 58 212 325 7 14 16 18 21 37 49 100 366

0.499 0.501 0.501 0.499 0.498 0.5 0.499 0.499 0.500 0.500 0.499 0.499 0.499 0.499 0.500 0.500 0.501 0.499 0.501 0.501 0.500 0.500 0.500 0.499 0.499 0.499 0.501 0.499 0.500

0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

0.499 0.5 0.501 0.498 0.498 0.5 0.498 0.496 0.500 0.499 0.498 0.499 0.498 0.499 0.501 0.500 0.504 0.499 0.501 0.500 0.499 0.500 0.499 0.498 0.499 0.499 0.500 0.499 0.500

0.248 0.249 0.249 0.246 0.247 0.250 0.248 0.250 0.249 0.249 0.247 0.251 0.248 0.248 0.251 0.250 0.250 0.248 0.251 0.252 0.249 0.250 0.249 0.249 0.248 0.247 0.252 0.251 0.250

0.749 0.752 0.751 0.751 0.750 0.750 0.749 0.749 0.749 0.751 0.749 0.749 0.749 0.749 0.748 0.750 0.751 0.750 0.750 0.750 0.750 0.749 0.750 0.748 0.749 0.750 0.751 0.749 0.751

0.429 0.302 0.611 0.061 0.051 0.681 0.294 0.055 0.959 0.860 0.098 0.608 0.438 0.620 0.659 0.895 0.090 0.413 0.852 0.513 0.333 0.889 0.996 0.479 0.188 0.192 0.447 0.473 0.834

0.05

0.01

timent that the asymptotic distribution of Ud from (5.27) is empirically approximated rather accurately by the Uniform(0, 1) distribution. In other words, we put forward the approximation suggested via (5.26) as a practical guideline with reasonable confidence. On a related note, in the light of our extensive sets of discussions combined from Subsections 5.3.1–5.3.2, for all practical purposes, we modify the conclusion from Theorem 5.3 to propose the following reasonable approximation: (5.27)

Eθ [Ld − n∗d ] ≈ φ +

1 + o(1). 2

Instead of the obvious bounds seen in (5.24), now we feel confident enough to

264

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 7. Summary statistics from 100,000 pseudorandom observations on Ud in (5.26) under same configuration as in Table 4. Column 9 shows P-value associated with the KS test: θ = 5, a = 2.

α

d

n0d

u

su

umed

QL

QU

P-values

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30

3 6 7 8 9 16 20 41 4 8 9 11 12 22 28 58 7 14 16 18 21 37 49

0.498 0.500 0.499 0.502 0.500 0.499 0.499 0.500 0.499 0.499 0.502 0.500 0.500 0.500 0.500 0.500 0.501 0.501 0.499 0.500 0.498 0.499 0.500

0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

0.496 0.500 0.500 0.502 0.501 0.499 0.500 0.500 0.499 0.498 0.503 0.500 0.502 0.501 0.500 0.499 0.499 0.501 0.499 0.501 0.499 0.498 0.500

0.248 0.250 0.249 0.253 0.250 0.248 0.248 0.249 0.249 0.248 0.251 0.249 0.251 0.249 0.250 0.249 0.252 0.252 0.249 0.248 0.247 0.249 0.251

0.749 0.751 0.750 0.753 0.750 0.749 0.747 0.750 0.748 0.750 0.753 0.749 0.750 0.750 0.750 0.750 0.750 0.751 0.748 0.750 0.749 0.751 0.751

0.008 0.969 0.563 0.020 0.830 0.507 0.197 0.879 0.589 0.315 0.022 0.757 0.741 0.839 0.997 0.899 0.219 0.802 0.625 0.387 0.012 0.586 0.850

0.05

0.01

heuristically proceed and conjecture: (5.28)

Eθ [Ld − n∗d ] = φ +

1 + o(1). 2

5.4. Data illustrations using simulations In the spirit of Subsection 4.4, we summarize some interesting features obtained from analyzing simulated data for the two-stage bounded-length confidence interval estimation methodology (Ld , KLd ) defined via (5.2)–(5.3) for the parametric function pθ from (2.1). Simulations were analogously carried out under these pre-fixed values: θ = 1, 2, 5, a = 2, α = 0.10, 0.05, 0.01, and d = 2.00, 1.65, 1.60, 1.55, 1.50, 1.35, 1.30, 1.20, 1.10, 1.08, 1.05. The features and performances highlighted here remain nearly the same for many other choices of d and (θ, a, α) values, and so we omit those for brevity. Under each fixed set of values of θ, a, α, and d, we determined n0d from (3.8), the pilot sample size. Also, we determined n∗d using (3.4), the optimal fixed

BOUNDED-LENGTH CONFIDENCE INTERVALS

a1: P-value = 0.719

a2

b1: P-value = 0.315

b2

c1: P-value = 0.996

c2

265

Figure 1. Empirical sampling distribution of Ud from (5.26) based on 100,000 pseudorandom observations. Each horizontal panel shows two pictures: (i) the histogram (panels a1, b1, c1) on the left and (ii) the boxplot (panels a2, b2, c2) on the right obtained from the same set of 100,000 observations under the following configurations. Panel (a) a = 2, θ = 1, α = 0.10, d = 1.5; Panel (b) a = 2, θ = 5, α = 0.05, d = 1.6; Panel (c) a = 2, θ = 2, α = 0.01, d = 1.6.

sample size, but treated n∗d as unknown. We first generated n0d pseudorandom observations {(Xi , Yi ), i = 1, 2, . . . , n0d } from the p.d.f. (1.1). Then, we generated Ld − n0d new pair of observations (X, Y ) in a single batch at the second stage. Under each configuration, we replicated the two-stage procedure (5.2)–(5.3) 10,000(= B, say) times. In the ith replication, suppose that we observed terminal values Ld = li , bi = 1 (or 0) if pθ belonged (or did not belong) to the constructed interval Kli in (4.5), i = 1, . . . , B. From such data observed across B replications,

266

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 8. Simulated performances of the two-stage estimation strategy defined via (5.2) and (5.3) with 10,000 replications: θ = 1, a = 2 along with φ(θ) = 3.624 defined via (5.14).

α

d

n0d

n∗d

l

sl

l − n∗d

l/n∗d

Cov b

sb

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 1.05 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08

3 6 7 8 9 16 20 41 149 229 569 4 8 9 11 12 22 28 58 212 325 807 7 14 16 18 21 37 49 100 366 561

15.064 28.860 32.763 37.682 44.023 80.361 105.143 217.727 796.729 1221.932 3040.356 21.388 40.977 46.519 53.503 62.506 114.100 149.287 309.140 1131.233 1734.957 4316.842 36.942 70.775 80.346 92.409 107.960 197.071 257.845 533.940 1953.845 2996.586

21.172 33.238 37.175 41.873 47.963 84.592 109.408 221.723 801.284 1228.072 3044.282 26.902 45.658 50.981 57.334 67.019 118.193 153.803 312.553 1136.223 1739.885 4321.171 41.932 74.779 84.484 96.586 112.413 201.180 261.663 537.166 1957.258 3000.222

0.229 0.162 0.166 0.169 0.172 0.224 0.251 0.343 0.642 0.794 0.559 0.195 0.187 0.188 0.187 0.207 0.255 0.296 0.407 0.753 0.943 0.663 0.188 0.211 0.218 0.236 0.254 0.336 0.371 0.527 1.001 1.234

6.108 4.378 4.412 4.190 3.940 4.231 4.265 3.995 4.555 6.140 3.926 5.514 4.680 4.462 3.831 4.512 4.093 4.516 3.413 4.990 4.928 4.329 4.990 4.004 4.137 4.176 4.453 4.109 3.818 3.226 3.413 3.636

1.405 1.152 1.135 1.111 1.089 1.053 1.041 1.018 1.006 1.005 1.001 1.258 1.114 1.096 1.072 1.072 1.036 1.030 1.011 1.004 1.003 1.001 1.135 1.057 1.051 1.045 1.041 1.021 1.015 1.006 1.002 1.001

0.884 0.900 0.891 0.898 0.892 0.898 0.895 0.901 0.898 0.898 0.899 0.938 0.942 0.942 0.946 0.946 0.947 0.948 0.948 0.950 0.949 0.950 0.984 0.985 0.987 0.986 0.987 0.989 0.987 0.990 0.990 0.990

0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.001 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001

0.05

0.01

we determined the following entities: ∗ l = B −1 ΣB i=1 li : should estimate nd or E[Ld ];

(5.29)

2 1/2 sl = {(B 2 − B)−1 ΣB : estimated standard error of l; i=1 (li − l) }

b = B −1 ΣB i=1 bi : should estimate Pθ {pθ ∈ KLd }; sb = {B −1 b(1 − b)}1/2 : estimated standard error of b;

in the spirit of (4.26). Using the notation explained in (5.29), Tables 8–10 summarize our findings.

BOUNDED-LENGTH CONFIDENCE INTERVALS

267

Table 9. Simulated performances of the two-stage estimation strategy defined via (5.2) and (5.3) with 10,000 replications: θ = 2, a = 2 along with φ(θ) = 16.122 defined via (5.14).

α

d

n0d

n∗d

l

sl

l − n∗d

n/n∗d

Cov b

sb

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10 1.08 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 1.10

3 6 7 8 9 16 20 41 149 229 4 8 9 11 12 22 28 58 212 325 7 14 16 18 21 37 49 100 366

46.747 89.560 101.671 116.936 136.614 249.377 326.281 675.655 2472.420 3791.918 66.373 127.162 144.358 166.031 193.970 354.077 463.269 959.326 3510.459 5383.944 114.638 219.632 249.332 286.765 335.022 611.554 800.149 1656.931 6063.199

69.350 107.791 118.883 133.357 153.807 265.963 343.933 692.523 2488.793 3808.575 88.046 145.463 162.460 182.527 211.582 370.903 479.813 977.251 3526.178 5401.646 133.621 236.506 265.442 304.224 352.218 628.981 817.234 1675.262 6081.013

0.363 0.309 0.306 0.315 0.335 0.415 0.469 0.649 1.194 1.474 0.353 0.342 0.357 0.351 0.387 0.483 0.550 0.763 1.423 1.758 0.342 0.395 0.411 0.442 0.466 0.619 0.696 0.988 1.870

22.603 18.231 17.212 16.421 17.193 16.586 17.653 16.868 16.373 16.657 21.673 18.301 18.102 16.496 17.612 16.827 16.545 17.924 15.719 17.702 18.982 16.875 16.110 17.458 17.196 17.426 17.085 18.331 17.814

1.484 1.204 1.169 1.140 1.126 1.067 1.054 1.025 1.007 1.004 1.327 1.144 1.125 1.099 1.091 1.048 1.036 1.019 1.004 1.003 1.166 1.077 1.065 1.061 1.051 1.028 1.021 1.011 1.003

0.869 0.880 0.886 0.885 0.887 0.892 0.893 0.894 0.899 0.898 0.922 0.934 0.935 0.937 0.940 0.943 0.945 0.947 0.948 0.949 0.978 0.983 0.985 0.985 0.985 0.988 0.988 0.989 0.990

0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000

0.05

0.01

All l values shown in column 5 overestimate n∗d across the board whether the sample sizes are small (n∗d ≤ 100), moderate (100 < n∗d < 300) or large (n∗d ≥ 300), but the extent of overestimation goes down fast as n∗d grows. This is consistent with the notion of asymptotic first-order efficiency property seen from Theorem 5.1, part (ii). We also note that the b values (column 9) are very close to the target coverage (Cov), 1 − α. This validates the notion of asymptotic consistency property (Theorem 5.1, part (iii)). All estimated standard error values, namely sn and sb , came out small. And for each fixed combination of α and d, all 10,000 confidence intervals had their lengths smaller than the corresponding d−1 d+1 , which validates our conclusion from Theorem 2.1. From Theorem 5.3, we know that the two-stage sampling strategy should

268

NITIS MUKHOPADHYAY AND YAN ZHUANG Table 10. Simulated performances of the two-stage estimation strategy defined via (5.2) and (5.3) with 10,000 replications: θ = 5, a = 2 along with φ(θ) = 100.123 defined via (5.14).

α

d

n0d

n∗d

l

sl

l − n∗d

n/n∗d

Cov b

sb

0.10

2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30 1.20 2.00 1.65 1.60 1.55 1.50 1.35 1.30

3 6 7 8 9 16 20 41 4 8 9 11 12 22 28 58 7 14 16 18 21 37 49

281.587 539.484 612.437 704.385 822.919 1502.169 1965.416 4069.939 399.811 765.984 869.567 1000.120 1168.419 2132.850 2790.591 5778.692 690.546 1322.994 1501.901 1727.388 2018.072 3683.819 4819.858

429.335 647.041 716.128 799.077 926.902 1604.316 2060.453 4173.376 527.953 873.174 983.913 1102.890 1268.631 2246.196 2891.366 5888.524 811.095 1425.289 1602.945 1842.444 2117.868 3787.650 4916.647

5.468 4.359 4.540 4.461 4.730 5.985 6.808 9.381 4.732 4.877 5.146 5.200 5.473 7.046 7.948 11.091 4.998 5.682 5.953 6.447 6.763 9.006 10.255

147.748 107.558 103.691 94.691 103.983 102.147 95.037 103.437 128.142 107.190 114.345 102.770 100.212 113.346 100.775 109.832 120.549 102.295 101.044 115.056 99.796 103.831 96.789

1.525 1.199 1.169 1.134 1.126 1.068 1.048 1.025 1.321 1.140 1.131 1.103 1.086 1.053 1.036 1.019 1.175 1.077 1.067 1.067 1.049 1.028 1.020

0.855 0.875 0.878 0.878 0.885 0.889 0.895 0.898 0.910 0.933 0.933 0.933 0.934 0.944 0.943 0.949 0.973 0.983 0.985 0.985 0.984 0.987 0.989

0.004 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.003 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001

0.05

0.01

enjoy asymptotic second-order efficiency property (5.22) for large n∗d , that is as d → 1+. We may reasonably expect that the differences between l and n∗d should approximately hover around φ(θ) or φ(θ) + 12 defined via (5.22) or (5.28) respectively. We may record: (i) φ(θ) = 3.624 when θ = 1, a = 2; (ii) φ(θ) = 16.122 when θ = 2, a = 2; and (iii) φ(θ) = 100.123 when θ = 5, a = 2. From column 7 in Tables 8–10, we see that the values of l − n∗d appear reasonably close to corresponding φ + 12 values in the sense that the corresponding intervals of (l − n∗d ) ± 2sl include such values. These empirically validate asymptotic second-order efficiency property seen from (5.22) or (5.28) in practice. We have made available our own developed R codes in a “dropbox”. These facilitated the data analyses presented in Subsection 4.3.1, Subsection 4.4, Subsection 5.3.2, and Subsection 5.4. We have provided an unrestricted link to access such codes in the list of references. One may additionally refer to R Core Team (2014). 6. Some concluding thoughts In Section 3, we noted a straightforward and naturally arriving positive and known lower bound n0d in (3.7) for the expression of n∗d in (3.4) which led to the

BOUNDED-LENGTH CONFIDENCE INTERVALS

269

specific choice of a pilot sample size m from (3.8) used both in Sections 4 and 5. Such a choice as our pilot size led to asymptotic second-order considerations under the two-stage estimation strategy (5.1)–(5.2). In a different vein, Mukhopadhyay and Duggan (1997) developed a remarkable two-stage fixed-width confidence interval methodology for the normal mean when it was assumed that the unknown population variance had a known positive lower bound. It was truly remarkable because Mukhopadhyay and Duggan (1997) could develop asymptotic second-order properties for an appropriately modified two-stage estimation methodology. It was a clear vindication of Stein’s (1945, 1949) original two-stage fixed-width confidence interval methodology. The proliferation of the core ideas from Mukhopadhyay and Duggan (1997) in many directions has been rather widespread and that continues to grow in areas including big data problems as well as small n large p problems. For brevity, we only mention some of the important references in order to connect the dots: Mukhopadhyay and Aoshima (1998), Aoshima and Mukhopadhyay (1998, 1999), Mukhopadhyay (1999a, b), Mukhopadhyay and Duggan (2000, 2001), Aoshima and Takada (2000), and Aoshima and Yata (2010). We recall that our purely sequential estimation strategy (4.1) and (4.5) as well as our two-stage estimation strategy (5.2)–(5.3) used the same pilot size n0d from (3.8). But, when we compare the columns corresponding to the values of n − n∗d from Tables 2–4 with the values of l − n∗d from Tables 8–10, it becomes apparent that n values are much tighter around n∗d than the l values. On the other hand, the two-stage estimation strategy (5.2)–(5.3) is operationally more convenient than the purely sequential estimation strategy (4.1) and (4.5). So, here is an important issue that we must grapple with: Should one implement the purely sequential estimation strategy or the two-stage estimation strategy? Assume that in a practical situation, one is able to implement either sampling methodology. Then, one should pick the more appropriate methodology by properly balancing the cost due to increased logistics and sampling operations with the intrinsic value of the extent of tightness required between the average sample size and n∗d . A practical problem must take into account all practical considerations as well as restrictions. Nothing less should be acceptable. Acknowledgements We received thoughtful and enthusiastic commentaries from two anonymous referees, an associate editor, and the Editor-in-Chief, Professor Makoto Aoshima. We are grateful to them for their most careful reading of the original version, especially for pointing out a number of necessary corrections very graciously. We thank them all. References Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, tenth edition, New York, Dover. Anscombe, F. J. (1952). Large sample theory of sequential estimation, Proceedings of Cambridge Philosophical Society, 48, 600–607.

270

NITIS MUKHOPADHYAY AND YAN ZHUANG

Aoshima, M. and Mukhopadhyay, N. (1998). Fixed-width simultaneous confidence intervals for multinormal means in several intraclass correlation models, J. Multivar. Anal., 66, 46–63. Aoshima, M. and Mukhopadhyay, N. (1999). Second-order properties of a two-stage fixed-size confidence region when the covariance matrix has a structure, Statistical Inference and Data Analysis (Tokyo, 1997), Commun. Stat., Theory and Methods, 28, 839–855. Aoshima, M. and Takada, Y. (2000). Second order properties of a two stage procedure for comparing several treatments with a control, J. Japan Statist. Soc., 30, 27–41. Aoshima, M. and Yata, K. (2010). Asymptotically second-order consistency for two-stage estimation methodologies and its applications, Ann. Inst. Stat. Math., 62, 571–600. Banerjee, S. and Mukhopadhyay, N. (2016). A general sequential fixed-accuracy confidence interval estimation methodology for a positive parameter: Illustrations using health and safety data, Ann. Inst. Stat. Math., 68, 541–570. Basu, D. (1964). Recovery of ancillary information, Sankhy¯ a , 26, 3–16. Chow, Y. S., Hsiung, C. and Lai, T. L. (1979). Extended-renewal theory and moment convergence in anscombe’s theorem, Ann. Probab., 7, 304–318. Chow, Y. S. and Robbins, H. (1965). On the asymptotic theory of fixed-width sequential confidence intervals for the mean, Ann. Math. Stat., 36, 457–462. Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics, New York, Chapman & Hall. Reprinted 2000, Boca Raton, CRC. Dropbox Link for R Programs. https://www.dropbox.com/s/0g0schj34aoy2px/R-Programs. pdf?dl=0, August 2017. Fisher, R. A. (1934). Two new properties of mathematical likelihood, Proceedings of Royal Society, Series A, 144, 285–307. Fisher, R. A. (1956). Statistical Methods and Scientific Inference, Edinburgh and London, Oliver and Boyd. Fisher, R. A (1973). Statistical Methods and Scientific Inference, third edition, London, Macmillan. Ghosh, M., Fraser, D. A. S. and Reid, N. (2010). Ancillarity statistics: A review, Statistica Sinica, 20, 1309–1322. Ghosh, M. and Mukhopadhyay, N. (1981). Consistency and asymptotic efficiency of two-stage and sequential procedures, Sankhy¯ a , Series A, 43, 220–227. Ghosh, M., Mukhopadhyay, N. and Sen, P. K. (1997). Sequential Estimation, New York, Wiley. Gut, A. (2012). Anscombe’s theorem 60 years later, Sequential Analysis, 31, 368–396. Joshi, S. N. and Shah, M. N. (1999). Estimation of P (Y < X) in the problem of the Nile, Statistical Inference and Design of Experiments (eds. U. J. Dixit and M. R. Satam), pp. 28– 35, New Delhi, Narosa. Kagan, A. and Malinovsky, Y. (2013). On the Nile problem by Sir Ronald Fisher, Electronic Journal of Statistics, 7, 1968–1982. Kagan, A. and Malinovsky, Y. (2016). On the structure of UMVUEs, Sankhy¯ a , Series A, 78, 124–132. Lai, T. L. and Siegmund, D. (1977). A nonlinear renewal theory with applications to sequential analysis I, Annals of Statistics, 5, 946–954. Lai, T. L. and Siegmund, D. (1979). A nonlinear renewal theory with applications to sequential analysis II, Annals of Statistics, 7, 60–76. Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, second edition, New York, Springer. Mukhopadhyay, N. (1999a). Second-order properties of a two-stage fixed-size confidence region for the mean vector of a multivariate normal distribution, J. Multivar. Anal., 68, 250–263. Mukhopadhyay, N. (1999b). Higher than second-order approximations via two-stage sampling, Sankhy¯ a , Series A, 61, 254–269. Mukhopadhyay, N. (2000). Probability and Statistical Inference, New York, Dekker. Mukhopadhyay, N. (2014). On rereading D. Basu’s jointly sufficient statistic example made up of two ancillaries and miscellany, Sankhy¯ a , Series A, 76, 280–287. Mukhopadhyay, N. and Aoshima, M. (1998). Multivariate multistage methodologies for simultaneous all pairwise comparisons, Metrika, 47, 185–201.

BOUNDED-LENGTH CONFIDENCE INTERVALS

271

Mukhopadhyay, N. and Banerjee, S. (2014). Purely sequential and two-stage fixed-accuracy confidence interval estimation methods for count data from negative binomial distributions in statistical ecology: One-sample and two-sample problems, Sequential Analysis, 33, 251– 285. Mukhopadhyay, N. and Banerjee, S. (2015a). Sequential negative binomial problems and statistical ecology: A selected review with new directions, Statistical Methodology, 26, 34–60. Mukhopadhyay, N. and Banerjee, S. (2015b). Purely sequential and two-stage bounded-length confidence intervals for the Bernoulli parameter with illustrations from health studies and ecology, Ordered Data Analysis, Modeling and Health Research Methods, Honor of H. N. Nagaraja’s 60th Birthday (eds. P. Choudhary, C. Nagaraja, and H. K. T. Ng), pp. 211–234, New York, Springer. Mukhopadhyay, N. and Chattopadhyay, B. (2012). A tribute to Frank Anscombe and random central limit theorem from 1952, Sequential Analysis, 31, 265–277. Mukhopadhyay, N. and Duggan, W. T. (1997). Can a two-stage procedure enjoy second order properties?, Sankhy¯ a , Series A, 59, 435–448. Mukhopadhyay, N. and Duggan, W. T. (2000). On a two-stage procedure having second-order properties with applications, Ann. Inst. Stat. Math., 51, 621–636. Mukhopadhyay, N. and Duggan, W. T. (2001). A two-stage point estimation procedure for the mean of an exponential distribution and second-order results, Statistics & Decisions, 19, 155–171. Mukhopadhyay, N. and Zhuang, Y. (2016). On fixed-accuracy and bounded-accuracy confidence interval estimation problems in Fisher’s “nile” example, Sequential Analysis, 35, 516–535. Mukhopadhyay, N. and Zhuang, Y. (2017). MLE, information, ancillary complement, and conditional inference with illustrations, Methodology and Computing in Applied Probability, 19, 615–629. Rao, C. R. (1973). Linear Statistical Inference, second edition, New York, Wiley. R Core Team (2014). R: A Language and Environment for Statistical Computing, Vienna, R Foundation for Statistical Computing. Sen, P. K. and Singer, J. M. (1993). Large Sample Methods in Statistics, New York, Chapman & Hall. Siegmund, D. (1985). Sequential Analysis, New York, Springer. Stein, C. (1945). A two sample test for a linear hypothesis whose power is independent of the variance, Ann. Math. Stat., 16, 243–258. Stein, C. (1949). Some problems in sequential estimation (abstract), Econometrica, 17, 77–78. Woodroofe, M. (1977). Second-order approximations for sequential point and interval estimation, Annals of Statistics, 5, 984–995. Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis, CBMS 39, Philadelphia, SIAM.

Suggest Documents