Using Auxiliary Information in Statistical Function Estimation Abstract 1 ...

1 downloads 0 Views 201KB Size Report
function and Nelson Cumulative Hazard Estimator. 1 ..... i=1 I(Yi ≥ x) (the number of events Yi registered at and after x) and Aalen's estimator for its mean.
project

Using Auxiliary Information in Statistical Function Estimation By SERGEY TARIMA Department of Statistics, University of Kentucky, Lexington, Kentucky, 40506-0027, U.S.A. [email protected]

DMITRI PAVLOV Biostatistics and Reporting, Pfizer Inc 50 Pequot Ave, New London, CT 06320, U.S.A. dmitri [email protected]

Abstract In many cases, auxiliary information is available from previous experiments. In the paper a general method of using auxiliary information is considered. It is shown that properties of standard empirical estimates can be improved by using additional knowledge. As an example, an improved empirical distribution estimate is given. It is demonstrated that it has faster rate of convergence. The method provides substantially more efficient confidence intervals than the standard procedures. It is shown by simulation that the theoretical results are valid even for moderate sample sizes. Some key words: Auxiliary information; Non-parametric statistics

1. Introduction Auxiliary information can be used in statistical function estimation in order to improve properties of standard procedures. Different methods of incorporating auxiliary information were suggested by a number of researchers (Dmitriev, Gal’chenko, Gurevich, Holt, Kuk, Pugachev, Zhang) and in most cases this information can be available in the form of prior knowledge about the estimated distributions, parameters, etc. Those methods deal mostly with auxiliary information of deterministic nature. For example, Chambers and Dunstan (1973) considered quantile estimation in presence of auxiliary variable when values of that auxiliary variable were known. The same ”ultimate” knowledge on auxiliary variable was used by Holt (1991) to modify estimators derived on the data inflicted by non-responses. The above results are not applicable when the exact auxiliary information cannot be obtained. In most real problems auxiliary information come from experts or derived from previous experiments. But there are very few papers provide methods on how to use auxiliary information in a form of statistical estimates from the previous experiments. For example, Kuk and Mak (1989) used the median derived from an independent previous sample to improve the standard median estimator. In 1973 Pugachev used effect of linear correlation between auxiliary and empirical information to derive modified estimators. His approach used exact knowledge of means of some statistical functions correlated to estimable one. Gal’chenko and Gurevich (1991) suggested modification of Pugachev’s method when auxiliary information is not known exactly but present in a form of statistical estimates from previous experiments. This paper suggests a new methodology of using auxiliary information in a form of statistical estimates derived from previous experiments which generalizes known approach of Gal’chenko and Gurevich. The method is applied to modify empirical cumulative distribution function and Nelson Cumulative Hazard Estimator. 1

2. Methodology R Consider problem of statistical estimation of H(t) = R ϕ(t, x)dFX (x), where ϕ(t, x) is a known function defined on (R × R) and FX (x) is an unknown cumulative distribution function of a random variable X defined on real line. In order to estimate H(t) a sample of size n was obtained. For ˆ n (t) be a consistent and asymptotically simplicity this sample will be called ”current” sample. Let H normal estimator of H(t) based on this sample. ˜ m (s1 ), . . . , H ˜ m (sk )) was ob˜ m (s) = (H Auxiliary information in form of vector of estimates H tained independently from ”current” sample, where subscript m reflects the size of a sample used to derive these estimates and −∞ < s1 < · · · < sk < +∞. Again for simplicity this sample of size m will be referred as ”previous” sample. Remark: In the notation for these estimators ”hat” corresponds to one estimator’s type (e.g. empirical estimator), ”tilde” - to (may be, but not necessary) another (e.g. smoothed estimator). Picking up the sum of squares as a loss function risk function becomes a mean square error, which is used in the paper as a measure of estimators’ optimality. Extracting an estimator with minimal mean square error may be conducted in the family of estimators ˆ Λ (t) = H ˆ n (t) + Λ(∆H ˜ n (s) − ∆H ˜ m (s))T , H

(1)

where Λ = (λ1 , . . . , λk ) is a vector of parameters corresponding to all possible estimators in family ˜ n (s1 ), . . . , ∆H ˜ n (sk )), ∆H ˜ m (s1 ), . . . , ∆H ˜ m (sk )), ∆H ˜ n (si ) = H ˜ n (si ) − ˜ n (s) = (H ˜ m (s) = (H (1), ∆H ˜ ˜ ˜ ˜ ˜ ˜ Hn (si−1 ), ∆Hm (si ) = Hm (si ) − Hm (si−1 ), i = 2, . . . , k, Hm (s0 ) = Hn (s0 ) = 0, T - transpose operator. ˆ n (t), ∆H ˜ n (si ) − ∆H ˜ m (si ))ki=1,...,k and Let also Kts = kcov(H ˜ ˜ ˜ ˜ m (sj ))ki,j=1,...,k then mean square error of H ˆ Λ (t) Kss = kcov(∆Hn (si ) − ∆Hm (si ), ∆Hn (sj ) − ∆H is ˆ Λ (t)] = M SE[H(t)] ˆ M SE[H + 2ΛKts + ΛKss ΛT . (2) ˆ Λ (t)] ≡ 0, where 5Λ is a gradient vector, then Condition for extremum is 5Λ M SE[H

It can be shown that

ˆ Λ (t)] = 2Kts + 2ΛKss ≡ 0. 5Λ M SE[H

(3)

Λ0 = −Kts K−1 ss

(4)

be the ”optimal” vector Λ0 corresponding to the estimator

with

ˆ Λ0 (t) = H(t) ˆ ˜ ˜ H − Kts K−1 ss (∆Hn (s) − ∆Hm (s))

(5)

T ˆ Λ0 (t)) = M SE(H(t)) ˆ M SE(H − Kts K−1 ss Kts

(6)

minimal in family (1). Covariance matrix Kss is non-negatively defined. But from (4) notice that det(Kss ) 6= 0 and T then matrix Kss should be positively defined. In equation (6) Kts K−1 ss Kts is a quadratic form and −1 T therefore is always nonnegative. Moreover Kts Kss Kts = 0 if and only if Kts ≡ 0. In practice researcher usually doesn’t really know values of Kts and Kss and should use their estimators. Consider ˆ n (t), ∆H ˜ n (si ) − ∆H ˜ m (si ))ki=1,...,k Knts = kcovn (H an estimator for Kts and ˜ n (si ) − ∆H ˜ m (si ), ∆H ˜ n (sj ) − ∆H ˜ m (sj ))ki,j=1,...,k Knss = kcovn (∆H 2

an estimator for Kss . Remark: Subscript n in covn defines only that this covariance estimator derived from ”current” sample and does not define type of this estimator. Among desired properties of this estimator consistency should be pointed. Its consistency will be used later in Proposition 2. Applying Knts and Knss instead of unknown Kts and Kss modifies (5) and (6) to the following forms ˆ Λn0 (t) = H ˆ n (t) − Knts K−1 ˜ ˜ (7) H nss (∆Hn (s) − ∆Hm (s)) and

T ˆ Λn0 (t)) = M SEn (H ˆ n (t)) − Knts K−1 M SEn (H nss Knts .

(8)

ˆ Λn0 (t)) can be used as an estimator for unknown MSE defined in (6). M SEn (H New requirement comes from (7) and (8): Knss should be positively defined. Similar problem has already been mentioned and resolved by different amendments, see for example, Dmitriev and Ustinov (1988). Also methods of sequential analysis can be applied in this case to generate proper stopping rule on sample size. Asymptotical properties of (5) described by Proposition 1. ³ ´ ˆ n (t) − H(t) −→ξ(t), in distribution, as n → ∞, where an is a sequence Let 1) ξn (t) = an H ¢ ¡ of positive real numbers such that an → +∞, as n → +∞, ξ(t) = N 0, σ12 (t) in distribution, σ12 (t) < +∞ for³all t, ´ ˜ n (sj ) − H(sj ) −→ζ(sj ) in distribution, as n → ∞, where bn is a sequence of 2) ζn (sj ) = bn H ¢ ¡ positive real numbers such that bn → +∞, as n → +∞, ζ(sj ) = N 0, σ22 (sj ) in distribution, σ22 (sj ) < +∞ for all j, 3) Kss - positively defined, 4) m = f (n), where f (n) be an increasing function, 5) ∆ζn (sj ) = ζn (sj ) − ζn (sj−1 ) and ∆ζf (n) (sj ) = ζf (n) (sj ) − ζf (n) (sj−1 ). ˆ nΛ0 (t) be consistent and Then H if bn /bm = bn /bf (n)−→w ∈ [0, +∞), as n → ´ as n → ¡∞, then ηn (t) −→ η(t) in probability, ³ ¢ Λ0 2 2 −1 −1 T ˆ ∞, where ηn (t) = an Hn (t) − H(t) , η(t)=N 0, σ (t) − (1 + w ) Cts Css Cts in distribution, 1

Cts = kcov(ξ(t), ∆ζ(si ))ki=1,...,k , Css = kcov(∆ζ(si ), ∆ζ(sj ))ki,j=1,...,k . T Consider asymptotic variance of η(t). The term (1 + w2 )−1 Cts C−1 ss Cts is a non-negative one and its value depends on sample sizes only through w. If w much larger than 1 then decrease of mean square error is very small and asymptotical properties of new estimator are almost the same as unmodified estimator has. If w is close to 0 then auxiliary information derived from ”previous” sample almost true and asymptotical properties of new estimator close to the case of incorporating information of exact knowledge. Proposition 1 showed asymptotical properties of estimator (5) but (5) cannot be used in most practical cases because Λ0 usually is not known. In these situations estimator (7) should be applied and its asymptotical properties described in Proposition 2. Suppose assumptions of Proposition 1 hold and ˆ n (t), ∆H ˜ n (si )) − cov(H ˆ n (t), ∆H ˜ n (si ))) are asymptotically normal with mean 0 and fi1)a2n (covn (H nite variance (which can be zero), i = 1, . . . , k; ˜ n (si )−∆H ˜ m (si ), ∆H ˜ n (sj )−∆H ˜ m (sj ))−cov(∆H ˜ n (si )−∆H ˜ m (si ), ∆H ˜ n (sj )−∆H ˜ m (sj ))) 2)a2n (covn (∆H are asymptotically normal with mean 0 and finite variance (which can be zero), i, j = 1, . . . , k.

3

³ ´ ˆ Λn0 (t) − H(t) −→ η(t), as n −→ ∞, where η(t) defined in Proposition 1. Then an H Remarks: 1)In Proposition 2 there are two additional assumptions on convergence of covariances’ estimators. If these assumptions do not hold asymptotical properties of estimator (7) can not only differ from these of (5) but also be even worse (in sense of asymptotical variance) when of unmodified estimator. 2) For the cases when w = 0 (7) become the same as the estimator derived by method of correlated processes (Pugachev,1973) and has the same asymptotical properties as empirical likelihood estimator in the presence of auxiliary information of the same type (Zhang,1996).

3. ”Memoryless” case Assume that Hn (t2 )−Hn (t1 ) does not correlate with Hn (t4 )−Hn (t3 ) for arbitrary t1 < t2 < t3 < t4 , that is E((Hn (t2 ) − Hn (t1 )) · (Hn (t4 ) − Hn (t3 ))) = 0. (9) Then

³ ´ ˆ n (x) · H ˜ n (y)) = E H ˆ n (min(x, y))H ˜ n (min(x, y)) E(H

for any positive x and y. With these assumptions for Hn (t) ° ³ ´° ° ˆ n (t), ∆H ˜ n (si ) − ∆H ˜ m (si ) ° Kts = °cov H °

i=1,...,k

(10)

° ³ ´° ° ˆ n (t), ∆H ˜ n (si ) ° = °cov H °

i=1,...,k

=

° ³ ´ ³ ´° ° ˆ n (min(t, si )), H ˜ n (min(t, si )) − cov H ˆ n (min(t, si−1 )), H ˜ n (min(t, si−1 )) ° = °cov H °

j=1,...,k

and Kss becomes diagonal matrix ° ° ° ˜ n (si ) − ∆H ˜ m (si ), ∆H ˜ n (sj ) − ∆H ˜ m (sj ))° Kss = °cov(∆H °

=

° ° ° ˜ n (si ), ∆H ˜ n (sj )) + cov(∆H ˜ m (si ), ∆H ˜ m (sj ))° = °cov(∆H °

=

i,j=1,...,k

i,j=1,...,k

° ° ° ˜ n (si ), ∆H ˜ n (si )) + cov(∆H ˜ m (si ), ∆H ˜ m (si ))° = °cov(∆H °

i=1,...,k

.

In this case estimator (7) may be rewritten as i Pk h ˆ Λ0 (t) = H(t) ˆ ˜ j) × H − j=1 ∆H(sj ) − ∆H(s

×

³ ´ ³ ´ ˆ n (min(t, si )), H ˜ n (min(t, si )) − cov H ˆ n (min(t, si−1 )), H ˜ n (min(t, si−1 )) cov H

(11)

˜ n (si ), ∆H ˜ n (si )) + cov(∆H ˜ m (si ), ∆H ˜ m (si )) cov(∆H

and mean square error (8) as ˆ Λ0 (t)) = M SE(H(t))− ˆ M SE(H



h ³ ´ ³ ´i2 ˆ n (min(t, si )), H ˜ n (min(t, si )) − cov H ˆ n (min(t, si−1 )), H ˜ n (min(t, si−1 )) k cov H X j=1

˜ n (si ), ∆H ˜ n (si )) + cov(∆H ˜ m (si ), ∆H ˜ m (si )) cov(∆H 4

.

(12)

Remarks: 1) Notice that denominators in (11) and (12) can be transformed in the following way ˜ n (si ), ∆H ˜ n (si )) + cov(∆H ˜ m (si ), ∆H ˜ m (si )) = cov(∆H ˜ n (si )) + M SE(∆H ˜ m (si )) = = M SE(∆H ˜ n (si )) − M SE(H ˜ n (si−1 )) + M SE(H ˜ m (si )) − M SE(H ˜ m (si−1 )) = M SE(H and in case of unbiasedness of involved estimators mean square errors become variances. 2) if ”hat” and ”tilde” correspond to the estimators of the same type then covariances in nominators of (11) and (12) become mean square errors (variances in unbiased case).

4. Empirical cumulative distribution function modification Suppose X1 , . . . , Xn are independent and identically distributed random variables with an unknown cumulative distribution function F (t) (t ∈ (−∞, ∞)). Consider problem of non-parametric estimation of F (t) in presence of auxiliary information. In addition to ”current” sample 1 , . . . , Xn statistic PX m Fm (s) was derived from ”previous” sample Y1 , . . . , Yn , where Fm (s) = m−1 i=1 I(−∞,s) (Yi ) and Y1 , . . . , Yn have the same distribution as X1 , . . . , Xn . Indicator function IA (X) equals to 1 if X ∈ A and 0 otherwise. Notice also here that only statistic Fm (s) is known not all ”previous” sample. All methodology described R R in first section can be applied in this section with following details ˆ n (t) = Fn (t), H ˜ m (s) = Fm (s), H(t) = R ϕ(t, x)dFX (x) = R I(−∞,t) (x)dF (x) = F (t), k = 1, H ˆ n (s) = Fn (s) H In order to modify empirical cumulative distribution function Fn (t) consider family Fnλ (t) = Fn (t) + λ(Fn (s) − Fm (s))

(13)

which is a special case of (1). Optimal λ0 defined by (4) take a form λ0 = −

m F (min(s, t)) − F (s)F (t) n+m F (s)(1 − F (s))

(14)

and estimator with minimal mean square error in (13) is Fnλ0 (t) = Fn (t) −

m F (min(s, t)) − F (s)F (t) (Fn (s) − Fm (s)). n+m F (s)(1 − F (s))

(15)

Fn (t) is an unbiased estimator and as a result all family (13) consists of unbiased estimators and their mean square errors become variances. Variance of (15) is 2 ¡ ¢ F (t)(1 − F (t)) m (F (min(s, t)) − F (s)F (t)) − . var Fnλ0 (t) = n n(n + m) F (s)(1 − F (s))

(16)

But in λ0 in (14) is not known and (15) with (16) cannot be used without proper modification. Adaptive estimator (7) can be used in this situation which transform (15) to form Fnλn0 (t) = Fn (t) −

m Fn (min(s, t)) − Fn (s)Fn (t) (Fn (s) − Fm (s)). n+m Fn (s)(1 − Fn (s))

(17)

According to Proposition 2 estimator (17) has the same asymptotical properties as (15) but Fnλn0 (t) cannot be used in all possible cases because as only Fn (s) = 0 or Fn (s) = 1 denominator and nominator in (14) become 0. Different amendments can be used in this case, for example, applying 5

Fnλn0 (t) = Fn (t) when λn0 = 0/0 resolves this problem. If t ≤ s then (17) can be transformed to the form Fnλn0 (t) = Fn+m (s)

Fn (t) Fn (s)

(18)

and if t > s (17) becomes Fnλn0 (t) = Fn+m (s) + [1 − Fn+m (s)]

Fn (t) − Fn (s) . 1 − Fn (s)

(19)

From (18) and (19) derive representation of (17) Fnλn0 (t) = Fn+m (s)

Fn (max(t, s)) − Fn (s) Fn (min(t, s)) + [1 − Fn+m (s)] . Fn (s) 1 − Fn (s)

(20)

Estimator (20) was also derived by Little and Rubin when they applied maximum likelihood ratio to monotone missing data with ignorable mechanism of missing data generation. When the m(m+n)−1 goes to 0 estimator (17) converges to empirical one. When the m(m+n)−1 goes to 1, this case becomes similar to the case when the auxiliary information is known exactly. Applying the ultimate case when m = +∞ and n is finite m(m + n)−1 = 1 (17) and (20) become Fnλn0 (t) = Fn (t) − and Fnλn0 (t) = F (s)

Fn (min(s, t)) − Fn (s)Fn (t) (Fn (s) − F (s)) Fn (s)(1 − Fn (s))

Fn (min(t, s)) Fn (max(t, s)) − Fn (s) + [1 − F (s)] . Fn (s) 1 − Fn (s)

(21)

(22)

Notice that (22) is also a non-parametric maximum likelihood estimator constructed with a prior knowledge of F (s) (Owen, 2001). In order to show graphically how modified estimator (17) differs from empirical one consider the following example. Example 4.1. Suppose ”current” and ”previous” samples were obtained from F (t) = 1 − exp(−t). Auxiliary information is available in the form of the following statistic: Fm (1) (s = 1, k = 1) based on ”previous” sample. The following three experiments were conducted: 1) n=50, m=150 2) n=50, m=10 3) n=50, m=10000

Pic. 1: Empirical estimator (n=50)

Pic. 2: Modified estimator (n=50, m=150) 6

Pic. 3: Empirical estimator (n=50)

Pic. 4: Modified estimator (n=50, m=10)

Pic. 5: Empirical estimator (n=50)

Pic. 6: Modified estimator (n=50, m=10000)

On Pictures 1,3,5 empirical estimators with their 95% confidence intervals are shown. Pictures 2,4,6 correspond to modified estimators (with their confidence intervals) based on the same samples as empirical ones on pictures 1,3,5 correspondingly. Picture 2 shows the closer t to s (and the stronger correlation between Hn (t) and Hm (s)) the smaller confidence interval becomes and the biggest influence from auxiliary information came at point s. If the size of ”previous” sample (m) greatly overwhelms the size of ”current” sample (n) then picture 6 describes such a situation. Pictures 1 - 6 show that the best improvement of confidence interval corresponds to t = 1 because in this time point the strongest correlation Fm (t) and Fn (t) shows up. Pictures 1 and 2 were plotted on the basis of only one simulation. All methodology in the first section can be applied not only to empirical but also to the other estimators. Consider, for example, modification of Nelson cumulative hazard estimator.

5. Modification of Nelson Cumulative Hazard Estimator Let Y1 , . . . , Yn be independent and identically distributed random variables with an unknown FY (t) (t ∈ [0, ∞)), C1 , . . . , Cn - with unknown FC (t) (t ∈ [0, ∞)). If Tj = min(Yj , Cj ) and δj = I(Yj ≤ Cj ), for j = 1, 2, . . . , n, then paired observations (T1 , δ1 ), . . . , (Tn , δn ) are called right censored data. One 7

Rt may take interest in the cumulative hazard function H(t) = 0 (1 − FY (x−))−1 dFY (x) estimation, where F (x−) = limε→0 F (x − ε), ε > 0. For this purpose well known estimators may be used. But it is possible to improve them when extra information is known. Consider Nelson estimator of cumulative hazard function. X dn (x) ˆ n (t) = Hn,N A (t) = H , (23) Rn (x) x≤t Pn where dn (x) = i=1 I(Yi = x) · δi (the number of uncensored events Yi registered at x) and Rn (x) = P n i=1 I(Yi ≥ x) (the number of events Yi registered at and after x) and Aalen’s estimator for its mean square error is X dn (x) ˆ n (t)) = M SEn (H ˆ n,N A (t)) = M SEn (H . (24) Rn2 (x) x≤t

Fleming (1991) is a good reference concerning estimators (23) and (24). Suppose asymptotically normal statistical estimates ˜ m (si ) = Hm,N A (si ) = H

X dm (x) Rm (x) x≤t

of H(t) were derived from ”previous” sample and covariances’ estimates (satisfying to proposition 2) ˆ m,N A (min(si , sj ))) covn (Hm (si ), Hm (sj )) = M SEn (H and

ˆ n,N A (min(si , sj ))) covn (Hn (si ), Hn (sj )) = M SEn (H

can be calculated on the basis of ”current” sample, where i, j = 1, . . . , k (see equation (31)). It is reasonable to propose that this extra data is independent from (T1 , δ1 ), . . . , (Tn , δn ). ˆ N A . ApplyIt can be shown that assumptions for ”memoryless” situation (9) hold when Hn = H ing (23) and (24) to (11) receive modification of Nelson estimator  ∗ HN A (t) =

X dn (x) m − Rn (x) n + m x≤t

k X

 X

 

j=1

min(t,sj−1 )≤x≤min(t,sj )

 ×

X sj−1 ≤x≤sj

dn (x) − Rn (x)

X sj−1 ≤x≤sj

dn (x)  Rn2 (x)

X sj−1 ≤x≤sj

−1  dn (x)   × Rn2 (x)

 dm (x)  Rm (x)

(25)

and applying (23) and (24) to (11)  k X dn (x) X m ∗  M SEn (HN − A (t)) = Rn2 (x) n + m j=1 x≤t

X min(t,sj−1 )≤x≤min(t,sj )

2  dn (x)   Rn2 (x)

X

sj−1 ≤x≤sj

−1 dn (x)  Rn2 (x) (26)

8

6. Conclusion A problem of using auxiliary information in statistical function estimation was considered. It is assumed that auxiliary information is presented in the form of statistics obtained from previous experiments. New estimators have been developed which are generalizations of Gal’chenko and Gurevich results for a wider (arbitrary convergence rate) class of auxiliary information. The given theorems provide asymptotic properties of these estimators such as consistency, normality, etc. The obtained results were used to build modified empirical estimators of cumulative distribution function and cumulative hazard function. It is shown that the suggested estimators are consistent, asymptotically normal. Also, they way of estimation produces better results in terms of variance. The discussed methodology does not require strong assumptions and it can be used to improve properties of other widely-used procedures.

7. Appendix ˆ n (t) −→ H(t) in probability, as n → ∞, from Proof of Proposition 1. From the first assumption H ˜ ˜ f (n) (sj ) −→ H(sj ) the second Hn (sj ) −→ H(sj ) ,as n → ∞, for all j, and from second and fourth H Λ ˆ n 0 (t) is consistent. ,as n → ∞, for all j. So H Notice that ³ ´ ³ ´ ˜ n (sj ) − ∆H(sj ) = ζn (sj ) − ζn (sj−1 ) = ∆ζn (sj ) and bn ∆H ˜ f (n) (sj ) − ∆H(sj ) = b n ∆H ³ ´ ¡ ¢ bn ˜ f (n) (sj ) − ∆H(sj ) = bn ζf (n) (sj ) − ζf (n) (sj−1 ) = bn ∆ζf (n) (sj ) b ∆ H f (n) bf (n) bf (n) bf (n) then ηn (t) may be rewritten as °· ¸° ° ° an bn −1 ° ηn (t) = ξn (t) − Kts Kss ° ∆ζn (sj ) − ∆ζf (n) (sj ) ° . ° bn bf (n) j=1,...,k In condition when Kss is positively defined ° ° ° ° ˆ ˜ ˜ Kts K−1 ss = °cov(Hn (t), ∆Hn (si ) − ∆Hm (si ))°

i=1,...,k

(27)

×

° °−1 ° ˜ n (si ) − ∆H ˜ m (si ), ∆H ˜ n (sj ) − ∆H ˜ m (sj ))° × °cov(∆H °

i,j=1,...,k

.

First consider the following representation of elements of matrix Kts ˆ n (t), ∆H ˜ n (si ) − ∆H ˜ m (si )) = cov(H ˆ n (t), H ˜ n (si )) − cov(H ˆ n (t), H ˜ n (si−1 )) − cov(H ˆ n (t), H ˜ m (si )) + cov(H ˆ n (t), H ˜ m (si−1 )), = cov(H where covariances may be represented as ˆ n (t), H ˜ n (si )) = cov(H ˆ n (t) − H(t), H ˜ n (si ) − H(si )) = cov(H =

1 ˆ n (t) − H(t)), bn (H ˜ n (si ) − H(si ))) = 1 cov(ξn (t), ζn (si )). cov(an (H an bn an bn

So ˆ n (t), H ˜ n (si )) = cov(H By analogy find ˆ n (t), H ˜ n (si−1 )) = cov(H

1 cov(ξn (t), ζn (si )). an bn

(28)

1 cov(ξn (t), ζn (si−1 )), an bn

(29)

9

1 cov(ξn (t), ζm (si )) = 0, an bm ˆ n (t), H ˜ m (si−1 )) = 1 cov(ξn (t), ζm (si−1 )) = 0. cov(H an bm ˆ n (t), H ˜ m (si )) = cov(H

(30) (31)

Zeroes in (30) and (31) came from independence of ”current” and ”previous” samples. Then the elements of Kts may be represented as 1 cov(ξn (t), ∆ζn (si )). an bn Use similar technique to find the following representation of elements of Kss ˆ n (t), ∆H ˜ n (si ) − ∆H ˜ m (si )) = cov(H

˜ n (si )−∆H ˜ m (si ), ∆H ˜ n (sj )−∆H ˜ m (sj )) = 1 cov(∆ζn (si ), ∆ζn (sj ))+ 1 cov(∆ζm (si ), ∆ζm (sj )). cov(∆H b2n b2m Now consider asymptotical situation, where

bn bm

=

bn bf (n)

→ w ∈ [0, +∞) , as n → ∞.

Then ηn (t) → η(t) = ξ(t) − kcov(ξ(t), ∆ζ(si ))ki=1,...,k × ×kcov(∆ζ(si ), ∆ζ(sj )) + w2 · cov(∆ζ(si ), ∆ζ(sj ))k−1 i,j=1,...,k × k[∆ζ1 (sj ) − w · ∆ζ2 (sj )]kj=1,...,k = = ξ(t) − kcov(ξ(t), ·∆ζ(si ))ki=1,...,k k(1 + w2 ) · cov(∆ζ(si ), ∆ζ(sj ))k−1 i,j=1,...,k × × k[·∆ζ1 (sj ) − w · ∆ζ2 (sj )]kj=1,...,k . Notice here that ∆ζ1 (·) and ∆ζ2 (·) are independent random vectors with the same k-dimensional distribution function as ∆ζ(·). η(t) is a normal random variable because it is a linear combination of normal random variables. E(η(t)) = 0 and after routine algebraic transformations find var (η(t)) = var (ξ(t)) − kcov(ξ(t), ∆ζ(si ))ki=1,...,k × T ×kcov(∆ζ(si ), ∆ζ(sj )) + w2 · cov(∆ζ(si ), ∆ζ(sj ))k−1 i,j=1,...,k × kcov(ξ(t), ∆ζ(si ))ki=1,...,k .

Or var (η(t)) = var (ξ(t)) −

1 T Cts C−1 ss Cts . 1 + w2

(32)

Q.E.D. ˆ ˜ Proof of Proposition 2. Notice that Knts K−1 nss is a continuous function of covn (Hn (t), ∆Hn (si )− ˜ m (si )), and covn (∆H ˜ n (si )−∆H ˜ m (si ), ∆H ˜ n (sj )−∆H ˜ m (sj )), i, j = 1, . . . , k, when det (Knss ) > 0. ∆H Probability of the cases when det (Knss ) = 0 goes to zero as n goes to infinity and hence does not change asymptotical properties. Consider Taylor expansion of Knts K−1 nss with respect to covn (·, ·) in a neighborhood of the true ˆ n (t), ∆H ˜ n (si )−∆H ˜ m (si )), cov(·, ·), where cov(·, ·) is a k +k 2 -dimensional vector with elements cov(H ˜ n (si ) − ∆H ˜ m (si ), ∆H ˜ n (sj ) − ∆H ˜ m (sj )), i, j = 1, . . . , k. (i = 1, . . . , k), and cov(∆H −1 −1 −1 Then Knts K−1 nss = Kts Kss +On (covn (, )−cov(, )) or Knts Knss −Kts Kss = On (covn (, ) − cov(, )) which goes to normal random variable with mean 0 and variance 0, or to zero in probability. Now ³ ´ ³ ´ ³ ´ ˆ Λn0 (t) − H(t) = an H ˆ Λn0 (t) − H ˆ Λ0 (t) + an H ˆ Λ0 (t) − H(t) = an H ´ ³ ´ ¡ ¢³ −1 ˆ Λ0 (t) − H(t) . ˜ n (s) − ∆H ˜ m (s) an + an H = Kts K−1 ∆H ss − Knts Knss 10

(33)

³ ´ −1 ˜ ˜ In (33) Kts K−1 − K K goes to zero in probability and a ∆ H (s) − ∆ H (s) is a connts n n m ss nss sistent estimator of 0. ´ ¡ ¢³ −1 ˜ n (s) − ∆H ˜ m (s) goes to zero in probability and asympHence an Kts K−1 ∆H ss − Knts Knss ³ ´ ³ ´ ˆ Λn0 (t) − H(t) and an H ˆ Λ0 (t) − H(t) are the same. Q.E.D. totical distributions for an H

8. References Dmitriev, Yu. G. and Ustinov Yu. K. (1988) Statistical estimation of probability distribution with auxiliary information. [in Russian] Tomsk, Tomsk State University, 194pp. Chambers, R. L. and Dunstan, R. ( 1973) Estimating distribution functions from survey data. Biometrika,597-604. Fleming, T. R. and Harrington, D. P. (1991) Counting processes and survival analysis Wiley, 429pp. Holt, D. and Elliot, D. (1991) Methods of Weighting for Unit Non-Response The Statistician, Vol. 40, No. 3, Special Issue: Survey Design, Methodology and Analysis, 333-342. Gal’chenko, M. V. and Gurevich, V. A. (1991) Minimum-contrast estimation taking into account additional information. Journal of Soviet Math., 53, no. 6, 547-551. Anthony Y. C. Kuk and T. K. Mak (1989) Median estimation in the Presence of Auxiliary Information. Journal of the Royal Statistical Society. Series B (Methodological), Vol. 51, 261-269. Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with missing data. New York, Wiley, Owen, A.B. (2001) Empirical Likelihood. Chapman and Hall, 320pp. Pugachev, V. N. (1973) Mixed Methods of Determining Probabilistic Characteristics. [in Russian] Moscow, Soviet Radio, 256pp. Zhang, B. (1996) Confidence intervals for a distribution function in the presence of auxiliary information. Computational statistics and data analysis, v.21, pp 327-342.

11

Suggest Documents