Using Products and Powers of Products to Test

1 downloads 0 Views 540KB Size Report
Dinis Pestana, Fernando Sequeira. Universidade de Lisboa, Faculdade de Ciências (DEIO), CEAUL. Bloco C6, Piso 4, Campo Grande, 1749-016 Lisboa, ...
Using Products and Powers of Products to Test Uniformity Maria de Fátima Brilhante Universidade dos Açores (DM), CEAUL Rua da Mãe de Deus, Apartado 1422, 9501-801 Ponta Delgada, Portugal E-mail: [email protected] Sandra Mendonça Universidade da Madeira (CCEE), CEAUL Campus Universitário da Penteada, 9000-390 Funchal, Portugal E-mail: [email protected] Dinis Pestana, Fernando Sequeira Universidade de Lisboa, Faculdade de Ciências (DEIO), CEAUL Bloco C6, Piso 4, Campo Grande, 1749-016 Lisboa, Portugal E-mails: [email protected], [email protected] Abstract. The p-values observed in independent tests on some hypothesis are, under the overall null hypothesis, a sample from the standard uniform. Thus uniformity tests are an important tool in meta-analysis, and the family of probability density func  mx − m−2 (x), tions {fXm (x) = I (0,1) 2 m ∈ [−2, 0]} provides an interesting theoretical framework to assess uniformity. In fact, H0 : m = 0 stands for uniformity, and the alternative HA : m ∈ [−2, 0] should favor rejection of the overall null hypothesis, since the smaller m is, the more probable is to observe p-values near 0. But computational sample augmentation techniques using this family has the unexpected result of decreasing power. We investigate the usefulness of products and powers of products of variables from that family in testing uniformity. We show, using simulation, that those transformations also have a negative impact on power of uniformity goodness-of-fit tests. The Kakutani inner product of the measures under the null hypothesis and under the alternative hypothesis confirms increased collinearity and poorer fit. In fact, the techniques investigated pull the entropy towards 0, which is the entropy of the standard uniform random variable. This enlightens why testing uniformity can be rather misleading.

The uniform random model fits exactly the ideal of equiprobability, interesting from a philosophical viewpoint, but as such an asset of rather limited interest as a randomness model in the scope of decision procedures. But on the other hand, in view of what Rohatgi and Saleh (2001, p. 208) call the probability integral transform theorem — the fact that, under very broad conditions, FX (X)  U nif orm(0, 1), and that even under broader conditions Y = FX← (U )  FX , where U is the standard uniform random variable and FX← denotes the generalized inverse of FX —, the uniform model became central in all areas using Computational Statistics. An important consequence of the probability integral transform theorem is that performing similar independent tests, under H0 the set of p-values is a sample from the standard uniform population. Combining p-values is a tool in meta-analysis, but testing uniformity using small samples can be misleading. Gomes et al. (2009) presented several ‘recipes’ of computational sample augmentation, but with rather unexpected effects on the power of the goodness-of-fit test. Brilhante et al. (2010) developed further results along similar lines.

Keywords. Uniformity, mixtures, products and powers, Kakutani product, entropy.

Gomes et al. (2009) used, namely, the family of random variables Xm with probability density function (pdf) fXm (x) =

1. Introduction

509 nd

Proceedings of the ITI 2010 32 Int. Conf. on Information Technology Interfaces, June 21-24, 2010, Cavtat, Croatia

  mx − m−2 I(0,1) (x) — where IA (x) = 2  1 if x ∈ A is the indicator function. In 0 if x ∈ A other words, Xm has a mixture pdf, of the standard uniform, with weight 1+ m 2 , and of a Beta(1, 2) random variable, with weight − m 2. Those mixtures, and namely the Beta(1, 2) model itself, are important in investigating the truthfulness of a null hypothesis, in metaanalysis, combining the p-values of independent tests since for m ∈ [−2, 0) they are more prone to generate small values, namely below the usual significance level, than the uniform model, corresponding to m = 0. Using that shows that   the characterization Xm 1−Xm min Xp , 1−Xp is uniform and independent of Xp if and only if either Xm or Xp is standard uniform in case both variables have support (0,1), computational sample augmentation is straightforward. However, if the goal is to test uniformity, sample augmentation has a negative impact, since it actualy decreases power. Herein we exploit an extension of the d d fact that if U1 = U2 = U3  U nif orm(0, 1) are independent random variables, then (U1 U2 )U3  U nif orm(0, 1), in the aim of investigating whether power divergence (an idea inspired by Cressie and Read, 1984) enhances power. In section 2 we prove that (U1 U2 )X is uniform iff  = 0, i.e. X  U nif orm(0, 1), and that (U1 U2 U3 )X is uniform iff  = −2, i.e. X  Beta(1, 2). This inspired us to assess whether using products and powers of products is useful in testing uniformity. Using simulation, we shall show that the answer is negative, and the Kakutani inner product of the measures of the test statistic under the null and under the alternative hypothesis confirms increased colinearity using the transformed variables. The explanation lies in the fact that the transformations used pull the entropy towards 0, which is the entropy of the standard uniform law. In the sequel we shall need some functions related to the exponential integral, we transcribe their definition from Abramowitz and

510

Stegun (1922, chap. 5):  ∞ −t  x t e e Ei(x) = − − dt = − dt, x > 0. −x t −∞ t  x dt = Ei(ln(x)), li(x) = − ln(t) 0 and for n = 0, 1, . . . ; Re(z) > 0,  ∞ −zt e En (z) = dt, tn 1  ∞ αn (z) = tn e−zt dt = =

n! z n+1

e

−z

1



zn z2 1+z+ + ··· + 2 n!

.

2. On products of random variables In what follows, U1 , U2 , . . . denote independent and identically distributed (iid) standard uniform random variables. ∗ ), for Define also Un∗  U nif orm(0, Un−1 ∗ n = 1, 2, . . . , where U0 = 1. The pdf of Un∗ is

fUn∗ (x) =

(− ln(x))n−1 I(0,1) (x), (n − 1)! d

and therefore Un∗ = U1 U2 · · · Un . Most of our results stem out from a very humble and straightforward result: Theorem If Tα and Vα+1 are independent random variables, Tα  Beta(α, 1) and Vα+1  Gamma(α + 1, 1), then Tα Vα+1  Gamma(α, 1). In fact, fTα Vα+1 (x) =  =

∞ x

 α−1 α −y x y e dy α (x) I y Γ(α + 1) y (0,∞) =

xα−1 e−x I(0,∞) (x). Γ(α)

Observations 1. From f 1 (x) = X

Tα  Beta(α, 1)

fX ( x1 ) , it follows that x2 ⇐⇒ T1α  P areto(α).

Hence the above result may alternatively be stated as:

=

Let Vα  Gamma(α, 1) and Wα  P areto(α) be independent random variables. Then

Vα+1 d Wα = V α

 Gamma(α, 1).

2. From Z  Beta(p, q) ⇐⇒ 1 − Z  Beta(q, p) and the integral probability transform, exp(−V1 )  U nif orm(0, 1); therefore, from well known additive properties of the Gamma family, d d exp(−Vn ) = U1 U2 · · · Un = Un∗ , with the Uk ’s iid standard uniform random variables, has pdf fUn∗ (x) = fexp(−Vn ) (x) = fU1 U2 ···Un (x) = (− ln(x))n−1 I(0,1) (x) . (n − 1)!

=

For the two ‘extreme’ cases  = −2 and  = 0, for w ∈ (0, 1), we get respectively

× [αn−2 (− ln(w)) − αn−3 (− ln(w))] and

Tn

d

= U1 U2 · · · Un ,

fW3, (w) = −

and hence (U1 U2 )U3 (U4 U5 )U6 · · · (U3n−2 U3n−1 )U3n = d

Tn

=(U1 U2 · · · Un+1 ) .

Let us now investigate the distribution of Wn, = (Un∗ )X . The joint pdf of (Un∗ , Wn, ) is fUn∗ ,Wn, (u, w) = =f

(u)fX

ln(w) ln(u)



1 I{0≤u≤w≤1} (u, v), |w ln(u)|

and therefore, for n ≥ 3, w ∈ (0, 1), fWn, (w) =  0

w

(− ln(u))n−1 (n − 1)!

 ln(w) =− w −

−2 2w



ln(w)  − 2 −  ln(u) 2



du = w|ln(u)|

(− ln(w))n−2 αn−3 (− ln(w))− (n − 1)! (− ln(w))n−1 (n − 1)!

 ln(w) li(w)  − 2 − w 2

 ln(w)  − 2 − (1 − ln(w)) 2 4 and thus in the ‘extreme’ cases the pdf, for w ∈ (0, 1), has the expression fWn, given in the table below:

(U1 U2 )U3 = U  U nif orm(0, 1),



 −2 E2 (− ln(w)) + li(w) w 2w

fW2, (w) =

and in particular,

d

(− ln(w))n−1 αn−2 (− ln(w)). (n − 1)! w

fWn,0 (w) =

fW1, (w) =

(U1 U2 · · · Un+1 )

2(− ln(w))n−1 × (n − 1)! w

fWn,−2 (w) =

For n = 1, 2, 3,  ∈ [−2, 0], w ∈ (0, 1), we get

Consequently,

∗ Un

(− ln(w))n−1 × (n − 1)! w

−2 ×  αn−3 (− ln(w)) − αn−2 (− ln(w)) . 2

n

 = −2

=0

1 2 3

−2 E2 (− ln(w))+li(w) w li(w) 2 − 2 ln(w) w

− li(w) w 1

1

1−ln(w) 2

More generally, observe that the pdf of Wn,0 = (Un∗ )Un+1 is, for n = 2, 3, . . . , w ∈ (0, 1)  fWn,0 (w) = =

w

0

(− ln(t))n−1 dt = (n − 1)! w | ln(t)|

(− ln(w))n−1 αn−2 (− ln(w)) = (n − 1)! w n−1 1 (− ln(w))k−1 , n−1 (k − 1)! k=1

i.e., fWn,0 (w) is the balanced mixture, with 1 , k = 1, . . . , n − 1, equal weights k = n−1

αn−2 (− ln(w)) =

511

of the pdf’s fk (w) = Uk∗ .

(− ln(w))k−1 I(0,1) (w) (k−1)!

3. Testing uniformity

of

Observe also that the distribution function d (df) of Un∗ = U1 U2 · · · Un is ⎧ 0 x