Most powerful conditional tests Arnold Janssen, Dominik V¨olker Received: January 17, 2006; Accepted: February 16, 2007
Summary: The present paper establishes finite sample most powerful tests for certain nonparametric null hypotheses P0 which admit a sufficient statistic S. The underlying alternatives are of semiparametric or nonparametric nature. Optimal one-sided S-conditional test are offered for families with nonparametric isotone likelihood ratio. Similarly two-sided optimal locally unbiased S-conditional test are introduced for alternatives with nonparametric convex likelihood. If in addition S is P0 -complete then of course we arrive at most powerful α-similar tests. Special examples are randomization tests, permutation tests for two-sample problems and symmetry tests for the null hypothesis of 0-symmetry. The results rely on a new conditional Neyman–Pearson Lemma which can be found in the appendix and which is of own interest. This Lemma is used to solve conditional optimization problems for tests.
1 Introduction The principle of the reduction of a composite null hypothesis by sufficiency is rather old and leads to finite sample distribution free tests of exact nominal level α. These conditional procedures like permutation tests or more generally randomization tests regain a revival since modern computers and algorithms, see for example Pagano and Tritchler [15] and Gebhard and Schmitz [6], allow a serious treatment of conditional critical values. During the last years there was much interest studying the finite sample optimality of conditional tests, see Pfanzagl [16] for parametric problems as well as Gebhard and Schmitz [5] and V¨olker [21] for permutation tests which are optimal against parametric alternatives. Lehmann and Stein [13] derived early most powerful permutation tests for one-sided alternatives. The results were extended by Bell and Sen [1] to one-sided randomization tests. Within the asymptotic set up these procedures are often asymptotically efficient within a wider context. Under a broad class of local alternatives there is asymptotically no difference between efficient unconditional tests and their conditional counterparts, see Romano [17, 18], Janssen [8] and Janssen and Pauls [11] for two-sample tests, Janssen and Mayer [10] for survival tests and censored data, Strasser AMS 2000 subject classification: Primary: 62F03; Secondary: 62G10 Key words and phrases: Conditional test, tests with Neyman-structure, conditional Neyman–Pearson Lemma, randomization test, permutation test, symmetry test, similar test, locally conditionally unbiased test, locally most powerful test
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Statistics & Decisions 25, 41–62 (2007) / DOI 10.1524/stnd.2007.25.1.41 c Oldenbourg Wissenschaftsverlag, M¨unchen 2007
and Weber [20] for permutation tests and Janssen [9] for conditional symmetry tests. An early conditional central limit theorem for special permutation statistics was proved by Hoeffding [7]. In the present paper finite sample size optimality of conditional tests given a sufficient statistic S is studied in detail. Similarly to the unconditional situation we derive a conditional generalized Neyman–Pearson Lemma which is presented in a separate appendix. As consequence most powerful conditional one-sided tests and locally most powerful conditional tests are established. Under a conditional local unbiasedness condition also most powerful two-sided tests of this type are obtained. Roughly speaking, optimal parametric procedures given by a real test statistic T can be extended to nonparametric hypotheses (with sufficient statistic S). The challenge is now that tests based on T should be carried out as distribution free S-conditional tests. The optimality is then preserved under a wide class of semiparametric alternatives. In various cases the test statistic T can be modified towards the efficient score function T0 = T − E • (T |S) of T introduced by Bickel et al. [2]. Here E • (T |S) denotes a version of the conditional expectation which is independent of P ∈ P0 . Above the parametric test statistic T is projected on the S-orthogonal part T0 . The most difficult part of the paper consists of the treatment of arbitrary non-dominated one-sided alternatives and the serious treatment of two-sided conditional tests under constraints which requires special efforts. One may argue that now asymptotic results – say for permutation tests – are superfluous. This is certainly not the case for more complex models. Recall for instance that there does not exist a reasonable exact level α test for the famous Behrens–Fisher problem. This problem is a two-sample problem for the means of normal distributions with different variances, see Linnik [14] for further references. An asymptotic solution based on studentized permutation tests is proposed in Janssen [8]. Throughout, we are concerned with testing nonparametric and semiparametric hypotheses at finite sample size. Special examples are randomization tests which include permutation tests. Typically these tests are carried out as resampling tests. The present results may be viewed as a finite sample supplement of the asymptotics of conditional tests, especially for permutation tests mentioned above. For the general background of conditional tests we refer to Lehmann and Romano [12] and Pfanzagl [16].
2 Preliminaries Let us start with a family of distributions P on some measurable space (, A ). Throughout, let P0 ⊂ P denote a non-void null hypothesis and we are going to establish tests for P0 against P\P0 at level α ∈ (0, 1).
(2.1)
We will always assume that S : (, A ) −→ ( S , A S ) and T : (, A ) −→ ([−∞, ∞], B([−∞, ∞]))
(2.2)
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
42
43
are statistics with values in some measurable spaces where B([−∞, ∞]) denotes the Borel sets on the extended real line. Let always S be P0 -sufficient,
(2.3)
let T denote a suitable test statistic where the choice of T depends of course of the underlying alternatives. It is well known that S-conditional tests (or tests with Neyman structure) are powerful tools in order to reduce a composite null hypothesis P0 by the sufficiency of S. Recall that a test ϕ : −→ [0, 1] is called an S-conditional test with level α, if E • (ϕ|S) = α
(2.4)
holds for a version of the conditional expectation of ϕ given S which is independent of P ∈ P0 . Recall that bounded completeness of S together with the sufficiency implies that each P0 α-similar test ϕ (i.e. E P (ϕ) = α is the nominal level for all P ∈ P0 ) implies (2.4). Suppose for a moment that large values of T are referring to the alternative P\P0 . Then upper T -tests can be carried out as conditional tests by the following procedure. Recall first that there exists a kernel K T (·, ·) : ( S , A S ) × ([−∞, ∞], B([−∞, ∞])) −→ [0, 1], see Dudley [3, p. 270], which represents the conditional distribution of T given S = s for all P ∈ P0 , i.e. formally by definition K T (s, ·) = P T |S=s (·),
P ∈ P0 .
(2.5)
Lemma 2.1 (a) Under the assumptions (2.2) and (2.3) there exist measurable functions c(·) : ( S , A S ) −→ [−∞, ∞],
γ(·) : ( S , A S ) −→ [0, 1]
(2.6)
such that ϕ = 1 (c(S),∞](T ) + γ(S)1 {c(S)} (T )
(2.7)
is an S-conditional test with level α. (b) For a fixed version K T (s, ·) of (2.5) each pair of measurable functions (2.6) with ϕ dK T (s, ·) = α, s ∈ S (2.8) defines via (2.7) an S-conditional test ϕ with level α.
Proof: Routine arguments prove that measurable solutions (2.6) of (2.7) and (2.8) exist. For instance we may choose c(s) := inf {x : K T (s, [−∞, x]) ≥ 1 − α} . Further details are left to the reader.
(2.9)
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
Remark 2.2 (a) The construction (2.8) of the conditional test depends on P0 via the kernel K . Different choices of K for different sets P0 may lead to different tests. However, often P0 can be enlarged such that (2.5) holds for a wider class of distributions. It is easy to see that P0 can be enlarged by adding all infinite convex combinations P=
∞
ci Pi ,
Pi ∈ P0 , ci > 0,
i=1
∞
ci = 1.
(2.10)
i=1
Recall that then E P (·|S) = E • (·|S) remains true. When dealing with conditional tests for P0 we may assume without restrictions that P0 is closed under infinite convex combinations (2.10). (b) Suppose that there exists a product measurable function h : [−∞, ∞] × S −→ [−∞, ∞], where t → h(t, s) is strictly increasing as function of the first argument for all s ∈ S . Then our test statistic T may replaced by T0 := h(T, S). If we replace s → c(s) by s → c0 (s) := h(c(s), s) then the conditional test ϕ of (2.7) remains unchanged as function from T0 , c0 (·), γ(·). Two examples are of special importance where ϕ can be rewritten. (1) If E • (T |S) exists we may choose T0 = T − E • (T |S).
(2.11)
T0 is the projection on the S-orthogonal part of T (2) If T = T0 · g(S) has a positive factor g(S) > 0 which depends only on the sufficient statistic. In this case we can drop the factor g(S). At this stage let us consider some examples of sufficient statistics S for nonparametric null hypotheses P0 . Test statistics and alternatives are specified later on. Example 2.3 (Randomization tests, permutation tests) Let G = {g : (, A ) −→ (, A )} be a finite group of transformations and let P0 be a family of G -invariant distributions, i.e. P g = P for all P ∈ P0 and all g ∈ G . We will consider conditional inference given the σ-field F of G -invariant sets given by F := {A ∈ A : g(A) = A for all g ∈ G },
(2.12)
which can always be described by the statistic (identity id) S = id : (, A ) −→ (, F).
(2.13)
Recall that S is P0 -sufficient where the kernel K(·, ·), K(ω, ·) =
1 εg(ω) , |G | g∈G
(2.14)
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
44
45
yields the conditional expectations
E • ( f |S)(ω) =
f(y) K(ω, dy)
(2.15)
for all integrable functions f : (, A ) −→ R. The kernel K T given by (2.5) is then 1 K T (ω, ·) = εT(g(ω)). (2.16) |G | g∈G
The conditional test ϕ of (2.7) is then called the randomization test given by T . It can be carried out as follows. • Keep the data point fixed. • Consider the randomization distribution g → T(g(ω))
(2.17)
under the uniform distribution on G . Then the conditional critical value c(ω) is just the (1 − α)-quantile of the randomization distribution (2.5). According to (2.11) we may replace T by 1 T0 (ω) = T(ω) − E • (T |S)(ω) = T(ω) − T(g(ω)) (2.18) |G | g∈G
and ϕ remains unchanged. For further references about randomization tests we refer to the survey article of Bell and Sen [1]. Two examples of randomization tests are of special interest: (a) Permutation tests: Consider (n , A n ) and the group G given by all permutations (x 1 , . . . , x n ) → (x π(1), . . . , x π(n) ) of the coordinates. Then P0 may consist of a subset of exchangeable distributions. The randomization tests ϕ are then called permutation tests. A special case is the subset of product measures, i.e. P0 = {P n : P ∈ P ⊂ M(, A )} . But as in many experimental situations the observations must not to be independent. (b) Conditional symmetry tests: Consider here (, A ) = (Rn , Bn ) and the group G of reflections (x 1 , . . . , x n ) → (ε1 x 1 , . . . , εn x n ) given by all vectors (ε1 , . . . , εn ) ∈ {−1, 1}n . Then we are testing a null hypothesis of G -symmetric distributions. A special case is the set of 0-symmetric product measures, i.e. P0 = {P n : P 0-symmetric distribution on (R, B)} .
(2.19)
Let P1 and P2 denote two probability measures and ν = P1 + P2 . The likelihood ratio of P2 with respect to P1 is defined by dP2 dP1 dP2 = . dP1 dν dν If we use the conventions x/0 = ∞ for x > 0 then dP2 /dP1 is P1 + P2 -a.e. uniquely determined.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
3 Conditional tests for one-sided alternatives Throughout, we will consider S-conditional tests (or short conditional tests) for (2.1) with optimum power within the class (2.4) for fixed α. We extend previous results of Pfanzagl [16] who treated one-sided most powerful conditional tests for dominated parametric families. As technical tool a general conditional fundamental Lemma of Neyman–Pearson type will be used, see Lemma A.1 of the appendix. Consider again (2.3) and let us start with a single alternative P\P0 = {Q}. Next we like to specify a single distribution P0 of type (2.10) which is spread out wide enough in P0 which can be used to consider conditional tests for P0 against {Q}. Recall from Remark 2.2 that without restrictions P0 is closed under infinite convex combinations. If P0 is dominated by some σ-finite measure it is well known that there exists a distribution P0 which is equivalent to P0 , i.e. P P0 =
∞
ci Pi ,
Pi ∈ P0
(3.1)
i=1
for all P ∈ P0 , where P0 is of type (2.10), see Lehmann and Romano [12, Appendix A.4.2]. All distributions P0 given by (3.1) are then mutually equivalent. In the general case suitable (maximal) elements P0 will be chosen according to the following Lemma. Let P = Pa + Ps denote the decomposition of P in a Q-singular part Ps and a Q-absolutely continuous part Pa . Lemma 3.1 (a) There exists an infinite convex combination P0 of elements of P0 with P − Ps P0 for each P ∈ P0 . We will call such elements P0 maximal elements of P0 with respect to Q. (b) Let f := dQ/dP0 denote a fixed version of the likelihood ratio of Q with respect to some maximal element P0 . For every P ∈ P0 there exist some measurable function g P (S) with dQ = f · g P (S) dP
P + Q-a.e.
(3.2)
Proof: (a) As in (3.1) we may apply Lehmann and Romano [12, Lemma A 4.2] to the set of the Q-absolutely continuous parts {Pa : P ∈ P0 } of P0 . (b) Let us first assume that P0 P holds. By the sufficiency of S there exists a function g P (S) with dP0 /dP = g P (S). Obviously, equality (3.2) holds on the set {g P (S) > 0} where P0 and P are mutually absolutely continuous. Consider next the set A := {dQ/dP > 0}. Since P0 is a maximal element and Pa = P|A holds we have the equivalence P|A ≈ P0|A of the restrictions on A. This implies that (3.2) holds on A. Its complement Ac is a Q null set. In addition f must be zero P0 -a.e. and also P-a.e. on the set Ac ∩ {g P (S) > 0}.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
46
47
On the remaining set Ac ∩ {g P (S) = 0} equation (3.2) trivially holds. For an arbitrary P ∈ P0 we may choose a further maximal element P1 with P0 + P P1 . Then (3.2) holds for the pair (P1 , Q) and it is easy to see that the same follows for (P, Q). Definition 3.2 A conditional test ϕ with (nominal) level α, see (2.4), is called a conditional Neyman–Pearson test for P0 against {Q} if for each P ∈ P0 there exists a measurable function c(·) : ( S , A S ) −→ [0, ∞] depending on P such that dQ dQ 1 (c(S),∞] ≤ ϕ ≤ 1 [c(S),∞] holds P + Q-a.e. (3.3) dP dP Lemma 3.3 (a) Let P0 be a maximal element of P0 with respect to Q. Choose T := dQ/dP0 . Then each conditional test at level α of the form (2.7) ϕ0 = 1 (c0 (S),∞] (T ) + γ(S)1 {c0 (S)}(T ) is a conditional Neyman–Pearson test for P0 against {Q}. (b) A conditional test ϕ with level α has maximum power E Q (ϕ) for testing P0 against {Q} within the class of conditional tests (2.4) iff ϕ is a conditional Neyman–Pearson test. (c) The maximum power E Q (ϕ) is equal to α iff S−1 (A S )-measurable versions of dQ/dP exist for each P ∈ P0 . In this case S is sufficient for P0 ∪ {Q} and Q can be added to the null hypothesis.
Proof: (a) Define for P ∈ P0 c(S) = c0 (S)g P (S) where c0 (S) is the critical function of ϕ0 and g P (S) is as in (3.2). According to Lemma 3.1 (b) we then have dQ dQ > c(S) ⊂ {ϕ0 = 1} and < c(S) ⊂ {ϕ0 = 0} P + Q-a.e. dP dP Thus ϕ is a conditional Neyman–Pearson test. (b) In a first step we will find the maximum power for testing our maximal element {P0} versus {Q} within the class (2.4). Define ν = (P0 + Q)/2 and f 0 = dP0 /dν, f 1 = dQ/dν where dQ/dP0 = f 1 / f 0 holds ν-a.e. Recall that E ν (h f 0 |S) = E P0 (h|S)E ν ( f 0 |S) holds ν-a.e. for every ν-integrable function h. Thus the condition E P0 (ϕ|S) = α implies E ν (ϕ f 0 |S) = αE ν ( f 0 |S). We are now going to maximize the functional ψ → ψ f1 dν = E Q (ψ)
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
of tests ψ under the condition E ν (ψ f 0 |S) = αE ν ( f 0 |S) =: α1 (S). Lemma A.1 of the appendix implies that our conditional test ϕ0 has maximum power. Another conditional test ψ with optimum power is ν-a.e. equal to ϕ0 on the set {T = c0 (S)}. As in part (a) we see that then ψ is again a conditional Neyman–Pearson test. In addition ϕ0 is up to the choice of the randomization independent of the special maximal element P0 . Conversely, let ϕ denote a conditional Neyman–Pearson test with level α. It is then obvious that ϕ and ϕ0 have the same power at Q. Confer again Lemma A.1. (c) It is trivial that S−1 (A S )-measurable statistics lead to conditional tests with Q-power α. Conversely, if E Q (ϕ0 ) = α holds for the test ϕ0 of part (a), then ϕα ≡ α is a solution of our optimization problem studied in (b) and the set {T = c0 (S)} has ν-probability 1, confer again Lemma A.1. Remark 3.4 If there exists kernels P0 (A|S = s) = K 0 (s, A) and Q(A|S = s) = K 1 (s, A) for the pair (P0 , Q) of Lemma 3.1 (a) then each conditional test with level α and 0-1 structure ⎧ > ⎨1 dK 1 (S, ·)/dK 0 (S, ·) c(S) ϕ= ⎩ 0 < is a conditional Neyman–Pearson test. Observe that the S-measurable part of the likelihood given by dQ S /dP0S can be removed, see also Remark 3.5 (b). Thus we may perform for fixed s a level α Neyman–Pearson test for K 0 (s, ·) against K 1 (s, ·) with critical value c(s). Remark 3.5 (Discussion (about conditional inference)) (a) For convenience let P0 be dominated which is tested against a simple alternative {Q} with Q ≈ P for some P ∈ P0 . The conditional Neyman–Pearson testing procedure can be described as follows. (1) The distribution P , given by dP /dP = E • (dQ/dP|S), can be added to P0 such that S remains sufficient for P0 ∪ {P } (with unchanged conditional expectation E • (·|S)). Note that the Neyman criterion about sufficiency can be used. (2) Obviously, P and Q coincide on the S-measurable sets S−1 (A S ). Thus the S-marginals of P and Q can not be distinguished and testing {P } versus {Q} is in some sense the hardest binary testing problem of P0 ∪ {P } versus {Q}. On the other hand the statistic T = dQ/dP of the conditional Neyman–Pearson Lemma can be replaced by T =
dQ T = dQ dP E • dP |S
without changing the test, see also Remark 2.2 (b). Thus P can be viewed as a projection of Q on P0 and the conditional test is actually testing {P } versus {Q}.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
48
49
(b) The present results confirm the conditional likelihood inference. Suppose that a pair of statistics (X, S) is sufficient for P where S is, as usual, P0 -sufficient. Let µ be a dominating measure with marginal densities f P(m) (s) of the distribution of S under P and let conditional densities f P(c) (x|s) of X given S = s, i.e. dP (X,S) (c) (m) (x, s) = f P (x|s) f P (s), dµ
P ∈ P.
Again the S-measurable parts of the likelihood ratio can be cancelled and the conditional likelihood inference based on f Q(c) (x|s) (c)
f P0 (x|s)
,
P0 ∈ P0
is most powerful for conditional testing of P0 ∈ P0 against Q ∈ P\P0 . Additional (s) of the likelihood is information of the marginals given by the part f Q(m) (s)/ f P(m) 0 not used. The P ’s with the same S-marginal distribution as Q play the role of least favourable distributions where the testing problem is the hardest one. Let T : −→ R denote a statistic. Definition 3.6 The family P\P0 has isotone likelihood ratio in T with respect to P0 if for each Q ∈ P\P0 there exists a maximal element P0 ∈ P0 with respect to Q with dQ = h(T, S) · g(S) dP0
P0 + Q-a.e.
where h : R × S −→ [0, ∞] is product measurable and g : S −→ [0, ∞) are statistics which may depend both on the pair (P0 , Q). For each s ∈ S the function t → h(t, s) is isotone. Families with isotone likelihood ratio admit uniformly most powerful conditional tests, i.e. tests with maximum power under the restriction (2.4). Under the assumption of Definition 3.6 we have Theorem 3.7 The S-conditional test ϕ given by (2.7) and (2.8) is a uniformly most powerful conditional test at level α for conditional testing P0 against P\P0 . Proof: Consider a fixed alternative Q ∈ P\P0 with maximal element P0 . Due to Lemma 3.3 it is enough to prove that ϕ is an S-conditional Neyman–Pearson test for testing {P0 } against {Q}. Based on the function c(S) specified by (2.9) introduce a new function d(S) = h(c(S), S)g(S). Then the following inclusions holds P0 + Q-a.e.: dQ dQ > d(S) ⊂ {ϕ = 1} and < d(S) ⊂ {ϕ = 0}. dP0 dP0 This implies the optimality of ϕ.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
The present result is a slight extension of Pfanzagl [16], Section 4.6 and 4.7 about most powerful similar tests for dominated families. In case when optimal one-sided tests do not exist often locally most powerful tests (also called score tests) are proposed which maximize the slope at the null hypothesis. It is pointed out below that here also locally most powerful conditional tests exist. Let d ln(dPϑ /dP0 )|ϑ=0 at ϑ = 0 of (Pϑ )ϑ∈ be L1 -differentiable with L1 -derivative L˙ = dϑ the likelihood ratios. For the concept of L1 -differentiable families of distributions we refer to Witting [22, Section 1.8.1]. Recall that for each test ψ the function ϑ → E Pϑ (ψ) is differentiable at ϑ = 0 with derivative
d E P (ψ)|ϑ=0 = E P0 ψ L˙ . dϑ ϑ Suppose now that the member P0 of the path belongs to our null hypothesis P0 . A conditional test ϕ with (2.4) is called locally most powerful for {P0 } against {Pϑ : ϑ > 0} with level α if d d E Pϑ (ϕ)|ϑ=0 ≥ E P (ψ)|ϑ=0 dϑ dϑ ϑ holds for all competing conditional tests ψ with level α, see (2.4). It is well known that ˙ a locally most powerful parametric test is given by upper L-tests, see Witting [22, p. 222]. In the conditional set up the same holds true if the test is carried out as conditional test ˙ see Lemma 2.1. with the test statistic T = L, Theorem 3.8 Let ϕ denote a conditional test with level α, see (2.4). (a) Then ϕ is a locally most powerful conditional test for {P0 } against {Pϑ : ϑ > 0} iff ⎧ ⎨1
> ˙ ϕ= L˙ − E P0 L|S c(S) ⎩ 0 < where c(·) : ( S , A S ) −→ R is some measurable function. (b) For ϕ given by (a) we have d E P (ϕ)|ϑ=0 = 0 dϑ ϑ
iff
˙ L˙ = E P0 L|S P0 -a.e.
˙ Remark 3.9 The projection L˙ − E P0 ( L|S) of the score function L˙ is known to be the efficient score function. It plays a central role for semiparametric models with square integrable score functions, see Bickel et al. [2]. Proof of Theorem 3.8: (a) Let ϕ0 be the conditional test at level α of Lemma 2.1 given ˙ Recall that the derivate of the power function of any test ψ is just E P0 ψ L˙ . by T = L. Maximizing this slope at zero is given by the equation
E P0 ψ L˙ = max
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
50
51
where E P0 (ψ|S) = α holds. If we put g0 = L˙ and g1 ≡ 1 then ϕ0 is a solution, see Lemma A.1. Each other solution coincides with ϕ0 P0 -a.e. outside the randomization ˙ region. It is evident that the S-measurable function E P0 ( L|S) can be subtracted. (b) If the slope of the power function is zero then ϕα ≡ α is locally most powerful. Thus ˙ = c(S) holds P0 -a.e. L˙ − E P0 ( L|S)
4 Conditional tests for two-sided alternatives In this section let P0 be dominated by some measure P0 ∈ P0 which is as in (3.1). If P\P0 is a two-sided alternative then local unbiased tests at P0 have to be considered. This restriction is expressed by a further equation. Let T : −→ R be again a statistic and let h(T, S) be P0 integrable. Consider then conditional tests ϕ with E • (ϕ|S) = α and E P0 ((ϕ − α)h(T, S)|S) = 0.
(4.1)
That condition is the local conditional unbiasedness property of ϕ, confer with the discussion below. Since S is sufficient for P0 (4.1) implies that E P ((ϕ − α)h(T, S)|S) = 0 holds for every measure P ∈ P0 for which the conditional expectation exist. If the conditional expectation exist for every P ∈ P0 we arrive at E • ((ϕ − α)h(T, S)|S) = 0. In the case of one-sided tests it is very easy to derive the existence of a test ϕ of the form (2.7) which fulfills the condition E • (ϕ|S) = α. The prove of the existence of a test ϕ which fulfills the conditions E • (ϕ|S) = α and (4.1) is much more complicated. The next Lemma extends classical results of Ferguson [4] for exponential families, see also Strasser [19, Chap. 2]. Lemma 4.1 Suppose that h : R × S −→ R is product measurable where t → h(t, s) is isotone for each s ∈ S . Then there exist measurable functions ci (·) : S −→ R and γi (·) : S −→ [0, 1], i = 1, 2, such that ⎧ ∈ [c1 (S), c2 (S)] ⎨ 1 ϕ = γi (S) T = ci (S), i = 1, 2 (4.2) ⎩ 0 ∈ (c1 (S), c2 (S)) fulfills E • (ϕ|S) = α and condition (4.1).
Proof: In a first step let S be constant. In this case the proof of Ferguson [4] and Strasser [19, Section 13] can easily be adapted. Let cβ denote the β-quantile of P0T and let ϕβ = 1 (−∞,cβ ) (T ) + γβ1 {cβ } (T )
(4.3)
be fixed by the condition E P0 (ϕβ ) = β for β ∈ (0, 1). Define ϕ0 ≡ 0 and ϕ1 ≡ 1. Clearly, β1 ≥ β2 implies ϕβ1 ≥ ϕβ2 holds and β → ϕβ is continuous in L1 (P0 ).
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
To see this choose βn ↑ β. Then ϕβ − ϕβn dP0 = β − βn −→ 0.
(4.4)
Continuity from above follows by the same arguments. For each β ∈ [0, α] define the new tests
ψβ := ϕβ + 1 − ϕ1+β−α .
(4.5)
Thus β → (ψβ − α)h(T, S) dP0 is continuous on [0, α]. The ordinary generalized Neyman–Pearson Lemma implies that ψ0 is a solution of the optimization problem E P0 (ψh(T, S)) = max
(4.6)
under the constraint E P0 (ψ) = α. Similarly ψα minimizes the functional in (4.6). Thus E P0 (ψα h(T, S)) ≤ αE P0 (h(T, S)) ≤ E P0 (ψ0 h(T, S))
(4.7)
follows when ψ0 and ψα are compared with the constant test α. By continuity we find a solution β0 ∈ [0, α] with (4.1). In case β0 ∈ {0, α} we then have equality in (4.7) at least for one place. The necessary part of the Neyman–Pearson Lemma implies h(T, S) is constant P0 -a.e. since the constant test α is also a solution. Thus we may choose here β0 = α/2 and c1 and c2 are again finite. If S is not constant choose a regular conditional distribution P T |S=s of T given S = s via a kernel K T (s, ·) K T (s, A) = E • (1 A (T )|S = s),
A ∈ B.
We will find a solution in the class of tests ψ = ψ(T, S) which only depend on (T, S). On the set, where |h(t, s)| K T (s, dt) < ∞ holds the constraints read as ψ(t, s) K T (s, dt) = α and (ψ(t, s) − α)h(t, s) K T (s, dt) = 0. (4.8) We will see that our test ϕ turns out to be a pointwise solution of (4.8) for fixed s. Together with the first part of the proof a measurable solution of (4.8) will be guaranteed. For these reasons let s → β(s) be a measurable function of conditional levels in [0, 1]. Then the critical values s → cβ(·)(s) := inf {y ∈ Q : K T (s, (−∞, y]) ≥ β(s)} given by β(·) are measurable. Let now ϕβ(·)(T, S) denote the test (4.3) with β and γ replaced by functions β(·) and γ(·) on S where γ(·) is determined by ϕβ(·)(t, s) K T (s, dt) = β(s). Clearly, ϕβ(·) is measurable. Similarly as in (4.5) we now introduce ψβ(·)(T, S) for 0 ≤ β(·) ≤ α where ψβ(·) (t, s) K T (s, dt) = α
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
52
53
holds for all s. A measurable solution will now be derived via a suitable measurable function β0 (·). Observe that for β ∈ [0, α] f(s, β) := (ψβ (t, s) − α)h(t, s) K T (s, dt) is continuous in β and measurable in s. Thus β0 (s) = inf{β ∈ [0, α] ∩ Q : f(s, β) ≥ 0} is a measurable function with f(s, β0 (s)) = 0. The corresponding test ψβ0 (·) has the desired form. It may happen that A = {s : β0 (s) = 0 or α} is not empty. On this set we may modify ci (·) as follows in order to replace them by real functions. Notice that as above in the unconditional case we may choose β0 (s) = α/2 for s ∈ A since then f(s, α/2) = 0 holds. Thus the proof is complete. Before applications are studied we will discuss the local conditional unbiased condition (4.1). Let us summarize some details about L1 -differentiable paths. The derivate of the likelihood ratio at P belongs to the set of score functions 0 L1 (P) = ∈ L1 (P) : dP = 0 . On the other hand it is easy to see that for each ∈ L01 (P) there exists a path t → Pt , P0 = P, with derivative . For instance we may choose dPt = |1 + t|ε−1 (t), dP
t∈R
(4.9)
with ε(t) = E P (|1 + t|). The details are left to the reader. Remark 4.2 (Discussion about the local conditional unbiased condition) (a) Let (Q ϑ )ϑ∈ be a one parameter exponential family in ϑ ∈ ⊂ R and h(T, S) ◦
with ϑ0 ∈ defined by dQ ϑ /dQ ϑ0 = C(ϑ) exp((ϑ − ϑ0 )h(T, S)). The local unbiasedness condition of tests ϕ for testing H = {ϑ0 } against K = \{ϑ0 } is now given by d E Q ϑ (ϕ)|ϑ=ϑ0 = E Q ϑ0 (ϕh(T, S)) − E Q ϑ0 (ϕ)E Q ϑ0 (h(T, S)) = 0. dϑ If now Q ϑ0 ∈ P0 holds we arrive at the condition E Q ϑ0 ((ϕ − α)h(T, S)) = 0 which is an unconditional version of (4.1) with P0 replaced by Q ϑ0 . For an rich enough class P0 we may attach an exponential family in ϑ and h(T, S) with foot point Q ϑ0 = P for every P ∈ P0 . Let us further assume that S is P0 -complete. Then we arrive at condition (4.1). (b) On the other hand let h(T, S) be integrable for each P in P0 . Then at each foot point P we may also attach a L1 -differentiable path t → Pt with score function h P := h(T, S) − E P (h(T, S))
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
at P ∈ P0 . Local unbiasedness of a test ϕ with E P (ϕ) = α then yields E P ((ϕ − α)h(T, S)) = 0. When the model is rich enough the latter condition holds for each P ∈ P0 . If in addition our statistic S is P0 -complete we then have E • ((ϕ − α)h(T, S)|S) = 0 which explains the notion conditional unbiasedness of a test ϕ a second time. Below let T and h be fixed statistics. Suppose that t → h(t, s) is isotone for each s ∈ S . Then the existence of the test ϕ in (4.2) is ensured. In the following we will study the optimality properties of this test ϕ. For the case of one-sided hypotheses we have considered alternatives with isotone likelihood ratio in T . For the case of two-sided hypotheses we have to consider alternatives with convex likelihood ratio in T . Definition 4.3 The family P\P0 has convex likelihood ratio in T and h with respect to P0 if P is dominated by P0 and if for each Q ∈ P\P0 the likelihood ratio of Q with respect to P0 is of the form dQ = f(h(T, S)) · g(S) dP0
P0 -a.e.
(4.10)
where f : R −→ [0, ∞) is strictly convex and g : S −→ [0, ∞) are statistics which may depend both on Q. Observe that in Definition 4.3 P0 may be replaced by another maximal element Q 0 ∈ P0 since dP0 /dQ 0 is a function of S. With the definition of a convex likelihood ratio we get the following theorem. Theorem 4.4 Let P\P0 be a family with convex likelihood ratio in T and h with respect to P0 . Each test ϕ of the form (4.2) which fulfills the conditions (2.4) and (4.1) has optimum power for testing P0 against P\P0 within the class of conditional tests (2.4) under the additional condition (4.1). Proof: Let Q ∈ P\P0 and P0 , f, g as in (4.10) be fixed. Then the test ϕ can be rewritten as ⎧ ∈ [d1 (S), d2 (S)] ⎨1 ϕ= f(h(T, S)) (4.11) ⎩ 0 ∈ (d1 (S), d2 (S)) with di (S) = f(h(ci (S), S)), i = 1, 2. We will show that E Q (ϕ) is maximal in the present class of tests. For this purpose we may define S1 = S2 = S for m = r = 2 and g0 =
dQ , g1 = 1 and g2 = h(T, S) dP0
(4.12)
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
54
55
and we will show that the conditional Neyman–Pearson Lemma, Lemma A.1, can be applied. The restrictions are E P0 (ψ|S) = α and E P0 (ψh(T, S)|S) = αE P0 (h(T, S)|S). Since f is strictly convex there exist measurable functions ai (·) with {t : f(h(t, S)) < a1 (S) + a2 (S)h(t, S)} = (d1 (S), d2 (S)). Put k1 (S) = a1 (S)/g(S) and k2 (S) = a2 (S)/g(S) for g(S) = 0 and k1 (S) = k2 (S) = 0 otherwise. Then ϕ has the structure required in Lemma A.1 since by (4.10) and (4.11) dQ > k1 (S)g(S) + k2 (S)h(T, S) ⊂ {ϕ = 1} and dP0 dQ < k1 (S)g(S) + k2 (S)h(T, S) ⊂ {ϕ = 0}. dP0 Thus E Q (ϕ) is maximal.
The proof above extends to the following alternatives. Remark 4.5 Consider again conditional tests with (2.4) and (4.1). (a) If P is a set of distributions such that (4.10) holds for a strictly concave function f then the error probability E P (ϕ) is minimal for P ∈ P . Thus the solution ϕ is of level α on the strict concave part of distributions P . Roughly speaking, the null hypothesis with concave likelihood ratios are tested against strict convex likelihood ratios. (b) Suppose that there exist measurable functions such that dP = k1 (S) + k2 (S)h(T, S) dP0 is a density of some distribution P. Then Lemma A.1 can be applied to the constant test α which is a solution under (4.1). Thus E P (ϕ) = α follows. Remark 4.6 (a) If in addition S is P0 -boundedly complete then S-conditional optimality of tests implies unconditional optimality of one-sided tests within the class of P0 α-similar tests. (b) The same assertion holds for two-sided tests if either the additional function h(T, S) is bounded or S is P0 -complete.
5 Examples 5.1 Permutation tests As in Example 2.3 (a) let G be the group of transformations generated by the permutations of the coordinates, F the G -invariant Borel sets on n and S the A n -F measurable identity. In this set up we are testing the null hypothesis of an i.i.d. situation against various two-sample alternatives.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
Example 5.1 (Example 2.3 (a) continued) Let T : −→ R denote a fixed test statistic and P0 = {P n : P ∈ P }, P ⊂ M1 (, A ) the null hypothesis. (a) (One-sided permutation tests) For n 1 + n 2 = n let P\P0 be given by all distributions Q n1 ⊗ P n2 , P ∈ P , where the P0n1 -density of Q n1 differ from the P0n1 -density of P n1 only by a multiplicative factor h(T(x 1 ) + · · · + T(x n1 )) where h is an isotone function and P0 ∈ P is some fixed dominating distribution, P + Q P0 , with n 1 dQ n1 dP n1 x = h , . . . , x T(x ) 1 n i n1 n1 x1 , . . . , xn1 . 1 dP0 dP0 i=1 In this situation the P0n -density of Q n1 ⊗ P n2 is given by n 1 dQ n1 ⊗ P n2 dP n T(x i ) (x 1 , . . . , x n ) = h (x 1 , . . . , x n ) . n dP0 dP0n i=1
According to Theorem 3.7 the test ⎧ 1 > ⎪ n1 n ⎨ 1 1 T(x i ) − T(x i ) = c(S) ϕ = γ(S) n1 n2 ⎪ ⎩ i=1 i=n 1 +1 0
0, P ∈ P0 }. Observe that S(x) = |x| is boundedly complete. Remark 5.4 The present finite sample optimality results about symmetry fit into the asymptotic results of Janssen [9]. In that paper it is pointed out that after a suitable studentization the idea of S-conditional tests also works well asymptotically for extended null hypotheses given by functionals (even for non-symmetric distributions under the null hypothesis).
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
58
59
Appendix In the appendix we will prove a generalized conditional Neyman–Pearson Lemma. In the framework of conditional tests this Lemma plays the same role as the generalized Neyman–Pearson Lemma in the unconditional framework does. A prior version of a conditional fundamental Lemma can be found in Pfanzagl [16, p. 145] for densities and m = 1. For the general case with m > 1 a different proof is required. Let P be a distribution on (, A ), let gi : −→ R be P-integrable functions, 0 ≤ i ≤ m, and Si : −→ i measurable functions, 1 ≤ i ≤ m. The statistics Si will describe conditional constraints. Suppose that S1 = · · · = Sr are the same for some 1 ≤ r ≤ m (with 1 = · · · = r ). We are looking for a test ϕ : −→ [0, 1] which maximizes the functional ψg0 dP (A.1) under all tests ψ : −→ [0, 1] which satisfy either the side conditions E(ψgi |Si ) = αi (Si ),
1 ≤ i ≤ m,
(A.2)
1 ≤ i ≤ m,
(A.3)
or more general side conditions E(ψgi |Si ) ≤ αi (Si ),
for some given measurable functions αi (·) : i −→ R. Clearly, the test ϕ has to satisfy the side conditions (A.2) or (A.3) as well. Lemma A.1 (Conditional Neyman–Pearson Lemma) Let = (and ≤ ) denote the set of all tests ψ : −→ [0, 1] which satisfy the side conditions (A.2) (and conditions (A.3), respectively). Then the following holds true: (a) (Sufficient condition) For some measurable functions ci (·) : i −→ R, 1 ≤ i ≤ m, let ϕ : −→ [0, 1] be a test of the form ⎧ 1 > ⎪ ⎪ ⎪ m ⎨ g ci (Si )gi (A.4) ϕ= 0 ⎪ ⎪ i=1 ⎪ ⎩ 0 < which belongs to the set = . If ci (Si )gi are P-integrable for i > r the test ϕ maximizes the functional (A.1) under all tests which belongs to the set = : ϕg0 dP = sup ψg0 dP. (A.5) ψ∈=
If in addition we have ci (Si ) ≥ 0 for all 1 ≤ i ≤ m, then = can be replaced in (A.5) by ≤ .
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
(b) (Uniqueness) Let ci and ϕ be as in (a) and let ψ ∈ = be another test which maximizes the functional (A.1) under all tests in the set = . Then ϕ|A = ψ|A holds P-a.s. on the set A := {g0 = m i=1 ci (Si )gi }. Proof: (a) For K > 0 we define A K := ri=1 {|ci (·)| ≤ K } such that ci (Si )gi 1 A K (S1 ) are integrable for 1 ≤ i ≤ m. Due to (A.4) we conclude m (A.6) (ϕ − ψ) g0 − ci (Si )gi ≥ 0 i=1
for every test ψ ∈ ≤ . This yields m 1 (ϕ − ψ) A K (S1 )g0 dP ≥ (ϕ − ψ)1 A K (S1 )ci (Si )gi dP
(A.7)
i=1
where the left hand side of (A.7) tends to (ϕ − ψ)g0 dP for K to ∞. Consider a term on the right hand side of (A.7) with i > r. For K to ∞ this term tends to (ϕ − ψ)ci (Si )gi dP = E((ϕ − ψ)gi |Si )ci (Si ) dP. This expression is 0, if ψ ∈ = and ≥ 0, if ψ ∈ ≤ and ci ≥ 0. For every term on the right hand side of (A.7) with i ≤ r we have (ϕ − ψ)1 A K (S1 )ci (S1 )gi dP = E((ϕ − ψ)gi |S1 )1 A K (S1 )ci (S1 ) dP. Again,this expression is 0, if ψ ∈ = and ≥ 0, if ψ ∈ ≤ and ci ≥ 0. This consideration yields (ϕ − ψ)g0dP ≥ 0 if both cases are put together. (b) Suppose that ψg0 dP = ϕg0 dP holds. According to (A.6) we have f := (ϕ − ψ) g0 − m i=1 ci (Si )gi ≥ 0 and h(S1 ) := E( f |S1 ) ≥ 0 exists where 0 ≤ E( f 1 A K (S1 )|S1 ) = h(S1 )1 A K (S1 ) −→ h(S1 ) holds when K tends to ∞. We can take conditional expectations of f since f ≥ 0 although f may not be integrable. It is enough to prove E( f ) = 0. In the same way as in part (a) we prove r E (ϕ − ψ) ci (Si )gi 1 A K (S1 ) S1 = 0 i=1
which entails
⎞ ⎝ ⎣ ⎦ E f 1 A K (S1 )|S1 = E (ϕ − ψ) g0 − ci (Si )gi 1 A K (S1 ) S1 ⎠ i=r+1 ⎛ ⎡ ⎤ ⎞ m ⎝ ⎣ ⎦ −→ h(S1 ) = E (ϕ − ψ) g0 − ci (Si )gi S1 ⎠ . i=r+1
⎛
⎡
m
⎤
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Janssen -- V¨olker
60
61
Thus we have
⎛
⎡
E( f ) = E(h(S1 )) = E⎝(ϕ − ψ) ⎣g0 − ⎛
⎡
= E⎝(ϕ − ψ) ⎣−
m
⎤⎞ ci (Si )gi ⎦⎠
i=r+1
m
⎤⎞ ci (Si )gi ⎦⎠
i=r+1
by our assumption ϕg0 dP = ψg0 dP. On the other hand condition (A.2) implies also E((ϕ−ψ)ci (Si )gi ) = E(E((ϕ−ψ)gi |Si )ci (Si )) = 0 for all i > r and thus E( f ) = 0 follows. This completes the proof. In Pfanzagl [16] the prior conditional Neyman–Pearson Lemma was used to derive most powerful one-sided tests for a real parameter in the presence of a nuisance parameter under the assumption of an exponential family. The case of two-sided hypotheses was left open. With Lemma A.1 and Lemma 4.1 it is possible to propose most powerful two-sided tests for the same situation. Observe also Remark 4.2 and Theorem 4.4.
References [1] C. B. Bell and P. K. Sen. Randomization procedures. In P. R. Krishnaiah and P. K. Sen (ed.), Handbook of Statistics, Vol. 4, pages. 1–29, Amsterdam, Elsevier Science Publishers, 1984. [2] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner. Efficient and Adaptive Estimation for Semiparametric Models. Baltimore and London, Johns Hopkins Ser., Mathem., Sciences, Johns Hopkins Univ., Press, 1993. [3] R. M. Dudley. Real Analysis and Probability. Pacific Grove, Calfornia, Wadsworth & Brook/Cole, 1989. [4] T. S. Ferguson. Mathematical Statistics, A Decision Theoretic Approach. New York and London, Academic Press, 1967. [5] J. Gebhard and N. Schmitz. Permutation tests – A revival?! I. Optimum properties. Statistical Papers, 39:75–86, 1998. [6] J. Gebhard and N. Schmitz. Permutation tests – A revival?! II. An efficient algorithm for computing the critical region. Statistical Papers, 39:87–96, 1998. [7] W. Hoeffding. The large-sample power of tests based on permutations of the distributions. Ann. Math. Statist., 23:169–192, 1952. [8] A. Janssen. Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens–Fisher problem. Statist. Probab. Lett., 6:9–21, 1997. [9] A. Janssen. Nonparametric symmetry tests for statistical functionals. Math. Methods Statist., 8:320–343, 1999.
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
Most powerful conditional tests
Janssen -- V¨olker
[10] A. Janssen and C.-D. Mayer. Conditional studentized survival tests for randomly censored models. Scand. J. Statist., 28:283–293, 2001. [11] A. Janssen and Th. Pauls. How do bootstrap and permutation tests work? Ann. Statist., 31:768–806, 2003. [12] E. L. Lehmann and J. P. Romano. Testing Statistical Hypotheses, 3rd ed. Springer Texts in Statistics. New York, Springer, 2005. [13] E. L. Lehmann and C. Stein. On the theory of some non-parametric hypotheses. Ann. Math. Statist., 20:28–45, 1949. [14] J. N. Linnik. Statistical problems with nuisance parameters. Transl. Mathem. Monograph 20, Providence, RI, Amer. Math. Soc., 1968. [15] M. Pagano and D. Tritchler. On obtaining permutation distributions in polynomial time. J. Amer. Statist. Assoc., 78:435–440, 1983. [16] J. Pfanzagl. Parametric Statistical Theory. Berlin – New York, de Gruyter, 1994. [17] J. P. Romano. Bootstrap and randomization tests of some nonparametric hypotheses. Ann. Statist., 17:141–159, 1989. [18] J. P. Romano. On the behavior of randomization tests without a group invariance assumption. J. Amer. Statist. Assoc., 85:687–692, 1990. [19] H. Strasser. Mathematical Theory of Statistics. Berlin – New York, de Gruyter, 1985. [20] H. Strasser and Ch. Weber. The asymptotic theory of permutation statistics. Math. Methods Statist., 8:220–250, 1999. [21] D. V¨olker. Finit optimale nichtparametrische Tests f¨ur Lebensdauerdaten. PhD thesis, Westf¨alische Wilhelms-Universit¨at M¨unster (in German), 2003. [22] H. Witting. Mathematische Statistik. Stuttgart, B. G. Teubner, 1985.
Arnold Janssen Mathematisches Institut Heinrich-Heine Universit¨at Universit¨atsstraße 1 40225 D¨usseldorf Germany
[email protected]
Dominik V¨olker Institut f¨ur Mathematische Statistik Westf¨alische Wilhelms-Universit¨at Einsteinstraße 62 48149 M¨unster Germany
[email protected]
This article is protected by German copyright law. You may copy and distribute this article for your personal use only. Other use is only allowed with written permission by the copyright holder.
62