Minimax Euclidean Separation Rates for Testing Convex ... - arXiv

Vol. 0 (2017)

arXiv:1702.03760v1 [math.ST] 13 Feb 2017

Minimax Euclidean Separation Rates for Testing Convex Hypotheses in Rd Gilles Blanchard and Alexandra Carpentier and Maurilio Gutzeit∗ Universit¨ at Potsdam, Institut f¨ ur Mathematik Karl-Liebknecht-Straße 24-25, 14476 Potsdam, Germany e-mail: gilles.blanchard/carpentier/[email protected] Abstract: We consider composite-composite testing problems for the expectation in the Gaussian sequence model where the null hypothesis corresponds to a convex subset C of Rd . We adopt a minimax point of view and our primary objective is to describe the smallest Euclidean distance between the null and alternative hypotheses such that there is a test with small total error probability. In particular, we focus on the dependence of this distance on the dimension d and the sample size/variance parameter n giving rise to the minimax separation rate. In this paper we discuss lower and upper bounds on this rate for different smooth and nonsmooth choices for C. Keywords and phrases: composite testing problem, minimax testing.

1. Introduction In this paper we consider the problem of testing whether a vector µ ∈ Rd belongs to a convex subset C of Rd with d ∈ N, based on a noisy observation X obtained from the Gaussian sequence model with variance scaling parameter n ∈ N, i.e. 1 X = µ + √ ǫ, n

(1.1)

where ǫ is a standard Gaussian vector. More precisely, in an L2 sense, we aim at finding the order of magnitude of the smallest separation distance ρ > 0 from C such that the testing problem H0 : µ ∈ C

vs. Hρ : inf kµ − ck2 ≥ ρ, c∈C

(1.2)

where k.k2 is the Euclidean distance, can be solved in the following sense: For η ∈ (0, 21 ), we can construct a uniformly η−consistent test ϕ for (1.2), i.e. sup Pµ (ϕ = 1) + sup Pµ (ϕ = 0) ≤ η, µ∈C

(1.3)

µ∈Aρ

where Aρ ⊂ Rd corresponds to Hρ . We write ρ∗ (C) := ρ∗ (C, d, n, η) for the minimax-optimal separation distance ρ for this problem, i.e. the smallest distance ρ that enables the existence of such a test. See Section 2 for a more precise description of the model and relevant quantities. ∗ Corresponding

author 1

G. Blanchard, A. Carpentier, M. Gutzeit/Minimax Euclidean Separation Rates

2

An instance of this problem that was extensively studied is signal detection, i.e. the case where C is a singleton, see e.g. [11] for an extensive survey of this problem and also [2]. From this literature, we can deduce that the minimaxoptimal order of ρ∗ (C) in this case is d1/4 √ . n In its general form however, this problem is a composite-composite testing problem (i.e. neither C nor Aρ is only a singleton). A versatile way of solving such testing problems was introduced in [9], where the authors combine signal detection ideas with a covering of the null hypothesis, for deriving minimax optimal testing procedures for a composite-composite testing problems, provided that the null hypothesis is not too large (i.e. that its entropy number is not too large, see Assumption (A3) in [9]). In this case, the autors prove that the minimax optimal testing separation rate in this composite-composite testing problem is 1/4 the same as the signal detection separation rate, namely d√n . This idea can be generalised also in the case where the null hypothesis is “too large” (when Assumption (A3) in [9] is not satisfied), and in this case this approach implies that an upper bound on the minimax rate of separation is the sum of the signal detection rate and the optimal estimation rate in the null hypothesis C - see [5] for an illustration of this for a specific convex shape. When using this technique, it results that the smaller the entropy of C, the smaller the separation rate. This idea has the advantage of generality, but is nevertheless sub-optimal in many simple cases. For instance in the case where the convex C is a halfspace, the minimax-opimal separation rate is √1n , which is much smaller than the minimax-opimal signal detection rate, and that even though a half-space has a much larger entropy (it is even infinite), and larger dimension, than a single point. See Section 3 for an extended discussion on this case. This highlights the fact that for such a testing problem, it is in many cases not the entropy, or size, of the null hypothesis that drives the rate, but rather some other properties of the shape C. In order to overcome the limitations of this approach, some other ideas were proposed. A first line of work is in the paper [3], where the authors consider the general testing problem (1.2), but for separation in k.k2 -norm instead of k.k∞ -norm. Since any convex can be written as a intersection of half-spaces, they rewrite the problem as a multiple testing problem. This approach is quite fruitful, but the k.k∞ -norm results translate in a non-optimal way in k.k2 -norm in high dimension, in terms of the dependence in the dimension d. A second main direction that was investigated was to consider testing for some specific convex shapes, as e.g. the cone of positive, monotone, or convex functions, see e.g. [12], or also balls for some metrics [10, 13, 7]. These papers exhibit the minimax-optimal separation distance - or near optimal distance, in some cases of [12] and [13] - for the specific convex shapes that are considered. The models in [12, 13, 7] are different from our model - as they consider functional estimation - and the results are therefore not immediately applicable to our setting.


3

Besides, they consider specific shapes, i.e. cones in the two first papers, and a smoothness ball in the third paper, but they do not provide results for a more general null hypothesis shape. In Sections 3 and 5, we derive results for our model and shapes related to those of these papers - namely the positive orthant and the euclidian ball - in order to relate our work with these earlier results. Finally, a last type of results that are related to our problem is the case where the null hypothesis can be parametrised, see e.g. [8] where the authors consider shapes that can be parametrised by a quadratic functional. This approach and their results are very suggestive, as they suggest that the smoothness of the shape has an impact on the testing rate. In this paper, we want to take a more general approach toward the testing problem (1.2). In Section 3, we expose the range of possible separation rates by demonstrating that, without any further assumptions on C, the statement √ d 1 √ . ρ∗ (C) . √ (1.4) n n is sharp up to ln(d)-factors. After that, in Sections 4 and 5, we investigate the potential of a geometric smoothness property of the surface of C. Despite its simplicity, this property takes us quite far: In particular, given any separation rate satisfying (1.4), it allows for constructing a set C exhibiting this rate up to ln(d)-factors. 2. Setting Let d, n ∈ N. We consider the d-dimensional statistical model 1 X = µ + √ ǫ, n

(2.1)

where µ ∈ Rd is unknown and ǫ is a standard Gaussian vector, written ǫ ∼ N (Od , Id ) with the null vector Od and the identity matrix Id . Now, let C ( Rd be non-empty and convex and for x ∈ Rd , dist(x, C) := inf kx − ck2 , c∈C

qP d

2 where k · k2 denotes Euclidean distance, i.e. kxk2 = i=1 xi . A corresponding d open Euclidean ball with center z ∈ R and radius r > 0 is denoted B(z, r).

For ρ > 0, we are interested in the testing problem H0 : µ ∈ C vs. Hρ : dist(µ, C) ≥ ρ and we write Aρ := {x ∈ Rd | dist(x, C) ≥ ρ}.

(2.2)


4

Our aim is to find the smallest value of ρ such that testing (2.2) with prescribed total error probability is possible in a minimax sense, i.e. the quantity ρ∗ (C) := =

ρ∗ (C, d, n, η) inf{ρ > 0 | ∃ test ϕ : sup Pµ (ϕ = 1) + sup Pµ (ϕ = 0) ≤ η} µ∈C

µ∈Aρ

for η ∈ 0, 21 . Here, a test ϕ is a measurable function ϕ : Rd → {0, 1}.

In particular, we focus on the dependence of ρ∗ (C) on the dimension d and sample size/ variance parameter n. In terms of notation, this is done by using the symbols ., & and h as follows: We write ρ∗ (C) . gC (d, n) where gC (d, n) depends only on C, n, d, if for some w(η) > 0 depending only on η and C > 0 a universal constant, ρ∗ (C) ≤ Cw(η)gC (d, n). We define in a similar way the symbol & (other direction). Now, if gC (d, n) . ρ∗ (C) . gC (d, n), we write ρ∗ (C) h gC (d, n) and call gC (d, n) the minimax Euclidean separation rate for (2.2) or simply separation rate. Remark 2.1. In the proofs for upper bounds on ρ∗ (C) it is necessary to consider the type- I and type- II errors sup Pµ (ϕ = 1) and sup Pµ (ϕ = 0) separately µ∈C

µ∈Aρ

leading to parameters α, β ∈ (0, 12 ) rather than η. However, this does not affect the separation rate. For the sake of consistency in notation, we will state the exact constants w(η) in upper bounds with α = β = η2 . In these statements and in the proofs, we use the abbreviation vx := ln x1 , x > 0. 3. A General Guarantee and Extreme Cases

The rate ρ∗ (C) clearly depends on C. Let us firstly examine a simple, essentially one-dimensional case, namely a half-space. Theorem 3.1. Let C = CHS = (−∞, 0] × Rd−1 (if d = 1, CHS = (−∞, 0]). Then, in the testing problem (2.2), we have r r 1 8 ∗ ln(1 + 4(1 − η)2 ) ≤ ρ (C) ≤ vη n n and therefore

1 ρ∗ (C) h √ . n Remark 3.2. Note that the lower bound in the previous theorem is valid for any choice of convex C such that C and Rd \C are non-empty: 1 √ . ρ∗ (C). n

This is easily verified using the fact that due to convexity there is a hyperplane in Rd tangent to C such that one of the two corresponding half-spaces fully contains C. In the case of Theorem 3.1, the hyperplane {0} × Rd−1 has this role.


5

Now, on the other hand, making no additional assumptions about C, a natural choice ϕ for solving (2.2) is a plug-in test based on confidence balls. This gives rise to the following general upper bound: Theorem 3.3. Let C be an arbitrary convex subset of Rd such that C and Rd \C are non-empty. Then, in the testing problem (2.2), we have r 2 2q d ∗ + ρ (C) ≤ 2 dvη/2 + vη/2 n n n and therefore

√ d √ ρ (C) . . n ∗

Remark 3.4. Note that this upper bound is the rate of estimation of µ in L2 norm in the model (2.1) (See Equation (6.2) in Section 6.1.2). Remark 3.5. From Remark 3.2 and Theorem 3.3, it is clear that for any C being a convex subset of Rd such that C and Rd \ C are non-empty, it holds that √ d 1 √ . ρ∗ (C) . √ . n n Given Remarks 3.2 and 3.5, it is natural to ask if the upper bound in Theorem 3.3 is also sharp,√ in the sense that there is a choice of C that requires the separation rate √nd , at least up to logarithmic factors. It turns out that the answer is yes when C is taken to have the shape of an orthant:

Theorem 3.6. Let C = CO = (−∞, 0]d , d ≥ 2. Then, if η ∈ (0, 61 ), for the testing problem (2.2), we have √ d ∗ √ ρ (C) ≥ 288(⌈ln(d)⌉ + 1/3)2 n and therefore

√

√ d d ∗ √ . ρ (C) . √ . ln(d)2 n n

A version of this result was already proven in [12]; however, the influence of n is not considered there (which appears to be due to the fact that the result only serves as a lemma for another theorem), η = 18 is fixed and the bound holds for d ≥ D, where D ∈ N is fixed but not explicit. We provide a partly alternative or adapted proof in Section 6.2.3 that leads to the statement above.


6

4. A Simple Smoothness Property Clearly, the two extreme cases CHS and CO differ significantly with respect to smoothness of their surfaces. Based on this observation, in order to be able to handle ρ∗ (C) more flexibly, we propose to describe convex sets by their surfaces’ degree of smoothness, where the surface of a set S ∈ Rd is denoted by ∂S. To begin with, we examine the potential of the following very simple and purely geometric smoothness concept: Definition 4.1. Let R > 0 and S ⊆ Rd be nonempty. S is called R−smooth if ∀x ∈ ∂S ∃z ∈ S : x ∈ B(z, R) ⊆ S. (4.1) Remark 4.2. The definition of R−smoothness is closely related to the so-called R-rolling condition employed in [1]. In fact, R-smoothness of S is equivalent to saying that Rd \S fulfils the R-rolling condition. Another related concept worth mentioning is the radius of curvature, though the connection is more subtle: The radius of curvature at a point x ∈ ∂S would be the largest radius R such that there is a ball B(z, R) tangent to ∂S at x such that z −x points into C. Hence, it is possible that the infimum of these radii with respect to x ∈ ∂S corresponds to the parameter R in our previous definition. However, note that the condition B(z, R) ⊆ S is then still not guaranteed in general, so that the indicated notion of a uniform radius of curvature is not equivalent to R-smoothness. Since smoothness is usually defined as a property of a function, we provide a suggestion for how to cast the above concept in that context for a convex set C: Given any x ∈ ∂C, without loss of generality (w.l.o.g.) apply a mapping G that is orthogonal (represented by an orthogonal matrix) up to translation such that x′ = G(x) = Od and C ′ := G(C) ⊆ Rd−1 × [0, ∞). Now, let f be the function describing ∂C ′ over B := B(Od−1 , R). The following lemma states sufficient conditions for R− smoothness at x ∈ C: Lemma 4.3. In the situation described above, if f is twice differentiable on B (i.e. the gradient ∇f (·) and Hessian matrix Hf (·) exist), the following conditions are sufficient in order that ∂C ′ and B((Od−1 , R), R) (the ball associated to x in (4.1)) do not intersect in B × (0, R]:     ∇f (O ) = O , d−1

d−1

 1   ∀x ∈ B\{Od−1} : 0 ≤ λ min (Hf (x)), λmax (Hf (x)) ≤ R ,

where λmin (·) and λmax (·) are the lowest and highest eigenvalues of a real symmetric matrix, respectively. Now, let us examine how the additional assumption of R−smoothness may affect the general upper bound of Theorem 3.3:


7

Theorem 4.4. If C is R−smooth for R > 0, for the testing problem (2.2), we have q q q 1 2 2 d 2 + v + + v + dv ρ∗ (C) ≤ η/4 η/8 η/4 n 2nR nR nR n vη/2 and therefore, taking Theorem 3.3 into account, !! √ d d 1 ρ∗ (C) . max √ , min √ , n n nR

The following result confirms that this upper bound can be sharp up to ln(d) factors, namely in the case where C is taken as the R-inflated negative orthant: Theorem 4.5. Let d ≥ 3, η ∈ (0, 61 ) and

C = CIO = CO + B(Od , R) =

[

B(z, R).

z∈CO

1 √ Then, in the testing problem (2.2), we have with s = 144(⌈ln(d)⌉+1/3) 2 n 1 (d − 1)s2 √ (d − 1)s2 √ ≥ min , d−1 s ρ∗ (C) ≥ 16 2R 8R + 4 d − 1 s

and therefore, ∗

ρ (C) & max

1 √ , min n

√ !! d 1 1 d . · ·√ , ln(d)4 nR ln(d)2 n

5. Discussion The concept of R−smoothness allows for the construction of hypotheses C with √ any separation rate √1n . ρ∗ (C) . √nd , up to ln(d)− factors. On the other hand, we must acknowledge that R−smoothness is too weak a concept to fully describe the difficulty of testing an arbitrary C; an examination of the natural R−smooth set, namely a ball of radius R, provides clear evidence of this drawback. The result is a direct generalisation of the known rate ρ∗ (C) ∼ detection setting, see [2].

1

d4 √ n

in the signal

d Theorem 5.1. Let d ≥ 3. If C = CB = B(z, qR), z ∈ R and R > 0, for the √ 2 2 testing problem (2.2), we have for s := d−1 n e ln(1 + 4(1 − η) ) √ ! 1 s d4 d ∗ ρ (C) ≥ √ & min √ , n nR 2 s + R2

and also ρ∗ (C) ≤ .

! r r √ √ d 14 √ 2 2 2 d √ min 2 2 1 vη/2 + 3 vη/2 + 3 vη/2 , vη/2 √ n nR + 2 nvη/2 n n2 √ !! 1 d4 d 1 . max √ , min √ , n n nR


Therefore, ρ∗ (C) h max

1 √ , min n

√ !! 1 d4 d √ , . n nR

8

(5.1)

Clearly, Theorem 4.5 does not capture this case. As a consequence, future work will be concerned with finding a stronger concept, possibly a localised version of R−smoothness, that ideally allows for describing ρ∗ (C) for any choice of C. However, we suspect this to be quite an ambitious goal. 6. Proofs 6.1. General Preparations 6.1.1. Techniques for Obtaining Lower Bounds We employ a classical Bayesian approach for proving lower bounds, see references in [2] for its origins. We briefly give the main theoretical ingredients of this approach for our setting: Let ν0 be a distribution with S0 := supp(ν0 ) ⊆ C and νρ be a distribution with Sρ := supp(νρ ) ⊆ {µ ∈ Rd | dist(µ, C) = ρ} (priors). For instance, Dirac priors on some x ∈ R will be denoted δx . Furthermore, for i ∈ {0, ρ}, let Pνi be the resulting distribution of X given µ ∼ νi . Now, we see that for any test ϕ = 1A , A ∈ B(Rd ), sup Pµ (ϕ = 1) + sup Pµ (ϕ = 0) ≥ Pν0 (ϕ = 1) + Pνρ (ϕ = 0) µ∈C

µ∈Aρ

= 1 − (Pνρ (A) − Pν0 (A)) ≥ 1 − sup |Pνρ (A) − Pν0 (A)| A

1 = 1 − kPνρ − Pν0 kTV . 2 ! 21 Z dPνρ 2 1 dPν0 − 1 ≥ 1− . 2 dPν0 Rd This justifies the following reasoning used for each lower bound proof in the present paper: For any ρe > 0 such that either 1 1 kPν0 − Pνρe kTV ≤ 1 − η ∈ ,1 2 2 or Z dPνρ 2 (6.1) dPν0 ≤ 1 + 4(1 − η)2 ∈ (2, 5), dPν0 Rd for the testing problem (2.2), we have

ρ∗ (C) ≥ ρe.


9

6.1.2. Concentration Properties of Gaussian and χ2 Random Variables We will repeatedly make use of the following classical properties of N ∼ N (0, σ 2 ) and Z ∼ χ2λ (d) (that is, a χ2 - distribution with d degrees of freedom and noncentrality parameter λ ≥ 0): For any δ ∈ (0, 1), the following concentration inequalities hold: √ (I) p P(N ≥ σ 2vδ ) ≤ δ 2λ)vδ + 2vδ ) ≤ δ, (II) P(Z ≥ d + λ + 2 (d +p (6.2) (III) P(Z ≤ d + λ − 2 (d + 2λ)vδ ) ≤ δ,

See [4] for proofs of (6.2.II) and (6.2.III). 6.1.3. Lower Bounds for Specific

√ ·− Functions

The following bounds are easily obtained via Taylor expansion; however, we will employ them on several occasions which makes it convenient to mention them here. For any a > 0, b ∈ R, we have p a a √ ≤ a + b2 − b ≤ (6.3) 2 2b 2 a+b

and for any b > 0, a ∈ (0, b2 ) we have p a b − b2 − a ≥ 2b

(6.4)

6.2. Proofs for Section 3 6.2.1. Proof of Theorem 3.1 Proof. We prove independently that ρ∗ (C) is lower and upper bounded in order by √1n . Lower Bound. In accordance with the framework in Section 6.1.1, we verify that the bound holds in the special case ν0 = δOd and νρ = δρ·e1 . Since both the null and alternative hypotheses are simple, the corresponding density functions Fν0 (x) and Fνρ (x) are readily given and we obtain r dZ Z Fν2ρ n n exp −n(x1 − ρ)2 + x21 (x) dx = 2π Rd 2 Rd Fν0 n · exp − (x22 + x23 + . . . + x2d ) dx 2 r Z n n exp −n(x1 − ρ)2 + x21 dx1 = 2π R 2 r Z n n = exp(nρ2 ) exp − (x1 − 2ρ)2 dx1 2π 2 R 2 = exp(nρ ).


10

Therefore inequality (6.1) is satisfied (with equality) when the latter quantity is equal to 1 + 4(1 − η)2 , i.e. equivalently for r 1 ln(1 + 4(1 − η)2 ). ρ= n This yields the claim. Upper Bound. Given α, β ∈ (0, 12 ), let δ = min(α, β) and τδ = the test ϕ(X) = 1{X1 ≥τδ } .

q

2 n vδ .

Define

Then for any µ ∈ C, we have Pµ (ϕ(X) = 1) ≤ P

√1 ǫ1 n

≥ τδ

(6.2.I) ≤ δ ≤ α.

On the other hand, let now ρ = 2τδ . Then for any µ ∈ Aρ , we have Pµ (ϕ(X) = 0) ≤ P 2τδ +

√1 ǫ1 n

This concludes the proof since ρ h

(6.2.I) ≤ τδ ≤ P √1n ǫ1 ≤ −τδ ≤ δ ≤ β.

√1 . n

2

6.2.2. Proof of Theorem 3.3 Proof. Given α, β ∈ (0, 12 ), let δ = min(α, β) and τδ = Define the test

d n

+

2 n

√

dvδ +

2 n vδ .

ϕ(X) = 1{B(X,√τδ )∩C=∅} = 1{dist(X,C)≥√τδ } . Then for any µ ∈ C, we have Pµ (ϕ(X) = 1) ≤ P (kX − µk ≥

(6.2.II) ≤ δ ≤ α.

√ τδ ) ≤ P

2 1 n kǫk

2 1 n kǫk

(6.2.II) ≤ δ ≤ β.

≥ τδ

√ On the other hand, let now ρ = 2 τδ . Then for any µ ∈ Aρ , we have since dist(X, C) ≥ dist(µ, C) − kX − µk: Pµ (ϕ(X) = 0) ≤ P This concludes the proof since

√ τδ h

q

d n.

≥ τδ

2


11

6.2.3. Proof of Theorem 3.6 Proof. The arguments of this proof are related to the ones used in [12] and [6]. We decompose the proof into several steps. 1. Choice of priors. We make use of the following lemma used and explained in [12] : Lemma. For any M ∈ N and b > 0, there are distributions νe0 and νe1 with the following properties: b (I) supp(e ν0 ) ⊆ [−b, 0], supp(e ν1 ) ⊆ [−b, 0] ∪ 4M 2 b 1 (6.5) (II) νe1 4M 2 ≥ 2 Z Z (III) ∀k ∈ {0, 1, . . . , M } :

xk νe0 (dx) =

xk νe1 (dx).

For now, let νei be such distributions and νi = νei⊗d , i ∈ {0, 1}; M, b and ρ

will be specified later. Furthermore, writing σ 2 = n1 , let ⊗d Pi = νei ∗ N (0, σ 2 ) , i ∈ {0, 1},

where ∗ denotes convolution. Clearly, the corresponding density function can be written as Fi (x) =

d Y

j=1 2

Eµj ∼eνi [φ(xj ; µj , σ 2 )] , i ∈ {0, 1}.

where φ(x; µ, σ ) is the density of N (µ, σ 2 ). It will be convenient to exe i. amine the case d = 1, denoted by P Note that P0 is in accordance with Pν0 from Section 6.1.1, but the construction of ν1 does not warrant the notion of Euclidean distance we are interested in, hence the slight difference in notation. This technical obstacle is necessary for the property (6.5.III), but it can be resolved for a small price, which we explain in the last step of this proof. 2. Controlling the total variation distance. b Based on our construction and writing u := 4M 2 , we have for i ∈ {0, 1} Z 2xµ − µ2 2 2 Eµ∼eνi [φ(x; µ, σ )] = φ(x; 0, σ ) exp νei (dµ) 2σ 2 k Z X ∞ 1 2xµ − µ2 νei (dµ) = φ(x; 0, σ 2 ) k! 2σ 2 k=0 Z X ∞ 1 2 = φ(x; 0, σ ) (2xµ − µ2 )k νei (dµ) k!(2σ 2 )k k=0 Z ∞ X k X k (−1)k−j 2j j x µ2k−j νei (dµ). = φ(x; 0, σ 2 ) 2 )k k!(2σ j j=0 k=0


12

As a consequence, by (6.5.III), this yields Z Eµ∼eν [φ(x; µ, σ 2 )] − Eµ∼eν [φ(x; µ, σ 2 )] dx e1 − P e 0 kTV = kP 1 0 Z X ∞ k X k (−1)k−j 2j j x = 2 k k=⌊M/2⌋+1 j=0 j k!(2σ ) Z Z 2k−j 2k−j · µ νe1 (dµ) − µ νe0 (dµ) φ(x; 0, σ 2 ) dx ≤

Z Z 2j µ2k−j νe1 (dµ) − µ2k−j νe0 (dµ) 2 k j k!(2σ ) {z } |

k X k

∞ X

k=⌊M/2⌋+1 j=0

·

Z

=I2k−j

|x|j φ(x, 0, σ 2 ) dx.

We now use the bound I2k−j ≤ 2b2k−j and the fact that √ j √ j Z σ j 2 k! σj 2 j 2 |x| φ(x, 0, σ ) dx = √ Γ(j/2 + 1) ≤ √ π π R (see [14]) to obtain e1 − P e 0 kTV kP

≤

2 √ π

=

2 √ π

= Now, if t := 1 − t ≥ 21 : kP1 −P0 kTV

s 2

2 √ π

∞ X

k X k

k=⌊M/2⌋+1 j=0 ∞ X

k=⌊M/2⌋+1 ∞ X

k=⌊M/2⌋+1

b 2σ

k−j k X k √ b k (2 2)j j σ j=0

ik hs √ b , s := . 2 2+s 2 σ

√ √ √ 2 2 + s ∈ (0, 21 ], i.e. b ∈ (0, σ( 3 − 2)], this yields using

⌊M/2⌋+1 √ 2d b b 2d e e ≤ dkP1 −P0 kTV ≤ √ 2 2+ ≤ √ 2−⌊M/2⌋ . σ π 2σ π

Finally, let η ′ ∈ (0, 21 ) and choose b = computation we can show that M≥

√ j 2j b2k−j σ j 2 k! j k!(2σ 2 )k

1 4σ

=

1 √ . 4 n

By straightforward

ln(π(1 − η ′ )2 ) 1 2 ln(d) − =⇒ kP1 − P0 kTV ≤ 1 − η ′ . ln(2) ln(2) 2

Here, since η ′ < 21 , less than can therefore fix the solution

1 2

is subtracted; together with

M = 3⌈ln(d)⌉ + 1.

2 ln(2)

≤ 3 we


13

3. Application. Note that this upper bound 12 kP1 − P0 kTV ≤ 1 − η ′ does not formally allow for determining a lower bound on ρ∗ (C) yet since H0 and H1 are not separated in a Euclidean sense. In a final step, we will resolve this by a suitable restriction of H1 . Pd Let Y = i=1 1{µi =u} , i.e. the number of coordinates of µ taking the value u. Obviously, if µ ∼ ν1 , we have Y ∼ Bin(d, νe1 ({u})). By property (6.5.II), this yields that if d ≥ 2, Pµ∼ν1 Y ≥ Now, let ξ = {Y ≥ d4 } and

d 4

≥

3 . 4

H1′ : µ ∼ ν1 |ξ. Assuming that for some test ϕ the relation Pµ∼ν0 (ϕ = 1) + Pµ∼ν1 (ϕ = 0) ≥ η ′ holds, we can conclude Pν0 (ϕ = 1) + Pν1 |ξ (ϕ = 0) = ≥ ≥ ≥ ≥

Pν1 ({ϕ = 1} ∩ ξ) P(ξ) 4 Pν0 (ϕ = 1) + 1 − Pν1 (ϕ = 1) 3 4 1 Pν0 (ϕ = 1) + Pν1 (ϕ = 0) − 3 3 1 Pν0 (ϕ = 1) + Pν1 (ϕ = 0) − 3 1 ′ η − . 3 Pν0 (ϕ = 1) + 1 −

Hence, inference from the testing problem discussed in Steps 1 and 2 to the problem H0 : µ ∼ ν0 vs. H1′ : µ ∼ ν1 |ξ is valid as long as η ∈ (0, 61 ) (η corresponds to η ′ −

1 3

above).

The following observation concludes the proof: Clearly, H1′ agrees with Aρ for √ √ √ √ d dσ d d √ ∼ √ . = u= ρ= 2 2 2 2 32M 288(⌈ln(d)⌉ + 1/3) n ln(d) n 2


14

6.3. Proofs for Section 4 6.3.1. Proof of Lemma 4.3 Proof. Let x ∈ B\{Od−1 }. By construction, we need to ensure that p 0 ≤ f (x) ≤ R − R2 − kxk2 .

Applying Taylor’s theorem with Lagrange’s remainder yields ∃s ∈ (0, 1) : f (x) =

1 T x Hf (sx)x. 2

Clearly, for f ≥ 0 on B, it is sufficient to require λmin (Hf (y)) ≥ 0 for y ∈ B\{Od−1}. On the other hand, we can use a classical eigenvalue representation to obtain the desired upper bound: For some s ∈ (0, 1), f (x) = ≤ =

T 1 x x Hf (sx) kxk2 2 kxk kxk 1 kxk2 max y T Hf (sx)y 2 kyk=1 1 kxk2 λmax (Hf (sx)). 2

This already implies a sufficient condition on Hf for the upper bound: p 2(R − R2 − kxk2 ) . ∀x ∈ B\{Od−1} ∀s ∈ (0, 1) : λmax (Hf (sx)) ≤ kxk2 However, (6.4) tells us that for z ∈ (0, R2 ), R−

p 2(R − z , that is R2 − z ≥ 2R

p (R2 − z)) 1 ≥ . z R

We obtain the simplified sufficient condition

∀x ∈ B\{Od−1 } : Hf (x) ≤ and the proof is concluded.

1 R 2

6.3.2. Proof of Theorem 4.4 Proof. We define the test statistic T (X) := dist(X, C),


15

and a corresponding test of the form ϕ(X) = 1{T (X)≥τ } . W.l.o.g. let Od ∈ ∂C and C ⊆ Rd−1 × [0, ∞). Now, let µ = Od and z = (Od−1 , R) so that by construction µ ∈ B(z, R) ⊆ C. For τ > 0, we have dist(X, C) ≥ τ

dist(X, B(z, R)) ≥ τ

√1

n ǫ − z − R ≥ τ.

=⇒ =⇒

Now, using (6.3), we obtain

√1

n ǫ − z − R

= = ≤

r

2

√1

n ǫ − z − R r r

1 2 n kǫ1:(d−1) k

+

1 2 n kǫ1:(d−1) k

+ R+

√1 |ǫd | n

≤

R+

≤

√1 |ǫd | n

that is Pµ (T (X) ≥ τ ) ≤ P

+

+

−R

2

√1 |ǫd | n 2

kǫ1:(d−1) k 1 2n(R+ √ |ǫd |) n

−R

2

−R

−R

kǫ1:(d−1) k2 , 2nR

√1 |ǫd | n

+

kǫ1:(d−1) k2 2nR

Clearly, this bound holds generally in the sense sup Pµ (ϕ(X) = 1) ≤ P √1n |ǫd | + µ∈∂C

√1 ǫd n

≥τ .

kǫ1:(d−1) k2 2nR

≥τ .

Now, based on the general property P(A ≥ τ1 ) ≤

α 2

∧ P(B ≥ τ2 ) ≤

α 2

=⇒ P(A + B ≥ τ1 + τ2 ) ≤ α

for random variables A and B and by using (6.2.I) and (6.2.II), we obtain the rejection threshold q q d 1 1 d 2 τ := n2 vα/4 + 2nR + nR vα/2 h max √ , dvα/2 + nR n nR for a fixed level α ∈ (0, 12 ). On the other hand, w.l.o.g., choose µ = (Od−1 , −ρ). Then we have √ dist(X, C) ≤ τ =⇒ Xd ≥ −τ ⇐⇒ ǫd ≥ n(ρ − τ ),


16

so that it is sufficient to ensure sup Pµ (ϕ(X) = 0) ≤ P(ǫd ≥

µ∈Aρ

√ n(δ − τ )) ≤ β ∈ (0, 12 ).

which leads to the condition ρ≥τ+ This concludes the proof.

q

2 n vβ

∼ τ. 2

6.3.3. Proof of Theorem 4.5 Proof. This is a variation on the proof of Theorem 3.6. Using the same construction and notation as previously, and taking d ≥ 3, let now for i ∈ {0, 1} νi = δR ⊗ νei⊗d−1 .

Since the mutual deterministic coordinate µ1 = R is irrelevant for the total variation distance between the resulting distributions P0 and P1 , the bounds in Step 2 of the proof of Theorem 3.6 also hold here with d − 1 instead of d. The most important modification arises when calculating ρ: Now, if at least d−1 4 of the coordinates take the value u, computing the Euclidean distance of µ from C and using (6.3) leads to r d−1 2 (d − 1)u2 ∗ q u −R ≥ ρ (C) ≥ R2 + 4 8 R2 + d−1 u2 4

(d − 1)u2 √ ≥ 8R + 4 d − 1u (d − 1)u2 √ 1 min , d − 1u ≥ 16 2R √ ! d d 1 1 . · ·√ , ∼ min ln(d)4 nR ln(d)2 n

This concludes the proof.

2

6.4. Proofs for Section 5 6.4.1. Proof of Theorem 5.1 Proof. W.l.o.g., let z = Od . We prove independently that ρ∗ (C) is lower and upper bounded by the right hand side of (5.1).


Lower Bound.

17

Let ν0 = δRe1 , giving rise to the density function

Fν0 (x) :=

d n Y n n d2 exp − x2i . exp − (x1 − R)2 2π 2 2 i=2

On the other hand, for a suitable h > 0 specified in a moment, let νρ be the uniform distribution on Ph := {(R, h · v) | v ∈ {−1, 1}d−1} p Since each element of Ph has Euclidean distance R2 + (d − 1)h2 − R from C, (R+ρ)2 −R2 . d−1

which should correspond to ρ, we set h2 = following density function: Fνρ (x)

:=

=

=

n d2 n 1 exp − (x1 − R)2 d−1 2π 2 2

This gives rise to the

X

v2 ,...,vd ∈{−1,1}

n exp − (xi − h · vi )2 2 i=2 d Y

d n 1 Y n n d2 n exp − x2i − h2 2 cosh (nhxi ) exp − (x1 − R)2 d−1 2π 2 2 2 2 i=2

d n d2 n n n Y exp − (x1 − R)2 − (d − 1) h2 exp − x2i cosh (nhxi ) , 2π 2 2 2 i=2

so that Fν2ρ (x) :=

d n d Y exp −nx2i cosh2 (nhxi ) . exp −n(x1 − R)2 − (d − 1)nh2 2π i=2

Now, using the fact that E[cosh2 (aY )] = exp(a2 σ 2 ) cosh(a2 σ 2 ) for Y ∼ N (0, σ 2 ), we have Z Z d Fν2ρ (x) n 2 exp − n2 (x1 − R)2 − (d − 1)nh2 Fν (x) dx = 2π Rd

0

Rd

· =

n 2π

d2

=

i=2

cosh2 (nhxi ) exp − n2 x2i

exp −(d − 1)nh2 ·

=

d Y

Z

Z

R

dx

exp − n2 (x1 − R)2 dx1 2

cosh (nhx) exp R

− n2 x2

d−1 exp −(d − 1)nh2 · exp(nh2 ) cosh(nh2 )

d−1 dx

cosh(nh2 )d−1 .

Now, by Taylor expansion we obtain the bound e nh2 ≤ 1 =⇒ cosh(nh2 ) ≤ 1 + n2 h4 , 2

(6.6)


18

so that n2 e e ((R + ρ)2 − R2 )2 , ln cosh(nh2 )d−1 ≤ (d − 1) n2 h4 = 2 (d − 1) 2

whenever

2

nh ≤ 1 i.e. ρ ≤

r

(6.7)

d−1 + R2 − R. n

(6.8)

Now, provided that (6.8) holds, (6.7) leads to the following sufficient condition for (6.1): s r √ d−1 2 2 ρ≤ s + R − R, where s := ln(1 + 4(1 − η)2 ). (6.9) n e It is straightforward to see that (6.9) implies (6.8) as long as d≥1+

2 ln(5) i.e. d ≥ 3. e

of (6.3) now yields the It remains to investigate (6.9) a little closer. Application √ d−1 2 2 following, defining t > 0 via the relation R = t s n : s

√

d−1 s + R2 − R ≥ n h h 1

If on the one hand R . s

(d−1) 4 1 n2

√ 1 (d − 1) 4 s √ 1 2 1 + t2 n2 1 (d − 1) 4 min(t−1 , 1) 1 n2 ! 1 1 (d − 1) 4 (d − 1) 2 , . min 1 n·R n2

1 √ √ 4 for t > 0, we have s, that is R ≤ t s (d−1) 1

n2

√ √ 1 (d − 1) 4 d−1 s h min s + R2 − R ≥ √ 1 n 2 1 + t2 n2 1

Analogously, the case R & s

√

(d−1) 4 1 n2

1

1

(d − 1) 4 (d − 1) 2 , 1 n·R n2

√ s also yields

d−1 s + R2 − R & min n

1

1

(d − 1) 4 (d − 1) 2 , 1 n·R n2

Upper Bound. We define the test statistic T (X) := kXk2 − R2

!

.

!

.


19

and a corresponding test of the form ϕ(X) = 1{T (X)≥τ } . Clearly, for any µ ∈ ∂C, we have nkXk2 ∼ χ2nR2 (d) and for any µ ∈ Aρ with kµk = R + ρ, we have nkXk2 ∼ χ2n(R+ρ)2 (d). Therefore, via sup P(ϕ(X) = 1) ≤ P(Z ≥ nR2 + nτ ), Z ∼ χ2nR2 (d), µ∈C

and using (6.2.II), a rejection threshold τ is readily given by s 2 d 2 2 d τ = +2 + R vα + vα . 2 n n n n On the other hand, in order to satisfy a prescribed level β ∈ (0, 12 ) for the Type II error, that is sup Pµ (ϕ(X) = 0) = P(Z ′ ≤ nR2 + nτ ) ≤ β, where Z ′ ∼ χ2n(R+ρ)2 (d),

µ∈Aρ

(6.2.III) yields the sufficient condition d + nR2 + 2

q p (d + 2nR2 )vα + 2vα ≤ d + n(R + ρ)2 − 2 (d + 2n(R + ρ)2 )vβ .

√ √ √ Using a + b ≤ a + b (for a, b > 0) and (6.3) respectively, we obtain two different sufficient bounds for ρ: r √ d 14 √ 2 √ √ √ ρ ≥ ( vα + 2 vβ ) ; 2 √ ( vα + vβ ) + n n r √ √ d 2 √ √ √ ( vα + vβ ) + ( vα + 2 vβ ). ρ ≥ √ nR + 2 nvα n Therefore, as claimed, the upper bound ∗

ρ (C) . max holds. This concludes the proof.

1 √ , min n

√ !! 1 d4 d √ , n nR 2

References [1] Arias-Castro, E., and Casal, A. R. On estimating the perimeter using the alpha-shape. arXiv preprint arXiv:1507.00065 (2015). [2] Baraud, Y. Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8, 5 (2002), 577–606.


20

[3] Baraud, Y., Huet, S., and Laurent, B. Testing convex hypotheses on the mean of a gaussian vector. application to testing qualitative hypotheses on a regression function. The Annals of Statistics (2005), 214–257. [4] Birg´ e, L. An alternative point of view on lepski’s method. Lecture NotesMonograph Series (2001), 113–133. [5] Bull, A., and Nickl, R. Adaptive confidence sets in l2 . Probability Theory and Related Fields 156, 3-4 (2013), 889–919. [6] Cai, T. T., and Low, M. Testing composite hypotheses, hermite polynomials and optimal estimation of a nonsmooth functional. The Annals of Statistics 39, 2 (2011), 1012–1041. [7] Carpentier, A. Testing the regularity of a smooth signal. Bernoulli 21, 1 (2015), 465–488. [8] Comminges, L., and Dalalyan, A. Minimax testing of a composite null hypothesis defined via a quadratic functional in the model of regression. Electronic Journal of Statistics 7 (2013), 146–190. [9] Gayraud, G., and Pouet, C. Adaptive minimax testing in the discrete regression scheme. Probability Theory and Related Fields 133, 4 (2005), 531–558. [10] Ingster, Y. Minimax detection of a signal for lqn -balls with lpn -balls removed. WIAS Preprint (1991). [11] Ingster, Y., and Suslina, I. Minimax detection of a signal for besov bodies and balls. Problemy Peredachi Informatsii 34, 1 (1998), 56–68. [12] Juditsky, A., and Nemirovski, A. On nonparametric tests of positivity/monotonicity/convexity. The Annals of Statistics (2002), 498–527. [13] Lepski, O., Nemirovski, A., and Spokoiny, V. On estimation of the lr norm of a regression function. Probability Theory and Related Fields 113, 2 (1999), 221–253. [14] Winkelbauer, A. Moments and absolute moments of the normal distribution. arXiv preprint (2012).

Minimax Euclidean Separation Rates for Testing Convex ... - arXiv

Minimax Euclidean Separation Rates for Testing Convex ... - arXiv

Suggest Documents

Minimax Robust Hypothesis Testing - arXiv

Minimax rates for finite mixture estimation

Minimax inequalities in G-convex spaces

Separation-Sensitive Collision Detection for Convex Objects

Minimax Theorems for Extended Real-Valued Abstract Convex ...

Symbol Error Rates of Maximum-Likelihood Detector: Convex ... - arXiv

Minimax rates of estimation for high-dimensional linear regression ...

Minimax rates for homology inference. - CMU Statistics - Carnegie ...

Minimax rates of estimation for high-dimensional linear regression ...

Local Privacy and Minimax Bounds: Sharp Rates for Probability ...

Minimax Rates of Convergence for High-Dimensional ... - CiteSeerX

Minimax Optimal Rates for Poisson Inverse Problems with Physical ...

Testing Convex Position - Semantic Scholar

Covering Numbers for Convex Functions - arXiv

Stochastic Subgradient Algorithms for Strongly Convex ... - arXiv

Covering Numbers for Convex Functions - arXiv

early stopping for convex regularization arXiv

STRICT SEPARATION OF CONVEX SETS1 735

Separation of Convex Sets in Linear Topologic

Minimax

Minimax Robust Quickest Change Detection - arXiv

A Convex Analysis Framework for Blind Separation of ... - EE, CUHK

Minimax Design and Order Estimation of FIR Filters for ... - arXiv

Convex Total Least Squares - arXiv