Document not found! Please try again

Estimators of the proportion of false null hypotheses: I" universal ...

2 downloads 0 Views 462KB Size Report
Jul 10, 2018 - Department of Mathematics and Statistics, Washington State University, Pullman, ... all other p-values”; see Theorem 13 of Blanchard and Roquain (2009). ...... connections with George Pólya's treatment of the Riemann ...
Estimators of the proportion of false null hypotheses: I “universal construction via Lebesgue-Stieltjes integral equations and uniform consistency under independence”

arXiv:1807.03889v1 [math.PR] 10 Jul 2018

Xiongzhi Chen∗

Abstract The proportion of false null hypotheses is a very important quantity in statistical modelling and inference based on the two-component mixture model and its extensions, and in control and estimation of the false discovery rate and false non-discovery rate. Most existing estimators of this proportion threshold p-values, deconvolve the mixture model under constraints on the components, or depend heavily on the location-shift property of distributions. Hence, they usually are not consistent, applicable to non-location-shift distributions, or applicable to discrete statistics or p-values. To eliminate these shortcomings, we construct uniformly consistent estimators of the proportion as solutions to Lebesgue-Stieltjes integral equations. In particular, we provide such estimators respectively for random variables whose distributions have separable characteristic functions, form discrete natural exponential families with infinite supports, and form natural exponential families with separable moment sequences. We provide the speed of convergence and uniform consistency class for each such estimator. The constructions use Fourier transform, Mellin transform or probability generating functions, and have connections with Bessel functions. In addition, we provide example distribution families for which a consistent estimator of the proportion cannot be constructed using our techniques. Keywords: Analytic functions; Bessel functions; concentration inequalities; Fourier transform; Lebesgue-Stieltjes integral equations; Mellin transform; natural exponential family; proportion of false null hypotheses MSC 2010 subject classifications: Primary 35C05, 60E05; Secondary 60F10.

1

Introduction

The proportion of false null hypotheses and its dual, the proportion of true null hypotheses, play important roles in statistical modelling and multiple hypotheses testing. For example, they are components of the two-component mixture model of Efron et al. (2001), its extensions by Ploner et al. (2006), Cai and Sun (2009) and Liu et al. (2016), and their induced statistics including the “local false discovery rate (local FDR)” of Efron et al. (2001) and ∗

Corresponding author. Department of Mathematics and Statistics, Washington State University, Pullman, WA 99164, USA; E-mail: [email protected]

1

“positive FDR (pFDR)” and q-value of Storey (2003). They also form the optimal discovery procedure (ODP) of Storey (2007). However, without information on the proportions, decision rules based on the local FDR cannot be implemented, and none of the pFDR, qvalue and ODP decision rule can be computed in practice. Further, they form upper bounds on the FDRs and false non-discovery rates (FNRs) of all FDR procedures including those of Benjamini and Yekutieli (2001), Genovese and Wasserman (2004), Storey et al. (2004), Sarkar (2006) and Blanchard and Roquain (2009), and accurate information on either proportion helps better control and estimate the FDR and FNR, and thus potentially enables an adaptive procedure to be more powerful or have smaller FNR than its non-adaptive counterpart. Since neither proportion is known in practice, it is very important to accurately estimate the proportions. In this work, we focus on consistently estimating the proportions for the purpose of FDR control and estimation without employing the two-component mixture model or assuming that statistics or p-values have absolutely continuous cumulative distribution functions (CDFs). However, part of our results can be applied to modeling and inference based on this model and its extensions. The main motivation for dealing with discrete p-values or statistics is the wide practice of FDR control in multiple testing based on discrete data in genomics (Auer and Doerge; 2010; Di et al.; 2011), genetics (Gilbert; 2005), vaccine efficacy studies (Mehrotra and Heyse; 2004), drug safety monitoring (Chen and Doerge; 2017) and other areas, where Binomial Test, Fisher’s Exact Test and Exact Negative Binomial Test have been routinely used to test individual hypotheses. In the sequel, we refer to an estimator of the proportion of true (or false) null hypotheses as a “null (or alternative) proportion estimator”.

1.1

A brief review on adaptive FDR control and proportion estimators

Consider an adaptive FDR procedure, e.g., one studied in Section 3 of Blanchard and Roquain (2009), that employs a null proportion estimator, and recall the property of “positive regression dependency on subsets of the index set of true null hypotheses (PRDS) ”defined by Benjamini and Yekutieli (2001) or Sarkar (2008). To ensure the non-asymptotic conservativeness of an adaptive FDR under independence, usually is sufficient the “reciprocal conservativeness of a null proportion estimator”, which is the property that “the reciprocal of the proportion of true null hypotheses is an upper bound for the expectation of the reciprocal of the estimate obtained by applying the null proportion estimator to the value 0 for a true null hypothesis together with all other p-values”; see Theorem 13 of Blanchard and Roquain (2009). In contrast, to ensure the non-asymptotic conservativeness of an adaptively weighted FDR procedure under independence, such as those in Hu et al. (2010) and Chen and Doerge (2017), whose weights are induced by the estimated groupwise proportions of true null hypotheses, “reciprocal conservativeness” does not seem to be sufficient. In fact, it remains an open problem to show the non-asymptotic conservativeness of such a procedure. However, the asymptotic conservativeness of these procedures under PRDS is ensured when the null proportion estimators used by them are consistent under 2

PRDS. There are many estimators of the proportions, and their constructions can be roughly categorized into 4 classes: (1) thresholding p-values (Storey; 2002; Storey et al.; 2004); (2) deconvolving the two-component mixture model for the distribution of p-values or statistics, modulo identifiability conditions (Swanepoel; 1999; Genovese and Wasserman; 2004; Kumar and Bodhisattva; 2016); (3) bounding the proportions via the use of uniform empirical process (Meinshausen and Rice; 2006); (4) Fourier transform for Gaussian family or mixtures with a Gaussian component (Jin and Cai; 2007; Jin; 2008; Jin et al.; 2010). Further, perhaps the only existing consistent estimators are those mentioned in class (1), (2) and (3) under the mixture model where statistics or p-values are identically distributed and the proportions are identifiable, and those mentioned in class (4). However, the construction in class (4) based on Fourier transform does not work if the family of distributions of statistics does not consist at least one component from a location-shift family, due to the loss of equivalence between translating the mean parameter of a distribution and convoluting the original and induced distributions. In contrast, consistency of estimators in class (1), (2) and (3) requires the two-component mixture model and various regularity conditions such as concavity, smoothness, purity (defined by Genovese and Wasserman; 2004) of the alternative component, or the conditions in Lemma 3 of Kumar and Bodhisattva (2016). Finally, almost all existing proportion estimators were initially designed for p-values or statistics that have continuous distributions, and it is not clear yet whether the null proportion estimators in classes (2) to (4) are reciprocally conservative. Even though the estimator of Storey et al. (2004) is reciprocally conservative for independent p-values whose null distributions are uniform on [0, 1] (see Corollary 13 of Blanchard and Roquain; 2009), Chen et al. (2018) have shown that it can have excessive upward bias when applied to discrete p-values from the Binomial test and Fisher’s Exact Test and proposed an improved estimator accordingly. Note that for p-values of these tests, the two-component mixture model is inappropriate.

1.2

Main contributions

We generalize “Jin’s Strategy” provided in Section 2.1 of Jin (2008) that estimates the proportions by approximating the indicator function of the true status of a null hypothesis and has only essentially been implemented for Gaussian family or mixtures with a Gaussian component, and in much more general settings construct proportion estimators as solutions to a specific type of Lebesgue-Stieltjes integral equation in the complex domain. As such, our methodology (referred to as “the Strategy”) lifts Jin’s strategy to its full generality. In addition to the key advantage of Jin’s strategy, i.e., free from the two-component mixture model, the Strategy does not require the absolute continuity of the CDFs of involved statistics or the location-shift property of involved distributions. We implement the Strategy as “Construction I” for random variables whose distributions have separable characteristic functions (CFs) (see Definition 1 and Theorem 1), as “Construction 3

II” for random variables that form discrete natural exponential families (NEFs) with infinite supports (see Theorem 3), and as “Construction III” for random variables that form NEFs with separable moment functions (see Definition 5 and Theorem 4). In particular, these distributions include eight members of the twelve natural exponential families with cubic variance functions (NEF-CVFs) proposed by Letac and Mora (1990), Cauchy family, Hyperbolic Secant family, Laplace family and Logistic family. Specifically, Construction I employs Fourier transform and an extended Riemann-Lebesgue lemma of Costin et al. (2016). Since the set of distributions with separable CFs contains several location-shift families, we give a unified treatment of proportion estimators for location-shift families and reveal the intrinsic mechanism of Fourier transform based construction. In contrast, Construction II mainly uses generating functions (GFs), and Construction III Mellin transform which can be regarded as inducing “multiplication-convolution equivalence”, in contrast to Fourier transform inducing “translation-convolution equivalence”. We show that for Inverse Gaussian family (which does not have a separable moment sequence) and Binomial family (which is discrete but with a finite support), the Strategy is not implementable; see Section 5. Further, we provide evidence that Ressel and Hyperbolic Cosine families may not have separable moment sequences, suggesting that Construction III based on Mellin transform may not be applicable to them; see Section 8.3. The study of the moment sequence of Ressel family utilizes Lambert W functions and the calculus of residue, exemplifying yet another application of Lambert functions and meromorphic functions to statistics and solutions of integro-differential equations. We provide upper bounds on the variances of the proportion estimators, show their uniform consistency, and provide their speeds of convergence for consistency under independence respectively for Construction I, II and III; see Section 3.2, Section 6 and Section 7. For Construction I, uniform consistency in frequency domain (see Definition 3) can also be achieved due to the global Lipschitz property of the construction, and the uniform consistency classes (see Definition 3) for the proportion estimators can be ordered via set inclusion according to the magnitudes of the moduli of the corresponding CFs; see Theorem 2 and Corollary 3. In contrast, for Constructions II and III, uniform consistency in frequency domain of the estimators is very hard to obtain since the constructions are not Lipschitz transforms of the involved random variables. For Construction I and Construction II where GFs have finite radii of convergence, the speed of convergence of an estimator and its uniform consistency class do not depend on the supremum norm of the parameter vector (see Theorem 2, Corollary 3 and Corollary 4), enabling the corresponding estimators to be fully data-adaptive, whereas for other instances of Construction II and III, they may depend explicitly on the supremum norm of the parameter vector or the infimum of a transform of this vector (see Theorem 6 and Theorem 8). However, they always depend on the minimal magnitude of the differences between parameters of interest and their reference value, and this is the consequence of approximating indicator functions of the status of individual null hypotheses — a universality phenomenon for the Strategy.

4

1.3

Notations and conventions

Throughout the article, we use the following conventions and notations: C denotes a generic, positive constant whose values may differ at different occurrences; O (·) and o (·) are respectively Landau’s big O and small o notations; E and V are respectively the expectation and variance with respect to the probability measure Pr; R and C are respectively the set of real and complex numbers; ℜ, ℑ and arg denote the real, imaginary part and argument of a complex number,

respectively; N denotes the set of non-negative integers, and N+ = N \ {0}; δy is the Dirac

mass at y ∈ R; ν the Lebesgue measure, and when it is clear that an integral is with respect

to ν, the usual notation d· for differential will be used n in place of ν (d·); o for 1 ≤ p ≤ ∞ 1/p R p p and L (A) = f : kf kp < ∞ , for which kf k∞ is and A ⊆ R, kf kp = A |f (x)| ν (dx)

the essential supremum of f ; for a set A in R and a scalar a, 1A is the indicator of A and A − a = {x − a : x ∈ A}; ∅ is the empty set; for x ∈ R, ⌊x⌋ denotes the integer part of x; the

extended complex plane C ∪ {∞} is identified with the Riemann sphere so that ∞ corresponds

to its north pole; for x ∈ R, x → ∞ means x → +∞, and x → −∞ means x → −∞; for positive functions a′ (x) and b′ (x), a′ (x) ∼ b′ (x) means that limx→∞ a′ (x) (b′ (x))−1 = 1.

1.4

Organization of paper

The rest of the article is organized as follows. In Section 2 we formulate the problem of proportion estimation and state the Strategy for constructing proportion estimators. In Section 3 we develop uniformly consistent proportion estimators when the CDFs of random variables have separable characteristic functions. In Section 4, we construct proportion estimators when the distributions of random variables form discrete NEFs with infinite supports or form NEFs with separable moment sequences. In Section 5, we provide two families of random variables for which the Strategy cannot be applied. Section 6 and Section 7 justify the uniform consistency of the constructed proportion estimators for NEFs with infinite supports and NEFs with separable moment sequences, respectively. We end the article with a discussion in Section 8 on several topics worthy of future investigations, uniform consistency of proportion estimators in frequency domain, and concentration inequalities for non-Lipschitz functions of independent random variables.

2

The estimation problem and strategy

In this section, we formulate the estimation problem and generalize Jin’s strategy. Let zi , 1 ≤ i ≤ m be m random variables each with mean or median µi , such that, for a fixed value µ0 for the mean or median and some integer m0 between 0 and m, µi = µ0 for each i = 1, . . . , m0

and µi 6= µ0 for each i = m0 + 1, . . . , m. Consider simultaneously testing the null hypothesis

Hi0 : µi = µ0 versus the alternative hypothesis Hi1 : µi 6= µ0 for 1 ≤ i ≤ m. Let I0,m

= {1 ≤ i ≤ m : µi = µ0 } and I1,m = {1 ≤ i ≤ m : µi 6= µ0 }. Then the cardinality of I0,m is m0 , 5

the proportion of true null hypothesis (“null proportion” for short) is defined as π0,m = m−1 m0 , and the proportion of false null hypotheses (“alternative proportion” for short) π1,m = 1 − π0,m .

In other words, π0,m is the proportion of random variables that have a prespecified mean or median. Our target is to consistently estimate π1,m as m → ∞ when {zi }m i=1 are independent.

2.1

The Strategy for proportion estimation

Let z = (z1 , . . . , zm )T and µ = (µ1 , ..., µm )T . Denote by Fµi (·) the CDF of zi for 1 ≤ i ≤ m and

suppose each Fµi with 0 ≤ i ≤ m is a member of a set F of CDF’s such that F = {Fµ : µ ∈ U }

for some non-empty U in R. For the rest of the paper, we assume that each Fµ is uniquely determined by µ and that U has a non-empty interior. We state the Strategy below. For each fixed µ ∈ U , if we can approximate the indicator

function 1{µ6=µ0 } by a function ψ (t, µ; µ0 ) with t ∈ R satisfying

• limt→∞ ψ (t, µ0 ; µ0 ) = 1 and limt→∞ ψ (t, µ; µ0 ) = 0 for µ 6= µ0 , then the “phase function” m

ϕm (t, µ) =

1 X [1 − ψ (t, µi ; µ0 )] m

(1)

i=1

satisfies limt→∞ ϕm (t, µ) = π1,m for any fixed m and µ and provides the “Oracle” Λm (µ) = limt→∞ ϕm (t, µ). Further, if we can find a function K : R2 → R that does not depend on any µ 6= µ0 and satisfies the Lebesgue-Stieltjes integral equation ψ (t, µ; µ0 ) =

Z

K (t, x; µ0 ) dFµ (x) ,

(2)

then the “empirical phrase function” m

ϕˆm (t, z) =

1 X [1 − K (t, zi ; µ0 )] m

(3)

i=1

satisfies E [ϕˆm (t, z)] = ϕm (t, µ) for any fixed m, t and µ. Namely, ϕˆm (t, z) is an unbiased estimator of ϕm (t, µ). When the difference em (t) = |ϕˆm (t, z) − ϕm (t, µ)|

(4)

is suitably small for large t, ϕˆm (t, z) will accurately estimate π1,m . If Pr (em (tm ) → 0) → 1 for

a monotone increasing sequence {tm }m≥1 such that limm→∞ tm = ∞, then ϕˆm (tm , z) will be

consistent.

By duality, ϕ∗m (t, µ) = 1 − ϕm (t, µ) is the oracle for which π0,m = limt→∞ ϕ∗m (t, µ) for any

fixed m and µ, ϕˆ∗m = 1 − ϕˆm (t, z) satisfies E [ϕˆ∗m (t, z)] = ϕ∗m (t, µ) for any fixed m, t and µ, 6

and ϕˆ∗m (t, z) will accurately estimate π0,m when em (t) is suitably small for large t. Further, the stochastic oscillations of ϕˆ∗m (t, z) and ϕˆm (t, z) are the same and is quantified by em (t). We remark on the differences between the Strategy and Jin’s Strategy. The latter in our notations sets µ0 = 0, requires ψ (t, 0; 0) = 1 for all t, requires location-shift families and uses Fourier transform to construct K and ψ, deals with distributions whose means are equal to their medians, and intends to have 1 ≥ ψ (t, µ; 0) ≥ 0 for all µ and t.

(5)

It is not hard to see from the proof of Lemma 7.1 of Jin (2008) that (5) may not be achievable for non-location-shift families. Further, Jin’s construction of ψ is a special case of solving (2). Finally, it is easier to solve (2) for K : R2 → C in the complex domain. However, a real-valued K is preferred for applications in statistics.

3

Construction I: separable characteristic functions

In this section, we present the construction, referred to as “Construction I”, when the set of characteristic functions (CFs) of the CDFs are separable (see Definition 1). This construction essentially depends on generalizations of the Riemann-Lebesgue Lemma (“RL Lemma”) and subsumes that by Jin (2008) for Gaussian family. R Recall the family of CDFs F = {Fµ : µ ∈ U } and let Fˆµ (t) = eιtx dFµ (x) be the CF of Fµ √ where ι = −1. Let rµ be the modulus of Fˆµ . Then Fˆµ = rµ eιhµ , where hµ : R → R is the multivalued phase function whose branches differ from each other by addition by a constant multiple of 2π. However, the multiplicity of hµ on R does not affect our analysis since Fˆµ is well-defined on R. Further, hµ restricted to the non-empty n open interval (−τoµ , τµ ) with hµ (0) = 0 is uniquely defined, continuous and odd, where τµ = inf t > 0 : Fˆµ (t) = 0 > 0; see Luo and Zhang (2004) for a positive lower bound for τµ .

Definition 1 If

and, for each µ ∈ U \ {µ0 }

n o t ∈ R : Fˆµ0 (t) = 0 = ∅ sup t∈R

rµ (t) 0, there exists a step function q1,ǫ on R with compact

support A0 such that

q1,ǫ (y) =

n2,ǫ X

a2j 1A2j (y) and

j=1

Z

R

|q1,ǫ (y) − q1 (y)| dy < ǫ,

n

2,ǫ where n2,ǫ ∈ N is finite and the sets {A2j }j=1 are disjoint and

with |t| ≥ 1. Then, the boundedness of ω and Z

rµ rµ0

implies −1

[−1,1]

|ω (s) q1 (ts) − ω (s) q1,ǫ (ts)| ds ≤ C |t| 9

Z

R

Sn2,ǫ

j=1 A2j

⊆ A0 . Now consider t

|q1,ǫ (y) − q1 (y)| dy ≤ 2Cǫ.

(13)

Let τ (ts) = hµ (ts) − hµ0 (ts) and a∗t = max ν ({s ∈ [−1, 1] : ts ∈ A2j }) . 1≤j≤n2,ǫ

n

2,ǫ Since the sets {A2,j }j=1 are uniformly bounded, lim|t|→∞ a∗t = 0 and

Z

[−1,1]

|ω (s) q1,ǫ (ts) exp (ιτ (ts))| ds ≤ Cn2,ǫ a∗t → 0 as |t| → ∞.

(14)

Combining (13) and (14) gives (12), which justifies the claim. When each Fµ has a density with respect to ν, the condition rµ rµ−1 ∈ L1 (R) 0

Corollary 1 can be fairly strong since it forces limt→∞

rµ (t) rµ0 (t)

T

L∞ (R) in

= 0. Unfortunately, Corollary 1 is

not applicable to location-shift families since for these families

rµ (t) rµ0 (t)

≡ 1 for all µ ∈ U whenever

rµ0 (t) 6= 0; see Lemma 1. In contrast, Theorem 1 is as we explain next.

Recall the definition of location-shift family, i.e., F = {Fµ : µ ∈ U } is a location-shift family

if and only if z + µ′ has CDF Fµ+µ′ whenever z has CDF Fµ for µ, µ + µ′ ∈ U .

Lemma 1 Suppose F is a location-shift family. Then, rµ does not depend on µ and hµ (t) = tµ′ , where µ′ = µ − µ0 . If in addition Fˆµ (t) 6= 0 for all t ∈ R, then Fˆ is separable. 0

Proof. Since F is a location-shift family, if z has CDF Fµ with µ ∈ U , then there exists some

µ0 ∈ U such that z = µ′ + z ′ , where z ′ has CDF Fµ0 and µ′ = µ − µ0 . So,

   Fˆµ (t) = E [exp (ιtz)] = E exp ιt µ′ + z ′ = Fˆµ0 (t) exp ιtµ′

for all t. In particular, in the representation Fˆµ = rµ eιhµ , the modulus rµ does not depend on µ and hµ (t) = tµ′ . If Fˆµ0 (t) = 6 0 for all t ∈ R, then (8) holds and Fˆ is separable.

Lemma 1 implies that very likely the set of location-shift family distributions is a subset of the set of distributions with separable CFs. However, verifying if Fˆµ0 for a location-shift

family has any real zeros is quite difficult in general. This somehow prevents us from giving examples in Section 3.1 that are for separable CFs but not location-shift families. Nonetheless, with Lemma 1, Theorem 1 implies: Corollary 2 If F is a location-shift family for which (6) holds, then ψ (t, µ; µ0 ) =

Z

K (t, y + (µ − µ0 ) ; µ0 ) dFµ0 (y) =

Z

[−1,1]

ω (s) cos (ts (µ − µ0 )) ds.

If additionally ω is good, then (5) holds, i.e., 1 ≥ ψ (t, µ; µ0 ) ≥ 0 for all µ and t. Proof. When F is a location-shift family, Z

dFµ (x) = A

Z

A−(µ−µ0 )

10

dFµ0 (y)

(15)

for each A ⊆ R measurable with respect to Fµ0 . Therefore, Z

K (t, x; µ0 ) dFµ (x) =

Z

K (t, y + (µ − µ0 ) ; µ0 ) dFµ0 (y)

and the first identity in (15) holds. Further,

rµ rµ0

≡ 1 for all µ ∈ U . So, (10) reduces to the

second identity in (15). Finally, since the cosine function is even on R, it suffices to consider

t and µ such that t (µ − µ0 ) > 0 in the representation (15). When ω is good, the proof of the third claim of Lemma 7.1 of Jin (2006) remains valid, which implies 1 ≥ ψ (t, µ; µ0 ) ≥ 0 for all µ and t.

The identity (15) in Corollary 2 asserts that, for a location-shift family, ψ in (2) reduces to a “convolution”, and it is the intrinsic mechanism behind the Fourier transform based construction. When µ0 = 0, from Corollary 2 we directly have Z Z ψ (t, µ; 0) = K (t, x + µ; 0) dF0 (x) =

ω (s) cos (tsµ) ds

(16)

[−1,1]

recovering the construction by Jin (2008) for Gaussian and Laplace families. Examples for which (16) holds are given in Section 3.1.

3.1

Construction I: some examples

We provide some examples from Construction I. They are all formed out of location-shift families by allowing the location parameter µ to vary but fixing the scale parameter σ, and are infinitely divisible so that none of their CFs has any real zero; see Pitman and Yor (2003), Lukacs (1970) and Fischer (2014) for details on their CFs and justifications of their infinitely divisibility. Note however that the Poisson family as an example given in Section 4.3 is infinitely divisible but does not separable CFs. There are certainly other examples that Construction I applies to. But we will not attempt to exhaust them.  Example 1 Gaussian family Normal µ, σ 2 with mean µ and standard deviation σ > 0, for which √ −1   dFµ (x) = fµ (x) = 2πσ exp −2−1 σ −2 (x − µ)2 dν   The CF of fµ is fˆµ (t) = exp (ιtµ) exp −2−1 t2 σ 2 . Here rµ−1 (t) = exp 2−1 t2 σ 2 . Therefore, K (t, x; µ0 ) =

Z

[−1,1]

rµ−1 (t) ω (s) exp (ιts (x − µ0 )) ds

(17)

and (15) holds. When µ0 = 0 and σ = 1, this construction reduces to that in Section 2.1 of Jin (2008). √  Example 2 Laplace family Laplace µ, 2σ 2 with mean µ and standard deviation 2σ > 0 for 11

which

 dFµ 1 (x) = fµ (x) = exp −σ −1 |x − µ| dν 2σ  −1 exp (ιtµ). Therefore, (17) and (15) hold with rµ−1 (t) = and the CF is fˆµ (t) = 1 + σ 2 t2

1 + σ 2 t2 . When µ0 = 0 and σ = 1, this construction reduces to that in Section 7.1 of Jin (2008). Example 3 Logistic family Logistic (µ, σ) with mean µ and scale parameter σ > 0, for which dFµ 1 (x) = fµ (x) = sech2 dν 4σ



x−µ 2σ



and the CF of fµ is fˆµ (t) = πσt (sinh (πσt))−1 exp (ιtµ). Therefore, (17) and (15) hold with rµ−1 (t) = (πσt)−1 sinh (πσt) ∼ (2πσt)−1 eπσt as t → ∞. Example 4 Cauchy family Cauchy (µ, σ) with median µ and scale parameter σ > 0, for which σ2 dFµ 1 (x) = fµ (x) = dν πσ (x − µ)2 + σ 2 and the CF is fˆµ (t) = exp (−σ |t|) exp (ιtµ). Therefore, (17) and (15) hold with rµ−1 (t) = exp (σ |t|). Note that the Cauchy family does not have the first absolute moment.

Example 5 The Hyperbolic Secant family HSecant (µ, σ) with mean µ and scale parameter σ > 0, for which

dFµ 1 1 ; (x) = fµ (x) = dν 2σ cosh π x−µ σ

see, e.g., Chapter 1 of Fischer (2014). The identity Z

+∞

−∞

eιtx

 dx = cosh 2−1 πt , π cosh (x)

  implies Fˆµ (t) = σ −1 exp −ιtµσ −1 sech tσ −1 . Therefore, (17) and (15) hold with rµ−1 (t) =   σ cosh tσ −1 ∼ 2−1 σ exp σ −1 t as t → ∞.

3.2

Construction I: uniform consistency and speed of convergence

The performance of the estimator ϕˆm depends on how accurately it approximates ϕm , the oracle that knows the true value π1,m as t → ∞. Specifically, the smaller em (t) defined by (4) is when t is large, the more accurately ϕ ˆm (t; z) estimates π1,m . Two key factors that affect em are: (i) the magnitude of the reciprocal of the modulus, rµ−1 (t), which appears as a scaling factor 0 in the integrand in the definition of K in (9), and (ii) the magnitudes of the µi ’s and the variabilities of zi ’s. As will be shown by Lemma 2, for independent {zi }m i=1 , the oscillation of

em (t) depends mainly on rµ−1 (t) due to concentration of measure for independent, uniformly 0 12

bounded random variables and their transforms by Lipschitz functions, whereas the consistency of ϕˆm (t; z) depends also on the magnitudes of π1,m and µi ’s that affect how accurate the Oracle is when t is large. Lemma 2 Assume Fˆ to be separable and let 1 a (t; µ0 ) = 1{t6=0} t

Z

[−t,t]

dy + 2 × 1{t=0} . rµ0 (y)

 −1 2 If {zi }m i=1 are independent, then, with probability at least 1 − 2 exp −2 λ , |ϕˆm (t, z) − ϕm (t, µ)| ≤

λ kωk∞ a (t; µ0 ) √ . m

(18)

If there are positive sequences {um }m≥1 , {λm }m≥1 and {tm }m≥1 such that lim sup {ψ (t, µ; µ0 ) : (t, |µ|) ∈ [tm , ∞) × [um , ∞)} = 0

m→∞

and

1 λm kωk∞ a (tm ; µ0 ) √ = 0 and m→∞ π1,m m lim

then

 lim exp −2−1 λ2m = 0,

m→∞

  −1 Pr π1,m ϕˆm (tm , z) − 1 → 0 → 1

(19)

(20)

(21)

whenever {µi : i ∈ I1,m } ⊆ [um , ∞). Proof. Recall

K (t, x; µ0 ) =

Z

[−1,1]

ω (s) cos (tsx − hµ0 (ts)) ds. rµ0 (ts)

Set wi (y) = cos (yzi − hµ0 (y)) for each i and y ∈ R and define m

Sm (y) =

1 X (wi (y) − E [wi (y)]) . m

(22)

i=1

Then m

1 X (K (t, zi ; µ0 ) − E [K (t, zi ; µ0 )]) m i=1 Z ω (s) = Sm (ts) ds. r [−1,1] µ0 (ts)

ϕˆm (t, z) − ϕm (t, µ) =

Since |wi (ts)| ≤ 1 uniformly in (t, s, zi , i), Hoeffding inequality of Hoeffding (1963) implies 

λ Pr |Sm (ts)| ≥ √ m



≤ 2 exp −2−1 λ2 13



for any λ > 0

(23)

uniformly in (t, s, m) ∈ R × [−1, 1] × N+ . Therefore, |ϕˆm (t, z) − ϕm (t, µ)| ≤

λ kωk∞ √ m

Z

[−1,1]

λ kωk 1 ds = √ ∞ a (t; µ0 ) rµ0 (ts) m

 with probability 1 − 2 exp −2−1 λ2 , i.e., (18) holds.

 Consider the second claim. With probability at least 1 − 2 exp −2−1 λ2m ,

ϕˆm (tm , z) ϕˆm (tm , z) − ϕm (tm , µ) ϕm (tm , µ) + − 1 − 1 ≤ π1,m π1,m π1,m X 1 1 λm kωk∞ a (tm ; µ0 ) √ |ψ (tm , µj ; µ0 )| + ≤ π1,m m mπ1,m j∈I1,m

1 λm kωk∞ a (tm ; µ0 ) √ ≤ + sup |ψ (t, µ; µ0 )| . π1,m m (t,|µ|)∈[tm ,∞)×[um ,∞)

(24)

 However, assumptions (19) and (20) imply that both exp −2−1 λ2m and the upper bound in

(24) converge to 0. So, (21) holds.

Lemma 2 captures the key ingredients needed for and the essence of proving the consistency of a proportion estimator based on the Strategy of approximating indicator functions. However, without further information on the phases, which is true for separable CFs in general, the subtle impact of the modulus and means (or medians) on the consistency of an estimator is largely inaccessible. This is why Lemma 2 says nothing about the uniform consistency of an estimator. In contrast, we will see that conditions (18), (19) and (20) take much less implicit forms for location-shift families, and that for these families uniform consistency can be achieved. Define

n o Xm Bm (ρ) = µ ∈ Rm : m−1 |µi − µ0 | ≤ ρ for some ρ > 0 i=1

and um = min {|µj − µ0 | : µj 6= µ0 }.

Theorem 2 Assume that F is a location-shift family for which (6) holds and

for each µ ∈ U . If the following hold: 1. The derivative

d dy hµ0

R

|x|2 dFµ (x) < ∞

d (y) is defined for each y ∈ R and supy∈R dy hµ0 (y) = Cµ0 < ∞.

2. There are positive constants ϑ and q and a positive, monotone increasing sequence {γm }m≥1 such that

lim γm = ∞ and

m→∞

where Rm (ρ, µ) =

Z

mϑ log γm √ √ = ∞, m→∞ Rm (µ) m 2qγm lim

|x| dFµ0 (x) + 2ρ + 2Cµ0 .

14

(25)

Then, for all sufficiently large m, supµ∈Bm (ρ) supt∈[0,γm ] |ϕˆm (t, z) − ϕm (t, µ)| ≤ Υ (q, γm , rµ0 )

(26)

with probability at least 2 pm (ϑ, q, hµ0 , γm ) = 2mϑ γm exp (−qγm ) + 4Aµ0 qγm m−2ϑ (log γm )−2 ,

where

√ 2 kωk∞ 2qγm √ Υ (q, γm , rµ0 ) = m

sup

Z

t∈[0,γm ] [0,1]

(27)

1 ds rµ0 (ts)

(28)

and Aµ0 is the variance of |X| for X that has CDF Fµ0 . −1 Therefore, if γm um → ∞, pm (ϑ, q, hµ0 , γm ) → 0 and π1,m Υ (q, γm , rµ0 ) → 0, then   −1 Pr supµ∈Bm (ρ) π1,m supt∈[0,γm ] ϕˆm (t, z) − 1 → 0 → 1.

(29)

Proof. The strategy of proof adapts that for Lemma 7.2 of Jin (2006) for Gaussian family, which can be regarded as an application of the “chaining method” proposed by Talagrand (1996). Since rµ0 has no real zeros and F is a location-shift family, rµ has no real zeros for each µ 6= µ0 and

hµ (t) is well-defined and continuous in t on R for each µ ∈ U . Therefore,

d dy hµ0

(y) can be

defined.

Recall wi (y) = cos (yzi − hµ0 (y)) and Sm (y) defined by (22). Let sˆm (y) =

1 m

Pm

i=1 wi (y)

and sm (y) = E [ˆ sm (y)]. Define the closed interval Gm = [0, γm ], and let P = {t1 , . . . , tl∗ } for some l∗ ∈ N+ with tj < tj+1 be a partition of Gm with norm ∆ = max1≤j≤l∗ −1 |tj+1 − tj | such that ∆ = m−ϑ . For each y ∈ Gm , pick tj ∈ P that is the closest to y. By Lagrange mean value theorem,

|ˆ sm (y) − sm (y)| ≤ |ˆ sm (yi ) − sm (yi )| + |(ˆ sm (y) − sˆm (yi )) − (sm (y) − sm (yi ))| ≤ |ˆ sm (yi ) − sm (yi )| + ∆ sup |∂y (ˆ sm (y) − sm (y))| , y∈R

where ∂· denotes the derivative with respect to the subscript. So,

√  2qγm B0 = Pr maxy∈Gm |Sm (y)| ≥ √ ≤ B1 + B2 , m 

where sm (yi ) − sm (yi )| ≥ B1 = Pr max1≤i≤l∗ |ˆ and



2qγm − (2qγm )−1/2 log γm √ m

∆−1 (2qγm )−1/2 log γm √ B2 = Pr sup |∂y (ˆ sm (y) − sm (y))| ≥ m y∈R 15

(30)

!

.

!

Applying to B1 the union bound and Hoeffding inequality (23) gives (log γm )2 B1 ≤ 2l∗ exp (−qγm + log γm ) exp − 4qγm

!

2 ≤ 2mϑ γm exp (−qγm ) .

(31)

On the other hand, ∂y wi (y) = − (zi − ∂y hµ0 (y)) sin (yzi − hµ0 (y)), and ∂y E [wi (y)] = E [∂y wi (y)] = −E [(zi − ∂y hµ0 (y)) sin (yzi − hµ0 (y))] holds since

R

d hµ0 (y) = Cµ0 < ∞. So, |x|2 dFµ (x) < ∞ for each µ ∈ U and supy∈R dy m

m

i=1

i=1

1 X 1 X |zi | + 2Cµ0 + E [|zi |] . sup |∂y (ˆ sm (y) − sm (y))| ≤ m m y∈R

(32)

However, since F is a location-shift family, there exits independent and identically distributed

(i.i.d.) {Xi }m i=1 with common CDF Fµ0 such that zi = (µi − µ0 ) + Xi for 1 ≤ i ≤ m. Therefore,

the upper bound in (32) is upper bounded by m

m

m

i=1

i=1

i=1

1 X 2 X 1 X |Xi | + |µi − µ0 | + 2Cµ0 + E [|Xi |] m m m m 1 X (|Xi | − E [|Xi |]) + E [|X1 |] + 2ρ + 2Cµ0 . ≤ m i=1

Namely, m

1 X (|Xi | − E [|Xi |]) + E [|X1 |] + 2ρ + 2Cµ0 . sup |∂y (ˆ sm (y) − sm (y))| ≤ m y∈R i=1

Recall Rm (ρ, µ) = E [|X1 |] + 2ρ + 2Cµ0 . Since ∆−1 (2qγm )−1/2 log γm √ = ∞, m→∞ Rm (ρ, µ) m lim

Chebyshev inequality implies B2,1 = Pr

! m ∆−1 (2qγm )−1/2 log γm 1 X √ − 2Rm (ρ, µ) (|Xi | − E [|Xi |]) ≥ m m i=1

≤ 4Aµ0 qγm m−2ϑ (log γm )−2

for all m large enough, where Aµ0 as the variance of |X1 |. Thus, for all m large enough, B2 ≤ B2,1 ≤ 4Aµ0 qγm m−2ϑ (log γm )−2 , 16

which, together with (31) and (30) and the continuity of sˆm (y) − sm (y) in y, implies ! √ 2qγm ≤ pm (ϑ, q, hµ0 , γm ) , B0 = Pr sup |ˆ sm (y) − sm (y)| ≥ √ m y∈Gm where 2 exp (−qγm ) + 4Aµ0 qγm m−2ϑ (log γm )−2 . pm (ϑ, q, hµ0 , γm ) = 2mϑ γm

Thus, with probability at least pm (ϑ, q, hµ0 , γm ), sup

sup |ϕˆm (t, z) − ϕm (t, µ)|

µ∈Bm (ρ) t∈Gm



sup

sup

µ∈Bm (ρ) t∈Gm

Z

ω (s) [−1,1]

supt∈Gm |Sm (ts)| ds ≤ Υ (q, γm , rµ0 ) rµ0 (ts)

for all sufficiently large m, where √ Z 2 kωk∞ 2qγm 1 √ ds. sup Υ (q, γm , rµ0 ) = m t∈Gm [0,1] rµ0 (ts) Since ϑ, q and γm are such that pm (ϑ, q, hµ0 , γm ) → 0, we have limm→∞ B0 = 0. Finally, recall ψ (tm , µi ; µ0 ) =

Z

[−1,1]

ω (s) cos (tm s (µi − µ0 )) ds,

for which lim sup {ψ (t, µ; µ0 ) : (t, |µ|) ∈ [γm , ∞) × [um , ∞)} = 0

m→∞

−1 since γm |µi − µ0 | ≥ γm |um − µ0 | → ∞ uniformly for i ∈ I1,m . So, when π1,m Υ (q, γm , rµ0 ) → 0,

the same reasoning used to prove (21) implies (29).

Four implications of Theorem 2 are ready to be stated. Firstly, Theorem 2 does not apply to location-shift families, e.g., the Cauchy family, that do not have first-order absolute moments. Secondly, Theorem 2 potentially allows many possible choices of ϑ, q and γm for an F, and we

should choose among all feasible choices one for which both Υ (q, γm , rµ0 ) and pm (ϑ, q, hµ0 , γm ) converge to 0 the fastest and γm → ∞ the fastest, so that the estimator captures the Oracle

the quickest. Thirdly, compared to Theorems 1.4 and 1.5 of Jin (2008) where Bm (ρ) is for

a fixed ρ, we allow Bm (ρ) to change as ρ does as long as Rm (ρ, µ) is of smaller order than

mϑ−1/2 (2qγm )−1/2 log γm ; see condition (25). In fact, the proof of Theorem 2 shows that both Υ (q, γm , rµ0 ) and pm (ϑ, q, hµ0 , γm ) are non-increasing as Bm (ρ) increases. Since both Bm (ρ) and Rm (ρ, µ) are non-decreasing in ρ, we immediately see that it is the harder to consistently

estimate π1,m when Bm (ρ) is for a fixed ρ than an increasing ρ. In other words, consistent

estimation becomes easier when on average |µi − µ0 |’s become larger. Fourthly, the bound (26)

together with (28) imply that, when other things are kept fixed, the larger rµ−1 is, the slower 17

em converges to 0 as m → ∞ and the less stable the estimator is. This has been observed by Jin (2008) for the Gaussian and Laplace families since rµ−1 (t) for the former is much larger than the latter when t is large. To characterize the settings under which an estimator is consistent, we introduce the following definition: Definition 3 Given a family F, the sequence of sets Qm (µ, t; F) ⊆ Rm × R for each m ∈ N+ is called a “uniform consistency class” for the estimator ϕˆm (t, z) if

  −1 Pr supµ∈Qm (µ,t;F ) π1,m supt∈Qm (µ,t;F ) ϕˆm (t, z) − 1 → 0 → 1.

(33)

If (33) holds and each Qm (µ, t; F) contains a connected subset Gm ⊆ R such that limm→∞ ν (Gm ) =

∞, then ϕˆm (t, z) is said to be “uniformly consistent in frequency domain”.

In Theorem 2, we can set γm = log m and um = logloglogmm , pick positive ϑ and q such that  ′ q > ϑ > 2−1 and set Rm (µ) = O mϑ for any 0 ≤ ϑ′ < ϑ − 1/2, so that γm um → ∞, −1/2

pm (ϑ, q, hµ0 , γm ) → 0, mϑ−1/2 γm

log γm → ∞ and (25) holds, all as m → ∞. Under these

settings, to obtain a uniform consistency class Qm (µ, t; F) we then only need to look at (28), which becomes

√ Z C q log m 1 √ Υ (q, log m, rµ0 ) = ds. sup m t∈[0,log m] [0,1] rµ0 (ts)

(34)

It is immediately clear that rµ0 determines Qm (µ, t; F) and the magnitude of π1,m for which ϕˆm (t, z) can be consistent. Thus, for location-shift families  

  q > ϑ > 2−1 , 0 ≤ ϑ′ < ϑ − 1/2, γm = log m, um = logloglogmm ,  ′ Qm (µ, t; F) = , −1  t ∈ [0, γm ] , Rm (µ) = O mϑ , limm→∞ π1,m Υ (q, γm , rµ0 ) = 0 

(35)

and Qm (µ, t; F) is data-adaptive and depends only on um = mini∈I1,m |µi − µ0 |; see also Corollary 3. If Fµ has a density with respect to ν, then limt→∞ rµ (t) = 0 must hold, which forces (34) to give Υ (q, log m, rµ0 ) ≥

√ C q log m √ as m → ∞. m

(36)

On the other hand, Hoeffding inequality is optimal for independent, uniformly almost surely bounded random variables. So, Theorem 2, (35) and (36) together imply that when the Strategy is applied to location-shift families with absolutely continuous CDFs, a uniform consistency class  is unlikely able to contain any π1,m ∈ 0, m−0.5 .

Recall Λm (µ) = limt→∞ ϕm (t, µ) = π1,m for all m and µ. So, ϕˆm (t, z) with a finite t can

rarely be consistent, and a consistent ϕˆm (t, z) often employs a positive sequence {tm : m ≥ 1}

such that limm→∞ tm = ∞ and ϕˆm (t, z) = ϕˆm (tm , z). Therefore, for a consistent ϕ ˆm (tm , z), its 18

intrinsic speed at which limm→∞ |ϕˆm (tm , z) − ϕm (tm , µ)| = 0 is better represented by tm , and we will use tm as the “speed of convergence” of ϕˆm (tm , z). To this end, we introduce:

Definition 4 A non-negative sequence {b′m : m ≥ 1} satisfying limm→∞ b′m = ∞ is called “sub′



log” if lim supm→∞ b′m (log m)−α = 0 for any α′ > 0, “sup-log” if lim inf m→∞ b′m (log m)−α = ′



∞ for any α′ > 0, and “poly-log” if 0 < inf m≥1 b′m (log m)−α ≤ supm≥1 b′m (log m)−α < ∞ for

some α′ > 0.

Let L be the set of location-shift families noneof whose  CFs has real zeros, and consider two ˜ uniform consistency classes Qm (µ, t; F) and Qm µ, t; F respectively for two F, F˜ ∈ L. Then

it may hold that

  rµ0 (t) ≤ r˜µ0 (t) for all large t > 0 =⇒ Qm (µ, t; F) ⊇ Qm µ, t; F˜

(37)

when other aspects are kept fixed, where rµ0 and r˜µ0 are respectively the moduli of the CFs of ˜ With these preparations, we have: Fµ ∈ F and F˜µ ∈ F. 0

0

Corollary 3 Let F and F˜ be two location-shift families that are determined by the same param-

eters and for which neither rµ0 nor r˜µ0 has any real zeros. With the same estimator ϕˆm (t, z) under the assumptions stated in Theorem 2, the assertion (37) holds. Further, the following hold: 1. For Gaussian family with σ = 1: q > 3/2, ϑ = q/3, γm = can be set, and Qm (µ, t; F) =

(

√ 2γ log m for a fixed 0 < γ ≤ 0.5

µ ∈ Bm (ρ) , mini∈I1,m |µi − µ0 | ≥

log log m √ , 2 log m

π1,m ≥ mγ−0.5 , t ∈ [0, γm ]

for a fixed ρ > 0. The fastest speed of convergence is γm =



)

(38)

2 log m (i.e., “poly-log”

speed), achieved when lim inf m→∞ π1,m > 0. 2. For Laplace, Hyperbolic Secant and Logistic families with σ = 1, a uniform consistency class is also (38). However, for Laplace family the fastest speed of convergence is γm =  o m1/5 (i.e., “sup-log” speed), that for Hyperbolic Secant family is γm = 2−1 log m (i.e., “poly-log” speed), and that for Logistic family is γm = (2π)−1 log m (i.e., “poly-log” speed),

all achieved when lim inf m→∞ π1,m > 0. Proof. The first claim is on (37) and is straightforward in view of (26), (28) and (27). Now we  show the rest. Recall pm (ϑ, q, hµ0 , γm ) from (27) with positive ϑ and q. Then, γm = O m2ϑ  2 exp (−qγ ) → 0, as long ensures 4Aµ0 qγm m−2ϑ (log γm )−2 → 0 and γm = o mϑ/2 does 2mϑ γm m as γm → ∞. So,

  pm (ϑ, q, hµ0 , γm ) → 0 when γm = o mϑ/2 and γm → ∞. 19

For the claim for Gaussian family, the uniform consistency class is just Theorem 4 and Theorem 5 of Jin (2008). Further, the proof of Theorem 4 of Jin (2008) shows sup

Z

t∈[0,γm ] [0,1]

rµ−1 (ts) ds 0

So, the fastest speed of convergence is γm =



 2 exp 2−1 γm ≤C . γm

2 log m, achieved when lim inf m→∞ π1,m > 0. The

first part of the third claim follows from the first claim. Now we show the rest of the third claim. For the Laplace family, rµ−1 (t) = 1 + σ 2 t2 . When 0 σ = 1, Z

sup

t∈[0,γm ] [0,1]

So,

rµ−1 (ts) ds 0

=

Z

 γ2 2 2 1 + γm s ds = 1 + m . 3

[0,1]

√   2 2 kωk∞ 2qγm γm √ Υ (q, γm , rµ0 ) = 1+ , m 3

 and γm = o m1/5 ensures Υ (q, γm , rµ0 ) → 0. So, the fasted speed of convergence is γm =  o m1/5 , achieved when lim inf m→∞ π1,m > 0.   For Hyperbolic Secant family, rµ−1 (t) = σ cosh tσ −1 ∼ 2−1 σ exp σ −1 t as t → ∞. When

σ = 1,

sup

Z

t∈[0,γm ] [0,1]

and

rµ−1 (ts) ds

≤2

Z

[0,1]

exp (sγm ) ds ≤

2 exp (γm ) γm

√ C 2qγm exp (γm ) . lim Υ (q, γm , rµ0 ) ≤ √ m→∞ m γm

√ Since exp (γm ) = O ( m) ensures Υ (q, γm , rµ0 ) → 0, the fastest speed of convergence is γm =

2−1 log m, achieved when lim inf m→∞ π1,m > 0.

For Logistic family, rµ−1 (t) = (πσt)−1 sinh (πσt) ∼ (2πσt)−1 eπσt as t → ∞. When σ = 1

and γm is sufficiently large, sup

Z

t∈[0,γm ] [0,1]

and

rµ−1 (t) ds ≤ 2

Z

[0,1]

eπsγm ds ≤ 4

exp (πγm ) πγm

√ C γm exp (πγm ) . lim Υ (q, γm , rµ0 ) ≤ √ m→∞ πγm m

√ Since exp (πγm ) = O ( m) ensures Υ (q, γm , rµ0 ) → 0, the fastest speed of convergence is γm = (2π)−1 log m, achieved when lim inf m→∞ π1,m > 0.

We remark that the conclusions of Corollary 3 hold for Laplace family with any fixed σ > 0, Hyperbolic Secant family with any fixed σ > 1, and Logistic family with any fixed 0 < σ < 1.

20

4

Construction for non-separable characteristic functions

When the random variables {zi }m i=1 do not have separable CFs, Construction I in Section 3 can-

not be used. In particular, outside location-shift families, the translation-convolution equivalence does not hold and Hoeffding inequality is no longer applicable. This makes the construction of a proportion estimator using the Strategy much more challenging. So, we will restrict our attention to {zi }m i=1 whose CDFs do not have separable CFs but belong to NEFs whose mean and

variance are functionally related. Specifically, we show that the Strategy is implementable for

discrete NEFs with infinite supports or “separable moment functions” (see Definition 5). These include 8 of the total of 12 NEF-CVFs. The techniques of construction mainly use generating functions (GFs) and Mellin transform.

4.1

A brief review on natural exponential families

We provide a very brief review on NEF, whose details can be found in Letac (1992). Let β be a R positive Radon measure on R that is not concentrated on one point. Let L (θ) = exθ β (dx) for

θ ∈ R be its Laplace transform and Θ be the maximal open set containing θ such that L (θ) < ∞.

Suppose Θ is not empty and let κ (θ) = log L (θ) be the cumulant function of β. Then F = {Gθ : Gθ (dx) = exp (θx − κ (θ)) β (dx) , θ ∈ Θ}

forms an NEF with respect to the basis β. Note that Θ has a non-empty interior if it is not empty and that L is analytic on the strip AΘ = {z ∈ C : ℜ (z) ∈ Θ}.

The NEF F can be equivalently characterized by its mean domain and variance function.

Specifically, the mean function µ : Θ → U with U = µ (Θ) is given by µ (θ) =

variance function V (θ) =

d2 dθ 2

d dθ κ (θ),

and the

κ (θ) can be parametrized by µ as

V (µ) =

Z

(x − µ)2 Fµ (dx) for µ ∈ U,

where θ = θ (µ) is the inverse function of µ and Fµ = Gθ(µ) . Namely, F = {Fµ : µ ∈ U }. The

pair (V, U ) is called the variance function of F, and it characterizes F.

For the constructions of proportion estimators for NEFs, we will reuse the notation K but

take K as a function of t and θ ∈ Θ. Note that K depends on θ0 but not on any θ 6= θ0 . Further,

we will reuse the notation ψ but take it as a function of t and θ ∈ Θ. For an NEF, ψ defined by

(2) becomes

ψ (t, θ; θ0 ) =

Z

K (t, x; θ0 ) dGθ (x) for Gθ ∈ F.

1 Pm Let θ = (θ1 , . . . , θm ). Then accordingly ϕˆm (t, z) = [1 − K (t, zi ; θ0 )] and ϕm (t, θ) = m i=1 1 Pm [1 − ψ (t, θi ; θ0 )]. m i=1 21

4.2

Construction II: discrete distributions with infinite supports

Suppose the basis β for F is discrete with support N, i.e., there exists a positive sequence {ck }k≥0

such that

X∞

β= Then the power series H (z) =

P∞

k=0 ck z

k

k=0

ck δk .

(39)

with z ∈ C must have a positive radius of convergence

RH , and h is the generating function (GF) of β. Further, if β is a probability measure, then (−∞, 0] ⊆ Θ and RH ≥ 1, and vice versa. The following approach, which we refer to as “Construction II”, provides the construction for discrete NEFs with support N.

Theorem 3 Let F be the NEF generated by β in (39) and ω admissible. For x ∈ N and t ∈ R

set

 Z K (t, x; θ0 ) = H eθ0

[−1,1]

Then

 θ0 (ts)x cos πx 2 − tse ω (s) ds. H (x) (0)

(40)

Z    H eθ0 θ θ0 ψ (t, θ; θ0 ) = ω (s) ds, cos st e − e H (eθ ) [−1,1]

ψ (t, θ0 ; θ0 ) = 1 for any t, and limt→∞ ψ (t, θ; θ0 ) = 0 for each θ 6= θ0 .  Proof. Clearly, L (θ) = H eθ and ck =

H (k) (0) , k!

where H (k) is the kth order derivative of H

and H (0) = H. Let





θ0

K (t, x; θ0 ) = H e and ψ † (t, θ; θ0 ) =

R

Z

[−1,1]

(ιts)x ω (s) ds exp (ιtseθ0 ) H (x) (0)

(41)

K † (t, x; θ0 ) dGθ (x). Then †

Z

K † (t, x; θ0 ) dGθ (x) Z ∞ X H eθ0 (ιts)k eθk ck ω (s) ds = H (eθ ) [−1,1] exp (ιtseθ0 ) H (k) (0) k=0 Z    H eθ0 θ θ0 = ω (s) ds, exp ιst e − e H (eθ ) [−1,1]

ψ (t, θ; θ0 ) =

for which ψ † (t, θ0 ; θ0 ) = 1 for any t and limt→∞ ψ † (t, θ; θ0 ) = 0 for each θ 6= θ0 by the RL

Lemma. Taking the real parts of K † and ψ † yields the claim.

In Theorem 3, H (k) (0) = ck k! for k ∈ N. So, if β is known, then we can use ck k! instead

of H (k) (0), whereas if H is known and H (k) is easy to compute, we can use H (k) (0). This will greatly aid the numerical implementation of Construction II.

22

4.3

Construction II: sample examples

Theorem 3 covers the construction for Abel, Negative Binomial, Poisson, Strict Arcsine, Large Arcsine and Tak´acs families, each of which is an NEF-CVF, has basis β with support N, is infinite  divisible such that H eθ 6= 0 for each θ ∈ Θ, and has non-separable CFs; see Letac and Mora

(1990) for details on these distributions. However, for each of Abel and Large Arcsine families,

the corresponding GF is a composition of two analytic functions, and manually computing  H eθ0 in the construction of K in the statement of Theorem 3 may be cumbersome. Example 6 Poisson family Poissson (µ) with mean µ > 0, for which Fµ ({k}) = Pr (X = k) = µk −µ k! e

for k ∈ N. Clearly,

However,

 Fˆµ (t) = exp µ eιt − 1 = exp (µ (cos t − 1)) exp (ιµ sin t) .

n o Fˆµ : µ > 0 are not separable since when µ = µ0 + 1 and t > 0, 1 ℜ t

Z

[−t,t]

Fˆµ (y) dy Fˆµ (y) 0

!

e−1 = t

Z

exp (cos y) cos (sin y) dy [−t,t]

≥ 2e−1 e−1 cos 1 > 0.  P 1 θ with θ ∈ R, µ (θ) = eθ , H (z) = ez with R = ∞ The basis is β = ∞ H k=0 k! δk , L (θ) = exp e and H (k) (0) = 1 for all k ∈ N. Example 7 Negative Binomial family NegBinomial (θ, n) with θ < 0 and n ∈ N+ such that Gθ ({k}) = Pr (X = k) = (k+n−1)! (n−1)! for −1 neθ 1 − eθ ,

with c∗k = µ (θ) =

k ∈ N. The basis is β =

H (z) = (1 − z)

−n

n c∗k kθ  1 − eθ e k! c∗k k=0 k! δk ,

P∞

L (θ) = 1 − eθ

−n

with RH = 1, and H (k) (0) = c∗k for all k ∈ N.

 P Example 8 Strict Arcsine family. Its VF is V (u) = u 1 + u2 and β = ∞ n=0 c∗0 (1)

=

c∗1 (1)

with θ < 0,

c∗n (1) n! δn ,

where

= 1,

c∗2n (σ) =

n−1 Y k=0

σ 2 + 4k2



and c∗2n+1 (σ) = σ

n−1 Y

σ 2 + (2k + 1)2

k=0



(42)

for σ > 0 and n ∈ N+ . Further, H (z) = exp (arcsin z) with RH = 1.  P Example 9 Large Arcsine family. Its VF is V (u) = u 1 + 2u + 2u2 and β = ∞ n=0 cn δn with c∗n (1+n) (n+1)!

for n ∈ N, where c∗n (σ) is defined in (42) and for which H (z) = exp (arcsin (h (z))) P∞ with h (z) = k=0 ck z k+1 . It can be seen that RH must be finite; otherwise, lim|zl |→∞ |h (zl )| =

cn =

23

∞ for a sequence {zl : l ≥ 1}, and H (zl ) cannot be expanded into a convergent power series at zl for l sufficiently large.

P Example 10 Abel family. Its VF V (u) = u (1 + u)2 , β = ∞ k=0 ck δk with ck = P∞ h(z) k+1 k ∈ N, and H (z) = e with h (z) = k=0 ck z with RH = e−1 . Example 11 Tak´ acs family. Its VF is V (u) = u (1 + u) (1 + 2u) and β = δ0 +

ck =

(2k)! k!(k+1)!

for k ∈ N+ , for which H (z) =

√ 1− 1−4z 2z

(1+k)k−1 k!

P∞

k=1 ck δk

for

with

with RH = 4−1 and z = 0 is a removable

singularity of H.

The above calculations show that the GFs of Negative Binomial, Strict Arcsine, Large Arcsine, Abel and Tak´acs families all have positive and finite radii of convergence whereas that of Poisson family has infinite radius of convergence. This will be very helpful in determining uniform consistency classes for Construction II and its numerical implementation; see Corollary 4.

4.4

Construction III: continuous distributions with separable moments

In contrast to NEFs with support N, we consider non-location-shift NEFs whose members are continuous distributions. Assume 0 ∈ Θ, so that β is a probability measure with finite moments

of all orders. Let

1 c˜n (θ) = L (θ)

Z

n θx

x e β (dx) =

Z

xn dGθ (x) for n ∈ N

(43)

be the moment sequence for Gθ ∈ F. Note that (43) is the Mellin transform of the measure Gθ . Definition 5 If there exist two functions ζ, ξ : Θ → R and a sequence {˜ an }n≥0 that satisfy the following:

• ξ (θ) 6= ξ (θ0 ) whenever θ 6= θ0 , ζ (θ) 6= 0 for all θ ∈ U , and ζ does not depend on any n ∈ N,

• c˜n (θ) = ξ n (θ) ζ (θ) a ˜n for each n ∈ N and θ ∈ Θ, • Ψ (t, θ) =

P∞

n=0

tn ξ n (θ) a ˜n n!

is absolutely convergent pointwise in (t, θ) ∈ R × Θ,

then the moment sequence {˜ cn (θ)}n≥0 is called “separable” (at θ0 ). The concept of separable moment sequence is an analogy to the structured integrand used P tn ξ n (θ) in (41) for Construction II, and the condition on Ψ (t, θ) is usually satisfied since ∞ n=0 n!

already is convergent pointwise on R×Θ. The next approach, which we refer to as “Construction III”, is based on Mellin transform of Gθ and applies to NEFs with separable moment sequences.

24

Theorem 4 Assume that the NEF F has a separable moment sequence {˜ cn (θ)}n≥0 at θ0 , and let ω be admissible. For t, x ∈ R set 1 K (t, x; µ0 ) = ζ (θ0 ) Then

Z

∞ X (−tsx)n cos

Z

[−1,1]

tsξ (θ0 )

a ˜n n!

[−1,1] n=0

ζ (θ) ψ (t, µ; µ0 ) = ζ (θ0 )

π 2n +



ω (s) ds.

cos (ts (ξ (θ0 ) − ξ (θ))) ω (s) ds,

(44)

(45)

ψ (t, θ0 ; θ0 ) = 1 for any t and limt→∞ ψ (t, θ; θ0 ) = 0 for each θ 6= θ0 . Proof. Let †

K (t, x; µ0 ) =

Z

exp (ιtsξ (θ0 ))

[−1,1]

∞ X (−ιtsx)n

n=0

a ˜n n!

ω (s) ds.

(46)

Since {˜ cn (θ)}n≥1 is separable at θ0 , then 1 ψ (t, µ; µ0 ) = ζ (θ0 ) †

1 = ζ (θ0 )

Z Z

K † (t, x; θ0 ) dGθ (x) dGθ (x)

Z

exp (ιtsξ (θ0 ))

[−1,1]

∞ X (−ιtsx)n

n=0

a ˜n n!

∞ X (−ιts)n

1 = ζ (θ0 )

Z

exp (ιtsξ (θ0 )) ω (s) ds

ζ (θ) = ζ (θ0 )

Z

exp (ιtsξ (θ0 )) ω (s) ds

ζ (θ) = ζ (θ0 )

Z

exp (ιts (ξ (θ0 ) − ξ (θ))) ω (s) ds.

[−1,1]

[−1,1]

[−1,1]

n=0 ∞ X n=0

a ˜n n!

ω (s) ds

c˜n (θ)

(−ιts)n n ξ (θ) n! (47)

Further, ψ † (t, µ; µ0 ) = 1 when µ = µ0 for all t, and the RL Lemma implies that limt→∞ ψ † (t, µ; µ0 ) = 0 for each θ 6= θ0 . Taking the real parts of K † and ψ † gives the claim.

Compared to Constructions I and II, Construction III involves the integral of an infinite series

and is more complicated. However, it deals with NEFs that have more complicated structures than the former two.

4.5

Construction III: two examples

We provide two examples from Construction III for Exponential and Gamma families, respectively. Note that Gamma family contains Exponential family and Chi-square family. Example 12 Exponential family Exponential (µ) with mean µ > 0 and basis β (dx) = e−x ν (dx), for which L (θ) = (1 − θ)−1 and µ (θ) = 1 − θ for θ < 1. Further, dFµ (x) = fµ (x) = µe−µx 1[0,∞) (x) dν 25

˜n = n! and ζ ≡ 1. Setting and c˜n (µ) = µ µn!n . So, L (µ) = µ−1 , ξ (µ) = µ−1 , a K (t, x; µ0 ) =

Z

ω (s)

[−1,1]

gives ψ (t, µ; µ0 ) =

Z

∞ (−tsx)n cos X

π 2n 2

+

ts µ0

(n!)

n=0

K (t, x; µ0 ) dFµ (x) =



Z

[−1,1]



ds

cos ts µ−1 − µ−1 0



(48)

ω (s) ds.

Example 13 Gamma family Gamma (θ, σ) with basis β such that dβ xσ−1 e−x (x) = 1 (x) dx with σ > 0, dν Γ (σ) (0,∞) where Γ is the Euler’s Gamma function. So, L (θ) = (1 − θ)−σ , eθx xσ−1 e−x dGθ (x) = fθ (x) = (1 − θ)σ 1(0,∞) (x) dν Γ (σ) for θ < 1, and µ (θ) =

σ 1−θ .

(49)

Since σ

c˜n (θ) = (1 − θ)

we see ξ (θ) = (1 − θ)−1 , a ˜n = K (t, x; θ0 ) = we obtain ψ (t, θ; θ0 ) =

Z

∞ 0

Γ (n + σ) dy 1 e−y y n+σ−1 = , n+σ−1 Γ (σ) (1 − θ) Γ (σ) (1 − θ)n

Γ(n+σ) Γ(σ)

Z Z

and ζ ≡ 1. Setting

ω (s) [−1,1]

[−1,1]

∞ X

(−tsx)n Γ (σ) cos



π ts 2 n + 1−θ0

n!Γ (σ + n)

n=0



ds,

   cos ts (1 − θ0 )−1 − (1 − θ)−1 ω (s) ds,

for which ψ (t, θ0 ; θ0 ) = 1 for all t and limt→∞ ψ (t, µ; θ0 ) = 0 for each θ 6= θ0 . Recall (15) of Construction I based on Fourier transform, i.e., ψ (t, µ; µ0 ) =

Z

K (t, x; µ0 ) dFµ (x) =

Z

K (t, y + (µ − µ0 ) ; µ0 ) dFµ0 (y) ,

where the action of Fourier transform is seen as the translation K (t, x; µ0 ) 7→ K (t, y + (µ − µ0 ) ; µ0 ).

26

In contrast, the action of Mellin transform is seen via ψ (t, θ; θ0 ) =

Z

K (t, x; θ0 ) dGθ (x) ∞

e−(1−θ)x xσ−1 dx K (t, x; θ0 ) (1 − θ)σ Γ (σ)  Z0  y = K t, ; θ0 β (dy) , 1−θ

=

Z

(50) (51)

  where from (50) to (51) scaling K (t, x; θ0 ) 7→ K t, y (1 − θ)−1 ; θ0 is induced. This comparison

clearly shows the action of Mellin transform as the multiplication-convolution equivalence, in contrast to the action of Fourier transform as the translation-convolution equivalence.

5

Two non-existence results for the Strategy

In this section, we provide two example families for which the Strategy is not implementable. To state them, we introduce Definition 6 The proposition “PropK”: there exists a K : R2 → R such that K does not depend

on any θ ∈ Θ1 with θ 6= θ0 and that

ψ (t, θ; θ0 ) =

Z

K (t, x; θ0 ) dGθ (x)

satisfies limt→∞ ψ (t, θ0 ; θ0 ) = 1 and limt→∞ ψ (t, θ; θ0 ) = 0 for θ ∈ Θ1 with θ 6= θ0 , where Θ1 is

a subset of Θ that has a non-empty interior and does not contain θ0 .

For the following two families, i.e., Inverse Gaussian and Binomial families, PropK does not hold. Note that the former family is discrete but has a finite support, whereas the latter is continuous but does not have a separable moment sequence. Example 14 The Inverse Gaussian family InvGaussian (θ, σ) with scale parameter σ > 0 and basis β (dx) = x

−3/2

σ2 exp − 2x 



 √ for which L (θ) = exp −σ −2θ with θ < 0. So,

σ √ 1(0,∞) (x) ν (dx) , 2π

  x−3/2 σ2 σ dGθ √ 1(0,∞) (x) . (x) = fθ (x) = exp − dν L (θ) 2x 2π

27

By a change of variables y = Z

σ2 2x ,

we obtain

  Z ∞ 1 σ σ2 −3/2 √ K (t, x; θ0 ) dGθ (x) = dx exp − K (t, x; θ0 ) x L (θ) 2π 0 2x  Z ∞  σ2 1 K t, ; θ0 y −1/2 e−y dy. =√ 2y πL (θ) 0

(52)

Since the integral on the right hand side of (52) is not a function of θ for θ 6= θ0 , PropK does

not hold and K does not exist. Note that the Gaussian family and Inverse Gaussian family are reciprocal pairs, called so by Letac and Mora (1990). Example 15 Binomial family Binomial (θ, n) such that   n k P (X = k) = θ (1 − θ)n−k for k

θ ∈ (0, 1) .

We will show that if PropK holds, then K must be a function of θ ∈ Θ1 , a contradiction. Assume PropK holds. Then,

ψ (t, θ; θ0 ) =

Z

n

K (t, x; θ0 ) dGθ (x) = (1 − θ)

n X k=0

  n θk , ak (t; θ0 ) k (1 − θ)k

where ax (t; θ0 ) = K (t, x; θ0 ) for x = 0, ..., n.  θ . Pick n + 1 distinct values ̟i , i = 0, . . . , n from Let dk (t; θ0 ) = ak (t; θ0 ) nk and q (θ) = 1−θ

(0, 1) such that ̟0 = θ0 and ̟i ∈ Θ1 for i = 1, . . . , n. Further, define the (n + 1) × (n + 1)  Vandermonde matrix Vn+1 whose (i, j) entry is q j−1 (̟i−1 ), i.e., Vn+1 = q j−1 (̟i−1 ) , dt = T (d0 (t; θ0 ) , . . . , dn (t; θ0 ))T and b = (1 − θ0 )−n , 0, . . . , 0 . Then the determinant |Vn+1 | = 6 0, and the properties of K imply

lim Vn+1 dt = b.

t→∞

However, Vn+1 : Rn+1 → Rn+1 as a bounded linear mapping is a homeomorphism with the

−1 bounded inverse Vn+1 . So,

−1 −1 lim dt = lim Vn+1 (Vn+1 (dt )) = Vn+1 b.

t→∞

t→∞

(53)

By Cramer’s rule, we obtain lim dn (t; θ0 ) =

t→∞

(−1)n |Vn | (−1)n (1 − θ0 )−n , n = Qn |Vn+1 | (1 − θ0 ) k=1 (q (̟k ) − q (θ0 ))

(54)

where Vn is the submatrix of Vn+1 obtained by removing the first row and last column of Vn+1 . But (54) is a contradiction since dn (t; θ0 ) does not depend on any θ ∈ Θ1 for all t. For the case

28

n = 1, we easily see from (54) the contradiction lim a1 (t; θ0 ) =

t→∞

− (1 − θ0 )−1 for each θ ∈ Θ1 . q (θ) − q (θ0 )

To summarize, PropK does not hold and K does not exist.

6

Construction II: consistency and speed of convergence

n o Recall H (k) (0) = ck k! for k ∈ N. Call the sequence (ck k!)−1 , k ∈ N the “reciprocal derivative

sequence (of H at 0)”. The following lemma gives bounds on the magnitudes of this sequence for the examples given in Section 4.3. It will help derive concentration inequalities for estimators from Construction II. Lemma 3 Consider the examples given in Section 4.3. Then ck k! ≡ 1 for Poisson family,

whereas for Negative Binomial, Abel and Tak´ acs families with a fixed n and σ > 0, 1 C ≤ for all k ∈ N. ck k! k!

(55)

However, for Strict Arcsine and Large Arcsine families, both with a fixed σ > 0, (55) does not hold. On the other hand, for any r˜ > 0 smaller than the radius RH of convergence of H, 1 1 1 r˜k = (k) ≥ ck k! H (˜ r) k! H (0)

for all k ∈ N.

(56)

Proof. By simple calculations, we obtain the following: (1) ck k! ≡ 1 for Poisson family; (2)

(ck k!)−1 =

(n−1)! (k+n−1)!

for Negative Binomial family with a fixed n; (3) (ck k!)−1 = (1 + k)−(k−1)

for Abel family; (4) (ck k!)−1 = (k + 1)! ((2k)!)−1 for Tak´acs family. Therefore, (55) holds. Fix a σ > 0. Then for Strict Arcsine family, ck k! = c∗k (1) ≥ 2k−2 and for Large Arcsine family, ck k! =

c∗k

 2 ⌊2−1 k⌋ − 1 ! ,

−1 k+1 ≥ (1 + k)⌊2 k⌋−1 2k−2 (1 + k)

 2 ⌊2−1 k⌋ − 1 ! .

So, (55) does not hold for these two families. Now we show the third claim. Since H (z) = gence, there exists RH > r˜ > 0 such that H

(k)

k! (0) = 2πι

Z

{z∈C:|z|=˜ r}

29

P∞

k=0 ck z

H (z) dz z k+1

k

has a positive radius RH of conver-

for all k ∈ N.

However, H has all positive coefficients. Therefore, max{z∈C:|z|=˜r} |H (z)| is achieved when z = r˜, and

sup|z|=˜r |H (z)| k! k! (k) 2π˜ r = H (˜ r) k . H (0) ≤ k+1 2π r˜ r˜

Observing that H (k) (0) is real and H (k) (0) = ck k! for k ∈ N gives (56).

Lemma 3 shows that, among the six discrete NEFs with support N given in Section 4.3,

the reciprocal derivative sequence for Poisson family has the largest magnitude, whereas this sequence for Negative Binomial, Abel and Tak´acs families with a fixed n and σ > 0 are all dom1 inated by the “reciprocal factorial sequence” k! : k ∈ N approximately. Further, Lemma 3  asserts that the reciprocal derivative sequence dominates the “exponential sequence” H (˜ r ) r˜k!k : k ∈ N

for any r˜ > 0 smaller than the radius of convergence of H.

6.1

Probabilistic bounds on oscillations of estimators

Let η = eθ for θ ∈ Θ, ηi = eθi for 0 ≤ i ≤ m and η = (η1 , . . . , ηm ). First, we provide upper

bounds on the variance and oscillations of ϕˆm (t, z) − ϕm (t, θ) when t is positive and sufficiently large.

m Theorem 5 Let F be an NEF generated by β in (39), {zi }m i=1 independent with CDFs {Gθi }i=1

belonging to F, λ a positive constant, and t positive and sufficiently large. o n 1/4 . If (55) holds, then 1. Let φm (L, η) = min1≤i≤m L (θi ) ηi (m)

V [ϕˆm (t, z) − ϕm (t, θ)] ≤ VII,1

  1/2 exp 2t kηk ∞ C √ = m tφm (L, θ)

and (m)

Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤ λ−2 VII,1 .

(57)

(m)

2. Let Lmin = min1≤i≤m L (θi ). Then for Poisson family, (m)

V [ϕˆm (t, z) − ϕm (t, θ)] ≤ VII,2

  2 kηk1/2 exp t ∞ C = (m) m L

(58)

min

and (m)

Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤ λ−2 VII,2 . Proof. Recall H (z) =

P∞

k=0 ck z

CDF Gθ and η > 0, then



E t

2Z

k

and H (k) (0) = ck k! for k ∈ N. First, we show that, if Z has

 −2  (Z) H (0) ≤ 30

√  C exp 2t η p√ L (θ) t η

(59)

h −2 i . Since (55) holds, for positive and sufficiently large t. Let χZ (t) = E t2Z H (Z) (0) χZ (t) =



C 1 X t2k ck η k √ ≤ BII (2t η) , 2 L (θ) L (θ) (ck k!) k=0

where BII (x) = let

P∞ (2−1 x)2k k=0

(k!)2

for x > 0. So, it suffices to bound BII (x). For σ > 0 and y ∈ C,

Jσ (y) =

∞ X

 y 2n+σ (−1)n n!Γ (n + σ + 1) 2 n=0

(60)

Then Jσ is the Bessel function of the first kind of order σ; see definition (1.17.1) in Chapter 1 o (1975), and BII (x) = J0 (−ιx). By identity (1.71.8) in Chapter 1 of Szeg¨ o (1975) that of Szeg¨ was derived on page 368 of Whittaker and Watson (1940), we have Jσ (y) =

√  2 (πy)−1/2 cos (y − c0 ) 1 + O y −2

(61)

as y → ∞ whenever |arg y| < π, where c0 = 2−1 σπ − 4−1 π. So,

√ √ √ √ BII (2t η) = |J0 (−2ιt η)| = C (t η)−1/2 exp (2t η)

(62)

as 0 < tη → ∞, where we have used the identity |cos z|2 = cosh2 (ℑ (z)) − sin2 (ℜ (z)). The

bound given by (62) is tight up to a multiple of a positive constant, which can be seen from

inequality (5) of Gronwall (1932). On the other hand, BII (tη) = O (1) when tη = O (1). Thus, when η > 0, (59) holds for all positive and sufficiently large t. Now we show the first claim. Recall ηi = eθi when zi has CDF Gθi for 1 ≤ i ≤ m. Let

Vm (ϕ) ˆ = V [ϕˆm (t, z) − ϕm (t, θ)] and p˜m (λ) = Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ). Let w (t, x) = and

tx cos

π 2x

− tη0 cx x!



for t ≥ 0 and x ∈ N

m

1 X Sm (t) = (w (t, zi ) − E [w (t, zi )]) for t ≥ 0. m i=1 R Then (40) is equivalent to K (t, x; θ0 ) = H (η0 ) [−1,1] w (ts, x) ω (s) ds and ϕˆm (t, z) − ϕm (t, θ) = H (η0 )

31

Z

Sm (ts) ω (s) ds. [−1,1]

(63)

  Since E w2 (t, Z) ≤ χZ (t), inequality (59) implies

  1/2 √  m exp 2t kηk X exp 2t ηi ∞ C C Vm (ϕ) ˆ ≤ 2 √ 1/2 ≤ m √tφ (L, θ) . m m i=1 L (θi ) t ηi

So,

  1/2 exp 2t kηk ∞ CVm (ϕ) ˆ C √ Pr (|Sm (t)| ≥ λ) ≤ ≤ 2 . λ2 λ m tφm (L, θ) From

Z

1 t

Z

λ 2H (η0 ) kωk∞



Sm (ts) ω (s) ds = [−1,1]

[−t,t]

for t > 0, we have 

p˜m (λ) ≤ Pr |Sm (t)| ≥



 Sm (y) ω yt−1 dy

CH 2 (η0 ) kωk2∞ λ2 m

  exp 2t kηk1/2 ∞ √ . tφm (L, θ)

Finally, we show the second claim. For Poisson family, ck k! = 1 for all k ∈ N. So, from (63)

we obtain

  2 kηk1/2 √  2 2 exp t  ∞ H (η0 ) exp t η C E w2 (t, Z) ≤ and Vm (ϕ) ˆ ≤ L (θ) m min1≤i≤m L (θi ) 

and 

p˜m (λ) ≤ Pr |Sm (t)| ≥

λ 2H (η0 ) kωk∞



  2 kηk1/2 exp t ∞ C ≤ 2 λ m min1≤i≤m L (θi )

for positive and sufficiently large t. We remark that the assertion in Theorem 5 on Poisson family holds for any NEF with support N such that ck k! ≤ C for all k ∈ N.

6.2

Uniform consistency and speeds of convergence

With Theorem 5, we derive the uniform consistency classes and speeds of convergence for the  estimators from Construction II. Recall η = eθ for θ ∈ Θ and η = eθ1 , . . . , eθm .

m Theorem 6 Let F be the NEF generated by β in (39), {zi }m i=1 independent with CDFs {Gθi }i=1

belonging to F, and ρ a finite, positive constant.

1. If (55) holds, then a uniform consistency class is QII,1 (θ, t, π1,m ; γ) =

(

kθk∞ ≤ ρ, π1,m ≥ m(γ−1)/2 , t = 2−1 kηk−1/2 γ log m, ∞ limm→∞ t mini∈I1,m |η0 − ηi | = ∞ 32

)

for any fixed γ ∈ (0, 1] . The speed of convergence is “poly-log”. 2. For Poisson family, a uniform consistency class is

QII,2 (θ, t, π1,m ; γ) =

 

kθk∞ ≤ ρ, π1,m ≥ m

(γ ′ −1)/2

,t =

q

kηk−1/2 γ ∞

log m,

limm→∞ t mini∈I1,m |η0 − ηi | = ∞



for any fixed γ ∈ (0, 1) and γ ′ > γ. The speed of convergence is “poly-log”.

  

Proof. Obviously, φm (L, θ) is positive and finite when kθk∞ ≤ ρ. First, consider the case when (55) holds. Then (57) implies

  1/2 exp 2t kηk ∞ C √ Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤ 2 . λ m t −1/2 We can set t = 2−1 kηk∞ γ log m for γ ∈ (0, 1], which induces

Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤

C 1 √ . λ2 m1−γ γ log m

Let ε > 0 be any finite constant. If π1,m ≥ Cm(γ−1)/2 , then Pr



 |ϕˆm (t, z) − ϕm (t, θ)| Cε−2 → 0 as m → ∞. ≥ε ≤ √ π1,m γ log m

Moreover, H (η0 ) ψ (t, θ; θ0 ) = H (η)

Z

[−1,1]

cos (st (η − η0 )) ω (s) ds → 0 as m → ∞

whenever limm→∞ t min1≤i≤m |η0 − ηi | = ∞.

(m)

Secondly, we deal with Poisson family. Clearly, Lmin = min1≤i≤m L (θi ) is positive and finite when kθk∞ ≤ ρ. So, inequality (58) implies Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤ So, we can set t =

q

  C 1/2 2 . exp t kηk ∞ λ2 m

kηk−1/2 γ log m for a fixed γ ∈ (0, 1), which induces ∞ Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤

33

C λ2 m1−γ

.



If π1,m ≥ Cm(γ −1)/2 for any γ ′ > γ, then Pr



 |ϕˆm (t, z) − ϕm (t, θ)| Cε−2 ≥ ε ≤ γ ′ −γ → 0 as m → ∞. π1,m m

This completes the proof. In Theorem 6, the speed of convergence and uniform consistency class depend on kηk∞ and

mini∈I1,m |η0 − ηi |, whereas they only depend on mini∈I1,m |µi − µ0 | for location-shift families.

However, when the GF H of the basis β has finite radius of convergence, their dependence on kηk∞ can be removed, as justified by: m Corollary 4 Let F be the NEF generated by β in (39), {zi }m i=1 independent with CDFs {Gθi }i=1

belonging to F, and ρ a finite, positive constant. If (55) holds and H has a finite radius of convergence RH , then for positive and sufficiently large t

  1/2 exp 2tR H C √ , V [ϕˆm (t, z) − ϕm (t, θ)] ≤ m tφm (L, θ) and a uniform consistency class is ˜ II,1 (θ, t, π1,m ; γ) = Q

(

−1/2

kθk∞ ≤ ρ, t = 2−1 RH

γ log m,

π1,m ≥ m(γ−1)/2 , mini∈I1,m |η0 − ηi | ≥

log log m √ 2 log m

)

for any γ ∈ (0, 1]. Since sup Θ ≤ log RH and kηk∞ ≤ RH , the proof Corollary 4 follows easily from the first ˜ II,3 (θ, t, π1,m , γ) claims of Theorem 5 and Theorem 6 and is omitted. The uniform consistency class Q

is fully data-adaptive and only requires information on mini∈I1,m |η0 − ηi |, as do Jin’s estimator of

Jin (2008) for Gaussian family and Construction I for location-shift families on mini∈I1,m |µi − µ0 |.

7

Construction III: consistency and speed of convergence

We will focus on Gamma family and show that the corresponding estimators are uniformly consistent.

7.1

Probabilistic bounds on oscillations of estimators

 Recall fθ defined by (49) for Gamma family. Then fθ (x) = O xσ−1 as x → 0+, which tends

to 0 when σ > 1. The next result provides upper bounds on the variance and oscillations of ϕˆm (t, z) − ϕm (t, θ) when t is positive and large.

34

m Theorem 7 Consider Gamma family such that {zj }m j=1 are independent with parameters {(θi , σ)}i=1

and a fixed σ > 0. Assume t is positive and sufficiently large and set u3,m = min1≤i≤m {1 − θi }. Then

V [ϕˆm (t, z) − ϕm (t, θ)] ≤

C = exp m

(m) VIII



4t u3,m

X m  i=1

t 1 − θi

3/4−σ

and (m)

Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ) ≤ λ−2 VIII . Proof. Let

∞ X

w ˜ (z, x) =

(64)

(zx)n for z, x > 0. n!Γ (σ + n)

n=0

First, we will show that, if Z has CDF Gθ , then 

2



E w ˜ (z, Z) ≤ C



z 1−θ

3/4−σ

exp



4z 1−θ



(65)

for positive and sufficiently large z. Recall the Bessel function Jσ defined by (60) and the asymptotic bound (61). Then, w ˜ (z, x) =



zx

  1−σ 1−σ 1 σ √  √  ι Jσ−1 ι2 zx = (zx) 4 − 2 exp 2 zx 1 + O (zx)−1

when zx → ∞. Let A1,z = {x ∈ (0, ∞) : zx = O (1)}. Then, on the set A1,z , fθ (x) = O xσ−1 and w ˜ (z, x) ≤ Cezx = O (1) when θ < 1. Therefore, Z

A1,z

w ˜2 (z, x) dGθ (x) ≤ C (1 − θ)σ

Z

A1,z

xσ−1 dx ≤ C (1 − θ)σ z −σ .



(66)

On the other hand, let A2,z = {x ∈ (0, ∞) : limz→∞ zx = ∞}. Then Z

2

A2,z

w ˜ (z, x) dGθ (x) ≤ C ≤

where BIII (z) =

Z

Z

n=0

(zx)

1 −σ 2

A2,z

∞ X 4n z n/2

n!

A2,z

√  1 (zx) 2 −σ exp 4 zx dGθ (x) √ n ∞ X (4 zx) n!

n=0

c†n/2

c†n/2

and

=

Z

1

dGθ (x) = z 2 −σ BIII (z) ,

−1 (n+1)−σ

x2

dGθ (x)

and c†n/2 is referred to as a “half-moment”. However, c†n/2

(1 − θ)σ = Γ (σ)

Z



x

2−1 (n+1)−σ θx σ−1 −x

e x

e

0

35

 Γ 2−1 n + 2−1 (1 − θ)σ−1/2 dx = , Γ (σ) (1 − θ)n/2

(67)

and by Stirling formula, p   n−1 2 π (n − 1) n−1 Γ 2−1 n + 2−1 n (n − 1)n/2 (n − 1)−1/2 −n 2 2 2 ≤C ≤ Ce 2 n n−1 √ n! nn/2 nn/2 e 2 2πn n n

n

≤ Ce 2 2− 2

e −1/2

(n − 1) nn/2

−1/4 n n ≤ C2− 2 √ . n!

Therefore, σ−1/2

BIII (z) ≤ C (1 − θ) where Q∗ (z) =

z√n/2 . n!

P∞

n=0

  n ∞ X 8z 4n z n/2 2− 2 1 σ−1/2 ∗ √ = C (1 − θ) , Q n/2 1−θ n! (1 − θ) n=0

By definition (8.01) and identify (8.07) in Chapter 8 of Olver (1974),

Q∗ (z) =



2 (2πz)1/4 exp 2−1 z

Combining (67), (68) and (69) gives Z

σ−1/2

2

A2,z

(68)

w ˜ (z, x) dGθ (x) ≤ C (1 − θ)

z

1 + O z −1



1 −σ 2



z 1−θ



1/4

.

exp

(69)



4z 1−θ



for all positive and sufficiently large z. Recall (66). Thus, when 1 − θ > 0, σ > 0 and z is

positive and sufficiently large, 

2



E w ˜ (z, Z) ≤

Z

2

w ˜ (z, x) dGθ (x) + z −σ +

≤C



z 1−θ

w ˜ 2 (z, x) dGθ (x)

A2,z

A1,z

≤C

Z



z 1−θ

3/4−σ

3/4−σ

exp



exp

4z 1−θ





4z 1−θ

!

.

Thus, (65) holds. Now we show the claim. Define ∞ X (−tx)n cos 2−1 πn + tξ (θ0 ) w (t, x) = Γ (σ) n!Γ (n + σ) n=0

Set Sm (t) = m−1

Pm

i=1 (w (t, zi ) −



for t ≥ 0 and x > 0.

E [w (t, zi )]). Then (44) is just

1 K (t, x; θ0 ) = ζ (θ0 )

Z

w (ts, x) ω (s) ds. [−1,1]

Recall Vm (ϕ) ˆ = V [ϕˆm (t, z) − ϕm (t, θ)]. Since |w (t, x)| ≤ Γ (σ) w ˜ (t, x) uniformly in (t, x), (65) 36

implies, for positive and sufficiently large t, Vm (ϕ) ˆ ≤m

−2 2

Γ (σ)

m X i=1

 2  E w ˜ (t, x)

 3/4−σ  m  4t 1 (m) t C X exp ≤ VIII , ≤ 2 m 1 − θi 1 − θi m i=1

where (m) VIII

C = exp m



4t u3,m

X m  i=1

t 1 − θi

3/4−σ

.

and u3,m = min1≤i≤m {1 − θi }. Recall p˜m (λ) = Pr (|ϕˆm (t, z) − ϕm (t, θ)| ≥ λ). So, Pr (|Sm (t)| ≥ λ) ≤ (m)

λ−2 VIII

and

  kωk2 1 (m) ζ (θ0 ) λ p˜m (λ) ≤ Pr |Sm (t)| ≥ ≤ 2 ∞ 2 VIII , 2 kωk∞ ζ (θ0 ) λ

completing the proof.

7.2

Uniform consistency of estimators

Recall u3,m = min1≤i≤m {1 − θi } and ξ (θ) = (1 − θ)−1 , such that u3,m = min1≤i≤m ξ −1 (θi ).

Using Theorem 7, we show the uniform consistency and speed of convergence of the estimator for Gamma family.

m Theorem 8 Consider Gamma family such that {zj }m j=1 are independent with parameters {(θi , σ)}i=1

and a fixed σ > 0. Let ρ > 0 be a finite constant. If σ > 3/4, then the uniform consistency class QIII (θ, t, π1,m ; γ) =

(

kθk ≤ ρ, t = 4−1 γu3,m log m, limm→∞ u3,m log m = ∞,

π1,m ≥ m(γ−1)/2 , limm→∞ t mini∈I1,m |ξ (θ0 ) − ξ (θi )| = ∞

for any fixed γ ∈ (0, 1]. On the other hand, if σ ≤ 3/4, then QIII (θ, t, π1,m ; γ) =

(



kθk ≤ ρ, t = 4−1 γu3,m log m, π1,m ≥ m(γ −1)/2 , limm→∞ t mini∈I1,m |ξ (θ0 ) − ξ (θi )| = ∞

for any fixed γ ∈ (0, 1) and γ ′ > γ. Proof. Recall (m) VIII

C = exp m



4t u3,m

X m  i=1

t 1 − θi

and (64), i.e., Pr (|ϕˆm (t, z) − ϕm (t, µ)| ≥ λ) ≤

37

3/4−σ

1 (m) , V λ2 III

)

)

where u3,m = min1≤i≤m {1 − θi } and θi < 1, we divide the rest of the proof into two cases:

σ > 3/4 or σ ≤ 3/4. If σ > 3/4 and kθk ≤ ρ, then (m) VIII

C ≤ t3/4−σ exp m



4t u3,m



 σ−3/4  4t C 3/4−σ exp ≤ t max (1 − θi ) . 1≤i≤m m u3,m

So, we can set t = 4−1 u3,m γ log m for any fixed γ ∈ (0, 1] to obtain Pr (|ϕˆm (t, z) − ϕm (t, µ)| ≥ λ) ≤

C m1−γ λ2

4−1 u3,m γ log m

which implies Pr



3/4−σ

,

(70)

 3/4−σ |ϕˆm (t, z) − ϕm (t, µ)| ≥ ε ≤ Cε−2 4−1 u3,m γ log m → 0 as m → ∞ π1,m

for any fixed ε > 0 whenever π1,m ≥ Cm(γ−1)/2 and u3,m γ log m → ∞. In contrast, if σ ≤ 3/4, then

(m) VIII

C ≤ m



t u3,m

3/4−σ

exp



4t u3,m



.

So, we can still set t = 4−1 u3,m γ log m for any fixed γ ∈ (0, 1), which implies Pr



3/4−σ  Cε−2 4−1 γ log m |ϕˆm (t, z) − ϕm (t, µ)| ≥ε ≤ → 0 as m → ∞ π1,m mγ ′ −γ ′

whenever π1,m ≥ Cm(γ −1)/2 for any γ ′ > γ. Recall for (44) ζ (θ) ψ (t, µ; µ0 ) = ζ (θ0 )

Z

[−1,1]

cos (ts (ξ (θ0 ) − ξ (θ))) ω (s) ds,

which converges to 0 as m → ∞ when limm→∞ t mini∈I1,m |ξ (θ0 ) − ξ (θi )| = ∞. Noticing u3,m = min1≤i≤ ξ −1 (θi ), we have shown the claim.

In Theorem 8, the speed of convergence and uniform consistency class depend on u3,m = min1≤i≤m {1 − θi } and ξ3,m = mini∈I1,m |ξ (θ0 ) − ξ (θi )|. Since θ < 1 for Gamma family, u3,m

measures how close a Gθi is to the singularity where a Gamma density is undefined, and it is

sensible to often assume lim inf m→∞ u3,m > 0. On the other hand, σξ (θ) = µ (θ) for all θ ∈ Θ for Gamma family. So, ξ3,m measures the minimal difference between the means of Gθi for i ∈ I1,m and Gθ0 , and ξ3,m cannot be too small relative to t as t → ∞ in order for the estimator to achieve consistency.

38

8

Discussion

We have demonstrated that solutions of Lebesgue-Stieltjes integral equations can serve as a universal construction for proportions estimators, provided proportion estimators for random variables with three types of distributions, and justified under independence the uniform consistency and speed of convergence of such estimators. In order to achieve uniform consistency for estimators from Construction II and III, we may need to estimate the supremum norm of the parameter vector or the infimum of a transform of the vector itself. In an accompanying article, we will deal with this issue, discuss the numerical implementation of Construction III, and report the non-asymptotic performances of the estimators. For the rest of this section, we will discuss future topics along the line of research presented here (see Section 8.1), uniform consistency in frequency domain for Constructions II and III and its relation to concentration inequalities for non-Lipschitz functions of independent random variables (see Section 8.2), and Construction III for Resell and Hyperbolic Cosine families (see Section 8.3).

8.1

Topics worthy of future investigations

Our work induces four topics that are worthy of future investigations. Firstly, we have not considered functionals of the parameter space other than a parameter being equal to a fixed value. It would be interesting to construct uniformly consistent proportion estimators for different functionals serving various practical purposes. However, in terms of inducing a consistent estimator of the proportion of parameters that are no larger than a fixed value, none of Constructions I, II and III works since they are all based on the Riemann-Lebesgue Lemma, and the construction in Jin (2008) to estimate the proportion of parameters whose magnitudes are upper bounded by a fixed value for Gaussian family cannot be extended to deal with this case since the indicator function 1{x≤x0 } with x ∈ R for any fixed x0 does not have a Fourier transform. Further, we

have only considered independent random variables. Extending the consistency results provided here to dependent case will greatly enlarge the scope of applications of the estimators, as did by Chen (2018) and Jin and Cai (2007) to Jin’s estimator of Jin (2008) for Gaussian family and

Gaussian mixtures. Secondly, we have only been able to construct proportion estimators for three families of distributions, and provide Gamma family as an example for Construction III. It is worthwhile to explore other settings for which solutions of Lebesgue-Stiejtjes integro-differential equations exit, can be analytically expressed, and serve as consistent proportion estimators. Further, we have not studied optimal properties of the proposed estimators or their reciprocal conservativeness, both of which are important from a decision theoretic perspective and for non-asymptotic conservative FDR estimation and control, and are worth investigation. With regard to studying the optimality properties, the techniques of Cai and Jin (2010) and Carpentier and Verzelen

39

(2017) may be useful. Thirdly, for eight members of the twelve NEF-CVFs, we are able to construct proportion estimators as solutions of the proposed Lebesgue-Stieltjes integral equation, and it is known from Hassairi and Zarai (2004) that NEF-CVFs all have exponential generating functions and presentations via orthogonal polynomials. Further, Construction III for Gamma family involves Bessel function of the first kind, and so does concentration inequalities for estimators from Construction II. It is challenging to explore the connection between solutions of Lebesgue-Stieltjes integral equations, exponential generating functions and second order differential equations. Fourthly, we have introduced the concept of “the family of distributions with separable characteristic functions” (see Definition 1) for which n

o t ∈ R : Fˆµ0 (t) = 0 = ∅,

(71)

and shown that it contains several location-shift families. The requirement (71) precludes the characteristic function Fˆµ to have any real zeros. We are aware that Poisson family is infinitely 0

divisible but not a location-shift family, nor does it have separable characteristic functions since (8) does not hold. However, it is unclear to us the relationships (with respect to set inclusion) between these families of distributions. So, a better understanding of such relationships will contribute both to the theory of probability distributions and finding examples different than those given here that Constructions I, II and III apply to. On the other hand, (71) is on the real zeros of the Fourier transform of a sigma-finite measure, and investigation along this has connections with George P´ olya’s treatment of the Riemann Hypothesis via zeros of Fourier transform; see, e.g., Newman (1976).

8.2

On uniform consistency in frequency domain for Construction II and III

For Construction I, recall (29) for location-shift families with separable CFs, i.e.,   −1 Pr supµ∈Bm (ρ) π1,m supt∈[0,γm ] ϕˆm (t, z) − 1 → 0 → 1,

for which the estimator is also consistent uniformly in t ∈ [0, γm ] for a positive, increasing

sequence γm → ∞. This is referred to as “uniform consistency in frequency domain”. Even

though theoretically it provides much flexibility in choosing different sequences of values for t when estimating π1,m , it does not have much practical value since t needs to be large for ϕm (t, µ) to converge to π1,m fast so that ϕˆm (t, z) can accurately estimate π1,m . Such uniform consistency is a consequence of the uniform boundedness and global Lipschitz property of the transform on {zi }m i=1 that is used to construct K (t, x; µ0 ). In contrast, for

Constructions II and III, the corresponding transform is not necessarily uniformly bounded or globally Lipschitz (see the comparison below), and uniform consistency in frequency domain is hard to achieve. In fact, it is very challenging to derive good concentration inequalities for 40

sums of transformed independent random variables where the transform is neither bounded nor globally Lipschitz. For progress along this line when the transform is a polynomial, we refer to readers to Kim and Vu (2000), Vu (2002) and Schudy and Sviridenko (2012). Now we present the comparison. For location-shift families with separable CFs, recall K (t, x; µ0 ) =

Z

[−1,1]

ω (s) w (ts, x) ds, rµ0 (ts)

where w (y, x) = cos (yx − hµ0 (y)) and k∂y hµ0 k∞ ≤ Cµ0 < ∞ is assumed. So, kwk∞ < ∞ and k∂y w (·, x)k∞ ≤ C˜0 |x| + C˜

(72)

for finite, positive constants C˜0 and C˜ that do not depend on x. This, together with the locationshift property, implies uniform consistency in frequency domain for ϕˆm (t, z) for an admissible ω. Let us examine Constructions II and III. First, consider Construction II. For (40), i.e., K (t, x; θ0 ) = H (η0 ) where

Z

w (ts, x) ω (s) ds with η0 = eθ0 , [−1,1]

y x cos 2−1 πx − yη0 w (y, x) = cx x!



for y ≥ 0 and x ∈ N.

So, kwk∞ = ∞, and k∂y w (·, x)k∞ ≤ C˜0 |x| + C˜ does not hold. Secondly, consider Construction

III. Recall (44) for Gamma family, i.e.,

K (t, x; µ0 ) = where

Γ (σ) ζ (θ0 )

Z

w (ts, x) ω (s) ds,

[−1,1]

 ∞ X (−yx)n cos 2−1 πn + yξ (θ0 ) w (y, x) = for y ≥ 0 and x > 0. n!Γ (σ + n) n=0

Decompose w (y, x) into the sum of four series ∞ X (−yx)4l+l cos 2−1 π (4l + l′ ) + yξ (θ0 ) Sl′ (x, y) = (4l + l′ )!Γ (σ + 4l + l′ ) ′

l=0



for l′ ∈ {0, 1, 2, 3} .

Then the summands in Sl′ (x, y) for each l′ has a fixed sign uniformly in x, y and n. Further,  there exists a sequence of y → ∞ such that cos π n + yξ (θ0 ) is positive uniformly in n. Thus, 2

there exists a sequence of x such that kwk∞ = ∞.

41

8.3

On Construction III for Ressel and Hyperbolic Cosine families

For Ressel and Hyperbolic Cosine families, which are non-location-shift NEF-CVFs, we suspect that Construction III cannot be implemented based on the following initial results on their moment sequences. Example 16 Ressel family with basis dβ σxx+σ−1 e−x (x) = f (x) = 1 (x) for σ > 0 dν Γ (x + σ + 1) (0,∞) µ2 σ

and variance function V (µ) =

1+

µ σ

for µ > 0. Note that

R∞ 0

(73)

β (dx) = 1 by Proposition

5.5 of Letac and Mora (1990). From Bar-Lev et al. (2016) and references therein, we know the following: the Laplace transform Lσ (θ) of β cannot be explicitly expressed in θ; L1 (θ) = exp (−˜ η (−θ)) where η˜ (−θ) is the solution to the functional equation η˜ (−θ) = log (1 + η˜ (−θ) − θ) with θ ≤ 0; µ Lσ (θ) = (L1 (θ))σ , θ (µ) = log 1+µ − µ−1 and L1 (θ (µ)) = 1+µ ; limx→0+ dβ µ dν (x) = 0 when σ > 1. R n xθ ∗ Let us compute c˜n = x e β (dx) for n ∈ N. Recall Hankel’s formula for the reciprocal

Gamma function, i.e.,

1 ι = Γ (z) 2π

Z

C

(−t)−z e−t dt with ℜ (z) > 0,

where C is the Hankel contour that wraps the non-negative real axis counterclockwise once and

the logarithm function log is such that log (−t) ∈ R for t < 0; see, e.g., Section 12.22 of

Whittaker and Watson (1940) for details on this. Then, for n ≥ 1 we obtain xx+σ ι = Γ (x + σ + 1) 2π

Z

(−t)−x−σ−1 e−xt dt for x > 0

C

and c˜∗n

= =



Z

Z0 ∞ 0

σxx+σ+n−1 −x(1−θ) e dx Γ (x + σ + 1) σxn−1 exp (−x (1 − θ + t + log (−t))) dx

ισ (n − 1)! = 2π

Z

C

(−t)−σ−1 dt. (1 − θ + t + log (−t))n

ι 2π

Z

(−t)−σ−1 dt C

Let b (θ) be the lower real branch of the solutions in t to the functional equation 1 − θ + t + log (−t) = 0. 42

(74)

Then b (θ) is the Lambert W function W−1 (˜ z ) with z˜ = − exp (θ − 1) and domain z˜ ∈ [−e−1 , 0)  such that W−1 (˜ z ) decreases from W−1 −e−1 = −1 to W−1 (0−) = −∞; see, e.g., Corless et al.

(1996) for details on this. When θ < 0, b (θ) > −1. Since b (θ) is a pole of order n for the ˜ (t) in (74), the residue theorem implies integrand R  dn−1  n ˜ (t − b (θ)) R (t) . t→b(θ) dtn−1

c˜∗n = −σ lim In particular, c˜∗1 = −σ

−σ (−1)−σ−1 W−1 (− exp (θ − 1)) (−b (θ))−σ−1 = −σ , −1 1 + b (θ) 1 + W−1 (− exp (θ − 1))

and c˜∗n is a complicated function in W−1 (− exp (θ − 1)) when n is large. So, Ressel family is

unlikely to have a separable moment sequence.

Example 17 Hyperbolic Cosine family with basis dβ 2σ−2  σ x  2 (x) = + ι for σ > 0 and x ∈ R Γ dν πΓ (σ) 2 2

and Fourier transform L (ιt) = (cosh t)−σ as shown on page 28 of Letac and Mora (1990). So, −σ  = (cos θ)−σ for |θ| < 2−1 π, L (θ) = 2σ e−ιθ + e−ιθ   2 µ (θ) = σ tan θ and V (µ) = σ 1 + µσ2 for µ ∈ R. From Theorems 2 and 3 of Morris (1982), R we see that c˜∗n = xn eθx β (dx) for n ≥ 3 is a polynomial of degree n in µ with at least one

nonzero term of order between 1 and n − 1. So, Hyperbolic Cosine family is unlikely to have a

separable moment sequence.

Acknowledgements Part of the research was funded by the New Faculty Seed Grant provided by Washington State University. I am very grateful to G´erard Letac for his guidance on my research involving natural exponential families and constant encouragements, Jiashun Jin for providing the technical report Jin (2006) and warm encouragements, Kevin Vixie, Hong-Ming Yin and Charles N. Moore for discussions on solution of integro-differential equations, Sheng-Chi Liu for discussions on solutions of algebraic equations, and Ovidiu Costin and Sergey Lapin for help with access to two papers.

References Auer, P. and Doerge, R. (2010). Statistical design and analysis of RNA-Seq data, Genetics 43

(185): 405–416. Bar-Lev, S., Boukai, B. and Landsman, Z. (2016). The Kendal-Ressel exponential dispersion model: Some statistical aspects and estimation, International Journal of Statistics and Probability 5(3). Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in mutliple testing under dependency, Ann. Statist. 29(4): 1165–1188. Blanchard, G. and Roquain, E. (2009). Adaptive false discovery rate control under independence and dependence, J. Mach. Learn. Res. 10: 2837–2871. Cai, T. T. and Jin, J. (2010). Optimal rates of convergence for estimating the null density and proportion of nonnull effects in large-scale multiple testing, Ann. Statist. 38(1): 100–145. Cai, T. T. and Sun, W. (2009). Simultaneous testing of grouped hypotheses: Finding needles in multiple haystacks, J. Amer. Statist. Assoc. 104(488): 1467–1481. Carpentier, A. and Verzelen, N. (2017). Adaptive estimation of the sparsity in the gaussian vector model, arXiv:1703.00167 . Chen, X. (2018). Consistent FDR estimation for adaptive multiple testing Normal means under principal correlation structure, arXiv:1410.4275v4 . Chen, X. and Doerge, R. W. (2017). A weighted FDR procedure under discrete and heterogeneous null distributions, arXiv:1502.00973v4 . Chen, X., Doerge, R. W. and Heyse, J. F. (2018). Multiple testing with discrete data: proportion of true null hypotheses and two adaptive FDR procedures, Biometrial Journal 60(4): 761–779. Corless, R. M., Gonnet, G. H., Hare, D. E. G., Jeffrey, D. J. and Knuth, D. E. (1996). On the lambertw function, Adv. Comput. Math. 5(1): 329–359. Costin, O., Falkner, N. and McNeal, J. D. (2016). Some generalizations of the RiemanncLebesgue lemma, Am. Math. Mon. 123(4): 387–391. Di, Y., Schafer, D. W., Cumbie, J. S. and Chang, J. H. (2011). The NBP negative binomial model for assessing differential gene expression from RNA-Seq, Stat. Appl. Genet. Mol. Biol. 10(1): 24 pages. Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical bayes analysis of a microarray experiment, J. Amer. Statist. Assoc. 96(456): 1151–1160. Fischer, M. J. (2014). Generalized Hyperbolic Secant Distributions, Springer.

44

Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control, Ann. Statist. 32(3): 1035–1061. Gilbert, P. B. (2005). A modified false discovery rate multiple-comparisons procedure for discrete data, applied to human immunodeficiency virus genetics, J. R. Statist. Soc. Ser. C 54(1): 143– 158. Gronwall, T. H. (1932). An inequality for the bessel functions of the first kind with imaginary argument, Ann. Math. 33(2): 275–278. Hassairi, A. and Zarai, M. (2004). Characterization of the cubic exponential families by orthogonality of polynomials, Ann. Probab. 32(3B): 2463–2476. Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc. 58(301): 13–30. Hu, J. X., Zhao, H. and Zhou, H. H. (2010). False discovery rate control with groups, J. Amer. Statist. Assoc. 105(491): 1215–1227. Jin, J. (2006). Propotions of nonzero normal means: universal oracle equivalence and uniformly consistent estimations, Technical report. Department of statistics, Purdue University, West Lafayette . Jin, J. (2008). Proportion of non-zero normal means: universal oracle equivalences and uniformly consistent estimators, J. R. Statist. Soc. Ser. B 70(3): 461–493. Jin, J. and Cai, T. T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons, J. Amer. Statist. Assoc. 102(478): 495–506. Jin, J., Peng, J. and Wang, P. (2010). A generalized fourier approach to estimating the null parameters and proportion of nonnull effects in large-scale multiple testing, J. Stat. Res. 44(1): 103–107. Kim, J. H. and Vu, V. H. (2000). Concentration of multivariate polynomials and its applications, Combinatorica 20(3): 417–434. Kumar, P. R. and Bodhisattva, S. (2016). Estimation of a two-component mixture model with applications to multiple testing, J. R. Statist. Soc. Ser. B 78(4): 869–893. Letac, G. (1992). Lectures on natrual exponential families and their variance functions, Monografias de matem´ atica, 50, IMPA, Rio de Janeiro. Letac, G. and Mora, M. (1990). Natural real exponential families with cubic variance functions, Ann. Statist. 18(1): 1–37.

45

Liu, Y., Sarkar, S. K. and Zhao, Z. (2016). A new approach to multiple testing of grouped hypotheses, J. Stat. Plan. Inference 179: 1–14. Lukacs, E. (1970). Characteristic Functions, Hafner Publishing Company. Luo, S. and Zhang, Z. (2004). Estimating the first zero of a characteristic function, Comptes Rendus Mathematique 338(3): 203–206. Mehrotra, D. V. and Heyse, J. F. (2004). Use of the false discovery rate for evaluating clinical safety data, Stat. Methods Med. Res. 13(3): 227–238. Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses, Ann. Statist. 34(1): 373–393. Morris, C. N. (1982). Natural exponential families with quadratic variance functions, Ann. Statist. 10(1): 65–80. Newman, C. M. (1976). Fourier transforms with only real zeros, Proc. Amer. Math. Soc. 61(2): 245–251. Olver, F. W. J. (1974). Asymptotics and special functions, Academic Press, Inc., New York. Pitman, J. and Yor, M. (2003). Infinitely divisible laws associated with hyperbolic functions, Canad. J. Math. 55(2): 292–330. Ploner, A., Calza, S., Gusnanto, A. and Pawitan, Y. (2006). Multidimensional local false discovery rate for microarray studies, Bioinformatics 22(5): 556–565. Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures, Ann. Statist. 34(1): 394–415. Sarkar, S. K. (2008). On methods controlling the false discovery rate, Sankhy¯ a: Series A 70(2): 135–168. Schudy, W. and Sviridenko, M. (2012). Concentration and moment inequalities for polynomials of independent random variables, Proceedings of the Twenty-third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’12, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp. 437–446. Storey, J. (2003). The positive false discovery rate: a Bayesian intepretation and the q-value, Ann. Statist. 3(6): 2013–2035. Storey, J. D. (2002). A direct approach to false discovery rates, J. R. Statist. Soc. Ser. B 64(3): 479–498.

46

Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing, J. R. Statist. Soc. Ser. B 69(3): 347–368. Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation in simultaneous conservative consistency of false discover rates: a unified approach, J. R. Statist. Soc. Ser. B 66(1): 187–205. Swanepoel, J. W. H. (1999). The limiting behavior of a modified maximal symmetric 2s-spacing with applications, Ann. Statist. 27(1): 24–35. Szeg¨ o, G. (1975). Orthogonal polynomials, American Mathematical Society, New York. Talagrand, M. (1996). Majorizing measures: the generic chaining, Ann. Probab. 24(3): 1049– 1103. Vu, V. H. (2002). Concentration of non-lipschitz functions and applications, Random Structures & Algorithms 20(3): 262–316. Whittaker, E. and Watson, G. (1940). A Course of Modern Analysis, Cambridge University Press.

47

Suggest Documents