Generalized Likelihood Functions and Random Measures 1 ... - Hikari

4 downloads 0 Views 108KB Size Report
Karlovassi GR-83 200, Samos, Greece. Copyright c 2014 Christos E. ... In [4], the problem of estimation of a probability density function was ex- amined and the ...
Pure Mathematical Sciences, Vol. 3, 2014, no. 2, 87 - 95 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/pms.2014.437

Generalized Likelihood Functions and Random Measures Christos E. Kountzakis Department of Mathematics University of the Aegean Karlovassi GR-83 200, Samos, Greece c 2014 Christos E. Kountzakis. This is an open access article distributed Copyright  under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract The density estimators we construct in this paper, called generalized likelihoods, rely on specific random measures. The key idea is to approximate an unknown density function -given a sample of its observations through another function which is made by another family of known density functions. Under this point of view, we conclude that under some assumptions the classes of distributions, which constitute the maximum domains of attraction can be viewed to be two instead of three and we discuss the question of difference between the classes of equivalence of probability distributions.

Mathematics Classification: 62K05; 60GG57; 62K20 Keywords: Random measures; Density estimation

1

Introduction

Many methods for the estimation of an unknown density function f by means of functions of i.i.d random variables X1 , X2 , . . . , Xn have been proposed in recent years. These include the kernel methods studied in [4], the characteristic function approach studied in [5] and the properties for L´ evy Random measure studied by [3] have shown that in certain cases these estimates are best possible. In [4], the problem of estimation of a probability density function was examined and the problem of determining the mode of a probability density

88

Christos E. Kountzakis

function, as well. The construction of a family of estimates of f (x) was indicated, and of the mode, which are consistent and asymptotically normal. The problem of estimating the mode of a probability density function is somewhat similar to the problem of maximum likelihood estimation of a parameter. Also, a random measure is said to be a submeasure of a second random measure if its probability law is absolutely continuous with respect to that of the second. In [3], it was proved that if the second measure is a L´ evy random measure then the submeasure is L´ evy if and only if the Radon-Nikodym derivative satisfies a natural factorization condition. In this paper, we actually show that the Weibull and Fr´ echet distribution families do and do not belong respectively, to the equivalence class of densities defined by the kernel of Gumbel distributions. For this we use a random measure in accordance with the definition given in [3]. These families are important, since they are the Maximum Domains of Attraction, but we also discuss other distribution families, too.

2

Random Measures and Kernel Densities

We consider a probability space (Ω, F , μ) and a measurable space (E, E). A random measure in [3] ξ on (E, E) over the probability space (Ω, F , μ) is a map ξ : E × Ω → R+ , such that: 1. the map ω → ξ(A, ω) is a random variable for any A ∈ E 2. the map A → ξ(A, ω) is a measure on E, μ- almost surely in Ω. In the next, the measurable spaces (Ω, F ) and (E, E) are identified to (R, BR ), while μ = λ, where λ denotes the Lebesgue measure on R. In [5] and [4] a function K : R → R is called kernel if is satisfies the following properties - in the next called kernel conditions: (i) sup{|K(x)|, x ∈ R} < ∞, (ii) lim |xK(x)| = 0, x→∞



(iii)

R

i



R

|K(x)|dx < ∞,



R

K(x)dx = 1,

x K(x)dx = 0, i = 1, 2, ..., m − 1,



m ∈ N, p ∈ R++ .

R

1

|x|m− p |K(x)|dx < ∞,

89

Generalized likelihood functions and random measures

The Sobolev space Wp(m) (M), where M > 0, p > 0 and m ∈ N is a function space over the domain R for example which consists of the functions {f : R → R|f (k) , is absolutely continuous for k = 1, 2, ..., m − 1, f (m) ∈ Lp (R), f (m) p ≤ M}.

A function f : I → R, where I is an interval of R is absolutely continuous on I if for any ε > 0, there is a δ > 0 such that whenever a finite family of  sub-intervals (xk , yk ), k = 1, 2, ..., m of I exists such that m i=1 (yi − xi ) < δ, m then i=1 (f (yi ) − f (xi )) < ε holds. A function f is aboslutely continuous on R if it is absolutely continuous on every interval of R. A criterion for the absolute continuity is the existence of the first derivative f λ-almost everywhere and that f  is Lebesgue integrable. 

The estimator of the density f ∈ Wp(m) (M) which relies on the ordered sample t1 , t2 , ...., tn of it and also on the kernel K is equal to n 1  x − ti K( ). fˆn (x) = nhn i=1 hn

hn > 0 is selected so that hn → 0, nhn → ∞. For example hn = textbfN.

√1 , n n



If we suppose that m = 2, p = 1, K(x) ≥ 0, x ∈ R, then the above conditions become sup{K(x), x ∈ R} < ∞, lim |x|K(x) = 0,

x→∞



R

K(x)dx = 1,



R

xK(x)dx = 0.

90

Christos E. Kountzakis

According to [5, Theorem 4.1], 

R

(f (x) − fˆn (x))2 dx → 0,

while n → ∞ and for m = 2, p = 1 the above integral is less than c > 0, if f ∈

(2) W1 .

c

2

n3

, where

We may use other density functions which satisfy the previous conditions of the kernels in order to approximate another unknown density function, by taking observations of it. We consider a random measure ξ : BR × Y → R+ , such that: 1. the map y → ξ(A, y) is a random variable for any A ∈ BR 2. the map A → ξ(A, y) is a measure on E, λ- almost surely in Y, 3. λ is the Lebesgue measure restricted on the open set Y ⊆ R. Hence by this way for a single-parametric family of densities satisfying the kernel conditions, whose parametric space is the set Y, we may define the random measure ξK whose the measure -part is ξK (y)(x) = Ky (x), x ∈ R, y ∈ Y. If we take a single-parametric family of density kernels whose parameter takes the value y or else if we take any measure-value of the random measure ξK on Y, then the convergence result [5, Theorem 4.1], takes the form: (2)

Theorem 2.1 If f (x) is an unknown density in W1 ordered sample from it, then 

R

and t1 , t2 , ...., tn is an

(f (x) − fˆn (x))2 dx → 0,

and if m = 2, p = 1 the above integral is less than

c1

2

n3

, where c1 > 0 and

n 1  x − ti ˆ Ky ( ). fn (x) = nhn i=1 hn

hn > 0 is selected so that hn → 0, nhn → ∞ if y ∈ Y and every Ky satisfies the kernel conditions.

Generalized likelihood functions and random measures

91

Proof: The proof arises from [5, Theorem 4.1]. Definition 2.2

n 1  x − ti ˆ Ky ( ) fn (x) = nhn i=1 hn

is called generalized likelihood function for f (x).

3

Classes of Equivalence for Densities

The closure of the class KY = {Ky , y ∈ Y, Y ⊆ R}, is FK = {f (x) ∈

(2) W1,+ |

 ∞ −∞

(f (x) − fˆKy ,n )2 → 0, n → +∞},

for some ordered sample t1 , t2 , ...tn , ... from f (x) and for some sequence (hn )n∈textbfN , such that hn > 0 and hn → 0, while nhn → +∞. (2) We may define a binary relation ≈ between the densities in W1 according to the families of kernels, which satisfy the kernel conditions and approximate them. This relation is defined as follows: f ≈ g ⇔ f, g ∈ FK . Theorem 3.1 The relation ≈ is an equivalence relation, namely reflexive, symmetric and transitive. Proof: (i) If f ∈ FK , then f, f ∈ FK , which implies f ≈ f (Reflexivity). (ii) If f, g ∈ FK , namely f ≈ g, we notice that g, f ∈ FK , hence g ≈ f (Symmetry). (iii) If f, g ∈ FK , namely f ≈ g, while g, z ∈ FK , namely g ≈ z, we deduce that f, z ∈ FK , namely f ≈ z. Theorem 3.2 The relation ≈ divides the set of the densities for which (2) f ∈ W1 into disjoint classes of equivalence. Proof: The proof of the Theorem relies on the fact that if an unknown (2) density f ∈ W1 belongs to a pair of different classes FK , FL, for the same ε > 0 we obtain: 

n 1  x − ti ε2 Ky ( ) − f (x))2 dx < , n ≥ n1 (ε), hn 8 R nhn i=1

(



n 1  x − τi ε2 Ly ( ) − f (x))2 dx < , n ≥ n2 (ε), ηn 8 R nηn i=1

(

92

Christos E. Kountzakis

where L is another kernel, τi , i = 1, 2, ..., n are sample points from the unknown density, too (ηn )n∈textbfN ) is a sequence of real numbers such that ηn > 0 and ηn → 0, while nηn → +∞. We consider n0 (ε) = max{n1 (ε), n2 (ε)}. For this n0 (ε) and by the Minkowski Inequality, we obtain 

n n 1  x − ti 1  x − τi 2 ε2 Ky ( )− Ly ( )) < , n = n0 (ε). hn nηn i=1 ηn 4 R nhn i=1

(

The last inequality implies that n n x − ti 1  x − τi 1  Ky ( )= Ly ( ), n = n0 (ε), x ∈ R, nhn i=1 hn nηn i=1 ηn

a contradiction. Theorem 3.3 The family of normalized Gumbel distributions can be used as kernels for the approximation of a Weibull distribution according to Theorem 1. Proof: This arises from the fact that a normalized Gumbel density f  ( R xf (x)dx = 0) satisfies the kernel conditions and the abstract Weibull (2) density h ∈ W1 . The fact that the normalized Gumbel densities K(x) = 1 exp(−z − exp(−z)), z = x−m , b > 0, m ∈ R, x ∈ R, being actually the ones b b   2 with R xK(x)dx = 0, R x K(x)dx = 1, satisfy the rest kernel conditions  sup{K(x), x ∈ R} < ∞, limx→∞ |x|K(x) = 0, R K(x)dx = 1, is verified as follows.: The third condition is verified by the fact that K(x) is a density function. The first property is verified by the functional form of K(x) and specifically from the inequality |K(x)| ≤ exp(−exp(−z)). The second one is verified by the inequality |x|K(x) ≤ ez 1b e−z exp(−exp(−z)) = 1b exp(−exp(−z)). Corollary 3.4 There is at least one non-heavy tailed distribution density approximated by heavy-tailed distributions in the sense of Theorem 1. Proof: Normalized Gumbel distributions belong to the class of heavytailed distributions. On the other hand, the Weibull distribution density x k f (x) = ka ( xa )k−1exp−( a ) , x ≥ 0, for k = 1 is equal to an exponential distribution density, which is the density of a non-heavy tailed distribution. About classes of heavy-tailed distributions, see for example in [2]. Theorem 3.5 The family of Fr´ echet distributions cannot be approximated by normalized Gumbel distributions according to Theorem 1. This holds because (2) if k(x) is a Fr´ echet density, this does not belong to the Sobolev space W1 .

93

Generalized likelihood functions and random measures

Proof: The Fr´ echet distribution density is: f (x) = as ( x−m )−a−1 exp−( s x > m, s > 0. The first derivation of f(x) is: f  (x) =



a( −m+x )−a−1 s sexp(



=−

m−x )a s

x−m −a ) s

,

m−x −a ) ( x−m )−a s

a(am+sa−ax+s)exp( s s(m−x)2

a(am + sa − ax + s)exp( f (x) = − s(m − x)2 

m−x −a ) s

( x−m )−a s

.

The second derivation of f(x) is: a(a2 (m + s − x)2 + as(2m + 3s − 2x) + 2s2 )exp( f (x) = − s2 (m − x)3 

m−x −a ) s

( x−m )−a s

.

For x > m and s > 0 :  ∞ m

a(a2 (m + s − x)2 + as(2m + 3s − 2x) + 2s2 )exp( − s2 (m − x)3

m−x −a ) s

( x−m )−a s

dx = ∞.

The Fr´ echet distribution density is not aboslutely continuous on R. So (2) Fr´ echet density does not belong to the Sobolev space W1 . Definition 3.6 If two distribution families K, L, are called different if there is at least one member(density) f ∈ L, such that f ∈ / FK holds. Corollary 3.7 (Normalized) Gumbel family is different from Fr´ echet family. Proof: See the Proof of the previous Theorem. We may provide some more Examples of difference of distribution families. Theorem 3.8 Exponential family is different from the Gaussian family of distributions. Proof: Let us consider the family of canonical kernels Kσ (x) = √

1 1 exp{− 2 x2 }, 2σ 2πσ

where σ > 0. The kernel conditions sup{Kσ (x), x ∈ R} < ∞, lim |x|Kσ (x) = 0, x→∞





R

Kσ (x)dx = 1,

R

xKσ (x)dx = 0

94

Christos E. Kountzakis

hold for the canonical kernels, hence we may examine if the exponential distribution f (x) = u · exp{−ux}H0 (x), where H0 (x) is the Heaviside step function, (2) relative to 0 and u > 0. We notice that f ∈ W1 . We consider the ordered sample points t1 , t2 , ..., tn from the exponential distribution, where t1 = min{ti |i = 1, 2, ..., n} > 0. We are going to show that the following non-convergence result is true: 

n 1 1  1 −(x − ti )2 √ exp{− ( } − f (x))2 dx → +∞, hn = √ , n → +∞. 2 2 2σ hn n R nhn i=1 σ 2π

The above integral is greater or equal than the following sum of integrals: √ √ n  ∞ n n   1 σ π  σ π 1 nx2 √ = ( ( exp{− 2 }dx = 3 = ∞. 2 2nπσ 2 σ i=1 0 i=1 2nπσ 2 n i=1 4n 2 Hence, the family of exponential distributions is different than the one of the canonical ones. We notice that both the canonical and the exponential distributions are not heavy -tailed. Theorem 3.9 The family of Pareto -type distributions is different from the Gaussian family of distributions. Let us consider the family of canonical kernels Kσ (x) = √

1 1 exp{− 2 x2 }, 2σ 2πσ

where σ > 0. The kernel conditions sup{Kσ (x), x ∈ R} < ∞, lim |x|Kσ (x) = 0, 



x→∞

Kσ (x)dx = 1, xKσ (x)dx = 0 R R hold for the canonical kernels, hence we may examine if the Pareto-tpye distribution g(x) = x23 H1 (x), where H1 (x) is the Heaviside step function, relative to (2) 1. We notice that g ∈ W1 . We consider the ordered sample points t1 , t2 , ..., tn from the exponential distribution, where t1 = min{ti |i = 1, 2, ..., n} > 0. We are going to show that the following non-convergence result is true: 

n 1  −(x − ti )2 1 1 √ exp{− } − g(x))2 dx → +∞, hn = √ , n → +∞. 2 2 2σ hn n R nhn i=1 σ 2π

(

Generalized likelihood functions and random measures

95

The above integral is greater or equal than the following sum of integrals: n  ∞ 

(

i=1

1

n 

4 ( √ 2 dx ≥ 3 σ 2nπx exp{ ntσi2x }

2 = +∞. 1 i=1 σ 2nπexp{ σ2 } √

We also notice that this Pareto -type distribution is a heavy-tailed one, while a Gaussian distribution is not a heavy -tailed distribution.

References [1] Aliprantis C.D., Border K.C. (1999). Infinite Dimensional Analysis, A Hitchhiker’s Guide, (second edition). Springer. [2] Cai, J., Tang, Q. (2004). On max-sum equivalence and convolution closure of heavy-tailed distributions and their applications. Journal of Applied Probability, 41 117-130. [3] Karr A. (1978). L´ evy Random Measures, Annals of Probability 6, 57–71. [4] Parzen E. (1962). On Estimation of a Probability Density and Mode. Annals of Mathematical Statistics 33, 1065–1076. [5] Wahba G. (1975). Optimal Convergence Properties of Variable Knot, Kernel, and Orthogonal Series Methods for Density Estimation. Annals of Statistics 3, 15–29. Received: March 18, 2014

Suggest Documents