Sep 7, 2015 - with given Fisher information and minimal Shannon entropy must be ... Fisher information associated with Renyi entropy including α-th moment ...
Extended inequalities for weighted Renyi entropy involving generalized Gaussian densities Salimeh Yasaei Sekeh
arXiv:1509.02190v1 [cs.IT] 7 Sep 2015
Abstract In this paper the author analyses the weighted Renyi entropy in order to derive several inequalities in weighted case. Furthermore, using the proposed notions α-th generalized deviation and (α, p)-th weighted Fisher information, extended versions of the moment-entropy, Fisher information and Cram´er-Rao inequalities in terms of generalized Gaussian densities are given.
1
The weighted p-Renyi entropy
In 1960, Renyi was looking for the most general definition of information measures that would preserve the additivity for independent events and was compatible with the axioms of probability. He started with Cauchy’s functional equation and he ended up with the general theory of means. The consequence of this investigation derived the definition of Renyi entropy, see [16]. In addition, Stam [19] showed that a continuous random variable with given Fisher information and minimal Shannon entropy must be Gaussian. However the moment-entropy inequality established the result saying: a continuous random variable with given second moment and maximal Shannon entropy must be Gaussian as well. Cf. [10, 25, 14, 6]. Furthermore, The Cram´er-Rao inequality shows that the second moment of a continuous random variable is bounded by the reciprocal of its Fisher information. Cf. [7]. Currently in [13], the notions of relative Renyi entropy, (α,p)-th Fisher information, as a general form of Fisher information associated with Renyi entropy including α-th moment and deviation has been introduced. More results and certain applications emerged in [5, 11, 2, 3]. Later, an interesting generalization of Stam’s inequality involving the Renyi entropy was given, check again [13], wherein it has been asserts that the generalized Gaussian densities maximizes the Renyi entropy with given generalized Fisher information. The initial concept of the weighted entropy as another generalization of entropy was proposed in [1, 8, 4]. Certain applications of the weighted entropy has been presented in information theory and computer science (see [17, 18, 21, 24, 22]). Let us now to give the definition of the weighted entropy. For given function x ∈ R 7→ ϕ(x) ≥ 0, and an RV X : Ω → R, with a PM/DF f , the weighted entropy (WE) of X (or f ) with weight function (WF) ϕ is defined by Z w w (1.1) hϕ (X) = hϕ (f ) = − ϕ(x)f (x) log f (x)dx = −EX (ϕ log f ) R
2010 Mathematics Subject Classification: 60A10, 60B05, 60C05. Key words: weight function, weighted Renyi entropy, weighted Renyi entropy-power, weighted Fisher information, generalized Gaussian densities.
1
whenever the integral
Z
R
ϕ(x)f (x) 1 ∨ | log f (x)| dx < ∞. (A standard agreement 0 =
0 · log 0 = 0 · log ∞ is adopted throughout the paper). Next, for two functions, x ∈ R 7→ f (x) ≥ 0 and x ∈ R 7→ g(x) ≥ 0, the relative WE of g relative to f with WF ϕ is defined by Z f (x) w dx. (1.2) ϕ(x)f (x) log Dϕ (f kg) = g(x) R When ϕ ≡ 1 the relative WE yields the Kullback-leibler divergence. we refer the reader [1, 4, 20]. Following the same argument as in the WE, here, we propose a generalization form of the WE named p-th weighted Renyi entropy (WRE). As we said, the aim of this work is to extend the aspects introduced in [13] to the weighted case. On the other hand, the famous assertions like the moment-entropy, the Fisher information and the Cram´er-Rao inequalities are generalized to the weighted case referring the generalized Gaussian densities. One sees if ϕ ≡ 1 the results coincide with the standard forms. In author’s opinion, this work is devoted in a similar way as analysing the Renyi entropy by elaborating newly established assertions for the WE. This leads several interesting bounds stem from the particular cases of α and p. Let us begin with the p-th WRE’s definition. Definition 1.1 The p-th weighted Renyi entropy (WRE) of a RV X with probability density function (PDF) f in R, given WF ϕ and for p > 0, p 6= 1, is defined by Z 1 w w (1.3) log ϕ(x)f p (x)dx. hϕ,p (X) := hϕ,p (f ) = 1−p R Moreover, we propose the p-th weighted Renyi entropy power (WRP) of a PDF f by w Nϕ,p (f ) = exp hw (1.4) ϕ,p (f ) . Here and below we assume that integrals are with respect to Lebesgue measure over the real line R and absolutely convergent. Throughout the paper we also suppose that the entropies are finite as well as the expectation Ef [ϕ]. Observe that w w lim Nϕ,p (f ) = Nϕ,1 (f ) = exp
p→1
hw ϕ (f ) . Ef [ϕ]
(1.5)
On the other words as p → 1 the WRP intends to the weighted entropy power (WEP), see w [6, 12, 23]. Note that both hw ϕ,p (f ) and Nϕ,p (f ) are continuous functions in p. Remark 1.1 Recalling the definition of (α, p)-th Fisher information (FI) of PDF f with derivative function f ′ : βp Z |f p−2 f ′ |β f dx, for β ∈ (1, ∞], α ∈ (1, ∞), (1.6) Jα,p (f ) = R
and α−1 + β −1 = 1. Particularly ϕ = |f ′ |β f reads an obvious relation βr/(1−p) w Nϕ,p (f ) = Jα,r (f )
where
r = (p + 2β − 2)/p.
This emphasizes a practical fact that all drived assertions in terms of p-th WRE can be applied for the (α, p)-th FI as well. For more details refer [2, 3, 11].
2
Next, extending the standard notions, we can also introduce the relative p-th WRP of f and g: for p > 0, p 6= 1 and given WF x ∈ R 7→ ϕ(x) ≥ 0
w Nϕ,p (f, g)
=
Z
ϕg
p−1
f dx
R
Z
1/(1−p) Z
ϕ f p dx
R
p
ϕ g dx R
1 p(1−p)
1/p
.
(1.7)
Therefore when p → 1, one yields w w Nϕ,1 (f, g) = lim Nϕ,p (f, g) = exp p→1
(
) Dϕw (f kg) hw (g) ϕ . − Ef [ϕ] Eg [ϕ]
(1.8)
More generally, for given two functions f and g we employ the relative p-th WRP and define the relative p-th WRE of f and g by w w Dϕ,p (f kg) = log Nϕ,p (f, g).
(1.9)
Evidentally as p → 1, the relative p-th WRE does not imply the relative WE accurately, although it is a linear function of Dϕw (f kg), (1.8). Going back to (1.7), the relative p-th WRE is continuous in p. Theorem 1.1 Given non-negative PDFs f , g and WF ϕ, one has a) w (f kg) ≥ 0, Dϕ,p
(1.10)
assuming p > 0, p 6= 1 and with equality iff f ≡ g. Z ϕ dx exists, suppose the following inequality b) For p = 1, given WF ϕ such that R
2 Ef [ϕ] hw ϕ (g) + Eg [ϕ] ≤ Ef [ϕ]Eg [ϕ]
(1.11)
w (f kg) ≥ 0. Equality occurs when is fulfilled, then (1.10) holds true, that is Dϕ,1 hw ϕ (g) ≤ 0 and f ≡ g.
Proof The case p = 1 follows by using the Gibbs inequality: w (f kg) −Dϕ,1 Z 1 g 1 = ϕ f 1(f > 0) log dx + hw ϕ (g) Ef [ϕ] ZR f E [ϕ] g 1 g 1 ≤ ϕ f 1(f > 0) − 1 dx + hw ϕ (g) Ef [ϕ] RZ f E [ϕ] g Z 1 1 ≤ ϕ f dx + ϕ g dx − hw (g) ≤ 0. Ef [ϕ] Eg [ϕ] ϕ R R
If p > 1, owing to the H¨older inequality (see Lemma 1 cf. [13]), one can write Z Z p−1 1 1 ϕ p f dx ϕp g ϕ gp−1 f dx = R R 1 p−1 Z Z p p p p ϕ f dx . ϕ g dx ≤ R
R
3
Next for p < 1 one also derives from the H¨older inequality: Z Z 1−p p dx ϕ gp−1 f ϕ gp ϕ f p dx = R 1−p p Z RZ p p−1 . ϕ g dx ϕg f dx ≤ R
R
The equality takes place from the equality in the H¨older inequality.
Example 1.1 Let the WF be ϕ(x) = exp(γx). Consider the PDFS f ∼ Exp(λ1 ), g ∼ Exp(λ2 ) such that λ1 , λ2 > 0. Then for p > 0, p 6= 1 under the following list of suppositions: λ2 (p − 1) + λ1 − γ > 0
and
λi p − γ > 0,
i = 1, 2.
we compute w (f kg) = Dϕ,p
1 log λ1 p − γ p(1 − p) 1 1 log λ2 (p − 1) + λ1 − γ − log λ2 p − γ . − 1−p p
(1.12)
It can be easily seen that for p 6= 1, (1.12) takes non-negative values. Now for γ < 0 implement (1.11): λ1 λ2 (λ1 − γ)(λ2 − γ) λ22 λ22 λ1 λ2 ≥ log λ2 + + . (λ1 − γ)2 (λ1 − γ) (γ − λ2 ) (λ2 − γ)
(1.13)
In particular choose p = 1, applying some straightforward calculations leads w Dϕ,1 (f kg) = log λ1 +
λ2 λ2 − λ1 − . (λ1 − γ) (λ2 − γ)
(1.14)
w (f kg) Here also γ < 0. Performing numerical simulation, one can show the behavior of Dϕ,1 for chosen value λ1 , λ2 over selected range of γ. It can be indicated that for instance if γ ∈ (−5, −1), in special case λ1 = 0.1, λ2 = 1, the both inequalities (1.11) and (1.10) in p = 1 are violated. Whereas one pictorially can demonstrate when γ ∈ (−3, −1), λ1 = 0.1 and λ2 = 10, the assertion (1.10) holds true while the assumption (1.11) is not satisfied. Further, with the same forms of f and g, we observe that bounds (1.10) and (1.11) are fulfilled with λ1 = 1.5, λ2 = 3.5 and γ ∈ (−10, −1).
Definition 1.2 Given the WF x ∈ R → 7 ϕ(x) ≥ 0 and α ∈ (0, ∞) we introduce the α-th generalized moment of a PDF f as follows: Z ϕ(x)|x|α f (x) dx. (1.15) µϕ,α (f ) = R
Considering that the integral exists. Sequently define the α-th generalized deviation of the PDF f for α ∈ [0, ∞]: Z 1 ϕ(x)f (x) log |x| dx α = 0, exp E [ϕ] f R 1/α (1.16) σϕ,α (f ) = µϕ,α (f) α ∈ (0, ∞), esssup ϕ(x)|x| : f (x) > 0 α = ∞. Note that all definitions are valid when the integrals exist, on the other hand the expressions are finite. Also it is worthwhile to mention that σϕ,α (f ) is continuous in α. Observe that taking ϕ ≡ 1, (1.16) coincides with the standard p-th deviation, (6) in [13]. 4
Remark 1.2 Assume for α ∈ (0, ∞) the function g(x) = f (x)|x|α χ : R 7→ C, where χ := Ef [|x|α ], is integrable. For any real number ξ, let ϕ(x) = e−2πixξ . Then α-th generalized moment of f becomes a proportion of the Fourier transform for g, i.e. b g(ξ). Also in special case ϕ(x) = eitx , t ∈ R the µϕ,α (f ) equals the proportion of characteristic function for g. Furthermore, consider ϕ addresses to the non-negative polynomial function of |x|, that is for which constants a0 , . . . , an that ϕ(x) = an |x|n + an−1 |x|n−1 + · · · + a1 |x| + a0 .
(1.17)
and ϕ ≥ 0. Then one obtains µϕ,α (f ) =
n X
ai µα+i (f ).
i=0
Here µα+i (f ) stands α + i-th moment of f defined in [13]. Example 1.2 Let ϕ(x) takes the form (1.17), X ∼ Exp(λ). Then X n (α + i)! 1/α . ai α+i σϕ,α (f ) = λ i=0
In spite of α-th moment, this function is not always increasing in α ∈ (0, ∞). For instance let n = 3, a0 = 1, a1 = −2, a2 = −1, a3 = 2 and λ ∈ (0.5, 1.2), then σϕ,α (f ) decreases in the range α ∈ (1, 2).
2
Generalized p-Gaussian densities
Let us continue the paper with weighted information measures of generalized p-Gaussian densities. For this, let X be a RV in R. Then for α ∈ [0, ∞] and p > 1 − α, the generalized p-Gaussian has the PDF ( 1/(p−1) p 6= 1, aα,p 1 + (1 − p)|x|α + (2.1) G(x) = α aα,1 exp − |x| p = 1, here we use the notation x + = max{x, 0} and α(1 − p)1/α p < 1, 1 2β( α1 , 1−p − α1 ) α p = 1, aα,p = 2Γ( α1 ) α(p − 1)1/α p > 1. 2β( 1 , p ) α p−1
Note that for p > 1 and α = 0 the G is defined almost every x ∈ R as 1/(p−1) p , here a0,p = 1 2Γ( ). G(x) = a0,p − log |x| + p−1 In addition, in case α = ∞ and p > 0, the PDF G is given as follows: ( 1 |x| ≤ 1, G(x) = 2 0 |x| > 1, here a∞,p = 21 . 5
(2.2)
(2.3)
In the context of communication transmission, a model PDF can characterize the statistical behaviour of a signal. For multimedia signals, the generalized Gaussian distribution is often used, cf. [15]. The generalized Gaussian can be applied to model the distribution of Discrete Cosine Transformed (DCT) coefficient, the wavelete transform coefficient, pixel coefficient and so on. Thus, it might be used in video and geometry compression. This distribution is also known in economic as Generalized Error Distribution (GED). We emphasize that the generalized Gaussians are also the first-dimensional version of the extremal functions for sharp Sobolev, log-Sobolev and Gagliardo-Nirenberg inequalities. Cf. [9]. Now, passing to the generalized weighted Fisher information we establish the following definition. Definition 2.1 For given α ∈ [1, ∞] and p ∈ R, the (α, p)-th weighted Fisher informaw,ϕ tion (WFI) of a PDF f denoted by Jα,p (f ) is defined in different cases: If α ∈ (1, ∞), let w,ϕ β ∈ (1, ∞] be the H¨ older conjugate of α, α−1 + β −1 = 1. The Jα,p (f ) is proposed by Z w,ϕ ϕ |f p−2 f ′ |β f dx. (2.4) Jα,p (f ) = R
1/β w,ϕ Note that the PDF f is absolutely continuous. If α = 1 then Jα,p (f ) is the essential
supremum (assuming to existence) of ϕ |f p−2 f ′ | on the support of f . If α = ∞ similar in w,ϕ [13], the Jα,p (f ) is given by Z V (ϕ f p p) − ϕ′ f p p dx, (2.5)
where V (f ) is the total variation (here assuming f p has bounded variation) and ϕ′ denotes d ϕ(x). It can be easily seen that when ϕ ≡ 1 the integral the derivative of WF ϕ, ϕ′ (x) = dx in (2.5) vanishes. For more details, we once more address the reader [10, 19, 25, 11, 2, 3]. Remark 2.1 In special form ϕ = f k |f ′ |m , k ∈ R, m > 1 − β one obvious formula reads β ′ p′ w,ϕ , (2.6) Jα,p (f ) = Jα′ ,p′ (f )
where α′ is the H¨ older conjugate of β ′ such that β ′ = m + β,
p′ = (k + pβ + 2m) (m + β).
If we deal with the WF ϕ as a polynomial function of f : ϕ = are constant and ϕ ≥ 0. Then one has βpi n X w,ϕ bi Jα,pi (f ) Jα,p (f ) = , i=0
n X
bi f i where bi , i = 0 . . . n
i=0
pi = p +
i . β
(2.7)
Note that J.,. (f ) stands as before in (1.6). By looking at (2.7) it’s not difficult to figure out the (α, p)-th WFI is not necessary decreasing in α for given p. In what follows, we will also use notation E(X) for the expectation relative to RV X with the PDF f . For arbitrary RV Z and given WF ϕ, set n n (1 − Z) o1/α (1 − Z) o1/α e Λϕ,p (Z) = E ϕ − +ϕ , (p − 1) (p − 1) (2.8) n n (1 − Z) o1/α (1 − Z) o1/α Λϕ,p (Z) = E ϕ − +ϕ . Z (1 − p) Z (1 − p) 6
Assuming all expectations are finite, the notations below are proposed allowing us to shorten the formulas throughout the paper: n o n o Θα (Z) = E ϕ(−Z 1/α ) + ϕ(Z 1/α ) , Υ(Z) = E ϕ(−e−Z ) + ϕ(e−Z ) . (2.9)
Next, by virtue of Definitions (1.1), (1.2) and (2.1), a list of weighted information measures for PDF G in various cases of α, p is established: w (G). Therefore after • For α ∈ (0, ∞) and p > 1, we start with an explicit form of Nϕ,p not complicated computations, one obtains
1/(1−p) pα e ϕ,p (Z) 1/(1−p) , (2.10) . Λ pα+p−1 where Z ∼ Beta 1/α, (2p − 1)/(p − 1) . Also assume RV Y has Beta distribution with parameters p/(p − 1), (α + 1)/α. Then owing to (1.16), one can write w 1/(p−1) Nϕ,p (G) = a−1 2 . α,p
σϕ,α (G) =
n
o1/α −1 e ϕ,p (Y ) 2 (pα + p − 1) .Λ .
(2.11)
For a subset of α as [1, ∞) and p > 1, observe
−1 β(p−1) β w,ϕ e ϕ,p (Y ). Jα,p (G) = 2(αp + p − 1) .aα,p .α .Λ
(2.12)
Note that here β is the H¨older conjugate of α. Consequently we have the following w,ϕ w (G), σ assertion involving Nϕ,p ϕ,α (G) and Jα,p (G):
e ϕ,p (Z) 1−p w,ϕ 1/β Λ w Nϕ,p (G) = p.σϕ,α (G). Jα,p (G) . . e ϕ,p (Y ) Λ
(2.13)
• For α ∈ (0, ∞) and p ∈ (1/(α + 1), 1), consider two RVs Z and Y having Beta PDFs with parameters p(α + 1) − 1)/α(1 − p), 1/α and (p(α + 1) − 1)/α(1 − p), (α + 1)/α respectively. With analogue manner one has w 1/(p−1) Nϕ,p (G) = a−1 2 . α,p
1/(1−p) n o1/(1−p) pα . Λϕ,p (Z) , pα + p − 1
(2.14)
and the expression (1.16) becomes σϕ,α (G) =
n
o1/α −1 . 2 (pα + p − 1) .Λϕ,p (Y )
(2.15)
e ϕ,p (Y ) in (2.12) and taking into account the previous arguSubstituting Λϕ,p (Y ) in Λ w,ϕ ments, Jα,p (G) is drived when α ∈ [1, ∞). Thus
1−p w,ϕ 1/β Λϕ,p (Z) w Nϕ,p (G) = p.σϕ,α (G). Jα,p (G) . . Λϕ,p (Y )
(2.16)
• Let α ∈ (0, ∞) and p = 1. Suppose RVs W , W have Gamma distribution with same rate parameter 1 but different scale parameters (α + 1)/α, 1/α. To specify the p-th WRE, one yields 1 −1 hw Θα (W ). ϕ,1 (G) = EG [ϕ] log aα,1 + 2α 7
(2.17)
Consequently, it can be seen n . o w (G) = a−1 . exp Θ (W ) α Θ (W , Nϕ,1 α α α,1 1/α σϕ,α (G) = (2 α)−1/α Θα (W ) .
(2.18)
Recalling (2.4) once again, one can write a representation for the (α, 1)-th WFI: w,ϕ Jα,1 (G) = 2−1 αβ−1 Θα (W ).
(2.19)
Here Θα stands as the formula in (2.9). Also it can be checked w,ϕ 1/β 2 σϕ,α (G) Jα,1 (G) = Θα (W ).
(2.20)
e having • Case α = 0 and p > 1. In a modified setting Consider three RVs X, X, X Gamma distribution with shape parameters (2p − 1)/(p − 1), 1/(p − 1), p/(p − 1) respectively and rate parameter 1. Then one gives 1/(1−p) n o1/(1−p) p w (G) = a−1 . Υ(X) , Nϕ,p 0,p n 2(p − 1) o e . σϕ,0 (G) = exp − (p − 1).Υ(X) Υ(X) • For α = ∞ and p > 0, set ψ(x) =
Z
(2.21)
x
ϕ(t) dt. According to (1.4) one has the
0
respective formula
1/(1−p) w Nϕ,p (G) = 2p/(p−1) ψ(1) − ψ(−1) .
(2.22)
Finally (1.16) admits the representation σϕ,∞ (G) = esssup Z x ϕ(x). Regarding to conϕ′ (t) dt, then clude this part, for derivative function ϕ′ , set ψ(x) = 0
w,ϕ J∞,p (G) = (p 2p )−1 ψ(1) − ψ(−1) − 2−1−p ψ(1) − ψ(−1) .
(2.23)
Consequently
1−p w,ϕ w = p J∞,p (G) − p 2−1−p ψ(1) − ψ(−1) . Nϕ,p (G)
(2.24)
Eventually for t > 0 define Gt : R → [0, ∞) as the form Gt (x) =
1 x G( ), t t
(2.25)
which later in Section 3 will be used.
3
Main extended inequalities
We start with an extension of the moment-entropy inequality which will be applied to obtain an extended form for the Cram´er-Rao inequality. Further, let us discuss a kind of general version of the Fisher information inequality reflecting the properties of WFIs. In essence, Theorems 3.1-3.3 are deployments of their counterparts from [13]. Throughout this section we use a number of properties established in Sections 1, 2. 8
Theorem 3.1 (The extended moment-entropy inequality (MEI) cf. [13], Theorem 2.) Consider f is any PDF. For given WF ϕ and G represented in Section 2, set σϕ,α (f ) ∗ x . (3.1) ϕ (x) = ϕ σϕ,α (G) Here σϕ,α (.) represents the α-th generalized deviation. Consider α ∈ [0, ∞], p > 1/(1 + α) and the following assumptions hold: Ef [ϕ] ≥ EG [ϕ] and Ef [ϕ] ≥ EG [ϕ∗ ], for p = 1.
(3.2)
Then w (f ) Nϕ,p ≤ σϕ,α (f )
1−p p w Nϕ∗ ,p (G)
w (G) Nϕ,p
σϕ,α (G)
.
(3.3)
With equality if and only if f ≡ G. Proof: Following arguments in Theorem 2, cf. [13], we provide the proof in different cases: first for simplicity set a = aα,p and tϕ =
σϕ,α (f ) . σϕ,α (G)
(3.4)
Case 1: α ∈ (0, ∞) and p 6= 1. Owing to (2.1) and (2.25) one can write Z ϕ Gp−1 tϕ f dx Z Z R p−1 1−p p−1 1−p−α ≥ a tϕ ϕ(x)f (x) dx + (1 − p)a tϕ ϕ(x)|x|α f (x)dx R R = ap−1 t1−p Ef [ϕ] + (1 − p)t−α ϕ ϕ µϕ,α (f ) ,
(3.5)
Going back to (3.2), the RHS of (3.5) is bigger and equal than Z p−1 1−p 1−p a tϕ EG [ϕ] + (1 − p)µϕ,α (f ) = tϕ ϕ Gp dx. R
Note that the equality holds if p < 1 and equality occurs in (3.2).
Case 2: α = ∞ and p 6= 1. Observe that when α = ∞, f vanishes outside of interval [−tϕ .tϕ ]. So one derives Z R p−1 t1−p tϕ ϕf dx ϕ Gp−1 ϕ tϕ f dx = a −tϕ Z R (3.6) 1−p p ϕ G dx. ≥ ap−1 t1−p E [ϕ] = t ϕ ϕ G R
Here the inequality holds because of (3.2). The scaling identity also can be checked for Gtϕ : Z Z p 1−p ϕ(x)Gtϕ (x) dx = tϕ ϕ(tϕ x)Gp (x) dx. (3.7) R
R
9
With regard to ϕ∗ (x) = ϕ(tϕ x), recalling (3.5), (3.6) and (3.7), one yields p w 1 ≤ Nϕ,p (f, Gtϕ ) −1 Z Z Z p 1−p 1−p p p−1 p ϕ f dx ϕ Gtϕ dx = ϕ Gtϕ f dx R p R Z RZ 1−p ∗ p −1 p ϕ G dx .Nϕ,p (f ) = tϕ ϕ G dx R 1−p p R w (G) Nϕ,p Nϕw∗ ,p (G) σϕ,α (f ) . , ≤ w (f ) σϕ,α (G) Nϕ,p
(3.8)
implying (3.3). Case 3: α ∈ (0, ∞) and p = 1. By virtue of (3.2) and the Gibbs inequality in [20] one has Dϕw (f kGtϕ ) hw µϕ,α(f ) ϕ (f ) 0≤ =− − log a + log tϕ + t−α . ϕ Ef [ϕ] Ef [ϕ] Ef [ϕ] Using (3.2) and (3.4) we conclude the required result in this case by 0≤−
hw hw ϕ (G) ϕ (f ) + + log σϕ,α (f ) − log σϕ,α (G). Ef [ϕ] EG [ϕ]
Case 4: α = ∞ and p = 1. With similar analogue method in case 3, one obtains Dϕw (f kGtϕ ) hw ϕ (f ) =− − log a + log tϕ Ef [ϕ] Ef [ϕ] hw hw ϕ (G) ϕ (f ) + + log σϕ,∞ (f ) − log σϕ,∞ (G). =− Ef [ϕ] EG [ϕ]
0≤
Case 5: α = 0 and p > 1. Owing to (1.16): Z ϕ Gp dx = −ap−1 EG (ϕ) log σϕ,0 (G).
(3.9)
R
By virtue of (3.2), (1.16) and (3.4), it turns out Z ϕ Gp−1 tϕ f dx R 1−p p−1 1−p p−1 ≥ Ef [ϕ] tϕ a log tϕ − tϕ a log σϕ,0 (f ) Z Ef (ϕ) = t1−p ϕ Gp dx ϕ EG (ϕ) ZR 1−p p ≥ tϕ ϕ G dx.
(3.10)
R
Taking into account (3.8) the result is verified. According to the Gibbs inequality, the equality occurs iff f ≡ Gtϕ , for some tϕ ∈ (0, ∞), where this is implied from f ≡ G. In addition, an immediate application of Theorem 3.1 by choosing p = α = 1 in Corollary 3.1 can be established as the following: 10
Corollary 3.1 Consider RV X with the PDF f . Let ϕ(x) = |x|c , c ∈ R. For α ∈ [0, ∞], set tc (f, G) :=
σc+α (f ) c(c + α)/α . σc+α (G)
(3.11)
Suppose that PDF f obeys Ef |X|c ≥ EG |X|c ,
h c i Ef |X|c ≥ EG tc (f, G) X , if p = 1.(3.12)
and together with
Then for p > 1/(1 + α) σϕ,α (f )c+1 σϕ,p (G)c+1 ≥ , w (f ) w (G) Nϕ,p Nϕ,p
equivalently
σc+α (f )Cα σc+α (G)Cα ≥ . w (f ) w (G) Nϕ,p Nϕ,p
(3.13)
Here Cα = (c + 1)(c + α)/α. Consequently following [6, 13], the appropriate extremal distribution maximizes 1/(1−p) Z c p w . |x| f (x)dx Nϕ,p (f ) = R
with the same c + α-th moment. On the other hand a direct assertion can be expressed as the following: for any PDF f : R 7→ R where satisfies in suppositions c 1 Ef [|X|c+1 ] , c ∈ R. (3.14) Ef [|X|c ] ≥ c!, Ef [|X|c ] ≥ c+1 One yields
w (c + 1)! N|x| c ,1 (f )
2 ec+1
≤
Z
|x|c+1 f (x)dx.
(3.15)
R
1 −|x| e . Note that for instance if f ∼ Exp(λ) for chosen 2 ranges λ ∈ (3, ∞), c ∈ (−1, 0), both inequalities in (3.14) are fulfilled. This implies (3.15) as one of the variate possible bounds for WRP.
The equality occurs when f ≡
In this stage the author analyses another particular case p = 2, α = 1 in Theorem 3.1, in order to explore one more result below. Corollary 3.2 Define the quantity m(c) by 22−c . (c + 3)2−c (c + 2)1−c Under condition Ef [|X|c ] ≥
2 , (c + 2)(c + 1)
c+2>0
One has m(c)
Z
c+1
|x| R
f (x) dx
c−1
Equality holds if and only if f ≡ 1 − |x| + .
11
≤
Z
R
|x|c f 2 (x) dx.
(3.16)
Next the reader is referred to the definitions of the (α, p)-th weighted FI with the WF ϕ(x) = x2 and the (α, p)-th FI introduced in (1.6). By Taking into account the 2-th WRP 2 and the 2-th generalized deviation for given WF ϕ(x) = G′ (x) we pass to the following assertion. 3 Corollary 3.3 Consider the generalized 2-Gaussian PDF, G(x) = 1 − x2 ), x ∈ (−1, 1). 4 Let f : R 7→ R be a PDF satisfied in σ2 (f ) ≥ Define constant w(G) by
(3.17)
10 −3/2 w,x2 J2,5/2 (G) J2,2 (G) .
Then the inequality Z
2 2 J2,2 (G) . 3
2
2
x f (x) dx R
−1
≤ 2 w(G)
Z
4
x f (x) dx
R
3/2
,
(3.18)
3 1 − x2 )+ . Let us note that σ2 (f ) in (3.17) is holds true. The equality occurs when f ≡ 4 2-th deviation given by σ2 (f ) =
Z
2
x f (x) dx R
1/2
.
Recently, in [13], it has been shown that among all PDFs, the unique distribution that minimizes p-th Renyi entropy with (α, p)-th Fisher information is Gaussian. Regarding to the weighted version, employing the WF ϕ, we establish an extended assertion associated the generalized Gaussian. The proof is given in Appendix. Suppose RV X has PDF f . let (a, b), a, b ∈ [−∞, ∞] be the smallest interval containing the support of absolutely continuous PDF f . Define an increasing absolutely continuous function s : (a, b) 7→ (−k, k), for some k ∈ (0, ∞] such that ∀x ∈ (a, b) Z
x
f (t)dt = a
Z
s(x)
G(t)dt.
−k
Here G represents the generalized Gaussian density. Observe that RV S := s(X) has density G. Next consider a WF x ∈ R 7→ ϕ(x) ≥ 0. Given p, α and its H¨older conjugate β, introduce additional WFs: pβ/(p−1) α/(1−p) . (3.19) , ρ2 (x) = ϕ(x) ρ1 (x) = ϕ(x) Further, let x ∈ R 7→ T (x) ∈ S be an differentiable function. Denoting, as before, the derivation of function ρ by ρ′ , define Z p 1/(1−p) (3.20) ρs (x) = ϕ(x) e ϕ (x) , ηϕ,p (s) = s(x)ρ′s (x)f p (x)dx, S
where ϕ(x) e = ϕ(s(x)). Here
h 1 s′ (x)ϕ′ (s(x)) p ϕ′ (x) i . . + . ρ′s (x) = ρs (x). 1−p ϕ(s(x)) p − 1 ϕ(x) 12
Moreover, involving the p-th WRP of G given in Section 2, set p−1 κϕ,p = ηϕ,p (s) Nρw1 ,p (G) .
(3.21)
One deduces that when ϕ ≡ 1, ηϕ,p (s) and consequently κϕ,p (s) are removed. Theorem 3.2 (The extended Fisher information inequality (FII) cf. [13], Theorem 3.) Assume α < ∞, p > 1. Also consider 1/α + 1/β = 1, two RVs Z, Y invoking Beta distributions with the first shape parameter 1/α, p/(p − 1) and the second shape parameters (2p − 1)/(p − 1), (α + 1)/α respectively. Hence given continuous WF ϕ h N w (G) i h N w (G) ip h J w,ρ2 (f ) i1/β h Λ e ρ ,p (Y ) i α,p ϕ,p ρ1 ,p 1 − κϕ,p (s). . . ≤ w,ρ 1 w w e Nρ1 ,p (G) Nϕ,p (f ) Jα,p (G) Λρ1 ,p (Z)
(3.22)
The κϕ,p (s) refers to (3.21). If α < ∞, p ∈ (1/(1 + α), 1) substitute Λρ1 ,p (Y ) Λρ1 ,p (Z) in e ρ ,p (Y ) Λ e ρ ,p (Z). Furthermore, if p = 1 then Λ 1 1
w Nϕ,1 (G).EG [ϕ]
≤ 2−1
EG [ϕ]
w Nϕ,1 e (f )
1/β w,ϕ w,ϕ e Jα,1 (f ) Jα,1 (G) .Θα (W ) − Ef [S.(ϕ) e ′ ].
(3.23)
d ϕ. e Providing W ∼ Gamma((α + 1)/α, 1), and reduced WF ϕ(x) e = ϕ(s(x)) where ϕ e′ = dx Finally in case α = ∞: h N w (G) ip J w,ρs (f ) ∞,p ϕ,p ≤ w,ϕ − ∆ϕ,p . w Nϕ,p (f ) J∞,p (G)
(3.24)
−1 n o w,ϕ ∆ϕ,p = J∞,p (G) . p−1 ηϕ,p (s) − 2−1−p ψ(1) − ψ(−1) .
(3.25)
Here
where ψ is given by
Z
x
ϕ′ (t) dt.
0
Remark 3.1 Passing to ϕ ≡ 1, the reduced Eqns. κϕ,p (s), ∆ϕ,p and Ef [T.(ϕ) e ′ ] are vanished. Thus as result all (3.22), (3.23), (3.24) illustrate the same result as (22) in [13]. Next assertion follows directly from Theorem 3.2 in special case α = 1, p = 1.
Corollary 3.4 Let X be a RV with density f . Moreover let (a, b), a, b ∈ [−∞, ∞] be the support of f . For given map s : (a, b) 7→ (−k, k) for some k ∈ (0, ∞], such that for each x ∈ (a, b) Z
a
x
1 f (t) dt = 2
Z
s(x)
e−|t| dt.
(3.26)
−k
For constant c, consider RV S : s(X) and set Bs (f ) = Ef S S ′ e−c S . As (f ) = Ef S 2 S ′ |S|c−2 , 13
(3.27)
Then for −1 < c one has c! w c ≤ c! 2c sup |s|c |(log f )′ | − c As (f ). 2 c! exp{2 } N|s|c,1 (f )
1 1 And for − < c < one gets 2 2 1/(1−c2 ) n 1 − c2 o 2 2 w (1 − 4 c ) 2 exp (1 − c )Ne−cs ,1 (f ) 1 − 4 c2
≤ sup e−cs |(log f )′ | + c (1 − 4 c2 ) Bs (f ).
(3.28)
(3.29) (3.30)
Here s′ denotes the derivative of s with respect to x. An application of the extension version MEI and FII provides the following developed form of the Cram´er-Rao inequality: Theorem 3.3 (The extended Cram´er-Rao inequality, cf. [13], Theorem 5.) Assume the suppositions in Theorem 3.1, 3.2. For given WF ϕ the reduced WFs ϕ∗ , ρ1 are considered as before, (3.1), (3.19). Moreover set w w (G).Nϕ,p (f ). (3.31) ̟ϕ,p (f, G) = Nϕw∗ ,p (G).Nρw1 ,p (G) Nϕ,p Then the LHSs of inequalities (3.22), (3.23) and (3.22) are given by p−1 σϕ,α (G) . ̟ϕ,p (F, G) , if α < ∞, p > 1, σϕ,α
and if p = 1, by
EG [ϕ] σϕ,α (G).EG [ϕ] σϕ,α (f ) .
In particular case α = ∞, the LHS becomes
σϕ,α (G)/σϕ,α (f ). The RHS is the same as RHS in the corresponding inequalities, hence Omitted. Acknowledgements – SYS thanks Professors Yuri Suhov and Mark Kelbert for useful discussions. SYS thanks the CAPES PNPD-UFSCAR Foundation for the financial support in the year 2014-5. SYS thanks the Federal University of Sao Carlos, Department of Statistics, for hospitality during the year 2014-5.
Appendix Lemma 4, cf. [13]: Let a, b ∈ [−∞, ∞] and f : (a, b) 7→ R be an absolutely continuous function such that lim f (x) = lim f (x) = 0. x→a
x→b
Let map g : (a, b) 7→ R be an increasing absolutely continuous function such that lim g(t) > t→b Z ′ 0, and the integral , bf g dx is absolutely convergent. Then a
Z
b
′
f g dx = − a
Z
a
14
b
f ′ g dx.
Proof of Theorem 3.2: To implement the same steps as in the proof of Theorem 3.1, cf [13] one shall offer the proof in three cases: Case1: p 6= 1, α < ∞. We begin this case via computing the p-th WRP for RV S := s(t): 1 (1−p) Z b p 6= 1 ϕ(s(x)) f p (s′ )1−p dx w a Nϕ,p (G) = (3.32) n hw (g) o ϕ p=1 exp Eg [ϕ]
It can be easily seen that
w e log s′ . hw ϕ (G) = hϕ e (f ) + Ef ϕ
Here ϕ(z) e = ϕ(s(z)) and s′ is derivative function of s. Now one has
−p
w (f ) Nϕ,p
=
Z ≤
b
Za
b
w (G) Nϕ,p
ϕ f p dx
−p
(1−p)
1 (1−p) Z b ϕ(s(x)) f p (s′ )1−p dx .
(3.33)
a
ρs (x) s′ f p dx.
a
The inequality comes from the H¨older inequality with ρs in (3.20). Recall Lemma 4 cf. [13] and the notation in (3.20), then the RHS of (3.33) takes the form −p
Z
b
s(x)ρs (x)f p−1 (x)f ′ (x) dx − ηϕ,p (s).
a
At this stage let us focus on the above integral: −p
Z
b
s(x)ρs (x)f p−1 (x)f ′ (x) dx
aZ
b
α/(1−p)
1/α
α
(ϕ(x)) e |s(x)| f (x)dx ≤p 1/β Z ba pβ/(p−1) p−1−1/α ′ β ϕ(x) |f (x) f (x)| dx . a w,ρ2 1/β ≤ p.σρ1 ,α (G). Jα,p (f ) .
This leads
−p
w Nϕ,p (f )
w,ρ2 1/β w Nϕ,p (G) ≤ p.σρ1 ,α (G). Jα,p (f ) − ηϕ,p (s).
Eventually using expressions (2.13) and (2.14) gives the required results. Case2: α < ∞, p = 1. To deduce (3.23) we use Jensen’s inequality. One yields Z
b
ϕ(x)f e (x) log s′ (x)dx a Z b w ϕ(x)f e (x)s′ (x)dx. e log Ef [ϕ] e + Ef [ϕ] e log ≤ hϕe (f ) − Ef [ϕ]
w (f ) + hw ϕ (G) = hϕ e
a
15
(3.34)
By virtue of Lemma 4 in [13] and H¨older inequality once again, one obtains hw ϕ (G)
n h i1/β o w,ϕ e ′ ≤ hw (f ) − E [ ϕ] e log E [ ϕ] e + log σ (G) J (f ) . − E S ( ϕ) e ϕ,α f f ϕ e α,1
where β as in entire of the paper is the H¨older conjugate of α. Therefore applying (2.20), (1.5) and Ef [ϕ] e = EG [ϕ], completes the proof in this case.
Case3: α = ∞. Setting s(x) = −k for all x ∈ (−∞, a] and s(x) = k for all x ∈ [b, ∞), let us go back to (3.33) which gives Z b Z b −p ′ p w w s(x)ρs (x)(f p (x))′ dx Nϕ,p (f ) s(x)ρs (x)f (x)dx − Nϕ,p (G) ≤ − Z a a ≤ k ρs (x)|(f p (x))′ |dx − ηϕ,p (s) R
w,ρs (f ) − ηϕ,p (s). = p.J∞,p
Therefore the claimed bound is provided.
References [1] M. Belis and S. Guiasu. A Quantitative and qualitative measure of information in cybernetic systems. IEEE Trans. on Inf. Theory, 14 (1968), 593–594. [2] J. F. Bercher. On generalized Cram´er-Rao inequalities, generalized Fisher information and characterizations of generalized q-Gaussian distributions. Journal of Physics A: Mathematical and Theoretical, Vol. 45 (2012), No. 25, 255–303. [3] J. F. Bercher. On a (β, q)-generalized Fisher information and inequalities involving q-Gaussian distributions. J. Math. Phys., Vol. 53, (2012), no. 6, 063–303. [4] A. Clim. Weighted entropy with application. Analele Universitˇ a¸tii Bucure¸sti, Matematicˇ a, Anul LVII (2008), 223-231. [5] J. A. Costa, A. O. Hero and C. Vignat. A characterization of the multivariate distributions maximizing renyi entropy. In proceedings of 2002 IEEE International Sumposium on Information Theory, (2002), page 263. [6] T. Cover and J. Thomas. Elements of Information Theory. New York: Wiley, 2006. [7] H. Cram´er. Mathematical methods of statistics. Princton Landmarks in Mathematics. Princton, N. J: Princton University Press, 1999, reprint of the 1946 original. [8] S. Guiasu. Weighted entropy. Report on Math. Physics, 2 (1971), 165–179. [9] M. Del Pino and J. Dolbeault. Best constants for Gagliardo-Nirenberg inequalities and applications to nonlinear diffusions. J. Math. Pures Appl., Vol. 81 (2002), no. 9, 847–875. [10] R. A. Fisher. Theory of statistical estimation. Philos. Trans. Roy. Soc. London Ser A, Vol. 222 (1930), 309–368. [11] S. Furuichi. On generalized Fisher informations and Crem´er-Rao type inequalities. Journal of Physics: Conference Series 2, 201 (2010), XXX 16
[12] M. Kelbert and Y. Suhov. Information Theory and Coding by Example. Cambridge: Cambridge University Press, 2013. [13] E. Lutwak, D. Yang and G. Zhang. Cram´er-Rao and moment-entropy ineqialities for Renyi entropy and generalized Fisher information. IEEE Transaction on Information Theory, Vol 51 (2005), no. 2, 473–478. [14] E. Lutwak, D. Yang and G. Zhang. Moment-entropy inequalities. Annals of Probability, vol 32 (2004), 757–774. [15] J. R. Ohm. Multimedia Comunication Technology. Springer-Verlag Berlin Heidelberg, 2004. [16] A. Renyi. On measures of entropy and information. In Proc. Fourth Berkeley Symp. Math. Stat. Prob., Vol 1 (1960), page 547. Berkeley, (1961). University of California Press. [17] B. D. Sharma, J. Mitter and M. Mohan. On measure of ‘useful‘ information. Inform. Control 39 (1978), 323–336. [18] R. P. Singh and J. D. Bhardwaj. On parametric weighted information improvement. Inf. Sci. 59 (1992), 149–163. [19] A. Stam. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inform. Contr., Vol 2 (1959), 101–102. [20] Y. Suhov and S. Yasaei Sekeh. Simple inequalities for weighted entropies. arXiv 1409.4102. [21] Y. Suhov, S. Yasaei Sekeh. An extension of the Ky Fan inequality. arXiv:1504.01166 [22] Y. Suhov, S. Yasaei Sekeh and I. Stuhl. Weighted Gaussian entropy and determinant inequalities entropy. arXiv:1502.02188 [23] Y. Suhov, S. Yasaei Sekeh and M. Kelbert. Entropy-power inequality for weighted entropy. arXiv: 1502.02188. [24] Y. Suhov, I. Stuhl, M. Kelbert. Weight functions and log-optimal investment portfolios. arXiv:1505.01437 [25] R. Zamir. A proof of the Fisher infromation inequality via a data processing argument. IEEE Transaction on Information Theory, 44, No. 3 (1998), 1246–1250.
Salimeh Yasaei Sekeh is with the Statistics Department, DEs, of Federal University of S˜ ao Carlos (UFSCar), S˜ ao Paulo, Brazil
17