SARAJEVO JOURNAL OF MATHEMATICS Vol.5 (17) (2009), 3–12
RIEMANN-LIOUVILLE AND CAPUTO FRACTIONAL APPROXIMATION OF CSISZAR’S f –DIVERGENCE GEORGE A. ANASTASSIOU Abstract. Here are established various tight probabilistic inequalities that give nearly best estimates for the Csiszar’s f -divergence. These involve Riemann-Liouville and Caputo fractional derivatives of the directing function f. Also a lower bound is given for the Csiszar’s distance. The Csiszar’s discrimination is the most essential and general measure for the comparison between two probability measures. This is continuation of [4].
1. Preliminaries Throughout this paper we use the following. I) Let f be a convex function from (0, +∞) into R which is strictly convex at 1 with f (1) = 0. Let (X, A, λ) be a measure space, where λ is a finite or a σ-finite measure on (X, A) . And let µ1 , µ2 be two probability measures on (X, A) such that µ1 ¿ λ, µ2 ¿ λ (absolutely continuous), e.g. λ = µ1 + µ2 . dµ2 1 Denote by p = dµ dλ , q = dλ the (densities) Radon-Nikodym derivatives of µ1 , µ2 with respect to λ. Here we assume that p 0 < a ≤ ≤ b, a.e. on X and a ≤ 1 ≤ b. q The quantity ¶ µ Z p (x) Γf (µ1 , µ2 ) = q (x) f dλ (x) , (1) q (x) X was introduced by I. Csiszar in 1967, see [7], and is called f -divergence of the probability measures µ1 and µ2 . By Lemma 1.1 of [7], the integral (1) is well-defined and Γf (µ1 , µ2 ) ≥ 0 with equality only when µ1 = µ2 . In [7] the author without proof mentions that Γf (µ1 , µ2 ) does not depend on the choice of λ. 2000 Mathematics Subject Classification. 26A33, 26D15, 28A25, 60E15. Key words and phrases. Csiszar’s discrimination, Csiszar’s distance, fractional calculus, Riemann-Liouville and Caputo fractional derivatives.
4
GEORGE A. ANASTASSIOU
For a proof of the last see [4], Lemma 1.1. The concept of f -divergence was introduced first in [6] as a generalization of Kullback’s “information for discrimination” or I-divergence (generalized entropy) [11], [12] and of R´enyi’s “information gain” (I-divergence of order α) [13]. In fact the I-divergence of order 1 equals Γu log2 u (µ1 , µ2 ) . 2
The choice f (u) = (u − 1) produces again a known measure of difference of distributions that Ris called κ 2 −divergence, of course the total variation distance |µ1 − µ2 | = X |p (x) − q (x)| dλ (x) equals Γ|u−1| (µ1 , µ2 ) . Here by assuming f (1) = 0 we can consider Γf (µ1 , µ2 ) as a measure of the difference between the probability measures µ1 , µ2 . The f -divergence is in general asymmetric in µ1 and µ2 . But since f is convex and strictly convex at 1 (see Lemma 2, [4]) so is µ ¶ 1 ∗ f (u) = uf (2) u and as in [7] we get Γf (µ2 , µ1 ) = Γf ∗ (µ1 , µ2 ) . (3) In Information Theory and Statistics many other concrete divergences are used which are special cases of the above general Csiszar f -divergence, e.g. Hellinger distance DH , α-divergence Dα , Bhattacharyya distance DB , Harmonic distance DHα , Jeffrey’s distance DJ , triangular discrimination D∆ , for all these see, e.g. [5], [9]. The problem of finding and estimating the proper distance (or difference or discrimination) of two probability distributions is one of the major ones in Probability Theory. The above f -divergence measures in their various forms have been also applied to Anthropology, Genetics, Finance, Economics, Political Science, Biology, Approximation of Probability distributions, Signal Processing and Pattern Recognition. A great inspiration for this article has been the very important monograph on the topic by S. Dragomir [9]. II) Here we follow [8]. We start with Definition 1. Let ν ≥ 0, the operator Jaν , defined on L1 (a, b) by Z x 1 ν Ja f (x) := (x − t)ν−1 f (t) dt Γ (ν) a
(4)
for a ≤ x ≤ b, is called the Riemann-Liouville fractional integral operator of order ν. For ν = 0, we set Ja0 := I, the identity operator. Here Γ stands for the gamma function.
FRACTIONAL APPROXIMATION OF CSISZAR’S f –DIVERGENCE
5
Let α > 0, f ∈ L1 (a, b) , a, b ∈ R, see [8]. Here [·] stands for the integral part of the number. We define the generalized Riemann-Liouville fractional derivative of f of order α by µ ¶m Z s 1 d α Da f (s) := (s − t)m−α−1 f (t) dt, Γ (m − α) ds a where m := [α] + 1, s ∈ [a, b] , see also [1], Remark 46 there. In addition, we set Da0 f := f, Ja−α f := Daα f, if α > 0, Da−α f := Jaα f, if 0 < α ≤ 1, Dan f = f (n) , for n ∈ N.
(5)
We need Definition 2. ([3]) We say that f ∈ L1 (a, b) has an L∞ fractional derivative Daα f (α > 0) in [a, b] , a, b ∈ R, iff Daα−k f ∈ C ([a, b]) , k = 1, . . . , m := [α] + 1, and Daα−1 f ∈ AC ([a, b]) (absolutely continuous functions) and Daα f ∈ L∞ (a, b) . Lemma 3. ([3]) Let β > α ≥ 0, f ∈ L1 (a, b) , a, b ∈ R, have L∞ fractional derivative Daβ f in [a, b] , let Daβ−k f (a) = 0 for k = 1, . . . , [β] + 1. Then Z s 1 α Da f (s) = (s − t)β−α−1 Daβ f (t) dt, ∀s ∈ [a, b] . (6) Γ (β − α) a Here Daα f ∈ AC ([a, b]) for β − α ≥ 1, and Daα f ∈ C ([a, b]) for β − α ∈ (0, 1) . Here AC n ([a, b]) is the space of functions with absolutely continuous (n − 1)-st derivative. We need to mention Definition 4. ([8]) Let ν ≥ 0, n := dνe , d·e is ceiling of the number, f ∈ AC n ([a, b]) . We call Caputo fractional derivative Z x 1 ν D∗a f (x) := (x − t)n−ν−1 f (n) (t) dt, ∀x ∈ [a, b] . (7) Γ (n − ν) a ν f (x) exists almost everywhere for x ∈ [a, b] . The above function D∗a We need ν f Proposition 5. ([8]) Let ν ≥ 0, n := dνe , f ∈ AC n ([a, b]) . Then D∗a exists iff the generalized Riemann-Liouville fractional derivative Daν f exists.
6
GEORGE A. ANASTASSIOU
Proposition 6. ([8]) Let ν ≥ 0, n := dνe . Assume that f is such that both ν f and D ν f exist. Suppose that f (k) (a) = 0 for k = 0, 1, . . . , n − 1. Then D∗a a ν D∗a f = Daν f.
(8)
In conclusion ν f exists or Corollary 7. ([2]) Let ν ≥ 0, n := dνe , f ∈ AC n ([a, b]) , D∗a ν (k) Da f exists, and f (a) = 0, k = 0, 1, . . . , n − 1. Then ν Daν f = D∗a f.
(9)
We need Theorem 8. ([2]) Let ν ≥ 0, n := dνe , f ∈ AC n ([a, b]) and f (k) (a) = 0, k = 0, 1, . . . , n − 1. Then Z x 1 ν f (x) = (x − t)ν−1 D∗a f (t) dt. (10) Γ (ν) a We also need Theorem 9. ([2]) Let ν ≥ γ + 1, γ ≥ 0. Call n := dνe . Assume f ∈ ν f ∈ L (a, b) . AC n ([a, b]) such that f (k) (a) = 0, k = 0, 1, . . . , n−1, and D∗a ∞ γ Then D∗a f ∈ AC ([a, b]) , and Z x 1 γ ν D∗a f (x) = (x − t)ν−γ−1 D∗a f (t) dt, ∀x ∈ [a, b] . (11) Γ (ν − γ) a Theorem 10. ([2])Let ν ≥ γ + 1, γ ≥ 0, n := dνe . Let f ∈ AC n ([a, b]) such that f (k) (a) = 0, k = 0, 1, . . . , n − 1. Assume ∃Daν f (x) ∈ R, ∀x ∈ [a, b] , and Daν f ∈ L∞ (a, b) . Then Daγ f ∈ AC ([a, b]) , and Z x 1 γ Da f (x) = (x − t)ν−γ−1 Daν f (t) dt, ∀x ∈ [a, b] . (12) Γ (ν − γ) a 2. Results Here f and the whole setting is as in 1. Preliminaries (I). We present first results regarding the Riemann-Liouville fractional derivative. Theorem 11. Let β > 0, f ∈ L1 (a, b) , have L∞ fractional derivative Daβ f in [a, b] , let Daβ−k f (a) = 0 for k = 1, . . . , [β] + 1. Also assume 0 < a ≤ p(x) q(x) ≤ b, a.e. on X, a < b. Then ° ° ° β ° Z °Da f ° ∞,[a,b] q (x)1−β (p (x) − aq (x))β dλ (x) . (13) Γf (µ1 , µ2 ) ≤ Γ (β + 1) X
FRACTIONAL APPROXIMATION OF CSISZAR’S f –DIVERGENCE
Proof. By (6) , α = 0, we get Z s 1 f (s) = (s − t)β−1 Daβ f (t) dt, all a ≤ s ≤ b. Γ (β) a Then Z s ¯ ¯ 1 ¯ ¯ |f (s)| ≤ (s − t)β−1 ¯Daβ f (t)¯ dt Γ (β) a ° ° ° β ° Z s °Da f ° ∞,[a,b] (s − t)β−1 dt ≤ Γ (β) a ° ° ° β ° °Da f ° β ∞,[a,b] (s − a) = Γ (β) β ° ° ° β ° °Da f ° ∞,[a,b] = (s − a)β , all a ≤ s ≤ b. Γ (β + 1) I.e. we have that ° ° ° β ° °Da f ° ∞,[a,b] (s − a)β , all a ≤ s ≤ b. |f (s)| ≤ Γ (β + 1) Consequently we obtain µ ¶ Z p (x) Γf (µ1 , µ2 ) = q (x) f dλ (x) q (x) X ° ° ° β ° ¶β µ Z °Da f ° p (x) ∞,[a,b] − a dλ (x) ≤ q (x) Γ (β + 1) q (x) X ° ° ° β ° Z °Da f ° ∞,[a,b] = q (x)1−β (p (x) − aq (x))β dλ (x) , Γ (β + 1) X proving the claim.
7
(14)
(15)
(16)
(17) ¤
Next we give an Lδ result. Theorem 12. Same assumptions as in Theorem 11. Let γ, δ > 1 : γ1 + 1δ = 1 and γ (β − 1) + 1 > 0. Then ° ° ° β ° D f ° a ° δ,[a,b] Γf (µ1 , µ2 ) ≤ Γ (β) (γ (β − 1) + 1)1/γ Z β−1+ γ1 2−β− γ1 (p (x) − aq (x)) dλ (x) . (18) q (x) X
8
GEORGE A. ANASTASSIOU
Proof. By (6) , α = 0, we get again Z s 1 f (s) = (s − t)β−1 Daβ f (t) dt, all a ≤ s ≤ b. Γ (β) a Hence 1 |f (s)| ≤ Γ (β) ≤
≤
Z
s
a
¯ ¯ ¯ ¯ (s − t)β−1 ¯Daβ f (t)¯ dt
µZ
1 Γ (β) ° ° ° β ° °Da f °
s
¶1/γ µZ (s − t)γ(β−1) dt
a β−1+ γ1
(s − a)
(γ (β − 1) + 1)1/γ
Γ (β) ° ° ° β ° °Da f °
|f (s)| ≤
δ,[a,b]
Γ (β)
¯δ ¶1/δ ¯ β ¯ D f (t) ¯ a ¯ dt
s¯
a
δ,[a,b]
That is
(19)
, all a ≤ s ≤ b.
β−1+ γ1
(s − a)
(γ (β − 1) + 1)1/γ
, alla ≤ s ≤ b.
Consequently we obtain ¯ µ ¶¯ Z ¯ p ¯¯ Γf (µ1 , µ2 ) ≤ q ¯¯f dλ q ¯ X ° ° ° β ° ¶β−1+ 1 µ Z °Da f ° γ p δ,[a,b] − a dλ ≤ q q Γ (β) (γ (β − 1) + 1)1/γ X ° ° ° β ° Z °Da f ° δ,[a,b] β−1+ γ1 2−β− γ1 = (p − aq) dλ, q Γ (β) (γ (β − 1) + 1)1/γ X proving the claim.
(20)
(21)
(22) ¤
An L1 estimate follows. Theorem 13. Same assumptions as in Theorem 11. Let β ≥ 1. Then ° ° ° β ° µZ ¶ °Da f ° 1,[a,b] 2−β β−1 (q (x)) (p (x) − aq (x)) dλ (x) (23) Γf (µ1 , µ2 ) ≤ Γ (β) X Proof. By (19) we have Z s ¯ ¯ 1 ¯ β−1 ¯ β (s − t) |f (s)| ≤ ¯Da f (t)¯ dt Γ (β) a Z ¯ ° (s − a)β−1 b ¯¯ β (s − a)β−1 ° ¯ ° β ° ≤ . ¯Da f (t)¯ dt = °Da f ° Γ (β) Γ (β) 1,[a,b] a
(24)
FRACTIONAL APPROXIMATION OF CSISZAR’S f –DIVERGENCE
I.e. |f (s)| ≤
9
° (s − a)β−1 ° ° β ° , °Da f ° Γ (β) 1,[a,b]
(25)
for all s in [a, b] . Therefore
° ° ° β ° ¯ µ ¶¯ µ ¶β−1 Z D f ° a ° ¯ p ¯¯ p 1,[a,b] ¯ q ¯f Γf (µ1 , µ2 ) ≤ dλ ≤ q −a dλ q ¯ Γ (β) q X °X ° ° β ° µZ ¶ °Da f ° 1,[a,b] β−1 2−β q (p − aq) dλ , (26) = Γ (β) X Z
proving the claim.
¤
We continue with results regarding the Caputo fractional derivative. Theorem 14. Let ν > 0, n := dνe , f ∈ AC n ([a, b]) and f (k) (a) = 0, ν f ∈ L (a, b) , 0 < a ≤ p(x) ≤ b, a.e. on X, k = 0, 1, . . . , n − 1. Assume D∗a ∞ q(x) a < b. Then Z ν fk kD∗a ∞,[a,b] Γf (µ1 , µ2 ) ≤ q (x)1−ν (p (x) − aq (x))ν dλ (x) . (27) Γ (ν + 1) X Proof. Similar to Theorem 11, using Theorem 8.
¤
Next we give an Lδ result. Theorem 15. Assume all as in Theorem 14. Let γ, δ > 1 : γ (ν − 1) + 1 > 0 . Then Z ν fk kD∗a δ,[a,b] 2−ν− γ1 Γf (µ1 , µ2 ) ≤ (p (x) q (x) 1/γ Γ (ν) (γ (ν − 1) + 1) X
1 γ
ν−1+ γ1
− aq (x))
+
1 δ
= 1 and
dλ (x) . (28)
Proof. Similar to Theorem 12, using Theorem 8.
¤
It follows an L1 estimate. Theorem 16. Assume all as in Theorem 14. Let ν ≥ 1. Then µZ ¶ ν fk kD∗a 1,[a,b] 2−ν ν−1 Γf (µ1 , µ2 ) ≤ (q (x)) (p (x) − aq (x)) dλ (x) . Γ (ν) X (29) Proof. Similar to Theorem 13, using Theorem 8. Regarding again the Riemann-Liouville fractional derivative we need:
¤
10
GEORGE A. ANASTASSIOU
Corollary 17. Let ν ≥ 0, n := dνe , f ∈ AC n ([a, b]) , ∃Daν f (x) ∈ R, ∀x ∈ [a, b] , f (k) (a) = 0, k = 0, 1, . . . , n − 1. Then Z x 1 f (x) = (x − t)ν−1 Daν f (t) dt. (30) Γ (ν) a Proof. By Corollary 7 and Theorem 8.
¤
We continue with results again regarding the Riemann-Liouville fractional derivative. Theorem 18. Let ν > 0, n := dνe , f ∈ AC n ([a, b]) , ∃Daν f (x) ∈ R, ∀x ∈ [a, b] , f (k) (a) = 0, k = 0, 1, . . . , n − 1. Assume Daν f ∈ L∞ (a, b) , 0 < a ≤ p(x) q(x) ≤ b, a.e. on X, a < b. Then kDaν f k∞,[a,b] Z Γf (µ1 , µ2 ) ≤ q (x)1−ν (p (x) − aq (x))ν dλ (x) . (31) Γ (ν + 1) X Proof. Similar to Theorem 11, using Corollary 17.
¤
Next we give the corresponding Lδ result. Theorem 19. Assume all as in Theorem 18. Let γ, δ > 1 : γ (ν − 1) + 1 > 0 . Then Z kDaν f kδ,[a,b] 2−ν− γ1 (p (x) Γf (µ1 , µ2 ) ≤ q (x) 1/γ Γ (ν) (γ (ν − 1) + 1) X
1 γ
ν−1+ γ1
− aq (x)) Proof. Similar to Theorem 12, using Corollary 17.
+
1 δ
= 1 and
dλ (x) . (32) ¤
It follows the L1 estimate. Theorem 20. Assume all as in Theorem 18. Let ν ≥ 1. Then µ ¶ kDaν f k1,[a,b] Z 2−ν ν−1 Γf (µ1 , µ2 ) ≤ (q (x)) (p (x) − aq (x)) dλ (x) . (33) Γ (ν) X Proof. Similar to Theorem 13, using Corollary 17.
¤
We need Theorem 21. (Taylor expansion for Caputo derivatives, [8], p. 40) Assume ν ≥ 0, n = dνe , and f ∈ AC n ([a, b]) . Then Z x n−1 X f (k) (a) 1 ν (x − a)k + (x − t)ν−1 D∗a f (t) dt, ∀x ∈ [a, b] . f (x) = k! Γ (ν) a k=0 (34)
FRACTIONAL APPROXIMATION OF CSISZAR’S f –DIVERGENCE
11
We make Remark 22. Let ν > 0, n = dνe , and f ∈ AC n ([a, b]) . ν f ≥ 0 over [a, b] , then If D∗a (≤0)
Z
x
a
ν (x − t)ν−1 D∗a f (t) dt ≥ 0 on [a, b] . (≤0)
By (34) then we obtain f (x) ≥ (≤)
n−1 X k=0
f (k) (a) (x − a)k , k!
(35)
∀x ∈ [a, b] . Hence µ ¶ ¶k n−1 X f (k) (a) µ p p ≥ (≤) qf q − a , a.e. on X. q k! q
(36)
k=0
Consequently we get Γf (µ1 , µ2 ) ≥ (≤)
n−1 X k=0
f (k) (a) k!
Z q 1−k (p − aq)k dλ.
(37)
X
We have established ν f ≥ 0 on Theorem 23. Let ν > 0, n = dνe , and f ∈ AC n ([a, b]) . If D∗a (≤0)
[a, b] , then Γf (µ1 , µ2 ) ≥ (≤)
n−1 X k=0
f (k) (a) k!
µZ 1−k
(q (x))
¶ (p (x) − aq (x)) dλ (x) . k
X
(38)
We finish with Remark 24. Using Lemma 3, Theorem 9 and Theorem 10 and in their γ settings, for g any of Daα f, D∗a f, Daγ f, which fulfill the conditions and assumptions of 1. Preliminaries (I), we can find as above similar estimates for Γg (µ1 , µ2 ) .
References [1] G. Anastassiou, Fractional Poincar´e type inequalities, submitted, 2007. [2] G. Anastassiou, Caputo fractional multivariate Opial type inequalities on spherical shells, submitted, 2007.
12
GEORGE A. ANASTASSIOU
[3] G. Anastassiou, Riemann-Liouville fractional multivariate Opial type inequalities on spherical shells, Bull. Allahabad Math. Soc., 23 (2008), 65–140. [4] G. Anastassiou, Fractional and other approximation of Csiszar’s f -divergence, Rend. Circ. Mat. Palermo, Serie II, Suppl. 99 (2005), 5–20. [5] N. S. Barnett, P. Cerone, S. S. Dragomir, and A. Sofo, Approximating Csiszar’s f divergence by the use of Taylor’s formula with integral remainder, (paper #10, pp. 16), in Inequalities for Csiszar f -Divergence in Information Theory, S. S. Dragomir (ed.), Victoria University, Melbourne, Australia, 2000. On line: http://rgmia.vu.edu.au [6] I. Csiszar, Eine Informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizit¨ at von Markoffschen Ketten, Magyar Trud. Akad. Mat. Kutato Int. K¨ ozl. 8 (1963), 85–108. [7] I. Csiszar, Information-type measures of difference of probability distributions and indirect observations, Stud. Sci. Math. Hung., 2 (1967), 299–318. [8] Kai Diethelm, Fractional Differential Equations, on line: http://www.tu-bs.de/˜ diethelm/lehre/f-dgl02/fde-skript.ps.gz [9] S. S. Dragomir (Ed.), Inequalities for Csiszar f -Divergence in Information Theory, Victoria University, Melbourne, Australia, 2000. On line: http://rgmia.vu.edu.au [10] S. S. Dragomir and C. E. M. Pearce, Selected Topics on Hermite-Hadamard Inequalities and Applications, Victoria University, Melbourne, Adelaide, Australia, on line:http://rgmia.vu.edu.au [11] S. Kullback and R. Leibler, On information and sufficiency, Ann. Math. Stat., 22 (1951), 79–86. [12] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959. [13] A. R´enyi, On measures of entropy and information, in Proceedings of the 4th Berkeley Symposium on Mathematical Statistics and Probability, I, Berkeley, CA, 1960, 547– 561. (Received: January 14, 2008)
Department of Mathematical Sciences University of Memphis Memphis, TN 38152, USA E–mail:
[email protected]