Determination of Maximum Entropy Probability

Applied Mathematical Sciences, Vol. 2, 2008, no. 57, 2851 - 2858

Determination of Maximum Entropy Probability Distribution via Burg’s Measure of Entropy S. Mansoury Power and Water University of Technology, Tehran, Iran mansoury [email protected] E. Pasha Dept. of Statistics, Tarbiat Moallem University Abstract In this paper, we consider the methods of obtaining maximum entropy multivariate distributions via Burg’s measure of entropy under the constraints that the marginal distributions or the marginals and covariance matrix are prescribed. Next, a numerical method is considered for the cases of unavailable closed form of solutions. Finally, the method is illustrated via a numerical example.

Keywords: Burg’s measure of entropy, Maximum entropy principle, Wavelet, Curve fitting

1

Introduction

Burg’s measure of entropy of a probability mass function (pmf) P = (p1 , · · · , pn ) n ln pi and for continuous variate distribution with probis defined as H(P ) = i=1

ability density function (pdf) f , it was later generalized to measure H(f ) = ln f (x)dx. The maximum entropy probability distributions (MEPD) were R studied by a number of researchers such as Jaynes (1968), Meeuwissen and Bedford (1997) and Miller and Wei-han (2002). Mansoury et al (2005) determined the MEMPD under some similar constraints via Bayesian measure of entropy. In section 2 of this paper determination of MEMPD with given marginal distributions or marginals and covariance matrix, via Burg’s measure of entropy is considered. In section 3, a numerical method by using smooth curve fitting and wavelets are given. Results and discussions are given in the final section.

2852

2

S. Mansoury and E. Pasha

Determination of MEMPD

In this section, to determine the MEMPD with given marginals and covariance matrix via Burg’s measure of entropy, the following theorems are stated and proved. Theorem 1: Let L2 denotes the collection of all measurable and squared integrable functions, and g1 , · · · , gn ∈ L2 , be prescribed marginal pdfs of multivariate distribution of X = (X1 , · · · , Xn ) , where x = (x1 , · · · , xn ) ∈ E ⊆ Rn , then the MEMPD over the region E is uniquely given by fM (x) = n 1 , x ∈ E, where f1, · · · , fn can be obtained by solving the following i=1

fi (xi )

system of functional equations 1 dx−i = gi (xi ); n Rn−1 fj (xj )

i = 1, · · · , n,

(1)

j=1

where dx−i denotes dx1 · · · dxi−1 dxi+1 · · · dxn . Proof: Since the marginal pdfs of f (x) are prescribed, for any function Ki (Xi ), the means i = 1, · · · , n

μi = E(Ki (Xi ));

(2)

are known. By using lemma 1 in [4], prescription of g1 , · · · , gn are equivalent to prescription of EKi (Xi ) for any function Ki (Xi ), to find MEMPD, we have to maximize Burg’s measure of entropy with constraints (2). Now by EulerLagrange method (Kapur, 1989), the Lagrangian is given by n ln f dx + λi ( Ki (xi )f dx − μi ) L=

E

= E

{ln f +

i=1 n

E

λi Ki (xi )f }dx −

i=1

n

λi μ i .

i=1

where dx denotes dx1 · · · dxn . Since ln f +

n i=1

λi Ki (xi )f is a concave and con-

tinuous function of f , the unique extermal solution maximizes entropy. Thus n ∂ {ln f + λi Ki (xi )f} = 0, therefore, the MEMPD is given by fM (x) = ∂f − n i=1

1

i=1

λi Ki (xi )

. It is clear that fM is inverse of sum of n separate functions

f1 (x1 ), · · · , fn (xn ), i.e. fM (x) =

f1 (x ) , n

i=1

i

i

x ∈ E. Since fM has the marginals

g1 , · · · , gn , then f1 , · · · , fn can be obtained by solving the system of equations (1).

2853

Determination of maximum entropy probability distribution

Theorem 2: If g1 ,· · · , gn ∈ L2 are marginal pdfs of multivariate distribun tion of X on the region E ⊆ R , with prescribed covariance matrix , then the 1 MEMPD over the region E is uniquely given by fM (x) = n i=1

fi (xi )+(x−μ) Λ(x−μ)

where μ = EX and f1 , · · · , fn and Λ can be obtained by solving the following system of equations: ⎧ 1 ⎪ dx−i ; i = 1, · · · , n ⎨ g(xi ) = Rn−1 n fj (xj )+(x−μ) Λ(x−μ) (3) j=1 ⎪ ⎩ = Cov(X) Proof: Since the marginal pdfs of f (x) are prescribed, for any function Ki (Xi ), the means μKi = E(Ki (Xi )); i = 1, · · · , n

(4)

are known. Prescription of g1 , · · · , gn and variance and covariance matrix are equivalent to prescription of EKi (Xi ) for any function Ki (Xi ), i = 1 · · · n and prescribed σi,j = E(Xi − μi )(Xj − μj ); i, j = 1 · · · n,

(5)

where μi = EXi , i = 1, · · · , n. To find MEMPD, we have to maximize Burg’s measure of entropy with constraints (4) and (5). Now by Euler-Lagrange method, the Lagrangian is given by n L= ln f dx + λi ( Ki (xi )f dx − μKi ) E

n n

+

λi,j (

E

i=1 j=1

=

E

−

{ln f +

n

n i=1

n

(xi − μi)(xj − μj )f dx − σi,j )

λi Ki (xi )f +

i=1

λi μ K i −

i=1

Since ln f +

E

i=1

n n

λi,j (xi − μi )(xj − μj )f }dx

i=1 j=1

n n

λi,j σi,j .

i=1 j=1

λi Ki (xi )f +

n n i=1 j=1

λi,j (xi − μi )(xj − μj )f is a concave and

continuous function of f , the unique extermal solution maximizes entropy. Then ∂ {ln f + λi Ki (xi )f + λi,j (xi − μi )(xj − μj )f } = 0, ∂f i=1 i=1 j=1 n

n

n

2854


therefore, the MEMPD is given by fM (x) =

1 , where Λ = f (x )+(x−μ) Λ(x−μ) n

i

i

[λi,j ]n,n . Since fM has the marginals g1 , · · · , gn and covariance , then f1 , · · · , fn and Λ can be obtained by solving the system of equations (3). i=1

Theorems 1 and 2 can easily be extended for discrete multivariate random variables by replacing integrals with summations.

3

Numerical Method

Since the integral and functional equations (1) and (3) always can not be solved analytically, we consider numerical methods to obtain the MEMPD in special case, that it can easily be extended for approximation MEMPD. Suppose we want to determine the maximum entropy bivariate probability distribution (MEBPD) over the region E, which is given by E = {(x, y) ∈ R2 | a < x < b, C1 (x) < y < C2 (x)} = {(x, y) ∈ R2 | c < y < d, D1 (y) < x < D2 (y)},

with given marginal pdfs g and h on (a, b) and (c, d) respectively. By theorem 1 (x, y) ∈ E, 1 the general form of MEBPD is given by fM (x, y) = f1 (x)+f 2 (y) and the marginal pdfs are

g(x) = h(y) =

C2 (x)

1

dy;

a < x < b,

dx; f1 (x)+f2 (y)

c < y < d.

f1 (x)+f2 (y) CD12(x) (y) 1 D1 (y)

(6)

Now, a numerical method for approximation of f1 and f2 is considered. We subdivide E into subregions:

a = x0 < x1 < · · · < xn1 +1 = b, c = y0 < y1 < · · · < yn2 +1 = d, Δxi = xi − xi−1 = Δ, i = 2, 3, · · · , n1 , j = 2, 3, · · · , n2 , Δyj = yj − yj−1 = Δ,

2855

Determination of maximum entropy probability distribution

From (6), for each 1 ≤ i ≤ n1 and 1 ≤ j ≤ n2 , we have 1 1 1 + ](yi1 − C1 (xi )) g(xi ) = [ 2 f1 (xi ) + f2 (C1 (xi )) f1 (xi ) + f2 (yi1 ) ni 1 1 1 + ](yik − yi,k−1) [ + 2 k=2 f1 (xi ) + f2 (yik ) f1 (xi ) + f2 (yi,k−1) 1 1 1 + ](C2 (xi ) − yini ), + [ 2 f1 (xi ) + f2 (yini ) f1 (xi ) + f2 (C2 (xi )) 1 1 1 + ](xj1 − D1 (yj )) h(yj ) = [ 2 f1 (D1 (yj )) + f2 (yj ) f1 (xj1 ) + f2 (yj ) nj 1 1 1 + ](xjk − xj,k−1) [ + 2 k=2 f1 (xjk ) + f2 (yj ) f1 (xj,k−1) + f2 (yj ) 1 1 1 + [ + ](D2 (yj ) − xjnj ). 2 f1 (xjnj ) + f2 (yj ) f1 (D2 (yj )) + f2 (yj ) By replacing the following approximations: f2 (C1 (xi )) = f2 (yi1 ), f2 (C2 (xi )) = f2 (yini ) f1 (D1 (yj )) = f1 (xj1 ), f1 (D2 (yj )) = f1 (xjnj )

1 ≤ i ≤ n1 , 1 ≤ j ≤ n2 ,

we have 1 (yi1 − C1 (xi )) f1 (xi ) + f2 (yi1 ) 1 1 Δ + 2( +··· + [ 2 f1 (xi ) + f2 (yi1 ) f1 (xi ) + f2 (yi2 ) 1 1 )+ ] + f1 (xi ) + f2 (yi,(ni−1) ) f1 (xi ) + f2 (yini ) 1 + (C2 (xi ) − yini ), f1 (xi ) + f2 (yini )

g(xi ) =

1 (xj1 − D1 (yj )) f1 (xj1 ) + f2 (yj ) 1 1 Δ + 2( +··· + [ 2 f1 (xj1 ) + f2 (yj ) f1 (xj2 ) + f2 (yj ) 1 1 + ] + f1 (xj,(nj −1) )) + f2 (yj ) f1 (xjni ) + f2 (yj ) 1 (D2 (yj ) − xjnj ). + f1 (xjnj ) + f2 (yj )

(7)

h(yj ) =

(8)

Since the equations (7) and (8) are related, we set one of unknowns f1 (xi ) or f2 (yj ) to be zero and we can obtain the other unknowns. We fit two smooth

2856


curves fˆ1 and fˆ2 over intervals (a, b) and (c, d), then an approximate MEBPD over E is given by fˆM (x, y) =

1 fˆ1 (x) + fˆ2 (y)

(x, y) ∈ E.

(9)

Now, we consider wavelet method to fit two smooth curves to f1 and f2 . Consider a function f (x). The objective is to approximate it by wavelets ψj,k (x) so that f (x) = cj,k ψj,k (x) (10) j∈Z k∈Z j

where ψj,k (x)s are defined the wavelets by ψj,k (x) = 2 2 ψ(2j x − k), j, k ∈ Z and we assume that {ψj,k (x) | j, k ∈ Z} are orthogonal and normalized i.e. 0 j = n or k = i ψj,k (x)ψn,i (x)dx = (11) 1 j = n, k = i R Utilizing (10) and (11), it is observed that cj,k = R f (x)ψj,k (x)dx. Now suppose we want to fit a wavelet to f (x); a < x < b and outside of this interval 0, by using the following set of data {(xi , f (xi))| i = 0, 1, · · · , n1 + 1}, where the abscissas {xk ; k = 0, 1, · · · , n1 + 1} are distinct and a = x0 < x1 < · · · < xn1 +1 = b, Δxi = xi − xi−1 = Δ, i = 2, 3, · · · , n1 , the coefficients can be approximate by cˆj,k = Δ

n1 i=0

f (xi )ψj,k (xi ).

Example: Let E = {(x, y) ∈ R2 | 0 < x + y < 1, 0 < x < 1} and the marginal pdfs be 1 g(x) = 1−ln (ln 2 − ln(1 + x)), 0