The moment problem and maximum entropy - Semantic Scholar

The moment problem and maximum entropy: numerical investigation M. Frontini, A. Tagliani Dipartimento di Matematica, Politecnico di Milano, Piazza L. da Vinci 32 -20133 MilanoItaly

Abstract The nite moment problem in the framework of maximum entropy approach is numerically investigated to recover a positive density function. Numerical experiments suggest that a few moments can be eectively used as the moment space geometry. If an high machine precision is adopted, an estimate of the information content of a recovered function can be obtained. 1. Introduction The moment problem by Maximum Entropy (M.E.) approach has been widely studied [1]. Only recently the existence problem has been completely solved [2,3]. The recent theoretical results on the existence and unicity of M.E. solution have put a new light on the ill-posedness and on the diculties of the numerical solvability of the nite moment problem when a positive solution is required. Some results concerning the convergence problem have been provided in [3,4]. A lot of problems lead to the generalized moment problem, consisting in the recovery of a positive function f (x), de ned in a domain R, knowing only the rst M + 1 moments [5,6,7]. By varying the de nition interval of f (x) we have the following classical moment problem: Hausdor (R [0; 1]), Stieltjes (R [0; +1)), Hamburger (R (?1; +1)). If only the rst M + 1 moments are assigned, the recovery of f (x) isn't uniquely de ned. In order to guarantee uniqueness, the M.E. principle of Jaynes [8] is widley used, thanks 1

to its theoretical basis in statistical mechanics and information theory. This principle chooses, within the in nite number of approximating solutions fM (x) with the rst M +1 given moments j , j = 0; 1; : : :; M , the one with maximum entropy. Here the entropy

H (f ) of a function f (x) is de ned as H (f ) = ?

R

R f (x)lnf (x)dx.

The problem leads to the maximization of the following functional

H (f ) = ?

Z

R

f (x)lnf (x)dx +

M X j =0

Z

j ( xj f (x)dx ? j ); R

(1:1)

from which, through standard technique, one gets the analytical form of the approximating function fM (x)

fM (x) = exp(?

M X

0

j xj );

(1:2)

where j are the Lagrange multipliers satisfying the M + 1 constraints Z

R

xj fM (x)dx = j

j = 0; :::; M:

(1:3)

This paper is devoted to hightlighting the numerical diculties arising in the numerical solution of the nite moment problem by the M.E. approach as the number of assigned moments increases. A new algorithm [9] for the numerical solution of the Hausdor nite moment problem is used. 2. Theoretical results in the literature a) The existence problem. De ned the following Hankel matrices for even and odd M

M = 2N

2

3

3 0 N 0 1 2 6 .. .. 75 ; 0 1 4 5 l0 = 0 ; l2 = ; l4 = 1 2 3 ; l2N = 4 . ... . 1 2 2 3 4 N 2N (2:1a) 3 2 ? ? ? 2 2 ? 3 ; u = 4 1 ? 2 2 ? 3 3 ? 4 5 ; u2 = 1 ?2 ; u4 = 1 ? 2 3 3 4 4 5 6 2 ? 3 3 ? 4 3 ? 4 4 ? 5 5 ? 6

2

2

2

::: u2N = 64 M = 2N + 1

1 ? 2

N ? N +1

.. .

.. .

.. .

N ? N +1 2N ?1 ? 2N

l1 = 1 ; l3 = 1

2

2

2 ; l = 4 1 2 2 3 5 3 3 4 3 2 1 N +1

3 7 5;

(2:1b)

3

3 4 5 ; 5

.. .. 75 ; ::: l2N +1 = 64 ... (2:1c) . . N +1 2N +1 2 3 ? ? ? 0 1 1 2 2 3 1 1 ? 2 ; u = 4 ? ? ? 5 ; u1 = 0 ?1 ; u3 = 0 ? 5 1 2 2 3 3 4 1 ? 2 2 ? 3 2 ? 3 3 ? 4 4 ? 5 3 2 0 ? 1 N ? N +1 7 .. .. .. (2:1d) ::: u2N +1 = 64 5: . . . N ? N +1 2N ? 2N +1

The following results hold. i) Hausdor case [2,10]

Theorem 1. The necessary and sucient conditions for the existence of the M.E. solution

fM (x), when the rst M +1 moments are given, is the strict positivity of the determinants (2.1a), : : : , (2.1d). Remark 1. The strict positivity of the determinats (2.1) implies that the moments vector is inside the convex set de ned by the "moments' function" [9,10]. By the nature of the Hausdor moment problem the diameter of the M -th moment is decreasing as M grows. For this reason also for the small uctuations of moments the existence of the solution can be lost. If we de ne the diameter of the moments set diam(M ), as the maximum size of variability of the last moment M , then in the case M = 2 diam(2 ) = 41 is obtained when 1 = 12 . As an example the region of the admissible values of the two rst moments 1 ; 2 is represented in gure 1. In the general case, when M + 1 moments are assigned, the maximum size of diam(M ) is given by [10] diam(M ) =

lM + uM : lM ?2 uM ?2 3

In the optimal case diam(M ) = 1=22(M ?1) ; such a value is generated by the sequence of the moments corresponding to the function (t) = pt1(1?t) as the moments sequence ? j = 2jj 2?2j , j = 0; 1; : : :; M is such that j is the middle point of the interval of the admissible values of j ?1 . Remark 2. When M is varied within the range of its admissible values the M coecient varies monotonically in (?1; +1). Thus when M is increased, from Remark 1, we can argue that small variations of M imply large variations of M : the numerical computation of the M.E. solution becomes highly unstable. ii) Stieltjes and Hamburger cases [3]. Theorem 2. The necessary and sucient conditions for the existence of the M.E. solution

fM (x) (M 4), when the rst M + 1 moments are given, is the strict positivity of the determinants (2.1a) for Hamburger case, and (2.1a) and (2.1d) in the Stieltjes cases. b) The convergence problem The following results have been cited in the literature. i) De ned the directed divergence I (f; fM ) of fM (x) from f (x) as Z

flog ff dx

(2:2)

I (f; fM ) = H (fM ) ? H (f )

(2:3)

I (f; fM ) =

R

M

if j (f ) = j (fM ), j = 0; 1; : : :; M , then [4]

ii) the following inequality holds [11] Z 1 I (f; fM ) 4 ( j f ? fM j dx)2 R

(2:4)

The entropy convergence and (2.3), (2.4) imply the L1 norm convergence of fM to f . The following results on the family function (1.2) are known. i) Hausdor case [4,12] 4

Teorem 3. fM (x) existence 8M implies the relative entropy, entropy and L1 norm convergence as M ! 1. ii) Stieltjes and Hamburger cases [4] Teorem 4. If fM (x) exists 8M , the set of moments uniquely de nes f (x) and if M = 0 lim M !1

(2:6)

M ?2

where j are de ned in (2.1a) and (2.1c), then fM (x) converges to f (x) in relative entropy, entropy and L1 norm as M ! 1. Remark 3. The sucient condition for entropy convergence given by the relationship (2.6) can be so interpreted. Let ?M be the M value corresponding to M = 0, then the following relation is obtained [10] M = ?M + MM?2 . The condition (2.6) implies that lim M !1 M = ?M from which one can suspect instability in the numerical computation of j . c) A discretization of (1.3) by collocation method (Hausdor case) By discretizing (1.3) through a Gaussian quadrature formula with n knots x(in) and weights

p(in) , i = 1; : : :; n and by indicating with the set 0; : : :; M , one gets M

n

X X (n) = p(in) (x(in) )k exp(? j (x(in) )j ) = k j =0 i=1

k = 0; 1; : : :; M

(2:7)

(2.7) admits an approximate solution (n) i theorem 1 holds and n = n(0 ; : : :; M ), where n can assume large values if at least one of the assigned moments is near the boundary of the moments' set [9]. Moreover lim n!1 (n) = , where satis es (1.3). 3. Numerical computation of j multipliers. The solution of (1.3) leads to the computation of the minimum of the potential function ? so de ned ? = ?(1 ; : : :; M ) =

M X j =1

j j + ln( 1

0

5

Z

R

exp(?

M X j =1

j xj )dx)

(3:1)

i) An accurate evaluation j(in) of the starting point of j .

Minimization of (3.1) can start from an accurate value j(in) j = 1; : : :; M , obtained from the solution of (2.7) with a suciently large number of knots. Alternatively the system (1.3) can be interpreted as a non linear system of equations in the variable 1 ; : : :; M and can be solved by a Newton like method bene tting of the

accurate approximate solution j(in) as starting point. In such a case the Jacobian matrix

is exactly given by the Hankel matrix l2M in (1.1a), where M +j , j = 1; : : :; M are obtained from the recoursive relationship

M +j = M + [(j + 1)j ? 1 +

M ?1 X k=1

kk (k ? k+j )]=MM ;

j1

(3:2)

for the Hausdor case. A similar procedure can be extended to Stieltjes and Hamburger cases even if no theoretical results of convergence of (jn) to j are available. 4. Numerical results a) Hausdor case. In this section, in order to show the power of the M.E. method, we present, among various cases examined, only the numerical results obtained for two C 0 [0; 1] functions

f (1) (x), f (2) (x) and for a discontinuos step function f (3) (x). Example 1.

8 < 8(1=2

f (1) (x) = : 1?k

? x)

0 x 1=2

x > 1=2

0

with moments k = (1+2k)(2+k) , k = 0; 1; : : : and entropy H (f (1) ) ' ?0:886294. Example 2.

f (2) (x) =

8 < :

0 < x < 1=2

0 8(x ? 1=2)

x 1=2 1 (1 ? k1+2 ) ? 1 1 (1 ? k1+1 ), k = 0; 1; : : : and entropy H (f (2) ) ' with moments k = k+2 2 k+1 2 2

?0:886294. 6

Example 3.

f (3) (x) =

8 < 4=3

0 < x < 1=2

:

2=3 x > 1=2 1 (1 + k1+1 ), k = 0; 1; : : : and entropy H (f (3) ) ' ?0:056633. with moments k = 23 k+1 2 The approximate solutions fM (x) corresponding to increasing M values are determined by the numerical procedure illustrated in section 3 using dierent machine precision. The quantities H (fM ) ? H (f ) and the L1 norm are reported in tables 1,2,3. In g. 1,2,3 a graphical comparison between f (x) and fM (x), for increasing M values are reported. Some comments are in order. 1) The H (fM ) value is sensitive to machine precision depending on a more or less accurate computation of the potential ? minimum. Such a minimum has been computated by the Powell method (even if dierent algorithms have been experimented giving qualitatively similar results). 2) Entropy convergence is evident only if an high machine precision is adopted, as a consequence of the moment space geometry and the reduction of diam(M ). 3) An optimal number of moments, corresponding to minimum entropy value, is found depending on the set of given moments and on machine precision. b) Stieltjes case As an example of Stieltjes moment problem we consider the numerical inversion of the Laplace transform. Let F (s) the Laplace transform of f (t) the inversion of F (s) is related to the moment problem by the following relationship j F (0) (?1)j d ds j

= j =

1

Z

0

tj f (t)dt

j0

(4:1)

If we know F (s) the rst step for our algorithm is the computation of the quantities

j related to

dj F (0) dsj

by (4.1). When the derivatives at the origin cannot be obtained

through an analytical procedure, we can resort to the Lyness algorithm [13]. This ecient algorithm computes

dj F (0) dsj

when F (s) is given analitically or numerically known on a

closed path containing the origin. The Lyness method, based on the Cauchy theorem, 7

reduces the computation of the integral along a closed path in the complex plane to the computation of a real integral over [?; ] by the trapezoidal rule. Some numerical examples are considered where all the calculations are carried out in double precision. For all the examples we will report: i) a graphical comparison between f (t) and fM (t); ii) the dierence H (fM ) ? H (f ). iii) L1 norm Example 4.

f (4) (t) =

8 < sin(t) :

0

with moments k = 1 ? k(k+1) 2 k?2 ;

0t1

t>1

k 2 and 0 = 2 , 1 = 1 and H (f (4) ) '

0:19535. Example 5.

f (5) (t) =

8 1

0

1 , k = 0; 1; : : : , and H (f (5) ) = 0. with moments k = k+1

In tables 4,5 and in gures 4,5 we report the same quantities as in Hausdor case. From the obtained numerical results similar considerations as in Hausdor case can be drawn. For this reason only double precision results are illustrated. 6. Conclusions By resorting to known analytical results the M.E. criterion has been used to recover a positive function whose M +1 moments are assigned. The M.E. criterion is eective when the positivity of the unknown function is more crucial matching the moments. Numerical experiments suggest that 1) only a few moments may be used to recover the M.E. solution as the moments' space geometry. 8

2) A good entropy estimate is given for functions well characterized by their rst moments: nevertheless it is impossible to establish, from the assigned set of moments, if that condition is veri ed. 3) According to M.E. principle and convergence results the only available information given by the moments' set is H (fM ) H (f ). 4) A practical criterion to establish the optimal maximum number of moments, for a given machine precision, is the failure of the monotonic not increasing (Hausdor case) and monotonic decreasing (Stieltjes and Hamburger cases) behaviour of the H (fj ) sequence. 5) The numerical instability arising from the particular choice (1.2) is regained by the positivity of the approximating functions fM (x).

9

M

2 4 6 8 10 12 14

Table 1 (Single precision Double precision) H (fM ) ? H (f ) L1 norm H (fM ) ? H (f ) 0:1901(?01) 0:1186(?00) 0:2019(?01) 0:5514(?02) 0:5398(?01) 0:3074(?02) 0:1834(?02) 0:3521(?01) 0:1455(?02) 0:1554(?02) 0:1808(?01) 0:8630(?03) 0:1184(?02) 0:1991(?01) 0:7010(?03) 0:4670(?03) 0:3390(?03)

Table 2 (Single precision Double precision) M H (fM ) ? H (f ) L1 norm H (fM ) ? H (f ) 2 0:1901(?01) 0:1186(?00) 4 0:7174(?02) 0:5398(?01) 0:6974(?02) 6 0:5258(?02) 8 0:3850(?02)

M

2 4 6 8 10 12 14 16

M

2 4 6 8 10 12 14

Table 3 (Double precision) H (fM ) ? H (f ) L1 norm 0:1386(?01) 0:1337(?00) 0:7921(?02) 0:8763(?01) 0:5642(?02) 0:7322(?01) 0:5466(?02) 0:7098(?01) 0:5261(?02) 0:6975(?01) 0:5088(?02) 0:6645(?01) 0:4916(?02) 0:6648(?01) 0:4723(?02) 0:6363(?01) Table 4 (Double precision) H (fM ) ? H (f ) L1 norm 0:1129(?00) 0:2996(?00) 0:5919(?01) 0:1950(?00) 0:2587(?01) 0:1124(?00) 0:1677(?01) 0:8458(?01) 0:1568(?01) 0:7126(?01) 0:1159(?01) 0:6289(?01) 0:1099(?01) 0:5439(?01) 10

L1 norm

0:1187(?00) 0:2700(?01) 0:1824(?01) 0:1533(?01) 0:1635(?01) 0:7115(?02) 0:1001(?01)

L1 norm 0:7093(?01) 0:3991(?01) 0:5115(?01)

M

2 4 6 8 10 12

Table 5 (Double precision) H (fM ) ? H (f ) L1 norm 0:2223(?01) 0:8377(?01) 0:1201(?01) 0:7021(?01) 0:2834(?02) 0:2992(?01) 0:1942(?02) 0:2111(?01) 0:1139(?02) 0:1050(?01) 0:8990(?03) 0:1826(?01)

Table 1 Dierence H (fN ) ? H (f ) and L1 norm (example 1) Table 2 Dierence H (fN ) ? H (f ) and L1 norm (example 2) Table 3 Dierence H (fN ) ? H (f ) and L1 norm (example 3) Table 4 Dierence H (fN ) ? H (f ) and L1 norm (example 4) Table 5 Dierence H (fN ) ? H (f ) and L1 norm (example 5)

11

Moment space with two moments 1 0.9 0.8

second moment

0.7 0.6 0.5