A SEMIDEFINITE PROGRAMMING APPROACH ... - Semantic Scholar

A SEMIDEFINITE PROGRAMMING APPROACH TO THE GENERALIZED PROBLEM OF MOMENTS JEAN B. LASSERRE

Abstract. We consider the generalized problem of moments (GPM) from a computational point of view and provide a hierarchy of semidefinite programming relaxations whose sequence of optimal values converges to the optimal value of the GPM. We then investigate in detail various examples of applications in optimization, probability, financial economics and optimal control, which all can be viewed as particular instances of the GPM.

1. Introduction Originally, the classical problem of moments is that of deriving existence of a measure (and more generally, bounds on existing measures) with given specified moments. It traces back to early works by famous mathematicians like Chebyshev, Markov and Stieltjes to cite a few. After further extensive investigations, there is now a rather satisfactory theory for the univariate case, and for more details the interested reader is referred to e.g. Curto and Fialkow [8], Shohat and Tamarkin [33], Simon [34], and many references therein. On the other hand, despite relatively recent progress, the multivariate case has still only partial answers, the compact case being the most notable; see in particular the important results stated in Maserick and Berg [6], Jacobi and Prestel [15], Putinar [30], Schm¨ udgen [31]. Roughly speaking, and as an extension of the classical problem of moments, the Generalized Problem of Moments (GPM) is the infinite-dimensional optimization problem Z Z (1.1) P: min { f0 dµ | fj dµ = bj , j ∈ I} µ∈M (K)

where I is an index set not necessarily finite, and: • K ⊆ Rn and M (K) is a convex set of measures on K, • fj : K → R are measurable functions for every j ∈ I. From a theoretical viewpoint, the GPM has developments and impact in various area of Mathematics like algebra, Fourier analysis, functional analysis, operator theory, probability and statistics, to cite a few. In addition, and despite its rather simple and short formulation, the GPM has a large number of important applications in various fields like optimization, probability, finance, control, signal processing, chemistry, cristallography, tomography, etc .. For an account of various methodologies as well as some of potential applications, the interested reader is referred to e.g. Akhiezer [1] and the nice collection of papers [18]. 1991 Mathematics Subject Classification. 90C22 90C25. Key words and phrases. measures; moments. 1

2

JEAN B. LASSERRE

The GPM (1.1) can be viewed as an infinite dimensional linear program on a space of measures, and although discretization schemes can be defined to approximate the GPM by finite dimensional LPs of increasing size (as in e.g. HernándezLerma and Lasserre [12]), it is essentially used as a theoretical modeling tool from which existence and characterizations of optimal solutions are often possible. But in its full generality, the GPM is intractable numerically, except for small dimension n ≤ 2, 3 for which a nice and elegant geometric approach is possible; see Anastassiou [2] and Kemperman [17] for a nice account of this latter approach, based on the geometry of the moment space Y = conv (fj (K), j ∈ I) ⊂ R|I| . Contribution. The purpose of the present paper is to show that for the GPM with polynomial data, the situation is much nicer as one can define a systematic numerical scheme in which one obtains a monotone nondecreasing sequence of lower bounds that converges to the optimal value of the GPM; sometimes finite convergence may even occcur. The proof of convergence is obtained by using relatively recent results on the classical (multivariate) moment problem on a compact set (or its dual facet of representation of positive polynomials) due to Jacobi and Prestel [15] and Putinar [30], a refinement of Schm¨ udgen’s Positivstellensatz [31]. By polynomial data, we mean a GPM (1.1) in which the set K ⊂ Rn is a basic closed semi-algebraic set (hence defined by finitely many polynomial inequality constraints), and the data {fj } are polynomials. In fact, as we will later see, f0 is also allowed to be a piecewise polynomial, or a rational function. (a) We first extend to the abstract GPM (1.1) the methodology we have used in [19] for approximating (and sometimes solving) nonconvex global optimization poblems with polynomial data, which encompass such important classes of optimization problems as nonconvex quadratic and 0-1 programming problems. In [19] one has defined a hierarchy of semidefinite programming (SDP) relaxations of the original problem whose associated monotone nondecreasing sequence of optimal values converges to the global optimum. For more details the reader is referred to Lasserre [19]. In particular, we show that the sequence of SDP-relaxations converges to the optimal value of the GPM without assuming no duality gap between the GPM and its dual. That is, convergence is still compatible with a possible duality gap between the GPM and its dual. (In most works dealing with the GPM from an algorithmic viewpoint, it is assumed that the vector b = {bj }j∈I in (1.1) is in the interior of the moment space, in which case there is no duality gap between the GPM and its dual; see e.g. Bertsimas and Popescu [7, Theor. 2.2], Isii [14].) This is an important feature because checking whether some given b lies in the interior of the moment space may not be a trivial task. (b) In a second part we illustrate the power of the GPM as a modeling tool in some different applications ranging from global optimization of a rational function to some applications in probability and finance, as well as in optimal control. In each case, the problem can be viewed as instance of the GPM (1.1) (or a variant of it) either directly from the statement of the original problem to solve, or after some analysis and transformation. We then particularize in detail to each case the hierarchy of convergent SDP-relaxations defined for the abstract GPM (1.1). It is worth emphasizing that in some of the above applications, one ends up with a variant of the GPM in which one has two unknown measures µ and ν (instead of a

THE GENERALIZED PROBLEM OF MOMENTS

3

single measure µ in (1.1)), and/or the index set I is countable. However, one still obtain a converging sequence of SDP-relaxations; see sections 4.2, 4.3, 4.4 and 4.5. The paper is organized as follows: We first describe the GPM in section 2. In section 3 we then define our hierarchy of SDP-relaxations along with convergence results and a sufficient condition to detect finite convergence (hence global optimality). For clarity of exposition, the proof of the main theorem is postponed to an appendix. Finally, in section 4 we explain and particularize the approach for some applications in global optimization, probability, financial economics, and in optimal control as well. 2. The GPM n

In R we always consider the usual Borel σ-algebra B and so a finite measure on Rn is always understood as a Borel finite measure on B. With K ∈ B, let M(K) be the Banach space of finite signed Borel measures on K, endowed with the norm of total variation, and let M(K)+ be its positive cone, i.e., the space of finite Borel measures on K. Let C(K) be the Banach space of bounded continuous functions on K, endowed with the sup-norm. In the weak topology σ(M(K), C(K)) (with respect to bounded continuous functions on K), a bounded sequence {µn } ⊂ M(K) converges to µ ∈ M(K) if Z Z (2.1) h dµn → h dµ ∀ h ∈ C(K). Notice that when K is compact then M(K) ' C(K)∗ (i.e. M(K) is the topological dual of C(K)) and the weak topology σ(M(K), C(K)) is the weak ? topology of M(K). In that topology, the important and celebrated Banach-Alaoglu theorem states that the unit ball of M(K) is compact (and even sequentially compact in our setting); see e.g. Ash [3] and Hernández-Lerma and Lasserre [11, p. 5]. Typically, the convex set of measures M R(K) ⊂ M(K)+ in the GPM (1.1) is defined by some linear inequality constraints hj dµ ≤ cj , for some given measurable functions hj : K → R, scalars cj ∈ R, j ∈ J, and some index set J. Therefore in this case, it is equivalent to define the GPM (1.1) as the infinite dimensional linear program Z Z Z (2.2) P : min { f0 dµ | fj dµ = bj , j ∈ I; fj dµ ≤ bj , j ∈ J}. µ∈M(K)+

The GPM (2.2) is said to be solvable if there exists an optimal solution µ∗ ∈ M (K)+ . Lemma 2.1. Consider the GPM (2.2) where K ⊂ Rn is compact, fj are lower semicontinuous on K for every j ∈ J ∪ {0}, fj are continuous for all j ∈ I, and fj ∗ is strictly positive on K for some j ∗ ∈ I ∪ J ∪ {0}. Assume that the GPM has a feasible solution. Then: (a) The GPM is solvable. (b) If I ∪ J is finite then there exists an optimal solution µ∗ ∈ M(K)+ which is supported on at most s := 1 + card I ∪ J points {xi }si=1 ⊂ K. Proof. (a) Let ρ∗ denote the optimal value of the GPM (2.2). Then ρ∗ > −∞ because the GPM has a feasible solution µ0 ∈ M(K)+ and f0 is lower semicontinuous, hence bounded R below on K. Therefore, consider a minimizing sequence {µn } ⊂ M(K)+ with f0 dµn ↓ ρ∗ > −∞.

4

JEAN B. LASSERRE

The sequence {µn } ⊂ M(K)+ is bounded. Indeed let fj ∗ be strictly positive on K. So fj ∗ ≥ δ > 0 on K Rif j ∗ ∈ I (because K is compact and fj ∗ is continuous on K). Therefore bj = fj dµn ≥ δµn (K) which shows that {µn } is norm bounded. Similarly if j ∗ ∈ J ∪ {0} then f0 ≥ δ > 0 on K because f0 (being lower semicontinuous) attains its minimum on K. Hence, in the minimizing sequence Z Z δµn (K) ≤ f0 dµn ≤ f0 dµ0 , n = 1, 2, . . . which also shows that {µn } is norm bounded. So consider a subsequence (still denoted {µn } for convenience) for which µn (K) (> 0) → ρ. Case ρ > 0. Consider the sequence of probability measures {νn } := {µn /µn (K)} on K. The set K being compact implies that {νn } is tight and thus, by Prohorov theorem, relatively compact for the weak topology σ(M(K), C(K)); see e.g. Billingsley [4], or Hern´ andez-Lerma and Lasserre [11]. So let {νnk } be a converging subsequence, i.e., Z Z lim h dνnk = h dν, ∀ h ∈ C(K), k→∞

for some probabilityRmeasure ν ∈ M(K)+ . Next, as fj ∈ C(K) for all j ∈ I, we immediately obtain fj dν = bj /ρ for all j ∈ I. On the other hand, as fj is lower semicontinuous on K for all j ∈ J ∪ {0} (hence bounded below), we also obtain Z Z bj /ρ = lim inf fj dνnk ≥ fj dν ∀ j ∈ J, k→∞

and ρ∗ /ρ = lim inf k→∞

Z

Z f0 dνnk ≥

f0 dν.

See e.g. Hern´ andez-Lerma and Lasserre [11]. And so, the measure µ := ρ ν ∈ R M(K)+ is a feasible solution of the GPM (2.2) with value f0 dµ ≤ ρ∗ , which shows that µ is an optimal solution of P. Case bj = Z ρ = 0. Then µn → µ = 0 in total variation. In particular, Z lim fj dµn = 0 for all j ∈ I. In addition, as we have again lim inf fj dµn ≥ n→∞ n→∞ Z fj dµ = 0 for all j ∈ J ∪ {0}, we get that the trivial measure µ = 0 is feasible for P with value 0 ≤ ρ∗ , and so µ = 0 is an optimal solution of P. This completes the proof of (a). (b) Let µ ∈ M(K)+ be an optimal solution of the GPM. If I ∪ J is finite then by Caratheodory theorem, there exists a measure ν ∈ M(K)+ such that Z Z fj dν = fj dµ, ∀ j ∈ I ∪ J ∪ {0}, with ν being supported on at most s := 1 + card I ∪ J points {xi }si=1 ⊂ K. (See e.g. Anastassiou [2].) Let I be finite with cardinal p. Convexity of M (K) in (1.1) yields that the moment space Y ⊂ Rp defined by Z p Y := {y ∈ R | yj = fj dµ ∀j ∈ I, for some µ ∈ M (K)}


5

is also convex. If we let g : Y → R ∪ {+∞} be the value function, then Fenchel duality yields that g(y) ≥ sup {L(λ) + hλ, yi},

(2.3)

∀y ∈ Y,

λ∈Rp

with Z λ 7→ L(λ) :=

inf µ∈M (K)

(h −

p X

λj fj ) dµ.

j=1

Equality holds in (2.3) if g(y) is finite throughout int(Y), in which case for every y ∈ int(Y) one also has g(y) = L(λ) + hλ, yi for some λ ∈ Rp . When the GPM is described as the infinite dimensional LP (2.2), then its dual is the LP X X (2.4) P∗ : max { λ j bj | λj fj ≤ f0 on K; λj ≤ 0 ∀ j ∈ J} λ

λj ∈I∪J

j∈I∪J

For a detailed discussion on the GPM, the interested reader is referred to e.g. Isii [13, 14], Kemperman [17] and many references therein. 3. SDP-relaxations of the GPM As already mentioned, and except in the case of small dimension n ≤ 3, in full generality the GPM (1.1) is not solvable numerically. In this section we are concerned with an effective computation (or approximation) of the optimal value ρ∗ of the GPM (2.2) with polynomial data, that is, when data {fj } are polynomials and the set K is a basic closed semi-algebraic set. 3.1. Notation. For a real symmetric matrix A, the notation A 0 (resp. A 0) stands for A is positive semidefinite (resp. positive definite), whereas ut denotes the transpose of a vector u. Let N be the set of natural numbers, and let R[x] (= R[x1 , . . . , xn ]) be the ring of real polynomials in the n variables x1 , . . . , xn . Let Σ2 ⊂ R[x] be the set of polynomials that are sums of squares (s.o.s.). With r ∈ N, let s(r) := n+r , and let u(x) ∈ Rs(r) be the column vector n ur (x) = (1, x1 , . . . , xn , x21 , x1 x2 , . . . , xrn )t , whose components form the usual canonical basis of the vector space Ar (of dimension s(r)) of real polynomials of degree at most r. Given a infinite sequence y := {yα }α∈Nn indexed in the canonical basis u∞ (x), let Ly : R[x] → R be the linear mapping X X fα yα . (3.1) f ∈ R[x] (= fα xα ) 7→ Ly (f ) := α∈Nn

α∈Nn

Moment matrix. Let Mr (y) be the s(r) × s(r) real matrix with rows and columns indexed in the basis ur (x), and defined by: α, β ∈ Nn , |α|, |β| ≤ r, Pn where for every α ∈ Nn , the notation |α| stands for i=1 αi . Equivalently, Mr (y) = Ly (ur (x)ur (x)t ), meaning that Ly is applied entrywise to the polynomial matrix ur (x)ur (x)t . The matrix Mr (y) is called the moment matrix associated with the sequence y (see e.g. Curto and Fialkow [9] and Lasserre [19]). (3.2)

Mr (y)(α, β) = yα+β ,

6

JEAN B. LASSERRE

If y has a representing measure µy (i.e., if y is the moment sequence of a measure µy on Rn ) then, identifying a polynomial f ∈ Ar with its vector f = {fα } ∈ Rs(r) of coefficients in the basis ur (x), one has Z (3.3) hf , Mr (y)f i = f 2 dµy ≥ 0, f ∈ Ar , so that Mr (y) 0. A measure µ is said to be moment determinate if there is no other measure with same moments. In particular, and as an easy consequence of the Stone-Weierstrass theorem, every measure with compact support is moment determinate; see e.g. Berg [5], or Berg and Maserick [6]. There is a nice sufficient condition to ensure that a sequence y has a unique representing measure. It is due to Nussbaum [27] and is an extension to the multivariate case of Carleman’s condition in the univariate case. Namely, if ∞ X

(3.4)

Ly (xi2k )−1/2k = +∞,

i = 1, . . . , n

k=1

then y has a unique representing measure; see e.g. Berg [5, Theor. 5]. Similarly, if for some a, c > 0, |yα | ≤ ca|α| for all α ∈ Nn , then y has a unique representing measure, with support contained in the ball [−a, a]n ; again see Berg [5, Theor. 9]. Finally, if the marginal distributions of a measure µ are determinate, then so is µ; see Petersen [29]. Localizing matrix. Similarly, given y = {yα } and θ ∈ R[x], let Mr (θy) be the s(r) × s(r) matrix defined by: Mr (θy) := Ly (θ(x)ur (x)ur (x)t ), meaning again that Ly is applied entrywise to the polynomial matrix θ(x)ur (x)ur (x)t . The matrix Mr (θy) is called the localizing matrix associated with the sequence y and the polynomial θ (see again Lasserre [19]). If y has a representing measure µy with support contained in the level set {x ∈ Rn : θ(x) ≥ 0} (where θ ∈ R[x]), then Z (3.5) hf , Mr (θy)f i = f 2 θ dµy ≥ 0 ∀ f ∈ Ar , so that Mr (θy) 0.

3.2. SDP-relaxations. We now consider the GPM (2.2) where I and J are finite index sets, and K ⊂ Rn is the basic closed semi-algebraic set defined by (3.6) for some family

K := { x ∈ Rn {gj }m j=1

| gj (x) ≥ 0,

j = 1, . . . , m}

⊂ R[x].

Assumption 3.1. (i) {fj }j∈I∪J∪{0} ⊂ R[x], and fj ∗ > 0 on K for some j ∗ ∈ I ∪ J ∪ {0}. (ii)PThe set K in (3.6) is compact and there exists u ∈ R[x] such that u = m 2 u0 + j=1 uj gj for some s.o.s. polynomials {uj }m j=0 ⊂ Σ , and the level set {x ∈ n R | u(x) ≥ 0} is compact.


7

Remark 3.1. Notice that from Assumption 3.1, one may and will assume that every feasible solution of P satisfies µ(K) ≤ M for some M . Indeed let j ∗ be as in Assumption 3.1(i). As R K is compact, fj ∗ ≥ δ on K for some δ > 0. Next, if j ∗ ∈ I ∪ J then bj ∗ = fj ∗ dµ ≥ δµ(K); hence take M ≥ δ −1 bj ∗ . On the other hand, if j ∗ = 0 then from a feasible solution µ0 of P, one may just consider the R R R feasible solutions µ that satisfy f0 dµ ≤ f0 dµ0 , and so take M ≥ δ −1 f0 dµ0 . Depending on parity, let deg fj = 2wj or 2wj − 1, j ∈ I ∪ J ∪ {0}; similarly, let deg gj = 2vj or 2vj − 1, for all j = 1, . . . , m, and for r ≥ r0 := max[ max vj , max wj ], consider the SDP-relaxations: j=0,...,m

(3.7)

j∈{0}∪I∪J

Qr :

 inf   y    s.t         

Ly (f0 ) Mr (y) 0 Mr−vj (gj y) 0, j = 1, . . . , m , Ly (fj ) = bj , j∈I Ly (fj ) ≤ bj , j∈J y0 ≤ M

with M as in Remark 3.1, and denote by inf Qr its optimal value (and min Qr is the infimum is achieved). Notice that by construction, Qr contains only moment variables yα with |α| ≤ 2r. Qr is an obvious relaxation of P. Indeed let µ be a feasible solution of P with µ(K) ≤ M (see Remark 3.1), and let y = {yα } be the sequence of its moments (well defined because K is compact). Then y0 ≤ M and Z Z bj = fj dµ = Ly (fj ), j ∈ I; bj ≥ fj dµ = Ly (fj ), j ∈ J. In addition, as µ has its support contained in K, one has Mr (y) 0;

Mr (gj y) 0,

j = 1, . . . , m,

Rfor every r = 0, 1, . . . Hence, y is a feasible solution of Qr with value Ly (f0 ) = f0 dµ, which proves that inf Qr ≤ min P for every r ≥ r0 . Finally, it is also easy to see that inf Qr is a monotone nondecreasing sequence. The dual Q∗r of Qr is the SDP  X  sup δM + λ j bj    δ,λ,{hj }  j∈I∪J   m  X X ∗ s.t. f0 − δ − λj fj = h0 + hj gj , (3.8) Qr :   j=1 j∈I∪J     {hj } ⊂ Σ2 and deg h0 , deg hj gj ≤ 2r ∀j = 1, . . . , m   δ ≤ 0, λj ≤ 0 ∀ j ∈ J whose optimal value is denoted by sup Q∗r , or min Q∗r if the supremum is achieved at some optimal solution (δ, λ, {hj }). Of course by weak duality between dual pairs of SDPs (3.9)

sup Q∗r ≤ inf Qr ≤ inf P

∀ r ≥ r0

holds true. Q∗r is related to P∗ in (2.4) P in the following way. Let λ be a feasible P solution of ∗ P in (2.4), i.e., f := f0 − j∈I∪J λj fj ≥ 0 on K, with value ρ := j∈I∪J λj bj .

8

JEAN B. LASSERRE

If f (≥ 0 on K) has Putinar’s representation (always valid if f > 0 on K) (3.10)

f = h0 +

m X

hj gj ,

j=1

for some {hj } ⊂ Σ2 , then (0, λ, {hj }) is feasible in Q∗r as soon as 2r ≥ deg h0 , and 2r ≥ deg hj gj for all j = 1, . . . , m, and with same associated value ρ. On the other hand, if f ≥ 0 on K does not have Putinar’s representation (3.10), then one may find a feasible solution of Q∗r with value as close to ρ as desired. Indeed, let > 0 be fixed arbitrary, and let 0 > δ ≥ −M −1 . Then f − δ > 0 on K and in view of Assumption 3.1(ii), Putinar’s representation (3.10) holds with f − δ in lieu of f , provided r is sufficiently large. Hence (δ, λ, {hj }) is a feasible solution of Q∗r with value ρ + δM ≥ ρ − . Therefore, in particular ∀ > 0, ∃ r()

(3.11) ∗

s.t.

sup Q∗r ≥ sup P∗ −

∀ r ≥ r().

∗

where sup P denotes the optimal value of P in (2.4). Theorem 3.2. Let K be as in (3.6), and let Assumption 3.1 hold. Assume that the GPM (2.2) has a feasible solution and let inf P be its optimal value. For every r0 ≤ r ∈ N, let Qr be the SDP-relaxation defined in (3.7). Then inf Qr ↑ inf P = min P as r → ∞. For a proof see Appendix section. Theorem 3.2 states that the hierarchy of SDP-relaxations {Qr } approximates the optimal value inf P of the GPM (2.2) as closely as desired. It is worth emphasizing that to be well defined, the SDP-relaxations {Qr } only need that fj ∈ R[x] for all j ∈ I ∪ J ∪ {0}, and K to be a basic closed semi-algebraic set as in (3.6), but not necessarily compact; in fact K is even allowed to be the whole space Rn . So even if Assumption 3.1 does not hold, the hierarchy of SDP-relaxations {Qr } still provides a monotone nondecreasing sequence of lower bounds on inf P; but the convergence inf Qr ↑ ρ = inf P may be lost, and one may have ρ < inf P. Notice that Theorem 3.2 holds without the usual Slater type conditions that ensure sup P∗ = max P∗ = inf P (where P and P∗ are defined in (2.2) and (2.4) respectively). Therefore, the convergence inf Qr ↑ inf P in Theorem 3.2 is still compatible with a possible duality gap between P and its dual P∗ . On the other hand, if sup P∗ = inf P, an alternative (and simpler) proof of Theorem 3.2 using P∗ is possible. Indeed, let > 0 be fixed arbitrary. Combining sup P∗ = inf P with (3.9) and (3.11) yields inf P − = sup P∗ − ≤ sup Q∗r ≤ inf Qr ≤ inf P (where the first inequality on the left holds provided r is sufficiently large). As > 0 was arbitrary, one obtains inf Qr ↑ inf P and sup Q∗r ↑ inf P as well. Remark 3.3. If sup P∗ = inf P holds then one may even discard the constraint y0 ≤ M in the SDP-relaxation (3.7), and consequently discard the variable δ in ∗ the SDP-relaxation Q∗r . Indeed, let λ be a nearly optimal for P P solution of P , i.e., > 0 arbitrary small, f0 − j∈I∪J λj fj ≥ 0 on K and j∈I∪J λj bj ≥ sup P∗ − = inf P − . ∗ Recall that by Assumption 3.1(i), fj ∗ > 0 on K for some jP ∈ {0}∪I ∪J. Assume ∗ first that j ∈ I ∪ J so that necessarily bj ∗ ≥ 0. Then f0 − j∈I∪j λj fj + fj ∗ > 0


9

on K, and so, under Assumption 3.1(ii), m X X λj fj + fj ∗ = h0 + f0 − hj gj j=1

j∈I∪J

{hj }m j=0

2

for some s.o.s. polynomials ⊂ Σ ; see Putinar [30]. Let 2r ≥ max[2r0 , degh0 , maxj deg hj gj ], and let λ0j = λj

∀j 6= j ∗ ;

λ0j ∗ = λj ∗ − .

Then ({hj }, λ0 ) is feasible for Q∗r with value X X λ0j bj = λj bj − bj ∗ ≥ inf P − (1 + bj ∗ ), j∈I∪J

j∈I∪J

and so inf P ≥ inf Qr ≥ sup Q∗r ≥ inf P − (1 + bj ∗ ). As > 0 was arbitrary one obtains inf Qr ↑ inf P (and sup Q∗r ↑ inf P as well). ∗ 0 P If j =0 0 then let λj := (1 + )λj for all j ∈ I ∪ J, and notice that f0 − j∈I∪J λj (1 + )fj + f0 > 0 on K, and so under Assumption 3.1(ii), Putinar’s representation (3.10) holds, i.e., m X X f0 (1 + ) − λ0j (1 + )fj = h0 + hj gj j=1

j∈I∪J

for some s.o.s. polynomials f0 −

{hj }m j=0 X

2

⊂ Σ . Dividing by 1 + yields

λ0j fj =

j∈I∪J

m X 1 [h0 + hj gj ], 1+ j=1

0 and letting 2r ≥ max[2r0 , degh j deg hj gj ], one obtains that (λ , {hj }/(1 + )) P 0 , max 0 ∗ is feasible in Qr with value j λj bj ≥ (inf P − )/(1 + ), and so inf P ≥ inf Qr ≥ sup Q∗r ≥ (inf P−)/(1+). Again, as > 0 was arbitrary, one obtains inf Qr ↑ inf P (and sup Q∗r ↑ inf P as well).

3.3. Detecting optimality. Theorem 3.2 guarantees that the SDP-relaxations {Qr } in (3.7) converge to the desired optimal value of P. However, the convergence proved in Theorem 3.2 is only asymptotic as r → ∞. In some cases, finite convergence takes place, and below we present a sufficient condition to detect whether it has occured at some SDP-relaxation Qr . Theorem 3.4. Let v := maxj=1,...,m vj , and let Qr be the SDP-relaxation defined in (3.7). Assume that Qr has an optimal solution y that satisfies (3.12)

rank Mr (y) = rank Mr−v (y) =: s

Then y is the vector of moments of some s-atomic measure µ with support contained in K, and µ is an optimal solution of P. Proof. From a result of Curto and Fialkow [9, Theor. 1.6], also proved in Laurent [26], (3.12) implies that y is the vector of moments of some measure µ finitely supported on exactly s points {x(i)}si=1 ⊂ K. In addition we also have Z bj = Ly (fj ) = fj dµ, j ∈ I, and

Z bj ≥ Ly (fj ) =

fj dµ,

j ∈ J,

10

JEAN B. LASSERRE

which proves that µ is feasible for P. But this fact combined with Z min P ≥ min Qr = Ly (f0 ) = f0 dµ, yields that µ must be an optimal solution of P, the desired result.

It is worth noticing that Theorem 3.4 does not require Assumption 3.1 to hold. As already mentioned just after Theorem 3.2, the hierarchy of SDP-relaxations (3.7) can always be defined provided only that K is a basic closed semi-algebraic set, and fj ∈ R[x] for all j ∈ I ∪ J ∪ {0}. Even though the convergence inf Qr ↑ inf P is not guaranteed any more, finite convergence may still happen if condition (3.12) holds true at some optimal solution of Qr (whenever Qr is solvable). 4. Examples In this section we provide several examples of problems that can be viewed as particular instances of the GPM (2.2). Some of them are already stated in the form of the GPM considered in §3.2, while other need some transformation to be put in that format. 4.1. Global optimization with rational functions. Let f : Rn → R be given and let K ⊂ Rn . Suppose that we want to compute or approximate f ∗ := min { f (x) | x ∈ K }.

(4.1)

x

∗

Then f is the optimal value of the GPM (2.2) where f0 ≡ f , J = ∅, I = {1}, f1 ≡ 1 and b1 = 1. Indeed inf P ≤ f ∗ because with any feasible solution x ∈ K is associated the feasible solution measure R µ := δx ∈ M(K)+ (with δx being the Dirac ∗ at x), with associated value f dµ = f (x). On the other hand, if f ≥ f on K then R for any feasible solution µ ∈ M(K)+ it holds that f dµ ≥ f ∗ µ(K) = f ∗ , and so inf P ≥ f ∗ . Therefore f ∗ = inf P, the desired result. Notice that inf P = f ∗ holds for an arbitrary measurable function f and as such, P is not of practical use in general. On the other hand, if K ⊂ Rn is a basic closed semi-algebraic set as defined in (3.6) and f ∈ R[x], then P is directly in the form considered in §3.2. This formulation encompasses a large class of global optimization problems which includes in particular, nonconvex quadratic optimization problems as well as 0 − 1 discrete optimization problems. Under Assumption 3.1, Theorem 3.2 applies and the SDP-relaxations (3.7) converge, that is, inf Pr ↑ f ∗ . In this case, one does not need R the constraint y0 ≤ M in the SDP-relaxation Qr in (3.7) because the constraint f1 dµ = 1 reads y0 = 1. The resulting convergent SDP-relaxations have been proposed in Lasserre [19] and for more details, the interested reader is referred to [19, 20]. See also the paper [10] that describes the Gloptipoly software package which implements the SDP-relaxations {Qr } with a rank test to detect global optimality as described in Theorem 3.4, and displays results on some numerical experiments. Tests on a significant sample of examples in the literature show fast and often finite convergence. In case of a unique global minimizer x∗ ∈ K, convergence of the subvector of moments of order 1 to x∗ has also been proved in Schweighofer [32]. Next, suppose that f is now a rational function, i.e., f = p/q with p, q ∈ R[x]. We assume that q does not change sign on K otherwise f ∗ = −∞ (unless perhaps


11

if p and q have common zeros). Therefore, let us assume that q > 0 on K. Then Z Z ∗ (4.2) f = min { p dµ | q dµ = 1} µ∈M(K)+

which is an instance of the GPM with f0 ≡ p, J = ∅, I = {1} with f1 ≡ q, and b1 = 1. Indeed if f ≥ f ∗ on K,R then equivalently p ≥ f ∗ q, and so for every feasible R ∗ solution of the GPM (4.2), pdµ ≥ f × q dµ = f ∗ , which proves inf P ≥ f ∗ . For the converse statement, observe that f ∗ > −∞ because with q > 0 on K (compact), f is bounded on K. In addition f (being continuous) attains its minimum on K at ∗ −1 some global minimizer x∗ ∈ K, and so, the measure + is R µ := q(x∗ ) δx∗∗ ∈ M(K) a feasible solution of the GPM (4.2) with value pdµ = p(x )/q(x ) = f ∗ , which proves that inf P ≤ f ∗ . Therefore, when K is as in (3.6), the GPM (4.2) is in the format considered in §3.2. Again, under Assumption 3.1, Theorem 3.2 applies and the SDP-relaxations {Qr } defined in (3.7) converge to f ∗ . In the case where K has nonempty interior, one may prove at that there is no duality gap between P and P∗ and so there is no need of the constraint y0 ≤ M in the SDP-relaxation Qr ; see Remark 3.3. Jibetean and De Klerk [16] were the first to propose to approximate f ∗ by a hierarchy of SDP-relaxations, namely the hierarchy of SDP-relaxations Q∗r (without the variable δ). 4.2. Probability. Let K and S ⊂ K be Borel subsets of Rn , and consider the problem of computing an upper bound (if possible sharp) on Prob (X ∈ S) the probability that a K-valued random variable X belongs to S, given that some of its moments {yα }α∈Γ are known for some Γ ⊂ Nn . Notice that 0 ∈ Γ and y0 = 1, as µ is required to be a probability measure. Equivalently, one wants to solve the GPM Z (4.3) max { µ(S) | xα dµ = γα ∀ α ∈ Γ}, µ∈M(K)+

where f0 = IS , J = ∅, I = Γ with fα = xα , and bα = γα for all α ∈ Γ. If K is compact and S is closed then f0 is upper semicontinuous and from Lemma 2.1, P is solvable. However, even if K and S are basic closed semi-algebraic sets, the GPM (4.3) is not in the format described in §3.2 because f0 = IS is not a polynomial. There is an easy way to overcome this technical problem. Just write µ to be the sum ϕ+ν of two measures ϕ andR ν with support contained in S and K respectively. With this trick, ϕ(S) is just fb0 dϕ with fb0 ≡ 1, and the GPM (4.3) reduces to a variant of a GPM with polynomial data, but now with two unknown measures ϕ and ν instead of the single measure µ in (4.3). Namely, consider the GPM Z Z Z (4.4) P : max { fb0 dϕ | fα dϕ + fα dν = γα ∀ α ∈ Γ}. ϕ∈M(S)+ ,ν∈M(K)+

In principle one should have imposed ν(S) = 0 but observe that from a feasible solution (ϕ, ν) of P with ν(S) > 0, one easily constructs a strictly better feasible solution (ϕ0 , ν 0 ) with ν 0 (S) = 0. Indeed write ν as the sum ν1 + ν2 of two mutually singular measures ν1 , ν2 , with ν1 (B) = ν(B ∩ S) and ν2 (B) = ν(B ∩ (K \ S)) for all Borel sets B of K. Then (ϕ0 , ν 0 ) := (ϕ + ν1 , ν2 ) is feasible for P with value ϕ0 (S) = ϕ(S) + ν1 (S) ≥ ϕ(S), and ν 0 (S) = 0.

12

JEAN B. LASSERRE

We obtain the following analogue of Lemma 2.1 for this variant of the GPM. Proposition 4.1. Let P be the GPM defined in (4.4) with K and S ⊂ K being compact semi-algebraic subsets of Rn . Then P is solvable. The proof is very similar to that of Lemma 2.1. The crucial property is that in a maximizing sequence {(ϕn , νn )}, both sequences {ϕn } and {νn } are bounded, so that the sequences of probability measures {ϕn /ϕ(S)} and {νn /νn (K)} on S and K respectively, are tight. Hence, by Prohorov theorem, they are relatively compact in the weak topologies σ(M(S), C(S)) and σ(M(K), C(K)) respectively (with C(S) being the space of continuous functions on S). One may thus proceed as in the proof of Lemma 2.1. Now, the GPM (4.4) is in nice format since K and S are both compact basic semi-algebraic sets, and all fα ∈ R[x] for all α ∈ I ∪ {0} (where I = Γ). To properly define the SDP-relaxation Qr we assume that K is defined as in (3.6) and similarly, the set S ⊂ K is defined by (4.5)

S = {x ∈ Rn

| hj (x) ≥ 0

∀ j = 1, . . . , m0 }

0

for some polynomials {hj }m j=1 ⊂ R[x]. Depending on parity, let deg hj = 2wj or 2wj − 1, for all j = 1, . . . , m0 . If we let y = {yα } and z = {zα } be the moment sequences associated with ϕ and ν respectively, one obtains the following SDP-relaxation Qr , the analogue for the GPM (4.4) of the SDP-relaxation (3.7) for the GPM (2.2). With 2r ≥ max max |α|, max deg gj , max 0 deg hj j=1,...,m

α∈Γ

Pn

j=1,...,m

n

(with the notation |α| := j=1 αj , for α ∈ N ) Qr  sup y0 (= Ly (fb0 ))    y,z    s.t Mr (y), Mr (z) 0 (4.6) Qr : Mr−vj (gj z) 0,     Mr−wj (hj y) 0,   yα + zα = γα

reads

j = 1, . . . , m . j = 1, . . . , m0 α∈I

As already mentioned, we do not need the constraints y0 ≤ M and z0 ≤ M because we already have y0 + z0 = γ0 = 1. If we assume that Assumption 3.1(ii) holds for K and S as well (with obvious modifications in the statement for S) then one obtains sup Qr ↓ sup P as r → ∞. The proof is quite similar to that of Theorem 3.2 except that we now have two moment sequences y and z. Finally, observe that the dual P∗ of P reads  X  min λα γα   λ   α∈Γ  X  s.t. λα xα ≥ 1 on S , (4.7) P∗ :  α∈Γ  X    λα xα ≥ 0 on K   α∈Γ

∗

with optimal value denoted inf P (and min P∗ is the infimum is attained). As already mentioned, under some interior point condition, there is no duality gap between P and its dual P∗ . Namely, if the vector γ = {γα } is in the interior of the


13

moment space (i.e. the space of vectors for which there exists a feasible solution to P) then sup P = min P∗ ; see e.g. [7, Theor. 2.2]. So again the convergence of the hierarchy of SDP-relaxations (4.6) is compatible with a possible duality gap between P and P∗ . For an historical background and a more detailed discussion on problem P, the interested reader is referred to e.g. Bertsimas and Popescu [7], Lasserre [21] and many references therein. 4.3. Probability (continued). Moment problems in financial economics. A central question in financial economics is to find the price of a derivative security given information of the underlying asset. Under no arbitrage, the price of an European Call Option with strike k is given by E[(X − k)+ ] where E is the expectation operator with respect to the distribution of the underlying asset X, and the notation x+ stands for max[0, x]. Hence, finding an optimal upper bound on the price of an European Call Option with strike k, given the first p + 1 moments {γj }pj=0 of the price of the underlying asset, reduces to solving the GPM (2.2) with K = R, J = ∅, I = {1, . . . , p + 1} with fj = xj−1 and bj = γj−1 for all j ∈ I, and f0 = (x − k)+ . Problem P is similar to that in §4.2 except now we are in dimension n = 1. Observe that f0 is not a polynomial, and as in §4.2, we replace µ with the sum of two measures ϕ and ν with support on S := [k, +∞) and K = R respectively. That is, we again want to solve a variant of the GPM with two unknown measures ϕ and ν; namely P now reads: Z Z Z (4.8) max { f0 dϕ | fj dϕ+ fj dν = γj ∀ j = 1, . . . , p+1} ϕ∈M(S)+ ,ν∈M(K)+

where f0 = x − k ∈ R[x] and fj = xj−1 , for all j = 1, . . . , p + 1. Again we should have imposed that ν(S) = 0. However, if a feasible solution (ϕ, ν) of P is such that ν(S) > 0 then one easily constructs a feasible solution (ϕ0 , ν 0 ) with ν 0 (S) = 0 and with value at least as good. Write ν as the sum ν1 + ν2 of two mutually singular measures ν1 , ν2 , with ν1 (B) = ν(B∩S) and ν2 (B) = ν(B∩(K\S)) 0 0 for R all 0Borel R sets B Rof K. Then R (ϕ , ν ) := (ϕ + ν1 , ν2 ) is feasible for P with value f0 dϕ = f0 dϕ + f0 dν1 ≥ f0 dϕ because f0 ≥ 0 on S. The dual P∗ of P reads  Pp+1 min  j=1 λj bj  λ∈R p+1 Pp+1 s.t. λj xj−1 − (x − k) ≥ 0 on S .  Pj=1  p+1 j−1 ≥0 on R j=1 λj x Then P∗ has a complete description as an SDP because one has an appropriate description of polynomials nonnegative on an interval, in terms of a weighted sum of squares. Indeed, a nonnegative univariate polynomial is a s.o.s. and by a theorem of P´ olya and Szeg¨ o, a polynomial f nonnegative on [k, +∞) can be written as q0 + (x − k)q1 for two s.o.s. polynomials q0 , q1 ∈ R[x] such that deg q0 , (x − k)q1 ≤ deg f ; see e.g. Powers and Reznick [28]. Therefore P∗ is the SDP  Pp+1 min  j=1 λj bj  p+1   λ∈R Pp+1 j−1 s.t. − (x − k) = q0 + (x − k)q1 j=1 λj x (4.9) P∗ : . Pp+1 j−1   λ x = q j 2  j=1  qj s.o.s. ∀ j = 0, 1, 2; deg q0 , (x − k)q1 , q2 ≤ p

14

JEAN B. LASSERRE

Hence, if there is no duality gap between P and P∗ , the optimal value sup P of the GPM (4.8) is computed by solving the single SDP (4.9). For instance, this is the case if the vector γ = {γj } is in the interior of the moment space; again see [7, Theor. 2.2]. For a detailed discussion on P, the interested reader is referred to Bertsimas and Popescu [7] where various optimal bounds in closed-form are also obtained in a number of particular cases. 4.4. Transportation (or mass transfer) problem. Let K := K1 × K2 with K1 , K2 being Borel subsets of Rn1 and Rn2 respectively. Let πi : M(K)+ → M(Ki )+ , i = 1, 2 be the projection mappings. That is, for every measure µ in M(K)+ , (π1 µ)(B) := µ(B × K2 )

and

(π2 µ)(B) := µ(K1 × B)

for every Borel subset B of K1 and K2 respectively. Let f0 : K → R be a measurable function, and let ν1 ∈ M(K1 )+ , ν2 ∈ M(K2 )+ be two given probability measures on K1 and K2 respectively. Then the linear program Z Z Z MT : min f0 dµ := f0 (x, y)µ(d(x, y)) K1

K2

s.t. πi µ = νi for i = 1, 2; µ ∈ M(K)+ . is the transportation (or mass-transfer) problem, sometimes also called the MongeKantorovich problem. It is a very hold engineering problem stated by the french geometer Monge in the 18th century for military applications, and is a special case of moment problems involving measures with given marginals. For instance, with K1 = K2 and a specific distance function f0 , its optimal value also measures the distance between two probability measures ν1 and ν2 on K1 , and so induces a metric on the space of probability measures on K1 . For the interested reader, a nice discussion on moment problems involving measures with given marginals can be found in Kemperman [17, §5]. To put MT in the format of a GPM it suffices to notice that if Ki is compact for i = 1, 2, every finite measure on Ki is moment determinate, i.e. is completely determined by its moments. One may thus replace the constraint πi µ = νi with the countably many linear equality constraints Z Z α x dµ = xα dνi , ∀ α ∈ Nni ; i = 1, 2. Therefore, the mass-transfer problem MT is also R the GPM (2.2) with J = ∅, I = Nn1 ∪ Nn2 , fα = xα whenever α ∈ I, and bα = xα dνi if α ∈ Nni , i = 1, 2. If f0 is lower semi continuous then from Lemma 2.1, P is solvable. Next, if f0 ∈ R[x] then the GPM (2.2) has polynomial data but the index set I is not finite. However, in view of the particular form of the constraints, and under Assumption 3.1, the SDP-relaxation (3.7) has the natural extension  inf Ly (f0 )   y    s.t Mr (y) 0 . (4.10) Qr : Mr−vj (gj y) 0, j = 1, . . . , m   α n1  (x ) = b , α ∈ N , |α| ≤ 2r L  y α  Ly (xα ) = bα , α ∈ Nn2 , |α| ≤ 2r


15

That is, in the SDP-relaxation Qr one only considers the constraints Ly (fα ) = bα with indices α ∈ I such that |α| ≤ 2r. In addition we do not need y0 ≤ M because y0 = 1 follows from the constraints. Theorem 3.2 is valid. Indeed the proof of Theorem 3.2 easily adapts to the present setting with countable index set I. Therefore, one still obtains the monotone convergence inf Qr ↑ inf P as r → ∞. 4.5. Optimal control. We here present an application of the GPM to a generic deterministic optimal control problem (OCP). With T > 0, let: - X, K ⊂ Rn and U ⊂ Rm be basic compact semi-algebraic sets. - U be the space of measurable functions u : [0, T ] → U . - h ∈ R[t, x, u], H ∈ R[x] - f : R×Rn ×Rm → Rn a polynomial map, i.e. fk ∈ R[t, x, u] for all k = 1, . . . , n. Let x0 ∈ X and consider the following OCP: J ∗ (0, x0 ) := inf J(0, x0 , u),

(4.11)

u∈U

where Z J(0, x0 , u) (4.12)

=

T

h(s, x(s), u(s)) ds + H(x(T )) 0

x(s) ˙ = f (s, x(s), u(s)), s ∈ [0, T ) (x(s), u(s)) ∈ X × U s ∈ [0, T ) x(T ) ∈ K,

and with initial condition x(0) = x0 ∈ X. Before rewriting the OCP as a GPM we need introduce some additional notation and definitions. For a compact space X ⊂ Rn let C(X ) denote the Banach space of continuous functions on X , equipped with the sup-norm, so that C(X )∗ ' M(X ), the Banach space of finite signed measures on X , equipped with the total variation norm. Let Σ := [0, T ] × X, S := Σ × U , and let C1 (Σ) be the Banach space of functions v ∈ C(Σ) with partial derivatives ∂v/∂xj in C(Σ) for all j = 1, . . . , n. With u ∈ U , let A : C1 (Σ) → C(S) be the mapping ∂v (t, x) + hf (t, x, u), ∇x v(t, x)i, ∂t and let L : C1 (Σ) → C(S) × C(K) be the mapping

(4.13)

v 7→ A v(t, x, u) :=

v 7→ L v := (−A v, vT ), where vT (x) := v(T, x), for all x ∈ X. Notice that for an arbitrary trajectory (s, x(s), u(s)), and v ∈ C1 (Σ) one has Z T (4.14) vT (x(T )) = v(0, x0 ) + A v(s, x(s), u(s)) ds. 0

Let (µ, ν) ∈ M(S)+ × M(K)+ be defined as Z ν(H) = I [x(T ) ∈ H];

µ(A × B × C) :=

I [(x(s), u(s)) ∈ B × C] ds [0,T ]∩A

16

JEAN B. LASSERRE

for all Borel hyper rectangles H of K, A of [0, T ], B of X, and C of U (and where x 7→ I[x ∈ •] stands for the indicator function of the set •). The measure µ is called the state-action occupation measure up to time T , whereas ν is the state occupation measure at time T . Then the time integration (4.14) is the same as the spatial integration Z Z vT dν = v(0, x0 ) + A v dµ, and so, for an arbitrary trajectory (s, x(s), u(s)), one has hL v, (µ, ν)i = hv, δ(0,x0 ) i

(4.15)

∀ v ∈ C1 (Σ).

Similarly, the cost of this trajectory can be expressed via µ and ν by Z T Z Z (4.16) J(0, x0 , u) = h(s, x(s), u(s) ds + H(x(T )) = h dµ + H dν. 0

(4.15) -(4.16) will be the basis of the SDP-relaxation for the OCP (4.11)-(4.12). Next, let L∗ : M(S) × M(K) → C1 (Σ)∗ be the adjoint mapping of L, defined by h(µ, ν), L vi = hL∗ (µ, ν), vi, for all ((µ, ν), v) ∈ M(S) × M(K) × C1 (Σ). A function v : [0, T ] × Rn → R, is a solution of the Hamilton-Jacobi-Bellman optimality equation if (4.17)

inf {A v(s, x, u) + h(s, x, u)} = 0,

u∈U

(s, x) ∈ [0, T ) × X,

with boundary condition vT (x) (= v(T, x)) = H(x), for all x ∈ K. And, a function v ∈ C1 (Σ) is said to be a smooth subsolution of the Hamilton-Jacobi-Bellman equation (4.17) if A v + h ≥ 0 on [0, T ) × X × U, and v(T, x) ≤ H(x), for all x ∈ K. Then, consider the infinite-dimensional linear program (4.18)

P:

{h(µ, ν), (h, H)i |

inf (µ,ν)∈∆

L∗ (µ, ν) = δ(0,x0 ) }.

(where ∆ := M(S)+ × M(K)+ ), with optimal value denoted inf P (min P is the infimum is attained). Its dual reads (4.19)

P∗ :

sup

{hδ(0,x0 ) , vi |

L v ≤ (h, H)}

v∈C1 (Σ)

with optimal value denoted sup P∗ (max P∗ is the supremum is achieved). Note that the feasible solutions v of P∗ are precisely smooth subsolutions of (4.17). Under some conditions, there is no duality gap between P and its dual P∗ . For instance, this is the case if h, H are convex, and the set f (s, x, U ) is convex for all (s, x) ∈ Σ; see e;g. Vinter [35]. To read P as an instance of the GPM with polynomial data, it suffices to notice that the constraint L∗ (µ, ν) = δ(0,x0 ) is equivalent to (4.15) for all v in R[t, x], and so is equivalent to the countably many equality constraints (4.20)

hL(xα tk ), (µ, ν)i = hxα tk , δ(0,x0 ) i

∀ (α, k) ∈ Nn × N,

which in turn reads α Z Z x0 k α k−1 α α (4.21) T x dν − [kt x + h∇x x , f i] dµ = 0

if k = 0 , otherwise


17

for all α ∈ Nn and k ∈ N. Equivalently (4.21) reads: Z Z 1 2 fαk dν − fαk dµ = bαk , (α, k) ∈ Nn × N, 2 1 = T k xα ∈ R[x], fαk = ktk−1 xα + h∇x xα , f i ∈ R[t, x, u], and bαk = xα with fαk 0 if k = 0 and 0 otherwise.

Hence the OCP (4.11)-4.12) has same optimal value as the infinite dimensional linear program Z (4.22) P: min { h dµ + H dν | (4.20) holds}, µ∈M(S)+ ,ν∈M(K)+

a variant of the GPM (2.2) with two unknown measures µ, ν, and with J = ∅, I = n Nn × N, fα,k = xα tk and bα,k = xα 0 if k = 0 and 0 otherwise, for all (α, k) ∈ N × N. SDP-relaxations. So, let y = {yγ } (with γ ∈ Nn ) and z = {zβ } (with β ∈ N × Nn × Nm ) be the moment variables associated with ν and µ respectively. For every (α, k) ∈ Nn × N, let d(α, k) := max[|γ|, |β|] of yγ and zβ in the constraint α x0 if k = 0 k k−1 α α . (4.23) T yα − Lz [kt x + h∇x x , f i] = 0 otherwise Let K and S be compact semi-algebraic sets with K defined as in (3.6) and (4.24)

S := {(t, x, u) ∈ R × Rn × Rm

| hj (t, x, u) ≥ 0

∀ j = 1, . . . , m0 }

0

for some polynomials {hj }m j=1 ⊂ R[t, x, u]. Depending on parity, let deg gj = 2vj or 2vj − 1 and similarly, deg hj = 2wj or 2wj − 1. For every 2r ≥ r0 := max[deg h, deg H, maxi,j [vi , wj ]], the SDP-relaxation of the GPM (4.22) reads:  sup Ly (H) + Lz (h)    y,z    s.t Mr (y), Mr (z) 0 (4.25) Qr : . Mr−vj (gj z) 0, j = 1, . . . , m   0  Mr−wj (hj y) 0, j = 1, . . . , m    (4.23) (α, k) ∈ Nn × N, d(α, k) ≤ 2r Notice that from (4.21) one has ν(XT ) = 1 and µ(S) = T , so that y0 = 1 and z0 = T , and there is no need to impose bounds on y0 and z0 . Proposition 4.2. Let K ⊂ Rn , S ∈ R × Rn × Rm be compact semi-algebraic sets defined as in (3.6) and (4.24) respectively. Let Assumption 3.1(ii) holds for K and for S as well (with obvious appropriate statement for S). Let P be as in (4.22) with optimal value inf P and let Qr be the SDP-relaxation of P defined in (4.25). Then inf Qr ↑ inf P as r → ∞. Sketch of the proof. The proof mimmicks that of Theorem 3.2. For every sufficiently large r ≥ r0 , one has inf Qr > −∞. Next, let (y r , z r ) be a nearly optimal solution of Qr with value ρr ≥ inf Qr − 1/r (and of course ρr ≤ inf Qr ≤ inf P). For fixed α ∈ Nn and β ∈ N × Nn × Nm , both sequences |yαr | and |zβr | are bounded in r. Hence, proceding as in the proof Theorem 3.2, there is a subsequence {rk } such that (4.26)

y rk → y

and z rk → z

pointwise,

18

JEAN B. LASSERRE

for some y and z which are moments of two measures ν and µ supported on K and S respectively. Then, with (α, k) ∈ Nn × N fixed arbitrary, feasibility of (y r , z r ) in Qr and the pointwise convergence (4.26) imply that (4.23) holds for (α, k). Therefore (ν, µ) is feasible for P R R in (4.22). Finally, from ρr ≤ inf P and again (4.26), it follows that hdµ + Hdν = limk→∞ ρrk ≤ inf P. This fact combined with the feasibility of the pair (ν, µ) in P, proves that (ν, µ) is an optimal solution of P and so, inf Qr ↑ inf P. This approach has been tested on two examples of minimum time OCPs, i.e. OCPs where T is not fixed and is in fact the quantity to minimize. Preliminary results with different values of the initial state x0 ∈ X are very encouraging as one often gets very close to the optimal value with relatively few moments. For more details see Lasserre et al. [22]. Of course, so far, one only gets convergence to the optimal value of P and so, to the optimal value of the OCP (under appropriate conditions). Therefore, this semidefinite programming approach should be viewed as a complement to existing methods for solving the OCP; in particular, as it may provide good approximations on the optimal value, it could be used to evaluate the efficiency of other approaches that also compute feasible controls u ∈ U. Using the SDP-relaxations Qr to obtains approximations of an optimal control u∗ ∈ U is a topic of further research. A very similar approach has also been applied to option pricing in mathematical finance. The dynamic of the system now obeys an Ito’s stochastic differential equation instead of an o.d.e., and is either a geometric Brownian motion model, or a Fleming process, or an Ornstein-Uhlenbeck process. One defines the expected occupation measures ν of x(T ) at time T and µ of {x(t)} up to time T . By a standard martingale property, one obtains a linear constraint linking ν and µ, the analogue of (4.14) (with no control u) for diffusions. The only difference is that the infinitesimal generator A now contains also partial second order derivatives. For more details see Lasserre et al. [24] and Lasserre and Prieto [23]. 5. conclusion We have presented a semidefinite programming approach to the GPM (2.2) with polynomial data (and some variants). We hope to have convinced the reader that the examples detailed in section 4 show the potential of the approach to approximate (and sometimes solve exactly) nontrivial problems in various fields. Concerning application of this approach in global optimization, practice on a significant sample has revealed fast and often finite convergence; see [10]. However, in the resulting hierarchy of SDP-relaxations (3.7) one has to solve, the size of SDPs grows fast with the problem dimension n (typically Qr involves O(n2r ) variables), and even if convergence is expected to be fast, the present status of available SDP solvers limits the applicability of the approach to small/medium size problems. A research direction under investigation and with promising preliminary results, is to take advantage of sparsity in the problem data, as is often present for large scale problems; see for instance the recent works of Waki et al. [36] and Lasserre [25]. Acknowledgments This work was supported by french ANR-grant NT05 − 3 − 41612, and part of it was completed in January 2006 while the author was a member of IMS, the Institute for Mathematical Sciences of NUS (The National University of Singapore).


19

The author wishes to acknowledge financial support from IMS during that visit and to express his gratitude to the IMS staff for material support. Appendix Proof of Theorem 3.2. That inf P = min P follows from Lemma 2.1. Next, we already know that inf Qr ≤ inf P for all r ≥ r0 and that inf P > −∞ whenever P has a feasible solution. We also need to prove that inf Qr > −∞ for sufficiently large r. Let Q ⊂ R[x] be the quadratic module generated by the polynomials {gj } ⊂ R[x] that define K, i.e., Q := { σ ∈ R[x] | σ = σ0 +

m X

σj gj

2 with {σj }m j=0 ⊂ Σ }.

j=1

In addition, let Q(t) ⊂ Q be the set of elements σ of Q which have a representation Pm σ0 + j=0 σj gj for some s.o.s. family {σj } ⊂ Σ2 with deg σ0 ≤ 2t and deg σj gj ≤ 2t for all j = 1, . . . , m. Let r ∈ N be fixed. As K is compact, there exists N such that N ± xα > 0 on K, for all α ∈ Nn with |α| ≤ 2r. Therefore, under Assumption 3.1(iii), the polynomial x 7→ N ± xα belongs to Q; see Putinar [30]. But there is even some l(r) such that x 7→ N ± xα ∈ Q(l(r)) for every |α| ≤ 2r. Of course we also have x 7→ N ± xα ∈ Q(l) for every |α| ≤ 2r, whenever l ≥ l(r). Therefore, let us take l(r) ≥ r0 , with r0 ≥ maxj=0,...,m vj . For every feasible solution y of Ql(r) one has |yα | = | Ly (xα ) | ≤ N y0 ≤ N M,

|α| ≤ 2r.

This follows from y0 ≤ M , Ml(r) (y) 0 and Ml(r)−rj (y gj ) 0, which implies N y0 ± yα = Ly (N ± xα ) = Ly (σ0 ) +

m X

Ly (σj gj ) ≥ 0.

j=1

P Therefore, in particular, Ly (f0 ) ≥ −N M α |f0α | which proves that inf Ql(r) > −∞, and so inf Qr > −∞ for sufficiently large r. Next, let ρ := inf P = min P, where the latter equality follows from Lemma 2.1. From what precedes, and with k ∈ N arbitrary, let l(k) ≥ k be such that Nk ± xα ∈ Q(l(k))

(5.1)

∀ α ∈ Nn with |α| ≤ 2k,

for some Nk . Let r ≥ l(r0 ), and let y r be a nearly optimal solution of Qr with value 1 1 (5.2) inf Qr ≤ Lyr (f0 ) ≤ inf Qr + ≤ρ+ . r r Fix k ∈ N. Notice that from (5.1), one has | Lyr (xα ) | ≤ Nk y0 ≤ M Nk ,

∀ α ∈ Nn with |α| ≤ 2k,

∀ r ≥ l(k).

Therefore, (5.3)

|yαr | = | Lyr (xα ) | ≤ Nk0 ,

∀ α ∈ Nn with |α| ≤ 2k,

∀ r ≥ r0 .

20

JEAN B. LASSERRE

where Nk0 = max[M Nk , Vk ], with Vk := max { |yαr | : α,r

|α| ≤ 2k ;

r0 ≤ r ≤ l(k) }.

Complete each vector y r with zeros to make it an infinite bounded sequence in l∞ , indexed in the canonical basis in u∞ (x) of R[x]. In view of (5.3), one has y0r ≤ M and |yαr | ≤ Nk0

(5.4)

∀ α ∈ Nn with 2k − 1 ≤ |α| ≤ 2k,

and for all k = 1, 2, . . .. Hence let ybr ∈ l∞ be a new sequence defined by yb0r = y0r /M and ybαr :=

yαr , Nk0

∀ α ∈ Nn with 2k − 1 ≤ |α| ≤ 2k,

∀ k = 1, 2, . . . ,

and in l∞ , consider the sequence {b y r }r , as r → ∞. Obviously, the sequence {b y r }r is in the unit ball B1 of l∞ , and so, by the BanachAlaoglu theorem (see e.g. Ash [3]), there exists yb ∈ B1 , and a subsequence {ri }, such that ybri → γ b as i → ∞, for the weak ? topology σ(l∞ , l1 ) of l∞ . In particular, pointwise convergence holds, that is, lim ybαri → ybα

i→∞

∀ α ∈ Nn ,

Next, define y0 = M yb0 and yα := ybα × Nk0

∀ α ∈ Nn with 2k − 1 ≤ |α| ≤ 2k,

∀ k = 1, 2, . . .

The pointwise convergence ybri → yb implies the pointwise convergence y ri → y, i.e., lim yαri → yα

(5.5)

i→∞

∀ α ∈ Nn .

Next, let r ∈ N be fixed. From the pointwise convergence (5.5) we deduce that lim Mr (y ri ) = Mr (y) 0.

i→∞

Similarly lim Mr (gj y ri ) = Mr (gj y) 0,

i→∞

j = 1, . . . , m.

As r was arbitrary we obtain that (5.6)

Mr (y) 0;

Mr (gj y) 0,

j = 1, . . . , m;

r = 1, 2, . . .

By Putinar’s Positivstellensatz [30], (5.6) implies that y is the sequence of moments of some finite measure µ with support contained in K. Next, from the pointwise convergence (5.5) and the constraints of Qr , one has Ly (fj ) = lim Lyri (fj ) = bj ,

j ∈ I,

Ly (fj ) = lim Lyri (fj ) ≤ bj ,

j ∈ J.

i→∞

and i→∞

That is, µ is a feasible solution of P. Finally, the pointwise convergence (5.5) R ri (f0 ) → Ly (f0 ) = implies L f dµ, and so, from (5.2), we deduce that inf Qri → y 0 R ρ = f0 dµ, and in fact the desired result inf Qr ↑ ρ, because the sequence {inf Qr } is monotone nondecreasing.


21

References [1] N.I. Akhiezer. The Classical Moment Problem, Hafner, New York, 1965. [2] G.A. Anastassiou. Moments in Probability Theory and Approximation Theory, Pitman Research Notes in Mathematics Series, Longman Scientific & Technical, 1993. [3] R. Ash, Real Analysis and Probability, Academic Press, San Diego, 1972. [4] P. Billingsley. Convergence of Probability Measures, Wiley, New York, 1999. [5] C. Berg, The multidimensional moment problem and semi-groups, in: Moments in Mathematics, H.J. Landau (Ed.), Proc. Symp. Appl. Math. 37, American Mathematical Society, Providence, 1980, 110–124. [6] P.H. Maserick, C. Berg. Exponentially bounded positive definite functions, Illinois J. Math. 28 (1984), 162–179. [7] D. Bertsimas, I. Popescu. Optimal inequalities in Probability theory: A convex optimization approach, SIAM J. Optim. 15 (2005), 780–804. [8] R.E. Curto, L. Fialkow. Recursiveness, positivity, and truncated moment problems, Houston J. Math. 17 (1991), 603–635. [9] R.E. Curto and L. A. Fialkow. The truncated complex K-moment problem, Trans. Amer. Math. Soc. 352 (2000), 2825–2855. [10] D. Henrion, J.B. Lasserre. GloptiPoly : Global Optimization over Polynomials with Matlab and SeDuMi, ACM Trans. Math. Soft. 29 (2003), 165–194. ńdez-Lerma, J.B. Lasserre. Markov Chains and Invariant Probabilities, [11] O. Herna Birkh¨ auser Verlag, Basel, 2003. ńdez-Lerma, J.B. Lasserre. Approximation schemes for infinite linear programs, [12] O. Herna SIAM J. Optim. 8 (1998), 973–988. [13] K. Isii. The extrema of probability determined by generalized moments (I): bounded random variables, Annals Inst. Math. Stat. 12 (1960), 119–133. [14] K. Isii. On sharpness of Tchebycheff-type inequalities, Ann. Inst. Stat. Math. 14 (1963), pp. 185-197. [15] T. Jacobi, A. Prestel. Distinguished representations of strictly positive polynomials, J. Reine. Angew. Math. 532 (2001), 223–235. [16] D. Jibetean, E. de Klerk. Global optimization of rational functions: a semidefinite programming approach, Math. Prog. A 106 (2006), 93–109. [17] J.H.B. Kemperman. Geometry of the moment problem, in: Moments in Mathematics, H.J. Landau (Ed.), Proc. Symp. Appl. Math. 37, American Mathematical Society, Providence, 1980, 16–53. [18] H.J. Landau. Moments in Mathematics, H.J. Landau (Ed.), Proc. Symp. Appl. Math. 37, American Mathematical Society, Providence, 1980. [19] J.B. Lasserre. Global optimization with polynomials and the problem of moments, SIAM J. Optim. 11 (2001), 796–817. [20] J.B. Lasserre. An explicit equivalent positive semidefinite program for nonliner 0-1 programs, SIAM J. Optim. 12 (2002), 756–769. [21] J.B. Lasserre. Bounds on measures satisfying moment conditions, Annals Appl. Prob. 12 (2002), 1114–1137. [22] J.B. Lasserre, C. Prieur, D. Henrion. Nonlinear optimal control: Numerical approximations via moments and LMI relaxations, Proc. 44th IEEE Conference on Decision and Control, Sevilla, Spain, December 2005, pp. 1648–1653. [23] J.B. Lasserre, T. Prieto-Rumeau. SDP vs. LP Relaxations for the Moment Approach in Some Performance Evaluation Problems, Stoch. Models 20 (2004), 439–456. [24] J.B. Lasserre, T. Prieto-Rumeau, M. Zervos. Option pricing exotic options via moments and semidefinite relaxations, Math. Finance, to appear. [25] J.B. Lasserre. Convergent semidefinite relaxations in polynomial optimization with sparsity, Technical report #05612, LAAS-CNRS, Toulouse France, 2005. Submitted. [26] M. Laurent. Revisiting two theorems of Curto and Fialkow on moment matrices, Proc. Amer. Math. Soc. 133 (2005), 2965–2976. [27] A.E. Nussbaum. Quasi-analytic vectors, Ark. Mat. 6 (1966), 179–191. [28] V. Powers, B. Reznick. Polynomials that are positive on an interval, Trans. Amer. math. Soc. 352 (2000), 4677–4692.

22

JEAN B. LASSERRE

[29] L.C. Petersen. On the relation between the multidimensional moment problem and the one dimensional moment problem, Math. Scand. 51 (1982), 361–366. [30] M. Putinar. Positive polynomials on compact semi-algebraic sets, Indiana Univ. Math. J. 42 (1993), 969–984. ¨dgen. The K-moment problem for compact semi-algebraic sets, Math. Ann. 289 [31] K. Schmu (1991), 203–206. [32] M. Schweighofer. Optimization of polynomials on compact semialgebraic sets, SIAM J. Optim. 15 (2005), 805–825. [33] J.A. Shohat, J.D. Tamarkin. The Problem of Moments, AMS, New York, 1943. [34] B. Simon. The classical moment problem as a self-adjoint finite difference operator, Adv. Math. 137 (1998), 82–203. [35] R. Vinter. Convex duality and nonlinear optimal control, SIAM J. Control Optim. 31 (1993), 518–538. [36] H. Waki, S. Kim, M. Kojima, M. Maramatsu. Sums of squares and semidefinite programming relaxations for polynomial optimization problems witth structured sparsity, Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology, Tokyo, 2004. LAAS-CNRS and Institute of Mathematics, LAAS 7 Avenue du Colonel Roche, 31077 ´dex 4, France. Toulouse Ce E-mail address: [email protected]