LMI approximations for cones of positive semidefinite forms∗ Luis F. Zuluaga†
Javier Pe˜ na
Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213-3890, USA e-mail:
[email protected]
Tepper School of Business Carnegie Mellon University Pittsburgh, PA 15213-3890, USA e-mail:
[email protected]
Juan Vera‡ Department of Mathematical Sciences Carnegie Mellon University Pittsburgh, PA 15213, USA email:
[email protected]
October 28, 2003, Revised February 14, 2005
Abstract An interesting recent trend in optimization is the application of semidefinite programming techniques to new classes of optimization problems. In particular, this trend has been successful in showing that under suitable circumstances, polynomial optimization problems can be approximated via a sequence of semidefinite programs. Similar ideas apply to conic optimization over the cone of copositive matrices, and to certain optimization problems involving random variables with some known moment information. We bring together several of these approximation results by studying the approximability of cones of positive semidefinite forms (homogeneous polynomials). Our approach enables us to extend the existing methodology to new approximation schemes. In particular, we derive a novel approximation to the cone of copositive forms, that is, the cone of forms that are positive semidefinite over the non-negative orthant. The format of our construction can be extended to forms that are positive semidefinite over more general conic domains. We also construct polyhedral approximations to cones of positive semidefinite forms over a polyhedral domain. This opens the possibility of using linear programming technology in optimization problems over these cones. ∗ Supported
by NSF grants CCF-0092655 supported by NSF grant DMI-0098427 ‡ This paper was written while the second author was visiting the Universidad de los Andes in Bogot´ a, Colombia. † Partially
1
1
Introduction
An interesting recent trend in optimization is the use of semidefinite programming techniques for solving or approximating new classes of optimization problems. In particular, Lasserre [17] proposed a general solution approach for polynomial optimization problems via semidefinite programming. Independently, Parrilo [23, 24] developed semidefinite programming techniques to address semialgebraic problems in control theory. In addition to the work by Lasserre and Parrilo, the idea of approximating a set of positive semidefinite polynomials is also present in the work by Bertsimas and Popescu [1], de Klerk and Pasechnik [4], Laurent [19, 20], Popescu [26], and Kojima et al. [16]. A fundamental ingredient underlying most of these approaches, as well as earlier related work by Shor [34, 35] and Nesterov [22], is to recast the feasibility of a finite system of polynomial equations and inequalities in terms of an alternative polynomial identity involving squares of (unknown) polynomials. Computable relaxations (via semidefinite programming) of the feasibility problem can then be obtained by solving a degree-restricted version of the alternative polynomial identity. For instance, Markov-Lukacs Theorem states that a single-variable polynomial is non-negative if and only if it is a sum of squares of two single-variable polynomials. The latter can then be recast in terms of positive semidefinite matrices as discussed by Nesterov [22]. In general, for a system of finitely many polynomial equations and inequalities, the powerful Positivstellensatz Theorem from real algebraic geometry (see, e.g., [2, 27]) ensures the existence of such an alternative polynomial identity. When the problem possesses additional structure, more specialized versions of the Positivstellensatz can be applied. Some of such results are Schm¨ udgen’s [33], Putinar’s [28], P´ olya’s [9], Reznick’s [31], and Handelman’s [6] Theorems. These results provide a fundamental step in the development of solution techniques for various classes of polynomial optimization problems via semidefinite programming [1, 4, 16, 17, 18, 19, 20, 23, 24]. In this paper we study the approximability of the cone Pn,m (D) of positive semidefinite forms (homogeneous polynomials) of degree m over a semialgebriac conic domain D ⊆ IRn . This approach allows us to bring together a number of previously known approximation results for polynomial optimization problems. By considering the cone of positive semidefinite forms Pn,m (D), we can systematically apply homogenized versions of representation theorems from algebraic geometry to show that a given cone of positive semidefinite forms can be approximated by a sequence of cones, where each cone in the sequence has a description in terms of linear matrix inequalities (LMI). This generic approximation format is an extension of the ones presented by De Klerk and Pasechnik [4], and by Lasserre [17]. De Klerk and Pasechnik show that Parrilo’s hierarchy of sufficient criteria for copositivity can be seen as sequence of cones that converge to the copositive cone, where each cone in the sequence has an LMI-description. Lasserre’s approximation approach for polynomial optimization problems can also be phrased, after a suitable homogenization, in a similar fashion.
2
In addition to gathering several previously known approximation results for polynomial optimization problems, our approach to cones of positive semidefinite forms enables us to develop some new approximation results. In particular, we give a generalization of the (sufficient) criterion for copositivity proposed by Parrilo in [23] (Section 4). In the approximation format above, this corresponds to a sequence of cones converging to Pn,m (IRn+ ), each of which has an LMI-description. The two key ideas of our construction are to approximate Pn,m (IRn+ ) with simply described sets En,m (IRn+ ) and to embed Pn,m (IRn+ ) in a higher dimensional cone Pn,m+r (IRn+ ). We initially introduce En,m (IRn+ ) as the set of n-degree forms θ such that θ(x21 , . . . , x2n ) is a sum of squares; subsequently, we show (Proposition 4) that En,m (IRn+ ) has an alternative simpler description. The latter yields an interesting new description of the successive LMI approximations to the cone of copositive matrices proposed by Parrilo [23]. In addition, it allows us to extend our ideas further: First to Pn,m (D) for a pointed polyhedral domain D (Section 5), and then to Pn,m (D) for a pointed semialgebraic conic domain D (Section 6). In Sections 4 and 5, the fundamental representation theorem that ensures the convergence of the constructed approximation is P´ olya’s Theorem. In Section 6, the representation theorem ensuring the convergence of the approximation sequence is Schm¨ udgen’s Theorem. Section 3, which serves as a preamble to the main three sections, discusses the conceptually simpler case of approximating the cone Pn,m (IRn ) of positive semidefinite forms over IRn . In this case the representation theorem underlying the construction is due to Reznick [31]. When the domain D is polyhedral, in addition to semidefinite approximations; we provide polyhedral approximations, that is approximations via linear inequalities only, for Pn,m (D). This construction is an extension of the polyhedral approximations for the copositive cone proposed by De Klerk and Pasechnik [4]. Although polyhedral approximations for Pn,m (D) are in general weaker (inclusion-wise) than semidefinite approximations; they open possibilities for use of the highly developed linear programming technology. Given the limitations of current semidefinite programming solvers to handle large-scale problems, the availability of polyhedral approximations can potentially yield enhancements in the solution techniques for problems involving cones of positive semidefinite forms. The rest of the paper is organized as follows. In Section 2 we introduce some key definitions and present Theorem 1, which formally defines the format of the approximation results discussed in the sequel. In Sections 3 and 4 we construct inner approximations for the cones Pn,m (IRn ) and Pn,m (IRn+ ) respectively. In the latter case, which generalizes the cone of copositive matrices, along with a sequence of semidefinite approximations, we present a sequence of polyhedral approximations. In Section 5 we generalize the construction and key results from Section 4 to the cone Pn,m (D) when D is a pointed polyhedral cone. Section 6 discusses similar results for the more general cone of positive semidefinite forms over pointed semialgebraic cones.
3
2 2.1
Preliminaries Monomials, polynomials and forms
We begin by recalling some standard multinomial notation and terminology. Given α := (α1 , . . . , αn ) ∈ INn and a vector of variables x := (x1 , . . . , xn ) the expression xα denotes the monomial n xa1 1 · · · xα n .
We also write |α| for α1 + · · · + αn . Let Hn,m denote the set of forms (homogeneous polynomials) of degree m in n variables with real coefficients. A form θ(x) in Hn,m can be written as X θ(x) = θα xα . |α|=m
We shall identify the form θ(x) with the vector of its coefficients θ := (θα )|α|=m . Formally speaking, θPdenotes the vector (θα )|α|=m , and for a given x ∈ IRn , θ(x) denotes the value |α|=m θα xα , i.e., the value of the form θ evaluated at x. Via this identification, the set Hn,m can in turn be identified with the Euclidean space IRnm , where n+m−1 n nm := |{α ∈ IN : |α| = m}| = . n−1 We will make extensive use of this identification. In particular we endow Hn,m with the dot-inner product. In other words, define the inner product of θ, φ ∈ Hn,m as X hθ, φi := θα φα . |α|=m
We shall also frequently use the vector-valued function σ m : IRn → IRnm defined by x 7→ (xα )|α|=m . Notice that by construction, the following identity holds for all θ ∈ Hn,m and x ∈ IRn . θ(x) = hθ, σ m (x)i. For the special case m = 2, the space of 2-forms Hn,2 = IRn2 can also be identified with the space Sn of n × n symmetric matrices. The identification is via the one-to-one correspondence between symmetric matrices and quadratic forms Q ∈ Sn 7→ q ∈ Hn,2 , where q(x) := xT Qx.
4
2.2
Positive semidefinite forms
Definition 1 Given a cone D ⊆ IRn , let Pn,m (D) be the cone of m-degree forms that are positive semidefinite in D (psd in D), i.e., Pn,m (D) := {θ ∈ Hn,m : θ(x) ≥ 0 for all x ∈ D}. When m = 2, D = IRn , and identifying Hn,2 with Sn , the cone Pn,2 (IRn ) corresponds precisely to the cone of positive semidefinite matrices, usually denoted Sn+ . We shall write A 0 for A ∈ Sn+ following the usual notation in the semidefinite programming literature (see, e.g., [37]). If θ ∈ Pn,m (D) satisfies θ(x) > 0 for all x ∈ D, x 6= 0, then p is said to be positive definite in D (pd in D). Throughout our presentation we will frequently rely on the following straightforward characterization of the interior of Pn,m (D). Observation 1 Assume D ⊆ IRn is a closed cone. Then θ is pd in D if and only if θ ∈ int(Pn,m (D)). Notice that the class of cones of positive semidefinite polynomials can be seen as a subclass of cones of positive semidefinite forms via homogenization. Consequently, our presentation focuses on the latter class of cones. An advantage of working with cones of forms is the algebraic characterization of the interior of Pn,m (D) stated in Observation 1. The analog statement in general fails for cones of polynomials due to the possibility of “zeros at infinity” (see [30]). For example, the polynomial g(x1 , x2 ) = x21 + (1 − x1 x2 )2 is positive definite in IR2 but for all > 0, g(x1 , x2 ) − is not psd in IR2 because limx1 →∞ g(x1 , 1/x1 ) = 0.
2.3
LMI-approximations of Pn,m (D)
Theorem 1 below summarizes the main results discussed in this paper: For several important classes of conic domains D, the cone Pn,m (D) can be innerly approximated by a sequence of cones definable in terms of linear matrix inequalities (LMI). Theorem 1 can be seen as a compilation and extension of previous approximation results for polynomial optimization problems [1, 4, 16, 17, 18, 19, 20, 23, 24]. Theorem 1 Suppose D = IRn , D = IRn+ or D is a pointed semialgebraic cone. Then we can construct a sequence of cones K r , r = 0, 1, . . . such that (i) K r ⊆ K r+1 ⊆ Pn,m (D), r = 0, 1, . . . (ii) int(Pn,m (D)) ⊆
∞ [
Kr
r=0 r
(iii) Each K has an LMI description. In other words, K r = {θ ∈ Hn,m : ∃Φ 0 s.t. Lθ = T Φ} for some suitable linear mappings L, T . 5
(1)
(iv) If D is a polyhedral cone, then we can also construct a polyhedral sequence of cones approximating Pn,m (D) as above (i.e. with Φ ≥ 0 in (1)). Proof. See Propositions 1, 7, and 10 in Sections 3, 5, and 6 respectively.
2
Throughout the sequel, we shall write K r ↑ Pn,m (D) as a shorthand for conditions (i) and (ii) in Theorem 1. Also, whenever we say that {K r , r = 0, 1, . . .} is a sequence of inner approximations for Pn,m (D) it will be implicitly assumed that {K r , r = 0, 1, . . .} satisfies conditions (i), (ii), and (iii) in Theorem 1. Theorem 1 in combination with Theorem 2 below readily yield SDP-based numerical schemes for computing arbitrarily close approximate solutions to primal-dual pairs of conic programs of the form zP = (P)
inf s.t.
hc, θi Aθ = b θ ∈ Pn,m (D)
zD = (D)
sup s.t.
hb, vi c − A∗ v ∈ Pn,m (D)∗ ,
for the classes of conic domains D in Theorem 1. Here A∗ is the adjoint of A and Pn,m (D)∗ is the dual cone of Pn,m (D). This general format underlies several of the ideas and results in [17, 18, 23]. Consider the primal-dual pair of conic programs obtained when Pn,m (D) is replaced by K r in (P) and (D): zP r = (Pr )
inf s.t.
hc, θi Aθ = b θ ∈ Kr
(Dr )
zDr = sup s.t.
hb, vi c − A∗ v ∈ (K r )∗ ,
Theorem 2 below formalizes the intuitively natural fact that (P) and (D) are approximated when the cone Pn,m (D) is suitably approximated by a sequence of cones. This result can be seen as a strengthening of the classical strong conic duality theorem in convex analysis (cf. [29, 32]). Theorem 2 Assume (D) is feasible, A is surjective, and (P) is strictly feasible (i.e., there exists θ ∈ int(Pn,m (D)) such that Aθ = b). Let K r ↑ Pn,m (D) and (P), (Pr ), (D), (Dr ) be as above. Then (i) (Dr ) is feasible and zDr ≥ zDr+1 ≥ zD for r = 0, 1, . . .. (ii) For r sufficiently large, zPr = zDr and (Dr ) has an optimal solution v r . (iii) lim zDr = zD = zP , the set {v r : r = 0, 1, . . .} is bounded, and every limit r→∞
point of {v r : r = 0, 1, . . .} is an optimal solution of (D).
Proof. This is a direct consequence of conic duality. It is a dual version of [39, Thm. 2] and can be proven by a similar argument. 2
6
3
Positive semidefinite forms in IRn
In this section we concentrate on the cone Pn,2m (IRn ), which we shall abbreviate as Pn,2m .
3.1
First approximation: sums of squares
Let Σn,2m denote the cone of forms in Hn,2m that are sum of squares (sos), that is, Σn,2m := conv{φ(x)2 : φ ∈ Hn,m }. (Here conv(S) denotes the convex hull of the set S.) Notice that Σn,2m ⊆ Pn,2m (IRn ) for all m, n. This inclusion is proper except for some special cases. This is a classical result due to Hilbert [10]. Theorem 3 (Hilbert) Σn,2m = Pn,2m (IRn ) if and only if n ≤ 2, or m ≤ 1, or (n, m) = (3, 2). The inclusion Σn,2m ⊆ Pn,2m gives an inner approximation of Pn,2m and henceforth a sufficient condition for positive semidefiniteness: a form is psd if it is a sos. Notice that for φ ∈ Hn,m we have φ(x)2 = hφ, σ m (x)i2 = σ m (x)T (φφT )σ m (x). This yields the following observation. Observation 2 Let θ ∈ Hn,2m . Then θ ∈ Σn,2m if and only if there exists Φ ∈ Snm , Φ 0 such that θ(x) = σ m (x)T Φσ m (x). Notice that the identity θ(x) = σ m (x)T Φ σ m (x) corresponds to a linear system of equations in Φ and θ and therefore optimizing a linear function with linear restrictions over Σn,2m can be cast as a SDP problem. The study of the relationship between psd forms and sos has a long history. The search for such kinds of connections is closely tied with Hilbert’s 17th problem [11] and with advances in real algebra over the last century [27, 31]. Our work relies on some of these developments. For a detailed account of the rich history of this subject, we refer the reader to the excellent references [27, 30, 31].
3.2
Inner approximations for Pn,2m
Using Σn,2m as a starting point, we next construct a sequence of inner approximations for Pn,2m . The construction is based the following key representation theorem for positive definite forms due to Reznick [31]: Theorem 4 (Reznick) Let θ ∈ Hn,2m . If θ is pd in IRn then there exists r ∈ IN such that r n X x2j θ(x) ∈ Σn,2(m+r) . j=1
7
2
Proof. See [31, Thm. 3.12].
Theorem 4 naturally suggests the following sequence of inner approximations for Pn,2m : For r = 0, 1, . . . let r o n P n 2 r θ(x) ∈ Σ x Kn,2m (IRn ) := θ ∈ Hn,2m : n,2(m+r) j=1 j n nm+r = θ ∈ Hn,2m : ∃ Φ ∈ S , Φ 0 s.t P r o n 2 m+r T m+r x θ(x) = σ (x) Φσ (x) . j=1 j The last identity holds by Observation 2 and automatically gives an LMI r (IRn ): description of Kn,2m r Kn,2m (IRn ) = {θ ∈ Hn,2m : ∃ Φ ∈ Snm+r , Φ 0 s.t Lθ = T Φ},
(2)
where L : Hn,2m → Hn,2(m+r) and T : Snm+r → Hn,2(m+r) are the linear maps P r n 2 defined by (Lθ)(x) := θ(x) and (T Φ)(x) := σ m+r (x)T Φσ m+r (x). j=1 xj r Proposition 1 Kn,2m (IRn ) ↑ Pn,2m .
Pn 2 Proof. Let θ ∈ Hn,2m . If θ(x) ∈ Σn,2m then j=1 xj θ(x) ∈ Σn,2m+2 . Pn 2 Also, if j=1 xj θ(x) ∈ Σn,2m+2 then θ(x) ∈ Pn,2m . These two facts imply n r+1 r Kn,2m (IR ) ⊆ Kn,2m (IRn ) ⊆ Pn,2m for all r. Finally, from Observation 1 and S∞ r Theorem 4 int(Pn,2m ) ⊆ r=0 Kn,2m (IRn ). 2 S∞ r Remark 1 In general the inclusion r=0 Kn,2m (IRn ) ⊆ Pn,2m in Proposition 1 is strict. For example, it is known (see, e.g., [31]) that the form θ(x1 , x2 , x3 , x4 ) = x21 (x22 x43 + x23 x44 + x24 x42 − 3x22 x23 x24 ) + x84 P4 satisfies θ ∈ P4,8 but for all r we have ( j=1 x2j )r θ(x) 6∈ Σ4,8+2r . Thus θ 6∈ S∞ 4 r r=0 K4,8 (IR ). As shown in Example 1 below, Proposition 1 yields an approximation scheme for unconstrained polynomial optimization. This approximation scheme is similar to the one proposed by Lasserre in [17]. However, notice that Proposition 1 and Theorem 2 guarantee convergence regardless of the availability of any a priori bound on the size of the minimizer of the polynomial. Example 1 (Unconstrained polynomial optimization) Let g(x) be a given (2m)-degree polynomial in n variables (non-necessarily homogeneous) and consider the problem of finding g ∗ := min{g(x) : x ∈ IRn }.
8
Notice that this problem is equivalent to max l s.t. g(x) ≥ l ∀x ∈ IRn .
(3)
Without loss of generality assume the constant term of g(x) is zero, i.e., g~0 = 0. Thus, by homogenizing, we can recast (3) as max s.t.
−θ(~0,2m) θ(α,αn+1 ) = gα , ∀ |(α, αn+1 )| = 2m, α 6= ~0 θ ∈ Pn+1,2m (IRn × IR+ ) = Pn+1,2m (IRn+1 ).
For each nonnegative integer r, it is natural to consider the approximation max −θ(~0,2m) s.t. θ(α,αn+1 ) = gα ∀ |(α, αn+1 )| = 2m, α 6= ~0 r θ ∈ Kn+1,2m (IRn+1 ). However, if the homogenization of g(x) − g ∗ fails to be in the interior of Pn+1,2m (IRn+1 ), then Theorem 2 cannot be applied. This can be fixed by slightly 2m 2m perturbing the problem: Let h(x) := x2m 1 + x2 + · · · + xn and consider g r := max s.t.
−θ(~0,2m) 1 θ(α,αn+1 ) = gα + r+1 hα , ∀ |(α, αn+1 )| = 2m, α 6= ~0 n+1 r θ ∈ Kn+1,2m (IR ).
By construction, this is a semidefinite program. Also by Proposition 1 and Theorem 2, g r ↑ g ∗ . 1 We note that the perturbation r+1 h(x) in Example 1 resembles one of the ideas in the construction of Hanzon and Jibetean [8], and Jibetean [13].
4
Copositive forms
In this section we concentrate on the cone Pn,m (IRn+ ), which we call the cone of m-degree copositive forms in n variables. We describe two families of inner approximations of Pn,m (IRn+ ). The first one is analogous to that of Section 3. The second one is a sequence of polyhedral cones.
4.1
Inner approximations for Pn,m (IRn+ )
Let S : Hn,m → Hn,2m , be the mapping θ(x) 7→ θ(x2 ) := θ(x21 , . . . , x2n ). In other words, θ 21 α if every αi is even, (4) [S θ]α = 0 otherwise.
9
Since θ(x) ≥ 0 for all x ∈ IRn+ if and only if θ(x2 ) ≥ 0 for all x ∈ IRn , it follows that θ ∈ Pn,m (IRn+ ) ⇔ S θ ∈ Pn,2m . (5) Inspired by (5) we define (for r = 0, 1, . . . ) r r Kn,m (IRn+ ) := {θ ∈ Hn,m : S θ ∈ Kn,2m (IRn )}. r From Proposition 1 and (5) it follows that the sequence of cones Kn,m (IRn+ ) is n a sequence of inner approximations of Pn,m (IR+ ). r Proposition 2 Kn,m (IRn+ ) ↑ Pn,m (IRn+ ). r r (IRn+ ). The LMI description (2) for Kn,2m (IRn ) yields an LMI description for Kn,m However, a more concise description can be obtained via the cone of elementary copositive forms En,m (IRn+ ) ⊆ Pn,m (IRn+ ), defined as follows
En,m (IRn+ ) := {θ ∈ Hn,m : S θ ∈ Σn,2m }.
(6)
r (IRn+ ); namely This gives an alternative definition of Kn,m
r n X r Kn,m (IRn+ ) = θ ∈ Hn,m : xj θ(x) ∈ En,m+r (IRn+ ) . j=1
In Section 4.3 we give an alternative and more concise description of En,m (IRn+ ) without relying on the operator S .
4.2
Polyhedral approximations
Now we construct a sequence of polyhedral approximations for Pn,m (IRn+ ). This construction is based in the representation theorem due to P´olya [9]: Theorem 5 (P´ olya) Let θ ∈ Hn,2m . If θ is pd in IRn+ then there exists r ∈ IN such that r n X xj θ(x) has non-negative coefficients. j=1
2
Proof. See [9, Thm. 56]. It is now natural to define r n X r Cn,m (IRn+ ) := θ ∈ Hn,m : xj θ(x) has non-negative coefficients. . j=1
r Notice that Cn,m (IRn+ ) can also be viewed as a modification of the construcn r r r tion of Kn,m (IR+ ). Clearly Cn,m (IRn+ ) ⊆ Kn,m (IRn+ ) as any form of degree d
10
r with non-negative coefficients is in En,d (IRn+ ). By construction, Cn,m (IRn+ ) has n r an LMI description. Indeed, Cn,m (IR+ ) is a polyhedral cone. r The sequence Cn,m (IRn+ ), r = 0, 1, . . . also yields a sequence of inner approxn imations for Pn,m (IR+ ). r (IRn+ ) ↑ Pn,m (IRn+ ). Proposition 3 Cn,m
Proof. This follows from Proposition 2 and Theorem 5. 2 S∞ n n r Again, as in Proposition 1, the inclusion r=0 Cn,m (IR+ ) ⊆ Pn,m (IR+ ) is strict in general. For example, the form θ(x1 , x2 , x3 , x4 ) = x1 (x2 x23 + x3 x24 + x4 x22 − 3x2 x3 x4 ) + x44 S∞ r satisfies θ ∈ P4,4 (IR4+ ) but θ 6∈ r=0 C4,4 (IR4+ ).
Alternative characterization of En,m (IRn+ )
4.3
The next proposition yields a characterization of the elementary copositive forms En,m (IRn+ ) without relying on the operator S . Aside from being more concise, this alternative description for En,m (IRn+ ) has natural extensions to pointed polyhedral and semialgebraic cones (cf. Sections 5 and 6). Proposition 4 The sets En,m (IRn+ ) defined above satisfy the following identity En,m (IRn+ ) = conv{xi1 xi2 · · · xik ψ(x)2 : m − k is even, ψ ∈ Hn,(m−k)/2 , and i1 , . . . , ik ∈ {1, . . . , n}}. Proof. The “⊇” inclusion is P immediate. For the reverse inclusion, assume k θ ∈ En,m (IRn+ ). Hence θ(x2 ) = i=1 φi (x)2 , for some φi ∈ Hn,m , i = 1, . . . , k. Writing each φi in terms of its monomial expansion we get X X 0 φi (x)2 = φi,α φi,α0 xα xα . |α|=m |α0 |=m
Now let par(α) ∈ {0, 1}n be the vector of parities of α, defined by: par(α)i = 0 if αi is even, and par(α)i = 1 otherwise. Since all monomials in θ(x2 ) only contain variables with even powers, it follows that θ(x2 ) =
k X
φi (x)2
i=1
=
k X
X
X
0
{φi,α φi,α0 xα xα : par(α) = par(α0 ) = P, |α| = |α0 | = m}.
i=1 P∈{0,1}n
(All other terms cancel out.) Thus, θ(x2 ) =
k X i=1
φi (x)2 =
k X
X
X
{φi,α xα : par(α) = P, |α| = m }
i=1 P∈{0,1}n
11
2
.
Since α ≥ par(α), we can rewrite the previous expression as θ(x2 ) =
k X
φi (x)2 =
i=1
k X
X
x2P
X
2 {φi,α xα−P : par(α) = P, |α| = m} .
i=1 P∈{0,1}n
Finally since all entries in α − par(α) are even, we get θ(x) =
k X
X
xP
X
2 {φi,α x(α−P)/2 : par(α) = P, |α| = m} .
i=1 P∈{0,1}n
2 The following inductive and LMI descriptions of the sets follow from Proposition 4 and Observation 2.
En,m (IRn+ )
readily
Corollary 1 (i) The sets En,m (IRn+ ) satisfy the following recursive relationships En,1 (IRn+ ) = conv{xj : j = 1, . . . , n}, En,2 (IRn+ ) = conv({(aT x)2 : a ∈ IRn } ∪ {xi xj : 1 ≤ i < j ≤ n}), En,2k+1 (IRn+ ) = conv{xj θ(x) : θ ∈ En,2k , j = 1, . . . , n}, En,2k+2 (IRn+ ) = conv(Σn,2(k+1) ∪ {xj θ(x) : θ ∈ En,2k+1 , j = 1, . . . , n}). (ii) The sets En,m (IRn ) can be defined via the following LMI identities En,1 (IRn+ ) := {θ ∈ Hn,1 : ∃a ∈ IRn+ s.t. θ(x) = aT x}, En,2 (IRn+ ) := {θ ∈ Hn,2 : ∃M, N ∈ Sn , M 0, N ≥ 0 s.t. θ(x) = xT (M + N )x}, n , M i 0, N i ≥ 0, i = 1, . . . , n En,3 (IRn+ ) := {θ ∈ Hn,3 : ∃M i , N i ∈ SP s.t. θ(x) = i xi (xT (M i + N i )x)},
.. . The map S defined in (4) establishes a nice parallel between the pairs (Σn,2m , Pn,2m ) and (En,m (IRn+ ), Pn,m (IRn+ )). Extending this parallel, we note that the inclusion En,m (IRn ) ⊆ Pn,m (IRn+ ) is proper except for the special cases described in Proposition 5. This result follows from Theorem 3 (Hilbert’s Theorem) and a classical result on copositive forms due to Diananda [5, Thm. 2]. For details see [38]. Proposition 5 En,m (IRn ) = Pn,m (IRn+ ) if and only n ≤ 2, or m = 1, or (n, m) = (3, 2), or (n, m) = (4, 2). The particular case m = 2 in Proposition 2 yields the hierarchy of sufficient conditions for copositivity of matrices proposed by Parrilo [23]; namely, 12
a symmetric matrix A ∈ Sn is copositive (i.e., a(x) := xT Ax ∈ Pn,2 (IRn+ )) if the following r-criterion holds r n X x2j (x2 )T A x2 ∈ Σn,4+2r . (7) j=1
The LMI description of the sets En,m (IRn+ ) in Corollary 1 yields an alternative LMI formulation of Parrilo’s r-criterion for r = 0, 1, 2, 3, . . . In particular, a new succinct derivation of the criterion for copositivity proposed by Parrilo in [23] can be obtained [38].
5
Psd forms over pointed polyhedral cones
We next construct approximation schemes for the cone Pn,m (D) in the case when D is a pointed polyhedral cone. Throughout this section we shall assume that the domain D is the polyhedral cone D = {x : aT i x ≥ 0, i = 1, . . . , q}, T for some matrix a1 . . . aq ∈ IRq×n . We shall also assume that D is pointed, i.e., it contains no lines.It is easy to see that the latter condition is equivalent T to rank a1 . . . aq = n. Notice that IRn+ = {x : eT i x ≥ 0, i = 1, . . . , n}, and therefore this section is an extension of the previous one. To get inner approximating sequences of cones for Pn,m (D) we first extend Polya’s theorem to this context. This generalization can be obtained as a consequence of a representation theorem for polynomials positive on compact polyhedra due to Handelman [6] (a constructive proof of such theorem is presented in [31, Thm. 2]). At the end of this section we give a proof of this result that relies exclusively on elementary tools and P´ olya’s Theorem. Proposition 6 Assume D = {x : aT i x ≥ 0, i = 1, . . . , q} is pointed and θ ∈ int(Pn,m (D)). Then for N sufficiently large T N T T (aT 1 x + . . . + aq x) θ(x) = φ(a1 x, . . . , aq x)
for some φ ∈ Hq,m+N with φ ≥ 0. Here φ ≥ 0 means that the form φ has non-negative coefficients. Now, we can extend the ideas of Section 4 in a natural fashion. Let n T 2 En,m (D) := conv (aT i1 x) · · · (aik x) ψ(x) : m − k is even, ψ ∈ Hn,(m−k)/2 , o and i1 , . . . , ik ∈ {1, . . . , q} , 13
( r Kn,m (D)
:=
θ ∈ Hn,m :
q X
!r aT i x
) θ(x) ∈ En,m+r (D) ,
i=1
and n r Cn,m (D) := θ ∈ Hn,m : ∃φ ∈ Hn,m+r , φ ≥ 0 s.t. o r Pq T T T x θ(x) = φ(a x, . . . , a x) . a q 1 i=1 i r r By construction, both Kn,m (D) and Cn,m (D) have LMI descriptions. Inr r r deed, Cn,m (D) is a polyhedral cone. Notice also that Cn,m (D) ⊆ Kn,m (D).
The natural extensions of Propositions 2 and 3 hold. r r Proposition 7 Cn,m (D) ↑ Pn,m (D) and Kn,m (D) ↑ Pn,m (D).
Proof. The first claim follows from Proposition 6. The second claim follows r r (D) ⊆ Pn,m (D). 2 (D) ⊆ Kn,m from the first one and the inclusions Cn,m Proposition 6 also yields Proposition 8 below, which is a natural analog of Theorem 4. Notice that both the second part of Proposition 7 and Proposition 2 can be obtained as a consequence of Proposition 8. Proposition 8 Assume D = {x : aT i x ≥ 0, i = 1, . . . , q} is pointed. Let θ ∈ Hn,m . If θ is pd in D then there exists r ∈ IN such that !r n X T ai x θ(x) ∈ En,m+r (D). i=1
Proof. This readily follows from Proposition 6.
2
Proof of Proposition 6. Let A = QR be the full QR-factorization of A (see, e.g., [7, 36]), i.e., Q ∈ IRq×q is orthogonal and R ∈ IRq×n is upper triangular. U Since rank(A) = n (as D is pointed), the matrix R is of the form where 0 U ∈ IRn×n is upper triangular and non-singular. Put Q = Q1 Q2 where Q1 is the block of the first n columns of Q. Now let γ ∈ Hq,m be defined as γ(y) := θ(U −1 QT 1 y). Notice that U −1 QT 1 A = I, so in particular γ(Ax) = θ(x). T T Claim: For P c ≥ 0 sufficiently large γ(y) + c(y Q2 Q2 y) > 0 for all y ∈ ∆q := q {y ∈ IR+ : yi = 1}. Here is a proof of the claim: let F := {y ∈ ∆q : γ(y) ≤ 0}. If F = ∅ then take c = 0. Otherwise, observe that any given y ∈ F cannot be of the form Ax, so it is neither of the form Q1 z. Therefore QT 2 y 6= 0 for all y ∈ F . Let m1 := min{γ(y) : y ∈ F } ≤ 0, and m2 := min{(y T Q2 QT 2 y) : y ∈ F } > 0 (these minima are attained because F is compact). The claim then follows by taking c > −m1 /m2 .
14
Let c > 0 be sufficiently large so that γ(y) + c(y T Q2 QT 2 y) > 0 for all y ∈ ∆q . Applying P´ olya’s Theorem to γ(y) + c(y1 + . . . + yq )m−2 (y T Q2 QT 2 y) we conclude that for N sufficiently large there exists φ ∈ Hq,m+N , φ ≥ 0 such that (y1 + . . . + yq )N (γ(y) + c(y1 + . . . + yq )m−2 (y T Q2 QT 2 y)) = φ(y1 , . . . , yq ). Thus, since γ(Ax) = θ(x) and QT 2 Ax = 0, plugging y = Ax we get T N T T (aT 1 x + . . . + aq x) θ(x) = φ(a1 x, . . . , aq x).
2
6
Psd forms over pointed semialgebraic cones
Consider a domain of the form D = {x ∈ IRn : φi (x) ≥ 0, i = 1, . . . , q}, where φi ∈ Hn,mi , i = 1, . . . , q. We shall restrict our attention to domains that are pointed, i.e., we shall assume that 0 cannot be obtained as a sum of nonzero elements of D. Since D is semialgebraic, it is closed. Thus, D is pointed if and only if {0} is an exposed face of D (see, e.g., [12, 32]), i.e., if and only if there exists a nonzero vector a ∈ IRn such that D ⊆ {x ∈ IRn : aT x ≥ 0}
(8)
D ∩ {x ∈ IRn : aT x = 0} = {0}.
(9)
and n
Furthermore, (8) and (9) imply that D ∩ {x ∈ IR : aT x = 1} is compact. Assumption 1 (a) Assume that a vector a ∈ IRn satisfying (8) and (9) is available. (b) Assume also that aT x ≥ 0 is included in the definition of D. In other words, assume φi (x) = aT x for some i ∈ {1, . . . , q}. (This can be assumed without loss of generality because we can always include the redundant inequality aT x ≥ 0 in the definition of D.) We now extend the constructions in Sections 4 and 5 in a natural fashion. Let En,m (D) := conv{φi1 (x) · · · φik (x) ψ(x)2 : m − (mi1 + · · · + mik ) is even, ψ ∈ Hn,(m−(mi1 +···+mik ))/2 , and i1 , . . . , ik ∈ {1, . . . , q}}, (10) 15
and r Kn,m (D) := {θ ∈ Hn,m : (aT x)r θ(x) ∈ En,m+r (D)}.
(11)
The heart of our construction is the following natural extension of Proposition 8. Proposition 9 Assume D = {x ∈ IRn : φi (x) ≥ 0, i = 1, . . . , q} is such that Assumption 1 holds. Let θ ∈ Hn,m . If θ is pd in D then there exists r ∈ IN such that (aT x)r θ(x) ∈ En,m+r (D). The proof of Proposition 9 relies on the following fundamental representation theorem due to Schm¨ udgen [33]. In the following statement Σn denotes the set of polynomials in n variables that are sums of squares. Theorem 6 (Schm¨ udgen) Let f, h1 , . . . , hs be polynomials in n variables such that D = {x ∈ IRn : hi (x) ≥ 0, i = 1, . . . , s} is compact and f (x) > 0 for all x ∈ D. Then there exist gν ∈ Σn , ν ∈ {0, 1}s such that X f (x) = h1 (x)ν1 · · · hs (x)νs gν (x). ν∈{0,1}s
2
Proof. See [33, Cor. 3].
T Proof of Proposition 9. First assume a = en := 0 · · · 0 1 ∈ IRn . To simplify notation, we shall let x ¯ denote a generic vector (x1 , . . . , xn−1 ) ∈ IRn−1 . Assume θ ∈ Hn,m is pd in D. Let f (¯ x) = θ(¯ x, 1), and D = {¯ x ∈ IRn−1 : (¯ x, 1) ∈ n−1 D} = {¯ x ∈ IR : φi (¯ x, 1) ≥ 0}. By Assumption 1 and since θ is pd in D, the set D is compact and f (¯ x) > 0 for all x ¯ ∈ D. Thus by Theorem 6 there exist gν ∈ Σn−1 , ν ∈ {0, 1}q such that X f (¯ x) = φ1 (¯ x, 1)ν1 · · · φq (¯ x, 1)νq gν (¯ x). (12) ν∈{0,1}q
Let mν =
X
deg(φi ) for each ν ∈ {0, 1}q , and let N = max(mν + deg(gν )). ν
νi =1
From (12) we get −m xN θ(x) n
= P xN x/xn ) n f (¯ = x/xn , 1)ν1 · · · φq (¯ x/xn , 1)νq gν (¯ x/xn )xN n ν∈{0,1}q φ1 (¯ P N −m −deg(g ν ν) ν1 νq = φ (x) · · · φ (x) g ˘ (x)x , n q ν ν∈{0,1}q 1
N −m where g˘ν is the homogenization of gν . It thus follows that θ ∈ Kn,m (D).
The general case can be reduced to this special case via a “change of coordinates” as follows. Without loss of generality assume aT a = 1. Let B ∈ IRn×n ˜ := {y : By ∈ be an orthogonal matrix whose last column is a. Putting D ˜ D}, φi (y) := φi (By), a ˜ := en we are in the previous case and hence the state˜ φ˜i , a ment above holds for D, ˜. The general result then follows for D, φi , a, 16
after changing back to the original coordinates by putting D = {By : y ∈ ˜ φi (x) := φ˜i (B T x), a = B˜ D}, a. 2 We can now extend the second part of Proposition 7, which yields our most general result. r Proposition 10 Kn,m (D) ↑ Pn,m (D).
Proof. This follows from Proposition 9 and the following two facts θ(x) ∈ En,m (D) ⇒ (aT x)θ(x) ∈ En,m+1 (D), (a x)θ(x) ∈ En,m+1 (D) ⇒ θ(x) ∈ Pn,m (D). T
2 r Remark 2 Notice that the definitions of En,m (·) and Kn,m (·) in Sections 4, 5, and 6 are consistent, i.e., if we apply (10) and (11) to the special cases D = T IRn+ = {x : xj ≥ 0}, a = e := 1 · · · 1 and to D = {x ∈ IRn : aT i x ≥ 0, i = 1, . . . , q}, a = a1 + · · · + aq , we recover the sets defined in Sections 4 and 5. Indeed, Proposition 2 and the second part of Proposition 7 are special cases of Proposition 10 above.
Using a stronger representation theorem for positive polynomials due to Jacobi and Prestel (see [27, Thm. 6.3.4]), another sequence of inner approximations can be constructed. Assume m1 , . . . , mq have the same parity. Thus by Assumption 1 (a), all m1 , . . . , mq must be odd. Define En,m (D) as follows: For m odd En,m (D) := conv{φi (x)ψ(x)2 : i ∈ {1, . . . , q}, ψ ∈ Hn,(m−mi )/2 }, and for m even En,m (D) := Σn,m . Finally, let r Kn,m (D) := {θ ∈ Hn,m : (aT x)r θ(x) ∈ En,m+r (D)}. r Proposition 11 Kn,m (D) ↑ Pn,m (D).
Proof. Under the assumptions made above the representation theorem due to Jacobi and Prestel [27, Thm. 6.3.4(i)] implies that for any given θ ∈ Hn,m pd in D there exists r ∈ IN such that (aT x)r θ(x) ∈ En,m+r (D). Now proceeding as in the proof of Proposition 10 the result follows.
17
2
Remark 3 When the mi ’s do not have the same parity, the construction above still works provided we take En,m (D) := conv(Fn,m (D) ∪ Gn,m (D) ∪ Σn,m ), where Fn,m (D) := {φi (x)ψ(x)2 : i ∈ {1, . . . , q}, m − mi is even, ψ ∈ Hn,(m−mi )/2 }, Gn,m (D) := {φi (x)φj (x)ψ(x)2 : i < j ∈ {1, . . . , q}, m − mi − mj is even, and ψ ∈ Hn,(m−mi −mj )/2 }. In this case [27, Thm. 6.3.4(ii)] applies. Example 2 (Constrained polynomial optimization) Let g(x) and gi (x), i = 1, . . . , q be given polynomials in n variables (non-necessarily homogeneous) and consider the problem of finding g ∗ := min{g(x) : gi (x) ≥ 0, i = 1, . . . , q}.
(13)
We shall assume that the following technical condition holds: For all x ∈ IRn \ {0} there exists i ∈ {1, . . . , q} s.t. g˜i (x) < 0,
(14)
where g˜i (x) is the homogeneous component of gi (x) of highest total degree. Without loss of generality assume the constant term of g(x) is zero, i.e., g~0 = 0. Thus, by homogenizing, it can be shown that (13) is equivalent to max −θ(~0,2m) s.t. θ(α,αn+1 ) = gα , ∀ |(α, αn+1 )| = 2m, α 6= ~0 θ ∈ Pn+1,m (D). where D = {(x, xn+1 ) : xn+1 ≥ 0 and g˘i (x, xn+1 ) ≥ 0, i = 1, . . . , q}. It is easy to see that the domain D ⊆ IRn+1 satisfies Assumption 1(a) for a = en+1 if and only if condition (14) holds. For each nonnegative integer r, consider g r := max s.t.
−θ(~0,2m) θ(α,αn+1 ) = gα , ∀ |(α, αn+1 )| = 2m, α 6= ~0 r (D). θ ∈ Kn+1,m
r By the construction of Kn+1,m (D) above, this is a semidefinite program. Furthermore, by Proposition 10 and Theorem 2, g r ↑ g ∗ . This also holds if r r (D) is changed to Kn+1,m (D). Kn+1,m r Remark 4 The sequence of inner approximations Kn,m (D) is closely related to Lasserre’s construction in [17], which relies of a theorem of Putinar [28, Thm. 1.4]. However, as pointed out in [27, page 159], the proof of Putinar’s theorem only works in the case when all mi ’s are even and requires the hypothesis (14), which is slightly stronger than the hypothesis made in [28] and in [17].
18
Acknowledgments. We thank Monique Laurent for her careful reading and comments on a preliminary version of this paper. In particular, we are indebted to her for pointing out a gap in a previous proof of Proposition 10.
References [1] D. Bertsimas and I. Popescu, “Optimal Inequalities in Probability Theory: A Convex Optimization Approach,” INSEAD Working Paper, 2003. [2] J. Bochnak, M. Coste, M-F. Roy, Real Algebraic Geometry, Springer Verlag, 1998. [3] I. Bomze, E. de Klerk, “Solving standard quadratic optimization problems via semidefinite and copositive programming,” Journal of Global Optimization 24 (2002) 163–185. [4] E. de Klerk and D. Pasechnik, “Approximating the Stability Number of a Graph via Copositive Programming,” SIAM Journal on Optimization 12 (2002) 875–892. [5] P. Diananda, “On Non-negative Forms in Real Variables some or all of which are Non-negative,” Math. Proc. Cambridge Philos. Soc., 58 (1962) 17–25. [6] D. Handelman, “Representing polynomials by positive linear functions on compact convex polyhedra,” Pacific Journal of Mathematics 132 (1988) 35– 62. [7] G. Golub and C. Van Loan, Matrix Computations, Third Edition, Johns Hopkins University Press, 1996. [8] B. Hanzon and D. Jibetean, “Global minimization of a multivariate polynomial using matrix methods,” To appear in Journal of Global Optimization. [9] G. Hardy, J. Littlewood, and G. P´olya, Inequalities, Cambridge University Press, 1954. [10] D. Hilbert, “Uber Die Darstellung Definiter Formen als Summe von Formenquadraten,” Math Analen 32 (1888) 342–350. [11] D. Hilbert, “Mathematical Problems,” Bulletin of American Mathematical Society 8 (1902) 437–479. [12] J. Hiriart-Urruty and C. Lemar´echal, Convex Analysis and Minimization Algorithms, Springer Verlag, 1993. [13] D. Jibetean, Algebraic optimization with applications to system theory, Ph.D. Dissertation, Vrije Universiteit, Amsterdam, 2003. [14] D. Jibetean and M. Laurent, “Converging semidefinite bounds for global unconstrained optimization,” Preprint, CWI, Amsterdam, December 2004. 19
[15] S. Karlin and W. Studden, Tchebycheff Systems: with Applications in Analysis and Statistics, Pure and Applied Mathematics Vol. XV, A Series of Texts and Monographs. Interscience Publishers, John Wiley and Sons, 1966. [16] M. Kojima, S. Kim and H. Waki, “A General Framework for Convex Relaxation of Polynomial Optimization Problems over Cones,” J. Oper. Res. Soc. Japan 46 (2003) 125–144. [17] J. Lasserre, “Global Optimization Problems with Polynomials and the Problem of Moments,” SIAM Journal on Optimization 11 (2001) 796–817. [18] J. Lasserre, “Bounds on Measures Satisfying Moment Conditions,” Annals of Applied Probability 12 (2002) 1114–1137 [19] M. Laurent, “A comparison of the Sherali-Adams, Lov´asz-Schrijver, and Lasserre relaxations for 0-1 integer programming,” Mathematics of Operations Research 28 (2003) 470–496. [20] M. Laurent, “Semidefinite representations for finite varieties,” To Appear in Mathematical Programming. [21] K. Murty and S. Kabadi, “Some NP-complete problems in quadratic and linear programming,” Mathematical Programming 39 (1987) 117–129. [22] Y. Nesterov, “Structure of Non-negative Polynomials and Optimization Problems,” CORE Discussion Paper No. 9749, 1997. [23] P. Parrilo, “Structured Semidefinite Programs and Semialgebraic Geometry Methods in Robustness and Optimization,” Ph.D. Dissertation, California Institute of Technology, Pasadena, CA, 2000. [24] P. Parrilo, “Semidefinite programming relaxations for semialgebraic problems,” Mathematical Programming 96 (2003) 293–320. [25] V. Powers and B. Reznick, “A New Bound for P´olya’s Theorem with Applications to Polynomials Positive on Polyhedra,” Journal of Pure and Applied Algebra 164 (2001) 221–229 [26] I. Popescu, “A Semidefinite Programming Approach to Optimal Moment Bounds for Distributions with Convex Properties,” INSEAD Working Paper, 2001. [27] A. Prestel and C. Delzell, Positive Polynomials: From Hilbert’s 17th Problem to Real Algebra, Springer Verlag, 2001. [28] M. Putinar, “Positive polynomials on compact sets,” Indiana Univ. Math. J. 42 (1993) 969–984. [29] J. Renegar, A Mathematical View of Interior-Point Methods in Convex Optimization, SIAM, Philadelphia, 2001.
20
[30] B. Reznick, “Some concrete aspects of Hilbert’s 17th problem,” in Real Algebraic Geometry and Ordered Structures, (C. N. Delzell, J.J. Madden eds.) Cont. Math., 253 (2000), 251-272. [31] B. Reznick, “Uniform Denominators in Hilbert’s Seventeenth Problem,” Math. Z. 220 (1995) 75–97 [32] T. Rockafellar, Convex Analysis, Princeton University Press, 1970. [33] K. Schm¨ udgen, “The K-moment Problem for Compact Semi-algebraic Sets,” Mathematische Annalen 289 (1991) 203–206. [34] N. Shor, “Class of global minimum bounds of polynomial functions,” Cybernetics 23 (1987) 731–734. [35] N. Shor and P. Stetsyuk, “The use of a modification of the r-algorithm for finding the global minimum of polynomial functions,” Cybernetics and System Analysis 33 (1997) 482–497. [36] L. Trefethen and D. Bau, Numerical Linear Algebra, SIAM, 1997. [37] H. Wolkowicz, R. Saigal, and L. Vandenberghe, Handbook of Semidefinite Programming: Theory, Algorithms, and Applications, Kluwer Academic Publishers, 2000. [38] L. Zuluaga, Ph.D. Dissertation, Carnegie Mellon University (2004). [39] L. Zuluaga and J. Pe˜ na, “A Conic Programming Approach to Generalized Tchebycheff Inequalities,” To Appear in Mathematics of Operations Research.
21