AE ) = matrix of liner inequality and equality constraints ... cE, cI = constant vectors for linear equality, inequality constraints .... is an mI Ã n (mE Ã n, resp.).
Hierarchical Overlapping Coordination for Large-Scale Optimization by Decomposition Nestor Michelena∗ Panos Papalambros
Hyungju A. Park† ∗
Devadatta Kulkarni
†
Abstract Decomposition of large engineering design problems into smaller design subproblems enhances robustness and speed of numerical solution algorithms. Design subproblems can be solved in parallel, using the optimization technique most suitable for the underlying subproblem. This also reflects the typical multidisciplinary nature of system design problems and allows better interpretation of results. Hierarchical overlapping coordination (HOC) simultaneously uses two or more problem decompositions, each of them associated with different partitions of the design variables and constraints. Coordination is achieved by the exchange of information between decompositions. This article presents the HOC algorithm and a sufficient condition for global convergence of the algorithm to the solution of a convex optimization problem. The convergence condition involves the rank of a matrix derived from the Jacobian of the constraints. Computational results obtained by applying the HOC algorithm to problems of various sizes are also presented.
∗
Department of Mechanical Engineering and Applied Mechanics, University of Michigan, Ann Arbor, MI 48109. e-mail: {nestorm, pyp}@umich.edu † Department of Mathematics and Statistics, Oakland University, Rochester, MI 48309. e-mail: {park, kulkarni}@oakland.edu.
1
Nomenclature Aαi = block of FDT associated with the i-th subproblem of the α-decomposition AE , AI = constraint matrices corresponding to linear equalities, inequalities A¯I = submatrix of AI consisting of active inequality constraints ! I A A := = matrix of liner inequality and equality constraints AE Aˆ = full row rank submatrix of A C(A) = cone defined by the row vectors of AE and A¯I cE , cI = constant vectors for linear equality, inequality constraints FDT = functional dependence table fα0 = summand in objective function involving only linking variables fαi = summand in objective function involving xαi f (x) = objective function g(x) = vector of inequality constraint functions HOC = hierarchical overlapping coordination Hα = indicator matrix such that Hα x = yα hαi , gαi = vector of constraint functions involving xαi h(x) = vector of ! equality constraint (affine) functions J I (x) J(x) := = Jacobian of inequality and equality constraint functions J E (x) ˆ J(x) = full row rank submatrix of J(x) J¯I (p) = submatrix of J I (p) consisting of the active inequality constraints at p Kα = matrix obtained by augmenting A with Hα Kα (x) = matrix obtained by augmenting J(x) with Hα ˆ αβ = matrix obtained by augmenting Aˆ with Hα and Hβ K ˆ αβ (p) = matrix obtained by augmenting J(p) ˆ K with Hα and Hβ λE , λI = Lagrange multipliers for equality, inequality constraints nα = number of linking variables for the α-decomposition pα = number of subproblems in the α-decomposition
R = set of real numbers Rn = n-dimensional Euclidean space Ta , Ta (p) = set of indices corresponding to active inequality constraints, at p X = set constraint 2
x = vector of optimization variables xαi = vector of local variables associated with the i-th subproblem of the α-decomposition yα = vector of linking variables for the α-decomposition
1
Introduction
A typical approach to engineering design consists of formulating an optimization problem using models to estimate design criteria and constraint functions, and applying formal optimization methods to search the design space for an optimum. The resulting optimization problem may be, in general, a mixed-discrete nonlinear problem whose model contains noncontinuous mathematical functions. In this article, we address design problems that can be formulated (or reformulated) as smooth, nonlinear optimization problems and solved for a local solution using a gradient-based optimization algorithm. It is expected that a local solution provides a reasonable improvement over a baseline design or is sufficiently close to the global solution because of engineering constraints on the feasible region. For the mathematical proofs we consider a convex optimization problem of the form
find x ∈ X such that h(x) = 0, g(x) ≤ 0 and f (x) is minimized,
where X ⊂
Rn
is a nonempty open convex set, f : X →
differentiable functions on X, and hi : X →
R
R
and gi : X →
(1.1)
R
are convex and
are affine functions on X. This problem is the
foundation for most nonlinear programming solution algorithms. Although optimization methods have been applied with practical success to individual system components, difficulties arise for system level design—a system being a collection of connected components or processes. Exploiting the structure of the design problem by decomposition of the problem into smaller subproblems may be necessary in the case of systems that involves hundreds of variables and constraints. The subproblems are then solved in parallel, using the optimization technique most suitable for the underlying submodel, gaining in robustness, speed
3
and interpretation of results. Moreover, system design problems are typically multidisciplinary and/or involve subsystem design teams that effect one or more explicit problem decompositions. Thus, coordinated solution of design subproblems may be the only way to address the overall system problem in a practical and robust manner. Hierarchical overlapping coordination (HOC) simultaneously uses two or more design problem decompositions, each of them associated with different partitions of the design variables and constraints. This kind of problem decomposition may reflect, for example, matrix-type organizations structured according to product lines or subsystems and the disciplines involved in the design process. At the product level, for example, a powertrain system design model could be partitioned by subsystems (e.g., engine, torque converter, transmission, differential, wheel, and vehicle submodels) or by aspects (e.g., thermodynamics, combustion, heat transfer, multibody and fluid dynamics submodels). Coordination of the design subproblems is achieved by the exchange of information between decompositions. The mathematical formulation of HOC was first proposed by Macko and Haimes,1 and several criteria for convergence of the coordination algorithm under linear equality and inequality constraints were developed in Macko and Haimes,1 and Shima and Haimes.2 Convergence criteria developed in those articles are computationally difficult to check and possibly incorrect (see Remark 4.4 in Park et al.3 ). In order to remedy the situation, new computationally efficient criteria for convergence of the HOC algorithm under linear constraints were developed in Park et al.3 In the present article, we present an HOC algorithm and a condition that ensures the convergence of the algorithm for nonlinear convex problems. This condition also guarantees convergence to a stationary point in the case of more general smooth, nonlinear optimization problems. Several researchers have proposed coordination strategies to exploit the structure of a problem associated with its decomposition. Reviews of optimization procedures that use decomposition are presented by Wagner and Papalambros4 and Sobieszczanski-Sobieski and Haftka.5 Recently, Nelson and Papalambros6 presented Sequentially Decomposed Programming as a globally convergent coordination scheme for hierarchic systems. Other promising coordination algorithms, including concurrent subspace optimization (CSSO)7 and collaborative optimization (CO)8 for non-hierarchic systems, require further in-depth study of their robustness and con4
vergence properties.
2
Hierarchical Overlapping Coordination
Dependence of design functions on variables can be represented by a Boolean matrix termed the functional dependence table (FDT). The (i, j)-th entry of the FDT is one if the i-th function depends on the j-th variable, and zero otherwise. Hypergraph-based model decomposition9 can be applied to the constraint functions of problem (1.1) to obtain two or more distinct (say, α-, β-, γ-, . . . ) decompositions. This involves generating a decomposition of the functional dependence table (FDT) by reordering the variables and constraints, as shown in Figure 1. hα1 gα1
ya
xα1
Bα1
Aα1
hα2 gα2
Bα2
.. .
.. .
hαpα gαpα
xα2
xαpα
···
Aα2 ..
Bαpα
. Aαpα
Figure 1: Block α-decomposition of functional dependence table In Figure 1, xαi is the vector of local variables associated with block Aαi , i.e., with subproblem αi ; yα is the vector of nα linking variables for the α-decomposition; and pα denotes the number of subproblems in the α-decomposition (diagonal blocks in the figure). Provided that the objective function f is α-additive separable∗ , problem (1.1) takes the following form:
Minx fα0 (yα ) +
pα X
fαi (yα , xαi ) subject to
(2.1)
i=1
hαi (yα , xαi ) = 0, gαi (yα , xαi ) ≤ 0, i = 1, . . . , pα . ∗ In general, HOC can be used if the objective function can be written as monotonic functions of local objective functions derived from the α- and β-decompositions.
5
For a given vector dα ∈
R nα ,
fixing the α-linking variables yα = dα in (2.1) results in the
following Problem α: Problem α : For each i = 1, . . . , pα ,
(2.2)
Minxαi fαi (dα , xαi ) subject to hαi (dα , xαi ) = 0, gαi (dα , xαi ) ≤ 0. Problem α can be solved by solving pα independent uncoupled subproblems. Similarly, Problem β can be defined and solved for a β-decomposition after fixing the β-linking variables. The hierarchical overlapping coordination algorithm can be described as follows, for the case of two decompositions (α and β):
2.1
Generic HOC Algorithm
Step 1. Fix linking variables yα , and solve Problem α by solving the pα independent subproblems given in (2.2). Step 2. Fix linking variables yβ to their values determined in Step 1, and solve Problem β by solving pβ independent subproblems. Step 3. Go to Step 1 with the fixed values of α-linking variables determined in Step 2. Step 4. Repeat these steps until convergence is achieved. Thus, in the HOC algorithm, the linking variables for one of the decompositions are fixed at values that result from the solution of a number of independent subproblems associated with the previous decomposition.
2.2
Properties of HOC
The following properties of HOC were observed in Macko and Haimes,1 and can be proved in a straightforward manner: 1. If the HOC algorithm is started with a feasible point x0 , then at each stage of the process, problem α and problem β will have nonempty feasible domains.
6
∞ 2. If the sequences {xαi }∞ i=1 and {xβi }i=1 result from solving problem α and problem β,
respectively, and f min := min{f (x) | h(x) = 0, g(x) ≤ 0}, then (a) f (xαi ) ≥ f (xβi ) ≥ f (xα(i+1) ) (b) limi→∞ f (xαi ) = limi→∞ f (xβi ) = f ∗ ≥ f min ∞ 3. Any accumulation point x∗ of either {xαi }∞ i=1 or {xβi }i=1 solves both problem α and
problem β.
3
Convergence Under Linear Constraints
In general, the accumulation point achieved in Step 4 of the Generic HOC Algorithm in Section 2.1 is not necessarily an optimal solution of Problem 1.1. A condition that guarantees convergence of the HOC algorithm to an optimal solution will be referred to as an HOC convergence condition. For linearly constrained problems, several equivalent HOC convergence conditions were developed in an earlier work,3 one of them being notably efficient in a computational sense. This result for the case of linear constraints is reviewed in this section. Consider the following optimization problem under linear equality and inequality constraints: Minx f (x) subject to AI x ≤ cI and AE x = cE
where f :
Rn
→
R
(3.1)
is convex and differentiable, AI (AE , resp.) is an mI × n (mE × n, resp.)
I mI constraint matrix with real entries, x ∈ R n is the vector of optimization ! variables, and c ∈ R I A (cE ∈ R mE , resp.) is a constant vector. Let A be the matrix . The problem is assumed AE to have a nonempty solution set.
The Generic HOC Algorithm described in Section 2.1 applied to Problem 3.1 results in two ∞ ∗ ∞ ∞ sequences {xαi }∞ i=1 and {xβi }i=1 . For an accumulation point x of {xαi }i=1 or {xβi }i=1 , define
Ta to be the set of the indices corresponding to the active inequality constraints, i.e., Ta := {i | (aIi1 , . . . , aIin )x∗ = cIi }, where aIij denotes the (i, j)-entry of the matrix AI . Let A¯I be the submatrix of AI consisting of 7
the active inequality constraints. Define the cone C(A) by
C(A) := {x | x =
X
ai viI
+
mE X
bi viE , ai ≥ 0},
i=1
i∈Ta
where viI (viE , resp.) denotes the i-th row vector of AI (AE , resp.). Let Hα (Hβ , resp.) be the unique matrix such that Hα x = yα (Hβ x = yβ , resp.). That is, Hα and Hβ are the indicator matrices identifying the linking variables in the α- and β-decompositions, respectively. Also, define Kα , Kβ and Kαβ as follows:
A Kα :=
Hα
!
A , Kβ :=
Hβ
!
A
, Kαβ := Hα . Hβ
We now define the induced cones C(Kα ) and C(Kβ ) as follows:
C(Kα ) := {x | x =
X
ai viI
X
ai viI
+
i∈Ta
where ei ∈
Rn
bi viE
+
+
mE X i=1
nα X
si eα(i) ,
ai ≥ 0},
ti eβ(i) ,
ai ≥ 0},
i=1 nβ
i=1
i∈Ta
C(Kβ ) := {x | x =
mE X
bi viE +
X i=1
is the i-th standard row vector, and α(i) (β(i), resp.) is the index for the i-th
α-linking (β-linking, resp.) variable. The Lagrange multiplier theorem for linear constraints [10, Proposition 3.4.1] states that x∗ ∈
Rn
is a solution to Problem 3.1 if and only if there exists a nonnegative vector λI and a
vector λE such that t t ∇f t(x∗ ) + A¯I λI + AE λE = 0.
(3.2)
As in the case of only equality constraints, this result is valid even when x∗ is not a regular point [10, page 292].
8
Condition 3.2 is equivalent to t t −∇f t(x∗ ) = A¯I λI + AE λE ,
λI ≥ 0,
(3.3)
which can be rephrased as “−∇f t (x∗ ) belongs to the cone C(A).” For fixed values of the α-linking variables yα = dα , Problem α can be defined as
Minx f (x) subject to AI x ≤ cI , AE x = cE and Hα x = dα .
(3.4)
Based on the above reasoning, x∗α is a solution to Problem α if and only if “−∇f t (x∗α ) belongs to the cone C(Kα ).” Analogously, x∗β is a solution to Problem β if and only if “−∇f t (x∗β ) belongs to the cone C(Kβ ).” ∞ Theorem 3.1 Let x∗ be an accumulation point of {xαi }∞ i=1 or {xβi }i=1 . If
C(A) = C(Kα ) ∩ C(Kβ ), then x∗ is a solution to the optimization problem in (3.1). Proof: As explained in Section 2.2, x∗ solves both Problem α and Problem β. Therefore, −∇f t(x∗ ) ∈ C(Kα ) and − ∇f t (x∗ ) ∈ C(Kβ ). Since C(A) = C(Kα ) ∩ C(Kβ ), one gets −∇f t (x∗ ) ∈ C(A), which implies x∗ is a solution to the
2
original optimization problem in (3.1).
9
The HOC convergence condition stated in Theorem 3.1 cannot be practically used because one has to know a priori the accumulation point x∗ and the set Ta of active constraints in order to compute the cones C(A), C(Kα ) and C(Kβ ). Theorem 3.2 below fixes this problem and provides a new sufficient condition for the convergence of HOC. This condition does not rely on the accumulation point x∗ . ! AI and Aˆ be an r × n submatrix of A with full row Theorem 3.2 Let r be the rank of A = AE rank. If the matrix ˆ A ˆ αβ := Hα K Hβ
has full row rank, then C(A) = C(Kα ) ∩ C(Kβ ).
Proof: Clearly, C(A) ⊂ C(Kα ) and C(A) ⊂ C(Kβ ). Therefore, C(A) ⊂ C(Kα ) ∩ C(Kβ ). To show the reverse inclusion, choose an arbitrary v ∈ C(Kα ) ∩ C(Kβ ). Let v1 , . . . , vr be the ˆ and ei ∈ R n be the i-th standard row vector. Since v ∈ C(Kα ) and v ∈ C(Kβ ), row vectors of A,
v
=
X
ai viI
X
di viI
+
v
=
bi viE +
+
mE X
ei viE
+
i=1
i∈Ta
nα X
si eα(i) ,
ai ≥ 0,
nβ X
ti eβ(i) ,
di ≥ 0.
i=1
i=1
i∈Ta
and
mE X
i=1
Therefore, X
(ai − di )viI +
mE X
(bi − ei )viE
i=1
i∈Ta
+
nα X
si eα(i) −
i=1
nβ X
ti eβ(i) = 0.
AI Since v1 , . . . , vr form a basis for the row space of
X
i∈Ta
(ai −
di )viI
(3.5)
i=1
+
mE X
(bi −
i=1
10
AE
!
ei )viE
, there exist γ1 , . . . , γr ∈ R such that
=
r X i=1
γi vi .
Hence, (3.5) becomes r X
γi vi +
nα X
si eα(i) −
i=1
i=1
nβ X
ti eβ(i) = 0.
i=1
Since ˆ αβ = (v1 , . . . , vr , eα(1) , . . . , eα(n ) , eβ(1) , . . . , eβ(n ) )t K α β has full row rank, v1 , . . . , vr , eα(1) , . . . , eα(nα ) , eβ(1) , . . . , eβ(nβ ) are linearly independent. Therefore,
γi = 0, sj = 0, tk = 0
for all i = 1, . . . , r, j = 1, . . . , nα , k = 1, . . . , nβ , and thus
v=
X
i∈Ta
ai viI +
mE X
bi viE ,
ai ≥ 0.
i=1
2
This implies that v ∈ C(A). Theorem 3.2 combined with Theorem 3.1 immediately implies the following Corollary.
Corollary 3.3 For Problem 3.1, suppose α- and β-decompositions are given. Let {xαi }∞ i=1 and ∗ {xβi }∞ i=1 be two sequences obtained by applying HOC to these decompositions, and x be an ∗ ∞ ˆ accumulation point of {xαi }∞ i=1 or {xβi }i=1 . If Kαβ has full row rank, then x is a solution to
the optimization problem in (3.1).
4
Convergence Under Nonlinear Constraints
Consider a design problem that can be formulated as a convex optimization problem of the form
Minx f (x) subject to h(x) = 0, g(x) ≤ 0
where X ⊂
Rn
is a nonempty open convex set, f : X →
R
and gi : X →
(4.1)
R
are convex and
differentiable functions on X, and hi : X → R are affine functions on X. The Jacobian of the constraint functions in Problem 4.1 plays a role similar to that of the 11
AI matrix
AE
!
J I (x) in the linear problem (3.1). Let J(x) be the matrix
J E (x)
!
where J I (x)
and J E (x) are the Jacobians of g(x) and h(x), respectively. This matrix function J(x) will be simply referred to as the Jacobian of the Problem 4.1. Define the matrices Kα (x), Kβ (x) and Kαβ (x) by
J(x) Kα (x) :=
Hα
!
J(x) , Kβ (x) :=
Hβ
!
J(x)
, Kαβ (x) := Hα . Hβ
For a fixed point p ∈ Rn , define Ta (p) to be the set of the indices corresponding to the active inequality constraints at p, i.e.,
Ta (p) := {i | gi (p) = 0}, where gi denotes the i-th inequality constraint. Let J¯I (p) be the submatrix of J I (p) consisting of the active inequality constraints at p. The cones C(J)(p), C(Kα )(p) and C(Kβ )(p) can be defined analogously as in the linear case. Define the cone C(J)(p) by C(J)(p) := {y ∈ R n | y =
X
ai viI (p)
i∈Ta (p)
+
mE X
bi viE (p), ai ≥ 0},
i=1
where viI (p) (viE (p), resp.) denotes the i-th row vector of J I (p) (J E (p), resp.). Also, define the induced cones C(Kα )(p) and C(Kβ )(p) as follows: C(Kα )(p) := {x ∈ R | x = n
X
ai viI (p)
+
+
bi viE (p)
i=1
i∈Ta (p) nα X
mE X
si eα(i) , ai ≥ 0},
i=1
C(Kβ )(p) := {x ∈ R | x = n
X
i∈Ta (p)
12
ai viI (p)
+
mE X i=1
bi viE (p)
+
nβ X
ti eβ(i) , ai ≥ 0}.
i=1
The Generic HOC Algorithm described in Section 2.1 applied to Problem 4.1 results in two ∞ 10 sequences {xαi }∞ i=1 and {xβi }i=1 . The Lagrange multiplier theorem for nonlinear constraints
states that a regular point x∗ ∈
Rn
is a solution to Problem 4.1 if and only if there exists a
nonnegative vector λI and a vector λE such that t ∇f t (x∗ ) + J¯I (x∗ )t λI + J E (x∗ ) λE = 0.
(4.2)
Condition 4.2 is equivalent to t −∇f t (x∗ ) = J¯I (x∗ )t λI + J E (x∗ ) λE , λI ≥ 0,
(4.3)
which can be rephrased as “−∇f t (x∗ ) belongs to the cone C(J(x∗ )).” For fixed values of the α-linking variables yα = dα , Problem α can be defined as
Minx f (x) subject to h(x) = 0, g(x) ≤ 0, and Hα x = dα .
(4.4)
Based on the above reasoning, x∗α is a solution to Problem α if and only if “−∇f t (x∗α ) belongs to the cone C(Kα )(x∗α ).” Analogously, x∗β is a solution to Problem β if and only if “−∇f t (x∗β ) belongs to the cone C(Kβ )(x∗β ).” Now the following theorem is immediate. ∞ ∗ Theorem 4.1 Let x∗ be an accumulation point of {xαi }∞ i=1 or {xβi }i=1 . If x is a regular point
and C(J)(x∗ ) = C(Kα )(x∗ ) ∩ C(Kβ )(x∗ ), 13
then x∗ is a solution to the optimization problem in (4.1). Proof: As explained in Section 2.2, x∗ solves both Problem α and Problem β. Therefore, −∇f t (x∗ ) ∈ C(Kα )(x∗ ) and − ∇f t (x∗ ) ∈ C(Kβ )(x∗ ). Since C(J)(x∗ ) = C(Kα )(x∗ ) ∩ C(Kβ )(x∗ ), one gets −∇f t(x∗ ) ∈ C(J)(x∗ ), which implies x∗ is a solution to the original optimization problem in (4.1). 2 ! J I (x∗ ) ∗ Theorem 4.2 Let r be the rank of J(x ) = and Jˆ(x∗ ) be an r × n submatrix of E ∗ J (x ) J(x∗ ) with full row rank. If the matrix ˆ ∗ J(x ) ˆ αβ (x∗ ) := Hα K Hβ
has full row rank, then C(J)(x∗ ) = C(Kα )(x∗ ) ∩ C(Kβ )(x∗ ). Proof: Clearly, C(J)(x∗ ) ⊂ C(Kα )(x∗ ) and C(J)(x∗ ) ⊂ C(Kβ )(x∗ ). Therefore, C(J)(x∗ ) ⊂ C(Kα )(x∗ ) ∩ C(Kβ )(x∗ ). To show the reverse inclusion, choose an arbitrary v ∈ C(Kα )(x∗ )∩C(Kβ )(x∗ ). Let v1 , . . . , vr ˆ ∗ ), and ei ∈ Rn be the i-th standard row vector. Since v ∈ C(Kα )(x∗ ) be the row vectors of J(x and v ∈ C(Kβ )(x∗ ),
v
=
X
ai viI
X
di viI
(ai −
di )viI
+
i∈Ta (x∗ )
and v
=
mE X
bi viE
+
i∈Ta (x∗ )
+
si eα(i) ,
ai ≥ 0,
nβ X
ti eβ(i) ,
di ≥ 0,
i=1
i=1
mE X
nα X
ei viE
i=1
+
i=1
Therefore, X
i∈Ta (x∗ )
mE X + (bi − ei )viE i=1
14
+
nα X
si eα(i) −
i=1
nβ X
ti eβ(i) = 0.
(4.5)
i=1
ˆ ∗ ), there exist γ1 , . . . , γr ∈ R such that Since v1 , . . . , vr form a basis for the row space of J(x X
i∈Ta
(ai − di )viI +
(x∗ )
mE X
(bi − ei )viE =
r X
γi vi +
nα X
si eα(i) −
i=1
i=1
γi vi .
i=1
i=1
Hence, (4.5) becomes
r X
nβ X
ti eβ(i) = 0.
i=1
Since the matrix ˆ αβ (x∗ ) = (v1 , . . . , vr , eα(1) , . . . , eα(n ) , eβ(1) , . . . , eβ(n ) )t K α β has full row rank, its row vectors v1 , . . . , vr , eα(1) , . . . , eα(nα ) , and eβ(1) , . . . , eβ(nβ ) are linearly independent. Therefore, γi = 0, sj = 0, tk = 0 for all i = 1, . . . , r, j = 1, . . . , nα , k = 1, . . . , nβ , and thus
v=
X
ai viI +
i∈Ta (x∗ )
mE X
bi viE ,
ai ≥ 0.
i=1
2
This implies that v ∈ C(J)(x∗ ). Theorem 4.2 combined with Theorem 4.1 immediately implies the following Corollary.
Corollary 4.3 For Problem 4.1, suppose α- and β-decompositions are given. Let {xαi }∞ i=1 and ∗ {xβi }∞ i=1 be two sequences obtained by applying HOC to these decompositions, and x be an ∞ ∗ ∗ ˆ accumulation point of {xαi }∞ i=1 or {xβi }i=1 . If x is a regular point and Kαβ (x ) has full row
rank, then x∗ is a solution to the optimization problem in (4.1). ˆ αβ (x∗ ) has full row rank only if the sets of α- and β-linking variables are disjoint Remark 4.4 K and only if the sum of the rank of J(x∗ ) plus the total number of linking variables is less than or equal to the total number of variables. 15
HOC Algorithm for Nonlinear Convex Problems Step 1. Apply the Generic HOC Algorithm starting at a point x0 for α- and β-decompositions ˆ αβ (x0 ) full row rank. Let x∗ be a resulting accumulation point. that makes K ˆ αβ (x∗ ) has full row rank, then (by Corollary 4.3) conclude that x∗ is a solution to Step 2. If K the original optimization problem in (4.1) and exit. ˆ αβ (x∗ ) fails to have full row rank, then find new α- and β-decompositions that Step 3. If K ˆ αβ (x∗ ) full row rank, and go to Step 1 with x∗ as new starting point. If it is make K not possible to find appropriate α- and β-decompositions, then assume another starting point and go to Step 1 or exit. ˆ † ) has full row rank (or we This process is repeated until we reach a point x† such that J(x reach a maximum number of iterations). Then this point x† is, by Corollary 4.3, a solution to the original optimization problem in (4.1). Valid decompositions are generated using hypergraphbased model decomposition techniques3, 9 described in Section 5.1. Remark 4.5 The HOC convergence condition stated in Corollary 4.3 requires that one has to ˆ αβ (x∗ ). However, if the matrix function know the accumulation point x∗ in order to compute K ˆ J(x) ˆ αβ (x) := Hα K Hβ
ˆ αβ (x∗ ) also has full rank, and has full row rank for every x in the feasible domain, then K HOC is convergent. The method used in the proof [3, Theorem 3.5] can be used to show that ˆ αβ (x) has full row rank everywhere if and only if there is no functional dependency exclusively K among the α- and β-linking variables. This situation can be described in terms of Elimination Theory: In the case of polynomial constraints, let
R[x1 , . . . , xn ]
be the ring of polynomials in
n variables, and let I be the ideal generated by the given constraints. Then the projection of I onto the subring
R[xα(1) , . . . , xα(nα ) , xβ(1) , . . . , xβ(nβ ) ]
consisting of polynomials in α- and
β-linking variables, should be empty. The same argument can be made for the case of analytic constraints. 16
ˆ αβ (x) In the case of polynomial constraints, it can be computationally checked whether K has full row rank everywhere. Some symbolic computer algebra systems, e.g. Macaulay11 and Singular,12 are capable of computing the projection of an ideal onto a subring.
5
Computational Results
5.1
Obtaining Two Distinct Decompositions
Recall that the functional dependence table (FDT) is a Boolean matrix representing the dependence of design functions on variables. The (i, j)-th entry of the FDT is one if the i-th function depends on the j-th variable and zero otherwise. A decomposition of the given optimization problem can be achieved by reordering rows and columns of the FDT corresponding to the constraints and variables, respectively. The decomposition algorithm proposed in Michelena and Papalambros9 uses a hypergraph representation of the design model, which is then optimally partitioned into weakly-connected subgraphs that can be identified with subproblems. Design variables are represented by the hypergraph edges, whereas design constraints interrelating these variables are represented by the nodes. These constraints may be given as algebraic equations, response surfaces, or look-up tables, or evaluated using simulation or analysis ”black boxes.” The formulation can account for computational demands and resources, by means of design constraint weighting coefficients and partitions sizes, respectively, and for the strength of interdependencies between the analysis modules contained in the model, by means of design variable weighting coefficients. The sufficient condition for convergence of HOC cannot easily be (and does not need to be) enforced at every point of the feasible space. As explained before, the condition will be enforced at the initial point and verified at the accumulation point(s). To solve a nonlinear optimization problem P by the HOC algorithm, two distinct (α-, β-) decompositions of P satisfying the sufficient condition for convergence of HOC at the initial point x0 (Corollary 4.3) can be found by the following heuristic: 1. Apply the hypergraph-based model decomposition algorithm† to problem P to obtain an †
An implementation of this decomposition http://arc.engin.umich.edu/graph part.html.
17
algorithm
can
be
found
at
the
URL:
α-decomposition. 2. In the process of obtaining a β-decomposition, penalize‡ the α-linking variables so that the disjointness of the set of α-linking variables and the set of β-linking variables is accomplished, as required by the convergence condition in Corollary 4.3. If the two sets of linking variables are not disjoint, then go back to Step 1 and obtain a new α-decomposition after penalizing the common linking variables. 3. Check if the resulting α- and β-decompositions satisfy the convergence condition in Corollary 4.3. If the convergence condition is not satisfied, then go back to Step 1 and obtain a new α-decomposition after penalizing one of the interdependent linking variables.
5.2
Example Problems
The HOC algorithm developed in this article has been applied to a family of nonlinear optimization problems of various sizes. The smallest problem P1 has 25 variables and 21 constraints (19 linear equalities and 2 nonlinear inequalities) with a strictly convex, additive separable objective function. (Thus, the FDT of problem P1 constraints is a 21 × 25 table.) The largest problem P9 has 500 variables and 420 constraints (380 linear equalities and 40 nonlinear inequalities). Figure 2 shows the reordered FDTs for the α- and β-decompositions obtained by applying the above decomposition heuristic to example problem P1 . Maple 13 was used to verify that these two decompositions do satisfy the convergence condition in Corollary 4.3 for the initial point x0 . The α-decomposition in Figure 2 has two subproblems and one linking variable (x13 ), whereas the β-decomposition has two subproblems and two linking variables (x3 and x9 ). While each of the two decompositions for P1 has two subproblems, each of the two decompositions for P9 has 40 subproblems. Once the α- and β-decompositions are determined, the subproblems have to be repeatedly solved. constr, the MATLAB 14 implementation of the Sequential Quadratic Programming algorithm (SQP), was used for this purpose. The HOC iteration process stops if the relative ‡
A variable is penalized when it is not desirable to have the variable as a linking variable. This can be achieved by assigning a high weight to the corresponding hyperedge in the model decomposition algorithm described in Michelena and Papalambros.9
18
DESIGN
VARIABLES
DESIGN
1000000000111111111222222 3123456789012456789012345
D E S I G N
R E L A T I O N S
1 2 3 4 5 6 7 8 9 10 11 12 20 21 13 14 15 16 17 18 19
VARIABLES
0001111000000111111222222 3960268124578134579012345
.%%%%%................... ...%..%.................. .%.....%%................ ...%.....%%.............. .......%...%............. ..........%.%............ *%%%..................... *%.........%.%........... .......%......%.......... ..........%....%......... .%.....%......%.%........ ...%......%....%.%....... .....%..%%............... ...%........%............ *.................@...... ..................@@..... *...................@.... ...................@@@... ..................@...@.. ..................@....@. ..................@.....@
2 4 6 10 12 21 1 3 5 7 8 9 11 13 14 15 16 17 18 19 20
*.%...................... **.%..................... ...%%.................... ...%.%................... *..%.%%.................. *...%.................... *......@@@@.............. .......@...@@............ ...........@.@........... *......@@.....@.......... .......@.....@@@......... ...........@....@........ .......@...@....@@....... ..............@...@...... ..................@@..... ..............@.....@.... ...................@@@... ..................@...@.. ..................@....@. ..................@.....@ .*........@.@............
(a)
(b)
Figure 2: Decompositions of example Problem P1 : (a) α-decomposition and (b) β-decomposition difference between the values of the objective function for two consecutive iterations is less than a preset tolerance value. The tolerance value used for the computation was 10−5 . To compare the effectiveness of the HOC algorithm with the ordinary All-At-Once (AAO) approach (i.e. one not using decompositions), the problems were solved in both ways. Even though the original problem yields a sparse matrix, each of the subproblems in the HOC may not be really sparse. A sparse optimizer was not used with either approach to assure fair comparison of performances. An AAO approach with sparse optimizers may turn out to be comparable in performance with HOC. However, HOC may increase computational efficiency of any general-purpose optimizer in the case of sparse problems. The results for P1 and the other problems of larger sizes are shown in Tables 1 and 2 for two different initial points. Runtime was measured in CPU seconds on a Sun UltraSparc 1. Runtimes only include constr function calls, excluding I/O and data transfer between α- and β-decompositions. Each runtime represents the average runtime for five separate runs of the algorithm; the times of the five runs were consistently close. Serial-runtime is measured for the
19
Table 1: Comparison of CPU-runtimes for nonlinear optimization problems of various sizes. Initial point x0 = −0.1. prob P1 P2 P3 P4 P5 P6 P7 P8 P9
no. var 25 50 75 100 125 200 250 375 500
no. no. constr subpr 21 2 42 4 63 6 84 8 105 10 168 16 210 20 315 30 420 40
All-At-Once objective runtime† 3.34092 6.68185 10.0227 13.3637 16.7046 26.7274 33.4092 50.1139 66.8185
1.847 6.067 16.85 29.59 59.03 333 1446 6904 20539
Hierarchical Overlapping Coordination objective serialparallelno. HOC runtime† runtime† iterations 3.34092 3.045 2.820 3 6.68185 5.997 2.790 3 10.0227 9.032 2.800 3 13.3637 12.03 2.812 3 16.7046 15.07 2.827 3 26.7274 24.13 2.832 3 33.4092 30.18 2.835 3 50.1139 45.22 2.832 3 66.8185 60.77 2.890 3
†Runtime is measured in CPU seconds on a Sun UltraSparc 1.
Table 2: Comparison of CPU-runtimes for nonlinear optimization problems of various sizes. Initial point x0 = 0. prob P1 P2 P3 P4 P5 P6 P7 P8 P9
no. var 25 50 75 100 125 200 250 375 500
no. no. constr subpr 21 2 42 4 63 6 84 8 105 10 168 16 210 20 315 30 420 40
All-At-Once objective runtime† 3.34092 6.68185 10.0227 13.3637 16.7046 26.7274 33.4092 50.1139 66.8187
2.860 11.41 25.57 52.46 96.85 818 1701 10869 27662
Hierarchical Overlapping Coordination objective serialparallelno. HOC runtime† runtime† iterations 3.34092 1.942 1.867 1 6.68185 3.867 1.877 1 10.0227 5.782 1.870 1 13.3637 7.697 1.867 1 16.7046 9.662 1.872 1 26.7274 15.47 1.887 1 33.4092 19.32 1.885 1 50.1139 28.85 1.877 1 66.8185 38.99 1.910 1
†Runtime is measured in CPU seconds on a Sun UltraSparc 1.
HOC computation in which the subproblems are solved sequentially, whereas parallel-runtime is measured for the HOC computation in which the subproblems are simulated to be solved in parallel.
5.3
Discussion
HOC has shorter parallel and serial runtimes than AAO for all problems and initial points except for problem P1 with initial point x0 = −0.1 := −(0.1, 0.1, . . . , 0.1). Moreover, for all problem sizes and initial points, the HOC algorithm did not require re-partition of the model as dictated by Step 3 of the algorithm. That is, the condition for convergence was satisfied both at the initial points and at the first-generated accumulation points. AAO converges faster when it is started with x0 = −0.1 as initial point, whereas HOC converges faster when it is started with 20
x0 = 0 as initial point. Note that HOC for P9 has parallel- and serial-runtimes that are 7100 − 14500 and 340 − 710 times shorter than the AAO runtimes, respectively, depending on the initial point. Surprisingly, HOC terminates after three and one iterations regardless of problem size. HOC serial-runtimes vary linearly with the size of the problem, whereas parallel-runtimes remain about constant. This is expected since subproblem sizes are similar for all nine problems. AAO runtimes seem to vary polynomially (3rd or 4th order) with the size of the problem. HOC serial wall-clock times, which include data transfer between α- and β-decompositions, were significantly shorter than their AAO counterpart. For example, for x0 = −0.1, P9 was solved in less than 2 minutes by HOC, while its AAO solution took over 7 12 hours on the same workstation. This is also true for the smaller problems: for example, HOC took 11.9 and 35.6 seconds to solve P3 and P6 , respectively; while AAO took 17.3 and 363 seconds to solve the same problems, respectively. This clearly demonstrates the advantage of using HOC for large scale problems with many loosely-linked subproblems.
6
Conclusions
Hierarchical overlapping coordination takes advantage of two or more decompositions of an optimization model to coordinate the solution of the underlying optimization problem. Each decomposition effects smaller subproblems whose separate solutions result in the solution of the original problem with possible runtime savings. Additional savings could be achieved by implementing the algorithm on a parallel computer. Convergence of the algorithm depends on the way the model decompositions interact with each other. In this article, we present a sufficient condition for convergence that can be computationally verified for decompositions of a nonlinear convex problem. This condition also guarantees convergence to a stationary point in the case of more general smooth, nonlinear optimization problems. Moreover, the condition together with model partitioning methods can help in generating appropriate problem decompositions.
21
Acknowledgment This research was supported by the U.S. Army Automotive Research Center for Modeling and Simulation of Ground Vehicle under contract No. DAAE07-94-C-R094, and by the Oakland University Foundation. This support is gratefully acknowledged.
References 1
Macko, D. and Haimes, Y., “Overlapping coordination of hierarchical structures,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-8, pp. 745–751, 1978.
2
Shima, T. and Haimes, Y., “The convergence properties of hierarchical overlapping coordination,” IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-14, pp. 74–87, February 1984.
3
Park, H., Michelena, N., Kulkarni, D., and Papalambros, P., “Convergence criteria for hierarchical overlapping coordination under linear constraints,” Computational Optimization and Applications, 1998. (submitted).
4
Wagner, T. and Papalambros, P., “A general framework for decomposition analysis in optimal design,” in Advances in Design Automation (Gilmore, B., ed.), vol. 2, (New York), pp. 315– 322, ASME, 1993.
5
Sobieszczanski-Sobieski, J. and Haftka, R. T., “Multidisciplinary aerospace design optimization: Survey of recent developments,” Structural Optimization, vol. 14, pp. 1–23, 1997.
6
Nelson, S. and Papalambros, P., “Sequentially Decomposed Programming,” AIAA Journal, vol. 35, pp. 1209–1216, July 1997.
7
Sobieszczanski-Sobieski, J., “Optimization by decomposition: A step from hierarchic to nonhierarchic systems,” in Presented at the Second NASA/Air Force Symposium on Recent Advances in Multidisciplinary Analysis and Optimization, (Hampton, VA), September 1988. NASA CP-3031, Part 1. Also NASA TM-101494.
22
8
Braun, R. and Kroo, I., “Development and application of the collaborative optimization architecture in a multidisciplinary design environment,” in Multidisciplinary Design Optimization: State of the Art (Alexandrov, N. and Hussaini, M., eds.), pp. 98–116, SIAM, 1997.
9
Michelena, N. and Papalambros, P., “A hypergraph framework to optimal model-based decomposition of design problems,” Computational Optimization and Applications, vol. 8, pp. 173–196, September 1997.
10
Bertsekas, D., Nonlinear Programming. Belmont, Massachusetts: Athena Scientific, 1995.
11
Bayer, D. and Stillman, M., Macaulay: A computer algebra system for algebraic geometry, 1995. Available by anonymous ftp from ftp.math.harvard.edu.
12
Greuel, G., Pfister, G., and Sch¨onemann, H., “Singular Reference Manual,” in Reports On Computer Algebra, no. 12, Centre for Computer Algebra, University of Kaiserslautern, May 1997. http://www.mathematik.uni-kl.de/˜zca/Singular.
13
Nicolaides, R. and Walkington, N., Maple: A Comprehensive Introduction. New York: Cambridge University Press, 1996.
14
The Math Works, Inc., Natick. MA, Using MATLAB, version 5.1 ed., 1997.
23