Solving Linear and Quadratic Matrix Equations on

0 downloads 0 Views 207KB Size Report
Descriptor systems with singular E also lead |after ... Nevertheless, problems arising from modeling large ... where A; E 2 IRn n, F; G 2 IRm m, Q; X 2. IRn m; ... to the linear and quadratic matrix equations that arise ... dimension n = 1000, (3) or (4) represent a set of linear ..... Figure 1: Block-cyclic data storage scheme of an 8.
Solving Linear and Quadratic Matrix Equations on Distributed Memory Parallel Computers Peter Benner Zentrum fur Technomathematik Fachbereich 3 { Mathematik und Informatik Universitat Bremen D{28334 Bremen (Germany) [email protected]

Enrique S. Quintana-Ort Gregorio Quintana-Ort Departamento de Informatica Universidad Jaime I 12080{Castellon (Spain) [email protected]

Abstract We discuss the parallel implementation of numerical solution methods for linear and quadratic matrix equations occuring frequently in control theory. In particular we consider equations related to analysis and synthesis of continuous-time, linear time-invariant control systems. These are the Sylvester equation, the Lyapunov equation, and the continuous-time algebraic Riccati equation. We assume the coecient matrices to be dense and the state-space dimension to be roughly of order 103{104. For such problem classes, methods based on the sign function and related methods prove to be very ecient and usually outperform methods based on the QR or QZ algorithms even in sequential computing environments. We discuss the implementation of these methods on distributed memory parallel computers employing MPI and ScaLAPACK.

1 Introduction In recent years, many new and reliable numerical methods have been developed for analysis and synthesis of moderate size (continuous-time) linear time-invariant control systems (LCS), given here in state-space form

E x_ (t) = Ax(t) + Bu(t); t > t0 ; x(t0 ) = x0 ; (1) y(t) = Cx(t) + Du(t); t  t0 ; where A; E 2 IRnn , B 2 IRnm , C 2 IRpn , and D 2 IRpm . Here we assume that E is nonsingular. Descriptor systems with singular E also lead |after appropriate transformations| to the above problem

[email protected]

formulation with a nonsingular (and usually, diagonal or triangular) matrix E ; see, e.g., [20]. Nevertheless, problems arising from modeling large

exible space structures, interconnected networks of power systems, chemical engineering, etc., lead to large-scale LCS and new methods and tools are in demand for these. Linear and quadratic matrix equations are basic computational tools used to solve most analysis, design, and synthesis methods for LCS such as linear-quadratic Gaussian optimal control, robust control with H2 and H1 performance criteria, Kalman ltering, model reduction, etc.; see, e.g., [1, 18, 20, 14, 23] and many more. The most frequently solved equations in this area are

 Sylvester equations AT XF + E T XG + Q = 0; (2) where A; E 2 IRnn , F; G 2 IRmm , Q; X 2 nm IR ;  Lyapunov equations

AT XE + E T XA + Q = 0; (3) where A; E are as in (1) and Q; X 2 IRnn are

symmetric;  algebraic Riccati equations (ARE)

AT XE + E T XA ? E T XRXE + Q = 0; (4) where A; E; Q; X are as in (3) and R 2 IRnn is

symmetric.

Lyapunov equations are special instances of both, Sylvester and algebraic Riccati equations (set F = E and G = A in (2) or R = 0 in (4)). We will focus here on the equations of the above form as they arise in continuous-time problems; similar ideas apply to the linear and quadratic matrix equations that arise in discrete-time problems. The need for parallel computing in this area can be seen from the fact that already for a system with state-space dimension n = 1000, (3) or (4) represent a set of linear or quadratic equations with 500500 unknowns (having already exploited the symmetry of X ). Systems of such a dimension driven by ordinary di erential(-algebraic) equations are not uncommon in chemical engineering applications, are standard for second order systems arising from modeling mechanical multibody systems or large exible space structures, and represent rather coarse grids when derived from the discretization of a partial di erential equations. We assume here that the coecient matrices are dense. If sparsity of matrices is to be exploited, other computational techniques have to be employed. The traditional approach for solving the above matrix equations relies on the computation of invariant/de ating subspaces by means of the QR/QZ algorithms; see, e.g., [13, 24]. In order to solve the above linear matrix equations, the QR algorithm can only be used if in (2), either F = Im and E = In or A = In and G = Im , or if E = In in (3). (Of course the roles of A; E and G; F in (2) or A and E in (3) may be swapped.) In all other cases, the QZ algorithm has to be employed as initial stage when solving these equations via the most frequently used HessenbergSchur and Bartels-Stewart methods; see [11, 21] and the references therein. Solving (4) by the (generalized) Schur vector method [19, 3], again the QR (QZ) algorithm is applied to the corresponding Hamiltonian matrix (Hamiltonian/skew-Hamiltonian matrix pencil). When iteratively re ning an approximate solution or even solving (4) directly using Newton's method [3, 17, 18, 20], in each iteration step a (generalized) Lyapunov equation of the form (3) has to be solved. Thus a parallel implementation of Newton's method also depends heavily on the parallel performance of the Lyapunov solver employed, i.e., if the Bartels-Stewart method is to be used, once more on the eciency of the parallelized QR/QZ algorithms. From the above considerations we can conclude that in order to use the traditional algorithms for solving linear and quadratic matrix equations, it is necessary to have ecient parallelizations of the QR and QZ algorithms. But several experimental studies report the diculties in parallelizing the double implicit shifted QR algorithm on parallel distributed multiprocessors. These are often based on block scattered distributions of the

matrices involved; attempts to increase the granularity employ multishift techniques. A di erent approach relies on a block Hankel distribution, which improves the balancing of the computational load. Nevertheless, the parallelism and scalability of these parallel QR algorithms are still far from those of matrix multiplications, matrix factorizations, triangular linear systems solvers, etc.; see, e.g., [9] and the references given therein. Still, the parallelization of the QR algorithm has already been thoroughly studied, in contrast to the QZ algorithm. We are not aware of any parallel implementation of this algorithm so far, probably due to its higher complexity. Moreover, since both the QR and the QZ algorithms are composed of the same type of ne-grain computations, similar or even worse parallelism and scalability results are to be expected from the QZ algorithm. In order to avoid the problems arising from the dicult parallelization of the QR and QZ algorithms, we will use a di erent computational approach which is more suitable for parallel computations. It is well-known that under suitable assumptions, the above matrix equations can be solved via the sign function method. It has long been acknowledged that algorithms based on the sign function are relatively easy to parallelize. For an overview of di erent aspects of such algorithms and more references see [16]. Therefore, we will build our algorithms for solving (2){ (3) on matrix sign function computations. We will see that these algorithms are easily implemented using simple building blocks for linear systems, and bene t from the recent development of the parallel libraries for matrix linear algebra PBLAS and ScaLAPACK [9]. Preliminary results show that using the matrix sign function and the ScaLAPACK library it is possible to solve linear and quadratic matrix equations of order up to O(104 ) eciently on symmetric multicomputers (SMP). Furthermore, the scalability of the solvers based on the sign function approaches ensures that, by increasing the number of processors, we can deal with equations associated with very large-scale LCS.

2 Basic Computational Tools The sign function method was rst introduced in 1971 by Roberts [22] for solving algebraic Riccati equations of the form (4) with E = In . Roberts also shows how to solve stable Sylvester and Lyapunov equations via the matrix sign function. The application to generalized algebraic Riccati equations with E 6= In is investigated in [10] while the application to (3) with E 6= In is examined in [6].

The computation of the sign function is based on basic numerical linear algebra tools like matrix multiplication, inversion and/or solving linear systems. These computations are implemented eciently on most parallel architectures and in particular, ScaLAPACK [9] provides easy to use and portable computational kernels for these operations. Hence, the sign function method is an appropriate tool to design and implement ecient and portable numerical software for distributed memory parallel computers. The application of the matrix sign function method to a matrix pencil Z ? Y as given in [10] in case Z and Y are nonsingular can be presented as

Z0 := Z; k = 0; 1; 2; : : : Zk+1 := 21c (Zk + c2k Y Zk?1Y ):

for

(5)

k

where ck is a scaling parameter. For determinantal 1 scaling, ck = (j det(Zk )j=j det(Y )j) n [10]. This iteration is equivalent to computing the sign function of the matrix Y ?1 Z via the standard Newton iteration as proposed in [22]. The property needed here is that if Z1 := limj!1 Zj , then Z1 ?Y (or Z1 +Y ) de nes the skew projection onto the stable (or anti-stable) right de ating subspace of Z ? Y parallel to the anti-stable (or stable) de ating subspace. In [10] the iteration (5) is used to compute a particular solution of the ARE (4). One is usually interested in the stabilizing solution X , i.e., the unique solution for which  (A ? GXE; E )  C? . It is known [10, 18, 20] that if such a solution exists, then it is unique and the columns of [ In ; E T X ]T span the stable de ating subspace of the matrix pencil

H ? K :=









A R E 0 Q ?AT ?  0 E T : (6)

Therefore, (4) can be solved by applying (5) to H ? K and then forming the resulting projector H1 ? K onto the stable de ating subspace of H ? K . A basis of this subspace is then given by the range of that projector. If this basis is given by the columns of [ U T ; V T ]T , U; V 2 IRnn , then U is invertible and the solution of (4) is obtained by solving XEU = ?V for X . (Note that in practise, usually X is obtained from H1 ? K directly without computing U; V ; see, e.g., [10, 18] for details.) The Lyapunov equation (3) is a special case of the ARE (4). This implies that one can solve (3) by means of the sign function method applied to the matrix pencil in (6) which then takes the form

H ? K =









A 0 E 0 Q ?AT ?  0 E T : (7)

For stable systems, i.e.,  (A; E )  C? , H ? K is regular and H ? K has an n-dimensional stable de ating subspace such that the solution of (3) can be obtained analogously to that of (4). In [6] it is observed that applying the generalized Newton iteration (5) to the matrix pencil H ? K in (7) and exploiting the block-triangular structure of all matrices involved, (5) boils down to A0 := A; Q0 := Q; for k = 0; 1; 2; : : : ?  (8) Ak+1 := 21c Ak + c2k EA?k 1 E ; k ?  Qk+1 := 21c Qk + c2k E T A?k T Qk A?k 1 E : k and that X = 21 E ?T (limk!1 Qk ) E ?1 : In case E = In , the iteration in (8) has already been derived by Roberts [22]. When re ning a computed solution of the ARE solving (4) directly using Newton's method in each iteration step a Lyapunov equation of the form (3) has to be solved (see, e.g., [18, 20, 24] and the references therein). Hence the above approach to solving Lyapunov equations can also be used for implementing Newton's method on parallel computers. Note that all other computations required by Newton's method apart from solving Lyapunov equations basically consist of matrix multiplications and can therefore be implemented eciently on parallel computers. In case that  (A; E )  C? and  (F; G)  C? , the Sylvester equation (2) can also be solved using the sign function method applied to     0 ?  F 0 : (9) H ? K := G Q ?AT 0 ET Using again the block-triangular structure of the matrix pencil H ? K , the iteration can be performed on the blocks as follows: A0 := A; G0 := G; E0 := E for k = 0; 1; 2; : : : ?  Ak+1 := 21c Ak + c2k EA?k 1 E ; k ?  1 Gk+1 := 2c Gk + c2k FG?k 1 F ; k ?  Qk+1 := 21c Qk + c2k E T A?k T Qk G?k 1 F : k (10) The solution of (2) is then given by the solution of the linear system of equations E T XF = 21 limk!1 Qk . In case E = In (and F = Im for (2)), other iterative schemes for computing the sign function like the Newton-Schulz iteration or Halley's method can also be implemented eciently to solve Lyapunov and Sylvester equations.

3 Implementation and Performance Results Our parallel algorithms are implemented using ScaLAPACK (scalable linear algebra package) [9]. This is a public-domain parallel library for MIMD computers which provides scalable parallel distributed subroutines for many matrix algebra kernels available in LAPACK [2]. The ScaLAPACK library employs BLAS and LAPACK for serial computations, PB-BLAS (parallel block BLAS) for parallel basic matrix algebra computations, and BLACS (basic linear algebra communication subprograms) for communication. The eciency of the ScaLAPACK kernels depends on the eciency of the underlying computational BLAS/PB-BLAS routines and the communication BLACS library. BLACS can be used on any machine which supports either PVM [12] or MPI [15], thus providing a highly portable environment. In ScaLAPACK, the user is responsible for distributing the data among the processes. Access to data stored in a di erent process must be explicitly requested and provided via message-passing. The implementation of ScaLAPACK employs a block-cyclic distribution scheme [9], which is mapped into a logical pr  pc grid of processes. Each process owns a collection of (MB  NB) blocks, which are locally and contiguously stored in a two-dimensional array in \column-major" order (see Figure 1). 1

2

3

4

5

6

7

8

9

1 2

P00

P00

P01

P01

P02

P02

P00

P00

P01

P00

P00

P01

P01

P02

P02

P00

P00

P01

3 4

P10

P10

P11

P11

P12

P12

P10

P10

P11

P10

P10

P11

P11

P12

P12

P10

P10

P11

5 6

P00

P00

P01

P01

P02

P02

P00

P00

P01

P00

P00

P01

P01

P02

P02

P00

P00

P01

P10

P10

P11

P11

P12

P12

P10

P10

P11

P10

P10

P11

P11

P12

P12

P10

P10

P11

7 8

P00

P01

P02

a11 a12 a21 a22

a17 a18 a27 a28

a13 a14 a23 a24

a19 a29

a15 a16 a25 a26

a51 a52 a61 a62

a57 a58 a67 a68

a53 a54 a63 a64

a59 a69

a55 a56 a65 a66

a31 a32 a41 a42

a37 a38 a47 a48

a33 a34 a43 a44

a39 a49

a35 a36 a45 a46

a71 a72 a81 a82

a77 a78 a87 a88

a73 a74 a83 a84

a79 a89

a75 a76 a85 a86

P10

P11

P12

Figure 1: Block-cyclic data storage scheme of an 8  9 matrix, pr  pc =2  3 and MBNB=2  2.

For scalability purposes, we only employ square topologies (pr = pc =: p) with each process mapped onto a di erent processor. The experiments were performed using Fortran 77 and ieee double-precision arithmetic ("  2:2  10?16). We tested our routines on an ibm sp2 and an Cray T3E. For both platforms, we made use of the vendor-supplied BLAS and the LAPACK library [2]. BLACS are installed on top of MPI. The ibm sp2 that we used consists of 80 RS/6000 nodes at 120 MHz, and 256 MBytes RAM per processor. Internally, the nodes are connected by a tb3 high performance switch. The Cray T3E-600 has 60 DEC

Alpha EV5 nodes at 300 MHz, and 128 Mbytes RAM per processor. The communication network has a bidimensional torus topology. Here we report performance results for the following codes:



PDGECLNW.

The standard Newton iteration ((8) with E = In ) for the Lyapunov equation (3) with E = In .  PDGGCLNW. The generalized Newton iteration (8) for the generalized Lyapunov equation (3).  PDGECRNW. Newton's method for the ARE (4) with E = In using PDGECLNW for solving the Lyapunov equations.  PDGGCRNW. Newton's method for the ARE (4) using PDGGCLNW for solving the generalized Lyapunov equations.

For more details on accuracy, algorithms, and implementation see [4, 5, 6, 7]. For the naming convention and a survey on other subroutines implemented in this area see [8].

Example 1 We generated the coecient matrices

A = Vn diag ( 1 ; : : : ; n ) Wn ; E = Vn Wn ; where the scalars 1 ; : : : ; n are uniformly distributed in [?10; 0), Wn is an n  n lower triangular matrix with all unit entries, and Vn is an n  n matrix with unit

entries on and below the anti-diagonal and all other entries equal to zero. For the standard case (E = In ), we choose Vn := Wn?1 . Then, C was constructed as a random r  n matrix and Q was set to C T C . We only used integer entries for C to avoid a loss of numerical accuracy when Q is constructed. The accuracy for this example as measured by residuals is as good as to be expected from the conditioning of the problem and equals that of the generalized Bartels-Stewart method as described in [11, 21]. Figure 2 reports the execution times for PDGGCLNW for di erent numbers of processors and varying problem size. The gure shows a good speed-up. E.g., the size of problems that can be solved in a xed time increases almost as to be expected from theoretical considerations when the number of processors is increased. Figure 3 reports the M op ratio per node for PDGECLNW on the Cray T3E and PDGGCLNW on the ibm sp2 when the number of nodes is increased and the ratio n=p is kept constant. Here, n is the problem size and p is the number of rows/columns in the square grid of processors. Thus, we are measuring the scalability of our parallel routines. The results in the gures are averaged

method using PDGECLNW or PDGGCLNW for solving the Lyapunov equations to implementations using the (generalized) Bartels-Stewart method. The latter ones will be denoted as DGECRBS and DGGCRBS, respectively.

3

Execution time (sec.)

10

2

10

1

10

0

10

500

1000

1500 2000 Problem size(n)

2500

3000

Figure 2: Execution times for PDGGCLNW on the ibm sp2. Legend: \{x{" for 1  1, \- -o- -" for 2  2, \{ {" for 3  3, and \- -+- -" for 4  4 processor grids.

Figure 4 shows the execution time of the ARE solvers for n=250, 500, 750, and 1000 on one processor of the ibm sp2. In this example we only report the results of one iteration of Newton's method. The number of iterations is the same regardless of the employed Lyapunov solver. The gure shows that even the serial performance of the implementations using the sign function based Lyapunov solvers is better than that of the codes using the (generalized) Bartels-Stewart method. 4

10

3

10

Execution time (sec.)

Execution time (sec.)

for ve executions on di erent randomly generated matrices. In these gures the solid line indicates the maximum attainable real performance (that of DGEMM) and the dashed line represents the performance of the corresponding linear matrix equation solver.

2

10

1

400

10 200

200

Megaflops per node

Megaflops per node

350 300 250 200 150 100

3

10

2

10

1

400

600 800 Problem size(n)

10 200

1000

400

600 800 Problem size(n)

1000

Figure 4: Execution time of one iteration of Newton's

150

method for the standard (left) and generalized (right) case for Example 2 on the ibm sp2. Legend: \- -x- -" for DGECRBS and DGGCRBS, \{o{" for DGECRNE and DGGCRNE.

100

50

50 0

5

10

15

20

25

30

0

2

4

Number of nodes

6

8

10

12

14

16

Number of nodes

Figure 3: M op ratio for PDGECLNW on the Cray T3E with

n=p

= 750 (left), and

PDGGCLNW

ibm sp2, with n=p = 1000 (right).

on the

Both gures show similar results. The performance per node of the algorithms decreases when the number of processors is increased from 1 to 4 due to the communication overhead of the parallel algorithm. However, as the number of processors is further increased, the performance only decreases slightly showing the scalability of the solvers.

To analyze the scalability of the solvers, we use the same data and x the dimension of the problem per node as above. Figure 5 reports the high scalability of our solvers as there is only a slight decrease in the M op ratio when the number of processors is increased from 4 to 16. This result agrees with the scalability of the basic building blocks involved in the ARE solvers (matrix product, LU factorization, triangular linear systems, etc.) In some cases, increasing the ratio n=p from 750 to 1000 reduces the performance of the algorithms. Further experiments showed that this is due to the cache size of this architecture. 250

220

mance of the ARE solvers based on Newton's method. We use the same coecient matrices A and E as in Example 1. Then, we construct two random n  n symmetric positive semide nite matrices, X and R, with kXkF = kRkF = 1. The matrix Q is then chosen such that the ARE (4) is satis ed. Note that X is then the stabilizing solution of the ARE. As the coecient matrix A or the matrix pencil A ? E are stable, we start Newton's method with X0 = 0. First, we compare implementations of Newton's

Megaflops per node

Example 2 The second example shows the perfor-

Megaflops per node

200 200

150

100

180 160 140 120 100

50 0

5

10 Number of nodes

15

80 0

5

10 Number of nodes

15

Figure 5: M op ratio of PDGECRNE (left) and PDGGCRNE (right) on the ibm sp2. Legend: \{x{" for n=p = 500, \{+{" for n=p = 750, and \{o{ " for n=p = 1000.

4 Conclusions We have presented some approaches to the parallel solution of linear and quadratic matrix equations as they arise in continuous-time control problems. Based on sign function methods, very ecient and scalable codes have been implemented. There will also be a larger set of routines available which together with the ones presented here will form a subroutine library for solving linear control problems on distributed memory parallel computers.

References

[1] B.D.O. Anderson and J.B. Moore. Optimal Control { Linear Quadratic Methods. Prentice-Hall, Englewood Cli s, NJ, 1990. [2] E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. SIAM, Philadelphia, PA, second edition, 1995. [3] W.F. Arnold, III and A.J. Laub. Generalized eigenproblem algorithms and software for algebraic Riccati equations. Proc. IEEE, 72:1746{1754, 1984. [4] P. Benner, R. Byers, E.S. Quintana-Ort, and G. Quintana-Ort. Solving algebraic Riccati equations on parallel computers using Newton's method with exact line search. Berichte aus der Technomathematik, Report 98{05, Universitat Bremen, August 1998. Available from www.math.uni-bremen.de/zetem/berichte.html. [5] P. Benner, J.M. Claver, and E.S. Quintana-Ort. Parallel distributed solvers for large stable generalized Lyapunov equations. Parallel Processing Letters, to appear. [6] P. Benner and E.S. Quintana-Ort. Solving stable generalized Lyapunov equations with the matrix sign function. Numer. Algorithms, 20(1):75{100, 1999. [7] P. Benner, E.S. Quintana-Ort, and G. QuintanaOrt. Solving linear matrix equations via rational iterative schemes. In preparation. [8] P. Benner, E.S. Quintana-Ort, and G. QuintanaOrt. A portable subroutine library for solving linear control problems on distributed memory computers. NICONET Report 1999{1, WGS | The Working Group on Software, January 1999. Available from www.win.tue.nl/niconet/NIC2/reports.html. [9] L.S. Blackford, J. Choi, A. Cleary, E. D'Azevedo, J. Demmel, I. Dhillon, J. Dongarra, S. Hammarling, G. Henry, A. Petitet, K. Stanley, D. Walker, and R.C. Whaley. ScaLAPACK Users' Guide. SIAM, Philadelphia, PA, 1997.

[10] J.D. Gardiner and A.J. Laub. A generalization of the matrix-sign-function solution for algebraic Riccati equations. Internat. J. Control, 44:823{832, 1986. [11] J.D. Gardiner, A.J. Laub, J.J. Amato, and C.B. Moler. Solution of the Sylvester matrix equation AXB + CXD = E . ACM Trans. Math. Software, 18:223{231, 1992. [12] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, B. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine { A Users Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge, MA, 1994. [13] G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, Baltimore, third edition, 1996. [14] M. Green and D.J.N Limebeer. Linear Robust Control. Prentice-Hall, Englewood Cli s, NJ, 1995. [15] W. Gropp, E. Lusk, and A. Skjellum. Using MPI: Portable Parallel Programming with the MessagePassing Interface. MIT Press, Cambridge, MA, 1994. [16] C. Kenney and A.J. Laub. The matrix sign function. IEEE Trans. Automat. Control, 40(8):1330{1348, 1995. [17] D.L. Kleinman. On an iterative technique for Riccati equation computations. IEEE Trans. Automat. Control, AC-13:114{115, 1968. [18] P. Lancaster and L. Rodman. The Algebraic Riccati Equation. Oxford University Press, Oxford, 1995. [19] A.J. Laub. A Schur method for solving algebraic Riccati equations. IEEE Trans. Automat. Control, AC24:913{921, 1979. [20] V. Mehrmann. The Autonomous Linear Quadratic Control Problem, Theory and Numerical Solution. Number 163 in Lecture Notes in Control and Information Sciences. Springer-Verlag, Heidelberg, July 1991. [21] T. Penzl. Numerical solution of generalized Lyapunov equations. Adv. Comp. Math., 8:33{48, 1997. [22] J.D. Roberts. Linear model reduction and solution of the algebraic Riccati equation by use of the sign function. Internat. J. Control, 32:677{687, 1980. (Reprint of Technical Report No. TR-13, CUED/BControl, Cambridge University, Engineering Department, 1971). [23] G. Schelfhout. Model Reduction for Control Design. PhD thesis, Dept. Electrical Engineering, KU Leuven, 3001 Leuven{Heverlee, Belgium, 1996. [24] V. Sima. Algorithms for Linear-Quadratic Optimization, volume 200 of Pure and Applied Mathematics. Marcel Dekker, Inc., New York, NY, 1996.