Comparison of certain MINLP algorithms when ... - Science Direct

2 downloads 0 Views 211KB Size Report
One criterion that allows this to be done is Akaike's information criterion (AIC). Minimizing the AIC is a mixed integer non-linear programming (MINLP) problem.
PII:

Computers Chem. Engng Vol. 22, No. 12, pp. 1829—1835, 1998  1998 Elsevier Science Ltd All rights reserved. Printed in Great Britain S0098-1354(98)00238-5 0098-1354/98 $ — see front matter

Comparison of certain MINLP algorithms when applied to a model structure determination and parameter estimation problem Hans Skrifvars, Sven Leyffer and Tapio Westerlund * Department of Chemical Engineering, Abo Akademi University, Biskopsgatan 8, FIN-20500 Abo, Finland Department of Mathematics and Computer Science, University of Dundee, Dundee DD1 4HN, Scotland, U.K. (Received 5 May 1997; revised 27 April 1998) Abstract The maximum likelihood method is frequently used in parameter estimation. If the structure of the model is unknown, the maximization of the likelihood function can be replaced by minimizing an information criterion. One criterion that allows this to be done is Akaike’s information criterion (AIC). Minimizing the AIC is a mixed integer non-linear programming (MINLP) problem. In this paper, three different MINLP algorithms are compared in the solution of a simultaneous model structure determination and parameter estimation problem by minimizing the AIC criterion. The problem considered appears in quantitative Fourier transformed infra red (FTIR) spectroscopy where concentration estimates of certain gas components are to be obtained from measured absorbances at different wave numbers. The resulting problem is a large MINLP problem containing several hundreds, or even thousands, of variables including a huge number of possible model structures. It is, however, found that the studied algorithms solve the considered problem in quite a small number of iterations and a reasonable CPU-time.  1998 Elsevier Science Ltd. All rights reserved. Keywords: identification; estimation; maximum likelihood; optimization; mixed-integer non-linear programming Introduction Parameter estimation problems appear in many different applications. Several methods have therefore been proposed for the solution of estimation problems for different assumptions (Goodwin and Payne, 1977; Ljung, 1987; So¨derstro¨m and Stoica, 1989). One of the most commonly used parameter estimation methods is the maximum likelihood method. In many applications, not only the numerical value of certain parameters but also the number of parameters and the parameter structure may be unknown. In such cases there is no unique solution to the estimation problem. However, some solution to the problem of simultaneously estimating the parameters and determining the model structure may be obtained by minimizing an information criterion. Akaike’s information criterion (AIC), the Hannan and Quinn’s criterion (HQ) and the Baysian information criterion (BIC) all belong to this category of criteria (Kotz and Johnson, 1986). *Author to whom correspondence should be directed.

Minimizing an information criterion results in an MINLP problem. In this paper, some MINLP algorithms are compared when solving a simultaneous model structure determination and parameter estimation problem by minimizing the AIC criterion. The problem considered is from quantitative Fourier transformed infra red (FTIR) spectroscopy where concentration estimates of certain gas components are to be obtained. The concentration estimates are linear functions of measured absorbances (optical densities) at different wave numbers. Depending on the resolution of the analyzer, each spectra may contain several thousand absorbances. The model includes parameters for each absorbance to be used. The number of possible model structures may therefore be very large. The model parameters and the model structure should be obtained from a set of calibration spectra. In this paper, the numerical solution of the resulting parameter estimation and model structure determination problem is considered. Three different algorithms for convex MINLP problems are applied to a specific FTIR problem in order to demonstrate the performance of the methods.

1829

H. SKRIFVARS et al.

1830

For linear models with known residual covariance matrices, the problem is convex and it is possible to obtain the global optimal solution to the considered problem by means of the algorithms. If the residual covariance matrix is unknown, the problem is generally non-convex, even for linear models. In this case a relaxation procedure, frequently used in maximum likelihood estimation, is used. The MINLP problem Akaike’s information criterion, AIC, is given by (Akaike, 1974), AIC"!2 ln ¸#2p,

(1)

where ln ¸ is the logarithm of the likelihood function and p is the number of parameters in the model. The logarithm of the likelihood function for N normally distributed, independent, random vectors, e , I with the dimension dim(e)"n, and a common covariance matrix, R, is given by (Goodwin and Payne, 1977), N 1 , N ln ¸"! n ln(2n)# ln("R\")! e2R\e . I I 2 2 2 I (2) When solving parameter estimation problems the vectors, e , are model residuals that are functions of I various model parameters h , i"1, 2,2, I. If the G covariance matrix, R, is known, the maximization of ¸, or ln ¸, (with respect to the parameters h ) is equivG alent to minimizing the last term in equation (2). If, on the other hand, the covariance matrix, R, is unknown, the likelihood function must be maximized with respect to both the covariance matrix and the parameters (Goodwin and Payne, 1977; Ljung, 1987; So¨derstro¨m and Stoica, 1989). In the latter case it can be shown (Karrila and Westerlund, 1990) that the unique maximum likelihood estimate of the covariance matrix is 1 , Rª " e e2 . (3) I I N I Combining Akaike’s information criterion with the logarithm of the likelihood function, equation (2), we obtain , AIC"N n ln(2n)#N ln("R")# e2R\e #2p. I I I (4) In order to incorporate the structure of the model into equation (4), binary variables, b , can be introG duced to define the existence of each parameter, h , G i"1,2, I. The actual number of parameters, p, is then given by the sum of the binary variables, b . The G optimization problem can now be written as an MINLP problem in the binary variables, b , and the G continuous variables, h . G

As in the maximum likelihood problem, the minimization of AIC will be dependent on the assumption that is made about the covariance matrix, R. Let us consider, first, the case when the covariance matrix is known. In this case, we find from equation (4), that the criterion to be minimized can be written as , ' J " e2R\e #2 b . (5)  I I G I G On the other hand, if the covariance matrix, R, is not known, the maximum likelihood estimate of the covariance matrix can be inserted in equation (4) resulting in ' AIC"N n ln(2n)#N ln("Rª ")#Nn#2 b . (6) G G After some manipulations of equation (6), the criterion to be minimized can be written as ' J ""Rª " “ e@G,. (7)  G Now, introducing some upper and lower limits for the parameters, h , the optimization problem can be forG mulated as follows: min +J,

(8)

h !h b )0, G G  G b )0, !h #h G G  G i3+1, 2,2, I,,

(9)

h ,b G G

subject to

(10)

b 3+0, 1,, h 3R'. G G If the covariance matrix R is known, then J"J ,  and if the covariance matrix is not known, J"J . The  second problem is generally non-convex, even for linear models (Westerlund et al., 1996), and will not be given further explicit treatment in this paper. However, a solution to the latter problem may also be found by the formulation J for a given covariance  matrix R. The solution from problem J may then be  used in a relaxation algorithm where a new maximum likelihood estimate of the covariance matrix is applied at each iteration. The relaxation procedure is given by

subject to

min +J (Rª ),  P h ,b

(11)

h !h b )0, G G  G !h #h b )0, G G  G i3+1, 2,2, I,,

(12)

G

G

(13)

b 3+0, 1,, h 3R', G G r3+0, 1, 2,2,. In the above formulation, Rª is estimated from P equation (3) after each iteration, r. The procedure begins with an initial estimate, Rª , of the covariance 

Comparison of MINLP algorithms matrix. In the numerical example considered, Rª "I  has been used. A similar relaxation procedure is frequently used in the maximum likelihood method (Goodwin and Payne, 1976). The procedure has been found to give good performance in practical calculations, however no proof for convergence has been given so far, not even for the maximum likelihood problem. From a practical point of view, the relaxation procedure has certain merits. The resulting sub-problems are quadratic and thus convex, and it is possible to obtain the global optimal solution to these sub-problems. The only non-rigorous step in the relaxation procedure is the approximation of the covariance matrix in each iteration. However, this can be expected not to be a very critical step in the algorithm since already after the first iteration the following covariance matrix estimate corresponds to the residual covariance matrix resulting from least squares parameter estimation. Numerical solution Numerically, the MINLP problem being considered can be solved by, for example, the Generalized Benders Decomposition (Geoffrion, 1972), the OuterApproximation (OA) method (Duran and Grossmann, 1986), the Extended Cutting Plane (ECP) method (Westerlund and Pettersson, 1995), non-linear branch-and-bound (B&B) (Dakin, 1965) or other MINLP algorithms that ensure global optimality for convex problems. In the following, the solutions by means of the OA method, the ECP method and the B&B method have been compared. In the OA method, the objective function may be written in non-linear form in the continuous variables and in linear form in the integer variables. The constraints may be linear and non-linear. In order to ensure global optimality the objective function as well as the non-linear constraints should be convex. In the algorithm, the non-linear inequality constraints are linearized at each iteration and then added to the linear and linearized constraints from the previous iterations. A new solution is obtained by solving both an MILP and an NLP problem. The two-step optimization procedure is performed such that the MILP masters problem is solved for both the continuous and discrete variables but generates the new linearization point only for the discrete variables, while a subsequent NLP problem generates the new linearization point for the continuous variables. The NLP problem is given by the original MINLP problem where the discrete variables are treated as constants, with the values obtained from the previous MILP solution. The procedure has global convergence properties for convex problems, and it has been shown that the procedure converges to the global optimal solution in a finite number of steps (Duran and Grossman, 1986). The Branch and Bound method (Land and Doing,

1831

1960; Dakin, 1965) can be used to solve the MILP masters problem. The present NLP sub-problems can be solved using suitable NLP-solvers. Several different NLP-solvers were tested on the present problem, and the Davidon—Fletcher—Powell’s method was found to be best studied for the problem. The ECP method is an extension of Kelley’s cutting plane method for convex NLP problems (Kelley, 1960). In the ECP algorithm (Westerlund and Pettersson, 1995), the objective function is written in linear form, while the constraints are linear and non-linear. In problems such as the one being considered, where the objective function is non-linear, the objective function is written as a non-linear constraint together with an additional variable, which is then minimized. In order to ensure global optimality the constraints should be convex, in the version of the algorithm given in Westerlund and Pettersson (1995). In Westerlund et al. (1998) it is shown that global convergence is ensured with the ECP-algorithm also for quasi-convex problems. In the algorithm, the non-linear inequality constraints are linearized at each iteration and then added to the linear and linearized constraints from the previous iterations. The solution is obtained by solving a sequence of MILP master problems and it is shown in Westerlund and Pettersson (1995) that the procedure converges to an epsilon optimal solution in a finite number of iterations. The solution at each MILP problem generates the linearization point for the non-linear constraints to be added to the old linearized constraints at the subsequent iteration. The MILP problem can be solved, for example, by the Branch and Bound method. In the ECP method, no NLP sub-problems need to be solved. Non-linear branch-and-bound solves a sequence of non-linear programming (NLP) problems with relaxed integer restrictions, branching on fractional integer solutions until all integer restrictions are satisfied. This particular solution constitutes an upper bound to the optimal value and B&B proceeds by back-tracking through the search-tree, exploring new branches, using the bounding properties of the tree until no open, unexplored node remains on the tree. Since the MINLP problem considered, is a quadratic mixed integer problem, the B&B solver that has been used in the experiment is a special-purpose MIQP B&B solver. The MIQP solver has several features which make it efficient in the present setting. It exploits the sparsity of the constraints (12) and (13) and uses an outer product formulation of the Hessian of equation (11) to compute Hessian vector products. Thus the dense Hessian of equation (11) need never be formed explicitly. The MIQP B&B solver also makes use of warm start facilities of the QP solver to solve the nodes in the tree. Finally, a best bound rule is used to select the fractional integer variable to branch on next. The lower bounds needed for this rule can be computed efficiently and include a curvature term. This branching strategy has proved to be the most

H. SKRIFVARS et al.

1832

Fig. 1. Some FTIR spectra.

successful for this class of problem (Fletcher and Leyffer, 1998). The FTIR example This numerical example is based on a problem in quantitative FTIR-spectroscopy (FTIR"Fourier Transformed Infra Red). A more detailed presentation of the problem can be found in Brink and Westerlund (1995). Numerically, however, only a problem of limited size was solved with the OA method in Brink and Westerlund (1995). Infra-red spectroscopy is based on the principle that most molecules absorb electro-magnetic radiation in a unique pattern. The quantitative relation between concentration and absorbance can be expressed with Beer’s law, A(l)"a(l)bc,

(14)

where A(l) is the absorbance at wave number l, a(l) is the absorptivity of the component, b is the optical path length and c is the concentration. In a multicomponent system, the absorbance at wave number l can be written as the sum of the individual absorbances related to each component, ' A (l)" a (l)bc . (15) RMR G G G Usually the optical path length, b, is held constant during the calibration stage and the prediction stage. The two first terms on the right hand side can thus be

written as a single constant. A spectrum for a multicomponent system is usually recorded over thousands of wave numbers. In Fig. 1, a set of spectra is illustrated. All spectra are recorded from the wave number 800—2200 cm\ with a resolution of 0.5 cm\. The spectra presented in Fig. 1, are recorded from 35 different experiments where different combinations of concentrations of the components has been used. If a spectrum (i.e. the absorbance at different wave numbers A(l )) for the multicomponent system is G stored in a vector, a, the multicomponent system can be written in matrix notations as a"Kc. In order to estimate the concentrations from measured absorbance spectra, however, we can use an inverse model of the form, c"Pa (16) which relates the absorbance to the various concentrations. The main difficulty with this model is that only a restricted number of parameters, corresponding to absorbances at certain wave numbers, can be included in the model, due to the lack of degrees of freedom in multiple linear regression analysis. One problem is, thus, how to select the ‘‘best’’ wave numbers for multicomponent system being considered. Since we do not have an estimate of the residual covariance matrix, the problem should be formulated as given by equation (7). However, we have used the convex formulation given by equations (11)—(13) (assuming the residual covariance matrix to be

Comparison of MINLP algorithms known), and then used the relaxation procedure in updating the covariance matrix. In the following, we will illustrate the determination of the model structure and the parameter estimates for a three-component system containing CO, NO and CO . Spectra obtained for different concentrations of  the components are given in Fig. 1. All 35 spectra in Fig. 1 together with the corresponding concentration data are used in the calculations. Since the resolution was 0.5 cm\ and the spectra were recorded from 800 to 2200 cm\, the spectra include absorbances at 2800 wave numbers. The problem with three chemical components would thus contain a total of 16 800 variables; 8400 continuous and 8400 binary variables. In order to reduce the problem somewhat, two example problems with average absorbances in 100 and 200 sub-intervals (with lengths of 28 and 14 cm\, respectively) were used in the calculations. The problem with average absorbances in 100 subintervals contains 600 variables and the problem with 200 sub-intervals has 1200 variables that should be obtained from the 35 spectra. The problems are formulated as in equations (11)—(13). In both cases the upper and lower limits for the parameters have been given the values h "1000 and h "0. The G  G  residuals are calculated from e "c !Pa , (17) I I I where c and a are the 35 concentration and absorbI I ance vector, respectively. The parameters, h , are the row-wise elements in the G parameter matrix P. The parameter matrix P includes 100 columns in the first problem and 200 columns in the latter problem. The relaxation procedure starts with an initial estimate of the residual covariance matrix, Rª , equal to an identity matrix, I.  It should be noted that in a linear model with a maximum of I parameters, the total number of model structures is given by S"2'. In the latter case with 200 absorbances, I"600. The total number of possible model structures is thus, in this case,

1833

Fig. 2. Upper and lower bounds of the QP-B&B, OA and ECP algorithms versus the CPU time needed. Problem with 100 absorbances.

Fig. 3. Upper and lower bounds of the ECP, OA and QPB&B algorithms versus the number of function evaluations needed. Problem with 100 absorbances.

S"2+10. It is thus practically impossible to examine all the structures even in a simple linear model like the one being considered. In the following it is, however, shown that the global optimal solution to the problem can be obtained in only a few iteration steps, and in quite a reasonable CPU-time, by the different MINLP methods. All methods start with the first linearization at h "0. The convergence properties of G the numerical methods, for the first problem, are illustrated in Figs 2— 4. The convergence properties for the latter problem are illustrated in Figs 5 and 6. Figures 2 and 3 illustrate the solution of the first problem (for the problem with 100 absorbances) where an identity matrix has been used as the covariance matrix. The top curve illustrates the upper bound of the objective function and the bottom curve the lower bound of the objective function for each

Fig. 4. Upper and lower bounds versus the number of iterations obtained with the OA algorithm for each covariance matrix in the relaxation procedure. Problem with 100 absorbances.

1834

H. SKRIFVARS et al.

Fig. 5. Upper and lower bounds of the QP-B&B, ECP and OA algorithms versus the CPU time needed. Problem with 200 absorbances.

Fig. 6. Upper and lower bounds of the ECP, OA and QPB&B algorithms versus the number of function evaluations needed. Problem with 200 absorbances.

method. In Fig. 2, the convergence properties of the methods are plotted against the CPU-time, while in Fig. 3 the corresponding convergence properties are plotted against the number of non-linear function evaluations. From Fig. 2 it can be observed that the QP-based B&B method converges to the optimal solution in about 15 seconds of CPU-time (solving 346 QP problems), while the OA and ECP methods take approximately 60 and 480 s, respectively. The CPU-time illustrated in Fig. 2 is the total CPUtime. Since the total CPU-time does not indicate the effort spent on ‘‘internal’’ calculations of the methods and that on the non-linear functions involved separately, it is also of interest to illustrate the convergence properties of the methods in relation to the number of non-linear function evaluations. This is done in Fig. 3 where the lower and the upper bounds of the objective function, similarly as in Fig. 2, are plotted against the number of non-linear function evaluations. When examining Fig. 3, it can be observed that the convergence properties of the different methods are

quite the opposite of those obtained in Fig. 2. In this case, the ECP method converges after 65 evaluations, while the OA and QP-based B&B methods need approximately 2000 to 6000 function evaluations respectively in order to converge. It may be noted that 65 master iterations were needed with the ECP method. Since only one non-linear function evaluation is needed at each iteration, the total number of nonlinear function evaluations equals the number of master iterations for the ECP method in this example. Using the OA method, 24 master iterations were needed and thus, on average, the number of function evaluations for the linearizations plus the function evaluations in the NLPs is approximately 80 per iteration. In the case of the QP-based B&B method it is not possible to count function evaluations separately. Instead, we count the number of times a Hessian vector product is computed. As a consequence, this number includes both function evaluations and manipulations of the complimentarity tableau (effectively 1 product per pivot). On average about 20 Hessian vector products are evaluated at each node of the branch-andbound tree. In Fig. 4, the upper and lower bounds when applying the OA method for each new covariance matrix in the relaxation procedure are plotted against the number of iterations for the problem with 100 absorbances. From this figure it can be seen that the number of iterations increases when solving for new estimates of the covariance matrix. A similar behavior was observed when solving the problem with all of the methods considered. The same also holds for the CPU-time used in the optimizations. Figures 5 and 6 illustrates the solution to the problem with 200 absorbances where an identity matrix has been used for the covariance matrix. The top curve illustrates the upper bound of the objective function and the bottom curve the lower bound of the objective function for each method. Figure 5 illustrates the convergence in CPU-time while Fig. 6 illustrates the convergence in terms of the number of non-linear function evaluations required. From Figs 5 and 6 a similar behavior as was observed for the smaller problem can be noted. The QP-based B&B method converged in 130 s of CPU-time and needed 24 000 function evaluations. The corresponding figures for the OA method were 4700 s and 7700 function evaluations, and for the ECP method, 4400 s and 72 function evaluations. The CPU-times for the numerical solutions are obtained on a DECstation 3000/900 m AXP workstation. The OA and ECP codes have been written in C and in these algorithms the MILP master problems have been solved with the CPLEX linear optimizer. In the OA algorithm, Davidon—Fletcher—Powell’s method was used to solve the NLP sub-problems. In the B&B case a code written in FORTRAN has been used. The code as well as the QP-B&B algorithm are described in Fletcher and Leyffer (1998).

Comparison of MINLP algorithms When the relaxation procedure (equations (11)—(13)) was applied, only four iterations (updates of the covariance matrix) were needed for both problems. After the fourth iteration, the parameter estimates for the first problem (with 100 absorbances) were



2

0

2

0

2

Pª " 2 

0

2

0

2 315.57 2

2 22.47 2 25.01 2

0 0



was a magnitude lower than for the OA and QPbased B&B methods. Some other minor differences in the efficiency of the numerical methods were found but all methods can be recommended for the solution of the considered FTIR problem.

2 714.78 2 65.07 2

2

Since the parameter matrix Pª contains 100 columns, only those columns with non-zero elements have been given. The above columns (with non-zero elements) are the columns 10, 20, 83, 87 and 97 of the original parameter matrix, corresponding to the average absorbance in the wave-length intervals 927.3—941.4, 1068.7—1082.8, 1959.6—1973.7, 2016.2— 2030.3 and 2157.6—2171.7 cm\, respectively. One interesting point in the selected columns of the parameter matrix, above, is that they correspond to wavenumbers which do not contain the highest peaks of absorbance. The residual covariance matrix obtained from these estimates is given by 1.1489 0.1993 0.0029

1835



Rª " 0.1993 0.1113 0.0002 .  0.0029 0.0002 0.0008 Further updating of the covariance matrix did not alter the structure of the model. The changes in the numerical values of the parameter estimates as well as the elements in the residual covariance matrix were also small. Summary A problem of joint model structure determination and parameter estimation in FTIR analysis was considered. It was found that the problem can be solved by minimizing the Akaike’s information criterion as a MINLP problem. Three different MINLP algorithms were compared for the numerical solution of the resulting problem. It was found that the methods performed extremely well on the considered problem. It could be observed from the comparison, that the methods have different bottlenecks. The ECP spends most of the time in solving the MILP, while the B&B again spends most time in solving the non-linear functions. For this particular FTIR MINLP problem, the OA and ECP methods used a magnitude more CPUtime to achieve convergence, compared to the QPbased B&B method. It should be noted though, that the QP-based B&B method is ideally suited for this application, since the convex parameter estimation problem considered is indeed an MIQP. On the other hand, the number of non-linear function evaluations needed to achieve convergence for the ECP method,



0

2

0

2 .

0

2

0

2

References Akaike, H. (1974) A new look at statistical model identification. IEEE ¹rans. Automat. Control 19, 716—722. Brink, A. and Westerlund, T. (1995) The joint problem of model structure determination and parameter estimation in quantitative IR spectroscopy. Chemometrics and Intelligent ¸aboratory Systems 29, 29—35. CPLEX Optimization Inc. (1995) Using the CPLEX Callable Library. Dakin, R.J. (1965) A tree-search algorithm for mixed integer programming problems. Comp. J. 8, 250—255. Duran, M.A. and Grossmann, I.E. (1986) An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Mathematical Programming 36, 307—399. Fletcher, R. (1993) Resolving degeneracy in quadratic programming. Ann. Oper. Res. 47, 307—334. Fletcher, R. and Leyffer, S. (1998) Numerical experience with lower bounds for MIQP branch-and-bound. ºniversity of Dundee, Dept. of Mathematics & Computer Science Numerical Analysis Report, NA/151, SIAM J. Optim. 8(2), (May 1998). Geoffrion, A.M. (1972) Generalized Benders decomposition. J. Optim. ¹heory Appl. 10(4), 237—260. Goodwin, G.C. and Payne, R.L. (1977) Dynamic System Identification. Experimental Design and Data Analysis. Academic Press, New York. Karrila, S. and Westerlund, T. (1990) An elementary derivation of the maximum likelihood estimator of the covariance matrix, and an illustrative determinant inequality. Automatica 27(2), 425—426. Kelley, Jr. J.E. (1960) The cutting-plane method for solving convex programs. J. Soc. Indust. Appl. Math. 8(4), 703—712. Kotz, S. and Johnson (1986) Encyclopedia of Statistical Science (Wiley) 7, 709—714. Land, A.H. and Doig, A. (1960) An automatic method for solving discrete programming problems. Econometrics 28, 497—520. Ljung, L. (1987) System Identification. ¹heory for the ºser. Prentice-Hall, Englewood Cliffs, NJ. So¨derstro¨m, T. and Stoica, P. (1989) System Identification. Prentice-Hall, London. Westerlund, T. and Pettersson, F. (1995) An extended cutting plane method for solving convex MINLP problems. Comput. Chem. Engng. (Suppl.)19, 131—136. Westerlund, T., Karrila, S., Ma¨kila¨, P.M. and Brink, A. (1996) In: Leondes, C.T. (Ed.), Control and Dynamic Systems, »olume: Digital Design and Control Systems ¹echniques and Applications, Academic Press, New York. Westerlund, T., Skrifvars, H., Harjunkoski, I. and Po¨rn, R. (1998). An extended cutting plane method for a class of non-convex MINLP problems. Comput. Chem. Engng 22, 357—365.

Suggest Documents