Response Surface Approximation using Gradient ... - CiteSeerX

2 downloads 0 Views 214KB Size Report
S. Lauridsen† , R. Vitali‡ , F. van Keulen§ , R. T. Haftka‡ , J. I. Madsen¶ ... If gradient information (computation of all directional derivatives) is available for each ... a derivative is comparable to the cost needed for a function evaluation (as for ... is the vector of p 'true' parameter values, X is the [no × p] design matrix, and is the.
Response Surface Approximation using Gradient Information S. Lauridsen† , R. Vitali‡ , F. van Keulen§ , R. T. Haftka‡ , J. I. Madsen¶ † Institute of Mechanical Engineering, Aalborg University, Aalborg, Denmark. e-mail: [email protected] ‡ Department of Aerospace Engineering, Mechanics and Engineering Sciences, University of Florida, Gainesville, FL, USA. e-mail: [email protected] & [email protected] § Technische Universit¨at Delft, Delft, The Netherlands. email: [email protected] ¶ Vanderplaats R&D, Colorado Springs, CO, USA, email: [email protected] 1. Abstract Response surface techniques have originally been developed for the construction of approximations on the basis of function values only. In many cases however, derivative information is available inexpensively, and can be used to improve the accuracy and reduce the cost of constructing the response surface. This is, for example, the case when efficient procedures for design sensitivity analysis are available. In the present paper, a framework is given for the construction of response surfaces (RS) using both function values and derivatives. The basis is a weighted least squares formulation which includes derivatives. Of particular interest is the estimation of the a prediction-error-covariance matrix used to establish the weight matrix in an iterative procedure of Weighted Least Squares regression. Examples of Response Surface Approximations on simple polynomials through use of derivatives are presented. 2. Keywords Response surfaces, iteratively weighted least squares, derivatives, error-covariance. 3. Introduction Response surface (RS) techniques were originally developed for fitting experimentally obtained —hence noisy— data. These methods historically focused on fitting function data without using derivative data, which is rarely available from physical experiments. With growing use of RS based computer simulations, where derivatives are often available at low cost, there is interest in considering RS that incorporate both function and derivative data. In order to establish a clear picture of the potential of using derivatives in the construction of response surfaces, consider the total number of observations no as the sum of nf function evaluations and nd directional derivatives: no = nf + nd

(1)

If gradient information (computation of all directional derivatives) is available for each function evaluation and all variables, the total number of observations is no = nf + nf k (2) for k variables. For quadratic approximations in k variables, where the number of parameters to be estimated is information is available even if nf = k since

(k+1)(k+2) , 2

sufficient

(k + 1)(k + 2) (3) ≤ k + k2 , k > 1 2 Thus, it is theoretically possible to construct quadratic approximations in k variables where the number of required points, at which to evaluate function and design sensitivity values, is of order O(k). Response surfaces benefit from the inclusion of sufficiently accurate gradient information when this can be obtained with less effort than needed to obtain function values. If approximation of function values are of primary interest and the computational cost of obtaining a derivative is comparable to the cost needed for a function evaluation (as for example when derivatives are obtained from a finite difference procedure), then derivative information should not be used and all the computational effort should be put in obtaining only function values. When derivatives are computationally inexpensive one might want to use the information for one of the following purposes: 1: Increase the quality of an approximation for a given number of design points. 2: Increase the order of the response surface approximation for a given number of design points. 3: Reduce the number of design points necessary to achieve a desired approximate response surface. 4. Methodology The classical linear model is left unchanged:

y = X + 

where y is the vector of no observations, is the vector of p ’true’ parameter values, X is the [no × p] design matrix, and vector of no errors which are assumed to satisfy: E() = 0 , E(T ) = Cov() = C

(4)

 is the (5)

By expressing the error in function values and derivatives at some point xi , as i

=

ij

=

yi − yˆ(xi , b) dˆ y (xi , b) ∂yi − ∂xj dxj

(6) (7)

the system of equations takes the following form for n design points: 3 2 3 2 1 ξ (x ) . . . ξp (x1 ) 2 1 y1 ) ( x dξ p 1 7 6 ∂y1 7 6 0 dξ2dx(x1 1 ) . . . 7 dx1 6 ∂x1 7 6 7 6 . 7 6 . . . 6 . 7 .. .. 6 .. 7 6 7 6 7 6 . 6 ∂y1 7 6 dξp (x1 ) 7 dξ2 (x1 ) 72 ... 6 ∂xk 7 6 0 dxk dxk 7 b1 6 7 6 7 6 . 7 .. . .. 76 ... 6 .. 7 = 6 7 4 .. . . 6 7 6 7 6 y 7 6 7 bp ξp (x1 ) 7 6 n 7 6 1 ξ2 (x1 ) . . . 6 ∂yn 7 6 dξ x ) ( ( x ) dξ p 1 1 2 7 | {z 6 ∂x1 7 6 0 . . . 7 dx1 dx1 b 6 7 6 7 6 . 6 .. 7 6 .. . 7 . . 4 . 5 4 . . . 5 ∂yn dξp (x1 ) 2 (x 1 ) 0 dξdx ... ∂x dxk k | {zk } | {z } y X

2 6 6 6 6 3 6 6 6 7 6 5+6 6 6 6 } 6 6 6 6 4 |

1 11 .. . 1k ... n n1 .. . nk {z



3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5

(8)

}

The method of Weighted Least Square (WLS) is used when the construction of a Response Surface is based on observations of differing importance which are then weighted accordingly. The WLS estimator, b of is then: b = (XT WX)−1 XT Wy

(9)

where W is the [no × no ] matrix of weights. It can be shown [1, p.69] that the WLS estimator is also the Minimum Variance Unbiased linear Estimator (MVUE) for the estimable parametric functions if: E() = 0 and Cov() = W−1

(10)

That is, if the weight matrix is chosen as the inverse of the covariance matrix, b is the Best Linear Unbiased Estimate (BLUE) of (Gauss-Markov theorem). When only function values are used to construct an approximation, it is customary to assume that the errors are uncorrelated and that they all have the same variance. This hypothesis translates into the familiar assumption Cov() = σ 2 I. However, when analyses are based on numerical simulations, the errors in the derivatives at a design point are likely to be correlated with one another and with the error in the function value at that design point. Mathematically this translates into Cov() = C where C = In ⊗ C and C is the covariance matrix of function value-gradient prediction errors, a symmetric [(1 + k) × (1 + k)] matrix. Since the covariance matrix is usually unknown when dealing with numerical analyses, an approximate one must be obtained. From the prediction errors an estimate of the covariance matrix can be computed as [2]: T ˆ = [0 : 1 : . . . : k ] [0 : 1 : . . . : k ] C n

(11)

where 0 is an [n × 1] vector of prediction errors in function values and j is a vector of prediction errors in the j th derivative: 3 3 2 2 1j 1 6 . 7 . 7 .. 5 , j = 6 0 = 4 4 .. 5 , j = 1..k nj n

(12)

An Iteratively Weighted Least Squares (IWLS) scheme was proposed by [3], which starts by determining the unweighted coefficent ˆ may then be estimated and the weight matrix may estimates b (corresponding to W = I). Based upon this, the covariance matrix C −1 ˆ be updated accordingly (W = In ⊗ C ). Using the improved weight matrix, new coefficients may then be computed and so on. This iterative procedure is carried out until the coefficient estimates b converge to constant values. 5. Results Assume that function values and gradients are obtained at a fixed number of design points. If only function values are considered, it is possible to obtain a polynomial of order P . Now, by inclusion of gradient information, the number of observations increases from n to n(k + 1). Therefore, it might be possible to achieve an approximation of the same order P of higher accuracy. The following example demonstrates the advantage of using gradient information. Consider the cubic and quartic polynomials c and q in two variables: c(x1 , x2 ) q(x1 , x2 )

=

1 + x1 + x2 + x1 2 + x1 x2 + x2 2 3

2

2

+

S(x1 + x2 x1 + x2 x1 + x2 )

=

1 + x 1 + x 2 + x1 2 + x1 x 2 + x 2 2

+

3

2

2

(13)

3

3

(14) 4

3

2

2

3

4

S(x1 + x2 x1 + x2 x1 + x2 + x1 + x2 x1 + x2 x1 + x2 x1 + x2 )

where S is a scaling factor acting on monomials of order higher than two. The quadratic polynomial yˆ(x1 , x2 ) = b0 + b1 x1 + b2 x2 + b3 x21 + b4 x1 x2 + b5 x22 , x1 ∈ [−1; 1] , x2 ∈ [−1; 1]

(15)

is used as the approximate model to fit data extracted from c and q on a 3-level factorial design (3 by 3 grid). Two approximations are made on both c and q; one using only function values in a LS regression and one using both function values and derivatives in an IWLS

procedure. The approximations are compared for different scaling factors using the analytic Root Mean Square error obtained through integration over the design space: 

Erms

= 

Erms,i

=

1 A 1 A

Z

1

−1

Z

1

Z

1

−1

(ˆ y (x1 , x2 ) − fc/q (x1 , x2 ))2 dx1 dx2

Z

1

( −1

−1

 12

(16)

dfc/q (x1 , x2 ) 2 dˆ y (x1 , x2 ) − ) dx1 dx2 dxi dxi

 12

(17)

where A is the area covered by the design space. Figure 1 shows the RMS-error versus the scaling factor. The accuracy with and without gradient is very similar for the cubic polynomial, but the gradients reduce the error substantially for the quartic polynomial. Figure 2 shows that the accuracy of the derivatives is not affected much by the inclusion of gradients (note that Erms,1 = Erms,2 due to symmetry of y(x1 , x2 ) with respect to x1 = x2 ). 1

. 0 C 0

. 8

0

. 6

0

. 4

u

b

Q

u

i c

C

a r t i c

u u

r a d ( g

i e n

r a d

t

i e n

e n

h

a n

t

e n

h

c e d a n

. 0

)

c e d

) 0

i c a r t i c

0

. 8

. 6

s , 1 r m

0

. 4 C

u

E

E

r m

,

s

E

r m

Q

b

( g

s , 2

1

0

. 2

0

. 0

0

0

. 0

0

. 2

0

. 4

0

. 6

0

. 8

1

i c

. 2

u C

( g

a r t i c

u Q

0

b

Q

b u

r a d ( g

i e n

r a d

t

i e n

e n

h

a n

t

e n

h

c e d a n

)

c e d

)

i c a r t i c

. 0 0

. 0

. 0

0

. 2

0

. 4

0

. 6

0

. 8

1

. 0

S

S

Figure 1. RMS-error, (16), in function values versus scaling factor.

Figure 2. RMS-error, (17), in derivative values versus scaling factor.

5.1. Accuracy of the Estimated Covariance in the Presence of Bias Error In the presence of bias error, we would like to see the estimated covariance matrix be close to the covariance matrix that corresponds to the approximation that will minimize the analytic RMS-error, (16). Therefore, we will compare the results of the IWLS procedure with the results of a WLS procedure, with the weight based on that best approximation. To reduce bias error, it is customary to use designs in the inferior of the region, instead of on the boundary, and so we scale the design by a factor α x1 = (−α, −α) x4 = (−α, 0) x7 = (α, α)

x2 = (0, −α) x5 = (0, 0) x8 = (0, α)

x3 = (α, −α) x6 = (α, 0) x9 = (α, α)

(18)

The approximation that minimizes the exact integrated RMS-error ((16) with ranges −α, α) is found to be   14α2 + 1 yˆqb (x1 , x2 , α) = 1 + (x1 + x2 ) + x21 + x1 x2 + x22 15

(19)

where subscript ’qb’ denotes the best quadratic approximation based on the analytic error. If we denote the errors in function values and derivatives for this best quadratic approximation as E0 = yˆqb − y

,

E1 =

dy dˆ yqb − dx1 dx1

,

E2 =

then the ideal covariance matrix Cα will have the components Z α Z α 1 Cα,ij = 2 (Ei Ej )2 dx1 dx2 4a −α −α

dˆ dy yqb − dx2 dx2

(20)

(21)

Figure 3 shows the analytical RMS-error over the original region (-1,1) based on data in the reduced region. The crosses and circles overlap, which indicates that the IWLS covariance matrix indeed converges closely to the ideal matrix. For comparison, Figure 3 also shows the results for WLS for α = 1 and for the gradient-based normal Least Square (LS) procedure ( W = I ). It is interesting to note that the minimum bias value increases substantially from 0.6 to 0.7.

I W

E

r m

1

. 0

0

. 8

0

. 6

0

. 4

W W

L

S

L L

L S

S

S

( C ( C

) a

a

=

) 1

s

0

. 0

0

. 2

0

. 4

0

. 6

0

. 8

1

. 0

a

Figure 3. Analytic RMS-errors as functions of α. 5.2. Accuracy of the Estimated Covariance in the Presence of Correlated Random Error The following example is based on the approximation of a quadratic polynomial in two dimensions with correlated random errors in function values and gradients. All errors are generated using normally distributed random errors with zero mean and variance σ 2 = 0.05. To achieve correlation between the function value and its derivatives the following correlation matrix is used: 2 3 1.0 0.5 0.5 40.5 1.0 0.55 (22) 0.5 0.5 1.0 Thus, at any point the true covariance matrix and it’s inverse are 32 22 33 2 3 32 1.0 0.5 0.5 75.0 62.5 62.5 0.05 1.0 0.5 0.5 0 0 0 5 40.5 1.0 0.555 = 462.5 75.0 62.55 · 10−3 0.05 C = 440.5 1.0 0.55 4 0 (23) 0.5 0.5 1.0 0.5 0.5 1.0 62.5 62.5 75.0 0 0 0.05 3 2 55.0 −25.0 −25.0 −1 55.0 −25.05 C = 4−25.0 (24) −25.0 −25.0 55.0 For each of a 3-, 6- and 9-level factorial design in the range [−1; 1], a set of correlated random errors is generated. In each of the three cases the IWLS procedure is applied and the covariance matrices obtained at convergence are as follows: 3 2 3 2 2 3 82.9 −20.2 −53.1 423 −132 −260 56.8 −22.8 −32.1 ˆ 3−1 = 4−132 ˆ −1 ˆ −1 4 4 424 57.5 −24.95 , C −2905 , C 48.5 −19.65 C (25) 6 = −20.2 9 = −22.8 −260 −290 −32.1 −19.6 532 −53.1 −24.9 83.2 56.2

It is evident that the accuracy of the estimated covariance matrix increases with number of data, which is in agreement with [4] who demonstrated that the prediction-error-covariance (11) is biased of degree n1 . The estimated covariance matrices in (25) are from a single realization, however, and will change for different realizations. In the following, we will demonstrate how gradient-based approximations compare to ordinary ones when the errors are correlated and random. As the true function we will use f (x) = 1 + x1 + x2 + x1 2 + x1 x2 + x2 2 , x = [x1 , x2 ]

(26)

Correlated random errors are added so that y(xi ) ∂y(xi ) ∂xj where

= =

f (xi ) + i ∂f (xi ) + ij ∂xj

3 2 i 1.0 4i1 5 = 40.5 i2 0.5

2

0.5 1.0 0.5

(27) ,

j = 1..2

32 3 N (0, 0.5) 0.5 0.55 4 N (0, σg2 ) 5 N (0, σg2 ) 1.0

(28)

(29)

Ten samples of errors are generated for four different values of σg2 . Based on a 3-level factorial design, gradient based approximations are constructed through use of IWLS and compared to LS approximations based on function values. Figure 4 shows the analytic RMS-error as an average taken over the ten different approximations obtained in each case. It is seen that the use of IWLS produces equally good approximations of function values and derivatives and is only slightly disturbed from an increasing level of error in the derivatives. On the other hand, the LS approximations of function values exhibit a major difficulty in predicting the true function for the increasing level of derivative errors since errors in function values are correlated with these.

1

r r o

r

1

. 2

1

L r m

E

S E

s

r m

s , 1

r m

s , 2

E E

r m

s

r m

s , 1

r m

s , 2

. 8

a

l y

S

E

. 0

0

L E

t i c

e

I W

. 4

. 6

e

a

n

0

. 4

e

r a

g

0

A

v

0

0

. 2

. 0 0

. 3

0

. 5

1

s g

. 0

1

. 5

2

Figure 4. Analytic RMS-errors for varying levels of errors in derivatives. 5.3. Increasing the Order of an Approximation Through Inclusion of Gradients Consider the case where the available computational resources allow a quadratic approximation ( (k+1)(k+2) parameters). If there are 2 k derivatives for each design point the total number of observations no must satisfy no >

(k + 1)(k + 2) (k + 1) 2

(30)

and the number of entries in the covariance matrix is While the number of parameters in a cubic approximation is (k+1)(k+2)(k+3) 6 , it is possible to fit a cubic approximation since Nc = (k+1)(k+2) 2 no >

(k + 1)(k + 2) (k + 1)(k + 2)(k + 3) (k + 1)(k + 2) f or k > 1 (k + 1) > + 2 6 2 {z } {z } | {z } | | quadratic

(31)

Nc

cubic

To demonstrate the idea of using gradients to increase the order of an approximation, consider the case where the true function is the quartic polynomial in (14) for S = 1. Based on a 3-level factorial design, a gradient-based cubic approximation obtained by the use of IWLS is compared to a quadratic approximation based on function values only. For the gradient based cubic approximation, the analytic errors in the function values and derivatives are Erms = 0.490 and Erms,g = 1.089, respectively. Comparing these values to Erms = 0.646 and Erms,g = 1.471 obtained for the quadratic approximation based on function values, it is evident that the cubic approximation exhibit better predictive capabilities. However, for the gradient based quadratic approximation on the quartic function, the analytic errors are Erms = 0.388 and Erms,g = 1.573 (Figure 1 and 2 for S = 1). Thus, when the IWLS procedure is used the gradient based quadratic approximation has a better prediction of the function values at the expense of predicting derivatives compared to the gradient based cubic approximation. On the other hand, the use of a gradient-based LS procedure (W = I) produce Erms = 0.928 and Erms,g = 1.644 for a cubic approximation while Erms = 1.467 and Erms,g = 2.215 for a quadratic approximation. In Table 1 the summarizing of these results tells us that the IWLS procedure provides better results for both the cubic and quadratic approximations compared to a LS approach. Comparing the LS quadratic approximation to the IWLS cubic approximation, the relative weighting of function and gradients shifts towards the gradients, and results in considerable improvement in the prediction of gradients and a small deterioration in the prediction of function values. Therefore, if the accuracy of the prediction of function values is crucial, one may benefit from a more intuitive selection of the weighting matrix (see for example [5]). Table 1. Analytic RMS-errors for different approximations to the Quartic function

Erms Erms,g

Quadratic function values only IWLS 0.646 0.388 1.471 1.573

LS 1.467 2.215

Cubic IWLS LS 0.490 0.928 1.089 1.644

5.4. Performance of the IWLS Procedure on a Sparse Data Set In general, when the gradient information is included the total number of unknowns can be written as p + Nc , where p is the number ˆij to be estimated. For a quadratic approximation with full of parameters of the response surface and Nc is the number of terms C correlation, we have p + Nc = (k + 1)(k + 2). Assuming that gradients are available at every design point (k + 1 data), the number of design points can be reduced to a minimum of k + 2. However, as the number of design points decrease, there is an increased risk of XT WX becoming singular. This section serves the purpose of addressing the singularity issue for a sparse number of design points when the error is biased.

Consider the cubic polynomial in k dimensions as the true function, and an approximation of quadratic form. The objective is to examine the use of gradients and function values for approximations based on a sparse number of design points. Although is has been demonstrated that the required number of design points is minimum of k + 2, a problem that remains unsolved is how to design the experiments in a reasonable way for such a small number of design points. The experimental designs will therefore, due to simplicity, consist of the faces of the hyper-cube plus the central point (also known as Star Designs). Thus, the number of design points n and the number of data no have the following dependence on k: n(k) = 2k + 1

no (k) = (2k + 1)(k + 1)

,

(32)

Approximations yˆg based on gradients and function values are compared to approximations yˆf based solely on function values according to Table 2. The Box-Behnken Designs used for five, seven and ten design variables are those published by [6]. Table 2. The relation between number of parameters and the number of data for the considered cases. yˆg k 2 5 7 10

p 6 21 36 66

n 5 11 15 21

no 15 66 120 231

n 9 46 62 170

yˆf Experimental Design Central Composite Box-Behnken Box-Behnken Box-Behnken

In all cases the iterative procedure encountered problems with singularity of the estimated covariance matrix. After a number of iterations the weighting of function values exceeded those of the derivatives by several orders of magnitude, eventually causing the system of equations to become very ill-conditioned. In order to avoid this, the Identity matrix was used as a weight matrix. In Table 3 the resulting analytic RMS-errors for all approximations are listed (since the true function is symmetric with respect to x1 = x2 , Erms,1 = Erms,2 ). Table 3. Analytic RMS-errors in function values and derivatives for the considered cases. yˆg k 2 5 7 10

Erms 0.407309 1.378372 2.725635 5.260119

yˆf Erms,1 1.351507 2.194459 2.998654 4.223080

Erms 0.680803 1.041468 1.753304 2.805889

Erms,1 1.201851 1.926424 2.380476 3.098387

The results from Table 3 clearly demonstrates that use of gradients for reducing the number of analyses is a trade-off between accuracy and cost-efficiency. Although additional information is available, as seen from Table 2, the predictive capabilities of each approximation is reduced. The reason for this is the emergence of an insurmountable problem: when the number of observations are very sparse, the sampled data is concentrated to a small number of design points, thus enlarging the regions of extrapolation. This problem becomes more severe with an increased number of design variables. However, the approach of using function values and gradients in conjunction with the Star Design for higher dimensions, could prove very efficient in the context of preliminary design optimization. 6. Discussion The basic conclusion that can be drawn from the present work is, that inexpensive derivative data can be employed efficiently in RS building. The examples presented here show that by including derivatives and using the IWLS procedure described, a considerable improvement is gained over the traditional approximations based on function values only. However, the use of derivative information for approximations on very sparse data sets (as for high dimensional applications) can be less accurately predictive than approximations based on a similar number of observations of function values, due to the enlarged regions of extrapolation. Although the IWLS procedure is a very useful method of estimating a sound weighting scheme, it requires the function value and gradient to be available for each design point considered. When this is not the case, e.g., if some observations are found to be erroneous and are removed, other weighting schemes must be sought or another procedure must be followed. 7. References 1 Marvin H. J. Gruber. Regression estimators: a comparative study. Statistical modeling and decision science. Academic Press, 1990. 2 Andr´e I. Khuri and John A. Cornell. Response Surfaces, Designs and Analyses, volume 81 of Statistics. Marcel Dekker, Inc., 1987. 3 L.F.P. Etman. Some global and mid-range approximation concepts for optimum structural design. Master’s thesis, Eindhoven University of Technology, Department of Mechanical Engineering, August 1992. WfW Report 92.099. 4 Arnold Zellner. An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. Journal of the American Statistical Association, 57(298):348–368, 1962. 5 F. van Keulen, B. Liu, and R. T. Haftka. Noise and discontinuity issues in response surfaces based on functions and derivatives. In Proc. of the 41th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, Atlanta, Georgia - AIAA Paper 2000-1363, April 3-6 2000. 6 G. E. P. Box and D. W. Behnken. Some new three level designs for the study of quantitative variables. Technometrics, 2(4):455–475, 1960.