proposes a general methodology for bootstrapping in frontier models, extending the more restrictive method proposed in Simar & Wilson (1998) by allowing for ...
Journal of Applied Statistics, Vol. 27, No. 6, 2000, 779± 802
A general methodology for bootstrapping in non-parametric frontier models
 OPOLD SIMAR1 & PAUL W. WILSON 2 , 1Institut de Statistique, Universite LE 2 Catholique de Louvain, Belgium and Department of Economics, University of Texas, Austin, USA
abstract The Data Envelopment Analysis method has been extensively used in the literature to provide measures of ® rms’ technical eYciency. These measures allow rankings of ® rms by their apparent performance. The underlying frontier model is non-parametric since no particular functional form is assumed for the frontier model. Since the observations result from some data-generating process, the statistical properties of the estimated eYciency measures are essential for their interpretations. In the general multi-output multi-input framework, the bootstrap seems to oVer the only means of inferring these properties (i.e. to estimate the bias and variance, and to construct con® dence intervals). This paper proposes a general methodology for bootstrapping in frontier models, extending the more restrictive method proposed in Simar & Wilson (1998) by allowing for heterogeneity in the structure of eYciency. A numerical illustration with real data is provided to illustrate the methodology. 1 Introduction An extensive literature concerning the measurement of eYciency in production has developed since Debreu (1951) and Farrell (1957) provided basic de® nitions for technical and allocative eYciency in production. One large section of this literature focuses on linear-programming based measures of eYciency along the lines of Charnes et al. (1978, 1979), Deprins et al. (1984), and FaÈre et al. (1985). 1 Among this part of the literature, those approaches that rely on convexity assumptions are known as Data Envelopment Analysis (DEA). DEA models measure eYciency relative to a non-parametric, maximum likelihood estimate of an unobserved true frontier, conditional on observed data resulting Correspondence: LeÂopold Simar, Institut de Statistique, Universite Catholique de Louvain, Voie du Roman Pays 20, Louvain-la-Neuve, Belgium. ISSN 0266-4763 print; 1360-0532 online/00/060779-24
2000 Taylor & Francis Ltd
780
L. Simar & P. W. Wilson
from an underlying (and unobserved) data-generating process (DGP).2 These methods have been widely applied to examining technical and allocative eYciency in a variety of industries; Lovell (1993) and Seiford (1996, 1997) provide lengthy bibliographies of these applications. Aside from the production setting, the problem of estimating monotone concave boundaries also naturally occurs in portfolio management. In capital asset pricing models (CAPM), the objective is to analyse the performance of investment portfolios. Risk and average return on a portfolio are analogous to inputs and outputs in models of production; in CAPM, the attainable set of portfolios is naturally convex and the boundary of this set gives a benchmark relative to which the eYciency of a portfolio can be measured. These models were developed by Markovitz (1959) and others; Sengupta (1991) and Sengupta & Park (1993) provide links between CAPM and non-parametric estimation of frontiers as in DEA. Lovell (1993) and others have labelled DEA and similar approaches to eYciency measurement as deterministic , as if to suggest that DEA models have no statistical underpinnings. Typical DEA applications invariably present point estimates of ineYciency, with no measure and only slight or no discussion of uncertainty surrounding these estimates. Yet, since eYciency is measured relative to an estimate of the frontier, estimates of eYciency from DEA models are subject to uncertainty due to sampling variation. Banker (1993) proved the consistency of the output-oriented eYciency scores in the case of a single output, but gave no indication of the achieved rate of convergence. Korostelev et al. (1995a, 1995b) also analysed the single-output problem and derived the speed of convergence for the estimated attainable production set (using either the HausdorV metric or the Lebesgue measure of symmetric diVerences between the true and the estimated production sets), but not for the estimated measures of eYciency. The theory of statistical consistency in DEA models has been extended to the general multi-input and multi-output case for both input- and output-oriented eYciency measures in Kneip et al. (1998), where the rates of convergence are also derived. Due to the complexity and multidimensional nature of DEA estimators, the sampling distributions of the estimators are not easily available. In the very particular case of one-input and one-output, Gijbels et al. (1999) derived the asymptotic sampling distribution of the DEA estimator, with an expression for its asymptotic bias and variance. However, in the more useful multi-output and multiinput case, the bootstrap methodology seems, so far, to be the only way to investigate sampling properties of DEA estimators. Simar & Wilson (1998) proposed a bootstrap strategy for analysing the sensitivity of the eYciency measures to sampling variation, providing con® dence intervals and corrections for the bias inherent in the DEA procedure. The method has been applied to time dependence structures for estimating Malmquist indices (Simar & Wilson, 1999). However, the methodology in both these papers relies on some restrictive homogeneity assumptions on the distribution of eYciency among ® rms. This paper presents an extension of the method to allow for more general DGPs; in particular, for less restrictive eYciency structures. Section 2 introduces some basic concepts and notation along with a brief introduction to DEA estimators. In Section 3 we analyse the underlying statistical model and explicitly state our assumptions regarding the DGP. In Section 4 we propose a general bootstrap procedure. An empirical illustration is given in Section 5, and conclusions are discussed in Section 6. Although the discussion throughout is in terms of the
Bootstrapping in non-parametric frontier models
781
Shephard (1970) input distance functions, it is straightforward to adapt the techniques presented below to output distance functions or to other non-parametric models (e.g. FDH models), including DEA measures of allocative eYciency and overall eYciency such as those described by FaÈre et al. (1985).
2 Frontier analysis and DEA estimators 2.1 The frontier model Given column vectors of p inputs (denoted by x Î R p+ ) and of q outputs (denoted by y Î R q+ ), the activity of a productive organization can be described by means of the production set W of physically attainable points (x, y): W
5
+ {(x, y) Î R p+ q ½ x can produce y}
(1)
This set can be described by its sections, either an input requirement set de® ned ; yÎ W , p X( y) 5 {x Î R + ½ (x, y) Î W }
(2)
or an output correspondence set de® ned ; x Î W , q Y(x) 5 { y Î R + ½ (x, y) Î W }
(3)
Clearly, x Î X( y) Û
y Î Y(x)
(4)
The relations between these two sets, along with standard assumptions one may reasonably make on them, are discussed in Section 9.1 of Shephard (1970). The convexity of X( y) for all y (and of Y(x) for all x) and the disposability of inputs and outputs are the most usual. The disposability assumptions correspond to monotonicity of the frontier; i.e. y1
1 if and only if x Î X( y); if d (x, y) 5 1, then (x, y) Î X( y) and the point (x, y) is said to be input-eYcient. It will be useful later to denote the eYcient level of input, corresponding to the output level y and the input vector direction determined by x, as x d (x, y)
x ( y) 5
(12)
Note that x ( y) is the intersection of X( y) and the ray (h x, y), h Î [0, ` ]. Since radial distances are used, we will often refer to the polar coordinates of x de® ned by its modulus x 5 x (x) Î R +
where x (x) 5
Î
(x ¢ x), and its angle g 5 g (x) Î
where, for j 5 1, . . . , p 2 1, g
j
5
arctan
f g 0,
( )
p
p2 1
2
xj + 1 if x1 > 0 or g x1
j
5
p 2
if x1 5 0.
Typically, W , X( y) and X( y) are unknown; hence, for a ® rm producing at (x, y), d (x, y) is also unknown. The DEA technique provides a consistent estimator of d (x, y) from a random sample l 5 {(xi , yi ) ½ i 5 1, . . . , n}.
2.2 The DEA approach The DEA approach involves measurement of eYciency for a given ® rm at (x, y), relative to the boundary of either the convex or conical hulls of the data l 5 {(xi , yi ), i 5 1, . . . , n} intersected with the free-disposal hull. The intersection of the convex and free-disposal hulls is given by3 W Ã 5 {(x, y) Î R p
+q
½
y
+
n
i5 1
c i xi , +
n
i5 1
c i 5 1, c i > 0, i 5 1, . . . , n} (13)
Replacing W in (2) and (7) with W Ã yields estimates of the input requirement set and the input-eYcient boundary for the output level y, respectively: Ã ( y) 5 {x Î R p ½ (x, y) Î W Ã } X Ã ( y) 5 X
à ( y), {x Î R p ½ x Î X
à ( y) h xÏ X
(14) ; 0 < h < 1}
(15)
Finally, for any given point (x, y), the estimator d à (x, y) of d (x, y) is obtained by à ( y) from (14) for X( y) in (11), yielding substituting X
783
Bootstrapping in non-parametric frontier models à ( y)} d à (x, y) 5 sup{d ½ x/d Î X
(16)
To make this operational, we rewrite (16) as a linear program: (d à (x, y))
2
1
5
min{h > 0 ½ y
+
n
i5 1
c i xi , +
n
i5 1
c i 5 1, c i > 0, i 5 1, . . . , n} (17)
Analogous to equation (12) we have an estimator of the input-eYcient level of inputs, x xà ( y) 5 à d (x, y)
(18)
à ( y) is an upward-biased estimator of X( y), and d à (x, y) Note that W Ã Í W , and so X is a downward-biased estimator of d (x, y) so that d à (x, y)
1
; (x, y) Î W Ã
(20)
Thus, for each of the sample points in l , we have d à (xi , yi ) > 1
; i 5 1, . . . , n
(21)
In order to perform statistical inference on the estimated input distances, we must analyse the behaviour of the diVerence (d à (x, y) 2 d (x, y)) for a given (x, y) by investigating its sampling distribution.
3 The statistical model and consistency of DEA Our statistical model is de® ned through the following assumptions, which were used by Kneip et al. (1998). These assumptions serve to characterize the DGP, and augment Shephard’s (1970) assumptions, which are more concerned with the nature of the underlying production set. Assumption A1: {(xi , yi ), i 5 1, . . . , n} are i.i.d. random variables on the convex production set W . For the input-oriented case presented here, consider a point (x, y) Î W . Then for a given output level y, the corresponding input level x falls on the ray (d x ( y), y), d Î [1, ` ]; the deviation of (x, y) away from X( y) is assumed to result from technical ineYciency. Assumption A2: Outputs y possess a density f(´ ) whose bounded support X compact.
Í
R q+ is
Due to the radial nature of the ineYciency, the conditional p.d.f. of x for a given y is more naturally introduced through the polar coordinates of x. Assumption A3: For all y Î X , g 5 (g 1 , . . . ,g p 2 1 ) has a conditional p.d.f. f(g ½ y) on 2 [0, p /2]p 1 and conditional on ( y, g ), the modulus x has a density f(x ½ y, g ).
784
L. Simar & P. W. Wilson
Note that, for a given ( y, g ) the eYcient input-level x ( y) de® ned by (12) has modulus x (x ( y)) 5 inf{x Î R + ½ f(x ½ y, g ) > 0}
(22)
The relation between x (x) and the input distance function d (x, y) is given by d (x, y) 5
x (x) x (x ( y))
(23)
Assumption A3 induces, by (23), a conditional p.d.f. for d (x, y) given ( y, g ), namely 4 f(d ½ y, g ), with support [1, ` ). In order to achieve consistency in any non-parametric estimation of spatial boundaries, the DGP must ensure that points will be observed near the frontier when n is suYciently large. This imposes, in our case, an assumption on the conditional p.d.f. of x given ( y, g ). p2 1 Assumption A4: For all y Î X and all g Î [0, p /2] , there exist constants e 1 > 0 and e 2 > 0 such that ; x Î [x (x ( y)), x (x ( y)) + e 2 ], f(x ½ y, g ) > e 1 .
Again, this implies by (23), that f(d ½ y, g ) > e 1; d Î [1, 1 + e 2 ]. In addition, Kneip et al. (1998) make two additional assumptions regarding the smoothness of the frontier in order to derive the rates of convergence; these may be written in terms of d (x, y). For sake of simplicity, we assume the following here.5 Assumption A5: The distance function d (x, y) is diVerentiable in both its arguments. Consider now a ® xed point (x, y) and a sample l 5 {(xi , yi ) ½ i 5 1, . . . , n} generated by the DGP de® ned by A1± A5 in terms of polar coordinates, where the modulus is de® ned through the distance function with respect to the frontier by + equation (23). An observation (xi , yi ) Î R p q has as its polar coordinate representation ( yi ,g i , d i ) Î U 5
R q+ 3 [0, p /2]p 2
1
3
[1, ` )
(24)
where d i5
x (xi ) x (xd ( yi ))
(25)
The DGP is completely de® ned through the density of ( yi ,g i , d i ) on U f( yi ,g i , d i ) 5 f(d i ½ yi ,g i )f(g i ½ yi )f( yi )
: (26)
Kneip et al. (1998) proved that for a ® xed point (x, y), d à (x, y) 2 d (x, y) 5 c
P
(n
2
2 /(p
+ q + 1)
)
(27)
Thus, d à (x, y) is a consistent estimator of d (x, y), but the rate of convergence is low; furthermore, by construction, d à (x, y) is downward-biased. Unfortunately, very few results exist on the sampling distribution of d à (x, y). Gijbels et al. (1999) derived the asymptotic distribution of d à (x, y) in the special case of one input and one output (p 5 q 5 1) along with an analytic expression for its large sample bias and variance. This would allow one to construct a bias-corrected estimator and con® dence intervals for this special case. Unfortunately, in the more general multivariate setting, the radial nature of the distance function and the complexity of the estimated frontier complicates the derivations; so far, the bootstrap appears
Bootstrapping in non-parametric frontier models
785
to oVer the only way to approximate this asymptotic distribution. The next section proposes a general methodology for bootstrapping the distribution of d à (x, y) 2 d (x, y). 4 The bootstrap 4.1 The principles Let d denote the DGP de® ned by assumptions A1± A5, from which the random sample l 5 {(xi , yi ) ½ i 5 1, . . . , n} is obtained. Consider again a ® xed point (x, y). From (16) we can obtain an estimate d à (x, y) of d (x, y). Typically, we would choose this ® xed point to correspond to one of the points in l , but this is not necessary. Given a consistent estimator d à of d estimated from the data l , consider now a new data set l * 5 {(x*i , y*i ), i 5 1, . . . , n} drawn from d à . The convex hull of l * gives an estimator W à * of W à , which from the perspective of l *, is the true set of possible values in l *, and which in our original setting was an estimate of W de® ned in (1). Speci® cally, we have W à * 5 {(x, y) Î R p
+q
½
y
+
n
i5 1
c i x*i , +
n
i5 1
c i 5 1, c i > 0, i 5 1, . . . , n} (28)
Analogous to (14)± (15), corresponding to W à * we have à *( y) 5 {x Î R p ½ (x, y) Î W à *} X
(29)
and à *( y), h x Ï X à *( y) ; 0 < h < 1} {x Î R p ½ x Î X
à *( y) 5 X
(30)
à ( y) with X à *( y) in (16), we have Replacing X à *( y)} d à *(x, y) 5 sup{d ½ x/d Î X
(31)
which may be evaluated by solving the linear program (d à *(x, y))
2
1
+
5 n
i5 1
min{h > 0 ½ y
+
n
i5 1
c i x* i , (32)
c i 5 1, c i > 0, i 5 1, . . . , n}
Note that conditional on l , the sampling distribution of d à *(x, y) is (in principle) completely known since d à is known, although it may be diYcult to compute analytically. However, the sampling distributions are easily approximated by Monte Carlo methods. Using d à to generate B samples l *b , b 5 1, . . . , B, yields a set of pseudo estimates d à * b (x, y), b 5 1, . . . , B. The empirical density function of these bootstrap values gives a Monte Carlo approximation of the sampling distribution of d à *(x, y), conditional on d à . The bootstrap method introduced by Efron (1979) (see also Efron, 1982, or Efron & Tibshirani, 1993) is based on the idea that if d à is a consistent estimator of d , the known bootstrap distributions will mimic the original unknown sampling distributions of the estimators of interest. More speci® cally, (d à *(x, y) 2 d à (x, y)) ½ d Ã
~ (d à (x, y) 2 d (x, y)) ½ d
approx
(33)
786
L. Simar & P. W. Wilson
A naive bootstrap would consist of sampling the pairs (x* i , y* i ) with replacement from the original pairs in l . In boundary problems such as this, the naive bootstrap yields inconsistent estimates (see Bickel & Freedman, 1981 or Beran & Ducharme, 1991 for discussions of this problem in the context of estimating the support of a univariate density; see Gijbels et al., 1999 for a two-dimensional case). To illustrate, consider a ® xed point (x, y). With non-zero probability, the naive bootstrap estimate + à ( y), y) is a d à *(x, y) will equal d à (x, y). This can be proven as follows. In R p q , (X polyhedron de® ned by all the dominating facets of (xi , yi ), i 5 1, . . . , n, where the à ( y), y) and are determined by ( p + q) observed dominating facets are given by ( X eYcient points. The probability that the bootstrap sample l * contains the original dominating facet of (x, y) is then
( ) ( ) p +q
+q
p
+
n
j
( 2 1)j 1 2
j5 0
j , n
+ 2 which has the limit (1 2 e 1)p q > 0 as n ® ` . Consequently, in the naive bootstrap d à *(x, y) 5 d à (x, y) with non-zero probability; moreover, this problem does not go away as n ® ` , and so the naive bootstrap is inconsistent. One way to overcome this diYculty is to use a smoothed bootstrap, i.e. to draw i.i.d. bootstrap samples (x*i , y*i ), i 5 1, . . . , n, from a density Ãf(x, y) on W à where Ãf(x, y) is a smooth, consistent estimator of the joint density of (x, y) on W . In terms of the polar coordinates introduced in equations (24)± (25), this is equivalent to estimating the density f( y, g , d ) and drawing bootstrap samples ( y* i ,g i* , d i* ) from the estimated density. Unfortunately, f( y, g , d ) has bounded support as described by (24), and ordinary kernel density estimates are known to be biased and inconsistent near boundaries. Scott (1992) proposes the use of boundary kernels to deal with this problem in the univariate case, but it is not clear how this method could be extended to higher-dimensional spaces such as ours. Instead, we adopt the re¯ ection method proposed by Silverman (1986) to estimate f( y, g , d ).6 Since the frontier is unknown, d is not directly observable in our sample. Therefore, we will use the estimator of f( y, g , d à ) from the set of points {( yi ,g i , d à i ), i 5 1, . . . , n}, where d à i is the consistent DEA estimator of d i . To estimate f( y, g , d à ) consistently, recall from equation (24) that d à Π[1, ` ]. As noted earlier, in a given sample of size n we will likely observe many values of d à i 5 1. To ensure consistency of our estimator Ãf of f( y, g , d à ), we re¯ ect each point ( yi ,g i , d à i ) through the boundary at unity for d à i .7 Let n denote the n 3 (p + q) matrix
5 n
g
[yi
i
dà i]
(34)
so that the ith row of n contains the observation for the ith ® rm expressed in polar coordinates. Then the matrix of points re¯ ected about the boundary d à i 5 1 is n
R
5
g
[yi
2 2 dà i ]
i
Let zi denote the ith row of n , and zRi denote the ith row of n set of points is now given by the 2n 3 ( p + q) matrix nÄ
5
f g n
n
R
(35) R
. The `augmented’
(36)
Bootstrapping in non-parametric frontier models
787
We wish to estimate the unknown density of the 2n unbounded points represented by n Ä . We use a multivariate Gaussian kernel function scaled to have the same shape as the cloud of points in (p + q)-space represented in n Ä . Speci® cally, let R Ã 1 be an estimate of the covariance matrix of n , which can be written in partitioned form as RÃ
f
5
1
S11
S12
S21
S22
g
(37)
¢ is ( p + q 2 1) 3 1, and S22 is scalar.8 where S11 is (p + q 2 1) 3 ( p + q 2 1), S12 5 S 21 Accordingly, the corresponding estimate of the covariance matrix of the re¯ ected data is RÃ
2
5
f
2
S11 2
S12
S21
S22
g
(38)
Finally, let Kz ( ´ ) be a (p + q)-variate Gaussian density with zero mean and shape R Ã z for F 5 1, 2; i.e. Kz (x) 5 (2p )
2
(p
+ q)/2
(det(R Ã z ))
2
1/2
(
1 ¢Ã xR 2
exp 2
z
2
1
)
F 5 1, 2
x ,
(39)
A consistent kernel estimator of the density of z 5 ( y, g , d ) over the 2n observations in n Ä is given by Ä fh (z) 5
1 + + 2nh(p q) i 5
n 1
f ( ) ( )g z 2 zi h
K1
+ K2 z 2 zRi h
where h is the bandwidth. Then, a consistent estimator Ãfh of f, with bounded support U (24), is obtained by setting Ãfh (z) 5
{
2fÄh (z) 0
if z Î U otherwise
(40)
de® ned in equation
(41)
The remaining question regards the proper choice of the bandwidth h. Consistency of à fh(z), and hence Ãfh(z), requires h ® 0 as n ® ` , but not too quickly; in + + 2 particular, we need h 5 c (n 1/(p q 4)). One possibility is to use the normal reference rule (e.g. see Scott, 1992), which assigns hà 5
(
4 + p q+2
)
1/(p
+ q + 4)
n
2
1/(p
+ q + 4)
(42)
This bandwidth minimizes the asymptotic mean integrated square error (AMISE) when the data are normally distributed, and have been prewhitened to have unit variance and zero covariance. Unfortunately, the data in n Ä will almost certainly be highly non-normal, and so using equation (42) to determine the bandwidth will almost certainly fail to minimize the AMISE. A better approach is to use the data to choose a value hà that minimizes an estimate of mean integrated square error (MISE); we provide such a method in the appendix. To generate a random sample of size n from à fh(z) in equation (41) is very easy;
788
L. Simar & P. W. Wilson
an algorithm is given in the next section. This provides pseudo observations Î {( y* , i 5 1, . . . , n}, which can be translated back to the rectangular i ,g * i ,d * i ) U coordinates {(x* 1, . . . , n}. To demonstrate this, consider the draw i , y* i ), i 5 ( y* i ,g i* , d i* ) from à fh(z) in equation (41). Given the output level y* i and the angle g i* , we must determine the location of the original estimated frontier à X ( y* i ) in order to compute x* i . This is accomplished by solving an additional linear program. Let xÄ be any point in the x-space on the ray with angle g i* (for instance, take xÄ1 5 1 and xÄj + 1 5 tan(g i*j ) for j 5 1, . . . , p 2 1). For this point (xÄ, y* i ), use equation (17) to compute d à (xÄ, y*i ). Given the output level y*i and the angle g i* contained in our draw from Ãfh(z), we can replace d à (x, y) with d à (xÄ, y* i ) in equation (18) and the x appearing in the numerator of (18) with xÄ to obtain xÄ xà *( y*i ) 5 à d (xÄ, y*i )
(43)
which gives the (true) input-eYcient level of inputs in the bootstrap world.9 In other words, xà *( y* i ) gives the value of the input vector with angle g * i , corresponding à to output level y* , and lying on the estimated frontier ( ). X y* i i Finally we can de® ne the Cartesian coordinates x*i by computing x*i 5 d *i xà *( y*i )
(44)
Thus, x*i is formed by taking a random deviation away from the input vector xà *( y* i ) lying on the estimated frontierà X( y* i ), consistent with our assumptions A1± A5 regarding the underlying DGP. Now, having constructed the pseudo-sample l * 5 {(x* i , y* i ), i 5 1, . . . , n} we can à d compute the bootstrap value *(x, y) by solving the linear program (32).10 To clarify, we detail the various steps we have described above in algorithmic form in the next section. 4.2 The algorithm As seen from the previous section, the critical element of our bootstrap procedure involves generation of pseudo-data ( y*i ,g *i , d *i ) Î U from the density estimate Ãfh(´ ) given by equation (41). Our procedure consists of eleven steps, as detailed below. Step 1. From the original data set l , compute d à i 5 d à (xi , yi ); i 5 1, . . . , n using equation (16). Step 2. Translate the data into polar coordinates: ( yi ,g i , d à i ); i 5 1, . . . , n, and form the augmented matrix n Ä as in (34)± (36). Step 3. Compute the estimated covariance matrices R à 1 , R à 2 as in (37)± (38), and the lower triangular matrices L1 and L2 such that R à 1 5 L1L1¢ and R à 2 5 L2 L2¢ via the Cholesky decomposition. Step 4. Choose an appropriate bandwidth h as described in the appendix using the information in n Ä , R à 1 and R à 2 . Step 5. Draw n rows randomly, with replacement from the augmented matrix n Ä and denote the result by the n 3 (p + q) matrix n Ä *; compute zÅ*, the 1 3 (p + q) row vector containing the means of each column of n Ä *. Step 6. Use a random number generator to generate an n 3 (p + q) matrix e of i.i.d. standard normal pseudo-random variates; let e i´ denote the ith row of this matrix. Then compute the n 3 ( p + q) matrix e * with the ith row e * i ´ given by
Bootstrapping in non-parametric frontier models e *i ´ 5 e i´ Lz ¢
789 (45)
~ Np + q (0, R Ã z ) where F 5 1 if the ith row of n Ä * was drawn so that e * i´ from rows 1, . . . , n of n Ä , or F 5 2 if the ith row of n Ä * was drawn from rows (n + 1), . . . , 2n of n Ä . Step 7. Compute the n 3 (p + q) matrix C 5 (1
+ h2 ) 2
(Mn Ä * + he *) + in
1/2
zÅ*
(46)
where M 5 In 2 (1/n) in i n¢ is the usual n 3 n centring matrix with In denoting an identity matrix of order n, in an n 3 1 vector of ones, and denotes the Kronecker product.11 q p2 1 Step 8. Partition C so that C 5 [c i1 , c i2 ,c i3 ], where c i1 Î R + , c i2 Î [0, p /2] , and Î + c i3 ( 2 ` , ` ) ; i 5 1, . . . , n. De® ne the n 3 ( p q) matrix of bootstrap pseudo-data n * such that the ith row z*i of n * is given by z*i 5
{
(c
i1
,c
i2
,c
(c
i1
,c
i2
,2 2 c
i3
if c
) i3
)
i3
> 1
otherwise.
(47)
Thus, values c i3 < 1 are re¯ ected back across the boundary d à i 5 1, ensuring that z*i Î U ; i 5 1, . . . , n. Step 9. Translate the polar coordinates in n * to Cartesian coordinates using equations (43) and (44); note that this requires solving n linear programs similar to (17), as discussed in the previous section and in footnote 9. This yields the bootstrap sample l * 5 {(x* i , y* i ), i 5 1, . . . , n}. For observations where this results in linear programs with infeasible solutions, repeat Steps 5± 8.12 Step 10. For the given point (x, y), compute d à *(x, y) by solving the DEA program (32). Step 11. Repeat Steps 5± 10 B times to obtain a set of bootstrap estimates {d à b*(x, y) ½ b 5 1, . . . , B }. Step 11 amounts to a Monte-Carlo simulation. Unfortunately, the computational burden is not negligible, particularly for large n. Step 1 requires that n linear programs be solved; Step 9 requires solution of an additional n linear programs, and Step 10 requires solution of a linear program for each point (x, y) being considered. Moreover, Steps 9 and 10 are repeated B times. If one were to consider the eYciency of each of the n points in the original sample l , the total number of linear programs for such an application would be n(2B + 1), which is of order O(nB). For very large values of n, however, using the bootstrap to estimate con® dence intervals, variance, bias, etc, for every observation might yield an overwhelming amount of information. One might gain more insight by identifying groups of similar observations, perhaps through a cluster analysis, and then bootstrapping from a representative point for each of these groups, perhaps located at the centre of each group. Alternatively, the researcher might be interested in the performance of only a few ® rms within a large sample, in which case the computational costs will be much lower than indicated above. Once the bootstrap values d à b*(x, y), b 5 1, . . . , B have been obtained, we can compute the bootstrap bias estimate for the original estimator d à (x, y) as à biasB [d à (x, y)] 5 B 2
1
+
B
b5 1
d à *b (x, y) 2 d à (x, y)
(48)
790
L. Simar & P. W. Wilson
which is merely the empirical bootstrap analogue of the de® nition of bias, i.e. E[d à (x, y)] 2 d (x, y). Then a bias-corrected estimator of d (x, y) can be computed as d Ãà (x, y) 5 d à (x, y) 2 biasB [d à (x, y)] 2 5 2d à (x, y) 2 B
1
+
(49)
B
b5 1
d à *b (x, y)
Of course, one should avoid using d à (x, y) if it has a higher mean-square error than d à (x, y). The variance of the summation term on the right-hand side (r.h.s.) of the second line of equation (49) can be made arbitrarily small by increasing B. Yet, even if B ® ` , the bias-corrected estimator d à (x, y) will have variance equal to four times that of the original, uncorrected estimator, d à (x, y). The sample variance of the bootstrap values d à b*(x, y) gives an estimate of the variance of d à (x, y); call this estimate r à 2 . Then the estimated mean square error of d à (x, y) is 4r à 2 if B ® ` , and à asB [d à (x, y)])2 ] for d à (x, y). Hence, the bias-correction should not be used [r à 2 + (bi unless 1 à r à 2 < (bi asB [d à (x, y)])2 3
(50)
Moreover, since equation (50) contains only estimates of variance and bias rather than the true values, caution would indicate that the bias correction in equation à asB [d à (x, y)])2 /r à 2 is well above unity. (49) be used only if the ratio 13 (bi à The bootstrap values d b*(x, y) can also be used to construct con® dence intervals for the true value d (x, y), along the lines of Simar & Wilson (1999). Recall that the idea behind the bootstrap is to approximate the unknown distribution of (d à (x, y) 2 d (x, y)) by the distribution of (d à *(x, y) 2 d à (x, y)) conditioned on the original data l . The bootstrap values d à b*(x, y), b 5 1, . . . , B, together with the original estimate d à (x, y), can be used to obtain an empirical approximation to the second distribution. If we knew the distribution of (d à (x, y) 2 d (x, y)), then it would be trivial to ® nd values aa , ba such that Prob( 2 ba
max j( y*j ); the solution is infeasible in this case since (x, y) lies above the bootstrap frontier. This is a problem of ® nite samples. 2 11. The rescaling factor (1 + h2 ) 1 /2 is required for the rows of C to have approximately the covariance structure R à 1 of the original data in n ; the centering correction is to keep the correct mean of n Ä . 12. In some cases, the linear program described in footnote 9 may have infeasible solutions; this will happen when y*i > maxj( yj ). One approach would be to impose an additional boundary condition, namely, y*i < max j( yj ), but this would result in a great increase in complexity and computational burden. We adopt the simpler, innocuous approach of deleting bootstrap values for which > max j(yj ), and then redrawing. Note that the inequalities here are understood to be elementy* i wise. In our empirical example in Section 5, this problem arises in approximately 2.3% of bootstrap draws. 13. Note that we have omitted the subscript b from d à * in (52), to signify the random variable d à *, as opposed to one of its realizations, d à b*. 14. In Simar & Wilson (1998), we constructed con® dence intervals for distance functions using an estimate of bias analogous to (48). The approach in this paper avoids introducing the extra noise contained in this estimate. The bias inherent in the distance function estimates is implicitly accounted for here since we use the bootstrap values to construct an empirical distribution of diVerences as in equation (53).
Bootstrapping in non-parametric frontier models
797
15. See Charnes et al. for the actual data and a detailed discussion, including de® nitions of the input and output variables. Charnes et al. indicate that schools with similar circumstances were chosen for their data to allow comparison of those implementing PFT and those not doing so.
REFERENCES Banker, R. D. (1993) Maximum likelihood, consistency and data envelopment analysis: a statistical foundation, Management Science, 39(10), pp. 1265± 1273. Beran, R. & Ducharme, G. (1991) Asymptotic Theory for Bootstrap Methods in Statistics (Montreal, Centre de Reserches Mathematiques, University of Montreal). Bickel, P. J. & Freedman, D. A. (1981) Some asymptotic theory for the bootstrap, Annals of Statistics 9, pp. 1196± 1217. Campbell, N. A. (1980) Robust procedures in multivariate analysis I: robust covariance estimation, Applied Statistics, 29, pp. 231± 237. Charnes, A., Cooper, W. W. & Rhodes, E. (1978) Measuring the ineYciency of decision making units, European Journal of Operational Research, 2, pp. 429± 444. Charnes, A., Cooper, W. W. & Rhodes, E. (1979) Measuring the eYciency of decision making units, European Journal of Operational Research, 3, p. 339. Charnes, A., Cooper, W. W. & Rhodes, E. (1981) Evaluating program and managerial eYciency: an application of data envelopment analysis to program follow through, Management Science, 27, pp. 668± 697. Debreu, G. (1951) The coeYcient of resource utilization, Econometrica, 19, pp. 273± 292. Deprins, D., Simar, L. & Tulkens, H. (1984) Measuring labor ineYciency in post oYces. In: M. Marchand, P. Pestieau & H. Tulkens (Eds) The Performance of Public Enterprises: Concepts and Measurements (Amsterdam, North-Holland), pp. 243± 267. Efron, B. (1979) Bootstrap methods: another look at the jackknife, Annals of Statistics 7, pp. 1± 16. Efron, B. (1982) The Jackknife, the Bootstrap, and Other Resampling Plans, DBMS-NSF Regional Conference Series in Applied Mathematics, Monograph 38, Society for Industrial and Applied Mathematics, Philadelphia. Efron, B. & Tibshirani, R. J. (1993) An Introduction to the Bootstrap (New York, Chapman and Hall). Fan, J. & Gijbels, I. (1996) Local Polynomial Modeling and its Applications (New York, Chapman and Hall). FaÈ re, R., Grosskopf, S. & Lovell, C. A. K. (1985) The Measurement of EYciency of Production (Boston, Kluwer-NijhoV Publishing). Farrell, M. J. (1957) The measurement of productive eYciency, Journal of the Royal Statistical Society, A(120), pp. 253± 281. Gijbels, I., Mammen, E., Park, B. U. & Simar, L. (1999) On estimation of monotone and concave frontier functions, Journal of the American Statistical Association, 94, pp. 220± 228. HaÈ rdle, W. (1990) Applied Nonparametric Regression (Cambridge, Cambridge University Press). Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. & Stahel, W. A. (1986) Robust Statistics: The Approach Based on In¯ uence Functions (New York, Wiley). Kneip, A., Park, B. & Simar, L. (1998) A note on the convergence of nonparametric DEA eYciency measures, Econometric Theory, 14, pp. 783± 793. Kneip, A. & Simar, L. .(1996) A general framework for frontier estimation with panel data, Jour nal of Productivity Analysis, 7(2/3), pp. 187± 212. Korostelev, A., Simar, L. & Tsybakov, A. (1995a) EYcient estimation of monotone boundaries, The Annals of Statistics, 23, pp. 476± 489. Korostelev, A., Simar, L. & Tsybakov, A. (1995b) On estimation of monotone and convex boundaries, Publications de l’ Institut de Statistique de l’Universite de Paris XXXIX, 1, pp. 3± 18. Lovell, C. A. K. (1993) Production frontiers and productive eYciency. In: Hal Fried, C. A. Knox Lovell & Shelton S. Schmidt (Eds) The Measurement of Productive EYciency: Techniques and Applications (Oxford, Oxford University Press), pp. 3± 67. Markovitz, H. M. (1959) Portfolio Selection: EYcient Diversi® cation of Investments (New York, Wiley). Rousseeuw, P. J. (1985) Multivariate estimation with high breakdown point. In: W. Grossman, G. Plug, I. Vincze & W. Wertz (Eds) Mathematical Statistics and Applications, Vol B (Dordrecht, Reidel Publishing), pp. 283± 297. Scott, D. W. (1992) Multivariate Density Estimation (New York, Wiley). Seiford, L. M. (1996) Data envelopment analysis: the evolution of the state-of-the-art (1978± 1995), Journal of Productivity Analysis, 7(2/3), pp. 99± 138.
798
L. Simar & P. W. Wilson
Seiford, L. M. (1997) A bibliography for data envelopment analysis (1978± 1996), Annals of Operations Research, 73, pp. 393± 438. Sengupta, J. K. (1991) Maximum probability dominance and portfolio theory, Journal of Optimization Theory and Applications, 71, pp. 341± 357. Sengupta, J. K. & Park, H. S. (1993) Portfolio eYciency tests based on stochastic dominance and cointegration, International Journal of Systems Science, 24, pp. 2135± 2158. Shephard, R. W. (1970) Theory of Cost and Production Function (Princeton, New-Jersey, Princeton University Press). Silverman, B. W. (1986) Density Estimation for Statistics and Data Analysis (London, Chapman and Hall). Simar, L. (1996) Aspects of statistical analysis in DEA-type frontier models, Journal of Productivity Analysis, 7, pp. 177± 185. Simar, L. & Wilson, P. (1998) Sensitivity analysis of eYciency scores: How to bootstrap in nonparametric frontier models, Management Science, 44, pp. 49± 61. Simar, L. & Wilson, P. (1999) Estimating and bootstrapping Malmquist indices, European Journal of Operational Research, 115, pp. 459± 471. Wilson, P. W. (1993) Detecting outliers in deterministic nonparametric frontier models with multiple outputs, Journal of Business and Economic Statistics, 11, pp. 319± 323.
Appendix. Choice of bandwidth Least-squares cross-validation (e.g. see Silverman, 1986) provides a data-driven criterion for choosing the bandwidth h in kernel density estimation. As Silverman (1986) demonstrates, the usual cross-validation function gives an approximation to MISE; the goal is to choose h to minimize this approximation. As noted previously, the usual cross-validation function suVers from degenerate behaviour when data have been discretized, allowing the cross-validation function to approach 2 ` as h ® 0. We deal with this problem by minimizing an approximation to the mean weighted integrated square error (MWISE) (see for example HaÈrdle, 1990). The MWISE is de® ned by E
5
E
fò
fò
2 w(z) ( f(z) 2 Ãfh (z)) dz
ò
g
2 w(z)fà h (z) dz 2 2 w(z)f(z)fÃh (z) dz +
ò
2 w(z)f (z) dz
g
(A1)
where w(z) is some prede® ned weight function. The last term of equation (A1) does not depend on h, so the optimal choice of the bandwidth (in the sense of minimizing MWISE) minimizes the following criterion: C(h) 5 E
fò
ò
2 w(z)fà h (z) dz 2 2 w(z)f(z)fÃh (z) dz
g
(A2)
The idea of cross validation is to construct an estimate of C(h) from the data {zi , i 5 1, . . . , n}. Following Silverman, an unbiased estimator of C(h) is given by CV(h) 5
ò
2 2 + w(z)fà h (z) dz 2 n i5
n 1
w(zi )fÃh, 2 i (zi )
(A3)
799
Bootstrapping in non-parametric frontier models
where Ãfh, 2 i (zi ) is the leave-one out estimator of f(zi ) based on all the original observations except zi , with bandwidth h. In order to avoid the discretization problem, we de® ne the weight function w(z) as
{
w(z) 5
0
if d Î [1, 1 + e ]
1
otherwise,
(A4)
2 where e is small, say 10 6. By choosing e > 0 but very small, we avoid the discretization problem by preventing observations where d à i 5 1 from in¯ uencing the CV(h) in (A3). The value of e should be chosen small enough so that only observations where d à i 5 1 are eliminated. In minimizing CV(h) with respect to h, clearly there is no problem in computing the second term on the r.h.s. of (A3), but the ® rst term is more diYcult. We can write this ® rst term as
ò
ò
2 w(z)fà h (z) dz 5
zÎ U
Ãf 2h (z) dz 2
ò
zÎ U
Ãf 2h (z) dz
(A5)
e
where U e 5 R q+ 3 [0, p /2]p 1 3 [1, 1 + e ]. Consider the ® rst term of the r.h.s. of (A5): 2
ò
zÎ U
ò
Ãf 2h (z) dz 5
f ( ) ( f ( ) ( )g ò f ( ) ( ( ) ( ) ( ) 1
nh
zÎ U
1 3
nh
(p
+ q)
1
5
2 2(p
nh
+
(p
+
+ q)
+ q)
+
z 2 zj h
K1
n
K1
i5 1
n
j5 1
+
z 2 zi h
n
n
i5 1 j5 1
K1
zÎ U
+ K2 z 2 zRi h
+ K2 z 2 zRj h
)g
dz
z 2 zi z 2 zj K1 h h
) ( )( ) ( )g + K1 z 2 zi K2 z 2 zRj h
+ K2 z 2 zRi K1 z 2 zj + K2 z 2 zRi K2 z 2 zRj h
h
n
1
5
2 2(p + q) 2 1
nh
+
n
+
i5 1 j5 1
ò
h
[K1 (h
2
hu Î U
1
dz
h
zi 2 u)K1 (u 2 h
2
1
h
(A6)
zj )
+ K1 (h 2 1zi 2 u)K2 (u 2 h 2 1zRj ) + K2 (h 2 1zRi 2 u)K1 (u 2 h 2 1zj ) + K2 (h 2 1zRi 2 u)K2 (u 2 h 2 1zRj )] du 5 2
1 2 2(p
nh
+ q) 2 1
+
n
+
n
i5 1 j5 1
f ( ) ( ) ( K11
zi 2 zj h
+ 2K12 zi 2 zRj + K22 zRi 2 zRj h
h
)g
where u 5 h 1z (implying dz 5 h du), K11 (´ ) is the convolution of K1(´ ) with itself and hence is N(0, 2R Ã 1 ), K12 ( ´ ) is the convolution of K1(´ ) with K2(´ ) and hence is N(0, R Ã 1 + R Ã 2 ) and K22 ( ´ ) is the convolution of K2( ´) with itself and hence is N(0, 2R Ã 2 ).
800
L. Simar & P. W. Wilson
Consider now the second term on the r.h.s. of (A5). Let b 5 ( y, g ), 2 5 R q+ 3 [0, p /2]p 1, and W 5 [1, 1 + e ]. Then V
ò
ò ò
Ãf 2h (z) dz 5
zÎ U
d Î W
b Î V
ò fò
e
5
Ãf 2h (b ½ d )fà 2h (d ) db dd
d Î W
Ãf h2 (b ½ d ) db
b Î V
g
(A7)
Ãf 2h (d ) dd
Now consider the bracketed term in the last part of (A7). Partition zi and zRi so that zi 5 (b i , d à i ) and zRi 5 (b i , 2 2 d à i ). Then conditioning on d in our de® nition of Ãfh(z) in equation (41) yields Ãfh (b ½ d ) 5
1 nh
(p
+ q)
+
n
i5 1
f ( | ) ( | b 2 b h
K1
i
d 2 dà h
+ K2 b
i
2
b
i
h
d 2 (2 2 d à i ) h
)g
(A8)
Squaring both sides of this yields à f 2h (b ½ d ) 5
f ( | )( | ) ( | )( | ) ( | )( | )g 1
n
+
2 2(p + q)
nh
K2
K1
i5 1 j5 1
2
+ 2K1 b b 2 b h
b
i
h
i
b 2 b h
n
+
d 2 dà h
i
i
d 2 dà h
b 2 b h
K2
i
b 2 b h
K1
j
d 2 dà h
j
d 2 (2 2 d à j ) h
j
d 2 (2 2 d à i ) b 2 b K2 h h
j
(A9)
d 2 (2 2 d à j ) h
( | ) ) ( |
We must integrate both sides of this expression to obtain the bracketed term in b 2 b i d 2 dà i (A7). Since K1(´ ) is multivariate N(0, R à 1 ), K1 must be h h
(
2
N S12 S 22
1
d 2 dà i 2 , S11 2 S12 S 22 1 S21 h
Similarly, K2(´ ) is multivariate N(0, R Ã 2 ), and hence K2
(
2 1 N 2 S12 S 22
Let
2
Then we can rewrite K1
1i
2
5
S12 S 22
1
d 2 dà i and l h
( | ) ( ) b 2 b h
i
)
d 2 (2 2 d à i ) must be h
i
)
d 2 (2 2 d à i ) 2 , S11 2 S12 S 22 1 S21 h
S11.2 5 S11 2 S12 S 22 S21 , l 1
b 2 b h
d 2 dà h
i
as u
1i
b 2 b h
i
2i
5
2
, where u
2
S12 S 22 1i
1
d 2 (2 2 d à i ) . h
(´ ) is N(l
1i
, S11.2 ).
Similarly, K2 N(l
2i
ò
( | b 2 b h
i
801
Bootstrapping in non-parametric frontier models
)
d 2 (2 2 d à i ) h
can be rewritten as u
( ) b 2 b h
2i
i
, where u
( ´ ) is
2i
, S11.2 ). Then, integrating (A9), we have 1
Ãf 2h (b ½ d ) db 5
b Î V
2 2(p
nh
+ 2u
ò f ( ) ( ) ( ) ( ) ( ) ( )g
+ q)
+
n
+
u
i5 1 j5 1
b 2 b h
1i
b 2 b h
n
i
1i
b Î V
u
b 2 b h
2j
i
+u
j
u
b 2 b h
1j
b 2 b h
2i
i
j
u
b 2 b h
2j
(A10)
j
db
Using the same convolution argument as in (A6), we obtain the bracketed term of (A7):
ò
b Î V
1
à f 2h (b ½ d ) db 5
2 2(p
nh
+ q) 2 1
+
n
+
n
i5 1 j5 1
f ( ) b
u
i
11ij
2 b
j
h
+ 2u
( ) ( )g b
12ij
i
2 b
+u
j
h
b
22ij
i
2 b
j
h
(A11) where for k, F 5 1, 2, u kz ij( ´) is the convolution of u ki ( ´) and u z j( ´). Hence u 11ij is N(l 1i + l 1j , 2S11.2 ), u 12ij is N(l 1i + l 2j , 2S11.2 ), and u 22ij is N(l 2i + l 2j , 2S11.2 ). Substituting (A11) into (A7), we have, for the second term on the r.h.s. of (A5),
ò
Ãf 2h (z) dz 5
zÎ U e
ò f ( ) ( ) ( )g
1 2 2(p
nh
+ 2u
+ q) 2 1
b
12ij
i
2
+
n
+
u
i5 1 j5 1
b
11ij
d Î W
+u
j
h
b i2 b h
n
b
i
22ij
2
b
j
j
(A12)
Ãf 2h (d ) dd
h
We now need Ãf h2(d ). Note that from the de® nition of our multivariate density estimator (equation (41)), and the fact that the marginal of a multivariate normal is also normal, we have
{
1 + nh
Ãfh (d ) 5
n i5 1
f ( ) ( K*
d 2 dà h
i
+K d *
2
)g
(2 2 d à i ) h
if d > 1
0
(A13)
otherwise
where K ( ´) is the univariate normal density with mean 0 and variance S22 . * Squaring both sides of (A13) yields à f 2h (d ) 5
f ( ) ( ) ( ) ( ( ) ( )g
1 + n h k5 2 2
+K
n
+
1 l5
n
1
K*
d 2 (2 2 d à * h
k
d 2 dà h
K*
k
K*
d 2 dà h
d 2 (2 2 d à l ) h
l
+ 2K d *
2
dÃ
h
k
K*
d 2 (2 2 d à l ) h
)
(A14)
802
L. Simar & P. W. Wilson
Finally, substituting (A14) into (A12) yields a form for the second term on the r.h.s. of (A5) that we can compute:
ò
zÎ U
Ãf 2h (z) dz 5 e
3
1
+ q) + 1
4 2(p
nh
d Î W
{ f ( ) ( ) ( )g } { f ( ) ( ) ( ) ( ) ( ) ( )g } n
+
n
+
i5 1 j5 1
3
ò
+
n
+
*
11ij
n
k5 1 l5 1
+K
b
u
K*
i
2
b
+ 2u
j
h
d 2 dà h
k
K*
b
2
i
12ij
b
j
h
d 2 dà h
+u
+ 2K d
l
*
d 2 (2 2 d à k ) d 2 (2 2 d à l ) K* h h
b
22ij
2
dÃ
h
i
2
b
j
h
k
K*
d 2 (2 2 d à l ) h
dd
(A15)
This can be solved by any one of several one-dimensional numerical integration techniques on d over W 5 [1, 1 + e ]. We use ten-point Gauss± Legendre quadrature in our empirical example discussed in Section 5. The computation cost in solving (A15) should not be too great, since the integral is over a short interval in one dimension. In addition, the integral must be evaluated only once each time CV(h) is evaluated. Note that u 11ij , u 12ij and u 22ij depend on d through their mean terms, and so remain under the integral sign. Finally, the second term inCV(h) in (A3) can be written 2 + n i5
n 1
2 + n i5
w(zi )fÃh, 2 i (zi ) 5
5
n 1
w(zi )
1 + + (n 2 1)h(p q) j 5 ¹
2 + (p + q) i5 n(n 2 1)h
n
j
n 1
w(zi ) +
n
j5 1 j¹ i
f ( ) ( )g f ( ) ( )g
1 i
K1
K1
zi 2 zj h
zi 2 zj h
+ K2 zi 2 zRj h
+ K2 zi 2 zRj h
(A16) which involves straightforward computations. Substituting (A6) and (A15) into (A5), and then substituting the resulting expression together with (A16) into (A3) yields an expression for CV(h) that can be computed numerically. Of course, the resulting expression of CV(h) will be rather complicated; consequently, minimization with respect to h will typically require a grid search over a range of values h Î [hmin , hmax ]. The range of values could be determined by taking factors of, say 0.25 and 2.0, of the bandwidth suggested by the normal reference rule in equation (42). Note that due to the nature of kernel functions, the expression comprising CV(h) will be easier to compute for small values of h than for large values. In a similar problem, Fan & Gijbels (1996) suggest evaluating CV(h 5 hmin ), then successively increasing h by small increments, evaluating CV(h) each time. The process terminated after some number of successive increases in CV(h) have been observed, and the value of h that yielded the smallest value for CV(h) is then chosen as the optimum.