This paper presents a procedure for fitting monotonic polynomials .... always needed for polynomial regression, is numerically quite accurate for polynomials of ...
Computational S t a t i s t i c s
( 1 9 9 4 ) 9:233-247
Fitting Monotonic Polynomials to Data Douglas M. Hawkins Department of Applied Statistics, University of Minnesota, St. Paul, MN 55108, USA
Keywoids: Constraints, Least squares, L I norm. Snmmaiy Many statistical applications require fitting a general monotonic function to data. 'These applications range &om regular nonlinear regression modeling where the response is known to be monotonic, through transformations of data to normality, to work-horse routines in nonmetric scaling procedures. Monotonic polynomials comprise a class of possible monotonic functions for this. OuiBetting their well-known drawbacks of poor extrapolatory. properties and proclivity to tax the numerical accuracy of fitting algorithms, they have many advantages. They are parsimonious, provide predictions that vary smoothly with the argument, and are able to approximate any smooth function to arbitrary accuracy. This paper presents a procedure for fitting monotonic polynomials and illustrates their use in conventional regression modeling and in data transformation.
Introduction There are many situations in which one wishes to fit a monotonic function to data. Examples of widely different application areas include:1
General regression modeling situations in which a response function does not have any known parametric form, but is known t o m physical considerations to be monotonic in its argument.
2
Non-metric multidimensional scaling and unfolding, in which a general monotonic transformation is required to transform observed dissimilarities onto a scale with metric properties (see for example Schifiman, Reynolds and Young 1981 p 394).
3
Transformation of data to normality.
Any continuous random variable may be
transformed to normality (or to any other continuous 'target distribution' Elphinstone 1983).
see
The transformation is in general unknown but must be strictly
monotonic.
0943-4062/94/3/233-247 $2.50 © 1 9 9 4 P h y s i c a - V e r l a g ,
Heidelberg
234 4
The ttansformation of a dependent variable to obtain a more linear regression relationship (for a variant on this widely-used general technique, see Ramsey 1977).
In all of these applications, the required fitted function is monotonic increasing, and we will assume upward monotidty for the remainder of the paper. There are several approaches in the literature for fitting such monotonic functions. The best-known is isotonic regression which provides a nonparametric fit.
As commonly used i t has the drawback that it provides fitted
values only for the values of the predictor observed and for no other predictor values. Another approach is monotonic splines (Winsberg and Ramsey 1983). Both these approaches are very flexible, being able in principle to accommodate monotonic fits of arbitrary complexity. Monotonic splines have the advantage over isotonic regression that they can be evaluated for any argument and not just those observed in the calibration data. Making an analogy with general regression methodology, these approaches may be thought of as monotonic specializations of regression smoothers.
These are also able in principle to handle
relationships of arbitrary form, though not parsimoniously, by fitting the data locally. In general regression modeling, smoothers are not the only way of going beyond a simple linear regression; also widely used are global fitting methods such as power transformations and polynomial regression.
Most serious analysts would agree that there is room for both local and global
approaches. A global fit capability is desirable' because, where it works, it can be fitted quickly and provides a parsimonious model that can be evaluated easily for all arguments.
Local
nonparametric smoothers are valuable because they may work well in data sets where global fitting does not. While opinions may differ on the correct boundary lines between the areas of applicability of each, neither technology can completely substitute for the other. This thinking motivates the development of a global-fitting methodology for models that are necessarily monotonic, and one approach that merits consideration is that of monotonic polynomials.
Polynomials have many useful properties:
They are smooth and many-times
smoothly different!able, easily fitted and evaluated for all values of their argument, and capable of approximating any smooth function to arbitrary accuracy. Their main drawbacks as models for smooth functions is that while they may work well within the range of predictor values observed, their predictions are poor outside this range (in which deficiency they are far from unique), and their fitting is taxing from the viewpoint of numerical analysis. Modeling using monotonic polynomials has been used in the past in various forms. For example, in multidimensional scaling, Winsberg and Ramsey's monotonic spline functions are splines fitted to the data under a constraint of non-negative coefficients.
A related approach (MuUer 1984)
involves the fitting of an ordinary (odd-degree) polynomial whose coefficients are all constrained to be non-negative. I t is a defect of both of these approaches that the polynomials they span represent only a subset of the class of monotonic polynomials - positivity of coefficients is by no means necessary for monotonicity.
235 A different approach is that sketched in Ramsey (1977) and used in Elphinstone (1982). This approach uses the most general form of a polynomial of degree 2K+1 which is monotonic everywhere. Recognizing that a polynomial is monotonic if and only if its first derivative has no real roots, this most general form is:
G(x) = d + a ^ n ^ { l + 2 b i t + ( b ? + c ? ) t 2 } dt, where the b j and Cj are arbitrary real values. A polynomial in this most general representation can be fitted to data by finding best-fitting values of the parameters { b j , {c.}, a and d by some suitable criterion. Unfortunately, the polynomial is highly nonlinear in the parameters b^ and c^ so that for criteria of practical interest the fitting of the polynomial involves a difficult nonlinear optimization. Here, we propose a direct constrained optimization modeling procedure which is linear in the coefficients of the fitted polynomial. We will concentrate on fitting by ordinary or generalized least squares, but will sketch the extension of the approach to the L j and
norms. The
algorithm proposed for this constrained fitting is fast and, provided it is coded with the care always needed for polynomial regression, is numerically quite accurate for polynomials of the degrees likely to be used. The least sqaares case Suppose that the data consist of the triples (Yj,Xj,Wj) i = 1,2 ...n, where Wj is an optional vector of covariates (if there are no covariates, then w may be simply omitted in the equations that follow). Write the model to be fitted as
Y=P,^(x) + w \ ,
where Pt(x) is the polynomial E a-x^. * j=0 J Let P' and P" denote the first and second derivatives of P with respect to x. Monotonidty requires that P^(x) > 0 for all x. We have already noted that this requires that k be odd. We will write this model in the traditional and more compact regression format Y = Xb. where the design matrix X has as its columns the powers of x, and the values of the covariates, and the vector b is the partitioned form b^ = (a^ : v^), consisting of the polynomial coefficients aj
236 and the covaiiate regression coefDdents Vj. Fitting this model by generalized least squares leads to the optimization problem
Minimize
S(b) = ( Y - X b f r(Y-Xb)
subject to
Pj;(x)>0 for aU real X.
(1)
where F is a general symmetric matrix for generalized least squares, a diagonal matrix for weighted least squares, and the identity for unweighted least squares. This is a quadratic program with infinitely many linear constraints. Some thought about the form of its solution is instructive: the requirement that P|^(x) is monotonic at a point x implies that the first derivative of P^(x) is non^egative. In addition to non-negativity though, at any point at which P^(x) is zero, so also must be the second derivative Pj|(x), a fact easily shown by a Taylor expansion of Pj|(x) about the point x. The form of the monotonic polynomial P]j(x) is therefore that its derivative is strictly positive, with the possible exception of one or more points at which the first and second derivatives are both zero. These points (if there are any) are points of inflection with horizontal tangents.
We will call them 'hips' (a contraction of 'horizontal
inflection points'). It will be shown that the hips are central to fitting the polynomial. Define the vector h(x) = ( 0 1 2x 3x^ 4x^ 5x^ ...
kx''"^ 0 0 ... 0).
The trailing zeros in h(x) reflect the covariates w and will be absent if there are no covariates. The constraint that P|^(x) is everywhere monotonic reduces to the requirements that for all real x h(x)b > 0. The objective of the fitting therefore is Minimize
S(b) = (Y-Xb)'^r(Y-Xb)
subject to
h(x)b>0, for all X. '
(2)
There are many ways of solving this minimization problem. The most obvious is that of a direct application of general-purpose constrained least squares procedures like those of Hanson and Haskell (1982), or (less desirably) general-purpose quadratic programming codes. These methods are applied by creating a close-spaced mesh of x values and enforcing the constraint h(x) > 0 only for X values on this mesh rather than for all real x, thereby leading to a problem with a finite number of specified linear constraints. This approach is less than ideal. The discretization does not truly enforce monotonicity at all values of X, but only at the x values on the mesh. Particularly with high degree polynomials it may be necessary to use a very close mesh to achieve some assurance that the monotonicity on the mesh translates acceptably closely into monotonicity for ^1 x. Since the number of linear constraints is inversely proportional to the mesh size, this leads, to slow execution. Closely-spaced mesh values also create potential problems of numerical accuracy.
237 For this reason, we prefer a different approach - the primal-dual active set algorithm of Goldfarb and Idnani (1983) which is also discussed in Fletcher (1987 p 244). Its basis is straightforward, relying directly on standard Kuhn Tucker theory (see for example Hadley 1964 p 185). This shows that the optimal solution to (2) will be the saddle point of the Lagrangian F(b,A) = ( Y - X b ) ^ r ( Y - X b ) -
M E A^h(x^)b "m"^"m' m=l
* * * where x ^ X2,...,X|^ is the (possible empty) set of hips at which h(x) must equal 0. The solution has the form (Y-Xb)^r(Y-Xb) is a minimum subject to the constraints h(xjb = 0
m = l,2,...,M.
The true optimum will automatically satisfy the inequality h(x)b
> 0
for all other x
but if the wrong number of hips or the wrong values are specified, this inequality will be violated at some x value. The constraints of zero slope at the hips are called 'active constraints'. I t is characteristic of these hips that their Lagrange multipliers will be positive. Other x values not in the list of hips are 'inactive'; i f they were included in the Lagrangian they would have negative Lagrange multipliers. Solving the fitting problem then reduces to finding the set of hip values that, when made active, will give positive Lagrange multipliers and a fitted polynomial whose slope is strictly positive for all non-hip x values. The primal-dual active set algorithm is well-suited to solving this quadratic program.
It
operates by successively finding whether there are any currently inactive constraints that need to be made active, or any currently active constraints that would be satisfied as strict inequalities even if they were allowed to become inactive. The algorithm operates as follows: 1.
Set M = 0 and solve the unconstrained generalized least squares problem (2).
2.
Check whether the first derivative P^(x) of the fitted polynomial is non-negative everywhere. I f it is, then the fitted polynomial is monotonic already, and therefore solves the problem. If the resulting polynomial is not monotonic however, then i t will be necessary to find an additional candidate hip at which to force the polynomial's sloBe l o be zeio. To do Ihis,
3.
Increase M by 1, adding an additional candidate hip and forcing a zero derivative there. Solve the resulting equality-constrained problem (details of doing this are given below). The additional constraint at the new candidate hip may make one of the existing constraints redundant, a fact that will be detected by its Lagrange multiplier Aj^ becoming negative.
If this occurs, remove that candidate hip from the active set.
Repeat the
algorithm from step 2. Goldfarb and Idnani (1983) prove the algorithm's convergence.
It is generally very fast,
frequently terminating with the simple least squares fit with which it started. Care is needed in
238 its coding since with finite- (even double-)precision arithmetic it is not always trivial to decide whether a polynomial is really non-monotonic, or just seems so because of roundoff noise. The test for monotonicity can be made quickly and efficiently, Ending the minimum of Pj^(x) by solving Pj|(x) = 0 - that is solving P j ; ( x ) = E j(H)b:X>-2 = 0, * j=2 J which Is easily done using standard packages for manipulating polynomials.
The test, for
monotonicity is made by evaluating P^(x) at each real root of this equation - by definition the x values at which P'(x) takes on its most extreme values. If any of these derivatives is negative the polynomial is not monotonic. The converse is not true; P'(x) may have positive extreme values but be negative for all sufficiently large | x | ; for this reason while the primary test for monotonicity is based on the values of P'(x) at the roots of Pj|(x) = 0, we supplement this test with a direct evaluation of P^(x) at some relatively extreme values of x. Where the current polynomial is not monotonic, the algorithm needs to select an additional candidate hip. A good choice is the x value at which the largest negative slope occurs, a value which is a byproduct of the suggested method of checking monotonicity. This value is then used as the candidate additional hip in step 3. Solution of the fitting equation for a given set of hips
Assume the set of hips is fixed. Define the matrix H, which has M rows of the form where x ^ represents one of the hips.
hj{Xjjj}
Solving for the given set of hips then requires the
minimization of (Y-Xb)'^r(Y-Xb) subject to the equality constraints h(x^)b = 0
m = l,2,...,M.
This gives the Lagrangian T(b,A) = ( Y - X b ) ^ r ( Y - X b ) - 2 A ' ^ H b ,
(3)
where A is the vector of Lagrange multipliers associated with the constraints. The necessary b and A (see for example Campbell and Meyer, 1979 p 63) are given by the linear equations
x'^rx - H ^ -H
0
b = A
x'^rY
(4)
0
While this system of equations could be solved directly by Gaussian elimination, there are two good reasons not to do so. One is numerical accuracy and the well-known dramatic subtractive cancellation that occurs when fitting polynomials using X^X technology. The other is the gain in computational efficiency resulting from the fact that all the required calculations can be expressed as modifications of the unrestricted fit, leading to operations on much smaller matrices than (4) would suggest. Instead of(4), use the QR decomposition to write
239 X^rX = R^il, with R a square lower triangular matrix,
G = V^H. Let
be the solution to the unrestricted GLS problem (as sketched for example in Thisted 1988
chapter 3). Then the restricted equations have solution A = -(G^G)-lHb„, b = b y + VGA, and the change in residual sum of squares over the unrestricted case is C = -A-^Hb„. These equations are modest extensions of those of Liew (1976). They are also algebraic rather than computational formulas - the G^G-based calculations are replaced for computational purposes by equivalents arising from the QR factorization of G, again to avoid the loss of precision inherent in X^X calculations. The major portions of these calculations are independent of the position or number of the hips. Only those relating to the modestly-sized matrix G (of M rows and k columns) are hip-dependent quantities involving recalculation as candidate hips are added and removed. The consequence of this is that the algorithm is generally very fast.
Elzample - use for general regression modeling It may be instructive to illustrate this general regression modeling approach with a simple example. A sample of size 50 was simulated by the formula Y = 4x(x-2)^(x+0.5)^(x^+2) + e with e ~ N(0,1), and the x's equally spaced from - 1 to 1. The following table gives the y values, reading from left to right by rows. T»ble 1. -27.232 -6.851 -1.057 1.134 1.810 -0.912 3.693 10.157 14.771 21.284
Simulated polynomial regression data -20 879 -5 177 -0.886 0 438 -0 994 1 443 3 583 8 405 16 657 24 138
-15.980 -2.122 -0.995 0.791 -1.561 2.107 5.936 9.477 18.560 24.533
-11.942 -2.820 1.638 -0.006 -0.247 3.372 5.990 12.441 20.798 26.305
-7.325 -0.480 0.403 -1.397 -1.030 1.852 6.950 14.752 20.324 28.571
The true model is a polynomial of degree 7 with hips at -0.5 and 2. We used the algorithm described here to Ct monotonic polynomials of degrees 1, 3, 5, 7 and 9.
The results are
summarized in Table 2. This shows the initial unrestricted polynomial and its residual sum of squares, the hips, and the restricted polynomial and its residual sum of squares. The fits of degree
240 Table 2.
Fitting monotonic polynomial to regression data
Decree 1. Unrestricted polynomial Y = 17.93
+ 4.05,
X
Residual ssd = 973.22 Decree 3. Unrestricted polynomial: Y =
21.28 x^ + 3.77 x^ + 4.65 x +
2.74
Residual ssd = 323.69
Unrestricted polynomial: Y = 13.17
- 28.11 x^ + 6.09 x ' + 28.82 x^ + 8.02 x + 0.14
Residual ssd = 42.08 Monotonic polynomial: Y = 10.99 x^ - 21.42 x^ + 7.29 x ' + 22.18 x^ + 8.59 x + 0.99 Residual ssd = 63.57 5 iterations required, hip at -0.31 with Lagrange multiplier 5.17 Decree 7. Unrestricted polynomial: Y = 18.36 x^ - 4.28 x^ - 17.53 x^ - 22.06 x^ + 20.53 x ' + 26.73 x^ + 6.36
X
+ 0.24
Residual ssd = 39.70 Monotonic polynomial: Y = 6.55 x^ - 11.39 x^ - 31.06 x* - 10.78 5.92
X
+ 26.47 x^ + 22.04 x^ +
+ 0.57
Residual ssd = 42.00 5 iterations required, hip at -0.28 with Lagrange multiplier 1.01 Depree 9. Unrestricted polynomial: Y = 71.38 x^ - 16.60 x* - 137.51 x'^ + 27.74 x^ + 94.90 x^ - 41.14 9.16
+ 30.31 x^ + 8.45 x + 0.14
Residual ssd = 37.39 Monotonic polynomial: Y = 0.20 x^ + 6.14 x^ + 25.65 x^ - 22.44 x^ - 30.05 x* - 4.96 x"* + 26.15 x ' + 21.22 x^ + 5.91 x + 0.57 Residual ssd = 41.91 7 iterations required; hips at ^ . 0 0 , -0.28 with Lagrange multipliers 0.00, 0.90
-
241 1 and 3 were monotonic and so were obtained at once by a single least squares computation. Those of degree 5 and 7 required the fitting of a single hip, at a location of approximately -0.3. At degree 9, two hips were required - one in the same location of approximately -0.3, and one out at a large x value to prevent the polynomial from turning down outside the range of the data. The hip fitted at approximately x=-0.3 in the polynomials of degree 5, 7 and 9 may be regarded as an estimate of the true hip at x=-0.5. This hip seems to have been recovered quite well by all these fits. None of the fits recovered the true hip at x=2, a fact not too surprising since this value was far outside the range of x values covered in the data. The total computer time needed to perform these fits in double precision on a DECStation 3100 was 0.5 seconds. We performed two checks on the accuracy of the results. In the first, as a very stringent check on the numerical accuracy of the procedure we repeated the calculations in single precision. This gave results effectively identical for degrees up to 7, with errors of 2 in the third significant figure of the coefficients creeping in for the 9th degree fit. In view of the severe computational difficulty in fitting high degree polynomials, these results are a most encouraging reflection on the numerical accuracy of the algorithm.
As a second check, we fitted the
polynomials using Hanson and Haskell's (1982) general purpose constrained least squares code LSEI, also running (primarily for logistical reasons) in single precision and with the monotonicity enforced only for
x values on i selected mesh. This approach did not flie well. It gave errors in
the first significant digits of one or more coefficients for all degrees greater than 3 and gave a fitted nonic whose leading term had a negative coefficient. Its execution time was also not attractive - i t was similar to that of our dual active set algorithm running in double precision. As a second example, we studied the 'cars' data set used in the 1983 ASA Data Exposition and downloaded from Carnegie Mellon University's statlib. The portion of the data set used was the power output and fuel consumption of 392 cars for which both were provided. Clearly, the fuel consumption should increase with the engine's power output, but the relationship is likely to be nonlinear. We therefore tried predicting Y = fuel consumption in miles per US gallon, from X = engine power output, in horsepower. In these units, Y should be a monotonic decreasing function of x. As the procedures are expressed in terms of monotonic increasing polynomials, we reversed the sign of Y, fitted monotonic increasing polynomials, and then reversed the sign of the results. Polynomials of degree 1, 3, 5, 7 and 9 were fitted - the last primarily to test the ability of the code and the method when using a polynomial of degree far above what most people would normally consider using for data modeling. The total execution time on the DECStation 3100 was 3.6 seconds, two-thirds of which was used in fitting the nonic. One of the issues to be resolved in application of the method is the degree of polynomial required. On the face of i t , this is an easy matter which can be resolved by simple analysis of variance hypothesis testing of whether the high-degree powers could have zero true coefficients. However the problem is not as simple as this since even if the assumptions of normality and
242 homoscedasticity are true, the hypotheses do not satisfy the needed regularity conditions for conventional F distributions (see for example Wolak 1987). Nevertheless, it is informative to carry out the analysis of variance for adding powers to the model; this gives the following results:F
Source of
Sum of
Degrees of
Mean
variation
squares
freedom
square
Linear
215487
1
215487
1949
2
974
go to q u i n t i c
24
2
13
0.7
go to septic
163
2
82
4.3
139
2
70
3.7
7278
382
19
go to cubic
go to nonic Residual
51.3
Even absent any formal testing, it seems clear that a cubic is essential, but anything of greater degree produces only marginal improvement. Figure 1 shows the results. The solid line is the fitted cubic. The dotted curve is the fitted nonic. Most experienced users would expect a fitted ninth-degree polynomial to be truly awful, with excessive wiggles and turning points; this is seen not to be the case at all. The nonic closely agrees with the cubic - its only questionable feature is the curvature at the extreme left end of the range. Also shown on Figure 1 as a dashed curve is the default lowess provided by S. Lowess is a popular and widely-available nonparametric smoother, and might be the first analysis for users without access to more specialized software. There is no need for lowess' fit to be monotonic, but in this data set it is and would therefore be acceptable. If it turned out to be non-monotonic, it could still provide the basis for a monotonic fit by using the idea of Friedman and Tibshirani (1984) of applying the isotonic regression algorithm to its predicted values. Here lowess agrees with the polynomials over most of the range, diverging only in predicting rather poorer fuel economy for engines with the highest power output. The simulated data set and this real one together show that the monotonic polynomials can be fitted quickly, and can provide good fits to data without the bad behavior commonly expected from unconstrained polynomials. {Example - transformation to normality A quite different application of monotonic functions is that of Elphinstone (1983), who discussed the use of monotonic polynomials as a general-purpose method of transforming data from an unknown distribution to normality. Here the object is, given a random sample X j , . . . , X j j from some unknown continuous distribution, to find a strictly monotonic transformation G(.) so that Y = G(X) has a normal distribution. There are many ways of measuring the closeness of the transformed data Y^ to normality. One such is the Shapiro-Wilk statistic (Shapiro and Wilk 1965). Seeking the transformation that maximizes the Shapiro-Wilk statistic of the transformed sample brings this problem within the present framework, for then the fitting becomes a matter of
243 choosing the polynomial minimizing the residual sum of squares of the regression of the order statistics on the Shapiro-Wilk constants.
To illustrate this use, we simulated a sample of size 100 from a gamma distribution with shape parameter 2 and looked for the monotonic polynomials of degree 1, 3, 5, 7 and 9 that would yield the maximum Shapiro-Wilk statistic for the transform. This required a total execution time on the DECStation of 1 second. Given the terminal at x=0 and the skewness of this distribution, transforming it to normality would seem to be no simple matter; Figure 2 shows the resulting normal probability plot for the polynomials of degree 1 and 5. The untransformed data have strong curvature, showing that they are quite clearly not normal, so it is striking how well the quintic polynomial straightens out the probability plot, leaving only a slight flattening at the left edge, where the hard lower limit of 0 on x strains the ability of a polynomial to cover the whole real line. A natural concern with this use of a polynomial would be the suspicion that the polynomial is overfitted, and would not stand up to validation using a fresh sample. To investigate this issue, we drew another five samples of the same size from the same distribution, and applied to them the polynomial transformation fitted from the original calibration sample. Table 3 gives the Shapiro-Wilk statistics for the original and the fresh samples. For a sample of size 100, the 90% point of the Shapiro-Wilk statistic is 0.980. The table therefore shows that, while the values of the Shapiro-Wilk statistic in the validation samples are lower than in the calibration sample (as one would expect), they remain high. In particular the polynomials of degree 5, 7 and 9 in all these fresh samples are generally successful in attaining normality in that they all give Shapiro-Wilk statistics that are insignificant at the 10% level. Table 3.
Shapiro-Wilk statistics for samma-distributed samples Shapiro-Wilk statistic
Degree 1• 3 5 7 9
Calibration sample . 0.856 0.993 0.994 0.995 0.994
1 0.859 0.983 0.985 0.984 0.979
Validation 2 0.932 0.981 0.980 0.981 0.984
sample 3 0.897 0.984 0.982 0.985 0.987
4 0.905 0.988 0.986 0.987 0.989
5 0.960 0.979 0.981 0.982 0.982
Figure 3 corresponds to Figure 2, and shows the normal probability plot for the first fresh sample. Like Figure 2, i t shows the clear curvature for the untransformed and very non-normal data, but is dose to linear for the quintic transform. This provides visual confirmation of the normality suggested by the Shapiro-Wilk statistic and the success in finding a transformation to normality.
The LJ criterion We have concentrated on the least squares criterion, but monotonic polynomials are if anything even easier to fit using the L j norms. Without the constraint of monotonicity, this norm involves the objective function and constraints:
244
Figure 2. NPP of calibration data
-1
0
1
Expected normal s c o r e s Approximate Wilk-Shapiro 0.8564
100 c a s e s
NPP of quintic of calibration data
-
2
-
1
0
1
Expected normal s c o r e s Approximate Wilk-Shapiro 0.9938
100 c a s e s
245
Figure 3. NPP of validation data
-2
-1
0
1
2
Expected normal s c o r e s Approximate Wilk-Shapiro 0.8586
100 c a s e s
NPP of quintic of validation data
D -0.2
-3
-
2
-
1
0
1
Expected normal s c o r e s Approximate Wilk-Shapiro 0.9851
100 c a s e s
246
Min
n E (Pj + Nj), subject to the constraints i=l
»0 + ' ' l ^ i + V i + •• + V J + * i l ^ + •• + ^ r \ + Pi - " i = ^1= and may be solved by either the primal or the dual simplex method.
The constraint of
monotonicity adds to this tableau the constraints a j + 2a2X + ... + kai^x''"^ > 0 for all x. Either conventional linear programming theory or a direct application of Kuhn-Tucker results shows that the optimum monotonic L^ fit will have exactly the same form as in the least squares case, namely Hin
n E (P. + N.), i=l ' ^
»o * V i
i j
^ Vi
+
ZajXj
+ 23^2^
subject to the constraints
^ ••• ^ V i
+
. . .
+
' "11^ ' ••• ' "10.%^ ''i *(k-l)
f o r each hip x
ka^Xj
+ . . . + ka^x^~^
> 0
= \ '
J
f o r a l l other x .
In other words, exactly as with the least squares case, the solution consists of an active set of exact equality constraints for the hips grafted onto the conventional unconstrained L ^ n o r m fit. This problem can therefore also be handled effectively by an active set algorithm. In this, the unconstrained
fit
is found. If this happens to be monotonic, then the problem is solved. If not,
then the constraint of zero slope at some hip must be enforced. I f the linear program is solved by the dual simplex algorithm, then adding the constraints
of new hips is particularly
straightforward, consisting of generating additional columns of the simplex tableau. Enforcing monotonicity of the fitted polynomial therefore becomes a computationally quite minor addition to the basic L j fit. The L j norm is potentially attractive in data analysis since i t provides a way of fitting a monotonic polynomial robustly. The
(Chebyshev) norm can also be implemented by adapting
a conventional unconstrained approach using linear programming - this is Kin
V,
subject to the constraints
Y. - (ag 4 a j x . * ajx? . . . . . a,^xj + w . j V j . . . . . w . ^ v j < V, -Y. + (aj) + a j x . + a j x f . . . . + a^x^ + w . j V j + . . . + w . ^ v j < V. aj
+ 2a2X
+ . . . + ka^x
>0
for a l l x.
Once again, to fit a monotonic polynomial all that is necessary is to add to the standard linear programming model the additional linear constraints h(x)b > 0 for aU
X.
247 This too can be very effectively solved by the dual active set algorithm, creating columns to accommodate any hips whose existence emerges during the solution. The L norm is not of much interest for general-purpose data modeling or transformation. It does have some potential however for applications such as the approximation of tabled functions.
Conclnsion The method proposed here provides a way of fitting monotonic polynomials. It has a variety of potential uses - for data modeling where a response is known to be monotonic, for general-purpose monotonic transformations, for modeling tabled functions, and as an adjunct to analytical methods such as multidimensional scaling in which a general smooth but flexible monotonic transformation is required. Coding requires care, but little more than is appropriate in any application.involving polynomials, and the resulting algorithm appears to be fast.
Acknowledgements This work was supported by the National Science Foundation under grant number DMS 9208819. Helpful discussions with D H Martin are gratefully acknowledged as is the polynomial root finding code supplied by C Bingham.
Befeiences Campbell, S. L., and Meyer, C. D., (1979). Generalized Inverses of Linear Transformations. Pitman, London. Elphinstone, C. D., (1983) 'A target distribution model for nonparametric density estimation' Communications in Statistics A12, 2,161 - 198 Fletcher, R., (1987), Practical Methods of Optimization. Wiley, New York. Friedman, J., and Tibshirani, R., (1984), 'The monotone smoothing of scatterplots', Technometrics, 26, 243-250. Goldfarb, D., and Adnani, A., (1983^, 'A numerically stable dual method for solving strictly convex quadratic programs'. Mathematical Programming. 27,1-33. Hadley, G. (1964) Nonlinear and Dvnamic Programming Addison Wesley. Reading Mass. Hanson, R. J., and Haskell, K. H., (1982), 'Algorithni 587: Two algorithms for the linea,rly constrained least squares problem', ACM Transactions on Mathematical Software. 8, 323-333. Lawson, C. L., and Hanson, R. J., (1974), Solving Least Squares Problems. Prentice Hall, Englewood Cliffs. Liew C. K., (1976), 'Inequality constrained least-*quares estimation', Journal of the American Statistical Association, 71, 746-751. Muller, M . W. (1984), Multidimensional Unfolding of Preference Data bv Maximum Likelihood. PhD thesis. University of South Africa, Pretoria. Ramsey, J. O. (1977) 'Monotonic weighted power transformations to additivity' Ppyghomgtrika, 42, 83 -109. Schiffman S. S., Reynolds, M . L., and Young, F. W. (1981) Introduction to Multidimensional Scaling. Theory. Practice and Methods. Academic Press, New York. ShM)iro, S. S., and Wilk, M., (1965), 'An analysis of variance test for normality', Biometrika 52, 591-611. Thisted, R., (1988), Elements of Statistical Computing. Chapman and Hall, London. Winsberg, S. and Ramsey, J. 0 . (1981), 'Analysis of pairwise preference data using integrated B-splines', Psvchometrika 46,171 -186. Wolak, F. A., (1987), 'An exact test for multiple inequality and equality constraints in the linear regression model', Journal of the American Statistical Association. 82, 782-793.
Landmarks in Optimization
MaiUiOFfltscM AlexBnlcirSclv||>«r
nHsUDeaifaatian Method tor Constraim OptlmlzaliaD
B.N. Pshenichnyj
The Linearization IVIethod for Constraint Optimization Translated from the Russian by S.S. Wilson 1994. Vlll, 147 pp. 6 tabs. (Springer Series in Computational Mathematics, Vol. 22) Hardcover DM 128,- ISBN 3-540-57037-3 Techniques of optimization are applied 10 many problems in economics, automatic control, engineering, etc. The linearization method is one of the results of the research of the last 20 years. It is closely related to Newton's method of solving systems of linear equations, to penalty function methods and to methods of nondifferenliable optimization. It requires the efficient solution of quadratic programming problems, and this leads to a connection with conjugate gradient methods and variable metrics. This book, written by one of the leading . specialists of optimization theory, sets out to provide a brief yet quite complete exposition of this most effective method of solution of optimization problems.
Dynamical Systens
Geometric Alwirflliiiis Bnd
Comblnatariar
U. Helmke, J.B. Moore
IH. Grdtschel, L. Lov^z, A. Scbrijver
Optimization and Dynamical Systems
Geometric Algorithms and Combinatorial Optimization
. With a foreword by R. Brocken 1994. XIV, 391 pp. 33 figs. (Communications and Control Engineering Series) Hardcover DM 158,ISBN 3-540-19857-1
Contents: - Matrix Eigenvalue Methods. - Double Bracket Isospectral Flows. - Singular Value Decomposition. - Linear Programming. - Approximation and Control. - Balanced Matrix Factorizations. - Invariant Theory and System Balancing. - Balancing via Gradient Flows. -Sensitivity Optimization. - Linear Algebra. - Dynamical Systems. - Global Analysis.
Prices are sub|ea lo change wilhoul notice. In EC countries the local VAT is effective. Customers In EC countries, please state your VAT-Identificallon-Number if appLcable. For information on prices in Austrian schillings and Swiss francs please consult the German book directory "VLB • Veneichnis lieferbarer Bilcher" or our general catalogue.
2nd corr. ed. 1993. Xll, 362 pp. 23 figs. (Algorithms and Combinatorics, Vol. 2) Hardcover DM 148,ISBN 3-540-56740-2 This book develops geomenic techniques for proving the polynomial time solvabiUty of problems in convexity theory, geometry, and, in particular, combinatorial optimization.' It offers a unifying approach which is based on two fundamental geometric algorithms: the eUipsoid method forfindinga point in a convex set and the basis reduction method for point lattices.
Springer pro.l65I.MJfr/E'l
Spriogcr-Verlag • Heidelberger Platz 3. D-14197 Beriin, F.R. Cemiany • 175 Fifth Are., New Votk, NY 10010, USA • Catleshall M , Faraconibe, Codahning, Surrey CII7 INH, England • 26, nie des Cannes, F-75005 Paris. France • 3 7 - 3 , Kongo 3-chonie, BunllH)-ku, Tokyo 113, Japan • Room 701, Mirror Tower, 61 Mody Road, TsimshaBui, Kowloon, Hong Kong • Avinguda Diagonal, 468-4° C, i;-08006 Barcelona, Spain • Wessel&yi u. 28, H.1075 Budapest, Hungary