a generalised pav algorithm for monotonic regression in several ... - LiU

COMPSTAT’2004 Symposium

c Physica-Verlag/Springer 2004

A GENERALISED PAV ALGORITHM FOR MONOTONIC REGRESSION IN SEVERAL VARIABLES Oleg Burdakow, Anders Grimwall and M. Hussian Key words: Statistical computing, numerical algorithms, monotonic regression, nonparametric regression, pool-adjacent-violators algorithm. COMPSTAT 2004 section: Nonparametrical statistics. Abstract: We present a new algorithm for monotonic regression in one or more explanatory variables. Formally, our method generalises the well-known PAV (pool-adjacent-violators) algorithm from fully to partially ordered data. The computational complexity of our algorithm is O(n2 ). The goodnessof-fit to observed data is much closer to optimal than for simple averaging techniques.

1

Introduction

Monotonic regression is a nonparametric method that is appropriate to use when a response variable (y) is increasing or decreasing in one or more explanatory variables (x1 , . . . , xp ). Over the past decades, several numerical algorithms have been developed to facilitate practical application of this method. However, all the algorithms that are currently in use have considerable drawbacks. The most widespread computational method for monotonic regression is the so-called pool-adjacent-violators (PAV) algorithm [1], [2], [8]. When p = 1, this algorithm is computationally efficient and provides solutions that are optimal in the sense that the mean square error is minimised. However, if p > 1, the PAV algorithm is less useful. Special cases in which the values of the explanatory variables can be grouped into a moderate number of classes can be handled by repeatedly applying the PAV algorithm to different subsets of data [4], [6], [15], [16]. Other approaches are required for typical multiple regression data in which at least one of the explanatory variables is continuous. Simple averaging techniques constitute another widespread group of methods. The main idea is to form a weighted mean of two monotonic functions that embrace all observed values of the response variable [11], [12], [16]. In contrast to the PAV algorithm, simple averaging techniques can easily accommodate several explanatory variables. On the other hand, the latter techniques are sensitive to outliers, and the goodness-of-fit can be far from optimal. Quadratic programming provides yet another approach to monotonic regression [3], [7]. Given a set of observations {(x1i , . . . , xpi , yi ), i = 1, . . . , n}, we shall find a set of fitted values {zi , i = 1, . . . , n} such that

762

Oleg Burdakow, Anders Grimwall and M. Hussian

S=

n i=1

(zi − yi )2

is minimised under the constraints induced by the partially ordered data, namely zi ≤ zj , if xki ≤ xkj for k = 1, . . . , p. All available algorithms for such solutions entail a considerable computational burden, even for moderately large data sets. The best known computational complexity is O(n4 ), and it refers to an algorithm introduced in [10]. Development of more efficient algorithms remains an open problem. Here, we generalise the PAV algorithm from fully to partially ordered data, and we show how this algorithm can be used for monotonic regression in one or more explanatory variables. In addition, we examine the performance of this algorithm with respect to computational burden and goodness-of-fit to observed data.

2

Main characteristics of the PAV algorithm

Let Mn = {(xi , yi ), i = 1, . . . , n} denote a set of n observations of one explanatory variable (x) and one response variable (y), and assume that the x-values are sorted in increasing order. Then the PAV algorithm computes a non-decreasing sequence of values {zi , i = 1, . . . , n} such that S=

n i=1

(zi − yi )2

is minimised. The cited algorithm is recursive in the sense that the optimal solution for the data set Mn is constructed by starting from the solution for M1 , which is subsequently modified into the solution for M2 , and so on. Moreover, it has the following characteristics: (i) Mr+1 is formed by extending Mr with a data point (xr+1 , yr+1 ), such that xr+1 ≤ xi for all i > r + 1.

(ii) If the values z1 , . . . , zr denote the solution obtained for Mr , then a preliminary solution for Mr+1 is formed by setting zr+1 = yr+1 . Thereafter, the final solution for Mr+1 is derived by pooling adjacent z-values that violate the monotonicity constraints. To be more precise, zr+1 , . . ., zr+1−k are assigned the common value (zr+1 + . . . + zr+1−k )/(k + 1), where k is the smallest non-negative integer such that the new values of z1 , . . . , zr+1 form a non-decreasing sequence. (iii) The optimal solution {zi , i = 1, . . . , n} for Mn is composed of clusters of identical z-values.

A generalised PAV algorithm

3

763

An alternative formulation of the PAV algorithm

The removal of monotonicity violators implies that adjacent clusters of identical z-values are joined to form new and larger clusters. To achieve a more precise reformulation of the PAV algorithm, we introduce the notation I = {i1 , . . . , im } for a cluster consisting of a set of adjacent indices i1 , . . . , im . Furthermore, we use |I| for the number of elements in I, and the symbol z(I) for the common value of all zi , i ∈ I. When two adjacent clusters I1 and I2 are joined to form a new cluster I1 ∪ I2 , the associated z-value is given by the expression z(I1 ∪ I2 ) =

|I1 |z(I1 ) + |I2 |z(I2 ) . |I1 | + |I2 |

If the clusters I1 , . . . , Iq and their associated values z(I1 ), . . . , z(Iq ) compose the optimal solution for Mr , then a preliminary solution for Mr+1 is formed by introducing the cluster Iq+1 consisting of the integer r + 1, and setting z(Iq+1 ) = yr+1 . Thereafter, the final solution for Mr+1 is obtained by joining Iq+1 with adjacent left-neighbour clusters, one by one, until the z-values violating the monotonicity constraints have been removed.

4

A generalised PAV algorithm for partially ordered data

In our generalisation of the PAV algorithm to partially ordered data, we use the notions introduced in the previous sections. Let Mn = {(x1i , . . . , xpi , yi ), i = 1, . . . , n} denote a set of n observations of p explanatory variables and one response variable. Also, let xi = (x1i , . . . , xpi ), i = 1, . . . , n, denote the elements in Rp that are defined by the explanatory variables. Then we can define a partial order on the set Un = {xi , i = 1, . . . , n} by setting xi ) xj if xki ≤ xkj , k = 1, . . . , p, and subsequently sort the elements of Un (and Mn ) in such a way that, for each i, xi is a minimal element of the set Vi = {xj , j = i, . . . , n}. Furthermore, we can compute the lower cover Li of each xi . The latter set consists of all elements xj , such that xj ) xi and the inequalites xj ) xk ) xi are satisfied only if xj = xk or xk = xi . Like the original PAV algorithm, our generalisation is recursive. Furthermore, it has the following features: (i) Mr+1 is formed by extending Mr with a data point (xr+1 , yr+1 ), such that, for all i > r +1, either xr+1 ) xi , or xr+1 is incomparable with xi .

764


(ii) If the clusters I1 , . . . , Iq and their associated values z(I1 ), . . . , z(Iq ) denote a solution for Mr , then a preliminary solution for Mr+1 is formed by introducing the cluster Iq+1 consisting of the integer r + 1, and setting z(Iq+1 ) = yr+1 . Thereafter, the final solution for Mr+1 is obtained by joining Iq+1 with left-neighbour clusters, one by one, until the z-values violating the monotonicity constraints have been removed. A cluster Ij is called a left neighbour of Il if there exists an i ∈ Ij and a k ∈ Il such that xi belongs to the lower cover of xk . (iii) The solution {zi , i = 1, . . . , n} obtained for Mn is composed of clusters of identical z-values. Due to their construction, the solutions obtained by using the generalised PAV algorithm are monotonic in the explanatory variables. However, two ambiguities should be noted: (i) a cluster may have several different left neighbours; (ii) the pre-sorting which ensures that, for each i, xi is a minimal element of the set Vi = {xj , j = i, . . . , n} may be done in different ways. The first ambiguity is easy to remove. The goodness-of-fit will be improved, if the largest violator of monotonicity is removed first, whenever a cluster has several left neighbours. The second ambiguity is more intricate, and our generalised PAV algorithm will not necessarily attain the minimal value of the mean square error.

5

Computational burden

The computations can be divided into pre-calculations and a recursive establishment of solutions for the data sets Mr , r = 1, . . . , n. The pre-calculations have three major components: (i) establishment of a partial order on the set {xi , i = 1, . . . , n}; (ii) sorting of the observations to ensure that, for each i, xi is a minimal element of the set Vi = {xj , j = i, . . . , n}; (iii) calculation of the lower cover of each xi . Test runs of a VisualBasic implementation of the algorithm showed that data sets consisting of several hundred observations can be processed in less than a second using an ordinary PC. Most of the computer time was usually spent on the calculation of lower covers, followed by the establishment of a partial order and the removal of monotonicity violators. It is also noteworthy that with our generalised algorithm, after the partial order has been established, the number of explanatory variables does not influence the computational burden. A theoretical analysis of the proposed algorithm showed that its complexity is O(n2 ). This result will be proved in a separate paper. The proof is based on the following observations. The pre-calculations can be carried out in O(n2 ) elementary arithmetic operations. Each operation of joining clusters is preceded by the search of the largest violator among the left neighbours.

A generalised PAV algorithm Correlation

0 0.9 -0.9

765

Mean square error Normally distr. errors Exponentially GPAV SA GPAV 0.72 0.80 0.77 0.75 0.90 0.75 0.75 0.88 0.75

distr. errors SA 1.05 1.15 1.12

Table 1: Mean square error for monotonic regression in two explanatory variables using the generalised PAV algorithm (GPAV) and a simple averaging technique (SA). The table shows mean values for 100 data sets, each consisting of 400 observations. Such search requires at most n comparisons. The complexity of joining clusters is also O(n). Since the total number of joinings cannot exceed n, the overall complexity related to joining is O(n2 ). The number of cases, in which the search of the largest among the left neighbours does not result in joining of clusters, is below n. Thus, the contribution of these cases to the complexity does not exceed O(n2 ). When the structure of partially ordered data is a tree, the algorithm is guaranteed to produce the optimal solution, and it has the same complexity as the algorithm introduced in [13], namely O(n log n).

6

Goodness-of-fit

A simulation study was undertaken to compare the goodness-of-fit that could be achieved by applying (i) the generalised PAV algorithm and (ii) the simple averaging technique described by Mukarjee in [11]. All of the analysed data sets were generated according to the equation y = x1 + x2 + where (x1 , x2 )∗ was normally distributed with mean zero, variance one, and correlation ρ. The error terms were either normally or exponentially distributed with variance 1. Table 1 shows that, regardless of the distribution of the error terms or the correlation between the two explanatory variables, the generalised PAV algorithm performed better than the simple averaging technique. Furthermore, the difference in goodness-of-fit was particularly large for heavy-tailed (exponentially distributed) error terms.

7

Discussion

Models of monotonic responses in two or more explanatory variables have a large number of applications in many areas. Thus far, use of monotonic regression has been hampered by the lack of algorithms suitable for typical

766


multiple regression data. With the generalised PAV algorithm, it is feasible to handle data sets that include hundreds or even thousands of observations of one response variable and an arbitrary number of explanatory variables. For moderately large data sets, it is possible to combine monotonic regression with general model selection techniques, such as cross-validation. The test runs described in this article show that our generalisation of the PAV algorithm has high efficiency from the viewpoint of both its accuracy (goodness-of-fit) and computational time. Two recent conference contributions [5], [9] involving further numerical experiments and applications in environmental science confirm the main results. The present version of the generalised PAV algorithm is superior to simple averaging techniques, but it may not always provide least squares solutions. Further work is needed to determine the extent to which the goodness-of-fit can be improved by removing or reducing the non-optimal performance.

References [1] Ayer M., Brunk H.D., Ewing G.M., Reid W.T., Silverman E. (1955). An empirical distribution function for sampling with incomplete information. The Annals of Mathematical Statistics 26, 641 – 647. [2] Barlow R.E., Bartholomew D.J., Bremner J.M., Brunk H.D. (1972). Statistical inference under order restrictions. Wiley, New York. [3] Best M.J., Chakravarti N. (1990). Active set algorithms for isotonic regression: a unifying framework. Mathematical Programming 47, 425 – 439. [4] Bril G., Dykstra R., Pillers C., Robertson T. (1984). Algorithm AS 206, isotonic regression in two independent variables. Applied Statistics 33, 352 – 357. [5] Burdakov O., Sysoev O., Grimvall A., Hussian M. (2004). An algorithm for isotonic regression problems. To appear in the Proceedings of the 4th European Congress of Computational Methods in Applied Science and Engineering ‘ECCOMAS 2004’. [6] Dykstra R., Robertson T. (1982). An algorithm for isotonic regression for two or more independent variables. The Annals of Statistics 10, 708 – 716. [7] Gamarnik D. (1998). Efficient learning of monotone concepts via quadratic optimisation. Proceedings of the Eleventh Annual Conference on Computational Learning Theory, July 24-26, 1998, USA, Wisconsin, Madison 134 – 143. [8] Hanson D.L., Pledger G., Wright F.T. (1973). On consistency in monotonic regression. Annals of Statistics 1, 401 – 421. [9] Hussian M., Grimvall A., Burdakov O., Sysoev O. (2004). Monotonic regression for trend assessment of environmental quality data. To appear in the Proceedings of the 4th European Congress of Computational Methods in Applied Science and Engineering ‘ECCOMAS 2004’.

A generalised PAV algorithm

767

[10] Maxwell W.L., Muchstadt J.A. (1983). Establishing consistent and realistic reorder intervals in production-distribution systems. Operations Research 33, 1316 – 1341. [11] Mukarjee H. (1988). Monotone nonparametric regression. The Annals of Statistics 16, 741 – 750. [12] Mukarjee H., Stem H. (1994). Feasible nonparametric estimation of multiargument monotone functions. Journal of the American Statistical Association 425, 77 – 80. [13] Pardalos P.M., Xue G. (1999). Algorithms for a class of isotonic regression problems. Algorithmica 23, 211 – 222. [14] Salanti G., Ulm K. (2001). The multidimensional isotonic regression. Proceedings Book, International Society of Clinical Biostatistics, 19-23 August, Sweden, Stockholm, 162. [15] Schell M.J., Singh B. (1997). The reduced monotonic regression method. Journal of the American Statistical Association 92, 128 – 135. [16] Strand M. (2003). Comparison of methods for monotone nonparametric multiple regression. Communications in Statistics - Simulation and Computation 32, 165 – 178. Acknowledgement : The authors are grateful for financial support from the Swedish Environmental Protection Board and the Swedish Research Council. Address: O. Burdakov, A. Grimvall, M. Hussian, Department of Mathematics, Link¨ oping University, Sweden E-mail : [email protected]