A Learning Algorithm for Piecewise Linear Regression

0 downloads 0 Views 227KB Size Report
Abstract. A new learning algorithm for solving piecewise linear regression problems is proposed. It is able to train a proper multilayer feedforward neural network ...
A Learning Algorithm for Piecewise Linear Regression Giancarlo Ferrari-Trecate 1 , Marco Muselli 2 , Diego Liberati 3 , Manfred Morari 1 1

Institute f¨ ur Automatik, ETHZ - ETL CH 8092 Z¨ urich, Switzerland

2

Istituto per i Circuiti Elettronici - CNR via De Marini, 6 - 16149 Genova, Italy

3

Ce.S.T.I.A. - CNR c/o Politecnico di Milano Piazza Leonardo da Vinci, 32 - 20133 Milano, Italy

Abstract A new learning algorithm for solving piecewise linear regression problems is proposed. It is able to train a proper multilayer feedforward neural network so as to reconstruct a target function assuming a different linear behavior on each set of a polyhedral partition of the input domain. The proposed method combine local estimation, clustering in weight space, classification and regression in order to achieve the desired result. A simulation on a benchmark problem shows the good properties of this new learning algorithm.

1

Introduction

Real-world problems to be solved by artificial neural networks are normally subdivided in two groups according to the range of values assumed by the output. If it is Boolean or nominal, we speak of classification problems; otherwise, when the output is coded by a continuous variable, we are facing with a regression problem. In most cases, the techniques employed to train a connectionist model depend on the kind of problem we are dealing with. However, applications can be found, which lie on the borderline between classification and regression; these occur when the input space can be subdivided into disjoint regions Xi characterized by different behaviors of the function f to be reconstructed. The target of the learning problem is consequently twofold: by analyzing a set of samples of f , possibly affected by noise, it has to generate both the collection of regions Xi and the behavior of the unknown function f in each of them. If the region Xi corresponding to each sample in the training set were known, we could add the index i of the region as an output, thus obtaining a classification problem which has the target of finding the effective form of each Xi . On the other side, if the actual partition Xi were known, we could solve several regression problems to find the behavior of the function f within each Xi .

Because of this mixed nature, classical techniques for neural network training cannot be directly applied, but specific methods are necessary to deal with this kind of problems. Perhaps, the simplest situation one can think of is piecewise linear regression: in this case the regions Xi are polyhedra and the behavior of the function f in each Xi can be modeled by a linear expression. Several authors have treated this kind of problem [2, 3, 4, 8], providing algorithms for reaching the desired result. Unfortunately, most of them are difficult to extend beyond two dimensions [2], whereas others consider only local approximations [3, 4], thus missing the effective extension of regions Xi . In this contribution a new training algorithm for neural networks solving piecewise linear regression problems is proposed. It combines clustering and supervised learning to obtain the correct values for the weights of a proper multilayer feedforward architecture.

2

The piecewise linear regression problem

Let X be a polyhedron in the n-dimensional space Rn and Xi , i = 1, . . . , s, aS polyhedral partition of X, i.e. Xi ∩ Xj = ∅ for every i, j = 1, . . . , s and s i=1 Xi = X. The target of a Piecewise Linear Regression (PLR) problem is to reconstruct an unknown function f : X → R having a linear behavior in each region Xi f (x) = zi = wi0 +

n X

wij xj

j=1

when only a training set S containing m samples (xk , yk ), k = 1, . . . , m, is available. The output yk gives an evaluation of f (xk ) subject to noise, being xk ∈ X; the region Xi to which xk belongs is not known in advance. Scalars wi0 , wi1 , . . . , win , for i = 1, . . . , s, characterize univocally the function f and their estimate is a target of the PLR problem; for notational purposes they will be included in a vector wi . Since regions Xi are polyhedral, they can be defined by a set of li linear inequalities of the following kind: aij0 +

n X

aijk xk ≤ 0

(1)

k=1

Scalar aijk , for j = 1, . . . , li and k = 0, 1, . . . , n, can be included in a matrix Ai , whose estimate is still a target of the reconstruction process for every i = 1, . . . , s. Discontinuities may be present in the function f at the boundaries between two regions Xi . Following the general idea presented in [8], a neural network realizing a piecewise linear function f of this kind can be modeled as in Fig. 1. It contains a gate layer that verifies inequalities (1) and decides which of the terms zi must be used as the output y of the whole network. Thus, the i-th unit in the gate

S

w

A w

1

S

1 0 1 1

x

z

w

w

1 2

1

w

1 1 n

w

A 2 0 2 1

2

S

x

w

z 2

w

2

2 2

2 n

y

O u tp u t L a y e r

w

z s1

w

A s

s

S w

s2

x

G a te L a y e r

w

s0

H id d e n L a y e r

sn

n

In p u t L a y e r

Figure 1: General neural network realizing a piecewise linear function.

layer has output equal to its input zi , if all the constraints (1) are satisfied for j = 1, . . . , li , and equal to 0 in the opposite case. All the other units perform a weighted sum of their inputs; the weights of the output neuron, having no bias, are always set to 1.

3

The proposed learning algorithm

As previously noted, the solution of a PLR problem requires a technique that combine classification and regression: the first has the aim of finding matrices Ai to be inserted in the gate layer of the neural network (Fig. 1), whereas the latter provides weight vectors wi for the input to hidden layer connections. A method of this kind is reported in Fig. 2; it is composed of four steps, each of which is devoted to a specific task. The first of them (Step 1) has the aim of obtaining a first estimate of the weight vectors wi by performing local linear regressions based on small subsets of the whole training set S. In fact, points xk that are close to each other are likely to belong to the same region Xi . Then, for each sample (xk , yk ), with k = 1, . . . , m, we build a set Ck containing (xk , yk ) and the c − 1 distinct pairs (x, y) ∈ S that score the lowest values of the distance kxk − xk. The parameter c can be freely chosen, though the inequality c ≥ n must be respected to perform the linear regression. It can be easily seen that some sets Ck , called mixed, will contain input patterns belonging to different regions Xi . They lead to wrong estimates for wi and consequently their number must be kept minimum; this can be obtained by lowering the value of c. However,

ALGORITHM FOR PIECEWISE LINEAR REGRESSION 1. (Local regression) For every k = 1, . . . , m do 1a. Form the set Ck containing the pair (xk , yk ) and the samples (x, y) ∈ S associated with the c − 1 nearest neighbors x to xk . 1b. Perform a linear regression to obtain the weight vector vk of a linear unit fitting the samples in Ck . 2. (Clustering) Perform a clustering process in the space Rn+1 to subdivide the set of weight vectors vk into s groups Vi . 3. (Classification) Build a new training set S 0 containing the m pairs (xk , ik ), being Vik the cluster including vk . Train a multicategory classification method to produce the matrices Ai for the regions Xi . 4. (Regression) For every i = 1, . . . , s perform a linear regression on the samples (x, y) ∈ S with x ∈ Xi to obtain the weight vector wi for the i-th unit in the hidden layer.

Figure 2: Proposed learning method for piecewise linear regression.

the quality of the estimate improves when the size c of the sets Ck increases; a tradeoff must therefore be attained in selecting a reasonable value for c. Denote with vk the weight vector of the linear unit produced through the linear regression on the samples in Ck . If the generation of the samples in the training set is not affected by noise, most of the vk coincide with the desired weight vectors wi . Only mixed sets Ck yield spurious vectors vk , which can be considered as outliers. Nevertheless, even in presence of noise, a clustering algorithm (Step 2) can be used to determine the sets Vi of vectors vk associated with the same wi . A proper version of the K-means algorithm [6] can be adopted to this aim if the number s of regions is fixed beforehand; otherwise, adaptive techniques, such as the Growing Neural Gas [7], can be employed to find at the same time the value of s. The sets Vi generated by the clustering process induce a classification on the input patterns xk belonging to the training set S. As a matter of fact, if vk ∈ Vi for a given i, the set Ck is fitted by the linear neuron with weight vector wi and consequently xk is located into the region Xi . The effective extension of this region can be determined by solving a linear multicategory classification problem (Step 3), whose training set S 0 is built by adding as output to each input pattern xk the index ik of the set Vik to which the corresponding vector vk belongs. To avoid the presence of multiply classified points or of unclassified patterns in the input space, proper techniques [1] based on linear and quadratic programming can be employed. In this way the s matrices Ai for the gate layer are generated; they can include redundant rows that are not necessary in the determination of the polyhedral regions Xi . These rows can be removed by applying standard linear programming techniques.

16

14

14

12

12

10

10

8

8

y

y

16

6

6

4

4

2

2

0

0

−2 −4

−2 −4

−3

−2

−1

0

1

2

3

4

−3

−2

−1

0

x

x

a)

b)

1

2

3

4

Figure 3: Simulation results for a benchmark problem: a) unknown piecewise linear function f and training set S, b) function realized by the trained neural network (dashed line).

Finally, weight vectors wi for the neural network in Fig. 1 can be directly obtained by solve s linear regression problems (Step 4) having as training sets the samples (x, y) ∈ S with x ∈ Xi , where X1 , . . . Xs are the regions built by the classification process.

4

Simulation results

The proposed algorithm for piecewise linear regression has been tested on a one-dimensional benchmark problem, in order to analyze the quality of the resulting neural network. The unknown function to be reconstructed is the following  if −4 ≤ x ≤ 0  −x x if 0 < x < 2 f (x) = (2)  2 + 3x if 2 ≤ x ≤ 4 with X = [−4, 4] and s = 3. A training set S containing m = 100 samples (x, y) has been generated, where y = f (x) + ε and ε is a normal random variable with zero mean and variance σ 2 = 0.05. The behavior of f (x) together with the elements of S are depicted in Fig. 3a. The method described in Fig. 2 has been applied by choosing at Step 1 the value c = 6. At Step 2 the number s of regions has been supposed to be known, thus allowing the application of the K-means clustering algorithm [5]; a proper definition of norm has been employed to improve the convergence of the clustering process [6]. Multicategory classification (Step 3) has then been performed by using the method described in [1], which can be easily extended to realize nonlinear boundaries among the Xi when treating a multidimensional

problem. Finally, least square estimation is adopted to generate vectors wi for piecewise linear regression. The resulting neural network realizes the following function, represented as a dashed line in Fig. 3b:   −0.0043 − 0.9787x if −4 ≤ x ≤ −0.24 0.0899 + 0.9597x if −0.24 < x < 2.12 f (x) =  1.8208 + 3.0608x if 2.12 ≤ x ≤ 4 As one can note, this is a good approximation to the unknown function (2). Errors can only be detected at the boundaries between two adjacent regions Xi ; they are mainly due to the effect of mixed sets Ck on the classification process.

References [1] E. J. Bredensteiner and K. P. Bennett, Multicategory classification by support vector machines. Computational Optimizations and Applications, 12 (1999) 53–79. [2] V. Cherkassky and H. Lari-Najafi, Constrained topological mapping for nonparametric regression analysis. Neural Networks, 4 (1991) 27–40. [3] C.-H. Choi and J. Y. Choi, Constructive neural networks with piecewise interpolation capabilities for function approximation. IEEE Transactions on Neural Networks, 5 (1994) 936–944. [4] J. Y. Choi and J. A. Farrell, Nonlinear adaptive control using networks of piecewise linear approximators. IEEE Transactions on Neural Networks, 11 (2000) 390–401. [5] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. (1973) New York: John Wiley and Sons. [6] G. Ferrari-Trecate, M. Muselli, D. Liberati, and M. Morari, A Clustering Technique for the Identification of Piecewise Affine Systems. Accepted at the Fourth International Workshop on Hybrid Systems: Computation and Control, Roma, Italy, March 28-30, 2001. [7] B. Fritzke, A growing neural gas network learns topologies. In Advances in Neural Information Processing Systems 7 (1995) Cambridge, MA: MIT Press, 625–632. [8] K. Nakayama, A. Hirano, and A. Kanbe, A structure trainable neural network with embedded gating units and its learning algorithm. In Proceedings of the International Joint Conference on Neural Networks (2000) Como, Italy, III–253–258.

Suggest Documents