Approximating I/O Data Using Radial Basis Functions: A New Clustering-Based Approach Mohammed Awad, Héctor Pomares, Luis Javier Herrera, Jesús González, Alberto Guillén, and Fernando Rojas Dept. of Computer Architecture and Computer Technology, University of Granada, Granada, Spain
[email protected]
Abstract. In this paper, we deal with the problem of function approximation from a given set of input/output data. This problem consists of analyzing these training examples so that we can predict the output of the model given new inputs. We present a new method for function approximation of the I/O data using radial basis functions (RBFs). This approach is based on a new efficient method of clustering of the centres of the RBF Network (RBFN); it uses the objective output of the RBFN to move the clusters instead of just the input values of the I/O data. This method of clustering, especially designed for function approximation problems, improves the performance of the approximator system obtained, compared with other models derived from traditional algorithms.
1 Introduction Function approximation is the name given to a computational task that is of interest to many sciences and engineering communities [1]. Function Approximation consists of synthesizing a complete model from samples of the function and its independent variables [2]. In supervised learning, the task is that of learning a mapping from one vector space to another with the learning based on a set of instances of such mappings. We assume that a function F does exist and we endeavour to synthesize a computational model of that function. As a general mathematical problem, function approximation has been studied for centuries. However, some knowledge of the function to be approximated is usually assumed, depending on the specific problem. For example, in pattern recognition, a function mapping is made whose objective is to assign each pattern in a feature space to a specific label in a class space. When one makes no assumptions about a model of the function to be approximated, mathematical theory can only provide interpolation techniques such as Splines, Taylor expansions, Fourier series, etc. Under this assumption, we can also make use of the so called model-free systems. These systems include neural networks and fuzzy systems, among others. Radial Basis Function Networks (RBFNs) can be seen as a particular class of Artificial Neural Networks ANNs. They are characterized by a transfer function in the hidden unit layer having radial symmetry with respect to a centre. The basic architecture of an RBFN is a 3-layer network. The output of the net is given by the following expression: J. Cabestany, A. Prieto, and D.F. Sandoval (Eds.): IWANN 2005, LNCS 3512, pp. 289 – 296, 2005. © Springer-Verlag Berlin Heidelberg 2005
290
M. Awad et al. m G G F ( x , Φ, w) = ∑ φi ( x ) ⋅ wi i =1
(1)
where Φ = {φi : i = 1,..., m} are the basis functions set and wi the associate weights for every RBF. The basis function φ can be calculated as a gaussian function using the following expression: G G ⎛ x−c ⎞ G G φ ( x , c , r ) = exp ⎜⎜ (2) ⎟⎟ ⎝ r ⎠ G where c is the central point of the function φ and r is its radius. RBFNs are universal approximators and thus best suited for function approximation problems. In general an approximator is said to be universal if it can approximate any continuous function on a compact set to a desired degree of precision. The technique of finding the suitable number of radial functions is very complex since we must be careful of not producing excessively large networks which are inefficient and sensitive to over-fitting and exhibit poor performances. G When the values of c and r of the basis functions are known it is possible to use a linear optimization method for finding the values of wi that minimize the cost function computed on the sample set. This method relies on the computation of the pseudoinverse matrix. Other methods proposed in the literature try to optimize also the centre values of the RBFs. For instance, in [3] Chen et al. propose an alternative learning procedure based on the orthogonal least-squares method. The procedure chooses radial basis function centres one by one in a rational way until an adequate network has been constructed; each selected centre maximizes the increment to the explained variance or energy of the desired output and does not suffer numerical ill-conditioning problems. Orr in [4] selects the centres among the samples of the basis functions that most contribute to the output variance. Another solution to this problem is to cluster similar samples of the input data together. Every cluster has a centroid, which can then be chosen as the centre of a new RBF. We can find in the literature some unsupervised clustering algorithms such as k-means [5], fuzzy c-means [6], enhanced LBG [7], and also some supervised clustering algorithms such as the Clustering for Function Approximation method (CFA) [8], the Conditional Fuzzy Clustering algorithm (CFC) [9] and the Alternating Cluster Estimation method (ACE) [10]. In this paper we present a new method for function approximation from a set of I/O data using radial basis functions (RBFs). This approach is based on a new efficient method of clustering of the centres of the RBF Network; it uses the target output of the RBFN to migrate and fine-tune the clusters instead of just the input values of the I/O data. This method of clustering, especially designed for function approximation problems, calculates the error committed in every cluster using the real output of the RBFN trying to concentrate more clusters in those input regions where the approximation error is bigger, thus attempting to homogenize the contribution to the error of every cluster. After this introduction, the organization of the rest of this paper is as follows. Section 2 presents an overview of the proposed algorithm. In Section 3, we present in detail the proposed algorithm for the determination of the pseudo-optimal RBF parameters. Then, in Section 4 we show some results that confirm the goodness of the proposed methodology. Some final conclusions are drawn in Section 5.
Approximating I/O Data Using RBFs: A New Clustering-Based Approach
291
2 Overview of the Proposed Algorithm As mentioned before, the problem of function approximation consists of synthesizing a complete model from samples of the function and its independent variables. ConG G sider a function y = f ( x ) where x is a vector ( x1 ,..., xn ) in n-dimensional space from which there are available a set of input/output data pairs. The idea is to approximate G these data with another function F ( x ) . The accuracy of the approximation is generally measured by a cost function which takes into account the error between the output of the RBFN and the target output. In this paper, the cost function we are going to use is the so-called Normalized Root Mean Squared Error (NRMSE). This performance index is defined as: NRMSE =
P
∑( y i =1
i
P G − f ( xi )) 2 / ∑ ( y i − y ) 2
(3)
i =1
where y is the mean of the target output and p is the data number. The objective of our algorithm, which is inspired in the CFA algorithm, is to increase the density of clusters in the input domain areas where the target function is less accurately approximated, rather than just in the zones where there are more input examples, as most unsupervised clustering algorithms would do, or in zones where more variability of the output is found, as it is the case of CFA. The RBFN universal approximation property states that an optimal solution to the approximation problem can be found, which minimizes the NRMSE. In order to find the minimum of the error function, the RBFN is completely specified by choosing the G following parameters: the number m of radial basis functions, the centres c of every RBF, the radius r, and the weights w. The number of RBFs is a critical choice. In our algorithm we have used a simple incremental method to determine the number of RBFs. We will stop adding new RBFs when the approximation error falls below a certain target error, in our case NRMSETARGET=0.1. As to the rest of the parameters of the RBFN, in Section 3 we present a new clustering technique especially suited for function approximation problems. The basic idea we have developed is to calculate the error committed in every cluster using the real output of the RBFN to compute the error for each training data belonging to the cluster, and concentrating more clusters in those input regions where the cluster error is bigger. Fig. 1 presents a flow chart with the general description of the complete incremental algorithm.
3 Parameter Adjustment of the RBF Network The locality property inherent to the Radial Basis Functions allows us to use a clustering algorithm to obtain the RBF centres. Clustering algorithms may get stuck in a local minimum ignoring a better placement of some of the clusters, i.e., the algorithm is trapped in a local minimum which is not the global one. For this reason we need a clustering algorithm capable to solve this local minima problem. To avoid this problem we endow our supervised algorithm with a migration technique. This modifica-
292
M. Awad et al.
tion allows the algorithm to escape from local minima and to obtain a prototype allocation independent of the initial configuration. To optimize the other parameters of the RBFN (the radius r and the weights w) we used well-known heuristics such as the k-nearest neighbour technique (knn) for the initialization of the radius of each RBF, and some conventional techniques such as singular value decomposition (SVD) to directly optimize the weights. Finally, local minimization routines such as the Levenberg-Marquardt algorithm are finally used to fine-tune the obtained RBFN. Therefore, in this section we will concentrate on the proposed clustering algorithm. In Fig. 2, we show a flowchart representing the general description of our clustering algorithm. D←∞
Begin with 1 RBF
Initiate the Clusters using K-means
Calculate the RBFNNs parameters: - Centers using the proposed clustering algorithm. - Radius using K-nearest neighbours Knn - Weights using sigular value descomposition SVD
Perform Local Displacement of the Clusters
Dant ← D
Add 1 RBF
Calculate the Error NRMSEt
Perform Migration of the Clusters
Yes
Perform Local Displacement of the Clusters
NRMSEt 1?
Calculate the Error of the RBFN, using (KNN) to initiate the Radius and (SVD) to calculate the weight of the RBFN
Yes Select one (U>1) using the roulette wheel selection
Perform the Partition of the Training set.
Move the selected cluster with (U1). Used K-means to repatition the data
Calculate the Distortion D
Perform Local Displacement of the Clusters
No Dant −D / D< ε ?
Yes Return the new Cluster Cj
Fig. 3. Local Displacement of the Clusters
Calculate the Distortion
Yes
Dj
Has the Distotion improved?
No
Reject the Migration
Confirm the migration Stop the Migration
Fig. 4. The Migration Process
294
M. Awad et al.
This is carried out by an iterative process that updates each cluster centres as the weighted mean of the training data belonging to that cluster and we repeat this process until the total distortion of the net reaches a minimum. G G cm = ∑ Eim ⋅ xi / ∑ Eim G G (6) xi ∈ Cm xi ∈ Cm The migration process migrates clusters from the better approximated zones toward those zones where the approximation error is worse, thus attempting to make equal their contribution to the total distortion. Our main hypothesis is that the best initial cluster configuration will be the one that equalizes the approximation error committed by every cluster. To avoid local minimums, the migration process uses a pseudorandom selection of the cluster to migrate, being the probability of choosing a given cluster inversely proportional to what we call “the utility” of that cluster, which is defined as U j = D j / D j = 1,..., m (7) In this way, the proposed algorithm selects one cluster that has utility less than one and moves this cluster to the zone nearby a new selected cluster having utility more than one (see Fig. 4). This migration step is necessary because the local displacement of clusters only moves clusters in a local manner. It should be noted that using the kmeans algorithm to divide the data that belonged to the zone that receives the new cluster is less complex and need less execution time than others migration algorithms such as ELBG and CFA.
4 Example of the Proposed Procedure Let us consider the function [8] f1 ( x )
= sin(2π x ) / e x
x ∈ [0,10]
(8)
This has been chosen to demonstrate the importance of the equidistribution of the approximation errors throughout the clusters. This function (see Fig. 5a), has a very variable output when the input x is near to the value zero. To test the effects caused by the proposed algorithm on the initialization and avoid local minima on the placement of the clusters, a training set of 2000 samples of the function was generated by evaluating inputs taken uniformly from the interval [0,10], from which we have removed 1000 points for validation. The results of the proposed algorithm compared with the CFA algorithm, which resulted to be the best algorithm for this example in [8], with µmin = 0.001, are represented in Table 1. In the table, NRMSEC is the NRMSE of the training data after the clustering process is concluded. NRMSET is the final error index (for 10000 test data) obtained after the application of the Levenberg–Marquardt method. It must be noted, that both clustering algorithms were designed to provide an initial RBF configuration to be subsequently optimized using a local optimization method in order to find the global minimum. Std are the standard deviation of the error indices using 5 executions of both algorithms. Finally, TimeC is the clustering process execution time (in sec-
Approximating I/O Data Using RBFs: A New Clustering-Based Approach
295
onds). As can be seen from the table, the proposed algorithm reaches better approximations using less time than the CFA algorithm, in all cases.
Fig. 5. a) Objective function. b) Approximation with 6 RBFs
Table 1. Comparison between CFA and the proposed approach m
NRMSEC
Std
NRMSET
Std
TIMEC
NRMSEC
Std
NRMSET
Std
TIMEC
2 3 4 5 6
1.0 0.99 1.00 0.90 0.88
0.0 1E-6 1E-4 7E-5 7E-3
0.77 0.67 0.17 0.15 0.07
1E-1 1E-1 6E-2 8E-2 9E-3
0.27 0.38 1.11 1.17 1.39
1.01 1.00 1.00 0.98 0.96
2E-3 6E-4 3E-3 1E-2 8E-3
1.45 0.70 0.18 0.23 0.09
2E-8 3E-2 6E-2 5E-3 1E-3
8.1 8.2 13.5 30.4 50.2
Proposed Approach
CFA
As an example of the learning process, in Fig. 6a we can see the initial distortion distribution for the case of 6 equally distributed RBFs, which is the first configuration whose approximation error falls under the target error. Fig. 6b represents the same information when the clustering process has ended. We can now see the advantage that we can expect from the fact of making each cluster to have an equal contribution to the total distortion, which is the objective of the proposed clustering algorithm. Finally, 5b represents the approximation of the net using 6 RBFs. We can see how the net is capable of making practically a perfect approximation.
Fig. 6. a) The distortion before the migration
b) The distortion after the migration
296
M. Awad et al.
5 Conclusions In this paper we have proposed an algorithm of clustering especially suited for function approximation problems. This method calculates the error committed in every cluster using the real output of the RBFN, and not just an approximate value of that output, trying to concentrate more clusters in those input regions where the approximation error is bigger, thus attempting to homogenize the contribution to the error of every cluster. This algorithm is easy to implement and is superior in both performance and computation time to other algorithms such as the CFA method. We have also shown how it is possible to use this algorithm to find the minimal number of RBFs that satisfy a certain error target for a given function approximation problem.
Acknowledgements This work has been partially supported by the Spanish CICYT Project TIN200401419.
References 1. Pomares, H., Rojas, I., Ortega, J., González, J., Prieto, A.: “A Systematic Approach to Self-Generating Fuzzy Rule-Table for Function Approximation”, IEEE Trans. Syst,. Man, and Cybern. (2000), Part-B, 30(3) 431-447. 2. Higgins, Ch.: Classifications and approximation with Rule-Based Networks. Phd. Thesis (1993), 33-36 3. Chen S., Cowan C. F. N, Grant P. M.: “Orthogonal least squares learning algorithm for radial basis functions networks”, IEEE Trans. Neural Networks (1991), vol. 2, no. 2, 302-309. 4. Orr M. J. L.: “Regularization in the selection of radial basis function centers”, Neural Computation (1995), 7(3), 606-623 5. Duda, R. O., Hart, P. E.: Pattern Classification and Scene Analysis. New York: Wiley, 1973. 6. Bezdek, J. C.: Pattern Recognition with Fuzzy Objective Function Algorithms Plenum, New York, 1981. 7. Russo, M., Patanè, G.: “Improving the LBG Algorithm,” in Lecture Notes in Computer Science. New York: Springer-Verlag (1999), vol. 1606, 621–630. 8. Gonzalez, J., Rojas, I., Pomares, H., Ortega, J., Prieto, A.: “A new clustering technique for function approximation”, IEEE Trans. Neural Networks (2002), vol.: 13, no. 1, 132 -142. 9. Pedrycz, W.: “Conditional fuzzy C-means,” Pattern Recognition Lett., vol. 17, pp. 625– 632, 1996. 10. Runkler, T. A., Bezdek, J. C.: “Alternating cluster estimation: A new tool for clustering and function approximation”, IEEE Trans. Fuzz Syst. (1999), vol.7, 377- 393. 11.