Helsinki University of Technology Control Engineering Laboratory Espoo 2001
Report 124
CLUSTERING-BASED ALGORITHMS FOR RADIAL BASIS FUNCTION AND SIGMOID PERCEPTRON NETWORKS Zekeriya Uykan
TEKNILLINEN KORKEAKOULU TEKNISKA HÖGSKOLAN HELSINKI UNIVERSITY OF TECHNOLOGY TECHNISCHE UNIVERSITÄT HELSINKI UNIVERSITE DE TECHNOLOGIE D´HELSINKI
Helsinki University of Technology Control Engineering Laboratory Espoo June 2001
Report 124
CLUSTERING-BASED ALGORITHMS FOR RADIAL BASIS FUNCTION AND SIGMOID PERCEPTRON NETWORKS Zekeriya Uykan Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Automation and Systems Engineering, for public examination and debate in Auditorium S4 at Helsinki University of Technology (Espoo, Finland) on the 27th of June, 2001, at 12 noon.
Helsinki University of Technology Department of Automation and Systems Technology Control Engineering Laboratory
Distribution: Helsinki University of Technology Control Engineering Laboratory P.O. Box 5400 FIN-02015 HUT, Finland Tel. +358-9-451 5201 Fax. +358-9-451 5208 E-mail:
[email protected]
ISBN 951-22-5529-4 ISSN 0356-0872
Picaset Oy Helsinki 2001
i
Abstract This thesis is concerned with learning methods of Radial Basis Function Network (RBFN) and standard single-hidden-layer Sigmoid Perceptron Network (SPN). The main research question is to develop and analyse new learning methods for design of RBFN and SPN. In the thesis, which is based on seven publications, three learning methods for RBFN, two for SPN, and one (SPN-based) power control algorithm for cellular radio systems are developed, analysed and compared with corresponding algorithms in literature. The key point in the design of RBFN is to specify the number and locations of centers. All the proposed algorithms -which will be called Input Output Clustering (IOC), Augmented-Input-Layer (AIL) RBFN and Clustering of Regressors (CR)are based on a clustering process applied on either input-output samples or outputs of hidden neurons for determining the centers of RBFN. The idea of concatenating the output vector to the input vector in the clustering process has independently been proposed by several authors in the literature, although none of them revealed an analysis of such hybrid methods but rather demonstrated their eectiveness in several applications. The most challenging problem of this thesis is to construct a new upper bound on output error of RBFN and SPN in terms of input-output quantization errors: A new approach for investigating the relationship between clustering process on input-output training samples and mean squared output error has been presented. The main result is: i) A weighted mean squared input-output quantization error, which is to be minimized by IOC as well as IC gives an upper bound to the mean squared output error, and ii) this upper bound and consequently the output error can be made arbitrarily small (zero in the limit case) by increasing the number of hidden neurons, i.e., by decreasing the quantization error. The gradient-descent type supervised learning algorithms are the most commonly used ones for design of standard SPN. In this thesis, unsupervised learning is proposed for determining the SPN input-layer (synaptic) weights. The proposed hierarchical methods minimize an upper bound on the SPN output error by a clustering algorithm (IC or IOC) for determining the input layer (synaptic) weights in contrast to supervised learning algorithms minimizing the output error itself. The simulation results indicate that the proposed hierarchical methods yield comparable performance as compared to RBFN case. Last part of the thesis is concerned with power control (PC) in cellular radio systems. Simple RBFN and SPN models have been applied to prediction of Rayleigh fading. In the thesis, the PC problem is viewed as a controller design problem. The proposed algorithm, which will be called SPN PC, is a rst order, fully distributed (nonlinear) PC algorithm. The analysis can readily be applied to other nonlinear (e.g. RBFN) or linear (e.g. PI) controller case, each of which would yield in new fully distributed PC algorithms.
ii
To my beloved ones, my mother and father, Hayriye and Sayit Uykan. Mojoj majki i mome babu.
iii
Preface This thesis was carried out at the Control Engineering Laboratory of Helsinki University of Technology from September 1996 to April 2001. I would like to express my sincere gratitude to my supervisor, professor Heikki N. Koivo, for his guidance, support, encouragement, and positive attitude throughout my research. I am also grateful to professor Cuneyt Guzelis for his guidance during my time in Istanbul Technical University, Turkey, and for numerous fruitful discussions. I also thank the whole personnel of the Control Engineering Laboratory for the creative and scienti c atmosphere. The nancial supports from the Centre for International Mobility (CIMO), Finland, Finnish Academy Graduate School on Electronics, Telecommunication and Automation (GETA), Imatran Voiman Saatio, Sonera Foundation and Elisa Foundation are gratefully acknowledged. Finally, but most importantly, my warmest thanks go to my beloved ones, my mother Hayriye and my father Sayit Uykan, to whom this thesis is dedicated.
April 27, 2001, Espoo.
Zekeriya Uykan
iv
v
List of Publications
This thesis is based on the following seven publications:
publication 1:
Z. Uykan and C. Guzelis, Input-Output Clustering for Determining the Centers of Radial Basis Function Network, ECCTD-97, vol.2, pp. 435 - 439, BudapestHungary, 1997.
publication 2:
Z. Uykan, C. Guzelis, M.E. Celebi and H.N. Koivo, Analysis of Input-Output Clustering for determining centers of Radial Basis Function Networks, IEEE Trans. on Neural Networks, vol.11, no 4, pp. 851-858, July 2000.
publication 3:
Z. Uykan and H.N. Koivo, Upper Bounds on RBFN Designed by Input Clustering, Proc. of IEEE-ACC (American Control Conference)-2000, vol.2, pp. 1440-1444, Chicago, June 2000. (extended version submitted to IEEE Trans. on Neural Networks).
publication 4:
Z. Uykan and H.N. Koivo, Augmented-Input-Layer Radial Basis Function Networks, ICSC/IFAC Symposium on Neural Computation / NC'98, pp. 989-994, ViennaAustria, 1998.
publication 5:
Z. Uykan and H.N. Koivo, Clustering of Regressors for Constructing Radial Basis Function Networks, WMSCI'98 (World Multiconference on Systemics, Cybernetics and Informatics), pp. 741-748, Orlando, Florida, July 1998.
publication 6:
Z. Uykan and H.N. Koivo, Unsupervised Learning of Sigmoid Perceptron, Proc. of IEEE-ICASSP2000 (International Conference on Acoustics, Speech, and Signal Processing), vol.6, pp. 3486 - 3489, Istanbul, Turkey, June 2000.
publication 7:
Z. Uykan and H.N. Koivo, A Sigmoid-Basis Nonlinear Power Control Algorithm for Mobile Radio Systems IEEE-VTC2000 (Vehicular Technology Conference), vol. 4, pp. 1556-1560, Boston, USA, 2000. (submitted to IEEE Trans. on Vehicular Technoloy).
vi
List of Mathematical notations xs ds
F () () ; ~
C W L p q N Z
mj
k; K
ri R e d q M Mx D E pi
p
gij sij lij
G H
H ! M
s0th input vector s0th (desired) output vector output of RBFN or SPN activation function of RBFN or SPN linear output weight vector of RBFN or SPN network matrix whose columns are centers of RBFN matrix whose columns are input (synaptic) weights of RBFN smoothing factor of the sigmoid function number of training vectors dimension of input vector dimension of output vector number of neurons size of look-up table in AIL RBFN cluster (codebook) vector of the j 0th cluster in input/ouput space scaling factor in IOC Lipschitz constant regressor vector in RBFN model corresponding the i0 th neuron Regressor matrix in RBFN regression model error vector in RBFN model desired output vector quantization error vector matrix whose columns are the cluster (codebook) vectors in input-output space matrix whose columns are the cluster (codebook) vectors in input space quantization error output error fuction of RBFN or SPN transmit power of mobile i transmit power vector link gain from transmitter j to receiver i shadow fading term from transmitter j to receiver i propagation loss from transmitter j to receiver i link gain matrix normalized link gain matrix spectral radius of matrix H eigenvalue number of mobiles sharing the same channel free parameter of SPN algorithm
vii
List of Abbreviations RBFN
Radial Basis Function Neural Network
SPN
Sigmoid Perceptron Network
OLS
Orthogonal Least Squares
AIL RBFN Augmented-Input-Layer RBFN CR
Clustering of Regressors
IOC
Input Output Clustering
IC
Input Clustering
RS
Random Selection
VQ
Vector Quantization
LMS
Least Mean Squares
OE
Output Error
UB
Upper Bound
PC
Power Control
SgmPC
Sigmoid Power Control
SPN PC
SPN (Sigmoid Perceptron Network) Power Control
C-SPN PC Constrained SPN (Sigmoid Perceptron Network) Power Control CDMA
Code Division Multiple Access
DCPC
Distributed Constrained Power Control
CIR
Carrier-to-Interference+noise Ratio
Contents 1 Introduction 2 Radial Basis Function Neural Network
1 4
3 Clustering-Based Algorithms for RBFN
8
2.1 Radial Basis Function Network . . . . . . . . . . . . . . . . . . . . . 4 2.2 Learning Algorithms for RBFN . . . . . . . . . . . . . . . . . . . . . 6 3.1 IOC for Center Determination of RBFN . . . . . 3.2 Analysis of IOC . . . . . . . . . . . . . . . . . . 3.2.1 Computer Simulation Results . . . . . . . 3.2.2 Conclusions . . . . . . . . . . . . . . . . . 3.3 Upper Bounds on RBFN Designed by IC . . . . 3.3.1 Computer Simulation Results . . . . . . . 3.3.2 Conclusions . . . . . . . . . . . . . . . . . 3.4 Augmented-Input-Layer (AIL) RBFN . . . . . . 3.4.1 Upper Bound on AIL RBFN output error 3.4.2 Computer Simulation Results . . . . . . . 3.4.3 Conclusions . . . . . . . . . . . . . . . . . 3.5 Clustering of Regressors (CR) . . . . . . . . . . . 3.5.1 The eect of CR on the output error . . . 3.5.2 Computer Simulations . . . . . . . . . . . 3.5.3 Conclusions . . . . . . . . . . . . . . . . .
4 Hierarchical Learning of SPN
4.1 IOC for design of SPN . . . . . . . . . . . . 4.1.1 Upper Bound on SPN Output Error 4.1.2 Computer Simulation Results . . . . 4.1.3 Conclusions . . . . . . . . . . . . . . 4.2 IC for design of SPN . . . . . . . . . . . . . 4.2.1 Upper Bound on SPN by IC . . . . 4.2.2 Computer Simulation Results . . . . 4.2.3 Conclusions . . . . . . . . . . . . . . viii
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
8 10 13 20 20 23 23 24 26 27 28 28 29 31 32
33
35 35 37 38 39 39 41 44
CONTENTS
ix
5 SPN Power Control Algorithm
46
5.1 5.2 5.3 5.4
Introduction . . . . . . Power Control Problem SPN PC Algorithm . . Concluding Remarks . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
46 47 49 51
6 Conclusions
52
7 Summary of the publications
54
6.1 Topics for Future Research . . . . . . . . . . . . . . . . . . . . . . . . 53 7.1 Author's contributions to the publications . . . . . . . . . . . . . . . 56
Chapter 1 Introduction Arti cial Neural Networks, commonly referred to as \Neural Networks", have been focus of a number of dierent disciplines in literature like neuroscience, mathematics, electrical and computer engineering, psychology. Accordingly, it has numerous de nitions in the literature. Kohonen de nes neural networks as [35]: \Arti cial Neural Networks are massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which are intended to interact with the objects of real world in the same way as the biological nervous systems do". Almost 50 dierent neural network architectures have been developed in the literature, although only a part of them is in common use [37]. Radial basis function neural network and standard single-hidden layer sigmoid perceptron network are two of commonly used ones (e.g. [59]), which are focused in this thesis. The thesis is concerned with learning methods of Radial Basis Function Network (RBFN) and standard single-hidden-layer Sigmoid Perceptron Network (SPN). The main research question is to develop and analyse new learning methods for design of RBFN and SPN. The thesis is based on seven publications given on page iv, in which three learning methods for RBFN, two for SPN, and two (fully distributed) power control algorithms for cellular radio systems are developed, analysed and compared with corresponding algorithms in literature. The proposed algorithms for design of RBFN -which will be called Input-Output Clusteering (IOC), Augmented-Input-Layer (AIL) RBFN and Clustering of Regressors (CR)- are based on a clustering process applied to either input-output samples or outputs of hidden neurons for determining centers of RBFN. In IOC and AIL methods, a clustering method is applied on augmented vectors which are obtained by concatenating weighted input vector to output vector in the clustering process. Unlike AIL method, IOC projects the augmented codebook vectors into input space (after rescaling the input part). AIL RBFN augments the input of RBFN with desired output values in training phase and with a computed average (priori) in generalization phase. CR in publication 5, on the other hand, applies clustering to the outputs of the hidden neurons (i.e., regressors) instead of the input-output 1
CHAPTER 1. INTRODUCTION
2 vectors.
The idea of concatenating the output vector to the input vector in the clustering process has independently been proposed by several authors, e.g. [34], [6], [71], publication 1, [51], although none of them revealed an analysis of such hybrid methods but rather demonstrated their eectiveness in several applications. The most challenging problem of this thesis is to construct a new upper bound on output error of RBFN and SPN in terms of input-output quantization errors: A new approach for investigating the relationship between clustering process on input-output training samples and mean squared output error has been presented. The main result is: i) A weighted mean squared input-output quantization error, which is to be minimized by Input Output Clustering (IOC) as well as Input Clustering (IC) gives an upper bound to the mean squared output error, and ii) this upper bound and consequently the output error can be made arbitrarily small (zero in the limit case) by increasing the number of hidden neurons, i.e., by decreasing the quantization error. The analysis is carried out for IOC, IC and AIL methods. For the analysis part of the CR, on the other hand, error vector of RBFN regression model is obtained as a linear combination of the quantization errors in regressor space. Standard single-hidden-layer Sigmoid Perceptron Network (SPN) has been a focus of a large number of studies in literature. Standard back-propagation-type gradient-descent type supervised learning algorithms minimizing the output error are the most commonly used (e.g. [28]) for design of SPN. But, one of its disanvantage is that learning is generally slow (e.g. [15]). Apart from long computation time, this method suers from local minima problem when applied to a nonconvex error function, thus providing suboptimal solutions (see e.g. [15], [28], [27]). In the context of standard SPN, in this thesis, unsupervised learning is proposed for determining the input-layer (synaptic) weights in contrast to traditional gradientdescent type supervised learning. The proposed hierarchical methods minimize an upper bound on the SPN output error by a clustering algorithm (IC or IOC) for determining input-layer (synaptic) weights in contrast to supervised learning algorithms minimizing the output error itself. The simulation results concerning the SPN show that i) the proposed hierarchical learning of SPN yields comparable performance in comparison with RBFN case, and ii) the upper bounds minimized during the clustering are relatively tight to the SPN output error function. The derivations concerning the above-mentioned upper bound analysis have been made for RBFN and SPN in this thesis. Nevertheless, the analysis can readily be applied to a single hidden-layer network with any bounded nonlinear activation function which is Lipschitz continuous and has a limit at one in nity [30] (e.g. ramp function). A function () : < ! < is Lipschitz continuous if there exists k > 0 such that j(a) ; (b)j kja ; bj; 8a; b 2 < (see e.g. [11]). The fact that the upper bound can be made arbitrarily small (zero in the limit case) eventually refers to the fact that RBFN and SPN designed by IOC as well as IC are universal approximators. 1
1
1
3 Last part of the thesis is concerned with power control in cellular radio systems. PC problem has drawn much attention since Zander's works on centralized [80] and distributed [79] CIR balancing in early 90's and the number of papers on the same direction has recently been growing remarkably. In this thesis, the same problem is viewed as a controller design problem, which results in a framework suitable for developing and analysing new (fully distributed) PC algorithms. The algorithms, which will be called sigmoid and SPN PC algorithms, are proposed and analysed in this thesis. The analysis can readily be applied to other nonlinear (e.g. RBFN) or linear (e.g. PI) controller case, each of which would yield in new fully distributed PC algorithms. Computer simulation results indicate the sigmoid PC algorithms significantly enhance the convergence speed of PC compared with the linear distributed constrained power control [24] as a reference algorithm. Note that some of the results presented in above mentioned publications have been further studied and these studies together with new simulation results are presented throughout the chapters. For example, the upper bound analysis in publication 3 has been applied to SPN case and the results are presented in Chapter 4.2. Therefore, all the topics are resummarized throughout capter 3 to chapter 5 with these further studies using the same notation for all. Accordingly, the notation used in chapter 3 to chapter 5 may be a bit dierent from the ones in the corresponding publications. The rest of the thesis is organized as follows: Chapter 2 summarizes the structure of RBFN and learning algorithms used for the design of RBFN in literature. The clustering based algorithms for RBFN (which will be called IOC, AIL and CR) together with their analysis are presented in Chapter 3. Chapter 4 introduces the clustering based algorithms, basically IC and IOC, for learning input (synaptic) weights of the standard SPN together with their upper bound calculations. Chapter 5 is concerned with power control for cellular radio systems, in which so-called SPN power control algorithm is presented. The main conclusions of the thesis are given in Chapter 6 and the publications are summarized in Chapter 7. Seven publications follow the chapters.
Chapter 2 Radial Basis Function Neural Network In this chapter, the structure of RBFN and learning algorithms used for design of RBFN in literature are summarized.
2.1 Radial Basis Function Network Radial Basis Function neural Network (RBFN) with the simplicity of its local tunable property is a good alternative to sigmoidal multilayer perceptron. Linear output layer and radial basis hidden layer structure of RBFN gives the possibility of learning the connection weights eciently without local minima problem in a hierarchical procedure so that the linear weights are learned after determining the centers by a clustering process. The radial basis function method has traditionally been used for strict interpolation in multidimensional space by Powell [53] and Micchelli [43] requiring as many centers as data points (assigning all input data points as centers). Then Broomhead and Lowe [3] removed the \strict" restriction and used less centers than data samples, so allowing many practical RBFN applications in which the number of data samples is very large. Today RBFN has been a focus of studies not only in numerical analysis but also in neural networks area. The RBFN has also been of interest in Neuro-Fuzzy systems in last decade where the aim is to combine the advantages of neural networks and fuzzy systems. It is shown in [32] that the functional behaviour of RBFN is equal to a fuzzy system, under some restrictions. This functional equivalance makes it possible to apply what has been discovered (learning rule, representational power, etc) for one of the methods to the other and vice versa [73], [68]. Using the gaussian kernel function, RBFN is capable of forming an arbitrarily close approximation to any continuous function [26], [39] on a compact set. Chen and Chen [10], presented a general result on approximating to nonlinear functionals 4
2.1. RADIAL BASIS FUNCTION NETWORK
5
Figure 2.1: A RBFN with one output. and operators by RBFN using sample data either in frequency or in time domain. The construction of an RBFN involves three dierent layers: Input layer which consists of source nodes, hidden layer in which each neuron computes its output using a radial basis function and output layer which builds a linear weighted sum of input neuron outputs to supply the response of the network. An RBFN with one output neuron implements the input-output relation in (2.1) which is indeed a composition of the nonlinear mapping realized by the input layer and the linear mapping realized by the output layer.
F (; C; x) =
N X j =1
j (kx ; cj k )
(2.1)
2
where (a) = e;a is the radial basis function, N is the number of input neurons, x 2
N ). But, in the CR, a clustering algorithm is applied to the regressor vectors in (3.26) and the ones being nearest to the cluster codebook vectors are selected, whereas in OLS at each step the regressors are orthogonalized to nd the one which gives the biggest contribution to the output energy.
3.5.1 The eect of CR on the output error
The output error function is the same as in previous chapters, i.e., the average sum of output errors over a nite set of L training patterns.
E (; C) = L1
L X s=1
kds ; F (; C; xs)k
2 2
(3.28)
CHAPTER 3. CLUSTERING-BASED ALGORITHMS FOR RBFN
30
Figure 3.11: Regressors and codebook vector for the cluster j i .
is the codebook vector of cluster j(i) ; rl , ri , rk are the regressors and rj (i) is the nearest regressor to the codebook vector in the sense of Eucledian norm and qi , qk are quantization error vectors for ri and rk respectively. ( )
tj(i)
where = [ L] is the linear weight vector and C = [c ; ; cN ] is a matrix whose columns are the centers of RBFN and fxs; dsgLs is the set of input-(desired) output pairs and F () is the RBFN of (2.1). For determining the centers of RBFN, a clustering algorithm is applied to the regressors in (3.26) and nally the error vector of RBFN regression model is obtained as a linear combination of the quantization error vectors obtained by a clustering algorithm as shown in (3.29). For details, see publication 5. 2 ~ 3 h i e = q : : : qL LL 664 ... 775 = ~ q + + ~LqL (3.29) ~L L where fqi gLi are the quantization error vectors obtained by the VQ [36] and f~igLi are corresponding linear weights. It can be shown that l norm of the error vector is equal to the output error function in (3.28) up to a multiplication i.e., 1
1
=1
1
1
1
1
1
=1
2
=1
L X L X 1 1 ~k ~l qk ql (3.30) E (; C) = L e e = L k l In fact, eq.(3.29) and (3.30) do not show that the smaller quantization error vectors necessarily give smaller error vector because of the fact that optimal linear weights depend on the desired outputs too. Nevertheless, assuming that achieving a good clustering (or obtaining small quantization error vectors) would yield small error vector is not far from reality. The regressors within a cluster are quite similar to T
T
=1 =1
3.5. CLUSTERING OF REGRESSORS (CR) Training
31
Generalization
N RS CR RS CR 2 1.6146 1.3629 1.8567 1.5562 3 0.2770 0.0672 0.3692 0.0965 4 0.0545 0.0378 0.0978 0.0650 5 0.0344 0.0375 0.0573 0.0619 6 0.0234 0.0212 0.0409 0.0436 Table 3.5: Average prediction output square errors during the training and generalization phases using the RBFN designed by CR method as compared to Random Selection (RS) of centers for dierent number of neurons (N ). Mobile speed is 30 kmph. each other (e.g. see Fig.3 in publication 5) and all are represented by the regressor which is the nearest to the codebook vector. Increasing the number of clusters decreases the quantization errors and hence decreases the output error function. Taking as many clusters as regressors would give zero quantization error and hence zero output error. On the other hand, if enough number of clusters in regressor space are not obtained, then it is proposed to produce new regressors (for example by changing the variance of gaussians or by taking also new centers dierent than input samples). Then a clustering algorithm is applied to all regressors, in which case the number of regressors is much higher than the dimension. Similarly, the regressors being nearest to the codebook vectors are selected, which may not be linearly independent any more. After removing the selected linearly dependent regressors, a similar analysis as in Section 3.5.1 can be re-carried out. Nevertheless, in the simulations in publication 5, there was no need to produce new regressors.
3.5.2 Computer Simulations
In publication 5, two examples were presented: Function approximation under noisy conditions and system identi cation as compared to OLS method. In the following, the performance of CR method is compared with that of Random Selection (RS) method of centers in Rayleigh [31] fading prediction in the Example 3 of Section 3.2.1. For description of the problem, see Example 3 in Section 3.2.1. The only dierence is that here the appled algorithms are CR and RS. Table 3.5 presents the average prediction output square errors in the received Rayleigh fading signal during the training and generalization phases using the RBFN designed by CR with dierent number of neurons (N ) as compared to Random Selection (RS) of centers for dierent number of neurons (mobile speed is 30 kmph). The results show that CR gave better performance (less output error) for N 4 and comparable performance for N 5 as compared to RS method.
32
CHAPTER 3. CLUSTERING-BASED ALGORITHMS FOR RBFN
3.5.3 Conclusions
This section shows that a well-performing set of radial basis functions can emerge from a simple clustering process applied to the regressors. Simulation results show that the CR method is 1) independent of ordering among the regressors, and 2) more robust in severely noisy conditions in comparison with OLS (see simulation results in publication 5). The disadvantage of the algorithm is that the dimension of the regressor vectors might be as high as the number of samples, which may deteriorate the clustering performance. This subchapter reveals that the error vector of RBFN regression model is obtained as a linear combination of the quantization error vectors. (The l norm of the error vector is equal to the output error function multiplied by 1=L). 2
Chapter 4 Hierarchical Learning of SPN Standard single-hidden-layer Sigmoid Perceptron Network (SPN) and RBFN, two examples of nonlinear feedforward networks, have intensively been studied in the literature (e.g. [59]). They both are universal approximators [27]. As SPN constructs global approximations to nonlinear input-output mapping, RBFN has local tuneable property. The main dierence between two structures is as follows: As the argument of the activation function in RBFN computes the norm (distance) between the input vector and center vector of the unit, the activation function in the SPN calculates the inner product of the input vector and the synaptic weight vector of the neuron. A comparison of RBFN and multilayer perceptron in general is given in [27]. The standard single-hidden-layer SPN model involves three dierent layers (Fig.4.1): Input layer which calculates the inner product of input vector and the weight vector of the unit, hidden layer in which each neuron computes its output using sigmoid function and output layer which builds a linear weighted sum of hidden layer outputs to supply the response of the network. Sigmoid perceptron with one output neuron implements the input-output relation in (4.1).
F (; W; x) =
N X j =1
j (xT wj )
(4.1)
where (a) = 0:5 ; exp ;a ; > 0 is the sigmoid function, x 2
0, there exist an integer M and sets of real vectors wi ; i = 1; : : : ; N and 2 Rp such that one can de ne
F (x) =
N X i=1
i(xT wi)
(4.2)
as an approximate realization of the function f (); that is,
jF (x) ; f (x)j <
(4.3)
for all x 2 Ip . Notice that universal approximation theorem is an existence theorem which says that a single hidden layer is sucient for a multilayer perceptron to compute a uniform approximation to a given training set represented by the set of inputs x and desired outputs f (x) [27]. The rest of the chapter is organized as follows: Section 4.1 and 4.2 present SPNs designed by IOC and IC respectively together with their upper bound analysis.
4.1. IOC FOR DESIGN OF SPN
35
4.1 IOC for design of SPN The proposed IOC algorithm for the design of standard SPN applies a clustering algorithm to the augmented vectors in (4.4). L p L q tI=O s = [ xs ds ] ; fxs gs 2 < ; fds gs 2 < T
T T
=1
=1
where L is the number of training vectors and is a weighting factor. Here, the following weighted l quantization error is considered.
(4.4)
2
L X
D(M) = L1 ktI=O s ; mj s k s L " (x ; m x ) # X 1 s js = k (4.5) ds ; myj s k Ls where, M = [m ; ; mN ] is a matrix whose columns are the cluster center vectors in the input-output space. j (s) : f1; 2; : : : ; Lg ! f1; 2; : : : ; N g is the index identifying the cluster to which the sample tI=O s belongs. Without loss of generality, only the 2 ( ) 2
=1
( )
2 2
( )
=1
1
input part of the augmented vector in (4.4) is weighted. The proposed hierarchical learning for the SPN can be summarized as follows (publication 6): Step 1: Apply a clustering algorithm to they set of augmented vectors in (4.4) and obtain the cluster vectors (mj = [( mxj) (mj ) ] 2