clustering-based algorithms for radial basis function and sigmoid

0 downloads 0 Views 2MB Size Report
tering for determining centers of Radial Basis Function Networks, IEEE Trans. ..... 1) RBFN's powerful local approximation capability may collapse for a gradient-.
Helsinki University of Technology Control Engineering Laboratory Espoo 2001

Report 124

CLUSTERING-BASED ALGORITHMS FOR RADIAL BASIS FUNCTION AND SIGMOID PERCEPTRON NETWORKS Zekeriya Uykan

TEKNILLINEN KORKEAKOULU TEKNISKA HÖGSKOLAN HELSINKI UNIVERSITY OF TECHNOLOGY TECHNISCHE UNIVERSITÄT HELSINKI UNIVERSITE DE TECHNOLOGIE D´HELSINKI

Helsinki University of Technology Control Engineering Laboratory Espoo June 2001

Report 124

CLUSTERING-BASED ALGORITHMS FOR RADIAL BASIS FUNCTION AND SIGMOID PERCEPTRON NETWORKS Zekeriya Uykan Dissertation for the degree of Doctor of Science in Technology to be presented with due permission of the Department of Automation and Systems Engineering, for public examination and debate in Auditorium S4 at Helsinki University of Technology (Espoo, Finland) on the 27th of June, 2001, at 12 noon.

Helsinki University of Technology Department of Automation and Systems Technology Control Engineering Laboratory

Distribution: Helsinki University of Technology Control Engineering Laboratory P.O. Box 5400 FIN-02015 HUT, Finland Tel. +358-9-451 5201 Fax. +358-9-451 5208 E-mail: [email protected]

ISBN 951-22-5529-4 ISSN 0356-0872

Picaset Oy Helsinki 2001

i

Abstract This thesis is concerned with learning methods of Radial Basis Function Network (RBFN) and standard single-hidden-layer Sigmoid Perceptron Network (SPN). The main research question is to develop and analyse new learning methods for design of RBFN and SPN. In the thesis, which is based on seven publications, three learning methods for RBFN, two for SPN, and one (SPN-based) power control algorithm for cellular radio systems are developed, analysed and compared with corresponding algorithms in literature. The key point in the design of RBFN is to specify the number and locations of centers. All the proposed algorithms -which will be called Input Output Clustering (IOC), Augmented-Input-Layer (AIL) RBFN and Clustering of Regressors (CR)are based on a clustering process applied on either input-output samples or outputs of hidden neurons for determining the centers of RBFN. The idea of concatenating the output vector to the input vector in the clustering process has independently been proposed by several authors in the literature, although none of them revealed an analysis of such hybrid methods but rather demonstrated their e ectiveness in several applications. The most challenging problem of this thesis is to construct a new upper bound on output error of RBFN and SPN in terms of input-output quantization errors: A new approach for investigating the relationship between clustering process on input-output training samples and mean squared output error has been presented. The main result is: i) A weighted mean squared input-output quantization error, which is to be minimized by IOC as well as IC gives an upper bound to the mean squared output error, and ii) this upper bound and consequently the output error can be made arbitrarily small (zero in the limit case) by increasing the number of hidden neurons, i.e., by decreasing the quantization error. The gradient-descent type supervised learning algorithms are the most commonly used ones for design of standard SPN. In this thesis, unsupervised learning is proposed for determining the SPN input-layer (synaptic) weights. The proposed hierarchical methods minimize an upper bound on the SPN output error by a clustering algorithm (IC or IOC) for determining the input layer (synaptic) weights in contrast to supervised learning algorithms minimizing the output error itself. The simulation results indicate that the proposed hierarchical methods yield comparable performance as compared to RBFN case. Last part of the thesis is concerned with power control (PC) in cellular radio systems. Simple RBFN and SPN models have been applied to prediction of Rayleigh fading. In the thesis, the PC problem is viewed as a controller design problem. The proposed algorithm, which will be called SPN PC, is a rst order, fully distributed (nonlinear) PC algorithm. The analysis can readily be applied to other nonlinear (e.g. RBFN) or linear (e.g. PI) controller case, each of which would yield in new fully distributed PC algorithms.

ii

To my beloved ones, my mother and father, Hayriye and Sayit Uykan. Mojoj majki i mome babu.

iii

Preface This thesis was carried out at the Control Engineering Laboratory of Helsinki University of Technology from September 1996 to April 2001. I would like to express my sincere gratitude to my supervisor, professor Heikki N. Koivo, for his guidance, support, encouragement, and positive attitude throughout my research. I am also grateful to professor Cuneyt Guzelis for his guidance during my time in Istanbul Technical University, Turkey, and for numerous fruitful discussions. I also thank the whole personnel of the Control Engineering Laboratory for the creative and scienti c atmosphere. The nancial supports from the Centre for International Mobility (CIMO), Finland, Finnish Academy Graduate School on Electronics, Telecommunication and Automation (GETA), Imatran Voiman Saatio, Sonera Foundation and Elisa Foundation are gratefully acknowledged. Finally, but most importantly, my warmest thanks go to my beloved ones, my mother Hayriye and my father Sayit Uykan, to whom this thesis is dedicated.

April 27, 2001, Espoo.

Zekeriya Uykan

iv

v

List of Publications

This thesis is based on the following seven publications:

publication 1:

Z. Uykan and C. Guzelis, Input-Output Clustering for Determining the Centers of Radial Basis Function Network, ECCTD-97, vol.2, pp. 435 - 439, BudapestHungary, 1997.

publication 2:

Z. Uykan, C. Guzelis, M.E. Celebi and H.N. Koivo, Analysis of Input-Output Clustering for determining centers of Radial Basis Function Networks, IEEE Trans. on Neural Networks, vol.11, no 4, pp. 851-858, July 2000.

publication 3:

Z. Uykan and H.N. Koivo, Upper Bounds on RBFN Designed by Input Clustering, Proc. of IEEE-ACC (American Control Conference)-2000, vol.2, pp. 1440-1444, Chicago, June 2000. (extended version submitted to IEEE Trans. on Neural Networks).

publication 4:

Z. Uykan and H.N. Koivo, Augmented-Input-Layer Radial Basis Function Networks, ICSC/IFAC Symposium on Neural Computation / NC'98, pp. 989-994, ViennaAustria, 1998.

publication 5:

Z. Uykan and H.N. Koivo, Clustering of Regressors for Constructing Radial Basis Function Networks, WMSCI'98 (World Multiconference on Systemics, Cybernetics and Informatics), pp. 741-748, Orlando, Florida, July 1998.

publication 6:

Z. Uykan and H.N. Koivo, Unsupervised Learning of Sigmoid Perceptron, Proc. of IEEE-ICASSP2000 (International Conference on Acoustics, Speech, and Signal Processing), vol.6, pp. 3486 - 3489, Istanbul, Turkey, June 2000.

publication 7:

Z. Uykan and H.N. Koivo, A Sigmoid-Basis Nonlinear Power Control Algorithm for Mobile Radio Systems IEEE-VTC2000 (Vehicular Technology Conference), vol. 4, pp. 1556-1560, Boston, USA, 2000. (submitted to IEEE Trans. on Vehicular Technoloy).

vi

List of Mathematical notations xs ds

F () () ; ~

C W  L p q N Z

mj

k; K

ri R e d q M Mx D E pi

p

gij sij lij

G H

H ! M

s0th input vector s0th (desired) output vector output of RBFN or SPN activation function of RBFN or SPN linear output weight vector of RBFN or SPN network matrix whose columns are centers of RBFN matrix whose columns are input (synaptic) weights of RBFN smoothing factor of the sigmoid function number of training vectors dimension of input vector dimension of output vector number of neurons size of look-up table in AIL RBFN cluster (codebook) vector of the j 0th cluster in input/ouput space scaling factor in IOC Lipschitz constant regressor vector in RBFN model corresponding the i0 th neuron Regressor matrix in RBFN regression model error vector in RBFN model desired output vector quantization error vector matrix whose columns are the cluster (codebook) vectors in input-output space matrix whose columns are the cluster (codebook) vectors in input space quantization error output error fuction of RBFN or SPN transmit power of mobile i transmit power vector link gain from transmitter j to receiver i shadow fading term from transmitter j to receiver i propagation loss from transmitter j to receiver i link gain matrix normalized link gain matrix spectral radius of matrix H eigenvalue number of mobiles sharing the same channel free parameter of SPN algorithm

vii

List of Abbreviations RBFN

Radial Basis Function Neural Network

SPN

Sigmoid Perceptron Network

OLS

Orthogonal Least Squares

AIL RBFN Augmented-Input-Layer RBFN CR

Clustering of Regressors

IOC

Input Output Clustering

IC

Input Clustering

RS

Random Selection

VQ

Vector Quantization

LMS

Least Mean Squares

OE

Output Error

UB

Upper Bound

PC

Power Control

SgmPC

Sigmoid Power Control

SPN PC

SPN (Sigmoid Perceptron Network) Power Control

C-SPN PC Constrained SPN (Sigmoid Perceptron Network) Power Control CDMA

Code Division Multiple Access

DCPC

Distributed Constrained Power Control

CIR

Carrier-to-Interference+noise Ratio

Contents 1 Introduction 2 Radial Basis Function Neural Network

1 4

3 Clustering-Based Algorithms for RBFN

8

2.1 Radial Basis Function Network . . . . . . . . . . . . . . . . . . . . . 4 2.2 Learning Algorithms for RBFN . . . . . . . . . . . . . . . . . . . . . 6 3.1 IOC for Center Determination of RBFN . . . . . 3.2 Analysis of IOC . . . . . . . . . . . . . . . . . . 3.2.1 Computer Simulation Results . . . . . . . 3.2.2 Conclusions . . . . . . . . . . . . . . . . . 3.3 Upper Bounds on RBFN Designed by IC . . . . 3.3.1 Computer Simulation Results . . . . . . . 3.3.2 Conclusions . . . . . . . . . . . . . . . . . 3.4 Augmented-Input-Layer (AIL) RBFN . . . . . . 3.4.1 Upper Bound on AIL RBFN output error 3.4.2 Computer Simulation Results . . . . . . . 3.4.3 Conclusions . . . . . . . . . . . . . . . . . 3.5 Clustering of Regressors (CR) . . . . . . . . . . . 3.5.1 The e ect of CR on the output error . . . 3.5.2 Computer Simulations . . . . . . . . . . . 3.5.3 Conclusions . . . . . . . . . . . . . . . . .

4 Hierarchical Learning of SPN

4.1 IOC for design of SPN . . . . . . . . . . . . 4.1.1 Upper Bound on SPN Output Error 4.1.2 Computer Simulation Results . . . . 4.1.3 Conclusions . . . . . . . . . . . . . . 4.2 IC for design of SPN . . . . . . . . . . . . . 4.2.1 Upper Bound on SPN by IC . . . . 4.2.2 Computer Simulation Results . . . . 4.2.3 Conclusions . . . . . . . . . . . . . . viii

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

8 10 13 20 20 23 23 24 26 27 28 28 29 31 32

33

35 35 37 38 39 39 41 44

CONTENTS

ix

5 SPN Power Control Algorithm

46

5.1 5.2 5.3 5.4

Introduction . . . . . . Power Control Problem SPN PC Algorithm . . Concluding Remarks . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

46 47 49 51

6 Conclusions

52

7 Summary of the publications

54

6.1 Topics for Future Research . . . . . . . . . . . . . . . . . . . . . . . . 53 7.1 Author's contributions to the publications . . . . . . . . . . . . . . . 56

Chapter 1 Introduction Arti cial Neural Networks, commonly referred to as \Neural Networks", have been focus of a number of di erent disciplines in literature like neuroscience, mathematics, electrical and computer engineering, psychology. Accordingly, it has numerous de nitions in the literature. Kohonen de nes neural networks as [35]: \Arti cial Neural Networks are massively parallel interconnected networks of simple (usually adaptive) elements and their hierarchical organizations which are intended to interact with the objects of real world in the same way as the biological nervous systems do". Almost 50 di erent neural network architectures have been developed in the literature, although only a part of them is in common use [37]. Radial basis function neural network and standard single-hidden layer sigmoid perceptron network are two of commonly used ones (e.g. [59]), which are focused in this thesis. The thesis is concerned with learning methods of Radial Basis Function Network (RBFN) and standard single-hidden-layer Sigmoid Perceptron Network (SPN). The main research question is to develop and analyse new learning methods for design of RBFN and SPN. The thesis is based on seven publications given on page iv, in which three learning methods for RBFN, two for SPN, and two (fully distributed) power control algorithms for cellular radio systems are developed, analysed and compared with corresponding algorithms in literature. The proposed algorithms for design of RBFN -which will be called Input-Output Clusteering (IOC), Augmented-Input-Layer (AIL) RBFN and Clustering of Regressors (CR)- are based on a clustering process applied to either input-output samples or outputs of hidden neurons for determining centers of RBFN. In IOC and AIL methods, a clustering method is applied on augmented vectors which are obtained by concatenating weighted input vector to output vector in the clustering process. Unlike AIL method, IOC projects the augmented codebook vectors into input space (after rescaling the input part). AIL RBFN augments the input of RBFN with desired output values in training phase and with a computed average (priori) in generalization phase. CR in publication 5, on the other hand, applies clustering to the outputs of the hidden neurons (i.e., regressors) instead of the input-output 1

CHAPTER 1. INTRODUCTION

2 vectors.

The idea of concatenating the output vector to the input vector in the clustering process has independently been proposed by several authors, e.g. [34], [6], [71], publication 1, [51], although none of them revealed an analysis of such hybrid methods but rather demonstrated their e ectiveness in several applications. The most challenging problem of this thesis is to construct a new upper bound on output error of RBFN and SPN in terms of input-output quantization errors: A new approach for investigating the relationship between clustering process on input-output training samples and mean squared output error has been presented. The main result is: i) A weighted mean squared input-output quantization error, which is to be minimized by Input Output Clustering (IOC) as well as Input Clustering (IC) gives an upper bound to the mean squared output error, and ii) this upper bound and consequently the output error can be made arbitrarily small (zero in the limit case) by increasing the number of hidden neurons, i.e., by decreasing the quantization error. The analysis is carried out for IOC, IC and AIL methods. For the analysis part of the CR, on the other hand, error vector of RBFN regression model is obtained as a linear combination of the quantization errors in regressor space. Standard single-hidden-layer Sigmoid Perceptron Network (SPN) has been a focus of a large number of studies in literature. Standard back-propagation-type gradient-descent type supervised learning algorithms minimizing the output error are the most commonly used (e.g. [28]) for design of SPN. But, one of its disanvantage is that learning is generally slow (e.g. [15]). Apart from long computation time, this method su ers from local minima problem when applied to a nonconvex error function, thus providing suboptimal solutions (see e.g. [15], [28], [27]). In the context of standard SPN, in this thesis, unsupervised learning is proposed for determining the input-layer (synaptic) weights in contrast to traditional gradientdescent type supervised learning. The proposed hierarchical methods minimize an upper bound on the SPN output error by a clustering algorithm (IC or IOC) for determining input-layer (synaptic) weights in contrast to supervised learning algorithms minimizing the output error itself. The simulation results concerning the SPN show that i) the proposed hierarchical learning of SPN yields comparable performance in comparison with RBFN case, and ii) the upper bounds minimized during the clustering are relatively tight to the SPN output error function. The derivations concerning the above-mentioned upper bound analysis have been made for RBFN and SPN in this thesis. Nevertheless, the analysis can readily be applied to a single hidden-layer network with any bounded nonlinear activation function which is Lipschitz continuous and has a limit at one in nity [30] (e.g. ramp function). A function () : < ! < is Lipschitz continuous if there exists k > 0 such that j(a) ; (b)j  kja ; bj; 8a; b 2 < (see e.g. [11]). The fact that the upper bound can be made arbitrarily small (zero in the limit case) eventually refers to the fact that RBFN and SPN designed by IOC as well as IC are universal approximators. 1

1

1

3 Last part of the thesis is concerned with power control in cellular radio systems. PC problem has drawn much attention since Zander's works on centralized [80] and distributed [79] CIR balancing in early 90's and the number of papers on the same direction has recently been growing remarkably. In this thesis, the same problem is viewed as a controller design problem, which results in a framework suitable for developing and analysing new (fully distributed) PC algorithms. The algorithms, which will be called sigmoid and SPN PC algorithms, are proposed and analysed in this thesis. The analysis can readily be applied to other nonlinear (e.g. RBFN) or linear (e.g. PI) controller case, each of which would yield in new fully distributed PC algorithms. Computer simulation results indicate the sigmoid PC algorithms significantly enhance the convergence speed of PC compared with the linear distributed constrained power control [24] as a reference algorithm. Note that some of the results presented in above mentioned publications have been further studied and these studies together with new simulation results are presented throughout the chapters. For example, the upper bound analysis in publication 3 has been applied to SPN case and the results are presented in Chapter 4.2. Therefore, all the topics are resummarized throughout capter 3 to chapter 5 with these further studies using the same notation for all. Accordingly, the notation used in chapter 3 to chapter 5 may be a bit di erent from the ones in the corresponding publications. The rest of the thesis is organized as follows: Chapter 2 summarizes the structure of RBFN and learning algorithms used for the design of RBFN in literature. The clustering based algorithms for RBFN (which will be called IOC, AIL and CR) together with their analysis are presented in Chapter 3. Chapter 4 introduces the clustering based algorithms, basically IC and IOC, for learning input (synaptic) weights of the standard SPN together with their upper bound calculations. Chapter 5 is concerned with power control for cellular radio systems, in which so-called SPN power control algorithm is presented. The main conclusions of the thesis are given in Chapter 6 and the publications are summarized in Chapter 7. Seven publications follow the chapters.

Chapter 2 Radial Basis Function Neural Network In this chapter, the structure of RBFN and learning algorithms used for design of RBFN in literature are summarized.

2.1 Radial Basis Function Network Radial Basis Function neural Network (RBFN) with the simplicity of its local tunable property is a good alternative to sigmoidal multilayer perceptron. Linear output layer and radial basis hidden layer structure of RBFN gives the possibility of learning the connection weights eciently without local minima problem in a hierarchical procedure so that the linear weights are learned after determining the centers by a clustering process. The radial basis function method has traditionally been used for strict interpolation in multidimensional space by Powell [53] and Micchelli [43] requiring as many centers as data points (assigning all input data points as centers). Then Broomhead and Lowe [3] removed the \strict" restriction and used less centers than data samples, so allowing many practical RBFN applications in which the number of data samples is very large. Today RBFN has been a focus of studies not only in numerical analysis but also in neural networks area. The RBFN has also been of interest in Neuro-Fuzzy systems in last decade where the aim is to combine the advantages of neural networks and fuzzy systems. It is shown in [32] that the functional behaviour of RBFN is equal to a fuzzy system, under some restrictions. This functional equivalance makes it possible to apply what has been discovered (learning rule, representational power, etc) for one of the methods to the other and vice versa [73], [68]. Using the gaussian kernel function, RBFN is capable of forming an arbitrarily close approximation to any continuous function [26], [39] on a compact set. Chen and Chen [10], presented a general result on approximating to nonlinear functionals 4

2.1. RADIAL BASIS FUNCTION NETWORK

5

Figure 2.1: A RBFN with one output. and operators by RBFN using sample data either in frequency or in time domain. The construction of an RBFN involves three di erent layers: Input layer which consists of source nodes, hidden layer in which each neuron computes its output using a radial basis function and output layer which builds a linear weighted sum of input neuron outputs to supply the response of the network. An RBFN with one output neuron implements the input-output relation in (2.1) which is indeed a composition of the nonlinear mapping realized by the input layer and the linear mapping realized by the output layer.

F (; C; x) =

N X j =1

j (kx ; cj k )

(2.1)

2

where (a) = e;a is the radial basis function, N is the number of input neurons, x 2

Suggest Documents