Optimization of EBFN Architecture by an Improved RPCL Algorithm

0 downloads 0 Views 230KB Size Report
This method is simple and computationally ... carding abundant units. Set a counter p to zero. ... carded) remain intact when the algorithm n- ishes. To solve this ...
Optimization of EBFN Architecture by an Improved RPCL Algorithm with Application to Speaker Veri cation Xin Li and Man Wai Mak Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hong Kong.

Abstract| Determining an appropriate an e ort to determine an appropriate number

number of clusters is a dicult but important problem. The rival penalized competitive learning (RPCL) algorithm was proposed to solve this problem. But its performance is not satisfactory when the input vectors contain dependent components. This paper addresses this problem by incorporating full covariance matrices into the original RPCL algorithm. The resulting algorithm, referred to as the improved RPCL algorithm, progressively eliminates the units whose clusters contain only a small portion of the training data. The improved algorithm is applied to determine the number of clusters of a Gaussian mixture distribution. It is also applied to optimize the architecture of elliptical basis function networks for speaker veri cation. The results show that the covariance matrices obtained by the improved RPCL algorithm have a better representation of the clusters than that of the original RPCL algorithm, resulting in a lower veri cation error rate.

1 Introduction

of clusters automatically. The idea is that for each input vector, not only the weights of the winner unit are modi ed to adapt to the vector, but also the weights of its rival (the second winner) are \delearned" by a small amount. Although promising results have been demonstrated in [3], we show in this study that the performance of RPCL becomes poor when the input vectors contain dependent components. In this situation, the algorithm is found to be very sensitive to the learning rates of the winner and the rival units. This is because the original RPCL algorithm uses Euclidean distance measure, which could cause poor representation of the data set when the input vectors contain dependent components. To solve this problem, this paper proposes to use full covariance matrices instead of diagonal ones with equal variances as in the RPCL algorithm. We denote this as the improved RPCL algorithm. Its advantage is that it is able to identify potential clusters even though the clusters may be overlapped and the input vectors may have dependent components. In this work, the performance of the RPCL algorithm and the improved one was compared based on two problems: (1) identi cation of Gaussian clusters and (2) speaker veri cation. The results show that the improved RPCL algorithm is superior to the original one in determining the optimal number of cluster centers. The remainder of this paper is organized as follows. In Section 2, the improved RPCL algorithm is introduced. Its performance in identifying Gaussian clusters and in optimizing EBF network's architecture is compared with that of the original RPCL algorithm in Section 3. Finally, conclusions are drawn in Section 4.

Clustering is a process in which a collection of data is partition into meaningful subgroups. It has important applications in many problem domains. Over the years, a number of clustering methods have been proposed [1]. One of the well-known methods is the K-means algorithm [2], which iteratively reassigns each data point to the cluster with the closest center, and then recalculates the cluster centers. This method is simple and computationally ecient. However, its disadvantage is that the number of clusters should be pre-speci ed. In many applications, this number is usually unknown. 2 The Improved RPCL Algorithm To tackle this problem, Xu et al. [3] proposed a new idea of competitive learning called ri- The improved RPCL algorithm starts with a val penalized competitive learning (RPCL) in large number of units in the competitive learn-

ing network. This number is larger than the Step 3) For each pattern ~x 2 D, nd its closexpected number of clusters in the input data est center and assign it to the associated set. After every iteration, the number of samcluster. ples in each cluster is determined. Then, the units whose clusters contain only a few samples are discarded. We denote these units as Step 4) For each cluster, determine the number of samples it contains. If the ratio of \abundant units". This step is to ensure that this number to the total number of patterns sucient training samples are used to compute falls below the threshold  , discard the asthe covariance matrices. The original RPCL sociated center (unit). algorithm is modi ed as follows in order to obtain the improved RPCL algorithm: Step 5) Modi ed the number of centers k acStep 1) Initially, set the number of cluster cording to the number of units discarded. If k remains unchanged (i.e. no unit has been centers, k, to a value that is larger than the discarded), set p = p + 1, else set p = 0. possible number of clusters in the data set. k Initialize the matrices fj gj =1 by using the K-nearest neighbor algorithm with K being Step 6) If p = 2, stop; otherwise, continue from Step 2. set to 2, and de ne a threshold  for discarding abundant units. Set a counter p to The original RPCL algorithm is found to be zero. very sensitive to the learning rates of the winStep 2) Set the counter t = 0. ner and rival units. This is because some (a) Randomly select a sample ~x from the \abundant units" (which should have been discarded) remain intact when the algorithm ndata set D. For i = 1; :::; k, let ishes. To solve this problem, we repeat Step 8 > < 1 if i = c; 2 to Step 5 and keep on discarding abundant i = > ?1 if i = r; (1) units until p = 2. In other words, we stop the : 0 otherwise algorithm after detecting that no more units can be discarded for two consecutive iterations. where This scheme makes the improved algorithm less

c (~x ? !~ c )T ?c 1 (~x ? ~!c) = (2) sensitive to the learning rates. min

j (~x ? ~!j )T ?j 1 (~x ? ~!j ) j

r (~x ? ~!r )T ?r 1 (~x ? !~ r ) = min j (~x ? ~!j )T ?j 1 (~x ? ~!j ):

(3)

j 6=c

Here, i = ni =ki=1 ni where ni is the number of occurrences of i = 1. Note that c and represent the winner and rival units, respectively. (b) Update the weights f~!i gki=1 by 4~!i(t) =

(

c (t)(~x ? ~!i ) if i = 1; ? r (t)(~x ? ~!i ) if i = ?1;

0 otherwise

(4)

where 0  c (t); r (t)  1 are the learning rates for the winner and rival units, respectively. Note that only the winner and the rival units will be updated. (c) Increment t by 1. If t < T , go to Step 2(a); otherwise, go to Step 3.

3 Experiments and Results

3.1 Identi cation of Gaussian clusters In this experiment, we used three Gaussian distributed clusters with each cluster containing 500 samples. The covariance matrices of the three Gaussian distributions are [(0:5; 0:2)T (0:2; 0:3)T ], [(0:3; 0:0)T (0:0; 0:3)T ] and [(0:5; ?0:2)T (?0:2; 1:0)T ], and their mean vectors are (?1:0; 1:0)T , (0:0; ?1:0)T and (1:0; ?1:0)T . Fig. 1(a) illustrates the three distributions. We xed the learning rates c and r at 0.06 and 0.0004 respectively, and the threshold  was set to 0.1. The original and the improved RPCL algorithms were started with various number of centers. One hundred simulation runs were performed for each starting condition. Tables 1 and 2 show the number of times (out of 100)

Table 1: Frequency distributions (based on 100 runs) of the numbers of clusters that have been identi ed by the original RPCL algorithm. Initial No. of centers 3 5 10

Final No. of centers 1 2 3 4 5 6 7 8 0 83 17 / / / / / 0 1 88 11 0 / / / 0 0 8 92 0 0 0 0

9 / / 0

10 / / 0

Training data 3 Cluster 1 Cluster 2 Cluster 3 2

1

0

Table 2: Frequency distributions (based on 100 runs) of the numbers of clusters that have been identi ed by the improved RPCL algorithm. Initial No. of centers 3 5 10

1 7 13 8

2 40 24 21

Final No. of centers 3 4 5 6 7 8 53 / / / / / 60 1 2 / / / 63 5 3 0 0 0

9 / / 0

10 / / 0

that a particular number of clusters was identi ed by the algorithms. It is evident that the improved RPCL algorithm can identify the three clusters at a higher success rate than the original RPCL algorithm. Figures 1(b) and 1(c) shows the centers identi ed by the original and the improved RPCL algorithm when the initial number of centers was set to 10. Note that these gures are based on two of the simulation runs where both algorithms are successful in identifying the three clusters. The centers found by the original RPCL algorithm are (0:021; ?1:013)T , (?1:004; 0:986)T and (1:161; 0:599)T , whereas those found by the improved RPCL are (1:111; 1:009)T , (?1:114; 1:001)T and (?0:079; ?1:081)T . Therefore, the average distance between the clusters centers found by the improved RPCL algorithm and the true centers is 0.112, while that for the original RPCL algorithm is 0.157. This result shows that the original RPCL algorithm not only has a lower success rate (8%) in identifying the clusters, but also gives a poorer estimation of the center locations.

3.2 Speaker Veri cation

-1

-2

-3 -3

-2

-1

(a)

0

1

2

3

2

3

Training data 3

2 Center 3 1

0

Center 2

-1

-2

Center 1

-3 -3

-2

-1

(b)

0

1

Training data 3

2

1

Center 2

0

Center 1

Center 3

-1

-2

-3 -3

-2

-1

(c)

0

1

2

3

Figure 1: (a)The training samples; (b)clusters identi ed by the original RPCL algorithm; We have also applied the RPCL and the im(c)clusters identi ed by the improved RPCL alproved RPCL algorithms to determine the number of centers of elliptical basis function gorithm. (EBF) networks [4] in an attempt to solve a speaker veri cation task.

3.2.1 EBF networks EBF networks are an extension of radial basis function (RBF) networks [5]. Instead of having equal function width along di erent input dimensions, each basis function of an EBF network incorporates a full covariance matrix. These matrices enable the EBF networks to model complex distributions without the need of using a large number of function centers. The k-th output of an EBF network has the form J X yk (~x) = wk0 + wkj j (~x) (5) j =1

where

)

(

1 j (~x) = exp ? (~x ? ~j )T ?j 1 (~x ? ~j ) : 2 j (6) In (5) and (6), ~x is the input vector, ~j and j are the mean vector and covariance matrix of the jth basis function respectively, wk0 is a bias term, and j is a smoothing parameter controlling the spread of the j th basis function. The basis function parameters f~gJj=1 and fgJj=1 can be estimated by the K-means algorithm and sample covariance [2], ie., 1 X ~x jx j

Each speaker spoke two dialectal sentences (the SA sentence set), ve phonetically compact sentences (the SX sentence set) and three phonetically diverse sentences (the SI sentence set). Each speaker spoke the same set of sentences in the SA set. In the SX sentence set, some speakers spoke the same sentences. However, all sentences in the SI sentence set are di erent. The speakers were divided into three sets: speaker set (20 speakers), anti-speaker set (20 speakers), and impostor set (36 speakers). The SA and SX sentence sets were used as the training set, and the SI sentence set was the test set. This arrangement allows us to perform textindependent speaker veri cation experiments. The feature vectors that characterize the voice of speakers were derived from an LPC analysis procedure [7]. For each sentence, the silent regions were removed by using the information provided by the .phn les of the corpus. The remaining signals were pre-emphasized by a lter with transfer function 1 ? 0:95z ?1 . For every 14 ms, 12th order LP-derived cepstral coecients were computed using a 28 ms Hamming window.

(7) 3.2.3 Veri cation procedure and results Each speaker in the speaker set was assigned a network characterizing his/her own voice. Each X network is composed of 12 inputs, 2 outputs j = jx1 j (~x ? ~j ) (~x ? ~j )T (8) and function centers contributed by the correj ~x2Xj sponding speaker and the speakers in the antiwhere Xj denotes the j th cluster and jXj j is speaker set. The networks were trained to disthe number of training samples in Xj . Once tinguish the voice of the speakers in the speaker ~ j and j are known, the output weight wkj set and the anti-speaker set. RPCL and imcan be found by the technique of singular value proved RPCL were applied to nd the optimal decomposition [6]. number of function centers (J ) for each network. Each network was started with 10, 15 3.2.2 Speech data and 20 centers, the learning rates were set to In the experiments, we used 76 speakers (23 c = 0:006 and r = 0:0004. The threshold female and 53 male) from dialect region 2 of  was set to the reciprocal of the number of the TIMIT and NTIMIT corpuses. TIMIT starting centers. This arrangement was found is a phonetically balanced, continuous speech to be e ective in avoiding the removal of useful corpus, and NTIMIT was obtained by pass- units. The results are shown in Table 3, from ing the speech in the TIMIT corpus through which we can see that most networks need less a telephone network, resulting in a telephone than ten centers to model the speaker's voice. bandwidth corpus. Both corpuses contain 630 A veri cation procedure similar to that of [4] speakers separated into eight dialect regions. was applied to evaluate the performance of the ~j =

j x2Xj

Table 3: Optimal numbers of centers obtained by the original (org.) and the improved (imp.) RPCL algorithms. speaker faem0 fajw0 fcaj0 org. rpcl 4 6 6 imp. rpcl 8 8 10 speaker fdas1 fdnc0 fdxw0 org. rpcl 6 6 5 imp. rpcl 10 9 10 speaker mcew0 mctm0 mdbp0 org. rpcl 7 6 9 imp. rpcl 10 8 12 speaker mdlc2 mdmt0 mdps0 org. rpcl 7 5 6 imp. rpcl 8 11 10

fcmm0 fcyl0 6 5 10 10 marc0 mbjv0 8 5 10 10 mdem0 mdlb0 7 5 9 10 mdss0 mdwd0 6 7 10 9

Table 4: Error rates based on the TIMIT corpus. Initial No. of centers 10 15 20

original RPCL FAR FRR AVE 0.96 0.36 0.58 0.68 0.27 0.43 0.38 0.27 0.32

improved RPCL FAR FRR AVE 0.64 0.36 0.48 0.36 0.28 0.32 0.43 0.18 0.28

gorithm by incorporating full covariance matrices in the distance computation and by discarding the units with only a few training samples. Our results show that the improved RPCL with full covariance matrices has a better representation of the clusters than the original RPCL algorithm, where only diagonal covariance matrices with equal variances are used. In the speaker veri cation experiment, the lowest error rate attained by the RPCL algorithm is 0.32% in the TIMIT corpus and 15.14% in the NTIMIT corpus, while they were reduced to 0.28% and 14.08% respectively when the improved RPCL was used.

Acknowledgement This project was supported by the Hong Kong

EBF networks. Table 4 and Table 5 show the false acceptance rates (FARs) and the false rejection rates (FRRs) achieved by the EBF networks. The average error rate (AVE) is the square root of the product of the FAR and FRR, which has been shown to be a good performance measure [8]. The result demonstrates the superiority of the improved RPCL algorithm over the original one, suggesting that the full covariance matrices of the improved RPCL algorithm can give a better representation of speaker features. This is because feature vectors with dependent components can be modeled by the full covariance matrices, but not the diagonal ones.

4 Conclusion Selecting an appropriate number of clusters is a dicult problem in the classical K-means algorithm. The purpose of the rival penalized competitive learning (RPCL) algorithm is to address the problem. But its performance is poor when the input vectors contain dependent components. We proposed an improved RPCL alTable 5: Error rates based on the NTIMIT corpus. Initial No. of centers 10 15 20

original RPCL FAR FRR AVE 32.77 8.91 17.09 23.05 10.56 15.60 21.86 10.49 15.14

improved RPCL FAR FRR AVE 23.29 11.49 16.36 19.43 11.80 15.14 17.25 11.49 14.08

Polytechnic

University

Grant

No.

0351/501.

References [1] P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice Hall, 1982. [2] R. O. Duda and P. E. Hart. Pattern Classi cation and Scene Analysis. Wiley, 1973. [3] L. Xu, A. Krzyzak, and E. Oja. Rival penalized competitive learning for clustering analysis, rbf net, and curve detection. IEEE Transactions on Neural Networks, 4(4):636{649, 1993. [4] M. W. Mak. Text-independent speaker veri cation over a telephone network by radial basis function networks. In International Symposium on MultiTechnology Information Processing, pages 145{150, Taiwan, 1996. National Tsing Hua University. [5] J. Moody and C. J. Darken. Fast learning in networks of locally tuned processing units. Neural Computation, 1:281{194, 1989. [6] W. H. Press et al. Numerical Recipes in C. Cambridge University Press, 1994. [7] J. Makhoul. Linear prediction: A tutorial review. Proceedings of IEEE, 63:561{580, 1975. [8] G. R. Doddington. Voice authentication gets the go-ahead for security systems. Speech Technology, 2:14{23, 1983.

Suggest Documents