An Adaptive Learning Algorithm for Supervised ...

2 downloads 0 Views 492KB Size Report
algorithm and the improvement of noise tolerance. Keywords: supervised neural network; contour preserving classification; noise tolerance; outpost vector.
An Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification Piyabute Fuangkhon, Thitipong Tanprasert Distributed and Parallel Computing Research Laboratory Faculty of Science and Technology, Assumption University 592 Soi Ramkamhang 24, Ramkamhang Road, Huamak, Bangkapi, Bangkok, Thailand [email protected], [email protected]

Abstract. A study of noise tolerance characteristics of an adaptive learning algorithm for supervised neural network is presented in this paper. The algorithm allows the existing knowledge to age out in slow rate as a supervised neural network is gradually retrained with consecutive sets of new samples, resembling the change of application locality under a consistent environment. The algorithm utilizes the contour preserving classification algorithm to preprocess the training data to improve the classification and the noise tolerance. The experimental results convincingly confirm the effectiveness of the algorithm and the improvement of noise tolerance. Keywords: supervised neural network; contour preserving classification; noise tolerance; outpost vector

1 Introduction It is known that repetitive feeding of training samples is required for allowing a supervised learning algorithm to converge. If the training samples effectively represent the population of the targeted data, the classifier can be approximated as being generalized. However, there are many times when it is impractical to obtain such a truly representative training set. Many classifying applications are acceptable with convergence to a local optimum. For example, a voice recognition system may be customized for effectively recognizing voices of a group of limited number of users, so the system may not be practical for recognizing any speaker. As a consequence, this kind of application needs occasional retraining when there is sliding of actual context locality. Our focus is when only part of the context is changed; thereby establishing some new cases, while inhibiting some old cases, assuming a constant system complexity. The classifier will be required to effectively handle some old cases as well as new cases. Assuming that this kind of situation will occur occasionally, it is expected that the old cases will age out, the medium-old cases are accurately handled to a certain degree, and new cases are most accurately handled. Since the existing knowledge is lost while retraining new samples, an approach to maintain old knowledge is required. While the typical solution uses both prior samples and new samples on retraining, the

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-05253-8_43

2

Piyabute Fuangkhon, Thitipong Tanprasert

major drawback of this approach is that all the prior samples for training must be maintained. Research works related to the proposed algorithm are in the field of adaptive learning [1], [2], incremental learning [16], and contour preserving classification [3]. The prior one can be categorized into three strategies [3]. The first strategy (increasing neurons) [4], [5], [6], [7] will increase number of hidden nodes when the error is excessively high. These algorithms will adapt weights of neuron that is closest with input sample and neighbors only. However, the increasing size of neural networks is a cause of accuracy tradeoff. The second strategy (rule extraction) [8], [9], [10], [11] will interchange between rules and weight of neuron. The accuracy of networks depends on discovered rules. The translation of weight vector into rules also partly suppresses certain inherited statistical information. The last strategy (aggregation) [12], [13], [14], [15] will allow existing weights to change in bounded range and will always add new neurons for learning samples of new context. This method requires two contexts be similar and network’s size is incrementally larger. In this paper, an alternative algorithm is proposed for solving adaptive learning problem for supervised neural network. The algorithm improved from [17], [18] is able to learn new knowledge while maintaining an old knowledge through the decay rate while allowing the adjustment of the number of new samples. In addition, the improvement of the classification and the noise tolerance is archived by utilizing the contour preserving classification algorithm which helps expanding the territory of both classes while maintaining the shape of both classes. Following this section, section 2 summarizes the outpost vector model. Section 3 describes the methodology. Section 4 demonstrates the experimental results of a 2dimension partition problem. Section 5 discusses the conclusion of the paper.

2 The Outpost Vector Model This section summarizes the outpost vector method as originally published in [3]. Fig. 1 illustrates the concept of outpost vector. Each input vector is modeled to span its territory as a circle (sphere in case of 3-dimensional space or hyper sphere in case of more-dimensional space) until the territories collide against one another. In Fig. 1, the territory of input vector k of class A (denoted by Ak) is founded by locating the input vector in class B which is nearest to Ak (denoted by B*(Ak)) and declaring the territory at half way between Ak and B*(Ak). Accordingly, the radius of Ak’s territory is set at half of the distance between Ak and B*(Ak). This is to guarantee that if B*(Ak) sets its territory using the same radius, then the distance from the hyper plane to either Ak or B*(Ak) will be at maximum. For B*(Ak), the nearest input vector of class A to B*(Ak) is not necessarily Ak. * B (Ak) may not place its outpost vector against Ak although Ak has placed an outpost vector against it. However, B*(Ak) places its outpost vector against Aj because it is nearest to B*(Ak). Since it is attempted to carve the channel between the two classes in such a way that its concavity or convexity can influent the learning process, it is desirable to generate an additional set of outpost vectors. As illustrated in Fig. 1, an additional outpost vector of B*(Ak) (designated as “class B”) is placed on B*(Ak)’s

An Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification 3

territory in the direction of Ak in response to the existence of an outpost vector of Ak in that direction. In [3], a theorem was provided for proving that the territory created will not overlap across different classes. All the outpost vectors are combined with the original input vectors for training the network. Outpost Vector of Ak taking B*(Ak) as Ak Ak’s territory B*(Ak) Optional outpost vector A

B*(A )’s k

Outpost Vector of B*(Ak) taking Aj as the nearest

Fig. 1. Territory and Outpost Vectors

3 Methodology The algorithm [17], [18] utilizes the concept of adaptive learning and outpost vectors in modeling the new training samples. The adaptive learning algorithm will maintain limited number of training samples from the previous training session (decayed prior samples) to be used in the next training session while the outpost vectors will help expanding the territory of both classes and maintaining the shape of the boundary between both classes. There are three parameters in the algorithm: new sample rate, outpost vector rate, and decay rate. Firstly, the new sample rate is the ratio of the number of selected new samples over the number of new samples. It determines the number of selected new samples to be included in the final training set. The larger new sample rate will cause the network to learn new knowledge more accurately. The number of selected new samples is calculated by formula: nss = nw × ns where

(1)

nss is the number of selected new samples, nw is the new sample rate [0, ∞), ns is the number of new samples Secondly, the outpost vector rate is the ratio of the number of generated outpost vectors over the number of new samples. It determines the number of outpost vectors to be included in the final training set. Using larger outpost vector rate will generate

4

Piyabute Fuangkhon, Thitipong Tanprasert

more outpost vectors at the boundary between both classes. When outpost vector rate is equal to 1.0, the number of outpost vectors will be at maximum. The number of outpost vectors is calculated by formula: nov = ov × ns

(2)

where

nov is the number of outpost vectors, ov is the outpost vector rate [0,1], ns is the number of new samples Lastly, the decay rate is the ratio of the number of decayed prior samples over the number of new samples. It determines the number of decayed prior samples to be included in the final training set. The larger decay rate will cause the network to forget old knowledge slower. When decay rate is greater than 1.0, more than one instance of some prior samples will be included in the decayed prior sample set. There is an exception for the first training session where the new samples will also be used as the decayed prior samples. The number of decayed prior samples is calculated by formula: ndc = dc × ps where

(3)

ndc is the number of selected prior samples, dc is the decay rate [0, ∞), ps is the number of prior samples After the final training set is available, it will be trained with a supervised neural network. The whole process will be repeated when new samples have arrived. Algorithm Sub_Trainning _with_Outpost_Vectors 1 Set new sample set as prior sample set for the first training session 2 for each training session 3 Construct selected new sample set by 3 If new sample rate is equal to 1.0 4 Select all new samples from new sample set 5 Else 6 Calculate the number of select new samples (1) 7 Randomly select samples from new sample set 8 End if 9 Construct outpost vector set by 10 Calculate the number of outpost vectors (2) 11 Generate outpost vectors from new sample set 12 Construct decayed prior sample set by 13 Calculate the number of decayed prior samples (3) 14 Randomly select samples from prior sample set 15 Construct final sample set by 16 UNION ( selected new sample set, outpost vector set ,decayed prior sample set ) 17 Train the network with the final sample set 18 Set final sample set as prior sample set for the

An Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification 5

next training session 19 end for In order to study the noise tolerance of the algorithm, a random noise radius is used to move the samples to their nearby location. Samples at the boundary can even be moved across their boundary. However, the new location must also be outside the territory of the other class.

4 Experiment The experiment was conducted on a machine using Intel® Pentium D™ 820 2.4 GHz with 2.0 GB main memory running Microsoft® Windows™ XP SP3. Feed-forward backpropagation neural network running under MATLab 2007a is used as the classifier. The proposed algorithm was tested on the 2-dimension partition problem. The distribution of samples was created in limited location of 2-dimension donut ring as shown in Fig. 2. This partition had three parameters: Inner Radius (R1), Middle Radius (R2) and Outer Radius (R3). The class of samples depended on geometric position. There were two classes which were designed as one and zero. Inner Radius Middle Radius Outer Radius

0 < Radius ≤ R1 R1 < Radius ≤ R2 R2 < Radius ≤ R3

Class 0 Class 1 Class 0

The context of the problem was assumed to shift from an angular location to another while maintaining some overlapped area between consecutive contexts as shown in Fig. 3. The set numbers shown in Fig. 3 identify the sequence of training and testing sessions. In the experiment, each training and testing set consisted of eight sub sets of samples generated from eight problem contexts. Each sub set of samples consisted of 400 new samples (200 samples from class 0 and 200 samples from class 1). The samples from class 0 were placed in the outer radius and inner radius. The samples from class 1 were placed in the middle radius. In the sample generation process, the radius of the donut ring was set to 100. Placement of the samples from both classes was restricted by the gap or empty space introduced between both classes to test the noise tolerance of the network. There were two training sets having the gap of size 5 (Set A) and 10 (Set B) as shown in Fig. 4. The gap of size 0 was not used to generate the training set because there was no available space at the boundary between both classes to generate outpost vectors. There were six main testing sets having the gap and noise radius 5:0 (Set M), 5:5 (Set N), 5:10 (Set O), 10:0 (Set P), 10:10 (Set Q), and 10:20 (Set R) as shown in Fig. 5. To test the noise tolerance of the training sets, four testing sets having the gap and noise radius 5:5 (Set S), 5:10 (Set T), 10:10 (Set U), and 10:20 (Set V) were generated by adding noise radius to all samples in the two training sets. The noise samples

6

Piyabute Fuangkhon, Thitipong Tanprasert

introduced into the testing set (Set N, O, Q, R, S, T, U, V) were intended to test the noise tolerance of the network when outpost vector was applied. In the training process, the sub training (training on sub set of training set) was conducted eight times with eight sub training sets to cover eight problem contexts in a training set. Each final sub training set was composed of three components: 1. Selected new samples taken from sub training set 2. Outpost vectors generated from sub training set 3. Decay prior samples randomly selected from final sub training set from the previous training session The numbers of vectors in each part of the final sub training set were determined by the new sample rate, the outpost vector rate, and the decay rate. Table 1 shows the number of vectors in a sample final sub training set when new sample set consisted of 200 samples, new sample rate was equal to 1.0, outpost vector rate was equal to 0.5 and decay rate was equal to 1.0. Fig. 6 shows samples of final training set for the last training session when decay rate is equal to 1.0 and 2.0 respectively. Because the prior sample set was constructed at the end of the algorithm, there was no prior sample set to make decayed prior sample set for the first sub training session. An additional step for solving this problem is to also use the new sample set as the prior sample set. The procedure for the experiment started from the feed forward back-propagation neural network being trained with the following parameters: 1. network size = [10 1] 2. transfer function for hidden layer = “logsig” 3. transfer function for output layer = “logsig” 4. max epochs = 500 5. goal = 0.001 After the first training session (S1), seven sub training sessions followed. At the end of the eighth sub training session (S8), the supervised neural network was tested with the sub testing samples from every context to evaluate the performance of testing data from each context. The testing results are shown in Table 2, 3, 4, 5, 6, 7, and 8. For the testing sets without noise samples (Set M, P), applying outpost vector can lower the mean square error (MSE) effectively. For the testing sets with noise samples and small gap (Set N, O), medium outpost vector rate (OV 0.5) give better results. For the testing sets with noise samples and large gap (Set Q, R), large outpost vector rate (OV 1.0) give better results. For the testing sets with noise samples generated from the training sets (Set S, T, U, V), large outpost vector rate (OV 1.0) generally give better results.

0

1

0

R1 R2 R3 Θ = 0°

Fig. 2. Shape of Partition Area

Fig. 3. Shifting of Problem Context

An Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification 7

The testing results show that the proposed algorithm can classify samples in the newer contexts (S1, S7, S8) accurately while the accuracy of classifying samples from the older contexts (S2, S3, S4, S5, S6) is lower because its old knowledge is decaying. The proposed algorithm presents some level of noise tolerance because the difference between the mean square errors (MSEs) of the classification of testing sets with and without noise samples is insignificant. 100

100 Class 0

Class 1

Class 0

Class 0

90

90

80

80

70

70

60

60

50

50

40

40

30

30

20

20

10

Class 1

Class 0

10

0

0 0

20

40

60

80

100

0

(a) Set A, Gap 5

20

40

60

80

100

(b) Set B, Gap 10

Fig. 4. Sample Sub Training Sets in S8 100

100

100 Class 0

Class 1

Class 0

Class 0

Class 1

Class 0

Class 0

90

90

90

80

80

80

70

70

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

0 20

40

60

80

100

(a) Set M, Gap 5, Noise 0

0 0

20

40

60

80

Class 1

Class 0

Class 1

90

80

80

80

70

70

70

60

60

60

50

50

50

40

40

40

30

30

30

20

20

20

10

10

40

60

80

(d) Set P, Gap 10, Noise 0

100

100

Class 1

Class 0

0 0

20

40

60

80

100

(e) Set Q, Gap 10, Noise 10

Fig. 5. Sample Sub Testing Sets in S8 Table 1. The Number of Vectors in a Final Sub Training Set Type Selected New Samples Outpost Vectors Decayed Prior Samples

80

10

0

20

60

Class 0

Class 0

90

0

40

100

Class 0

90

0

20

(c) Set O, Gap 5, Noise 10

100 Class 0

0

100

(b) Set N, Gap 5, Noise 5

100

Class 0

10

0

0

Class 1

Rate 1.0 0.5 1.0

Vectors 200 ( 200 × 1.0 ) 100 ( 200 × 0.5 ) 200 ( 200 × 1.0 )

0

20

40

60

80

(f) Set R, Gap 10, Noise 0

100

8

Piyabute Fuangkhon, Thitipong Tanprasert

5 Conclusion A study of noise tolerance characteristics of an adaptive learning algorithm for supervised neural network is presented. The noise samples are used to test the noise tolerance of the algorithm. Overall result shows that combining adaptive learning algorithm with contour preserving classification yields effective noise tolerance, better learning capability, and higher accuracy. 100

Class 0

100

Class 1

80

80

60

60

40

40

20

20

0 -100

-50

Class 0

Class 1

0

-20

0

50

100

-100

-50

-20

-40

-40

-60

-60

-80

-80

-100

-100

(a) 1200 Samples, Gap 5, Decay 1

0

50

100

(b) 1600 Samples, Gap 5, Decay 2

Fig. 6. Sample Last Sub Training Set

Table 2. MSEs from Training Set A with Testing Set M, N, O and Decay Rate 1 SET M N O M N O M N O

OV 0.0 0.0 0.0 0.5 0.5 0.5 1.0 1.0 1.0

NS 00 05 10 00 05 10 00 05 10

S1 0.13 0.14 0.14 0.02 0.02 0.02 0.20 0.21 0.21

S2 0.21 0.19 0.21 0.14 0.14 0.15 0.38 0.38 0.37

S3 0.17 0.12 0.16 0.15 0.15 0.14 0.27 0.27 0.27

S4 0.16 0.17 0.19 0.02 0.03 0.03 0.14 0.13 0.14

S5 0.25 0.26 0.25 0.00 0.00 0.01 0.06 0.06 0.07

S6 0.14 0.14 0.14 0.00 0.00 0.00 0.02 0.03 0.03

S7 0.01 0.01 0.03 0.00 0.00 0.00 0.01 0.01 0.02

S8 0.03 0.04 0.04 0.00 0.00 0.00 0.01 0.01 0.01

Table 3. MSEs from Training Set A with Testing Set M, N, O and Decay Rate 2 SET M N O M N O M N O

OV 0.0 0.0 0.0 0.5 0.5 0.5 1.0 1.0 1.0

NS 00 05 10 00 05 10 00 05 10

S1 0.27 0.28 0.27 0.12 0.12 0.12 0.18 0.18 0.18

S2 0.30 0.30 0.30 0.23 0.23 0.24 0.32 0.32 0.32

S3 0.28 0.28 0.27 0.13 0.13 0.13 0.31 0.31 0.31

S4 0.28 0.27 0.28 0.02 0.03 0.03 0.20 0.20 0.20

S5 0.28 0.28 0.27 0.01 0.01 0.02 0.11 0.10 0.11

S6 0.23 0.25 0.23 0.00 0.00 0.01 0.08 0.09 0.09

S7 0.21 0.20 0.21 0.00 0.00 0.01 0.07 0.07 0.08

S8 0.21 0.22 0.21 0.00 0.00 0.01 0.05 0.06 0.06

An Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification 9 Table 4. MSEs from Training Set B with Testing Set P, Q, R and Decay Rate 1 SET P Q R P Q R P Q R

OV 0.0 0.0 0.0 0.5 0.5 0.5 1.0 1.0 1.0

NS 00 10 20 00 10 20 00 10 20

S1 0.17 0.18 0.17 0.27 0.27 0.27 0.22 0.25 0.24

S2 0.30 0.31 0.30 0.45 0.45 0.45 0.31 0.34 0.34

S3 0.26 0.27 0.27 0.38 0.38 0.37 0.22 0.22 0.25

S4 0.19 0.20 0.20 0.39 0.39 0.38 0.26 0.27 0.29

S5 0.21 0.21 0.21 0.35 0.34 0.33 0.25 0.25 0.24

S6 0.16 0.16 0.17 0.19 0.20 0.21 0.23 0.23 0.25

S7 0.08 0.08 0.10 0.08 0.08 0.09 0.12 0.13 0.14

S8 0.06 0.07 0.09 0.04 0.04 0.07 0.01 0.01 0.05

Table 5. MSEs from Training Set B with Testing Set P, Q, R and Decay Rate 2 SET P Q R P Q R P Q R

OV 0.0 0.0 0.0 0.5 0.5 0.5 1.0 1.0 1.0

NS 00 10 20 00 10 20 00 10 20

S1 0.22 0.24 0.25 0.16 0.16 0.16 0.17 0.17 0.18

S2 0.32 0.32 0.32 0.31 0.31 0.31 0.28 0.30 0.30

S3 0.35 0.35 0.35 0.34 0.34 0.33 0.15 0.16 0.17

S4 0.33 0.33 0.34 0.29 0.29 0.29 0.03 0.04 0.08

S5 0.28 0.29 0.29 0.19 0.19 0.20 0.09 0.09 0.11

S6 0.25 0.25 0.25 0.08 0.08 0.10 0.10 0.09 0.11

S7 0.20 0.21 0.21 0.03 0.03 0.05 0.01 0.01 0.02

S8 0.17 0.18 0.20 0.03 0.04 0.06 0.00 0.01 0.02

Table 6. MSEs from Training Set A with Testing Set S, T and Decay Rate 1 SET S T S T S T

OV 0.0 0.0 0.5 0.5 1.0 1.0

NS 05 10 05 10 05 10

S1 0.13 0.13 0.01 0.02 0.19 0.18

S2 0.19 0.25 0.14 0.15 0.37 0.39

S3 0.14 0.18 0.15 0.14 0.27 0.26

S4 0.15 0.14 0.02 0.02 0.14 0.14

S5 0.24 0.24 0.00 0.00 0.06 0.07

S6 0.14 0.14 0.00 0.00 0.03 0.02

S7 0.01 0.08 0.00 0.02 0.01 0.02

S8 0.02 0.02 0.00 0.00 0.01 0.01

Table 7. MSEs from Training Set B with Testing Set U, V and Decay Rate 1 SET U V U V U V

OV 0.0 0.0 0.5 0.5 1.0 1.0

NS 10 20 10 20 10 20

S1 0.18 0.20 0.29 0.31 0.24 0.28

S2 0.30 0.30 0.43 0.44 0.32 0.29

S3 0.27 0.26 0.41 0.37 0.22 0.23

S4 0.20 0.21 0.39 0.42 0.27 0.27

S5 0.21 0.21 0.35 0.35 0.24 0.25

S6 0.16 0.16 0.17 0.17 0.21 0.21

S7 0.07 0.09 0.07 0.08 0.11 0.14

S8 0.08 0.07 0.05 0.06 0.01 0.03

Table 8. MSEs from Training Set B with Testing Set U, V and Decay Rate 2 SET U V U V U V

OV 0.0 0.0 0.5 0.5 1.0 1.0

NS 10 20 10 20 10 20

S1 0.23 0.28 0.18 0.20 0.18 0.21

S2 0.37 0.26 0.33 0.30 0.31 0.26

S3 0.30 0.37 0.34 0.33 0.17 0.15

S4 0.33 0.29 0.29 0.29 0.03 0.08

S5 0.29 0.28 0.19 0.19 0.08 0.09

S6 0.24 0.24 0.07 0.07 0.10 0.10

S7 0.20 0.21 0.03 0.03 0.01 0.02

S8 0.19 0.18 0.04 0.05 0.01 0.01

10

Piyabute Fuangkhon, Thitipong Tanprasert

References [1] T. Tanprasert, and T. Kripruksawan, “An approach to control aging rate of neural networks under adaptation to gradually changing context”, ICONIP’02, 2002. [2] T. Tanprasert, and S. Kaitikunkajorn, “Improving synthesis process of decayed prior sampling technique”, InTech’05, 2005. [3] T. Tanprasert, C. Tanprasert, and C. Lursinsap, “Contour preserving classification for maximal reliability”, IJCNN’98, 1998. [4] V. Burzevski, and C. K. Mohan, “Hierarchical growing cell structures”, ICNN’96, 1996. [5] B. Fritzke, “Vector quantization with a growing and splitting elastic net”, ICANN’93, 1993. [6] B. Fritzke, “Incremental learning of local linear mappings”, ICANN’95, 1995. [7] T. M. Martinez, S. G. Berkovich, and K. J. Schulten. “Neural-gas network for vector quantization and it application to time-series prediction”, IEEE Transactions on Neural Networks, 1993. [8] S. Chalup, R. Hayward, and D. Joachi, “Rule extraction from artificial neural networks trained on elementary number classification tasks”, Proceedings of the 9th Australian Conference on Neural Networks, 1998. [9] M. W. Craven, and J. W. Shavlik. “Using sampling and queries to extract rules from trained neural networks”, ICML’94, 1994. [10] R. Setiono. “Extracting rules from neural networks by pruning and hidden-unit splitting”, Neural Computation, 1997. [11] R. Sun, “Beyond simple rule extraction: Acquiring planning knowledge from neural networks”, ICONIP’01, 2001. [12] S. Thrun, and T. M. Mitchell, “Integrating inductive neural network learning and explanation based learning”, IJCAI’93, 1993. [13] G. G. Towell, and J. W. Shavlik, “Knowledge based artificial neural networks”, Artificial Intelligence, 1994. [14] T. Mitchell, and S. B. Thrun, “Learning analytically and inductively”, Mind Matters: A Tribute to Allen Newell, 1996. [15] P. Fasconi, M. Gori, M. Maggini, and G. Soda, “Unified integration of explicit knowledge and learning by example in recurrent networks”, IEEE Transactions on Knowledge and Data Engineering, 1995. [16] R. Polikar, L. Udpa, S. S. Udpa, and V. Honavar, “Learn++ : An incremental learning algorithm for supervised neural networks”, IEEE Transactions on Systems, Man, and Cybernetics, 2001. [17] T. Tanprasert, P. Fuangkhon and C. Tanprasert, “An Improved Technique for Retraining Neural Networks In Adaptive Environment”, INTECH’08, 2008. [18] P. Fuangkhon, and T. Tanprasert, “An Incremental Learning Algorithm for Supervised Neural Network with Contour Preserving Classification”, ECTI-CON’09, 2009

Suggest Documents