A Dual Network Adaptive Learning Algorithm for ... - Semantic Scholar

0 downloads 0 Views 427KB Size Report
classification is improved from the previous works by adding outpost vectors ... Keywords: Supervised Neural Network; Outpost Vector; Contour Preserving.
A Dual Network Adaptive Learning Algorithm for Supervised Neural Network with Contour Preserving Classification for Soft Real Time Applications Piyabute Fuangkhon, Thitipong Tanprasert Distributed and Parallel Computing Research Laboratory, Assumption University 592 Soi Ramkamhang 24, Ramkamhang Road, Hua Mak, Bangkapi, Bangkok 10240 TH [email protected], [email protected]

Abstract. A framework presenting a basic conceptual structure used to solve adaptive learning problems in soft real time applications is proposed. Its design consists of two supervised neural networks running simultaneously. One is used for training data and the other is used for testing data. The accuracy of the classification is improved from the previous works by adding outpost vectors generated from prior samples. The testing function is able to test data continuously without being interrupted while the training function is being executed. The framework is designed for a parallel processing and/or a distributed processing environment due to the highly demanded processing power of the repetitive training process of the neural network. Keywords: Supervised Neural Network; Outpost Vector; Contour Preserving Classification; Feed-forward Back-propagation; Soft Real Time Applications

1

Introduction

It is known that repetitive feeding of training samples is required for allowing a supervised learning algorithm to converge. If the training samples effectively represent the population of the targeted data, the classifier can be approximated as being generalized. However, there are many times when it is impractical to obtain such a truly representative training set. Many classifying applications are acceptable with convergence to a local optimum. As a consequence, this kind of application needs occasional retraining when there is sliding of actual context locality. Assuming a constant system complexity, when the context is partially changed, some new cases are introduced while some old cases are inhibited. The classifier will be required to effectively handle some old cases as well as new cases. Assuming that this kind of situation will occur occasionally, it is expected that the old cases will age out, the medium-old cases are accurately handled to a certain degree, and new cases are most accurately handled. Since the existing knowledge is lost while retraining new samples, an approach to maintain old knowledge is required. While the typical

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-13769-3_16

2

Piyabute Fuangkhon, Thitipong Tanprasert

solution uses prior samples and new samples on retraining, the major drawback of this approach is that all the prior samples for training must be maintained. In soft real time applications where new samples keep coming for both training and testing process but the deadline is desirable, the adaptive learning algorithm must be able to train data and test data simultaneously. With only one supervised neural network [18], [19], training and testing process cannot be executed at the same time. In this paper, a framework for solving adaptive learning problems by supervised neural network is proposed for soft real time applications. The framework based on [18], [19] improves the accuracy of the classification by adding outpost vectors generated from prior samples and uses two neural networks for training data and testing data simultaneously. Research works related to this paper are in the field of adaptive learning [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], incremental learning [16], contour preserving classification [17], distributed and parallel neural network [20], [21], [22]. Following this section, section 2 describes the proposed framework. Section 3 describes the experimental results. Section 4 discusses the conclusion of the paper.

2

Framework

A proposed framework consists of two supervised neural networks which are processed simultaneously to solve adaptive learning problems in soft real time applications. The first supervised neural network called “train-network” is used in training process and the second supervised neural network called “test-network” is used in the testing process. Figure 1 presents the proposed framework. It shows the details of how the final training set is constructed and how the dual network adaptive learning algorithm works. There are three parameters in [18], [19] but there are four parameters in the proposed framework: new sample rate (α), outpost vector rate for new samples (β), decay rate (γ), and outpost vector rate for prior samples (δ). The outpost vector rate for prior samples is added to sharpen the boundary between classes of prior data. The function of each parameter is as followings. Firstly, the new sample rate (α) is the ratio of the number of selected new samples over the number of arriving new samples. It determines the number of selected new samples to be included in the final training set. Using larger new sample rate will cause the network to learn new knowledge more accurately. When new sample rate is greater than 1.0, some new samples will be repeatedly included in the final training set. The number of selected new samples is calculated by formula: nns = α × ns . where

(1)

nns is the number of selected new samples, α is the new sample rate [0, ∞), ns is the number of new samples Secondly, the outpost vector rate for new samples (β) is the ratio of the number of generated outpost vectors over the number of selected new samples. It determines the

Error! Use the Home tab to apply title to the text that you want to appear here.

3

number of outpost vectors generated from new samples to be included in the final training set. Using larger outpost vector rate for new samples will sharpen the boundary of classes of new data. When outpost vector rate for new samples is greater than 1.0, duplicated locations in a problem space will be assigned to outpost vectors. The number of outpost vectors generated from new samples is calculated by formula: novns = β × ns . (2) where

novns is the number of outpost vectors, β is the outpost vector rate (new samples) [0, ∞), ns is the number of new samples Thirdly, the decay rate (γ) is the ratio of the number of decayed prior samples over the number of selected new samples. It determines the number of decayed prior samples to be included in the final training set. The larger decay rate will cause the network to forget old knowledge slower. When decay rate is greater than 1.0, more than one instance of some prior samples will be included in the decayed prior sample set. The number of decayed prior samples is calculated by formula: ndc = γ × ps . (3) where

ndc is the number of selected prior samples, γ is the decay rate [0, ∞), ps is the number of prior samples Lastly, the outpost vector rate for prior samples (δ) is the ratio of the number of generated outpost vectors over the number of prior samples. It determines the number of outpost vectors generated from prior samples to be included in the final training set. Using larger outpost vector rate for prior samples will sharpen the boundary of classes of prior data. When outpost vector rate for prior samples is greater than 1.0, duplicated locations in the problem space will be assigned to the outpost vectors. The number of outpost vectors generated from prior samples is calculated by formula: novps = δ × ps . (4) where

novps is the number of outpost vectors, δ is the outpost vector rate (prior samples) [0, ∞), ps is the number of prior samples After the generation of selected new sample set, outpost vector set generated from new samples, decayed prior sample set, and outpost vector set generated from prior samples, these sets are then combined (UNION) to form the final training set for training process. After the final training set is constructed, the train-network will be trained with the final training set. The training will be repeated if the performance of the train-network is lower than the goal. When the performance of the train-network meets the goal, the train-network will be turned into the test-network which will be used for testing process. The previous test-network if any will be discarded. The training will start again when a specified number of new samples have been collected. Both train-network and test-network will run simultaneously to solve adaptive learning problems in soft real time applications. Figure 2 presents the algorithm of the training process. The statement in [18], [19] that sets the new sample set as a dummy decayed prior sample set in the first training session is removed.

4

Piyabute Fuangkhon, Thitipong Tanprasert

Fig. 1. Proposed Framework 1 for each training session 2 Construct selected new sample set by 3 Calculate nns by (1) 4 Randomly select samples from new sample set 5 Construct outpost vector set for new samples by 6 Calculate novns by (2) 7 Generate outpost vectors from new sample set 8 Construct decayed prior sample set by 9 Calculate ndc by (3) 10 Randomly select samples from prior sample set 11 Construct outpost vector set for prior samples by 12 Calculate novps by (4) 13 Generate outpost vectors from prior sample set 14 Construct final training set by 15 UNION ( selected new sample set, outpost vector set for new samples, decayed prior sample set, outpost vector set for prior samples ) 16 Initialize train-network 17 repeat 18 Train train-network with final training set 19 until (performance of train-network meets the goal) 20 Set train-network as test-network 21 Set final training set as prior sample set for the next training session 22 end for

Fig. 2. Algorithm for the Training Process

The time complexity of the training process includes the time complexity of the final training set generation and the time complexity of the repetitive training.

Error! Use the Home tab to apply title to the text that you want to appear here.

5

The time complexity of the final training set generation is computed from the time complexity of the selected new sample set generation, the outpost vector set (new samples) generation, the decayed prior sample set generation, and the outpost vector set (prior samples) generation. Given that nA is the number of new samples of class A, nB is the number of new samples of class B, nAps is the number of prior samples of class A, and nBps is the number of prior samples of class B, · The time complexity of the selected new sample set generation is O(nA+nB). · The time complexity of the outpost vector (new samples) set generation is O(nAnB). · The time complexity of the decayed prior sample set generation is O(nAps+nBps). · The time complexity of the outpost vector (prior samples) set generation is O(nApsnBps). Therefore the total time complexity of the final training set construction is O(nAnB)+O(nApsnBps). The time complexity of the repetitive training is computed from the time complexity of the selected training function O(x) and the number of times the training function is repeated O(1). Therefore the total time complexity of the repetitive training is O(x). In total, the time complexity of the training process including the time complexity of the final training set generation and the time complexity of the repetitive training is O(nAnB)+O(nApsnBps)+O(x). Larger new sample set and parameters will increase the size of the final training set causing the time for training set construction to be increased. However, the time spent for final training set generation is generally insignificant compared to the massive computation of the repetitive training of the supervised neural network. By using two neural networks, the capability of training data and testing data simultaneously is available for solving adaptive learning problems in soft real time applications.

3

Experimental Results

The experiments were conducted and tested with the 2-dimension partition problem. The distribution of samples was created in limited location of 2-dimension donut ring as shown in Figure 3. This partition had three parameters: Inner Radius (R1), Middle Radius (R2) and Outer Radius (R3). The class of samples depended on geometric position. There were two classes of data which were designed as one (1) and zero (0). Class 0 covered the area from 0 to Inner Radius (R1) and from Middle Radius (R2) to Outer Radius (R3) and Class 1 covered the area from Inner Radius (R1) to Middle Radius (R2). The context of the problem was assumed to shift from an angular location to another while maintaining some overlapped area between consecutive contexts as shown in Figure 4. The set numbers identified the sequence of training and testing sessions.

6

Piyabute Fuangkhon, Thitipong Tanprasert

Fig. 3. Shape of Partition Area

Fig. 4. Shifting of Problem Context

In the experiment, the data set was separated into 8 training sets (Set 1 to 8) and 8 testing sets (Ratios of Gap and Random were 5:0, 5:5, 5:10, 10:0, 10:5, 10:10, 10:15, 10:20). Each set consisted of 400 samples. Each training set was used to generate a new training set consisting of selected new samples, outpost vectors generated from new samples, selected decayed prior samples, and outpost vectors generated from old samples. The size of new training set was determined by the number of current samples and the decay rate. Figure 5a and Figure 5b present the sample final training sets of the old algorithm [18], [19] and the proposed framework having α = 1.0, β = 0.5, γ = 4.0, δ = 0.0 and α = 1.0, β = 0.5, γ = 2.0, δ = 1.0 respectively. The final training sets in Figure 5a consisted of 2,200 samples including 400 selected new samples (1.0×400), 200 outpost vectors (new samples) (0.5×400), 1,600 selected decayed prior samples (4.0×400), and no outpost vector (prior samples) (0.0×1,600). The final training sets in Figure 5b consisted of 2,200 samples including 400 selected new samples (1.0×400), 200 outpost vectors (new samples) (0.5×400), 800 selected decayed prior samples (2.0×400), and 800 outpost vectors (prior samples) (1.0×800). The first final training set was trained with the feed forward back-propagation neural network (train-network) with the following parameters: network size = [10 1]; transfer function for hidden layer = “logsig”; transfer function for output layer = “logsig”; max epochs = 500; goal = 0.001. After the train-network was turned into test-network, the test-network started testing service while the second final training set was being generated. The second final training set was then trained and turned into test-network. The previous test-network was discarded. This training process was repeated until all final training sets were trained completely. Figure 6 presents the testing results of the old algorithm and the proposed framework with 8 testing sets. The testing results show that the proposed framework yields lower error rate for classification than the old algorithm. When the experiment was conducted in a multi-programmed uni-processor system, the time spent during the training and testing of the old algorithm and the proposed was the same because of the thread switching. However, in a multiprocessor system, one processor executed the training function as one thread while another processor executed the testing function as another thread simultaneously. The testing serviced without delay while the training function was being executed.

Error! Use the Home tab to apply title to the text that you want to appear here.

(a) α = 1.0, β = 0.5, γ = 4.0, δ = 0.0

7

(b) α = 1.0, β = 0.5, γ = 2.0, δ = 1.0

Fig. 5. Final Training Sets of Old Algorithm [18], [19] and Proposed Framework Respectively

Fig. 6. Testing Results of Old Algorithm [18], [19] and Proposed Framework Respectively

4

Conclusion

A framework for solving adaptive learning algorithms in soft real time applications is proposed. The framework based on [18], [19] consists of two supervised neural networks running simultaneously. One is used for training data and the other is used for testing data. The accuracy of the classification is improved from the previous works by adding outpost vectors generated from prior samples. In a multiprogrammed uni-processor system where the training function and testing function share the same processor, the time spent during the training and testing of the old algorithm and the proposed is the same. However, in multiprocessor system, one processor can execute the training function while another processor can execute the testing function simultaneously. The testing can service without delay while the training function is being executed. When the training function takes very long time, the proposed framework will show its true performance.

8

Piyabute Fuangkhon, Thitipong Tanprasert

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

20. 21.

22.

Tanprasert, T., Kripruksawan, T.: An approach to control aging rate of neural networks under adaptation to gradually changing context. In: ICONIP 2002, pp. 174–178 (2002) Tanprasert, T., Kaitikunkajorn, S.: Improving synthesis process of decayed prior sampling technique. In: INTECH 2005, pp. 240–244 (2005) Burzevski, V., Mohan, C.K.: Hierarchical growing cell structures. In: ICNN 1996, pp. 1658–1663 (1996) Fritzke, B.: Vector quantization with a growing and splitting elastic network. In: ICANN 1993, pp. 580–585 (1993) Fritzke, B.: Incremental learning of locally linear mappings. In: ICANN 1995, pp. 217– 222 (1995) Martinez, T.M., Berkovich, S.G., Schulten, K.J.: Neural gas network for vector quantization and its application to time-series prediction. IEEE Transactions on Neural Networks 4(4), 558–569 (1993) Chalup, S., Hayward, R., Joachi, D.: Rule extraction from artificial neural networks trained on elementary number classification tasks. In: Proceedings of the 9th Australian Conference on Neural Networks, pp. 265–270 (1998) Craven, M.W., Shavlik, J.W.: Using sampling and queries to extract rules from trained neural networks. In: ICML 1994, pp. 37–45 (1994) Setiono, R.: Extracting rules from neural networks by pruning and hidden-unit splitting. In: Neural Computation, vol. 9(1), pp. 205–225 (1997) Sun, R., Peterson, T., Sessions, C.: Beyond simple rule extraction: Acquiring planning knowledge from neural networks. In: WIRN Vietri 2001, pp. 288–300 (2001) Thrun, S., Mitchell, T.M.: Integrating inductive neural network learning and explanation based learning. In: IJCAI 1993, pp. 930–936 (1993) Towell, G.G., Shavlik, J.W.: Knowledge based artificial neural networks. Artificial Intelligence 70(1-2), 119–165 (1994) Mitchell, T., Thrun, S.B.: Learning analytically and inductively. In: Mind Matters: A Tribute to Allen Newell, pp. 85–110 (1996) Fasconi, P., Gori, M., Maggini, M., Soda, G.: Unified integration of explicit knowledge and learning by example in recurrent networks. IEEE Transactions on Knowledge and Data Engineering 7(2), 340–346 (1995) Tanprasert, T., Fuangkhon, P., Tanprasert, C.: An improved technique for retraining neural networks in adaptive environment. In: INTECH 2008, pp. 77–80 (2008) Polikar, R., Udpa, L., Udpa, S.S., Honavar, V.: Learn++: An incremental learning algorithm for supervised neural networks. IEEE Transactions on Systems, Man, and Cybernetics 31(4), 497–508 (2001) Tanprasert, T., Tanprasert, C., Lursinsap, C.: Contour preserving classification for maximal reliability. In: IJCNN 1998, pp. 1125–1130 (1998) Fuangkhon, P., Tanprasert, T.: An incremental learning algorithm for supervised neural network with contour preserving classification. In: ECTI-CON 2009, pp. 470–473 (2009) Fuangkhon, P., Tanprasert, T.: An adaptive learning algorithm for supervised neural network with contour preserving classification. In: Deng, H., Wang, L., Wang, F.L., Lei, J. (eds.) Artificial Intelligence and Computational Intelligence. LNCS (LNAI), vol. 5855, pp. 389–398. Springer, Heidelberg (2009) Calvert, D., Guan, J.: Distributed artificial neural network architectures. In: HPCS 2005, pp. 2–10 (2005) Seiffert, U.: Artificial neural networks on massively parallel computer hardware. In: ESANN 2002, pp. 319–330 (2002) Yang, B., Wang, Y., Su, X.: Research and design of distributed training algorithm for neural networks. In: ICMLC 2005, pp. 4044–4049 (2005)

Suggest Documents