Recursive Non Linear Models for On Line Traffic ... - Semantic Scholar

1 downloads 0 Views 305KB Size Report
for on line traffic prediction of Variable Bit Rate (VBR) MPEG-2 video sources. This is .... 1(a) depicts the rates of the first 400 frames of Source1 (Jurasic Park).
Recursive Non Linear Models for On Line Traffic Prediction of VBR MPEG Coded Video Sources Anastasios D. Doulamis, Nikolaos D. Doulamis and Stefanos D. Kollias National Technical University of Athens, Department of Electrical and Computer Engineering Heroon Polytechniou 9, 157 73 Zografou, Greece E-mail: [email protected] Abstract Any performance evaluation of broadband networks requires modeling of the actual network traffic. Since multimedia services and especially MPEG coded video streams are expected to be a major traffic component over these networks, modeling of such services and traffic prediction are useful for the reliable operation of the broadband based an Asynchronous Transfer Mode (ATM) networks. In this paper, a Recursive implementation of a Non linear AutoRegressive model (RNAR) is presented for on line traffic prediction of Variable Bit Rate (VBR) MPEG-2 video sources. This is accomplished by using an efficient weight adaptation algorithm so that the network provide good performance even in case of highly fluctuated traffic rates. In particular, the network weights are adapted so that the output is approximately equal to the current data while preserving the former knowledge of the network. Experimental results are presented to show the good performance of the proposed scheme. Furthermore, comparison with other linear or non linear techniques is presented to show that the adopted method yields better results than the other ones.

1.

Introduction

The success of broadband Asynchronous Transfer Mode (ATM) networks is forecasted to depend on efficient transmission of digital video, since demands for video services will rapidly increase in the forthcoming years [1]. Examples include high definition TV (HDTV), videophone or video conferencing applications, and content-based retrieval from large video databases [2]-[6]. Since all the above applications impose very large bandwidth requirements [6], several coding algorithms have been proposed in the literature to accomplish efficient video compression. Recent efforts towards this goal have stimulated the generation of various international standards, each of which is related to different applications. Among the most popular is the MPEG-2 standard, mainly due to their generic structure, able to support a broad range of applications, such as distribution of digital cable TV, networked database services via ATM, satellite and terrestrial digital broadcasting distribution [6]. Variable Bit Rate (VBR) video transmission (variable rate-constant quality) presents many advantages compared to Constant Bit Rate (CBR) transmission (constant rate-variable quality). This is due to the fact that more video calls can be carried in a VBR mode than in a CBR one, while simultaneously better video quality and shorter delay can be accomplished for the same average bit rate [7]. The output rate of VBR encoders fluctuates according to video activity and therefore, if the allocated bandwidth is set equal to source peak rate, the network is under-utilized [8], [9]. For this reason, several independent video sources are multiplexed in a common buffer with constant output rate, so that the aggregate bit rate tends to smooth out around the average, as the law of large numbers indicates. Then, losses occur in case that the incoming amount of data exceeds the buffer size (buffer overflow), resulting in degradation of picture quality due to the fact that the lost information cannot be retransmitted, in a real time video transmission [8]. Since, for a given video quality, the loss probability should not exceed a certain limit, it is necessary to build some protection against losses so that an acceptable Quality of Service (QoS) level should guarantee to the users. Consequently, statistical characterization and modeling of VBR video sources is necessary for predicting the transmitted video traffic or for estimating the network resources. Traffic management algorithms and congestion schemes, which prevent possible overload of the network or violation of the negotiated QoS parameters, can be gained for such traffic modeling. For example, a rate regulation algorithm can be applied to reduce losses in case that high video activity periods are predicted. Several works have been proposed in the literature dealing with the one line traffic prediction using either linear or nonlinear models. Linear approaches are mainly implemented in a recursive framework and they are suitable for simple traffic traces [10]. Instead, prediction of video streams is accomplished using non-linear models based on neural networks [11], [12]. However, the algorithms adopted have been applied to "smooth" video traffic obtained for example using videoconferencing streams or video sources consisting of one shot. In this paper, a novel recursive implementation of non-linear autoregressive models (NAR) is proposed for on-line traffic prediction of VBR MPEG-2 coded video sources. This is accomplished by optimally adapting the network weights (i.e., model parameters) so that the network output is approximately equal to the current traffic rate, while simultaneously a minimal degradation over the previous network knowledge is provided. We denote the proposed recursive model RNAR similarly to the notation used in linear models. Experimental results and comparative study with other linear and non-linear models used for on line traffic prediction of real coded MPEG-2 video sequences indicate the good performance of the proposed scheme than the previous ones.

2.

MPEG-2 Encoder and Source Characteristics

Since the adopted coding algorithm affects the statistical properties of the actual video traffic, it is useful first to briefly describe the general structure of the MPEG-2 standard, the associated VBR coding control and present some basic characteristics. MPEG-2 Encoder: In the MPEG-2 standard, three different coding modes are supported: Intraframe (I), Predictive (P), and Bi-directionally predictive (B). In intraframe mode, only compression in spatial direction is used, while in predictive mode (P

frames), both in spatial and temporal one. B frames are coded similar to P frames apart from the fact that the motion vectors can be estimated with respect to the previous and/or the following I or P frame [6]. These three types of frames are deterministically merged, forming a group, called Group Of Picture (GOP), defined by the distance L between I frames and the distance M between P frames. In practice, M=3 and L =12, resulting in the following GOP pattern …IBBPBBPBBPBBI…. Source Characteristics: In our simulations, four long duration VBR MPEG-2 coded video sequences (each approximately of 30 min) have been used; two films (Jurasic Park and James Bond) and two TV series. For clarity of presentation, we call the first film and first TV series as Source1 and Source2, while the second film and second TV series as Source3 and Source4, in the rest of this paper. In all video sources, scene changes, high variation of motion activity, camera zooming and panning and changes of luminosity conditions are encountered. Figure. 1(a) depicts the rates of the first 400 frames of Source1 (Jurasic Park). As it can be seen, the three types of frames (I, P and B), composing the MPEG-2 traffic, present different statistical properties and are periodically repeated, according to the GOP pattern, to form the aggregate MPEG-2 sequence. This is also illustrated in Figure 1(b), where the autocorrelation function of the aggregate MPEG-2 sequence for Source1 is presented. The large positive peaks are due to I frames, the negative peaks to B frames, while the intermediate to P frames. 1

40

0.8

30 Autocorrelation

Bit Rate (Mbits/sec)

35

25 20 15

0.6 0.4 0.2

10 0 5 0 0

50

100

150 200 250 Frame Number

300

350

-0.2 0

400

(a)

20

40 60 Lag (Video Frames)

80

100

(b)

Figure 1: (a) Traffic rates of the Source1 sequence. (b) The autocorrelation function of Source1. The average rate of Intra frames is much higher than the respective rate of Inter ones, due to the fact that the former are coded only in spatial direction. B frames presents the lowest rate since they are predicted both with respect to the previous and the following I or P frame. However, the fluctuation of Inter frames is higher than the respective Intra ones, especially in case of high motion periods, where the motion compensation algorithm usually fails [6]. For this reason, although I frames have on average the highest rate and B the lowest one, models which ignore B (and sometimes P) frames and take into account only I frames severely underestimate the network resources [13]. Furthermore, there is correlation between the rates of I, P and B frames, mainly due to motion estimation algorithm and the continuity of the actual video traffic. Examining the four aforementioned sources, a correlation coefficient from 0.2 to 0.35 is obtained between Intra and Inter frames, while between Inter frames it ranges from 0.75 to 0.85. As observed, the correlation coefficient between Inter frames (P-B) is stronger than the respective one between Inter and Intra frames (P-I, B-I), since the method used for coding Intra frames is different from that used for coding Inter frames. Almost the same correlation degree is observed for all examined video sequences. Similar results as far as correlation of I, P and B frames is concerned have also been presented in [13].

3.

Non-Linear Modeling of VBR MPEG-2 Video Sources In the following, let us denote as x c (n) , the rate of c ∈ {I , P, B} frame of an MPEG coded sequence. It should be

mentioned that the variable n of x c (n) refers to the nth sample of the c-stream and not to the nth sample of the aggregate sequence. Since, in general, x c (n) depends on the previous pc samples, we model the rate of c-frames using a non-linear autoregressive model of order p c , NAR(pc), x c ( n) = g c ( x c (n − 1), x c (n − 2),..., x c (n − p c )) + e c (n), c ∈ {I , P, B} g c (⋅)

(1)

e c (n)

where is a non-linear function, while an independent and identically distributed (i.i.d) error. In the following analysis, we omit superscript c for simplicity purposes since it is involved in all equations. The main difficulty in implementing a NAR model is that the function g (⋅) is actually unknown. However, in [14], it has been shown that a feedforward neural network, with a Tapped Delay Line (TDL) filter as its input, is able to implement a NAR model. Let us consider a network architecture, consisting of one hidden layer of l neurons, one output neuron and a TDL filter of p delay input elements, equal to the order of the model. Let us denote as w i = [ wi ,1 wi , p +1 ]T , i = 1,2, l the ( p + 1) × 1 vectors containing all the weights wi , k , k = 1,

, p





which connect the ith hidden neuron to the kth input element and biases

wi , p +1 of the ith neuron. Let us also define as v = [v1 v2

v ] l

T

an l × 1 vector, which contains the network weights, say

connecting the ith hidden neuron to the output neuron and as θ the respective bias. Then, the vector w = [w1T w T2

w

T l

vi ,

v T θ ]T

represents all network weights and biases. In this case, the network output yw (⋅) , which provides a non-linear approximation, say gˆ (⋅) of function g (⋅) and thus an estimate, say xˆ (n) of x(n) , is given by y w ( x(n − 1)) ≡ xˆ (n) = gˆ ( x(n − 1), x(n − 2), , x (n − p )) = v T ⋅ u(x(n − 1)) + θ



(2a)

with u( x(n − 1)) = f ( W T ⋅ x(n − 1)) where W is a ( p + 1) × l matrix, the columns of which correspond to the weight vector w i that is W = [w1 w 2



(2b) w l ] and f (⋅) a

vector-valued function the elements of which correspond to the activation functions, say f (⋅) , of the hidden neurons. In our case, the function f (⋅) is selected to be the sigmoid. The x(n − 1) = [ x(n − 1)



 x(n − p )

1]T is a ( p + 1) × 1 input vector, which

contains the p-previous samples x(n − 1), , x (n − p) and an element equal to one to accommodate the bias effect. In equation (2a) a linear activation function has been used for the output neuron, since the network output approximates a continuous valued signal, i.e., the frame rate of I, P and B frames. Initially, a training set of N samples is used to estimate the network weights w. Without loss of generality, we can assume that the training set consists of pairs {x(n − 1), x(n)}nn == Kp ++1p ; index n begins from p+1 so that all elements of vector x(p) can be properly defined. The network is trained to minimize the mean squared value of the error for all samples in the training set. A modification of the Marquardt-Levenberg algorithm [15] is used to train the network due to its efficiency and fast convergence. To further increase the generalization performance of the network the cross validation method has been used [16]. In our case 10% of the available data are used as validation set.

4.

Recursive Non-Linear AutoRegressive Model (RNAR)

In this section, a novel recursive implementation of the NAR model is presented for on line weight adaptation. We called this model Recursive Non-linear AutoRegressive (RNAR) similar to the notation of the recursive linear models. Recursive Non-Linear AutoRegressive Model (RNAR) The proposed recursive implementation of the NAR model is based on a modification of the network weights so that the network output appropriately responses to new data, but also provides a minimal degradation over the previous network knowledge. Instead, training the network using only the new data would result in a highly fluctuated (unstable) network output [17], [18]. Let us denote by w b the network weights before the adaptation. Let us assume that these weights have been estimated using a training set Sb = {(t1 , d1 ),

 , (t

N b , d N b )}

of Nb pairs, which actually represents the previous network knowledge. Vectors t i

are the network input identically to x(n − 1) , while d i are the target outputs. Similarly, let w a denote the network weights after the adaptation. Without loss of generality, we can consider that the weight adaptation algorithm is activated at the kth sample of x(n). This means that the (k+1)th sample will be estimated using the new weights w a , while the kth sample of x(n) has been predicted based on the previous weights w b . Then, the network weights w a are estimated by minimizing the following equation w a = arg min Db = w

1 Nb 1 Nb ∑ ( y w (t i ) − d i ) 2 = ∑ Di , b 2 i =1 2 i =1

Subject to y w a (x(k − 1)) ≡ xˆ (k ) ≈ x( k )

(3a) (3b)

where y w a (⋅) is the network output using the new weights w a and Di , b = ( y w (t i ) − d i ) 2 . Equation (3) means that the network output, after weight adaptation, is adapted to the current bit rate, i.e., x(k) in such a way a minimal distortion over the previous network knowledge (set S b ) is accomplished. Assuming that a small weight perturbation is sufficient for the adaptation, we have that (4) w a = w b + ∆w where ∆w represents small increments of network weights. The effect of ∆w in equation (3) can be expressed by estimating the sensitivity of errors

Di ,b with respect to network

weights [equation (3a)] and by linearizing (3b) using a first order Taylor series expansion. In this case, (3) yields to the following constraint minimization 1 min ∆w T ⋅ J T ⋅ J ⋅ ∆w (5a) 2 subject to b = aT ⋅ ∆w

(5b)

where matrix J corresponds to the Jacobian matrix of errors Di , b with respect to network weights, while scalar b and vector a are expressed in terms of the previous network weights w b . Since the cost function of (5a) is of squared form (convex function), it has only one minimum, the global one. The gradient projection method has been has been used for the optimization. In particular, the method is an iterative process, starting from an initial weight increment ∆w (0) , which satisfies equation (5b). In our case, ∆w (0) is calculated as the minimal distance of the constraint hyper plane (5b) from the origin, b⋅a (6) ∆w (0) = aT ⋅ a At next iterations, the weight increments are updated as follows ∆w (m + 1) = ∆w (m) + γ (m)h(m) (7) where ∆w (m) is the weight increment at the mth iteration, γ (m) a scalar regulating the convergence rate and h(m) the negative gradient of (5a) onto the subspace that is tangent to the surface defined by the constraint, h ( m ) = − P ⋅ J ⋅ ∆w ( m ) (8) with P = I − a ⋅ (aT ⋅ a) −1 aT The scalar γ (m) of (7) is appropriately selected to guarantee fast convergence. The number of iterations of the method is restricted by the maximum permitted time TD required for rate prediction. For example, for one-step prediction of I frames TD =12*40ms for L=12 and 40ms interframe interval (PAL system). Then, if Ts denotes the computational time for one iteration, the adaptation is terminated earlier than TD / Ts iterations. After a weight adaptation, which has been activated at the kth sample of x(n) , the pair (x(k − 1), x(k )) , consisting of the sample x(k) as desired output and its p previous values, x(k − 1) as network input, gets into the set Sb to be used in future adaptations. However, this yields to a continuous increase of Sb size, since more and more pairs are included. A more efficient implementation is accomplished by considering the number of pairs of Sb constant. Thus, each time a new pair gets into the set Sb , the oldest one is removed from it. Activation of the Weight Adaptation Procedure While, in theory, the recursive implementation of the NAR model can be performed at every new incoming sample, in practice there is no reason to adapt the network weights (model parameters) in case that future samples are predicted with high accuracy. A measure for estimating the prediction accuracy is, 1 if D = xˆ (n) − x(n) > T (9) A= 0 if D = xˆ (n) − x(n) ≤ T In case that the difference D exceeds a certain threshold T, the model parameters are updated using the recursive algorithm of subsection 4.1 (A=1). Otherwise, the same weights (model parameters) are used (A=0). The value of threshold T is calculated based on the average validation error, Ev , since this expresses the network performance during its operation phase. In particular, T = λ * Ev with λ around 1.1-1.2.

5.

Experimental Results

The good performance of the proposed neural network model is evaluated using several experiments of Source1,2,3 and 4. The 20% of data of Source1 and Source2 have been used for training, while a validation set of 10% of the training data has been formed to improve the generalization ability. The model accuracy is evaluated using data of Source3 and Source4, which are not included in the training set. The network inputs have been normalized in the interval [-1 1] so that all the incoming inputs fall within the specified range. The order of the model is selected to be 8,10, 12 for I, P and B frames respectively. In general, the order of Inter frames is greater than the order of Intra frames since the rates of Inter frames fluctuates more rapidly than of Intra frames. Figures 3(a,d) illustrate a time window (250 frames) of traffic rate of I frames versus frame number for Source3 and Source4 respectively. The solid line corresponds to the actual data, while the dotted refers to the predicted data. In this case, the recursive implementation of the NAR model has been applied. Particularly, the network weights are adapted each time a prediction error greater than 10% of the average validation error has been encountered. The size of the former network knowledge is selected to be 100, i.e., N b = 100 . Similarly, Figures 8(b,e) and 8(c,f) present the traffic prediction for P and B frames for the same sequences. As is observed high prediction accuracy is achieved, even at time instances of highly fluctuated frame rates, due to the adaptation of the model parameters. A measure for evaluating the prediction accuracy is to compute the relative prediction error over all samples of the examined sequences with respect to the actual data, E=

1 N

N

∑ ( xˆ (k ) − x(k ) 2 ) / x(k )

k =1

(10)

where N is the total size of the signal. Table I presents the results obtained for I, P and B frame streams and the total MPEG stream over all samples of Source3 and Source4. As is observed the prediction performance is very satisfactory since in the worst case the relative error is less than 3.73%. Moreover, the proposed model present similar performance to all sequences, due to the adaptive nature of the proposed model. However, Intra frames are predicted more accurate than Inter frames due to the fact that they present a smoother traffic behavior. The performance of the proposed recursive model is depicted in Figure 4, where the predicted traffic rates of I, P and B frames are plotted versus the actual rates. In these figures, the solid line corresponds to perfect prediction. 40

30

16

Actual Data

Actual Data

Predicted Data

Predicted Data

30 25 20 15

20 15 10

Predicted Data

12 10 8 6 4

5

10 5 1000

1050

1100 1150 Frame Number

1200

1250

2000

2 2050

(a)

2100 2150 Frame Number

2200

2250

1500

Actual Data

Predicted Data

Predicted Data

15

Actual Data Predicted Data

14

15

10

12 10 8 6 4

5

10

1750

16 Bit Rate (Mbits/sec)

Bit Rate (Mbits/sec)

20

20

1700

18

Actual Data

25

1600 1650 Frame Number

(c)

25

30

1550

(b)

35

Bit Rate (Mbits/sec)

Actual Data

14 Bit Rate (Mbits/sec)

25 Bit Rate (Mbits/sec)

Bit Rate (Mbits/sec)

35

2 5 2000

2050

2100 2150 Frame Number

2200

2250

3500

3550

3600 3650 Frame Number

3700

3750

3800

3850

3900 3950 Frame Number

4000

4050

40

30

12

35

25

10

20

8

Actual Data bi 15

Actual Data bi

Actual Data bi

(d) (e) (f) Figure 3: The prediction performance of the proposed recursive non linear scheme. (a,d) I frames of Source3 and Source4 respectively. (b,e) P frames of Source3 and Source4 respectively.(c,f) B frames of Source3 and Source4 respectively.

10

5

2

30 25 15

20

5 5

6

10

10

15 20 25 30 Predicted Data [Mbits/s]

35

40

0 0

4

5

10 15 Predicted Data [Mbits/s]

20

25

0 0

2

4 6 8 Predicted Data [Mbits/s]

10

12

(a) (b) (c) Figure 4: The actual data provided by the Source3 versus the predicted one obtained by the neural network model. The solid line indicates perfect fit. (a) I frames. (b) P frames. (c) B frames. Table I also presents the relative error, obtained for the rates of I, P and B frames using two other (conventional) methods for on line prediction of VBR video sources. The first method is relied on a recurrent neural network as it has been described in [12], while the second one on a recursive implementation of a linear AR model [10]. In all cases, the proposed model provides better results compared to the other methods. This is particularly evident for Inter frames since they present a highly fluctuated traffic rate. A comparison between the proposed method and the recurrent model of [12] is also presented in Figure 5 for I, P and B frames of Source3. From these figures, it can be seen that the recurrent model cannot follow with high accuracy the traffic rates of the MPEG stream, especially in cases that a high fluctuation is encountered. Another drawback of the recurrent model is that its performance is highly dependent on the adopted learning rate. Small learning rates generate very smooth signals, which cannot predict the frame rates accurately. On the contrary, large learning rates often saturate the network output producing unstable solutions especially after a short time period. Furthermore, the appropriate value of learning rate highly depends on signal to be predicted and the network size. Thus, it is difficult to tune the learning rate parameter. However, in our case the results obtained

in Table I and Figure 5 have been generated using a learning rate that provides a minimal relative prediction error. Based on the above comparisons we conclude that the proposed adaptively trained neural network yields better prediction results than the other methods regardless of the fluctuation of the traffic rates. 40

30 12 Actual I Signal

25

30

The Method of [12]

25 20

Actual I Signal

10

The Proposed Method

Bit Rate (Mbits/sec)

Bit Rate (Mbits/sec)

The Proposed Method

Actual I Signal

20

The Method of [12]

15 10

15

5

10 1000 1010 1020 1030 1040 1050 1060 1070 1080 Frame Number

2200 2210 2220 2230 2240 2250 2260 2270 2280 Frame Number

Bit Rate (Mbits/sec)

35

The Proposed Method The Method of [12]

8 6 4 2 0 8200

8250

8300 8350 8400 Frame Number

8450

8500

(a) (b) (c) Figure 5. Comparison of the prediction performance of the proposed recursive non linear scheme with the method of [12]. (a) I frames of Source3. (b) P frames of Source3.(c) B frames of Source3.

Seq.

The Proposed Method Recursive Linear Method The nonlinear Method of [12] I Signal P Signal B Signal I Signal P Signal B Signal I Signal P Signal B Signal Source 3 1.12% 2.44% 3.73% 7.63% 13.22% 21.02% 5.14% 10.89% 15.01% Source 4 0.96% 2.17% 3.31% 6.30% 12.83% 20.74% 4.43% 9.31% 13.74% Table I: Average relative error over different sources using the proposed adaptively trained method and two other techniques.

6.

References

[1] N. Ohta, Packet Video, Modeling and Signal Processing. Artech House Boston-London, 1994. [2] N. Doulamis, A. Doulamis, D. Kalogeras and S. Kollias, “Low Bit Rate Coding of Image Sequences Using Adaptive Regions of Interest,” IEEE Trans. on Circuits and Syst. for Video Technol. vol. 8, no. 8, pp. 928-934, December 1998. [3] S.-F. Chang, A. Eleftheriadis and D. Anastassiou, “Development of Coloumbia’s Video on Demand Tested,” Signal Processing: Image Comm., vol. 8, pp. 191-208, 1994. [4] Y. Avrithis, A. Doulamis, N. Doulamis, and S. Kollias, “A Stochastic Framework for Optimal Key Frame Extraction from MPEG Video Databases,” Computer Vision and Image Understanding, vol. 75, Nos1/2, pp. 3-24, August 1999. [5] N. Doulamis, A. Doulamis, Y. Avrithis, K. Ntalianis and S. Kollias, "Efficient Summarization of Stereoscopic Video sequences," IEEE Trans. on Circuits & Systems for Video Technology, to appear in June of 2000. [6] T. Sikora, MPEG Digital Video Coding Standards. Digital Consumer Electronics Handbook, McGRAW-Hill book Company. [7] T.V. Laskhman. A. Ortega, and A. Reibman, “VBR video: Tradeoffs and Potentials,” Proceedings of the IEEE, vol. 86, no.5, pp. 952-973, 1998. [8] B. Maglaris, D. Anastassiou, P. Sen, G. Karlsson and J. D. Robbins, “Performance Models of Statistical Multiplexing in Packet Video Communication,” IEEE Trans. on Communications., vol. 36, pp. 834-843, 1988. [9] N. Doulamis, A. Doulamis. G. Konstantoulakis and G. Stassinopoulos, “Efficient Modeling of VBR MPEG-1 Coded Video sources” IEEE Trans. on Circuits & Systems for Video Technology, to appear in the early of 2000. [10] S. Haykin Adaptive Filter Theory. Prentice Hall 1996. [11] J. Chong, S.Q. Li and J. Ghosh, “Predictive Dynamic Bandwidth Allocation for Efficient Transport of Real Time VBR over ATM,” IEEE journal of Selected Areas in Communications, vol. 13, pp. 12-33, Jan. 1995. [12] P.-R. Chang and J.-T. Hu, “Optimal Nonlinear Adaptive Prediction and Modeling of MPEG Video in ATM Networks Using Pipelined Recurrent Neural Networks,” IEEE journal of Selected Areas in Communications, vol. 15, no. 6, Aug. 1997. [13] D. P. Heyman, A. Tabatabai, and T. V. Laskman, “Statistical Analysis of MPEG-2 Coded VBR Video Traffic,” Packet Video ’94 B2.1. [14] J. Connor, D. Martin and L. Altas, “Recurrent Neural Networks and Robust Time Series Prediction,” IEEE Trans. on Neural Networks, vol. 5, no. 2, pp. 240-254. [15] S. Kollias and D. Anastassiou, “An Adaptive Least Squares Algorithm for the Efficient Training of Artificial Neural Networks,” IEEE Trans. on Circuits and Systems, vol. 36, pp. 1092-1101,1989. [16] S. Haykin. Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994. [17] D. C. Park, M. A. EL-Sharkawi, and R. J. Marks II, “An Adaptively Trained Neural Network,” IEEE Trans. on Neural Networks, vol. 2, pp. 334-345, 1991. [18] A. Doulamis, N. Doulamis and S. Kollias, “On Line Retrainable Neural Networks: Improving the Performance of Neural Networks in Image Analysis Problems,” IEEE Trans. on Neural Networks, vol. 11, no.1 January 2000.

Suggest Documents