Multiple Neural Networks in Flexible Link Control Using Feedback-Error-Learning Areolino de Almeida Neto Universidade Federal do Maranhão, CEP 65.085-580 São Luís-MA, Brazil
[email protected]
Luiz Carlos S. Góes Instituto Tecnológico de Aeronáutica, CEP 12228-900 São José dos Campos-SP, Brazil
[email protected]
Cairo L. Nascimento Jr. Instituto Tecnológico de Aeronáutica, CEP 12228-900 São José dos Campos-SP, Brazil
[email protected] Abstract: In this paper two different control approaches using Feedback-Error-Learning are combined. These approaches are used for neural control of a flexible link in order to acquire the inverse dynamic model of the plant. Such systems are characterized as a non-minimum phase system, which is difficult to be controlled by most techniques and by only one neural network, hence this difficulty is overcome by the approaches discussed here: one redefines the output of the plant and the other modifies the reference input. And besides, these approaches are combined in order to obtain a better trained neural ensemble and so a better performance is achieved in neural control. Keywords: multiple neural networks, combining neural networks, neural control, flexible link control, feedback-error-learning
1. Introduction Flexible link systems are being used in many areas, for instance the aerospace industry, mainly due to the use of light-weight materials in large space structures and flexible space robots. The use of lightweight structures has some advantages, which results in an overall system with a lower energy consumption, faster operation and lower cost. However, these structures have disadvantages too and the principal one is the control of flexible systems is more difficult to implement than the control of rigid bodies, because the greater influence of unmodelled dynamics, coupling effects, structural non-linearities, and errors in the estimation of physical parameters. In this paper, we present the combination of two approaches that use neural network (NN) techniques to control a flexible link system. The usage of first approach for controlling flexible link were proposed by Talebi et al. (1998) and the usage of second approach was proposed by Rios Neto et al. (1999)-1. These structures are studied, tested and compared in Almeida Neto et al. (2000)-1, Almeida Neto et al. (2000)-2 and Almeida Neto et al. (2001). All structures use a NN in a control strategy known as Feedback-Error-Learning (FEL). This strategy, described in Miyamoto et al. (1988) and Gomi et al. (1993), utilizes a conventional feedback controller (CFC) concurrently with a NN adapted online using the output of the CFC as its error signal. The CFC should at least be able to stabilize the system under control when used without the neural network. The Feedback-Error-Learning strategy aims to acquire the inverse dynamics model of the plant under control. One great difficulty for most control strategies when dealing with flexible systems is the fact that they are in general non-minimum phase plants. In the first approach this problem is surpassed by feed-backing a redefined output of the flexible system, which is also used to generate the input signal for the NN controller. In the second approach the non-minimum phase characteristic of the plant is dealt by using as the input for the CFC the comparison between a delayed reference input signal and the actual output of the flexible system, while the NN controller receives the reference signal after it has been fed through a tapped-delay line, Nascimento Jr. et al. (1994). Due to the difficulty in getting only one NN well trained, i.e., to find a good minimum of output error signal by one NN, both approaches are combined in order to make good use of each approach. The way to combine several NNs is object of recent researches, principally in control of dynamical systems. In this paper we propose a new way to combine the neural nets, which is similar to increasing topology scheme, that means, one NN is incorporated to control system after the others had already been incorporated to it. This procedure tries to find a global minimum using the contribution of the prior neural nets used and adding the improvement of the recent neural net incorporated to the system. This paper is structured as follows: in the second section the nonlinear mathematical model of the flexible link system is introduced and the experimental setup is presented in the third section. In the fourth section output redefinition approach for tip position control is discussed, in the fifth section the input delayed approach is investigated, in the sixth section the procedure of random search used is detailed and some experimental results are presented, and in the last section conclusions are drawn. 2. Mathematical Modeling Figure 1 shows a structure composed of a flexible beam where one end is fixed on a rigid rotating hub and the other
end is free. The flexible structure is excited by an actuator at the hub with a torque τ. The structure has a rigid body angular displacement θ and an elastic deformation y(x,t), where x is the position of a point along the beam, such that 0 ≤ x ≤ l, and l is the length of the beam. The variable to be controlled is the total angular displacement of the beam tip, νtip written as νtip = θ + y(l,t)/l.
Figure 1 – Schema of the Flexible Link Considering Ih the hub inertia, r the radius of the hub, ρ the mass per unit of length of one of the flexible appendage, l the length of the appendage, mt the mass of the accelerometer at the tip of the flexible link, the line (’) and dot (.) the partial derivatives with respect to space and time, respectively, the kinetic energy of the flexible system can be expressed in the following equation: l 1 l 2 ρ ∫ (r + x ) dx θ 2 + ρ ∫ (r + x )y dx θ + 0 2 0 1 l 2 1 1 2 ρ ∫ y dx + m t (r + l ) θ 2 + I h θ 2 + 0 2 2 2 1 1 m t (r + l )y (l )θ + m t y 2 (l ) + m t y 2 (l )θ 2 + 2 2 l 1 1 I h θ y ′(0 ) + I h y ′ 2 (0 ) + ρ ∫ y 2 dx θ 2 2 2 0
T=
(1)
The elastic potential energy is: I
V=
1 EIy '' 2 dx 2 ∫0
(2)
The parameter E is the modulus of elasticity and I is the cross-sectional area moment of inertia. The assumed modes method is used to transform the distributed system in a discrete one. The mode shape function φj (x) and the characteristic equation taken are described in Rios Neto (1998). Applying the generalized Lagrange equation, Eq. (3), where the Lagrangian is L = T – V, q = [θ η1 η2]T (one rigid body angular displacement and two elastic modes), and Q = [τ 0 0]T, results the matrix equation, Eq. (4), for the discrete flexible system:
d ∂L ∂L =Q − dt ∂q ∂q
(3)
Mq + Kq + H ηθ = F
[ ]
(4)
F = [τ 0 0]T
(5)
2η T [MS ] H = − [MS ]θ (3 x 2 )
(6)
I t + η T [MS ]η T M = [µs ] 1 [µs ]T 2
[µs ]1 [µs]2 [MS ]11 [MS ]12 [MS ]21 [MS ]22 (3 x 3)
(7)
0 0 K = 0 Kη11 0 K η 21
0 Kη12 Kη22 ,(3 x 3 )
(8)
η = [ η1 η 2 ] T
l
[MS ] = ∫ ρφ (x )φ T (x )dx + mtφ (l )φ T (l ) 0
l
[µs] = ∫ ρxφ t (x )dx + mt lφ T (l ) 0
(9)
l
[K ] = ∫ EIφ ′′(x )φ ′′T (x )dx 0
I t = I h + mt l 2 3. The Output Redefinition Approach The output redefinition approach can be performed by two different schemes. Wang et al. (1989) proposed to use as output of the flexible system the so-called reflected tip position (RTP), νa, defined as νa = h(θ ,y(l,t)) = θ – y(l,t)/l. Later on, Madhavan et al. (1991) proposed a generalized version where the output is redefined as, νa = θ + α y(l,t)/l , where −1 < α < 1. In this more general version, they showed that there is a critical value 0 < α* < 1, such that for α > α* the zero dynamics related to this redefined output are unstable, and for –1 < α < α* the zero dynamics are stable. Hence, an inverse dynamics controller can be designed to control the tip position. The value of α* depends on the payload, and it takes its smallest value when the payload is zero. In Talebi et al. (1998), it was proposed the usage of two possible NN structures to control the tip position of the flexible link system using the redefined output νa The first structure was called Inverse Dynamic Model Learning (IDML) and is not used in this work. The second structure was called Nonlinear Regulator Learning (NRL) and is shown in Fig. 2. These structures are not new, but its use in flexible link control are. In both structures the CFC control law is ucfc = K2 (νr −νa ) + K1 (νr −νa ) + K0 (ν r −ν a ) , the learning rule for the NN is w = γ ucfc ∂un / ∂w , where γ is the learning rate. The NN output is u n = Φ ( rn , w ) where rn is the NN input and w is the NN weight matrix. In the NRL structure rn = f n ( νr ,ν r ,ν r ,ν r −ν a ,νr −νa ,θ , y , y ) . In both structures the term
νr is the reference input signal for tip position and the term νa is the redefined output. A conventional feedback controller is used as an ordinary controller to guarantee asymptotic stability during the learning period and its output is fed to the NN as its output error signal. After the learning phase, the NN should acquire an arbitrarily close model of the inverse dynamics of the plant. Then one can say that the error equation will give by K 2 e + K1e + K 0 e ≅ 0 , where e = νr – νa, which means that the tracking error e should converge to zero, Talebi et al. (1998).
Figure 2 – Structure of the Nonlinear Regulator Learning (NRL) 4. The Input Delayed Approach The structure of the adaptive control proposed by Rios Neto et al. (1999)-1 is shown in the Fig. 3. This structure originally proposed by Miyamoto et al. (1988) is modified by the introduction of two basic modifications: a) the highorder differentiators for the reference signal were substituted by a tapped delay line of L length; and b) the input
reference signal is delayed by M sampling periods, Nascimento Jr. et al. (1994), before it is compared with the real tip position of the beam, given by νtip = θ + y(l,t)/l.
Figure 3 – Structure of Input Delayed Approach Assuming that the plant is a stable SISO linear dynamic system with a transfer function G(z), the linear NN transfer function is given by:
G n (z ) =
u n (z ) = β 0 + β1 z −1 + + β L z − L ( ) z νr
(10)
Assuming that the closed loop control without the NN is stable, the polynomials δ(z) and ψ(z) converge:
δ (z ) =
∞ G cfc (z ) = ∑δ j z − j cfc 1 + G (z )G (z ) j =0
(11)
ψ (z ) =
∞ G (z )G cfc (z ) = ∑ψ j z − j cfc 1 + G (z )G (z ) j =0
(12)
From Fig. 3, one finds the equation below such that if Gn(z) = z−M/G(z), then νtip(z)/νr(z) = z−M and ucfc(z)/ νr(z) = 0:
[
]
ucfc (z ) = z − M δ (z ) −ψ (z )G n (z ) ν r
(13)
The NN training problem can be seen as an optimization problem, where the objective is to obtain a NN which minimizes the cost function J as defined below:
J=
(
)
2 T 1 −M E z δ (z )ν r ( z ) −ψ (z )(β * ) ν *r ( z ) 2
β * = [β 0
(14)
β1 β L ]T
(15)
ν *r ( z ) = [ν r ( z ) z −1ν r ( z ) z − Lν r ( z )] T
(16)
The cost function, J , will have a unique minimum if the matrix ∂ 2 J ∂β *2 is positive-definite, where:
[
][
] }∆ F
∂ 2 J ∂β *2 = E{ ψ (z )ν *r ( z ) ψ (z )ν r* ( z )
T
l
(17)
The βi parameters are found by solving the equations:
[
]
F l β * = F r = E (z − M δ (z )ν *r ( z )) (ψ (z )ν *r ( z ))
(18)
These Parameters will have a unique solution if the input reference signal νr has enough exciting properties, such that
F l is a positive-definite matrix. However, to calculate F r and F l it is necessary to know the plant transfer
function G(z) and its associated polynomials δ(z) and ψ(z). Hence as G(z) is unknown, a learning algorithm with a simple gradient descent search can be used. In this case, the following learning rule for the parameters βi is derived: ∞ ∂J ∆β i = −γ = γ u cfc, k ∑ψ j ν r ,k −i − j j =0 ∂β i
(19)
Since the polynomial ψ(z) is unknown, Rios Neto et al. (1999)-2 proposed to use another (maybe guessed) polynomial as its approximation, such that Eq. 19 becomes: NL
NL
j =0
j =0
σ ( z ) = ∑σ j z − j => ∆β i = γ u cfc, k ∑σ j ν r ,k −i − j
(20)
5. The Combination of Two Approaches To obtain an excellent neural system, that means, a neural system able to acquire the inverse model of the plant, it's not easy for only one NN. So one way is to change the NN used, either changing its parameters, like number of units, of layers, etc., or even changing the way to apply the NN in the control system. But it's possible to use another way to solve this problem: application of more than one NN. The usage of several NN is explored by many researches, but not so much in control systems and less in flexible link control. In this work we combine the approaches above in order to obtain a better performance than if only one NN were used, and so only one approach were used. The figure below shows the scheme of these two approaches combined.
Figure 4 – Structure of Two NNs Combined Using Input Delayed and NRL Approaches The main problem in using more than one NN is how to mix their output in order to produce the unique output of the neural system. Some problems permit several output, so each output can be provided by one NN, but in that problems which this is not possible, the output of the NNs should be mixed and the result of this mixing is considered the output of the neural system. Important works which treat this problem are 1, 2, 3 e Jacobs and Jordan (1993). The Hashem's works present a procedure to find the weights in a weighted sum of the output of the NN, while in the work of Jacobs another NN is used to provide the weighting of the output of the NNs. In essence these and others works are very similar: the outputs of the NNs should be weighted before producing the output of the neural system. Here the output of the NNs are not weighted, because the training of one NN doesn't happen simultaneously with the training of another NN. In fact, in the beginning only one NN, let's say the Gn1 in figure above, is used and trained to acquire the inverse model of the plant. Probably the learning of the inverse model will not occur, so this NN, Gn1, is not trained anymore, but it continues producing its output and feeding the plant with this output. So another NN, Gn2, is incorporated in the system in a such way that its output is zero in the beginning and step by step it will produce an output, which is added to the output of the previous NN. The reason for not be necessary a weight for each NN is because in the training of Gn2, the output of the previous trained NN continues to feed the plant and after the processing of this output by the plant and by the CFC, this value is used in the learning of the Gn2. The equations below show the error used by the first NN and the error used by the second NN, respectively Eq, 21 and 22, where E(z) is the error, R(z) is the reference and G(z) is the transfer function of the plant.
E( z ) = z − M R( z ) − G (U cfc ( z ) + U n1 ( z ))
(21)
E( z ) = z − M R( z ) − G (U cfc ( z ) + U n1 ( z ) + U n 2 ( z ))
(22)
The first advantage of this scheme is the learning of the inverse model is divided by two NNs, which is much simpler for each NN than to learn the complete inverse model. The second advantage is the dispense of the procedure to find the weights for each NN, which permits this scheme to be smaller than if were necessary the usage of some weight arrangement. The computational cost associated with the training of more than one NN can become the control unable in real implementation, because the processing of the training algorithm is much heavier than the processing of the output of the trained NN. One disadvantage of this scheme is the slow learning of the inverse model, but what is better a slow and right learning or a fast and wrong learning? Of course, the best is a fast and right learning, but this is too long. 6. Experimental Results The results presented here were obtained from simulations. However even in this way, when two NNs were combined using the approaches described above, the performance was better. The simulations consisted of tests using the approaches above with the NNs trained by BP algorithm. Due to the nonlinear characteristics of the flexible system, the linear NN was not able to converge to a good approximation of the delayed inverse dynamics. So in this work we used a 3-layers Perceptrons (input, hidden e output layers). The weights between input and hidden layers are initialized randomly and the other weights were initialized with zero, in such way that output of all NN were zero: this procedure avoids to unstabilize the system at the beginning of training session. At all, the training goes in well-known procedure: the weights are initialized, the BP algorithm is ran and the results are obtained; if not a good result is obtained the procedure is repeated with some changes in the first NN. But after some experiences, the result is not improved only changing some parameter of the first NN, so when this occurs is time to incorporate another NN and run the BP again, but only over the second NN. The variable controlled used in these tests was only the hub position with a sine wave reference of 90º of amplitude and a period of 6 s. For the input delayed approach Linput and Lhidden , the number of units in the input and hidden layers, were 20 and 50, respectively, while in the NRL structure Linput = 9 and Lhidden=50 too. The learning rate parameters γinput and γhidden used by the BP algorithm were 0.000001 and 0.000001 for all trainings. The delay for the inverse model of the system in the input delayed approach (M) was 1. For both approach, the CFC was only one PD controller with the proportional and derivative gains 45 and 22, respectively. The figure below shows the output of the plant at the end of the training of the second NN. It's possible to see that the actual response practically accomplishes the reference signal.
Figure 5 – Output of the System at the End of Training Using Both Structures The next figure shows the output error during the complete training. From 0 s up to 120 s, only one NN is used with the Input Delayed Approach. As can be seen, after 50 s, the error stabilized and the NN could not improve the performance. At this point the maximum absolute error was 2.5º, that means, the maximum relative error of 2.8 %. When the second NN was incorporated in the system, the first NN continued to produce its output, so the second NN had only to improve the performance learning a part of the plant, not being necessary to learn everything about the
plant. Considering this, the second NN could increase the performance bringing the maximum absolute error to 1.25º or the equivalent relative error of 1.4 %.
Figure 6 –Output Error for the Complete Training Using First Input Delayed Approach and after NRL Approach In Fig. 7 it is showed the quadratic mean of hub position error during the training session. This graphic shows also the improvement due to the addition of the second NN.
Figure 7 – Quadratic Mean Output Error for Hub Position during the Training Session The next figure shows the quadratic mean value for the output of the controllers. In this graphic is showed the output of the CFC, the sum of the output of NNs and the sum of these signal, which is the true control signal sent to the
plant. In this figure the quadratic mean values (QMV) for the output of the CFC (Ucfc) is decreased, while the QMV for the sum of output of the NNs (Un1+Un2) is increased and the control law (U = Ucfc+Un1+Un2) is more influenced by the NNs at the end of training session than by the CFC. This evidences that in this simulation the inverse model obtained with the combination of NRL and Input Delayed structures were close to the desired model.
Figure 8 – Quadratic Mean Value for the CFC (Ucfc), NNs (Un1+Un2) and for the Control Law (U=Ucfc+Un1+Un2) In Fig. 9 is showed the output of the controllers at the end of the training session. One can see that the first NN has a signal which is the principal part of the total control law, the second NN has a signal less than the output of the first NN, but greater than the output of the CFC. And the output of the CFC is bounded around the zero volt.
Figure 9 – Control Signals U, Ucfc and Un1 and Un2
7. Conclusions Two different structures for neural control of a flexible link were combined in a such way that only one feedback controller was used and two NNs, each one belonging to one approach, had the task of learning the inverse model of the flexible plant. In simulations this combination was successful and was showed that the possibility to increase the learning about an inverse model of a plant is much easier with more than one NN. The performance achieved with the combination was very good. More, it was presented also a way to combine the networks with no necessity of weighting procedure or weighting element, like another NN used in many others works. 8. References Almeida Neto, A., Rios Neto, W., Góes, L.C.S. and Nascimento Jr., C.L., 2000, “Neural Control of a Flexible Link Using Feedback-Error-Learning” , Proceedings of the 3rd International Conference on Nonlinear Dynamics, Chaos, Control and Their Applications in Engineering Sciences, Vol. 6, Campos do Jordão, Brazil, pp. 242-251. Almeida Neto, A., Rios Neto, W., Góes, L.C.S. and Nascimento Jr., C.L., 2000, “Feedback-Error-Learning for Controlling a Flexible Link” , Proceedings of the 4th Brazilian Symposium of Neural Network, Vol. 1, Rio de Janeiro, Brazil, pp. 273-278. Almeida Neto, A., Góes, L.C.S. and Nascimento Jr., C.L., 2001, “Flexible Link Control with Three Different Structures Using Feedback-Error-Learning” , Proceedings of the 5th Brazilian Congress of Neural Network, Vol. 1, Rio de Janeiro, Brazil, pp. 637-642. Gomi, H. and Kawato, M., 1993, “Neural Network Control for a Closed-loop System using Feedback-Error-Learning”, Neural Networks, Vol. 6., pp. 933-946. Hashem, S. et al., 1994, "Optimal Linear Combinations of Neural Networks: an Overview", Proceedings of IEEE Conference on Neural Networks, Vol. 3, pp. 1507-1512. Hashem, S., Schmeiser, B., 1995, "Improving Model Accuracy Using Optimal Linear Combinations of Trained Neural Networks", IEEE Transactions on Neural Networks, Vol. 6, N. 3, pp. 792-794. Hashem, S., 1997, "Algorithms for Optimal Linear Combinations of Neural Networks", Proceedings of International Conference on Neural Networks, Vol. 1, pp. 242-247. Jacobs, R. A., Jordan, M. I., (1993), "Learning Piecewise Control Strategies in a Modular Network Architecture", IEEE Transactions on Systems, Man and Cybernetics, Vol. 23, N. 2, pp. 337-345. Madhavan, S.K. and Singh, S.N., 1991, “Inverse Trajectory Control and Zero Dynamic Sensitivity of an Elastic Manipulator”, Int. Journal of Robotic and Automation, pp. 179-191. Miyamoto, H., Kawato M., Setoyama T. and Suzuki, R., 1988, “Feedback Error-Learning Neural Network for Trajectory Control of a Robotic Manipulator”, Neural Networks I, pp. 251-265. Nascimento Jr., C.L., 1994, “Artificial Neural Networks in Control and Optimization”, PhD Thesis, Control Systems Centre, UMIST, UK. Rios Neto, W., 1998, “Control of a System with Flexible Appendages using Neural Networks”, MSc Thesis, Technological Institute of Aeronautics (in portuguese). Rios Neto, W., Góes, L.C.S. and Nascimento Jr., C.L., 1999, “Experimental Neural Control of an Unconstrained Multibody System with Flexible Appendages”, Proceedings of the 15th Brazilian Congress of Mechanical Engineering, Águas de Lindóia-SP, Brazil. Rios Neto, W., Nascimento Jr., C.L. and Góes, L.C.S., 1999, “Positional Control of a Flexible Structure Using Neural Networks”, Proceedings of the 4th Brazilian Congress of Neural Networks, Vol.1, S. José dos Campos-SP, Brazil, pp. 378-383. Talebi, H.A., Khorasani, K. and Patel, R.V., 1998, “Neural Network Based Control Schemes for Flexible Link Manipulators: Simulations and Experiments”, Neural Networks II, pp. 1357-1377. Wang, D. and Vidyasagar, M., 1989, “Transfer Function for a Single Flexible Link”, Proceedings of the IEEE International Conference on Robotics and Automation, pp. 1042-1047.