Proceedings of the 42nd IEEE Conference on Decision and Control Maui, Hawaii USA, December 2003
FrE03-6
Inverse Optimal Design for Trajectory Tracking with Input Saturations via Adaptive Recurrent Neural Control Luis J. Ricalde and Edgar N. Sanchez CINVESTAV, Unidad Guadalajara, Apartado Postal 31-430, Plaza La Luna, Guadalajara, Jalisco C.P. 45091, Mexico, e-mail: lricalde,
[email protected] Abstract – This paper is related to trajectory tracking problem for nonlinear systems, with unknown parameters, unmodelled dynamics and input saturations. A high order recurrent neural network is used in order to identify the unknown system and a learning law is obtained using the Lyapunov methodology. Then a control law, which stabilizes the tracking error dynamics, is developed using the inverse optimal control approach, recently introduced to nonlinear systems theory. Tracking error boundedness is established as a function of a design parameter. The applicability of the approach is illustrated via simulations, by synchronization of nonlinear oscillators.
Keywords - Recurrent neural networks, trajectory tracking, adaptive control, input saturation, inverse optimal design. I. Introduction Since the seminal paper [8], there has been continuously increasing interest in applying neural networks to identification and control of nonlinear systems. Lately, the use of recurrent neural networks is being developed, which allows more efficient modeling of the underlying dynamical systems [9]. Two recent books [14], [10] have reviewed the application of recurrent neural networks for nonlinear system identification and control. In [14], in particular, off-line learning is used, while [10] analyzes adaptive identification and control by means of on-line learning, where stability of the closed-loop system is established based on the Lyapunov function method. In [10], the trajectory tracking problem is reduced to a linear model following problem, with an application to DC electric motors. The problem of designing robust controllers for nonlinear systems with uncertanties, which guarantee stability and trajectory tracking, has received an increasing attention lately. The presence of input saturations limits the ability to freely modifying the system behavior by high gain feedback, in order to compensate the effects of unmodelled dynamics and external disturbances. These effects are reflected on the loss of stability, undesired oscillations and other adverse effects. In [2], a control law based on the Sontag formula with input saturations [7] is developed and applied to a chemical reactor. On the other hand, it is worth mentioning that the H∞ control approach [1] minimizes the control effort and achieves robust stabilization. One major dificulty with this approach, alongside its possible system structural instability, seems to be the requirement of solving some result-
0-7803-7924-1/03/$17.00 ©2003 IEEE
ing partial differential equation system. In order to alleviate this computational problem, the so-called inverse optimal control technique was recently developed, based on the input-to-state stability concept [6]. This approach has the advantage of not requiring to solve a HamiltonJacobi-Bellman equation. In [13], the inverse optimal control approach is applied to control robotic manipulators, with unmodeled dynamics. In this paper, we extend our previous results [13] to trajectory tracking for nonlinear systems in presence of input saturation and uncertainties. The proposed adaptive control scheme is composed by a recurrent neural identifier and a controller, where the former is used to build an on-line model for the unknown plant, and the latter to force the unknown plant to track the reference trajectory. An update law for the high order recurrent neural network is proposed via the Lyapunov methodology. A robust learning law to avoid the parameter drift in the presence of modelling error is also proposed. The control law is sinthetized using the Lyapunov methodology and the Sontag control law for stabilizing systems with input saturations and uncertain terms. This control law explicitly depend on the input saturations. Boundedness of the tracking error is proven and an estimation of the closed loop stability region is given in order to determine the available bounds of the uncertainties and the desired tracking error. Finally, we demonstrate that the developed control law minimizes a cost functional related with the solution of a HamiltonJacobi-Bellman partial differential equation. The proposed algorithm is tested via simulations by means of synchronizing two nonlinear oscillators. II. Recurrent neural network model The dynamic neural network to be used is first introduced. A. Recurrent Higher-Order Neural Networks In [5], recurrent higher-order neural networks (RHONN) are defined as x˙ i = −ai xi +
L X
k=1
wik
Y
dj (k)
yj
,
i = 1, ..., n
(1)
j∈Ik
where xi is the ith neuron state, L is the number of higher-order connections, {I1 , I2 , ..., IL } is a collection of
6200
non-ordered subsets of {1, 2, ..., m + n}, ai > 0, wik are the adjustable weights of the neural network, dj (k) are nonnegative integers, and y is a vector defined by £ ¤> y = y1 , .., yn , yn+1 , ..., yn+m = [S(x1 ), ..., S(xn ), S(u1 ) , ..., S(um )]> ,with u = [u1 , u2 , ..., um ]> being the input to the neural network, and S(·) a smooth sigmoid function 1 formulated by S(x) = 1+exp(−βx) + ε. For the sigmoid function, β is a positive constant and ε is a small positive real number. Hence, S(x) ∈ [ε, ε + 1]. As can be seen, (1) allows the inclusion of higher-order terms. By defining a vector z(x, u) = [z1 (x, u), ..., zL (x, u)]> = i hQ Q dj (1) dj (L) > y , ..., y , (1) can be rewritten as j∈I1 j j∈IL j x˙ i = −ai xi + wi> zi (x, u),
i = 1, ..., n
(2)
where wi = [wi,1 ...wi,L ]> . In this paper, terms as y = [y1 , .., yn , yn+1 , ..., yn+m ]> = [S(x1 ), ..., S(xn ), u1 , ..., un ]> are considered. This means that the same number of inputs and states is used. It is also assumed that the input to the neural network enters directly, so that (2) can be rewritten as x˙ i = −ai xi + wi> zi (x) + ui ,
i = 1, ..., n
(3)
Reformulating (2) in the matrix form yields x˙ = Ax + W z(x) + u
(4)
where x ∈ W tr W ·
w ˆ i,j
= −γei z(xpj ), , i = 1, 2, ..., n, j = 1, 2, ..., L
Substituting the adaptation law in (11) gives V˙ = −λkei k2 ≤ 0 .
Let consider the unknown nonlinear plant (5)
where xp , fp ∈ ˜o V = kei k2 + γ>0 (10) tr W W 2 2γ
III. Plant Identification
x˙ p = Fp (xp , u) , fp (xp ) + sat(u)
(8)
(7)
½
.
−γei z(xpj ) −γei z(xpj ) − σγwi
if |wi | < wm if |wi | ≥ wm
(13)
where σ is a positive constant and wm is the upper bound for the neural network weights. It can be shown that the robust learning law do not affect the stability of the identification error but improves it, making the Lyapunov function time derivative more negative. For a detailed demonstration, see [10]. Once that the identification error is minimized, we consider trajectory tracking for the neural network (and the unknown system) with input saturation and including the modelling error. IV. Trajectory Tracking Analysis Considering the nonlinear system with input saturation (5), which we modelize by the neural network identifier
6201
described in the previous section, including the modelling error term, we have
Furthermore, similar to [2], we assume that the uncertain term (Lw V ) wper is bounded by above as
x˙ = Ax + W z(xp ) + wper + sat(u)
(Lw V ) wper ≤ |e| wb
(14)
where we assume that the modelling error is bounded. In the following, for simplicity, we will use u instead of sat(u). We will design a robust controller which satisfies |u| ≤ umax and guarantees boundedness of the tracking error between the plant and the reference signal given as: x˙ r = fr (xr , ur ),
n
xr ∈
˜o V = kei k2 + kek2 + (19) tr W W 2 2 2γ Its time derivative, along the trajectories of (9) and (18), is ( > ) · 1 2 T ˜ z(xp ) + tr W ˜ W ˜ (20) V˙ = −λkei k + ei W γ
where χ, φ are adjustable parameters. Replacing the control law (22) in (21) and taking into account the bound for (Lw V ) wper , we obtain ¢ ¡ (23) V˙ ≤ −λ kei k2 + kek2 q 2 (Lf V + wb |e|) 1 + (umax Lg V ) · ¸ + q 1 + 1 + (umax Lg V )2 ³ 2 ´ |e| wb (φ − (χ − 1) |e|) |e|+φ · ¸ + q 2 1 + 1 + (umax Lg V ) q 2 4 (Lf V + χwb |e|) + (umax Lg V ) · ¸ − q 1 + 1 + (umax Lg V )2
φ , the third term is It is easy to verify that when |e| > χ−1 strictly negative; hence, we proceed to study the case when φ . First, we consider that the modelling error is a |e| ≤ χ−1 disturbance which satisfies a growth bound of the form
|Lw V | wper ≤ |e| wb Then, we can obtain the following bound wb |Lw V | |e| (φ − (χ − 1) |e|) ¸ · q 2 (|e| + φ) 1 + 1 + (umax Lg V )
≤ wb |Lw V | |e|
−λkek2 + eT W z(xp ) + eT (αr (xr ) + wper ) + eT u
2
≤ wb |e|
Replacing the learning law (12) in (20) we obtain V˙
Lft V Lw V
= −λkei k2 − λkek2 + eT W z(xp ) +eT (αr (xr ) + wper ) + eT u ¢ ¡ ∆ = −λ kei k2 + kek2 + L∗ft V + (Lw V ) wper + (Lg V ) u = eT W z(xp ) + eT αr (xr ) = eT , Lg V = eT
≤
wb φ2 (χ − 1)2
= ˆ β (φ) (21)
∀
|e| ≤
φ χ−1
where we assumed that the following inequality holds for a suitable φ. ¸ · q 2 (|e| + φ) 1 + 1 + (umax Lg V ) ≥ φ
6202
Substituting β (φ) in (23) we obtain ¢ ¡ V˙ ≤ −λ kei k2 + kek2 + β (φ) q 2 (Lf V + wb |e|) 1 + (umax Lg V ) · ¸ + q 2 1 + 1 + (umax Lg V ) q 2 4 (Lf V + χwb |e|) + (umax Lg V ) · ¸ − q 2 1 + 1 + (umax Lg V )
(24)
bound given by α−1 (2β) . Then, the designer can choose the parameters φ and χ in order to obtain a suitable tracking error. The design parameter φ should be selected small relative to the desired ultimate bound for the tracking error. B. Inverse Optimality Analysis.
Now, to determine the sign of the last two terms, we consider two cases, Case 1. Lf V + wb |e| ≤ 0, replacing this inequality in (24) yields ¢ ¡ V˙ ≤ −λ kei k2 + kek2 + β (φ) q 2 |Lf V + wb |e|| 1 + (umax Lg V ) · ¸ − q 2 1 + 1 + (umax Lg V ) q 2 4 (Lf V + χwb |e|) + (umax Lg V ) · ¸ − q 2 1 + 1 + (umax Lg V )
Once the problem of finding the control control law (22), which stabilizes (18), is solved, we can proceed to demonstrate that this control law minimizes a cost functional defined by ½ Z t³ ´ ¾ ˆ ˆ J(u) = lim 2V + l(e, W ) + uR(e, W )u dτ (28) t→∞
where the Lyapunov function solves the Hamilton-JacobiBellman family of partial derivative equations: ˆ )−1 Lg V > + |Lw V | wb = 0 ˆ ) + Lf V − 1 Lg V R(e, W l(e, W 4 (29) Note that 2βV in (29) is bounded when t → ∞, since by (27) V is decreasing and bounded from below by V (0). Therefore, lim V (t) exists and is finite. t→∞ ˆ ) to be positive define and In [6], it is required l(e, W radially unbounded with respect to e. Here, from (29) we have
Case 2. 0 ≤ Lf V + wb |e| ≤ umax 2
(Lf V + χwb |e|) ≤ (umax Lg V )
2
Then q 2 4 − (Lf V + χwb |e|) + (umax Lg V ) q 2 ≤ − (Lf V + χwb |e|) 1 + (umax Lg V )
Replacing this bound in (24) , we obtain ¢ ¡ V˙ ≤ −λ kei k2 + kek2 + β (φ) q 2 (1 − χ) wb |e| 1 + (umax Lg V ) · ¸ + q 2 1 + 1 + (umax Lg V )
(26)
where we select χ > 1 such that the last term on the right hand is strictly negative. From (26) , we deduce that exists a class K function α such that V˙ ≤ −α (|e|) + β (φ). Then, by an appropiate selection of φ we can make β (φ) small enough such that β (φ) ≤ 12 α (|e|) in order to obtain 1 V˙ ≤ − α (|e|) 2
ˆ ) = −Lf V + 1 Lg V T R(e, W ˆ )−1 Lg V − |Lw V | wb l(e, W 4
(25)
For this case, we consider the inequality
(27)
Hence, the trayectories of V will aproach a ball of radius |e| = α−1 (2β) . Therefore, whenever the inequality (25) holds, the trajectories of (18) will approach an ultimate
0
ˆ )−1 into (29), we obtain after some alSubstituting R(e, W gebraic manipulations q 2 − 12 Lf V − (Lf V + |e| wb ) 1 + (umax Lg V ) ˆ · ¸ l(e, W ) = q 2 1 + 1 + (umax Lg V ) ¡¡ ¢ ¢ ³ |e| ´ |e| wb 12 χ − 1 |e| − φ |e|+φ · ¸ (30) + q 2 1 + 1 + (umax Lg V ) q 1 (Lf V + χwb |e|)2 + (umax Lg V )4 2 · ¸ + q 2 1 + 1 + (umax Lg V ) ¶ µ q − (Lf V + |e| wb ) 12 + 1 + (umax Lg V )2 ˆ) ≥ · ¸ l(e, W q 1 + 1 + (umax Lg V )2 q 2 4 1 (Lf V + χwb |e|) + (umax Lg V ) 2 · ¸ + q 1 + 1 + (umax Lg V )2 ¡¡ ¢ ¢ ³ |e| ´ |e| wb 12 χ − 1 |e| − φ |e|+φ · ¸ (31) + q 2 1 + 1 + (umax Lg V )
6203
Since Lf V + χwb |e| ≤ umax |Lg V | , if the trajectories satisfy µ ¶ q 1 2 (Lf V + wb |e|) + 1 + (umax Lg V ) 2 q 1 2 4 (Lf V + χwb |e|) + (umax Lg V ) ≤ 4 then (30) is simplified as q 2 4 1 (Lf V + χwb |e|) + (umax Lg V ) ˆ) ≥ 4 · ¸ l(e, W q 2 1 + 1 + (umax Lg V ) ¡¡ ¢ ¢ ³ |e| ´ wb |e| 12 χ − 1 |e| − φ |e|+φ · ¸ + q 2 1 + 1 + (umax Lg V )
x˙ p1
The goal is to force (32) to track the reference given as the Duffing equation
where α1 (|e|) is a class K function. φ For analizing the case when |e| ≤ 1 χ−1 , we consider the 2 relation ¡¡ ¢ ¢ |e| wb |e| 12 χ − 1 |e| − φ |e|+φ ˆ ) ≥ α1 (|e|) + · ¸ l(e, W q
x˙ r1 x˙ r2 with
( 12 χ
xr1 (0) xr2 (0)
¤
=
£
1 0
¤
tanh2 (xp1 ) tanh2 (xp2 ) tanh2 (xp1 ) tanh2 (xp2 ) tanh3 (xp1 ) tanh3 (xp2 ) tanh3 (xp1 ) tanh3 (xp2 )
tanh4 (xp1 ) tanh4 (xp2 ) tanh4 (xp1 ) tanh4 (xp2 )
¤
To avoid the parameter drift we use the robust adaptation law (13) with γ = 25, wmax = 5, σ = 5. For the control law, we select χ = 2.1 and φ = 0.1. We let (32) evolve
Fig. 1. Time evolution for the reference and plant states.
t→∞
where ft and gt are described in (18). It is clear that the optimal value is J ∗ (u) = V (0).
£
(33)
with A = −5I and the high order sigmoid vector is defined as £ tanh(xp1 ) tanh(xp2 ) tanh(xp1 ) tanh(xp2 )
ˆ ) satSelecting χ > 2 and φ sufficiently small, then l(e, W isfies the condition of being positive definite and radially unbounded if the evolve outside the ball of ra¡ trajectories ¢ dius |e| = α−1 2β 0 . Hence, (28) is a cost functional. Now 1 we proceed to prove that the control law (22) minimizes ˆ ) in (28) we obtain (28), replacing R(e, W ½ Z t³ ´ ¾ T ˆ ˆ J(u) = lim V + l(e, W ) + u R(e, W )u dτ t→∞ 0 ½ Z t 1 ˆ )Lg V (Lf V − Lg V T R−1 (e, W J(u) = lim V − t→∞ 2 0 + |Lw V | wb dτ )} ½ ¶ ¾ Z tµ ∂V J(u) = lim V − (ft + gt u + wper ) dτ t→∞ ∂e 0 ¾ ½ Z t V˙ dτ = lim V (0) = V (0) J(u) = lim V − 0
= xr2 = (1 − x2r1 )xr1 − 0.15xr2 + 0.3 cos(t)
For the simulations, the following recurrent neural network is used: x˙ = Ax + W z(xp ) + u (34)
1 + (umax Lg V )2
wb φ2 − 1)2 1 ≥ α1 (|e|) − β 0 (φ) = α1 (|e|) 2
t→∞
= xp2 + sat(u1 ) (32) ¡ ¢ 2 = 0.5 − xp1 xp2 − xp1 + 0.5 cos(1.1t) + sat(u2 )
with £ ¤ £ ¤ xp1 (0) xp2 (0) = 0 0.50 , umax 1 = umax 2 = 5
≥ α1 (|e|)
≥ α1 (|e|) −
In order to illustrate the applicability of the proposed adaptive control scheme with input saturation, the following example is tested. The unknown plant considered is the following modified Van der Pol nonlinear oscilator
x˙ p2
φ which makes the secConsidering the relation |e| > 1 χ−1 2 ond term strictly negative, we obtain q 1 (Lf V + χwb |e|)2 + (umax Lg V )4 ˆ) ≥ 4 · ¸ l(e, W q 2 1 + 1 + (umax Lg V )
1+
V. Example
during 5 seconds; then we apply the control law in order to obtain the trajectory tracking. The simulation results
6204
Fig. 2. Phase portrait of the reference system
Fig. 4. Applied inputs to neural network and plant.
Guadalajara, Jalisco, Mexico. References [1] [2] [3] [4] [5] [6] Fig. 3. Phase portrait of the plant ( Duffing oscillator ) [7]
are shown in Fig. 1-4, where the time evolution of the states and the respective phase portraits are presented. As shown, the unknown system track the reference trajectory with a very small error. VI. Conclusions
[8] [9]
[10]
An adaptive control structure based on a recurrent neural network for trajectory tracking of unknown nonlinear systems with input saturations was presented. This structure is composed of a neural network identifier, which builds an on-line model of the unknown plant, and a control law for trajectory tracking with input saturations developed using the inverse optimal control approach. Stability of the identification and tracking error is analized via Lyapunov methodology. Tracking error boundedness is established as a function of a design parameter with an estimation of the region of stability The applicability of the proposed structure was tested via simulations, by an example of two nonlinear oscillators synchronization; the results are quite encouraging. Research along this line proceeds to relax the required condition of having the same number of inputs and states. Acknowledgements.- The first author thanks the support of Centro de Enseñanza Tecnica Industrial (CETI)
[11] [12] [13]
[14]
6205
T. Basar and P. Bernhard, H-Infinity Optimal Control and Related Minimax Design Problems, Birkhauser, Boston, USA, 1995. N. H. El-Farra, P. D. Christofides, “Integrating robustness, optimality and constraints in control of nonlinear processes ”, Chemical Engineering Science, 56, 1841-1868,2001. K. Hunt, G. Irwin and K. Warwick (Eds.), Neural Networks Engineering in Dynamic Control Systems, Springer Verlag, New York, USA, 1995. H. Khalil, Nonlinear Systems, 2nd Ed., Prentice Hall, Upper Saddle River, NJ, USA, 1996. E. B. Kosmatopoulos et al., “Dynamical neural networks that ensure exponential identification error convergence”, Neural Networks, vol. 1, no. 2, pp 299-314, 1997. M. Krstic and H. Deng, Stabilization of Nonlinear Uncertain Systems, Springer Verlag, New York, USA, 1998. Y. Lin and E. Sontag, “A universal formula for stabilization with bounded controls ”, Systems and Control Letters, 16, 393397,1991. K. S. Narendra and K. Parthasarathy, “Identification and control of dynamical systems using neural networks”, IEEE Trans. on Neural Networks, vol. 1, no. 1, pp 4-27, 1990. A. S. Poznyak, W. Yu, E. N. Sanchez, and J. P. Perez, “Nonlinear adaptive trajectory tracking using dynamic neural networks”, IEEE Trans. on Neural Networks, vol. 10, no 6, pp 1402-1411, Nov. 1999. G. A. Rovitahkis and M. A. Christodoulou, Adaptive Control with Recurrent High-Order Neural Networks, Springer Verlag, New York, USA, 2000. E. N. Sanchez and J. P. Perez, “Input-to-state stability analysis for dynamic neural networks”, IEEE Trans. Circuits Syst. I, vol. 46, pp 1395-1398, 1999. E. N. Sanchez, J. P. Perez and G. Chen, “Using dynamic neural control to generate chaos: An inverse optimal control approach”, Int. J. of Bifurcation and Chaos, 2001. E. N. Sanchez , J. P. Perez and L. Ricalde “Recurrent neural control for robot trajectory tracking”, Proceedings of the 15th World Congress International Federation of Automatic Control, Barcelona Spain, July, 2002. K. Suykens, L. Vandewalle and R. de Moor, Artificial Neural Networks for Modelling and Control of Nonlinear Systems, Kluwer Academic Publishers, Boston, USA,1996.