Multiple Model Adaptive Tracking Control Based on Adaptive Dynamic

0 downloads 0 Views 2MB Size Report
Feb 17, 2016 - Adaptive dynamic programming (ADP) has been tested as an effective method for optimal control of nonlinear system. However,.
Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2016, Article ID 6023892, 12 pages http://dx.doi.org/10.1155/2016/6023892

Research Article Multiple Model Adaptive Tracking Control Based on Adaptive Dynamic Programming Kang Wang,1 Xiaoli Li,2 and Yang Li3 1

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100124, China 3 School of International Studies, Communication University of China (CUC), Beijing 100024, China 2

Correspondence should be addressed to Xiaoli Li; [email protected] Received 25 December 2015; Accepted 17 February 2016 Academic Editor: Filippo Cacace Copyright © 2016 Kang Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Adaptive dynamic programming (ADP) has been tested as an effective method for optimal control of nonlinear system. However, as the structure of ADP requires control input to satisfy the initial admissible control condition, the control performance may be deteriorated due to abrupt parameter change or system failure. In this paper, we introduce the multiple models idea into ADP, multiple subcontrollers run in parallel to supply multiple initial conditions for different environments, and a switching index is set up to decide the appropriate initial conditions for current system. By taking this strategy, the proposed multiple model ADP achieves optimal control for system with jumping parameters. The convergence of multiple model adaptive control based on ADP is proved and the simulation shows that the proposed method can improve the transient response of system effectively.

1. Introduction In recent years, multiple model adaptive control (MMAC) has been a research focus on improving the transient response of nonlinear system. In practical control process, system dynamics may change abruptly due to system failure or parameter change. Traditional adaptive control methods can not deal with this kind of change, resulting in bad transient response or even system unstability. According to multiple model adaptive control theory, multiple models will be established to cover system uncertainty; corresponding multiple controllers will also be constructed [1]. Based on the switching mechanism, at every moment, the controller corresponding to the model which is the closest to current system will be selected as the current controller. Thus, the transient response and the control property will be greatly improved. From 1990s, multiple model adaptive control based on index switching function has obtained satisfying results for linear system, linear time-variant system with jumping parameters, and stochastic system with stochastic

disturbance. However, for nonlinear system, there is still no identical research method or satisfying process result. Among the main MMAC researches for nonlinear system, multiple model adaptive control based on neural networks has attached more and more attention [2–4]. Because the neural network shows outstanding performance in approximating nonlinear system, it can turn the system uncertainty into the uncertainty of weights and structure of neural networks. Thus, multiple adaptive control for nonlinear system can be designed based on the change of weights and structure of neural networks. In recent years, neural networks (NNs) and fuzzy logic are widely used to handle the control problem of nonlinear systems owing to their fast adaptability and excellent approximation ability. For system without complete model information or system regarded as “black-box,” neural networks show great advantage. For uncertain nonlinear discretetime system with dead-zone input, [5] introduces NNs to approximate the unknown functions in the transformed systems, so that the tracking error converges with the dead zone handled by an adaptive compensative term. Fuzzy logic

2

Discrete Dynamics in Nature and Society

systems are used to approximate the unknown functions to achieve control for discrete-time system with backlash [6, 7] or input constraint [8]. Combining dynamic programming, neural networks, and reinforcement learning [9], adaptive dynamic programming (ADP) solved the problem of “curse of dimensionality” in traditional dynamic programming and provides a practical control scheme for optimal control of nonlinear system. ADP adopts two neural networks, one critic neural network to approximate the cost function and one actor neural network to approximate the control strategy, so that the optimal principle can be satisfied [10, 11]. In 2002, Murray proposed the iterative ADP algorithm for continuous-time system firstly. Iterative ADP can update the policy equation and value function by iteration of policy and value [12, 13]. However, iterative ADP can only be used to calculate offline due to its long-time calculation caused by uncertain iteration times. In recent years, online ADP strategies are proposed widely [14–17]. They can obtain the optimal solution in an adaptive means rather than by offline calculation. Paper [18] proposed a ADP tracking strategy which does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty. However, as in most existing online ADP methods, the controller needs the initial control to satisfy the admissible condition for corresponding system [15, 19, 20]. Thus, once system endures abrupt changes of parameters and control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP controller can not make the state track the desired trajectory any more. In this paper, we introduce MMAC into ADP; multiple models are established to cover uncertainty of system; correspondingly, multiple subcontrollers are constructed and run in parallel. A switching index function is introduced to decide the most accurate model to describe current system. Once there is a model switching, corresponding controller will be selected to provide its current state and control signal as the initial condition of system. Based on this idea, we design multiple fixed models if the submodels are precisely known. And, for imprecise estimation models, multiple fixed models and one adaptive model are combined to obtain an improved transient response. This paper is organized as follows. System with jumping parameters is described in Section 2. Then, a transformed ADP tracking control scheme is introduced and proved convergent in Section 3. In Section 4 the main structure of MMAC based on ADP is described and two kinds of MMAC strategies are introduced for precisely known submodels and imprecise models. Simulation experiments are shown in Section 5 and Section 6 concludes this paper.

where 𝑥(𝑘) ∈ 𝑅𝑛 represents system state and constrained control input is denoted by 𝑢(𝑘) ∈ Ω𝑢 = {𝑢(𝑘) | 𝑢(𝑘) = [𝑢1 (𝑘), 𝑢2 (𝑘), . . . , 𝑢𝑚 (𝑘)]𝑇 , 𝑢𝑖 (𝑘) ≤ 𝑢𝑖 , 𝑖 = 1, 2, . . . , 𝑚}, where 𝑢𝑖 is the constraint bound of the 𝑖th actuator. 𝜃(𝑘) is a timevarying parameter satisfying the following assumption. Assumption 1. 𝜃(𝑘) is a piecewise constant function in respect to 𝑘, 𝜃(𝑘) ∈ {𝜃𝑖 | 𝑖1 = 1, 2, . . . , 𝑑𝜃 }, where 𝑑𝜃 is finite integer. 𝜃(𝑘) does not change frequently, that is, time between two different constants is long enough. And 𝜃(𝑘) will finally stop at one constant. The objective of the tracking problem is to design an optimal controller with constrained control signal so that the output state can track the following desired trajectory in an optimal way: 𝑥𝑑 (𝑘 + 1) = 𝜓 (𝑥𝑑 (𝑘)) .

As shown in [21], (2) can generate large class of trajectories satisfying the requirement of most applications, including unit step, sinusoidal waveforms, and damped sinusoids.

3. Trajectory Tracking Based on ADP For the following nonlinear discrete-time nonlinear system without jumping parameters, 𝑥 (𝑘 + 1) = 𝑓 (𝑥 (𝑘)) + 𝑔 (𝑥 (𝑘)) 𝑢 (𝑘) ,

𝑒 (𝑘) = 𝑥 (𝑘) − 𝑥𝑑 (𝑘) .

(4)

We have 𝑥(𝑘) = 𝑒(𝑘) + 𝑥𝑑 (𝑘). Combining (2), (3), and (4), the following dynamic equation in respect to 𝑒(𝑘), 𝑥𝑑 (𝑘), and 𝑢(𝑘) is given: 𝑒 (𝑘 + 1) = 𝑥 (𝑘 + 1) − 𝑥𝑑 (𝑘 + 1) = 𝑓 (𝑒 (𝑘) + 𝑥𝑑 (𝑘)) + 𝑔 (𝑒 (𝑘) + 𝑥𝑑 (𝑘)) 𝑢 (𝑘) (5) − 𝜓 (𝑥𝑑 (𝑘)) . Rewrite (2) and (5) in the following matrix form [21]: [

𝑒 (𝑘 + 1) 𝑥𝑑 (𝑘 + 1)

𝑓 (𝑒 (𝑘)) + 𝑥𝑑 (𝑘) − 𝜓 (𝑥𝑑 (𝑘)) ] 𝜓 (𝑥𝑑 (𝑘))

]=[

𝑔 (𝑒 (𝑘) + 𝑥𝑑 (𝑘)) ] 𝑢 (𝑘) . +[ 0

(6)

Further, the system can be rewritten as the following transformed dynamics in terms of control input 𝑢(𝑘): (7)

where

Consider the following nonlinear discrete-time system with jumping parameters: 𝑥 (𝑘 + 1) = 𝑓 (𝑥 (𝑘) , 𝜃 (𝑘)) + 𝑔 (𝑥 (𝑘)) 𝑢 (𝑘) ,

(3)

define the following tracking error:

𝑋 (𝑘 + 1) = 𝐹 (𝑋 (𝑘)) + 𝐺 (𝑋 (𝑘)) 𝑢 (𝑘) ,

2. Problem Description

(2)

(1)

𝑋 (𝑘) = [

𝑒 (𝑘)

] ∈ 𝑅2𝑛 𝑥𝑑 (𝑘)

and 𝐺(𝑋(𝑘)) satisfies ‖𝐺(𝑋(𝑘))‖ ≤ 𝐺𝑀.

(8)

Discrete Dynamics in Nature and Society

3

The infinite-horizon scalar cost function can be defined as ∞

𝐽 (𝑋 (𝑘)) = ∑𝛾𝑖−𝑘 [𝑋 (𝑖)𝑇 𝑄𝑋 (𝑖) + 𝑊 (𝑢 (𝑖))] ,

(9)

𝑖=𝑘

where 0 < 𝛾 ≤ 1; 𝑄 ∈ 𝑅2𝑛×2𝑛 is defined as 𝑄0 0 ]. 𝑄=[ 0 0

̂𝑐 (𝑘) 𝑋 (𝑘)) , ̂𝐽 (𝑋 (𝑘)) = 𝑊 ̂ 𝑐 (𝑘) 𝜙𝑐 (𝑉 (10)

𝑄0 ∈ 𝑅𝑛×𝑛 is positive definite. To deal with constrained control input, we employ the following function [22]: 𝑊 (𝑢 (𝑖)) = 2 ∫

𝑢(𝑖)

0

𝜉

−𝑇

−1

(𝑈 𝜇) 𝑈𝑅𝑑𝜇,

𝜙𝑐 (⋅) ∈ 𝑅2𝑛×1 is the activation function and 𝜀𝑎𝑘 ∈ 𝑅 is bounded approximation error, and 𝑁𝑐 is the number of neurons in hidden layer. 𝜀𝑐𝑘 , 𝜙𝑐 (⋅), and gradient of 𝜙𝑐 (⋅) are assumed to be bounded as ‖𝜀𝑐𝑘 ‖ ≤ 𝜀𝑐𝑀, ‖𝜙𝑐 (⋅)‖ ≤ 𝜙𝑐𝑀, and 󸀠 , respectively. ‖𝜕𝜙𝑐 (𝑋(𝑘))/𝜕𝑋(𝑘)‖ ≤ 𝜙𝑀 The actual output of the critic NN is given as

(11)

where 𝑈 is a diagonal matrix defined as 𝑈 = diag(𝑢1 , 𝑢2 , . . . , 𝑢𝑚 ), 𝜇 ∈ 𝑅𝑚 , 𝑅 ∈ 𝑅𝑚×𝑚 is a positive definite diagonal matrix, and 𝜉(⋅) is a one-to-one function satisfying |𝜉(⋅)| ≤ 1 and its first derivative is bounded by a constant. At the same time, it should be a monotonic increasing odd function. Consider

̂ 𝑐 and 𝑉 ̂𝑐 are the estimations of 𝑊𝑐 and 𝑉𝑐 . where 𝑊 Then, the approximate HJB function error can be derived as follows: 𝑒𝑐 (𝑘) = 𝛾̂𝐽 (𝑋 (𝑘 + 1)) + 𝑟 (𝑋𝑘 , 𝑢𝑘 ) − ̂𝐽 (𝑋 (𝑘)) .

1 𝐸𝑐 (𝑘) = 𝑒𝑐2 (𝑘) . 2

According to Bellman optimal principle and the firstorder necessary condition, theoretical optimal control law can be calculated as 𝑢∗ (𝑘) = 𝑈𝜉 (𝜂∗ (𝑘)) ,

̂𝑐 (𝑘) − 𝛼𝑐 𝜕𝐸𝑐 (𝑘) ̂𝑐 (𝑘 + 1) = 𝑉 𝑉 𝑠𝑗 𝑠𝑗 ̂𝑐 (𝑘) 𝜕𝑉 𝑠𝑗

(12)

Definition 2 (see [23]). A control policy 𝑢(𝑋) is said to be admissible if 𝑢(𝑋) is continuous, 𝑢(0) = 0, 𝑢(𝑋) stabilizes (3), and, for every initial state 𝑋(0), 𝐽(𝑋(0)) is finite.

(13)

̂ 𝑐 (𝑘 + 1) = 𝑊 ̂ 𝑐 (𝑘) − 𝛼𝑐 𝜕𝐸𝑐 (𝑘) . 𝑊 𝑠 𝑠 ̂ 𝑐 (𝑘) 𝜕𝑊 𝑠

𝜕𝐸𝑐 (𝑘) ̂𝑐 (𝑘) 𝜕𝑉 𝑠𝑗 ̂ 𝑐 (𝑘) (1 − tanh2 (𝑍𝑐 (𝑘 + 1))) 𝑋𝑗 (𝑘 + 1) = 𝑒𝑐 (𝑘) 𝛾𝑊 𝑠 𝑠



𝐽 (𝑋 (𝑘)) = 𝑟 (𝑋𝑘 , 𝑢𝑘 ) + 𝛾𝐽 (𝑋 (𝑘 + 1)) ,

(14)

̂ where 𝑍𝑐𝑠 (𝑘) = ∑2𝑛 𝑗=1 𝑉𝑐𝑠𝑗 (𝑘)𝑋𝑗 (𝑘) and 𝑍𝑐𝑠 (𝑘 + 1) 2𝑛 ̂ ∑ 𝑉𝑐 (𝑘)𝑋𝑗 (𝑘 + 1). Then 𝑗=1

(15)

where 𝑟(𝑋𝑘 , 𝑢𝑘 ) = 𝑋(𝑘)𝑄𝑋(𝑘) + 𝑊(𝑢(𝑘)). In the following part of this section, an online actor-critic structure is introduced to solve the optimal tracking problem, the critic neural network (NN) is designed to approximate the value function, and the actor NN is designed to approximate the optimal control signal. (1) Critic NN. A two-layer NN is utilized as the critic NN to approximate the value function 𝐽∗ (𝑋 (𝑘)) = 𝑊𝑐 𝜙𝑐 (𝑉𝑐 𝑋 (𝑘)) + 𝜀𝑐𝑘 ,

(21)

̂ 𝑐 (𝑘) (1 − tanh2 (𝑍𝑐 (𝑘))) 𝑋𝑗 (𝑘) , − 𝑒𝑐 (𝑘) 𝑊 𝑠 𝑠

and theoretical HJB equation is derived as ∗

(20)

In this paper, we select the activation function of critic NN as 𝜙𝑐 (⋅) = tanh(⋅), so we have

where −𝑇 𝛾 𝜕𝐽∗ (𝑋 (𝑘 + 1)) 𝜂∗ (𝑘) = − (𝑈𝑅) 𝐺 (𝑋 (𝑘))𝑇 2 𝜕𝑋 (𝑘 + 1)

(19)

Using the gradient-descent method, the update law of the critic NN is given as

−1

𝑢 (𝑖) 𝑇 𝑢1 (𝑖) −1 𝑢2 (𝑖) )𝜉 ( ) ⋅ ⋅ ⋅ 𝜉−1 ( 𝑚 )] . 𝑢1 (𝑖) 𝑢2 (𝑖) 𝑢𝑚 (𝑖)

(18)

The goal of critic NN is to minimize the following function:

𝜉−1 (𝑈 V) = [𝜉−1 (

(17)

(16)

where 𝑉𝑐 ∈ 𝑅𝑁𝑐 ×2𝑛 and 𝑊𝑐 ∈ 𝑅1×𝑁𝑐 are constant target weights of the hidden layer and output layer, respectively,

=

𝑠𝑗

𝜕𝐸𝑐 (𝑘) = 𝑒𝑐 (𝑘) 𝛾𝜋𝑐 𝑍𝑐𝑠 (𝑘 + 1) − 𝑒𝑐 (𝑘) 𝜋𝑐 𝑍𝑐𝑠 (𝑘) . ̂ 𝑐 (𝑘) 𝜕𝑊

(22)

𝑠

(2) Actor NN. To obtain the optimal control input, a two-layer NN is utilized as the actor NN to approximate 𝜂∗ (𝑘): 𝜂∗ (𝑘) = 𝑊𝑎 𝜙𝑎 (𝑉𝑎 𝑋 (𝑘)) + 𝜀𝑎𝑘 ,

(23)

where 𝑉𝑎 ∈ 𝑅𝑁𝑎 ×2𝑛 and 𝑊𝑎 ∈ 𝑅𝑚×𝑁𝑎 are constant target weights of the hidden layer and output layer, respectively, 𝜙𝑎 (⋅) ∈ 𝑅2𝑛 is corresponding activation function, 𝜀𝑎𝑘 ∈ 𝑅 is the bounded approximation error, and 𝑁𝑎 is the number of neurons in hidden layer. 𝜙𝑎 (⋅) and 𝜀𝑎𝑘 are assumed to be bounded as ‖𝜙𝑎 (⋅)‖ ≤ 𝜙𝑎𝑀 and ‖𝜀𝑎𝑘 ‖ ≤ 𝜀𝑎𝑀.

4

Discrete Dynamics in Nature and Society The actual output of the actor NN is given as ̂𝑎 (𝑘) 𝑋 (𝑘)) ̂ 𝑎 (𝑘) 𝜙𝑎 (𝑉 𝜂̂ (𝑘) = 𝑊 = [̂𝜂1 (𝑘) , 𝜂̂2 (𝑘) , . . . , 𝜂̂𝑚 (𝑘)] 𝑁𝑎

2𝑛

𝑙=1

𝑗=1

𝑇

(24)

̂ 𝑎 (𝑘) 𝜙𝑎 ( ∑𝑉 ̂𝑎 (𝑘) 𝑋𝑗 (𝑘)) , 𝜂̂𝑖 (𝑘) = ∑𝑊 𝑖𝑙 𝑙𝑗 ̂𝑎 are the estimations of 𝑊𝑎 and 𝑉𝑎 , respeĉ 𝑎 and 𝑉 where 𝑊 tively. Using (17), the actual approximation target is −1 𝛾 𝜕̂𝐽 (𝑋 (𝑘 + 1)) 𝜂̃ (𝑋 (𝑘)) = − (𝑈𝑅) 𝐺 (𝑋 (𝑘))𝑇 . 2 𝜕𝑋 (𝑘 + 1)

(25)

The goal of the actor NN is to minimize the following function: 𝑚 1 𝐸𝑎 (𝑘) = ∑ 𝑒𝑎2𝑖 (𝑘) , 𝑖=1 2

(26)

where the actor NN approximation error is defined as 𝑒𝑎𝑖 (𝑘) = 𝜂̃𝑖 (𝑋 (𝑘)) − 𝜂̂𝑖 (𝑘) .

(27)

Using the gradient-descent method, the update law of the actor NN is given as ̂𝑎 (𝑘 + 1) = 𝑉 ̂𝑎 (𝑘) − 𝛼𝑎 𝜕𝐸𝑎 (𝑘) 𝑉 𝑙𝑗 𝑙𝑗 ̂𝑎 (𝑘) 𝜕𝑉 𝑙𝑗 ̂ 𝑎 (𝑘 + 1) = 𝑊 ̂ 𝑎 (𝑘) − 𝛼𝑎 𝜕𝐸𝑎 (𝑘) . 𝑊 𝑖𝑙 𝑖𝑙 ̂ 𝑎 (𝑘) 𝜕𝑊

(28)

𝑖𝑙

In this paper, activation function of actor NN is selected ̂ as 𝜙𝑎 (⋅) = tanh(⋅). Define 𝑍𝑎𝑙 (𝑘) = ∑2𝑛 𝑗=1 𝑉𝑎𝑙𝑗 (𝑘)𝑋𝑗 (𝑘); we have 𝜕𝐸𝑎 (𝑘) ̂𝑎 (𝑘) 𝜕𝑉 𝑙𝑗 𝑚

̂ 𝑎 (𝑘) (1 − tanh2 (𝑍𝑎 (𝑘))) 𝑋𝑗 (𝑘) = −∑𝑒𝑎𝑖 (𝑘) 𝑊 𝑖𝑙 𝑙

(29)

𝑖=1

𝜕𝐸𝑎 (𝑘) = −𝑒𝑎𝑖 (𝑘) tanh (𝑍𝑎𝑙 (𝑘)) . ̂ 𝑎 (𝑘) 𝜕𝑊

Theorem 4. For nonlinear discrete-time system given by (7), let the weight tuning laws of the critic NN and actor NN be given by (20) and (28), respectively, and let the initial weight of the actor NN reflect the initial admissible control of system (7). There exist positive constants 𝛼𝑐 and 𝛼𝑎 such that system state and estimation errors of two networks are all uniformly ultimately bounded (UUB). Proof of Theorem 4 is shown in the Appendix. In contrast with traditional ADP tracking strategies, the above method does not require the knowledge of the system drift dynamics. By this means, it supplies some adaptability that for systems with different drift dynamics this method can still make system state track the desired trajectory. However, for different systems, initial admissible control conditions must be required.

4. Multiple Model Control Scheme Based on ADP In this section, firstly, we propose the multiple model ADP for system with accurately known submodels. Secondly, an adaptive ADP main controller is introduced so that the new multiple model ADP can deal with the problem of estimated submodel. 4.1. Multiple Model ADP with Accurately Known Submodels. In this section, we consider the case that known submodels can reflect system dynamics at every working point precisely as follows: 𝑀𝑙 : 𝑋𝑙 (𝑘 + 1) = 𝐹𝑙 (𝑋𝑙 (𝑘)) + 𝐺 (𝑋𝑙 (𝑘)) 𝑢𝑙 (𝑘) ,

(31)

where 𝑙 ∈ {1, 2, . . . , 𝑀}. According to the idea of multiple model adaptive control, it is natural to design independent multiple subcontrollers to track the target trajectory in parallel and use a switch index function to decide the best controller to control current system. The main structure of multiple model ADP controller for accurate known submodels is shown in Figure 1. For every submodel 𝑀𝑙 , according to Theorem 4, if initial weights of the actor NN 𝑊𝑎 (1) reflecting initial admissible control are given and the weights of two NNs are tuned online according to (20) and (28), respectively, with appropriate learning rates, then, output states can track the desired trajectory in the optimal manner. Thus, multiple subcontrollers can be constructed as follows:

𝑖𝑙

𝐶𝑙 = {𝑢𝑙 | 𝑊𝑎 (1) , 𝛼𝑎𝑙 , 𝛼𝑐𝑙 , 𝑙 = 1, 2, . . . , 𝑀} ,

Finally, optimal control signal is obtained as follows: ̂ (𝑘) = 𝑈𝜉 (̂𝜂 (𝑘)) . 𝑢

(30)

Remark 3. To obtain the optimal control policy, the actor NN is designed to approximate 𝜂∗ (𝑘) so that the control signal can be strictly restricted in given constraints by using the 𝜉(⋅) function as in (30), while, in some cases, the actor NN approximates 𝑢∗ (𝑘) directly, resulting in control signal out of constraints due to unsuitable weights in the initial period.

(32)

where 𝛼𝑎𝑙 and 𝛼𝑐𝑙 are the learning rates of the two NNs for model 𝑀𝑙 . For every moment, the following index function is calculated to show the matching degree between current system and every model: 𝑘

𝐼𝑙 (𝑘) = ∑ 𝛽𝑘−𝑚 𝑒𝑙2 (𝑘) , 𝑚=1

(33)

Discrete Dynamics in Nature and Society

5 ̂1 X k+1 Model M1

+

.. .

̂l X k+1

ekl

+



Model Ml .. . ̂M X k+1 Model MM uk1

. ukl ..

+

ekM −

ek1 −

Index function .. . Index function .. . Index function

Ik1

Ikl

IkM

Delay Xk

uk

Plant

.. .

Xk+1

Xk+1

Controller C1

ukM

.. .

Switch function

.. . .. .

Controller Cl .. . Controller CM

Figure 1: Structure of multiple model ADP with accurate submodels.

where 0 < 𝛽 < 1 is the forgetting factor, and model error is given as 𝑒𝑙 (𝑘) = 𝑋𝑙󸀠 (𝑘) − 𝑋(𝑘) where 𝑋𝑙󸀠 (𝑘 + 1) = 𝐹𝑙 (𝑋 (𝑘)) + 𝐺 (𝑋 (𝑘)) 𝑢 (𝑘) .

(34)

At every moment, the most accurate model to describe current system 𝑀𝐿(𝑘) is selected, 𝐿 (𝑘) = arg min 𝐼𝑙 (𝑘) , 𝑙∈Ω

(35)

and, at the switching point, 𝑋(𝑘) = 𝑋𝐿 (𝑘) and the controller 𝐶𝐿(𝑘) is selected to control the system. 4.2. Multiple Model ADP with Estimated Submodels. In Section 4.1, we discussed multiple model ADP for accurately known submodels. Because the submodels are precisely known, subcontroller can control the system directly if corresponding submodel matches current system. However, practically, submodels are estimated and somehow unprecise. In this case, using control scheme in Section 4.1 can not obtain satisfying control result as this scheme lacks adaptivity. For system with jumping parameters as (1), system can be viewed as different dynamic characters described by different fixed parameters. ADP tracking controller discussed in Section 3 does not require any knowledge of drift dynamics of the system, which means it has the adaptivity to deal with model uncertainty. However, the controller needs the initial control to satisfy the admissible condition for corresponding system. Thus, once system endures abrupt changes of parameters and the control signal at the change moment does not satisfy initial admissible condition after parameter change, the ADP

controller can not make the state track the desired trajectory accurately. The main idea of multiple model adaptive ADP to deal with estimated submodels is designing a main controller to improve its adaptivity and using multiple submodels and subcontrollers to guarantee the initial admissible control condition after system changes. The structure of multiple models ADP is shown in Figure 3. The main procedure of multiple model adaptive ADP tracking control is as follows: (1) Multiple models are established to cover the uncertainty of system. (2) Multiple ADP subcontrollers are set up according to multiple models. (3) Multiple independent subcontrollers run in parallel to track the same referenced trajectory. (4) At every moment, a switching index function is calculated to decide the closest model corresponding to current system. (5) Once there is model switch showed by switching index function, state and control of corresponding subcontroller are selected as the initial condition of the main ADP controller for the new stage. 4.2.1. Design of Main Controller. For system with jumping parameters as (1), we adopt the ADP controller in Section 3 as the main controller. However, different initial parameters including the initial admissible control and initial state

6

Discrete Dynamics in Nature and Society ̂ l (k + 1) X

Model Ml Delay

̂ l (k) X

̂ l (k) u

Initial parameters Pl

̂ l (k) u

Subcontroller Cl

̂ l (k) X

̂ al (k) W ̂ al (k) V

Figure 2: Structure of multiple model ADP with estimated submodels.

should be given according to the control and state of the most accurate model as shown in Figure 3. 4.2.2. Establishment of Multiple Models and Subcontrollers. According to different working conditions, multiple estimation models are constructed to cover system uncertainty. And it has to be ensured that, for every working condition, there must be at least one model which is close enough to corresponding plant. Multiple models are set up as ̂ 𝑙 (𝑘 + 1) = 𝐹 ̂ 𝑙 (𝑘)) + 𝐺 (𝑋 ̂ 𝑙 (𝑋 ̂ 𝑙 (𝑘)) 𝑢𝑙 (𝑘) , 𝑀𝑙 : 𝑋

(36)

where 𝑙 ∈ {1, 2, . . . , 𝑀}. Theoretically, for given system with jumping parameters, control performance will be improved with more submodels. However, too many submodels will also increase the calculation and may cause frequent model switching. Suitable number of submodels depends on experience and simulation. According to Theorem 4, for model 𝑀𝑙 , we can find ̂ 𝑙 (1) reflecting corresponding initial weight of the actor NN 𝑊 𝑎 ̂ 𝑙 (1), and appropriate learning the initial admissible control 𝑢 rates 𝛼𝑎𝑙 and 𝛼𝑐𝑙 , so that output state of model 𝑀𝑙 can track the desired trajectory 𝑥𝑑 if the weights of two NNs are tuned according to (20) and (28), respectively. Thus, multiple ADP subcontrollers corresponding to multiple models can be designed as follows: ̂ 𝑙 (1) , 𝛼𝑎𝑙 , 𝛼𝑐𝑙 , 𝑙 = 1, 2, . . . , 𝑀} , 𝑢𝑙 | 𝑢 𝐶𝑙 = {̂

(37)

̂ 𝑙 (1) is the initial ̂ 𝑙 (1) is initial admissible control, 𝑋 where 𝑢 states, 𝛼𝑐𝑙 and 𝛼𝑎𝑙 are selected learning rates of the critic and actor NNs, respectively, for the 𝑙th model. Figure 2 shows the structure of subcontrollers. The role of ADP subcontrollers is to supply initial param̂ 𝑙 (𝑘) and initial weights eters IP𝑙 (𝑘), including initial state 𝑋 ̂ ̂ of the actor NN 𝑊𝑎𝑙 (𝑘) and 𝑉𝑎𝑙 (𝑘) which reflect the initial ̂ 𝑙 (𝑘), for main ADP controller: control 𝑢 ̂ 𝑙 (𝑘) , 𝑊 ̂ 𝑎𝑙 (𝑘) , 𝑉 ̂𝑎𝑙 (𝑘)] . IP𝑙 (𝑘) = [𝑋

4.2.3. Choice of Switching Mechanism. At every moment, the switching function determines which model is closest to system and which group of initial parameters should be given to the main controller. To avoid incorrect switch caused by performance of single point, we employ the following accumulation of model error as the index function: 𝑘

𝐼𝑙 (𝑘) = ∑ 𝛽𝑘−𝑚 𝑒𝑙2 (𝑘) ,

(39)

𝑚=1

where 0 < 𝛽 < 1 is the forgetting factor, model estimation ̂ 󸀠 (𝑘) − 𝑋(𝑘), and error of the 𝑙th model is defined as 𝑒𝑙 (𝑘) = 𝑋 𝑙 󸀠 ̂ (𝑘) is defined as follows: model state for comparing 𝑋 𝑙 ̂ 󸀠 (𝑘 + 1) = 𝐹 ̂ 𝑙 (𝑋 (𝑘)) + 𝐺 (𝑋 (𝑘)) 𝑢 (𝑘) . 𝑋 𝑙

(40)

At every moment, the best model to describe current system 𝑀𝐿(𝑘) will be selected, 𝐿 (𝑘) = arg min 𝐼𝑙 (𝑘) , 𝑙∈Ω

(41)

and the state and actor NN’s weights IP𝐿(𝑘) (𝑘) of the subcontroller 𝐶𝐿(𝑘) will be selected as the initial parameters of the main controller for the system of new stage. Remark 5. Theoretically, multiple model ADP with estimated submodels in Section 4.2 is still effective if the estimated submodels describe system models precisely. Actually, multiple model ADP with accurate submodels in Section 4.1 can be viewed as a special case of the multiple model ADP with estimated submodels. Remark 6. If submodels are precisely known, model error 𝑒𝑙 (𝑘) turns zero if the submodel matches current system. And the model indexed by switching performance function (33) is the most accurate normally. However, in few cases such as when there is disturbance, (33) will index the wrong best model. Moreover, in most cases, precise jumping parameters are hard to be obtained. Therefore, we prefer the latter multiple model scheme given in Section 4.2 with main ADP controller as the final optimal control strategy. Remark 7. For multiple model adaptive ADP described in Section 4.2, convergence can hardly be proved if there are infinite model switches. In order to facilitate the convergence and stability, the following assumption is made. Assume that model switch starts at time step 𝑘0 , we set up a period Δ𝑘 and a tracking error limit 𝜀0 to improve transient response. If 𝑘 > 𝑘0 + Δ𝑘 or 𝑒(𝑘) < 𝜀0 , the switching between multiple submodels is stopped and the main ADP controller keeps working. Thus, the optimal tracking control problem can be viewed the same as in Section 3 and convergence can be guaranteed by Theorem 4.

(38)

As the parameter change and model switch can happen at any moment, multiple independent subcontrollers run in parallel at all times.

5. Simulation In this section, an experiment is constructed to show the effectiveness of the proposed method.

Discrete Dynamics in Nature and Society

7 ̂1 X k+1 Model M1 .. . Model Ml .. . Model MM

+

+

+

Ik1

Index function .. .



ekl

̂l X k+1 ̂M X k+1

ek1

Ikl

Index function .. .



ekM

Index function



IkM

Delay Xk Plant IP 1k

IP lk

Xk+1

Xk+1

uk .. .

Main ADP controller

.. IP M . k

Subcontroller C1 .. . Subcontroller Cl .. .

Switch function

.. . .. . ̂ lk , W ̂ lak , V ̂ lak[ IP lk = [X

Subcontroller CM

Figure 3: ADP subcontroller.

Consider the following nonlinear system:

The control objective is to force the system state to track the following target trajectory in an optimal manner: 𝑥𝑑1 (𝑘 + 1) = 0.6 + 𝑒−0.2 (𝑥𝑑1 (𝑘) − 0.6)

𝑥1 (𝑘 + 1) = 𝑥2 (𝑘) 𝑥2 (𝑘 + 1) = −𝑥12 (𝑘) + 𝑏𝑥2 (𝑘) + (1 + 𝑥12 (𝑘)) 𝑢 (𝑘) ,

(42)

where 𝑥 = [𝑥1 𝑥2 ]𝑇 is the state vector with initial state 𝑥(0) = [1 0.5]𝑇 and control input 𝑢 ∈ 𝑅 is bounded by 𝑢(𝑘) ≤ 0.4. Compared with system (1), drift dynamics and the input dynamics can be denoted as

𝑓 (𝑥 (𝑘)) = [

𝑥2 (𝑘) −𝑥12

(𝑘) + 𝑏𝑥2 (𝑘) 0

] (43)

where jumping parameters 𝑏 satisfy

(44)

(45)

where 𝑥𝑑 (0) = [0 0.6(1 − 𝑒−0.2 )]. In this example, the cost function as (9) is designed with 𝛾 = 0.2, 𝑅 = 1 and 𝑄0 = 5𝐼2 , where 𝐼2 is a twodimensional unit diagonal matrix. Two known estimated submodels consist of the same input dynamics as in (42) and the following drift dynamics: 𝑥2 (𝑘) ̃ (𝑘) = [ ], 𝑓 𝑙 −𝑥12 (𝑘) + ̃𝑏𝑙 𝑥2 (𝑘)

], 𝑔 (𝑘) = [ 1 + 𝑥12 (𝑘)

1.1, if 𝑘 < 50 { { { { 𝑏 = {1.95, elseif 𝑘 < 100 { { { otherwise. {1.1

𝑥𝑑2 (𝑘 + 1) = 0.6 + 𝑒−0.2 (𝑥𝑑2 (𝑘) − 0.6) ,

(46)

where 𝑙 = 1, 2 and ̃𝑏1 = 1 and ̃𝑏2 = 2. The proposed ADP control algorithm is applied to construct subcontrollers and the main controller. The two-layer critic and actor NNs are designed with five neurons in the hidden layers and the activation functions both adopt the tanh(⋅) function. For the subcontrollers, initial weights of the critic NN are set as random values in [−1, 1]; initial weights of the critic NN are selected to reflect the initial admissible control for corresponding submodels. (1) Single Model ADP. For nonlinear system with jumping parameters as (42), only the first submodel and subcontroller

8

Discrete Dynamics in Nature and Society 4 3 2

2 1 0 −1

1

−2 −3

0

10

20

30 Time steps

40

50

0

50

are adopted. From Figure 4, we can see that the system state can track the desired trajectory perfectly at the first stage. However, after parameter change at the 50th time step, adopted submodel does not match system, and the controller can not track the desired trajectory any more. In Figure 4, output state after step 55 is omitted as the state diverges far away from desired trajectory.

150

200

Figure 5: The switching sequence.

x1 xd1

Figure 4: The state trajectory 𝑥1 and desired trajectory 𝑥𝑑1 .

100 Time steps

1

0.5

0

(2) Multiple Model ADP. The proposed multiple model ADP strategy is adopted to control the system with jumping parameters. Forgetting factor 𝛽 in the switching index function (39) is set to be 0.4. Figure 5 shows that the model switching process can perfectly match the system change. For example, after parameter changes from 1.1 to 1.95 at the 50th time step, models are switched from model 𝑀1 with ̃𝑏1 = 1 to model 𝑀2 with ̃𝑏 = 2. 2 The control result by using the proposed multiple model ADP method is showed in Figures 6 and 7. States deviate from the desired trajectory when there are parameter changes. However, as the most precise model is selected and corresponding initial parameters are given to the main controller, system states can track the desired trajectory again after some transient process. Figure 8 shows that the control input is bounded in [−0.4 0.4].

6. Conclusion This paper proposes a ADP based multiple model adaptive control scheme for nonlinear system with jumping parameters. System uncertainty is covered by multiple submodels; corresponding multiple subcontrollers are constructed and run in parallel. A switch mechanism is introduced to decide the most precise model and corresponding initial parameters, so that the initial admissible condition can be satisfied at the whole time. The proposed method realizes the optimal

−0.5

0

50

100 Time steps

150

200

x1 xd1

Figure 6: The state trajectory 𝑥1 and desired trajectory 𝑥𝑑1 .

tracking control for nonlinear system with jumping parameters and improves the transient response and control quality greatly. As the model switch occurs after some period of model error accumulation, transient response may not be satisfying. In the future, the main work focuses on designing a scheme to improve the control quality further by introducing an upper limit of state or adopting multiple set-points.

Appendix Proof of Theorem 4. For simplification of proof process, we consider the weights of hidden layer of two NNs that keep fixed after some tuning time. Define weight estimation error of the action and critic ̂ 𝑎 (𝑘) − 𝑊𝑎 and 𝑊 ̃ 𝑐 (𝑘) = 𝑊 ̂ 𝑐 (𝑘) − 𝑊𝑐 , ̃ 𝑎 (𝑘) = 𝑊 network as 𝑊 ̂𝑐 𝑋(𝑘)) respectively. For simplification, denote 𝜙𝑐 (𝑘) = 𝜙𝑐 (𝑉

Discrete Dynamics in Nature and Society

9 denotes the maximum singular value of 𝑅. The first difference of 𝑉 is given as

0.8 0.6

Δ𝑉 =

0.4

2 𝛼𝑐 Λ𝜙𝑐𝑚 1 + Δ𝑉𝑎 + Δ𝑉𝑐 . 2 𝛼𝑐 𝛼𝑎 Π𝑎 (1 + Λ𝜙𝑐𝑀)

0.2 0

Δ𝑉1 = 𝑋𝑇 (𝑘 + 1) 𝑋 (𝑘 + 1) − 𝑋𝑇 (𝑘) 𝑋 (𝑘)

−0.4

0

50

100 Time steps

150

200

󵄩 = 󵄩󵄩󵄩󵄩𝐹 (𝑋 (𝑘))

̃ 𝑎 (𝑘) 𝜙𝑎 (𝑘) + 𝑢∗ (𝑘) − 𝜀𝑎𝑘 )󵄩󵄩󵄩󵄩 + 𝐺 (𝑋 (𝑘)) (𝑊 󵄩

x2 xd2

󵄩 − 𝑋𝑇 (𝑘) 𝑋 (𝑘) ≤ 2 󵄩󵄩󵄩𝐹 (𝑋 (𝑘))

Figure 7: The state trajectory 𝑥2 and desired trajectory 𝑥𝑑2 .

(A.3)

󵄩2 󵄩2 󵄩 + 𝐺 (𝑋 (𝑘)) 𝑢∗ (𝑘)󵄩󵄩󵄩 + 4 󵄩󵄩󵄩𝐺 (𝑋 (𝑘)) 𝜀𝑎𝑘 󵄩󵄩󵄩 󵄩 ̃ 𝑎 (𝑘) 𝜙𝑎 (𝑘)󵄩󵄩󵄩󵄩2 − ‖𝑋 (𝑘)‖2 . + 4 󵄩󵄩󵄩󵄩𝐺 (𝑋 (𝑘)) 𝑊 󵄩

0.4

The optimal closed loop system is upper bounded as

0.3

󵄩2 󵄩󵄩 ∗ ∗ 2 󵄩󵄩𝐹 (𝑋 (𝑘)) + 𝐺 (𝑋 (𝑘)) 𝑢 (𝑘)󵄩󵄩󵄩 ≤ 𝑘 ‖𝑋 (𝑘)‖ ,

0.2

(A.4)

̃ 𝑎 (𝑘)𝜙𝑎 (𝑘); we where 𝑘∗ is a positive constant. Define Ξ𝑎𝑘 = 𝑊 get

0.1

2 󵄩 󵄩󵄩Ξ𝑎𝑘 󵄩󵄩󵄩2 Δ𝑉1 ≤ − (1 − 2𝑘∗ ) ‖𝑋 (𝑘)‖2 + 4𝐺𝑀 󵄩 󵄩

0

󵄩2 2 󵄩 + 4𝐺𝑀 󵄩󵄩󵄩𝜀𝑎𝑘 󵄩󵄩󵄩 .

−0.1 −0.2

(A.2)

First, considering Δ𝑉1 , substituting (7) and (13), and then applying the Cauchy-Schwarz inequality [24] obtain

−0.2

−0.6

2 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚 Δ𝑉1 2 Π (1 + 𝜙2 ) (1 + Λ𝜙2 ) 4𝐺𝑀 𝑎 𝑎𝑀 𝑐𝑀

(A.5)

Next, consider 0

50

100 Time steps

150

200

Figure 8: Control input 𝑢 of jumping-parameters system with multiple ADP controllers.

̃ 𝑇 (𝑘 + 1) − 𝑊 ̃ 𝑇 (𝑘) . (A.6) ̃ 𝑐 (𝑘 + 1) 𝑊 ̃ 𝑐 (𝑘) 𝑊 Δ𝑉𝑐 (𝑘) = 𝑊 𝑐 𝑐 Substituting (15) and (16) reveals 𝑟 (𝑋 (𝑘) , 𝑢 (𝑘)) = −𝑊𝑐 Λ𝜙𝑐 (𝑘) − Λ𝜀𝑐𝑘 .

̂𝑎 𝑋(𝑘)), where 𝑉 ̂𝑐 and 𝑉 ̂𝑎 are fixed and 𝜙𝑎 (𝑘) = 𝜙𝑎 (𝑉 estimation weights after tuning. Denote Λ𝜙𝑐 (𝑘) = 𝜙𝑐 (𝑋(𝑘 + 1))𝛾 − 𝜙𝑐 (𝑋(𝑘)) and Λ𝜀𝑐𝑘 = 𝛾𝜀𝑐(𝑘+1) − 𝜀𝑐𝑘 . Assume Λ𝜙𝑐 (𝑘) is bounded as Λ𝜙𝑐𝑚 < ‖Λ𝜙𝑐 (𝑘)‖ < Λ𝜙𝑐𝑀. Consider the following positive definite Lyapunov candidate: 𝑉=

2 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚 𝑉 (𝑘) 2 Π (1 + 𝜙2 ) (1 + Λ𝜙2 ) 1 4𝐺𝑀 𝑎 𝑎𝑀 𝑐𝑀

+

2 𝛼𝑐 Λ𝜙𝑐𝑚

𝛼𝑎 Π𝑎 (1 +

𝑉 2 ) 𝑎 Λ𝜙𝑐𝑀

(𝑘) +

1 𝑉 (𝑘) , 𝛼𝑐 𝑐

(A.1)

̃ 𝑐 (𝑘)𝑊 ̃ 𝑇 (𝑘), 𝑉𝑎 (𝑘) = where 𝑉1 (𝑘) = 𝑋𝑇 (𝑘)𝑋(𝑘), 𝑉𝑐 (𝑘) = 𝑊 𝑐 ̃ 𝑎 (𝑘)𝑊 ̃ 𝑇 (𝑘), Π𝑎 = (5𝛼𝑎 + 2)(𝛾𝜙󸀠 𝐺𝑀𝜆 𝑅𝑀)2 , and 𝜆 𝑅𝑀 𝑊 𝑎

𝑀

(A.7)

Further, combining (17) and (18), we have ̂ 𝑐 (𝑘) 𝜙𝑐 (𝑘 + 1) 𝑒𝑐 (𝑘) = 𝑟 (𝑋 (𝑘) , 𝑢 (𝑘)) + 𝛾𝑊 ̂ 𝑐 (𝑘) 𝜙𝑐 (𝑘) −𝑊 ̂ 𝑐 (𝑘) Λ𝜙𝑐 (𝑘) = 𝑟 (𝑋 (𝑘) , 𝑢 (𝑘)) + 𝑊 ̃ 𝑐 (𝑘) Λ𝜙𝑐 (𝑘) − Λ𝜀𝑐𝑘 . =𝑊 ̂ 𝑐 (𝑘 + 1) = 𝑊 ̂ 𝑐 (𝑘) − 𝛼𝑐 𝜕𝐸𝑐 (𝑘) 𝑊 ̂ 𝑐 (𝑘) 𝜕𝑊 ̂ 𝑐 (𝑘) − =𝑊

𝛼𝑐 𝑒𝑐 (𝑘) Λ𝜙𝑐𝑇 (𝑘) . 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)

(A.8)

10

Discrete Dynamics in Nature and Society

Thus, we have the dynamics in respect to weight estimation error of the critic NN as ̃ 𝑐 (𝑘 + 1) 𝑊

According to (17) and (A.12), 𝑒𝑎 (𝑘) can be rewritten as ̂ 𝑎 (𝑘) 𝜙𝑎 (𝑘) 𝑒𝑎 (𝑘) = 𝑊

̃ 𝑐 (𝑘) − =𝑊

̃ 𝑐 (𝑘) Λ𝜙𝑐 (𝑘) − Λ𝜀𝑐𝑘 ) Λ𝜙𝑇 (𝑘) 𝛼𝑐 (𝑊 𝑐 1+

Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐

(𝑘)

(A.9)

𝛾 𝜕̂𝐽 (𝑋 (𝑘 + 1)) + 𝑅−1 𝐺𝑇 (𝑋 (𝑘)) 2 𝜕𝑋 (𝑘 + 1)

.

̃ 𝑎 (𝑘) 𝜙𝑎 (𝑘) =𝑊

̃ 𝑐 (𝑘)Λ𝜙𝑐 (𝑘); we have Denote Ξ𝑐𝑘 = 𝑊 𝑇

𝜕𝜙𝑇 (𝑘 + 1) 𝑇 𝛾 ̃ (𝑘) + 𝑅−1 𝐺𝑇 (𝑋 (𝑘)) 𝑐 𝑊 2 𝜕𝑋 (𝑘 + 1) 𝑐

𝑇

̃ (𝑘 + 1) − 𝑊 ̃ (𝑘) ̃ 𝑐 (𝑘 + 1) 𝑊 ̃ 𝑐 (𝑘) 𝑊 Δ𝑉𝑐 (𝑘) = 𝑊 𝑐 𝑐 ≤−

󵄩 󵄩2 2𝛼𝑐 󵄩󵄩󵄩Ξ𝑐𝑘 󵄩󵄩󵄩 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)

+

̃ 𝑐 (𝑘) Λ𝜙𝑐 (𝑘) Λ𝜀𝑇 𝛼𝑐 𝑊 𝑐𝑘 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)

+

̃ (𝑘) 𝛼𝑐 Λ𝜀𝑐𝑘 Λ𝜙𝑐𝑇 (𝑘) 𝑊 𝑐 𝑇 1 + Λ𝜙𝑐 (𝑘) Λ𝜙𝑐 (𝑘)

+

𝑇 𝛼𝑐2 Λ𝜀𝑐𝑘 Λ𝜀𝑐𝑘 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)



̃ (𝑘) 𝛼𝑐2 Λ𝜀𝑐𝑘 Λ𝜙𝑐𝑇 (𝑘) 𝑊 𝑐 𝑇 1 + Λ𝜙𝑐 (𝑘) Λ𝜙𝑐 (𝑘)

(A.13)

+ ̃𝜀𝑎𝑘 . ̃ 𝑎 (𝑘) − 𝛼𝑎 (𝜙𝑎 (𝑘)𝑒𝑇 (𝑘)/(1 + 𝜙𝑇 (𝑘) ̃ 𝑎 (𝑘 + 1) = 𝑊 Since 𝑊 𝑎 𝑎 𝜙𝑎 (𝑘))), substituting (A.13), we get

𝑇

̃ 𝑇 (𝑘 + 1) − 𝑊 ̃ 𝑇 (𝑘) ̃ 𝑎 (𝑘 + 1) 𝑊 ̃ 𝑎 (𝑘) 𝑊 Δ𝑉𝑎 (𝑘) = 𝑊 𝑎 𝑎 = 𝛼𝑎2

𝑒𝑎 (𝑘) 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘) 𝑒𝑎𝑇 (𝑘) 2

(1 + 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘))

𝑇

− +

(A.10)

̃𝑐 𝛼𝑐2 𝑊 1+

𝑇 (𝑘) Λ𝜙𝑐 (𝑘) Λ𝜀𝑐𝑘 Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)

≤−

̃ 𝑇 (𝑘) ̃ 𝑐 (𝑘) Λ𝜙𝑐 (𝑘) Λ𝜙𝑇 (𝑘) 𝑊 𝛼𝑐2 𝑊 𝑐 𝑐 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘)

󵄩 󵄩2 𝛼𝑐 (1 − 2𝛼𝑐 ) 󵄩󵄩󵄩Ξ𝑐𝑘 󵄩󵄩󵄩 ≤− 1 + Λ𝜙𝑐𝑇 (𝑘) Λ𝜙𝑐 (𝑘) 󵄩2 󵄩 + 𝛼𝑐 (1 + 2𝛼𝑐 ) 󵄩󵄩󵄩Λ𝜀𝑐𝑘 󵄩󵄩󵄩 ≤−

󵄩󵄩2 2 󵄩 󵄩󵄩𝑊 𝛼𝑐 (1 − 2𝛼𝑐 ) Λ𝜙𝑐𝑚 󵄩󵄩 ̃ 𝑐 (𝑘)󵄩󵄩󵄩 2 1 + Λ𝜙𝑐𝑀

̃ 𝑎 (𝑘 + 1)𝑊 ̃ 𝑇 (𝑘 + 1) − 𝑊 ̃ 𝑎 (𝑘)𝑊 ̃ 𝑇 (𝑘), Third, Δ𝑉𝑎 (𝑘) = 𝑊 𝑎 𝑎 𝜙𝑎 (𝑘) 𝑒𝑎𝑇 (𝑘) . 1 + 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘)

(A.11)

Considering (13) and substituting (16) and (23) reveal 𝜕𝜙𝑇 (𝑘 + 1) 𝑇 𝛾 𝑊 (𝑘) 𝑊𝑎 𝜙𝑎 (𝑘) + 𝑅−1 𝐺𝑇 (𝑋 (𝑘)) 𝑐 2 𝜕𝑋 (𝑘 + 1) 𝑐

(A.12)

= ̃𝜀𝑎𝑘 , where ̃𝜀𝑎𝑘 = −𝜀𝑎𝑘 − (𝛾/2)𝑅−1 𝐺𝑇 (𝑋(𝑘))(𝜕𝜀𝑐(𝑘+1) /𝜕𝑋(𝑘 + 1)).

̃ 𝑎 (𝑘) 𝜙𝑎 (𝑘) 𝑒𝑇 (𝑘) 𝑊 𝑎 1 + 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘)

𝛼𝑎 (2 − 𝛼𝑎 ) 󵄩󵄩 󵄩󵄩2 󵄩Ξ 󵄩 1 + 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘) 󵄩 𝑎𝑘 󵄩

+

𝛼𝑎 (𝛼𝑎 + 1) 󵄩 󵄩̃ 󵄩 󵄩 (𝛽 󵄩󵄩󵄩𝑊 (𝑘)󵄩󵄩󵄩 + 2̃𝜀𝑎𝑘 ) 󵄩󵄩󵄩Ξ𝑎𝑘 󵄩󵄩󵄩 (A.14) 1 + 𝜙𝑎𝑇 (𝑘) 𝜙𝑎 (𝑘) 𝑀 󵄩 𝑐 󵄩

+

𝛼𝑎2 1 2 󵄩󵄩 ̃ 󵄩2 𝛽𝑀 󵄩󵄩󵄩𝑊𝑐 (𝑘)󵄩󵄩󵄩󵄩 + 𝛼𝑎2̃𝜀2𝑎𝑘 𝑇 1 + 𝜙𝑎 (𝑘) 𝜙𝑎 (𝑘) 4

+

𝛼𝑎2 󵄩󵄩 󵄩̃ 𝛽𝑀 󵄩󵄩󵄩󵄩𝑊 󵄩󵄩 ̃𝜀𝑎𝑘 𝑐 (𝑘)󵄩 𝑇 1 + 𝜙𝑎 (𝑘) 𝜙𝑎 (𝑘)

≤−

󵄩2 󵄩 + 𝛼𝑐 (1 + 2𝛼𝑐 ) 󵄩󵄩󵄩Λ𝜀𝑐𝑘 󵄩󵄩󵄩 .

̂ 𝑎 (𝑘 + 1) = 𝑊 ̂ 𝑎 (𝑘) − 𝛼𝑎 𝑊

− 2𝛼𝑎

𝛼𝑎 (1 − 5𝛼𝑎 ) 󵄩󵄩 󵄩󵄩2 𝛼𝑎 (5𝛼𝑎 + 2) 󵄩󵄩 󵄩󵄩2 󵄩󵄩̃𝜀𝑎𝑘 󵄩󵄩 󵄩Ξ𝑎𝑘 󵄩󵄩 + 2 ) 󵄩 2 2 (1 + 𝜙𝑎𝑀

+

𝛼𝑎 󵄩󵄩2 2 󵄩 󵄩󵄩𝑊 (5𝛼𝑎 + 2) 𝛽𝑀 󵄩󵄩 ̃ 𝑐 (𝑘)󵄩󵄩󵄩 , 4

󸀠 where 𝛽𝑀 = 𝛾𝜙𝑀 𝐺𝑀𝜆 𝑅𝑀. Then, substituting (A.5), (A.10), and (A.14) into (A.2) yields

Δ𝑉 =

2 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚 Δ𝑉1 2 Π (1 + 𝜙2 ) (1 + Λ𝜙2 ) 4𝐺𝑀 𝑎 𝑎𝑀 𝑐𝑀

+

2 𝛼𝑐 Λ𝜙𝑐𝑚 1 Δ𝑉𝑎 + Δ𝑉𝑐 2 𝛼𝑐 𝛼𝑎 Π𝑎 (1 + Λ𝜙𝑐𝑀)

Discrete Dynamics in Nature and Society ≤

11

2 − (1 − 2𝑘∗ ) 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚 ‖𝑋 (𝑘)‖2 2 Π (1 + 𝜙2 ) (1 + Λ𝜙2 ) 4𝐺𝑀 𝑎 𝑎𝑀 𝑐𝑀



(Grant no. 20130006110008) and National Natural Science Foundation of China (Grant no. 61473034).

(4 − 9𝛼𝑐 ) 󵄩󵄩2 2 󵄩 󵄩󵄩𝑊 Λ𝜙𝑐𝑚 󵄩󵄩 ̃ 𝑐 (𝑘)󵄩󵄩󵄩 2 4 (1 + Λ𝜙𝑐𝑀)

References

2 (1 − 7𝛼𝑎 ) 𝛼𝑐 Λ𝜙𝑐𝑚 󵄩󵄩 󵄩󵄩2 − 󵄩Ξ 󵄩 + 𝜀𝑆𝑀, 2 2 2Π𝑎 (1 + 𝜙𝑎𝑀) (1 + Λ𝜙𝑐𝑀) 󵄩 𝑎𝑘 󵄩

(A.15) where 𝜀𝑆𝑀 =

2 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚 󵄩󵄩 󵄩󵄩2 󵄩𝜀𝑎𝑘 󵄩󵄩 2 ) (1 + Λ𝜙2 ) 󵄩 Π𝑎 (1 + 𝜙𝑎𝑀 𝑐𝑀

+

2 (5𝛼𝑎 + 2) 𝛼𝑐 Λ𝜙𝑐𝑚 󵄩󵄩̃𝜀𝑎𝑘 󵄩󵄩󵄩2 󵄩 󵄩 2 ) 󵄩 2Π𝑎 (1 + Λ𝜙𝑐𝑀

(A.16)

󵄩2 󵄩 + (1 + 2𝛼𝑐 ) 󵄩󵄩󵄩Λ𝜀𝑐𝑘 󵄩󵄩󵄩 . Therefore, Δ𝑉 < 0 if any of the following inequalities holds: ‖𝑋 (𝑘)‖ > √

2 2 2 4𝐺𝑀 Π𝑎 (1 + 𝜙𝑎𝑀 ) (1 + Λ𝜙𝑐𝑀 ) 𝜀𝑆𝑀 2 (1 − 2𝑘∗ ) 𝛼𝑎 𝛼𝑐 Λ𝜙𝑐𝑚

2 4 (1 + Λ𝜙𝑐𝑀 ) 𝜀𝑆𝑀 󵄩̃ 󵄩󵄩󵄩 > √ or 󵄩󵄩󵄩󵄩𝑊 (𝑘) 󵄩󵄩 𝑐 (4 − 9𝛼 ) Λ𝜙2 𝑐

(A.17)

𝑐𝑚

2 2 2Π𝑎 (1 + 𝜙𝑎𝑀 ) (1 + Λ𝜙𝑐𝑀 ) 𝜀𝑆𝑀 󵄩 󵄩2 or 󵄩󵄩󵄩Ξ𝑎𝑘 󵄩󵄩󵄩 > √ 2 (1 − 7𝛼 ) 𝛼 Λ𝜙 𝑎

𝑐

𝑐𝑚

≡ 𝑏Ξ𝑎 and the learning rates of the two networks are selected as 𝛼𝑎 < 1/7 and 𝛼𝑐 < 4/9 for nonlinear systems with optimal closed loop bounds described as 0 < 𝑘∗ < 1/2. Therefore, according to the Lyapunov extensions [25], the system states and the weight estimation error of the critic and actor NNs are UUB. Finally, using (23) and (24), ̃ 𝑎 (𝑘) 𝜙𝑎 (𝑘) − 𝜀𝑎𝑘 . 𝜂̂ (𝑘) − 𝜂∗ (𝑘) = 𝑊

(A.18)

Thus, we have 󵄩 󵄩 󵄩 󵄩 󵄩 󵄩󵄩̂ 󵄩 󵄩 ∗ 󵄩󵄩𝜂 (𝑘) − 𝜂 (𝑘)󵄩󵄩󵄩 ≤ 󵄩󵄩󵄩Ξ𝑎𝑘 󵄩󵄩󵄩 + 󵄩󵄩󵄩𝜀𝑎𝑘 󵄩󵄩󵄩 ≤ 𝑏Ξ𝑎 + 󵄩󵄩󵄩𝜀𝑎𝑘 󵄩󵄩󵄩 ≡ 𝜀𝑢 . (A.19) This completes the proof.

Competing Interests The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments This work was supported by the Specialized Research Fund for the Doctoral Program of Higher Education (SRFDP)

[1] K. S. Narendra and C. Xiang, “Adaptive control of discretetime systems using multiple models,” IEEE Transactions on Automatic Control, vol. 45, no. 9, pp. 1669–1686, 2000. [2] X.-L. Li, C. Jia, D.-X. Liu, and D.-W. Ding, “Nonlinear adaptive control using multiple models and dynamic neural networks,” Neurocomputing, vol. 136, pp. 190–200, 2014. [3] X.-L. Li, D.-X. Liu, C. Jia, and X.-Z. Chen, “Multi-model control of blast furnace burden surface based on fuzzy SVM,” Neurocomputing, vol. 148, pp. 209–215, 2015. [4] X.-L. Li, C. Jia, D.-X. Liu, and D.-W. Ding, “Adaptive control of nonlinear discrete-time systems by using os-elm neural networks,” Abstract and Applied Analysis, vol. 2014, Article ID 267609, 11 pages, 2014. [5] Y.-J. Liu and S. C. Tong, “Adaptive NN tracking control of uncertain nonlinear discrete-time systems with nonaffine deadzone input,” IEEE Transactions on Cybernetics, vol. 45, no. 3, pp. 497–505, 2015. [6] Y.-J. Liu and S. Tong, “Adaptive fuzzy control for a class of nonlinear discrete-time systems with backlash,” IEEE Transactions on Fuzzy Systems, vol. 22, no. 5, pp. 1359–1365, 2014. [7] J. Campos, F. L. Lewis, and R. Selmic, “Backlash compensation with filtered prediction in discrete time nonlinear systems by dynamic inversion using neural networks,” Asian Journal of Control, vol. 6, no. 3, pp. 362–375, 2004. [8] Y. J. Liu, S. C. Tong, D. J. Li, and Y. Gao, “Fuzzy adaptive control with state observer for a class of nonlinear discrete-time systems with input constraint,” IEEE Transactions on Fuzzy Systems, 2015. [9] Y. J. Liu, Y. Gao, S. C. Tong, and Y. M. Li, “Fuzzy approximationbased adaptive backstepping optimal control for a class of nonlinear discrete-time systems with deadzone,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 1, pp. 16–28, 2016. [10] H. G. Zhang, X. Zhang, Y. H. Luo, and J. Yang, “An overview of research on adaptive dynamic programming,” Acta Automatica Sinica, vol. 39, no. 4, pp. 303–311, 2013. [11] D. Liu, X. Yang, D. Wang, and Q. Wei, “Reinforcementlearning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints,” IEEE Transactions on Cybernetics, vol. 45, no. 7, pp. 1372–1385, 2015. [12] H. Zhang, L. Cui, X. Zhang, and Y. Luo, “Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method,” IEEE Transactions on Neural Networks, vol. 22, no. 12, pp. 2226–2236, 2011. [13] H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, “Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems,” Automatica, vol. 50, no. 1, pp. 193– 202, 2014. [14] H. Zargarzadeh, Q. Yang, and S. Jagannathan, “Online optimal control of nonaffine nonlinear discrete-time systems without using value and policy iterations,” in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, pp. 221–257, Wiley-IEEE Press, 2013. [15] Y. Jiang and Z.-P. Jiang, “Global adaptive dynamic programming for continuous-time nonlinear systems,” IEEE Transactions on Automatic Control, vol. 60, no. 11, pp. 2917–2929, 2015.

12 [16] C. Qin, H. Zhang, and Y. Luo, “Near-optimal control for continuous-time nonlinear systems with control constraints using on-line ADP,” in Proceedings of the 4th International Conference on Intelligent Control and Information Processing (ICICIP ’13), pp. 754–759, IEEE, Beijing, China, June 2013. [17] Y. Gao and Y.-J. Liu, “Adaptive fuzzy optimal control using direct heuristic dynamic programming for chaotic discretetime system,” Journal of Vibration and Control, vol. 22, no. 2, pp. 595–603, 2016. [18] B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, “Optimal tracking control of unknown discrete-time linear systems using input-output measured data,” IEEE Transactions on Cybernetics, vol. 45, no. 12, pp. 2770–2779, 2015. [19] T. Dierks and S. Jagannathan, “Online optimal control of nonlinear discrete-time systems using approximate dynamic programming,” Journal of Control Theory and Applications, vol. 9, no. 3, pp. 361–369, 2011. [20] C. Qin, H. Zhang, and Y. Luo, “Optimal tracking control of a class of nonlinear discrete-time switched systems using adaptive dynamic programming,” Neural Computing and Applications, vol. 24, no. 3-4, pp. 531–538, 2014. [21] B. Kiumarsi and F. L. Lewis, “Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems,” IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 1, pp. 140–151, 2015. [22] X. Yang, D. Liu, and Y. Huang, “Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints,” IET Control Theory and Applications, vol. 7, no. 17, pp. 2037–2047, 2013. [23] C. Qin, H. Zhang, and Y. Luo, “Adaptive optimal control for nonlinear discrete-time systems,” in Proceedings of the 4th IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL ’13), pp. 13–18, Singapore, April 2013. [24] X. Luo and J. Si, “Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN ’13), pp. 1–7, IEEE, Dallas, Tex, USA, August 2013. [25] K. G. Vamvoudakis and F. L. Lewis, “Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, 2010.

Discrete Dynamics in Nature and Society

Advances in

Operations Research Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Advances in

Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Journal of

Applied Mathematics

Algebra

Hindawi Publishing Corporation http://www.hindawi.com

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Journal of

Probability and Statistics Volume 2014

The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

International Journal of

Differential Equations Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Submit your manuscripts at http://www.hindawi.com International Journal of

Advances in

Combinatorics Hindawi Publishing Corporation http://www.hindawi.com

Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Journal of

Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

International Journal of Mathematics and Mathematical Sciences

Mathematical Problems in Engineering

Journal of

Mathematics Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Discrete Mathematics

Journal of

Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Discrete Dynamics in Nature and Society

Journal of

Function Spaces Hindawi Publishing Corporation http://www.hindawi.com

Abstract and Applied Analysis

Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

International Journal of

Journal of

Stochastic Analysis

Optimization

Hindawi Publishing Corporation http://www.hindawi.com

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

Volume 2014

Suggest Documents