Finite Horizon Stochastic Optimal Control of Uncertain Linear Networked Control System Hao Xu
S.Jagannathan
Department of Electrical and Computer Engineering Missouri University of Science and Technology Rolla, MO, USA
[email protected]
Department of Electrical and Computer Engineering Missouri University of Science and Technology Rolla, MO, USA
[email protected] controller design for uncertain linear/nonlinear system in the forward-in-time way instead of traditional backward-in-time optimal design which is utilized with known system dynamics. In ADP, by using policy or value iterations, reinforcement learning scheme can be combined with dynamic programming to solve optimal control. Recently, with ADP techniques, the effort in [7][18] generated finite horizon optimal control inputs for nonlinear system with unknown system dynamics. However, for achieving the optimality, these iteration-based ADP methods need significant number of iterations within a sample interval which may not be possible. Therefore, Dierks and Jagannathan [8] proposed a time-based ADP approach to derive infinite horizon optimal control of nonlinear affine discrete-time system in forward-in-time. In this scheme, past history of system states and cost function estimates has been utilized instead of iteration-based approximated optimal design. However, existing ADP approaches (e.g. [5-8]) for linear and nonlinear systems are not applicable for NCS since a) most of them address infinite horizon based optimal control [3]; and (b) network imperfections resulting from communication network are ignored. Network imperfections cause uncertainty in system dynamics since they are not known beforehand. Considering uncertain system dynamics, finite horizon optimal adaptive control design for linear NCS is more challenging due to terminal state constraint. In our previous work [9], a novel infinite horizon stochastic optimal control has been proposed for LNCS in presence of unknown system dynamics and network imperfections. The finite horizon case has not been addressed so far in the literature. Compared with infinite horizon, finite horizon optimal control design should optimize the linear system while satisfying the terminal constraint [3]. Therefore, optimal adaptive control scheme using ADP is undertaken in this paper to generate finite horizon stochastic optimal regulation of LNCS with uncertain system dynamics due to unknown network imperfections such as networkinduced delays and packet losses. First, with an initial admissible control, a novel adaptive estimator (AE) [10], which is tuned online, is proposed and updated forward-in-time to learn the stochastic value function by using Bellman equation [3] given the terminal state constraint. Next, the proposed stochastic optimal adaptive control is generated by optimizing the value function while satisfying the terminal constraint via the AE. In contrast with traditional finite-horizon stochastic optimal regulator which needs full knowledge of system dynamics to solve the Stochastic Riccati Equation (SRE), our proposed AE-based adaptive optimal
Abstract— In this paper, finite horizon stochastic optimal control issue has been studied for linear networked control system (LNCS) in the presence of network imperfections such as network-induced delays and packet losses by using adaptive dynamic programming (ADP) approach. Due to an uncertainty in system dynamics resulting from network imperfections, the stochastic optimal control design uses a novel adaptive estimator (AE) to solve the optimal regulation of uncertain LNCS in a forward-in-time manner in contrast with backward-in-time Riccati equation-based optimal control with known system dynamics. Tuning law for unknown parameters of AE has been derived. Lyapunov theory is used to show that all the signals are uniformly ultimately bounded (UUB) with ultimate bounds being a function of initial values and final time. In addition, the estimated control input converges to optimal control input within finite horizon. Simulation results are included to show the effectiveness of the proposed scheme. Keywords—Networked Control System; Adaptive Dynamics Programming and Reinforcement learning; Finite horizon; Stochastic Optimal Control; Adaptive Estimator
I.
INTRODUCTION
Networked Control System (NCS) [1], which uses a realtime communication network as a feedback control loop, has been considered as the next-generation control system due to many advantages such as low installation cost, high flexibility, efficiency etc. However, inserting a network into the feedback loop brings many challenging issues due to network imperfections such as network-induced delays and packet losses which occur during exchanging data among devices. Moreover, these network imperfections can degrade the control system performance significantly causing instability. Therefore, authors in [1] and [2] analyzed and proposed the stability region of NCS with network-induced delays and packet losses respectively. Moreover, by using stochastic optimal control theory [3], the authors in [4] derived infinite horizon stochastic optimal control of NCS with network imperfections. The optimal design in [3] is solved backward-intime with known NCS system dynamics by assuming the network imperfections are known apriori. However, the NCS system dynamics resulting from network imperfections are not known beforehand. Further, current NCS designs [1-2][4] did not consider the finite horizon optimal scheme which should be preferred in practical NCS. The adaptive dynamics programming (ADP) techniques proposed by Werobs [5] and Barto [6], intend to obtain optimal Research Supported in part by NSF ECCS #1128281 and Intelligent Systems Center.
c 978-1-4673-5925-2/13/$31.00 2013 IEEE
24
scheme works for LNCS by introducing augment state and relaxes the requirement for system dynamics and network imperfections without using value or policy iterations. The case of infinite horizon is also deduced. The control of LNCS with network imperfections is different than control techniques for time-delay systems with with known deterministic delays [15-16] since in NCS, the network imperfections result in random delays and packet losses which are not common with time-delay systems. Therefore, these approaches [15-16] are not suitable for LNCS. The contributions of this paper include the development of an adaptive optimal control of uncertain LNCS. The case of infinite horizon is also included. Lyapunov stability in terms of boundedness of the regulation errors and parameter estimates is demonstrated. A simulation example is utilized to show the effectiveness of the approach. II.
BACKGROUND
A. Linear Networked Control System (LNCS) Ts :Sampling interval Actuator
Plant
Sensor
Communication Network Delay And Packet losses
W ca (t)
W sc (t )
J (t )
J (t )
Delay And Packet losses
Controller
Fig. 1. Linear Networked Control System
The basic structure of LNCS is shown in Figure 1 where communication network is used to close the feedback control loop. Due to the communication network, the LNCS in this paper incorporates the network-induced delays and packet losses which includes: (1) W sc (t ) : sensor-to-controller delay, (2) W ca (t ) : controller-to-actuator delay, (3) J (t ) : indicator of network-induced packet losses. According to recent NCS studies [4][9] and standard communication network protocols, the following assumptions [11-12] are needed for the stochastic optimal design. Assumption: a) For a wide area network, two types of networked-induced delays are considered to be independent, ergodic and unknown whereas their probability distribution functions are considered known. The sensor-to-controller delay is kept less than one sampling interval. b) The sum of two delays is bounded while initial state of system is deterministic. Considering the network-induced delays and packet losses, the original time-invariant plant, x (t ) Ax(t ) Bu(t ) , can be represented as (1) x(t ) Ax(t ) BJ (t )u(t W (t )) °I nu n if packet has been received at timet and ® nu n °¯0 if packet has been lost x(t ) at timet x(t ) Թൈ , u(t ) Թൈ represent the system state, and control inputs of LNCS respectively and A Թൈ , B Թൈ denote the system matrices, and n, m represent the dimensions of LNCS.
where J (t )
Similar to [9], after integrating (1) over a sampling interval
[kTs , (k 1)Ts ) , the LNCS can be expressed as
As xk Bk1J k 1uk 1 Bkd J k d uk d Bk0J k uk
xk 1
(2)
with d Ts is the upper bound of network-induced delay, As , Bk0 , d
Bk1 ,..., Bk and J k are defined similar to [9]. To simplify the
LNCS representation (2), define a new augment state
zk [ xkT ukT1 ukT2 ukTd ]T Թା incorporating present state and past control inputs. Then, (2) can be expressed as a stochastic linear time-varying system given by (3) zk 1 Azk zk Bzk uk k 0,1,2,... with time-varying system matrices expressed as ª As J k 1 Bk1 J ªJ k Bk0 º B d 1 J k d Bkd º k d 1 k « » « » 0 0 0 » «0 « Im » «0 « 0 » Im 0 0 » » , Bzk « Azk « » . « » « 0 » « » « » » « « » «¬ 0 «¬ 0 »¼ 0 Im 0 »¼ Due to network imperfections, system dynamics become uncertain, time-varying and stochastic. Before designing the controller, we need to ensure that (3) is controllable. It was shown in [9] that if the original time-invariant system is controllable, then LNCS (3) is also controllable. Remark 1: For the proposed stochastic scheme, there are two important things to note: 1) Compared with traditional linear time invariant system, network imperfections not only make LNCS (3) become time-varying but also introduce augmented system states due to the incorporation of past control inputs; 2) traditional control designs that are developed for time-delay systems [15] which eliminate the delay randomness are not suitable for LNCS since network imperfections are not deterministic. In this paper, stochastic analysis will be utilized for LNCS with random network imperfections. Next, according to LNCS representation (3), the stochastic optimal control can be developed by minimizing the value function given by
V ( zk , k )
N 1
E [ z TN S z ,N z N ¦ ( ziT Qz zi uiT Rz ui )] k
W ,J
i k
0,..,N (4)
with NTs is the final time, S z , N , Qz and Rz are symmetric positive semi-definite and definite constant matrices respectively, and E (x) is the expected operator (i.e. mean W ,J
value)
of
N 1
z TN S z , N z N ¦ ( ziT Qz zi uiT Rz ui ) in terms of i k
network-induced delay W and packet losses J . III.
FINITE HORIZON STOCHASTIC OPTIMAL CONTROL
In this section, novel ADP techniques are used to obtain stochastic optimal adaptive control of LNCS in the presence of uncertain system dynamics due to network imperfections. A. Stochastic Value Function Definition Consider a LNCS with network imperfection represented as equation (3). Given the LNCS with a unique equilibrium point, z 0 , on a set S , stochastic optimal control signal uk* Lk zk
2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL)
25
k 0,1,...,N 1 (Note: Lk is the optimal gain) can be obtained by minimizing stochastic value function V ( zk ) given by (4). According to [3], stochastic value function can also represented in the quadratic form as E ( zkT Pz ,k zk , k ) k 0,1,...,N 1 ° (5) V ( zk , k ) ®W ,J T E (z S z , N ) ° ¯W ,J N z , N N where Pz ,k t 0 is the solution to the SRE [13]. Then, the Bellman equation can be expressed in terms of expected value as V * ( zk , k ) E [r ( zk , uk ) V * ( zk 1 , k 1)] W ,J (6) E{[ zkT ukT ]