On the Relation Between the Minimum Principle and Dynamic

0 downloads 0 Views 399KB Size Report
Aug 28, 2017 - PONTRYAGIN's minimum principle (MP) [1] and Bell- man's dynamic programming (DP) [2] serve as the two key tools in optimal control theory.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

4347

On the Relation Between the Minimum Principle and Dynamic Programming for Classical and Hybrid Control Systems Ali Pakniyat, Member, IEEE, and Peter E. Caines, Life Fellow, IEEE

Abstract—Hybrid optimal control problems are studied for a general class of hybrid systems, where autonomous and controlled state jumps are allowed at the switching instants, and in addition to terminal and running costs, switching between discrete states incurs costs. The statements of the Hybrid Minimum Principle and Hybrid Dynamic Programming are presented in this framework, and it is shown that under certain assumptions, the adjoint process in the Hybrid Minimum Principle and the gradient of the value function in Hybrid Dynamic Programming are governed by the same set of differential equations and have the same boundary conditions and hence are almost everywhere identical to each other along optimal trajectories. Analytic examples are provided to illustrate the results. Index Terms—Dynamic programming (DP), HamiltonJacobi-Bellman equation, hybrid systems, nonlinear control system, optimal control, Pontryagin minimum principle (MP).

I. INTRODUCTION ONTRYAGIN’s minimum principle (MP) [1] and Bellman’s dynamic programming (DP) [2] serve as the two key tools in optimal control theory. The MP can be considered as a generalization of Hamilton’s canonical system in classical mechanics as well as an extension of Weierstrass’s necessary conditions in calculus of variations [3], [4], while DP theory, including the Hamilton–Jacobi–Bellman (HJB) equation, may be considered as an extension of the Hamilton–Jacobi theory in classical mechanics and of Carath´eodory’s theorem in calculus of variations [5]. Both the MP and DP constitute necessary conditions for optimality, which under certain assumptions become sufficient (see, e.g., [5]–[9]). However, DP is widely used as a set of sufficient optimality conditions after the optimal control extension of Carath´eodory’s sufficient conditions in calculus of variations (see, e.g., [5]–[7]). The relationship between the MP and DP, which were developed independently in 1950s, was addressed as early as the

P

Manuscript received August 28, 2016; accepted January 9, 2017. Date of publication February 9, 2017; date of current version August 28, 2017. This work was supported by the Natural Sciences and Engineering Research Council of Canada and the Automotive Partnership Canada. Recommended by Associate Editor S. Andersson. The authors are with the Centre for Intelligent Machines and the Department of Electrical and Computer Engineering, McGill University, Montreal, QC H3A 0G4, Canada (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TAC.2017.2667043

formal announcement of the Pontryagin MP [1]. In the classical optimal control framework, this relationship has been elaborated by many others since then (see, e.g., [5]–[16]). The result states that, under certain assumptions (see, e.g., [6] and [8]), the adjoint process in the MP and the gradient of the value function in DP are equal, a property which we shall sometimes refer to as the adjoint–gradient relationship. While this relationship has been proved in various forms, the majority of arguments are based on the following two key elements: (i) the assumption of the openness of the set of all points, from which an optimal transition to the reference trajectory is possible [1, p. 70]; and (ii) the inference of the extremality of the reference optimal state for the corresponding optimal control [1, p. 72]. Then, with the assumption of twice continuous differentiability of the value function, the method of characteristics (see, e.g., [6] and [8]) can be employed to obtain the aforementioned relationship, which is analogous to the derivation of the equivalence of the Hamiltonian system and the Hamilton–Jacobi equation. For certain classes of optimal control problems, the assumption of twice differentiability is intrinsically satisfied since the total cost can become arbitrarily large and negative if the second partial derivative ceases to exist (see, e.g., [16]). But, in general, even once differentiability of the value function is violated at certain points for numerous problems (see, e.g., [6], [7], and [15]–[18]). Consequently, the adjoint–gradient relationship is usually expressed within the general framework of nonsmooth analysis that declares the inclusion of the adjoint process in the set of generalized gradients of the value function [10]–[15]. However, the general expression of the adjoint–gradient relationship in the framework of nonsmooth analysis is unnecessary for optimal control problems with appropriately smooth vector fields and costs, when the optimal feedback control possesses an admissible set of discontinuities [8]. In contrast to classical optimal control theory, the relation between the MP and DP in the hybrid system framework has been the subject of limited number of studies (see, e.g., [19]– [21]). One of the main difficulties in this discussion is that the domains of definition of hybrid systems employed for the derivation of the results of hybrid optimal control theory (see, e.g., [22]–[50]) do not necessarily intersect in a general class of systems. This is especially due to the difference in the approach and the assumptions required for the derivation of necessary and sufficient optimality conditions in the two key formulations. Hence, in order to establish the relationship between the Hybrid Minimum Principle (HMP) and Hybrid Dynamic Programming (HDP), there is the preliminary task of choosing an appropriately general class of hybrid systems, within which the desired HMP

0018-9286 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

4348

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

and HDP results are valid. This is one of the tasks addressed in this paper before establishing the relation between the MP and DP for classical and hybrid control systems. The organization of this paper is as follows. In Section II, a definition of hybrid systems is presented that covers a general class of nonlinear systems on Euclidean spaces with autonomous and controlled switchings and jumps allowed at the switching states and times. Further generalizations such as the lying of the system’s vector fields in Riemannian spaces [29], [30], nonsmooth assumptions [22]–[25], [38], [39], and state dependence of the control value sets [31], as well as restrictions to certain subclasses such as those with regional dynamics [43], [44], specified families of jumps [38]–[41], and systems with locally controllable trajectories [28], which are imposed for the sufficiency of the results, are avoided so that for the selected class of hybrid systems, the associated HMP and HDP theory, together with their relationships, are derived in a unified general framework. Similarly, the selected class of hybrid optimal control problems in Section III covers a general class of hybrid optimal control problems with a large range of running, terminal, and switching costs. With the exception of the infinite horizon problems considered in [37]–[41], this framework is in accordance with the majority of the work on the HMP (see [19]–[21] and [24]–[34]) and a number of publications on HDP (see, e.g., [42]–[45]) defined on finite horizons. The statement of the HMP is presented in Section IV, where necessary conditions are provided for the optimality of the trajectory and the controls of a hybrid system with fixed initial conditions and a sequence of autonomous or controlled switchings. These conditions are expressed in terms of the minimization of the distinct Hamiltonians indexed by the discrete-state sequence of the hybrid trajectory. A feature of special interest is the set of boundary conditions on the adjoint processes and the Hamiltonian functions at autonomous and controlled switching times and states; these boundary conditions may be viewed as a generalization to the optimal control case of the Weierstrass– Erdmann conditions of the calculus of variations [51]. In Section V, the principle of optimality is employed to introduce the cost-to-go and the value functions. It is proved that on a bounded set in the state space, the cost-to-go functions are Lipschitz with a common Lipschitz constant, which is independent of the control, and hence, their infimum, i.e., the value function, is Lipschitz with the same Lipschitz constant. The necessary conditions of HDP are then established in the form of the HJB equation and the corresponding boundary conditions. In Section VI, the main result is given describing the relationship of the MP and DP for classical and hybrid systems. The proof is different in the approach from the classical arguments discussed earlier, and in particular, the sequence of proof steps appears in a different order. To be specific, the optimality condition, i.e., the Hamiltonian minimization property (ii) discussed earlier, appears in the last step in order to emphasize the independence of the dynamics of the cost gradient process from the optimality of the control input. Consequently, assumption (i) is used differently here from the classical proof methods, and in particular, the optimality of the transitions back to the reference trajectory is relaxed to the existence of (not necessarily optimal) neighboring trajectories. After the derivation of the dynamics and boundary conditions for the cost gradient, or sensitivity, corresponding to an arbitrary control input, it is shown that an optimal control leads to the same dynamics and boundary conditions for the optimal cost gradient process as those

for the adjoint process. Thus, by the existence and uniqueness of the solutions to the governing ordinary differential equations (ODE), it is concluded that the optimal cost gradient, i.e., the gradient of the value function generated by the HJB, is equal to the adjoint process in the corresponding HMP formulation. Illustrative examples are provided in Section VII. II. HYBRID SYSTEMS Definition 1: A hybrid system (structure) H is a septuple H = {H, I, Γ, A, F, Ξ, M }

(1)

where the symbols in the expression and their governing assumptions are defined as below. A0 : H := Q × M is called the (hybrid) state space of the hybrid system H, where Q = {1, 2, ..., |Q|} ≡ {q1 , q2 , ..., q|Q | }, |Q| < ∞, is a finite set of discrete states (components), and M = {Rn q }q ∈Q is a family of finite dimensional continuous valued state spaces, where nq ≤ n < ∞ for all q ∈ Q. I := Σ × U is the set of system input values, where Σ with |Σ| < ∞ is the set of discrete-state transition and continuous state jump events extended with the identity element, and U = {Uq }q ∈Q is the set of admissible input control values, where each Uq ⊂ Rm q is a compact set in Rm q . The set of admissible (continuous) control inputs U (U ) := L∞ ([t0 , T∗ ), U ) is defined to be the set of all measurable functions that are bounded up to a set of measure zero on [t0 , T∗ ), T∗ < ∞. The boundedness property necessarily holds since admissible input functions take values in the compact set U. Γ : H × Σ → H is a time-independent (partially defined) discrete-state transition map. Ξ : H × Σ → H is a time-independent (partially defined) continuous-state jump transition map. All ξσ ∈ Ξ, ξσ : Rn q → Rn p , p ∈ A(q, σ) are assumed to be continuously differentiable in the continuous state x ∈ Rn q . A : Q × Σ → Q denotes both a deterministic finite automaton and the automaton’s associated transition function on the state space Q and event set Σ, such that for a discrete state q ∈ Q, only the discrete controlled and uncontrolled transitions into the q-dependent subset {A(q, σ), σ ∈ Σ} ⊂ Q occur under the projection of Γ on its Q components: Γ : H × Σ → H|Q . In other words, Γ can only make a discrete-state transition in a hybrid state (q, x) if the automaton A can make the corresponding transition in q. F is an indexed collection of vector fields {fq }q ∈Q such that fq ∈ C k f q (Rn q × Uq → Rn q ), kf q ≥ 1, satisfies a joint uniform Lipschitz condition, i.e., there exists Lf < ∞ such that fq (x1 , u1 ) − fq (x2 , u2 ) ≤ Lf ( x1 − x2 + u1 −u2 ), for all x, x1 , x2 ∈ Rn q , u, u1 , u2 ∈ Uq , q ∈ Q. M = {mα : α ∈ Q × Q, } denotes a collection of switching manifolds such that, for any ordered pair α ≡ (α1 , α2 ) = (p, q), mα is a smooth, i.e., C ∞ , codimension 1 submanifold of Rn α 1 , described locally by mα = {x : mα (x) = 0}. It is assumed that mα ∩ mβ = ∅, whenever α1 = β1 , but α2 = β2 , for all α, β ∈ Q × Q.  We note that the case where mα is identified with its reverse ordered version mα¯ giving mα = mα¯ is not ruled out by this definition, even in the nontrivial case mp,p , where α1 = α2 = p. The former case corresponds to the common situation where the

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

switching of vector fields at the passage of the continuous trajectory in one direction through a switching manifold is reversed if a reverse passage is performed by the continuous trajectory, while the latter case corresponds to the standard example of the bouncing ball. Switching manifolds will function in such a way that whenever a trajectory governed by the controlled vector field meets the switching manifold transversally, there is an autonomous switching to another controlled vector field or there is a jump transition in the continuous state component, or both. A transversal arrival on a switching manifold mq ,r at state xq ∈ mq ,r = {x ∈ Rn q : mq ,r (x) = 0} occurs whenever

4349

xq i −1 (ti −) := limt↑t i xq i −1 (t) satisfies a switching manifold condition of the form mq i −1 q i (xq i −1 (ti −)) = 0

(7)

(2)

for qi−1 , qi ∈ Q, where mq i −1 q i (x) = 0 defines a (qi−1 , qi ) switching manifold and it is not the case that either (i) xq i −1 (ti −) ∈ ∂mq i −1 q i or (ii) fq i −1 (xq i −1 (ti −), uq i −1 (ti −)) ⊥ ∇mq i −1 q i (xq i −1 (ti −)), i.e., ti is not a manifold termination instant (see [52]). With Assumptions A0 and A1 in force, such a transition is well defined and labels the event σq i −1 q i ∈ Σ, that corresponds to the hybrid state transition

for uq ∈ Uq , and q, r ∈ Q. We further assume the following: A1 : The initial state h0 := (q0 , x(t0 )) ∈ H is such that mq 0 ,q j (x0 ) = 0, for all qj ∈ Q. 

h (ti ) ≡ (qi , xq i (ti ))   = Γ(qi−1, σq i −1 q i ) , ξσ q i −1 q i (xq i −1 (ti −)) . (8)

∇mq ,r (xq )T fq (xq , uq ) = 0

[t ,t )

Definition 2: A hybrid input process is a pair IL ≡ IL 0 f := (SL , u) defined on a half open interval [t0 , tf ), tf < ∞, where u ∈ U and SL = ((t0 , σ0 ), (t1 , σ1 ), . . . , (tL , σL )), L < ∞, is a finite hybrid sequence of switching events consisting of a strictly increasing sequence of times τL := {t0 , t1 , t2 , . . . , tL } and a discrete event sequence σ with σ0 = id and σi ∈ Σ, i ∈ {1, 2, . . . , L}.  Definition 3: A hybrid state process (or trajectory) is a triple (τL , q, x) consisting of the sequence of switching times τL = {t0 , t1 , . . . , tL }, L < ∞, the associated sequence of discrete states q = {q0 , q1 , . . . , qL }, and the sequence x(·) = {xq 0 (·), xq 1 (·), . . . , xq L (·)} of piecewise differentiable functions xq i (·) : [ti , ti+1 ) → Rn q i .  Definition 4: The input-state trajectory for the hybrid system H satisfying A0 and A1 is a hybrid input IL = (SL , u) together with its corresponding hybrid state trajectory (τL , q, x) defined over [t0 , tf ), tf < ∞, such that it satisfies the following: i) Continuous state dynamics: The continuous state component x(·) = {xq 0 (·), xq 1 (·), . . . , xq L (·)} is a piecewise continuous function, which is almost everywhere differentiable and on each time segment specified by τL satisfies the dynamics equation x˙ q i (t) = fq i (xq i (t), uq i (t)), a.e. t ∈ [ti , ti+1 ) (3) with the initial conditions xq 0 (t0 ) = x0 ,



xq i (ti ) = ξσ i (xq i −1 (ti −)) := ξσ i lim xq i −1 t↑t i

(4)  (t) (5)

for (ti , σi ) ∈ SL . In other words, x(·) = {xq 0 (·), xq 1 (·), . . . , xq L (·)} is a piecewise continuous function, which is almost everywhere differentiable and is such that each xq i (·) satisfies  t fq i (xq i (s) , uq i (s)) ds (6) xq i (t) = xq i (ti ) + ti

for t ∈ [ti , ti+1 ). ii) Autonomous discrete transition dynamics: An autonomous (uncontrolled) discrete-state transition from qi−1 to qi together with a continuous state jump ξσ i occurs at the autonomous switching time ti if

iii) Controlled discrete transition dynamics: A controlled discrete-state transition together with a controlled continuous state jump ξσ occurs at the controlled discrete event time ti if ti is not an autonomous discrete event time and if there exists a controlled discrete input event σq i −1 q i ∈ Σ for which h (ti ) ≡ (qi , xq i (ti ))   = Γ(qi−1, σq i −1 q i ) , ξσ q i −1 q i (xq i −1 (ti −)) (9) with (ti , σq i −1 q i ) ∈ SL and qi ∈ A(qi−1 ).  Theorem 2.1 (Existence and uniqueness of solution trajectories for hybrid systems [52]): A hybrid system H with an initial hybrid state (q0 , x0 ) satisfying Assumptions A0 and A1 possesses a unique hybrid input-state trajectory on [t0 , T∗∗ ), where T∗∗ is the least of i) T∗ ≤ ∞, where [t0 , T∗ ) is the temporal domain of the definition of the hybrid system; ii) a manifold termination instant T∗ of the trajectory h(t) = h(t, (q0 , x0 ), (SL , u)), t ≥ t0 , at which either x(T∗ −) ∈ ∂mq (T ∗ −)q (T ∗ ) or fq (T ∗ −) (x(T∗ −), u(T∗ −)) ⊥  ∇mq (T ∗ −)q (T ∗ ) (x(T∗ −)). We note that Zeno times, i.e., accumulation points of discrete transition times, are ruled out by Definitions 2–4. III. HYBRID OPTIMAL CONTROL PROBLEMS A2: Let {lq }q ∈Q , lq ∈ C n l (Rn q × Uq → R+ ), nl ≥ 1, be a family of running cost functions with nl = 2 unless otherwise stated; {cσ j }σ j ∈Σ ∈ C n c (Rn q j −1 × Σ → R+ ), nc ≥ 1, be a family of switching cost functions; and g ∈ C n g (Rn q f → R+ ), ng ≥ 1, be a terminal cost function satisfying the following assumptions. i) There exists Kl < ∞ and 1 ≤ γl < ∞ such that and |lq (x1 , u1 ) − |lq (x, u)| ≤ Kl (1 + x γ l ) lq (x2 , u2 )| ≤ Kl ( x1 − x2 + u1 − u2 ), for all x ∈ Rn q , u ∈ Uq , q ∈ Q. ii) There exists Kc < ∞ and 1 ≤ γc < ∞ such that |cσ j (x)| ≤ Kc (1 + x γ c ), x ∈ Rn q j −1 , σj ∈ Σ. iii) There exists Kg < ∞ and 1 ≤ γg < ∞ such that  |g(x)| ≤ Kg (1 + x γ g ), x ∈ Rn q f .

4350

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

Consider the initial time t0 , final time tf < ∞, and initial hybrid state h0 = (q0 , x0 ). With the number of switchings L held fixed, the set of all hybrid input trajectories in Definition 2 with exactly L switchings is denoted by IL , and for all IL := (SL , u) ∈ IL , the hybrid switching sequences take the form SL = {(t0 , id), (t1 , σq 0 q 1), . . . , (tL , σq L −1 q L )} ≡ {(t0 , q0), (t1 , q1 ), . . . , (tL , qL )} and the corresponding continuous con trol inputs are of the form u ∈ U = Li=0 L∞ ([ti , ti+1 ), Uq i ), where tL +1 = tf . Let IL be a hybrid input trajectory that by Theorem 2.1 results in a unique hybrid state process. Then, hybrid performance functions for the corresponding hybrid input-state trajectory are defined as L  ti + 1  J (t0 , tf , h0 , L; IL ) := lq i (xq i (s) , uq i (s)) ds i=0

+

L 

IV. HYBRID MINIMUM PRINCIPLE Theorem 4.1 ([53]): Consider the hybrid system H subject to Assumptions A0–A2 and the HOCP (11) for the hybrid performance function (10). Define the family of system Hamiltonians by Hq (xq , λq , uq ) = λTq fq (xq , uq ) + lq (xq , uq )

xq , λq ∈ Rn q , uq ∈ Uq , q ∈ Q. Then, for a given switching sequence {qi }Li=0 and along the corresponding optimal trajectory xo , there exists an adjoint process λo such that x˙ oq =

(10)

Definition 5: The Bolza hybrid optimal control problem (BHOCP) is defined as the infimization of the hybrid cost (10) over the family of hybrid input trajectories IL , i.e., J o (t0 , tf , h0 , L) = inf J (t0 , tf , h0 , L; IL ) . I L ∈IL

(11)

 Definition 6: The Mayer hybrid optimal control problem (MHOCP) is defined as a special case of the BHOCP where lq (xq , uq ) = 0 for all q ∈ Q and cσ j (tj , xq j −1 (tj −)) = 0 for all σj ∈ Σ, 1 ≤ j ≤ L.  Remark 3.1: The relationship between the BHOCP and the MHOCP: In general, a BHOCP can be converted into an MHOCP with the introduction of an auxiliary state component z and the extension of the continuous valued state to z x ˆq := q . (12) xq With the definition of the augmented vector fields lq (xq , uq ) ˆ ˙x ˆq = fq (xˆq , uq ) := fq (xq , uq ) subject to the initial condition ˆ 0 = (q0 , x h ˆq 0 (t0 )) =



q0 ,

0 x0

(19)



 xoqj (tj ) = ξσ j xoqj −1 (tj −)

λoqL (tf ) = ∇g xoqL (tf )

(20) (21) (22)

λoqj −1 (tj −) ≡ λoqj −1 (tj ) = ∇ξσ j T λoqj (tj +) + ∇cσ j + p∇m (23) where p ∈ R, when tj indicates the time of an autonomous switching, and p = 0 when tj indicates the time of a controlled switching. Moreover Hq (xo , λo , uo ) ≤ Hq (xo , λo , u)

(24)

for all u ∈ Uq o , that is to say the Hamiltonian is minimized with respect to the control input, and at a switching time tj , the Hamiltonian satisfies Hq j −1 (xo , λo , uo )|t j − ≡ Hq j −1 (tj ) = Hq j (tj ) ≡ Hq j (xo , λo , uo )|t j + . (25)

(13)

 We note that the gradient of the state transition jump map ∂ ξ σ (x q

 (14)

and with the switching boundary conditions governed by the extended jump function defined as

ˆq j −1 (tj −) x ˆq j (tj ) = ξˆσ j x



 zq j −1 (tj −) + cσ j xq j −1 (tj −) := (15)

ξσ j xq j −1 (tj −) the cost (10) of the BHOCP turns into the Mayer form with   ˆ 0 , L; IL := gˆ (ˆ xq L (tf )) (16) J t 0 , tf , h where gˆ(ˆ xq L (tf )) = zq L (tf ) + g(xq L (tf )).

(18)

almost everywhere t ∈ [t0 , tf ] with xoq0 (t0 ) = x0

j =1

∂Hq o o o

xq , λq , uq ∂λq

∂Hq o o o

xq , λq , uq λ˙ oq = − ∂xq

ti



cσ j tj , xq j −1 (tj −) + g (xq L (tf )) .

(17)



)

∂ xq

∇ξσ j = ∂jx q j −1 ≡ ∂ x q j is not necessarily square due to j −1 j −1 the possibility of changes in the state dimension, but the boundary conditions (23) are well defined for hybrid optimal control problems satisfying A0–A2. Remark 4.2: For an HOCP represented in the Bolza form (as in Definition 5) and the corresponding Mayer representation (as in Definition 6 and through Remark 3.1), the corresponding HMP results are given by (see also [20])   ˆq x ˆ Tq fˆq (ˆ ˆ u λ ˆ, λ, xq , uq ) ≡ H = Hq (x, λ, u) ≡ λTq fq (xq , uq ) + lq (xq , uq ) , (26) 1 ˆ q (t) = . (27) λ λq (t) 

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

V. HYBRID DYNAMIC PROGRAMMING Definition 7: Consider the hybrid system (1) and the class of HOCP (11) with the hybrid cost (10). At an arbitrary instant t ∈ [t0 , tf ], an initial hybrid state h = (q, x) ∈ Q × Rn q , and a specified number of remaining switchings L − j + 1 corresponding to t ∈ (tj −1 , tj ], the cost to go subject to the hybrid [t,t ] input process IL −j +1 ≡ IL −jf +1 is given by J (t, tf , q, x, L − j + 1; IL −j +1 )  tj L  lq (xq , uq ) ds + cσ q i −1 q i (ti , xq i −1 (ti −)) = t

+

i=j

lq i (xq i , uq i ) ds + g (xq L (tf )) .

differentiability of the value function almost everywhere. Furthermore, if there exists an optimal input with an admissible set  of discontinuities, M(1) is open dense in Li=0 [ti , ti+1 ] × Rn q i .  Theorem 5.5 ([53]): Consider the hybrid system H and the HOCP (11) together with the Assumptions A0–A2 as above. Then, for all (t, xq i ) ∈ M(1) , qi ∈ Q, the HJB equation holds, i.e., −

∂V = inf {lq i (xq i , uq i ) + ∇V, fq i (xq i , uq i )} uq i ∂t

(31)

a.e. t ∈ (ti , ti+1 ), 0 ≤ i ≤ L, subject to the terminal condition

i=j

L  ti + 1 

4351

V (tf , qL , x, 0) = g (xq L )

(28)

ti

The value function is defined as the optimal cost to go over the corresponding family of hybrid control inputs, i.e., V (t, q, x, L − j + 1) := inf J(t, tf , q, x, L − j + 1; IL −j +1). I L −j + 1

(29)  Theorem 5.1: With Assumptions A0–A2 in force, the value function (29) is Lipschitz in x uniformly in t for all  t ∈ Li=0 (ti , ti+1 ), i.e., for Br = {x ∈ Rn : x < r} and for all t ∈ (ti , ti+1 ), x ≡ xt ∈ Br , there exist a neighborhood Nr x (xt ) and a constant 0 < K < ∞ such that |V (t, q, xt , L − j + 1) − V (s, q, xs , L − j + 1)|  12  (30) < K xt − xs 2 + |s − t|2 for s ∈ (ti , ti+1 ) and xs ∈ Nr x (xt ).  Proof: See the Appendix. Definition 8: Let M(i) denote the set of all (t, x) ∈ L nq i , for which the ith derivatives of V exist and i=0 R × R are continuous.  Note that from Theorem 5.1, it is concluded that M(0) ⊇ L nq i , i.e., the value function is at most disi=0 (ti , ti+1 ) × R continuous at the switching instants with nonzero switching costs and nonidentity jump maps. [t,t ] Definition 9: A hybrid feedback input IL −jf +1 (t, q, x) = (SL −j +1 , uq (τ ) (τ, x)), τ ∈ [t, tf ] is said to have an admissible set of discontinuities, if for each q ∈ Q, the discontinuities of the continuous valued feedback control uq (t, x) and the discrete valued feedback input σ(t, q, x) are located on lower dimensional manifolds in the time and state space R × Rn q .  Remark 5.2: By A0, an autonomous discrete-valued control input σ necessarily satisfies the lower dimensional manifold switching set condition of Definition 9, where the sets constitute C ∞ submanifolds. Remark 5.3: For classical (i.e., nonhybrid) systems, a more detailed definition of a feedback control law with an admissible set of discontinuities can be found in [8, pp. 90–97]. The necessary conditions for the Lipschitz continuity of the optimal feedback control are discussed in [54] and [55], and sufficient conditions for continuity with respect to initial conditions are given in [56]. Corollary 5.4: From Theorem 5.1 and Rademacher’s theorem (see, e.g., [8], [57], and [58]), the Lipschitz property implies

(32)

and at the switching times tj ∈ τL = {t1 , · · · , tL }, subject to the boundary conditions V (tj , q, x, L − j + 1)   = min V tj , Γ (q, σj ) , ξσ j (x), L − j) + cσ j (x) σ j ∈Σ j

(33) lq (x, uo (tj −, x)) + ∇V, fq (x, uo (tj −, x)) −∂ V (tj −, q, x, L − j + 1) ∂t

−∂ V tj , Γ (q, σj ) , ξσ j (x) , L − j = ∂t    o (x) t , u , ξ ≡ lΓ ( q , σ ) ξσ(x) j σj j j     + ∇V, fΓ ( q , σ ) ξσ(x) , uo tj , ξσ(x) j j ≡

(34)

j

where if (tj , x) ∈ R × Rn q j −1 belong to a controlled switching set, then Σj = Σ subject to the automaton constraint that Γ(q, σj ) is defined, and in the case of an autonomous switching, the set Σj is reduced to a subset of discrete inputs, which are consistent with the switching manifold condition mq ,Γ(q ,σ j ) (xq ) = 0. In the above equation, the notation uo (t, x) indicates the (continuous valued) optimal input corresponding to x at the instant t.  VI. RELATIONSHIP BETWEEN THE HMP AND HDP Theorem 6.1 (Evolution of the cost sensitivity along a general trajectory): Consider the hybrid system H together with Assumptions A0–A2 and the hybrid cost to go (28). Then, for [t,t ] a given hybrid feedback control IL −jf +1 (t, q, x) with an admissible set of discontinuities, the sensitivity function ∇J ≡ ∂ ∂ x J(t, tf , q, x, L − j + 1; IL −j +1 ), 1 ≤ j < L + 1, satisfies d ∇J = − dt



∂fq (x, u) ∂x

T

∂lq (x, u) ∇J + ∂x

 (35)

subject to the terminal condition: ∇J (tf , qL , x, 0; I0 ) = ∇g (x)

(36)

4352

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

and the boundary conditions: ∇J (tj −, qj −1 , x, L − j + 1; IL −j +1 ) ≡ ∇J (tj , qj −1 , x, L − j + 1; IL −j +1 ) T

=∇ξσ j x ∇J tj +, qj ,ξσ j (x) , L − j; IL −j + p ∇m|x + ∇c|x (37)

by Assumptions A0–A2 for the given control input with an admissible set of discontinuities, we have       ∂ ∂ d d ∂ d xs = φq L (s, t, x) = φq L (s, t, x) ds ∂x ds ∂x ∂x ds =

n q j −1

belong to a controlled with p = 0 when (tj , x) ∈ R × R switching set, and 

T ξ ,q j −1 q ∇J tj +, qj , ξσ j (x) , L − j; IL −j fq j ,ξ + lq jj −1 ,ξ p= T ∇m fq j −1 (x, u (tj −)) (38) when (tj , x) ∈ R × Rn q j −1 belong to an autonomous ξ ,q switching set, and where in the above equation fq j ,ξj −1 := − − fq j (ξσ j (xq j −1 (t− j )), uq j (tj )) − ∇ξfq j −1 (xq j −1 (tj ), uq j −1 (tj )) q j −1 and lq j ,ξ := lq j (ξσ j (xq j −1 ), uq j (tj)) − lq j −1 (xq j −1 , uq j −1 (t− j )).  Proof: We first prove that (35) holds for t ∈ (tL , tL +1 ] ≡ (tL , tf ] with the terminal condition (36). Then, by assuming that (35) holds for t ∈ (tj , tj +1 ], j ≤ L, we show that it also holds for t ∈ (tj −1 , tj ] with the boundary condition (37), with p = 0 when tj indicates the time of a controlled switching, and with p given by (38) when tj ∈ τL indicates the time of an autonomous switching. Hence, by mathematical induction, the relation is proved for all t ∈ [t0 , tf ]. i) No switching ahead: First, consider a Lebesgue time t ∈ [tL , tL +1 ] ≡ [tL , tf ] and the hybrid trajectory passing through (qL , x), and consider the cost to go (28) for I0 , which is  tf lq L (xs , us ) ds + g (xf ) . (39) J (t, qL , x, 0; I0 ) = t

Since by Definition 9 the discontinuities in x of [t,t ] I0 f ≡ u[t,t f ] lie on lower dimensional sets which are closed in the induced topology of the space, the partial derivative of J with respect to x exists in an open neighborhood of (t, x) and is derived as  tf   ∂J (t, qL , x, 0; I0) ∂  (t f ) ∂ (s) ds + = g xq L lq L x(s) , u qL qL ∂x ∂x t ∂x  tf ∂ ∂ lq L (xs , us ) ds + g (xf ) = ∂x ∂x t (40) which is equivalent to ∂J (t, qL , x, 0; I0 ) = ∂x



T ∂xs ∂lq L (xs , us ) ds ∂x ∂xs t T ∂xf ∂g(xf ) . (41) + ∂x ∂xf 

tf

Taking t = tf , the terminal condition for determined by

∂J ∂x

is seen to be

∂J (tf , qL , x, 0; I0 ) = ∇x f g (xf ) ≡ ∇g (x) (42) ∂x because xf = x when t = tf . Hence, (36) is proved. With the notation xs = φq L (s, t, x) and with the smoothness provided

∂ (fq L (φq L (s, t, x) , u)) ∂x

from which we obtain T   d ∂xs ∂fq L ∂φq L (s, t, x) = ds ∂x ∂xs ∂x ∂φ

(43)

(44)

(t,t,x)

qL with = In q L ×n q L , since φq L (t, t, x) = x. Let ∂x qL n q L ×n q L Φs,t ∈ R denote the solution of

T ∂fq L (xs , us ) qL qL T L ˙ Φs,t = ∇x s fq L (xs , us ) Φs,t ≡ Φqs,t ∂xs (45) with Φqt,tL = In q L ×n q L . By the uniqueness of the solutions to (44) and (45): ∂ L φq (s, t, x) = Φqs,t , ∂x L

(46)

for all x ∈ Rn q L . Also by the semigroup property: x = φq L (s, t, xt ) = φq L (s, t, φq L (t, s, x)) .

(47)

Hence, by taking the derivative with respect to x, we have  ∂φq L (t, s, x) ∂φq L (s, t, z)  In q L ×n q L = (48)  ∂z ∂x z =φ(t,s,x s ) which by (46) is equivalent to L L In q L ×n q L = Φqs,t Φqt,s .

(49)

For all r, s, t ∈ (tL , tf ], it is the case that T ∂fq L (xs , us ) d qL L L Φ = Φqs,r , Φqr,r = In q L ×n q L ds s,r ∂xs (50)   T ∂fq L (xs , us ) d qL qL

L Φs,t Φt,r = Φqs,t Φqt,rL ds ∂xs

∂fq L (xs , us ) = ∂xs

T

qL qL

Φs,t Φt,r

(51)

where for (51) at s = r, the condition Φqr,tL Φqt,rL = In q L ×n q L holds. Hence, from the uniqueness of the solution to the ODEs L L (50) and (51), we obtain Φqs,t Φqt,rL = Φqs,r . Furthermore, (49) gives 0= and hence

L L dΦqs,t dΦqt,s L L Φqt,s + Φqs,t dt dt

(52)

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

L L  q L −1 dΦqs,t dΦqt,s L = −Φqs,t Φt,s dt dt T  q L −1 ∂fq L (xt , ut ) L L Φt,s = −Φqs,t Φqt,s ∂xt T ∂fq L (xt , ut ) qL = −Φs,t . ∂xt

controlled switching. Then, for tj −1 < t ≤ tj < θ ≤ tj +1    [t,t ] J t, qj −1 , x, L − j + 1; IL −jf +1 =  + (53)

Differentiating (41) with respect to t along a trajectory (qL , x) gives Eqn. (54) as shown at the bottom of this page, where the zero terms arise from



 ∂J  [t ,t ] tf , qL , x, 0; I0 f f = ∇x f g (xf ) ≡ ∇x g (x) . ∂x

(59)

ii) A controlled switching ahead: Now assume that (35) holds for θ ∈ (tj , tj +1 ], j ≤ L when tj ∈ τL indicates a time of a



xθ = ξ xt + t

tj −

 fq j −1 (xs , us ) ds +



θ

fq j (xω , uω ) dω.

tj

(61) This gives  tj ∂J(t, qj −1 , x, L − j + 1; IL −j +1 ) ∂ = lq j −1 (xs , us )ds ∂x ∂x t  (x(t −)) ∂cσ j j ∂ θ (x ω ,u ω ) ∂J (θ, qj , xθ , L − j; IL −j) + + lq j dω + ∂x ∂x t j ∂x (62) which is equivalent to  t j T (x s ,u s ) ∂xs ∂lq j −1 ∂J (t, qj −1 , x, L − j + 1; IL −j +1) = ds ∂x ∂x ∂xs t

 θ T ∂lq j (xω , uω ) ∂xt j − T ∂c xt j − ∂xω + dω + ∂x ∂xt j − ∂x ∂xω tj T ∂xθ ∂J (θ, qj , xθ , L − j; IL −j ) + (63) ∂x ∂xθ

which gives

with

  [θ ,t ] lq j (xω , uω )dω + J θ, qj , xθ , L − j; IL −jf

where

Hence

d ∂J (t, qL , x, 0; I0 ) dt ∂x T ∂fq L (x, u) ∂J (t, qL , x, 0; I0 ) ∂lq L (x, u) − (58) =− ∂x ∂x ∂x

θ

(x(t j −))

s ,u s ) lq(xj −1 ds + cσ j

(60)

(56)

d ∂J (t, qL , x, 0; I0 ) ∂lq (xt , ut ) =− L dt ∂x ∂xt

T   T (x ,u ) (x ,u ) T tf ∂xf ∂g (xf ) ∂xs ∂lq L s s ∂fq L t t − ds + ∂xt ∂x ∂xs ∂x ∂xf t (57)

tj

t

tj

d d ∇z lq L (z, us )|z =φ q (s,t,x) = ∇x s lq L (xs , us ) = 0 L dt dt (55) d d ∇z g (z, us )|z =φ q (t f ,t,x) = ∇x f g (xf ) = 0. L dt dt

4353

with the differentiation of (61) giving    tj  θ ∂ξ x + f (x , u ) ds t q s s j −1 ∂fq j (xω , uω) t ∂xθ = dω + ∂x ∂x ∂x tj

 θ ∂ξ xt j − ∂fq j (xω , uω ) dω + (64) = ∂x ∂x tj

   d t f ∂φq L (s, t, x) T ∂lq L (z, us )  d ∂J (t, qL , x, 0; I0 ) d ∂φq L (tf , t, x) T ∂g  = ds +  dt ∂x dt t ∂x ∂z dt ∂x ∂z z =φ q (t f ,t,x) z =φ q L (s,t,x) L       T T  tf ∂φq L (s, t, x) ∂φq L (s, t, x) ∂lq L (z, us )  d ∂lq L (z, us )  =− + ds   ∂x ∂z dt ∂x ∂z t z =φ q (s,t,x) z =φ q (s,t,x)

+

d ∂φq L (tf , t, x) dt ∂x

T

 ∂g  ∂z 

L

s=t

L

z =φ q L (t f ,t,x)

     tf  T T ∂fq L (xt , ut ) ∂φq L (s, t, x) ∂lq L (z, us )  ∂lq L (xt , ut ) − = − In ×n . + 0 ds +  ∂xt ∂xt ∂x ∂z t z =φ q L (s,t,x)    T T ∂φq L (tf , t, x) ∂fq L (xt , ut ) ∂g (z)  +0 + − ∂xt ∂x ∂z z =φ q (t f ,t,x) L

(54)

4354

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

from which we obtain

T  θ ∂xt j − T ∂ξ xt j − ∂xθ ∂xω ∂fq j (xω , uω) = dω + . ∂x ∂x ∂xω ∂x ∂xt j − tj (65) In particular, for x = xt as t ↑ tj and for xθ as θ ↓ tj , the equation (63) becomes

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 ∂xt j − T  tj ∂lq j −1 (xs , us ) ∂xs = ds ∂x ∂xs tj −

 t + T j ∂xt j − T ∂c xt j − ∂lq j (xs , us ) ∂xs + + ds ∂xt j − ∂xt j − ∂x ∂xs tj

∂xt j + T ∂J tj +, qj , xt j + , L − j; IL −j + (66) ∂xt j − ∂xt j + which is equivalent to

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 ∂xt j −



∂c xt j − ∂xt j + T ∂J tj +, qj , xt j + , L − j; IL −j = + ∂xt j − ∂xt j − ∂xt j + (67) and also (65) turns into T  tj + ∂xt j + ∂fq j (xω , uω ) ∂xω = dω ∂xt j − ∂x ∂xω tj

∂xt j − T ∂ξ xt j − + ∂xt j − ∂xt j −

(68)

sides of the equality (60) with respect to x at t ∈ (tj −1 , tj ], with tj −1 < t ≤ tj < θ ≤ tj +1 , yields ∂J (t, qj −1 , x, L − j + 1; IL −j +1 ) ∂x  tj ∂ ∂c (x (tj −)) = l(x s ,u s ) ds + ∂x t q j −1 ∂x  θ ∂ ∂J (θ, qj , xθ , L − j; IL −j ) + lq (xω , uω ) dω + ∂x t j j ∂x (72) which gives  ∂tj ∂J (t, qj −1 , x, L − j + 1; IL −j +1 ) = lq j −1 (xs , us )s=t j − ∂x ∂x

T  tj ∂xt j − T ∂c xt j − ∂lq j −1 (xs , us ) ∂xs + ds + ∂x ∂xs ∂x ∂xt j − t T  θ  ∂lq j (xω , uω ) ∂xω ∂tj lq j (xω , uω )ω =t j + − dω ∂x ∂x ∂xω tj T ∂xθ ∂J (θ, qj , xθ , L − j; IL −j ) + (73) ∂x ∂xθ with the derivative of (61) derived as    tj ∂ ∂xθ = ξ xt + fq j −1 (xs , us )ds ∂x ∂x t  θ  ∂fq j (xω , uω ) ∂tj  − dω fq j (xω , uω ) ω =t j + ∂xt j − ∂x tj T    tj ∂ ∂ξ (z)  (x s ,u s ) = + f ds x t q j −1  t j (x s ,u s ) ∂z  ∂x z =x t +



which gives

∂xt j + ∂ξ xt j − = = ∇ξ|x t − . j ∂xt j − ∂xt j −

(69)

=



∂xt j −

∂c xt j − + ∇ξ|Tx t − j ∂xt j −

  [t j +,t f ] ∂J tj +, qj , xt j + , L − j; IL −j ∂xt j +

(70)

t

+ J (t, qj −1 , x (tj −) , L − j + 1; IL −j +1 )



∂tj fq j xt j , ut j + ∂xt j −



θ

tj

∂fq j (xω , uω ) dω ∂x

Note that in the above equations, the partial derivative

(74)

∂ tj ∂ xt j −

is not necessarily zero because for δxt ∈ Rn , the perturbed trajectory xs + δxs arrives on the switching manifold m at a different time tj − = (tj + δtj )−, δtj ∈ R. This requires an arbitrary modification of the input, which we take to be

(71)

and following a similar procedure as in part (i) of the proof, (35) is derived for t ∈ (tj −1 , tj ]. iii) An autonomous switching ahead: Now assume that (35) holds for all θ ∈ (tj , tj +1 ], j ≤ L when tj ∈ τL indicates a time of an autonomous switching. Then, taking the derivative of both

t

ds

 θ

∂fq j (xω , uω ) ∂xθ ∂tj =− dω fq xt j , ut j + ∂x ∂xt j − j ∂x tj  ∂tj (x t j − ,u t j − ) + ∇ξ|x − In ×n + fq j −1 t ∂x tj − j   tj (x ,u ) ∂xs T ∂fq j −1s s + ds . (75) ∂x ∂xs t

.

Therefore, (37) is shown to hold with p = 0 for the controlled switching case. Writing  tj J (t, qj −1 , x, 0; IL −j +1 ) = lq j −1 (xs , us ) ds

f q j −1

which gives

Hence   [t j ,t f ] ∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1

t

with

IL −j +1 = ((tj + δt, σj ) , u )

(76)

⎧ s ∈ [t, tj ) ⎨ us ,  us = u (tj −) , s ∈ [tj , tj + δt) ⎩ us , s ∈ [tj + δt, tj +1 )

(77)

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

if δt ≥ 0 and ⎧ s ∈ [t, tj + δt) ⎪ ⎨ us ,  us = u (tj ) ≡ u (tj +) , s ∈ [tj + δt, tj ) ⎪ ⎩ us , s ∈ [tj , tj +1 )

(78)

if δt < 0. Since IL −j +1 = IL −j +1 holds everywhere except only on [tj , tj + δt) (or [tj + δt, tj ) if δt < 0), the measure of the set of modified controls is of the order |δt|. Evidently, the perturbed trajectory arrives on the switching manifold when

(79) m xt j +δ t j − + δxt j +δ t j − = 0. For δt ≥ 0, we may write   m xt j − + δxt j − +

= m xt j − = 0

t j +δ t

tj



fq j −1 xs + δxs , ut j −





−∇mT δxt j −

+ O δt2 . ∇mT fq j −1 xt j − , ut j −

Therefore

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 = ∂xt j − 



−∂tj lq j xt j , ut j − lq j −1 xt j − , ut j − ∂xt j −



T ∂J tj +, qj , xt j + , L − j; IL −j + fq j − ∇ξfq j −1 ∂xt j +



∂c xt j − T ∂J tj +, qj , xt j + , L − j; IL −j + + ∇ξ . ∂xt j − ∂xt j + (86) But in the limit as δxt j − ∈ Rn tends to zero, it is concluded from (82) that ∂tj −∇m

. = T ∂xt j − ∇m fq j −1 xt j − , ut j −

(80)

which results in 

T 

 ∇m xt j − δxt j − + fq j −1 xt j − , ut j − δt + O δt2 = 0 (81) or δt =

4355

(82)

Equations (81) and (82) are derived similarly for δt < 0. In particular, as t ↑ tj and θ ↓ tj , (73) becomes

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 ∂tj (x t j − ,u t j − ) = lq ∂xt j − ∂xt j − j −1

T  tj ∂xt j − T ∂c xt j − ∂lq j −1 (xs , us ) ∂xs + ds + ∂x ∂xs ∂xt j − ∂xt j − tj − T  tj +

∂lq j (xs , us ) ∂xs ∂tj − lq j xt j , ut j + ds ∂xt j − ∂x ∂xs tj

∂xt j + T ∂J tj +, qj , xt j + , L − j; IL −j + (83) ∂xt j − ∂xt j + or

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 ∂xt j −



∂c xt j −

∂tj ∂tj = lq j −1 xt j − , ut j − + − lq j xt j , ut j ∂xt j − ∂xt j − ∂xt j −

T ∂xt j + ∂J tj +, qj , xt j + , L − j; IL −j + (84) ∂xt j − ∂xt j + and (75) turns into   ∂xt j + ∂tj (x t j − ,u t j −) (x t j ,u t j) = ∇ξ|x t − − − ∇ξ|x t − fq j −1 fq . j j ∂xt j − ∂xt j − j (85)

(87)

Therefore

∂J tj −, qj −1 , xt j − , L − j + 1; IL −j +1 ∂xt j −

T ∂J tj +, qj , xt j + , L − j; IL −j = ∇ξ + ∇c ∂xt j + +

lq j − lq j −1 +fq j −∇ξfq j −1T

∂ J t j +,q j ,x t + ,L −j ;I L −j j

∂ xt j +

∇mT fq j −1 xt j − , ut j −

∇m (88)

This proves (37) with p= [t j , t f ]



T lq j − lq j −1 + fq j − ∇ξfq j −1 ∇ J (t j +,q j ,x t j + ,L −j ;I L −j )

xt j +

∇mT fq j −1 (xt j − , ut j −) (89) which is the same equation for p as in (38). Taking account of (71), and following a similar procedure as in part (i) of the proof, (35) is derived for t ∈ (tj −1 , tj ], and as shown above, it is subject to the terminal and boundary conditions (36) and (37), respectively. This completes the proof.  Theorem 6.2: Consider the hybrid system H together with Assumptions A0–A2 and the HOCP (11) for the hybrid cost (10). If there exists an optimal hybrid input with admissible set of discontinuities, then along each optimal trajectory, the adjoint process λ in the HMP and the gradient of the value function ∇V in the corresponding HDP satisfy the same family of differential equations, almost everywhere, i.e., T d ∂ ∂ o o o ∇V = − fq (x , u ) ∇V − lq o (xo , uo ) (90) dt ∂x ∂x T ∂ d o ∂ o o λ =− fq o (x , u ) λo − lq o (xo , uo ) (91) dt ∂x ∂x

4356

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

and satisfy the same terminal and boundary conditions, i.e.,



∇V tf , q o , xoqL (tf ) , 0 = ∇g xoqL (tf ) , (92)   ∇V tj −, qjo−1 , xoqj −1 (tj −) , L − j + 1   = ∇ξ|Tx oq (t j −) ∇V tj +, qjo , xoqj (tj +) , L − j j −1

+ p ∇m|x oq

j −1

(t j −)

+ ∇c|x oq

j −1

(93)

(t j −)

for the gradient of the value function, and

λo (tf ) = ∇g xoqL (tf ) , λo (tj −) = ∇ξ|Tx oq

j −1

+ ∇c|x oq

(t j −)

j −1

(94)

λo (tj +) + p ∇m|x oq

j −1

(t j −)

uo = −λx .

−∂Hq 1 = −λ (uo + 1) = λ (λ x − 1) , λ˙ = ∂x

t ∈ (t0 , ts )

−∂Hq 2 = −λ (uo − 1) = λ (λ x + 1) , λ˙ = ∂x

t ∈ (ts , tf )

λ (tf ) = ∇g|x(t f ) = x (tf )

i=0 [ti, ti+1]×R

= −λ (ts +) + 

nq i

.  Proof: Equations (91), (94), and (95) are direct results of the HMP in Theorem 4.1, and (90), (92), and (93) hold for the optimal feedback control having an admissible set of discontinuities because (35)–(37) hold for all feedback controls with admissible sets of discontinuities, including uo corresponding to xo . Hence, from Theorem 2.1 and the resulting uniqueness of the solutions of (90) and (91) that are identical almost everywhere on t ∈ [t0 , tf ], it is concluded that (96) holds almost everywhere  in the Lebesgue sense on Li=0 [ti , ti+1 ] × Rn q i .  VII. EXAMPLES Example 1: Consider a hybrid system with the indexed vector fields: x˙ = f1 (x, u) = x + x u

(97)

x˙ = f2 (x, u) = −x + x u

(98)

and the hybrid optimal control problem  ts 1 2 1 J (t0 , tf , h0 , 1; I1 ) = u dt + 1 + [x (ts −)]2 t0 2  tf 1 2 1 u dt + [x (tf )]2 + 2 ts 2

(105)

−2x (ts −) 1 + [x (ts −)]2

2 .

(106)

The replacement of the optimal control input (102) in the continuous state dynamics (18) gives ∂Hq 1 = x (1 + uo ) = −x (λ x − 1) , t ∈ (t0 , ts )(107) ∂λ ∂Hq 2 = x (−1 + uo ) = −x (λ x + 1) , t ∈ (ts , tf ) x˙ = ∂λ (108) x˙ =

which are subject to the initial and boundary conditions x (t0 ) = x (0) = x0

(109)

x (ts ) = ξ (x (ts −)) = −x (ts −) .

(110)

The Hamiltonian continuity condition (25) states that 1 Hq 1 (ts −) = [uo (ts −)]2 + λ (ts −) x (ts −) [uo (ts −) + 1] 2 1 = [−λ (ts −) x (ts −)]2 2 + λ (ts −) x (ts −) [−λ (ts −) x (ts −) + 1] 1 o [u (ts +)]2 2 + λ (ts +) x (ts +) [uo (ts +) − 1]

= Hq 2 (ts +) = (99) =

subject to the initial condition h0 = (q(t0 ), x(t0 )) = (q1 , x0 ) provided at the initial time t0 = 0. At the controlled switching instant ts , the boundary condition for the continuous state is provided by the jump map x(ts ) = ξ(x(ts −)) = −x(ts −). The HMP formulation and results: Writing down the HMP results for the above HOCP, the Hamiltonians are formed as 1 2 u + λ x (u + 1) 2 1 = u2 + λ x (u − 1) 2

(104)

λ (ts −) ≡ λ (ts ) = ∇ξ|x(t s −) λ (ts +) + ∇c|x(t s −)

(96) L

(103)

which are subject to the terminal and boundary conditions

for the adjoint process. Hence, the adjoint process and the gradient of the value function are equal almost everywhere, i.e., λo = ∇x V

(102)

Therefore, the adjoint process dynamics, determined from (19) and with the replacement of the optimal control input from (102), is written as

(95)

(t j −)

almost everywhere in the Lebesgue sense on

from which the minimizing control input for both Hamiltonian functions is determined as

Hq 1 =

(100)

Hq 2

(101)

1 [−λ (ts +) x (ts +)]2 2 + λ (ts +) x (ts +) [−λ (ts +) x (ts +) − 1] (111)

which can be written, using (110), as ! " 1 x (ts − ) [λ (ts − )− λ (ts + )] = [x (ts − )]2 [λ (ts −)]2 −[λ (ts + )]2 . 2 (112) The solution to the set of ODEs (103), (104), (107), (108) together with the initial condition (109) expressed at t0 , the terminal condition (105) determined at tf , and the boundary conditions (110) and (106) provided at ts , which is not a priori

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

fixed but determined by the Hamiltonian continuity condition (112), determines the optimal control input and its corresponding optimal trajectory that minimize the cost J(t0 , tf , h0 , 1; I1 ) over I1 , the family of hybrid inputs with one switching. Interested readers are referred to [19], in which further steps are taken in order to reduce the above boundary value ODE problem into a set of algebraic equations using the special forms of the differential equations under study. The HDP formulation and results: Theorem 5.5 states that the value function satisfies the HJB equation (31) almost everywhere. In particular   ∂V (t, q2 , x, 0) ∂V − = inf Hq 2 x, ,u u ∂t ∂x     1 ∂V ∂V = inf lq 2 (x, u)+ fq 2 (x, u) = inf u2 + [−x + xu] u u ∂x 2 ∂x  =

 1 2 ∂V u + [−x + xu] 2 ∂x u =−x ∂ V

∂x

=

−1 2 x 2



∂V ∂x

2

−x

∂V ∂x

and similarly −

∂V (t, q1 , x, 1) −1 2 = x ∂t 2

(113) 

∂V ∂x

2

+x

∂V ∂x

(114)

with the boundary conditions V (tf , q2 , x, 0) = g (x (tf )) =

1 2 x 2

(115)

for V (t, q2 , x, 0), together with   1 V (ts , q1 , x, 1) = min V (ts , q2 , −x, 0) + 1 + x2 σ ∈{σ q 1 q 2 } (116) and   2 2 ∂Vq 2 −1 2 ∂Vq 1 −1 ∂Vq 1 x = (−x)2 +x 2 ∂x ∂x 2 ∂x − (−x)

∂Vq 2 ∂x

(117)

which determine V (t, q1 , x, 1) and ts . The HMP—HDP relationship: In order to illustrate the results of Theorem 6.2, we first take the partial derivatives of (113) with respect to x to write    2 ∂ ∂V 1 2 ∂V ∂V − x =0 (118) −x ∂x ∂t 2 ∂x ∂x or ∂2 V −x ∂x∂t



2

∂2 V ∂V ∂ 2 V ∂V − x − = 0. ∂x ∂x2 ∂x ∂x2 (119) It can easily be verified that the set of states with twice differentiability of V (t, q2 , x, 0) is M(2) = (ts , tf ) × (R − {0}), which is open dense in R × R, and therefore,  2 2 ∂V ∂2 V ∂2 V ∂V 2 ∂V ∂ V −x . (120) −x 2 =x + ∂t∂x ∂x ∂x2 ∂x ∂x ∂x ∂V ∂x

− x2

4357

But from the definition of the total derivative, we have   ∂2 V ∂2 V d ∂V + fq (x, uo ) = dt ∂x ∂t∂x ∂x2 2   ∂2 V ∂2 V 2 ∂V = + −x −x ∂t∂x ∂x2 ∂x =

∂2 V ∂V ∂ 2 V ∂2 V − x2 −x 2 . 2 ∂t∂x ∂x ∂x ∂x

(121)

Therefore, from (120) and (121), the governing equation for ∇V (t, q2 , x, 0) is derived as     2  d ∂V ∂V ∂V ∂V ∂V = +1 (122) + =x x dt ∂x ∂x ∂x ∂x ∂x which is the same as the dynamics (104) for λ(t), t ∈ (ts , tf ). Similarly, the differentiation of (114) results in     d ∂V ∂V ∂V −1 (123) = x dt ∂x ∂x ∂x which is the same as the dynamics (103) for λ(t), t ∈ (t0 , ts ). The equality of the terminal conditions for ∇V (tf , q2 , x, 0) and λ(tf ) becomes obvious by taking the gradient of (115), i.e., ∂g (x) ∂V (tf , q2 , x, 0) = =x ∂x ∂x

(124)

which is equivalent to (105). Moreover, the equality of the boundary conditions for ∇V (tf , q2 , x, 0) and λ(tf ) can be illustrated by taking the gradient of (116) and writing   ∂ ∂ 1 V (ts , q1 , x, 1) = V (ts , q2 , −x, 0) + ∂x ∂x 1 + x2 (125) that gives  ∂V (ts , q1 , x, 1) ∂V (ts , q2 , y, 0)  −2x =− +  ∂x ∂y (1 + x2 )2 y =−x (126) which is the same boundary condition as the boundary condition (106) for λ. Therefore, by the uniqueness of the results of the set of differential equations (122) and (123) for ∇V (or equivalently (104) and (103) for λ) with the terminal and boundary conditions (124) and (126) for ∇V (or equivalently (105) and (106) for λ), the gradient of the value function evaluated along every optimal trajectory is equal to the adjoint process corresponding to the same trajectory. Interested readers are referred to [19] for further discussion on this example.  Example 2: Consider the hybrid system with the indexed vector fields: x˙ 1 x2 x˙ = = f1 (x, u) = (127) x˙ 2 −x1 + u

x˙ 1 x˙ = x˙ 2





x2 = f2 (x, u) = u

(128)

where autonomous switchings occur on the switching manifold described by m (x1 (ts −) , x2 (ts −)) ≡ x2 (ts −) = 0

(129)

4358

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

with the continuity of the trajectories at the switching instant. Consider the hybrid optimal control problem defined as the minimization of the total cost functional  tf 1 2 1 1 J= u dt + (x1 (ts −))2 + (x2 (tf ) − vref )2 . 2 2 t0 2 (130) The HMP formulation and results: Employing the HMP, the corresponding Hamiltonians are defined as 1 H1 = λ1 x2 + λ2 (−x1 + u) + u2 (131) 2 1 H2 = λ1 x2 + λ2 u + u2 . (132) 2 The Hamiltonian minimization with respect to u [see (24)] gives uo = −λ2

(133)

for both q = 1 and q = 2. Therefore, the state dynamics (18) and the adjoint process dynamics (19) for q = 1 are derived as x˙ 1 =

∂H1 = x2 ∂λ1

(134) (135)

−∂H1 = λ2 λ˙ 1 = ∂x1

(136) (137)

x˙ 1 =

∂H2 = x2 ∂λ1

(138)

x˙ 2 =

∂H2 = uo = −λ2 ∂λ2

(139)

−∂H2 =0 λ˙ 1 = ∂x1

(140)

−∂H2 = −λ1 (141) λ˙ 2 = ∂x2 At the initial time t = t0 , the continuous valued states are specified by the initial conditions x1 (t0 ) = x10

(142)

x2 (t0 ) = x20 .

(143)

At the switching instant t = ts , the boundary conditions for the states and adjoint processes are determined as x1 (ts ) = x1 (ts −) ≡ lim x1 (t)

(144)

x2 (ts ) = x2 (ts −) = 0

(145)

t↑t s

λ1 (ts ) = λ1 (ts +) +

λ1 (tf ) =

∂g =0 ∂x1

(148)

λ2 (tf ) =

∂g = x2 (tf ) − vref . ∂x2

(149)

Note that unlike t0 and tf , which are a priori determined, ts is not fixed and needs to be determined by the Hamiltonian continuity condition (25) as H1 (ts −) = λ1 (ts −) x2 (ts −) − λ2 (ts −) x1 (ts −) 1 1 − λ2 (ts −)2 = −λ2 (ts ) x1 (ts −) − λ2 (ts )2 = H2 (ts +) 2 2 1 1 = λ1 (ts +) x2 (ts +) − λ2 (ts +)2 = − λ2 (ts +)2 (150) 2 2 i.e. 1 1 (151) λ2 (ts ) x1 (ts −) + λ2 (ts )2 = λ2 (ts +)2 2 2 that with the substitution of (147), it becomes 1 1 (λ2 (ts +) + p)2 = λ2 (ts +)2 . 2 2 (152) The set of ODEs (134) to (141), together with the initial conditions (142) and (143) expressed at t0 , the boundary conditions (144)–(147) provided at ts , and the terminal conditions (148) and (149) determined at tf , with the two unknowns ts and p determined by the Hamiltonian continuity condition (152) and the switching manifold condition (129), form an ODE boundary value problem whose solution results in the determination of the optimal control input and its corresponding optimal trajectory that minimize the cost J(t0 , tf , h0 , 1; I1 ) over I1 , the family of hybrid inputs with one switching on the switching manifold (129). Interested readers are referred to [21] for further steps taken in order to reduce the above boundary value ODE problem into a set of algebraic equations using the special forms of the differential equations under study. The HDP formulation and results: For the linear differential equations (127) and (128), the Hamiltonians for the HJB equation are formed as (λ2 (ts +) + p) x1 (ts −) +

∂H1 = −x1 + uo = −x1 − λ2 x˙ 2 = ∂λ2

−∂H1 = −λ1 λ˙ 2 = ∂x2 and for q = 2 are derived as

And at the terminal time t = tf , the adjoint processes are determined by (22) as

∂c ∂m +p = λ1 (ts +) + x1 (ts ) ∂x1 ∂x1 (146)

∂c ∂m +p = λ2 (ts +) + p. (147) λ2 (ts ) = λ2 (ts +) + ∂x2 ∂x2

H1 (x, ∇V, u) =

1 2 ∂V ∂V u + · x2 + · (−x1 + u) (153) 2 ∂x1 ∂x2

H2 (x, ∇V, u) =

1 2 ∂V ∂V u + · x2 + ·u 2 ∂x1 ∂x2

(154)

which have a minimizing control input uo = −∂V /∂x2 .

(155)

Therefore, the HJB equations are expressed as 2  ∂V −1 ∂V ∂V (t, q2 , x, 0) = + x2 − ∂t 2 ∂x2 ∂x1 −1 ∂V (t, q1 , x, 1) = − ∂t 2



∂V ∂x2

2

+ x2

(156)

∂V ∂V − x1 . ∂x1 ∂x2 (157)

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

The terminal condition at t = tf is specified as

d dt

1 V (tf , q2 , x, 0) = (x2 − vref )2 2

(158)

for V (t, q2 , x, 0), and the boundary condition for V (t, q1 , x, 1) and the switching instant t = ts are determined by 1 V (ts , q1 , x, 1) = V (ts , q2 , x, 0) + x21 2

(159)

and −1 2



∂Vq 1 ∂x2

2

+x2

∂Vq 1 ∂Vq 1 −1 − x1 = ∂x1 ∂x2 2



∂Vq 2 ∂x2

2

∂Vq 2 ∂x1 (160)

+x2

subject to the switching manifold condition (129). The HMP–HDP relationship: Similar to Example 1, in order to illustrate the result in Theorem 6.2, we shall take partial derivatives of (156) and (157) with respect to x. We note that by the definition of the total derivative     d ∂V (t, qi , x, 2 − i) ∂2 V ∂V ∂2 V + f x, − = q dt ∂x ∂x∂t ∂x2 i ∂x2 (161) which for ∇V (t, q2 , x, 0) it is equivalent to ⎡ ∂ V (t,q ,x,0) ⎤ ⎡ 2 ⎤ ⎡ 2 ⎤  ∂ V ∂2 V ∂ V 2 x2 ∂ x1 ∂ x2 ∂ x 21 ∂ x1 ∂ x1 ∂ t d ⎣ ⎦=⎣ ⎦+⎣ ⎦ ∂2 V ∂2 V ∂2 V dt ∂ V (t,q 2 ,x,0) − ∂∂xV2 ∂x ∂x ∂ x2 ∂x ∂t ∂ x2

2

2

⎡ =⎣

1

2

2

∂2 V ∂ x1 ∂ t

+ x2 ∂∂ xV2 −

∂2 V ∂ x2 ∂ t

+ x2 ∂ x∂2 ∂Vx 1 −

1

2

∂2 V ∂V ∂ x1 ∂ x2 ∂ x2 ∂2 V ∂V ∂ x 22 ∂ x 2



=⎣ ⎡ =⎣

∂ x2 2

∂ V ∂ x1 ∂ t ∂2 V ∂ x2 ∂ t ∂2 V ∂ x1 ∂ t





⎦+⎣

∂2 V ∂ x 21

∂2 V ∂ x1 ∂ x2

∂2 V ∂ x2 ∂ x1

∂2 V ∂ x 22

2

+ x2 ∂∂ xV2 −

∂2 V ∂ x2 ∂ t

1

2



⎦ (162)



∂2 V ∂V ∂ x1 ∂ x2 ∂ x2

+ x2 ∂ x∂2 ∂Vx 1 −



x2 −x1 −

∂V ∂ x2

2

− x1 ∂ x∂1 ∂Vx 2

∂2 V ∂V ∂ x 22 ∂ x 2

2

− x1 ∂∂ xV2

⎤ ⎦





∂V (t, q1 , x, 1) ∂x1

= 

∂V (t, q1 , x, 1) ∂x2

∂V (t, q1 , x, 1) ∂x2

=−

(166)

∂V (t, q1 , x, 1) ∂x1

(167)

which are equivalent to the differential equations (136) and (137) for λ(t), t ∈ (t0 , ts ]. Moreover, it can easily be verified that the optimal sensitivity process ∇V satisfies the terminal condition ⎡ ∂ V (t ,q ,x,0) ⎤ 2 f 0 ∂ x1 ⎦ ⎣ = ∇V (tf , q2 , x, 0) = ∂ V (t f ,q 2 ,x,0) x2 (tf ) − vref ∂ x2

and the boundary condition ⎡ ∂ V (t ,q ,x,1) ⎤ ⎡ ∂ V (t s



1

∂ x1 ∂ V (t s ,q 1 ,x,1) ∂ x2

⎦=⎣

⎡ ∂ m (x(t

+ p⎣

s

−))

∂ x1 ∂ m (x(t s −)) ∂ x2



(168)

s ,q 2 ,x,0) ∂ x1

∂ V (t s ,q 2 ,x,0) ∂ x2

⎡ ∂ V (t

⎦=⎣



⎦+⎣

,q 2 ,x,0) ∂ x1 s

⎡ ∂ c(x(t

s

−))

∂ x1 ∂ c(x(t s −)) ∂ x2

+ x1 (ts −)

∂ V (t s ,q 2 ,x,0) ∂ x2

+p

⎤ ⎦ ⎤ ⎦ (169)

subject to x2 (ts −) = 0. Therefore, by the uniqueness of the results of the set of governing differential equations for ∇V and λ, which are subject to the same terminal and boundary conditions, along any optimal trajectory, the gradient of the value function is equal to the adjoint process corresponding to the same optimal trajectory. Interested readers are referred to [21] for further discussion on this example.  VIII. CONCLUDING REMARKS

and for ∇V (ts , q1 , x, 1) is ⎡ ∂ V (t ,q ,x,1) ⎤ s 1 ∂ x1 d ⎣ ⎦ dt ∂ V (t s ,q 1 ,x,1) ⎡

d dt



4359

In this paper, it is proved in the context of deterministic hybrid optimal control theory that the adjoint process in the HMP and the gradient of the value function in HDP are identical to each other almost everywhere along optimal trajectories; this is because they both satisfy the same Hamiltonian equations with the same boundary conditions. So due to the fact that a similar adjoint process–gradient process relationship holds for continuous parameter (i.e., nonhybrid) stochastic optimal control problems via the so-called Feynman–Kac formula (see, e.g., [6]), it is natural to expect the adjoint process in the stochastic HMP [61] and the gradient of the value function in stochastic HDP to be identical almost everywhere. Indeed, the formulation of stochastic HDP and the investigation of its relationship to the stochastic HMP is the subject of another study to be presented in a consecutive paper.

(163)

2

Taking the partial derivative of (156) with respect to x and making a substitution using the resulting equation in (162) yields   d ∂V (t, q2 , x, 0) =0 (164) dt ∂x1   d ∂V (t, q2 , x, 0) ∂V (t, q2 , x, 0) (165) =− dt ∂x2 ∂x1 which are equivalent to the differential equations (140) and (141) for λ(t), t ∈ (ts , tf ]. Similarly, taking the partial derivative with respect to x of (157) and making a substitution in (163) gives

APPENDIX A. Proof of Theorem 5.1 For simplicity of notation, we use x, x ˆ, u, etc. instead of ˆq , uq , etc. whenever the spaces Rn q , R × Rn q , Uq , etc., xq , x can easily be distinguished. For a given hybrid control input ˆτ ≡ x ˆ(τ ; t, x ˆt ) to denote the IL −j +1 = (SL −j +1 , u), we use x extended continuous valued state as in (12) at the instant τ passing through xt , where t ≤ τ ≤ tf . We also define ( '( ) ( ( ˆr × U ˆ, u) ∈ Q × B x, u)( : (q, x (170) K1 = sup (fˆq (ˆ ˆr := {ˆ x = [z, xT ]T : |z|2 + x 2 < r2 }. where B

4360

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

Proof: First, consider the stage where no remaining switching is available and hence t ∈ (tL , tL +1 ) = (tL , tf ). In this case  tf x ˆ (tf ; t, x ˆt ) = x ˆt + xτ , uτ ) dτ (171) fˆq L (ˆ t

which gives



ˆt ) − x ˆt ≤ K1 |tf −t|+

ˆ x (tf ; t, x

tf

ˆ f ˆ K x (τ ; t, x ˆt ) −ˆ xt dτ

Now, consider t, s ∈ (tj , tj +1 ), where tj +1 indicates a time of an autonomous switching for the trajectory x ˆ(τ ; t, x ˆt ), and consider for definiteness the case where x ˆ(τ ; s, x ˆs ) arrives on the switching manifold described locally by m(x) = 0 at a later time tj +1 + δt (the case with an earlier arrival time can be handled similarly by considering δt < 0). It directly follows by replacing fˆq L and tf by fˆq j and tj +1 − in the above arguments that

t

ˆt ) − x ˆ (tj +1 −; s, x ˆs )

ˆ x (tj +1 −; t, x   12 xt − x ˆs 2 + |s − t|2 . ≤ K  ˆ

(172) ˆ f depends only on Kf and Kl which are defined in Aswhere K sumptions A0 and A2, respectively. By the Gronwall–Bellman inequality, this results in ˆt ) − x ˆt ≤ K1 |tf − t|

ˆ x (tf ; t, x  tf ˆ f K1 (τ − t) eKˆ f (t f −τ ) dτ + K t

≤ K2 |tf − t| ≤ K2 |tf − tL |

(173)

ˆ f K1 (tf − tL )eKˆ f (t f −t L ) }. Hence, by where K2 = max{K1 , K the semigroup properties of ODE solutions and by use of (217), xt ), we have for s ≥ t and x ˆs ∈ Nr xˆ (ˆ ˆt ) − x ˆ (tf ; s, x ˆs) ≤ ˆ xt − x ˆs + ˆ x (s; t, x ˆt)− x ˆt

ˆ x (tf ; t, x  tf ˆ f ˆ + K x (τ ; t, x ˆt ) − x ˆ (τ ; s, x ˆs ) dτ s

 ≤ ˆ xt − x ˆs +K2 |s − t|+

tf

ˆ f ˆ K x (τ ; t, x ˆt )−ˆ x (τ ; s, x ˆs ) dτ

s

(174) and therefore, by the Gronwall inequality, we have

Now, since ˆs ) − x ˆ (tj +1 −; s, x ˆs )

ˆ x (tj +1 + δt−; s, x ≤ K2 |tj +1 + δt − tj +1 | = K2 |δt|

(178)

and ˆs ) − x ˆ (tj +1 −; t, x ˆt ) 2

ˆ x (tj +1 + δt−; s, x ≤ ˆ x (tj +1 + δt−; s, x ˆs ) − x ˆ (tj +1 −; s, x ˆs ) 2 + ˆ x (tj +1 −; t, x ˆt ) − x ˆ (tj +1 −; s, x ˆs ) 2

(179)

it is sufficient to show that the upper bound for |δt| is propor1 ˆs 2 + |s − t|2 ) 2 . This can be shown to hold tional to ( ˆ xt − x by considering the fact that m (x (tj +1 + δt−; s, xs ))  



= m x (tj +1 −; s, xs ) + fq j x (τ ; s, xs ) , ut j − dτ tj   t j +δ t (x(τ ;s,x s ),u − )  −



t j = m x tj +1 ; t, xt + δx tj +1 + fq j dτ t j +δ t

tj

ˆ x (tf ; t, x ˆt ) − x ˆ (tf ; s, x ˆs )

= m (x (tj +1 −; t, xt )) = 0. ˆ f (t f −s) K

(180)

For δx(tj +1 −) < j +1 sufficiently small,    t j +δ t

(x(τ ;s,x s ),u t j − ) T fq j dτ + O 2j +1 = 0 ∇m δxt j + 1 − +

ˆs + K2 |s − t|) e ≤ ( ˆ xt − x

ˆ

≤ ( ˆ xt − x ˆs + K2 |s − t|) eK f (t f −t L )  12  ≤ K ˆ xt − x ˆs 2 + |s − t|2

(177)

tj

(181)

(175)

ˆf for some K < ∞ which depends only on tf − tL , K1 and K and not on the control input. ˆt ) is Lipschitz in Since gˆ is Lipschitz in x ˆ and x ˆ(tf ; t, x (t, x ˆt ) ≡ (t, [zt , xTt ]T ), the performance function x (tf ; t, x ˆt )) J (t, tf , q, x, 0; I0 ) = gˆ (ˆ  tf ≡ lq (x, u) ds + g (xq L (tf )) (176) t

ˆr ) uniformly in t ∈ (tL , tf ) ˆ∈B is Lipschitz in x ∈ Br (and x with a Lipschitz constant independent of the control. Furthermore, since the infimum of a family of Lipschitz functions with a common Lipschitz constant is also Lipschitz with the same Lipschitz constant, the value function V (t, tf , q, x, 0) with no switches remaining is Lipschitz in x ∈ Br uniformly in t ∈ (tL , tf ).

which is equivalent to ∇mT δx (tj +1 −)  t j +δ t



∇mT fq j x (τ ; s, xs ) , ut j − dτ + O 2j +1 = 0. + tj

(182) Due to the transversal arrival of the trajectories with respect to the smooth switching manifold, |∇mT fq j | is lower bounded by a strictly positive number km ,f [see (2)], and hence, 

 ∇mT δx (tj +1 −) + O 2j +1     t j +δ t  x(τ ;s,x ),u ( ) s t −   j = ∇mT fq j dτ   tj    t j +δ t    ∇mT fq(jx(τ ;s,x s ),u t j − )  dτ ≥ km ,f |δt| (183) ≥   tj

PAKNIYAT AND CAINES: ON THE RELATION BETWEEN THE MP AND DP FOR CLASSICAL AND HYBRID CONTROL SYSTEMS

which gives |δt| ≤ ≤

1 km ,f

1 km ,f





∇m δx (tj +1 −) + O 2j +1 

∇m j +1 + j +1

 

∇m

≤ + 1 j +1 = Kj +1 j +1 . km ,f (184)

Hence, for t ∈ (tj , tj +1 ) and xt ∈ Br , there exist a neighborhood Nr x (xt ) such that for s ∈ (tj , tj +1 ) and xs ∈ Nr x (xt ), 1 we have δx(tj +1 −) ≤ K  ( ˆ xt − x ˆs 2 + |s − t|2 ) 2 < j +1 in order to ensure that δt ≤ Kj +1 j +1 , and consequently

ˆ x (tj +1 + δt−; s, x ˆs ) − x ˆ (tj +1 −; t, x ˆt )

 12  ˆs 2 + |s − t|2 ≤ K ˆ xt − x

(185)

for K independent of the control. Since ξˆ is smooth and time invariant, it is therefore Lipschitz in x ˆ uniformly in time. Also since c(x(tJ +1 )) is embedded in ξˆ [see (15)], at the switching time tj +1 , we have J (tj +1 −, qj , x ˆ, L − j; IL −j )  x) , L − j − 1; IL −j −1 ) = J tj +1 , qj +1 , ξˆ (ˆ

(186)

the Lipschitz property for the cost to go function ˆ, L − j; IL −j ) follows from the smoothness of ξˆ J(tj +1 −, qj , x ˆt , L − j − 1; IL −j −1 ). and the Lipschitz property of J(t, qj +1 , x Namely, by backward induction from the Lipschitzˆt , 0; Io ) proved earlier, it is conness of J(t, qL , x cluded that J(t, qL −1 , x ˆt , 1; I1 ) is Lipschitz, from which ˆt , 1; I2 ) is concluded to be Lipschitz, etc. Since the J(t, qL −2 , x Lipschitz constant is independent of the control and because the infimum of a family of Lipschitz functions with a common Lipschitz constant is also Lipschitz with the same Lipschitz constant, (30) holds, and hence, the value function is Lipschitz.  REFERENCES [1] L. Pontryagin, V. Boltyanskii, R. Gamkrelidze, and E. Mishchenko, The Mathematical Theory of Optimal Processes, vol. 4. New York, NY, USA: Wiley, 1962. [2] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966. [3] H. Sussmann and J. Willems, “300 Years of optimal control: From the brachystochrone to the maximum principle,” IEEE Control Syst., vol. 17, no. 3, pp. 32–44, Jun. 1997. [4] V. Jurdjevic, “Optimal control and geometry: Integrable system,” Cambridge Univ. Press, 2016. [5] D. Jacobson and D. Mayne, Differential Dynamic Programming. New York, NY, USA: American Elsevier, 1970. [6] J. Yong and X. Y. Zhou, Stochastic Controls: Hamiltonian Systems and HJB Equations. New York, NY, USA: Springer-Verlag, 1999. [7] D. Liberzon, Calculus of Variations and Optimal Control Theory: A Concise Introduction. Princeton, NJ, USA: Princeton Univ. Press, 2011. [8] W. Fleming and R. Rishel, Deterministic and Stochastic Optimal Control. New York, NY, USA: Springer, 1975. [9] S. E. Dreyfus, “Dynamic programming and the calculus of variations,” J. Math. Anal. Appl., vol. 1, no. 2, pp. 228–239, 1960. [10] F. H. Clarke and R. B. Vinter, “The relationship between the maximum principle and dynamic programming,” SIAM J. Control Optim., vol. 25, no. 5, pp. 1291–1311, 1987.

4361

[11] R. B. Vinter, “New results on the relationship between dynamic programming and the maximum principle,” J. Math. Control, Signals, Syst., vol. 1, no. 1, pp. 97–105, 1988. [12] K. E. Kim, “Relationship between dynamic programming and the maximum principle under state constraints,” J. Convex Anal., vol. 6, no. 2, pp. 335–348, 1999. [13] P. Cannarsa and H. Frankowska, “Some characterizations of optimal trajectories in control theory,” SIAM J. Control Optim., vol. 29, no. 6, pp. 1322–1347, 1991. [14] A. Cernea and H. Frankowska, “A connection between the maximum principle and dynamic programming for constrained control problems,” SIAM J. Control Optim., vol. 44, no. 2, pp. 673–703, 2005. [15] X. Y. Zhou, “Maximum principle, dynamic programming, and their connection in deterministic control,” J. Optim. Theory Appl., vol. 65, no. 2, pp. 363–373, 1990. [16] J. Speyer and D. Jacobson, Primer on Optimal Control Theory. Philadelphia, PA, USA: SIAM, 2010. [17] R. Vinter, Optimal Control (ser. Systems and control). Boston, MA, USA: Birkh¨auser, 2000. [18] L. I. Rozonoer, “Certain hypotheses in optimal control theory and the relationship of the maximum principle with the dynamic programming method,” J. Autom. Remote Control, vol. 64, no. 8, pp. 1237–1240, 2003. [19] A. Pakniyat and P. E. Caines, “On the minimum principle and dynamic programming for hybrid systems,” in Proc. 19th Int. Federation Autom. Control World Congr., 2014, pp. 9629–9634. [20] A. Pakniyat and P. E. Caines, “On the relation between the minimum principle and dynamic programming for hybrid systems,” in Proc. 53rd IEEE Conf. Decision Control, 2014, pp. 19–24. [21] A. Pakniyat and P. E. Caines, “On the relation between the hybrid minimum principle and hybrid dynamic programming: A linear quadratic example,” in Proc. 5th IFAC Conf. Anal. Des. Hybrid Syst., Atlanta, GA, USA, 2015, pp. 169–174. [22] F. H. Clarke and R. B. Vinter, “Applications of optimal multiprocesses,” SIAM J. Control Optim., vol. 27, no. 5, pp. 1048–1071, 1989. [23] F. H. Clarke and R. B. Vinter, “Optimal multiprocesses,” SIAM J. Control Optim., vol. 27, no. 5, pp. 1072–1091, 1989. [24] H. J. Sussmann, A Nonsmooth Hybrid Maximum Principle (ser. Lecture Notes in Control and Information Sciences), vol. 246. London, U.K.: Springer, 1999, pp. 325–354. [25] H. J. Sussmann, “Maximum principle for hybrid optimal control problems,” in Proc. 38th IEEE Conf. Decision Control, 1999, pp. 425–430. [26] P. Riedinger, F. R. Kratz, C. Iung, and C. Zanne, “Linear quadratic optimization for hybrid systems,” in Proc. 38th IEEE Conf. Decision Control, 1999, pp. 3059–3064. [27] X. Xu and P. J. Antsaklis, “Optimal control of switched systems based on parameterization of the switching instants,” IEEE Trans. Autom. Control, vol. 49, no. 1, pp. 2–16, Jan. 2004. [28] M. S. Shaikh and P. E. Caines, “On the hybrid optimal control problem: Theory and algorithms,” IEEE Trans. Autom. Control, vol. 52, no. 9, pp. 1587–1603, Sep. 2007. [29] F. Taringoo and P. E. Caines, “On the optimal control of impulsive hybrid systems on Riemannian manifolds,” SIAM J. Control Optim., vol. 51, no. 4, pp. 3127–3153, 2013. [30] F. Taringoo and P. E. Caines, “Gradient-geodesic HMP algorithms for the optimization of hybrid systems based on the geometry of switching manifolds,” in Proc. 49th IEEE Conf. Decision Control, 2010, pp. 1534–1539. [31] M. Garavello and B. Piccoli, “Hybrid necessary principle,” SIAM J. Control Optim., vol. 43, no. 5, pp. 1867–1887, 2005. [32] B. Passenberg, M. Leibold, O. Stursberg, and M. Buss, “The minimum principle for time-varying hybrid systems with state switching and jumps,” in Proc. 50th IEEE Conf. Decision Control Eur. Control Conf., 2011, pp. 6723–6729. [33] A. Pakniyat and P. E. Caines, “Hybrid optimal control of an electric vehicle with a dual-planetary transmission,” Nonlinear Anal: Hybrid Syst., vol. 25, pp. 263–282, 2017. [34] A. Pakniyat and P. E. Caines, “The hybrid minimum principle in the presence of switching costs,” in Proc. 52nd IEEE Conf. Decision Control, 2013, pp. 3831–3836. [35] A. Pakniyat and P. E. Caines, “Time optimal hybrid minimum principle and the gear changing problem for electric vehicles,” in Proc. 5th IFAC Conf. Anal. Des. Hybrid Syst., Atlanta, GA, USA, 2015, pp. 187–192. [36] A. V. Dmitruk and A. M. Kaganovich, “The hybrid maximum principle is a consequence of Pontryagin maximum principle,” Syst. Control Lett., vol. 57, no. 11, pp. 964–970, 2008.

4362

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 62, NO. 9, SEPTEMBER 2017

[37] I. Capuzzo-Dolcetta and L. C. Evans, “Optimal switching for ordinary differential equations,” SIAM J. Control Optim., vol. 22, no. 1, pp. 143–161, 1984. [38] A. Bensoussan and J. L. Menaldi, “Hybrid control and dynamic programming,” Dyn. Continuous, Discrete Impulsive Syst. Ser. B, Appl. Algorithm, vol. 3, no. 4, pp. 395–442, 1997. [39] S. Dharmatti and M. Ramaswamy, “Hybrid control systems and viscosity solutions,” SIAM J. Control Optim., vol. 44, no. 4, pp. 1259–1288, 2005. [40] G. Barles, S. Dharmatti, and M. Ramaswamy, “Unbounded viscosity solutions of hybrid control systems,” ESAIM—Control, Optim. Calculus Variations, vol. 16, no. 1, pp. 176–193, 2010. [41] M. S. Branicky, V. S. Borkar, and S. K. Mitter, “A unified framework for hybrid control: Model and optimal control theory,” IEEE Trans. Autom. Control, vol. 43, no. 1, pp. 31–45, Jan. 1998. [42] H. Zhang and M. R. James, “Optimal control of hybrid systems and a system of quasi-variational inequalities,” SIAM J. Control Optim., vol. 45, no. 2, pp. 722–761, 2007. [43] P. E. Caines, M. Egerstedt, R. Malham´e, and A. Sch¨ollig, “A hybrid bellman equation for bimodal systems,” in Proc. 10th Int. Conf. Hybrid Syst.: Comput. Control, vol. 4416, pp. 656–659, 2007. [44] A. Sch¨ollig, P. E. Caines, M. Egerstedt, and R. Malham´e, “A hybrid bellman equation for systems with regional dynamics,” in Proc. 46th IEEE Conf. Decision Control, 2007, pp. 3393–3398. [45] M. S. Shaikh and P. E. Caines, “A verification theorem for hybrid optimal control problem,” in Proc. IEEE 13th Int. Multitopic Conf., 2009, pp. 1–3. [46] V. Azhmyakov, V. Boltyanski, and A. Poznyak, “Optimal control of impulsive hybrid systems,” Nonlinear Anal.: Hybrid Syst., vol. 2, no. 4, pp. 1089–1097, 2008. [47] J. Lygeros, “On reachability and minimum cost optimal control,” Automatica, vol. 40, no. 6, pp. 917–927, 2004. [48] J. Lygeros, C. Tomlin, and S. Sastry, “Multiobjective hybrid controller synthesis,” in Proc. Int. Workshop Hybrid Real-Time Syst., 1997, pp. 109–123. [49] C. Tomlin, J. Lygeros, and S. Sastry, “A game theoretic approach to controller design for hybrid systems,” Proc. IEEE, vol. 88, no. 7, pp. 949–970, Jul. 2000. [50] F. Zhu and P. J. Antsaklis, “Optimal control of hybrid switched systems: A brief survey,” Discrete Event Dyn. Syst., vol. 25, no. 3, pp. 345–364, 2015. [51] M. S. Shaikh and P. E. Caines, “On relationships between WeierstrassErdmannn corner condition, Snell’s law and the hybrid minimum principle,” in Proc. Int. Bhurban Conf. Appl. Sci. Technol., 2007, pp. 117–122. [52] P. E. Caines, “Lecture notes on nonlinear and hybrid control systems: Dynamics, stabilization and optimal control,” Dept. Elect. Comput. Eng., McGill Univ., Montreal, QC, Canada, 2013. [53] A. Pakniyat, “Optimal control of deterministic and stochastic hybrid systems: Theory and applications,” Ph.D. dissertation, Dept. Elect. Comput. Eng., McGill Univ., Montreal, QC, Canada, 2016. [54] V. N. Kolokoltsov, J. Li, and W. Yang, “Mean field games and nonlinear Markov processes,” ArXiv e-prints, Dec. 2011. [55] G. N. Galbraith and R. B. Vinter, “Lipschitz continuity of optimal controls for state constrained problems,” SIAM J. Control Optim., vol. 42, no. 5, pp. 1727–1744, 2003. [56] S. N. Y. Thow and P. E. Caines, “On the continuity of optimal controls and value functions with respect to initial conditions,” Syst. Control Lett., vol. 61, no. 12, pp. 1294–1298, 2012. [57] S. Jafarpour and A. D. Lewis, Time-Varying Vector Fields and Their Flows (ser. Springer Briefs in Mathematics). New York, NY, USA: Springer, 2014. [58] H. Federer, Geometric Measure Theory (ser. Grundlehren der mathematischen Wissenschaften). New York, NY, USA: Springer, 1969.

[59] R. E. Kalman, “Contributions to the theory of optimal control,” Bol. Soc. Matematica Mexicana, vol. 5, pp. 102–119, 1960. [60] A. Pakniyat and P. E. Caines, “On the minimum principle and dynamic programming for hybrid systems with low dimensional switching manifolds,” in Proc. 54th IEEE Conf. Decision Control, Osaka, Japan, 2015, pp. 2567–2573. [61] A. Pakniyat and P. E. Caines, “On the stochastic minimum principle for hybrid systems,” in Proc. 55th IEEE Conf. Decision Control, Las Vegas, NV, USA, 2016, pp. 1139–1144.

Ali Pakniyat (M’14) received the B.Sc. degree in mechanical engineering from Shiraz University, Shiraz, Iran, in 2008, and the M.Sc. degree in mechanical engineering (applied mechanics and design) from Sharif University of Technology, Tehran, Iran, in 2010, and the Ph.D. degree in electrical engineering from McGill University, Montreal, QC, Canada, in September 2016, under the supervision of P. E. Caines. He is a member of McGill Centre for In´ telligent Machines and Groupe d’Etudes et de Recherche en Analyse des Decisions. His research interests include de´ terministic and stochastic optimal control, nonlinear and hybrid systems, analytical mechanics and chaos, with applications in automotive industry, sensors and actuators, and robotics.

Peter E. Caines (LF’11) received the B.A. degree in mathematics from the University of Oxford, Oxford, U.K., in 1967, and the Ph.D. degree in systems and control theory from Imperial College, University of London, London, U.K., in 1970, under the supervision of D. Q. Mayne. After periods as a Postdoctoral Researcher and Faculty Member at the University of Manchester Institute for Science and Technology, Stanford University, University of California, Berkeley, University of Toronto, and Harvard University, he joined McGill University, Montreal, QC, Canada, in 1980, where he is the James McGill Professor and the Macdonald Chair in the Department of Electrical and Computer Engineering. He is the author of Linear Stochastic Systems (Hoboken, NJ, USA: Wiley, 1988) and a Senior Editor of Nonlinear Analysis—Hybrid Systems. His research interests include stochastic, mean field game, and decentralized and hybrid systems theory, together with their applications in a range of fields. Dr. Caines is a Fellow of the Society for Industrial and Applied Mathematics, the Institute of Mathematics and its Applications (U.K.), and the Canadian Institute for Advanced Research. He is a member of Professional Engineers Ontario. He was elected to the Royal Society of Canada in 2003. In 2000, the adaptive control paper he coauthored with G. C. Goodwin and P. J. Ramadge (IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1980) was recognized by the IEEE Control Systems Society as one of the 25 seminal control theory papers of the 20th century. In 2009, he received the IEEE Control Systems Society Bode Lecture Prize and, in 2012, a Queen Elizabeth II Diamond Jubilee Medal.

Suggest Documents