Page 1. Optimization Techniques for State-Constrained ... equations or variational inequalities, and level set methods are used to numerically approx- imate the ...
Optimization Techniques for State-Constrained Control and Obstacle Problems 1
A.B. KURZHANSKI2 , I.M.MITCHELL3 , P.VARAIYA4 Communicated by G.Leitmann
1
The first author was supported by Russian Foundation for Basic Research(Grant 03-01-00663), the program “Universities of Russia” (Grant 03.03.007) and the program of the President of Russian Federation for the support of the scientific research of the leading scientific schools (Grant NSh-1889.2003.1.) The second author was supported by the National Science and Engineering Research Council of Canada and ONR MURI contract 79846-23800-44-NDSAS. The third and first authors were supported by NSF Grants ECS-0099824 and ECS-0424445. 2 Professor, Moscow State (Lomonosov) University, Department of Computational Mathematics and Cybernetics, Moscow, Russia. 3 Assistant Professor, University of British Columbia, Department of Computer Science, Vancouver, British Columbia, Canada. 4 Professor, University of California at Berkeley, Department of Electrical Engineering and Computer Science,Electronics Research Laboratory, Berkeley, California, USA.
1
Abstract The design of control laws for systems subject to complex state constraints still presents a significant challenge. This paper explores a dynamic programming approach to a specific class of such problems, that of reachability under state constraints. The problems are formulated in terms of nonstandard minmax and maxmin cost functionals, and the corresponding value functions are given in terms of Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities. The solution of such relations is complicated in general; however, for linear systems, the value functions may also be described in terms of duality relations of convex analysis and minmax theory. Consequently, solution techniques specific to systems with linear structure may be designed independent of HJB theory. These techniques are illustrated through two examples. Key Words Nonlinear systems, control synthesis, state constraints, obstacle problems, dynamic programming, variational inequalities, convex analysis.
2
1 Introduction State constrained control is challenging even for systems with linear dynamics. In this paper we examine the state constrained reachability question: From what states can trajectories start that stay within a given set of constraints and reach a specified target set at a particular time? We examine three scenarios that differ in their constraint sets. The first uses a convex constraint set, while the second uses the complement of a convex set as the constraint. The final scenario combines the two previous ones. In every scenario the target set is convex. For the case of general dynamics and under differentiability assumptions, we show that the value functions for these three scenarios are given in terms of Hamilton-Jacobi-Bellman (HJB) equations or variational inequalities. The form of these equations proves important in later sections, and it is natural to derive it under differentiability assumptions before studying the problem in its general nondifferentiable form. Instead of solving the HJB equations we examine alternative solution methods available through duality relations of convex analysis and minmax theory. While these latter results apply only to systems with linear dynamics, we believe it is not generally known that they can be used to analyze nonconvex situations of the last two scenarios. The paper thus deals with dynamic programming problems for nonstandard, nonintegral cost functionals (Refs. 1-3). It also serves to bring to the attention of the English reading audience some basic results from (Refs. 4-7). Other applications of dynamic programming to reachabilityrelated issues were indicated in (Refs. 8, 9). In the final two sections of the paper we examine specific examples of the first and third scenarios, and lay out the steps necessary to construct the value functions through the methods discussed previously. These value functions are shown to satisfy the appropriate HJB equations or variational inequalities, and level set methods are used to numerically approximate the solutions and generate the figures. Although we do not pursue it further here, this final connection between solution methods could be used to validate the accuracy of level set implementations, since the alternative schemes examined here lead to analytic solutions without intermediate recourse to HJB theory. Parts of this paper, including a different version of the first example, were announced without details in (Ref. 10).
2 System Consider a controlled system described by an ordinary differential equation, x˙ = f (t, x, u), 3
(1)
which in particular may be linear, x˙ = A(t)x + B(t)u + C(t)v(t),
t0 ≤ t ≤ τ.
(2)
Here x ∈ IRn is the state, u ∈ IRm is the control and v(t) is a known disturbance; f (t, x, u) is continuous in all the variables and satisfies conditions of uniqueness and extendibility of solutions for all starting points, all t ≥ t 0 , and for any control u(t) restricted by hard bounds (3) u(t) ∈ P(t), t ≥ t0 . Here P(t) is a set-valued function, with values in compIR n , the variety of compact sets, continuous in t in the Hausdorff metric. We also require the set f (t, x, P(t)) = F (t, x) to be convex and compact and the differential inclusion (DI) x˙ ∈ F (t, x) to have a Caratheodori solution extendable within the intervals under consideration (see (Refs.11,12)). The tube of solutions to the DI which start in set X ∗ at time τ is denoted by X[t] = X(t; τ, X ∗ ). This is the “reach set” of system (1). For linear systems we require the matrix functions A(t), B(t), C(t) to be continuous and P(t) to be convex (P(t) ∈ convIRn ). We present now the problems discussed in this paper.
3 Backward Reachability and Target Problems We introduce three target problems and associated reach sets for systems with state constraints. Let x[t] = x(t; τ, x) denote the system trajectory starting from position {τ, x}, x = x[τ ], x ∈ IRn ; let level set M = {x ∈ IRn : ϕM (x) ≤ 1} denote the target set, and level sets Yi (t) = {x ∈ IRn : ϕi (t, x) ≤ 1}, i = 1, 2 denote the state constraints. The functions ϕi (t, x), ϕM (x) are assumed to satisfy ϕi (t, ·) ∈ Φ, t ∈ [t0 , ϑ], i = 1, 2, ϕM (·) ∈ Φ,
(4)
where Φ = {ϕ(·)} is the class of proper closed convex functions ϕ(x), x ∈ IR n , whose Fenchel conjugates ϕ∗ are such that 0 ∈ intdom ϕ∗ . (Here dom φ = {x : ϕ(x) < ∞} and int P is the interior of set P.) The functions ϕ i (t, x) are assumed continuous in both variables and satisfy inclusion (4) in the second variable for each t. Class Φ ensures that the level sets M, Yi of functions ϕM (x), ϕi (t, x), are convex and compact, when they are nonempty (Ref.13). 4
Problem 3.1 Given time interval [τ, ϑ] and functions ϕ 1 (t, x), ϕM (x), find the set
W1 [τ ] = x : ∃u(·), ∀t ∈ [τ, ϑ] ϕ1 (t, x[t]) ≤ 1, ϕM (x[ϑ]) ≤ 1; x[τ ] = x . Problem 3.1 has two variants, 3.1A and 3.1B. Problem 3.1A Find W1 [τ ] = {x : V1 (τ, x) ≤ 1} as a level set of the value function
V1 (τ, x) = min max max{ϕ1 (t, x[t]) | t ∈ [τ, ϑ]}, ϕM (x[ϑ]) x[τ ] = x . u
t
W1 [τ ] is the backward reach set relative to M under the state constraint Y 1 (t), i.e., the set of points x for which there exists some control u(t) that steers the trajectory x[t] = x(t; τ, x) to M under the state constraint Y1 (t). A similar set W10 [τ ] is introduced in the next problem. Problem 3.1B Find set W10 [τ ] = {x : V10 (τ, x) ≤ 1} as a level set of the value function
V10 (τ, x) = min ϕM (x[ϑ]) max{ϕ1 (t, x[t]) | t ∈ [τ, ϑ]} ≤ 1; x[τ ] = x . u
t
The next problem has state constraint of type opposite to the previous one. Consider a function ϕ2 (t, x) with the same properties as ϕ1 (t, x) and let Y2 (t) = {x : ϕ2 (t, x) ≤ 1}, Z2 (t) = {x : ϕ2 (t, x) ≥ 1}. Problem 3.2Given time interval [τ, ϑ], and functions ϕ 2 (t, x), ϕM (x), find the set
W2 [τ ] = x : ∃u(·), ∀t ∈ [τ, ϑ] ϕ2 (t, x[t]) ≥ 1, ϕM (x[ϑ]) ≤ 1; x[τ ] = x . Here W2 [τ ] = {x : V2 (τ, x) ≤ 1} is the level set of the value function
V2 (τ, x) = min max max{−ϕ2 (t, x[t]) + 2 | t ∈ [τ, ϑ]}, ϕM (x[ϑ]) u
t
x[τ ]
=x .
This is the set of points x from which some controlled trajectory x[t] = x(t; τ, x), starting at time τ , reaches M at time t = ϑ under state constraint ϕ 2 (t, x[t]) ≥ 1 or, equivalently, x[t] ∈ Z2 (t), ∀t ∈ [τ, ϑ]. For general constraint sets problems 3.1 and 3.2 are interchangable, but for the restricted class of convex sets Yi(t), problem 3.1 corresponds to staying within the convex set Y 1 (t), 5
while problem 3.2 corresponds to staying within the nonconvex Z 2 (t), the closure of the complement of the convex set Y2 (t). Problems 3.1 and 3.2 reflect the property of weak invariance of the backward reach sets Wi [τ ], i = 1, 2, relative to equation (1) and state constraints Y1 (t), Z2 (t) respectively. Therefore these sets Wi [τ ] are also called invariant sets (Ref.12). The third problem combines the previous two. Problem 3.3.Given time interval [τ, ϑ] and functions ϕ 1 (t, x), ϕ2 (t, x), ϕM (x), find the set
n
W3 [τ ] = x ∈ IR : ∃u(·), ∀t ∈ [τ, ϑ], ϕ1 (t, x[t]) ≤ 1, ϕ2 (t, x[t]) ≥ 1, ϕM (x[ϑ]) ≤ 1; x[τ ] = x . Here W3 [τ ] = {x : V3 (τ, x) ≤ 1} is a level set of the value function
V3 (τ, x) = min max max{ϕ1 (t, x[t]), −ϕ2 (t, x[t]) + 2 | t ∈ [τ, ϑ]}, ϕM (x[ϑ]) x[τ ] = x . u
t
This is the set of points x from which some controlled trajectory x[t] = x(t; τ, x), starting at time τ , reaches M at time t = ϑ and also satisfies the state constraints x[t] ∈ Y 1 (t), x[t] ∈ Z2 (t), ∀t ∈ [τ, ϑ]. Set W3 [τ ] is called the reach-evasion set (Ref.14). We indicate some approaches to the solution of these problems.
4 Solution Methods. HJB Equations In the general nonlinear case the various value functions may be calculated through a generalized HJB equation. We shall indicate such equations for problems 3.1–3.3. We suppose that ϕM (x) = d2 (x, M) + 1, ϕi (t, x) = d2 (x, Yi (t)) + 1, i = 1, 2, where d2 (x, Q) = min{(x − q, x − q)|q ∈ Q} is the square of the Euclidean distance of point x from compact set Q. These functions ϕ1 , ϕ2 , ϕM satisfy the inclusion (4). Of course other choices are possible. Starting with Problem 3.1, denote V1 (t, x) = V1 (t, x|V1 (ϑ, ·)), emphasizing the dependence of V1 (t, x) on the boundary condition V 1 (ϑ, x) = max{ϕ1 (ϑ, x), ϕM (x)}. Theorem 4.1. The value function V1 (t, x) satisfies the principle of optimality, which has the semigroup form V1 (τ, x | V1 (ϑ, ·)) = V1 (τ, x | V1 (t, · | V1 (ϑ, ·))). 6
(5)
This property is established through a conventional argument (Ref.15) and implies a similar property for the corresponding reach sets. Namely, if we denote W1 [τ ] = W1 (τ ; ϑ, M), we have W1 (τ ; ϑ, M) = W1 (τ ; t, W1 (t; ϑ, M)). Relation (5) yields for the value function V1 (t, x) the “backward” relation
V1 (t, x) ≤ max {max ϕ1 (s, x[s]) | s ∈ [t, t + σ]}, V1 (t + σ, x[t + σ]) x[t] = x , s
wherein u(s), s ∈ [t, t + σ], is any piecewise continuous control which satisfies (3), with equality along the optimal trajectory. This gives for σ ≥ 0
max {max ϕ1 (s, x[s])|s ∈ [t, t+σ]}−V1 (t, x), V1 (t+σ, x[t+σ])−V1 (t, x)x[t] = x ≥ 0, s (6) and ϕ1 (t, x) ≤ V1 (t, x), with V1 (ϑ, x) = max{ϕ1 (ϑ, x), ϕM (x)}. We assume that V1 (t, x) and ϕ1 (t, x) are differentiable (see Remark 3.1 below). (Case 1A-1) Assume V1 (t, x) > ϕ1 (t, x). Then relation (6) yields V1t (t, x) + min(V1x , f (t, x, u)) = 0, u ∈ P(t). u
(7)
To proceed further, denote H(t, x, V1x , u) = (V1x (t, x), f (t, x, u)), so that under control u the total derivative along the trajectory is dV1 /dt|u = V1t (t, x) + H(t, x, V1x , u). (Case 1A-2) Assume V1 (t, x) = ϕ1 (t, x), and let u0 (t), x0 (t) be the optimal solution of Problem 2.1A for V 1 (t, x). (Under our assumptions the optimal control u 0 does exist.) 7
Taking u = u0 (t), we observe that there exists a δ > 0, such that ϕmax (t + σ, x0 [t + σ]) = max{ϕ1 (s, x0 [s]) | s ∈ [t, t + σ]} ≡ V1 (t, x) ≡ ϕ1 (t, x) ≥ (8) ≥ ϕ1 (t + σ, x0 [t + σ]), ∀σ ∈ (0, δ]. Therefore 0 = dϕmax (t, x)/dt|u0 ≥ dϕ1 (t, x)/dt|u0 .
(9)
Thus, with V1 (t, x) = ϕ1 (t, x) we may consider the range of control u as being u ∈ P(t) ∩ {u : dϕ1 (t, x)/dt|u ≤ 0} = P0 (t, x). Repeating the proof for Case 1A-1, we arrive at the equation V1t (t, x) + min(V1x , f (t, x, u)) = 0, u ∈ P0 (t, x), u
(10)
which implies in this case that dϕ1 (t, x)/dt|u0 ≤ dV (t, x)/dt|u0 = 0. To proceed further we must compare dV (t, x)/dt|u0 and dϕ1 (t, x)/dt|u0 . We have two options: dV1 (t, x)/dt|u0 = dϕ1 (t, x)/dt|u0 = 0, or dV1 (t, x)/dt|u0 > dϕ1 (t, x)/dt|u0 = 0. In case (11)
(11) (12)
V1 (t + σ, x0 [t + σ]) = ϕ1 (t + σ, x0 [t + σ]) + o(σ),
and lim{o(σ)/σ| σ → 0} → 0, which imply that the equality V1 (t, x0 [t]) = ϕ1 (t, x0 [t]) = V1 (s, x0 [s] = ϕ1 (s, x0 [s]) = const holds for some interval s ∈ [t, t + σ], σ > 0, relative to terms of order higher than σ. This property is similar to x0 (s) moving along the boundary of the state constraint ϕ1 (s, x0 [s]) ≤ ϕ1 (t, x0 [t]).
(13)
In case (12), {t, x(t)} is the point of departure of trajectory x 0 [s] from Case 1A-2 to Case 1A-1 (or, in other words, from the boundary (13)). The boundary condition for both Cases 1A-1 and 1A-2 is V1 (ϑ, x) = max{ϕ1 (ϑ, x), ϕM (x)}.
(14)
In other words, due to (6),(10), we have max{H(t, x, V1x , u), H(t, x, ϕ1 , u)} ≥ H(t, x0 , V1x , u0 ) = 0 ≥ H(t, x0 , ϕ1 , u0 ). 8
(15)
Theorem 4.2. The solution V1 (t, x) to Problem 3.1A is given as follows. If V 1 (t, x) = ϕ1 (t, x), it should satisfy equation (7), (14). If V1 (t, x) = ϕ1 (t, x), the optimal control u0 (t, x) should be selected among controls u ∈ P(t) that satisfy V1t (t, x) + max{H(t, x, V1x , u), H(t, x, ϕ1x , u)} ≥ 0, 0 = V1t (t, x(0) [t])+H(t, x(0) , V1,x (t, x(0) [t]), u0 ) ≥ V1t (t, x(0) [t])+H(t, x(0) [t], ϕ1,x (t, x(0) [t]), u0 ), with boundary condition (14). Theorem 4.3. The solution to Problem 3.1 is given by W1 [τ ] = {x : V1 (τ, x) ≤ 1} in which V1 (τ, x) is the solution to Problem 3.1A. If we look for W1 [τ ] through Problem 3.1B, a reasoning similar to the above gives the next result (V10 denotes the value function). Lemma 4.1. The value function V10 (τ, x) satisfies the HJB equation (8) when ϕ1 (τ, x[τ ]) < 1 and the HJB equation 0 , f (t, x, u)) = 0, u ∈ P0 (t), V1t0 (t, x) + min(V1x
(16)
u
wherein P0 (t) = P(t) ∩ {u : dϕ1 (t, x)/dt|u ≤ 0}, when ϕ1 (τ, x[τ ]) = 1. The boundary condition is V10 (ϑ, x) = ϕM (x). Thus we have an alternative solution to Problem 3.1 Theorem 4.4. The solution to Problem 3.1 is given by W1 [τ ] = {x : V10 (τ, x) ≤ 1}, wherein V10 (τ, x) is the solution to Problem 3.1B. For Problem 3.2 the value function V2 (t, x) also satisfies a semigroup property similar to (5). This gives, for any u(s) ∈ P(s), s ∈ [t, t + σ], σ ≥ 0,
max {max ϕ2 (s, x[s])|s ∈ [t, t+σ]}−V2 (t, x), s
9
V2 (t+σ, x[t+σ])−V2 (t, x) x[t]
= x ≥ 0,
with equality along the optimal trajectory, and −ϕ2 (t, x) + 2 ≤ V2 (t, x). We assume that V2 (t, x) and ϕ2 (t, x) are differentiable. (Case 2A-1) Assuming −ϕ2 (t, x) + 2 < V2 (t, x), we have V2t (t, x) + min(V2x , f (t, x, u)) = 0, u ∈ P(t). u
(17)
(Case 2A-2) Assuming −ϕ2 (t, x) + 2 = V2 (t, x), if u0 (t), x0 (t) is the optimal solution of Problem 3.2, we have through reasoning similar to the above max{H(t, x[t], V2x (t, x[t]), u), H(t, x[t], −ϕ2x (t, x[t]) + 2, u) ≥ 0, 0 = H(t, x0 [t], V2x (t, x0 [t], u0 ) ≥ H(t, x0 [t], −ϕ2x (t, x0 [t]) + 2, u0).
(18)
The boundary condition is V2 (ϑ, x) = max{−ϕ2 (ϑ, x) + 2, ϕM (x)}.
(19)
Theorem 4.5. The solution to Problem 3.2 is given by W2 [τ ] = {x : V2 (τ, x) ≤ 1}, in which V2 (τ, x) is the solution to (17)–(19). Finally, in Problem 3.3, the value function V 3 (t, x) satisfies an analog of Theorem 4.1. We have, for any u(s) ∈ P(s), s ∈ [t, t + σ], σ ≥ 0, the relations
max max{ϕ1 (s, x[s]), −ϕ2 (s, x[s]) + 2 | s ∈ [t, t + σ]} − V3 (t, x), s
V3 (t + σ, x[t + σ]) − V3 (t, x) | x[t] = x ≥ 0,
(20)
with equality along the optimal trajectory, and ϕ1 (t, x) ≤ V3 (t, x), −ϕ2 (t, x) + 2 ≤ V3 (t, x). (Case 3A-1) Assuming V3 (t, x) > ϕ1 (t, x), V3 (t, x) > −ϕ2 (t, x) + 2, we have the HJB equation V3t (t, x) + min(V3x , f (t, x, u)) = 0, u ∈ P(t); (21) u
10
(Case 3A-2) Assuming V3 (t, x) = ϕ1 (t, x), we have max{H(t, x, V3x , u), H(t, x, ϕ1x , u) ≥ 0, 0 = H(t, x(0) [t], V3x (t, x(0) [t], u0 ) ≥ H(t, x(0) [t], ϕ1x (t, x(0) [t], u0 )
(22)
and lastly, (Case 3A-3) Assuming V3 (t, x) = −ϕ2 (t, x) + 2, we have max{H(t, x, V3x , u), H(t, x, −ϕ2x + 2, u) ≥ 0, 0 = H(t, x, V3x , u0 ) ≥ H(t, x, ϕ1x , u0).
(23)
The boundary condition is V3 (ϑ, x) = max{ϕ1 (ϑ, x), −ϕ2 (ϑ, x) + 2, ϕM (x)}
(24)
Theorem 4.6. The solution to Problem 2.3 is given by W3 [τ ] = {x : V3 (τ, x) ≤ 1}, in which V3 (τ, x) is the solution to (21)-(24). We have presumed Z1 (t) ∩ Y2 = ∅, where Z1 (t) = {ϕ1 (t, x[t]) ≥ 1 . Remark 4.1. The common situation is that the value functions V 10 , V1 , V2 , V3 are not differentiable. In this case, the argument above may be written in terms of upper and lower Dini derivatives DV+ , DV− in place of dV /dt. Solutions to the HJB equations should be treated in a generalized (“viscosity”or “minmax”) sense (Refs. 15-19). The numerical calculation of solutions to these equations for the general, nonlinear case is not simple and requires additional investigation. One promising emerging approach uses level set methods (Refs. 20-23). These numerical considerations are used to calculate the forthcoming Fitsr and Second Examples. In the case of linear systems, however, the value functions V 10 , V1 , V2 , V3 may be described through duality relations of nonlinear analysis and related branches of minmax theory. The value functions turn out to be differentiable, which allows direct interpretation of the variational equalities above.
5 Solution Techniques through Duality Methods We now find the value functions for linear systems (2) using techniques of convex analysis, semidefinite programming and minmax theory (Ref. 4, 6, 24-26). We develop formulas for 11
Vi (τ, x), i = 1, 2, 3 which can be used to design numerical procedures. Note that in this section the state constraints are expressed as inequalities of the variables y = K i x. Assume that ϕ1 (t, x) = (y, N1(t)y), ϕM (x) = (x − m, M(x − m)), N1 (t) = N1 (t) > 0, M = M > 0. Let E(p(t), P (t)) = {x − p, P −1 (t)(x − p)) ≤ 1} be the ellipsoid with center p(t) and shape matrix P (t) = P (t) > 0. All constraints are ellipsoidal: the control constraint is u(t) ∈ P(t) = E(0, P (t)), (25) the target set is M = E(m, M) = {x : (x − m, M −1 (x − m)) ≤ 1}, and the state constraint is Y1 (t) = E(0, N1(t)) = {y : (y, N1−1 (t)y) ≤ 1}; y = K1 x, with N1 (t) being continuously differentiable. Let ρ(l | X ) = max{(l, x) | x ∈ X } denote the support function of the convex compact set X . The support function of an ellipsoid is ρ 2 (d | E(0, P )) = (d, P −1d). To find V1 (τ, x) we first characterize the class of controls (25) whose trajectories satisfy the inequalities (y[t], N1 (t)y[t]) ≤ µ2 , t ∈ [τ, ϑ]; (x[ϑ] − m, M(x[ϑ] − m)) ≤ ε2 ,
(26)
in which y[t] = K1 x[t], x[t] = x(t; τ, x), and µ, ε are positive parameters. The first inequality in (26) is equivalent to (see (Ref. 4)) ∀q(·) ∈ Q,
(q(t), y[t]) ≤ µ, t ∈ [τ, ϑ],
(27)
in which Q = {q | q N ≤ 1} is the unit ball of continuous functions on [τ, ϑ] with norm q(·) N = max{(q(t), N −1 (t)q(t))1/2 | t ∈ [τ, ϑ]}, N(t) ≡ N1 (t). t
Hence (27) is equivalent to ∀q(·) ∈ Q,
ϑ τ
(q(t), y[t])dΛ(t) ≤ µ
ϑ τ
dΛ(t),
(28)
for all Λ(·) ∈ V ar+ [τ, ϑ], the space of scalar, nondecreasing functions of bounded variation on [τ, ϑ]. The second inequality in (26) is equivalent to ∀l ∈ IRn ,
(l, x[ϑ] − m) ≤ ε(l, M −1 l)1/2 . 12
(29)
Combining (28), (29) we find that (26) is solvable iff for all l ∈ IRn , q ∈ Q, Λ(·) ∈ V ar+ [τ, ϑ] (l, x[ϑ] − m) +
ϑ τ
(q(t), y[t])dΛ(t) ≤ ε(l, M
−1
l)
1/2
+µ
ϑ τ
dΛ(t).
In view of (2), this relation may be rewritten as
max max max (s[τ ], x) + q(·)
Λ(·)
l
ϑ τ
− µ
((s[t], B(t)u(t) + C(t)v(t))dt − (l, m) − ε(l, M −1 l)1/2
ϑ τ
dΛ(t) ≤ 0,
(30)
in which s[t] is the row-vector solution to the adjoint equation ds = −sA(t)dt − q (t)K1 dΛ(t), s[ϑ] = l .
(31)
To find the value function V1 (t, x) we have to minimize the left-hand side of (30) with respect to u(·). Taking ε = µ, applying a minmax theorem (Ref. 25, 26), and then substituting for the minimizing u, we obtain the next result. Theorem 5.1. The value function V1 (τ, x) is given by the formula
V1 (τ, x) = max max max (s[τ ], x) − (l, m) + q(·)
Λ(·)
ϑ
l
τ
(s[t]B(t)P (t)B (t)s [t])
1/2
+ s[t]C(t)v(t) dt ,
in which the maximum is to be taken over q(·) ∈ Q and {l, Λ(·)} ∈ D. Here
D = {l, Λ(·)} : (l, M −1 l)1/2 +
ϑ τ
Λ(t) = 1, Λ ∈ V ar+ [τ, ϑ] .
V1 (τ, x) is the smallest number µ = ε for which the inequalities (26) are solvable. This leads to the next conclusion. Corollary 5.1. The backward reach set is convex and compact, and is given by W1 [τ ] = {x : V1 (t, x) ≤ 1}. We now indicate the solution to Problem 3.1B. Theorem 5.2. The solution to Problem 3.1B is given by value function
V10 (τ, x)
= max max max (s[τ ], x) + q(·)
Λ(·)
l
ϑ
(s[t]B(t)P (t)B (t)s [t])1/2 + s[t]C(t)v(t) dt−
τ
−(l, m) −
ϑ τ
dΛ(s) ,
where the maximum is taken over q(·) ∈ Q, (l, M −1 l) ≤ 1, Λ(·) ∈ V ar+ [τ, ϑ]. 13
For calculating V2 (τ, x) consider the two inequalities ϕ2 (t, y) = (y[t], N2 (t)y[t])1/2 ≥ µ > 0, t ∈ [τ, ϑ], (x[ϑ] − m, M(x[ϑ] − m))1/2 ≤ 2 − µ, (32) in which y[t] = K2 x[t], x[t] = x(t; τ, x). The first inequality in (32) is equivalent to the following (see (Ref.4): ∃q(·) ∈ Q0 : (q(t), y[t]) ≥ µ, t ∈ [τ, ϑ].
(33)
Here Q0 is the compact subset of Q = {q(·) | q(·) N ≤ 1} given by Q0 = {q(·)}, q(t) = N(t)y[t]((y[t], N(t)y[t])−1/2 , t ∈ [τ, ϑ],
(34)
with N(t) = N2 (t), y[t] = K2 x[t] and x[t] = x(t; τ, x) is any trajectory of (2) generated by any u(t) ∈ E(p(t), P (t)) and any x : (x, x) ≤ r 2 with r 2 sufficiently large. Relation (33) in its turn is equivalent to: ∃q(·) ∈ Q0 :
ϑ τ
(q(t), y[t])dΛ(t) ≥ µ
ϑ τ
dΛ(t), ∀Λ(·) ∈ V ar+ [τ, ϑ].
(35)
The second inequality in (32) is equivalent to the following: (l, x[ϑ] − m) ≤ 2 − µ, ∀ l : (l, M −1 l) ≤ 1, or
− (l, x[ϑ] − m) + 2(l, M −1 l)1/2 ≥ µ(l, M −1 l)1/2 , ∀l ∈ IRn .
(36)
Combining (35), (36), we come to an equivalent system, observing that (32) is solvable iff there exists a function q(·) ∈ Q0 so that
− (l, x[ϑ] − m) + 2(l, M −1 l)1/2 +
≥ µ (l, M
−1
l)
1/2
+
ϑ τ
ϑ τ
(q(t), y[t])dΛ(t) ≥
dΛ(t) , ∀l ∈ IRn , ∀Λ(·) ∈ V ar+ [τ, ϑ].
This inequality may be rewritten as
min min −(s[τ ], x) + (l, m) + Λ(·)
l
ϑ τ
(s[t], B(t)u(t) + C(t)v(t) dt + 2(l, M −1 l)1/2 ) ≥ µ,
(37) under condition {l, Λ(·)} ∈ D. Here s[t] is the row-vector solution to the adjoint equation ds = −sA(t)dt − q (t)K2 dΛ(t), s[ϑ] = l .
(38)
We further have to minimize the left-hand side of (37) over u(·) with q(·) given. After that we maximize over q(·). This leads to the next result. 14
Theorem 5.3. The value function V2 (ϑ, x) is given by the formula
ϑ
V2 (ϑ, x) = max min min −(s[τ ], x)− q(·)
Λ(·)
l
(s[t]B(t)P (t)B (t)s [t])1/2 + s[t]C(t)v(t) dt+
τ
+(l, m) + 2(l, M
−1
l)
1/2
(39)
) ,
wherein the maximum is over q(·) ∈ Q0 and the minimum is over {l, Λ(·)} ∈ D. Corollary 5.2. In general the reach set W2 [τ ] is nonconvex and is given by W2 [τ ] = {x : V2 (τ, x) ≥ 1}. This is the set of all points from which it is possible to avoid the interior intY 2 (t) while reaching the target set M at the prescribed time ϑ. Lastly, for the value function V3 for Problem 3.3, we start with the solvability conditions for the system of inequalities: (y (1) [t], N1−1 (t)y (1) [t])1/2 ≤ µ,
(a) (b)
(y (2) [t], N2 (t)y (2) [t])1/2 ≥ 2 − µ > 0,
t ∈ [τ, ϑ],
(40)
(x(ϑ) − m, M(x(ϑ) − m))1/2 ≤ µ,
(c)
with y (1) [t] = K1 x[t], y (2) [t] = K2 x[t], x[t] = x(t; τ, x). The first inequality in (40) is equivalent to (q (1) (t), y (1) [t]) ≤ µ, t ∈ [τ, ϑ], ∀q (1) (·) ∈ Q1 , (41) while the second is equivalent to ∃q (2) (·) ∈ Q2 : (q (2) (t), y (2) [t]) ≥ 2 − µ, t ∈ [τ, ϑ].
(42)
The compact sets Q1 , Q2 above are of the form Q0 in (34) with matrix N substituted by N1 , N2 respectively. Relations (41), (42) are in turn equivalent to ϑ τ
(1)
(q (t), y [t])dΛ1 (t) ≤ µ
and (2)
(1)
∃q (·) ∈ Q0 : 2
ϑ τ
dΛ2 (t) −
ϑ τ
ϑ
(2)
τ
dΛ1 (t), ∀q (1) (·) ∈ Q1 ,
(2)
(q (t), y [t])dΛ2 (t) ≤ µ
∀Λi (·) ∈ V ar+ [τ, ϑ], i = 1, 2. 15
ϑ τ
dΛ2 (t),
(43)
(44)
The third inequality in (40) is equivalent to (l, x[ϑ] − m) ≤ µ, ∀ l : (l, M −1 l) ≤ 1.
(45)
Combining (43)–(45), we come to an equivalent system, observing that (40) is solvable iff there exists a function q (2) (·) ∈ Q2 , such that ϑ
((l, x[ϑ] − m) +
≤ µ (l, M
−1
l)
1/2
+
τ
ϑ τ
(1)
(1)
(2)
(2)
(q (t), y [t])dΛ1 (t) − (q (t), y [t])dΛ2 (t) + 2
dΛ1 (t)+
ϑ τ
ϑ τ
dΛ2 (t)
dΛ2 (t) , ∀q (1) (·) ∈ Q1 , ∀l ∈ IRn , ∀Λi (·)) ∈ V ar+ [τ, ϑ], i = 1, 2.
The last relation may be rewritten as
max max max max (s[τ ], x)−(l, m)+
q (1) (·) Λ1 (·) Λ2 (·)
ϑ
l
(s[t], B(t)u(t))+(s[t], C(t)v(t)) dt+2
τ
ϑ τ
dΛ2 (s)) ≤ µ,
(46) in which the maximizations are over
(1)
q (·) ∈ Q1 , {l, Λ1 (·), Λ2(·)} ∈ D0 = {l, Λ1 (·), Λ2(·)} : (l, M
−1
l)
1/2
+
ϑ τ
Λ1 (t)+
ϑ τ
Λ2 (t) = 1 ,
with q2 (·) ∈ Q2 given. Here s[t] is the row-vector solution to the adjoint equation ds = −sA(t)dt − q (1) (t) K1 dΛ1 (t) + q (2) (t) K2 dΛ2 (t), s[ϑ] = l .
(47)
We further have to minimize the left-hand side of (46) over u(·) with q (2) (·) given. As in the previous cases, applying a minimax theorem, we come to the next assertion. Lemma 5.1. Inequalities (40) are solvable iff there exists q (2) (·) ∈ Q2 so that the inequality
max max max max (s[τ ], x)−(l, m)+
q (1) (·) Λ1 (·) Λ2 (·)
ϑ
l
+2
τ
ϑ τ
(s[t]B(t)P (t)B (t)s [t])
1/2
+(s[t], C(t)v(t)) dt+
dΛ2(t) ≤ µ,
holds for all q (1) (·) ∈ Q, {l, Λ1 (·), Λ2(·)} ∈ D0 }. . Theorem 5.4. The value function V3 (ϑ, x) is given by the formula V3 (ϑ, x) = min max max max q (2)
(s[τ ], x)−(l, m)+
ϑ τ
q (1) Λ1 ,Λ2
l
(s[t]B(t)P (t)B (t)s [t])1/2 +(s[t], C(t)v(t)) dt+2
ϑ τ
dΛ2 (t)) ,
(48) wherein the minimum is over q (2) (·) ∈ Q2 and the maximum is over q (1) (·) ∈ Q1 , {l, Λ1(·), Λ2 (·)} ∈ D0 16
Corollary 5.3. In general the reach set W3 [τ ] is nonconvex and is given by W3 [τ ] = {x : V3 (τ, x) ≤ 1}. W3 [τ ] is the set of points from which it is possible to ensure the state constraints ϕ1 (t, y (1) [t]) ≤ 1, ϕ2 (t, y (2) [t]) ≥ 1, t ∈ [τ, ϑ], while reaching the target set M at prescribed time ϑ. The formulas of this section allow us to develop algorithms for linear systems with linear structure as we illustrate next.
6 First Example Consider the double integrator system on the time interval t ∈ [0, 2] x˙ 1 = x2 , x˙ 2 = u,
(49)
with control |u| ≤ k . Let m = (m1 , 0), ϕM (x) = (x − m, M(x − m))1/2 , M = M > 0 and ϕ1 (x2 ) = |x2 |. The objective is to calculate at time τ ∈ [0, 2) the backward reach set W [τ ] = W (τ ; 2, M) from the target set M = {x : (x − m, M(x − m))1/2 ≤ 1} under the state constraint |x2 [t]| ≤ 1, t ∈ [τ, ϑ]. One way to proceed is to follow Problem 3.1A and find the value function
max max{|x2 [t]| | t ∈ [τ, ϑ]}, (x[ϑ] − m, M(x[ϑ] − m))1/2 x[τ ] = x . V (τ, x) = min u t
(50) Then W [τ ] = {x : V (τ, x) ≤ 1}. Instead we solve the problem through convex analysis along the lines of (Ref.4). We do this in several steps. Step 1. Indicate conditions for existence of a control u(t) that ensure the solvability of the two inequalities, (x[ϑ] − m, M(x[ϑ] − m))1/2 ≤ ε, |x2 [t]| ≤ µ, t ∈ [τ, ϑ], for specified ε, µ > 0. 17
(51)
Remark 6.1. Because the state constraint here is of dimension one, the function q(t) of (27) is constant, with q(t) ≡ ±1. This makes the scheme simpler: we may omit q(t) in (28)–(31) and subsequent formulas, if we replace the nondecreasing functions V ar + [τ, ϑ]
ϑ by any functions of bounded variation, V ar[τ, ϑ], with the integral τ dΛ(t) substituted by
ϑ |dΛ(t)|, as necessary. τ Since x1 [t] = x(t, τ, x) = x1 + (ϑ − τ )x2 + x2 [t] = x2 +
t τ
t τ
(ϑ − s)u(s)ds,
u(s)ds, x1 = x1 [τ ], x2 = x2 [τ ],
the inequalities (51) are equivalent to l1 (x1 + (ϑ − τ )x2 ) + l2 x2 − (m, l) + ϑ τ
dΛ(s)x2 +
ϑ ϑ τ
s
ϑ τ
(l1 (ϑ − s) + l2 )u(s)ds ≤ ε(l, M −1 l)1/2 , ∀l ∈ IR2 ,
dΛ(t) u(s)ds − µ
ϑ τ
|dΛ(s)| ≤ 0, ∀Λ(·) ∈ V ar[τ, ϑ].
Adding the last two inequalities we come to an equivalent relation
max l1 ((x1 − m1 ) + (ϑ − τ )x2 )) + (l2 +
ϑ
l,Λ
−
ϑ τ
(l1 (ϑ − s) + l2 +
ϑ s
dΛ(t))u(s)ds − µ
ϑ τ
τ
dΛ(s))x2 −
|dΛ(s)| − ε(l, M
−1
l)
1/2
≤ 0,
(52)
in which the maximum is over l ∈ IR 2 , Λ(·) ∈ V ar[τ, ϑ] . After minimizing over u and applying a minimax theorem (Ref.25, 26) gives us the next result. Theorem 6.1. Inequalities (51) are solvable with some control u(t) iff ε0 (τ, x) = max Ψ(l, Λ, µ; τ, x) ≤ ε, l,Λ
in which the maximum is over (l, M −1 l)1/2 ≤ 1, Λ(·) ∈ V ar[τ, ϑ], and Ψ(l, Λ, µ; τ, x) = l1 ((x1 − m1 ) + (ϑ − τ )x2 )) + (l2 + −k
ϑ τ
|l1 (ϑ − s) + l2 +
ϑ s
dΛ(t)|ds − µ
ϑ τ
ϑ τ
dΛ(s))x2 −
|dΛ(s)|.
Corollary 6.1. With µ = 1, the smallest ε that ensures solvability of inequalities (51) is ε0 = ε0 (τ, x) = min{ϕM (x[ϑ]) | x[τ ] = x; |x2 (t)| ≤ 1, t ∈ [τ, ϑ]}. u
18
That is, ε0 (τ, x) is the minimum terminal cost ϕ M (x[ϑ]) under the state constraint |x2 [t]| ≤ 1, ∀t ∈ [τ, ϑ]. The maximum in (52) is attained at {l0 , Λ0 (·)}. The corresponding optimal trajectory denoted x0 [t] achieves ε0 (τ, x). Step 2. Calculate the value function ε0 (τ, x) for problem 3.1B: Minimize (with respect to u) the terminal cost ϕM (x[ϑ]) under the state constraint ∀t, |x2 (t)| ≤ µ. To do this we need properties of the maximizer {l 0 , Λ0 (·)}. These were studied in (Ref.6,7). We present them here in a form suited to our example. Properties of Maximizer (i) The function Λ0 (t) ≡ const whenever |x02 [t]| < 1. (ii) The function 0
h (t) =
l10 (ϑ
− t) +
l20
+
ϑ t
dΛ0(s) ≡ 0,
whenever |x02 [t]| ≡ 1. Property (ii) holds because in our example the motion |x[t]| ≡ 1 along the state constraint is not possible with extreme values u(t) = ±k of the control and therefore the maximum principle degenerates along this constraint: h 0 (t) ≡ 0 when |x02 [t]| ≡ 1. (iii) The equality |x02 [t]| = 1 holds at most within one interval [τ 1 , τ2 ]. That is, depending upon the starting point {τ, x}, the optimal trajectory x 0 [t] either totally avoids the state constraint boundary or allows one interval [τ 1 , τ2 ] of motion along the boundary |x02 [t]| = 1, t ∈ [τ1 , τ2 ]. This may be proved by direct calculation for our example. (iv) The function Λ0 (t) is continuous, which means that the generalized derivative dΛ 0 /dt does not include Dirac δ functions. This is true as the necessary condition of discontinuity for Λ(·), namely, condition (s0 [t], x˙ 0 [t − 0]) = (s0 [t], x˙ 0 [t + 0]) is not fulfilled here at any point of the boundary of the state constraint. Here s 0 [t] is the solution to the adjoint equation (31) ds1 = 0, ds2 = −s1 dt − dΛ0(t), s0 [ϑ] = l0 . For further calculation we have to consider several cases: (a) τ < τ1 ≤ τ2 < ϑ; 19
(b) τ = τ1 ≤ τ2 < ϑ; (c) τ < τ1 ≤ τ2 = ϑ; (d) τ2 ≤ τ ≤ ϑ. For each of these cases (a)–(d) we also have to consider subcases: l10 ≤ 0, l20 ≤ 0;
l10 ≥ 0, l20 ≥ 0;
l10 ≤ 0, l20 ≥ 0;
l10 ≥ 0, l20 ≤ 0.
Case (a). First consider l10 ≤ 0, l20 ≤ 0, presuming x1 < m1 and |ϑ − τ | is sufficiently large. For t ∈ [τ1 , τ2 ] = Tc , Property (2) implies 0
h (t) =
l10 (ϑ
− t) +
l20
+
ϑ t
dΛ0(s) ≡ 0.
By virtue of Property (4) Λ0 (t) has no discontinuities and Λ 0 (t) ≡ Const for {t : |x2 [t]| < 1} = [τ, τ1 ) ∪ (τ2 , ϑ] = T c ; so the function Λ0 (t) is differentiable almost everywhere. Therefore, we may introduce λ0 (t) = dΛ0(t)/dt, observing that λ0 (t) ≡ −l10 , t ∈ Tc ; λ0 (t) ≡ 0, t ∈ T c . Hence, while maximizing Ψ(l, Λ, 1; τ, x) in (52), we need only consider functions of the type dΛ(·)/dt = {λ(·) : λ(t) ≡ −l1 , t ∈ Tc ; λ(t) ≡ 0, t ∈ T c }. Since τ < τ1 , we have x2 < 1, h(τ20 ) = l1 (2 − τ ) + l2 = 0 and at position {τ, x} we should also seek for the maximum in Ψ(l, Λ, 1; τ, x) among l 1 < 0, l2 < 0. Direct calculation gives (recall ϑ = 2 in (52)) Ψ(l, Λ, 1; τ, x) = Ψ0 (l, τ1 , τ2 , 1; τ, x) = l1 (x1 + (2 − τ )x2 ) + (l2 − l1 (τ2 − τ1 ))x2 − l1 m1 − −0.5k(τ1 − τ )2 |l1 | − 0.5k(2 − τ2 )|l2 | − |l1 |(τ2 − τ1 ), so that
ε0 (τ, x) = max{Ψ0 (l, τ1 , τ2 , µ; τ, x)|(l, M −1 l)1/2 ≤ 1, τ1 , τ2 }.
(53)
Step 3. Check that the HJB equation holds for V 0 (τ, x) = ε0 (τ, x). We simply substitute the value function V 0 (τ, x) above, found through duality techniques, into the HJB partial differential equation. Denoting the maximizer in (53) by {l10 , l20 , τ10 , τ20 }, applying the theorem on differentiation of max-type functions (Ref. 26) and noting that the maximizer is unique, gives the following formulas: ∂V 0 (τ, x)/∂x1 = l10 , ∂V 0 (τ, x)/∂x2 = l10 (τ10 − τ ) + l10 (2 − τ20 ) + l20 = l10 (τ10 − τ ), 20
(since h0 (τ2 ) = l10 (2 − τ20 ) + l20 = 0 ), and ∂V 0 (τ, x)/∂τ = kl10 (τ10 − τ ) − l10 x2 . Substituting the partials into the equation ∂V (τ, x)/∂τ + min{(∂V (τ, x)/∂x1 )x2 + (∂V (τ, x)/∂x2 )u} = 0, u
(54)
we observe that kl10 (τ10 − τ ) − l10 x2 + min{l10 x2 + l10 (τ1 − τ )u} = 0, u
so that the HJB equation (54) is indeed satisfied in the domain under consideration. Step 4. Calculate the function V 0 (τ, x) itself. We first maximize Ψ0 (l, τ1 , τ2 , 1; τ, x) with respect to l, rewriting V 0 (τ, x; τ, τ2 ) = max{Ψ0 (l, τ1 , τ2 , 1; τ, x) | (l, M −1 l)1/2 ≤ 1} = = max{l1 f1 (τ1 , τ2 , 1; τ, x) + l2 f2 (τ1 , τ2 ; τ, x) | (l, M −1 l)1/2 ≤ 1}, in which f = (f1 , f2 ), and f1 (τ1 , τ2 , 1; τ, x) = (x1 − m1 ) + (τ1 − τ )x2 − k/2(τ1 − τ )2 − (τ2 − τ1 ), f2 (τ1 , τ2 ; τ, x) = 0.5k(2 − τ2 ). Then
V 0 (τ, x; τ1 , τ2 ) = (f (τ1 , τ2 , 1; τ, x), Mf (τ1 , τ2 , 1; τ, x))1/2
and
V 0 (τ, x) = max{V 0 (τ, x; τ1 , τ2 )|τ ≤ τ1 ≤ τ2 ≤ ϑ}. τ1 ,τ2
(55)
Case (b). This is calculated for the initial position {τ, x} similarly to case (a) above. Namely, taking x1 < m1 and assuming l10 ≥ 0, l20 ≥ 0, we have x2 = 1. Thus the state is on the constraint boundary x2 (t) ≡ 1, t ∈ Tc ), except for case τ1 = τ2 when the boundary is visited at an isolated point (this case may be treated separately). Direct calculation gives Ψ0 (l, τ1 , τ2 , µ; τ, x) = (x1 − m1 )l1 − k/2(2 − τ2 )|l2 | − |l1 |(τ2 − τ ) so that now ∂V 0 (τ, x)/∂x1 = l10 , ∂V 0 (τ, x)/∂x2 = 0, ∂V 0 (τ, x)/∂τ = l10 , and the total derivative is dV 0 (τ, x)/dt|u=u0 = l10 + l10 x02 [τ ] = 0, 21
with x02 [τ ] = 1, and l10 = 0. In this case the formula for the total derivative carries no information on what the control u0 (t) should be when moving along the constraint boundary. What helps is the condition along the state constraint boundary (see (9): for t ∈ [τ1 , τ2 ], min max{dV (t, x[t]/dt|u , dϕ1 (t, x[t])/dt|u } = dV 0 (t, x0 [t])/dt|u=u0 = dϕ1 (x02 [t])/dt|u=u0 ≡ 0. u
Here dϕ1 (x02 [t])/dt|u = u ≡ 0 which means that along the boundary, u 0 (t) ≡ 0, t ∈ Tc . Indeed, with x02 [τ ] = 1, we have dx02 /dt = u = 0, and x02 [t] ≡ 1, t ∈ Tc . Case (d). As τ2 ≤ τ , the optimal trajectory avoids the state constraint boundary (x 02 [t] < 1, t ∈ (τ, ϑ]). Here we assume l1 ≥ 0, l2 ≥ 0. Our problem is then to minimize ϕ M (x[ϑ]) without state constraints. (This case occurs when ϑ − τ is fairly small.) Then V 0 (τ, x) = ε0 (τ, x) = max{Ψ(l; τ, x)|(l, M −1 l)1/2 }, Ψ(l; τ, x) = l1 ((x1 − m1 ) + (ϑ − τ )x2 ) + l2 x2 − k
ϑ τ
| l1 (ϑ − s) + l2 |ds,
and l10 (ϑ − t) + l20 > 0. This gives ∂V 0 (t, x)/∂x1 = l10 , ∂V 0 (t, x)/∂x2 = l10 (ϑ−t)+l20 , ∂V 0 (t, x)/∂t = k|l10 (ϑ−t)+l20 |−l10 x2 , and
∂V 0 (t, x)/∂τ + min{(∂V 0 (t, x)/∂x1 )x1 + (∂V 0 (t, x)/∂x2 )u = u
=
k|l10 (ϑ
− τ ) + l20 | + min(l10 (ϑ − τ ) + l20 )u = 0. u
Direct calculation also gives V 0 (τ, x) = ε0 (τ, x) = max{(l, f ) | (l, M −1 l)1/2 ≤ 1} = (f, Mf )1/2 , in which f1 = l1 ((x1 − m1 ) + (ϑ − τ ) − 0.5k(ϑ − τ )2 ), f2 = l2 (x2 − k(ϑ − τ )). Case (c). This follows from the formulas for case (b). We have thus looked through the main cases (a)–(d) and found the value functions for the relevant subcases of l1 , l2 which can be traced from Figure 1. The other subcases are treated similarly. Remark 6.2 In the example we observed that • in the interior of the state constraint (|x2 | < 1) the value function V0 (τ, x) satisfies the HJB equation (7), and 22
• on the boundary (|x2 | ≡ 1) it satisfies the boundary condition max{dV (t, x[t]/dt|u , dϕ1 (t, x[t])/dt|u } ≥ dV 0 (t, x0 [t])/dt|u=u0 = dϕ1 (x02 [t])/dt|u=u0 ≡ 0, ∀u ∈ P(t). as indicated in (9). In order to get the total value function V 0 (τ, x) we have to go through all cases (a)-(d) with their subcases—all combinations l i ≥ 0; li ≤ 0, i = 1, 2. The procedure is somewhat lengthy, but one avoids solving the HJB equation with complex boundary condition. To solve Problem 2.1A we may follow the same approach. Namely, after reaching relation (52) we set ε = µ, which leads to the next result. Theorem 6.2. The solution to Problem 3.1A is given by µ0 (τ, x) = max{Φ(l, Λ; τ, x)}, l,Λ
in which the maximum is over all (l, M −1 l)1/2 + τϑ |dΛ(t)| ≤ 1; l ∈ IRn , Λ(·) ∈ Var[τ, ϑ]. Here ϑ Φ(l, Λ; τ, x) = l1 ((x1 − m1 ) + (ϑ − τ )x2 )) + l2 (x2 + dΛ(s))− −k
ϑ τ
l1 (ϑ − s) + l2 +
ϑ s
τ
dΛ(t) ds.
Thus, in order to solve Problem 3.1A, we first solve Problem 3.1B, and then, by equating ε = µ, we pass to 3.1A. Further analysis of Problem 3.1A is then similar to 3.1B. Figure 1 shows a particular version of this example approximated numerically by level set methods (Ref.20-22). The parameters chosen are
k = 1,
m=
0.5 0
,
M=
8 0 0 8
.
The figure shows the final reach set at ϑ = 2 without constraint (dashed line) and subject to the convex constraints |x2 (t)| ≤ 1 (thick solid line). The constrained reach set is much smaller than the intersection of the state constraints and the unconstrained reach set.
7 Second Example In this section we consider an example with a nonconvex constraint (see Problem 3.2), and then add a convex constraint as well (see Problem 3.3). On the finite interval t ∈ [0, ϑ] consider the system x˙ 1 = u1 , x˙ 2 = u2 , (56) 23
with controls |ui | ≤ k, i = 1, 2, k ≥ 1. Let m = (0, m2 ) be a terminal point, with m2 > 1 and ϕM (x) = (x − m, x − m)1/2 , ϕ1 (x) = x2 , ϕ2 (x) = (x21 + x22 )1/2 . Our objective is to calculate at time τ ∈ [0, ϑ) the backward reach set W [τ ] = W (τ ; ϑ, M) from the terminal set M = {x : (x − m, x − m)1/2 ≤ 1} under state constraints ϕ1 (x) ≤ 2, ϕ2 (x[t]) ≥ 1, t ∈ [τ, ϑ]. We could calculate the value function for system (56),
V (τ, x) = min max max{ϕ1 (x[t]) − 1, −ϕ2 (x[t]) + 2 | t ∈ [τ, ϑ]}, ϕM (ϑ) | x[τ ] = x . u
t
(57) We would then have W [τ ] = {x : V (τ, x) ≤ 1}. To calculate V we use techniques of convex analysis and duality theory along the lines of (Ref.4,6). We first presume that there is only one state constraint, namely ϕ 2 (x[t]) ≥ 1. (The second constraint, ϕ1 (x[t]) ≤ 2, will be added later). Therefore we first solve Problem 3.2, proceeding in several steps. Step 1. Indicate conditions for existence of a control u(t) that ensure solvability of the inequalities (x[ϑ] − m, x[ϑ] − m)1/2 ≤ ε, ϕ2 (x[t]) ≥ r2 , ∀t ∈ [τ, ϑ], with ε, r2 > 0 given. Since x[t] = x(t, τ, x) = x +
t τ
(58)
u(s)ds, x = x[τ ], x, u ∈ IRn ,
the inequalities (58) are respectively equivalent to the two relations (l, x − m) + and there exists 2
ϑ τ
dΛ(t) −
(2) q∗ (·)
ϑ τ
ϑ τ
(l, u(s))ds ≤ ε(l, l)1/2 , ∀l ∈ IR2 ,
∈ Q0 with
q∗(2) (t), x
+
t τ
u(s)ds dΛ(t) ≤ r2
ϑ τ
dΛ(t); ∀Λ(·) ∈ V ar+ [τ, ϑ].
(2)
First consider q∗ (·) to be a fixed element of Q0 . Adding the last two inequalities, we then come to an equivalent relation 2
ϑ τ
dΛ(t) − (l, m) + l − ≤ r2
ϑ τ
ϑ τ
q∗(2) (s)dΛ(s), x +
ϑ τ
l−
ϑ t
q∗(2) (s)dΛ(s), u(t) dt ≤
dΛ(t) + ε(l, l)1/2 ≤ 0, ∀l ∈ IRn , ∀Λ(·) ∈ V ar+ [τ, ϑ].
(59)
Minimizing over u, and applying a minimax theorem (Ref. 25, 26), we observe that there (2) exists a control u which ensures that (59) is solvable for fixed q∗ (·) iff the inequality min min{2 Λ
l
ϑ τ
dΛ(t)+(s[τ ], x)−(l, m)−k
ϑ τ
(|s1 [t]|+|s2 [t]|)dt} ≤ (ε(l, l)
24
1/2
+r2
ϑ τ
dΛ(t) (60)
holds for all l ∈ IR2 and all Λ(·) ∈ V ar[τ, ϑ]. Here s[t] = l −
ϑ t
q∗(2) (s)dΛ(s)
(2)
is the solution for q (2) (t) = q∗ (t) to the adjoint equation ds = q (2) (t)dΛ(t), s[ϑ] = l. Step 2. To obtain solvability conditions for (58), we recall that there should exist a function (2) q∗ (·) ∈ Q0 which ensures solvability of (59). We therefore remove at this stage the (2) assumption that q∗ is fixed and, omitting the asterisk, minimize the left-hand side of (60) over q (2) . This yields the next proposition. Theorem 7.1. In order for (58) to be solvable it is necessary and sufficient that inequality min max max{2 Λ
q (2)
ϑ
l
τ
dΛ(t) + (s[τ ], x) − (l, m) − k
ϑ τ
(|s1 [t]| + |s2 [t]|)dt} ≤ 1
holds, in which the maximum and minimum are over {l, Λ(·) : ε(l, l)
1/2
+ r2
ϑ τ
dΛ(t) = 1}, q (2) (·) ∈ Q0 .
To find the value function V (τ, x) of (57) we set µ = r2 = ε. Theorem 7.2. The value function V (τ, x) = min max max{2 q (2) (·) Λ(·)
l
ϑ τ
dΛ(t) + (s[τ ], x) − (l, m) − k
ϑ τ
(|s1 [t]| + |s2 [t]|)dt},
in which the maximum and minimum are over {l, Λ(·) : (l, l)
1/2
+
ϑ τ
dΛ(t) = 1}, q2 (·) ∈ Q0 .
V (τ, x) is the smallest µ for which (58) is solvable, with r2 = ε = µ. Then the reach set W2 [τ ] = {x : V (τ, x) ≥ 1} and Problem 3.2 is solved. We now add the convex constraint ϕ1 (x[t]) ≤ r1 , find the smallest ε for which the inequalities x21 [t] + x22 [t] ≥ r2 and (x1 [ϑ])2 + (x2 [ϑ] − m1 )2 ≤ ε hold, and at the end substitute r1 = 2, r2 = 1. We denote Λ by Λ2 . Step 1. Calculate the value function ε0 (τ, x). Adding the new constraint and repeating along the lines above we note that (58) will be augmented by ϕ1 (t, x[t] ≤ r1 , which may be represented as ϑ τ
q1∗ (t), x +
t τ
u(s)ds dΛ1(t) ≤ r1 25
ϑ τ
dΛ1 (t); ∀Λ1 (·) ∈ Var+ [τ, ϑ].
Following the previous scheme by adding the last inequality we observe that (60) will be transformed into the inequality, min min{2 Λ
l
ϑ τ
dΛ2 (t) + (s[τ ], x) − (l, m) − k
≤ (ε(l, l)
1/2
+ r1
ϑ τ
dΛ1(t) − r2
ϑ τ
ϑ τ
(|s1 [t]| + |s2 [t]|)dt}
dΛ2 (t),
(61)
which must hold for all l ∈ IR2 and all Λi (·) ∈ Var+ [τ, ϑ], i = 1, 2. Let matrix N 1 = {n1i,j } be of dimension 2 × 2 with the only nonzero element n 12,2 = 1 and N 2 be the unit matrix of dimension 2. Then s[t] = l −
ϑ t
( − N1 q∗(1) (s)dΛ1 (s) + N2 q∗(2) (s))dΛ2 (s)
is the solution to the new adjoint equation ds = −N1 q (1) (t)dΛ1 (t) + N2 q (2) (t)dΛ2 (t), s[ϑ] = l. (i)
with 2-dimensional vectors q (i) (t) = q∗ (t), i = 1, 2. (i)
Assuming q∗ nonfixed, we omit the asterisks, maximize over q (1) and minimize over q (2) , to get
ε0 (τ, x) = min max max
q (1) l,Λ1 ,Λ2
q (2)
(s[τ ], x) − (l, m) − k
ϑ τ
(|s1 [t]| + |s2 [t]|)dt −
ϑ τ
(r1 dΛ1 (t) − r2 dΛ2 (t)) .
Here the minimum is over q (2) (·) ∈ Q02 and the maximum is over q (1) (·) ∈ Q01 , Λi ∈ Var+ [τ, ϑ], i = 1, 2, l ≤ 1. Then ε0 (τ, x) = V30 (t, x),
V30 (τ, x) = min ϕM (x[ϑ]) ϕ1 (t, x[t]) ≤ r1 , ϕ2 (t, x[t]) ≥ r2 ; t ∈ [τ, ϑ]}; x(τ ) = x . u
We now study the optimizers. We note the following properties of trajectories for time interval [0, 1]. • The trajectories of (56) divide into three classes: (i) those that avoid the state constraint boundary, (ii) those that meet the constraint boundary ϕ 2 = r2 and (iii) those that meet the constraint boundary ϕ1 = r1 . • No trajectory meets the boundary of both state constraints. • A trajectory in class (ii) or (iii) visits the state constraint boundary within only one interval of time. We can thus distinguish three types of formulas for the value function V 30 (t, x). 26
(I). If position {τ, x} generates a trajectory of class (i) which does not meet the constraint boundary, dΛi(t) ≡ 0, s(t) ≡ l and V30 (τ, x) = max{(l, x − m) − k(ϑ − τ ) | (l, l) ≤ 1}, l
so that x − m ≤ k(ϑ − τ ). (II). If position {τ, x} generates a trajectory of class (ii), we have dΛ1 (t) ≡ 0, and proceed as follows. Denote the optimizers in the previous relation by q (2)0 (·), Λ02 (·), l0 . Note that (2) (2) here Q0 comprises functions q(t) with coordinates q 1 = π sin(t − α), q2 = π cos(t − α), π ∈ [0, ρ], ρ > 0. Functions q (i)0 (t) = 0 iff the trajectory is on the boundary of the state constraint. In our example they turn out to be collinear with the trajectory x[t] when the latter moves along a circle of radius r2 , namely, q (2)0 (t) = r2−1 x[t]. We now need other properties of the optimizers, which we present in a form suited to our example. (i) The function Λ02 (t) ≡ const, whenever x[t] > r2 . (ii) The function h0 (t) = −l0 +
ϑ (2)0 q (s)dΛ0 (s) t
2
≡ 0, whenever x[t] ≡ 1.
This property holds because in our example the motion x[t] ≡ r 2 along the state constraint is not possible with extreme values u i(t) = ±k of the control, so that the Maximum Principle degenerates along this constraint: h0 (t) ≡ 0 when x[t] ≡ 1. Indeed, the last equality is possible iff d(x[t], x[t])/dt = 2(x˙1 [t]x1 [t] + x˙2 [t]x2 [t]) = 2(x[t], u[t]) ≡ 0. This indicates that the control u is tangent to the state constraint. (iii) The equality x0 [t] = 1 holds at most within one interval [τ 1 , τ2 ]. That is, depending upon the starting point {τ, x}, the optimal trajectory x 0 [t] either totally avoids the state constraint boundary or allows just one interval [τ 1 , τ2 ] of motion along the boundary, x[t] ≡ r2 , t ∈ [τ1 , τ2 ]. This may be proved directly. (iv) The function Λ02 (t) is discontinuous, so that the generalized derivative dΛ 02 /dt may include δ functions. The necessary condition for discontinuity of Λ 02 (·), namely, condition (s0 [t], x˙ 0 [t − 0]) = (s0 [t], x˙ 0 [t + 0]), is now fulfilled—the optimal trajectory x 0 (t) is tangent to the circle x = 1 at the time instants of arrival and departure. This could be observed directly. 27
For further calculation we consider the cases: (a) τ < τ1 ≤ τ2 ≤ ϑ; (b) τ = τ1 ≤ τ2 ≤ ϑ; (c) τ2 ≤ τ ≤ ϑ. We shall look for the backward reach set from position {ϑ, M} with target set M = {x : x − m ≤ ε}, assuming x1 [τ ] = x1 ≤ m1 . (For x1 > m1 the trajectory does not reach the complementary convex constraint, that is, there is no class (ii) trajectory.) Take case (a) with l1 > 0, l2 < 0. Then condition h0 (t) ≡ 0 gives λ0 (t) ≡ 0, t ∈ (τ1 , τ2 ), so (2)0 (2)0 q (2)0 (t)λ0 (t) = q1 δ(t − τ1 ) + q2 δ(t − τ2 ), (2)0
(2)0
wherein q1 (τ (1) ) = x[τ1 ]/r2 , q2 (τ (2) ) = x[τ2 ]/r2 are indicated above and, τ1 , τ2 are to be found. The final optimization problem is over vector l ∈ IR 2 and points τ1 , τ2 . Then (2)0
V30 (τ, x) = (s0 (τ ), x) − (l0 , x) − k l0 (τ1 − τ ) + q2 (τ2 ) (ϑ − τ2 ) − q (1)0 − q (2)0 , in which
(2)0
(2)0
s˙ = q2 (τ1 )δ(t − τ1 ) + q2 (τ2 )δ(t − τ2 ), s(ϑ) = l, so that (2)0
s0 (t) = l0 , t ∈ [τ, τ1 ), s0 (t) ≡ 0, t ∈ (τ1 , τ2 ), s0 (t) = q2 (τ2 ), t ∈ [τ2 , ϑ]. The controls along the state constraint boundary (t ∈ [τ 1 , τ2 ]) are u0i (t) = r2−1 x0i (t). For t < τ1 we have u0 (t) = ksignli0 and for t > τ2 we again have u0 (t) = ksignli0 . Recall that l0 = l(t, x0 (t)). Cases (b),(c) follow from (a). (III). If position {τ, x} generates a class (iii) trajectory, we have dΛ2(t) ≡ 0, and then ds = −q (1) t)dΛ1 (t), s(ϑ) = l. Further details are similar in nature to the previous case. The optimal trajectory consists of one or two intervals during which x0 (t) lies away from the boundary with u 0 (t) = ksgnli0 and one interval during which x02 [t] ≡ r2 . Figure 2 shows a particular version of this example approximated numerically in the same manner as for Figure 1 The parameters are 28
k = 1,
m=
0 1.3
.
The figure shows the final reach set at ϑ = 1 with no constraint (dashed line) and subject to the combination of convex constraint |x 2 [t]| ≤ 2 and nonconvex constraint x21 + x22 ≥ 1 (thick solid line).
8 Conclusions This paper presents some basic solution schemes for nonstandard dynamic programming problems with state constraints, motivated by requirements in control design for automation and navigation. The solutions are given in the form of generalized HJB-type relations or, in the linear case, through duality relations of convex analysis and minimax theory. Illustrative examples are worked out.
29
References 1. KURZHANSKI A.B., VARAIYA P., Optimization Techniques for Reachability Analysis Journal of Optimization Theory and Applications, v.108, N2, pp.227-251, 2001. 2. KURZHANSKI A.B., VARAIYA P. Reachability under state constraints—the ellipsoidal technique Proceedings of the IFAC-2002 World Congress, Barcelona, Elsevier, 2002. 3. KURZHANSKI A.B., VARAIYA P. On some nonstandard dynamic programming problems of control theory, Variational Methods and Applications, Editors F.Giannessi and A.Maugeri, Kluwer Academic Publishers, New York, New York, pp. 613-627, 2004. 4. GUSEV M.I., KURZHANSKI A.B. Optimization of Controlled Systems with Bounds on the Controls and State Coordinates. Differential Equations, vol.7, N9, pp. 15911602, 1971 and vol.7, N10, pp. 1789-1800, 1971. 5. KRASOVSKII N. N., Game-Theoretic Problems on the Encounter of Motions. Nauka, Moscow, 1970, (in Russian). English Translation: Rendezvous Game Problems, National Technical Information Service, Springfield, VA, 1971. 6. KURZHANSKI A.B. Control and Observation Under Uncertainty, Nauka, Moscow (in Russian), 1977. 7. KURZHANSKI A.B., OSIPOV Yu.S., On Optimal Control under Restricted Coordinates PMM (Applied Mathematics and Mechanics), vol.33, N4, pp. , 1969. 8. LEITMANN G., Optimality and Reachability via Feedback Controls. Dynamic Systems and Mycrophysics, edited by Blaquiere A., Leitmann G., Academic Press, New York, New York, pp.119-141, 1982. 9. LYGEROS J. On the relation of reachability to minimum-cost control. Proceedings of the 41-st IEEE Control and Decision Conference, Las Vegas, Nevada, USA, 2002. 10. KURZHANSKI A.B., MITCHELL I.M., VARAIYA P., Control Synthesis for State Constrained and Obstacle Problems Proceedings of the IFAC Symposium NOLCOS-2004, Stuttgart, Germany, Elsevier, 2004. 11. FILIPPOV A.F., Differential Equations with a Discontinuous Right-Hand Side, Kluwer Academic Publishers, 1988. 12. AUBIN J-P., Viability Theory, Birkhauser, Boston, 1991. 13. ROCKAFELLAR R.T., Convex Analysis, 2-nd edition, Princeton University Press, Princeton, NJ, 1999. 30
14. LYGEROS J., TOMLIN C., SASTRY S., Controllers for Reachability Specifications for Hybrid Systems, Automatica, vol. 35 (3), pp.349-370, 1999. 15. FLEMING W.H., SONER H.M., Controlled Markov Processes and Viscosity Solutions, Springer Verlag, New York, 1993. 16. CRANDALL M.G., EVANS L.C., LIONS P-L., Some Properties of Solutions of Hamilton-Jacobi Equations. Transactions of the American Mathematical Society, vol.282, N2, pp. 487-502, 1984. 17. LIONS P.L., Viscosity Solutions and Optimal Control, Proceedings of International Congress on Industrial and Applied Mathematic), ICIAM 91, Society for Industrial and Applied Mathematics Publications, Philadelphia, pp.182-195, 1992. 18. SUBBOTIN A.I., Generalized Solutions of First-Order PDE’s. The Dynamic optimization Perspective. Birkhauser, Boston, 1995. 19. BARDI M., CAPUZZO-DOLCETTA I., Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Birkhauser, Boston, 1997. 20. SETHIAN J.A., Level Set Methods and Fast Marching Methods, Cambridge University Press, Cambridge, UK, 1999. 21. OSHER S., FEDKIW R., Level Set Methods and Implicit Surfaces. Springer-Verlag, New York, 2002. 22. MITCHELL I., A Toolbox of Level Set Methods. Technical Report TR-2004-09, Department of Computer Science, University of British Columbia, Vancouver, BC, Canada, 2004. Available at http://www.cs.ubc.ca/∼mitchell/ToolboxLS. 23. VARAIYA P., Reach set computation using optimal control. Proceedings of KIT Workshop on Verification of Hybrid Systems, Verimag, Grenoble, 1998. 24. ROCKAFELLAR R.T., WETS R.J.B., Variational Analysis, Springer-Verlag, New York, 1998. 25. FAN K. Minimax Theorems. Proceedings of the National Academy of Sciences of USA, vol.31, N1, pp.42-47, 1953. 26. DEMIANOV V.F., MALOZEMOV Introduction to the Minmax, Halsted Press, New York, 1974. 27. BOYD S., EL GHAOUI L., FERON E., BALAKRISHNAN V., Linear Matrix Inequalities in System and Control Theory, Society for Industrial and Applied Mathematics Publications, Philadelphia, 1994.
31
List of figures. Figure 1 Figure 2
32
Captions for figures. Figure 1. Growth of the reach set subject to convex constraints for a double integrator (solid lines; final time is shown thicker). The target set (dotted line) and constraints (shaded region) are shown as well. The unconstrained reach set at the same final time is shown for comparison (dashed line). Figure 2. Growth of the reach set subject to convex and nonconvex constraints (solid lines; final time is shown thicker). The target set (dotted circle) and constraints (shaded regions) are shown as well. The unconstrained reach set at the same final time is shown for comparison (dashed line).
33
k = 1, m = [ 1 0 ], M = [ 12 0; 0 12 ] 2.5
2
1.5
x2
1
0.5
0
−0.5
−1
−1.5
−1.5
−1
−0.5
0
0.5
1
x1
Figure 1. Authors: A.B.Kurzhanski, I.M.Mitchell, P.Varaiya.
34
1.5
2
Figure 2. Authors: A.B.Kurzhanski, I.M.Mitchell, P.Varaiya.
35