Hamilton%Jacobi%Bellman equations for Quantum ...

Hamilton-Jacobi-Bellman equations for Quantum Optimal Feedback Control J. Gougha , V.P. Belavkinb , and O.G. Smolyanovc a School of Computing & Informatics, Nottingham-Trent University, NG1, 4BU, UK b School of Mathematics, Nottingham University, NG7 2RD, UK c Faculty of Mathematics and Mechanics, Moscow State University, Russia J Opt. B: Quantum Semiclass. Opt. 7 (2005) S237–S244 Abstract We exploit the separation of the …ltering and control aspects of quantum feedback control to consider the optimal control as a classical stochastic problem on the space of quantum states. We derive the corresponding Hamilton-Jacobi-Bellman equations using the elementary arguments of classical control theory and show that this is equivalent, in the Stratonovich calculus, to a stochastic Hamilton-Pontryagin setup. We show that, for cost functionals that are linear in the state, the theory yields the traditional Bellman equations treated so far in quantum feedback. A controlled qubit with a feedback is considered as example.

1

Introduction

When engineers set about to control a classical system with incomplete data, they can evoke the celebrated Separation Theorem which allows them to treat the problem of estimating the state of the system (based on typically partial observations) from the problem of how to optimally control the system (through feedback of these observations into the system dynamics), see for instance [17]. Remarkably, this approach may also be carried over to the quantum world which cannot be in principle completely observed: this was …rst pointed out by Belavkin in [3],[5], see also the later [12],[15]. Quantum measurement, by its very nature, leads always to partial information about a system in the sense that some quantities always remain uncertain, and due to this the measurement typically alters the prior to a posterior state in process. The Belavkin nondemolition principle [4, 6] states that this state reduction can be e¤ectively treated within a non-demolition scheme [6],[7] when measuring the system over time. 1

Hence we may apply a quantum …lter for either discrete [2] or time-continuous [4] non-demolition state estimation, and then consider feedback control based on the results of this …ltering. The general theory of continuous-time nondemolition estimation developed in [7],[9],[10],[11] derives for quantum posterior states a stochastic …ltering evolution equation not only for di¤usive but also for counting measurements, however we will consider here the special case of Belavkin quantum state …ltering equation based on a di¤usion model described by a single white noise innovation, see e.g. [8],[33],[16]. We should also emphasize that the continuous-time …ltering equation can be obtained as the limit of a discrete-time state reduction based on von Neumann measurements [23],[24],[29],[30], however this time-continuous limit goes beyond the standard von Neumann projection postulate, replacing it by quantum …ltering equation as a stochastic Master equation. Once the …ltered dynamics is known, the optimal feedback control of the system may then be formulated as a distinct problem. Modern experimental physics has opened up unprecedented opportunities to manipulate the quantum world, and feedback control has been already successfully implemented for real physical systems [1],[22]. Currently, these activities have attracted interest in the related mathematical issues such as stability, observability, etc., [15], [25], [19], [26]. The separation of the classical world from the quantum world is, of course, the most notoriously troublesome task faced in modern physics. At the very heart of this issue is the very di¤erent meanings we attach to the word state. What we want to exploit is the fact that the separation of the control from the …ltering problem gives us just the required separation of classical from quantum features. By the quantum state we mean the von Neumann density matrix which yields all the (stochastic) information available about the system at the current time - this we also take to be the state in the sense used in control engineering. All the quantum features are contained in this state, and the …ltering equation it satis…es may then to be understood as classical stochastic di¤erential equation which just happens to have solutions that are von Neumann density matrix valued stochastic processes. The ensuing problem of determining optimal control may then be viewed as a classical problem, albeit on the unfamiliar state space of von Neumann density matrices rather than the Euclidean spaces to which we are usually accustomed. Once we get accustomed to this setting, the problem of dynamical programming, Bellman’s optimality principle, etc., can be formulated in much the same spirit as before. We shall consider optimization for cost functions that are non-linear functionals of the state. Traditionally quantum control has been restricted to linear functions where - given the physical meaning attached to a quantum state - the cost functions are therefore expectations of certain observables. In this situation, which we consider as a special case, we see that the distinction between classical and quantum features may be blurred: that is, the classical information about the measurement observations can be incorporated as additional randomness into the quantum state. This is the likely reason why the separation does not seem to have been taken up before.

2

2

Notations and Facts

The Hilbert space for our …xed quantum system will be a complex, separable Hilbert space h . We shall use the following spaces of operators: A = B (h) A? = T (h) S = S (h) T0 = T0 (h) ? T0? = T0 (h)

-

the the the the the

Banach algebra of bounded operators on h; predual space of trace-class operators on h; positive, unital-trace operators (states) on h; tangent space of zero-trace operators on h; cotangent space (see below):

The space A? equipped with the trace norm k%k1 = tr j%j is a complex Banach space, the dual of which is identi…ed with the algebra A with usual operator norm. The natural duality between the spaces A? and A is indicated by h%; Ai := tr f%Ag ;

(1)

for each % 2 A? ; A 2 A. The positive elements, in the sense of positive de…niteness % 0, form a cone T+ of the real subspace T A? of all Hermitian elements % = %y , and the unit trace elements % 2 T+ normalized as k%k1 = 1 are called normal states. Thus S = T+ \ T1 , where T1 = f 2 T : tr = 1g, and the extremal elements % 2 S of the convex set S T+ correspond to pure quantum states. Every state % can be parametrized as % (q) = %0 q by a tangent element q 2 T0 with respect to a given state %0 2 S. We may use the duality (1) to introduce cotangent elements p 2 T0? . Knowledge of hq; pi for each q 2 T0 will only serve to determine p 2 A up to an additive constant (as the q’s are tracefree): for this reason we should think of cotangents elements p as equivalence classes p [X] = fA 2 A : A = X + I, for some 2 Rg : (2) 2 = A sym A of the algebra A is the The symmetric tensor power Asym 2 subalgebra of B h of all bounded operators on the Hilbert product space h 2 = h h, commuting with the unitary involutive operator S = S y of permutations 1 2 7! 2 1 for any i 2 h. A map L (t; ) from A = B (h) to itself is said to be a Lindblad generator if it takes the form X L (t; X) = i [H (t) ; X] + LR (X) ; (3)

LR (X)

1 y R RX 2

= Ry XR

1 XRy R 2

(4)

with H self adjoint, the R 2 A (and the summations in (3) understood to be ultraweakly convergent [27] for an in…nite set fR g). The generator is Hamiltonian if it just takes form i [H (t) ; ]. The pre-adjoint L0 = L? of a generator L is de…ned on the pre-adjoint space A? through the relation hL0 (%) ; Xi = h%; L (X)i. We note that Lindblad generators have the property L (I) = 0 corresponding to conservation of the identity operator I 2 A or, equivalently, tr fL0 (%)g = 0 for all % 2 A? . 3

In quantum control theory it is necessary to consider time-dependent generators L (t), through an integrable time dependence of the controlled Hamiltonian H (t), and, more generally, due to a square-integrable time dependence of the coupling operators R (t). We shall always assume that these integrability conditions, ensuring existence and uniqueness of the solution % (t) to the quantum state Master equation d % (t) = L0 (t; % (t)) dt

# (t; % (t)) ;

(5)

for all for t t0 with given initial condition % (t0 ) = %0 2 S, are ful…lled. Let F = F [ ] be a (nonlinear) functional % 7! F [%] on A? (or on S A? ), then we say it admits a (Frechet) derivative if there exists an A-valued function r% F [ ] on A? (T0? -valued functional on T0 ) such that lim

h!0

1 fF [ + h ] h

F [ ]g = h ; r% F [ ]i ;

(6)

for each 2 A? (for each 2 T0 ). In the same spirit, a Hessian r% 2 r% r% 2 -valued can be de…ned as a mapping from the functionals on S to the Asym functionals, via lim

h;h0 !0

1 fF [ + h + h0 0 ] F [ + h ] F [ + h0 0 ] + F [ ]g hh0 0 =h ; r% r% F [ ]i :

(7)

and we say that the functional is twice continuously di¤erentiable whenever r% 2 F [ ] exists and is continuous in the trace norm topology. Likewise, a functional f : X 7! f [X] on A is said to admit an A? -derivative if there exists an A? -valued function rX f [ ] on A such that 1 ff [ + hA] h!0 h lim

f [ ]g = hrX f [ ] ; Ai

(8)

for each A 2 B (h). The derivative rX f [ ] has zero trace, rX f [A] 2 T0 for each A 2 A, if and only if the functional f [X I] does not depend on , i.e. is essentially a function f (p) of the class p [X] 2 T0? . With the customary abuses of di¤erential notation, we have for instance r% f (h%; Xi) = f 0 (h%; Xi) X;

rX f (h%; Xi) = f 0 (h%; Xi) %;

for any di¤erentiable function f of the scalar x = h%; Xi. Typically, we shall use r% more often, and denote it by just r.

3

Quantum Filtering Equation

The state of an individual continuously measured quantum system does not coincide with the solution of the deterministic master equation (5) but instead 4

depends on the random measurement output ! in a causal manner. We take the output to constitute a white noise process f (t) : t 0g, in which case it is mathematically more convenient to work with the integrated process fw (t) : t 0g, Rt given formally by w (t) = 0 (s) ds. It is then natural to model w (t).as a Wiener process and here we take ( ; F; P) to be the canonical probability space: that is, is the space of all continuous paths ! = f! (t) : t 0g with ! (0) = 0, and w (t) is the co-ordinate process w (t) ! (t), for each outcome !. The process fw (t) : t 0g is then the innovations process. We then view the state as an S-valued stochastic process % (t) : ! 7! %! (t), depending on the particular observations ! = f! (t)g 2 . (Here we shall use the symbol as subscript to indicate that the kernel symbol describes a random variable when we do not want to display !.) Causality is re‡ected through the requirement that the state process be adapted: that is % (t) is measurable variable with respect to the sigma-algebra generated by the Wiener output upto and including time t for each t 0. The Belavkin quantum …ltering equation giving the evolution of the …ltered state in this case is [8],[11],[33],[16] d% (t) = # (t; % (t)) dt + (% (t)) dw (t) where dw (t) = w (t + dt)

(9)

w (t), the time coe¢ cient is

# (t; %) = i [%; H (t)] + L0R (%) + L0L (%) ;

(10)

with L0L (%) of the form given L0L (%) = L%Ly

1 y %L L 2

1 y L L%; 2

and the ‡uctuation coe¢ cient is (%) = L% + %Ly

%; L + Ly %:

(11)

Here L is a bounded operator describing the coupling of the system to the measurement apparatus. The time coe¢ cient # consists of three separate terms: The …rst term is Hamiltonian and depends on time through the dependence of H on a steering parameter u (t) (belonging to some parameter space U) which we must specify at each time; the second term is the adjoint of a general Lindblad generator LR due to a reservoir coupling which describes the uncontrolled, typically dissipative, e¤ect of the environment; the …nal term is adjoint to the time independent Lindblad generator L which is related to the coupling operator L with the measurement apparatus. The maps # and are required to be Lipschitz continuous in all their components: for L constant and bounded, this will be automatic for the %-variable with the notion of trace norm topology. We remark that tr f (%)g = 0 if tr% = 1 and, by conservativity, tr f# (t; %)g = 0 for all % 2 A? . This implies that the normalization tr% = 1 is conserved under the stochastic evolution (9) and so that q! (t) = %0 %! (t) 2 T0 for all t t0 if %! (t0 ) = %0 . 5

Let %r;! (t; %) : r t; ! 2 be the solution to (9) for r t starting in state % at time t for all ! 2 . This will be a Markov process in S (embedded in the Banach space A? ), see for instance [18], and we remark that, for twice continuously di¤erentiable functionals F on A? , we will have lim

h!0+

1 E F %t+h; (t; %) h

F [%]

= D (t; %) F [%] ;

where D (t; %) is the elliptic operator de…ned by 1 h (%) (%) ; (r r) i : (12) 2 For the classical analogue of stochastic ‡ows on manifolds, see for instance [14]. D (t; %) = h# (t; %) ; r i +

3.1

Stratonovich Version

We convert to the Stratonovich picture [31] by means of the identity [20] (% ) dw =

(% ) dw

1 d (% ) :dw 2

and from (11) we get d (% ) = Ld% +d% Ly

d% ; L + Ly %

% ; L + Ly d%

d% ; L + Ly d% :

After a little algebra, we obtain the Stratonovich form of the Belavkin …ltering equation: d% = (t; % ) dt + (% ) dw; (13) which we may equate with the more formal white noise equation %_ ! = (%! ) (t). The time-coe¢ cient is given by (t; %)

where

(t; %! )+

1 L + Ly ; L + Ly % %; L + Ly 2 n o y = i [%; H (t)] + L0R (%) + K (%) % + % K (%) + F (%) % ; = # (t; %)

(14)

(%) and we introduce the operator-valued function K (%) :=

1 L + Ly L + %; L + Ly L 2

(15)

and the scalar-valued function 1 2 F (%) := %; L2 + 2Ly L + Ly2 %; L + Ly : (16) 2 We refer to # in (10) and in (14) as the Itô and Stratonovich state velocities, respectively. We note that the decoherent component L%Ly appearing in L0L , and present in # (t; %), is now absent in (t; %). The elliptical operator D (t; %) can then be put into Hörmander form as 1 h (%) ; r h (%) ; r ii ; 2 by using the equality (14) in the de…nition (12). D (t; %) ( ) := h (t; %) ; r i +

6

(17)

4

Optimal Control

From now on we will assume that the Hamiltonian H and therefore # (and ) are functions of a controlled parameter u 2 U depending on t such that the time dependence of the generator L is of the form L (u (t)). Moreover, we do not require at this stage the linearity of # (u; %) with respect to %, as well as the quadratic dependence (%), which means that what follows below is also applicable to more general quantum stochastic kinetic equations d% (t) = # (u (t) ; % (t)) dt + (% (t)) dw (t) of Vlassov and Boltzmann type, with only the positivity and trace preservation requirements tr f# (u; %)g = 0 = tr f (%)g. A choice of the control function fu (r) : r 2 [t0 ; t]g is required before we can solve the …ltering equation (9) at the time t for a given initial state %0 at time t0 . From what we have said above, this is required to be a U-valued function which we take to be continuous for the moment. The cost for a control function fu (r)g over any time-interval [t; T ] is random and taken to have the integral form J! [fu (r)g ; t; %] =

Z

T

C (u (r) ; %! (r)) dr + G (%! (T ))

(18)

t

where f% (r) : r 2 [t; T ]g is the solution to the …ltering equation with initial condition % (t) = %. We assume that the cost density C and the terminal cost, or bequest function, G will be continuously di¤erentiable in each of its arguments. In fact, due to the statistical interpretation of quantum states, we should consider only the linear dependence C (u; %) = h%; C (u)i ; G (%) = h%; Gi

(19)

of C and G on the state % as it was already suggested in [5],[7],[12]. We will explicitly consider this case later, but for the moment we will not use the linearity of C and G. We refer to C (u) 2 A as cost observable for u 2 U and G 2 A as the bequest observable. The feedback control u (t) is to be considered a random variable u! (t) adapted with respect to the innovation process w (t), in line with our causality requirement, and so we therefore consider the problem of minimizing its average cost value with respect to fu (t)g. To this end, we de…ne the optimal average cost on the interval [t; T ] to be S (t; %) :=

inf

fu (r)g

E [J [fu (r)g ; t; %]] ;

(20)

where the minimum is considered over all measurable adapted control strategies fu (r) : r tg. The aim of feedback control theory is then to …nd an optimal control strategy fu (t)g and evaluate S (t; %) on a …xed time interval [t0 ; T ]. Obviously that the cost S (t; %) of the optimal feedback control is in general smaller 7

than the minimum of E [J [fug ; t; %]] over nonstochastic strategies fu (r)g only, which gives the solution of the open loop (without feedback) quantum control problem. In the case of the linear costs (19) this open-loop problem is equivalent to the following quantum deterministic optimization problem which can be tackled by the classical theory of optimal deterministic control in the corresponding Banach spaces.

4.1

Bellman & Hamilton-Pontryagin Optimality

Let us …rst consider nonstochastic quantum optimal control theory assuming that the state % (t) 2 S obeys the master equation (5) where # (u; %) is the adjoint L0 (u) of some Lindblad generator for each u with, say, the control being exercised in the Hamiltonian component i [ ; H (u)] as before. (More generally, we could equally well consider a nonlinear quantum kinetic equation.) The control strategy fu (t)g will be here non-random, as will be any speci…c cost J [fug ; t0 ; %0 ]. As for S (t; %) = inf J [fug ; t; %] at the times t < t + " < T , one has (Z ) Z t+"

S (t; %) = inf

T

C (u (r) ; % (r)) dr +

fug

t

C (u (r) ; % (r)) dr + G (% (T )) :

t+"

Suppose that fu (r) : r 2 [t; T ]g is an optimal control when starting in state % at time t, and denote by f% (r) : r 2 [t; T ]g the corresponding state trajectory starting at state % at time t. Bellman’s optimality principle [13],[17] observes that the control fu (r) : r 2 [t + "; T ]g will then be optimal when starting from % (t + ") at the later time t + ". It therefore follows that Z t+" S (t; %) = inf C (u (r) ; % (r)) dr + S (t + "; % (t + ")) : fu(r)g

t

For " small we expect that % (t + ") = % + # (u (t) ; %) " + o (") and provided that S is su¢ ciently smooth we may make the Taylor expansion S (t + "; % (t + ")) = 1 + "

@ + " h# (u (t) ; %) ; ri S (t; %) + o (") : @t

(21)

In addition, we approximate Z t+" C (u (r) ; % (r)) dr = "C (u (t) ; %) + o (") t

and conclude that (note the convective derivative!) S (t; %) = inf

u2U

1 + " C (u; %) +

@ + h# (u; %) ; r i @t

S (t; %) + o (")

where now the in…mum is taken over the point-value of u (t) = u 2 U . In the limit " ! 0, one obtains the equation @ S (t; %) = inf fC (u; %) + h# (u; %) ; rS (t; %)ig ; u2U @t 8

(22)

where r = r% . The equation is then to be solved subject to the terminal condition S (T; %) = G (%) : (23) We may introduce the Pontryagin Hamiltonian function on T0 by the Legendre-Fenchel transform H# (q; p [X]) := sup fh# (u; % (q)) ; I

C (u; % (q))g :

Xi

T0? de…ned (24)

u2U

Here we use a parametrization % (q) = %0 q, q 2 T0 and the fact that the supremum does not depend on 2 R since h# (u; %) ; Ii = 0. Therefore H depends on X only through the equivalence class p [X] 2 T0? which is referred to as the co-state. It should be emphasized that these Hamiltonians are purely classical devices which may be called super-Hamiltonians to be distinguished from H. We may then rewrite (22) as the (backward) Hamilton-Jacobi equation @ S (t; % (q)) + H# (q; p [rS (t; %)] (q)) = 0: @t

(25)

Applying the derivative rq = r to this equation to q = %0 % in the tangent space T0 we obtain the dynamical equation p_ = rq H# (q; p) for the costate pt = Qt (T; s) of the operator–valued function X (t; %) = rS (t; %), where Qt (T; s) = p [X (t; %)] is the solution of this equation satisfying the terminal condition QT (T; s) = s := p [G] with p [G (%)] (q) = p [G (% (q))] for G = rG. We remark that, if u (q; p (X)) is an optimal control maximizing K# (u; q; p (X)) = h# (u; % (q)) ; I

Xi

C (u; % (q)) ,

d then the corresponding state dynamical equation dt % = # (u (%; X) ; %) in terms of its optimal solution qt %0 %t (t0 ; %0 ) corresponding to %t0 %0 can be written as q_ = rp H# (q; p), noting that

rp H# (q; p) = rp K# (u (q; p) ; q; p) =

# (u (q; p) ; % (q))

(26)

@ due to the stationarity condition @u K# (u; q; p) = 0 at u = u . This forward equation with q0 = 0 for % (t0 ) = %0 together with the co-state backward equation with pT = p [G] s is the canonical Hamiltonian system. Thus we may equivalently consider the Hamiltonian boundary value problem

q_t rp H# (qt ; pt ) = 0; q0 = 0 p_t + rq H# (qt ; pt ) = 0; pT = s

(27)

which we refer to as the Hamilton-Pontryagin problem, in direct analogy with the classical case [28]. The solution to this problem de…nes the minimal cost as the path integral S (t0 ; %0 ) =

Z

T

t0

[hq_r jpr i

H (qr ; pr )] dr + G (% (qT )) : 9

Thus the Pontryagin maximum principle for the quantum dynamical system is the observation that the optimal quantum control problem is equivalent to the Hamiltonian problem for state and co-state fqg and fpg respectively, leading to optimality K# (u; q; p) H# (q; p) with equality for u = u (q; p) maximizing K# (u; q; p).

4.2

Bellman Equation for Filtered Dynamics

We now consider the stochastic di¤erential equation (9) for the …ltered state in place of the master equation (5). This time, the cost is random and we consider the problem of computing the minimum average cost as in (20). The Bellman principle can however be applied once more. As before, we let fu! (t)g be a stochastic adapted control leading to optimality and let %! (r) be the corresponding state trajectory (now a stochastic process) starting from % at time t. Again choosing t < t + " < T , we have by the Bellman principle E [S (t + "; % (t + "))] + o (") = S (t; %) @S (t; %) + C (u; %) + D (u; %) S (t; %) " @t

+ inf

u2U

Taking the limit " ! 0 yields the di¤usive backward Bellman equation @S = inf fC (u; %) + D (u; %) S (t; %)g : u2U @t

(28)

This equation is to be solved backward with the terminal condition S (T; %) = G (%). Using the Hamiltonian function (24) this can be written in the HamiltonJacobi form as @S 1 (%) + H# (q; rS (%)) = h (%) @t 2

(%) ; r

rS (%)i ;

(29)

where we have replaced the co-state p [rS (%)] by its representative rS (%) 2 A and omitted with the customary abuse of notation the argument q in % (q) but not in H (q; p). Note that since the di¤erence D (u; %) S (t; %)

h# (u; %) ; rS (t; %)i =

1D (%) 2

2

;r

2

E S (t; %)

is assumed to be independent of u, the solution u to the minimization problem inf fC (u; %) + D (u; %) S (t; %)g =

u2U

1D (%) 2

2

;r

2

E S (t; %)

H# (q; rS (t; %))

in (28) coincides with the solution u (q; p) of the corresponding nonstochastic problem (26) for q = %0 % and p = p [rS (t; %)].

10

5

Stochastic Hamilton-Jacobi-Bellman Equation

An alternative approach to deriving the equation (29) will now be formulated, in which the control strategy is a prori not assumed to be nonanticipating. First of all we make a Wong-Zakai approximation [34] to the Stratonovich …ltering ( ) equation (13). This is achieved by introducing a di¤erentiable process w! (t) = Rt ( ) (r) dr converging to the Wiener noise w (t) as n ! 0 almost surely and 0 o

( ) uniformly for t 2 [0; T ]. (For instance, we may take (t) to be Gaussian with a …nite auto-correlation time which vanishes asn ! 0.) o We may then ( ) expect the same type of convergence for the solution, %! (t) , to the random ODE d ( ) u (t) ; %(! ) (t) + %(! ) (t) ( ) (t) % (t) = dt ! ( )

with non-random initial condition %! (t0 ) = %0 , as for the solution f%! (t)g with the same initial data %! (t0 ) = %0 . If we …x the output ! 2 , then we have an equivalent non-random dynamical system for which we will have a minimal cost function and we denote this as ( ) S! (t0 ; %0 ). Note that this depends on the assumed realization of the measurement output process and on the approximation parameter . The HJB equation ( ) for S! (t; %) will be (25) with (u) now replaced by (t; u) = (u) + ( ) (t): @ ( ) S (t) + H @t !

q; rS(! ) (t) =

D

E ; rS(! ) (t)

where we omitted the argument % in S, and in q = %0 ! 0 we obtain the Stratonovich SDE dS! (%) + H (%0

%

( )

(t) ;

q (%). In the limit

%; rS! (%)) dt = h (%) ; rS! (%)i dw:

(30)

which may be called a stochastic Hamilton-Jacobi-Bellman equation. Note that since (%) ( ) (t) doesn’t depend on u, the corresponding optimal strategy u! (t) as the solution of the optimization problem n D Eo D E inf C (u) + (u) + ! ( ) (t) ; X (q) = ! ( ) (t) ; X (q) H (q; p [X]) u2U

is the same function u (q; p) of q = %0 independent of # (%)

(%) =

( )

h i ( ) ( ) %! (t) and p = p rS! (t) as in (26)

(t). Moreover, due to independence of the di¤erence

1 L (%) + (%) Ly 2

(%) ; L + Ly %

%; L + Ly

(%)

on u, the function # (u) in (26) may technically be even replaced by the function (u).

11

5.1

Interpretation of the Stochastic HJB equation

The expression S! (t0 ; %0 ) gives the optimal cost from start time t0 to terminal time T when we begin in state %0 and have measurement output ! 2 . It evidently depends on the information f! (r) : r 2 [t0 ; T ]g only and is statistically independent of the noise w (t) = ! t prior to time t0 . In this sense, the stochastic action S! (t; %) is backward-adapted, and the optimal control strategy may not be nonanticipating. This point is of crucial importance: it means that the stochastic Hamiltonian-Jacobi-Bellman theory is not related directly to the stochastic Hamilton-Jacobi theory [32] where the action is always taken as be forward-adapted as the non-stochastic functional S (t; %) of the adapted state % = %! (t); it also means that we need to be careful when converting (30) to Itô form. This is a direct consequence of the fact that Bellman’s principle works by backward induction. Let us introduce the following time-reversed notations := T t; w ~ ( ) := w (T

~! ( ; %) := S! (T ) = w (t) and S

; %) = S! (t; %) :

~ ( ; %) is forward adapted to the …ltration generated by w: The process 7! S ~ ~ that is S ( ; %) is measurable with respect to the sigma algebra generated by {w ~ ( ): 2 [0; ]g. Note that the Itô di¤erential dw ~( ) = w ~ ( + ") w ~( ) coincides with w (t ") w (t) dw ~ (t) for t = T ~ and positive dt = " = d . Theorem 1 The stochastic process fS (t; %) : t 2 [0; T ]g satis…es the backward Itô SDE dS + H (q; rS ) dt = where dw ~ (t) := w (t)

w (t

1 ~ h ; r h ; rS ii dt + h ; rS i dw 2

(31)

dt) is the past-pointing Itô di¤ erential.

Proof. For simplicity, we suppress the % and q dependences. We shall take " > 0 to be in…nitesimal and recast (30) in the form S

1 t+ " 2

S

1 " 2

t

1 + h ; rS (t)i w t + " 2

H (rS (t)) " w t

1 " 2

= o (") :

In time-reversed notations, this becomes 1 ~ " S 2 D E ~ ( ) w + ; rS ~ ~ S

1 ~ ~ ( ) " + " H rS 2 1 1 " w ~ + " = o (") ; 2 2

12

~ v (p) = Hv ( p). We then have the forward-time equation where H 1 ~ + " S 2 D E ~ ( ) w + ; rS ~

1 " 2

~ S

~ ( ) " rS

~ +H

1 + " 2

1 " 2

w ~

= o (") ;

and using the Itô-Stratonovich transformation D D

=

E ~( ) w ; rS ~

1 + " 2

E ~ ( ) [w ; rS ~ ( + ")

1 " + o (") 2 EE D 1D ~ ( ) "; ; r ; rS 2

w ~ w ~ ( )]

we get by substitution ~ S ~ +H

1 + " 2

~ S

~( ) "+ rS

D

D EE 1D ~( ) " ; r ; rS 2

1 " 2

E ~ ( ) [w ; rS ~ ( + ")

w ~ ( )] = o (") :

~ (T or, in the backward form for the original S (t; %) = S 1 S t+ " 2

S t

1 " 2

+

t; %),

1 h ; r h ; rS (t)ii " 2

H (rS (t)) " + h ; rS (t)i [w (t)

w (t

")] = o (") :

In the di¤erential form this clearly is the same as (31). If we denote by E expectation on , then E [h ; rS (t)i dw ~ (t)] = 0 since the backward solution S (t), and its derivatives, with nonstochastic terminal condition S (T ) = S are independent of the mean-zero past-point Itô di¤erentials dw ~ (t). We then have as a corollary that the averaged cost S (t; %) := E [S (t; %)] will satisfy the equivalent di¤usive Hamilton-Jacobi equation @S 1 + H (q; rS) = h ; r h ; rSii @t 2

(32)

which is the Hörmander form of the Bellman equation (29) for optimal cost S (t; %). This proves that the optimal control strategy realizing the stochastic Hamilton Jacoby equation 30 is in average not better than the nonaticipating strategy realizing the equation 28. (In fact it coincides with the optimal nonanticipating strategy in this case.)

6

Linear–quadratic State Cost

A tractable special case, applicable to quantum mechanics, occurs when C (u; %) and G (%) are both linear (19) in the state %, with quadratic dependence of C on u. 13

Let us specify a cost observable with control parameter u = u1 ; ; u n 2 Rn and having a quadratic dependence of the form (Einstein index convention!) C (u) =

1 g 2

u u + u F + C0

where (g ) are the components of a symmetric positive de…nite metric with inverse denoted g and F1 ; ; Fn ; C0 are …xed bounded operators. We take control Hamiltonian operator to be H (u) = u Q where Q1 ; ; Qn are …xed controlled coordinates (bounded observables). Our aim is to …nd the optimal value u for each pair (q; p) giving a minimum to h%; C (u)i + h# (u; %) ; pi for % = % (q): we will have @ fh%; C (u)i + h# (t; u; %) ; Xig @u = g u + hP; F i + hi [P; Q ] ; Xi :

0

=

Thus the optimal control u (q; p (X)) is given by the components u =

g

h% (q) ; F + i [Q ; p [X]]i ;

where p [X] is any operator X I. This yields a unique point of in…mum and on substituting we determine that H# (q; p)

1 g h% (q) ; F + i [Q ; p]i h% (q) ; F + i [Q ; p]i 2 h% (q) ; C0 + LR (p) + LL (p)i :

=

As a result, the Hamilton-Jacobi-Bellman equation takes the form @S 1 + g @t 2

h%; F + i [Q ; rS]i h%; F + i [Q ; rS]i

= h%; C0 + LR (rS) + LL (rS)i +

1 h (%) 2

(%) ; (r

r) Si :

The terminal condition being that S (%; T ) = h%; Gi.

6.1

Controlled Qubit

Let us illustrate the above for the case of a qubit (two-state system). The feedback control problem we consider is similar to the one formulated in [15] with qubit …ltering equation derived in [10]. Choosing a basis f g in the tangent space T0 of zero trace 2 2 matrices, say given by Pauli spin vector ~ = ( x ; y ; z ) with x

=

0 1

1 0

;

y

=

0 i 14

i 0

;

z

=

1 0

0 1

;

we may represent each state by polarization vector ~r = %=1

~q:~ = %0

~q 2 R3 as

q;

where j~qj 1 and %0 = 1. Here for a convenience we have taken the coordinatization of quantum bit states with respect to the tracial state %0 = 1, normalized with respect to the unital trace tr, which is halve of the conventional trace Tr, that is tr% := 12 Tr% = 1. Any operator X takes the form X = p~:~

1;

and we have the duality h%; Ai = ~q:~ p . The state coordinate vector ~q = (x; y; z) represents the tangent elements q = ~q:~ 2 T0 , while the co-states p (A) 2 T0? are canonically identi…ed with p = p~:~ 2 T0 given by the momentum vectors p~ = (px ; py ; pz ) such that hq; Ai = hq; p (A)i is tr fqpg = ~q:~ p. Let us suppose that we have maximal spin control ~u 2 R3 of the Hamilton component of the dynamics, that is, we set 1 1 ; H (~u) = ~u:~ 2 2 We also ignore the e¤ect of the environment and take LR 0. For simplicity, we shall take the cost to have the form 1 2 C (u; %) = j~u j 2 and we take the coupling of the system to the measurement apparatus to be determined by the operator Q =

L=

1 X = Ly ; X = 2

z:

Explicitly we have 1 (xpx + ypy ) 2 from which we see that the minimizing control is ~u = ~q p~ leading to the Hamiltonian function 1 2 H# (~ p; ~q) = j~q p~ j (xpx + ypy ) : 2 h# (u; %) ; pi = ~u: (~ p

Meanwhile,

1 2

(%)

(%& z + & z %)

h (%) ; pi =

1

~q) +

h%; & z i % and so z 2 pz

z (xpx + ypy ) :

With the customary abuse S (t; x; y; z). D of notation,E we write S (t; % (~q)) 2 The Itô correction term, 21 (%) ; r 2 S , in the HJB equation is then given

by with (Sxy =

@2S @x@y ,

etc.)

2

2

zx;

zy; 1

z2

0

Sxx @ Syx Szx 15

Sxy Syy Szy

10 1 Sxz zx Syz A @ zy A : Szz 1 z2

Putting everything together, we …nd that the Hamilton-Jacobi-Bellman equation is 2 1 @S @S @S 1 ~ + + p~ rS x +y @t 2 2 @x @y 2 2 2 2 @ S @ S 2 @ S = x2 z 2 2 + y 2 z 2 2 + 1 z 2 2 @x @y @z 2 @2S @2S @2S + xyz 2 xz 1 z 2 yz 1 z 2 @x@y @x@z @y@z

7

:

Discussion

In our analysis we have sought to think of the quantum state of a controlled system (that is, its von Neumann density matrix) in the same spirit as classical control engineers think about the state of the system. This is possible since quantum (mixed) state is normally a su¢ cient coordinate for closed, as well as open quantum systems under the Markov approximation, and this remains true in the appropriate stochastic sense even if the open system is under a continuous nondemolition observation. The advantage of this is that all the quantum features of the problem are essentially tied up in the state: once the measurements have been performed the information obtained can be treated as essentially classical, as can the problem of using this information to control the system in an optimal manner. The disadvantage is that we have to deal with a stochastic di¤erential equation on the in…nite dimensional space of quantum states. Nevertheless, the Bellman principle can then be applied in much the same spirit as for classical states and we are able to derive the corresponding HamiltonJacobi-Bellman theory for a wider class of cost functionals than traditionally considered in the literature. When restricted to a …nite-dimensional representation of the state (on the Bloch sphere for the qubit) with the cost being a quantum expectation, we recover the class of Bellman equations encountered as standard in quantum feedback control. Another quantum optimal feedback control problem in the …nite-dimensional space of the su¢ cient coordinates of the Gaussian Bosonic states was considered in [12]. Acknowledgment We would like to thank Luc Bouten, Ramon van Handel, Hideo Mabuchi, Aubrey Truman for useful discussions. J.G. would like to acknowledge the support of EPSRC research grant GR/R78404/01, and V.P.B. acknowledges EEC support through the ATESIT project IST-2000-29681 and the RTN network QP&Applications.

References [1] M. Armen, J. Au, J. Stockton, A. Doherty, and H. Mabuchi. Adaptive homodyne measurement of optical phase, Phys. Rev. A 89:133602, (2002) 16

[2] V.P. Belavkin. Optimal Quantum Filtration of Markovian Signals. Problems Control Inform. Theory, 7: no. 5, 345–360 (1978) [3] V.P. Belavkin, Optimal Measurement and Control in Quantum Dynamical Systems. Preprint No. 411, Inst. of Phys., Nicolaus Copernicus University, Torun’, February 1979 [4] V.P. Belavkin, Quantum Filtering of Markov Signals with Wight Quantum Noise. Radiotechnika and Electronika, 25: 1445–1453 (1980). English translation in: Quantum Communications and Measurement. V. P. Belavkin et al, eds., 381–392 (Plenum Press, 1994). [5] V.P. Belavkin, Theory of the control of observable quantum systems. Autom. Remote Control, 44: 178-188, (1983) [6] V.P. Belavkin, Nondemolition measurement and control in quantum dynamical systems. Information complexity and control in quantum physics (Udine, 1985), 311–329, CISM Courses and Lectures, 294, Springer, Vienna, 1987. [7] V.P. Belavkin, Nondemolition measurements, nonlinear …ltering and dynamical programming of quantum stochastic processes. In: Modelling and Control of Systems (Lecture Notes in Control and Information Sciences), ed A Blaquiere, 121: 381–92 (Berlin: Springer, 1988) [8] V.P. Belavkin, A new wave equation for continuous nondemolition measurement. Phys. Lett. A, 140: 355–8 (1989). [9] V.P. Belavkin, Stochastic posterior equations for quantum nonlinear …ltering. Probability Theory and Mathematical Statistics, ed B Grigelionis, 1: 91–109 (Vilnius: VSP/Mokslas, 1990). [10] V.P. Belavkin, Quantum stochastic calculus and quantum nonlinear …ltering. Journal of Multivariate Analysis, 42: 171-201, (1992) [11] V.P. Belavkin, Quantum continual measurements and a posteriori collapse on CCR. Commun. Math. Phys., 146, 611-635, (1992) [12] V.P. Belavkin, Measurement, …ltering and control in quantum open dynamical systems. Rep. Math. Phys. 43: 405–425 (1999). [13] R. Bellman, Dynamic Programming, Princeton University Press (1957) [14] J.M. Bismut 1981 Mechanique Aléatoire Lecture Notes in Mathematics 866 (Berlin Springer) [15] L. Bouten, S. Edwards, V.P. Belavkin, Bellman equations for optimal feedback control of qubit states, arXiv:quant-ph/0407192v1 (2004) [16] L. Bouten, M. Gu¸ta¼, H. Maassen, Stochastic Schrödinger equations, J. Phys. A: Math. Gen., 37: 3189-3209, (2004) 17

[17] M.H.A. Davis, Linear Estimation and Stochastic Control, Chapman and Hall Publishers (1977) [18] G. Da Prato, J. Zabczyk, Stochastic Equations in In…nite Dimensions, Encyclopedia of Mathematics and its Applications, Cambridge University Press (1992) [19] A. Doherty, S. Habib, K. Jacobs, H. Mabuchi, and S. Tan. Quantum feedback and classical control theory. Phys. Rev. A, 62:012105 (2000) [20] C.W. Gardiner, Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences, Springer, Berlin (2004) [21] C.W. Gardiner and P. Zoller, Quantum Noise. Springer, Berlin (2000) [22] J. Geremia, J. Stockton, A. Doherty, and H. Mabuchi. Quantum Kalman …ltering and the Heisenberg limit in atomic magnetometry. Phys. Rev. Lett., 91:250801 (2003) [23] J. Gough, A. Sobolev, Stochastic Schrödinger equations as limit of discrete …ltering, Open Sys. & Information Dyn., 11, 235-255, (2004) [24] J. Gough, A. Sobolev, Continuous measurement of canonical observables and limit stochastic Schrödinger equations, Phys. Rev. A 69, 032107 (2004) [25] R. van Handel, J, Stockton, and H. Mabuchi. Feedback control of quantum state reduction. arXiv:quant-ph/0402136, (2004) [26] M.R. James, Risk-sensitive optimal control of quantum systems, Phys. Rev. A., 69: 032108, (2004) [27] K.R. Parthasarathy, An Introduction to Quantum Stochastic Calculus, Birkhäuser, Berlin (1992) [28] L.S. Pontryagin, V.G. Boltyanskii, R.V. Gamkrelidze, E.F. Mishchenko, The Mathematical Theory of Optimal Processes, John Wiley & Sons, (1962) [29] A.J. Scott, G.J. Milburn, Quantum nonlinear dynamics of continuously measured systems, Phys. Rev. A, 63, 042101; also arXiv:quant-ph/0008108 (2001) [30] O.G. Smolyanov, A. Truman, Schrödinger-Belavkin equations and associated Kolmogorov and Lindblad equations, Theor. Math, Physics, 120, 2, 973-984 (1993) [31] R.L. Stratonovich, A new representation of stochastic integrals and equations, SIAM J. Control, 4, 362-371 (1966)

18

[32] A. Truman, H.Z. Zhao, The stochastic Hamilton-Jacobi equation, stochastic heat equations and Schrödinger equations, in Stochastic Analysis and Applications, D. Elworthy, I.M. Davies, A. Truman (Eds.), World Scienti…c Press, 441-464 (1996) [33] M. Wiseman and G.J. Milburn, Quantum theory of optical feedback via homodyne detection. Phys. Rev. Lett. 70(5):548-551 (1993) [34] E. Wong, M. Zakai, On the relationship between ordinary and stochastic di¤erential equations, Int.. J. Eng. Sci., 3, pp. 213-229 (1965)

19