Pontryagin Maximum Principle and Maximum Entropy ...

A note on a possible connection between Pontryagin Maximum Principle and Jaynes Maximum Entropy Contact e-mail: [email protected] Copyright, © 2011-2015 by Javier Rivera © Registration TXu001878887 / 2013-‐09-‐15 Abstract A possible connection between the Pontryagin Maximum Principle and Jaynes Maximum Entropy has been developed in this note. A known manufacturing facility or plant wants to optimally reduce the sum of the variances of its observed production process. The plant’s models consist of known controllable state models along with known observational models of production. The selection of the plants control vector can be described as an optimal control problem, therefore the Pontryagin Maximum Principle will be used to discover a possible connection to Jaynes Maximum Entropy. The connection of Pontryagin Maximum Principle to Jaynes Maximum Entropy can be derived from each other under certain transformations relating the phase space generalized momentum as defined in a Hamiltonian system with the probability space as defined in Shannon’s entropy. The generalized momentum is used to define a known set of probabilities, which associates the Hamiltonian of the system to Jaynes Maximum Entropy via the Legendre transformation.

Introduction In this communication, the plant models are based mostly on statistical predictorcorrector models that can be written as known state and observation nonlinear models. Consider a simple example: a known plant wants to reduce errors in production by minimizing the sums of observed production variances through time. The description of the steps needed to treat this nonlinear optimal control process with the Pontryagin Maximum Principle needs the definition of the plant states nonlinear models and the plant sensors observation given by known nonlinear models. This problem will not be solved but rather the aim of this note is to show the connection between the Pontryagin Maximum Principle and Jaynes Maximum Principle as a mapping developed between the probability space and the generalized momentum. The Legendre transformation is used to map the Hamiltonian of the maximum principle to a Lagrangian. It follows the Lagrangian is just the definition of entropy as used by Jaynes Maximum Entropy. The definition of a plant model will be as found in Smirnov’s book “Oscillation Theory of Optimal Process”.

Known Plant Models A manufacturing plant wants to reduce errors in production by minimizing the sums of variances through time subject to the known state and observation models constraints. The following nonlinear system of differential equations characterizes the known plant state models: 𝑥! 𝑡 = 𝑓! 𝑥 𝑡 , 𝑢 𝑡

𝑛 = 1 𝑡𝑜 𝑁;

Note 𝑥 𝑡 = (𝑦!! 𝑡 , 𝑦!! 𝑡 , 𝑦!! 𝑡 , 𝑦!! 𝑡 , … , 𝑦!! 𝑡 , 𝑦!! 𝑡 )! and 𝑥! = (𝑦!! 𝑡 , 𝑦!! 𝑡 )! where T denotes transpose operator; 𝑦!! 𝑡 , 𝑎𝑛𝑑 𝑦!! 𝑡 represent the nth plant state position and velocity row vectors. Where 𝑢 𝑡 is a parametric feature control column vector. The function 𝑓! is the nth plant state’s nonlinear known model. The sensor systems in the plant samples K (> N) measurements or observations of the plant states with some additive environmental noise is given as 𝑧! 𝑡 = 𝑀! 𝑥 𝑡 , 𝑢 𝑡

+ 𝑤! 𝑡 where 𝑘 = 1 𝑡𝑜 𝐾;

𝑧! = (𝑧!! 𝑡 , 𝑧!! 𝑡 )! ;Where 𝑧!! 𝑡 is the estimated kth measurement plant’s production “position” vector with respect to the common origin, 𝑧!! 𝑡 is the kth measurement plant’s production rate or “velocity” vector, and the 𝑥 𝑡 is the state vector representing N plant states relative a common origin, 𝑢(𝑡) is the plant systems known control or feature parameters vector, and 𝑤! 𝑡 is an additive position and velocity zero mean noise process due to the plant sensor limitations, and other the environmental conditions. Thus 𝑀! is a nonlinear sensor measurement function based on the plant state vector, and input sensor control parameters. In this example, the purpose of this control process is to move the initial plant state from 𝑥 𝑡! = 𝑥! to a final state 𝑥 𝑡! = 𝑥! with minimum observational model variance. Typically in the Pontryagin Maximum Principle method a functional is given a priory, to be maximized (typically this functional is given a negative absolute value therefore the maximum is known to be zero). The steps to set up this problem as a Pontryagin Maximum Principle are shown in L.S. Pontryagin et al “The Mathematical Theory of Optimal Processes”: •

Differentiate the observation models with respect to time without the differentiation of the control parameter vector, u.

𝑑𝑧!,! = 𝑑𝑡

!

!

𝜕 𝑑𝑤!,! {𝑀!,! 𝑥 𝑡 , 𝑢 𝑡 }|! + 𝜕𝑥!,! 𝑑𝑡

𝑓!,! 𝑥 𝑡 , 𝑢 𝑡 !!! !!!

Or 𝑑𝑧!,! 𝑑𝑤!,! = 𝐺!,! (𝑥 𝑡 , 𝑢 𝑡 ) + 𝑑𝑡 𝑑𝑡 With !

!

𝐺!,! (𝑥 𝑡 , 𝑢 𝑡 ) =

𝑓!,! 𝑥 𝑡 , 𝑢 𝑡 !!! !!!

•

𝜕 {𝑀 𝑥 𝑡 , 𝑢 𝑡 }|! 𝜕𝑥!,! !,!

Rename the observation variables, and the hidden state variables into an a new expanded state vector Y(t) . 𝑌(𝑡) = (𝑥 ! 𝑡 , 𝑍 ! 𝑡 )! Where Z(t) vector is the aggregate of the K sensors observations (𝑧!!

•

𝑡 , … , 𝑧!! 𝑡 )! .

Define the minimum error variance as the process for obtaining an optimal control which minimizing the sum of observational noise variance defined by following equation assuming zero mean noise process, 𝐸 𝑤! = 0.

!

! !

𝐸 (|𝑤! |) − |𝐸 𝑤! |

!

=

!!!

𝐸 (|𝑤! |)

!

!!!

!

!

!

= !! !!! !!!

(𝑤!,! )! 𝐷!,! ( 𝑤!,! )𝑑𝑤!,!

In the above equation,𝐷!,! , are know (or assumed) probability density of the 𝑤!,! observation noise processes. Since the noise process is a zero mean process the measurements of state are unbiased. The problem can be rewritten as a maximization of a functional, 𝐽[𝑌 𝜏 , 𝑢 𝜏 ], in terms of the new state variable Y and sensors systems observational feature enhanced control parameters u by a change of variables, 𝑑𝑤!,! =

!!!,! !"

𝑑𝜏.

𝐽 = 𝐽[𝑌 𝜏 , 𝑢 𝜏 ] = −

!!

|𝐼 𝑌(𝜏), 𝑢(𝜏) |𝑑𝜏 ≤ 0

!!

Note that the maximum of J is zero. Where the integrand, 𝐼 !!!!!

(𝑌! − 𝑀!!!! 𝑌, 𝑢 )! 𝐷! ((𝑌! − 𝑀!!!! 𝑌, 𝑢 )(

𝐼 𝑌, 𝑢 = !!!!!!

𝑌, 𝑢

, is given as

𝑑𝑌! − 𝐺!!!! 𝑌, 𝑢 ) 𝑑𝜏

Note the nomenclature has changed to (𝑀!!!! , … , 𝑀!!!! ) = 𝑀! , and (𝐺!!!! , … , 𝐺!!!! ) = 𝐺! , …, (𝐺!!!!!!! , … , 𝐺!!!!! ) = 𝐺! .

…, (𝑀!!!!!!! , … , 𝑀!!!!! ) = 𝑀!

Subject to the state model system of differential equations: 𝑑𝑌! = 𝐹! 𝑌 𝑡 , 𝑢 𝑡 𝑑𝑡

𝑤ℎ𝑒𝑟𝑒 𝑟 = 1 𝑡𝑜 6𝑁 + 6𝐾 .

Where F is given as the following vector ( f ,… ,f , 𝐺! + !!!" ,…,𝐺! + !!!" ) of length 6N+6K. Note the control process is to move the initial plant state from 𝑥 𝑡! = 𝑥! to a final state 𝑥 𝑡! = 𝑥! with minimum sum of observational models variances, yet other cost functions could be used such as minimum time. 1

•

N

!

!

In order to use Pontryagin Maximum Principle we need to define an additional variable 𝑌! 𝑡 = −

! |𝐼 !!

𝑌(𝜏), 𝑢(𝜏) |𝑑𝜏

.

By differentiation of above equation we obtain 𝑑𝑌! 𝑡 = −|𝐼 𝑌(𝑡), 𝑢(𝑡) | = 𝐹! 𝑌(𝑡), 𝑢(𝑡) 𝑑𝑡 Note that the initial conditions are 𝑌! 𝑡! = 0 , 𝑌! 𝑡! = 𝐽. Now the methods of Pontryagin Maximum Principle can be applied to this problem to solve this problem if needed. For the Pontryagin Maximum Principle definitions and methods please refer to Smirnov’s book entitled “Oscillation Theory of Optimal Processes” pages 7-8 or L.S. Pontryagin et al, “The Mathematical Theory of Optimal Process” pages79-80. In a nutshell, what Pontryagin Maximum Principle does is to convert a very difficult observational/control problem to an operations research problem in which we know the optimal results are at the edges or vertices of the boundary that intercept of the control parameter volume by the “cost” or Hamiltonian function. By translating the problem from an intractable dynamic problem of finding the minimum observational noise variance given the known plant states and observation models as constraints, this maps the control process problem into the theory of optimal control process which has the support of a principle such as Pontryagin Maximum Principle, which is an extension of the Calculus of Variations and Hamiltonian dynamics. A simple check that we are in the right track is to define the Hamiltonian as: !!!!!

𝐻 𝛹 𝑡 ,𝑌 𝑡 ,𝑢 𝑡

=

𝛹! (𝑡)𝐹! 𝑌 𝑡 , 𝑢 𝑡 !!!

𝑑𝑌! 𝜕𝐻 = = 𝐹! 𝑌 𝑡 , 𝑢 𝑡 𝑑𝑡 𝜕𝛹!

𝑓𝑜𝑟 𝑟 = 0 𝑡𝑜 6𝑁 + 6𝐾

where the equality holds almost everywhere. From which the known state and observational models are obtained. The generalized momentum equations are 𝑑𝛹! 𝜕𝐻 =− =− 𝑑𝑡 𝜕𝑌!

!!!!!

𝛹! 𝑡

𝜕𝐹! 𝑌 𝑡 , 𝑢 𝑡

Hopefully, the characteristic functions 𝐹! dependence on the state variable,

𝑌!

; 𝑟 = 0, … , (6𝑁 + 6𝐾)

𝜕𝑌!

!!!

𝑌 𝑡 ,𝑢 𝑡

will be sparse in the functional

, therefore the matrix 𝐹!,! = !!

!

! ! ,! ! !!!

will be sparse.

Using the usual calculus necessary, and sufficient conditions to perform the maximization of the Hamiltonian. The least upper bound value, 𝑀 𝑌 𝑡 , 𝑢 𝑡 , of the Hamiltonian will typically lie in the boundary of the control space. There are numerical methods such as steepest descent which could be used to find the optimal observational control parameters u(t) to move the initial plant state to a final state with minimum observational model variance. But this is not the aim of this note, but to show a possible connection with Jaynes Maximum Entropy.

Jaynes Maximum Entropy Connection A possible connection of the Pontryagin Maximum Principle to the Jaynes Maximum Entropy can be achieved by mapping generalized momentum, 𝛹! , to a probabilistic space, for example define the scalar weight, 𝜌 𝑡 , as !!!!!

𝜌 𝑡 =

|𝛹! 𝑡 | > 0 𝑎𝑛𝑑 𝑓𝑖𝑛𝑖𝑡𝑒 𝑓𝑜𝑟 𝑒𝑣𝑒𝑟𝑦 𝑡𝑖𝑚𝑒 𝑡 𝑖𝑛 [𝑡! , 𝑡! ] !!!

!!!!! Given random disjoint events, 𝐸! , in a known probability space, where 𝑆 = 𝑈!!! 𝐸! and a uniform probability, 𝑃! 𝑡 , is defined as follows (Defining the probability space needs more work.)

𝑃! 𝑡 = 𝑃(𝐸! , 𝑡) = The probability of,

𝐸! ,

𝛹! 𝑡 !!!!! |𝛹! !!!

𝑡 |

=

𝛹! 𝑡 𝜌 𝑡

≥0

due to its disjoint property of the members is given by !!!!!

𝑃 𝑆 =

!!!!!

𝑃( 𝐸! ) = !!!

𝑃! (𝑡) = 1 !!!

The above constraint equation plays the role as the sum of probabilities of all events that yields to one. Another mapping example could have been used, for example using the

normal probability rather than the uniform probability, this type of transformation is maybe defined as a contact or canonical transformation. Then by setting 𝑃! 𝑡 as a generalized momentum into the a new Hamiltonian formulation could be defined as !!!!!

𝐻 𝑃 𝑡 ,𝑌 𝑡 ,𝑢 𝑡

=

𝑃! 𝑡 𝐹! 𝑌 𝑡 , 𝑢 𝑡

= 𝐸{𝐹}.

!!!

With the constraint !!!!!

𝑃! (𝑡) = 1 !!!

Since the following relation holds for all generalized momentum by inspection (and using triangular inequality) 𝐻 𝑃 𝑡 ,𝑌 𝑡 ,𝑢 𝑡

≥ 𝐻

𝛹 𝑡 ,𝑌 𝑡 ,𝑢 𝑡 𝜌 𝑡

.

Then the least upper bound of the Hamiltonian as a function of the control parameter u holding the state vector, Y(t), and conjugate momentum constant, 𝛹 𝑡 , therefore 𝜌 𝑡 and the probability of state, 𝑃∝ , are also held constant then 𝑀 𝑌 𝑡 ,𝑢 𝑡

= sup!!! 𝜌(𝑡)𝐻 𝑃 𝑡 , 𝑌 𝑡 , 𝑢 ≥ 𝑀 𝑌 𝑡 , 𝑢 𝑡

= sup!!! 𝐻 𝛹 𝑡 , 𝑌 𝑡 , 𝑢

Since the Hamiltonian is defined as a linear function with respect to the generalized momentum in the Pontryagin Maximum Principle, note that the Maximum Principle may be considered the negative of the Hamiltonian used in Hamilton’s Least Action (with unit mass, i.e. probability space). We can use the Legendre transformation to find the equivalent “Lagrangian” function, 𝐿, where we can interpret the Legendre transformation as a partial differential equation with respect to the generalized probabilities, 𝑃! 𝑡 : !!!!!

−𝐻 = !!!

𝜕𝐿 𝑃! 𝑡 𝜕𝑃! 𝑡

!!!!!

−𝐿 =−

𝑃! (𝑡)𝐹! 𝑌 𝑡 , 𝑢 𝑡 !!!

This equation can be easily solved for a particular solution of the Lagrangian by induction and disregarding the homogeneous solutions. Case 0:

𝑃! 𝑡

d𝐿 − 𝐿 = −𝑃! (𝑡)𝐹! 𝑌 𝑡 , 𝑢 𝑡 d𝑃! 𝑡

This is a linear differential equation of the Lagrangian, 𝐿, its elementary solution is 𝐿 = −𝐹! 𝑌 𝑡 , 𝑢 𝑡

𝑃! (𝑡)𝐿𝑜𝑔(𝑃! (𝑡))

The Log function is the natural logarithm function to the base e. Assume Case n holds, that is 𝐿 =

! !!! −𝐹!

𝑌 𝑡 ,𝑢 𝑡


is a particular solution

of the Legendre transformation (PDE). Now the case n+1 needs to be proven. Case n+1: !!!

!!!

𝜕𝐿 𝑃! 𝑡 𝜕𝑃! 𝑡

!!!

−𝐿 =−

𝑃! 𝑡 𝐹! 𝑌 𝑡 , 𝑢 𝑡

= −𝐻 = −𝐸{𝐹}

!!!

By construction !!!

𝐿=−

𝐹! 𝑌 𝑡 , 𝑢 𝑡


!!!

Differentiate the Lagrangian with respect to the known probability to obtain 𝜕𝐿 = −𝐹! 𝑌 𝑡 , 𝑢 𝑡 𝜕𝑃∝ (𝑡)

1 + 𝐿𝑜𝑔 𝑃∝ (𝑡)

Substituting into the Legendre transformation proves the n+1 case. !!!

{𝑃∝ (𝑡) !!!

𝜕𝐿 }−𝐿 = 𝜕𝑃∝ (𝑡) !!!

=

−𝑃∝ 𝑡 𝐹! 𝑌 𝑡 , 𝑢 𝑡

1 + 𝐿𝑜𝑔 𝑃∝ 𝑡

!!! !!!

+

!!!

𝐹∝ 𝑌 𝑡 , 𝑢 𝑡

𝑃∝ 𝑡 𝐿𝑜𝑔 𝑃∝ 𝑡

∝!!

=−

𝑃∝ 𝑡 𝐹! 𝑌 𝑡 , 𝑢 𝑡

= −𝐸 𝐹 = −𝐻

!!!

Therefore, we have shown it holds for all integers n. Therefore, the Lagrangian has the form !!!!!

𝐿=−

𝐹! 𝑌 𝑡 , 𝑢 𝑡 !!!


The time integral of the Lagrangian is given as 𝐽! =

!!

𝐿𝑑𝜏 =

!!

!!

!!!!!

−𝐹! 𝑌 𝑡 , 𝑢 𝑡

!! !

!!! !!!!!

= !!

𝑃! (𝑡)𝐿𝑜𝑔(𝑃! (𝑡)) 𝑑𝑡

−𝑃! 𝐿𝑜𝑔(𝑃! )𝑑𝑌! !!!

due to the following change of integration variables, 𝐹! 𝑑𝑡 = !!

replacing 𝐹! to its equivalent vector ( !"! , 𝐽! = + +

!!

!!!!!

!! !!!!!! !! !!

!!

!!! !"

,…,

−𝑃! (𝑡)𝐿𝑜𝑔(𝑃! (𝑡))

−𝑃! (𝑡)𝐿𝑜𝑔(𝑃! (𝑡))

−𝑃! (𝑡)𝐿𝑜𝑔(𝑃! (𝑡))

𝑑𝑡 = 𝑑𝑌! . Now

!" !!!

!!! !!! , !" , … , !" !"

!!

!! !!!

!!!

) of length 6N+6K+1.

𝑑𝑥! 𝑑𝑡 𝑑𝑡

dz!!!" 𝑑𝑡 dt

d𝑌! 𝑑𝑡 dt

With a change of variables (t = f-1i(xi(t)), etc.) the Lagrangian action integral can be seen to be as the maximization of entropy in terms of state or phase space, observational probability space, and for the sum of the variances in the observation space. (This is a stretch or sketch, probably with the use of an Ergodic Theorem.). !

𝐽! =

!!

!! !!!

!

−𝑃! 𝐿𝑜𝑔(𝑃! ) 𝑑𝑥! +

!!!!!

!! !!!!!!

!

−𝑃! 𝐿𝑜𝑔(𝑃! ) 𝑑𝑧!!!! +

!!

−𝑃! 𝐿𝑜𝑔(𝑃! ) 𝑑𝑌!

Since the above mathematical method is reversible, thereby the reverse of above steps could be shown: the maximization of entropy of above action functional of the Lagrangian !!!!!

𝐿=

−𝐹! 𝑌 𝑡 , 𝑢 𝑡


!!!

via Legendre Transformation is equivalent to the Pontryagin Maximum Principle. Additionally, any other constraints not mention here can be dealt with Lagrange multipliers. For methods of Jaynes Maximum Entropy please refer to his 1957 and 1982 papers listed in the references.

Conclusion A connection of Pontryagin Maximum Principle to Jaynes Maximum Entropy can be derived from each other under certain transformations relating the phase space generalized momentum as defined in a Hamiltonian system with the probability space as defined in Shannon’s entropy. This connection between the Pontryagin Maximum Principle and Jaynes Maximum Entropy was shown in this note for a manufacturing facility or plant wants to optimally reduce the sum of the variances of its observed production process. The generalized momentum was used to define a uniform set of probabilities, which associates the Hamiltonian of the system to Jaynes Maximum Entropy via the Legendre transformation. Yet, further work needs to be done in further explaining the physical meaning and implications of the probability space mapping.

References 1) L.S. Pontryagin, V.G. Boltyanskii, R.V. Gamkrelidze, E.F. Mishchenko, The Mathematical Theory of Optimal Process, L.S. Pontryagin Selected Works Volume 4, Gordon and Breach Science Publishers, NY 1986. 2) G.M. Smirnov, Oscillation Theory of Optimal Process, Wiley Inter-‐Science Publication, John Wiley and Sons, NY 1984 3) E.T. Jaynes, “Information Theory and Statistical Mechanics”, Physical Review, Volume 106, Number 4, May 15, 1957, pgs. 620-‐630 4) E.T Jaynes, “On The Rationale of Maximum-‐Entropy Methods”, Proceedings of the IEEE, Vol. 70, No. 9, September 1982, pgs. 939-‐952 5) I.M. Gelfand & S.V. Fomin, Calculus of Variations, Prentice Hall, NJ 1963