Minimum Time Trajectory Optimization and Learning N ... - CiteSeerX

8 downloads 0 Views 68KB Size Report
the use of only switching times as parameters if the ... search for the optimal switching times. .... independent constraints where the Jacobean matrix A.
Minimum Time Trajectory Optimization and Learning N. Sadegh and B. Driessen School of Mechanical Engineering Georgia Institute of Technology, Atlanta, GA 30332

Abstract This paper presents a numerical algorithm for finding the bang-bang control input associated with the time optimal solution of a class of nonlinear dynamic systems. The proposed algorithm directly searches for the optimal switching instants based on a projected gradient optimization method. It is shown that this algorithm can be made into a learning algorithm by using on-line measurements of the state trajectory. The learning is shown to have the potential for significant robustness to mismatch between the model and the system. It learns a nearly optimal input through repeated trials in which it utilizes the measured terminal state error of the actual system and gradients based on the theoretical state equation of the system but evaluated along the actual state trajectory. The success of the method is demonstrated on an under-actuated double pendulum system called the acrobot.

Introduction One of the primary attributes of an intelligent control system is its ability to automatically generate the desired state and/or output trajectory that the plant or plants under its control must follow. The determination of the desired trajectory is often performed in order to minimize a certain performance index while satisfying the required state and/or input constraints. A frequently used performance index in many motion control applications is the "time" required to perform a certain task. Most of the work in minimum-time trajectory planning has been for rigid, non-redundant, completely actuated manipulators, where nonredundant means the end effector path completely determines the path in joint space (to within a finite number of possibilities). Exemplary is the work of Bobrow [2] and Shin and McKay [13, 12, 15, 14]. Here, the trajectory of the system along this path can be completely specified using a single path parameter "s(t)." Bobrow [2] first constructed an infeasible

region in the state space ( s , s&) (regions for which the appropriate input torques for keeping the system on the path are not available) and then utilized a trial and error procedure for finding switching curves in the state space. Shin and McKay extended Bobrow's work, and later removed trial and error from the method by doing dynamic programming while utilizing a discretization of the state space (s, s&) [14]. Since there are only two state variables, Bellman's curse of dimensionality (see Larson and Casti [9]) was not too burdensome computationally speaking. There has been some work for the case where there are no path constraints and only the terminal state is specified. This scenario allows for the use of only switching times as parameters if the state equation of the dynamic system is linear in the inputs and the inputs have explicit upper and lower bounds. (This can be proved with the Pontryagin Minimum Principle {see Bryson and Ho [3] and Lewis [10]}.) Exemplary is the work of Meier and Bryson [11] and Byers and Vadali [5]. Eisler, Robinett, Segalman, and Feddema [8, 7] applied sequential quadratic programming to a two-link flexible (under-actuated) robotic arm whose tip was specified to stay on a straight line path. The objective was to get from one configuration at rest to another at rest in minimum time while maintaining the tip path. The optimization variables used were a discretized input torque history and the final time. This work represented progress toward solving more general minimum time problems than those discussed above. An interesting contrast to the above methods was presented by Zimmerman and Layton [16]. To generate a sub-optimal minimum time trajectory, they used a genetic algorithm with the actual system in the optimization loop. Since they were dealing with a system with noisy sensors (a flexible system with strain gauge sensors), a gradient based approach presented difficulty. They reportedly achieved good results. Unfortunately, this approach does not easily handle general constraints. Byers et al. [4] investigated near minimum time closed loop slewing maneuvers of flexible spacecraft. These researchers first neglected the

flexibility of the craft to obtain the switching (min/max) input history for a rigid model of the craft and smoothed the history out by replacing the sign function by a hyperbolic tangent approximation. Their work was based on the observation that such an approach yielded a final state that was very close to the desired final state for large angle maneuvers of craft with fairly small flexibility. Then more standard control methods were used to remove the residual error in the final state. In terms of parametrization, the work herein is similar to the switching time approach used by Meier and Bryson [11]; but similar in spirit to the learning approach of Zimmerman and Layton [16]. It formulates a projected gradient method to directly search for the optimal switching times. Of the main contributions of the paper are the analytical and numerical methods for computing the various gradients required by the optimization algorithm and also a procedure for testing the conditions of the Pontryagin’s Minimum Principle on the candidate solution. Moreover, the algorithm automatically adds or deletes switching intervals during the search if the exact number of switching intervals is not exactly known. Another key feature of the algorithm is that it is capable of learning a nearly optimal control input by placing the actual system in the loop thereby improving its robustness to modeling error. By means of a simulation example, the learning algorithm is shown to have potential for extreme robustness to mismatch between the model and the system. It is a projected gradient method that uses the measured terminal state error of the actual system and gradients based on the theoretical state equation of the system but evaluated along the actual state trajectory. Alternatively, if appropriate, numerical derivatives of the actual final state with respect to optimization parameters (herein, switching time intervals) could be obtained. In the absence of noise in measurements, such derivatives would be exact. To demonstrate the usefulness of the method, it is successfully applied to control an under-actuated mechanical system. The system, referred to as the "acrobot," is a double pendulum system with only one actuator torque, at the second joint. The optimal input swings the pendulum up from a given initial configuration, namely the "straight down" position, to an arbitrary final position, in particular the "straight up" position, in minimum time subject to motor torque limits.

Problem Statement and Formulation We are given a nonlinear time invariant dynamic system whose model or theoretical state equation is: x& = f ( x) +

r

∑ g ( x )u j

j

(1)

i =1

where x ∈ R m is the state vector, u1 ,K, ur ∈ R are the inputs, and f (x) and g j (x) are smooth functions of x. The initial condition: x(0) = x0 required terminal state x f :

(2)

x (t f ) = x f

(3)

and input bounds: u j , min ≤ u j ≤u j , max

(4)

are given. The goal is to find u (t ) that minimizes the final time t f while satisfying equations (3) and (4). The necessary conditions for optimality of this problem can be derived using the Pontryagin’s Minimum Principle {see Bryson and Ho [3] and Lewis [10]}. Defining the Hamiltonian as r   H = 1 + λT  f ( x) + g j ( x)u j , then each optimal ui   j =1   must minimize H, where the co-state λ satisfies the following differential equation: r  ∂g j ( x)  & = − ∂H = − ∂f ( x) + λ (5) λ ∂x ∂x   j =1  ∂x  subject to H ( x(t f ), λ(t f )) = 0 and the 2-point





boundary condition on x. The optimality condition for each uj implies that  u j ,max if λT g j ( x) < 0  u j (t ) =  u j ,min if λT g j ( x) > 0 (6) undetermined if λT g ( x) = 0 j  In spite of the simple dependency of uj on λ and x , determining λ itself is a difficult task as the resulting 2-point boundary value problem is nonsmooth hence difficult to solve numerically. A useful relationship, however, which can be used to determine λ(0) and subsequently λ(t ) , if a candidate solution were known, is that λ(0) is the sensitivity of the optimal cost function with respect to changes in x(0) {see Bryson and Ho [3]}, that is, optimal ∂t f λT (0) = (7) ∂x(0)

2

independent constraints where the Jacobean matrix A has full rank m ≤ n , which is the normal case of the algorithm, A+ = AT ( AAT ) − 1 .

The Numerical Optimization Algorithm Based on the Pontryagin’s Minimum Principle, a time optimal solution of the j-th input, u j (t ) , if determined, switches exclusively between its

One notes that ( I − A+ A)∇ w is the projection of the cost gradient onto the null space of the constraint matrix A, and A+ e f is the minimum 2-

lower and upper bounds, u j , min and u j , max . In this

norm

paper we opt to use switching time intervals as optimization variables assuming there exists a switching (bang-bang) control that transfers the initial state of the system to its final state in finite time. To simplify the presentation, the case of singleinput is considered first. The extension to the multiinput case will be discussed later in this section. For convenience, the subscript j is dropped from u j (t )

component ∆ti of the switching interval vector ∆t can be obtained as: ∂x(t f ) ∂x(t f ) ∂t i ∂x(t f ) = = x&i sgn(∆t i ) (9) ∂∆t i ∂t i ∂∆t i ∂x i where xi ≡ x(ti ) , and x&i is the right hand side of equation (1) evaluated at xi and u = umax or u = u min depending on the sign of ∆ti . The validity of

Clearly, for a fixed initial state, the control input, the terminal constraint and the cost function t f are all functions of the switching intervals. The optimization ∆t1 L ∆t n ]∈ R n in order problem is to seek ∆t = [ to minimize

that u(t) is fixed over [ t i − 1 , t i ]. Letting Ai =

∂xi , by ∂xi− 1

∆ti

the chain rule,

subject to e f (∆t ) ≡ x(t f ) − x f = 0

∂x(t f ) ∂xi

= An An − 1 L Ai+ 1 . The matrices

Ai ’s themselves can be computed by integrating the first variation of equation (1). Letting ∂x(t ) Ai (t ) = , ti− 1 ≤t ≤ti , then Ai (t ) satisfies ∂xi − 1

Starting with an initial guess, the following gradient optimization algorithm can be used to numerically solve for the switching interval vector and the resulting state trajectory:

dAi (t ) ∂f ( x(t )) ∂g ( x(t ))  = + u(ti )Ai (t ) dt ∂x (10)  ∂x  Ai (ti − 1 ) = I

(8)

where 1 ≥ kr ≥ kn > 0 are constant learning gains, I denotes the identity matrix, A is the Jacobean matrix of partial derivatives of x(t f ) with respect to ∆t (i.e., ∂∆t

∆tl and

l =1

i =1

∂x(t f )

i



equation (9) arises from the fact that ti =

n

A=

The

The derivative of x(t f ) with respect to each

instant and define the i-th (signed) switching interval if u (t ) = umax t − t ∆ti =  i i − 1 , t i − 1 < t < ti ti− 1 − ti if u (t ) = umin

∆t new = ∆t − k r A + e f − k n ( I − A+ A)∇ w

AX = e f = x(t f ) − x0 .

( I − A+ A)∇ w converge to zero.

To formulate the problem, let ti , 0 = t0 < t1 < L < tn = t f , denote the i-th switching



to

convergence theory of projected gradient methods can be found in Bazaraa et al [1]. From this theory, we can conclude that for small learning gains and fullrank Jacobean A, convergence is guaranteed, that is the constraint error e f and the projection

and g j (x) in the single-input case.

w (∆t ) ≡ t f =

solution

and Ai = Ai (ti ) , where x(t) is the trajectory resulting from ∆t , and u(ti ) is either umax or umin. To account for an incorrect number of initial switching instants, the gradient optimization algorithm removes from the set of search variables switching intervals whose absolute values pass through zero (i.e., if the interval ∆ti shrinks past zero and changes sign). It also allows for a switching interval to be added by always including an "extra" switching interval of length ε near zero and opposite

), ∇ w is the gradient of w with respect to T

 ∂t f  T sgn(∆t1 ) L sgn(∆tn )] , ∆t , i.e., ∇ w =   ∂∆t   =[   sgn(.) is the signum function, and A+ is the MoorePenrose pseudo-inverse of A. For algebraically

3

sign to its preceding interval. That is it lets ∆tn + 1 = − ε sgn(∆tn ) , and allows this interval to grow into a finite-length switching interval, hence increasing the length of ∆t , if so dictated by the gradient projection algorithm. This modification has proven very effective in arriving at the optimal solution in the simulation case studies. Finally, to ensure that the solution found by the gradient optimization algorithm is a true optimal candidate, it must be tested against the Pontryagin’s Minimum Principle. This is particularly important in cases where it is possible for the algorithm to converge to a solution with a non-optimal number of switching intervals. In these cases a check is needed to assure that the solution is in fact a stationary one with respect to the full variable set which includes not just the widths of the switching intervals but also the values of the inputs. If these optimality conditions are not satisfied, i.e., if the quantity λT g (x) associated with u (t ) has the wrong sign, the algorithm is restarted with a new ∆t that satisfies equation (6) using the current (non-optimal) values of λand x. The process is continued until a stationary solution satisfying equation (6) is obtained. As pointed out earlier, to verify these optimality conditions, the initial co-state λ(t ) needs to be computed at t=0 using equation (7) and then integrated according to equation (5) over the trajectory resulting from the stationary solution. To this end let, x0 = x(0) and ∆t ( x0 ) denote the switching interval vector corresponding to x0. Then by the chain rule and stationarity of the solution (i.e., ∂t f ( I − A+ A) = 0 ), ∂∆t dt f ( x0 ) ∂t f ∂∆t ∂t f + ∂∆t = = A A (11) ∂∆t ∂x0 ∂∆t ∂x0 dx0 Differentiating the constraint equation, x(t f ) − x f = 0 , which is now a function of both x0

To extend the preceding algorithms to include systems with more than one input, one can let each input have its own switching time interval vector by defining ∆t = [∆t1,1 ,L , ∆t1,n1 ,L , ∆tr ,1 ,L , ∆tr ,nr ] where ∆t j ,i is the i-th signed switching interval and n j is the number of switching intervals for the j-th

input. Obviously, this composite switching interval vector must satisfy nj

tf =

i =1

nk



∆t k ,i ,

j≠k

i =1

which can then be appended to the terminal state constraint. The remaining details of the algorithm is very similar to that of the single input system and is therefore omitted. The Learning Algorithm The gradient projection algorithm of the preceding section requires the computation of the state trajectory (including final state error) for computing the required Jacobeans and the final constraint error. Unfortunately, if this algorithm (or those similar to it) is applied to control an actual system, a slight modeling error may render a solution that is far from optimal. In particular, if the system model is unstable or only marginally stable, the propagation of the modeling error through the system model may result in an unacceptably large state trajectory error especially at the final time. To increase the robustness of the optimization algorithm, one may use the state trajectory feedback from the actual system, instead of its computed trajectory, in a series of experimental trials. This is accomplished by first selecting an initial switching interval vector and applying the corresponding switching control law to the actual system. After completion of the initial trial, the resulting state trajectory (including final state error) is recorded. Then, based on the gradient algorithm, an improved switching vector is calculated for the next trial. The trials continue until convergence to a feasible (satisfying the terminal state requirement) and near optimal solution is obtained. The learning control law that updates the switching interval vector is derived from equation (8) by using the actual state trajectory in computing the terminal state error and the required Jacobeans: ∆t c + 1 = ∆t c − k r Ac + ( xmc (t f ) − x f ) (12) + − k n ( I − A c A c )∇ w c

and ∆t , with respect to x0 we have dx(t f ) ∂x(t f ) ∂x(t f ) ∂∆t = + =0 dx0 ∂x ∂∆t ∂x 1203 123 0 AnL A1



∆t j ,i =

A

Using the preceding equation in equation (11) gives ∂t f + λT (0) = − A An L A1 ∂∆t ∂t f where as before = sgn(∆ti ) . ∂∆ti

4

∂xi in equation (12) can be ∂xi− 1 approximated by first differentiating the discrete time Runge-Kutta approximation equation (13) to obtain ∂xi ,k + 1 , (k=0,...,Ni ), where Ni is the number of Runge∂xi ,k

where the superscript c represents the cycle number, x mc is the measured terminal state during the cycle c

Jacobeans

and Ac is the Jacobean matrix A evaluated along xmc . The effectiveness of this approach is illustrated in the following simulation case study.

Ai =

Kutta integration steps over [ ti − 1 , ti ]. Then these derivatives can be propagated forward by the chain rule to give: ∂xi, N i ∂xi , Ni ∂xi , N i− 1 ∂xi ,1 L (14) = Ai ≅ ∂xi, 0 ∂xi , Ni − 1 ∂xi , N i− 2 ∂xi ,0

Simulation Case Study The learning algorithm was applied with success to the problem of the acrobot, illustrated below in Figure 1.

To test the learning algorithm by simulation, the values of the masses of the actual system were increased by 40% from those of the theoretical system. The initial guess for the switching time interval vector was: ∆t =1.95[-0.5 1 -1 1 -1 1 -1 1 -1 1 -1 1 -0.5] T / umax . The learning gains were set to kr = 0.05 and kn = 0.0125 . The results are shown below in Table 1.

θ

1

θ

2

Figure 1: Acrobot Schematic

Percent mass error

The initial state of the system is the "straight down" position (i.e., θ1 = 0 , θ2 = 0 ) at rest, and the desired terminal state is the "straight up" position (i.e., θ1 = π , θ2 = 0 ), also at rest. We will denote the masses and centroidal moments of inertia of the two links by m1 , m2 , I 1 , I 2 (which take into account the motor inertia at the second joint), their center of mass positions by l1c , l2c , which are measured from the joint of each respective link, the link lengths by l 1 , and l2 , and the gravitational acceleration by g. The system has its link mass distributed such that l2c = 0 . The complete set of nominal parameter values are: m1 = 1 , m2 = 1, l1 = 1 , l2 = 1 , g=9.81, I1 = 1 / 6 , I 2 = 1 / 2 , l1c = 1 / 2 , l2c = 0 (all units SI). The motor torque u at the second link is restricted as follows: − umax ≤u ≤umax , where umax = 3 . The equations of motion can be found in DeJong and Spong [6]. An adaptive step-size RungeKutta discretization, denoted symbolically by ~ xi ,k + 1 = f ( xi ,k , u (ti− 1 )) (13)

40%

Table 1: Simulation Results Actual Learned Percent minimum minimum Error in time time Minimum (seconds) (seconds) Final Time 7.5512 7.5930 0.5544%

From Table 1, one observes an error in the minimum final time of only 0.5544%, which demonstrates the robustness of the learning algorithm to model mismatch. Moreover, it should be noted that when the model-based switching interval vector is applied to the actual system, the resulting 1-norm of the terminal state error vector is about 6, which illustrates the great sensitivity of this highly nonlinear under-actuated pendulum system. Despite this sensitivity, the learning algorithm has performed very well by reducing the terminal state error to zero and achieving a nearly minimum final time. The optimal switching vectors were: ∆t=[-0.4093 0.9676 -0.9648 1.0291 -1.0267 1.2080 -1.3748 .5709] T for the true solution and ∆t =[-0.3697 0.9805 -1.0187 1.0428 -1.0806 1.2033 -1.3276 0.5700] T for the learned solution. The true solution was verified using Pontryagin’s Minimum Principle. As the Hamiltonian H is linear in u, u must satisfy equation (6). The co-state was found and ∂H ∂u = λT g (x) plotted versus time in

where xi , k is the value of x at the k-th discretization

step over the switching interval [ ti − 1 , ti ], is used to integrate the equations of motion forward. The

5

The true minimum time joint angle trajectory differed very little from Figure 3.

Figure 2 below. The circles indicate the switching times. We see that the switches do in fact occur when ∂H ∂u passes through zero. And, equation (6) was satisfied.

Conclusion Title: dhdu2.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers.

A self-contained numerical algorithm for finding the minimum time solution for a class of nonlinear systems was presented. The algorithm also checks the validity of the resulting solution against the Pontryagin’s Minimum Principle. It was shown that by placing the actual system in the feedback loop, the algorithm can learn a near time optimal control input for the actual system. The learning algorithm is a gradient based one that uses the measured terminal state error and the measured state trajectory of the real system in the optimization loop with the goal of robustness to model mismatch. The potential robustness was demonstrated by applying the trajectory learning algorithm to a highly nonlinear under-actuated double pendulum system called the acrobot.

Figure 2. ∂H ∂u Versus Time (sec) Showing that the Exact Solution Satisfies Pontryagin’s Minimum Principle The resulting learned system joint angle trajectories are shown in Figure 3 below. (Recall θ1 (t f ) = π and θ2 (t f ) = 0 .)

References

Title: angles.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers.

[1] Bazaraa, M.S. and Shetty, C.M., Nonlinear Programming, New York: John Wiley and Sons, 1979. [2] Bobrow, J.E. et al., "On the Optimal Control of Robotic Manipulators with Actuator Constraints," Proceedings of the 1983 American Control conference, vol. 2, pp. 782-787, 1983. [3] Bryson, A.E. and Ho, Y., Applied Optimal Control, New York: Hemisphere Publishing Corporation, 1975. [4] Byers, et al., "Near-Minimum Time, Closed Loop Slewing of Flexible Spacecraft," Journal of Guidance, Control, and Dynamics, vol. 13, No. 1, Jan.-Feb., 1990. [5] Byers, R.M. and Vadali, S.R., "Quasi-ClosedForm Solution to the Time-Optimal Rigid Spacecraft Reorientation Problem," Journal of Guidance, Control, and Dynamics, vol. 16, No.3, May-June 1993. [6] DeJong, G. and Spong, M.W., "Swinging Up the Acrobot: An Example of Intelligent Control," Proceedings of the American Control Conference, Baltimore, MD, June 1994, pp. 2158-2162. [7] Eisler, G.R., Robinett, R.D., Segalman, D.J., and Feddema, J.D., "Approximate Optimal

Figure 3: Learned Joint Angles (rad) versus time (sec) and Figure 4 below plots the learned input torque history: Title: torque.eps Creator: MATLAB, The Mathworks, Inc. Preview: This EPS picture was not saved with a preview included in it. Comment: This EPS picture will print to a PostScript printer, but not to other types of printers.

Figure 4: Learned Input torque (Nm) versus time (sec)

6

Trajectories for Flexible-Link Manipulator Slewing Using Recursive Quadratic Programming," Journal of Dynamic Systems, Measurement, and Control, vol. 115, September 1993, pp. 405-410. [8] Eisler, G.R., Segalman, D.J., and Robinett, R.D., "Approximate Minimum-Time Trajectories for Two-Link Flexible Manipulators," Proceedings of the American Control Conference, pp. 870875, 1990. [9] Larson, R.E. and Casti, J.L., Principles of Dynamic Programming, Part II, Advanced Theory and Applications. New York: Marcel Dekker, Inc., 1982. [10] Lewis, F.L., Optimal Control, New York: John Wiley and Sons, Inc., 1986. [11] Meier, E. and Bryson, A.E., "Efficient Algorithm for Time-Optimal Control of a Two-Link Manipulator," Journal of Guidance, Control, and Dynamics, vol. 13, No. 5, Sept.-Oct., 1990. [12] Shin, K.G and McKay, N.D., "Minimum-Time Control of Robotic Manipulators with Geometric Paths," IEEE Transactions on Automatic Control, vol. AC-30, No. 6, June 1985. [13] Shin, K.G and McKay, N.D., "Open-Loop Minimum-Time Control of Mechanical Manipulators and its Application," Conference Proceedings of the 1984 American Control Conference, vol. 3, pp. 1231-1236. [14] Shin, K.G. and McKay, N.D., "A Dynamic Programming Approach to Trajectory Planning of Robotic Manipulators," IEEE Transactions on Automatic Control, vol. AC-31, No. 6, June 1986. [15] Shin, K.G. and McKay, N.D., "Selection of NearMinimum Time Geometric Paths for Robotic Manipulators," IEEE Transactions on Automatic Control, vol. AC-31, NO. 6, June 1986. [16] Zimmerman, D.C. and Layton, D.S., "Large Angle Slewing Maneuvers Using Performance Driven Darwinian Learning Controllers: Theory and Experiment," The 34th AIAA / ASME / ASCE / AHS / ASC / Structures, Structural Dynamics, and Materials Conference, AIAA/ASME Adaptive Structures Forum, Part 6, April 19-22, 1993, pp. 3540-3550.

7

Suggest Documents