Designing Iterative Learning Control Subject to Actuator Limitations Using QP Methods Richard W. Longman * Columbia University, New York, NY, 10027 USA Katja D. Mombaur † LAAS-CNRS, Toulouse, France 31077 and Benjamas Panomruttanarug ‡ King Mongkut’s University of Technology Thonburi, Bangkok, Thailand, 10140
Iterative learning control (ILC) applies to situations when a system performs a tracking maneuver repeatedly. The error each run is used to adjust the command in the next run, aiming to converge to zero error. Spacecraft applications include making fine pointing equipment follow a precise scan pattern in spite of flexibility effects. Effective ILC design methods are available for linear systems. An important issue is that ILC may reach actuator saturation limits during the iterations with hardware, and then there is no guarantee of convergence. This paper develops ILC methods based on quadratic programming that guarantee convergence under such conditions starting from a feasible trajectory.
I
I. Introduction
TERATIVE learning control (ILC) iterates on the command given to a feedback control system, aiming to converge to that command that produces zero tracking error [1,2]. In manufacturing such software modifications can make existing hardware follow a desired finite time trajectory much more accurately. The benefit is much higher precision that can translate into higher product quality, or higher speed motions that can translate into higher manufacturing productivity, or both. In spacecraft it can be used to learn to make high precision scans using high precision sensors. Classical feedback control systems always make deterministic errors in executing a tracking command. The command determines a forcing function in a differential equation, and what the system does is a particular solution of the differential equation, and there is no reason for these to be equal in general. And the faster the desired trajectory compared to the time constants of the control system, the bigger the error. In addition, there are often disturbances that repeat every time one executes the same command, for example, the gravity torque disturbance on a robot link while moving along a desired trajectory. ILC applies the desired trajectory as a command, observes the error, and according to an ILC law starts adjusting the command to converge to whatever command is needed, and hence whatever forcing function is needed, in order to produce the desired response as a particular solution, in the presence of the disturbance. One class of control problems where learning can be particularly important is in executing optimal trajectory solutions. References [3,4] study time optimal control of ground based robots with the aim of increased productivity. Such trajectories ride the boundary of the actuator limits at least for one of the robot links. References [5,6] study optimal satellite mounted robot maneuvers, for various cost functionals, including aiming to minimize disturbances from robot motion to the satellite. Whenever one has a desired trajectory determined by some optimization, it is an important question how to make the hardware execute this trajectory, since normal feedback controllers make deterministic errors. And the deterministic errors are often big enough to seriously deteriorate the optimality of the *
Professor of Mechanical Engineering, 500 West 120th St., New York, NY 10027. Research initiated during the tenure of an Alexander von Humboldt Senior Scientist Research Award. Fellow AIAA, AAS. † 7 Avenue du Colonel Roche,
[email protected]. ‡ Lecturer, Department of Control System and Instrumentation Engineering,
[email protected] 1 American Institute of Aeronautics and Astronautics
trajectory. In any situation where one can repeat the trajectory for learning purposes, ILC offers a solution. Reference [7] investigates the use of ILC in implementing optimal robot trajectories. Most of the research in ILC does not include any limit on the actuator capabilities. Presumably the chosen trajectory is feasible with respect to the actuator limits, but the desired trajectory may be riding these limits. Although the desired trajectory is feasible, during the iterations ILC uses to converge to executing this trajectory may violate the limits. In numerical iterations one can come up with examples of saturation limits causing failure to converge. This could happen in hardware application of the common ILC approaches. What is needed is an ILC method that acknowledges the actuator hardware limits throughout the iterative process. It is the purpose of this paper to supply such methods, generalizing three different general classes of ILC laws. These include the ILC law based on a Euclidean norm contraction mapping (see for example, [8,9,10]), on a partial isometry [8,11], and on a use of a quadratic cost controlling the size of the change in the control action with iterations [12,13]. The approach makes use of quadratic programming or QP methods that can very efficiently minimize a quadratic expression subject to linear equality and inequality constraints. In ILC the computations can be done between runs, and can also be started for the next run during the current run, so there is often no serious computation time constraint. If needed, one can keep running the system with the previous iterations command until the updated command is available, and then apply it on the next run. The paper is organizes as follows. First some mathematical background is presented on ILC and QP. Then the very immediate generalization of the quadratic cost ILC to handle inequality constraints on the input is presented. This can apply to the non standard situation where one applies ILC to a system starting from actuator to output, without any feedback control running. Then it is shown how to reformulate the problem to handle limiting the actuator output when this signal is in the middle of a feedback loop, and the ILC is adjusting the command input to such a loop. This generalization is first done for typical classical feedback controllers with unity feedback. Then rate feedback is introduced, as well as feedback that employs some dynamics, e.g. including a low pass filter for attenuation of noise. Finally, the formulation is generalized to handle modern state variable feedback control designs that make use of a Kalman filter or Luenberger observer to reconstruct the state from measurements. Each of these formulations is done for the quadratic cost ILC law, but it is then shown how one can make use of QP methods to produce Euclidean norm contraction mapping, and partial isometry ILC that acknowledges the actuator limits throughout the iteration process. The reader is referred to [14] for a related work dealing other types of model nonlinearities in ILC design.
II. ILC Background A. The ILC Problem Statement Consider a general linear state variable difference equation model
x j (k + 1) = Ax j (k) + Bu j (k) y j (k + 1) = Cx j (k + 1) + v(k + 1)
(1)
k = 0,1,2,K, p " 1
In most applications of ILC this represents the closed loop dynamics of a feedback control system. We consider the input (the command for the case of a closed loop control!system) u j (k) and output y j (k) to be scalars, but ! generalization to a q-dimensional input and output is immediate. The aim of the learning control system is to iteratively adjust the input until the output time history y j (k) matches the desired output y * (k), k = 1,2,K, p . One first applies some input history, perhaps equal to the desired ! output history, and ! observes the resulting error. Based on this information the input is adjusted in the next run (repetition, or iteration) based on an ILC law, aiming to converge to zero tracking error. The subscript j refers to the run number. The system initial condition is reset to the ! ! same value before each run so that x(0) is independent of j. Except in the case of a constant command going into an integral control system, which at least succeeds in giving zero error after the transients have decayed, a feedback control system will not produce an output equal to the commanded output. ILC aims to converge to some different input that does in fact produce the desired output. In addition, feedback control systems are very often subject to ! and the disturbance can be the same every time the trajectory is run. ILC also aims to some external disturbance, eliminate the effects of such repeating deterministic disturbances. Physically such disturbances often enter the feedback control loop just before the plant transfer function, but they can appear in other locations as well. Wherever such a disturbance appears, there is an equivalent output disturbance, and this is represented by v(k +1) in equation (1). 2 American Institute of Aeronautics and Astronautics
!
B. The Repetition Domain Model By repeatedly writing equation (1) for successive values of time step k, and substituting previous equations into the current equation, one can get the convolution sum solution to the difference equation k
x j (k + 1) = A k +1 x(0) +
#A
k"i
Bu j (i)
;
(2)
y j (k + 1) = Cx j (k + 1)
i= 0
Define column vectors containing the complete history of variables during iteration j ! ! " y j (1) % " u j (0) % $ ' $ ' y j (2) ' u j (1) ' $ $ yj = ; uj = $ M ' $ ' M $ ' $ ' # y j ( p)& # u j (p ( 1)&
(3)
Note the one time step shift in the time arguments of the input vectors verses the output vectors, reflecting the one time step delay between input and output in (1). It is assumed that the product CB " 0 , so that a change in input at one time step always influences!the output one time step later. Analogous matrices are defined for the disturbance, the desired trajectory, and the tracking error, v, y*, e j = y * "y j , all of which have the time arguments as in y j . In case CB is zero, one can make appropriate adjustments to definitions ! of these history vectors, and to resulting matrices developed below. We can now package the result for each repetition [15] into the following equations
!
!
y j = A x(0) + Pu j + v
(4)
e j = y * "A x(0) " Pu j " v where
" CA % $! 2 ' CA ' A =$ $ M ' $ p' #CA &
# CB 0 % CAB CB P =% % M M % p"1 p"2 $CA B CA B
0& ( L 0( O M ( ( L CB' L
(5)
It is convenient to define a difference operator in repetitions such that " j +1 y = y j +1 # y j , and similarly for other column vectors [15].! Applying this!operator to the second of equations (4) produces
" j +1 e = #P" j +1 u !
(6)
because the difference operator operating on the desired trajectory, on the repeating initial condition, and on the repeating disturbance all produce zero. ! C. General Linear ILC and the Quadratic Cost ILC Law The form of a general (first order) linear iterative learning control law is
u j +1 = u j + Le j
(7)
where L is a matrix of control gains chosen by the specific ILC design. Thus, the history of the inputs in the next run is the history used in the most recent run plus this matrix of gains times the errors observed in the most recent run. Recognizing that (7) can be written as "! j +1 u = Le j , establishes the error propagation equation from one iteration to the next
!
e j +1 = [I " PL]e j 3 American Institute of Aeronautics and Astronautics !
(8)
where I is the identity matrix. One concludes from this that the tracking error at every time step of the p time step desired trajectory will go to zero as j tends to infinity, for all initial error histories, if and only if all eigenvalues of the matrix [I " PL] are less than one in magnitude [15] (9)
max " i (I # PL) < 1 i
!
A quadratic cost optimization criterion can be used to design the input updates from iteration to iteration in such a way as to control the behavior of the learning transients [12,13]. The cost and the corresponding repetition domain ! dynamic equations are
J j +1 =
1 T 1 e j +1Qe j +1 + " j +1 u T R" j +1 u 2 2
;
e j +1 = e j # P" j +1 u
(10)
where Q and R are symmetric positive definite weighting matrices chosen by the designer. Substituting the second equation into the first, differentiating with respect to the input update, and setting the result to zero produces the ! control law u j +1 = u j + (P T QP + R)"1 P T Qe j (11)
D. Addressing Internal Instability of the Inverse Problem ! When a linear constant coefficient ordinary differential equation system is fed by a zero order hold, there is an equivalent linear constant coefficient difference equation whose solution is identical to the solution of the differential equation at the sample times. As shown in [16,17,18], if the pole excess of the differential equation is 3 or more, and the sample time is fast enough, then a zero is introduced in the digital model that is outside the unit circle. This means the inverse problem, finding the input needed to produce a desired output, will have an unstable transfer function, and generically will have an unstable solution that grows with time step, for the p time steps of the finite time trajectory. This phenomenon is manifested in the matrix formulation given here, by have one very small singular value of matrix P for every zero outside the unit circle. The implication is that the control effort when using a zero order hold input that is needed to produce zero error at the sample times, grows exponentially with time step (and alternates in sign). The control action can very quickly become infeasible, and in addition the error between time steps can grow very quickly, although the error at the time steps is zero after convergence. Note that this implies that any ILC law that aims for zero error at every time step, as in the above quadratic cost ILC law, is aiming for a solution that is not feasible and impractical in many applications. In [17,18], a solution to this problem was developed by allowing the ILC law to update the input every time step, but to only ask for zero error at every other time step. The above ILC laws can be modified to accomplish this. Suppose the p is an even number and define e D as the p / 2 dimensional error vector obtained by deleting all odd numbered time step entries in e , and define PD as the p / 2 by p matrix obtained from P by deleting all odd numbered rows. Then the equations (6) and (7) become " j +1 e D = #PD" j +1 u and " j +1 u = Le D, j where the learning gain matrix L is now p by p / 2 . The!quadratic ! cost ILC law for this case is obtained by replacing e j ,P by e D, j ,PD ! ! ! in equations (10) and (11).
! ! III. The Quadratic Programming Problem ! ! ! To convert ILC laws to ones that make use of the inequality constraints on the actuator output, we convert each law into a quadratic programming problem. The routine quadratic programming problem asks to minimize an objective function over column vector z
J=
1 T z Hz + g T z 2
subject to linear inequality constraints and equality constraints
! 4 American Institute of Aeronautics and Astronautics
(12)
A1z " Z1 A2 z = Z 2
!
(13)
Here H is a given square symmetric matrix and g is a given column vector, Z1, Z 2 are given column vectors, and A1, A2 are given matrices of appropriate dimension. Note that the inequality in equation (13) is used here to mean an ! inequality applied component by component to the vectors on each side if the inequality. The numerical solution of this quadratic programming problem can be quite efficient. There are many numerical ! algorithms that appear rather different but have very strong similarities [19]. An outline of an effective approach used by Matlab is as follows. One employs an active set strategy. First one needs a feasible solution. To obtain it, one defines a linear programming problem that subtracts the same slack variable from the left hand side of all inequality constraints, and then seeks to minimize the slack variable. One can use the Simplex algorithm to solve such a problem. One makes the slack variable large enough so all inequalities are satisfied, and then one minimizes it. The result is that one (or more) of the inequality constraints is now an equality constraint, and the rest of the constraints are satisfied as inequalities. Now that one has a feasible starting solution, one can start addressing the QP problem. Take the minimization subject to the real equality constraint, and whichever of the inequality constraints is equality from above (which is the initial active set). Get a solution for the linear equations that result using the Karush Kuhn Tucker (KKT) conditions. Plug the solution into the other inequality conditions that are not on the boundary. Examine which ones are violated and see how much. Go the minimum distance in that direction to the first of these boundaries. Add this new inequality to the active set. Repeat, as long as the new direction that you calculate doesn’t include any new inequality constraints (or you have the number of equality conditions equal to the number of variables to adjust, in which case there is no minimization possible). Then check if you can omit one of the inequalities that are active, currently being treated as equality. Check the adjoint variables. All must be of the right sign based on the KKT conditions. If all fulfill the sign condition, you have found the solution. If not, pick one of these that has the wrong sign and start over on the QP computation above without this equality. Only take out one at a time. There are special strategies to decide what order to take them out. One can prove that under normal conditions, this produces a finite procedure, but it can be “np hard” meaning the number of steps may grow exponentially with some measure of the information contained in the problem, e.g. the number of inequalities and number of variables, etc. However, solution is very fast in most cases, and only in extreme cases does one see the exponential dependence. To handle such cases there are interior point methods that grow in a polynomial way with the size of the problem.
IV. Quadratic Cost ILC with Inequality Constraints on the Input Most often the ILC problem adjusts the input to a feedback control system, and in this case the actuator constraint is on some output of the actuator which is a signal somewhere inside the control loop. But ILC can be applied, for example, to the torque to response of a robot link, in which case the inequality constraint can be on the torque input variable. This section treats this simple case, as a prelude to the more complicated situation. The ILC quadratic cost function with the error at odd time steps ignored, used to compute the input u j +1 , is
J j +1 =
1 T 1 e e + s(" j +1 u)T " j +1 u 2 D, j +1 D, j +1 2
(14)
!
where for simplicity we have set the weight matrices to the identity matrix and the identity matrix time a gain s. The equation giving the dynamics from one repetition to the next is !
e D, j +1 = e D, j " PD# j +1 u
(15)
Combining these equations produces the standard QP problem
J j +1 =
1! 1 (" j +1 u)T [ sI + PDT PD ]" j +1 u # eTD, j PD" j +1 u + eTD, j e D, j 2 2
! 5 American Institute of Aeronautics and Astronautics
(16)
where H = [sI + PDT PD ] and g = "PDT e D, j . Note that in choosing this control update, e D, j is a predetermined constant. This minimization is to be done subject to the inequality constraints
!
!
u j +1 = u j + " j +1 u # u max u j +1 = u j + " j +1 u $ u min
!
(17)
As before the inequalities are to be taken component by component. Thus, instead of using the quadratic cost ILC law of equation (11) as modified for deletion, for the update each ! iteration, one solves the above quadratic programming problem each iteration to determine the command update. Such computations are made between runs, so that one often does not have time constraints. In the event that the next run should be made before computations are completed, one simply makes the update at the first run after which the computations are completed.
V. Quadratic Cost ILC With Actuator Inequality Constraints Now consider that the input that is adjusted by the ILC is a command to a feedback control system, and the inequality constraints are on the actuator output. Three different situations will be considered. The first is the case of a classical feedback control system with unity feedback. The second generalizes this to include such things as rate feedback or feedback including some filter, for example a noise filter. And the third considers the case of state variable feedback control design using a Kalman filter or Luenberger observer. E. Classical Feedback Control Consider a standard feedback control system structure, having a controller, a plant, and a feedback element. For this section we consider unity feedback so that the feedback element can be ignored or set to unity. The command given the feedback system is the reference input y r (k) , let a(k) denote the actuating signal to the controller (equal to the reference input minus the measured output when one has unity feedback), u(k) is the output of the contoller/actuator and has saturation limits, and v(k) is a repeating disturbance. The usual location for a disturbance is before the plant, but one can make an equivalent disturbance that enters immediately after the plant which we do ! ! here. For simplicity, consider that all blocks in the feedback loop block diagram have digital form. This is sufficient ! The actual situation would likely for our purpose here, to illustrate how one sets up the mathematical model needed. ! be slightly more complicated because the digital to analog converter would come after the controller and before the actuator, requiring two different discretizations, one from the input to the actuator to its output, which would be used to monitor the actuator output at sample times, and another to go from the input to the actuator to the sampled system output which is also the feedback signal. The state space form of the plant difference equation is
x p (k + 1) = A p x(k) + B p u(k) y(k) = C p x p (k) + v(k)
(18)
Since this corresponds to a differential equation fed by a zero order hold, it is natural to leave out any direct feedthrough term. ! In order to make a general mathematical formulation to use, we model the controller in the form of a state space representation
x c (k + 1) = Ac x c (k) + Bc a(k) u(k) = C c x c (k) + Dc a(k)
(19)
To show the generality of the formulation, consider several typical classical feedback control laws that we might want to represent by this equation. ! (1) Proportional Control: The simplest controller is a proportional controller using u(k) = K p e(k) . Generally one creates digital control systems in which the analog to digital (A/D) converter, and the digital to analog converter (D/A) using a zero order hold are synchronized. In this case the above control law would require instantaneous computation of u(k) so that no time step would be lost in the controller. This could not be a precise model of the ! negligible compared to one time step, hardware situation. But one could decide that the computation time needed is
!
6 American Institute of Aeronautics and Astronautics
and instead of synchronizing the A/D and D/A, let the D/A use an interrupt mode so that the controller output is applied immediately after the result arrives from the controller. And then if multiplying the error by K p and sending the result to the D/A takes a small fraction of a sample time interval, we can use a mathematical model that has no time delay. In order to make equation (19) represent this control law, set Ac , Bc ,C c all to zero, and let Dc = K p .
! (2) Proportional Control with Time for Computation: If one must allow a time step for computation when using a proportional controller, then let Dc = 0, Ac = 0 , and pick Bc ,C c so that C c Bc = K p , and then u(k) = K p e(k " 1) . ! ! (3) Proportional Plus Derivative Controller: If the desired control law were u(t) = K p e(t) + K d de(t) / dt in continuous time, then! one can allow one time! step delay of the computation, and ! duration T for completing ! approximate the derivative by a backward difference, [e(k " 1) " e(k " 2)] /T , to write the discrete time version of the control law in the form u(k) = K1e(k " 2) + K 2e(k " 1) . Then this can be converted to the state space form in (19) by ! creating a state equation that has memory for the needed number of steps
!
"0 ! " 0% 1% x c (k + 1) = $ ' x c (k) + $ 'e(k) ; u(k) = [ K1 K 2 ] x c (k) # 0 0& #1&
(20)
Alternatively, one could consider the version that does not loose a time step for computation, where u(k) = K1e(k) + K 2e(k " 1) , and have Ac = 0 , Bc = 1, C c = K 2 , and Dc = K1 . Only the product K1e(k) and its ! addition to the second term must be made in what is considered negligible time compared to a time step. t
# e(" )d"!, i.e. du(t) / dt = K e(t) ,
!
(4) Integral Control: If the desired control law ! ! ! in continuous!time were u(t) = K i
!
we can replace the derivative by a backward difference, [u(k) " u(k " 1)] /T , to obtain the recursive update equation u(k + 1) = u(k) + K iTe(k) . Then to use (19) to represent this equation, set Ac = C c = 1 , Dc = 0 , and Bc = K iT . ! ! (5) PID Controller: To obtain a PID controller one simply appends the integral control state equation to the PD ! control state equation ! ! !
" 0 1 0% " 0% $ ' $ ' x c (k + 1) = $ 0 0 0' x c (k) + $ 1 'e(k) ; u(k) = [ K1 K 2 $# 0 0 1'& $#T '&
i
0
(21)
K i ] x c (k)
(6) Compensator: If one wants to run the resulting signal through a compensator having a zero and a pole, one can include this in an obvious way into the matrices of equation (19). ! Since the ILC is adjusting the command y r (k) to the feedback control system in order to make the output y(k) track the desired output, we need to generate the closed loop input-output state variable model and form its Toeplitz matrix PCL for use in the quadratic cost function. We also need to generate the relationship between the command and u(k) , since u(k) is subject to the saturation limit. For simplicity, we will consider that the controller has a ! ! nonzero Dc . One can make appropriate modifications if there is a time step lost in the controller update.
[
!
]
T
Closed Loop Toeplitz Matrix: Define the state variables for the closed loop model as xCL (k) = x Tc (k) x Tp . ! In the ! first of equations (18) use the second of equations (19) to eliminate u(k) in terms of a(k) , and use !a(k) = y (k) " y(k) and eliminate y(k) using the second of equations (18). Do similar operations on the first of r equations (19) and then combine the results to obtain ! ! ! (22) xCL (k + 1) = ACL xCL (k) + BCL [ y r (k) " v(k)] ; y(k) = CCL xCL (k) + v(k) !
!
where
! 7 American Institute of Aeronautics and Astronautics
# Ac ACL = % $B pC c
& # Bc & "Bc C p ( ; BCL = % ( : CCL = [ 0 C p ] A p " B p Dc C p ' $ B p Dc '
(23)
The equivalent of equation (4) becomes
!
(24)
y = PCL y r + ACL xCL (0) + [v1 " PCL v 0 ]
where PCL and ACL are obtained from equation (5) by substituting the system matrices of (23) for A,B,C . The v1 is a column vector of v(k) going from k equals 1 to p going down the column, and v 0 is a vector of v(k) for k going ! from 0 to p " 1. Note that if Dc = 0 , then CCL BCL = 0 on the diagonal of PCL and one can drop y(1) in y , and drop ! ! y r (p "!1) in y r . ! Toeplitz relationship of the!controller, equation (19), ! Matrix Command to Actuator Output: The input-output ! can be written as ! ! ! ! ! !
!
!
(25)
u j = Pc a j + Ac x c (0)
where
# Dc " u j (0) % " a j (0) % ! % $ ' $ ' % Bc C c u (1) a (1) j ' ; a =$ j ' ; P =% B A C u j =$ c c c c j $ M ' $ M ' % M $ ' $ ' % # u j ( p)& # a j ( p)& %$ Bc Acp"1C c
!
0 Dc Bc C c M p"2 Bc Ac C c
0 0 Dc M p"3 Bc Ac C c
" Cc % L 0& ( $ ' L 0( $ C c Ac ' L 0 ( ; Ac = $ C c Ac2 ' ( $ ' O M ( $ M ' $#C c Acp '& L Dc ('
Substituting a j = y r, j " y j and applying the delta difference operator gives ! ! " j +1 u = Pc [" j +1 y r # " j +1 y]
" j +1 u = Pc (I # PCL )" j +1 y r
!
(26)
! (27)
The QP Problem For Actuator Saturation Constraint: The quadratic programming problem for actuator constraints can now be stated, minimize the cost function ! 1 1 T (28) J j +1 = (" j +1 y r )T [ sI + PD,CL PD,CL ]" j +1 y r # eTD, j PD,CL" j +1 y r + eTD, j e D, j 2 2 subject to the inequality constraints
!
u j +1 = u j + " j +1 u = u j + [Pc (I # PCL )]" j +1 y r $ u max u j +1 = u j + " j +1 u = u j + [Pc (I # PCL )]" j +1 y r % u min
(29)
Here the first time step that would normally be adjusted by choice of y r, j is deleted in e D, j , and every other row thereafter, and the same!is done in PD,CL . F. Classical Control with Rate or Filtered Feedback ! ! Unity feedback is the most common situation in classical control, but there are times when one wants a transfer ! function H f (z) in the feedback loop. The most common case is the use of rate feedback. Also, sometimes low pass filters are inserted to attenuate measurement noise. Various situations can apply based on where the analog to digital conversion is taking place. But for purposes of illustrating that one can handle such situations in a manner similar to the unit feedback case, consider that H f (z) is written in state space form !
!
8 American Institute of Aeronautics and Astronautics
x f (k + 1) = A f x f (k) + B f y(k)
(30)
f (k) = C f x f (k) + D f y(k)
As in the case of PD control without any time delay discussed above, this can represent unity feedback plus a derived rate feedback term with a gain. Of course it can also model a noise filter as well. ! to a QP problem one can parallel the above development. Of course, one can In order to convert the problem develop the new closed loop system model with a new PCL matrix. Then to get the input to actuator output result, note that a j = y r, j " f j , and use (30) following (4) to write f j = A f x f (0) + Pf y j . Applying the delta difference operation to equation (25) produces
! " j +1 u = Pc" j +1 a = Pc [" j +1 y r # " j +1 f ] = Pc [" j +1 y r # Pf " j +1 y] = Pc [" j +1 y r # Pf PCL" j +1 y r ] ! " j +1 u = Pc (I # Pf PCL )" j +1 y r
!
(31)
This final expression is substituted into equation (29), replacing the " j +1 u used there, in order to produce the linear inequality constraints needed for the quadratic programming problem. ! G. State Variable Feedback Design using a Kalman filter Now consider the other large class of feedback control! design methods that use a steady state Kalman filter or a Luenberger observer and use the estimated state for feedback. Consider the same plant equations (18), and form a feedback control law (32)
u(k) = K1 xˆ (k) + K 2 y r (k) The xˆ (k) is an estimate of the state x p (k) produced by the filter or observer
! xˆ (k + 1) = A p xˆ (k) + B p u(k) + K 3[ y(k) " C p xˆ (k)]
!
(33)
! The gain K 3 comes from the filter design method such as pole placement or the Kalman gain. Gain K1 can come from LQ design or from pole placement, and K1 is adjusted by the designer perhaps for zero error to DC signals. ! T Define the closed loop state variable as xCL (k) = x Tp (k) " T (k) where " (k) = x p (k) # xˆ (k) is the estimation error. ! ! Then equations (18), (32), and (33) can be combined to create the closed loop system equations ! xCL (k + 1) = ACL xCL (k) + B1 y r (k) + B2 v(k) ! (34) ! y(k) = CCL xCL (k) + v(k)
[
]
where
# A p + B p K1 ! ACL = % 0 $
#Bp K2 & #0& "B p K1 & ( ; B1 = % ( ; B2 = "% ( ; CCL = [CCL A p " K 3C p ' $ 0 ' $K 3 '
0]
(35)
Analogous to equation (24) we can form the output vector for a complete p time step run
!
!
y = PCL1 y r + PCL 2 v 0 + v1 + ACL xCL (0)
(36)
Here PCL1 uses input matrix B1, while PCL 2 uses input matrix B2 which has a minus sign in its definition. Then the change in input to change in error equation is " j +1 e = #PCL1" j +1 y r which is used in forming the quadratic cost as ! before. It remains to obtain the relationship between " j! +1 y r and " j +1 u to use in the linear constraint on the actuator ! ! output. Apply the difference operation!to (32) and then package the whole history for a run
! 9 ! American Institute of Aeronautics and Astronautics
" j +1 u = diag(K1,K1 ,...,K1)" j +1 xˆ + K 2" j +1 y r
(37)
Note that
" j +1 xˆ (k) = " j +1 x p (k) # " j +1$ (k) = [ I #I ]" j +1 xCL (k) = Cˆ " j +1 xCL (k) " j +1 xCL (k + 1) = ACL" j +1 xCL (k) + B1" j +1 y r (k) !
(38) (39)
ˆ ˆ Equation (39) with ! (38) as the output equation define a Toeplitz matrix P with system matrices ACL ,B1, C so that " j +1 xˆ = Pˆ " j +1 y . Use in (37) ! produces r
!
" j +1 u = Psfb" j +1 y r ;
Psfb
# K CˆB + K ! 0 2 % 1 ˆ1 ˆB + K K C A B K C 1 1 2 = % 1 CL 1 % M M % ˆ p"1 ˆ p"2 $ K1CACL B1 K1CACL B1
& ! 0 ( 0 ( ( M ( L K1CˆB1 + K 2 ' L L O
(40)
! in forming the linear inequality constraints on the actuator output as before, to complete the formulation This is used of the quadratic programming problem for this type of feedback system. !
VI.
Inequality Constraints in the Euclidean Norm Contraction Mapping and Partial Isometry ILC Laws
The quadratic cost ILC law converts somewhat more naturally to a quadratic programming problem with inequality constraints than other laws such as the Euclidean norm contraction mapping law [8,9,10] and the partial isometry law [11]. But these also can be converted. The Euclidean norm contraction mapping ILC law adjusts the input history at each time step of a repetition taking a step in the steepest descent direction for the sum of the squares of all future errors. To pick the control u j +1 (k) at time step k for the next repetition j + 1, one computes the cost
J j (k) =
!
1 T 1 e (k + 1)e(k + 1) = [eT (k + 1)e(k + 1) + eT (k + 2)e(k + 2) + LeT (p)e(p)] 2 2 ! !
(41)
where e(k) with a time argument is the vector of possible future errors at time step k. One does not minimize this cost which would correspond to solving the inverse problem in one step, but rather one finds the steepest descent direction !for the future errors that would result from changing the control u j (k) , and takes a step in this direction. This direction is given by
# "J j (k) &T # " e(k) & T # " e(k) & (=e ! (( = )(P)Tk +i e(k) =% (k)%% ( % "u j (k) $ " e(k) ' %$ "u j (k) (' " u (k) $ j '
"J j (k)
(42)
We have used the derivative " e /"u(k) = #(P) k +i where the subscript denotes the choice of the indicated column of the matrix, as follows from " j +1 e = #P" j +1 u , i.e. equation (6). Since this derivative is to be evaluated around the ! current situation at repretition j, the e(k) is now set to the associated error history in repetition j. Then taking a step of length " in ! the steepest descent direction in the next repetition, creates the learning law update u j +1 (k) = u j (k) " # ($J j /$u j ) . Writing this for each time step produces the Euclidean norm contraction ! mapping learning law !
! !
u j +1 = u j + "P T e j
(43)
If the singular value decomposition of P is given by P = USV T , then P T = VSU T . The partial isometry law that learns faster at high frequency is given by setting S to the identity matrix in this law, producing !
u j +1 = u j + "VU T!e j ! 10 American Institute of Aeronautics and Astronautics !
(44)
Working backwards, we can find a weight factor to include in the cost (41), such that one would obtain this learning law. The result is to modify (41) to read
J j (0) =
1 T e (1)[V T S "1V ]e(1) 2
(45)
with appropriate truncation for later time steps. We now ask, how can the actuator inequality constraints be incorporated into these learning laws by conversion ! to quadratic programming problems? For both laws, there is a quadratic cost for each time step k to determine the control action u(k) , where the cost is only a function of this one variable. And one takes a step that makes this cost descend. One could consider this to be a quadratic programming problem, and include the inequality constraints in the computation. In place of taking a full step to the minimum of this problem, one could take a step of length " toward the minimum. One would start this for determining u(0) for the next iteration. The computation of u(1) is a ! new QP problem, and should start from the result of the u(0) computation. After doing this for all p time steps, one is ready to apply the control action for the next iteration. Thus, to implement the actuator limitations into these two ! ILC laws requires solving multiple QP problems for each iteration, whereas the quadratic cost ILC law only requires ! ! the solution of one QP problem. Each learning law has its own advantages, so one would have to evaluate whether ! the advantages for one of the two laws considered in this section are sufficiently important to justify the extra computational effort.
VII. Numerical Examples ILC Laws Based on QP The ILC laws proposed in Section V, the quadratic cost ILC with actuator inequality constraints is applied to a linear single degree of freedom robot example. Consider a one-link robot described by the second-order equation
Iq&& + bq& = u
(46)
with torque input u and angle q. The robot is equipped with a proportional controller and rate feedback, and a block diagram is shown in Fig. 1. Actuators often have dynamics that must be considered, in which case the control saturation is on the output of an analog device that follows the digital controller output. Here we choose to place a first order transfer function in this position before the u(t) actuator output signal which is to obey saturation limits. Note that since the treatment is digital, we will be able to impose the saturation limits on the actuator only at the sample times. With an appropriate sample rate this should be sufficient for practical purposes. The quadratic cost ILC law in this case consists in iteratively solving problem (28)/(29), replacing the computation of δj+1u in (29) by equation (31).!We investigate trajectories of duration 1s, and sampling frequencies of 100Hz, resulting in 100 time steps. The size of the QP variable vector therefore is 100, which is small for typical QP solvers. We have applied a standard constrained QP solver from the Nag numerical library. Torque constraints (29) / (31) result in 200 inequality constraints, in addition 200 bounds on the steps δ j+1yr have to be included for technical reasons, leading to a total of 400 constraints in the QP. The torque limits are assumed to be +/- 115 Nm. In Fig. 2, we show the effect of including the torque limits in the QP iterations. The target trajectory in this case is feasible, but during a substantial interval of the trajectory, application of the maximum torque is required. The trajectory can be seen as the solution of an optimal control problem maximizing rotation angle in a given time, fixing initial angle, velocity and torque, and final velocity and torque to zero. The top row shows QP-based ILC iterations ignoring the torque limits, i.e. just solving the unconstrained QP. In the bottom row, torque limits are considered as constraints in the QP. In both cases in the left plots, the blue trajectories show the desired trajectory, and the commanded trajectory (in green) is initialised by these values. The red plots show the output of the iterations which initially stay behind the desired solution, but – by performing QP ILC iterations – finally converge to the desired solution. The lower plots show that convergence is possible while satisfying the constraints. In a second set of computations we prescribe an infeasible trajectory which is similar in shape to that of the previous trajectory but would require a maximum torque of about 140. In this case, obviously the iterations cannot converge to the desired trajectory since it is infeasible, but they will converge to a trajectory as close as possible to the desired one (in terms of error eD). We also may be more likely to encounter the problem that already the initial guess for the command produces infeasible torques, and therefore requires adjustment, since in the QP constraints 11 American Institute of Aeronautics and Astronautics
used above, infeasibility of torques is only addressed after the first step, but not for the initial value. In Fig. 3, we show that the initial command (i.e., desired trajectory) shown in green leads to infeasible torques which have to be adjusted to stay within constraints (red torque plot). Using equation (31), the adjusted command producing these torques can be computed (red). QP ILC iterations, as described above can now be started using the modified initial command. Results of these iterations are shown in Fig. 4. The left plot shows the infeasible desired trajectory (blue) as well as the feasible input (green) and output iterations (red). It can be seen that the output converges to a feasible trajectory close to the desired one. The right plot shows the resulting torques of QP ILC iterations which always remains within the limits.
VIII. Conclusions The theory of iterative learning control has generated several very effective learning laws that can significantly improve the performance of an existing feedback control system within a few iterations. Reference [8] and the references therein, present experimental results on a commercial robot that improved the tracking accuracy by a factor of about 1000 in 12 iterations for learning. These methods are based on linear theory, and do not acknowledge possible saturation nonlinearities on the actuators. This paper shows how one can convert three major classes of ILC laws into quadratic programming problems, and as a result one can introduce the actuator inequality constraints. The minimization of a quadratic expression subject to linear inequality constraints, the QP problem, is a standard problem in numerical analysis having very effective and quick solution methods. The paper develops how to formulate the QP problem for feedback control systems using classical controller designs, or classical controller with rate feedback or filtered feedback, or modern control designs using observers or Kalman filters. A future work will perform detailed numerical studies to investigate the properties of the approach.
References 1
Z. Bien and J.-X. Xu, editors, Iterative Learning Control – Analysis, Design, Integration, and Applications, Kluwer Academic Publishers, Boston, 1998. 2 K. L. Moore and J.-X. Xu, editors, Speical Issue on Iterative Learning Control, International Journal of Control, Special Issue on Iterative Learning Control, Vol. 73, No. 10, July 2000, pp. 930-954. 3 G. Giese, R. W. Longman, and H. G. Bock, “Understanding how Minimum-Time Control Optimizes Link Interaction Forces in Polar Coordinate Robots,” Advances in the Astronautical Sciences, Vol. 102, 1999, pp. 223-242. 4 G. Giese, R. W. Longman, and H. G. Bock, “Time-Optimal Elbow Robot Motion,” International Journal of Robotics and Automation, IASTED, Vol. 20, No. 4, 2005. 5 R. W. Longman, V. H. Schulz, and H. G. Bock, "Path Planning for Satellite Mounted Robots," Keynote Lecture, Proceedings of the 3rd International Conference on Dynamics and Control of Structures in Space, London, England, 27-31 May, 1996. 6 V. H. Schulz, R. W. Longman, and H. G. Bock, "Computer Aided Motion Planning for Satellite Mounted Robots," Journal of Mathematical Methods in the Applied Sciences (MMAS), Vol. 21, 1998. 7 J. Li, R. W. Longman, V. H. Schulz, and H. G. Bock, "Implementing Time Optimal Robot Maneuvers using Realistic Actuator Constraints and Learning Control," Advances in the Astronautical Sciences, Vol. 99, 1998, pp. 355-374. 8 R. W. Longman, “Iterative Learning Control and Repetitive Control for Engineering Practice,” International Journal of Control, Special Issue on Iterative Learning Control, Vol. 73, No. 10, July 2000, pp. 930-954. 9 H. S. Jang and R. W. Longman, "A New Learning Control Law with Monotonic Decay of the Tracking Error Norm," Proceedings of the Thirty-Second Annual Allerton Conference on Communication, Control, and Computing, Monticello, Illinois, Sept. 1994, pp. 314-323. 10 J. S. Jang and R. W. Longman, "An Update on a Monotonic Learning Control Law and Some Fuzzy Logic Learning Gain Adjustment Techniques," Advances in the Astronautical Sciences, Vol. 90, 1996, pp. 301-318. 11 H. S. Jang and R. W. Longman, "Design of Digital Learning Controllers using a Partial Isometry," Advances in the Astronautical Sciences, Vol. 93, 1996, pp. 137-152. 12 M.Q. Phan and J.A. Frueh, “System Identification and Learning Control,” Chapter 15, Iterative Learning Control: Analysis, Design, Integration, and Applications, Z. Bien and J. Xu, Editors, Kluwer Academic Publishing, Norwell, MA, 1998, pp. 285306. 13 D.H. Owens and N. Amann, “Norm-Optimal Iterative Learning Control,” Internal Report Series of the Centre for Systems and Control Engineering, Unviersity of Exeter, 1994. 14 R. W. Longman and K. D. Mombaur, “Implementing Linear Iterative Learning Control Laws in Nonlinear Systems,” Advances in the Astronautical Sciences, to appear. 15 M. Phan and R. W. Longman, "A Mathematical Theory of Learning Control for Linear Discrete Multivariable Systems," Proceedings of the AIAA/AAS Astrodynamics Conference, Minneapolis, Minnesota, August 1988, pp. 740-746. 16 K. Åström, P. Hagander, and J. Strenby, “Zeros of Sampled Systems,” Proceedings of the 19th IEEE Conference on Decision and Control, 1980, pp. 1077-1081.
12 American Institute of Aeronautics and Astronautics
17 R. W. Longman, T. J. Kwon, and P. A. LeVoci, “Making the Learning Control Problem Well Posed – Stabilizing Intersample Error,” Advances in the Astronautical Sciences, Vol. 123, 2006, p. 1143-1162. 18 Y. Li and R. W. Longman, “Addressing Problems of Instability in Intersample Error in Iterative Learning Control,” Advances in the Astronautical Sciences, Vol. 129, 2008, pp. 1571-1591. 19 J. Nocedal and S. Wright, Numerical Optimization, Springer Verlag, New York, 1999.
13 American Institute of Aeronautics and Astronautics
Figure 1: Block diagram of the one-link robot example considered for numerical computations
ILC-QP without torque constraints
ILC-QP with torque constraints
Figure 2: ILC iteration loops for one-link robot example based on QP solution; top row: ignoring torque constraints, bottom row: taking torque constraints into account. Left plots show desired trajectory (blue), input iterations (green) and output iterations (red), the latter finally converging to the desired trajectory. Right plots show the resulting torques during iterations with limits violated in the first case and satisfied in the second.
Figure 3: Re-initialisation of command in case of infeasible initial torques. Right plot: torque corresponding to original initial command (green) violates constraints and is adjusted (red). The violation in this case is related to the fact that the desired solution would require infeasible torques (blue). Left plot: Adjustment of initial commands (green to red) corresponding to the described adjustment of torques.
Figure 4: ILC iteration loops for one-link robot example based on QP solution for the infeasible solution starting with the adjusted initial command shown in figure 3. Left plot: infeasible desired trajectory (blue), feasible input (green) and output iterations (red); the latter in this case converge to as close as possible, but not exactly to desired trajectory. Right plot: resulting torques of ILC iterations, respecting the limits.