A discrete-time stochastic learning control algorithm - Semantic Scholar

0 downloads 0 Views 318KB Size Report
The author is with Lebanese American University, Byblos 48328, Lebanon. (e-mail: [email protected]). Publisher Item Identifier S 0018-9286(01)05129-7.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

877

A Discrete-Time Stochastic Learning Control Algorithm Samer S. Saab, Senior Member, IEEE

Abstract—One of the problems associated with iterative learning control algorithms is the selection of a “proper” learning gain matrix for every discrete-time sample and for all successive iterations. This problem becomes more difficult in the presence of random disturbances such as measurement noise, reinitialization errors, and state disturbance. In this paper, the learning gain, for a selected learning algorithm, is derived based on minimizing the trace of the input error covariance matrix for linear time-varying systems. It is shown that, if the product of the input/output coupling matrices is full-column rank, then the input error covariance matrix converges uniformly to zero in the presence of uncorrelated random disturbances. However, the state error covariance matrix converges uniformly to zero in presence of measurement noise. Moreover, it is shown that, if a certain condition is met, then the knowledge of the state coupling matrix is not needed to apply the proposed stochastic algorithm. The proposed algorithm is shown to suppress a class of nonlinear and repetitive state disturbance. The application of this algorithm to a class of nonlinear systems is also considered. A numerical example is included to illustrate the performance of the algorithm. Index Terms—Iterative learning control, optimal control, stochastic control.

I. INTRODUCTION

I

TERATIVE learning control algorithms are reported and applied to fields topping robotics [4]–[7], and [23]–[36], for example, precision speed control of servomotors and its application to a VCR [1], cyclic production process with application to extruders [2], and coil-to-coil control in rolling [3]. Several approaches for updating the control law with repeated trials on identical tasks, making use of stored preceding inputs and output errors, have been proposed and analyzed [4]–[38]. Several algorithms assume that the initial condition is fixed; that is, at each iteration the state is initialized always at the same point. For a learning procedure that automatically accomplishes this task, the reader may refer to [28] and [17]. For algorithms that employ more than one history data, refer to [15] and [16]. A majority of these methodologies are based on contraction mapping requirements to develop sufficient conditions for convergence and/or boundedness of trajectories through repeated iterations. Consequently, a condition on the learning gain matrix is assumed in order to satisfy a necessary condition for contraction. Based on the nature of the control update law, conditions on , such as or , are imposed (where the matrix is a function of input/output coupling functions to the system) Manuscript received September 1, 1999; revised April 25, 2000. Recommended by Associate Editor M. Polycarpou. This work was supported by the University Research Council at the Lebanese American University. The author is with Lebanese American University, Byblos 48328, Lebanon (e-mail: [email protected]). Publisher Item Identifier S 0018-9286(01)05129-7.

[4]–[20]. Several methods are proposed to find an “appropriate” value of the gain matrix . For example, the value of is found based on gradient methods to minimize a quadratic cost error function between successive trials [35], another is based on providing a steepest-descent minimization of the error at each iteration [37], an additional is achieved by choosing larger values of the gain matrix until a convergence property is attained [38]. Another methodology, based on frequency response technique, is also presented in [25]. On the other hand, for task-level learning control, an on-line numerical multivariate parametric nonlinear least square optimization problem is formulated and applied to a two-link flexible arm [24]. Then again, these methods are based on deterministic approach. To the best of the author’s knowledge, this paper presents the first attempt to formulate a computational algorithm for the learning gain matrix, which stochastically minimizes, in a least-square sense, trajectory errors in presence of random disturbances. This paper presents a novel stochastic optimization approach, for a selected learning control algorithm, in which the learning gain minimizes the trace of the input error covariance matrix. A linear discrete time-varying system, with random state reinitialization, additive random state disturbance and biased measurement noise, is initially considered. The selected control update , at a sampled time and iterative index , law consists of the previous stored control input added to weighted difference of the stored output error, that is . Although this update law possesses a derivative action, the proposed algorithm is shown to become insensitive to measurement noise throughout iterations. The problem can be interpreted as follows: assume that the random disturbances are white Gaussian noise and uncorrelated, then, at time and iterative index , find a learning , such that the trace of the input error cogain matrix is minimized. It is shown [11] that variance matrix at is chosen such that if , then boundedness of trajectories of this control law is guaranteed. A choice of this learning gain matrix requires the to be full-column rank. In addition, in abmatrix sence of all disturbances, uniform convergence of all trajectois full-column ries is also guaranteed if and only if rank. This latter scenario is not realistic because there are always random disturbances in the system in addition to measurement noise. Unfortunately, deterministic learning algorithms cannot learn random behaviors. That is, after a finite number of iterations, the repetitive errors are attenuated and the dominating errors will be due to those random disturbances and measurement noise, which cannot be learned or mastered in a determinis istic approach. In this paper, we show that if

0018–9286/01$10.00 © 2001 IEEE

878

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

full-column rank, then the algorithm that generates the learning gain matrix, which minimizes the trace of the input error covariance matrix automatically assures the contraction condition, , which ensures the supthat is pression of repeatable or deterministic errors [11], [14]. Moreover, it is shown that the input error covariance matrix converges uniformly to zero in presence of random disturbances, where the state error covariance matrix converges uniformly to zero in presence of biased measurement noise. Obviously, if the input error is zero and in the case where reinitialization errors are nonzero, then one should not expect this type of learning control to suppress such errors. Furthermore, if a zero-convergence of the state error covariance matrix is desired in presence of state disturbance, then the controller (if any) is expected to “heavily” employ full knowledge of the system parameters, which makes the use of learning algorithms unattractive. On the other hand, it is shown that if a certain condition is met, then knowledge of the state matrix is not required. Application of this algorithm to a class of nonlinear system is also examined. In one case where the nonlinear state function does not require the Lipschitz condition, unlike the results in [5]–[9], [12]–[15], and [24], it is shown that the input error covariance matrix is nonincreasing. In addition, if a certain condition is met, then the results for this class of systems become similar to the case of linear time-varying systems. A class of nonlinear systems satisfying Lipschitz condition is also considered. The outline of the paper is as follows. In Section II, we formulate the problem and present the assumptions of the disturbance characteristics. In Section III, we develop the necessary learning gain that minimizes the trace of the input error covariance matrix, and present the propagation formula of this covariance matrix through iterations. In Section IV, we assume that is full-column rank and present the convergence results. In Section V, we present a pseudo-code for computer application, and give a numerical example. We conclude in Section VI. In Appendix A, we show that if a certain condition is met, then the knowledge of the state matrix is not required. In Appendix B, we apply this type of stochastic algorithm to a class of nonlinear systems.

where

is the learning control gain matrix, and is the output error; i.e., where is a realizable desired output trajectory. It is assumed that for any realizable output trajectory and an appropriate initial , there exists a unique control input condition generating the trajectory for the nominal plant. That is, the following difference equation is satisfied:

(3) is full-column rank and is a Note that if realizable output trajectory, then a unique input generating the output trajectory is given by

Define the state and the input error vectors as , and , respectively. to be the expectation operAssumptions: Denote , and ators with respect to time domain, and iteration domain, respecand be sequences of zero-mean tively. Let is white Gaussian noise such that is pospositive–semidefinite matrix, itive–definite matrix for all , and for all indices , and, ; i.e., have zero crosscorrelathe measurement error sequence at all times. The initial state error and tion with are also assumed to be zero-mean the initial input error is poswhite noise such that itive–semidefinite matrix, and is a symmetrical positive–definite matrix. One simple scenario . Moreover, is uncorrelated is to set , and . The main target is to find with such that the input error a proper learning gain matrix variance is minimized. Note that since the disturbances are assumed to be Gaussian processes, then uncorrelated disturbances are equivalent to independent disturbances.

II. PROBLEM FORMULATION Consider a discrete-time-varying linear system described by the following difference equation:

(1) where

III. STOCHASTIC LEARNING CONTROL ALGORITHM DEVELOPMENT In this section, we develop the appropriate learning gain matrix, which minimizes the trace of the input/state error covariance matrix. In the following, we combine the system given by (1) and (2) in a regressor form. We first write expressions for is given by the state and input errors. The state error at

;

(4)

; ; ; ; and . The learning update is given by

and the input error for iterate

(2)

is given by

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Substituting the values of the state errors in the previous equation, we have

879

variances of the elements of the error covariance matrix

. Toward this end, we first form

Since the initial error state and the disturbances , and . Thus, last equation is now reduced to

Collecting terms, the input error yields to

is uncorrelated with , then at iterate , , and

. The (8)

(5) Combining (4) and (5) in the two-dimensional Roessor Model [39] as shown in (6) at the bottom of the page. The following is justified because the input vector of difference equation (6) has zero mean. Note that if disturbances are nonstationary or colored noise, then the system can be easily augmented to cover this class of disturbances. Writing (6) in a compact form, we have

Let

where

and

(7) As a consequence, and tive–semidefinite matrices.

where

are

is an

is an

are symmetrical and posi-

vectors where the zero submatrices are due to zero crosscorrelation and . For compactness, we denote between , , , , , , and . Expanding on the left hand terms of (8), we get the equation shown at the bottom of the next page. Consequently, the trace is equivalent to the following: of

vector

matrix, and

trace trace

is an matrix. such Next, we attempt to find a learning gain matrix that the trace of the error covariance matrix is minimized. It is implicitly assumed that the input and state as a covarierrors have zero mean, so it is proper to refer to ance matrix. Note that the mean-square error is used as the performance criterion because the trace is the summation of error

(6)

880

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Expanding and rearranging the terms on the right-hand side of the last equation, we get trace trace

then it will be reasonable to find a proper gain to minimize the instead of . In order to find such that the trace trace of is minimized, we set the derivative of trace of with of to zero as follows: respect to trace

Note that since and are positive–semidefinite matrices, is positive definite, then and in nonsingular. The following solution of is the optimal learning gain which minimizes the trace of

(9) Setting

and

(10) . This is Next, we show that in function accomplished by expressing the state error , , and , of the initial state error , and similarly, the input error in function of , , , , and , , and then correlating and . Iterating the argument of (4), we get

then the above equation is reduced to trace trace (11) Define Since the learning gain, is employed to update the con, and in addition, this gain does not effect trol law at , that is the state error at trace

and

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Similarly, iterating the

argument of (5), we get for

(12) Correlating the right hand terms of (11) and (12), it can be , , , readily concluded that , , and , are all uncorrelated. In addition, these , terms are also uncorrelated with . At this point, the correlation and of (11) and (12) is equivalent to correlating and , where , . and Since the terms cannot be represented as a function of one another, hence these terms are considered uncorrelated. Therefore, , and consequently . is reduced to Then,

881

is also full-row rank. Multiplying both sides of (13), from the and letting , right-hand side, by . implies that The input error covariance matrix associated with the learning gain (13) may now be derived. Note that if the initial condition is symmetric, then, by examining (14), is symof , and metric for all . For compactness, we define . Expanding (14), we have

(15) Substitution of (13) into (15) yields to

(16) Claim 1: Assuming that , and nite, then

is symmetric positive–defi-

(17) Proof: Substituting (13) in

, we get

(13) and the input error covariance matrix becomes One way to show the equality claimed, we multiply the left-hand side of (17) by the inverse of the right-hand side, and show that the result is nothing but the identity matrix (14) IV. CONVERGENCE In this section, we show that “stochastic” convergence is is guaranteed in presence of random disturbances if full-column rank. In particular, it is shown that the input error covariance matrix converges uniformly to zero in presence of random disturbances (Theorem 3), where the state error covariance matrix converges uniformly to zero in presence of biased measurement noise (Theorem 5). In the following, some useful results are first derived. is a full-column rank matrix, then Proposition 1: If if and only if . Proof: Sufficient condition is the trivial case [set in (13)]. Necessary condition. Since is full-column rank, is a full-row rank then

(18) Using a well-known matrix inversion lemma [40], we get

(19) Substituting (19) into the right-hand side of (18), we have

882

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Proof: Equation (16) along with (20), imply

Canceling two terms and rearranging others, the above equation yields to

where again

. We now show that by a contradiction argument. Since is strictly decreasing sequence and bounded from exists. Assume that below by zero, then the . Consider the following ratio test for the : sequence We may rewrite the above equation as Note that

Since is Lemma 2: If is full-column rank, then the learning algorithm, presented by (2), (13), and (16), guarantees is a symmetric positive-defthat , and . Moreover, the eigenvalues inite matrix are positive and strictly less than one, i.e., of , and . Proof: The proof is proceeded by induction with respect . By examining (14), to the iteration index for is assumed to be a symmetric positive–definite masince is symmetric and nonnegative definite. Define trix, then . Equation (17) implies that . Since is full-column rank and is is symmetric possymmetric positive–definite, then symmetric and posiitive–definite. In addition, having are tive–definite implies that all eigenvalues of are positive. Therefore, the eigenvalues of strictly greater than one, which is equivalent to conclude that are positive and strictly less than one. the eigenvalues of is nonsingular. Equation (16) implies that This implies that is nonsingular. Thus, is a symmetric possymmetric itive definite matrix. We may now assume that is sympositive definite. Equation (14) implies that metric and nonnegative definite. Using a similar argument of for , implies that , and consequently , is with strictly positive eigenvalues, is nonsingular and symmetric and positive–definite. Remark: Since all the eigenvalues of are , and strictly positive and strictly less than one (Lemma 2), then there exists a consistent norm such that , and (20) is full-column rank, then the Theorem 3: If learning algorithm, presented by (2), (13), and (16), guarantees . In addition, and that uniformly in as .

is bounded in , then also bounded in , where . This can be seen by em-

ploying (11). Define

and

then

(21) is bounded, then Since is symmetric positive–definite and bounded. Therefore, is symmetric, positive–definite, and bounded. The latter together with the assumption , then one may conclude that that , or

Since

, then the ratio test implies that , which is equivalent to . Employing Proposition 1, we get . Since takes on a finite values ( ), then uniform convergence is equivalent to pointwise convergence. Lemma 4: If , , then for . , and . Then, Proof: Let (positive–semidefinite matrix), and . Taking the limit as , we have . Sim-

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

883

ilarly, . . Therefore, Theorem 5: If is a full-column rank matrix, then in absence of state disturbance and reinitialization errors is positive (excluding biased measurement noise definite), the learning algorithm, presented by (2), (13), and . In addition, (16), guarantees that , , and the state error covariance matrix uniformly in as . Proof: Obviously, the results of the previous theorem still , and , (11) is reduced applies. Setting to

Consequently, taking the expected value of , we have and defining

Taking the limit as

,

TABLE I PERFORMANCE OF PROPOSED ALGORITHM

We assume that , , , , , , and are all available. Starting with , then the pseudocode is as follows: Step 1) For a) apply to the system described by (1), ; and find the output error ; b) employing (22), compute ; c) using (13), compute learning gain ; d) using (2), update the control . e) using (16), update , go to Step 1). Step 2) may be In the case where neglected, then the pseudocode is the same as the previous one excluding the second numbered item, and using (24) instead of . (13), to compute learning gain B. Numerical Example

, we get

System Description: In this section, we apply the proposed learning algorithm to a discrete-time varying linear system, and compare the performance with a similar algorithm where the gain is computed in a deterministic approach. The latter is referred to as “generic” algorithm. The system considered is given by

From

previous

results, ,

. Therefore, . . Again, is straightforward. ,

Consequently, uniform convergence for

V. APPLICATION In this section, we go over the steps involved in the application of the proposed algorithm. A pseudo code is presented for both cases where the term is considered, and neglected. In addition, a numerical example is presented to illustrate the performance of the proposed algorithm. A. Computer Algorithm In order to apply the proposed learning control algorithm, the state error covariance matrix needs to be available. From Section III, one can extract the following

(22)

(23) , and , second, , where “the integration period” , , , and are normally distributed , , white random processes with variances , , and 1, respectively. All the disturbances are assumed to be unbiased except for the measurement noise with . The desired trajectories with domain mean (1 second interval) are the trajectories associated with . The initial input for the generic and proposed al, . The initial input error gorithms , . covariance scalar (single input) Next, the generic algorithm requires the control gain to sat. It is noted that if is chosen isfy is close to zero, then, in the iterative dosuch that main, fast transient response is noticed but at the cost of larger is chosen such that “steady-state” errors. However, if

884

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Fig. 1. State errors and input error at k = 100, for t

2 [0; 100].

is close to one, then, smaller steady-state errors is noticed at the cost of very slow transient response. Consequently, is chosen such that , the “generic” gain . for all Performance: In order to quantify the statistical and deterministic size of input and state errors, after a fixed number of iterations, we use three different measures. For statistical evaluation, we measure the variance and the mean square error of each variable. For deterministic examination, the infinite norm is measured. Since the convergence rate, as the number of iteration increases, is also of importance, then the accumulation of each of these measures (variance, mean square, and infinite norm) are also computed. The superiority of the proposed algorithm can be detected by examining both Table I and Fig. 1. In Fig. 1, the solid, and dashed lines represent the employment of the proposed and generic gains, respectively. The top plot , whereas the bottom plot of Fig. 2 shows the evolution of shows the conformation of Proposition 1. In addition, an examination of Table I and Fig. 2 (top) shows the close conformity of the computed input error variance and the estimated one gen, erated by the proposed algorithm. In particular, at , and , with . VI. CONCLUSION This paper has formulated a computational algorithm for the learning gain matrix, which stochastically minimizes, in a leastsquare sense, trajectory errors in presence of random disturis full-column rank, bances. It is shown that if

then the algorithm automatically assures the contraction condition, that is , which ensures the suppression of repeatable errors and boundedness of trajectories in presence of bounded disturbances. The proposed algorithm is shown to stochastically reject uncorrelated disturbances from the control end. Consequently, state trajectory tracking becomes stochastically insensitive to measurement noise. In paris full-column ticular, it is shown that, if the matrix rank, then the input error covariance matrix converges uniformly to zero in presence of uncorrelated random state disturbance, reinitialization errors, and measurement noise. Consequently, the state error covariance matrix converges uniformly to zero in presence of measurement noise. Moreover, it is shown that, if a certain condition is met, then the knowledge of the state coupling matrix is not needed to apply the proposed stochastic algorithm. Application of this algorithm to class of nonlinear systems is also examined. A numerical example is added to conform with the theoretical results. APPENDIX A MODEL UNCERTAINTY Unfortunately, the determination of the proposed learning gain matrix, specified by (13), requires the update of the state error covariance matrix, which in turns requires full knowledge of the system model. This makes the application of the proposed learning control algorithm somehow unattractive. Note that in many applications, only the continuous-time statistical and deterministic models are available, furthermore, the discrete and , and the system and measurement noise matrices

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Fig. 2. Evolution of learning gain function K , t (bottom).

885

2 [0; 100] for k = 1; 2; 5, and 10 (top), avg

covariance matrices depend on the size of the sampling period (while discretizing). Suppose that a continuous-time linear , time-invariant system is given by where is the time argument in continand uous domain. In order to evaluate the discrete-state matrix , we may use

K

, and avg

P

versus k , 1

  100 k

the state error covariance matrix, hence knowledge of the state matrix is no longer needed. In the case where , and in absence of plant knowledge, then a parameter identification scheme, similar to the one presented in [34], may be integrated with the proposed algorithm. APPENDIX B APPLICATION TO NONLINEAR SYSTEMS A. Application to a Class of Nonlinear Systems

( ) with the sampling period. For a small sampling period, one may approximate . There are several other mathematical applications and the sampling period does not have to where be small, e.g., if the rows of the state matrix corresponding to th output is equal to the associated row of the output coupling , and only the second matrix . For example, if . Therefore, the row of is equal to , then learning gain matrix is reduced to

(24) Note that, by using the learning gain of (24) [instead of (13)], all the results in Sections IV and V still apply as presented. Consequently, the computer algorithm does not require the update of

Consider the state-space description for a class of nonlinear discrete-time varying systems given by the following difference equation:

(25) is a vector-valued function with range in . Dewhere , and . We assume fine and a desired trajectory are given that the initial state , and assumed to be realizable that is, bounded , , and , such that , and . , and are as asThe restrictions on the noise inputs , , sumed in Section II. Again, we define , , , the state error , and the input error vector vector

886

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

. Using (25), and the learning algorithm presented in (2), we obtain

B. Nonlinear Deterministic Disturbance In this section, nonlinear repetitive state disturbance is added to the linear system as follows

(27) may represent the repetitive state disturbance. It is where is uniformly globally Lipschitz assumed that the function . Let in and for , then is also uniformly globally Lipschitz . The error equation corresponding to in and for (27) is given by

Consequently, the input error covariance matrix becomes

(28) where , and since is a function , then . Borrowing the results of from Section III, we find that the learning gain that minimizes is similar to the one presented by (13). In the trace of particular, defining

then (26) Therefore, borrowing some of the results presented in Sections IV and V, one can conclude the following. Theorem 6: Consider the system presented in (25). If is full-column rank and , then the learning algorithm, presented by (2), (26), and (16), . In addition, guarantees that and uniformly in as . is not necessary. In case Lipschitz requirement on , then where . If the function is uniformly globally Lipschitz in and for ; then, positive constant such that . Theorem 7: Consider the system presented in (25), If is full-column rank and is uniformly globally Lip, then the learning algorithm, schitz in and for presented by (2), (26), and (16), guarantees that . In addition, and uniformly in as . Proof: The proof is the same as the corresponding part . of the proof of Theorem 3 except the nonsingularity of is Lipschitz, then positive constant such that Since . Borrowing the results given in [14, Th. 1], then the infinite norms of the state errors is bounded for all , consequently, are bounded. Therefore, is nonsingular.

in place of in the definition of the Substituting matrix of (26), the following results are concluded. is a full-column rank matrix, Theorem 8: If then for any positive constant , the learning algorithm, presented by (2), (26), and (16), will generate a sequence of inputs such that the infinite-norms of the input, output, and state errors , in given by (28) decrease exponentially for . In addition, . . Since , , Proof: Define , and , thus in particular, for , for and . Borrowing results of [14, Corollary 2], then the infinite-norms of the input, output, and state errors given by (28) decrease ex, in . The results related ponentially for to the input error covariance matrix can be concluded from the previous subsection. REFERENCES [1] Y.-H. Kim and I.-J. Ha, “A learning approach to precision speed control of servomotors and its application to a VCR,” IEEE Trans Contr. Syst. Technol., vol. 7, pp. 466–477, July 1999. [2] M. Pandit and K.-H. Buchheit, “Optimizing iterative learning control of cyclic production process with application to extruders,” IEEE Trans Contr. Syst. Technol., vol. 7, pp. 382–390, May 1999. [3] S. S. Garimella and K. Srinivasan, “Application of iterative learning control to coil-to-coil control in rolling,” IEEE Trans Contr. Syst. Technol., vol. 6, pp. 281–293, Mar. 1998. [4] S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation of robots by learning,” J. Robot. Syst., vol. 1, pp. 123–140, 1984. [5] S. Arimoto, “Learning control theory for robotic motion,” Int. J. Adaptive Control Signal Processing, vol. 4, pp. 544–564, 1990. [6] S. Arimoto, T. Naniwa, and H. Suziki, “Robustness of P-type learning control theory with a forgetting factor for robotic motions,” in Proc. 29th IEEE Conf. Decision Control, Honolulu, HI, Dec. 1990, pp. 2640–2645. [7] S. Arimoto, S. Kawamura, and F. Miyazaki, “Convergence, stability, and robustness of learning control schemes for robot manipulators,” in Int. Symp. Robot Manipulators: Modeling, Control, Education, Albuquerque, NM, 1986, pp. 307–316. [8] G. Heinzinger, D. Fenwick, B. Paden, and F. Miyaziki, “Stability of learning control with disturbances and uncertain initial conditions,” IEEE Trans. Automat. Contr., vol. 37, pp. 110–114, Jan. 1992. [9] J. Hauser, “Learning control for a class of nonlinear systems,” in Proc. 26th Conf. Decision Control, Los Angeles, CA, Dec. 1987, pp. 859–860. [10] A. Hac, “Learning control in the presence of measurement noise,” in Proc. American Control Conf., 1990, pp. 2846–2851.

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

[11] S. S. Saab, “A discrete-time learning control algorithm for a class of linear time-invariant systems,” IEEE Trans. Automat. Contr., vol. 40, pp. 1138–1142, June 1995. , “On the P-type learning control,” IEEE Trans. Automat. Contr., [12] vol. 39, pp. 2298–2302, Nov. 1994. [13] S. S. Saab, W. Vogt, and M. Mickle, “Learning control algorithms for tracking slowly varying trajectories,” IEEE Trans. Syst., Man, Cybern., vol. 27, pp. 657–670, Aug. 1997. [14] S. S. Saab, “Robustness and convergence rate of a discrete-time learning control algorithm for a class of nonlinear systems,” Int. J. Robust Nonlinear Control, vol. 9, no. 9, July 1999. [15] C.-J. Chien, “A discrete iterative learning control for a class of nonlinear time-varying systems,” IEEE Trans. Automat. Contr., vol. 43, pp. 748–752, May 1998. [16] Z. Bien and K. M. Huh, “Higher-order iterative learning control algorithm,” in Proc. Inst. Elec. Eng., vol. 136, 1989, pp. 105–112. [17] Y. Chen, C. Wen, Z. Gong, and M. Sun, “An iterative learning controller with initial state learning,” IEEE Trans. Automat. Contr., vol. 44, pp. 371–376, Feb. 1999. [18] Z. Geng, R. Carroll, and J. Xie, “Two-dimensional model and algorithm analysis for a class of iterative learning control systems,” Int. J. Control, vol. 52, pp. 833–862, 1990. [19] Z. Geng, M. Jamshidi, and R. Carroll, “An adaptive learning control approach,” in Proc. 30th Conf. Decision Control, Dec. 1991, pp. 1221–1222. [20] J. Kurek and M. Zaremba, “Iterative learning control synthesis based on 2-D system theory,” IEEE Trans. Automat. Contr., vol. 38, pp. 121–125, Jan. 1993. [21] T. Y. Kuc, J. S. Lee, and K. Nam, “An iterative learning control theory for a class of nonlinear dynamic systems,” Automatica, vol. 28, no. 6, pp. 1215–1221, 1992. [22] K. L. Moore, “Iterative Learning control for deterministic systems,” in Advances in Industrial Control. London, U.K.: Springer-Verlag, 1993. [23] P. Bondi, G. Casalino, and L. Gambardella, “On the iterative learning control schemes for robot manipulators,” IEEE J. Robot. Automat., vol. 4, pp. 14–22, Aug. 1988. [24] D. Gorinevsky, “An approach to parametric nonlinear least square optimization and application to task-level learning control,” IEEE Trans. Automat. Contr., vol. 42, pp. 912–927, July 1997. [25] R. Longman and S. L. Wirkander, “Automated tuning concepts for Iterative learning and repetitive control laws,” in Proc. 37th Conf. Decision Control, Dec. 1998, pp. 192–198. [26] J. Craig, “Adaptive control of manipulators through repeated trials,” in Proc. Amer. Control Conf., San Diego, CA, June 1984, pp. 1566–1573. [27] C. Atkeson and J. McInntyre, “Robot trajectory learning through practice,” in Proc. IEEE Int. Conf. Robotics Automation, San Francisco, CA, 1986, pp. 1737–1742. [28] P. Lucibello, “Output zeroing with internal stability by learning,” Automatica, vol. 31, no. 11, pp. 1665–1672, 1995. , “On the role of high-gain feedback in P-type learning control of [29] robot arms,” IEEE Trans. Syst., Man, Cybern., vol. 12, pp. 602–605, Aug. 1996.

887

[30] A. De Luca and S. Panzieri, “End-effector regulation of robots with elastic elements by an iterative scheme,” Int. J. Adapt. Control Signal Processing, vol. 10, no. 4–5, pp. 379–393, 1996. [31] R. Horowitz, W. Messner, and J. B. Moore, “Exponential convergence of a learning controller for robot manipulators,” IEEE Trans. Automat. Contr., vol. 36, pp. 890–894, July 1991. approach to [32] N. Amann, D. Owens, E. Rogers, and A. Wahl, “An linear iterative learning control design,” Int. J. Adaptive Control Signal Processing, vol. 10, no. 6, pp. 767–781, 1996. [33] S. Hara, Y. Yamamoto, T. Omata, and M. Makano, “Repetitive control systems: A new type servo system for periodic exageneous signals,” IEEE Trans. Automat. Contr., vol. 33, pp. 659–667, May 1998. [34] S. R. Oh, Z. Bien, and I. H. Suh, “An iterative learning control method with application for the robot manipulators,” IEEE J. Robot. Automat., vol. 4, pp. 508–514, Oct. 1988. [35] M. Togai and O. Yamano, “Analysis design of an optimal learning control scheme for industrial robots: A discrete system approach,” in Proc. 24th Conf. Decision Control, Ft. Lauderdale, FL, Dec. 1985, pp. 1399–1404. [36] N. Sadegh and K. Gluglielmo, “Design and implementation of adaptive and repetitive controllers for mechanical manipulators,” IEEE Trans. Robot. Automat., vol. 8, pp. 395–400, June 1992. [37] F. Furuta and M. Yamkita, “The design of a learning control system for multivariables systems,” in Proc. IEEE Int. Symp. Intelligent Control, Philadelphia, PA, Jan. 1987, pp. 371–376. [38] K. L. Moore, M. Dahleh, and S. P. Bhattacharyya, “Adaptive gain adjustment for a learning control method for robotics,” in Proc. 1990 IEEE Int. Conf. Robotics Automation, Cincinnati, OH, May 1990, pp. 2095–2099. [39] R. Roessor, “A discrete state-space model for linear image processing,” IEEE Trans. Automat. Contr., vol. AC-21, pp. 1–10, Jan. 1975. [40] F. L. Lewis, Applied Optimal Control and Estimation: Digital Design and Implementation. Upper Saddle River, NJ: Prentice-Hall, 1992.

H

Samer S. Saab (S’92–M’93–SM’98) received the B.S., M.S., and Ph.D. degrees in electrical engineering in 1988, 1989, and 1992, respectively, and the M.A. degree in applied mathematics in 1990, all from the University of Pittsburgh, Pittsburgh, PA. He is Assistant Professor of Electrical and Computer Engineering at the Lebanese American University, Byblos, Lebanon. Since 1993, he has been a Consulting Engineer at Union Switch and Signal, Pittsburgh, PA. From 1995 to 1996, he was a System Engineer at ABB Daimler-Benz Transportation, Inc., Pittsburgh, PA, where he was involved in the design of automatic train control and positioning systems. His research interests include iterative learning control, Kalman filtering, inertial navigation systems, nonlinear control, and development of map matching algorithms.