a function to a set of data optimally and to characterize the statistical .... holds for all t where Mλ is a positive constant number. ..... The θ0 is the y-intercept of the.
A Recursive Dual Minimum Algorithm Qi Zhu1 , and Shaohua Tan 2 1 Department of Computer Science, University of Houston - Victoria, Victoria, Texas, US 77901 2 Department of Electrical Engineering, National University of Singapore, Singapore, Singapore 119260 Abstract— The various RDM learning algorithms are developed by choosing different Λ(θ) for LIP models, including Projection, Recursive Dual Minimum, Recursive Dual Mean Minimum, λ-weighted Dual M 0, 0 < α = 12 < 2, and et = yt − ϕTt θt−1 . Delete e2t from both numerator and denominator from (19) and β is a any positive number, so we can obtain the Projection Learning Algorithm [1], [6] in adaptive control. θt = θt−1 +
i=1
holds for all t where M λ is a positive constant number. Theorem 1 For any given initial value θ 0 , the vector sequence θt generated in (12) has the following properties: (i) θt − θ ≤ θ0 − θ (14)
0 1
where 0t−1 is the (t − 1)th-order square zero matrix. In this case, RDM learning algorithm in (12) can reduce to the 1storder learning algorithm as.
(12)
where β > 0, 0 < α < 4, t = 1, 2, ..., ρ(t).
ρ(t)
3. Various Specialized RDM Learning Algorithms
3.1 Projection Learning Algorithm
1 T 1 T Jt = θt−1 Pt θt−1 − θt−1 Qt + Rt (11) 2 2 Furthermore, we have the General Form of RDM learning algorithm as follow. θt = θt−1 +
From (4) - (12), we can see that when Λ t is computed recursively. We can also obtain some specialised RDM learning algorithms by various choices of λ t .
α ϕt (yt − ϕTt θt−1 ) β + ϕTt ϕt
(20)
where t = 1, 2, · · · , β > 0, 0 < α = 12 < 2. Thus Projection learning algorithm is a special case of RDM algorithm when we choose a set of specific parameters. This Projection learning algorithm minimizes the cost function Jt = 12 e2t .
3.2 Recursive Dual Minimum Learning Algorithm The conventional recursive least squares (RLS) algorithm is a powerful learning algorithm in adaptive control. But the algorithm is applied with the condition that Φ is of full rank, the convergence is very slow, and the computation is quite intensive. In this subsection, we propose the recursive Dual Minimum learning algorithm, which is free of full rank and has much less computation. Choose ρ(t) = t and Λt = It where It is the tth-order identity matrix, then the Eqs. of θ t and Jt as. αJt T (Qt − Pt θt−1 ) (21) β + (Qt − Pt θt−1 )T T (Qt − Pt θt−1 ) 1 T 1 T Jt = θ Pt θt−1 − θt−1 Qt + Rt (22) 2 t−1 2 where β > 0, 0 < α < 4, t = 1, 2, ...,, and T is chosen to be a symmetric positive definite matrix. θt
=
θt−1 +
Pt , Qt and Rt defined in (8) - (10) have the recursive computation formulas as Pt = Pt−1 + ϕt ϕTt Qt = Qt−1 + yt ϕt Rt = Rt−1 + yt2
with P0 = 0 with Q0 = 0 with R0 = 0
(23)
3.3 Recursive Dual Mean Minimum Learning Algorithm In the preceding subsection, we have discussed the Recursive Dual Minimum learning algorithm, which is a powerful learning algorithm in identification. However, since the trace of Λt = It is not uniformly bounded, P t , Qt and Rt may become very large when t increases. In this subsection, we will propose a Recursive Minimum Mean Squares learning algorithm. Choosing ρ(t) = t and Λ t = 1t It , the Recursive Minimum Mean Squares learning algorithm is: =
Jt
=
αJt (Qt − Pt θt−1 ) (24) β + (Qt − Pt θt−1 )T (Qt − Pt θt−1 ) 1 T 1 T θ Pt θt−1 − θt−1 Qt + Rt (25) 2 t−1 2
θt−1 +
where β > 0, 0 < α < 4, t = 1, 2, .... Pt , Qt and Rt defined in (8) - (10) have the recursive computation formulas as 1 T Pt = t−1 t Pt−1 + t ϕt ϕt t−1 1 Qt = t Qt−1 + t yt ϕt 1 2 Rt = t−1 t Rt−1 + t yt
with P0 = 0 with Q0 = 0 with R0 = 0
Dual
⎥ ⎥ ⎥ ⎦
(27)
where 0 < λ < 1. Then the Recursive λ-weighted Minimum Squares learning algorithm includes (11, 12) and P t , Qt and Rt defined in (8) - (10) have the recursive computation formulas as Pt = λPt−1 + ϕt ϕTt with P0 = 0 Qt = λQt−1 + yt ϕt with Q0 = 0 (28) Rt = λRt−1 + yt2 with R0 = 0 It is easy to prove that the matrix Λ t defined in Eq. (27) satisfies the condition in (13) because tr(Λt ) =
Minimum
In adaptive control [1], [8], the conventional recursive least squares algorithm with forgetting index λ (named as λ-RLS algorithm) is another powerful learning algorithm. However, the algorithm is applied with the condition that Φ is of full rank and the computation is quite intensive. In this subsection, we propose a new Recursive λ-weighted Dual Minimum learning algorithm which is free of the full rank of Φ and has less computation.
t
λt−i =
i=1
1 1 − λt ≤ 1−λ 1−λ
(29)
is uniformly bounded for all t. Thus, the Recursive λweighted Dual Minimum learning talgorithm globally minimizes the cost function J t = 12 i=1 λt−i e2i .
3.5 Instantaneous k-order RDM Learning Algorithm In this subsection, we derive a power instantaneous korder dynamic RDM learning algorithm from Eq. (12) for LIP models when choosing specific Λ t . This learning algorithm updates at every step when the system has one more new data sample. Choosing ρ(t) = t and
0t−k 0 (30) Λt = 0 Λ(t, k) where 0t−k is the (t-k)th-order zero matrix. And Λ(t, k) is an k-order symmetric non-negative matrix satisfying tr(Λ(t, k)) ≤ Mλ
(26)
The Recursive Dual Mean Minimum learning algorithm in (25, 26) minimizes t the2 cost function of mean of squares 1 of errors Jt = 2t i=1 ei → 0.
3.4 Recursive λ-weighted Learning Algorithm
⎤
1
This new Recursive Dual Minimum learning algorithm does not need to assume that Φ is of full rank and minimizes the cost function J t = 12 ti=1 e2i , which is the same as the cost function of the conventional recursive least squares [11]. Note that the matrix Λ t = It does not satisfy the condition in (13), so we must choose a small symmetric positive definite matrix T to avoid the burst phenomenon. This technique for the Recursive Dual Minimum is effective in practice of identification and control.
θt
Choose ρ(t) = t and Λt to be ⎡ t−1 λ ⎢ λt−2 ⎢ Λt = ⎢ .. ⎣ .
(31)
for all t for a positive constant number M λ . One important matrix for Λ t is when Λ(t, k) = diag{λt−k+1 , λt−k+2 , · · · , λt }, then the Λt is ⎤ ⎡ 0t−k 0 ··· 0 ⎢ 0 λt−k+1 · · · 0 ⎥ ⎥ ⎢ (32) Λt = ⎢ . .. .. ⎥ .. ⎣ .. . . . ⎦ 0
0
···
λt
Our input and output matrices are with k rows from time t − k + 1 to t. Φt Yt
= =
Φ(t, k) = [ϕt−k+1 , ϕt−k+2 , · · · , ϕt ]T Y (t, k) = [yt−k+1 , yt−k+2 , · · · , yt ]
T
(33) (34)
When t < k, λt−k+1 to λ0 , ϕt−k+1 to ϕ0 , and yt−k+1 to y0 are arbitrarily set as the initial values.
Pt , Qt and Rt defined in (8) - (10) have the recursive computation formulas as Pt Qt
= =
Rt
=
Jt
= =
P (t, k) = ΦTt Λ(t, k)Φt = ΦT (t, k)Λ(t, k)Φ(t,(35) k) T T Q(t, k) = Φt Λ(t, k)Yt = Φ (t, k)Λ(t, k)Y (t,(36) k) R(t, k) = YtT Λ(t, k)Yt = Y T (t, k)Λ(t, k)Y (t,(37) k) 1 1 T T J(t, k) = Rt − θt−1 Qt + θt−1 Pt θt−1 2 2 1 1 T T R(t, k) − θt−1 Q(t, k) + θt−1 P (t, k)θt−1 (38) 2 2
Then, the instantaneous k-order Recursive Dual Minimum learning algorithm
θt
=
θt−1 +
αJt (Qt − Pt θt−1 ) (39) β + (Qt − Pt θt−1 )T (Qt − Pt θt−1 )
where β > 0, 0 < α < 4, t = 1, 2,t ..., and it minimizes the cost function J t = J(t, k) = 12 i=t−k+1 λi e2i to its global minimum.
3.6 Batch k-order RDM Learning Algorithm The instantaneous k-order RDM learning algorithm introduced in the preceding subsection updates the parameter vector at every step based on the current input and output data and the last k−1 data. However, the batch k-order RDM learning algorithm developed in this subsection updates parameters at every k-step based on the last k data samples. Choose ρ(t) = kt and
Λt
=
0kt−k 0
0 Λ(kt, k)
(40)
where 0kt−k is the (kt-k)th-order zero matrix. And Λ(kt, k) is an k-order symmetric non-negative matrix satisfying tr(Λ(kt, k)) ≤ Mλ
(41)
Mλ is a positive constant and Λt diag{λkt−k+1 λkt−k+2 , · · · , λkt }, then the Λt is ⎡ ⎢ ⎢ Λt = ⎢ ⎣
0kt−k 0 .. . 0
0 λkt−k+1 .. .
··· ··· .. .
0 0 .. .
0
···
λkt
=
⎤ ⎥ ⎥ ⎥ ⎦
(42)
We further introduce the following notations: Φt Yt
= =
Φ(kt, k) = [ϕkt−k+1 , ϕkt−k+2 , · · · , ϕkt ]T Y (kt, k) = [ykt−k+1 , ykt−k+2 , · · · , ykt ]T
(43) (44)
Pt
= =
P (kt, k) = ΦTt Λ(kt, k)Φt ΦT (kt, k)Λ(kt, k)Φ(kt, k)
(45)
Qt
= =
Q(kt, k) = ΦTt Λ(kt, k)Yt ΦT (kt, k)Λ(kt, k)Y (kt, k)
(46)
YtT Λ(kt, k)Yt
= =
R(kt, k) = Y T (kt, k)Λ(kt, k)Y (kt, k) 1 T Jt = J(kt, k) = Rt − θt−1 Qt + 2 1 T R(kt, k) − θt−1 = Q(kt, k) + 2 Then, Batch k-order Recursive Dual algorithm Rt
θt
= θt−1 +
(47) 1 T θ Pt θt−1 2 t−1 1 T θ P (kt, k)θt−1 (48) 2 t−1 Minimum learning
αJt (Qt − Pt θt−1 ) (49) β + (Qt − Pt θt−1 )T (Qt − Pt θt−1 )
where β > 0, 0 < α < 4, t = 1, 2, ..., and it minimizes 2 the cost function J t = J(kt, k) = 12 kt i=kt−k+1 λi ei to its global minimum.
4. Case Study In this section, we will present an example of industrial applications of various specialized RDM learning algorithms to show that the various RDM learning algorithms deduced are effective. Example Suppose that we are analysts in the management services division of an accounting firm. One of the firm’s clients is American Manufacturing Company, a major manufacturer of a wide variety of commercial and industrial products. American Manufacturing owns a large nine-building complex in Central City and heats this complex by using a modern coal-fueled heating system. In the past, American Manufacturing has encountered problems in determining the proper amount of coal to order each week to heat the complex adequately. Because of this, the firm has requested that the firm develop an accurate way to predict the amount of fuel (in tons of coal) that will be used to heat the nine-building complex in future weeks. The experience indicates that (1) weekly fuel consumption substantially depends on the average hourly temperature (in degrees Fahrenheit) during the week and (2) weekly fuel consumption also depends on factors other than average hourly temperature that contribute to an overall “chill factor”. Some of these factors are: 1) Wind velocity (in miles per hour) during the week 2) “Cloud cover” during the week 3) Variations in temperature, wind velocity, and cloud cover during the week (perhaps caused by the movement of weather fronts).
In this example we use regression analysis to predict the dependent variable weekly fuel consumption y, on the basis of the independent variable average hourly temperature x. Then we will use additional independent variables, which measure the effects of factors such as wind velocity and cloud cover, to help us predict weekly fuel consumption. Suppose that we have gathered data concerning y and x for the n = 8 weeks prior to the current week. This data is given in Table 1. Here the letter i denotes the time order of a previously observed week, where x i denotes the average hourly temperature and y i denotes the fuel consumption that has been observed in week i. It should be noted that it would, of course, be better to have more than eight weeks of data. However, sometimes data availability is initially limited. Furthermore, we have purposely limited the amount of data to simplify subsequent discussions in this example. Week, i 1 2 3 4 5 6 7 8
hourly temperature, xi x1 = 28.0 x2 = 28.0 x3 = 32.5 x4 = 39.0 x5 = 45.9 x6 = 57.8 x7 = 58.1 x8 = 62.5
fuel consumption, yi y1 = 12.4 y2 = 11.7 y3 = 12.4 y4 = 10.8 y5 = 9.4 y6 = 9.5 y7 = 8.0 y8 = 7.5
θ0 + θ1 xi = θ0 + θ1 × 0 = θ0 So the θ0 is the mean weekly fuel consumption for all potential weeks having an average hourly temperature of 0o F . To interpret the meaning of the slope, θ 1 , consider two different weeks. Suppose that for the first week average hourly temperature is c. The mean weekly fuel consumption for all such potential weeks is θ0 + θ1 × c For the second week, suppose that the average hourly temperature is c + 1. The mean weekly fuel consumption for all such potential weeks is θ0 + θ1 × (c + 1) The difference between these mean weekly fuel consumptions is
Table 1 F UEL CONSUMPTION DATA OF E XAMPLE 1
[θ0 + θ1 × (c + 1)] − [θ0 + θ1 × c] = θ1
.
To develop a regression model describing the fuel consumption data, we first consider the fifth week in Table 1 (for the purposes of our discussion we could consider any particular week). In the fifth week the average hourly temperature was x5 = 45.9, and the fuel consumption was y5 = 9.4. If we were to observe another week having the same average hourly temperature of 45.9, we might well observe a fuel consumption that is different from 9.4. This is because factors other than average hourly temperature factors such as average hourly wind velocity and average hourly thermostat setting - affect weekly fuel consumption. Therefore although two weeks might have the same average hourly temperature of x 5 = 45.9, there could be a lower average hourly wind velocity and thus a smaller fuel consumption in one such week than in the other week. It follows that there is an infinite population of potential weekly fuel consumptions that could be observed when the average hourly temperature is x 5 = 45.9. To generalize the proceeding discussion, consider all eight fuel consumptions in Table 1. For i = 1, 2, ..., 8 we may express yi in the form yi = θ0 + θ1 xi + ei
When we plot the eight fuel consumptions against the eight average hourly temperatures. Note that the fuel consumptions tend to decrease in a straight-line fashion as the temperatures increase. The θ 0 is the y-intercept of the straight line, and θ 1 is the slope of the straight line. To interpret the meaning of the y-intercept, θ 0 , assume that xi = 0. Then
(50)
Here, ei is the error term describes the effect on y i of all factors that have occurred in the ith week other than the average hourly temperatures x i .
Thus the slope θ1 is the change in the mean weekly fuel consumption that is associated with a 1-degree increase in average hourly temperature. Then we refer to the equation (50) as the simple linear (or straight-line) regression model relating y i to xi in this example, and θ 0 , θ1 is the parameters of the model. Note that yi is assumed to be randomly selected from the infinite population of potential values of the dependent variable that could be observed when the value of the independent variable x is xi . Let’s use RDM learning algorithm, due to the finite data, we use them repeatedly in our recursive algorithm. When we choose k = 4 and set the initial parameter values [θ0 θ1 ] = [1 1] and the error bound 0.3, then we can get that θ0 = 15.6631 and θ1 = −0.1243. Fig. 1 - Fig. 3 show the results of the simulations using RDM learning algorithm. Then using the values of θ 0 and θ1 , we can predict the amount of fuel for the nine-building complex of American Manufacturing Company in future weeks.
5. Conclusion In this paper, we have developed the various RDM learning algorithms by choosing different Λ(θ), and obtained several useful recursive learning algorithms for LIP models, such as Projection, Recursive Dual Minimum, Recursive Dual Mean Minimum, λ-weighted Dual Minimum, Instantaneous RDM and Batch RDM, etc. We have shown the
k=4
parameter
k=4
16
function
12.5 12
14
11.5
12
11 10 10.5 8 10 6 9.5 4 9 2
8.5
0
8
−2 0
7.5 25
20
40
60
80
100
120
30
35
40
45
50
55
60
Fig. 1 PARAMETERS CONVERGENCE OF E XAMPLE 1 USING R ECURSIVE D UAL
Fig. 3 F ITTING CURVE OF E XAMPLE 1 USING R ECURSIVE D UAL M INIMUM
M INIMUM LEARNING ALGORITHM WHEN k = 4.
LEARNING ALGORITHM .
k=4
65
error
6
5
4
3
2
1
0 0
20
40
60
80
100
120
Fig. 2 E RROR CONVERGENCE OF E XAMPLE 1 USING R ECURSIVE D UAL M INIMUM LEARNING ALGORITHM . E RROR WILL DROP BELOW 0.3 AFTER
101 STEPS .
effectiveness of them by a simulation example in adaptive identification and control fields. Compared with other training methods, RDM learning method has several distinct features. It can avoid the windup and burst phenomena which are the crucial drawbacks for the correspondent conventional learning algorithms. And the korder RDM has a faster training speed than the conventional ones when the k is chosen appropriately, the various new RDM algorithms can be successfully applied in adaptive controller design.
References [1] K.J. Astrom and B. Wittenmark, “Adaptive Control.” Addison-Wesley, Mass., 1989. [2] A. Bjorck, “Numerical Method for Least Squares Problems.” Society for Industrial and Applied Mathematics (SIAM), 1996. [3] S. Chu, S. Lo, “Application of the On-line Recursive Least-squares Method to Perform Structural Damage Assessment.” Journal of Structural Control Health Monitoring. doi: 10.1002/stc.362, 2009. [4] A.A. Giordano, F.M. Hsu, “Least Square Estimation with Applications to Digital Signal Processing.” John Wiley & Sons, 1985. [5] H.K. Fathy; K. Dongsoo; J.L. Stein, “Online vehicle mass estimation using recursive least squares and supervisory data extraction,” American Control conference, pp. 1842-1848, 2008.
[6] G. C. Goodwin and K. S. Sin, “Adaptive Filtering Prediction and Control.” Prentice-Hall, Englewood Cliffs, N.J., 1984. [7] A. Maddi, A. Guessoum, D. Berkani, “Application the recursive extended least squares method for modelling a speech signal.” Proc. of the 2nd ISCCP, 2006. [8] K. S. Narendra and A. M. Annaswamy, “Stable Adaptive Systems.” Prentice Haall, 1989. [9] Y. Nievergelt, “A Tutorial History of Least Squares with Applications to Astronomy and Geodesy.” Journal of Computational and Applied Mathematics, 121(1), pp. 37-72, 2000. [10] Q. Zhu, S. Tan, and Y. Qiao, “A High-order Recursive Quadratic Algorithm of Linear-in-the-Parameter Models,” Journal of Circuit, Systems, and Computers, Vol. 17, No. 2, pp. 1-28, 2008. [11] Q. Zhu, Ying Qiao, and S. Tan, “A Robust High-order Mixed L2Linfty Estimation for Linear-in-the-Parameters Models,” Journal of Scientific Computing, Vol 38, No 2, pp. 185-206, 2009.