Applied Economics Letters, 2007, 14, 529–531
An iterative method for flattering the ridge in the Ridge regression Miyoung Lee
Downloaded By: [2007-2008-2009 Konkuk University - Seoul Campus] At: 05:45 29 July 2010
Department of MIS, School of Business, Konkuk University, Seoul, Korea E-mail:
[email protected]
In many cases of multiple regression in an undefined system, some independent variables may not be orthogonal to each other. In such cases, the systems either are not solvable or induce incorrect results which could vary with the data used. Such problems are often overcome by using the Ridge regression method. This article proposes an alternative way of getting an exact least square estimator by using an iterative method. We prove the solvability of the proposed algorithm and demonstrate that our method outperforms traditional approaches.
I. Introduction
regression estimator is still subject to some error due to the biased model of the matrix.
There has been a surge in studies on the Ridge regression since the seminal work by Hoerl and Kennard (1968). Hoerl and Kennard (1970) point out that the ordinary least squares estimator based on data exhibiting near extreme multicollinearity would be subject to a number of errors and this problem can yield one or a few small eigenvalues in its correlation matrix. In fact, the difficulty of obtaining analytical results for this kind of estimator is well-defined in extant literature. Riley (1955) explores such a linear system with symmetric but almost singular matrix. James and Stein (1961) adopt a method of permitting a little of error to overcome these kinds of difficulties. Heorl (1964) also explores the potential drawback of the Ridge method. In a standard regression model for multiple linear form, Y ¼ X þ ", where Y is k 1, X is k n and of rank(X ) ¼ n, is n 1 and the random vector ", normal with E(") ¼ 0, the Ridge regression estimator evaluates * ¼ (X 0 X þ kI )1X 0 Y using a slightly biased matrix X 0 X þ kI for a positive value k instead of using ¼ (X 0 X)1X 0 Y in an unbiased original linear system. Also this Ridge regression method is applied for other studies for regression analysis as in Seo (1999) for dynamic plots for a higher order regression model. However this Ridge
X 0X þ k By eliminating the aforementioned errors prevailing in the Ridge regression estimator this article suggests an alternative way of achieving exact least squared solutions for the system Y ¼ X þ " even for the ill-conditioned correlation matrix system X 0 X. The remainder of the article proceeds as follows. In Section II, we derive an iterative method of solving for an exact least squared estimator. Section III explores a numerical example to highlight the difference between our method and the simple Ridge regression model. Concluding remarks are made in Section IV.
II. An Iterative Method Consider a standard regression model for multiple linear form. The ordinary least square estimator , which minimizes the sum of squares of residuals is equivalent to maximum likelihood estimator; ¼ ðX 0 XÞ1 X 0 Y
ð1Þ
Applied Economics Letters ISSN 1350–4851 print/ISSN 1466–4291 online 2007 Taylor & Francis http://www.tandf.co.uk/journals DOI: 10.1080/13504850500425840
529
M. Lee
530
B0 ¼ ðX 0 X þ kIÞ1 X 0 Y
ð2Þ
Then, at the mth step of the iteration, derive a solution Bm by the recursive relation; ð3Þ
As we can see in Golub and Van Loan (1989, Theorem 10.1.1, p. 508), with the following theorem, the spectral radius of any n n matrix G can be defined as ðGÞ ¼ maxfjj: 2 ðGÞg where (G) stands for the entire set of eigenvalues of matrix G. Theorem 1: Suppose that b is n 1 and A ¼ M N is a n n nonsingular matrix. If M is nonsingular and the spectral radius of M1N satisfies the inequality (M1N) < 1, then the iterate Bm defined by MBm ¼ NBm1 þ C converges to B ¼ A1C for any starting B0, which is n 1. Proof: See Golub and Theorem 10.1.1, p. 508).
Van
Loan
(1989, œ
Theorem 2: For k 1 matrix, Y, k n matrix X with rank(X) ¼ n, the iterative method specified as in Equation 3 with a initial guess given in Equation 2 converges to a problem solving the systems of equations (X 0 X)B ¼ X 0 Y and enables us to obtain the unique least squares estimator ¼ (X 0 X)1X 0 Y to Y ¼ X þ ", " is n 1, a random variate.
Our proposed iteration method is coded in matlab5.3. As an example, the random variables x1, x2 and x3 are used with 50 observations generated according to the model, y ¼ x1 2x2 þ x3 þ Nð0, 0:12 Þ The corresponding correlation matrix system X 0 X has eigenvalues 3 ¼ 12.0, 2 ¼ 10 and 3 > 0 (but close too). Fig. 1 and Fig. 2 illustrates the regression estimator of the Ridge regression and the proposed iteration method, respectively. Figure 1 evidently demonstrates that the Ridge regression estimation suffers from bias, which occurs due to the bias in the correlation matrix of the regressors.
The graph with the Ridge regression estimator 10 Data given The Ridge regression result Exact least squares estimator 5
0
−5
−10 0
Proof: For any k > 0 in Equation 3, the mth step of the iteration is denoted by 0
m
m1
ðX X þ kIÞB ¼ kB
0
þX Y
20 30 Data index
40
50
The Ridge regression estimator
ð4Þ
Then denote M ¼ X 0 X þ kI and N ¼ kI. And we arrange the eigenvalues of matrix X0 X as follows; max ¼ n n1 2 1 ¼ min > 0
10
Fig. 1.
ð5Þ
Since X has a full column rank, the positive eigenvalues of correlation matrix X 0 X can be assumed using the definition of positive definite as in Strang (1976, definition 6B, p. 331). Then G. Strang (1988, p. 258) shows that eigenvalues of M1N are k j i ¼ 1, 2, . . . , n ð6Þ i þ k From Equations 5 and 6, the spectral radius of M1N is (k/(min þ k)) < 1 for any k > 0. Therefore, by Theorem 1, the iterates defined by Equation 5 converge to B ¼ (X 0 X)1X 0 Y for any n 1 matrix Y. œ
The graph with the estimator from the iteration result Value of the functions on each data index
Downloaded By: [2007-2008-2009 Konkuk University - Seoul Campus] At: 05:45 29 July 2010
Bm ¼ ðX 0 X þ kIÞ1 ðkBm1 þ X 0 YÞ
III. Numerical Illustration
Value of the functions on each data index
We suggest a computational algorithm to obtain an unbiased solution for Y ¼ X þ ", which can be denoted as in Equation 1 when X 0 X is almost singular. At the first step of the iteration, we use a Ridge regression estimator, B0 as an initial value,
10 Data given Iteration result Exact least squares estimator 5
0
−5
−10 0
Fig. 2.
10
20 30 Data index
40
The estimator from the iteration method
50
An iterative method for flattering the ridge in the Ridge regression In Fig. 2, however, the bias is shown to be eliminated completely when the iteration scheme is added.
Downloaded By: [2007-2008-2009 Konkuk University - Seoul Campus] At: 05:45 29 July 2010
IV. Concluding Remark This article demonstrates that adding an iteration scheme to the Ridge regression method successfully eliminates bias inherent in the standard Ridge regression by adopting an exact least square solution to the correlation matrix instead of the biased Ridge regression estimator. In principle, the iterative method proposed in this article eliminates the ridge on the matrix in the Ridge regression through iteration. Put differently, the matrix that we solve for in each iterative step is not singular unlike the matrix used in the extant Ridge method.
531
References Golub, G. H. and Van Loan, C. F. (1989) Matrix Computations, 2nd edn., The Johns Hopkins University Press, Baltimore and London. Hoerl, A. E. (1964) Ridge analysis, Chemical engineering progress symposium series, 60, 67–77. Hoerl, A. E. and Kennard, R. W. (1968) On regression analysis and biased estimation, Technometrics, 10, 422–3. Hoerl, A. E. and Kennard, R. W. (1970) Ridge Regression: Based Estimation for Nonorthogonal Problems. Technometrics 12, 55–67. James, W. and Stein, C. M. (1961) Estimation with quadratic loss, Proc. 4th Berkeley Symposium, 1, 361–79. Riley, J. D. (1955) Solving systems of linear equations with a positive definite. Sysmetric, but possibly ill-conditioned matrix, Mathematics of Computation, 9, 96–101. Strang, G. (1988) Linear Algebra and its Application, 3rd edn., Harcourt Brace Jovanovich, Inc. Seo, H. S. (1999) A dynamic plot for the specification of curvature in linear regression, Computational Statistics and Data Analysis, 30, 221–8.