A Natural Decomposition of R2 in Multiple Linear Regression Anusar Farooqui⇤ August 22, 2016
Abstract 2
We show how to decompose R into components that capture the percentage of variation explained by predictor in a multiple linear regression.
We have n predictors and K observations in the sample, yk , x1k , . . . , xnk ,
for
k = 1, . . . , K,
(1)
where all variables have been standardized to have mean zero and variance one. That is, we have centered and rescaled the observations such that for i = 1, . . . , n, K X
yk = 0 =
k=1
1 K
1
K X
K X
xik ,
(2)
k=1
yk2
=1=
k=1
1 K
1
K X
x2ik .
(3)
k=1
Standardizing all variables in this manner is without loss of generality since R2 is manifestly invariant to centering and rescaling of variables. Our multiple linear model is then given by yk = 1 x1k + · · · + n xnk + "k , for k = 1, . . . , K, (4)
since standardization ensures that the intercept in the regression (4) is identically zero. We estimate the slope coefficients and obtain the fitted values, yˆk := ˆ1 x1k + · · · + ˆn xnk ,
(5)
where ˆi are the estimated slope coefficients for predictors i = 1, . . . , n. Let COV(·, ·) denote the sample covariance operator, defined for centered vectors x and y by COV(x, y) := ⇤
1 K
1
K X
x k yk .
k=1
Indian Institute of Management, Udaipur 313001, India. Email:
[email protected].
1
(6)
Then, R
2
:= = = =
PK
Pk=1 K
yˆk2
,
2 k=1 yk K X
1
K
1
(7) yˆk2 ,
(8)
k=1
COV (ˆ y , yˆ) , ! n X ˆ COV ˆ , i xi , y
(9) (10)
i=1
=
n X
ˆi COV(xi , yˆ),
(11)
i=1
That is, we have the decomposition, R2 = ˆ1 COV(x1 , yˆ) + · · · + ˆn COV(xn , yˆ),
(12)
We can therefore define the percentage of variation explained by predictor i, denoted by Ri2 , by Ri2 := ˆi COV(xi , yˆ). (13) We have ignored statistical considerations altogether to focus entirely on the algebra since given the sample data and the estimated slope coefficients, R2 is a determinate quantity. We also did not mention the estimation technique used to obtain the slope coefficients since the decomposition does not depend on it. Suppose we use ordinary least squares to obtain the slope coefficients. The normal equations for ordinary least squares, K X n X
xik xjk ˆjols =
k=1 j=1
K X
xik yk ,
for
i = 1, . . . , n,
(14)
k=1
imply that the least squares errors are orthogonal to the predictors, K X
xik "ˆols k = 0,
for
i = 1, . . . , n.
(15)
k=1
Thus, and
COV(xi , yˆols ) = COV(xi , y),
(16)
Ri2 = ˆiols COV(xi , y).
(17)
2