Int. J. Agricult. Stat. Sci. Vol. 11, No. 1, pp. 151-154, 2015
ISSN : 0973-1903
ORIGINAL ARTICLE
COMPARISON OF PENALIZED AND MULTIPLE LINEAR REGRESSION FOR PREDICTION OF MILK YIELD IN CROSSBRED CATTLE Hemant Kumar* and B. K. Hooda1 1
Division of Social Sciences, IIPR (ICAR), Kanpur - 208 024, India. Ch. Charan Singh Haryana Agriculture University, Hisar - 125 004, India. E-mail :
[email protected]
Abstract : Ordinary least square (OLS) estimates of the regression parameters in general have low bias, but large variance and often have poor performance in both prediction and interpretation. The prediction accuracy can sometimes be improved either by shrinking of some coefficient towards zero or by allowing a little bias to reduce the variance of the parameter estimates and predicted value. Shrinkage and penalization techniques have been proposed to improve OLS estimates. Ridge regression minimizes the residual sum of squares subject to a penalty of the L2-norm on the regression coefficients. A promising technique called the LASSO (least absolute shrinkage and selection operator) proposed by Tibshirani (1996) is a penalized least squares method imposing L1-penalty on the regression coefficients. In order to strengthen the power of prediction, a method similar to the LASSO, the elastic net was proposed by Zou and Hastie (2005) have combination of L1 and L2 penalties. We assessed the relative performance of the above three penalized model and compared with the multiple linear regression. Based on R2, RMSE, MAE, MAPE and U statistics, it is found that elastic net technique performed better. Key words : OLS, Penalized regression, Ridge regression, LASSO and Elastic net.
1. Introduction The overall productivity of dairy animals depends upon their lifetime performance rather than on a single lactation performance. Lifetime milk production or yield is therefore an important economic trait. Generation interval and expenses involved in maintaining less productive animals can be reduced if the animals are selected for lifetime productivity on the basis of traits expressed early in their life. The multiple linear regression models are being widely used in various disciplines including dairy science to predict the lifetime milk production of dairy animals. There are good reasons to use linear regression models because these are simple and often provide an adequate and interpretable description of how the inputs affect the output. The linearity assumption may not be satisfied in many practical applications, if the data have lot of noise or unexplained factors from the environment. Typically the predictions are made using some type of multiple regression model, with cause and effect relationships hypothesized between the independent and dependent variables. However, multiple regression models have various problems associated with them. First and foremost *Author for correspondence
Received December 27, 2014
is the choice of the underlying functional form of the model. If the researcher incorrectly formulate the initial model, the model will be much less likely to perform well as a predictive tool. Other problems with regression are the assumptions that must be made in order to be a valid prediction technique. Normality and independence of the error term and constancy of the error variance are assumptions, which are often made while using regression models. Ordinary least square (OLS) estimates of the regression parameter in general have low bias but large variance and often have poor performance in both prediction and interpretation [Tibshirani (1996)]. The prediction accuracy can sometimes be improved either by shrinking of some coefficient towards zero or by allowing a little bias to reduce the variance of the parameter estimates and predicted value. Further, with large number of predictors, the interpretation can be improved, if we use a smaller subset that exhibits the strongest effect. Shrinkage and penalization techniques have been proposed to improve OLS estimates. In particular ridge regression [Hoerl and Kennard (1970a, 1970b)] minimizes Revised April 12, 2015
Accepted April 28, 2015
152
Hemant Kumar and B. K. Hooda
the residual sum of squares subject to a penalty of the L2-norm on the regression coefficients. A promising technique called the LASSO (least absolute shrinkage and selection operator) proposed by Tibshirani (1996) is a penalized least squares method imposing L1-penalty on the regression coefficients. The LASSO is a shrinkage method, which selects the variables and estimates the parameters in a regression set up simultaneously. It minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Owing to the nature of the L1-penalty, the LASSO does both the continuous shrinkage and automatic variable selection simultaneously. Although, the LASSO has shown success in many situations, it tends to select only one variable from the group, when there is a group of variables having very high pair-wise correlations. In such situations, it has been empirically observed that the prediction performance of the LASSO is dominated by ridge regression [Tibshirani (1996)]. In order to strengthen the power of prediction, a method similar to the LASSO, the elastic net was proposed by Zou and Hastie (2005) have combination of L1 and L2 penalties. Elastic net generalizes the LASSO to overcome its drawback, while enjoying the similar optimal properties. Elastic net simultaneously does automatic variable selection and continuous shrinkage and it can select groups of the correlated variables. It is like a stretchable fishing net that retains all the “big fish”. Simulation studies have indicated that the elastic net often outperforms the LASSO in terms of prediction accuracy. Keeping in mind the limitations of multiple linear regression and better performance of penalized and shrinkage regression techniques especially in the presence of multicollinearity, in this article we would like to evaluate the performance of multiple regression, penalized regression techniques for prediction of milk production in cattle. Data Data pertaining to 158 crossbred cattle from the history cum pedigree sheet maintained in the department of Animal Genetics and Breeding, Lala Lajpat Rai University of Veterinary and Animal Sciences, Hisar, Haryana for a period of 24 years (1985 to 2009) were used for the present study. Since, the animals varied considerably due to their different physiology, care was taken to include the data only of those cattle, whose lactation was normal. Only those animals were included
in this study that had completed three lactations. The lifetime milk production was defined as total amount of milk produced by cattle from initiation of first lactation till the completion of third lactation used as dependent variable. Since, the lifetime milk production depends on many production and reproduction traits so following lactation traits were considered for the present study and used as independent variables. (i) Age at first calving (ii) First lactation milk yield (iii) Second lactation milk yield (iv) First lactation length (v) Second lactation length (vi) First service period (vii)Second service period (viii) First dry period (ix) Second dry period (x) First calving interval (xi) Second calving interval
2. Methodology The multiple regression coefficients minimize a residual sum of squares
R| F I U| arg min S G y x J V K |W |T H t
2
p
n
0
i 1
ij
j
j=1
The ridge regression coefficients minimize a penalized residual sum of squares
R| F arg min S G y |T H
i 1
I x J K p
n
t
0
ij
j=1
j
2
U| V |W p
2
j
j=1
Here, 0 is a complexity parameter that controls the amount of shrinkage: the larger the value of , the greater the amount of shrinkage and the coefficients are shrunk towards zero. As a continuous shrinkage method, ridge regression provides better prediction performance through a bias–variance trade-off. However, the ridge regression cannot produce a parsimonious model, as it always keeps all the predictors in the model. LASSO minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant.
Comparison of Penalized and Multiple Linear Regression for Prediction of Milk Yield
Table 1 : Observed and predicted milk yield of test set by different models and methods.
The LASSO estimates is defined by
R| F I arg min S G y x J K |T H p
n
i
0
ij
i=1
2
j
U| V. |W
Observation No.
p
j=1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Here, 0 as defined earlier is a complexity parameter that controls the amount of shrinkage. For any non-negative values 1 and 2, the elastic net estimates are given as
R| F I e1 j arg min S G y x J K |T H U| V. W| p
n
en
2
i
ij
2
j
j=1
p
2 j
2
0
i=1
p
j=1
1
Observed milk yield
Predicted by
j
j=1
153
j
j=1
Where, yi : ith observations on the variable to be explained. xij : ith observation for jth independent variables.
8345 9029 8350 6572 6077 7587 5701 8409 6635 3071 4807 4548 4730 7917 6095 12554
MLR
Ridge
LASSO EN
8312 7520 8440 6869 6202 6767 5042 8526 7149 3639 3869 5132 5420 7505 5299 11958
8270 7504 8429 6869 6191 6769 5118 8501 7144 3644 3862 5134 5425 7450 5305 11931
8233 7484 8403 6852 6166 6808 5168 8466 7137 3611 3880 5115 5442 7407 5310 11943
8220 7487 8398 6847 6166 6817 5188 8462 7148 3631 3909 5122 5445 7400 5335 11926
Table 2 : Comparative performance of models and methods over the test data.
0 : intercept coefficients.
Model\Criteria
MAE
MAPE
RMSE
U(10-8)
R2
i : ith regression coefficients.
MLR
546.69
9.04
659.14
1296
0.9093
Ridge
548.19
9.04
661.09
1302
0.9058
LASSO
537.53
8.83
654.78
1292
0.9105
EN
536.31
8.81
652.03
1286
0.9113
Data under study was subjected to multiple linear regression analysis using SPSS package, ridge regression analysis using MASS package of R and LASSO and elastic net analysis using glmnet package of R.
3. Results Research workers often use models to approximate unknown relationship between a set of predictor variables and the response variable. They try different types of models, which explain the variability in the data in a better way. The main objective of the model building is to predict response variable using the predictor variables. Much of the researcher’s effort is devoted to the estimation of model parameters, however it is desirable to have a model that is reasonably easy to construct and interpret and predicts well. Thus, the assessment of prediction performance of a model is critical and has practical importance. This is especially true for models with prediction as their primary objective. To assess the relative performance of the model, we used a training set which is a part of whole data with known information that is used to build a model. The goal of the training phase is to estimate parameters of a model to predict response variable with a good predictive performance in real use of the model.
In our study, we have evaluated the prediction performance of the following models for prediction of milk production in cattle. (i) Ordinary multiple linear regression (MLR) (ii) Ridge regression (iii) LASSO (iv) Elastic net (EN) For this purpose, the whole data was divided at random into two sets viz., the training set consisting of 90 per cent and testing set comprising of 10 per cent. The training sets were used to estimate the parameters and the testing sets were used to validate the models. Since, there are 156 observations under study, so there were 140 observations in training set to estimate the model parameters and there were 16 observations in test set to validate the model. The observed and predicted lifetime milk yield by different methods for the test set are given in Table 1.
154
Hemant Kumar and B. K. Hooda
Criteria for predicting performance of Model The test set observations are used for verifying further how accurately the fitted model performed in forecasting these values. In order to judge the forecasting accuracy of a particular model or for evaluating and comparing different models, their relative performance on the test dataset is considered. For this purpose, the following performance measures proposed in literature to estimate forecast accuracy have been used. 1. Coefficient of Determination (R2) n
R2
ey
j 1 e y yj y i
i
i=1 n
2
i=1
2. Root Mean Square Error (RMSE) n
ey
y i
i
1 n
j
y i
i=1
4. Mean Absolute Percentage Error (MAPE)
MAPE
1 n
yi y i
n
100
yi
i=1
5. Theil’s U-statistics (U) 1 n
U 1 n
n
ey
i=1
y i
j
2
i=1
n
e
i
yi
j
2
1 n
Hoerl, A. E. and R. W. Kennard (1970a). Ridge regression : biased estimation for non-orthogonal problems. Technometrics, 12, 5567. Hoerl, A. E. and R. W. Kennard (1970b). Ridge regression : applications to non-orthogonal problems. Technometrics, 12, 6982.
n i
Authors thankfully acknowledge the constructive comments of the anonymous reviewers on the earlier version of this paper.
References
i=1
y
Based on above five criteria of predicting performance of the model, it can be seen from Table 2 the elastic net technique outperformed the other methods ordinary multiple linear regression, artificial neural network and penalized regression technique, for prediction of lifetime milk yield using lactation traits in crossbred cattle.
2
3. Mean Absolute Error (MAE) MAE
For assessing forecast accuracy, it is desirable that the RMSE, MAE, MAPE and U-statistic should be close to zero and R2 should be close to unity.
Acknowledgement
2
i
1 RMSE n
Where, yi is actual or observed value, y i is the predicted value, y is the arithmetic mean of observed value and n is number of observation.
n
e y j i
i=1
2
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267–288. Zou, H. and T. Hastie (2005). Regularization and variable selection via the elastic net. Journal of Royal Statistics Society, Series B, 67(2), 301-320.