Document not found! Please try again

Comparing Model Performances Graphically

30 downloads 0 Views 1MB Size Report
Feb 23, 2017 - Stansbury, D.(2013), “Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff,”. Downloaded 5 April 2017, from ...
Method: Graphical Comparison of Models

Comparing Model Performances Graphically James R. Knaub, Jr. Retired in Reston, Virginia, USA February 23, 2017

Introduction: Many researchers on ResearchGate have asked questions regarding model selection and/or validation in which they are concerned about the deletion or inclusion of a particular regressor or regressors (i.e., predictors/independent variables). Model validation with test data may be used to compare models based on results, but one might also want to do a comparative graphical residual analysis, especially if available data are sparse for validation. Note that scatterplots with estimated residuals on the y-axis, and predicted y on the x-axis is well-known as a useful diagnostic. (For a great many good ideas, see PSU(2017).) Here, it is illustrated that plotting information for more than one model on the same scatterplot may provide a good, clearly interpretable comparison between models. Indications of both variance and model-failure/bias can be seen. For one regressor, residual plots are typically found with x on the x-axis, rather than predicted y, because for one regressor, for regression through the origin, x is a fine measure of “size.” However, predicted y = y* = bx is also a good measure of size for the x-axis, and leaves the entire world of all possible regression equation types to be used to obtain y*, which then leaves a common and equal scaled x-axis for comparing any two or more different models in the performance of estimating y, as measured by estimated residuals, no matter the difference in number of regressors, nonlinearity, or any other concern. (See restriction on this noted in the addendum/correction/clarification at the end.) By using absolute values for estimated residuals here, we can easily compare two or more models. In the two Excel files also uploaded to ResearchGate to support the graphics, a few test data points were left out to see how well they were predicted in each case. (For one classical ratio estimator (CRE) in the case of hydroelectric generation data obtained, the sum of the estimated residuals for those test data was twice that of another such model.) However, this was not a rigorous validation by any means. The focus here is on graphical comparison of model performances. The examples, as shown in the accompanying uploaded Excel files, only use the CRE1, but any relevant model can be used with any sets of regressor data for each set of points on any scatterplot graphic where the predicted y is on the x-axis.

Page

First, below is an optional section, on related uses of graphics, to compare model performance for groups of data which may or may not best be modeled together. That section also looks at graphics for one regressor, x, i.e., simple regression through the origin, such that the same regressor might be used to estimate for multiple dependent variables. (Thus, this is a simple regression case of multivariate regression.) This is the case shown in Knaub(2014a), “Quasi-Cutoff Sampling and Simple Small Area

1

Remember to pay attention to representativeness, by category, of dependent variable data collected.

1

CRE: 𝑦𝑖 = 𝑏𝑥𝑖 + 𝑒0𝑖 𝑥𝑖0.5 , 𝑤ℎ𝑖𝑐ℎ 𝑚𝑒𝑎𝑛𝑠 𝑡ℎ𝑎𝑡 𝑏 = ∑𝑛𝑖=1 𝑦𝑖 ⁄∑𝑛𝑖=1 𝑥𝑖

Method: Graphical Comparison of Models Estimation …,” in Figure 2, on page 5, where we have the same x, different y, instead of different x, and the same y.

Related Uses of Graphics (Optional Section): 1) First let us consider the combination or separation of two portions of a finite population, i.e. subpopulations, which may or may not be well modeled by the same regression with the same parameters. That is, should the data be modeled together. The example here is for very simple modeling, but can be extended. Here we look at simple regression through the origin. One may examine prediction error bounds to compare models for given data, such as is illustrated in Knaub(2014b), “CRE Prediction Bounds ….” This is for the classical ratio estimator, so the regression weight is 1/x. Below is a graph excerpted from that Excel file, which illustrates how a breakdown of data into two groups, perhaps strata, might be considered, or perhaps two groups may be combined for purposes of small area estimation (SAE). That is, if we suspect that data should either be divided into two estimation groups, “EGs” as suggested in Knaub(1999)), and discussed in Knaub(2010), “On Model-Failure…,” page 9, for purposes of stratification within a category to be published, or conversely, perhaps those groups may be considered to be modeled well enough together, for purposes of “borrowing strength” for small area estimation (Knaub(1999, 2014a)), we can examine this graphically. The following graph (as noted, in Knaub(2014b)) shows that if two potential subsets of the data of interest, and corresponding regressor data, are plotted together, in that example, then the prediction lines for y given x in each subset in this simple linear example (through the origin) may be compared. The 95% upper and lower prediction intervals on predicted y are shown in each case also, as well as the slope estimates with their estimated standard errors. 95% Prediction Bounds on predicted y. Cropped View; First data set in black, dashed prediction line; Second in brown-red. (note: lower prediction limit negative near origin because normality assumption problematic there )

350,000 b = 6.2, se(b) = 0.2; n = 30

300,000 250,000

b = 5.6, se(b) = 0.3; n = 30

200,000 150,000 100,000

50,000 (50,000) 0

10000

20000

30000

40000

50000

60000

x1

2

Figure 1: 𝑦 𝑣𝑠 𝑥1

400,000

Page

y

Method: Graphical Comparison of Models

One may then decide if the situation for your particular application warrants stratification, or may be combined for small area estimation. This may be extended to use of complex models if we want to see when to separate data sets or to combine them, by graphing y on the y-axis, and predicted y on the x-axis, for the two data sets on the same scatterplot.

2) Now let us consider a scatterplot showing y-values for energy data collected in two different months, with the same x-value used in each regression. The independent variable here is from a previous annual census of the same data element as collected on those two monthly samples. The following is extracted from page 5 of Knaub(2014a), “Quasi-Cutoff Sampling and Simple Small Area Estimation …”:

Figure 2a (renumbered from Knaub(2014a))

𝑥2 𝑦 𝑣𝑠 𝑥2 Figure 2b Enlargement near origin:

Page

3

𝑥2

Method: Graphical Comparison of Models

Method: Here, however, let us examine the case of a single y data set, for which we wish to compare models. Now, instead of the last figure where there are multiple y-values shown for the same x-value, we want to see about multiple predictions for the same dependent variable (y). Thus we are looking at one data set/population of interest, and want to see which model predicts better, for such an application. To start, we again consider only one predictor, x, so that we have the reverse of the previous graph: more than one different regressor, x, one regressor, one model at a time, predicting for the same y, instead of multiple y for the same x. This extends to any number of model predictions for the same y, where those models can be as complex, and with as many regressors (predictors, x) as you like. - Note restrictions in the addendum below. - That is because we do not have to plot y or residuals against a single predictor, x. We can plot them against predicted y. (In the simple case, that would be bx, rather than b, on the x-axis.) Two excel files have been uploaded in conjunction with this methods paper. They show how all of the following graphics were obtained. The graphics and other information produced there illustrate variance, prediction intervals, and example “error” due to variance and model-failure bias. Here the classical ratio estimator is used, regression weight 1/x, y*=bx. But any model can be used to generate the prediction y*, so the graphics can be used for any model types. For example, a multiple linear regression model with four predictors could be compared to a nonlinear regression model with two predictors. What matters is using y* on the x-axis of the graphics.

Example 1 –

Page

Here we consider hydroelectric generation data for the 23 hydroelectric plants on a monthly collection of these establishment survey data. In practice, a past census is used as regressor data. From the sample, the regression coefficient for the classical ratio estimator (the CRE, one regressor, regression through the origin, with regression weight w=1/x) is estimated (𝑏 = ∑𝑛𝑖=1 𝑦𝑖 ⁄∑𝑛𝑖=1 𝑥𝑖 ). This is used to predict (note this is not forecasting, but estimation) for data not collected in the sample. For purposes of this methodology paper, we will use one month of sample data as the dependent variable, and two other months as regressor data, one month each for two different one-regressor CRE predictions, to compare results. For this illustration, any month could be the dependent, i.e. y-value data, and any other month could be the independent, i.e. x-value data. In one case, the x-value data are from the adjacent month data, and in the second CRE model, the x-value data are from a model from a different season of the year. Here, as preliminary graphics, we first see the plot of a line representing the predictions, and curves

4

The Excel file for hydroelectric data used here comes from the website for the U.S. Department of Energy, The Energy Information Administration (EIA), US EIA(2017). The source they used to provide the data used here is the EIA-923 survey, “Power Plant Operations Report.” (We will not consider here the slight possible influence of grouping data from different power companies where about half of the data came from the same company. We will consider the data to be iid.) Note that the Excel file containing these data and the graphics generated has been uploaded to ResearchGate simultaneously with this methodology paper, and another Excel data file for Example 2 below.

Method: Graphical Comparison of Models showing the upper and lower 95% prediction intervals, vertically about the predicted values of y on the prediction line, with x on the x-axis. Figures 3 and 4 are for these two CRE models, with one regressor each, but a different one in Figure 3 (the adjacent month was used), than that used in Figure 4 (with the regressor data from a different season). Such plots would be desirable for any regression model.

Figure 3 𝑦 𝑣𝑠 𝑥1

𝑦 7,000,000 6,000,000 5,000,000

Prediction Line #1

4,000,000

Upper 95% Pred Bound for #1 Lower 95% Pred Bound for #1 (x1,y) data points

3,000,000

2,000,000 1,000,000

0

1,000,000

2,000,000

3,000,000

(1,000,000)

𝑥1 (note: lower prediction limit negative near origin because normality assumption problematic there )

Figure 4 𝑦 𝑣𝑠 𝑥2

𝑦 7,000,000 6,000,000 5,000,000

Prediction Line # 2

4,000,000

Upper 95% Pred Bound for #2 Lower 95% Pred Bound for #2 (x2,y) data points

3,000,000 2,000,000 1,000,000

2,000,000

3,000,000

𝑥2

5

1,000,000

Page

0 (1,000,000)

Method: Graphical Comparison of Models

Now we show residual analysis graphics for the two models on the same scatterplot for comparison. As noted, these are from the two CRE models above, but because we only plot the absolute values of the estimated residuals on the y-axis here, and the predicted values of y on the x-axis, any regression model could be compared with any other regression model for which we are trying to predict y. Thus model prediction performance for y is directly comparable for the given data for these models, regardless of linearity or number of regressors for either model. Note that these performance considerations are for prediction, but may possibly help in using regression for explanation by indicating important regressors. However, this is not strictly true, considering Shmueli(2010). Figure 5 shows a residual analysis graphical example for comparing two model performances:

|𝑒𝑖 |

Figure 5

900000

b''x2 brown-red y = 0.1997x + 72827

800000 700000 600000 500000 400000 300000 200000

b'x1, blue y = 0.0535x + 32259

100000 0 -

1,000,000

2,000,000

3,000,000

4,000,000

Page

In Figure 5, the y-axis is the absolute value of the estimated residual and the x-axis is the predicted y value, in each case, so comparisons are directly made between model results, regardless of model, as long as they are both predicting (estimating) for the same y-values. The blue dots are generated by model #1, and the red-brown ones for model #2. Note that Excel “trendlines” through these points do indeed indicate heteroscedasticity. It appears that Model #1 is performing better than Model #2 in important respects, including the four worst cases. (Note that Model #1 seems to generally perform better than Model #2 when looking at Figures 3 and 4 also. A closer look at the data near the origin may be needed, but the graphics here show a particular distinction in performance if we want to estimate totals.)

6

𝑦*

Method: Graphical Comparison of Models

As an aside, the CRE is used with these data, because we expect that although heteroscedasticity will probably be greater than this (Brewer(2002), page 111), this may be more robust against nonsampling error for the smaller respondents (see Knaub(2009)) to underestimate the degree of heteroscedasticity. Adjusting the sigmas for the degree of heteroscedasticity for which the CRE accounts, we obtain the following, Figure 6, for the absolute values of the CRE estimated random factors of the estimated residuals on the y-axis, and predicted y on the x-axis:

|𝑒0 𝑖 |

Figure 6

3500 3000 2500 2000 1500

b''x2, brown-red data set y = 7E-05x + 406.05

1000 500

b'x1, blue data set: y = -9E-06x + 134.7

0 -

1,000,000

2,000,000

3,000,000

4,000,000

𝑦* Note that the Excel “trendlines” are more nearly horizontal. Some data near the origin may be questionable, and perhaps should have been investigated for possible measurement error. (The respondents may need to have been asked if they were certain of their data. Perhaps they were asked, and confirmed their reports. But sometimes errors are found. However, asking selected respondents to confirm results, without asking everyone, might bias results. Further, even asking a few respondents to confirm can delay production of official statistics, and also be labor/resource intensive, so only the more flagrant cases, perhaps someone likely reporting in incorrect units, might be investigated.

Page

7

Note the relationship of the Excel “trendlines” to the methodology used in Knaub(1993) to estimate the degree of heteroscedasticity.

Method: Graphical Comparison of Models

Example 2 The data in the Excel file uploaded for Erie, Pennsylvania housing market data, for 1977, used here comes from Weisberg(1980), pages 219 and 220, which he found in another source2. Note that this Excel file containing these data and the graphics generated has been uploaded to ResearchGate simultaneously with this methodology paper, as noted above. Weisberg shows one dependent variable, sales price, and nine dependent variables, such as current taxes, lot size, living space, and number of rooms, and he states on page 220 that there are additional independent variables actually used to predict house sale prices, which are used in property tax assessment. In the simple application used here and in the data file for these housing data also uploaded to ResearchGate, we can expect omitted variable bias, and perhaps a substantial intercept term which is the result of those independent variables missing in such models. In this case, we note that the use of the one-regressor CRE, with lot size in thousands of square feet, is of course inadequate. 𝑦

Figure 7

90 80 70 60

Prediction Line

50 (x,y)

40

30 20 10 0

2

4

6

8

10

12

14

𝑥

Page

8

Here we only look at 28 data points which represent 28 real estate sales, and, as noted, only lot size as a dependent variable in regression through the origin, with regression weight w=1/x. Note that there are two large reported sales, which if ignored as likely more dependent than others upon some variable or variables not used here, would leave data which might be fit with an improved heteroscedastic model using this regressor, lot size, but with a relatively large positive intercept, and/or nonlinearity. Prediction intervals about the more inferior model here, are shown in Figure 8.

2

Narula, S.C. and J.F. Wellington (1977). “Prediction, linear regression and minimum sum of relative errors.” Technometrics, 19, 185-90.

Method: Graphical Comparison of Models

𝑦

Figure 8

120 100 Prediction Line

80

Upper Prediction Bound Lower Prediction Bound (x,y)

60 40 20

0

5

10

15

(20)

𝑥

Each model using all the data will appear model-unbiased (Cochran(1977), page 158 and elsewhere), as the expected sum of estimated residuals is zero, and for the CRE it is always exactly zero (Särndal, Swensson, and Wretman(1992), pages 231 and 232.) This is shown in the accompanying Excel spreadsheet. It is also shown what happens when some data are withheld as test data, and as in the case of the first example, the sum of the estimated residuals is no longer zero in such a test with parameters based on the smaller sample and applied to the full data set.

Conclusion:

Page

Please note: There are a number of good resources online for statistics, including PSU(2017), which is relevant here. That reference is noted because it provides a wealth of information on graphical residual analyses and related graphics, which is very useful for understanding this topic. If you want to compare the performance of one model to another, Figures 5 and 6 above are examples of such a comparison.

9

Remember that predicted y (i.e., y*), and the absolute values of the estimated residuals, for any candidate model, can be used on the same graph with predicted y and the absolute values of the estimated residuals for any other candidate model. Predicted y goes on the x-axis. This provides a direct comparison between model performances for various models for the same dependent (y). This is illustrated in Figure 5, and accounting for a degree of heteroscedasticity, in Figure 6. Other information is shown in the two excel files also uploaded to ResearchGate, such as estimated variances of prediction errors. Note that an estimated variance of prediction error is designed to estimate variance but because of the way sigma is estimated, it is also accordingly inflated by model-failure/bias. (See Knaub(2016), “Predictions for Finite Populations ….”)

Method: Graphical Comparison of Models

Addendum on limitations of this graphical methodology: Note that Figures 5 and 6 are useful for comparing models with different regressors and/or number of regressors. This would not help if looking, say, at the number of coefficients for polynomial regression, where one might use too many such coefficients, as shown in the example at the beginning of Stansbury, D.(2013): https://theclevermachine.wordpress.com/2013/04/21/model-selection-underfitting-overfitting-and-the-biasvariance-tradeoff/ That is, one could “overfit.” Residual analyses can tell us a great deal, but "model validation," to see if 'new' data are predicted well, can still be very important.

References: Brewer, K.R.W. (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press. Cochran, W.G.(1977), Sampling Techniques, 3rd ed., John Wiley & Sons. Knaub, J.R., Jr. (1993), "Alternative to the Iterated Reweighted Least Squares Method: Apparent Heteroscedasticity and Linear Regression Model Sampling," Proceedings of the International Conference on Establishment Surveys (Buffalo, NY, USA), American Statistical Association, pp. 520-525. https://www.researchgate.net/publication/263809034_Alternative_to_the_Iterated_Reweighted_Least _Squares_Method_-_Apparent_Heteroscedasticity_and_Linear_Regression_Model_Sampling DOI:10.13140/RG.2.1.4439.4085 · Conference: International Conference on Establishment Surveys (ICES), At Buffalo, NY, USA

http://ww2.amstat.org/meetings/ices/1993/contributed/VII_EconomicData-Analysis.pdf

Page

Knaub, J.R., Jr. (2009), “Properties of Weighted Least Squares Regression for Cutoff Sampling in Establishment Surveys,” InterStat, December 2009, http://interstat.statjournals.net/. https://www.researchgate.net/publication/263036348_Properties_of_Weighted_Least_Squares_Regres sion_for_Cutoff_Sampling_in_Establishment_Surveys

10

Knaub, J.R., Jr. (1999), “Using Prediction-Oriented Software for Survey Estimation,” InterStat, August 1999, http://interstat.statjournals.net/, partially covered in "Using Prediction-Oriented Software for ModelBased and Small Area Estimation," in Proceedings of the ASA Survey Research Methods Section, 1999, and partially covered in "Using Prediction-Oriented Software for Estimation in the Presence of Nonresponse,” presented at the International Conference on Survey Nonresponse, 1999. https://www.researchgate.net/publication/261586154_Using_PredictionOriented_Software_for_Survey_Estimation

Method: Graphical Comparison of Models

Knaub, J.R., Jr. (2010), “On Model-Failure When Estimating from Cutoff Samples,” InterStat, June 2010, http://interstat.statjournals.net/, https://www.researchgate.net/publication/261474154_On_ModelFailure_When_Estimating_from_Cutoff_Samples Knaub, J.R., Jr. (2014a), “Quasi-Cutoff Sampling and Simple Small Area Estimation with Nonresponse,” InterStat, May 2014, http://interstat.statjournals.net/, https://www.researchgate.net/publication/262066356_QuasiCutoff_Sampling_and_Simple_Small_Area_Estimation_with_Nonresponse Knaub, J.R., Jr.(2014b), “CRE Prediction 'Bounds' and Graphs Example for Section 4 of Properties of WLS article,” ResearchGate, https://www.researchgate.net/publication/263265199_CRE_Prediction_%27Bounds%27_and_Graphs_E xample_for_Section_4_of_Properties_of_WLS_article

PSU(2017), Pennsylvania State University, Eberly College of Science, STAT 501, "Regression Methods," online resource, “lessons,” downloaded on February 24, 2017, https://onlinecourses.science.psu.edu/stat501/node/2. Under this we see the following: "4.2 - Residuals vs. Fits Plot," https://onlinecourses.science.psu.edu/stat501/node/277, "4.3 - Residuals vs. Predictor Plot," https://onlinecourses.science.psu.edu/stat501/node/278, "4.4 - Identifying Specific Problems Using Residual Plots," https://onlinecourses.science.psu.edu/stat501/node/279, "7.4 - Assessing the Model Assumptions," https://onlinecourses.science.psu.edu/stat501/node/317

Särndal, C.-E., Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlag. Shmueli, G.(2010), “To Explain or to Predict?,” Statistical Science, Vol. 25, No. 3, pp. 289-310. Institute of Mathematical Statistics, DOI: 10.1214/10-STS330, https://www.researchgate.net/publication/48178170_To_Explain_or_to_Predict, 23 pages. Stansbury, D.(2013), “Model Selection: Underfitting, Overfitting, and the Bias-Variance Tradeoff,” Downloaded 5 April 2017, from https://theclevermachine.wordpress.com/2013/04/21/model-selectionunderfitting-overfitting-and-the-bias-variance-tradeoff/, Topics in Computational Neuroscience & Machine

Learning.

Page

Weisberg, S. (1980), Applied Linear Regression, 1st ed, John Wiley & Sons.

11

US EIA(2017), Power Plant Operations Report, Form EIA-923 detailed data, downloaded February 22, 2017, http://www.eia.gov/electricity/data/eia923/

Suggest Documents