OCTOBER 1998 #1: Filling in the Gaps for A Partially Discontinued Data Series by James R. Knaub, Jr.
Filling in the Gaps for A Partially Discontinued Data Series by James R. Knaub, Jr . Abstract: Data on US coal production, imports, producer and distributor stocks, consumption, exports, consumer stocks, and, by default, losses and unaccounted for coal, have been collected, and, considering changes in stock levels, all data have been 'balanced' on a quarterly basis. That is, the relationship between these variates is written as a single equation. These data have been published in the Energy Information Administration (EIA) Quarterly Coal Report for more than sixteen years. Producer and distributor stocks (p/d stocks) will no longer be collected on a quarterly basis, due to budgetary constraints, but will be observed annually. The EIA still wishes to publish these data quarterly, with 'estimates' given for p/d stocks for the first, second and third quarters of each year, and the observed value given for the fourth quarter. (Note that these "estimates" are actually "predictions." However, unlike the usual case with predictions, no values will ever be observed for first, second or third quarter p/d stocks after 1997. Also, we are only interested in a value to approximate the current conditions for each of these publications, not forecasts of the future.) This paper explores the use of weighted linear regression modeling, prediction and the variance of the prediction error, and the combination of 'predictions' from different models, to help fill in these unobserved p/d stock quarterly values in a reasonable manner, and provide estimated standard errors. The procedures found here have substantial potential for use whenever one might consider reducing the frequency of periodic data collection for an established data series. Key Words: linear regression; heteroscedasticity; regression weights; model; prediction; combining estimators; standard error of the prediction error; SAS PROC REG STDI; missing 'time series' data distribution, generated models; extreme distributions Author: James R. Knaub, Jr., mailto:%
[email protected] Editor: Jan Beran ,mailto:
[email protected] READING THE ARTICLE: You can read the article in portable document (.pdf) format (176266 bytes.) NOTE: The content of this article is the intellectual property of the authors, who retains all rights to future publication. This page has been accessed 1599 times since April 1, 2007. Return to the
Home Page.
file:////eianas01/...%20Gaps%20for%20A%20Partially%20Discontinued%20Data%20Series%20by%20James%20R_%20Knaub,%20Jr_.htm[4/11/2014 9:20:47 AM]
Filling in the Gaps for A Partially Discontinued Data Series James R. Knaub, Jr. Energy Information Administration, EI-53.1 US Dept. of Energy, 1000 Independence Ave. SW, Washington DC, 20585 Key Words: linear regression; heteroscedasticity; regression weights; model; prediction; combining estimators; standard error of the prediction error; SAS PROC REG; missing ‘time series’ data
Abstract: Data on US coal production, imports, producer and distributor stocks, consumption, exports, consumer stocks, and, by default, losses and unaccounted for coal, have been collected, and, considering changes in stock levels, all data have been ‘balanced’ on a quarterly basis. That is, the relationship between these variates is written as a single equation. These data have been published in the Energy Information Administration (EIA) Quarterly Coal Report for more than sixteen years. Producer and distributor stocks (p/d stocks) will no longer be collected on a quarterly basis, due to budgetary constraints, but will be observed annually. The EIA still wishes to publish these data quarterly, with ‘estimates’ given for p/d stocks for the first, second and third quarters of each year, and the observed value given for the fourth quarter. (Note that these “estimates” are actually “predictions.” However, unlike the usual case with predictions, no values will ever be observed for first, second or third quarter p/d stocks after 1997. Also, we are only interested in a value to approximate the current conditions for each of these publications, not forecasts of the future.) This paper explores the use of weighted linear regression modeling, prediction and the variance of the prediction error, and the combination of ‘predictions’ from different models, to help fill in these unobserved p/d stock quarterly values in a reasonable manner, and provide estimated standard errors. Note, however, as indicated by substantial data revisions and the inherently imprecise nature of data collection for coal, nonsampling error is an enormous consideration whether prediction is used or not, and no matter what procedures are employed to predict data. The procedures found here have substantial potential for use whenever one might consider reducing the frequency of periodic data collection for an established data series.
1. Background: There are 65 quarters of data (back to 1981) available for modeling p/d stock predictions; data on covariates will continue to be collected, and p/d stocks will continue to be observed annually. (See Table 1 in Quarterly Coal Report (1996) (96/4Q) and other issues of that
publication.) We will no longer break this down by State, but only have one national number. A quick examination of past revisions shows that there may have been enough nonsampling error to justify this decision, even if all quarterly observations were continued. The conventional notion of a "time series" was considered, but the series is to be broken. It is not that there is a need to project ahead to each quarter until the data are later obtained, because data for three out of each four future quarters will never be observed. Another statistician reminded the author that there could be benefit in separating the data into four groups, depending upon the quarter of the year in which the p/d stocks were collected. However, the author had already considered that, and in the case of these data, the benefit of what might be considered the virtual addition of another regressor (i.e., time-of-year), was not very great, and was offset by the resulting dependence upon smaller data sets, which are made less stable by the presence of possibly substantial nonsampling error. An attempt was made to use regression against covariates that did not include the previous quarter for p/d stocks. One would then assume that the relationships between these covariates and p/d stocks that were present within the 65 quarters of previous data will not soon change over time. The quality of this assumption could be judged to some extent in the future by compiling annual comparisons of fourth quarter observed data and the predictions that would have resulted were this methodology to be used to predict fourth quarter results. Without using the previous p/d stocks as a regressor, variance was too high to make results very useful, and the team leader needing the data was not happy with those results. She had emphasized that p/d stocks may be calculated from covariates, except for a losses term. However, this assumes that the previous quarter's p/d stock level was actually observed, which will only be the case in the first quarter of each year. This led back to addressing the problem of using the previous p/d stock level when only a predicted previous value will be available.
2. The Procedure: A solution lay in using regression weights to account for the fact that the previous p/d stock value was predicted, when that is the case. These regression weights are described below and used in Eq(4.16). They are decomposed into a part that deals with the usual heteroscedasticity, and a part that accounts for the number of time periods since data for the variate of interest were last collected. The inverse square root of a regression weight is the ‘nonrandom factor’ when residuals are factored into random and nonrandom parts, as in Eq(4.16). This can be implemented easily using STDI under SAS PROC REG, where STDI is an estimate of the standard error of the prediction error. (This is not an endorsement of products for the SAS Institute Inc., but merely reflects the software available to the author.) Three predictors were then developed, one of which does not use the previous p/d stocks, and 2
the other two do. Both multiple and simple linear regression were used. SAS PROC REG can be used to produce diagnostics and may indicate if there is a problem with multicollinearity. It automatically makes a correction for a matrix singularity. (See SAS Institute Inc. (1985), pages 671 and 672, and Maddala (1977), pages 183-194.) The three predictors, and a combined predictor, were applied to 17 cases using test data from the previous 65 quarters. (See Table 1.) Some of the data were used as the data base, and some were temporarily deleted in order to make ‘predictions’ and compare results to the values observed. For example, the second test case in Table 1 was obtained by using data through the last quarter of 1996 for coal p/d stocks, plus data for the other regressors through the first quarter of 1997, to estimate coal p/d stocks for the first quarter of 1997. That ‘prediction’ was then compared to the number observed. (Note that there have been revisions in the data since Table 1 was compiled, so exact individual results may not be replicated.) 1, then at least one previous value of w is needed to construct the more complicated form for the residual shown in Eq(4.16), which is used for only one y value, the current one, when predicting that value.
5. Application: When Q > 1, x( will be estimated using a predicted value for yi 1 . Assuming results are i not as sensitive to moderate changes in weights as they are to changes in previous values of coal p/d stocks, let w 1 i w 1i 1 ..... w 1k x( , or xˆ( (approximately). Not i i concerning ourselves with the ‘hat’ in such cases, neither here nor further below, this yields
/
yi
T
Xi by yˆi 1 /
(Q)
e0 i x(i
Q
1/2
b M q 1
2 (q 1)
y
,
Eq(5.1)
a special case of Eq(4.16). The test results in Table 1, and the programming code listed make use of Eq(5.1). It would be interesting to make use Eq(4.16) in the future, and see if performance could be improved appreciably. Note that if there are no previous values of y being used as a regressor, then Eq(5.1) has no y term on the right side of the equation, and then ˆ / ˆ and Xi/ Xi , so then
y i T X i x( 1/2 e0 i i
Eq(5.2)
which is a special case of Eq(4.5). This is the case for one of the three models chosen. This was done so that even if Q were large enough to make the associated variance and reasonableness of this procedure cause for concern in the other models, there would still be one model in use that only employs those other regressors, thus not being dependent upon 10
Q, and also adding a new data point each year to the historical data used to estimate this model.
in
6. Description of Models Used: The models used in this application will now be considered in increasing order of test performance (see Table 1), and therefore in order of apparent increasing accuracy, using the data currently available. In short, the first model used in the table is the one that is multivariate, but does not consider previous y values; the second is the simple linear regression; and the third model (apparently the most accurate at this point in the data collection) is multivariate, as in the first model, but the third model includes a term involving a previous p/d stock variate. Results from these models are given in this order in Table 1, followed by combined predictor results. The first model below will operate with a data base that increases over time, but this is not true for the other two models. In this model, there are six regressors, none involving previous coal p/d stock (i.e., “y”) values. An intercept term (i.e., not set equal to zero) is included. Here one has
yM 1
i
T X i e0 i x( i1/2
,
Eq(6.1)
which corresponds to Eq(5.2), where “M1" simply indicates that it is associated with the first predictor (i.e., “model 1”) results in Table 1.
For the second model in the order taken for Table 1 (the order of apparent test accuracy), there is one regressor and an intercept term. (Call this “model 2.”) The previous coal p/d stock level, yi 1 , or an estimate of it, is a part of the regressor. Here one has
s/
yM 2 i
T
X i b1 yˆi 1 /
(Q)
e0 i x( i
11
Q
1/2
b M q 1
2 (q 1)
1
,
Eq(6.2)
which is a form of Eq(5.1), where “M2" indicates the second model associated with
s/
Table 1, involves only b0 and b1 , and this is the same “ b1 ” that is the coefficient for the
yi 1
related portion of the single regressor. (Refer to Eq(4.3).)
In the third model (in order of use in Table 1), seven regressors were used, including a previous y value (or an estimate of it), but, as stated earlier, the intercept was found to be very nearly zero, and was set equal to zero. Call this “model 3.” Here one has
/
yM 3 X i b7 yˆi 1 i T
/
(Q)
e0 i x( i
Q
1/2
Mb q 1
2(q 1)
7
,
Eq(6.3)
another form of Eq(5.1). Here b7
is used to indicate that one of the regressors ( yi 1 ) is being considered separately from the rest of T Xi , so we use prime marks on the first term, as before.
7. Combining Results in Practice: The “STDI,” (“standard error of the individual predicted value,” page 663 in SAS Institute Inc. (1985)), an option in SAS PROC REG, was used for each of the three models above. (Note that this is not an official endorsement of that product. Other vendors may supply this feature. The author is simply stating what was used with regard to the software that was available to the author.)
( yˆM 1 , )ˆ M 1 ),
From STDI, and the above models, one has
( yˆM 2 , )ˆ M 2 ),
and
( yˆM 3 , )ˆ M 3 ), or this could be written as ( yˆM j , )ˆ M j ), representing the predicted new quarter producers’/distributors’ stocks and corresponding estimated standard error of the prediction error, using model j. Using STDI and section 9.2.4 in Granger and Newbold (1986), was 'ˆ ( M 2, M 3) about 0.48. However, using techniques displayed in Granger and Newbold (1986), weighting
12
factors used for combining predictions which employed this value of rho were, in effect, not so very different from those found using = 0. Considering this, and ' (M 2, M 3) considering the volatility of results, the overall empirical results found in Granger and Newbold (1986), pages 266 through 272, those in Knaub (1992), and extending these results from two to three models, we have
)ˆ 2M 2 )ˆ 2M 3 yˆM 1 )ˆ 2M 1 )ˆ 2M 3 yˆM 2 )ˆ 2M 1 )ˆ 2M 2 yˆM 3 yˆ c
)ˆ 2M 1 )ˆ 2M 2 )ˆ 2M 1 )ˆ 2M 3 )ˆ 2M 2 )ˆ 2M 3
Eq(7.1)
and
)ˆ
2 c
)ˆ 2M 1 )ˆ 2M 2
)ˆ 2M 1 )ˆ 2M 2 )ˆ 2M 3 )ˆ 2M 1 )ˆ 2M 3 )ˆ 2M 2 )ˆ 2M 3
Eq(7.2)
Thus, the “bottom line” information we are seeking is ( yˆ c , )ˆ c ). The first value in this pair is a prediction of the current (missing) quarterly coal p/d stock level, and the latter value is an estimate of the standard error of that prediction error. That estimated standard error is impacted by every kind of error, with the exception of a consistent bias, and so it is an excellent overall measure of the quality of the estimate of p/d stocks data. Further, because it is the result of combining predictors, biases for these models are likely to counteract each other to some extent, thus preventing a relatively large bias in any one of them from being likely to be extremely influential.
8. Testing: As for application, 17 examples using test data from within the 65 sets of quarterly data already observed, and employing these equations, performed well. (This conclusion is based upon the relationship of predicted values to estimated standard errors of the prediction errors and the corresponding number actually observed, and considering the magnitude of revisions made.) Using constant coefficients within each model appears to be reasonable. Under several test data set conditions examined, corresponding b values seemed fairly stable.
Examining the test results, it is clear that the accuracy of the three models seems generally 13
subject to ranking as indicated earlier. Also, as must necessarily be the case, larger “Qvalues” mean larger variances. However, Q did not appear to be as influential here as the model chosen. Since a combined prediction was then developed, and, as just indicated, Q = 3 was not a lot worse than Q = 1, these were positive developments. However, it would be interesting to see what differences would occur in such test data, should experimentation with Eq(4.16) be used, instead of using Eq(5.1).
Note on the next page that when considering these results (rounded to millions of short tons), in 12 out of 17 tests, the absolute difference between the predicted and observed values is approximately less than or equal to the estimated standard error. For this number of test cases, this is a good result. However, the extreme cases (numbers 4 and 14) are cause for some concern. Note also that it appears that this method may tend to underestimate. However, if one were to plot predicted values on the x-axis of a graph, and the corresponding observed values on the y-axis, it would appear that this method overestimates. By eliminating test case number 14, however, a regression through the remaining data points nearly extends through the origin.
14
Table 1 - Summary of some testing done as of January 5, 1998: (Predictions, estimated standard errors, and collected values are in millions of short tons.) Q M1 Value Pred 1, s.e. 1 obs. # 1 3 31 4
M2 M3 Pred 2, s.e. 2 Pred 3, s.e. 3
Pred. s.e.
Collected Value
31 3
29 2
30 2
31
2
1
36 5
40 3
36 2
37 2
38
3
1
38 5
42 3
40 2
40 2
42
4
2
33 4
38 3
38 3
37 2
42
5
3
32 4
35 3
34 3
34 2
36
6
1
34 4
38 2
35 2
36 1
37
7
2
32 4
36 3
35 2
35 2
37
8
3
32 4
31 3
31 2
31 2
34
9
1
33 5
36 2
31 2
33 2
34
10
2
30 5
35 3
33 2
33 2
36
11
3
31 5
35 3
33 3
33 2
33
12
1
40 5
39 3
38 2
38 2
38
13
2
36 4
35 3
36 2
36 2
35
14
3
40 6
31 3
32 3
32 2
27
15
1
37 5
40 3
38 2
38 2
40
16
2
38 5
38 3
39 3
38 2
41
17
3
34 4
35 3
36 3
35 2
35
All estimates are for quarters in 1992, or more recent years, using only data from previous periods (except that the by estimates are made using all data). So ... the worst case is N=44, n=43.
15
9. File maintenance for future prediction of producer/distributor coal stocks: Note that nonsampling error is a large influence here. (“Nonsampling error” usually refers to processing and other errors that are part of a survey whether it is a sample survey or a census survey (enumeration). Here it is being used to cover the same types of errors as “nonsampling error” would usually cover in a survey.) Revisions of more than a million short tons are typical for production. However, one change noted of more than 2 million short tons for producer/distributor stocks was far more substantial. That is, a change of 2 million out of hundreds of millions for a regressor is relatively minor, except that a great deal of precision is typically used. However, a change of 2 million out of tens of millions, for the variate of interest, is of much greater impact. ...........................................................................................................................................
The purpose of this effort was to be able to estimate producer/distributor stocks (p/d stocks) for the first, second and third quarter of each calendar year. The fourth quarter data will be observed, so that a prediction for the fourth quarter will not be necessary. (Future prediction of the fourth quarters may be compared to collected (observed) values for purposes of evaluating the continued efficacy of the models, but the observed p/d value will continue to be the value reported for each of the fourth quarter periods in the future.) As described below, the size of the data file being used to predict the p/d stocks value each of the first three quarters in the future will not be increased over the 67 data points collected through the end of 1997, except that the first of the three models will be able to use an additional data point observed each year. SAS and FORTRAN source code found in the file QCR.PDSTOCKS.PROGRAM (a hardcopy is seen below) may be used as a module to be inserted into the Quarterly Coal Report (QCR) production software. Most of the work is done by SAS PROC REG, and as long as results from that SAS PROC may be used, there will not be a tremendous amount of programming to be done. QCR.PDSTOCKS.PROGRAM outputs the prediction for p/d stocks, and the estimated standard error of the prediction error. This standard error is typically 1.5 to 2 million short tons for the first quarter when in the previous quarter we actually observed data for p/d stocks. It is typically closer to 2.5 million short tons for the third quarter, when the previous two quarters have estimated p/d stocks. The second quarter results generally fall between these extremes. Therefore, the typical standard error of the prediction error will normally be about 2 million short tons. It is suggested that a footnote in the QCR should state this, and provide the standard error estimate for the current quarter as well. The input file for QCR.PDSTOCKS.PROGRAM is COAL.PDSTOCKS.DATAQCR. (See a hardcopy of this file below.) This input file will need to be updated for each new quarter. 16
The records in COAL.PDSTOCKS.DATAQCR are fixed with nine fields each. The first field contains the “Q value.” If that value is 1, 2 or 3, then that indicates the number of quarters since the last collection/observation of the p/d stocks volume datum. For the historical data, that value is a “1,” indicating that the previous p/d stock value was a collected value. If we are estimating for the first quarter of a new year, then all Q values will be “1,” except as noted here. Also, in this case, all p/d values will be actually collected/observed values, except as noted. An exception (to the value “1") for the Q values is that once a predicted value is inserted for the p/d value, that record is to be flagged for exclusion in any future quarterly predictions for models M2 and M3. To that end, Q will be shown as a “0,” or a “4,” as will be explained. Observed data will be available quarterly through 1997. In the first quarter of 1998, all Q values will be “1.” In the second quarter of 1998, all Q values will be “1,” except the first quarter of 1998, which will be changed to a “0,” and the second quarter of 1998, which will be a “2.” In the third quarter of 1998, Q values will all be “1,” except the first two quarters of 1998 which will both be “0,” and the third will have a “3" in that field. For the fourth quarter of 1998, a new p/d value will be collected. The Q value for the fourth quarter of this and subsequent years will always be “4,” (another ‘flag’ value, like “0"), because the previous quarter p/d stock value will never be reported. In the first quarter of 1999, all Q values will be “1,” except for “0" in each of the first three quarters of 1998, and a “4" in the fourth quarter 1998 record. Notice that from that point forward, the Q value for the current quarter will be a “1,” “2,” “3,” or “4,” depending on the quarter of the year; the Q value of every record up through the fourth quarter of 1997 will be a “1;” and all other Q values will be “0,” except other fourth quarter values that will each be a “4.” The reason for using the flag “4" is that unlike the records marked “0,” which are ignored in all of the predictions, a record with a Q value of “4" will be used in one of the three models (M1) being combined with the other two to provide the final prediction. (See the SAS and FORTRAN code on pages 21 through 24.) The second of the nine fields in COAL.PDSTOCKS.DATAQCR is the production value observed for the given quarter. The third field is for the volume of imports. (Note that all values in the record, other than the “Q value,” are in thousands of short tons.) The fourth field is for the current producer/distributor stocks. The fifth and sixth fields are for consumption and exports. The seventh and eighth fields are for current and previous quarter consumer stocks, respectively. In the last field, we find the previous quarter value for producer/distributor stocks. It may be either observed or predicted, as indicated by the value of Q. After the 67th record in COAL.PDSTOCKS.DATAQCR is completed, using data for the fourth quarter of 1997, the 68th record will need to be filled with production and import volume values, etc. There will be one blank field, the fourth one, for the first quarter 1998 p/d stocks value. At that point, we may begin using QCR.PDSTOCKS.PROGRAM or its 17
equivalent, embedded in the QCR production software, to ‘predict’ the ‘missing’ p/d stock values.
Acknowledgments: Many thanks to Richard S. Sigman whose comments on corrections to an early draft were essential. Also, others have brought the author’s attention to a technical correction and to points of interest that otherwise would not have been addressed. Thanks to all.
References: Energy Information Administration, Office of Coal, Nuclear, Electric and Alternate Fuels (1996), Quarterly Coal Report, DOE/EIA-0121(96/4Q) (Washington DC). Granger, C.W.J., and Newbold, P. (1986), Forecasting Economic Time Series, 2nd edition, Academic Press. Hansen, M.H., Hurwitz, W.N., and Madow, W.G. (1953), Sample Survey Methods and Theory, Volume II (Theory), John Wiley & Sons. Knaub, J.R., Jr. (1992), “More Model Sampling and Analyses Applied to Electric Power Data,” Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 876-881. Knaub, J.R., Jr. (1997), “Weighting in Regression for Use in Survey Methodology,” InterStat, April 1997, http://interstat.stat.vt.edu/InterStat. (Note shorter, more recent version in ASA Proceedings of the Section on Survey Research Methods, 1997.) Maddala, G.S. (1977), Econometrics, McGraw-Hill, Inc. SAS Institute Inc. (1985), SAS User’s Guide: Statistics, Version 5 Edition, Cary, NC: SAS Institute Inc.
18
Graph 1 X_SIGMA
Loss as a function of 15 10 5 0 -5 -10 0
10
20
30
40
50
Series 1
Graph 2 P/D Stocks (Y) as a function of
X _ SIGMA
50 40 30 20 10 0 0
10
20
30
40
50
Series 2
Note: Heteroscedasticity is not always apparent from a graph. When the number of observations is small, it often appears to change substantially with minor changes to the data, as it does in this application. Also, calculated regression weights can be very volatile. (Further, when there are many observations, some points on a graph may be obscured by others.)
19
EXAMPLE COMPUTER CODE:
Following is some simple SAS code to implement equations 6.1, 6.2 and 6.3. That is followed by very simple FORTRAN code for implementing equations 7.1 and 7.2. This program may be improved upon, but did generate Table 1, when parts of the data file shown below this program file were removed for testing purposes. (When testing was done, the Qvalue of the data record for the quarter treated as “current” was set according to the information supplied above, in section 9, “File maintenance for future prediction of producer/distributor coal stocks.”)
QCR.PDSTOCKS.PROGRAM 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.
//JK7UPRED JOB (6944,LEN,,30),'PREDICT',TIME=(1,55),REGION=6000K /*ROUTE PRINT HST /*PHOLD //SASINY EXEC SAS,REGION=5000K,WORK='80,40',SORT=100 //IN DD DSN=JK76944.COAL.PDSTOCKS.DATAQCR,DISP=SHR //OUT DD DSN=JK76944.ASA.PRELIM.RESULTS1,DISP=SHR //SASLOG DD SYSOUT=A //SASLIST DD SYSOUT=A //SYSIN DD * OPTIONS NOSOURCE NONOTES REPLACE LINESIZE=132; DATA SASREG; INFILE IN; INPUT Q 1-1 X1 5-10 X2 15-20 Y 25-30 X3 35-40 X4 45-50 X5 55-60 X6 65-70 YLAST 75-80; *LOSS = Y - (YLAST + X1 + X2 - X3 - X4 - X5 + X6);
XT = X1 + X2 - X3 - X4 + YLAST - X5 + X6; W = 1/XT; IF Q EQ 0 THEN W = 0;
*PROC PLOT; *PLOT LOSS*XT / HZERO VZERO; *PROC PLOT; *PLOT Y*XT / HZERO VZERO; 20
28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70.
PROC REG NOPRINT; MODEL Y=X1 X2 X3 X4 X5 X6 / P CLI; WEIGHT W; OUTPUT OUT=ODW P=YHPW STDI=YHSTPW; DATA _NULL_; SET ODW; FILE OUT; PUT @11 YHPW 10.2 @31 YHSTPW 10.2; /* //SASIXT EXEC SAS,REGION=5000K,WORK='80,40',SORT=100 //IN DD DSN=JK76944.COAL.PDSTOCKS.DATAQCR,DISP=SHR //OUT DD DSN=JK76944.ASA.PRELIM.RESULTS2,DISP=SHR //SASLOG DD SYSOUT=A //SASLIST DD SYSOUT=A //SYSIN DD * OPTIONS NOSOURCE NONOTES REPLACE LINESIZE=132; DATA SASREG; INFILE IN; INPUT Q 1-1 X1 5-10 X2 15-20 Y 25-30 X3 35-40 X4 45-50 X5 55-60 X6 65-70 YLAST 75-80; LOSS = Y - (YLAST + X1 + X2 - X3 - X4 - X5 + X6); XT = X1 + X2 - X3 - X4 + YLAST - X5 + X6; IF Q EQ 1 THEN W = 1/XT; IF Q EQ 2 THEN W = (1/XT)*(1/(1+(0.63**2))); IF Q EQ 3 THEN W = (1/XT)*(1/(1+(0.63**2)+(0.63**4))); IF Q EQ 0 THEN W = 0; IF Q EQ 4 THEN W = 0;
*PROC PLOT; *PLOT LOSS*XT / HZERO VZERO; *PROC PLOT; *PLOT Y*XT / HZERO VZERO; PROC REG; MODEL Y=XT / P CLI; WEIGHT W; *RESTRICT X1 = 1; OUTPUT OUT=ODW P=YHPW STDI=YHSTPW; *PROC PRINT; 21
71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82. 83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113.
*DATA ODW;*VAR YHSTPW; *PROC PRINT; DATA _NULL_; SET ODW; FILE OUT; PUT @11 YHPW 10.2 @31 YHSTPW 10.2;
/* //SAS7NI EXEC SAS,REGION=5000K,WORK='80,40',SORT=100 //IN DD DSN=JK76944.COAL.PDSTOCKS.DATAQCR,DISP=SHR //OUT DD DSN=JK76944.ASA.PRELIM.RESULTS3,DISP=SHR //SASLOG DD SYSOUT=A //SASLIST DD SYSOUT=A //SYSIN DD * OPTIONS NOSOURCE NONOTES REPLACE LINESIZE=132; DATA SASREG; INFILE IN; INPUT Q 1-1 X1 5-10 X2 15-20 Y 25-30 X3 35-40 X4 45-50 X5 55-60 X6 65-70 YLAST 75-80; LOSS = Y - (YLAST + X1 + X2 - X3 - X4 - X5 + X6); XT = X1 + X2 - X3 - X4 + YLAST - X5 + X6; IF Q EQ 1 THEN W = 1/XT; IF Q EQ 2 THEN W = (1/XT)*(1/(1+(0.76**2))); IF Q EQ 3 THEN W = (1/XT)*(1/(1+(0.76**2)+(0.76**4))); IF Q EQ 0 THEN W = 0; IF Q EQ 4 THEN W = 0;
*PROC PLOT; *PLOT LOSS*XT / HZERO VZERO; *PROC PLOT; *PLOT Y*XT / HZERO VZERO; PROC REG; MODEL Y=X1 X2 X3 X4 X5 X6 YLAST / P NOINT CLI; WEIGHT W; OUTPUT OUT=ODW P=YHPW STDI=YHSTPW; DATA _NULL_; SET ODW; FILE OUT; PUT @11 YHPW 10.2 @31 YHSTPW 10.2; 22
114. 115. 116. 117. 118. 119. 120. 121. 122. 123. 124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138. 139. 140. 141. 142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156.
/* //FORSTR EXEC FORT1CLG //FORT.SYSIN DD * REAL*16 SPM1,SPM2,SPM3,YPM1,YPM2,YPM3,SC,YC,D INTEGER Y1,S1,Y2,S2,Y3,S3,YL,SL,Y,S C 11 READ(21,*,END=12)YPM1,SPM1 GO TO 11 12 READ(22,*,END=13)YPM2,SPM2 GO TO 12 13 READ(23,*,END=14)YPM3,SPM3 GO TO 13 14 D=(SPM1**2)*(SPM2**2)+(SPM1**2)*(SPM3**2)+(SPM2**2)*(SPM3**2) SC = SQRT(((SPM1**2)*(SPM2**2)*(SPM3**2))/D) YC = (SPM2**2)*(SPM3**2)*YPM1 YC = YC +(SPM1**2)*(SPM3**2)*YPM2 YC = YC + (SPM1**2)*(SPM2**2)*YPM3 YC = YC/D C C WRITE(6,*)YC,SC Y1 = YPM1 + 0.5 Y2 = YPM2 + 0.5 Y3 = YPM3 + 0.5 S1 = SPM1 + 0.5 S2 = SPM2 + 0.5 S3 = SPM3 + 0.5 YL = YC + 0.5 SL = SC + 0.5 Y = (YC/1000.) + 0.5 S = (SC/1000.) + 0.5 WRITE(6,123)Y1,S1,Y2,S2,Y3,S3 123 FORMAT(3(9X,I5,5X,I4),//) WRITE(6,234)YL,SL 234 FORMAT(13X,I5,5X,I4,////) WRITE(6,345)Y,S 345 FORMAT(13X,I5,2X,I4) STOP END //GO.FT21F001 DD DSN=JK76944.ASA.PRELIM.RESULTS1, // DISP=SHR //GO.FT22F001 DD DSN=JK76944.ASA.PRELIM.RESULTS2, 23
157. 158. 159. 160. 161.
// DISP=SHR //GO.FT23F001 DD DSN=JK76944.ASA.PRELIM.RESULTS3, // DISP=SHR //* DISP=SHR,LABEL=(,,,IN) /*
24
DATA: Here are the data, with revisions, as available in January 1998, as described in section 9 above. Data were being revised at the same time testing was being done, and more revisions are likely to be found for the data below in future publications of the Quarterly Coal Report.
COAL.PDSTOCKS.DATAQCR: 1. 1 135908 298 22265 2. 1 234578 232 23417 3. 1 238178 297 24149 4. 1 222250 114 32995 5. 1 216434 128 36876 6. 1 201384 271 39678 7. 1 198043 230 36784 8. 1 192282 269 39867 9. 1 186844 379 37761 10. 1 198816 299 35249 11. 1 204150 323 33931 12. 1 223115 276 34265 13. 1 230226 269 30841 14. 1 244592 435 29701 15. 1 197988 306 34090 16. 1 213238 330 35371 17. 1 228160 500 35197 18. 1 222827 623 32632 19. 1 219414 500 33133 20. 1 227974 485 38024 21. 1 220001 576 38148 22. 1 218681 537 33804 23. 1 223659 614 32093 24. 1 222199 331 36560 25. 1 218823 483 33939 26. 1 232958 475 28775 27. 1 244782 459 28321 28. 1 236889 542 36764 29. 1 226645 587 36079 30. 1 241622 437 31360 31. 1 245109 567 30418 32. 1 247179 531 35508 33. 1 239022 687 30598 34. 1 243060 925 28848 35. 1 251468 708 29000
169358 193011 182874 186675 164067 184091 172078 173144 165381 205741 192406 199942 185817 208113 197424 205484 193529 213937 205099 206314 188673 212671 196654 199523 199627 229397 208394 220787 205735 238672 218448 223486 208025 232026 226163 25
20514 33978 35773 25564 31621 25249 23842 15144 20345 22045 20238 15125 23737 25263 17357 18544 24208 25831 24097 17245 24170 23687 20416 16576 20113 21033 21885 16061 24900 27691 26371 21429 28445 23991 26949
158274 164970 185274 179484 198377 189967 195254 192315 197033 173743 168654 174283 194065 208019 197211 179454 188013 176195 170234 166398 176018 164885 175226 173173 176037 165598 185459 175279 173308 154331 158413 149238 159013 147165 146087
207340 158274 164970 185274 179484 198377 189967 195254 192315 197033 173743 168654 174283 194065 208019 197211 179454 188013 176195 170234 166398 176018 164885 175226 173173 176037 165598 185459 175279 173308 154331 158413 149238 159013 147165
23859 22265 23417 24149 32995 36876 39678 36784 39867 37761 35249 33931 34265 30841 29701 34090 35371 35197 32632 33133 38024 38148 33804 32093 36560 33939 28775 28321 36764 36079 31360 30418 35508 30598 28848
36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
264184 254279 254760 255853 254746 237006 251438 252794 255956 242735 249055 249799 243417 233750 227131 241127 255153 256964 260853 260535 266244 248613 257097 257782 259756 263397 272118 268585 273927 269701
735 674 514 776 938 730 984 738 679 1043 882 1199 1213 1093 2142 2861 1850 1577 2304 1853 1795 1609 1725 2071 1713 1552 2071 1790 1331 1708
35099 36895 33659 33418 42162 41054 33628 32971 39853 40513 35198 33993 38453 34827 27183 25284 34139 35758 32955 33219 42460 42104 36193 34444 36851 37344 33780 28648 37544 42529
217014 211666 240821 225978 219208 208757 236093 223562 220594 210037 237698 224093 229165 214820 249872 232087 237596 223145 245820 223640 227695 217496 259415 236274 243360 229264 259657 251053 245813 232945
22383 27733 29497 26191 22318 26214 31197 29239 24731 27010 26481 24294 18870 19946 18522 17181 14877 17940 19704 18838 18988 23184 22175 24201 20516 23039 23504 23414 20011 20603
160782 173061 161639 168210 171485 173663 163860 167711 168632 173270 161878 163692 152619 154842 121909 120458 112278 126694 121225 136139 144004 151657 131739 134639 124760 134267 127595 123024 119847 128087
146087 160782 173061 161639 168210 171485 173663 163860 167711 168632 173270 161878 163692 152619 154842 121909 120458 112278 126694 121225 136139 144004 151657 131739 134639 124760 134267 127595 123024 119847 128087
.
(The last part of this file is to be completed as data become available.)
26
29000 35099 36895 33659 33418 42162 41054 33628 32971 39853 40513 35198 33993 38453 34827 27183 25284 34139 35758 32955 33219 42460 42104 36193 34444 36851 37344 33780 28648 37544 42529
Appendix: Alternative Development of Equation (4.16) Consider the general equation in the form
/
y i T X i/ by yi 1(Q) wi 1/2 e0 i
,
Eq(A.1)
a generalized version of Eq.(4.6), where Q is the number of periods since y was last observed, and therefore yi 1(Q) is given the previous observed value of yi 1 if Q = 1, or is described by one or more regression equations when Q > 1.
If Q = 1, then
yi 1(Q)
yi 1 ,
Eq(A.2)
and
)2 y i(Q 1) wi 1 )e2
Eq(A.3) and Eq(4.13)
If Q = 2, then
/
yi 1(Q) T X i 1/ by yi 2 w i 1 1/2 e0 i 1
27
Eq(A.4)
Substituting Eq(A.4) into Eq(A.1), for Q = 2 one has
/
/
yi T X i/ b y (T X i 1/ by yi 2 w i 1 1/2 e0 ) wi 1/2 e0 i 1 i Eq(A.5) This is not a very useful format. However, since the component variances may be added, we have
)2 y i(Q 2) wi 1 )e2 by2 wi 1 1 )e2
Eq(A.6) and Eq(4.14)
and therefore (for Q = 2)
/
y i T X i/ by yˆi 1(Q 2) e0 b y2 wi 1 1 wi 1 i
Eq(A.7)
Further, for Q = 3, one has
)2 y i(Q 3) wi 1 )e2 by2 wi 1 1 )e2 by4 wi 2 1 )e2
Eq(A.8) and Eq(4.15)
So, in general,
) y i(Q) ) 2
2 e
Q
w M q 1
1
i 1 q
by2 (q 1)
28
Eq(A.9)
Therefore, Eq(A.1) can be rewritten as
/
yi X i by yˆi 1 /
T
where
yˆi 1(Q 1)
yi 1 ,
(Q)
e0 i
(Q 2)
yˆi 1
Q
w M q 1
i 1 q
by2 (q 1)
Eq(A.10) and Eq(4.16)
/
T X i 1/ by yi 2
/
yˆi 1(Q 3) T X i 1/ b y yˆi 2(Q 2)
1
, ...
/
yˆi 1(Q ) T X i 1/ by yˆi 1(Q 1)
29
,