REGRESSION WITH NON-GAUSSIAN STABLE DISTURBANCES ...

0 downloads 0 Views 2MB Size Report
with finite variance, then the least squares estimator has the minimum variance of all linear unbiased estimators of B. The central limit theorem is important.
7

Econometrica, Vol. 39, No. 3 (May, 1971)

REGRESSION WITH NON-GAUSSIAN STABLE DISTURBANCES: SOME SAMPLING RESULTSI By ROBERT BLATTBERG AND THOMAS SARGENT I. INTRODUCTION ECONOMISTS OFTEN use the basic regression model (I)

(j

= I, ... , T) ,

where Yj is the dependent variable at j, X is a nonstochastic independent variable at j, B is a parameter, and the U /s are independent, identically distributed random variables with mean zero. Economists usually estimate the parameter B by using the method of least squares. Aside from its computational simplicity, this method derives much of its popularity from the existence of the Gauss-Markov theorem and the central limit theorem. The Gauss-Markov theorem states that if the U/s in (I) follow a distribution with finite variance, then the least squares estimator has the minimum variance of all linear unbiased estimators of B. The central limit theorem is important because it establishes a foundation for hypothesis testing. It states that the sum of a large number of independently distributed variates, each of which follows a distribution with finite variance, will tend to be normally distributed. The relevance of this theorem is seen when it is recalled that the U/s in (1) are usually said to represent the impact on the Y/s of the sum of a very large number of omitted independent variables, each of which is itself insufficiently important to include in the regression. If these omitted variables follow distributions with finite variances and if they enter additively, the U /s will be normally distributed. This permits us to make the usual t and F tests. In addition, it means that least squares is the maximum likelihood estimator. The assumption that the U/s or the variates underlying them have a finite variance thus occupies an important role in justifying the use of least squares. In view of this fact, the accumulating body of evidence which suggests that many economic variables are best characterized as having infinite variances acquires special relevance.2 If the omitted variables whose impacts are summarized by the U/s have infinite variances, the assumptions of the central limit theorem fail to hold and the U/s will not be normally distributed. Moreover, it is likely that the U/s will be characterized by an infinite variance. Hence, that part of the GaussMarkov theorem which demonstrates the minimum variance or efficiency property of least squares is no longer applicable. Thus in a world of infinite variances, both

1 The authors thank John R. Meyer, Richard Roll, and the participants in seminars at the University of Toronto, University of Chicago. and the National Bureau of Economic Research for their helpful comments. A version of this paper was presented at the December, 1968 meetings of the Econometric Society. 2 For example, see [2, 7, 8, and II].

8

R. C. Blattberg and T. Sargent

502

R. BLATTBERG AND T. SARGENT

the efficiency of least squares and the applicability of normal distribution theory for hypothesis testing are lost. This paper investigates the performance of various estimators of B where the U/s in (1) follow distributions which have fatter extreme tails than does the normal distribution. More precisely, we begin by investigating the consequences of the assumption that the U/s follow a member of the class of symmetric stable Paretian distributions, a class which contains distributions with infinite variances. 3 The symmetric stable Paretian distributions are defined by the log characteristic function

where t is a real number, IX is the characteristic exponent, (j the location parameter, and 10'1" a scale parameter. Inspection of (2) reveals that where IX = 2, the distribution is normal with mean (j and variance 210'1". If IX = I, the distribution is Cauchy with central tendency (j and semi-interquartile range 0'. Differentiation of (2) demonstrates that where IX is less than two, only absolute moments of order less than IX exist. 4 Hence, for IX less than two, the variance is infinite, although the mean exists provided that IX is strictly greater than.one. An important characteristic of this class of distributions is that they are stable or invariant under addition. That is, a sum of independent symmetric stable variates with characteristic exponent IX will also be distributed according to a symmetric stable distribution with the same parameter lX. s In addition, the stable Paretian distributions possess a domain-of-attraction property which generalizes the classical central limit theorem. In view of the body of evidence which suggests that many economic variables have infinite variances, the standard use of the central limit theorem to support the assumption that the U/s in (1) are normally distributed thus actually suggests the weaker condition that the U /s follow a stable distribution. In this paper we investigate the performance of two types of estimators which have been suggested for use in the context of stable disturbances. The first type is the class of best (minimum dispersion) linear unbiased estimators which has been proposed by Wise [13]. The second is the estimator which minimizes the sum of absolute errors in (1). These estimators are discussed in Section 2. Section 3 then presents the results of some sampling experiments designed to assess the performances of these estimators. Our conclusions are stated in Section 4. 2. ALTERNATIVE ESTIMATORS

Wise [13] has suggested generalizing the concept of best linear unbiased estimator to include the case where disturbances are stable Paretian. As far as we For a description of the properties of stable distributions, see Feller [4]. Recall that by differentiating the log characteristic function n times with respect to t and evaluating the result at t = 0, the nth cumulant of the distribution is obtained. The moments can then be obtained from the cumulants in the standard way. , This can be proved easily by using the theorem that the log characteristic function of a sum of independently distributed variables is the sum of their log characteristic functions. 3

4

Regression with Non-Gaussian Stable Disturbances

REGRESSION

503

know, however, Wise has not written down an analytic expression for such estimators. For this reason, we present a derivation of the best linear unbiased estimator of B in equation (1) where the disturbances are distributed according to a symmetric stable distribution. Following Wise, we seek that estimator of B in (1) which is a linear function of the y/s (b(a) = l:j CjY)' which minimizes the dispersion parameter of b(a) (which is proportional to l:j ICl), and which is unbiased, requiring that l:) C)C j = 1. To see that the dispersion parameter of b(a) is proportional to l:) ICl, recall that the U/s have log characteristic functions given by log Q>u(t) = -lutI Cl. The estimator b(a) is given by the following linear combination of the U/s: b(a) = B

L CjX) + L Cp, )

)

where l:) C)C) = 1. Then the log characteristic function of b(a) is given by (3)

log q>,,(t) = iBt - lutl Cl L ICl)

Since u is a constant, by minimizing l:j ICl we minimize the scale parameter or dispersion of the distribution of b(a). Notice that for the normal distribution (a = 2), this leads to minimizing the variance of the estimator. Where a > 1 the conditional minimization is performed by minimizing the Lagrangian expression J =

t

lCl - A(tC)C} -

1)

where A is the undetermined Lagrangian multiplier. The first order conditions are

:~ (4)

= alCl- 1 sign Cj -

AX) = 0

(j = 1, ... , T),

)

oj

0..1.

= 2;. C)X) -

1

= o.

J

Rearranging (4) we have (j = 1, ... , T).

(5)

Notice that alC}"- 1 > 0 (j = 1, ... , T), and that ..1.IXJ is either positive or negative for all j depending on the sign of ..1.. If it is negative, sign C j = - sign X j for all j. But since l:j CjX) = 1, then sign C) must equal sign Xj for all j. Thus, multiplying (5) by sign X)' we have a1Cj1Cl-1 = AIX)I, (6)

ICl-1

= ~IXjl,

A)l/(CI-l) IC)I = ( ex IXA1/(CI-l).

9

10

R. C. Blattberg and T. Sargent

504

R. BLATfBERG AND T. SARGENT

Hence

a

).} 1/(,,-1)

(7)

Cj =

(

IX jI1/(,,-I)signX j .

We also have that

Then

and (8)

(

1

).} 1/("- 1)

~

L IX l/("

=

I)'

j

Substituting (8) into (7) we have C j --

1 L IXl/("

1)

IX 11 /(,,- 1 ) ' X j sign j'

j

Thus for oe > 1, b(oe) is given by

(9)

b(oe) =

L IXf/(,,-l)yjsign Xj L IXl/(" j

1)

j

Notice that for oe b(2)

= 2 we have

= ~Xj;j ~Xj

which is the familiar ordinary least squares estimator. For oe case, the best linear unbiased estimator has the form b(l) =

= 1, the Cauchy

l!. X.

where r corresponds to the observation which satisfies X. = max j X j' In addition to several members of the class of best linear unbiased estimators (BLUE), we have included in this study the minimum sum of absolute errors (MSAE) estimator which Mandelbrot [8] and Fama [2], among others, have suggested might be used in the context of Paretian disturbances where oe is less than two. The MSAE estimator was initially developed by Charnes, Cooper, and Ferguson [1] and was studied further by Wagner [12] and Fisher [5). The estimator minimizes the criterion ~j luJ where Uj = Yj - bxi , j = 1, . .. , T, and where fj is

Regression with Non-Gaussian Stable Disturbances

REGRESSION

11

505

th~ _MSAE estimate. The estimator is calculated using a linear programming algorithm. The solution has the form b = yz/X z where I is the observation in the optimal basis. Since it places less weight on extreme observations than does least squares, MSAE seems a natural estimator to apply in cases where extreme observations occur more frequently than where the disturbances are normally distributed. To make this a bit more precise, suppose that the V/s in (1) are independently and identically distributed according to the second law of Laplace (two-tailed exponential distribution) with density function

where the variance of Vj is given by 2A. 2 • This distribution is of some interest in the context of this paper, since it shares with the non-Gaussian stable distributions the property that, relative to the normal distribution, it is both more peaked and also denser in the extreme tails. If the V/s follow this distribution, the likelihood function associated with (1) is given by

The log of the likelihood function is then

which is maximized by minimizing I: j Iy j - bXJ This establishes that MSAE is the maximum likelihood estimator where the disturbances follow the second law of Laplace. The maximum likelihood property of the MSAE estimator in the context of such a fat-tailed distribution of disturbances suggests that it may be a good estimator in the context of other fat-tailed distributions like the nonGaussian stable distributions. To summarize, where the V/s in (1) are assumed to be distributed according to a stable symmetric Paretian distribution with IX > 1 and mean zero, we propose to investigate the performance of the following estimators: (i) ordinary least squares (OLS); (ii) a class of best linear unbiased estimators which includes OLS as a special case; (iii) minimum sum of absolute errors (MSAE). Under the assumed conditions, all three estimators are consistent, least squares and the best linear unbiased estimators also being known to be unbiased. Although we know of no proof of its lack of bias, the Monte Carlo studies summarized below convinced us that MSAE also is probably an unbiased estimator. Hence, our main interest centers on the relative efficiencies of the estimators. In the next section, we present the results of some sampling studies designed to assess this dimension of the estimators' performance.

12

R . C. Blattberg and T . Sargent

506

R. BLATTBERG AND T. SARGENT

3. SAMPLING RESULTS

Inclusion of MSAE among the estimators studied here introduces problems. Unlike the BLUE estimators, no analytic expression can be written down for the MSAE estimator. MSAE can only be calculated by applying a linear programming algorithm to the data at hand. This inability to write down a closed expression for the estimator makes it very difficult to determine its distribution. We are unaware of any analytic results on the sampling properties of the estimator, aside from the discussion of MSAE's consistency in Charnes, Cooper, and Ferguson [1]. In the face of the analytical intractability of MSAE, we propose to use sampling studies to throw some light on the properties of the various estimators. In addition to OLS and MSAE, the performances of the b(a) estimators b(1.1), b(1.3~ b(1.5), b(1.7), and b(1.9) were studied in sampling or "Monte Carlo" experiments. For each experiment one hundred replications of fifty observations each were generated. The X/s were generated according to the scheme Xj = .3Xj - l + ej where the e/s were independent, identically distributed random variables distributed uniformly on the interval [40,120). The same X/s were used throughout the study since this corresponds to the standard specification about the fixity of the independent variables used in regression analysis.6 The U/s, distributed according to a stable symmetric Paretian distribution with expected value zero, were produced by the method of Fama and Ron [3). The y/s were then generated from the model Yj = BX j + Uj (j = 1, ... ,50), for each replication. For each experiment, B equaled three. Separate experiments were run for U/s with the fo1\owing six values of a: 1.1, 1.3, 1.5, 1.7, 1.9, and 2. Since one has to specify a value of a prior to calculating the b(a) estimator, the fact that we include such estimators for several a's for each experiment will thus provide us information on the loss associated with calculating the estimator b(al) when the value of a characterizing the disturbances is some az "# a l · For each estimator, equation (1) was estimated for the one hundred replications of each experiment. The resulting sample of one hundred estimates of B was viewed as an estimate ofthe population distribution ofthe estimator. Two measures ofthe dispersion ofthe estimates were calculated for each such sample distribution. The first is the mean absolute deviation (MAD) of the estimated values from the true value. The MAD associated with the estimator b is defined as

where i is an index running over replications. The second measure is an estimate of the dispersion parameter (1 of the characteristic function of the estimator. This seems an interesting measure since the BL UE estimators, being linear combinations of the U/s, are distributed according to 6 Sets of experiments were also carried out for X's generated by several other schemes. The results agreed in all important respects with those reported in the text.

R egression with Non-Gaussian Stable Disturbances

13

507

REGRESSION

symmetric stable Paretian distributions with characteristic parameter that of the U/s. The estimator of (1 was

IX

equal to

S(h) = h. n - h.28

1.654 where hI is the fth fractile of the sample distribution of the estimates b. Fama and Roll [3] have shown that this estimator possesses an asymptotic bias of less than four-tenths of one per cent. However, its distribution in finite samples is not known. Of course, for the b(lX) estimators, population values of the dispersion parameter can be calculated analytically using relation (3) given above. We report the population values below, but in addition it seems worthwhile to report S(h) for the linear estimators both to serve as a rough check on the character of our sample and to make possible the comparison of MSAE and b(lX) estimators on the basis of the same set of data. The results of our experiments are recorded in Tables I and II. We refer first to Table I which reports S(h) for each estimator and in parentheses the corresponding population value of the dispersion parameter for the linear estimators only. It is seen that S(h) generally understates the population value, although the amount of the discrepancy varies both with IX and b(IX). Because of this bias in the empirical values of S(h), it seems best to compare the estimated S(h) for the linear estimators with the S(h) associated with the MSAE estimator. Actually, however, the differences between the S(h)'s and their population values are quite small relative to the differences which exist bet", ''1 S(b) for MSAE and the linear estimators, so that none of our conclusions WOUld be altered by comparing S(h) for MSAE with the population values of the dispersion parameter of the linear estimators. Several features stand out in the empirical results in Table I. First, MSAE outperforms OLS for IX less than 1.7 while OLS does better for IX exceeding 1.7. The TABLE Ia VALUES OF S(b) = (&.72 - &.28)/1.654

IX

MSAE

U

.0132

1.3

.0151

1.5

.0151

1.7

.0151

1.9

.0157

2.0

.0157

btU)

b(1.3)

b(1.5)

b(l.7)

b(1.9)

OLS

.0466 (.0550) .0374 (.0382) .0276 (.0299) .0223 (.0252) .0201 (.0222) .02l0 (.0212)

.0521 (.0578) .0333 (.0348) .0241 (.0241) .0176 (.0184) .0147 (.0149) .0117 (.0137)

.0533 (.0596) .0367 (.0350) .0211 (.0238) .0161 (.0178) .0137 (.0141) .0116 (.0129)

.0559 (.0607) .0395 (.0354) .0218 (.0239) .0139 (.0177) .0132 (.0140) .0116 (.0127)

.0575 (.0613) .0412 (.0357) .0223 (.0240) .0141 (.0177) .0115 (.0140) .0116 (.0127)

.0587 (.0616) .0425 (.0358) .0223 (.0240) .0144 (.0177) .0115 (.0140) .0115 (.0126)

a Based on one hundred replications for each experiment. Sample size was fifty for each replication. True values of (1 are enclosed in parentheses beneath values of S(b) for linear estimators.

14

R.C. Blattberg and T . Sargent

R. BLATIBERG AND T. SARGENT

508

performance of OLS relative to MSAE diminishes steadily as (1. is decreased from two toward one. This outcome provides support for Mandelbrot's and Fama's proposal that MSAE be used where Paretian disturbances seem likely. For (1. less than 1.5, the relative superiority of MSAE is quite large, while for higher IX'S the relative margin in favor of OLS is much smaller. MSAE thus appears to be more robust than OLS over a range of (x's. Second, in both the empirical and theoretical data, the minimum dispersion BLUE estimator is always associated with the IX actually characterizing the disturbances. In addition, the dispersion increases as the IX assumed in calculating the estimator increasingly departs from the (X characterizing the disturbances. However, the costs in efficiency of misspecifying (X appear to be fairly small for all but very large mistakes, e.g., specifying IX to be 1.1 when it is in fact 2. Third, notice that for small IX, MSAE out-performs even the best of the BLUE estimators by a sizable margin. Again, this outcome supports Mandelbrot's and Fama's advocacy of MSAE estimation. It suggests that in the face of Paretian disturbances, development of MSAE regression is a more promising path than developing the linear estimators proposed by Wise. The results in Table II, which reports the MAD statistics, are very similar to those based on S(b) and there is no need to comment on them in detail. The pattern of these results seems entirely sensible. As IX diminishes from two toward one, the tails of the distribution of disturbances become fatter and fatter and the performance of MSAE relative to OLS improves. What is perhaps more interesting is the relative robustness of MSAE across a variety of (X's generating the U/s. On the basis of our results, a two-stage estimator may seem appropriate. Suppose we estimate equation (l), Yj = BX j + Uj ' by any of the estimators discussed above, say least squares. Each of these estimators is consistent, and hence OJ = Yj - BX j ' where B is the estimate of B, is a consistent estimator of U j ' By using these residuals, it is possible to estimate the IX characterizing the disturbances. Then on the basis of Table I and this estimate of IX, a more efficient estimator of B can be obtained, say by using MSAE, if the estimated IX is less than 1.5. Preliminary experimentation with this procedure has convinced us that one can get quite a good estimate of IX

TABLE II" MEAN ABSoLUTE DEVIATION ex

MSAE

b( 1.1)

b(l.3)

b( 1.5)

b(l.7)

b( 1.9)

OLS

1.1

.0162 .0169 .0172 .0174 .0174 .0174

.0996 .0534 .0355 .0270 .0226 .0213

.1249 .0527 .0299 .0199 .0151 .0136

.1436 .0572 .0307 .0199 .0146 .0131

.1537 .0600 .0317 .0202 .0147 .0131

.1601 .0618 .0324 .0204 .0148 .0132

.1623 .0625 .0326 .0205 .0148 .0132

1.3 1.5 1.7 1.9 2.0

• Based on one hundred replications for each experiment. Sample size equals fifty.

Regression with Non-Gaussian Stable Disturbances

REGRESSION

15

509

using the residuals as estimates of the disturbances. This suggests that such a two-stage procedure may be promising. 4.

CONCLUSIONS

The results of previous sections suggest several conclusions. First, the MSAE estimator performs sufficiently well that it deserves further study and elaboration. In particular, the sampling theory of the estimator needs to be developed. In conjunction with the Monte Carlo results presented in Section 3, the large body of evidence suggesting that many economic variables are best described as being generated by non-Gaussian stable distributions makes the case in favor of MSAE estimation very strong. At the very least, the method deserves further study. Second, the relative superiority of MSAE over OLS appears to be sufficiently large for small (X (say less than 1.5) that it may be profitable for economists to begin to estimate the a's characterizing the residuals in their models. Then the application of a two-stage procedure such as that described in Section 3 may be useful. Third, it seems fairly clear that the relative superiority of MSAE over OLS will hold for many fat-tailed distributions of disturbances that are not stable. For example, it is possible to show that MSAE out-performs OLS for various fat-tailed distributions of disturbances that are formed by blending normal and uniform distributions.' Although these distributions have finite variances, they are denser over portions of the extreme tails than is the normal distribution. These results suggest that one need not rely on acceptance of the stable Paretian model as a description of the disturbances in order to justify the use of the MSAE estimator. Of course, this point is also suggested by the maximum likelihood property of MSAE in the context of disturbances with a Laplace distribution. Finally, it is interesting to speculate whether our results might provide an explanation of the rather remarkable outcome of an important experiment involving MSAE and several least-squares-based estimators recently reported by Meyer and Glauber [9]. Meyer and Glauber estimated a model of the determination of investment expenditures by several techniques including MSAE and OLS. 8 Then the estimates produced by each estimator were used to forecast investment for periods extending beyond the data employed in estimating the model. These forecast values were then compared with the actual values in order to assess the accuracy of the forecasts produced by the various estimators. On virtually every criterion (e.g., squared error, mean absolute deviation) MSAE out-performed the other estimators in the forecast period. Jorgenson [6] has cited these results as evidence that the equation employed to forecast was misspecified. However, while this is a possibility, it is not clear that MSAE is any less sensitive to specification errors than the other techniques used by Meyer and Glauber. An alternative explanation of the results is that they flow from the property of the MSAE estimator 7 Demonstration of this proposition is contained in a manuscript by the authors which is available on request. Also, see Press [10] for a discussion of some related points. 8 The others were a least squares estimator with a priori constraints imposed on some parameters and an estimator derived from a non-symmetrical error-cost function.

16

R .C. Blattberg and T. Sargent

510

R. BLATTBERG AND T. SARGENT

established above, its relatively good performance in the presence of disturbances with very fat tails. Taken together with the evidence on the widespread existence of Paretian variables in economics, this provides a possible reason for the better performance of the MSAE predictor in Meyer and Glauber's study. University of Chicago and University of Pennsylvania and N BER Manuscript received February, 1969; revision received June, 1969. REFERENCES [1] CHARNES, A., W. W. COOPER, AND R. O. FERGUSON: "Optimal Estimation of Executive Compensation by Linear Programming," Management Science (1955) . [2] FAMA, E. F . : "The Behavior of Stock-Market Prices," Journal of Business (January, 1965). [3] FAMA, E. F., AND R. ROLL: "Some Properties of Symmetric Stable Distributions," Journal of the American Statistical Association (September, 1968). [4] FELLER, W.: An Introduction to Probability Theory and Its Applications, Vol. II. John Wiley and Sons, New York, 1966. [5] FISHER, W. D . : "A Note on Curve Fitting with Minimum Deviations by Linear Programming," Journal of the American Statistical Association (June, 1961). [6] JORGENSON. D.: Book Review in Journal of Political Economy (February, 1966), 99-100. [7] MANDELBROT, B.: "New Methods in Statistical Economics," Journal of Political Economy (October. 1963). [8] - - - : "The Variation of Certain Speculative Prices," Journal of Business (1963). [9] MEYER. J. R ., AND R. R. GLAUBER: Investment Decisions, Economic Forecasting, and Public Policy. Harvard Business School. Boston, 1964. [10] PRESS, S. J. : "A Compound Events Model for Security Prices," Journal of Business (1967). [11] ROLL, R.: "The Efficient Market Model Applied to U .S. Treasury Bill Rates," Unpublished doctoral dissertation, Graduate School of Business, University of Chicago, 1967. [12] WAGNER, H. M . : "Linear Programming Techniques for Regression Analysis." Journal of the American Statistical Association (March. 1959). [13] WISE, J.: "Linear Estimators for Linear Regression Systems Having Infinite Residual Variances," manuscript, 1966.

Suggest Documents