Abstract. Investment in capital goods and intangibles is one of the main drivers of economic growth and the increasing availability of firm- level micro data from ...
International Conference On Applied Economics – ICOAE 2010
37
INVESTMENT FORECASTING WITH PLANS COLLECTED IN BUSINESS SURVEYS LEANDRO D'AURIZIO - S. IEZZI1
Abstract Investment in capital goods and intangibles is one of the main drivers of economic growth and the increasing availability of firmlevel micro data from business surveys has provided material for a wide body of empirical studies on the topic. It is relevant for economic policy to attempt to predict investment growth for a given year at the end of the previous one. We do that by exploiting data derived from a collection of regular yearly surveys on Italian manufacturing firms. We first show some descriptive sample evidence on both realized and planned investment and then propose a forecasting model for investment variations, selected among various specifications of dynamic panel data models. JEL codes: C500, C520, C530 Keywords: Investment plans, Dynamic models, Forecasting.
1
Introduction
Capital expenditures represent an engine of growth: jobs and productivity depend on them, as does the growth of production capacity. The volatility of investment expenditures, especially that of durable goods by the manufacturing sector, is a major factor in the explanation of aggregate output fluctuations (Bernanke, 1983). Low investment levels are routinely targeted as leading causes to a host of economic ills, such as low long-run growth and stagnant employment (Chirinko, 1993). Investment plans are a barometer of the general economic business climate. They also play a crucial role for economic policy and a considerable literature has been consequently spawned about their interplay with the other relevant economic fundamentals (Guiso and Parigi, 1999) for an interpretation of how uncertainty on investment decisions may slow down capital accumulation). In most outlook business surveys, plans of some kind are collected through categorical variables. In general terms, qualitative forecasts collected in business surveys are routinely fed into forecasting model of industrial productions, orders, etc, together with an array of other indicators. By themselves, they are a leading indicator of the business community's mood and are a proxy of the quantitative short-term behaviors of agents' decisions (Nardo,2003). The present paper examines investment microdata collected from a representative sample of Italian manufacturing firms over an extended time span. Every firm reports yearly investment plans for the following year in qualitative form in a short-term business outlook survey in September. An extended survey carried out in the year's first four months collects investment levels for the previous two years, together with that for the current one in form of partial forecast. Plans collected in September are the most relevant for economic analysis of the Italian business cycle, that is carefully assessed in this period of the year, when the economic activity gets back to its usual rhythm after the seasonal summer slowdown. The importance of investments is worth the effort to develop an econometric model that can be estimated with all the data stacked from repeated surveys, the short-term survey of September being the most recent available. The model aims to predict the investment variation for the following year in the previous year's last months. This forecast overcomes the limits of sample estimates of plans expressed as an ordered categorical variable and is available 7-8 months before the sample estimate of the partial forecast collected in the extended survey. The paper is organized as follows. Section 2 describes the survey data used and provides some descriptive evidence. The following sections 3 and 4 sketch the modeling strategy. Some more complex econometric issues are described in -section 5, whereas section 6 shows the use of survey weights in the modeling. Results are presented and interpreted in section 7. Section 8 concludes.
2
The data
The Bank of Italy's yearly survey on industrial and service firms collects data on the most relevant variables concerning company activities (turnover, employment, investment, productive capability, etc.). Interviews are carried out in the year's first four months. The survey has been carried out since 1972 (from now on, we will define it as "long survey"). The sample is a continuously updated panel, revised to take into account the attrition process. 2 Over the years both the sample size and the reference population have been considerably broadened: starting from 1999 the industrial sector of firms with at least 50 employees was covered, with the introduction of energy and mining firms. In 2001 industrial firms with between 20 and 49 employees were for the first time enrolled. Since 2002, firms belonging to the non-financial private sectors with at least 20 employees were also covered. The survey design is stratified with a single stage. The design strata are combinations of branches of activity, size classes and geographical areas (referring to the firms' head offices). The sample size is determined by first using the optimum allocation to strata that minimizes the variance of the means and variations of the main variables (employment, turnover and investments) and successively allocating the numbers so obtained among regions and branches of activity according to the population size. The weighting procedure assigns each firm an initial weight, given by the ratio of number of firms in a stratum to number of firms in the sample (strata are formed by combinations of branch of activities and size classes). These weights are adjusted by post stratification in order to align the weights to the geographical distribution of the firm population. Quantitative investment data cover the year of interest, together with the previous one and the following year (as forecast): by doing so, variations can be computed by using a single cross-section. Estimating trends from single surveys has proved much more stable
1 2
Economic and Financial Statistics Department, Bank of Italy. For further details on the design of the yearly survey, see Bank of Italy, 2008.
International Conference On Applied Economics – ICOAE 2010
38
than estimates obtained from adjacent surveys, often made unreliable by firms' structural changes and classification and measurement problems. Such confounding factors are more easily kept under control within a same questionnaire.
In the rest of the paper we will always refer to the manufacturing firms with 50 employees or more, for whom data have been continuously available since the beginning of the survey. The historical pattern of the sample estimates for investment variation for these firms is shown in fig. 1. Figure 1. Italian manufacturing firms with 50 employees or more. Percent variation of realized investments, 1985-2008 (at 2008 constant prices)
The estimate is simply obtained as:
where, for the generical i-th firm belonging to the sample of the survey carried out in the first months of the year t+1, Wi;t is the final weight, Ii;t-1 and Ii;t are respectively the investment levels for the years t-1 and t. Investments for the year t have been trimmed according to the method known as 'type II Winsorization', used in the official dissemination of the survey results. The method (Kocic and Bell, 1994, Smith et al., 2003) prevents smaller firms' values, that are outliers in terms of per capita investments, from influencing the estimates too much. The trends shown in fig. 1 are however very close to those obtained without this data treatment. We can see how the series is extremely
volatile and therefore difficult to forecast. Since 1993 a short-term business outlook survey is carried out in the days between September 20th and October 10th, on the same sample as the quantitative survey. Forecasts on the specific activity of the firms are collected. Data are mostly qualitative, since they are recorded during a quick telephone interview. Discrete answers may be less precise than quantitative ones, but are also less affected by non sampling error.3 Firms report their investment plans for the following year in terms of the investment variation for the following year compared to the current one. Five ordered categories are available: "strong decrease" (less than -10%), "slight decrease" (-10% to -3%), "stable" (-3% to 3%), "slight increase" (+3% to +10%), "strong increase" (more than +10%). For example, the September survey that takes place in 2005 collects categorical data about the planned investment variation between 2006 and 2005. The corresponding realized investment levels relative to 2005 and 2006 are collected only one year and a half later in the long survey in the January - April 2007 period. In the same occasion, planned investment levels for 2007 are also collected. Together with the 2006 level, they update under quantitative form the categorical forecast collected in September. Figure 2 shows a graphical representation of the information flow within the various survey occasions.
3
Two trade-offs emerge: 1) between loss of information generated by qualitative answers and risk of getting low response rates and possible bias by asking quantitative questions (Pesaran and Weale, 2006); 2) between inferior informative power provided by discrete answers and superior measurement error generated by quantitative questions.
International Conference On Applied Economics – ICOAE 2010
39
Figure 2. Bank of Italy's panel business surveys - Information flow within consecutive survey occasions
Data from the short-term survey can be synthesized by frequency tables of firms signaling respectively decrease, stability and increase of investment activity for the following year. A more concise representation of this distribution neglects the stability forecasts and computes the balance between the frequencies of increase and decrease. If we indicate with INCt the percentage of answers that at time t signaled "slight increase" or "strong increase", and with DECt the symmetrical percentage of those reporting at time t "slight decrease" or "strong decrease", the balance statistic is simply: BALt = INCt - DECt.. It is traditionally used to present business surveys that attempts to forecast the short-term economic outlook (IFO, 2007) by simply measuring whether firms planning an increase exceed those planning a decrease. Going beyond this simple yet useful meaning, a rationale for their use is that the balance statistic can be thought as the expected value of a discrete aggregate probability distribution which locates answers in three points: -100, 0, 100 (expressing respectively decrease, stability and increase in percent variation). The transformation assumes, a priori, the symmetry of answers: the distance between "increase" and "stable" is the same as that between "stable" and "increase". We can put together the historical series of the balances and the corresponding quantitative realizations. The two plots are however not directly comparable, since balances are differences of two frequencies and represent a quantification of planned investment variation only under strong hypotheses. They can just provide a rough idea of the direction of the current investment trend. The only way to get two comparable plots requires the prior categorization of the quantitative realized investment variations through the same numerical intervals of the categorical investment plans, compute their balance and compare it with the balance of the categorical plans. Figure 3 shows the three series. The predictive capability of the balances of the categorical plans can be assessed by looking at the coincidence of its turning points with that of the series of the quantitative realizations: as we can see, discrepancies take place only for years of sharp and unforeseeable recessions, such as 2001 and 2008. If we compare instead the two series of balances, we realize that plans systematically overestimate realizations. Figure 3. Italian manufacturing firms with 50 employees or more. Quantitative investment variations, balances of categorical investment plans and of categorized investment variations 1994-2008
3
The forecasting model
We explore an econometric model for realized investment variation, with the qualitative planned investment variation from the short-term survey of the previous year among the co-variates. The model's quantitative forecasts are more informative than simple
40
International Conference On Applied Economics – ICOAE 2010
frequency distributions of the ordered categorical variable representing plans, can be available at the same time, well in advance of quantitative plans and, if correctly specified, remove the upward bias present in investment plans. From now the investment variation will be the ratio between investment levels for years t and t-1:
Our starting point is a dynamic model of order p for panel data over the periods 1, , T , with yit as dependent variable:
Since the planned investment variations are collected in discrete form according to 5 categories (see section 2) yit-1 is a fourdimension vector of binary variables, yedit-1=(yit-1--, yit-1-, yit-1+, yit-1, each standing for "strong decrease", "slight decrease", "slight increase" and "strong increase", with "stable" as reference category. The first occurrence of the dependent variable on the right term yit-1 is however not available at the end of the year T when the model is needed for forecasting: the term is therefore replaced by the corresponding quantitative plan yeqit-1, collected in the extended survey in the first months of the year. The two variables are highly correlated (as shown in fig. 4, plotting the trends of forecast and realized investment variations, together with the correlation coefficient between the two variables for every cross-section), because yeqit-1 is formulated during the year of interest, when previous plans have been much revised to take into account of unexpected events. Revisions of past plans in the light of true events make these updated plans quite close to their actual realizations. Figure 4. Italian manufacturing firms with 50 employees or more. Quantitative realized and planned investments variation, cross-sectional correlation coefficient between the two variables (variations measured at 2008 constant prices), 1985-2008
For the previous reasons we will use the following alternative specification:
We have chosen to model yit directly, instead of its logarithm. This choice was supported by the results of an exploratory analysis in which we estimated eq. (3) with log(yit) instead of yit then computed predictions expressed as:
where the second term inside the exp operator is half the error variance of a 1-step-ahead forecast. These predictions were considerably less stable than those obtained without the transformation (see also Lutkepohl and Xu, 2009, for evidence supporting these findings in the modeling of monthly inflation data series.). Equation (3) can also be regarded as the reduced form of a two-equation system, where the first component is equation (2) and the second one models the relationship between realizations and quantitative plans collected in the extended survey:
The system falls within the classic two-stage least squares framework, which is unsuitable for forecasting exercises, since the dependent variable of the second equation is not available for the last year T and for this reason we have turned to eq. 3. The error term of eq. 3 is therefore the sum of two components, easily derivable by replacing yit-1 in the first equation of eq. (4) with its equivalent in the second equation:
International Conference On Applied Economics – ICOAE 2010
41
Eq. 3 is our baseline specification (model M0) for the forecasting procedure, that we progressively enrich according to the following table 1. Table 1. Model specifications M0
Lagged dependent variable (investment growth ratea) Categorical investment plans: 1. Strong decrease 2. Slight decrease 3. Slight increase 4. Strong increase
M1*
All the regressors in M0 plus Branch: 1. Textile, clothing, leather, shoes 2. Engineering Area of location+ 1. Northern Italy 2. Central Italy Number of employees: 1. Between 50 and 199 2. Between 200 and 499
M2*
All the regressors in M0 plus Prediction error of investment plans in the previous year 1. Over-planning of realized investment 2. Under-planning of realized investment
M3*
M1+M2
M4*
All the regressors in M3 plus Growth rate of turnover from sales (a): Investment plan for the first lag. – (*): All the variables added in the models M1-M4 refer to the year immediately preceding that of the dependent variable. – (+): Geographical area is defined by the location of the firm's headquarter. Model M1 adds to M0 some time-invariant sector-specific, geographic and size effects, in order to control for heterogeneity in the means of the yit series across sectors, geographical areas and to catch investment behaviors and financial constraints differentiating small and large firms. As an alternative to M1, M2 adds to M0 two binary variables indicating whether in the previous year the investment plans are above or below the realized investment variations: they capture the prediction performance of qualitative past plans. Model M3 simply combines the regressors of M1 and M2 and, finally, M4 add to the set of M3 regressors the real growth rate of turnover from sales.
4
From data to models and the problems of panel attrition
We need to model the relation between realized investment and a set of co-variates including investment plans over an extended time span. Since the attrition process affecting the two surveys determines incomplete information for every sample unit over the years, two strategies can be followed for model estimation. According to the first one, the model could be estimated on the pooled sample of all firms, provided that a plan can be matched with its corresponding realization (only units not falling into this condition are discarded). This sample is only limitedly unbalanced, since the condition holds for around 90 percent of every cross -section units. The alternative solution uses a balanced sample of firms for which plans and realizations can be found without gaps over a set of contiguous cross-sections covering a specific time span. The advantage of this approach is that traditional methods for dynamic panel data model estimation can be easily applied, at the cost of a potential loss of information. On the other hand, estimation methods for dynamic panel data model on unbalanced panels are uncommon in the econometric literature (Bruno, 2005) as well as quite complex and based on strong assumptions (Lokshin, 2008). Since our model has to be used for generating one-year ahead forecasts of aggregate investment variation, we prefer to employ more robust traditional estimation methods suitable only for balanced panels. As for panel lengths, there is a clear trade-off between short panels with a large cross-sectional dimension and long panels with relatively small number of units. Short panels also could not feature the regularities needed to produce good forecasts and could represent behaviors that are too idiosyncratic of single years. Longer panels could however be distorted by sample selection mechanisms. A balance must therefore be struck between these two extremes. We need to measure whether the cross-sectional estimates are biased if they are computed only on the observations belonging to the panels we use. An easy way to do this is to run a simple dummy regression with realized investment variation as dependent and a dummy indicating whether the unit belongs to the panel, together with the complete sample design variables, acting as control covariates.4 More formally, we estimate independently for every cross-sectional survey the following equation: 4
They are: geographical area of the firm's administrative headquarters (North-West, North-East, Center, South and Isles), class size (50-99, 100199, 200-499, 500-999, 1000 and more, in number of employees), sector of economic activity (Food products, beverages and tobacco; Textiles,
International Conference On Applied Economics – ICOAE 2010
42
The sub-script i indicates the generic unit belonging to the cross-section for the year t and dit is simply:
Zit is the vector of dummies representing the survey-design variables for the unit i.5 This procedure is simpler and more intuitive than the classical Heckman two-step procedure, that produces similar results, not shown here for brevity. 6 Balancing the panel would therefore be a source of bias if the coefficient π1 were significant: in such a case the selection mechanism would not be well explained and controlled by the survey design variables. The number of cases where such a coefficient is significant for every sample cross-section is quite contained (results are not shown for brevity). A significant panel attrition is present for the years 1996, 1999 and 2001. The year most affected is 2001, as an unavoidable aftermath of a 20 percent increase in sample size that took place in that year. The analysis also shows that panels of length 8 years are a reasonable compromise: we use the first seven years for model estimation, whereas the last year of each panel is set aside for the evaluation of out-of-sample forecasting (in the exploratory phase we also analyzed slightly shorter panels, with quite similar results).
5
Some econometrical issues
Given the high degree of similarity between yit-1 and yeqit-1, discussed at length in section 3, equation (3) is approximately a classical dynamic model with the first lag of the dependent variable replaced by its quantitative forecast. We explicitly write the disturbance term of equation (3) as sum of an individual-specific time-invariant effect μi and a pure disturbance term ζit: If individual effects exist, the use of GMM would be necessary, after first-differencing the equation to solve. Should they follow instead a degenerate distribution, the OLS estimator on the original values is consistent and more efficient than the GMM estimator, under the assumption that the error term μi be serially uncorrelated. Testing for the presence of individual effects is therefore a necessary step. A good clue to the fact that the model might by characterized by the absence of individual effects is that the dependent variable is a variation, instead of a level. Holtz-Eakin (Holtz-Eakin, 1988) proposed a very simple Sargan-difference test for the presence of individual effects for the purely first-order autoregressive model, which can be generalized to account for the presence of additional lagged values of the dependent variable and both endogenous, predetermined and time-invariant regressors. Through Montecarlo simulations JimenezMartin shows that the test lacks power when the coefficient of the lagged dependent variable tends to unity, whereas additional regressors sharply increases the power of the test (Jimenez, 1998). Since the Holtz-Eakin's test is based on the assumption that the error terms ζit: be serially uncorrelated, a test must be provided for this assumption, prior to testing for the existence of individual effects. This caution is all the more needed because of the complex error term dealt with (eq. 7), entailed by the linear combination of two disturbance terms separated by a time lag. Arellano and Bond (Arellano and Bond, 1991) propose for the purpose a simple direct test, based on the error term of the model expressed in firstdifference: the consistency of GMM estimators thus relies upon the assumption that E(Δ ζit Δ ζit-2)=0. A test for lack of second-order serial correlation in the first-difference residuals can be implemented in two ways: 1) by using the residuals coming from the model on differences; 2) by exploiting the residuals coming from the equations in differences of the system model. We prefer the first solution, since the second one, although more efficient, is conditional on the assumption of absence of individual effects. In order to implement the test on an appropriate number of lags and avoid committing a type II error, the test is carried out for all the combinations of lags for the dependent variable and model specification. Given the limited panel length, the number of lags p for the dependent variable can be 1,2, or 3. As shown in table 2, the presence of individual effects can be ruled out for all the model specifications and all the lags of the autoregressive component. The lack of second-order serial correlation in the first-difference residuals is however not always supported by data. Such hypothesis is largely rejected for most of the model specifications especially for the middle panels (19972003 and 1998-2004) . This result can however be explained by the sudden deliberate increase of the sample size in the years 19992001 that brought about some instability. Anyway, no model specification rejects the hypothesis of lack of second-order serial correlation in the first-difference residuals for all the panels and this fact is the strongest element supporting the validity of the HoltzEakin test.
clothing, hides and leather; Chemicals, rubber and plastic; Non-metal minerals; Engineering; Other manufacturing). Following the survey design, the class sizes and the sectors of activity are interacted. 5 Every unit is weighted by the product of the design weight and the investment level at time t-1: in this way firms are scaled according to the size of their contribution to the time t-1 estimated total investment level. 6 With the Heckman approach we would model a first equation for the inclusion in one of the panel samples and a second one having the realized investment variation as dependent.
International Conference On Applied Economics – ICOAE 2010
43
Table 2. Results from Holtz-Eakin test for the presence of unobserved individual effects M0 M1 M2 M3 M4 Panel p=1 p=2 p=3 p=1 p=2 p=3 p=1 p=2 p=3 p=1 p=2 p=3 p=1 p=2 p=3 0.115 0.022 -0.007 0.006 -0.188 0.021 0.100 0.069 0.044 0.006 -0.188 0.021 0.101 0.070 0.045 19942000 (-1.473) (0.974) (-1.815) (-0.394) -1.379 (-3.006) (-1.210) (0.550) (-0.125) (-0.394) -1.379 (-3.006) (-1.210) (0.551) (-0.124) 1995- 0.012 0.147 0.072 0.095 0.010 0.061 0.206 0.052 -0.042 0.096 0.010 0.061 0.206 0.053 -0.042 2001 (0.249) (2.280) (1.672) (2.834) (2.450) (1.881) (0.328) (2.920) (1.614) (2.833) (2.450 (1.880) (0.328) (2.092) (1.614) 1996- 0.000 0.000 0.146 0.152 0.002 0.000 0.010 0.203 0.000 0.153 0.003 0.001 0.011 0.204 0.001 2002 (-1.362) (-1.013) (-0.250) (-2.573) (-0.119) (-1.642) (-2.162) (-1.395) (-0.693) (-2.572) (-0.118) -1.643 (-2.162) (-1.394) (-0.692) 1997- 0.051 0.097 0.092 0.004 -0.088 -0.030 0.052 0.113 0.117 0.004 -0.088 -0.029 0.053 0.113 0.118 2003 (-2.466) (-2.106) (-2.453) (-4.517) (-2.525) (-1.842) (-8.112) (-2.804) (-3.723) (-4.517) (-2.525) (-1.842) (-8.112) (-2.803) (-3.723) 1998- 0.135 0.034 0.002 0.334 0.011 -0.016 0.011 0.379 0.243 0.335 0.012 -0.015 0.011 0.380 0.244 2004 (-2.155) (-2.336) (-2.140) (-2.505) (-3.212) (-2.792) (-3.285) (-2.694) (-4.402) (-2.504) (-3.211) (-2.792) (-3.284) (-2.693) (-4.401) 1999- 0.061 0.940 0.071 0.016 0.007 0.002 0.090 0.036 0.000 0.017 0.007 0.003 0.090 0.037 0.000 2005 (-0.379) (0.030) (-0.787) (-1.473) (-1.406) (-1.341) (-2.707) (-1.838) (-2.161) (-1.473) (-1.405) (-1.340) (-2.707) (-1.837) (-2.161) 2000- 0.140 0.045 0.004 0.178 0.077 0.108 0.041 0.077 0.098 0.179 0.078 0.108 0.041 0.077 0.098 2006 (0.594) (1.311) (1.845) (-0.898) (-0.257) (1.050) (1.742) (0.811) (2.753) (-0.898) (-0.256) (1.050) (-1.742) (0.811) (2.753) d.f. 7 6 5 7 6 5 7 6 5 7 6 5 7 6 5 χ2df at 5 % 14.067 12.592 11.070 14.067 12.592 11.070 14.067 12.592 11.070 14.067 12.592 11.070 14.067 12.592 11.070 Sample size: 124 (1994-2000), 131 (1995-2001), 137 (1996-2002), 138 (1997-2003), 127 (1999-2005), 131 (2000-2006).
6
The weighted model estimation and the forecasting procedure
Given the absence of individual effects, the model specifications outlined in section 3 (M0, M1, M2, M3 and M4 by p=1,2,3) are estimated by OLS on a collection of rolling balanced panels. We use weights in order to fully control for the survey design. The design weights are multiplied by the investment level at time t-1., in order to rank the unit's contribution to the estimate according to their investment size. Formally, we use the following weights: where W*i is the design weight adjusted for the panel attrition and Iit-1 is the investment level for the year t-1, as collected in the cross-section t. Let us indicate with:
and with: respectively the forecasting model in compact matrix form and the relative estimated coefficients. The standard deviation of the coefficients that we are going to use in our computations duly takes into account the survey design (DuMouchel and Duncan, 1983). More specifically, let W be a diagonal matrix containing the weights on the main diagonal, I the identity matrix, k the number of regressors and n the number of observations used in the model estimation. We define:
and:
By using (11) and (12), an estimator for the variance of the residuals is:
and the variance/covariance matrix of the model coefficients Σ can be accordingly estimated as: By using the Choleski's decomposition, Σ=TT', we generate 5,000 drawings from the distribution of β. The individual prediction for unit i of the investment variation between times t+1 and t can then be expressed as:
where Zit indicates the model regressors. A consistent predictor for the aggregate investment variation can then be obtained as the weighted average of eq. (15) over all the units:
International Conference On Applied Economics – ICOAE 2010
44
Where The estimator (16) can now be compared with the realized investment variation the model attempts to forecast7:
that can also be written as:
The similarity between expressions (18) and (16}) is now unambiguous and the expression (16) clearly estimates (17) by replacing individual planned investment variations with corresponding individual estimates (15).
7
Analysis of the forecasting performance
We compare the out-of-sample forecasting performance for the 8-th and last year of each panel in terms of bias, variance and mean squared error (MSE) of one-year-ahead forecasts derived from OLS estimates for the 15 model specifications outlined in the previous sections (M0, M1, M2, M3 and M4, each using a maximum of three lags, p=1,2,3, for the dependent variable). The forecasting period refers to the years 2001-2007. Results are reported in Table 3. Table 3 Squared bias, Variance and Mean Squared Error of one-year-ahead forecasts Specifications M0 M1 M2 M3 M4 number of lags number of lags number of lags number of lags number of lags 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Squared bias Years Sample size 2001 124 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.01 0.00 2002 131 0.02 0.02 0.03 0.02 0.02 0.03 0.03 0.04 0.04 0.03 0.04 0.04 0.03 0.04 0.04 2003 137 0.01 0.01 0.01 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.01 0.01 0.00 0.00 0.01 2004 138 0.01 0.01 0.01 0.01 0.00 0.00 0.01 0.01 0.01 0.01 0.00 0.00 0.01 0.00 0.01 2005 127 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2006 126 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2007 131 0.03 0.04 0.04 0.02 0.03 0.03 0.02 0.02 0.03 0.01 0.02 0.02 0.01 0.01 0.02 Average squared bias 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 Variance Years Sample size 2001 124 0.13 0.12 0.10 0.23 0.26 0.24 0.10 0.16 0.24 0.20 0.28 0.35 0.20 0.26 0.75 2002 131 0.11 0.10 0.10 0.14 0.17 0.23 0.07 0.13 0.23 0.15 0.19 0.28 0.15 0.21 0.29 2003 137 0.15 0.13 0.10 0.18 0.20 0.23 0.12 0.15 0.23 0.18 0.22 0.28 0.18 0.22 0.42 2004 138 0.13 0.10 0.07 0.12 0.14 0.16 0.12 0.14 0.16 0.17 0.20 0.25 0.17 0.19 0.22 2005 127 0.16 0.13 0.10 0.14 0.17 0.21 0.14 0.17 0.19 0.20 0.24 0.30 0.20 0.25 0.29 2006 126 0.16 0.14 0.11 0.12 0.14 0.20 0.12 0.15 0.20 0.18 0.21 0.26 0.17 0.20 0.27 2007 131 0.15 0.12 0.10 0.11 0.13 0.16 0.11 0.14 0.20 0.15 0.17 0.23 0.43 0.52 0.33 Average variance 0.14 0.12 0.10 0.15 0.17 0.20 0.11 0.15 0.21 0.18 0.22 0.28 0.21 0.26 0.37 Mean square error (Mse) Years Sample size 2001 124 0.13 0.12 0.10 0.23 0.26 0.24 0.10 0.17 0.24 0.20 0.28 0.35 0.20 0.27 0.75 2002 131 0.13 0.12 0.13 0.16 0.19 0.26 0.10 0.17 0.27 0.18 0.23 0.32 0.18 0.25 0.33 2003 137 0.16 0.14 0.12 0.18 0.21 0.24 0.13 0.15 0.24 0.18 0.22 0.30 0.18 0.22 0.43 2004 138 0.14 0.11 0.08 0.13 0.14 0.16 0.13 0.14 0.17 0.19 0.20 0.25 0.18 0.19 0.23 2005 127 0.16 0.13 0.10 0.14 0.17 0.21 0.14 0.17 0.19 0.20 0.24 0.30 0.20 0.25 0.29 2006 126 0.16 0.14 0.12 0.12 0.14 0.20 0.12 0.15 0.20 0.18 0.21 0.26 0.18 0.20 0.28 2007 131 0.18 0.16 0.14 0.13 0.15 0.19 0.13 0.16 0.24 0.16 0.19 0.25 0.44 0.53 0.35 Average Mse 0.14 0.15 0.13 0.11 0.16 0.18 0.22 0.12 0.16 0.22 0.19 0.23 0.29 0.22 0.27 The last row at the bottom of each of the three sections of the table shows the overall forecasting performance of the 15 model specifications across all the panels. In terms of bias, if we consider only models with p=1, the best specification turns out to be M1, that gains a reduction of squared bias of 9 to 22 per cent with respect to the other specifications. Adding more lags never produces any improvements in terms of bias and model specification M1 still remains the best performer over all the other ones. This fact proves that the model bias is controlled for by simply using the proxy of the first lag of the dependent variable, together with the survey design variables. In terms of forecast variance, if we consider only models with p=1, the best specification turns out to be M2. However, increasing the order of the autoregressive component degrades the precision of the estimate for all the specifications, except for the baseline M0:
7
The differences between the design weights corrected for the sample attrition for the years t and t+1 are negligible, since the population is stable across the two years.
International Conference On Applied Economics – ICOAE 2010
45
the best combination in terms of variance is (M0, p=3). M0 unfortunately generates biases that are uniformly superior to the other specifications with the same p. Unbiasedness matters however more than efficiency in an appraisal of model forecasting power and accordingly our preferred specification is (M1, p=1). It is a good compromise, since it is the best combination in terms of forecast bias and one of the best in terms of forecast variance: the resulting MSE is not far from the minimum attained (tab. 3). Table 4 presents for brevity the parameter estimates for this model only, for all the rolling panels beginning in the years 19942000. The most important variables to predict the realized investment variations are the one-year lagged dependent variables and in particular the investment plans. Results reveal a significant heterogeneity in the estimated parameters values across the panels. The first order autoregressive coefficient varies between -0.19 and -0.01 and its trend is decreasing over time (i.e. across the panels). At the same time, the magnitude of the coefficients related to investment plans becomes slightly higher over time. This result might be explained by recent economic turmoils making investment dynamics more erratic, with investment plans possessing predictive powers superior to lagged investment variations (this is also confirmed by the decreasing values of the adjusted R2 over time). Table 4 Parameter estimates of model M1, p=1 Panel 1994 1995 1996 1997 1998 1999 2000 1994 2000 2001 2002 2003 2004 2005 2006 2000 Intercept 1.2279* 1.1245* 0.9309* 0.9119* 0.7504* 0.9563* 0.9358* 1.2279* (0.1972) (0.1723) (0.2243) (0.1852) (0.1983) (0.1882) (0.181) (0.1972) Quantitative plan for yt-1 -0.1951* -0.0997 -0.0717 -0.0118 -0.0197 -0.0236 -0.0173 -0.1951* (0.0812) (0.0689) (0.0930) (0.0803) (0.0882) (0.0933) (0.0965) (0.0812) Categorical plans: Strong decrease -0.2586 -0.1709 -0.3928* -0.3404* -0.3761* -0.3942* -0.4063* -0.2586 (0.2104) (0.1418) (0.1614) (0.1575) (0.1606) (0.1618) (0.1608) (0.2104) Slight decrease -0.0761 -0.1080 -0.1131 -0.1325 -0.0767 0.0430 0.0148 -0.0761 (0.3817) (0.2125) (0.2430) (0.1797) (0.1685) (0.1639) (0.1527) (0.3817) Slight increase 0.2222 0.1399 0.1051 0.1533 0.2089 0.1796 0.1367 0.2222 (0.2648) (0.2388) (0.2612) (0.2412) (0.2297) (0.2300) (0.2236) (0.2648) Strong increase 0.5063* 0.4960* 0.5666* 0.5621* 0.7357* 0.6039* 0.5912* 0.5063* (0.2024) (0.1722) (0.1902) (0.1717) (0.1428) (0.1473) (0.1358) (0.2024) Branch: Textile, clothing, etc. -0.1026 0.0417 0.0825 0.1382 -0.0088 0.0170 0.0576 -0.1026 (0.1367) (0.1123) (0.1223) (0.1150) (0.1098) (0.1017) (0.1033) (0.1367) Engineering -0.1928 -0.0896 -0.0592 -0.0503 -0.0475 -0.1033 -0.0861 -0.1928 (0.1291) (0.1104) (0.1129) (0.1061) (0.1024) (0.1011) (0.0995) (0.1291) Area of location: North 0.1150 0.0076 0.0938 0.0218 0.1844 0.0011 -0.0099 0.1150 (0.1293) (0.1168) (0.1090) (0.1028) (0.1025) (0.1039) (0.0917) (0.1293) Centre 0.0474 0.0393 0.1182 0.1343 0.2172 0.1036 0.0721 0.0474 (0.1536) (0.1369) (0.1488) (0.1356) (0.1307) (0.1226) (0.1119) (0.1536) Firm size: Between 50 and 199 0.0203 0.0905 0.3395* 0.2373* 0.1904 0.1594 0.1438 0.0203 (0.1443) (0.1123) (0.1188) (0.1012) (0.1138) (0.1063) (0.1034) (0.1443) Between 200 and 499 0.2834 0.1597 0.2084 0.0966 0.2033 0.0790 0.1418 0.2834 (0.1490) (0.1197) (0.1361) (0.1314) (0.1292) (0.1128) (0.1107) (0.1490) Adj. R2 0.3825 0.4721 0.3583 0.3105 0.3994 0.1657 0.0451 0.3825 Durbin-Watson Test 1.6228 1.9254 1.6578 1.9789 1.8491 1.9654 1.9451 1.6228 Among the time-invariant fixed effects, the number of employees dominates sector specific and geographical effects, even though its parameters are not always statistically significant. The disturbance term of the equation we estimate is composite, as shown in section 3, and for this reason the hypothesis needed for OLS that serial correlation of residuals be absent must be carefully tested: it tends to be confirmed by the Durbin-Watson statistic adapted for panel data (see last row of table 4). The hypothesis is rejected only for two panels (1994-2000 and 1996-2002), for which the Durbin-Watson statistic is quite below the threshold level of 1.86 (Bhargava et al., 1982): this should not matter too much for the model forecasting properties, since the two panels are among the least recent ones we use. Finally, we can use the parameter estimates of (M1, p=1) to predict the one-year ahead investment variations at firm level. A consistent predictor for the aggregate investment variation, obtained as the weighted average of individual predictions over all the units (see equation 16), can now be compared with the aggregate realized investment variation (see equation 17) computed on the same panel sets. Figure 5 plots the two series, together with the forecast confidence interval at 68 per cent.
International Conference On Applied Economics – ICOAE 2010
46
Figure 4 Forecast and realized investment variation (variation index on 8- year panel data)
The model generates a one-year ahead forecast that follows quite faithfully the aggregate investment variation dynamics and eliminates the bias of investment plans.
8
Conclusions
The present paper examines a sample of Italian manufacturing firms for which qualitative investment plans and investment levels are collected in two separate surveys carried out every year. This panel sample is unique, in that qualitative investment plans and quantitative investment measures are collected on the same firm in two different occasions. By relying on exactly matched data on plans and realizations we estimate a quantitative model that significantly enriches the information obtained from simple qualitative variables. A straightforward future extension will be the extension of the model to all the target population covered by the survey, embracing also the industrial sectors outside manufacturing, as well as the private non financial services. A more complex development will be the specification of a model that could explicitly take into account the attrition process by a suitable regressor subset.
9
References
Arellano, M. - Bond, S., (1991), ‗Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment‘, The Review of Economic Studies, 58: 277- 297. Bank of Italy (2008), ‗Survey of Industrial and Service Firms (Year 2007)‘, Supplements to the Statistical Bulletin, Sample Surveys, XVIII, n. 42. Bernanke, B. (1983), ‗Irreversibility, Uncertainty and Cyclical Investments‘, The Quarterly Journal of Economics, 98: 85-106. Bhargava, A. - Franzini, L. - Narendranathan, W. (1982), ‗Serial Correlation and the Fixed Efiects Model‘, Review of Economic Studies, XLIX: 533-549. Bruno, G. S. F. (2005), ‗Approximating the bias of the LSDV estimator for dynamic unbalanced panel data models‘, Economics Letters, 87: 361-366. Chirinko, R. (1993), ‗Business Fixed Investment Spending: Modeling Strategies, Empirical Results, and Policy Implications‘, Journal of Economic Literature, XXXI: 1875-1911. DuMouchel, W. H. - Duncan, G. (1983), ‗Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples‘, Journal of the American Statistical Association, 78: 535-543. Guiso, L. - Parigi, G. (1999), ‗Investment And Demand Uncertainty‘, The Quarterly Journal of Economics, 114: 185-227. Goldrian, G. (ed.) (2007), Handbook of survey-based business cycle analysis, Edward Elgar Publishing Limited. Holtz-Eakin, D. (1988), ‗Testing for individual effects in autoregressive models‘, Journal of Econometrics, 39: 297-307. Jimenez-Martin, S. (1998), ‗On the testing of heterogeneity effects in dynamic unbalanced panel data models‘, Economics Letters, 58: 157-163. Kocic, P.N. - Bell, P.A. (1994), ‗Optimal Winsorized Cutoffs for a Stratified Finite Population‘, Journal of Official Statistics, 10: 419-435. Lokshin, B. (2008), ‗Further results on bias in dynamic unbalanced panel data models with an application to firm R&D investment‘, UNU-MERIT Working Papers, ISSN 1871-9872. Liutkepohl, H. - Xu, F. (2009), ‗Forecasting Annual Inflation with Seasonal Monthly Data: Using Levels versus Logs of the Underlying Price Index‘, EUI Working Papers, MWP 2009/37. Nardo, M, (2003), ‗The Quantification of Qualitative Survey Data: A Critical Assessment‘, Journal of Economic Surveys, 17: 645-668. Pesaran, M.H. - Weale, M. (2006), Survey expectations, chapter of Handbook of economic forecasting, North Holland. Smith, P. - Pont, M. - Jones, T. (2003), ‗Developments in business survey methodology in the Office for National Statistics, 1994-2000‘, Journal of the Royal Statistical Society: Series D (The Statistician), 52: 257-286.