GENERALIZED MAXIMUM ENTROPY ESTIMATION OF SPATIAL AUTOREGRESSIVE MODELS
by Thomas L. Marsh and Ron C. Mittelhammer
Draft August 2003
Associate Professor, Department of Agricultural Economics, 342 Waters Hall, Kansas State University, Manhattan, KS, 66506-4011;Phone 785-532-4913, Fax 785-532-6925
[email protected]. Professor, Department of Agricultural Economics and Professor, Department of Statistics, Washington State University, Pullman, Washington 99164.
1
Abstract: We formulate generalized maximum entropy estimators for the general linear model and the censored regression model when there is first order spatial autoregression in the dependent variable and residuals. Monte Carlo experiments are provided to compare the performance of spatial entropy estimators in small and medium sized samples relative to classical estimators. Finally, the estimators are applied to a model allocating agricultural disaster payments across regions.
2
1.0 Introduction In this paper we examine the use of generalized maximum entropy estimators for linear and censored regression models when the data generating process is afflicted by first order spatial autoregression in either the dependent variable or error term. Generalized maximum entropy (GME) estimators of regression models in the presence of spatial autocorrelation are of interest because they 1) offer a systematic way of incorporating prior information on parameters of the model, 2) are straightforwardly applicable to nonnormal error distributions, and 3) are robust for ill-posed problems (Golan, Judge, and Miller 1996). Prior information in the form of parameter restrictions arise naturally in the context of spatial models because spatial correlation coefficients are themselves inherently bounded. The development of estimators with finite sample justification across a wide range of sampling distributions and an investigation of their performance relative to established asymptotically justified estimators provides important insight and guidance to applied economists regarding model and estimator choice. 1 Various econometric approaches have been proposed for accommodating spatial autocorrelation in linear regression models and in limited dependent variable models. In the case of the linear regression model, Cliff and Ord (1981) provide a useful introduction to spatial statistics. Anselin (1988) provides foundations for spatial effects in econometrics, discussing least squares, ma ximum likelihood, instrumental variable, and method of moment estimators to account for spatial correlation issues in the linear regression model. In the case of the limited dependent variable model, most research has focused on the binary regression model and to a lesser extent the censored regression model. Besag (1972) introduced the auto- logistic model and motivated its use on plant
3
diseases. The auto- logistic model incorporated spatial correlation into the logistic model by conditioning the probability of occurrence of disease on its presence in neighboring quadrants (see also Cressie 1991). Poirier and Ruud (1988) investigate a probit model with dependent observations and prove consistent and asymptotic normality of maximum likelihood estimates. McMillen (1992) illustrated the use of a spatial autoregressive probit model on urban crime data with an Expectation-Maximization (EM) algorithm. At the same time, Case (1992) examined regional influence on the adoption of agricultural technology by applying a variance normalizing transformation in maximum likelihood estimator to correct for spatial autocorrelation in a probit model. Marsh, Mittelhammer, and Huffaker (2000) also applied this approach to correct for spatial autocorrelation in a probit model by geographic region while examining an extensive data set pertaining to disease management in agriculture. Bayesian estimation has also played an important role in spatial econometrics. LeSage (1997) proposed a Bayesian approach using Gibbs sampling to accommodate outliers and nonconstant variance within linear models. LeSage (2000) extended this to limited dependent variable models with spatial dependencies, while Smith and LeSage (2002) applied a Bayesian probit model with spatial dependencies to the 1996 presidential election results. The principle of maximum entropy has been applied in a variety of modeling contexts, including applications to limited dependent variable models. However, to date, GME or other information theoretic estimators have not been applied to spatial regression models. 2 Golan, Judge, and Miller (1996, 1997) proposed estimation of both the general linear model and the censored regression model based on the principle of generalized
4
maximum entropy in order to deal with small samples or ill-posed problems. Adkins (1997) investigated properties of a GME estimator of the binary choice model using Monte Carlo analysis. Golan, Judge, and Perloff (1996) applied maximum entropy to recover information from multinomial response data, while Golan, Judge, and Perloff (1997) recovered information with censored and ordered multinomial response data using generalized maximum entropy. Golan, Judge, and Zen (2001) proposed entropy estimators for a censored demand system with nonnegativity constraints, provided asymptotic results, and estimated a Mexican meat demand system. The current paper proceeds in the following manner. First, we provide a brief review of Golan, Judge, and Miller’s (1996) GME estimator of the general linear model (GLM) and then investigate a generalization to spatial GME-GLM estimators. Second, Monte Carlo experiments are provided to benchmark the mean squared error loss of the spatial GME estimators relative to ordinary least squares (OLS) and maximum likelihood (ML) estimators. We also examine the sensitivity of the spatial GME estimators to usersupplied supports, proximity matrices, non-normal errors, and other model assumptions. Third, Golan, Judge, and Perloff’s (1997) GME estimator of the censored regression model (i.e., Tobit) is extended to spatial GME-Tobit estimators, and additional Monte Carlo experiments are presented to investigate the sampling properties of the method. Finally, the spatial entropy GLM and Tobit approaches are applied empirically to the estimation of a simultaneous Tobit model of agricultural disaster payment allocations across regions.
5
2.0 Spatial GME-GLM Estimator 2.1 Data Constrained GME -GLM Consider the general linear model (GLM) (1)
Y = Xß + e
with Y a N × 1 dependent variable vector, X a fixed N × K matrix of explanatory variables, ß a K × 1 vector of parameters, and e a N × 1 vector of disturbance terms. The data constrained GME estimator of the general linear model (hereafter GME-D) is defined by the following constrained maximum entropy problem (Golan, Judge, and Miller 1996):3 (2a)
{
}
max − ( p )′ ln ( p ) p
subject to (2b)
Y = X( S ßpß ) + (S ep e )
(2c)
1′pßk = 1 ∀ k , 1′pie = 1 ∀i
(2d)
p = vec ( p ß , pe ) > [0]
In (2) the unknown parameter vector ß and error vector e are reparameterized as ß = Sß pß and e = Se pe from known matrices of support points Sß , Se and an unknown
( KJ + NM ) ×1 vector of weights
p = vec ( pß , p e ) .4 The KJ ×1 vector
pß = vec ( p1ß ,..., p ßK ) and the NM ×1 vector pe = vec ( p1e ,..., p Ne ) consist of J ×1 vectors pßk and M × 1 vectors pei , each ha ving nonnegative elements summing to unity. The
matrices Sß and Se are K × KJ and N × NM block-diagonal matrices of user supplied support points for the unknown ß and e vectors. For example, consider the support
6
matrix for the β vector,
ß ′ ( s1 ) 0 ß S = M 0
(3)
0 ß ′ s L 0 ( 2) M O M 0 L ( s ßK )′ 0
L
Here, s ßk = ( skß1,..., skJß )′ is a J ×1 vector such that skß1 ≤ skß2 ≤ L ≤ s ßkJ and β k = ∑ j =1 skjß pkjß ∈ sßk1 , skJß ∀ k = 1,..., K . Similarly the matrix Se is defined such that J
e s ek = ( ske1,..., skM )′ are M × 1 vectors such that sie1 ≤ sie2 ≤ L ≤ siMe and
e e e ε i = ∑ m =1 sim pim ∈ sie1, siM ∀i = 1,..., N . In effect the user-supplied supports used in the M
entropy approach provide a means of formalizing more adhoc methods researchers have commonly employed to impose a priori restrictions on regression parameters and ranges of disturbance outcomes. The generalized maximum entropy formulation in (2) incorporates inherently a dual entropy loss function that balances estimation precision in coefficient estimates and predictive accuracy. 5 Mittelhammer and Cardell (1998) have derived asymptotic properties and test statistics for the data-constrained GME-GLM estimator. They also identified a more computationally efficient approach with which to solve entropy based problems that does not expand with sample size and provides the basis for the optimization algorithms used in the current study. 2.2 Spatial GME Estimators To account for direct influences of spatial neighbors, the classical regression model in (1)
7
can be reformulated as a first order spatial autoregressive model (see Cliff and Ord 1981; Anselin 1988) Y = ρW1Y + Xß + e
(4)
In (4), W1 is a N×N proximity matrix and ρ is a scalar interpreted as the spatial correlation coefficient of the spatially lagged dependent variable. For instance, the elements of the proximity matrix W1 ={wij* } may be defined as a standardized joins matrix where wij* = wij
∑
j
wij with wij=1 if observations i and j are from an adjoining
spatial region (for i ≠ j ) and wij=0 otherwise. The spatial correlation coefficient implies positive spatial correlation if ρ >0, negative spatial correlation if ρ [0]
*
(
(
*
)
)
where p = vec pß , p π , p ρ, p λ , pu , pv is a ( KJ + LJ + 2 J + 2NM ) × 1 vector of unknown *
parameters. This specification is based on a GME estimator of the simultaneous equations model introduced by Marsh, Mittelhammer, and Cardell (1998), who demonstrated properties of consistency and asymptotic normality for the estimator. 8
11
2.3 Monte Carlo Experiment I - Spatial GLM The Monte Carlo experiments for the general linear model follow those in Mittelhammer and Cardell (1997). The linear component Xß is specified as
(14)
2 1 Xß = [1, x2 , x3 , x4 ] −1 3
where the values of xi 2 , i = 1,..., N are iid outcomes from Bernoulli(.5), and the values of the pair of exp lanatory variables xi 3 and xi 4 are generated as iid outcomes from
(15)
µ σ 2x 3 3 N , µ 4 σ x ,x 3 4
σ x3 ,x4 σ2x4
2 1 0.5 = N , 5 0.5 1
which are then truncated at ±3 standard deviations. In the experiments below, estimator performance is also examined when the correlation between x 3 and x 4 is increased to σ x3 , x4 = 0.85 . The disturbance terms ui are drawn iid from a N(0, σ 2 ) distribution that is
truncated at ±3 standard deviations and σ 2 = 1. Thus, the true support of the disturbance distribution in this Monte Carlo experiment is truncated normal, with lower and upper truncation points located at -3 and +3, respectively. To investigate the impact of coefficients restrictions on the choice of error truncation points, two different sets of supports for each of the four parameters are defined by s ßk = ( −5,0,5)′ and s ßk = ( −20,0,20 )′ , ∀k . Supports for each of the reduced form parameters are defined by s πl = ( −20,0,20)′ , ∀ l , for each experiment. Alternatively, if the researcher has strong priors about a particular coefficient then
12
supports can be adjusted to accommodate those desired restrictions. Error supports for the GME estimator are defined following Golan, Judge, and Miller (1996), using Pukelsheim’s (1994) 3-sigma rule. Specifically, in all of the experiments for the general * linear model the supports are defined as s ui = s vi = (− 3σˆ y ,0,3σˆ y )′ ∀i where σˆ y is the
sample standard deviation of the dependent variable. Next we specify the model assumptions for the spatial components (i.e., proximity matrices and spatial correlation coefficients with supports) of the regression model. The true spatial correlation coefficients are selected as ρ = 0 and λ = 0.5 ( ρ = 0.5 and λ = 0 ) for spatial autocorrelation in the residuals (dependent variable). Supports for the spatial correlation coefficients are specified by using eigenvalue bounds as s ρ = s λ = (1/ ξmin ,0,1/ ξ max )′ . Two spatial patterns were used to define the proximity
matrices in the Monte Carlo experiments, these being the star lattice (all regions are joined at a single common point) and a pairwise join (wherein only individual pairs of regions are joined). These are defined, respectively, by the bordered matrix (Wb) and Toeplitz matrix (Wt ) as:
(16)
0 1 Wb= M 1 1
1 1 K 1 0 1 0 K 0 1 0 1 K 0 0 0 K 0 M M K M and Wt = M M M K M 0 0 K 0 0 K 1 0 1 0 0 K 1 0 0 0 K 0
While there is an infinite number of weighting matrices from which to choose, these two specifications provide substantially different spatial representations with which to test robustness of the econometric estimators. 9
13
Table 1 presents the assumptions underlying twenty experiments that compare the performance of GME to the maximum likelihood (ML) estimator under normality of the GLM with first order spatial autocorrelation in either the dependent variable or the residuals. 10 Experiments 1-6 deal with spatial autocorrelation in the residuals, while experiments 7-12 deal with spatial autocorrelation in the dependent variable. The proximity matrix used in experiments 1-12 is the Toeplitz matrix. Experiments 13-18 employ the bordered proximity matrix for both spatial autocorrelation in the residuals and the dependent variable, concentrating on the case characterized by the widest supports for the parameters given by s ßk = ( −20,0,20 )′ . Finally, experiments 19 and 20 compare performance of estimators in the presence of non- normal errors. The error term is contaminated by outcomes from an asymmetric distribution (Huber, 1981; Hampel et al. 1986). Specifically, for a given percentage level ϕ , the errors for the structural equations are defined by (1 − ϕ) F1 + ϕ F2 where F1 is a N(0,1) distribution truncated at ± 3 standard deviations and F2 is a Beta (2,3) − 6 distribution that is skewed with mean zero. We examine the robustness of
ML and GME-NLP for ϕ =0.15, 0.25. All of the Monte Carlo results are based on 1000 replications with N=25 and 50 observations. For completeness, we also include performance measures of OLS. It is noteworthy to point out that convergence of the GME spatial estimator [using the general objective function optimizer OPTMUM in GAUSS (Aptech Systems Inc.)] occurred within a matter of seconds for each replication across all the Monte Carlo experiments. Appendix A discusses computational issues and provides derivations of the gradient and Hessian for the GME-NLP estimator in (13).
14
2.3.1 Experiment I - Results Table 2 contains the mean squared error loss (MSEL) from experiments 1-12 with the Toeplitz proximity matrix for: (a) the spatial correlation coefficients ρ and λ estimated by the spatial ML and GME estimators and (b) the regression coefficients ß for the spatial ML and GME estimators and the OLS estimator. As expected, MSEL values predominately decreased when sample size increased from 25 to 50 observations. Increasing the correlation between x 3 and x 4 from 0.50 to 0.85 increased the MSEL for the regression coefficients ß in OLS and ML, influencing these estimators to a notably larger extent than for GME-NLP and GME-N. Restricting the supports from s ßk = ( −20,0,20 )′ to s ßk = ( −5,0,5)′ lead to a decrease in MSEL for the GME estimators.
For the regression coefficients ß , both GME-NLP and ML exhibited smaller MSEL than either OLS or GME-N. In experiments 1-6 with spatial error correlation, the relative MSEL superiority between ML and GME-NLP were mixed, while GME-N had the largest MSEL for each experiment. In experiments 7 and 9-12, with a spatially lagged dependent variable, GME-NLP outperformed ML. Again, GME-NLP and ML had lower MSEL than GME-N. OLS had the largest MSEL for each experiment displaying its inherent simultaneity bias. Regarding the spatial correlation coefficients ρ and λ , ML had the lowest MSEL across experiments (1)-(12).
Table 3 reports additional Monte Carlo results from experiments 13-18 that use the bordered proximity matrix and experiments 19-20 that use the Toeplitz proximity matrix with contaminated errors. Except for the different proximity matrices, experiments 13-15 (16-18) correspond identically to experiments 1-3 (7-9), respectively. 15
While OLS and GME-N exhibited similar to performance in experiments 1-3 and 7-9, ML exhibited dramatically larger MSEL in experiments 16-18 in the presence of a spatially lagged dependent variable. Across experiments 13-18, with the bordered proximity matrix, GME-NLP dominated the other estimators in MSEL. In experiments 19 and 20 (with a spatially lagged dependent variable), where the percentage of erro contamination was 15 and 25% respectively, the GME estimators dominated ML in MSEL and demonstrated their robustness to non- normal residuals. Overall, the Monte Carlo results for the spatial regression model indicate that the data constrained GME-NLP estimator dominated the normalized moment constrained GME-N estimator in MSEL. Neither ML nor GME-NLP dominated one another in the presence of spatial error correlation. However, relative to ML and GME-N, GME-NLP was the more robust estimator in the presence of a spatially lagged dependent variable, non-normal errors, and across the Toeplitz and bordered proximity matrices. As a result, we focus on the data constrained GME-NLP estimator when investigating the censored Tobit model. 3.0 Spatial GME Tobit Estimator 3.1 GME-Tobit Model Consider a Tobit model with censoring of the dependent variable at zero
(17)
Yi = Yi* Yi = 0
if Yi* = Xi.ß + εi > 0 if Yi* = Xi.ß + εi ≤ 0
Golan, Judge, and Perloff (1997) reorder the observations and rewrite (17) in matrix form as (18)
Y Y = 1 = Y2
Y1 X1ß + e1 > 0 0 = X ß + e ≤ 0 2 2 16
where the subscript 1 indexes the observations associated with N1 positive elements, the subscript 2 indexes the observations associated with N 2 zero elements of Y, and N1 + N2 = N . In the GME method, the estimator of the unknown ß in the Tobit model formulation is given by ß = Sß pß , where pß = vec ( p1ß ,..., p ßK ) is a KJ ×1 vector, and e = Se pe , where pe = vec ( p e1 , pe 2 ) is a NM ×1 vector, with both pβ and p ε being derived from the following constrained maximum entropy problem:
{
}
max − ( p )′ ln ( p )
(19a)
p
subject to
(19b)
Y1 = X1 ( Sß pß ) + Se1 p e1 ß ß e e 0 ≥ X 2 ( S p ) + S 2 p 2
(19c)
1′p ßk = 1, 1 ′pie1 = 1, 1 ′pei 2 = 1
(19d)
p = vec ( pß , pe1 , pe 2 ) > [0]
where p = vec ( p ß , pe1 , p e2 ) is a ( KJ + NM ) × 1 vector of unknown support weights. Under general regularity conditions, Golan, Judge, and Perloff (Proposition 4.1, 1997) demonstrate that the GME-Tobit model is a consistent estimator. 3.2 Spatial GME -Tobit Model Assume a censored Tobit model with censoring of the dependent variable at zero and first order spatial autoregressive process in both the dependent and error variables as (20)
Yi = Yi* Yi = 0
% * +X % ß +u > 0 if Yi* = Y i i. i * * % % if Yi = Yi + X i .ß + ui ≤ 0
% = ( I − λW ) X and Y % denotes the ith row of the matrix X % * denotes the ith row where X i 2 i 17
of Y% * = ( ρ W1 + λ W2 − ( λ W2 )( ρ W1 ) ) Y . If λ = 0 and ρ ≠ 0 ( λ ≠ 0 and ρ = 0 ) then this is a Tobit model with a first order spatial autoregressive process in the dependent (error) variable. Equation (20) reduces to the standard Tobit formulation in (17) if ρ = λ = 0 . The spatial GME rule for defining the spatial Tobit estimator of the unknown parameters ß, π, ρ, λ in the combined spatial autoregressive models (20) is represented by the constrained maximum entropy problem:
{
}
max − ( p )′ ln ( p )
(21a)
p
subject to
( )
)
(21b)
Y1 = Z% 1 ( S πp π ) + X % 1 ( Sßp ß ) + Su*1 p u*1 * * 0 ≥ X% 2 ( Sß pß ) + Su 2 pu2
(21c)
Z% = ( Sρp ρ ) W1 + ( S λ pλ ) W2 − ( S λ pλ ) W2 ( S ρpρ ) W1 Z
(21d)
% = I − ( Sλ pλ ) W 2 X X
(21e)
Y = Z ( S π pπ ) + ( Sv p v )
(21f)
1′p ßk = 1 ∀k , 1′pπk = 1 ∀ k, 1′piu = 1 ∀ i, 1′pρ = 1, 1′p λ = 1
(21g)
p = vec pß , p π , p ρ, p λ , pu , p v > [0]
(
(
)
(
)
*
(
*
(
)
)
% and X % are partitioned submatrices of X % = I − ( Sλ pλ ) W2 X , Z% 1 and Z % are where X 1 2 2
(
)
partitioned submatrices of Z% = ( Sρp ρ ) W1 + ( S λ pλ ) W2 − ( S λ pλ ) W2 ( S ρpρ ) W1 Z , and u* = vec ( u*1 , u*2 ) (corresponding to the ordering of Y discussed in (18)). Apart from
reordering the data, the spatial Tobit estimator in (21) has the same structural components
18
as the estimator in (13). 3.3 Iterative Spatial GME -Tobit Model Breiman et. al. (1993) discussed iterative least squares estimation of the censored regression model that coincides with ML and the EM algorithm under normality, but that does not necessarily coincide with ML or EM with nonnormal errors. Each iteration involves two steps: an expectation step and a re-estimation step. 11 Following this approach, Golan, Judge, and Perloff (1997) suggest using the initial estimates from optimizing (19) defined by ߈ (0) = Sß pˆ ß (0) to predict Yˆ 2(1) and then redefine
(
)
1 1 Y( ) = vec Y1 , Yˆ 2( ) in re-estimating and updating ߈ (1) = Sßpˆ ß(1) .12 In the empirical
exercises below, we follow this process to obtain the ith iterated estimate ߈ (i ) = Sß pˆ ß (i) of the spatial GME-Tobit model in equation (21). 3.4 Monte Carlo Experiment II - Spatial Tobit Model The sampling experiments for the Tobit model follow closely those in Golan, Judge, and Perloff (1997) and Paarsch (1984). The explanatory variables and coefficients of the model are defined as
(22)
2 1 Xß = [ x1, x 2, x 3, x 4 ] −3 2
where the xil , i = 1,..., N , l = 1,...,4 are generated iid from N(0,1) and are orthonormal. The disturbance terms ui are drawn iid from a N(0,1) distribution. The percent of censored observations was approximately 50% across the sampling experiments.
19
The structural and reduced form error supports for the GME estimator are defined using a variation of Pukelsheim’s 3-sigma rule. Here, the supports are defined as
( −3σˆ ,0,3σˆ )′ where σˆ y
y
y
= ( y max − yˆ min )/12 (Golan, Judge, and Perloff 1997). The
spatial correlation coefficients are set to ρ = 0 and λ = 0.5 ( ρ = 0.5 and λ = 0 ) for spatial autocorrelation in the residuals (dependent variable) with supports s ρ = s λ = (1/ ξmin ,0,1/ ξ max )′ . Supports for the structural and reduced form coefficients
are specified as sβk = s lπ = ( −20,0,20 )′ . Table 4 presents the assumptions underlying four experiments comparing the spatial GME Tobit model with the ML estimator of the Tobit model. Experiments 1 and 2 deal with spatial autocorrelation in the residuals, while experiments 3 and 4 deal with spatial autocorrelation in the dependent variable. Experiments 5 and 6 introduce the contaminated error model with ϕ = 0.25 with spatial autocorrelation in the dependent variable. For each experiment the proximity matrix is the Toeplitz matrix. The Monte Carlo results are based on 1000 replications with N=25 and 50 observations. 3.4.1 Experiment II - Results Table 5 reports the MSEL for the regression coefficients ß and the spatial correlation coefficients ρ, λ from the non- iterative and iterative spatial GME estimators. 13 Table 5 also contains the MSEL of the regression coefficients ß for the standard ML estimator of the Tobit model, or ML-Tobit. The evidence from the Monte Carlo simulations indicates that ML-Tobit outperformed both the non- iterative and iterative GME-NLP in MSEL in the presence of a spatially lagged residual (experiments 1 and 2). Alternatively, the iterative GME-NLP
20
outperformed ML-Tobit in the presence of a spatially lagged dependent variable (experiments 3 and 4) and for the case of contaminated residuals (experiments 5 and 6). Overall, the iterative GME-NLP estimator appears to perform best in MSEL in the presence of a spatially lagged dependent variable. 14 4.0 Illustrative Application: Allocating Agricultural Disaster Payments Agricultural disaster relief in the U.S. has commonly taken one of three forms emergency loans, crop insurance, and direct disaster payments (U.S. GAO). Of these, direct disaster payments are considered the least efficient form of disaster relief (Goodwin and Smith, 1995). Direct disaster payments from the government provide cash payments to producers who suffer catastrophic losses, and are managed through the USDA’s Farm Service Agency (FSA). The bulk of direct disaster funding is used to reimburse producers for crop and feed losses rather than livestock losses. Direct disaster payments approached $30 billion during the 1990s FSA, by far the largest of the three disaster relief programs. Unlike the crop insurance program which farmers use to manage their risk, it is usually legislators who decide whether or not a direct payment should be made to individual farmers after a disaster occurs. The amount of disaster relief available through emergency loans and crop insurance is determined by contract, whereas direct disaster relief is determined solely by legislators only after a disaster occurs. Politics thus plays a much larger role in determining the amount of direct disaster relief than it does with emergency loans and crop insurance. Direct payments are also blamed for low participation in the crop insurance program. The ‘free’ disaster relief available through direct payments gives
21
little incentive for producers to pay for crop insurance coverage. Furthermore, legislators from a specific state find it politically harmful not to subsidize farmers who experienced a disaster, given the presence of organized agriculture interest groups within that state (Becker, 1983; Gardner 1987). 4.1 Modeling Disaster Relief Several important econometric issues arise in allocating agricultural disaster payments. First, there is potential simultaneity between disaster relief and crop insurance payments. Second, and more importantly for current purposes, regional influences of natural disasters and subsequent political allocations may have persistent spatial correlation effects across states. Ignoring either econometric issue can lead to biased and inconsistent estimates and faulty policy recommendations. Consider the following simultaneous equations model with spatial components:15 (23)
Yd = ( ρW1 + λ W2 − ( λW2 )( ρW1 ) ) Yd +δYc + ( I − λ W2 ) X d ß + u
(24)
Yc = Z πc + vc
(25)
Yd = Zπ d + v d
where the dependent variable Yd denotes disaster payments, and is censored because some states do not receive any direct agriculture disaster relief in certain years ( in the context of the Tobit model, Yd = 0 if Yd* ≤ 0 and Yd = Y*d if Yd* > 0 ). In (23), Yc denotes crop insurance payments (non-censored) and Xd are exogenous variables including measures of precipitation, political interest group variables, year specific dummy variables, and number of farms. In the reduced form models (24) and (25), Z includes per capita personal income, farm income, the number of farm acres, total crop values, geographical census region, income measures, year specific dummy variables, and 22
number of farms and political factors. The parameters to be estimated are δ and ß structural coefficients, as well the reduced form coefficients πc , πd . The data were collected from the FSA and other sources. A complete list and description of all direct disaster relief programs are available through the FSA. The FSA data set maintains individual farmer transactions of all agricultural disaster payments in the U.S. For the purposes of the current study, FSA aggregated the transactions across programs and individuals to obtain annual levels of disaster payments for each of the 48 contiguous states from 1992 to 1999. A list of selected variables and definitions are provided in Table 6. 16 For this application the elements of the proximity matrices for each time period W = {wij* } are defined as a standardized joins matrix where wij* = wij
∑
j
wij with wij=1 if observations i and j (for i ≠ j ) are from adjoining states
(e.g., Kansas and Nebraska) and wij=0 otherwise (e.g., Kansas and Washington). To account for the time series cross-sectional nature of the data, the full proximity matrix used in modeling all of the observed data was defined as a block diagonal matrix such that W1 = W2 = (IT ⊗ W ) where W is the joins matrix defined above and IT is an T × T identity matrix with T=8 representing the 8 years of available data. The analysis proceeded in several steps. First, to simplify the estimation process and focus on the spatial components of the model, predicted crop insurance values were obtained from the reduced form model in (24) with the GME-D estimator in (13). Then predicted values crop insurance va lues were used in the disaster relief model. 17 Second, equations (23) and (25) were jointly estimated with non-iterative GME-NLP in (21). Supports were specified as sβi = s πj = ( −1000,0,1000)′ for the structural and reduced form
23
* parameters, s ui = s vi = ( − j σˆ Yd ,0, jσˆ Yd )′ ∀i for the structural and reduced form residuals
with j=5 and the standard deviation of disaster payments σˆ Yd = 137 , and
(
)
′ s ρ = s λ = 1/ ξˆ min ,0,1/ ξˆ max for the spatial autocorrelation parameters with estimated eigenvalues ξˆ min = −0.7 and ξˆ max = 1.1 . Effectively, structural coefficient supports were selected to allow political coefficients to range between -$1 billion to $1 billion. Because of the inherent unpredictability and political nature of disaster allocations, circumstances arose in the data that yielded relatively large residuals for selected states in specific years. Rather than removing outlying observations, we chose to expand the error supports to a 5-sigma rule as opposed to the 3-sigma rule used in the Monte Carlo analysis. Table 7 presents structural and spatial coefficient estimates and asymptotic tvalues for the non- iterative GME-NLP estimates of the disaster relief model for both spatial autocorrelation in the residuals (Model 1) and the dependent variable (Model 2). 18 Results from Model 1 demonstrate that the spatial autocorrelation coefficient is significant and positive in the regression residuals, indicating potential spatial autocorrelation in the form of a lagged dependent variable or other misspecification. Findings from Model 2 indicate a significant and positive autocorrelation coefficient for the lagged dependent variable. Interpreting selected significant structural coefficients in Model 2, positive (negative) percentage change in precipitation positively (negatively) influences disaster payments. Relative to year 1996, which is excluded from the model and coincides with a change in farm legislation to “Freedom to Farm”, coefficients were significant and positive (except for 1995 and 1997). It is interesting that under the “Freedom to Farm” program, with intentions of smaller outlays to farmers (i.e., in direct
24
subsidies), that disaster relief allocations were increasing. Membership on the Senate Appropriations Committee is the only political variable that is not significant. Focusing on the results for the Secretary of Agriculture, this indicates that the level of direct disaster relief is some $82 million higher per year in the home state of the secretary of agriculture. 19 5.0 Conclusions In this paper we specified generalized maximum entropy (GME) estimators for the general linear model (GLM) and the censored Tobit model in the presence of first order spatial correlation. We generalized both the GME-GLM (Golan, Judge, and Miller 1996; Marsh, Mittelhammer, and Cardel 1998) and GME-Tobit (Golan, Judge, and Perloff’s 1997) estimators to include first order spatial autoregressive processes in either the dependent variable or residuals. Monte Carlo experiments were conducted for small and medium sized samples and estimators compared using the mean squared error loss (MSEL) of the regression coefficients. Relative to maximum likelihood, the GME estimators exhibited smaller MSEL in the presence of spatial correlation in the dependent variable of the regression model. The data constrained GME estimator also outperformed the normalized moment constrained GME estimator in MSEL. However, both GME estimators were robust in the presence of non- normal error assumptions and across selected proximity matrices. Finally, we provided an illustrative application of the spatial GME estimators in an analysis of a model allocating agricultural disaster payments using a simultaneous Tobit framework. We found evidence of significant spatial correlation in the disaster relief
25
model and recovered parameter estimates indicating significant statistical and economic influence of political interests in allocating agricultural disaster payments. The GME estimators provided in this paper provides a conceptually new approach to estimating spatial regression models with parameter restrictions imposed. The method is computationally efficient and robust. Overall, the results suggest that further investigation of GME estimators for spatial autoregressive models could yield additional findings and insight useful to applied economists.
26
References Adkins, L. “A Monte Carlo Study of a Generalized Maximum Entropy Estimator of the Binary Choice Model.” In Advances in Econometrics: Applying Maximum Entropy to Econometric Problems, Volume 12, edited by T. B. Fomby and R. C. Hill. Greenwich, Connecticut: JAI Press Inc., 183-200. Aptech Systems, Inc. (1996), GAUSS: Optimization Application Module. Maple Valley, Washington. Anselin, L. (1988), Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, Dordrecht. Becker, Gary S. (1983). “A Theory of Competition Among Pressure Groups for Political Influence.” Quarterly Journal of Economics 98: 371-400. Besag, J. E. (1972), “Nearest-Neighbour Systems and the Auto-logistic Model for Binary Data.” Journal of the Royal Statistical Society, Ser. B, 34, 75-83. Breiman, L., Y. Tsur, and A. Zemel. (1993), “On a Simple Estimation Procedure for Censored Regression Moodels with Known Error Distributions,” Annals of Statistics, 21:1711-1720. Case, A. C. (1992), “Neighborhood Influence and Technological Change.” Regional Science and Urban Economics, 22: 491-508. Cliff, A. and Ord, K. (1981), Spatial Processes, Models and Applications. Pion, London. Cressie, N. A. C. (1991), Statistics for Spatial Data. New York: John Wiley & Sons. Gardner, Bruce L. (1987). “Causes of Farm Commodity Programs.” Journal of Political Economy 95: 290-310. Garrett, T. A., T. L. Marsh, and M.I. Marshall. 2003. “Political Allocation of Agriculture Disaster Payments in the 1990s.” Working Paper, 2003-005A, Federal Reserve Bank, St. Louis. Garrett, T. A. and R. S. Sobel. “The Political Economy of FEMA Disaster Payments,” Economic Inquiry, 2003. Golan, A., Judge, G. G. and Miller, D., 1996. Maximum Entropy Econometrics: Robust Information with Limited Data. New York: John Wiley and Sons. Golan, A., Judge, G. G. and Perloff, J. 1996. “A Maximum Entropy Approach to Recovering Information from Multinomial Response Data.” Journal of the American Statistical Association, 91:841-853.
27
Golan, A., Judge, G. G. and Miller, D., 1997. “The Maximum Entropy Approach to Estimation and Inference: An Overview.” In Advances in Econometrics: Applying Maximum Entropy to Econometric Problems, Volume 12, edited by T. B. Fomby and R. C. Hill. Greenwich, Connecticut: JAI Press Inc., 3-24. Golan, A., Judge, G. G. and Perloff, J. “Estimation and inference with censored and ordered multinomial data.” Journal of Econometrics, 79 (1997)23-51. Golan, A., Judge, G. G. and Zen, E.Z. “Estimating a Demand System with Nonnegativity Contraints: Mexican Meat Demand.” The Review of Economics and Statistics, 83 (2001)541-550. Goodwin, Barry K. and Vincent H. Smith. (1995). The Economics of Crop Insurance and Disaster Aid. The AEI Press, Washington, D.C. Hample, F. R., E. M. Ronchetti, P.J. Rousseeuw, and W.E. Stahel. 1986. Robust Statistics: The Approach Based on Influence Functions. John Wiley & Sons: New York. Huber P.J. 1981. Robust Statistics. John Wiley & Sons: New York. Kelejian, H. H. and I. R. Prucha. 1998. “A Generalized Spatial Two-Stage Least Sqaures Procedure for Estimating a Spatial Autoregressive Model with Autoregressive Distrubances,” Journal of Real Estate Finance and Economics, Vol. 17:99-121. Kelejian, H. H. and I. R. Prucha. 1999. “A Generalized Moments Estimator for the Autoregressive Parameter in a Spatial Model,” International Economic Review, Vol. 40, No. 2 (May):509-533. LeSage, J. P. 1997. “Bayesian estimation of Spatial Autoregressive Models,” International Regional Science Review 20:113-129. LeSage, J. P. 2000. “Bayesian estimation of Limited Dependent Spatial Autoregressive Models,” Geographical Analysis 32:19-35. Marsh, T. L., R. C. Mittelhammer, and N. S. Cardell. 1998. “A Generalized Maximum Entropy Estimator of the Simultaneous Linear Statistical Model.” Working paper, Kansas State University. Marsh, T. L., R. C. Mittelhammer, and R. G. Huffaker. “Probit with Spatial Correlation by Plot: PLRV Net Necrosis.” Journal of Agricultural, Biological, and Environmental Statistics, 5(2000): 22-36. McMillen, D. P. (1992), “Probit with Spatial Correlation.” Journal of Regional Science, 32: 335-348.
28
McLachlan, G. J. and T. Krishnan. 1997. The EM Algorithm and Extensions. New York: John Wiley & Sons. Mittelhammer, R. C. and N. S. Cardell. 1998. "The Data-Constrained GME Estimator of the GLM: Asymptotic Theory and Inference." Mimeo, Washington State University. Mittelhammer, R., Judge, G. and Miller, D., 2000. Econometric Foundations. New York: Cambridge University Press. Paarsch, H. J. 1984. “A Monte Carlo Comparison of Estimators for Censored Regression Models.” Journal of Econometrics 24:197-213. Poirier, D. J. and P. A. Ruud. (1988), “Probit with Dependent Observations.” The Review of Economic Studies, 55:593-614. Pukelsheim, F. 1994. “The Three Sigma Rule.” The American Statistician 48: 88-91. Smith, T. E. and LeSage, J. P. 2002. “A Bayesian Probit Model with Spatial Dependencies,” Working Paper. Smith, Richard J. and Richard W. Blundell. (1986). “An Exogeneity Test for a Simultaneous Equation Tobit Model with an Application to Labor Supply.” Econometrica 54: 679-685. Theil H. 1971. Principles of Econometrics. John Wiley & Sons: New York. U.S. General Accounting Office. (1989). “Disaster Assistance: Crop Insurance Can Provide Assistance More Effectively Than Other Programs.” Report to the Chairman, Committee on Agriculture, House of Representatives. GAO/RCED-89-211, Washington, D.C. Zellner A. 1994. Bayesian and non-bayesian estimation using balanced loss functions. In Statistical Decision Theory and Related Topics, Gupta S, Berger J (eds). Springer Verlag: New York. Zellner A. 1998. The Finite Sample Properties of Simultaneous Equations’ Estimates and Estimators Bayesian and Non-Bayesian Approaches. Journal of Econometrics 83: 185212.
29
Table 1. Monte Carlo Experiments for Spatial Regression Model. Experiment σ x3 , x4 a ρ N Sβ′ b λ 1 25 0.5 (-20,0,20) 0 0.5
a b
W Toeplitz
2
50
0.5
(-20,0,20)
0
0.5
0
Toeplitz
3
50
0.85
(-20,0,20)
0
0.5
0
Toeplitz
4
25
0.5
(-5,0,5)
0
0.5
0
Toeplitz
5
50
0.5
(-5,0,5)
0
0.5
0
Toeplitz
6
50
0.85
(-5,0,5)
0
0.5
0
Toeplitz
7
25
0.5
(-20,0,20)
0.5
0
0
Toeplitz
8
50
0.5
(-20,0,20)
0.5
0
0
Toeplitz
9
50
0.85
(-20,0,20)
0.5
0
0
Toeplitz
10
25
0.5
(-5,0,5)
0.5
0
0
Toeplitz
11
50
0.5
(-5,0,5)
0.5
0
0
Toeplitz
12
50
0.85
(-5,0,5)
0.5
0
0
Toeplitz
13
25
0.5
(-20,0,20)
0
0.5
0
Bordered
14
50
0.5
(-20,0,20)
0
0.5
0
Bordered
15
50
0.85
(-20,0,20)
0
0.5
0
Bordered
16
25
0.5
(-20,0,20)
0.5
0
0
Bordered
17
50
0.5
(-20,0,20)
0.5
0
0
Bordered
18
50
0.85
(-20,0,20)
0.5
0
0
Bordered
19
25
0.5
(-20,0,20)
0.5
0
0.15
Toeplitz
20
25
0.5
(-20,0,20)
0.5
0
0.25
Toeplitz
σ x3 , x 4 denotes the correlation between x3 and x4 . in equation (15). Supports on the reduced form coefficients and spatial correlation coefficients are given by
s πl = ( −20,0,20 )′ ∀l and s λ = s ρ = (1/ ξmin ,0,1/ ξ max )′ . c
ϕc 0
Contaminated error model.
30
Table 2. Monte Carlo Results for OLS and the Spatial ML and GME Estimators with Toeplitz Proximity Matrix. OLS
ML
GME-NLP
GME-N
MSEL ( ߈ )
MSEL ( λˆ )
MSEL ( ߈ )
MSEL ( λˆ )
MSEL ( ߈ )
MSEL ( λˆ )
2.53754
0.03832
1.81793
0.24919
1.82611
0.24933
7.06534
2
1.08799
0.01585
0.70009
0.24770
0.93598
0.24933
7.09038
3
1.88644
0.01395
1.20932
0.24690
1.45700
0.24922
7.25905
4
2.70499
0.04135
1.80259
0.23231
1.36466
0.18127
8.07194
5
1.18626
0.01517
0.75482
0.23350
0.87995
0.18128
8.06731
6
1.88689
0.01475
1.16964
0.23264
0.66711
0.18193
8.15037
MSEL ( ߈ )
MSEL ( ρˆ )
MSEL ( ߈ )
7
198.62701
0.00798
8.28527
0.01307
8
184.90385
0.00302
2.97651
9
187.56518
0.00430
10
195.31599
11 12
Experiment 1
MSEL ( ρˆ )
MSEL ( ߈ )
5.13046
0.18939
10.93174
0.00609
3.15864
0.18927
10.92151
4.72289
0.00672
3.92809
0.18978
11.03139
0.00756
7.86067
0.00633
2.84756
0.03077
8.88099
186.77447
0.00312
3.12193
0.00529
2.48208
0.03063
8.87556
190.04912
0.00423
4.60866
0.00851
2.60334
0.03076
8.90722
31
MSEL ( ρˆ )
MSEL ( ߈ )
MSEL ( ߈ )
Table 3. Monte Carlo Results for OLS and the Spatial ML and GME Estimators with Bordered Proximity Matrix and with Contaminated Error Model. OLS
GME-NLP
MSEL ( λˆ )
MSEL ( ߈ )
2.03865
0.46891
2.30312
0.25068
1.50706
0.24924
7.09594
1.14160
0.44420
1.35981
0.25061
0.96532
0.24928
7.09796
1.61746
0.46271
1.89322
0.25093
1.28911
0.24919
7.28808
MSEL ( ߈ )
MSEL ( ρˆ )
MSEL ( ߈ )
16 a
248.19208
0.20811
202.69262
0.01749
17 a
244.67450
0.20600
199.54758
244.78677
0.23995
235.05793
MSEL ( ߈ )
MSEL ( ρˆ )
MSEL ( ߈ )
19 b
198.21838
0.00807
8.65881
0.01340
20 c
197.79736
0.01247
13.94196
0.01459
14
a
15
a
18
a
b
Bordered proximity matrix. Toepltiz proximity matrix with contaminated error ϕ = 0.15.
c
Toepltiz proximity matrix with contaminated error ϕ = 0.25.
32
MSEL ( λˆ )
MSEL ( ߈ )
GME-N
MSEL ( ߈ )
Experiment 13 a
a
ML
MSEL ( ߈ )
MSEL ( ρˆ )
MSEL ( ߈ )
8.89938
0.18982
10.86247
0.01465
9.56696
0.19006
10.86612
0.01486
9.70598
0.19050
10.98457
MSEL ( ρˆ )
MSEL ( ߈ )
5.21386
0.18928
10.91162
6.31885
0.18948
10.92980
MSEL ( ρˆ )
MSEL ( ρˆ )
MSEL ( ߈ )
MSEL ( λˆ )
MSEL ( ߈ )
Table 4. Monte Carlo Experiments for the Censored Regression Model. ϕ
W
0.5
0
Toeplitz
0
0.5
0
Toeplitz
(-20,0,20)
0.5
0
0
Toeplitz
50
(-20,0,20)
0.5
0
0
Toeplitz
5
25
(-20,0,20)
0.5
0
0.25
Toeplitz
6
50
(-20,0,20)
0.5
0
0.25
Toeplitz
Experiment 1
N
Sβ′
ρ
λ
25
(-20,0,20)
0
2
50
(-20,0,20)
3
25
4
33
Table 5. Monte Carlo Results for GME-NLP Estimators and the ML-Tobit Estimator for the Censored Regression Model.
ML-Tobit
Non-Iterative
Iterative
GME-NLP
GME-NLP
Experiment
MSEL( ߈ )
MSEL( λˆ )
MSEL( ߈ )
MSEL( λˆ )
MSEL( ߈ )
1
9.13729
0.10992
10.99066
0.27351
13.99312
2
8.37342
0.04531
15.24482
0.20843
16.61894
MSEL( ߈ )
MSEL( ρˆ )
MSEL( ߈ )
MSEL( ρˆ )
MSEL( ߈ )
3
10.92192
0.05837
9.25544
0.07633
9.70641
4
9.30010
0.01842
10.46130
0.02733
8.27657
5
20.10622
0.05950
11.22403
0.06907
17.06522
6
17.13849
11.95290
0.03483
14.27675
0.02931
34
Table 6. Definitions of Selected Variables for Disaster Relief Model (N=384). Variables
Definition
(+) Percent change in precipitationa
To capture periods of increased wetness, one variable contains positive percent changes in precipitation; 0 otherwise.
(-) Percent change in precipitationa
Periods of relatively dryer weather are reflected in another variable containing negative percent changes in precipitation; 0 otherwise.
Percent change in low temperature a
For extreme or severe freezes, the annual percent change in low temperature.
Crop Insurance
These payments include both government and private insurance payments from the Crop Insurance program, and are computed from subtracting total farmer payments (which equals total insurance premiums plus a federal subsidy) from total indemnity payments.
Secretary of Agriculture b
1 if secretary of agriculture from a specific state; 0 otherwise
House Agriculture Subcommittee b
1 if state represented on House Agriculture Committee, subcommittee on General Farm Commodities, Resource Conservation, and Credit; 0 otherwise
Senate Agriculture Subcommittee b
1 if state represented on Senate Agriculture Committee, subcommittee on Research, Nutrition, and General Legislation; 0 otherwise
House Appropriations Subcommittee b
1 if state represented on House Appropriations Committee, subcommittee on Agriculture, Rural Development, Food and Drug Administration, and Related Agencies; 0 otherwise
Senate Appropriations Subcommittee b
1 if state represented on Senate Appropriations Committee, subcommittee on Agriculture, Rural Development, and Related Agencies; 0 otherwise
Income Measures, Farm Acres, Number of Farms
U.S Bureau of the Census’ Bureau of Economic Analysis
Crop Values
USDA’s National Agricultural Statistics Service
Electoralc
Represents a measure of electoral importance.
Census Regionsd
1 if state in a specific Census Region; 0 otherwise
Year Dummies 1 if a specific year from 1992 to 1999; 0 otherwise a For each state, average annual precipitation data were gathered over the period 1991 to 1999 from the National Oceanic Atmospheric Administration’s (NOAA) National Climatic Data Center. b From the Almanac of American Politics. c Garrett and Sobel (2003). d New England: Connecticut, Vermont, Massachusetts, Maine, Rhode Island, New Hampshire; Mid Atlantic: New Jersey, New York, Pennsylvania; East North Central: Michigan, Indiana, Illinois, Wisconsin, Ohio; West North Central: North Dakota, Minnesota, Nebraska, South Dakota, Iowa, Missouri, Kansas; South Atlantic: West Virginia, Delaware, South Carolina, North Carolina, Maryland, Florida, Virginia, Georgia; East South Central: Kentucky, Mississippi, Alabama, Tennessee; West South Central: Arkansas, Oklahoma, Texas, Louisiana; Mountain: Montana, Colorado, New Mexico, Arizona, Wyoming, Nevada, Idaho, Utah; Pacific: Oregon, Washington, California.
35
Table 7. Results for Disaster Payment Model using Spatial GME-Tobit Estimators. Model 1 Spatially Lagged Residual
Model 2 Spatially Lagged Dependent Variable
Variable
Coefficients
T-values
Coefficients
T-values
Constant
20.843
0.985
-84.926
-4.401
(+) Percent change in precipitation
0.429
0.968
0.961
2.322
(-) Percent change in precipitation
-0.880
-1.348
-1.434
-2.310
Percent change in low temperature
2.488
0.991
2.174
0.931
Crop Insurance
-0.424
-4.136
-0.335
-2.731
Number of Farms
0.001
6.061
0.001
5.584
Secretary of Agriculture
72.488
1.904
81.665
2.185
House Agriculture Subcommittee
20.244
1.508
44.717
3.545
Senate Agriculture Subcommittee
16.601
0.999
42.244
2.647
House Appropriations Subcommittee
36.005
2.681
36.164
2.781
Senate Appropriations Subcommittee
4.826
0.348
15.374
1.159
1992
125.289
5.108
72.544
3.179
1993
136.660
5.579
73.513
3.231
1994
38.872
1.630
43.570
1.969
1995
-24.432
-1.083
23.333
1.115
1997
-82.897
-3.598
-22.220
-1.026
1998
-14.539
-0.636
31.185
1.474
1999
77.262
2.998
85.207
3.530
λ
0.783
9.981
---
---
ρ
---
---
0.270
3.153
36
Appendix A. Conditional Maximum Value Function Define the conditional entropy function by conditioning on the ( L + K + 2) × 1 vector
(
)
θ = t = vec t π , t ß , t ρ , t λ , yielding
( )
( )
( )
F ( τ) = max{−∑ pkjß ln pßkj −∑ pkjπ ln pkjπ −∑ pρj ln pρj p:θ=t
k, j
(A.1)
k, j
( )
( )
j
( )
u u v v − ∑ pλj ln p λj −∑ pim ln pim −∑ pim ln pim } j
(
u The optimal value of pui = pi1u ,K , piM *
*
*
*
i ,m
*
i ,m
)′ in the conditionally- maximized entropy
function is given by
pui ( τ) = *
(A.2)
* M u* argmax −∑ pil ln( piul ) , M * M * * * l =1 piul : ∑ p uil =1,∑ slu p uil =u *i ( t ) l=1
l=1
which is the maximizing solution to the Lagrangian (A.3)
L
* p ui
M M M * * * * * * * = −∑ piul ln( piul ) + φiu ∑ piul −1 + γ iu ∑ siul piul − ui* (t ) . l =1 l =1 l =1 *
The optimal value of piul is (A.4)
p
u* il
(γ
u* i
)
(u ( τ)) = * i
u*
eγ i (ui ( t)) sil u
*
∑e
*
M
*
γ ui ( u*i ( t)) s uim
, l =1,K , M ,
m =1
where γ ui (ui* ( t )) is the optimal value of the Lagrangian multiplier γ ui . The optimal value of pßk = ( pßk1, K , pßkJ )′ in the conditionally- maximized entropy function is given by (A.5)
pßk ( τβk ) =
J ß argmax −∑ pk l ln( pßk l ) , J J l =1 pkß: ∑ p ßkl =1, ∑ skβl p ßkl =τβk l=1
l=1
37
which is the maximizing solution to the Lagrangian (A.6)
J J J Lpk = − ∑ pβkl ln( pβk l ) + φ βk ∑ pβkl − 1 + ηβk ∑ skβl pβk l − τβk . l =1 l =1 l =1
The optimal value of pkβl is then β
β kl
β k
p (τ ) =
(A.7)
β
β
eηk ( τk ) skl J
∑e
ηβk ( τβk ) sβkm
, l = 1,K , J ,
m =1
where ηβk (τβk ) is the optimal value of the Lagrangian multiplier ηβk . Likewise the optimal values of p π , p v , p ρ , p λ can be derived. Substituting optimal values for p = vec ( pß , p π , p ρ, p λ , p u , p v ) into the conditional entropy function (A.1) yields
(A.8)
F ( τ) = −∑ ηπk τ πk τ πk − ln ∑ exp ηπk τ kπ pkjπ j k
( )
( ( ) )
β β β β β β − ∑ ηk τk τk − ln ∑ exp ηk τ k pkj k j
( )
( ( ) )
− ηρ τρ τρ − ln ∑ exp ηρ τ ρ pρj − ηλ τλ τλ − ln ∑ exp ηλ τ λ p λj j j v u* v v − ∑ γui (u*i ( t )) ui* − ln ∑ exp γui ( u*i ( t )) pim − ∑ γi (vi ( t ))vi − ln ∑ exp γi ( vi ( t )) pim i m i m
( )
( ( ) )
(
( )
( ( ) )
)
(
Computational Issues Following the computationally efficient approach of Mittelhammer and Cardell (1998), the conditional entropy function [equation (A.8)] was maximized. Note that the constrained maximization problem in (13) requires estimation of ( KJ + LJ + 2 J + 2NM ) × 1 unknown parameters. Solving (13) for ( KJ + LJ + 2 J + 2NM ) unknowns is not computationally practical as the sample size, N,
38
)
grows larger. In contrast, maximizing (A.8) requires estimation of only ( L + K + 2) unknown coefficients for any positive value of N. The GME-NLP estimator uses the reduced and structural form models as data constraints with a dual objective function as part of its information set. To completely specify the GME-NLP model, support points (upper and lower truncation and intermediate) for the individual parameters, support points for each error term, and ( L + K + 2) starting values for the parameter coefficients are supplied by the user. In the
Monte Carlo analysis and empirical application, the model was estimated using the unconstrained optimizer OPTIMUM in the econometric software GAUSS. We used 3 support points for each parameter and error term. To increase the efficiency of the estimation process the analytical gradient and Hessian were coded in GAUSS and called in the optimization routine. This also offered an opportunity to empirically validate the derivation of the gradient and Hessian (provided below). Given suitable starting values the optimization routine generally converged within seconds for the empirical examples discussed above. Moreover, solutions were robust to alternative starting values. Gradient
(
)
The gradient vector ∇ = vec ∇ π , ∇ ß ,∇ ρ ,∇ λ of F (t ) is
(A.9)
ηπ ηβ ∇ = − ρ η λ η
( t ) Z′ [0] t ( ) + τ ( ) 0 ( τ ) 0 π
β
ρ
λ
γv γ u ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß′
( ρW1 +λW2 - ( λW2 )( ρW1 ) ) Z ′ ( I -λW2 ) X ′ ( W1 - ( λW2 )( W1 ) ) Z π′
39
Hessian The Hessian matrix H ( t ) =
∂ 2F ( t ) is composed of submatrices ∂t ∂t ′
(A.10.1) ∂γu ( u ( t ) ) ′ = − ( ρW1 +λW2 - ( λW2 )( ρW1 ) ) Z e ( ρW1 +λW2 - ( λW2 )( ρW1 ) ) Z ∂t π ∂t π ′ ∂u ( t ) ∂2 F ( t )
( )
∂γ v ( v ( t ) ) ∂ηπ t π -Z′ e Z − ∂v ( t ) ∂t π ′
(A.10.2) 2 ∂γu ( u ( t ) ) ∂ F (t ) = − ( ρ W1 +λW 2- ( λW2 )( ρ W1 ) ) Z ′ e ( I -λW2 ) X ∂t π ∂t β′ ∂u ( t )
(A10.3) 2 ∂γu ( u ( t ) ) ∂ F (t ) = − ( ρW1 +λW 2- ( λW2 )( ρW1 ) ) Z ′ e ( W1 - ( λW2 )( W1 ) ) Zπ ∂t π ∂t ρ′ ∂u ( t ) ′ + ( W1- ( λW 2 )( W1 ) ) Z γ u (u ( t ) )
(A.10.4) 2 ∂γu ( u ( t ) ) ∂ F (t ) = − ( ρW1+λW 2-( λW2 )( ρW1 ) ) Z ′ e ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß ∂t π∂t λ′ ∂u ( t ) ′ + ( W2 - ( W2 )( ρW1 ) ) Z γu (u ( t ) )
(A.10.5)
∂2 F ( t ) ∂t ß∂t ß′
=−
( ) − ( I-λW ) X′ ∂γ (u ( t )) e (I-λW ) X
∂ηß t ß ∂t ß′
u
2
∂u ( t )
2
(A.10.6) ∂γu ( u ( t ) ) ∂2 F ( t ) = − ( I -λW2 ) X ′ e ( W1 - ( λW2 )(W1 ) ) Zπ ∂t ß∂t ρ′ ∂u ( t )
40
(A.10.7) ∂γ u ( u ( t ) ) ∂2 F ( t ) u ′ ′ = ( − W2 ) X γ ( u ( t ) ) − ( I-λ W2 ) X e ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß ∂t ß ∂t λ′ ∂u ( t ) (A.10.8)
∂2 F ( t ) ∂t ρ ∂t ρ
=−
( ) − W - (λW )(W )
∂ηρ τρ ∂τρ
(
1
2
1
∂γu ( u ( t ) ) ′ ) Zπ ∂u t e ( W1- ( λW2 )(W1 ) ) Zπ ( )
(A.10.9) ∂ F (t ) 2
∂t λ ∂t λ
u ′ ∂γ ( u ( t ) ) = − ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß e ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß ∂u ( t )
−
( )
∂ηλ τλ ∂τλ
(A.10.10) ∂2 F ( t ) ∂t ρ ∂t λ
∂γ u ( u ( t ) ) ′ = − ( W1 - ( λW2 )( W1 ) ) Zπ e ( W2 - ( W2 )( ρW1 ) ) Zπ − W2 Xß ∂u ( t ) + ( -W2 W1 ) Zπ ′ γ u ( u ( t ) )
In the above equations, the notation e implies a Hadamard product (element by element multiplication) and the derivatives of the Lagrangian multipliers are defined as −1
(A.11.1)
2 ∂ηlk (τ kl ) J l 2 l = ∑ ( s kj ) pkj − ( τ kl ) l ∂τk j =1
(A.11.2)
∂γ ui ( u ( τ ) )
(A.11.3)
∂γ vi ( v ( τ ) )
∂ui ( τ )
∂vi ( τ )
for l ∈ {π, ß, ρ, λ}
2 J = ∑ ( suil ) piul ( γiu( ui ( τ))) − ui2 ( τ) l =1
J v 2 v v = ∑ ( sil ) pil ( γi ( vi ( τ))) −vi2 ( τ) l =1
In the equations (A.9)-(A.11) the superscript * was dropped in the notation of the structural equations residuals for simple convenience.
41
Asymptotic Covariance Matrix Given the above derivations (A.1)-(A.11), we briefly characterize the asymptotic properties of the GME-NLP estimator based on results from Marsh, Mittelhammer, and Cardell (1998). This allows us to construct an estimator for the asymptotic covariance matrix used to calculate t-values in the illustrative application. Assuming ( I − λW2 ) is nonsingular, ( I − ρW1 ) is nonsingular, and other standard regularity conditions for the simultaneous equations model, then (13) provides an estimator for ß, π, ρ, λ that is consistent and asymptotically normally distributed. Intuitively, consistency of the reduced form parameters in (13) comes directly from results for the general linear model in Mittelhammer and Cardell (1998). Then, consistency of the spatial and structural parameters in (13) (conditional on the reduced form parameters) follows by applying Theorem XIV in Rao (p. 124). With consistency of ß, ρ, λ , then asymptotic normality is established. In the case of a spatially lagged dependent variable ( λ = 0, ρ ≠ 0 ), the asymptotic covariance matrix for ( ρ, ß′)′ can be estimated by (A.12) where σ2γ =
(
ˆ ρ = σˆ γ2 M ˆ ρ 'M ˆρ Ω
( ( )) ( ( )) and Mˆ
1 u * ˆ ′ * ˆ γ u ß, ρˆ u ß, ρˆ N
ρ
)
−1
= [ W1Zπˆ X] . In the case of a spatially
lagged residual ( ρ = 0, λ ≠ 0 ), the covariance matrix for ( λ, ß′)′ can be estimated by (A.13) where σ2γ =
(
ˆ λ = σˆγ2 M ˆ λ 'M ˆλ Ω
( ( )) ( ( )) and Mˆ
1 u * ˆ ˆ ′ * ˆ ˆ γ u ß, λ u ß, λ N
42
λ
)
−1
= W2 Zπˆ
(I-λˆW ) X . 2
Endnotes 1
See Anselin (1988) for further motivation and discussion regarding finite sample justified estimators. Mittelhammer, Judge, and Miller (2000) provide an introduction to information theoretic estimators and their connection to maximum entropy estimators. 3 In contrast to the pure data constraint in (2b), the GME estimator could have been specified with the moment constraint X′Y = X ′Xß + X′e . Data and moment constrained GME estimators are further discussed below and Monte Carlo results are provided for both. 4 For notational convenience it is assumed that each coefficient has J support points and each error has M support points. 5 A dual loss function combined with the flexibility of parameter restrictions and user-supplied supports can provide an estimator that is robust in small samples or in ill-posed problems. For further discussion of dual loss functions see Zellner (1994). 6 Generalized two stage least squares and generalized moment simultaneous equations estimators are discussed in Kelejian and Prucha (1998,1999). 7 An alterative specification of the moment constraint would be to replace equation (10b) with X ' I − S λ p λ W2 I − S ρ p ρ W1 Y − I − S λ p λ W 2 X S ß p ß N = X ' Su p u / N . 8 Including the reduced from model in (13c) is necessary to identify the reduced form parameters. In effect this is a one-step estimator in which the reduced and structural form parameters, as well as the spatial correlation coefficients, are estimated concurrently. 9 For example, the Moran I statistic, which is commonly used to test for spatial autocorrelation, is asymptotically beta distributed for Wb and normally distributed for Wt (Cliff and Ord 1981). 10 The log-likelihood function of the ML estimator for the spatial model with A = I − ρ W1 and B = I − λW2 2
( (
) )( (
) )
( (
) ) (
)
(
)
is ln L = −( N /2)ln( π ) − (1/2)ln| Ω | + ln | B | + l n | A| −(1/2)v ' v (Anselin 1988). 11 For further information on EM approaches see McLachlan and Krishnan (1997).
ˆ ( i ) and then ߈ ( i ) . The notation (i) in the The process of updating can continue iteratively to construct Y 2 superscript of the variables indicates the ith iteration with i=0 representing initial values. 13 For the iterative GME-NLP estimator, convergence was assumed when the absolute value of the sum of the difference between the estimated parameter vectors from the ith iteration and i-1st iteration was less than 0.0005. 14 Interestingly, there is an apparent tradeoff between bias and variance for the non-iterative and iterative GME-NLP estimators. While the iterative estimator dramatically reduced bias, it increased variance and the magnitude of MSEL depended which effect was more dominate. 15 See Smith and Blundell (1986) regarding inference for maximum likelihood estimators of the simultaneous Tobit model. 16 For further information see Garrett, Marsh, and Marshall (2003). 17 Garrett, Marsh, and Marshall (2003) explicitly investigated the simultaneous effects between disaster relief and crop payments. Alternatively, we choose to proxy crop insurance payments with predicted values and then focus on spatial effects of the disaster relief model. 18 Asymptotic covariance matrices for structural and spatial parameters of the GME-NLP estimator are provided in Appendix A. 19 Garrett, Marsh, and Marshall (2003) found that the mean level of direct disaster relief was $54 million higher per year in the home state of the secretary of agriculture. Those states having legislators on the Senate (House) Appropriations subcommittee received an average of $27 ($42) million more per year. 12
43