Stata output

119 downloads 2467 Views 67KB Size Report
Apr 26, 2016 - control columns and column formats, row spacing, line width, display. Maximization maximize_options control the maximization process; ...
Heckman selection model

Tuesday April 26 12:41:44 2016

Page 1

Title [R] heckman

Heckman selection model

Syntax Basic syntax heckman depvar [indepvars ], select(varlist_s ) [twostep] or heckman depvar [indepvars ], select(depvar_s = varlist_s ) [twostep]

Full syntax for maximum likelihood estimates only

heckman depvar [indepvars ] [ if] [ in] [ weight ], select([depvar_s =] varlist_s [, noconstant

Full syntax for Heckman's two-step consistent estimates only

heckman depvar [indepvars ] [ if] [ in], twostep select([depvar_s =] varlist_s [, noconstant]

heckman_ml_options

Description

Model * select() noconstant offset(varname ) constraints(constraints ) collinear

specify selection equation: dependent and independent variables; whet suppress constant term include varname in model with coefficient constrained to 1 apply specified linear constraints keep collinear variables

SE/Robust vce(vcetype ) Reporting level(#) first noskip nshazard(newvar ) mills(newvar ) nocnsreport display_options Maximization maximize_options coeflegend

vcetype may be oim, robust, cluster clustvar , opg, bootstrap, or

set confidence level; default is level(95) report first-step probit estimates perform likelihood-ratio test generate nonselection hazard variable synonym for nshazard() do not display constraints control columns and column formats, row spacing, line width, display

control the maximization process; seldom used display legend instead of statistics

* select() is required. The full specification is select([depvar_s =] varlist_s [, noconstant offset(varname_o )]). heckman_ts_options Model * select() * twostep noconstant rhosigma rhotrunc rholimited rhoforce

Description

specify selection equation: dependent and independent variables; whet produce two-step consistent estimate suppress constant term truncate rho to [-1,1] with consistent Sigma truncate rho to [-1,1] truncate rho in limited cases do not truncate rho

SE vce(vcetype )

vcetype may be conventional, bootstrap, or jackknife

Heckman selection model Reporting level(#) first nshazard(newvar ) mills(newvar ) display_options coeflegend

Tuesday April 26 12:41:51 2016

Page 2

set confidence level; default is level(95) report first-step probit estimates generate nonselection hazard variable synonym for nshazard() control columns and column formats, row spacing, line width, display display legend instead of statistics

* select() and twostep are required. The full specification is select([depvar_s =] varlist_s [, noconstant]). fvvarlist . indepvars and varlist_s may contain factor variables; see tsvarlist . depvar , indepvars , varlist_s , and depvar_s may contain time-series operators; see bootstrap, by, fp, jackknife, rolling, statsby, and svy are allowed; see prefix . Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. twostep, vce(), first, noskip, and weights are not allowed with the svy prefix. pweights, aweights, fweights, and iweights are allowed with maximum likelihood estimation; see coeflegend does not appear in the dialog box. See [R] heckman postestimation for features available after estimation.

Menu heckman for maximum likelihood estimates Statistics > Sample-selection models > Heckman selection model (ML) heckman for two-step consistent estimates Statistics > Sample-selection models > Heckman selection model (two-step)

Description

heckman fits regression models with selection by using either Heckman's two-step consistent estimat

Options for Heckman selection model (ML)

Model

select([depvar_s =] varlist_s [, noconstant offset(varname_o )]) specifies the variables and opti required. The selection equation should contain at least one variable that is not in the outcom

If depvar_s is specified, it should be coded as 0 or 1, with 0 indicating an observation not se observations for which depvar is not missing are assumed selected, and those for which dep noconstant suppresses the selection constant term (intercept). offset(varname_o ) specifies that selection offset

varname_o be included in the model with th

noconstant, offset(varname ), constraints(constraints ), collinear; see [R] estimation options.

SE/Robust

vce(vcetype ) specifies the type of standard error reported, which includes types that are derived f ( robust), that allow for intragroup correlation ( cluster clustvar ), and that use bootstrap

Reporting level(#); see [R] estimation options. first specifies that the first-step probit estimates of the selection equation be displayed before

noskip specifies that a full maximum-likelihood model with only a constant for the regression equat likelihood-ratio test for the model test statistic displayed in the estimation header. By defau the parameters in the regression equation are zero (except the constant). For many models, this

Heckman selection model

Tuesday April 26 12:41:51 2016

Page 3

nshazard(newvar ) and mills(newvar ) are synonyms; either will create a new variable containing the -- from the selection equation. The nonselection hazard is computed from the estimated paramete nocnsreport; see [R] estimation options.

display_options : noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels nolstretch; see [R] estimation options.

Maximization maximize_options : difficult, technique(algorithm_spec ), iterate(#), [ no]log, trace, gradient, and from(init_specs ); see [R] maximize. These options are seldom used. Setting the optimization type to

technique(bhhh) resets the default

The following option is available with

vcetype to vce(opg).

heckman but is not shown in the dialog box:

coeflegend; see [R] estimation options.

Options for Heckman selection model (two-step)

Model

select([depvar_s =] varlist_s [, noconstant]) specifies the variables and options for the selecti selection equation should contain at least one variable that is not in the outcome equation.

If depvar_s is specified, it should be coded as 0 or 1, with 0 indicating an observation not se observations for which depvar is not missing are assumed selected, and those for which dep noconstant suppresses the selection constant term (intercept). twostep specifies that Heckman's

(1979) two-step efficient estimates of the parameters, standard

noconstant; see [R] estimation options.

rhosigma, rhotrunc, rholimited, and rhoforce are rarely used options to specify how the two-step outside the admissible range for a correlation, [-1,1]. When rho is outside this range, the two definite and thus may be unusable for testing. The default is rhosigma.

rhosigma specifies that rho be truncated, as with the rhotrunc option, and that the estimat B_m * rho_hat; see Methods and formulas in [R] heckman for the definition of B_m. Both the two-step covariance matrix.

rhotrunc specifies that rho be truncated to lie in the range [-1,1]. If the two-step estimate to 1. This truncated value of rho is used in all computations to estimate the two-step covarian

rholimited specifies that rho be truncated only in computing the diagonal matrix D as it enters the untruncated estimate of rho is used.

rhoforce specifies that the two-step estimate of rho be retained, even if it is outside the adm non-positive-definite covariance matrix. These options have no effect when estimation is by maximum likelihood, the default.

They also h

SE

vce(vcetype ) specifies the type of standard error reported, which includes types that are derived f ( bootstrap, jackknife); see [R] vce_option . vce(conventional), the default, uses the two-step variance estimator derived by Heckman.

Reporting level(#); see [R] estimation options. first specifies that the first-step probit estimates of the selection equation be displayed before

Heckman selection model

Tuesday April 26 12:41:51 2016

Page 4

nshazard(newvar ) and mills(newvar ) are synonyms; either will create a new variable containing the -- from the selection equation. The nonselection hazard is computed from the estimated paramete

display_options : noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels nolstretch; see [R] estimation options. The following option is available with

heckman but is not shown in the dialog box:

coeflegend; see [R] estimation options.

Remarks Heckman estimates all the parameters in the model: (regression equation: y is y = xb + u_1 (selection equation: Z is y observed if Zg + u_2 > 0

depvar , x is varlist )

varlist_s )

where: u_1 ~ N(0, sigma) u_2 ~ N(0, 1) corr(u_1, u_2) = rho

In the syntax for heckman, depvar and varlist are the dependent variable and regressors for the u determine whether depvar is selected or observed (selected or not selected). By default, heck unobserved (not selected). With some datasets, it is more convenient to specify a binary variable ( ( depvar_s !=0) or not observed ( depvar_s =0); heckman will accommodate either type of data.

Examples Setup . webuse womenwk Obtain full ML estimates . heckman wage educ age, select(married children educ age) Obtain Heckman's two-step consistent estimates . heckman wage educ age, select(married children educ age) twostep Define . . .

and use each equation separately global wage_eqn wage educ age global seleqn married children age heckman $wage_eqn, select($seleqn)

Use a variable to identify selection . generate wageseen = (wage < .) . heckman wage educ age, select(wageseen = married children educ age) Specify robust variance . heckman wage educ age, select(married children educ age) vce(robust) Specify clustering on county . heckman $wage_eqn, select($seleqn) vce(cluster county) Report first-step probit estimates . heckman wage educ age, select(married children educ age) first Create mymills containing nonselection hazard . heckman $wage_eqn, select($seleqn) mills(mymills) No constant in model . heckman wage educ age, noconstant select(married children educ age) No constant in selection equation . heckman wage educ age, select(married children educ age, noconstant)

Stored results

Heckman selection model

Tuesday April 26 12:41:51 2016

heckman (maximum likelihood) stores the following in

Page 5 e():

Scalars e(N) e(N_cens) e(k) e(k_eq) e(k_eq_model) e(k_aux) e(k_dv) e(df_m) e(ll) e(ll_0) e(N_clust) e(lambda) e(selambda) e(sigma) e(chi2) e(chi2_c) e(p_c) e(p) e(rho) e(rank) e(rank0) e(ic) e(rc) e(converged)

number of observations number of censored observations number of parameters number of equations in e(b) number of equations in overall model test number of auxiliary parameters number of dependent variables model degrees of freedom log likelihood log likelihood, constant-only model number of clusters lambda standard error of lambda sigma chi-squared chi-squared for comparison test p-value for comparison test significance of comparison test rho rank of e(V) rank of e(V) for constant-only model number of iterations return code 1 if converged, 0 otherwise

Macros e(cmd) e(cmdline) e(depvar) e(wtype) e(wexp) e(title) e(title2) e(clustvar) e(offset1) e(offset2) e(mills) e(chi2type) e(chi2_ct) e(vce) e(vcetype) e(opt) e(which) e(method) e(ml_method) e(user) e(technique) e(properties) e(predict) e(marginsok) e(asbalanced) e(asobserved)

heckman command as typed names of dependent variable weight type weight expression title in estimation output secondary title in estimation output name of cluster variable offset for regression equation offset for selection equation variable containing nonselection hazard (inverse of Mills's ratio) Wald or LR; type of model chi-squared test Wald or LR; type of model chi-squared test corresponding to e(chi2_c) vcetype specified in vce() title used to label Std. Err. type of optimization max or min; whether optimizer is to perform maximization or minimization ml type of ml method name of likelihood-evaluator program maximization technique b V program used to implement predict predictions allowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved

Matrices e(b) e(Cns) e(ilog) e(gradient) e(V) e(V_modelbased)

coefficient vector constraints matrix iteration log (up to 20 iterations) gradient vector variance-covariance matrix of the estimators model-based variance

Functions e(sample)

marks estimation sample

heckman (two-step) stores the following in

e():

Heckman selection model

Tuesday April 26 12:41:51 2016

Page 6

Scalars e(N) e(N_cens) e(df_m) e(lambda) e(selambda) e(sigma) e(chi2) e(p) e(rho) e(rank)

number of observations number of censored observations model degrees of freedom lambda standard error of lambda sigma chi-squared significance of comparison test rho rank of e(V)

Macros e(cmd) e(cmdline) e(depvar) e(title) e(title2) e(mills) e(chi2type) e(vce) e(vcetype) e(rhometh) e(method) e(properties) e(predict) e(marginsok) e(marginsnotok) e(asbalanced) e(asobserved)

heckman command as typed names of dependent variable title in estimation output secondary title in estimation output variable containing nonselection hazard (inverse of Mills's ratio) Wald or LR; type of model chi-squared test vcetype specified in vce() title used to label Std. Err. rhosigma, rhotrunc, rholimited, or rhoforce twostep b V program used to implement predict predictions allowed by margins predictions disallowed by margins factor variables fvset as asbalanced factor variables fvset as asobserved

Matrices e(b) e(Cns) e(V)

coefficient vector constraints matrix variance-covariance matrix of the estimators

Functions e(sample)

marks estimation sample

Reference Heckman, J. 1979.

Sample selection bias as a specification error.

Econometrica

47: 153--161.