added to the list of the usual unknown parameters in a censored regression model, e.g., the regression ... Northwestern,. Bell Labs, Princeton, Columbia, UC San Diego, UC Berkeley, UC Santa Barbara .... Ilf- glL 2 (Ilf'll/a)hy. (11) where a is ...
Journal
of Econometrics
32 (1986) 5-34.
North-Holland
A SEMI-PARAMETRIC CENSORED REGRESSION ESTIMATOR Gregory M. DUNCAN * Washington
State University,
Pullman,
WA 99164-4860,
USA
This paper introduces a semi-parametric method for estimating regression coefficients when the underlying parent population of errors in censored. The method is an example of the method of sieves: and it provides simultaneous estimates of the regression coefficients and the density of the underlying parent population. In the very simplest terms, the underlying unknown density is approximated by a spline with mesh size approaching zero with the sample size. The values of the density at the knots are then added to the list of the usual unknown parameters in a censored regression model, e.g., the regression coefficients and scale parameter. A quasi-likelihood function using the approximate spline density is then maximized over all the parameters mentioned above. The method is shown to result in strongly consistent parameter estimates.
1. Introduction This paper introduces a semi-parametric method for estimating regression coefficients when the underlying parent population of errors is censored. The method is an example of the method of sieves; and it provides simultaneous estimates of the regression coefficients and the density of the underlying parent population. In the very simplest terms, the underlying unknown density is approximated by a spline with mesh size approaching zero with the sample size. The values of the density at the knots are then added to the list of the usual unknown parameters in a censored regression model, e.g. the regression coefficients and scale parameter. A quasi-likelihood function using the approximate spline density is then maximized over all the parameters mentioned above. The method is shown to result in strongly consistent parameter estimates. 2. Background In a left (right) censored regression model, defined more formally below, the values of the dependent variable falling below (above) a given value are *I am indebted to Jim Heckman, Hal White, Alan Marcus, Chris Sims, Steve Cosslett, David Spencer, Rob Engle, and Dale Poirier for helpful comments. Ib Hansen, Cathleen Leue-Ronev. and Mark Thorna-are thanked for research and programming assistance. This research was funded under NSF grant SES-8109274. Participants at seminars at Minnesota, Wisconsin, Chicago, Northwestern, Bell Labs, Princeton, Columbia, UC San Diego, UC Berkeley, UC Santa Barbara and UC Riverside, are also thanked for their comments.
0304~4076/86/$3.500
1986, Elsevier Science Publishers
B.V. (North-Holland)
6
G. M. Duncan, A semi-parametric censored regression estimator
replaced by that value. The regression errors are typically assumed to be independent and identically distributed, with zero expectation and with distribution function F(s); almost without exception, F( .) is assumed to be normal. Examples of the normally censored regressions can be found in Tobin (1958), Amemiya (1973), Heckman (1976,1979), Nelson (1977), and Lee and Trost (1978). They appear as a case of a much larger class of so-called selectivity problems which include mixed continuous/discrete models, Heckman (1978), Schmidt (1978), and Duncan (1980a,1982). All of the above references explicitly assume normality. The problem is that normally censored regression estimation methods are not robust against non-normality. For example, in the uncensored regression case, normality is a convenience and the fact the errors are, say, uniform has no effect on consistency. In the censored regression case, falsely assuming normality or generally misspecifying the distribution will lead to inconsistency. The obvious alternative is to model the true parent distribution. But economic theory, as yet, gives us no guide. The second alternative is to use a common flexible family of parametric distributions, such as the beta or Pearson family. This is unattractive since we still have no guides to the construction or justification of such families in practical situations. The final method and the one we choose here is to jointly estimate the regression coefficients and (to a high degree of accuracy in approximation) the parent population distribution and density function. In particular, we shall assume that the density can be uniformly approximated by a spline with exponential tails, and we find and estimate this approximation, together with the usual parameters. 3. A censored regression model Let y,=X,7;B+ei = 0
if
q> -XiTfi,
if
qI
-X;rS,
i=l
,.*., N.
X, and /3 are K x 1 vectors, while the E, are independent identically distributed random errors with an absolutely continuous distribution, expectation zero and variance u2. It will be necessary to assume that the density of y,l X, can be written as
where f( .) does not depend explicitly on u, p or Xi. This is a usual assumption and results in only a small loss in generality. Let JI = {i: yi > 0} and $‘= {i: y, = 0}, then the log-likelihood function can be
G.M. Duncan, A semi-parametric
censored regression estimator
written as !(/?,a)=
Clnf(r,-X$/e)+ iC$
C lnf(-Xjr/3/a)+
Clnl/a, iC$
jE+
where F(t) is /f_ m f(s)ds. And under the conditions listed in Amemiya (1973) or those listed in Hoadley (1981) a maximum-likelihood estimator of p and u exists which is consistent and asymptotically normal. 4. Splines Most simply, splines are continuous functions the graphs of which are piecewise polynomials joined together at points in the domain called knots. A linear spline is a continuous piecewise linear function. The value of the spline at its knots defines the whole function. So if a linear spline is defined over [a, b] and the knots are at t, < t, < . . * < t, with tO= a, t, = 6, and if f, =f(ti), i = 0,. . . , m, then the function f is completely determined by
=J+++;), r+l
j(t)
t;s t
I tr+l.
0)
1
If the knots are equally spaced then the mesh size, h, is defined by
(2)
h=(b-a)/m,
t,=t,+ih,
i=O ,-.., m,
(3)
clearly t r+l
-
t,=h,
Vi.
Another more general way of writing a spline is as
where OL,and rp,(t) depend upon the spline being chosen. I shall refer to the ai (or L) as spline weights.
G. M. Duncan, A semi-parametric
8
censored regression estimator
To write (1) in the form of (4) we define a set of basis functions cpi(t); following Prenter (1976) define for fi and h given in (2) and (3) 1 s i I m - 1, Ly(t)
= (r - ri-,)/h,
ti_t If I ti,
=(ti+,-r)/h,
risrsri+l,
= 0,
otherwise.
(5)
L?(r) is an isoceles triangle centered at ri with height 1 and base 2h. For and i = rn, we use right triangles E(t)
= 0 - t,-r)/h,
r,~r~r,_,,
= 0,
otherwise,
= (rt - 0/h,
r,srsr,,
= 0,
otherwise,
i =
0
(6)
and L:(t)
(7)
Now for fi as in (1) we have (8) So (8) represents a linear spline with knots at ri, i = 0,. . . , m, constant mesh size, h, domain [to, r,], and spline weights { f,};; = 0. We need some facts about spline approximation. Let g(r) be an arbitrary function in C2[a, b], let ri be as above, and let i=O ,*.., m.
f,=dti),
Then f(r) = Cj$y(r) the error estimate
is the linear spline approximation
IV- dxJ s w’ll,/4)~2.
to g(r). And we have
(9)
We mention in passing that if ~(f, h) is the modulus of continuity of f with respect to h, then
a
00)
glL 2 (Ilf’ll/a)hy
(11)
IV- gll, s v-03 so that if g E C’(a, b), Ilf-
where a is constant. And if g is simply continuous, then (10) defines the Level
G. M. Duncan, A semi-parametric
censored regression estimator
9
of approximation. We also have need to approximate integrals - an obvious approximation to G(t) =/‘g(t)dt a is
where g and f are as above. Define
iVim
=/‘Lm(r)df D
(12)
then
Note that Mi”( t) is piecewise quadratic and an ogive; indeed, M,“( f )/h is a probability distribution function. Again using Prenter (1976, p. 46) we have
If’(t) - G(t)1s (lls”ll,/4)(~- a)h*, or
IIf+) - G(t)ll, 5 (Ils”ll,/4)(~ -dh*. It is useful to note that
O”Ly(t)Ljm(t)dt J-CC
= 1,
if
i=j,
= 0. otherwise.
Also, the two spaces
are isomorphic. Moreover, if A is taken to be R then ._F* and 9 are linear. Consequently, .F has dimension m + 1 and basis (LT(t)};_‘,,.
10
G. M. Duncan, A semi-paramerric censored regression estimator
5. Sieves Consider now a space S( m + 1) of all linear splines over [a, b] with m + 1 knots and fixed mesh size h. Construct a finer space S(2m + 1) from S(m) by halving h maintaining the original knots and locating a new knot halfway between each pair of old knots. We may thus generate a sequence of sets {
Wkm + l,},“_,.
The sequence
is increasing
and has a limit that is dense in C[a, b]. That is,
S(2km + 1) G S(2k+‘m lim k-m
As a further
sup inf { ]]fg
(14)
g]],]fe
S(2km + I), gE
09 C[fz,
bl}
=O.
/
refinement
g(k) = (f(dlf@)
(16)
let
E
St2km+l),
Then one may show that FE i?(k) is isomorphic to the set of all vectors + 1, we have l!?(k) is compact since subset of a finite-dimensional vector I
k=O,...,
+ l),
lbf(‘)dt=l,
f’0).
(17)
implies 0 O,
XEX,
i
-“T8”80tx(t) d&.(x),
/ --do
(20)
y=o,
XEX,
where cuE.&, (e,, 0,) E 8 and h,(x) is a density on X. Finally, we need the following assumptions for the ‘true’ (Y. Assumption
1.
Let
where o( -) and h,( .) are the ‘true’ densities, then for all c and 8r = (f?,, 6:) we have
11n gh
X;
e)dY, x; e)i I M(~, x),
G. M. Duncan, A semi-parametric
censored regression estimator
13
and M( y, x) is integrable. This condition would be common in the consistency proofs for maximum likelihood in the case that (Ywere known. Assumption 2. For g(y, x; 0) as above and for all c, Br = (&,, 8,‘) EB( lyl Ix) and E) x I exist and are finite. Assumption
3.
The sets
{XI -lIX%,/B,_>.
(i) For C,,, c A, C,,,+ a means sup ]]a - bllA + 0. &Cm
Theorem 1 below also refers to the following conditions: C.l. For every m and every n, AL is almost surely (dP,J
non-empty.
C.2. If, for some sequence a, EYE, K(u,, a,) -+ 0, then a, -+ a,. C.3. There exists a sequence a, ~9~
such that K(u,, a,) -+ 0.
Finally, for each 6 > 0 and each m, define 0, = {a =?,IH(a,,
a) I H(a,,
a,) - a},
where a, is the sequence in C.3. Given I sets Qi,..., 0, in 9, such that g( ., 6,) is measurable for each k, define
The following theorem holds generally, not just in the censored regression case here. Theorem 1 [Gemun and Hwung (1982)]. Assume { 9,) is chosen so that C. 1 -C. 3 are in force and let m(n) be a sequence diverging to 00 with n. If for each 6 B 0 there are sets O;t, OF,. . . , O;l in Yp,, m = 1,2,. . . , such that (i)
D,,,c ij or, k=l
(ii) (iii)
g( -, Or)
is measurable,
E ~m(nj(Pm(nj)’ < 00, n=l
then J?:(~, + a, almost surely.
G. M. Duncan, A semi-parametric
censored regression estimator
17
For our application we need a relationship between convergence of Kullback-Liebler information and .I., convergence. In this regard we have the following lemma from Geman (1981). Lemma 1.
Let f,,(a) be a density function satisfying
lrn fo(x)lnfO,(x)dx< -CO
0~.
If for each n, T, is a collection of density functions and if
lim sup
n-+m
/ET,
m f,(x)h{fo(x)/f(x)} J -CC
dx = 0,
then also
lim sup
n-w
fET,
m If(x) -f,,(x)\ I --oo
dx = 0.
We now state the major result of this paper. Let Sj = 1:if y, > 0 and let 6; = 0 otherwise, then we may write the log-likelihood as
and &,,Cn)= (&,, B,, a ,(,,) is the modified spline maximum likelihood estimator. Note we abuse notation and identify (Y,the vector of knot values of the density with a( a) the spline density. Employing Geman and Hwang (1982) we have the following theorem which is the main result of this paper. Theorem 2. Let 8,,,(,,) be the modijed spline maximum likelihood estimator, then under the assumptions of the model &(“) “3 e,, providedthatforsome~>O,m=m(n)=O(n’-’)
ork=O((l-e)logn).
Proof.
We check the conditions of Geman and Hwang (1982) and Theorem 1 in a series of lemmas and propositions. The proofs are in the appendix. 8. Distribution
theory
A complete asymptotic distribution theory is unavailable. This section sketches a partial theory. The problem in applying the usual theorems is the
18
G.M. Duncan, A semi-parametric
censored regression estimator
fact that the number of parameters grows as the sample size. However, if one is willing to accept a level of asymptotic bias, then, using White (1982), one can achieve a sort of distribution theory by fixing the mesh size at a suitably small value. The estimated parameters converge to the parameters that minimize the Kullback-Liebler norm. Suitably normalized, the distribution of the estimated parameters may be approximated by a multivariate normal distribution. If one assumes the world may be characterized by a fixed knot spline then, using Duncan (1980b), one has both consistency and normality. The deficiency here is that one has made a distributional assumption, and we are trying to avoid that. The deficiency in applying-White (1982) is that I’ve been unable to develop a relationship between the ‘true’ parameters and their KullbackLiebler approximations. In particular, I’ve been unable to develop a relationship between the Kullback-Liebler ‘approximation’ errors and the usual L, or L, approximation errors. 9. Empirical results Lacking asymptotic distribution results is a problem, however, since the worth of a technique must be judged not on how well it ought to work in huge samples, but in how well it actually performs in moderate samples I present here the results of some limited simulations. These simulations are meant to be indicative of the promise of the method rather than to provide a definitive justification of its use. There are two sets of results; the first gives point estimate comparisons between various estimates including the ordinary least squares estimates for the complete (uncensored) sample. Monte Carlo simulations generated the second set of results which include standard error estimates (Monte Carlo). Currently, I bootstrap to obtain confidence intervals. The first set of results were generated in the following manner. The independent variables were generated independent U( - 20,20), parameter values were chosen, an error distribution was chosen. A sample size was chosen, the inner product of the true parameters and randomly generated independent variables was calculated for each observation; an error was generated from the chosen error distribution and added to the ‘true’ value of the dependent variable. Ordinary least squares (OLS) was performed on these data. Next the dependent variable was replaced by a censored version, that is, negative values were replaced by zeros. Ordinary least squares (OLSC) and a TOBZT were run. Also, the method of this paper was applied with varying numbers of knots [C(K)]. Finally, the observations with zero values for the dependent variable were tossed out, and ordinary least squares (OLST) was performed on the resulting truncated data. My experience has been that the sieve method gives extraordinarily good estimates of the underlying regression coefficients, excluding the intercept, where by good I mean in terms of
G. hf. Duncan, A semi-parametric
Table lb
Table la E - [(lo)
Parameters and estimates E - C(lO), N = 400.=
4 TRUE OLS C(3) C(6) CC121 ~(24) OLSC OLST TOBIT
-
20 20.08 20 20 19.9 19.9 l6.9 19.9 19.9
19
censored regression estimator
TRUE OLS C(3) C(6) C(l2) ~(24) OLSC OLST TOBIT
-10 - 9.8 -10 -10 - 10 -10 - 1.5 - 9.63 - 9.66
- 1(20), n = 400.
0,
0,
0,
0.02 0.339 0 0.066 0.1 0.117 0.014 - 0.016 0.185
- 0.05 - 0.053 0 - 0.046 - 0.01 0.005 0.004 -0.04 0.009
0.001 - 0.103 0 - 0.066 - 0.08 - 0.08 - 0.06 -0.18 -0.13
“C(i) = sieve/spline estimator with i knots. TRUE = true value used in estimation. Table lc a- 0.5N(5,100)+0.5N(10,25)
TRUE OLS C(3) C(6) OLSC TOBIT
Table ld CB0.1, n = 600.”
6,
**
8,
-4 -4.1 - 2.91 - 2.95 -3.6 - 3.99
1 0.87 - 2.53 -2.51 0.69 0.98
5 5.9 - 2.09 - 2.08 5.3 4.98
E- 0.5N(5,1) + 0.5N(O.l) @ 0.1, n = 600.
TRUE OLS C(3) C(6) C(l2) OLSC TOBIT
81
0,
03
-4 -4 - 4.0 - 4.02 - 4.03 -2.86 - 4.03
1 0.99 1.0 1.00 0.992 0.66 0.99
5 4.99 5.0 5.00 4.99 3.69 5.03
a 0 = mixture. Table le
Table 1f
E - O.SN(S,l) + O.SN(O,1) @ 0.0, n = 500.
E - 0.8N(0,1) 0 0.2N(O, IO), n = 200.
TRUE OLS C(l0) C(20) OLSC TOBIT
0,
e2
0,
-4 - 4.07 -4 - 3.99 - 3.04 - 5.81
1 1.3 0.81 0.82 0.93 2.6
5 4.94 3.25 3.25 3.48 5.53
TRUE OLS C(3) C(6) W2) ~(24) OLSC OLST TOBIT
8,
6%
2 1.86 2.002.04 2.00+ 1.99 1.24 1.58 1.74
-2 - 1.87 2.00- 1.91 -1.86 -1.84 - 0.646 - 1.13 -1.64
agreement with the ordinary least squares estimates that used the complete and uncensored data. The model is _yt= cip,ix,,Bi + E,, t = 1,. . . , N, with x,, - U( - 20,20) independently. In this first set of results, intercepts, scale term and spline weights are unreported. During this first set of runs, which I made basically to test the program, I found that except where the signal to noise ratio was low, the sieve performed
20
G.M. Duncan, A semi-parametric censored regression estimator
almost as well as the ordinary least squares estimator (OLS).The number of knots didn’t seem to matter. Initially, I imposed the zero median constraint, the unity constraint, and the interquartile range constraint using penalty functions, I found that the penalty functions were never sufficiently severe to make the constraints binding exactly and to allow convergence. And I found that the resulting ‘density’ estimates were strange - highly oscillating if a large number of knots were used. I also found a kind of identification problem, the scale factor was ‘off’ proportionaJly to the same extent the density was unnormalized. That is, 8 = 4 and /f = 4 when u = 1 and /f = 1. In short, even though f and 8 were inconsistent f/S was a density and its interquartile range was approximately the right one. I found a similar problem with ihe intercept. The intercept was off by approximately the same amount that f/3 was shifted off zero. This is if fi should have been 10, I might find p = 5 and median (fj6) = 5; the sum of the two was the correct value. As a result I reprogrammed and imposed the unity constraint directly. Then I deviated from the model of the paper, I did not estimate the intercept or the scale, instead I let the domain of the density that was approximated by a spline be [a, b] and estimated a and b. These results are presented in the next tables. The scale parameter is the IQR calculated from the estimated density, and the intercept is the calculated median. Also for these estimates I calculated the Monte Carlo standard errors based on 100 replications each. Clearly, more work must be done, but I found these results encouraging. Particularly troublesome is the oscillation of the estimated density and the difficulty in obtaining intercept and scale estimates. The former problem is common in density estimation, I don’t know why the intercept and scale are so
Table 2a E -
TRUE C(l7) SI “SI = estimate
N(O,
ye)
4
1.5 -
-3 -3.00 (0.028)
of the
samplingvariance
I),
N = locH).a
f%
Median
4 4.00 (0.059)
0 1.56 (1.08)
of the estimates
IQR 0.68 1.07 (0.235)
based on 100 replications.
Table 2b E - N(0,4),
TR l/E C(l7) SI
n = 1000.
E(E)
4
4
Median
1.5
-2 - 2.36 (0.86)
3 3.52 (0.74)
1.34 (1.44)
IQR 2.8 2.21 (1.77)
G.M. Duncan, A semi-parametric censored regression estimator
21
difficult to estimate - where by difhcult I mean that I need much more data than I need to obtain respectable coefficient estimates. I suspect that it derives from the fact that simple differentiability of the density is sufficient to identify the coefficients, whereas the precise density is required to identify the scale and intercept (cf. Lemma 4 of the appendix). Appendix: Proof of Theorem 2 Lemma
2.
Under the assumptions of this paper,
C.1 holak.
Proof.
+
5
(ln/_-z”‘f(t)dt)(l
-a,),
i-l
where a = {f, r, /IT} and f E S(m), then In L,(a)
= -
i
6,ln7+
i i=l
In [ *T’ j_l ajL;m+l( yq8,
i=l
+ i i=l
In 2~1ajM~m+1( -x,‘p/~) [ j-1
1
(1 - Sj).
Now, In L,(a) is a continuous function of (a, T, p) which, for each m, lie in a compact set Ym; hence In L,,(a) achieves its maximum on 9, and A?: is non-empty for each n, m. Lemma
3.
Proof.
Write
Q.E.D
Under the assumptions of this paper,
X
-“Tel’eoa(t)dth(x)dx, I-CO
C.3 is satisfied.
G. M. Duncan, A semi-parametric
22
censored regression estimator
use the change in variable 5 = (y - XT)/7 also define
h=e,/0,
A = (4 - we,,
and write
xX+[+xTd)dSh(x)dx
+
~+xT+l*F($)]
xG( T+x+z(x)dx, where G(t)
=/’
-03
a[(s)d s
and
F(t) =/’
f(s)ds. -CO
Now in each 9, there are (a, 8) arbitrarily close to (e,,, 0,); so also there are X arbitrarily ,close to one and A arbitrarily close to zero. Pick a sequence (u,,,, p,) -+ (f3,, 0,); then since a(.) is continuous,
and similarly
lnG( -T+x'A)
+lnG(
-$),
and so on. Now f is a linear spline and F is its integral, so both are continuous; hence lnF(-T)+lnF(-21.
G.M. Duncan, A semi-parametric censored regression estimator
23
Also f may be chosen so that sup )a - fl = o( h2) and also supIF-
GI = o(h2).
This follows from the theory of spline approximations again Prenter (1976, p. 47)]. Now write
w%~ b) = //_I,
[lna(t)
and their integrals [see
-lnf(t)]~~(t)dth(x)dx
+//(lnG(-G)-lnF(-?)
+/jm
[lnS(t)-lna(t)]G(f,x)dth(x)dx - Xrs/O
+
-$+x%)-lnG(-q)]
xG( - $+x%)h(x)dx
x(&(t,x)
-ci(r))dth(x)dx
+/[lnG(
-$)
-lnF(
-q)]
.I,!-?,,?I)-G(-q)]Ir(x)dx =A,+A2+A3+A4+A5+A6,
where G(l, x) = ha(ht + xTA).
(A-1)
24
G. M. Duncan, A semi-parametric
censored regression estimator
Now by Assumption 1
an integrable function. Similarly, In f and In F are linear in the tails, so Ilnf(z)kx(xz+xrA)l
~KlzlXa(hz+xTd)h(x)
+F(
-$)G(
-$+xrA)~
where the right-hand sides are integrable by Assumption 2. Hence choose f, u, and p as above and, using dominated convergence, we have term by term Ai = o(h2) and
IAil s ‘CxYY)? where c(x, y) is integrable; hence we’ve shown the existence of a sequence such that K(a,, Lemma 4 g(z; Proof.
a,,,) + 0.
Q.E.D.
(Identification).
Under the assumptions of this paper,
u) = g(z; b),
Vz,
implies
a = b.
Assume g( z; a) = g( z; b), then
;g( Cp) _if( g?),
y>o,
y =
0,
where a = (g, 7, r),
b= (f,oJ),
G(t) = /’ gb)ds, -CO
F(t)
=/’
f(s)ds. --ec
(A.2)
G. M. Duncan, A semi-parametric
(a) y = p. differentiable
censored regression estimator
25
Under our assumptions g and f are in C2( - 1, l), hence are everywhere. Differentiating (A.2) above w.r.t. x, y gives
1 ,B
1
-7 -‘-gu -’a
pg
1 ,Y 7
1
’ =sgfy
hence Y=P* Note, here differentiation so far, undetermined.
(b)
OCT.
Takex=x,,
by a ‘constant’ x makes no sense, so the intercept is,
x$~=x:Y=~,
then
(A-3) Let z = (v - a)/~)
then (A.3) becomes
if(z)=fg(;z), using Assumption (- :, 2) gives
Vz > -u/7.
3 let a/+~ 2 5, then integrating
both sides w.r.t. over
64.4) Since l/2 o(z)d.z=i, J - l/2
Vain A,
by change in variable this gives
Ju/2’g(t)dt=:. -o/27
g(t) is a density w.r.t. df, so ~.(a, b) =/%j)dj (1 is a measure that is absolutely continuous w.r.t. dt. (A.4) implies I*(-;,+)=+.
(A.9
26
G. M. Duncan, A semi-parametric
censored regression estimator
(AS) implies
c(-$5)=i. Since either
or vice versa, we have
P[(-&:,A(-;,;)]=O, where A is the symmetric difference [see Halmos (1950)]; hence cl
II -_=1
27
or
2
-=
fJ
7
1.
(c) PO= y,, (where the 0 indicates the constant term). Utilizing (a) and (b) and taking a = x,/3, where now x0 does not contain an intercept, we may write
p!_+(-y, for all u in the range of x/I. Using a change in variable, we may write
f(z,=g(z-q,
vz>
-qc
Using Assumption 3 x0 can be chosen so that
Then integrating both sides w.r.t. z gives
G. M.
Duncan,A semi-parametric
censored regression estimator
21
from the median constraint. Since measures are shift-invariant,
implies yo-Po/a=O
(d)
f= g.
or
uo=PO.
Finally, since y = p, 7 = u, we have
g(t) =f(r>*
Vt 2 -xy/a,
G(t)
Vt I -xy/a.
= F(t),
Since g, fe
C*( - 1,1) the latter equality implies, by differentiation,
g(t) =f(r),
wt I -x’p/u,
or f(t) Lemma 5. Proof.
=g(t).
Q.E.D.
Under the assumptions of this paper, C.2 is satisfied.
We need to show that K( a, a,) + 0
implies
[Ia - amllA + 0.
Referring to (A.1) of Lemma 3, we write K(a, a,) = (4
+ A2) + (4
+ 4)
+ (A5 + 4).
The sum (A, + A2) is the Kullback-Liebler unknown f; we denote it
information
for known 1y and
Kkf).
The sum (A, + A4) is Kullback-Liebler unknown A, h; we denote it K(l,O;
h, A).
information
for known (Y,y, T and
G. M. Duncan, A semi-parametric
censored regression estimator
Note j_;TB/o=
/_ +(r/o)[-q/r-xA] o.
Finally, the sum (A, + A,) has no particular denoted D. Thus ~(a,
a,) = K&x, f,,,) + K&O;
The fundamental
interpretation,
but will be
A,,,, A,) + 4.
information inequality tells us that
K(a, a,) 2 0,
&(a,
f,) 2 0,
K,(LO;
A,, A,,,) 2 0.
We first show that
K(u,a,)-+O
*
K,+O,
K,--+O.
Say that K(u, a,) = 0, but
K, or K,fO. Then
K,+K,+D,=O
and
-(K~+K,) 0, we have
s~p{lo,-fI,l,I~,-&~}
=O.
Q.E.D.
Finally, we need a covering for the epigraph of H(a,, a) for a E S,,,. That is, we need to construct a suitable covering 0,. Recall for each k, that 2k + 1 f&)
=
c
uiL;*+yt),
i=l
where ai 5 2k + 1 for each i. Let m = 2k + 1. Now take forsome
ai=p/m2 a0
=
a2”+
1 =
l/m2,
p=0,...,m3,
G. M. Duncan, A semi-parametric
30
censored regression estimator
and associate with each m-tuple a the set A,(a)=
{bES~4+lp~a-6~
0 such that r(t) < 1, or that 3t > 0 such that and is negative, then we are done. The former case
r(t) is convex in t, it is If we can show either that r(t) exists and r’(0) exists is obvious, in the latter case
G.
Duncan, A semi-parametric censored regression estimator
31
convexity and r’(0) < 0 implies inf, t 0 r(t) < r(0) = 1. Following Example 1 in Geman and Hwang (1982), we show the latter. Fix 0$, and b, = (a,, fij) ~9’(k). Then (r’(O))=E[lng(t;a)-lng(z;b,)]
+E(lng(z;b,)-lng(z;a,)).
By definition, the latter expectation is I -S -C0; hence r’(0) s E(ln g( z; u) - In g( z; bk)) - 8.
If we can show lE(lng(z;a)-lng(z;b,))(+O
as
m-+cc,
then we are done since then, for m large enough, r’(0) < 0. E[ln dz,
a) - ln g(z, b,)]
ln~(X~-XTP/~)
+
/
fi(--o/4
G(xrd
-
x%)h(x)dx.
(a) Since 6, T are in O,“/,we have ((I - 71 I l/G
censored regression estimator
33
I 2/m, so
for z 2 -1, so
2
q=g+
I-+m Cl
mHfl 1,fi WI
1+1/z.
1
-_m
m3
=0(&i). Hence
r’(0) = o(G) - 8. Finally we need to show that, for some t > 0, r(t) exists. For if r(t,) exists, then r(t) exists for all t < t,. But convexity and r’(0) -C0 imply that, for some t > 0, r(t) < 1. Thus
for m large enough. So
e
Imp25 i
m=l
(cm)‘“(p)“=qm.
m=l
Using the root test we find that, with p -C1, Finally, since m = 2k + 1,
q,
is finite if m = O(nlee),
E > 0.
ln( m - 1) k=
ln2
= O((1 - &)log n).’
‘The key feature in the above proof is that, if f(t) - g(t) = ah2, then In f(t) - In g(t) 5 ah2 if ah2 is small enough, since the logarithm is a concave function. Also implied is F(t) - G(t) = bh2, SO that In F(t) - In G( r) I bh2, dominated convergence does the rest.
34
G. M. Duncan, A semi-parametric
censored regression estimator
References Amemiya, T., 1973, Regression analysis when the dependent variables is truncated normal, Econometrica 41,997-1017. Dieudonne, J., 1960, Foundations of modem analysis (Wiley, New York). Duncan, G.M., 1980a. Formulation and statistical analysis of the mixed continuous/discrete choice model in classical production theory, Econometrica 48, 839-852. Duncan, G.M., 1980b. A relatively distribution robust censored regression estimator, Working paper (Washington State University, Pullman, WA). Duncan, G.M., 1983, On the use and misuse of Gaussian selectivity corrections: Selection bias as proxy variable problem, Research in Labor Economics 6, 333-345. Geman, S. and C.H. Hwang, 1982, Nonparametric maximum likelihood estimation by the method of sieves, Annals of Statistics 70, 40-414. Grenander, U., 1981, Abstract inference (Wiley, New York). Halmos, P., 1950, Measure theory (Van Nostrand, Princeton, NJ). Heckman, J., 1978, Dummy dependent variables in simultaneous equations model, Econometrica 46, 931-961. Heckman, J., 1976, The common structure of statistical models of truncation, sample selection, and limited dependent variables and simple estimator for such models, Annals of Economic and Social Measurement $475-492. Hoadley, B., 1971, Asymptotic properties of maximum likelihood estimators for the independent non-identically distributed case, Annals of Mathematical Statistics 42, 1977-1991. Huber, P., 1967, The behavior of maximum likelihood estimates under non-standard conditions, in: Fifth Berkely symposium on mathematical statistics and probability, Vol. 1 (University of California Press, Berkeley, CA). Lee, L.F. and R.P. Trost, 1978, Estimation of some limited dependent variable models with applications to housing demand, Journal of Econometrics 8, 357-382. Nelson, F., 1977, Censored regression models with unobserved, stochastic censoring thresholds, Journal of Econometrics 6, 309-328. Prenter, M.V., 1976, Splines and variational methods (Wiley, New York). Schmidt, P., 1978, Estimation of a simultaneous equations model with jointly dependent continuous and qualitative variables: The union-earnings question revisited, International Economic Review 19, 453-465. Tobin, J., 1958, Estimation of relationships for limited dependent variables, Econometrica 26, 24-36. White, H., 1982, Maximum likelihood estimation of misspecitied models, Econometrica 50, l-25.