International Statistical Institute, 56th Session 2007
Modelling Transition Probabilities from Irregular Aggregate Longitudinal Data with Application to Working Life Processes Davis, Brett Australian Government Department of Employment and Workplace Relations Canberra, ACT 0200, Australia
[email protected] Heathcote, Christopher The Australian National University, Centre for Mathematics and its Applications Canberra, ACT 0200, Australia
[email protected] Nurminen, Markku The Finnish Institute of Occupational Health, Centre of Expertise for Good Practices and Competence FI-00250 Helsinki, Finland markku.nurminen @ttl.fi
1. Introduction There are two sets of probabilities of interest: marginal probabilities pi(t, y) that an individual is in state i at year t and age y, and one-step transition probabilities pij((t, y),(t+1, y+1)) of being in state j at (t+1, y+1), conditional on having been in state i at the preceding (time, age) point (t,y). Official agencies often use cross-sectional data, in which case only the marginal probabilities can be estimated. Longitudinal irregular data are required to model and estimate the conditional probabilities, expectancies and first passage times. Under a Markov condition, the statistical methodology for estimating one-step transition probabilities was presented by Davis et al. (2002), with an application in Nurminen et al. (2004). In the present paper, we develop and apply material from these two sources. Related work treating marginal probabilities by similar methods can be found in Davis et al. (2001), Heathcote et al. (2003), Millimet et al. (2003) and Nurminen et al. (2005). Multistate life table methods could be invoked, but our preference is for the large-sample logistic regression technique of Davis et al. (2002). Our approach is an alternative to the traditional route followed by demographers and social scientists (e.g. Richards and Abele, 1999) (see, Nurminen and Nurminen, 2005).
2. The Finnish Work Ability Data The data on work ability used for illustrative purposes derives from the cross-sectional surveys of 1981, 1985 and 1992 on ageing and work ability of employees in the municipal sector carried out by the Finnish Institute of Occupational Health; the workers considered were those aged 45 years or over in 1981 (Tuomi, 1999). At any particular age, an individual was classified as belonging to one of the following four mutually exclusive and exhaustive states: having excellent or good work ability (state 1), having fair or poor work ability (state 2), being disabled or deceased (state 3), retired on an old-age or similar pension (state 4). States 1 and 2 are clearly transient (i.e., non-absorbing), whereas states 3 and 4 are assumed to be absorbing. Our interest focuses on the one-step transition probabilities pij (t, y), i = 1, 2 and j = 1,..., 4. The sample consisted of 6,257 active workers in states 1 and 2 in 1981, when the age range was from 45 to 51, giving 7 age cohorts. At the time of the third survey in 1992, with an age span from 56 to 69, 23% of the members were still active in the workforce. Status counts were taken at 7×3 = 21 points of the (age, year) Lexis plane.
3. Stochastic Analysis and Inference The multistate logistic regression technique to be described follows the work of Davis et al. (2002). Suppose that a large sample of like individuals is observed at discrete times y = (y0, y1,..., yn). The time variable y will denote the age of the cohort members along a diagonal of Lexis plane. For a non-absorbing
{
}
state i, let the log-odds be defined as θ ij ( yr , yr +1 ) = log pij ( yr , yr +1 ) pii ( yr , yr +1 ) , transform gives the transition probabilities:
j ≠ i. Inverting the
International Statistical Institute, 56th Session 2007
pij ( yr , yr +1 ) = pii ( yr , yr +1 ) exp{θij ( yr , yr +1 )} , j ≠ i −1
pii ( yr , yr +1 ) = 1+ ∑exp{θij ( yr , yr +1 )} . j ≠i
(3.1)
A non-parametric estimate of pij ( yr , yr +1 ) is p% ij ( yr , yr +1 ) = l%ij ( yr , yr +1 ) l%i ( yr ) , where the random
~
~
variables lij and li denote respectively the observable numbers of lives (individuals) in i at yr and j at yr +1 . Assuming that the Markov condition holds and that the l(0) individuals evolve independently, for fixed i the
~
estimates θ ij , j
i, are, for large l(0), normally distributed with the means θ ij and the covariance
matrix V. Suppose the one-step log-odds are modelled by a linear function of the parameters specified by θ ij ( y , y + 1) = θ ij (
j
(i ), y , y + 1) =
j
(i)′ j (i)
(3.4)
where z j (i) is a vector of explanatory (e.g. occupational) variables and
j
(i) is a parameter vector to be
estimated. The weighted least squares loss function, with weights specified by the inverse of the V, is
) = ∑∑ ( % ( −i, yr , yr +1 ) − ( a
L(
i =1
r
′ −1 , yr , yr +1 ) V ( −i, yr , yr +1 ) % ( −i, yr , yr +1 ) −
)
(
(
, yr , yr +1 )
where a is the number of non-absorbing states, % ( −i, yr , yr +1 ) is the vector of observable log-odds, and
(
, yr , yr +1 ) is an appropriate function of the one-step parameter vector . Let ˆj (i) denote the estimate of the vector (i) . Then θˆ ( y, y + 1) = z (i)′ ˆ(i) is a consistent estimate of θ ( (i ); y, y + 1) , with ij
j
j
j
ij
j
estimates pˆij ( y, y + 1) of the one-step transition probabilities. The result follows: For large l(0) and y