The Lee-Carter model may be considered the current gold standard. .... (2006). All these methods assume that the observed number of deaths is a realization.
Uncertainty in mortality forecasting: an extension to the classical Lee-Carter approach Siu-Hang Li, Mary R. Hardy and Ken Seng Tan The University of Waterloo Ontario, Canada May, 2006 Abstract Traditionally, actuaries have modeled mortality improvement using deterministic reduction factors, with little consideration of the associated uncertainty. As mortality improvement has become an increasingly significant source of financial risk, it has become important to measure the uncertainty in the forecasts. Probabilistic confidence intervals provided by the widely accepted Lee-Carter model are known to be excessively narrow, due primarily to the rigid structure of the model. In this paper, we relax the model structure by incorporating heterogeneity in each age-period cell. The proposed extension not only provides a better goodness-of-fit based on standard model selection criteria, but also ensures more conservative interval forecasts of central death rates and hence can better reflect the uncertainty in the forecasts. We illustrate the results using Canadian population mortality data.
Keywords: Confidence interval; Heterogeneity; Negative-binomial distribution; Maximum likelihood estimation.
1
Introduction
In recent years, mortality has improved considerably faster than had been predicted, resulting in unforeseen mortality losses for annuity liabilities ( see, for example, Continuous Mortality Investigation Bureau (CMIB), 2004). As a result, there has been a growing consensus about the need for reliable, scientific models for mortality that can be used to measure not only the expected improvement in mortality over the next few decades, but also the associated uncertainty in the projections. Several solutions have been proposed 1
including the Lee-Carter model (Lee and Carter, 1992), the P-splines regression (Currie et al., 2004) and the parameterized time-series approach (McNown and Rogers, 1989). The Lee-Carter model may be considered the current gold standard. It has been used as the basis of stochastic forecasts of the finances of the U.S. social security system and other aspects of the U.S. federal budget (see Congressional Budget Office of the United States, 1998). It has also been successfully applied to populations in Australia (Booth et al., 2002), Scandinavia (Li and Chan, 2005) and the seven most economically developed countries (G7) (Tulkapurkar et al., 2000). The model has several appealing features. First, the number of parameters is small relative to other stochastic mortality models. Secondly, the model parameters can be interpreted rather easily. Thirdly, the parsimonious model structure gives constraints to the behavior of future death rates, resulting in a stable age pattern of mortality in the projections. This effectively prevents mortality crossovers and various anti-intuitive behaviors that may be encountered in some other stochastic approaches. Nevertheless, the stringent model structure has been seen to generate overly narrow confidence intervals (see, for example, Lee (2000)). This narrowness may result in underestimation of the risk of more extreme outcomes, and this may defeat the original purpose of moving on to a stochastic framework. This phenomenon is an example of model risk (see, e.g., Cairns, 2000). Reduction of model risk can be achieved by relaxing the model structure to obtain a more general class of models. So far, relaxation of the Lee-Carter structure has been confined in the inclusion of additional bilinear terms, from two (Renshaw and Haberman, 2003) to five (Booth et al., 2002). Although these augmented versions can explain a higher proportion of temporal variance than the original model, the time-varying component in each of the additional bilinear terms is typically highly non-linear, which makes forecasting harder. This may explain why the impact of the additional bilinear terms on the width of the interval forecast has not been studied extensively. Our objective in this paper is to explore the feasibility of extending the model from a different angle, by considering individual heterogeneity at the cell level. In more detail, the implementation of any mortality model requires the division of the Lexis plane into cells. For example, the forecaster may divide the Lexis plane into 5000 cells for age 0 to 99 and ex ante period 1951 to 2000. The original Lee-Carter and other conventional methods allow death rates to vary between cells, but not within each cell. That is, individuals within a single cell are assumed to be homogenous. However, researchers have noted that individuals may differ substantially in their endowment for longevity, and that individual differences are important in population-based mortality studies (e.g. see Hougaard, 1984; Vaupel et al., 1979). The primary contribution of this paper is the incorporation of heterogeneity into the Lee-Carter model by introducing an unobserved variate for individual differences in each age-period cell. To justify our contribution, the proposed extension must satisfy the following criteria.
2
1. Provision of wider confidence intervals. Taking account of individual variations, the interval forecasts should encompass a broader range of probable outcomes. 2. Improvement of goodness-of-fit. The improvement of fit should be significant enough that the introduction of additional parameters is worthwhile. We base our conclusions on standard model selection criteria. 3. Retention of the appealing features in the original version. Introduction of additional parameters should be done carefully so that criteria 1 and 2 can be satisfied without making any undesirable distortion in the projected age patterns. In particular, we aim to retain linearity in the time-varying component of the model. The rest of the paper is organized as follows. In Section 2, we provide a brief review of current methodologies of fitting the Lee-Carter, with emphasis on the properties of the confidence intervals obtained under each method. In Section 3, we present our proposed extension. Technical details on model parameter estimation and forecasting are also given. In Section 4, we compare the performance of our proposed generalizations with the older models, using mortality data of the Canadian population. Section 5 concludes the paper.
2
The Lee-Carter model
Since its introduction in 1992, the Lee-Carter model has been widely used in diverse demographic applications. We refer the interested readers to Tabeau (2001) for a comprehensive review of the model. The model describes the central rate of death at age x and time t (mx,t ) by three series of parameters, {ax }, {bx } and {kt }, in the following way. ln(mx,t ) = ax + bx kt + !x,t ,
(1)
where the age-specific parameters ax give the average level of mortality at each age; the time-varying component kt , sometimes referred to as the mortality index, signifies the general speed of mortality improvement; the age-specific component bx that characterizes the sensitivity to kt at different ages; the error term !x,t captures all the remaining variations. The variance of !x,t in Lee and Carter (1992) is assumed to be constant for all x and t. This assumption is relaxed in some of its variants, which will be discussed in later sections. The mortality forecasting in Lee-Carter is performed in two stages. In the first stage, we estimate the parameters ax , bx and kt using historical mortality data. In the second stage, 3
fitted values of kt are modeled by an autoregressive integrated moving average (ARIMA) process, determined by the Box and Jenkins (1976) approach. Finally, we extrapolate kt through the fitted ARIMA model to obtain a forecast of future death rates. Note that parameters on the right hand side of equation (1) are unobservable. Hence, we are not able to fit the model by simple methods like the ordinary least squares. To solve the problem, researchers have proposed a few alternative approaches, which may be roughly categorized as non-likelihood-based and likelihood-based. An in-depth understanding of methods in both categories is important as the differences in their underlying assumptions often lead to dissimilar forecasting results. In the following we give detailed descriptions of model parameter estimation and interval forecasting under each approach.
2.1
Non-likelihood-based methods
In non-likelihood-based methods, we are not required to specify any probability distribution during model parameter estimation. Examples in this category include the method of Singular Value Decomposition (SVD) proposed in Lee and Carter (1992) and the method of weighted least squares (WLS) suggested later in Wilmoth (1993). ! Using SVD, we set ax = T1 Tt=0 ln(mx,t ), where T is the number of periods in the timeseries mortality data, and then we apply SVD to the matrix of {ln(mx,t ) − ax }. The first left and right singular vectors give initial estimates of bx and kt , respectively. To satisfy the constraints for parameter uniqueness, the estimates of bx and kt are normalized so that they sum to one and zero, respectively. As the method of SVD is purely a mathematical approximation, the fitted and actual number of deaths may not be the same. To reconcile the fitted and the observed number of deaths, we are required to make an ad hoc adjustment to kt . Having performed the re-estimation, we can model kt by an appropriate ARIMA process and then obtain the mortality forecast by extrapolation. Let T be the length of the data series and kˆT +s be the s-period ahead forecast of kt . Then, the s-period ahead forecast of ln(mx,t ) is given by ln(m ˆ x,T +s ) = a ˆx + kˆT +sˆbx .
(2)
Assuming that the model specification is correct, the true value of ln(mx,T +s ) can be expressed as
ln(mx,T +s ) = (ˆ a + αx ) + (kˆT +s + uT +s )(ˆbx + βx ) + !x,T +s , 4
(3)
where αx and βx are the error in estimating parameters ax and bx respectively, and uT +s is the stochastic error in the s-period ahead forecast of kt . By combining equations (2) and (3), we have the following expression for the error in the forecast of ln(mx,t ).
Ex,T +s = αx + !x,T +s + (ˆbx + βx )uT +s + βx kˆT +s .
(4)
Unfortunately, there is no analytic solution to the covariance between the error terms and the parameter estimates. Hence, in the computation of variance of the forecast error, we have to assume independence between all the terms on the right hand side of equation (4). Under this assumption, the variance of the forecast error can be expressed as
2 2 2 2 2 2 ˆ2 ˆ2 σE,x,T +s = σα,x + σ",x,T +S + (bx + σβ,x )σk,T +s + σβ,x kT +s ,
(5)
2 2 where σα,x and σβ,x correspond to the variance of the error in estimating ax and bx , 2 2 respectively; σ",x,T +s is the variance of !x,T +s ; and σk,T +s is the stochastic error in the 2 s-step ahead forecast of kt . σα,x can be estimated by the dividing the variance of ln(mx,t ) 2 over time by T , while σ",x,T +s can be estimated by the variance of the error in fitting age 2 group x. The quantity σβ,x can be estimated by a small-scaled bootstrap (see Lee and 2 Carter, 1992). The form of σk,T +S depends on the order of the ARIMA process.
To generate the interval forecast, we make the additional assumption that normality holds. This gives the following expression of the approximate 95% point-wise interval forecast of ln(mx,t ). ˆ x,T +s ) + 1.96σE,x,T +s . ln(mx,T +s ) = ln(m
(6)
In most cases, the original Lee-Carter (rank-1 SVD approximation) gives a!good fit in terms of the explained proportion of temporal variance, measured by s21 / ni=1 s2i . To further improve the fit, researchers later considered using a rank-p SVD approximation. For example, Renshaw and Haberman (2003) considered p = 2 and Booth et al. (2002) considered p = 5. Under the rank-p SVD approximation, equation (1) is generalized to
ln(mx,t ) = ax +
p "
(i)
b(i) x kt + !x,t .
(7)
i=1
Undoubtedly, the rank-p 1) SVD approximation can explain a higher proportion !p (p 2> ! of temporal variance, i=1 si / ni=1 s2i . However, it makes forecasting more complicated 5
because (1) the time-varying component in the higher order terms are often non-linear, which may not be well handled by an ARIMA process, and (2) analytic solutions to the covariance between errors in estimating the additional parameters are not available, which means stronger assumptions are required in computing the interval forecast. In the weighted least squares (WLS) method the model parameters ax , bx and kt are derived by minimizing: " x,t
wx,t (ln(mx,t ) − ax − bx kt )2 ,
(8)
where wx,t can be taken as the reciprocal of the number of deaths at age x and in time period t. Interval forecasting in WLS is similar to that in SVD, and therefore we do not restate the mathematical details. Note that interval forecasting under WLS also requires the normality assumption.
2.2
Likelihood-based methods
In likelihood-based methods, we specify a probability distribution for the death counts. Examples in this category include the method of maximum likelihood estimation (MLE) considered by Wilmoth (1993) and implemented later by Brouhns et al. (2002), and the method of generalized linear models (GLM) employed by Renshaw and Haberman (2006). All these methods assume that the observed number of deaths is a realization of a Poisson distribution with mean equal to the expected number of deaths under the Lee-Carter model, that is,
Dx,t ∼ Poisson(Ex,t exp(ax + bx kt )),
(9)
where Dx,t and Ex,t are the number of deaths and exposures-to-risk at age x and time t respectively. In the method of MLE, we obtain estimates of the model parameters by maximizing the log-likelihood function, which is given by
l(a, b, k) =
" x,t
(Dx,t (ax + bx kt ) − Ex,t (exp(ax + bx kt )) + c
6
(10)
where a, b and k are vectors of the parameters ax , bx and kt ; and c is a constant that is free of the model parameters. The maximization can be accomplished via standard Newton’s method. By differentiating both sides of equation (10), we can immediately see that the observed and fitted number deaths over time are equal when the algorithm converges. This implies that the ad hoc re-estimation of kt required for the SVD method is not required here. Having estimated the model parameters, we can project future values of kt by a properly identified ARIMA process. The s-period ahead forecast of the ln(mx,t ) is again given by equation (2). The interval forecast of ln(mx,t ) cannot be computed analytically, but it may be obtained by using parametric bootstrap (Brouhns et al., 2005), which we now summarize. 1. Simulate N realizations from the Poisson distribution with mean equal to the fitted number of deaths under the Lee-Carter. In the illustration presented in later sections, we use the transformed rejection method (H¨ormann, 1993) to generate the Poisson random numbers. 2. For each of these N realizations: (a) Re-estimate the model parameters ax , bx and kt using MLE. (b) Specify a new ARIMA process for the re-estimated kt . (c) Compute future values of ln(mx,t ) using the re-estimated ax and bx , and the simulated future values of kt under the newly specified ARIMA process. 3. Step (2) gives an empirical distribution of ln(mx,T +s ) for all x and s. The 2.5th and the 97.5th percentiles of the empirical distribution respectively give the lower and upper limit of the 95% interval forecast of ln(mx,T +s ). It is noteworthy that the above algorithm allows both the sampling fluctuation in the model parameters and stochastic error in the forecast of kt be included in the interval forecast. Furthermore, the algorithm does not require the assumption of normality and this makes an asymmetric confidence interval possible. Koissi et al. (2005) also proposed a very similar method, known as residual bootstrapping, to compute the interval forecast of ln(mx,t ). The details of this method are omitted in this review. In the method of GLM, we use the log-link in modeling the “responses” Dx,t . The linear model can be written as
7
ln(Dx,t ) = ln(Ex,t ) + ax + bx kt ,
(11)
where ln(Ex,t ) is the offset. As usual, parameters ax , bx and kt are unobservable and have to be estimated by the following algorithm. 1. Set arbitrary starting values of bx . 2. Treat bx as a known covariate; estimate ax and kt . 3. Treat ax and kt as known covariates; estimate bx . 4. Repeat steps (2) and (3) until the deviance converges. Steps (2) and (3) can be performed easily using S-plus, for example. Methods of GLM and MLE give the same parameter estimates if the same constraints for parameter uniqueness are chosen. Mean forecasting of ln(mx,t ) is identical to that in MLE and interval forecasting can also be achieved by bootstrapping.
2.3
Problems associated with the existing approaches
The non-likelihood-based methods are purely mathematical. In particular, in SVD, values of kt obtained from the first right singular vector are often illegitimate and thus require further adjustment through an ad hoc procedure. Interval forecasting is quite straightforward, but it requires very strong assumptions. First, even though an analytic formula for the variance of forecast error of ln(mx,t ) is available, co-variances between the error terms and the model parameter estimates are ignored to avoid the mathematical complexities. This would inevitably deflate the variance of the forecast error and make the interval forecast narrower than it should be. Second, as the methods specify no probabilistic distribution during model parameter estimation, normality has to be assumed in the computation of the interval forecast. Inter alia, the assumption of normality rules out the possibility of an asymmetric interval forecast. The existing likelihood-based methods assume that observed number of deaths is a realization of a Poisson distribution. In this setting, interval forecasts and standard errors of parameters can be estimated by bootstrapping, although it is computationally intensive. The bootstrap explicitly reflects the statistical dependency between the model parameters in the interval forecast, and it also allows a simultaneous consideration of parameter uncertainty and stochastic uncertainty arises from the process of kt . Moreover, we can obtain asymmetric interval forecasts as they are based on the percentiles of the simulated empirical distribution rather than normality. The interval widths are however close to 8
that obtained by non-likelihood-based methods since there is no change in the number of parameters. Finally, the likelihood-based methods may give a marginally better fit as the ad hoc re-estimation of kt is not required. Nevertheless, there are several drawbacks associated with the existing likelihood-based methods. In assuming Poisson models, we are imposing the mean-variance equality restriction on the variable Dx,t . In practice, however, the variance is can be greater than the mean. McCullagh and Nelder (1989) pointed out that overdispersion is commonplace and therefore it should be assumed to be present to some extent unless it is shown to be absent. If overdispersion exists, the analysis of data using a single parameter distribution such as Poisson will lead us to overestimating the degree of precision. In other words, in mortality forecasting, the interval forecasts of central death rates under the assumption of Poisson death counts will be overly narrow. Cox (1983) also pointed out that there is a possible loss of efficiency if a single parameter distribution is used when overdispersion exists.
3
The proposed extension
In this section we attempt to loosen the original model structure carefully so that restrictions on the movement of future death rates can be reduced without depriving the excellent properties of the original version. Motivated by the mean-variance equality restriction in the original likelihood-based estimation methods, we begin with the probability distribution assumed for the number of deaths. Recall that in the original likelihood-based methods, we assume that Dx,t follows a Poisson distribution and that we presume that individuals within each age-period cell are homogenous. However, other than age and time, there are various factors affecting human mortality, for example, ethnicity, education, occupation, marital status and obesity (Brown and McDaid, 2003), and these factors may divide the exposures-to-risk in each age-period cell into clusters that differ in their likelihood of death. The presence of clustering will not only violate the assumption of homogeneity, but also induce extra variation that is not reflected in the interval forecasts computed by the previous methods. To account for the possibility of clustering, we segregate each age-period cell into Nx,t clusters of equal size, where Nx,t is assumed to be non-random. This implies that the ith cluster will have Ex,t /Nx,t exposures-to-risk and Dx,t (i) deaths, where i = 1, ..., Nx,t . The total number of deaths, Dx,t , can therefore be expressed as
Dx,t = Dx,t (1) + Dx,t (2) + ... + Dx,t (Nx,t ). 9
(12)
We assume that Dx,t (i)s are independently distributed, and #
$ Ex,t Dx,t (i)|zx (i) ∼ Poisson zx (i) exp(ax + bx kt ) , Nx
(13)
where zx (i) is a random variable accounting for heterogeneity of individuals. Note that it is age-specific and that it varies from cluster to cluster. When zx (i) > 1, individuals in cluster i are more frail than the overall, and similarly when 0 < zx (i) < 1, individuals in cluster i are less frail. Although any distribution with positive support can be a candidate for zx (i), here we assume that zx (i) follows a Gamma distribution for tractability, that is,
zx (i) ∼ Gamma(ι−1 x , ιx ).
(14)
In fact, Gamma distribution is often utilized in modeling heterogeneity, for example, see Hougaard (1984), Vaupel et al. (1979), and Wang and Brown (1998). We follow the usual parameterization of Gamma distribution with E[zx (i)] = 1 and Var[zx (i)] = ιx , where ιx > 0. It can be easily shown that
E[Dx,t (i)] =
Ex,t exp(ax + bx kt ), Nx,t
(15)
which will be denoted as µx,t . Also,
E[Dx,t ] = Ex,t exp(ax + bx kt ),
(16)
which means our extension still complies with the Lee-Carter specification. Furthermore, we can show that the unconditional distribution of Dx,t (i) can be written as
Γ(y + ι−1 x ) Pr[Dx,t (i) = y] = y!Γ(ι−1 x )
#
µx,t µx,t + ι−1 x
$y #
ι−1 x µx,t + ι−1 x
$ι−1 x
.
(17)
By letting αx = ιx /Nx , equation (17) immediately leads to
Pr[Dx,t
Γ(y + αx−1 ) = y] = y!Γ(αx−1 )
#
Nx µx,t Nx µx,t + αx−1 10
$y #
αx−1 Nx µx,t + αx−1
$α−1 x
.
(18)
This shows that Dx,t follows a Negative Binomial distribution unconditionally. It is interesting to note that the Negative Binomial distribution has another statistical interpretation. In more detail, we may assume that
Dx,t |wx ∼ Poisson(wx Ex,t exp(ax + bx kt )),
(19)
where wx is a random variable that captures part of the variations not explained by the Poisson distribution, such as the random error in recording the age at death. If we assume further that wx ∼ Gamma(αx−1 , αx ),
(20)
then the unconditional distribution of Dx,t is the same as that specified by equation (18). Summing up, the introduction of Gamma-distributed heterogeneity in age-period cells is equivalent to the assumption that Dx,t follows a Negative Binomial distribution instead of a Poisson one. More importantly, we have Var[Dx,t ] = E[Dx,t ] + αx {E[Dx,t ]}2 ,
(21)
which means that the extension explicitly allows for overdispersion via the dispersion parameters αx s. In other words, the assumption of the mean-variance equality is not required, and consequently measures of uncertainty under the proposed extension can capture a large part of the variation that is ignored in the original model. Note that the limiting case αx → 0 yields a Poisson distribution. Furthermore, the proposed extension gives an additional parameter vector αx to the model without altering the structure specified by equation (1). This introduces more flexibility while still preserves the desirable features such as the linearity in kt and the stability of the age-pattern of mortality over time. We may use maximum likelihood to estimate the model parameters. The log-likelihood function is given by
l(a, b, k, α) =
" x,t
%&Dx,t −1 " i=0
ln
#
1 + αx i αx
$'
˜ x,t ) + Dx,t ln(αx D
( ˜ x,t ) + c, −(Dx,t + αx−1 ) ln(1 + αx D 11
(22)
˜ x,t = Ex,t exp(ax + bx kt ), and c is a constant that is free of ax , bx , kt and αx . where D We may use a standard Newton’s procedure as shown below.
a(v+1) x
=
a(v) x
!
(v) (v) t (Dx,t − αx gx,t ) , ! (v) (v) (v) ˆ (v) t (−αx gx,t )/(1 + αx Dx,t )
−
b(v+1) = b(v) x x , (v+1)
kt
(v)
= kt ,
αx(v+1) = αx(v) ;
b(v+2) x
=
b(v+1) x
(23)
−
ax(v+2) = a(v+1) , x (v+2)
kt
(v+1)
= kt
!
(v+1) (v+1) (v+1) (v+1) − αx kt gx,t ) t (Dx,t kt , ! (v+1) (v+1)2 (v+1) (v+1) ˆ (v+1) (−α k g )/(1 + α D ) x x t x,t x,t t
,
αx(v+2) = αx(v+1) ;
(v+3) kt
=
(v+2) kt
−
ax(v+3) = a(v+2) , x
(24)
!
(v+2) (v+2) (v+2) (v+2) − αx bx gx,t ) x (Dx,t bx , ! (v+2) (v+2)2 (v+2) (v+2) ˆ (v+2) g (−α b )/(1 + α D ) x x x x,t x,t x
b(v+3) = b(v+2) , x x αx(v+3) = αx(v+2) ;
(25)
12
αx(v+4)
( ! ) (v+3) (v+3) (v+3) Dx,t + + r − g f x (v+3) t x,t x,t t αx * = αx(v+3) − + ,# (v+3) (v+3) ! Dx,t +2rx,t αx (v+3) (v+3) 2 + (v+3)2 + gx,t − (v+3)2 t hx,t αx
αx
(v+3) Dˆx,t (v+3) ˆ (v+3) 1+αx D x,t
$- ,
ax(v+4) = a(v+3) , x b(v+4) = b(v+3) , x x (v+4)
kt
(v+3)
= kt
(26)
;
(v) (v) (v) (v) (v) (v) (v) (v) ˆ x,t = Ex,t exp(ax + bx kt ); and fx,t , gx,t , hx,t and rx,t are defined by: where D
(v) fx,t
=
Dx,t −1 #
" i=0
(v) gx,t
(v)
1 + αx i
−
1 (v)
αx
$
,
(27)
$+ # ,−1 1 (v) (v) ˆ x,t ˆ 1 + αx(v) D = Dx,t Dx,t + (v) , αx
(v)
hx,t =
Dx,t −1
" i=0
(v)
i
rx,t =
.
1 (v+3)2 αx
−i2
(v)
(1 + αx i)2
−
1 (v)2
αx
(v+3)
ˆ x,t ln(1 + αx(v+3) D
/
,
(28)
(29)
).
(30)
As with the standard Lee-Carter, we normalize bx and kt to sum to unity and zero, respectively. The iteration stops when the change in the log-likelihood function is sufficiently small, say 10−6 . Usually, the rate of convergence is lower than that in the Poisson-based MLE. A simple approach for achieving faster convergence is using the SVD or Poisson ML estimates as the starting values. Finally, we can use parametric bootstrapping to obtain interval forecasts of ln(mx,t ). The required generation of Negative Binomial random numbers is fairly easy (see Ross, 2002). Renshaw and Haberman (2005) proposed another method to handle over-dispersion in the Lee-Carter model. They assume that Dx,t follows the Poisson distribution given by equation (9). The keep E[Dx,t ] = Ex,t exp(ax +bx kt ) as before, but they set Var[Dx,t ] = φE[Dx,t ] 13
to relax the mean-variance equality. The scale parameter φ can be estimated by the ratio of the residual deviance and its degree of freedom, given by (X −1)(T −2), where X is the total number of age groups. The advantage of this method is that we can circumvent the assumption of a probability distribution for the unobserved heterogeneity. Nevertheless, there are several problems associated with this approach. First, the relationship between E[Dx,t ], Var[Dx,t ] and the probability function of Dx,t is internally inconsistent. Second, as the dispersion parameter is not involved in the estimating equations, ML estimates of model parameters remain unchanged. In other words, the goodness-of-fit is not improved for certain. Third, as only a single dispersion parameter is used, the percentage increase in Var[Dx,t ] is the same for all ages. This is unrealistic as the increase should be age-specific. For instance, variations at the advanced ages tends to be higher due to the inaccuracy of reported age at death and the sampling error when the number of deaths and the number of exposures-to-risk are small. Fourth, the dispersion parameter may not be integrated into the bootstrapping of interval forecasts, which is the primary concern in this study.
4
The Canadian population example
This section evaluates the performance of our proposed extension using the mortality data of the Canadian population, provided by the Human Mortality Database (HMD). The required data, that is, Ex,t and Dx,t are given for x = 0, 1, ..., 99, and t = 1921, 1922, ..., 2001. Through a graphical analysis of residuals, Koissi et al. (2005) has shown that the method of Poisson-based MLE provides a better fit than the methods of SVD and WLS. We shall therefore focus on the comparison between the Poisson-based MLE and our proposed generalization.
4.1
Parameter estimates
Figures 1 and 2 show that the influence of the extension on the base age pattern ax and the relative speed of improvement bx is minor. Figure 3 suggests that the linearity in the time-varying component is preserved. Figure 4 shows that the estimates of αx are non-negative, justifying the presence of overdispersion. It is also noteworthy that values of αx are especially high at the advanced ages, indicating a high level of heterogeneity within cells. For instance, the elderly may be classified by their activity of daily living (ADL) limitations (Kassner and Jackson, 1998), and their mortality is a strongly interacted with the number of ADL they have. From another point of view, the high values αx are also consistent with the empirical fact that vital statistics at the extreme ages are subject to significant recording and sampling errors (see, e.g.,Bourbeau and Desjardins, 2002). 14
0
-1
-2
-3 ax
-4
-5
-6 Poisson-based MLE The proposed extension
-7
-8 0
10
20
30
40
50
60
70
80
90
Age (x )
Male
0 -1 -2 -3 -4 ax
-5 -6 -7
Poisson-based MLE The proposed extension
-8 -9 0
10
20
30
40
50
60
70
80
Age (x )
Female
Figure 1: Estimate of parameter ax .
15
90
0.045 Poisson-based MLE The proposed extension
0.04 0.035 0.03 0.025 bx
0.02 0.015 0.01 0.005 0 0
10
20
30
40
50
60
70
80
90
Age (x )
Male
0.025 Poisson-based MLE The proposed extension 0.02
0.015 bx
0.01
0.005
0 0
10
20
30
40
50
60
70
80
Age (x )
Female
Figure 2: Estimate of parameter bx .
16
90
80 Poisson-based MLE The proposed extension
60 40 20
kt
0 -20 -40 -60 -80 -100 1921
1931
1941
1951
1961
1971
1981
1991
2001
Year
Male
150 Poisson-based MLE The proposed extension
100
50
kt
0
-50
-100
-150 1921
1931
1941
1951
1961
1971
1981
Year
Female
Figure 3: Estimate of parameter kt .
17
1991
2001
0.12
0.1
0.08 !x
0.06
0.04
0.02
0 0
10
20
30
40
50
60
70
80
90
60
70
80
90
Age (x )
Male
0.08
0.07
0.06
0.05 !x
0.04
0.03
0.02
0.01
0 0
10
20
30
40
50 Age (x )
Female
Figure 4: Estimate of parameter αx .
18
4.2
Goodness-of-fit Males Number of Parameters Log-likelihood AIC SBC
Poisson-based MLE 279 -66,021 -66,300 -67,276
The Proposed Extension 379 -45,517 -43,896 -45,222
Females Number of Parameters Log-likelihood AIC SBC
Poisson-based MLE 279 -52,643 -52,922 -53,898
The Proposed Extension 379 -40,748 -41,127 -42,453
Table 1: Comparison of selection information for Poisson MLE and the proposed extension. Table 1 gives a formal evaluation of the goodness-of-fit. The significantly higher loglikelihood suggests that our proposed extension provides a better fit. Nevertheless, under the principal of parsimony, we should make use of the least possible number of parameters for adequate representations, and it is therefore inappropriate to base the conclusion only on the change in the log-likelihood, as we have introduced additional parameters (αx ). To account for the extra parameters, we can use the following model selection criteria. 1. The Akaike Information Criterion (AIC) (Akaike, 1974), defined by l − j, where l is the log-likelihood and j is the number of parameters. The AIC merits the increase in log-likelihood, but penalizes for the introduction of additional parameters. Models with a higher value of AIC are more preferable. 2. Schwarz-Bayes Criterion (SBC) (Schwarz, 1978), defined by l − 0.5j ln(n), where n is the number of observations. The intuitions of SBC and AIC are similar. Again, we prefer models with a higher SBC. 3. Likelihood-ratio test (LRT) (see, e.g., Klugman et al. 2004). The null hypothesis of LRT is that there is no significant improvement in the more complex model. Let l1 be the log-likelihood of under Poisson MLE, and l2 be the log-likelihood under our proposed extension. The test statistics is 2(l2 − l1 ). Under the null hypothesis, the test statistic has a chi-square distribution, with degrees of freedom equal to the number of additional parameters.
19
The values of the AIC and the SBC are presented in Table 1. The p-values of the LRTs are less than 10−6 . All three criteria support the use of our proposed extension. Furthermore, as shown in Figures 5 and 6, the increased flexibility seems capable of correcting the under-fit manifested near the forecast origin, particularly at the young ages. However, the relaxed model structure has no discernable harm to the projected age pattern of mortality, as shown in Figure 9.
4.3
Interval forecasts
Finally, we obtain the interval forecasts by parametric bootstrapping, and the results are presented graphically in Figures 5 to 8. Notice that the percentage increase in the interval width is proportional to αx . On average, the increase in width at the younger ages (040) is around 6 percent and that at the higher ages (75 and over) is over 100 percent. This agrees with our previous assertion that the mean-variance equality restriction in the Poisson MLE has lead us to understating the variations, mostly at the higher ages.
5
Concluding remarks
In this paper, we relaxed the orginal Lee-Carter model by incorporating heterogeneity in each age-period cells. By assuming that the unobserved heterogeneity follows a Gamma distribution, we demonstrated that the distribution of the number of deaths is Negative Binomial. This generalization implicitly introduced an extra parameter vector (the agespecific dispersion parameters) to the model. As the main specification of the model was unaltered, the extension retains all the appealing features, such as the linearity in the time-varying component and the stability of age-patterns, of the original version. Given the more conservative confidence intervals and the improved goodness-of-fit, the proposed generalization seems to be an attractive alternative to the original version. For mathematical convenience, we assumed a single parameter Gamma distributions for the unobserved heterogeneity. Although, the Gamma distribution may be replaced by any continuous distribution with positive support, the mixture of distribution may not be carried out analytically. In this case, the ML estimates of model parameters may have to be determined by a combination of numerical integrations and the EM algorithm (see Brillinger, 1986). In all versions of the Lee-Carter we mentioned, the modeling proceeds in two steps: we first estimate the model parameters ax , bx , kt and αx (if applicable), and then the estimate of kt is further modeled by an ARIMA process for extrapolation. Czado et al. (2005) pointed out that this two-step procedure may give rise to incoherence and they proposed using the Markov Chain Monte Carlo (MCMC) method that could combine the two steps and 20
0
Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE)
-2
Mean forecast (the proposed extension) 95% Interval (the proposed extension) Raw data
-4 ln(mx)
-6
-8
-10
-12 1921
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
2041
2051
2041
2051
Year
Age 0
-5
Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE)
-5.5
Mean forecast (the proposed extension) 95% Interval (the proposed extension) Raw data
-6
-6.5 ln(mx)
-7
-7.5
-8
-8.5 1921
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
Year
Age 20
Figure 5: The fit (1921-2001) and the projection (2002-2051) of ln(mx,t ) at representative ages, Poisson-based MLE and the proposed extension, male. 21
-4.5
Mean forecast (Poisson-based MLE)
-5
95% Interval (Poisson-based MLE) Mean forecast (the proposed extension) 95% Interval (the proposed extension) Raw data
-5.5
-6 ln(mx)
-6.5
-7
-7.5
-8
-8.5 1921
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
2041
2051
2041
2051
Year
Age 40
0
Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE) Mean forecast (the proposed extension)
-0.5
95% Interval (the proposed extension) Raw data
-1 ln(mx)
-1.5
-2
-2.5 1921
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
Year
Age 95
Figure 6: The fit (1921-2001) and the projection (2002-2051) of ln(mx,t ) at representative ages, Poisson-based MLE and the proposed extension, male. 22
-1
-2
-3
-4 ln(mx)
-5
-6
-7
Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE) Mean forecast (the proposed extension) 95% Interval (the proposed extension)
-8
-9 1921
Raw data
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
2041
2051
2001
2011
2021
2031
2041
2051
Year
Age 0
-5
-6
-7
-8 ln(mx)
-9
-10 Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE) Mean forecast (the proposed extension) 95% Interval (the proposed extension)
-11
Raw data
-12 1921
1931
1941
1951
1961
1971
1981
1991 Year
Age 20
Figure 7: The fit (1921-2001) and the projection (2002-2051) of ln(mx,t ) at representative ages, Poisson-based MLE and the proposed extension, female. 23
-4.5
-5
-5.5
-6
ln(mx)
-6.5
-7
-7.5
-8 Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE)
-8.5
Mean forecast (the proposed extension) 95% Interval (the proposed extension) Raw data
-9
-9.5 1921
1931
1941
1951
1961
1971
1981
1991
2001
2011
2021
2031
2041
2051
2001
2011
2021
2031
2041
2051
Year
Age 40
0
-0.5
-1 ln(mx)
-1.5
-2 Mean forecast (Poisson-based MLE) 95% Interval (Poisson-based MLE) Mean forecast (the proposed extension)
-2.5
95% Interval (the proposed extension) Raw data
-3 1921
1931
1941
1951
1961
1971
1981
1991 Year
Age 95
Figure 8: The fit (1921-2001) and the projection (2002-2051) of ln(mx,t ) at representative ages, Poisson-based MLE and the proposed extension, female. 24
0
-2
-4
-6 ln(mx)
-8
-10
-12
Mean forecast (Poisson-based MLE) Mean forecast (the proposed extension)
-14 0
10
20
30
40
50
60
70
80
90
60
70
80
90
Age (x )
Male
0
Mean forecast (Poisson-based MLE) Mean forecast (the proposed extension)
-2
-4 ln(mx)
-6
-8
-10
-12 0
10
20
30
40
50
Age (x )
Female
Figure 9: Projected age pattern of mortality in 2051, Poisson-based MLE and the proposed extension. 25
might lead to desirable smooth variations over the Lexis plane. It would be interesting to incorporate our proposed extension with the MCMC method for a deeper understanding of the uncertainty associated with the parameter estimates and forecasts.
References Akaike, H. (1974). A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19, 716-723. Booth, H., Maindonald, J. and Smith, L. (2002). Applying Lee-Carter under Conditions of Variable Mortality Decline. Population Studies, 56, 325-336. Box, G.E.P. and Jenkins, G.M. (1976). Time Series Analysis Forecasting and Control. 2nd ed., San Francisco: Holden-Day. Bourbeau, R. and Desjardins, B. (2002). Dealing with Problems in Data Quality for the Measurement of Mortality at Advanced Ages in Canada. North American Actuarial Journal, 1, 1-25. Brillinger, D.R. (1986). A Biometrics Invited Paper with Discussion: The Natural Variability of Vital Rates and Associated Statistics. Biometrics, 42, 693-734. Brouhns, N., Denuit, M. and Vermunt, J.K. (2002). A Poisson Log-bilinear Regression Approach to the Construction of Projected Lifetables. Insurance: Mathematics and Economics, 31, 373-393. Brouhns, N, Denuit, M. and Keilegom, I.V. (2005). Bootstrapping the Poisson Log- bilinear Model for Mortality Forecasting. Scandinavian Actuarial Journal, 3, 212-224. Brown, R.L. and McDaid, J. (2003). Factors Affecting Retirement Mortality. North American Actuarial Journal, 2, 24-43. Cairns, A.J.G. (2000). A Discussion of Parameter and Model Uncertainty in Insurance. Insurance: Mathematics and Economics, 27, 313-330. Congressional Budget Office of the United States. (1998). Long-Term Budgetary Pressures and Policy Options. Washington, D.C.: U.S. Government Printing Office. 26
Continuous Mortality Investigation Bureau (2004). Projecting Future Mortality: A Discussion Paper. CMI Working Paper no. 3. London: Institute of Actuaries and Faculty of Actuaries. Cox, D.R. (1983). Some Remarks on Overdispersion. Biometrika, 70, 269-274. Currie I. D., Durban, M. and Eilers, P. H. C. (2004). Smoothing and Forecasting Mortality Rates, Statistical Modelling, 4, 279-298. Czado, C., Delwarde, A., Denuit, M. (2005). Bayesian Poisson Log-bilinear Mortality Projections. Insurance: Mathematics and Economics, 36, 260-284. H¨ormann, W. (1993). The Transformed Rejection Method for Generating Poisson Random Variables, Insurance: Mathematics and Economics, 12, 39-45. Hougaard, P. (1984). Life Table Methods for Heterogeneous Populations: Distributions Describing the Heterogeneity. Biometrika, 71, 75-83.
Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Available at www.mortality.org or www.humanmortality.de (data downloaded on 25 May 2005). Kassner, E. and Jackson, B. (1998) Determining Comparable Levels of Functional Disability. Washington, DC: American Association of Retired Persons. Koissi, M.C., Shapiro, A.F. and Hognas, G. (2005). Evaluating and Extending the LeeCarter Model for Mortality Forecasting: Bootstrap Confidence Interval. Insurance: Mathematics and Economics, in press. Klugman, S.A., Panjer, H.H., and Willmot, G.E. (2004). Loss Models: from Data to Decisions. 2nd edition. New York: Wiley. Lee, R. and Carter, L. (1992). Modeling and Forecasting U.S. Mortality. Journal of the American Statistical Association, 87, 659 - 671.
27
Lee, R. (2000) The Lee-Carter Method for Forecasting Mortality. North American Actuarial Journal, 4, 80-93. Li, S.H. and Chan, W.S. (2005). Outlier Analysis and Mortality Forecasting: the United Kingdom and Scandinavian countries. Scandinavian Actuarial Journal, 3, 187-211. McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd edition, Chapman and Hall. McNown, R. and Rogers, A. (1989). Forecasting Mortality: A Parameterized Time Series Approach. Demography, 26, 645-650. Renshaw, A.E. and Haberman, S. (2003). Lee-Carter Mortality Forecasting with Agespecific Enhancement. Insurance: Mathematics and Economics, 33, 255-272. Renshaw, A.E. and Haberman, S. (2006). A Cohort-based Extension to the Lee-Carter Model for Mortality Reduction Factors. Insurance: Mathematics and Economics. In press. Ross, S.M. (2003). Simulation, 3rd edition. San Diego, California: Academic Press. Schwarz, G. (1978). Estimating the Dimension of a Model. Annals of Statistics, 6, 461464. Tabeau, E. (2001). A Review of Demographic Forecasting Models for Mortality. In Tabeau, E., Jeths, A.V.D.B. and Heathcote, C. (eds.), Forecasting Mortality in Developed Countries - Insights from a Statistical, Demographic and Epidemiological Perspective. Dordrecht/Boston/London: Kluwer Academic Publishers. Tuljapurkar, S., Li, N. and Boe, C. (2000). A universal pattern of mortality decline in the G7 countries. Nature, 405, 789-792. Vaupel, J.W., Manton, K.G. and Stallard, E. (1979). The Impact of Heterogeneity in Individual Frailty on the Dynamics of Mortality. Demography, 16, 439-454. Wang, S.S., Brown, R.L. (1998). A Frailty Model for Projection of Human Mortality 28
Improvement. Journal of Actuarial Practice, 6, 221-241. Wilmoth, J.R. (1993). Computational Methods for Fitting and Extrapolating the LeeCarter Model of Mortality Change. Technical report. Department of Demography. University of California, Berkeley.
29