A Flexible Functional Form Approach To Mortality Modeling: Do We ...

31 downloads 0 Views 1MB Size Report
ABSTRACT. The increasing amount of attention paid to longevity risk and funding for old age has created the need for precise mortality models and accurate ...
Journal of Forecasting, J. Forecast. 36, 357–367 (2017)

Published online 16 August 2016 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/for.2437

A Flexible Functional Form Approach To Mortality Modeling: Do We Need Additional Cohort Dummies? HAN LI, COLIN O’HARE AND FARSHID VAHID ABSTRACT The increasing amount of attention paid to longevity risk and funding for old age has created the need for precise mortality models and accurate future mortality forecasts. Orthogonal polynomials have been widely used in technical fields and there have also been applications in mortality modeling. In this paper we adopt a flexible functional form approach using two-dimensional Legendre orthogonal polynomials to fit and forecast mortality rates. Unlike some of the existing mortality models in the literature, the model we propose does not impose any restrictions on the age, time or cohort structure of the data and thus allows for different model designs for different countries’ mortality experience. We conduct an empirical study using male mortality data from a range of developed countries and explore the possibility of using age–time effects to capture cohort effects in the underlying mortality data. It is found that, for some countries, cohort dummies still need to be incorporated into the model. Moreover, when comparing the proposed model with well-known mortality models in the literature, we find that our model provides comparable fitting but with a much smaller number of parameters. Based on 5-year-ahead mortality forecasts, it can be concluded that the proposed model improves the overall accuracy of the future mortality projection. Copyright © 2016 John Wiley & Sons, Ltd. KEY WORDS

mortality; orthogonal polynomials; cohort effects; forecasting

INTRODUCTION Owing to the rapid growth in life expectancy during the past few decades, longevity risk has now become one of the most significant risks faced by governments, insurance companies and superannuation funds. Moreover, the increased focus on life as a risk that can be commoditized and traded through mortality-linked financial markets has also created the desire to understand and forecast mortality rates more accurately. Therefore, increasing attention has been given to mortality modeling in recent years. According to Booth and Tickle (2008), recent attempts at mortality modeling normally treat age and time effects as the most significant factors that would affect mortality experience (see, for example, Lee and Carter, 1992; Renshaw et al., 1996; Cairns et al., 2006; Plat, 2009). More recently, an additional perceived pattern in mortality data, the ‘cohort effect’ (Renshaw and Haberman, 2006) has been brought to our attention. The cohort effect has been identified in several countries. For example, according to the Government Actuary’s Department (2002) in the UK, the generations born between 1925 and 1945 have experienced a more rapid decline in mortality rates compared to other generations. Adding cohort dummies into mortality models has been shown to improve fitting (Renshaw and Haberman, 2006; Cairns et al., 2009) in some but not all countries. The presence of cohort effects is therefore debatable and its justification is limited. Cairns et al. (2011) discussed the possibility that cohort effects could be fully or partially captured by more complex structure of age and time effects. This has created the primary motivation of our analysis, which is to gain a better understanding of the cohort effects. In this paper, we propose a two-dimensional (2D) Legendre orthogonal polynomials (LOP) model to investigate the feasibility of using age–time effects to capture cohort effects. Furthermore, there have been many studies on the comparison of different mortality models and it is concluded that no single model dominates for all countries (Cairns et al., 2011; Dowd et al., 2010). The result is not surprising since different countries will have different characteristics in mortality experience, and thus it is less likely for one model to provide a best fit for all countries. One of the strengths of our flexible functional form approach is the fact that the model we propose is data driven without prior belief on age, time or cohort structure of the underlying data. Cross-validation will be used to select age–time effects based on the features of a specific country’s mortality experience. Fitting male mortality data for age 50–89 from a range of developed countries over the period 1950–2009, we conclude that our model achieves better or at least comparable fit quality with a much smaller number of parameters when compared to the well-known Lee–Carter model and Plat model in the literature. A check of the residual plots at this stage is done to identify whether cohort additions are necessary and, if so, for which generations and countries.

 Correspondence to: Han Li, Department of Econometrics and Business Statistics, Monash University Melbourne, VIC, Australia. E-mail: [email protected]

Copyright © 2016 John Wiley & Sons, Ltd

358 H. Li, C. O’Hare and F. Vahid The residual plots suggest that cohort effects can be partially captured by age–time effects, and in some cases additional cohort dummies need to be included in the model. After making sure that the residuals look sufficiently random, we conduct backtesting to assess the forecasting performance of the 2-D LOP model. The 5-year-ahead forecasting results show that our model outperforms the Lee–Carter model and the Plat model in the majority of scenarios. The rest of the paper is organized as follows. The next section provides some background information on mortality modeling and further explains the motivations of our research. We introduce the 2-D LOP model in the third section, together with the model selection algorithm. The fourth section gives the fitting and forecasting results of the proposed model from empirical studies. Conclusions are drawn in the fifth section. BACKGROUND Notation We will start the section by defining several actuarial notations and terms that we will use throughout the paper:  Force of mortality, denoted by x;s and also known as the hazard rate, is the instantaneous death probability for a person aged exactly x at time s.  Central mortality rate, denoted by mx;t , reflects the death probability for age x last birthday in the middle of the calendar year t. Based on the assumption that the force of mortality is constant over calendar year t for each age x, which is x;s D x;t for t  s  t C 1, the two types of mortality rate are equivalent to each other.1 Review of existing mortality models The history of mortality modeling can be traced back to the 1800s. Early attempts tried to understand mortality by using a mathematical formula (see, for example, Gompertz, 1825; Makeham, 1860; Perks, 1932).2 However, the main focus of those models was to provide good fitting rather than accurate forecasting results. All these earlier models fit mortality data for a particular calendar year and thus are one-dimensional deterministic models. None of them takes future mortality improvements into account. Over the past few decades, the unanticipated growth in life expectancy has created the desire and need to understand mortality, in particular on the forecasting perspective. Renshaw et al. (1996) developed a model that incorporates age variations as well as underlying time trends in the mortality rates. The model is defined as follows: log.x;t / D ˇ0 C

s X j D1

0

ˇj Lj .x / C

r X i D1

0i

˛i t C

s r X X

ij Lj .x 0 /t 0i

(1)

i D1 j D1

where x 0 and t 0 are normalized age and time indexes mapped onto interval Œ1; 1. Lj .x 0 / denotes the j th-order Legendre orthogonal polynomial. Renshaw et al.’s model can be seen as a two-dimensional version of the Gompertz–Makeham model (Forfar et al., 1988) and it has been used to project continuous mortality investigation (CMI) immediate annuitants’ and life office pensioners’ mortality experience (Sithole et al., 2000). In fact, the incorporation of both age and time dimensions into mortality models has become a commonly adopted practice in the current literature. The introduction of stochastic mortality models in the early 1990s began a new era of mortality modeling. The Lee–Carter model (1992) is one of the earliest stochastic models and it is often considered as the baseline stochastic mortality model for comparison. The model is of the form log.mx;t / D ax C bx t C x;t

(2)

where ax and bx represent age effects and t represents the time effect. Thus the model incorporates both age and time dimensions of the mortality data and through t allows for future mortality improvement. It has been widely used for mortality projection. In order to forecast future mortality rates, a suitable ARIMA process is fitted to the time series t and it is suggested by Lee and Carter (1992) that a random walk with drift process fits t very well. Following the Lee–Carter model, several extensions and variants have been proposed in the last two decades (see, for example, Renshaw and Haberman, 2006; Cairns et al., 2006; Plat, 2009). Plat (2009) came up with a mortality model that maintains good characteristics of the existing time series models before it. The model also incorporates cohort effects. It is of the form 1 2

For readers who are interested in the detailed derivation of the formula, see Dickson et al. (2009). For a review of historical mortality models, see Olshansky and Carnes (1997).

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Flexible Functional Form Approach To Mortality Modeling 359 log.mx;t / D ax C t1 C t2 .xN  x/ C t3 .xN  x/C C t x C x;t

(3)

where ax captures the overall shape of mortality curve as in the Lee–Carter model. t1 and t2 are time effects and t x represents cohort effect for birth year t  x. The term .xN  x/C takes the value of zero if x is above the average age and the value of .xN  x/ if x is below the average age. Therefore t3 represents a time effect to younger ages only. A suitable multivariate stochastic process is fitted to time series .t1 ; t2 ; t3 / for forecasting purposes. A recent study by Li et al. (2015) argued that the time effects in stochastic mortality models could actually be smooth functions of time since the nonparametric smoothers of those time effects show clear parametric patterns. Therefore, stochastic mortality models can be considered as a special case of the Renshaw et al. (1996) model. However, most of the stochastic models impose restrictions on the functional form of mortality surfaces, and this may explain why a model usually performs better for certain countries. It is not difficult to understand that for different countries the shapes of mortality surface are likely to be different, so such restrictions may not be appropriate under all circumstances. Therefore, we have followed Renshaw et al.’s flexible functional form approach and, to allow for greater flexibility, we have used two-dimensional Legendre orthogonal polynomials for both age and time effects. This enables us to provide a general framework for mortality modeling as our model is a generalization of various stochastic and deterministic mortality models. As we are proposing a flexible functional form approach to mortality modeling and do not make any assumptions on the structure of the model, we are able to tailor our model design for different countries. This is a major strength of our approach. Relation of age, time and cohort The cohort effect has become an essential element in recent mortality models. However, to have a better understanding of it, we need to think about the relationship between age, time and cohort first. Obviously, there is an interplay between age, time and cohort. Among the three variables, only two of them can be controlled at the same time as we can express cohort as a linear function of time and age. Defining i as a cohort group, we have i Dt x

(4)

where t 2 Œb C 1; b C T  and x 2 Œa C 1; a C N . a, b, T and N are non-negative integers. Theoretically, this interplay between age, time and cohort suggests that cohort effects could be captured by functions of x and t adequately and therefore the addition of cohort dummies might not be necessary in mortality models. However, as we model mortality rates using smooth polynomial functions, any non-differentiable part of the mortality surface would not be adequately captured by the model since systematically higher or lower mortality rates are outside the domain of smooth polynomials. Therefore, after incorporating all the relevant effects into the model, we will look at the residual plot to see whether all the information in the mortality shape has been adequately captured without adding cohort dummies or whether there are still some diagonal patterns suggesting the need for cohort dummies. This is not the first paper to raise concerns about the necessity of cohort dummies in mortality models (see Cairns et al., 2011). Strong cohort trends only appear in certain countries and, moreover, it has been found that in the Cairns– Blake–Dowd model (Cairns et al., 2006) part of the cohort effects can be substituted by an additional quadratic age–time effect (Cairns et al., 2009). Cairns et al. (2011) claimed that the cohort dummies might not be needed at all in a model with ‘well-chosen’ age and time effects. However, the question then arises as to how to choose those additional effects. According to Cairns et al. (2011), there are two ways of doing this: we could either start with a simple model and try to expand it, or with a general model and then attempt to simplify it. In this paper we choose the latter. The model selection criteria discussed later enable us to find significant factors that affect mortality rates and at the same time ensure the relative parsimony of the model. METHODOLOGY Legendre orthogonal polynomials Orthogonal polynomials have the advantage of stability over conventional polynomials when higher-order terms are included in the model (Raich et al., 2004). The family of classical orthogonal polynomials are the most widely used orthogonal polynomials including the Hermite polynomials, Laguerre polynomials, Jacobi polynomials, Chebyshev polynomials and Legendre polynomials (Suetin, 2001). Without loss of generality,3 we choose Legendre orthogonal polynomials as the basis functions in this paper. The Legendre orthogonal polynomials were introduced by French mathematician Adrien-Marie Legendre in 1782 (Refaat, 2009). They have been frequently applied in technical fields (see, for example, Silva et al., 2013) as well as social sciences, including mortality modeling (see, for example, 3 We have also used other orthogonal polynomials such as Chebyshev polynomials and Laguerre polynomials; it is shown that the difference in results is negligible. These results are available upon request.

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

360 H. Li, C. O’Hare and F. Vahid Renshaw et al., 1996; Sithole et al., 2000). Defining an nth-order Legendre polynomial as 'n .´/, on its interval of orthogonality Œ1; 1, we have the following property: ² Z 1 0; if j ¤ k (5) 'j .´/'k .´/d´ D 1; if j D k 1 Legendre polynomials also satisfy the following three-term recursive relationship: .n C 1/'nC1 .´/ D .2n C 1/´'n .´/  n'n1 .´/; for n D 1; 2; 3; : : :

(6)

Thus the first seven Legendre orthogonal polynomials are as follows: '0 .´/ D 1 '1 .´/ D ´ 1 '2 .´/ D .3´2  1/ 2 1 '3 .´/ D .5´3  3´/ 2 1 '4 .´/ D .35´4  30´2 C 3/ 8 1 '5 .´/ D .63´5  70´3 C 15´/ 8 1 '6 .´/ D .231´6  315´4 C 105´2  5/ 16 In order to normalize the age and time indexes into range Œ1; 1, for x 2 Œa C 1; a C N  and t 2 Œb C 1; b C T , we define  x0 D  t0 D

xxN , where xN is the average age and x0 D a C 1 is the minimum age; x0 xN t tN , where tN is the average time and t0 D b C 1 is the starting time of the t0 tN

investigation.

The parameters of each of these polynomials are given a priori and are not estimated based on the underlying mortality data. In Renshaw et al. (1996) study, they restrict the highest degree of Legendre orthogonal polynomials of age to be 6. In another study on mortality modeling by Li et al. (2015), it is found that the optimal order of polynomials of time is at most 6 for a range of developed countries. Therefore, in this paper, we will consider Legendre orthogonal polynomials up to 6th order for both x 0 and t 0 . As we want to include the interactions between age and time effect in the model, 2-D orthogonal polynomials should be considered. Following the multivariate generalizations of univariate orthogonal polynomials proposed by Mádi-Nagy (2012), we define the 2-D LOP for x 0 and t 0 as 'm;n .x 0 ; t 0 / D 'm .x 0 /'n .t 0 /

(7)

Thus we introduce the 2-D LOP model for mortality rate as log.mx;t / D

6 6 X X

ˇm;n 'm;n .x 0 ; t 0 / C x;t

(8)

mD0 nD0

where ˇm;n is the coefficient for the .m; n/th polynomial. Since the linear span of Legendre orthogonal polynomials is dense in the space of smooth functions on Œ1; 1, our method is an example of sieve regression.4 Estimation and model selection It is well known that non-parsimonious models lead to bad forecasting performance (Tibshirani, 1996). Therefore, it is important to include only polynomial terms that would improve forecasting ability of our model. Modern statistical learning methods use a shrinkage method to improve overall predictive accuracy. This procedure is normally referred as ‘regularization’. Regularization has become a useful tool in modern statistical modeling and it is often used for predictive analysis (Hastie et al., 2009; Zhao and Yu, 2006). In our analysis, a regularized version of model (8) would be preferred for two reasons. Firstly, a good mortality model should not only focus on the quality of fitting: forecast 4

For a recent review of applications of sieve methods, see Hansen (2014).

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Flexible Functional Form Approach To Mortality Modeling 361 ability also needs to be taken into account. Secondly, the parsimony of mortality models is crucial for the purpose of interpretation. We want to have interpretable models so that we can determine which factors have the strongest effects on mortality rates. Tibshirani (1996) introduced the least absolute shrinkage and selection operator (Lasso). The method has become increasingly popular over other regularization methods because it automatically sets parameters of insignificant predictors to zero. In this paper, we use the Lasso method and cross-validation to ‘shrink’ the parameters of our model and set some of the coefficients in model (8) to zero. For x 2 Œa C 1; a C N  and t 2 Œb C 1; b C T , let    

yx;t D log.mx;t /; ˛ D ˇ0;0 ; ˇ D .ˇ0;1 ; ˇ0;2 ; : : : ; ˇ0;n ; ˇ1;0 ; ˇ1;1 ; : : : ; ˇ1;n ; : : : ; ˇm;0 ; ˇm;1 ; : : : ; ˇm;n /T ; X D .'0;1 .x 0 ; t 0 /; : : : ; '0;n .x 0 ; t 0 /; '1;0 .x 0 ; t 0 /; : : : ; '1;n .x 0 ; t 0 /; : : : ; 'm;0 .x 0 ; t 0 /; : : : ; 'm;n .x 0 ; t 0 //. O are defined as The Lasso coefficients .˛; O ˇ/ 8 < aCN X O D arg min .˛; O ˇ/ :

bCT X

.yx;t

xDaC1 t DbC1

9 =  ˛  Xˇ/2 C kˇk ;

(9)

where k:k equals the sum of absolute values of the vector’s elements.  is called the ‘tuning parameter’ and it controls the amount of regularization we impose on the model. As  increases, the number of non-zero coefficients in model (8) decreases. Hence there needs to be a disciplined way of choosing the tuning parameter. According to Tibshirani (1996), the cross-validation method is normally used as a common standard. There are other methods to select the optimal tuning parameter, such as the Akaike information criterion and Bayes information criterion. However, these information criterion based methods rely on a proper estimation of degree of freedom of the model, which is not always easy to achieve in Lasso estimation (Zou et al., 2007). Therefore, in this paper we choose the cross-validation method to select the optimal tuning parameter. Tenfold cross-validation is a widely used algorithm to select the tuning parameter for Lasso estimation (see for example Yu and Feng, 2014). The method divides data into 10 disjoint equal-sized ‘folds’. One fold is then treated as validation data and the remaining nine parts are treated as training data. After finishing the cross-validating process and finding the optimum tuning parameter, we repeat the process nine times, so that every data point is used for both training and validation. These results can then be averaged and we obtain the optimum  for the Lasso estimation. Putting the optimum  into equation (9), we get a set of Lasso coefficients and thus a 2-D LOP model for mortality rates. Residual check Before using the proposed method to project future mortality rates, we want to make sure that residuals look sufficiently random. A check of residual plots will be done to see if additional cohort dummies are needed in the model. The cleanness of the residual plot is important because it indicates how well the model captures patterns in the mortality data. It is not hard to understand that a more precise model will provide more accurate forecasting results. Therefore, if apparent cohort patterns can still be seen in the residual plots, we will include cohort dummies in the model, and model (8) becomes log.mx;t / D

6 6 X X

ˇm;n 'm;n .x 0 ; t 0 / C t x C x;t

(10)

mD0 nD0

where t x is as defined in the Plat model. We then estimate the model and find Lasso estimators for both coefficients ˇm;n and t x . Forecasting future mortality rates The forecasting of future mortality rates is of fundamental importance as adequate pricing of life insurance products relies on the accuracy of mortality projection. To produce an n-year-ahead mortality forecast, first we need to ensure that the values of t 0 are still in the range Œ1; 1. Thus, for t 2 Œb C 1; b C T , redefine t 0 as t  tQ t D ; t0  tQ 0

where tQ D

bCT XCn 1 t bCT Cn

(11)

bC1

Repeating the previous steps (under ‘Legendre orthogonal polynomials’ and ‘Estimation and model selection’, above) to determine the Lasso estimators for the parameters in the 2-D LOP model, we then incorporate values of bCT C1 bCT C2 Cn t D bCT ; ; : : : ; bCT into equation (8) or (10), and get the n-year-ahead mortality forecast. Cn bCT Cn bCT Cn Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

362 H. Li, C. O’Hare and F. Vahid Table I. Fitting results of the 2-D LOP model, Lee–Carter model and Plat model for age 50–89 from 1950 to 2009 2-D LOP model

GB USA AUS NL JPN FR SP

Lee–Carter model

Plat model

p

E1(%)

E2(%)

E3(%)

E1(%)

E2(%)

E3(%)

E1(%)

E2(%)

E3(%)

30 33 26 33 42 43 20

0.055 0.034 0.096 0.065 0.068 0.068 0.163

2.54 2.02 3.44 2.83 2.86 2.84 3.91

3.32 2.62 4.39 3.61 3.67 3.72 5.72

0.088 0.093 0.195 -0.045 0.620 0.328 0.191

3.04 2.64 3.36 4.32 4.08 2.93 2.85

4.16 3.62 4.60 5.62 5.31 3.96 4.01

-0.010 0.071 0.144 0.011 0.206 0.225 0.335

1.43 1.43 2.62 2.11 1.44 1.47 2.64

2.04 2.07 3.78 2.87 2.55 3.13 4.71

Note: p denotes the number of non-zero coefficients in the model.

EMPIRICAL STUDY In this section, we fit the proposed 2-D LOP model to male mortality data from a range of developed countries, including Great Britain (GB), United States (USA), Australia (AUS), Netherlands (NL), Japan (JPN), France (FR) and Spain (SP).5 We have chosen these countries because they are geographically widespread and, furthermore, they could provide a good indication of mortality experience in countries with similar economies. To avoid data quality issues, we will only consider post-war mortality data from 1950 to 2009, even though longer historical data are available. As the primary interest of the paper is to provide accurate mortality forecast for older ages—to which longevity risk is more exposed—the age range we used in our analysis is from 50 to 89, which is in line with other papers in the area (see, for example, Cairns et al., 2009, 2011; Dowd et al., 2010). Deaths and exposure data used in this paper is downloaded from the Human Mortality Database (HMD).6 We first present a comparison of fit quality among the proposed 2-D LOP model, the Lee–Carter model and the Plat model. We choose these models as comparators as they are well recognized and widely used in the literature, and provide both a small-factor (Lee and Carter, 1992) and a large-factor (Plat, 2009) example. Then we will look at the residual plot for each country to see whether the proposed method can capture a cohort effect by introducing higher-order age–time effects into the model. Fitting results Following the investigations of O’Hare and Li (2012) and Li et al. (2015), to statistically assess the fit quality of the proposed model, we define the following measures:  The average error E1, which measures the overall bias, is given by E1 D

1 XX m O x;t  mx;t NT x t mx;t

(12)

 The absolute average error E2 (also known as the mean absolute percentage error), which measures the magnitude of the deviance, is given by E2 D

1 X X jm O x;t  mx;t j NT x t mx;t

(13)

 The standard deviation of error E3, which is a measure to detect large deviance, is given by v u   u 1 XX m O x;t  mx;t 2 E3 D t NT x t mx;t

(14)

From Table I it can be seen that, in most of the scenarios, the 2-D LOP model gives comparable fitting results to the Lee–Carter model and the Plat model. Comparing with the Lee–Carter model, the 2-D LOP model gives better fitting results on all three measures in almost all countries. The only exception is the case of NL, when our method performs slightly worse on the E1 measure. As neither the 2-D LOP model nor the Lee–Carter model incorporates a cohort 5 6

Empirical results for female mortality data are available upon request. The HMD mortality database can be found at http://www.mortality.org

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Flexible Functional Form Approach To Mortality Modeling 363

Figure 1. Residual plots of the 2-D LOP models (without cohort dummies) for GB, USA, AUS, NL, JPN, FR and SP. [Colour figure can be viewed at wileyonlinelibrary.com] Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

364 H. Li, C. O’Hare and F. Vahid effect, it is worth noting that a better fit is achieved with a much smaller number of parameters, since the number of non-zero coefficients in the 2-D LOP model is only around 30, whereas the Lee–Carter model has 138 parameters for age and time effects. It is not surprising that the fitting of the Plat model is generally better than the Lee–Carter model and the 2-D LOP model, as it has more terms and it also incorporates the cohort effect. The total number of parameters in the Plat model is about 10 times larger than the number of parameters in the 2-D LOP model. However, the overall fitting of the 2-D LOP model is still comparable with the Plat model. Detailed forms of the selected model for each country are available upon request. As mentioned in previous sections, our analysis not only assesses the fitting quality based on statistical measures but also takes the cleanness of residual plots into account. Before moving on to future mortality projection, we want to make sure that the residuals look sufficiently random and do not illustrate any clear patterns. Figure 1 plots the residuals for the seven countries included in the analysis. It is shown that the plots of AUS, NL and SP seem to be free of diagonal patterns. Cohort trends can be seen on several diagonals from the plots of USA and JPN. Strong clustering of positives and negatives appears on certain diagonals from the plots of GB and FR. The diagonal patterns after model fitting indicate that there are non-differentiable parts on those countries’ mortality surfaces. To explore possible reasons behind this, we first visit the question of how such diagonal patterns occurred. Cohort effects can be seen as dependency of mortality experience for a group of individuals born in the same year. Therefore, when a historical event significantly affected the mortality experience of certain generations, these patterns will continue in the future if there is a strong dependency on mortality experience within each cohort group. For example, in the residual plot of GB, it can be seen that there are strong diagonal patterns for generations who were born around 1920. This indicates that the survivors of the World War II period seem to have higher mortality rates relative to other generations. Other possible reasons for such diagonal patterns include influenza and natural disasters across the country. It is worthwhile to conduct further investigations on the relationship between such historical events and cohort effects. As those strong cohort trends found in certain generations can be easily captured by cohort dummies, in the next section we will add cohort dummies to the mortality model of GB, USA, JPN and FR.

Figure 2. Residual plots of the 2-D LOP models (with cohort dummies) for GB, USA, JPN and FR. [Colour figure can be viewed at wileyonlinelibrary.com] Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Flexible Functional Form Approach To Mortality Modeling 365 Cohort dummies addition In this study, cohort effects seem to be more apparent in GB, USA, JPN and FR. Therefore, to make sure that the model is precise and there is no useful information left unexplained, in this section we incorporate cohort dummies into the 2-D LOP model for these countries. Following the study of Cairns et al. (2009), we include cohorts with more than five observations and allow the Lasso model selection method to determine significant cohort effects to be included in the model. Detailed forms of the selected model for each country are available upon request. It can be seen in Figure 2 that there has been a significant improvement in the fit quality of the model for all four countries. The magnitudes of these residuals have been considerably reduced. More importantly, after the incorporation of cohort dummies into the model, the residual plots are now free of obvious diagonal patterns. Thus, in the next section, we can produce mortality forecasts for GB, USA, JPN and FR based on the modified models. Future mortality projection This section shows the forecasting results of the proposed 2-D LOP model along with the Lee–Carter model and the Plat model. Following the study of Dowd et al. (2010), we use backtesting to assess the forecasting performance of all three models. As the main focus of this paper is on short- to medium-term forecast horizons, we first fit mortality data of the seven countries from 1950 to 2004, and then produce the 5-year-ahead forecasts. E1, E2 and E3 measures are used to compare the forecasts with the actual observations for the period 2005–2009. Forecasting results from the 2-D LOP model, the Lee–Carter model and the Plat model are presented in Table II. As mentioned above (‘Fitting results’), E1, E2 and E3 look at different aspects of goodness of fit. Since in this paper we mainly assess the overall accuracy of the future morality forecast, the E2 measure will be the one we focus on. We see from Table II that, overall, the Lee–Carter model is marginally better on the E1 measure compared to the 2-D LOP model and the Plat model. The 2-D LOP model gives a more accurate forecast in five out of seven countries on the E2 measure when compared to the Lee–Carter model and six out of seven countries when compared to the Plat model. For example, the 2-D LOP mortality forecast for GB is around 3% points better on the E2 measure than the forecast from the Lee–Carter model and the Plat model. Moreover, the 2-D LOP model gives the best forecasting performance across the three models on the E3 measure in four out of seven countries and are generally between 2% and 3% points better than the other two models. Only for the case of JPN do we have slightly poorer performance of the 2-D LOP model on both E2 and E3 measures. The appearance of structural breaks in some countries’ mortality data could be a possible explanation for the mixed results found in the empirical study. Stochastic models such as the Lee–Carter model and the Plat model normally have unit root process in the time effects, which will lead to a faster adaptation to structural changes in mortality experience. Therefore, if structural breaks have occurred during the investigation period, the 2-D LOP model may not be able to outperform some stochastic mortality models. It would be worthwhile to undertake some further investigations to fully understand the reasons for the relative poorer performance in some scenarios. However, it is still very clear that the 2-D LOP model outperforms the Lee–Carter model and the Plat model and substantially improves the accuracy of mortality forecasting across a wide range of countries. The consistently good performance of the model might be due to the fact that we are using a flexible functional form approach and the 2D LOP model does not impose any restrictions on the structure of the model. Additionally, we only include cohort dummies if the nonlinear function of age and time cannot adequately fit the cohort patterns in the mortality surface. As a result, the 2-D LOP model allows for different model structures for different countries. The special model design for each country ensures that the model is data driven and would capture the features in the underlying mortality experience sufficiently. A comparison of the 5-year-ahead mortality forecast from the three models with the actual mortality experience for GB males aged 50, 60, 70 and 80 is illustrated in Figure 3. It can be seen from these plots that the 2-D LOP

Table II. Five-year forecasting results of male mortality rates for age 50–89 from 2005 to 2009 2-D LOP model

GB* USA* AUS NL JPN* FR* SP

Lee–Carter model

Plat model

E1(%)

E2(%)

E3(%)

E1(%)

E2(%)

E3(%)

E1(%)

E2(%)

E3(%)

3.91 2.12 -3.58 5.14 -1.73 -1.43 2.24

5.80 4.26 5.82 7.84 5.22 3.47 5.10

6.88 5.30 7.90 10.69 6.54 5.07 6.22

1.83 -2.08 1.12 8.13 -2.18 0.63 1.60

8.72 8.37 7.95 11.04 3.50 4.37 4.54

10.31 10.40 9.81 13.14 4.61 5.38 5.55

4.71 2.39 3.65 8.16 1.02 2.28 3.65

8.21 6.03 7.82 9.50 4.40 4.15 5.13

9.60 7.03 9.29 11.61 5.44 4.98 6.43

*Countries with cohort dummies in the 2-D LOP model.

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

366 H. Li, C. O’Hare and F. Vahid

Figure 3. Mortality rates from 1990 with 5-year-ahead forecast from 2005 to 2009 for GB male aged (a) 50, (b) 60, (c) 70 and (d) 80. [Colour figure can be viewed at wileyonlinelibrary.com]

forecast is more accurate than the Lee–Carter forecast and the Plat forecast in most cases, and this is consistent with our conclusion on the forecasting performance based on the three statistical measures. Other countries’ equivalent forecasting plots are available upon request. CONCLUSION In this paper we introduce a flexible functional form approach using 2-D LOP to model mortality rates. The method is data driven and allows us to have a specific model design for each country’s mortality experience. We demonstrate the strengths of the model using mortality data from a range of developed countries. After fitting mortality data to the proposed model, from the residual plots we argue that additional cohort dummies are required for some countries as in those cases cohort effects can only be partially captured by the interactions of age and time effects in the 2-D LOP model. Based on the empirical study, it can be concluded that the proposed 2-D LOP model achieves comparable fit quality and better forecasting performance with a much smaller number of parameters. This would make the 2-D LOP more attractive compared to the existing mortality models in the literature. ACKNOWLEDGEMENTS We are indebted to colleagues from the Department of Econometrics and Business Statistics, Monash University, for valuable feedback received. REFERENCES Booth H, Tickle L. 2008. Mortality modelling and forecasting: a review of methods. Annals of Actuarial Science 3: 3–43. Cairns AJG, Blake D, Dowd K. 2006. A two-factor model for stochastic mortality with parameter uncertainty: theory and calibration. Journal of Risk and Insurance 73: 687–718. Cairns AJG, Blake D, Dowd K, Coughlan GD, Epstein D, Ong A, Balevich I. 2009. A quantitative comparison of stochastic mortality models using data from England & Wales and the United States. North American Actuarial Journal 13(1): 1–35. Cairns AJG, Blake D, Dowd K, Coughlan GD, Epstein D, Khalaf-Allah M. 2011. Mortality density forecasts: an analysis of six stochastic mortality models. Insurance: Mathematics and Economics 48: 355–367. Dickson DCM, Hardy MR, Waters HR. 2009. Actuarial Mathematics for Life Contingent Risks. Cambridge University Press: London. Dowd K, Cairns AJG, Blake D, Coughlan GD, Epstein D, Khalaf-Allah M. 2010. Evaluating the goodness of fit of stochastic mortality models. Insurance: Mathematics and Economics 47: 255–265.

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Flexible Functional Form Approach To Mortality Modeling 367 Forfar DO, McCutcheon JJ, Wilkie AD. 1988. On graduation by mathematical formula. Journal of the Institute of Actuaries 115: 1–49. Transactions of the Faculty of Actuaries 41: 96-245. Gompertz B. 1825. On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies. Philosophical Transactions of the Royal Society 115: 513–585. Government Actuary’s Department. 2002. National Population Projections 2000-Based. HMSO: London. Hansen B. 2014. Nonparametric sieve regression: least squares, averaging least squares, and cross-validation. In The Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics and Statistics, Racine J, Su L, Ullah A (eds). Oxford University Press: Oxford; 215–248. Hastie T, Tibshirani R, Friedman JH. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd edn). Springer: New York. Lee RD, Carter LR. 1992. Modeling and forecasting U.S. mortality. Journal of the American Statistical Association 87: 659–675. Li H, O’Hare C, Zhang X. 2015. A semiparametric panel approach to mortality modeling. Insurance: Mathematics and Economics 61: 264–270. Mádi-Nagy G. 2012. Polynomial bases on the numerical solution of the multivariate discrete moment problem. Annals Operations Research 200: 75–92. Makeham WM. 1860. On the law of mortality and the construction of annuity tables. Journal of the Institute of Actuaries 6: 301–310. O’Hare C, Li Y. 2012. Explaining young mortality. Insurance: Mathematics and Economics 50(1): 12–25. Olshansky SJ, Carnes BA. 1997. Ever since Gompertz. Demography 34: 1–15. Perks W. 1932. On some experiments in the graduation of mortality statistics. Journal of the Institute of Actuaries 63: 12–57. Plat R. 2009. On stochastic mortality modeling. Insurance: Mathematics and Economics 45(3): 393–404. Raich R, Qian H, Zhou GT. 2004. Orthogonal polynomials for power amplifier modeling and predistorter design. IEEE Transactions on Vehicular Technology 53(5): 1468–1479. Refaat EA. 2009. Legendre Polynomials and Functions. CreateSpace: USA. Renshaw AE, Haberman S. 2006. A cohort-based extension to the Lee–Carter model for mortality reduction factors. Insurance: Mathematics and Economics 38: 556–570. Renshaw AE, Haberman S, Hatzopoulos P. 1996. The modeling of recent mortality trends in United Kingdom male assured lives. British Actuarial Journal 2: 449–477. Silva FG, Torres RA, Brito LF, Euclydes RF, Melo ALP, Souza NO, Ribeiro JIJr, Rodrigues MT. 2013. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats. Genetics and Molecular Research 12(4): 6502–6511. Sithole TZ, Haberman S, Verrall RJ. 2000. An investigation into parametric models for mortality projection, with applications to immediate annuitants’ and life office pensioners’ data. Insurance: Mathematics and Economics 27: 285–312. Suetin PK. 2001. Classical orthogonal polynomials. In Encyclopedia of Mathematics, Hazewinkel M (ed). Springer; 643. Tibshirani R. 1996. Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B 58(1): 267–288. Yu Y, Feng Y. 2014. Modified cross-validation for penalized high-dimensional linear regression models. Journal of Computational and Graphical Statistics 23(4): 1009–1027. Zhao P, Yu B. 2006. On model selection consistency of Lasso. Journal of Machine Learning Research 7: 2541–2563. Zou H, Hastie T, Tibshirani R. 2007. On the ‘degrees of freedom’ of the Lasso. Annals of Statistics 35: 2173–2192.

Authors’ biographies:

Han Li is a PhD candidate at the Department of Econometrics and Business Statistics at Monash University since 2013. Her research interest is in the field of mortality modeling and forecasting. Colin O’Hare is an Associate Professor and actuarial program director at the Department of Econometrics and Business Statistics at Monash University. He is a qualified actuary with 15 years experience working in the pensions consulting arena in the UK. His research interests include the modeling and forecasting of mortality rates, the adequacy of superannuation systems and questions around building capacity for the aging population. Farshid Vahid is a Professor at the Department of Econometrics and Business Statistics at Monash University. He is a Fellow of the Academy of the Social Sciences in Australia. His research interests include applied econometrics, time series analysis and forecasting. Authors’ addresses:

Han Li, Colin O’Hare and Farshid Vahid, Department of Econometrics and Business Statistics, Monash University, Melbourne, VIC 3800, Australia.

Copyright © 2016 John Wiley & Sons, Ltd

J. Forecast. 36, 357–367 (2017)

Suggest Documents