Abstract: The insurance industry recently experienced a high demand for life in- ... ance premium x½x¾ derived through the calculation of annuity values ¿x½x¾:.
ISBA 2000, Proceedings, pp. 000–000 ISBA and Eurostat, 2001
Bayesian Estimation of Joint Survival Functions in Life Insurance ARKADY SHEMYAKIN and HEEKYUNG YOUN University of St. Thomas, Saint Paul, MN, USA
Abstract: The insurance industry recently experienced a high demand for life insurance policies issued to married couples, with payoff due at the second spouse’s death. The fair pricing of such policies is an example of insurance problems requiring the construction of two or higher dimensional survival functions. Unfortunately, while a lot of information is available regarding univariate survival functions, little data can be found allowing for estimation of the association between them. The assumption of independent univariate survival functions of the spouses is not supported by empirical data. The parametric copula models currently used for bivariate constructions are based on spouses’ physical age only. They do not explicitly address such factors as "common disaster"and "broken heart", related to real (chronological) time. We suggest a modification of existing copula models using Bayesian approach, which allows us to incorporate existing information concerning univariate survival functions. Numerical results are presented for MLE and Bayesian estimation. A direction for further research is suggested. Keywords: WEIBULL SURVIVAL FUNCTION; HOUGAARD COPULA; JOINT LAST SURVIVOR INSURANCE.
1. JOINT LAST SURVIVOR INSURANCE In recent years, the insurance industry has experienced increased demand for life insurance policies issued to female-male pairs (mostly, married couples) with the benefit payoff due at the second death of the spouse (joint last survivor policy). These policies are generally used by older couples for estate tax purposes and carry large amounts of insurance. The problem of fair pricing for such policies deals with the evaluation of a two-dimensional survival function for a married couple. To price such insurance, we can use the following formula for the last survivor insurx1 x2 : ance premium Ax1 x2 derived through the calculation of annuity values a 1 X 1 k i Ax1 x2 = 1 ax1 x2 ; ax1 x2 = (1) k px1 x2 ; 1+i k=0 1 + i 463
where x1 and x2 are respectively female and male ages at the policy issue date (entry ages), kpx1x2 is the probability that the "last survivor status" survives for k years after the issue date, and i is the interest rate. 2. DEFINITIONS AND NOTATION Let us denote by
S (t1 ; t2 ) = P (X1 > t1 ; X2 > t2 ) the joint survival function of the spouses, where X1 is the wife’s lifelength (age at death) and X2 is the husband’s lifelength (age at death). Let t pxj
=
P (Xj > xj + t jXj > xj ) for j = 1; 2
be female and male conditional survival probabilities for given entry ages xj . The premium computations for the joint wife-husband policies (see, e.g., (1) or Bowers et al. (1997)) require the estimation of survival probabilities for the joint last survivor status:
t px1 x2
P X1 > x1 + t
=
[
X2 > x2 + t jX1 > x1 ; X2 > x2
(2)
If we assume independence of the spouses’ lifelengths (in Section 6 - Model I), then we can write down: t px1 x2 = t px1 + t px2 t px1 t px2 In general this is not true. We can, however, obtain the following representation of probabilities in (2) in terms of the joint survival function
t px1 x2
=
S (x1 + t; x2 ) + S (x1 ; x2 + t) S (x1 + t; x2 + t) S (x1 ; x2 )
(3)
Bowers et al. (1997) recommend a simpler formula which is practically applicable assuming partial independence between X1 and X2 :
t px1 x2 where
t px1 x2
=
=
t px1
+t
px2
t px1 x2
P (X1 > x1 + t; X2 > x2 + t jX1 > x1 ; X2 > x2 )
(4)
(5)
3. DATA DESCRIPTION In the following sections we will suggest two approaches to the estimation of probabilities (2) and use a numerical example for illustration. The data set we use comes from 14,947 joint last survivor annuity contracts of a large Canadian insurer. The contracts were in payoff status over the observation period December 29, 1988 through December 31, 1993. For each contract, we have information on: - the date of birth, - the date of death (if applicable), - the date of contract initiation (entry age), 464
- sex of each annuitant (paired data). Each couple was included in the data set only once. Multiple contracts to the same couple were eliminated. Information from 11,457 pairs (mostly, married couples) was used in the study. Following insurance industry practice, we rounded entry ages x1 and x2 and ages at death X1 and X2 to the nearest integer. We should point out that our data are left truncated and right censored, which causes some additional difficulties. Additionally, current female and male mortality tables were provided courtesy of a major Minnesota insurance company. They were used for elicitation of the hyperparameters of the Bayesian models. 4. MLE IN COPULA MODEL In some recent studies (see Hougaard (1986), Hougaard et al. (1992), Frees et al. (1996)), the method of copula functions was suggested for the construction of joint survival functions. According to this method, the joint survival function of X1 and X2 is represented as
S (t1 ; t2 ) = C (P (X1 > t1 ); P (X2 > t2 ));
where C (u; v ) is a copula - a function with special properties, mixing the univariate survival functions u and v with an association parameter. Frees et al. (1996) suggest using twoparameter Gompertz or Weibull univariate female and male survival functions and Frank’s copula
C (u; v) = u + v
1
1
ln
1+
e (1 u)
(
e (1 v)
1)(
e
1)
1
with the association parameter 0 ( = 0 for independent univariate lifelengths). Then the maximum likelihood estimator is constructed for the 5-dimensional vector parameter, where the first 4 components correspond to parameters of Gompertz or Weibull univariate survival functions, and the last one is the parameter of association, . Another alternative is the Hougaard’s copula
C (u; v) = exp
n
[(
u +(
ln )
v 1=
ln ) ]
o
with the association parameter 1 ( = 1 in case of independence). Following the general approach of Frees et al. (1996), we construct a Weibull-Hougaard copula model. We assume the joint bivariate survival function S (t1 ; t2 ) to have the form of a Hougaard’s copula, mixing two-parameter Weibull univariate female and male survival functions with scale parameters j and shape parameters j .
8 " 9 < t 1 t 2 #1= = 1 2 S (t1 ; t2 ) = exp + : ; 1 2
(6)
Then we build the maximum likelihood estimates of the five parameters of this model taking account of the right censoring and left truncation as presented in Table 3. Finally, we can evaluate probabilities (2) directly from formula (3). 465
Unfortunately, there are issues, which raise a question of adequacy of either maximum likelihood estimation, or the copula approach as it is used in this situation. Let us illustrate these issues using our data set in the following two subsections. 4.1. Age and Chronology: Shape of the Copula Surface The graphs below depict the surfaces (t1 ; t2 ; S (t1 ; t2 )) and (t1 ; t2 ; @ 2 S (t1 ; t2 )=@t1 @t2 ) built according to model (6) with the parameter values for scale j and shape j estimated by MLE (see Table 3). Association is allowed to vary. Figure 1. Shapes of Copula Surfaces
1 0.75 0.5 5 0.25 5 0 50
0.001 50 60 70
60
0 50 60
80
70 80
50 5 60 70
0.0005 5
80
70 80
90
90 90
90
100 100
100 100
2
1.1 Surfaces (t1 ; t2 ; S (t1 ; t2 )) and (t1 ; t2 ; @t@ @tS 1
1 0.75 0.5 5 0.25 5 0 50
50 60 70
60 80
)
with
0.002 0.0015 0.001 1 0.0005 5 0 50
=1
50 5 60 70
60 80
70
80
70
2
80
90
90 90
90
100 100
100 100
2
1.2 Surfaces (t1 ; t2 ; S (t1 ; t2 )) and (t1 ; t2 ; @t@ @tS 1
2
)
with
=2
1 0.75 0.5 5 0.25 5 0 50
50 60 70
60
0.00015 0.0001 1 0.00005 5 0 50 60 80
70
80
70
50 5 60 70 80
80
90
90 90
90
100 100
100 100
2
1.3 Surfaces (t1 ; t2 ; S (t1 ; t2 )) and (t1 ; t2 ; @t@ @tS 1
466
2
)
with
= 10
On pictures 2.1-3.2 one can observe a "ridge" corresponding to higher values of the joint density, approximately along the diagonal (t1 ; t1 10). The higher the association, the steeper is the slope of the ridge. According to copula model (6), the pairs of (t1 ; t2 ) under this ridge are the ages of a higher life hazard for a couple. Thus a higher association means a higher life hazard for a woman when she is approximately 10 years older than her husband was when he died. However, at least a part of the association between the times of spouses’ deaths is due to common-disaster or broken-heart factors. Therefore, one could expect an increased number of cases when spouses’ deaths occur closely one after the other in real (chronological) time. This increased mortality corresponds to ages (t1 ; t1 + d), where d is the actual age difference between the spouses. These points, depending on d, lie along different diagonals on (t1 ; t2 ) plane, and not directly under the ridge, which we see on the graphs above. It sets some doubt whether common-disaster and broken-heart factors are adequately represented by copula model (6). According to the previous argument, we may suspect that the misrepresentation of the association by the copula model leads to its underestimation by MLE. This should be relatively easy to trace, because there is a direct relationship between the association in the Hougaard’s model and Kendall’s non-parametric correlation (see, e.g., Frees and Valdez (1998) and Youn and Shemyakin (1999)). 4.2 Drifting Univariate Parameters How would underestimation of in the model (6) affect the MLEs of the shape and scale parameters of the underlying two-parameter Weibull distributions? A property of a copula function C (u; v ) is C (u; 1) = u, C (1; v ) = v , so it preserves the marginal survival functions. Therefore one expects that whatever is, model (6) will give us an accurate estimate of j and j . However, the following table demonstrates the result of fixing a value of and then obtaining a MLE for j and j . Table 1. MLE of Weibull Parameters for Given
1 1.2 1.4 1.6 1.8 2 3
FEMALE
1
1
1
1
9.95 10.27 10.25 10.02 9.68 9.30 7.36
92.68 91.17 90.22 89.6 89.19 88.91 88.27
88.15 86.83 85.92 85.25 84.73 84.32 82.78
10.65 10.19 10.10 10.23 10.51 10.86 13.28
MALE
2
2
2
2
7.95 7.98 7.87 7.69 7.46 7.21 5.91
86.34 86.15 86.04 85.98 85.96 85.96 85.99
81.29 81.12 80.96 80.81 80.67 80.53 79.71
12.13 12.06 12.20 12.44 12.77 13.16 15.66
There is a clear downward trend in scale parameter values versus , better detected in the mean lifelenghts j (boldfaced). It gives reason to doubt that underestimation of has indeed no effect on the estimation of j and j . 467
5. BAYESIAN MODELS There are at least three reasons for considering Bayesian approach instead of direct MLE in a full copula model. First, there is a substantial amount of prior knowledge concerning the univariate survival functions (insurance industry experience, census data, etc.). Second, Bayesian methods might help to overcome issues discussed in subsections 4.1 and 4.2. Third, Bayesian methods prove to be not as sensitive as MLE to left truncation of data and possible underreporting of the first death, which seems to be a common problem for the last-survivor insurance data. If formula (4) is used, then the evaluation of t px1 x2 requires only the knowledge of the univariate survival functions and the conditional probability
t px1 x2
=
P (X1 > x1 + t; X2 > x2 + t) P (X1 > x1 ; X2 > x2 )
Consider a vector observation y = (t; c; x1 ; x2 ), where x1 and x2 are the entry ages of the female and male partners respectively, c is the censoring indicator, and t is the termination time. The termination is defined as the first death in a couple or failure of the joint-life status if observed (c = 0, no censoring). Otherwise, t is the end of the observation period (c = 1). Let us assume that
t px1 x2
= exp
w w2 )1= ;
( 1 +
(7)
where for j = 1,2
wj
=
w(t; xj ; j ; j ); w(t; x; ; ) =
x+t
x
(1)
For each value of (x1 ; x2 ) (7) is a Hougaard copula function built on Weibull univariate survival functions conditioned by the entry age. This way we resolve the issue discussed in subsection 4.1: conditioning by the entry ages (x1 ; x2 ) eliminates the conflict between physical age and chronological time. Therefore, the vector of parameters is (; 1 ; 1 ; 2 ; 2 ), where is the parameter of association, and conditional Weibull survival functions have shape parameters j and scale parameters j . The following informative priors reflect the substantial information on the univariate survival functions (cf. mortality tables) and a little knowledge of . Although j and j cannot take on negative values, assumption of normal priors still seems feasible since the means j are more than ten standard deviations away from zero. Model B1: j / N (j ; j ); j / N ('j ; j ); 1 / G(a; b) Model B2: j
/ N (j ; j ); j / N ('j ; j ); () / ; 1 1
Values of the hyperparameters are determined by resampling from industry mortality tables. We will use
468
Table 2. Hyperparameter Values
Female
1
1
'1
1
8.535
0.454
89.62
0.412
Male
2
2
'2
2
7.097
0.485
86.96
0.422
If we consider ci = 0 for i = 1; :::k and ci = 1 for likelihood function for the sample (y1 ; :::; yn ) of size n has the form
i
=
a b 1
1
k + 1; :::n, the
0 0 11= 1 n 2 X X @ wji A C l(; j ; j jy1 ; :::yn ) = exp B @ A i=1
k X 2 Y j wji i=1 j =1
j j
1
j =1
0 11= 2 X j 1 @ (xji + ti ) wji A
1
j =1
Estimates of the posterior means were obtained for Model B1 with the help of MCMC implementation in WinBUGS 1.3 using a popular "ones" trick. However, this trick failed to work for an improper prior in Model B2. In this case an optimization routine was used to estimate the posterior modes for the parameters of interest. 6. COMPARISON OF MODELS The results of implementation of two Bayesian models to the data set described above are presented in the Table 3 below. We also include the results from Model ML: maximim likelihood estimation according to full Weibull-Hougaard copula model (see Section 1) and Model I: no association, marginal survival functions estimated separately under an assumption of independence using maximum likelihood estimation for a Weibull parametric model. Table 3. Parameter Estimates
Parameters Female Male Association
1 1 2 2
Model B1 8.61 89.60 7.20 87.06 1.81
Model B2 8.83 89.68 8.12 87.10 1.82
Model ML 9.96 89.51 7.65 85.98 1.64
Model 9.98 92.62 7.94 86.32 1
Table 3 demonstrates that both Bayesian models suggest a higher value of than Model ML, which goes in tune with the discussion in Section 4. Additionally, Table 4 below provides a more detailed account of the WinBUGS output for Model B1. 469
Table 4. MCMC Parameter Estimates
node
1 1 2 2
mean 1.812 8.614 89.59 7.202 87.06
sd 0.7293 0.655 0.6332 0.68 0.6324
MC error 0.0148 0.0185 0.0182 0.0167 0.0170
2.5 1.049 7.379 88.34 5.847 85.81
median 1.612 8.61 89.59 7.224 87.07
97.5 3.723 9.846 90.82 8.523 88.29
start 4001 4001 4001 4001 4001
sample 5999 5999 5999 5999 5999
A direction of further research is suggested by using a more general formula (3) instead of (4). It will require a more complicated model than (7). An alternative approach emphasizing the use of age difference between the spouses is developed in Youn and Shemyakin (1999). The authors appreciate the support of the Society of Actuaries. Partial support of this work was also provided by an internal research grant of the University of St. Thomas. We also want to thank Tom Louis for valuable comments and many participants of the poster session of ISBA-2000 for the fruitful discussions.
REFERENCES Anderson, J.E., Louis, T.A., Holm, N.V. and Harvald B. (1992). Time Dependent Association Measures for Bivariate Survival Functions, J. Amer. Statist. Assoc. 87, 419 Berger, J.O. and Sun, D. (1993) Bayesian Analysis for the Poly-Weibull Distribution, J. Amer. Statist. Assoc. 88, 1412-1418 Bogdanoff, D.A. and Pierce, D.A. (1973) Bayes Fiducial Inference for the Weibull Distribution, J. Amer. Statist. Assoc. 68, 659–664 Bowers, N., Gerber, H., Hickmann, J., Jones, D. and Nesbitt, C. (1997) Actuarial Mathematics, Schaumburg, Ill.: Society of Actuaries Dellaportas, P. and Wright, D.E. (1991) Numerical Prediction for the Two-parameter Weibull Distribution, The Statistician 40, 365–372 Frees, E., Carriere, J. and Valdez, E. (1996) Annuity Valuation with Dependent Mortality, Journal of Risk and Insurance , 63, 229 Frees, E. and Valdez, E. (1998) Understanding Relationships Using Copulas, North American Actuarial Journal , 2, 1-25 Hougaard, P. (1986) A Class of Multivariate Failure Time Distributions, Biometrika 73, 671–678 Hougaard, P., Harvald, B., and Holm, N.V. (1992) Measuring the Similarities Between the Lifetimes of Adult Twins Born 1881-1930, J. Amer. Statist. Assoc. 87, 17 Smith, R.L. and Naylor, J.C. (1987) A Comparison of Maximum Likelihood and Bayesian Estimators for the Three-parameter Weibull Distribution, Applied Statistics , 36, 358– 369 470
Youn, H. and Shemyakin, A. (1999) Statistical Aspects of Joint Life Insurance Pricing, ASA 1999 Proceedings, Business and Economic Statistics Section , 34-38
471