Deloitte Consulting LLP ... Same β for every data point; but α varies by group ..... Panel data analysis is a type of hierarchical modeling â this is a natural.
Introduction to Hierarchical Modeling
CAS RPM Seminar Las Vegas March, 2009
Jim Guszcza Bill Stergiou Deloitte Consulting LLP
Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding – expressed or implied – that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
1
Topics
Hierarchical Modeling Theory Sample Hierarchical Model Hierarchical Models and Credibility Theory Case Study #1: Poisson Regression Case Study #2: Loss Reserving
Copyright © 2009 Deloitte Development LLC. All rights reserved.
2
Hierarchical Model Theory
Hierarchical Model Theory
What is Hierarchical Modeling? • Hierarchical modeling is used when one’s data is grouped in some important way. • • • • • • •
Claim experience by state or territory Workers Comp claim experience by class code Income by profession Claim severity by injury type Churn rate by agency Multiple years of loss experience by policyholder. …
• Often grouped data is modeled either by: • Pooling the data and introducing dummy variables to reflect the groups • Building separate models by group
• Hierarchical modeling offers a “third way”. • Parameters reflecting group membership enter one’s model through appropriately specified probability sub-models. Copyright © 2009 Deloitte Development LLC. All rights reserved.
5
What’s in a Name? • Hierarchical models go by many different names • • • • •
Mixed effects models Random effects models Multilevel models Longitudinal models Panel data models
• We prefer the “hierarchical model” terminology because it evokes the way models-within-models are used to reflect levels-within-levels of ones data. • An important special case of hierarchical models involves multiple observations through time of each unit. • Here group membership is the repeated observations belonging to each individual. • Time is the covariate. Copyright © 2009 Deloitte Development LLC. All rights reserved.
6
Varying Slopes and Intercepts Random Intercept Model
Random Slope Model
Random Intercept / Random Slope Model
• Intercept varies with group • Slope stays constant
• Intercept stays constant • Slope varies by group
• Intercept and slope vary by group
• Each line represents a different group
Copyright © 2009 Deloitte Development LLC. All rights reserved.
7
Common Hierarchical Models • Notation: • Data points (Xi, Yi)i=1…N • j[i]: data point i belongs to group j.
• Classical Linear Model
Yi = + Xi + i
• Equivalently: Yi ~ N( + Xi, 2) • Same and for every data point
• Random Intercept Model
Yi = j[i] + Xi + i
• Where j ~ N(, 2) & i ~ N(0, 2) • Same for every data point; but varies by group
• Random Intercept and Slope Model
Yi = j[i] + j[i]Xi + i
• Where (j, j) ~ N(, ) & i ~ N(0, 2) • Both and vary by group
Yi ~ N j [ i ] j[ i ] X i ,
2
j where ~ N , j
Copyright © 2009 Deloitte Development LLC. All rights reserved.
2 ,
2 8
Parameters and Hyperparameters • We can rewrite the random intercept model this way:
Yi ~ N ( j[i ] X i , 2 )
j ~ N ( , 2 )
• Suppose there are 100 levels: j = 1, 2, …, 100 (e.g. SIC bytes 1-2) • This model contains 101 parameters: {1, 2, …, 100, }. • And it contains 4 hyperparameters: {, , , }. • Here is how the hyperparameters relate to the parameters:
ˆ j Z j ( y j x j ) (1 Z j ) ˆ
where Z j
nj 2 nj
2
• Does this formula look familiar? Copyright © 2009 Deloitte Development LLC. All rights reserved.
9
Sample Hierarchical Model
Example • Suppose we wish to model a company’s policies in force, by region, for the years 2005-08. • 8 * 4 = 32 data points. 2700
Policies in Force by Year and Region
• One way to visualize the data:
3
2
2200
7 8
4 8 7
1 6 5
5 8 7 8 7 2 1 3 5
5 1 6
2 6
6 1 2005
Copyright © 2009 Deloitte Development LLC. All rights reserved.
4
2
4
2300
2400
2500
3
2000
– Use a trellis-style display, with one plot per region – More immediate representation of the data’s hierarchical structure. – (see next slide)
2100
pif
• Alternate way:
2600
– Plot all of the data points on the same graph, use different colors/symbols to represent region.
4 3
2006
2007
year
2008 11
Trellis-Style Data Display • We wish to build a model that captures the change in PIF over time. • We must reflect the fact that PIF varies by region.
2006
2007
2008
2006
2007
2008
2600 2000
2200
pif
2400
2600 2400 2000 2005
2005
2006
2007
2008
2005
2006
2007
region5
region6
region7
region8
2007
2008
year
2005
2006
2007
2008
year
2400 2000
2200
pif
2400 2000
2200
pif
2400 2000
2200
pif
2400 2200
2006
Copyright © 2009 Deloitte Development LLC. All rights reserved.
2008
2600
year
2600
year
2600
year
2600
year
2000 2005
region4
2200
pif
2400 2000
2200
pif
2400 2000
2200
pif
2005
pif
region3
2600
region2
2600
region1
2005
2006
2007
year
2008
2005
2006
2007
year
2008
12
Option 1: Simple Regression • The easiest thing to do is to pool the data across groups -- i.e. simply ignore region • Fit a simple linear model • Alas, this model is not appropriate for all regions
2006
2007
2008
2006
2007
2008
2600 2400 2000
2200
pif
2200
pif
2400
2600
region4
2000 2005
2005
2006
2007
2008
2005
2006
2007
region5
region6
region7
region8
2008
2400 2000
2200
pif
2400 2000
2200
pif
2400 2200 2000
pif
2200
Copyright2005 © 20092006Deloitte Development LLC. All rights reserved. 2007 2008 2005 2006 2007
2008
2600
year
2600
year
2600
year
2600
year
2000
2400
2400 2000
2200
pif
2400 2000
2200
pif
2005
pif
region3
2600
region2
2600
region1
PIF t
2005
2006
2007
2008
2005
2006
2007
2008
13
Option 2: Separate Models by Region • At the other extreme, we can fit a separate simple linear model for each region. • Each model is fit with 4 data points. • Introduces danger of over-fitting the data.
2007
2008
2008
2600 2000
2200
pif 2005
2006
2007
2008
2005
2006
2007
region6
region7
region8
2400 2000
2200
pif
2400
pif
2200 2000
pif
2200 2000
2400
2008
2008
2600
region5
2600
year
2600
year
Copyright ©20052009 Deloitte Development LLC.2005All rights 2006 2007 2008 2006 reserved. 2007
k 1, 2 ,..,8
2400
2600 2400
pif
2200 2007
year
2200
2400
2006
region4
2000 2005
kt k
year
2000
pif
2400 2000
2200
pif
2400
pif
2200 2000
2006
2600
2005
k
region3
2600
region2
2600
region1
PIF
2005
2006
2007
2008
2005
2006
2007
2008
14
Option 3: Random Intercept Hierarchical Model • Compromise: Reflect the region group structure using a hierarchical model.
PIF ~ N ( j[ i ] t , ) 2
2006
2007
2008
2006
2007
2008
2600 2400 2000
2000
2200
pif
2400
2600
region4
2200
pif 2005
2005
2006
2007
2008
2005
2006
2007
region5
region6
region7
region8
2008
2400 2000
2200
pif
2400 2000
2200
pif
2400 2200 2000
pif
2200
Copyright LLC. All rights reserved. 2005 © 2009 2006 Deloitte 2007 Development 2008 2005 2006 2007
2008
2600
year
2600
year
2600
year
2600
year
2000
2400
2400 2000
2200
pif
2400 2000
2200
pif
2005
pif
region3
2600
region2
2600
region1
j ~ N ( , ) 2
2005
2006
2007
2008
2005
2006
2007
2008
15
Compromise Between Complete Pooling & No Pooling
PIF
PIF t
k
kt k
k 1, 2 ,..,8
Complete Pooling
No Pooling
• Ignore group structure altogether
• Estimating one model for each group
Compromise
Hierarchical Model • Estimates parameters using a compromise between complete pooling and no pooling methodologies
PIF ~ N ( j[ i ] t , 2 ) Copyright © 2009 Deloitte Development LLC. All rights reserved.
j ~ N ( , 2 ) 16
Option 1b: Adding Dummy Variables • Question: of course it’d be crazy to fit a separate SLR for each region. • But what about adding 8 region dummy variables into the SLR?
PIF 1 R1 2 R2 ... 8 R8 t • If we do this, we need to estimate 9 parameters instead of 2. • In contrast, the random intercept model contains 4 hyperparameters: , , , • Now suppose our example contained 800 regions. If we use dummy variables, our SLR potentially requires that we estimate 801 parameters. • But the random intercept model will contain the same 4 hyperparameters. Copyright © 2009 Deloitte Development LLC. All rights reserved.
17
Varying Slopes • The random intercept model is a compromise between a “pooled” SLR and a separate SLR by region.
PIF ~ N ( j[i ] t , 2 )
j ~ N ( , 2 )
• But there is nothing sacred about the intercept term: we can also allow the slopes to vary by region.
Yi ~ N j [i ] j[ i ] X i ,
2
j where ~ N , j
2 ,
2
• In the dummy variable option (1b) this would require us to interact region with the time t variable… i.e. it would return us to option 2. • Great danger of overparameterization.
• Adding random slopes adds considerable flexibility at the cost of only two additional hyperparameters. • Random slope only: • Random slope & intercept:
, , , , , , , ,
Copyright © 2009 Deloitte Development LLC. All rights reserved.
18
Option 4: Random Slope & Intercept Hierarchical Model • We can similarly include a sub-model for the slope .
2007
2008
2600
2600 2006
2007
2008
pif
2200 2000 2005
2006
2007
2008
2005
2006
2007
year
region5
region6
region7
region8
2007
2008
2005
2006
2007
2008
2400 2000
2200
pif
2400 2000
2200
pif
2400 2000
2200
pif
2400 2200
2006
Copyright © 2009 Deloitte Development LLC. All rights reserved. year year
2008
2600
year
2600
year
2600
year
2000 2005
2400 2000
2005
2
region4
2200
pif
2400 2000
2200
pif
2400
pif
2200 2000
2006
2600
2005
pif
region3
2600
region2
2600
region1
2 ,
2400
PIFi ~ N j[ i ] j[ i ] t i ,
2
j where ~ N , , j
2005
2006
2007
year
2008
2005
2006
2007
year
2008
19
Does Adding Random Slopes Improve the Model? • How do we determine whether adding the random slope term improves the model? 1. Graphical analysis and judgment: • the random slopes arguably yield an improved fit for Region 5. • but it looks like the random slope model might be overfitting Region 3. • Other regions a wash
2. Out of sample lift analysis. 3. Akaike information Criterion [AIC]: -2*LL + 2*d.f. • Random intercept AIC: 380.40 • Random intercept & slope AIC: 380.64 • Slight deterioration better to select the random intercept model.
• Random slopes don’t help in this example, but it is a very powerful form of variable interaction to consider in one’s modeling projects.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
20
Parameter Comparison • It is important to distinguish between each model’s parameters and hyperparameters. ,
, , ,
SLR region 1 2 3 4 5 6 7 8
intercept 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0
slope 100.1 100.1 100.1 100.1 100.1 100.1 100.1 100.1
• SLR: • Random intercept: • Random intercept & slope:
random intercept intercept slope 1911.3 100.1 2087.8 100.1 2236.1 100.1 2267.3 100.1 1980.3 100.1 1932.3 100.1 2066.8 100.1 2061.8 100.1
, , , , , random intercept & slope intercept slope 1999.3 70.3 2070.2 111.2 2137.0 137.4 2159.6 133.2 2033.1 79.3 2008.9 73.8 2066.3 101.2 2069.5 94.1
2 parameters and 2 hyperparameters 9 parameters and 4 hyperparameters 16 parameters and 6 hyperparameters
• How do the hyperparameters relate to the parameters? Copyright © 2009 Deloitte Development LLC. All rights reserved.
21
Connection with Credibility Theory
Hierarchical Models and Credibility Theory • Let’s revisit the random intercept model.
PIF ~ N ( j[i ] t , 2 )
j ~ N ( , 2 )
• This is how we calculate the random intercepts {1, 2, …, 8}:
ˆ j Z j ( y j t j ) (1 Z j ) ˆ
where Z j
nj 2 nj
2
• Therefore: each random intercept is a credibility-weighted average between: • The intercept for the pooled model (option 1) • The intercept for the region-specific model (option 2)
Copyright © 2009 Deloitte Development LLC. All rights reserved.
23
Within Variance and Between Variance ˆ j Z j ( y j t j ) (1 Z j ) ˆ
where Z j
nj 2 nj
2
• Low within variance () and high between variance () high credibility (Z) More like separate models (option 2) • High within variance () and low between variance () low credibility (Z) More like complete pooling (option 1) Copyright © 2009 Deloitte Development LLC. All rights reserved.
24
Hierarchical Models and Credibility Theory • This makes precise the sense in which the random intercept model is a compromise between the pooled-data model (option 1) and the separate models for each region (option 2).
ˆ j Z j ( y j t j ) (1 Z j ) ˆ
where Z j
• As 0, the random intercept model option 1 • As , the random intercept model option 2
nj 2 nj
2
(complete pooling) (separate models)
• Aside: what happens to the above formula if we remove the covariate t from our random intercept model?
Copyright © 2009 Deloitte Development LLC. All rights reserved.
25
Bühlmann’s Credibility and Random Intercepts • If we remove the time covariate (t) from the random intercepts model, we are left with a very familiar formula:
ˆ j Z j y j (1 Z j ) ˆ
where Z j
nj 2 nj
2
• Therefore: Bühlmann’s credibility model is a specific instance of hierarchical models. • The theory of hierarchical models gives one a practical way to integrate credibility theory into one’s GLM modeling activities.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
26
Sample Applications • Territorial ratemaking or including territory in a GLM analysis. • The large number of territories typically presents a problem.
• Vehicle symbol analysis • WC or Bop business class analysis • Repeated observations by policyholder • Experience rating • Loss reserving • Short introduction to follow
Copyright © 2009 Deloitte Development LLC. All rights reserved.
27
Summing Up • Hierarchical models are applicable when one’s data comes grouped in one or more important ways. • A group with a large number of levels might be regarded as a “massively categorical value”… • Building separate models by level or including one dummy variable per level is often impractical or unwise from a credibility point of view.
• Hierarchical models offer a compromise between complete pooling and separate models per level. • This compromise captures the essential idea of credibility theory. • Therefore hierarchical model enable a practical unification of two pillars of actuarial modeling: • Generalized Linear Models • Credibility theory
Copyright © 2009 Deloitte Development LLC. All rights reserved.
28
Other thoughts • The “credibility weighting” reflected in the calculation of the random effects represents a “shrinkage” of group-level parameters (j, j) to their means (, ). • The lower the “between variance” (2) the greater amount of “shrinkage” or “pooling” there is. • There is more shrinkage for groups with fewer observations (n). • Panel data analysis is a type of hierarchical modeling this is a natural framework for analyzing longitudinal datasets. • Multiple observations of the same policyholder • Loss reserving: loss development is multiple observations of the same AY claims
Copyright © 2009 Deloitte Development LLC. All rights reserved.
29
Case Study Hierarchical Poisson Regression
Modeling Claim Frequency • Personal auto dataset. • 67K observations. • Build Poisson claim frequency models.
• AREA and BODY_TYPE are highly categorical values. • We can treat these as dummy variables or as random intercepts. • Note several levels of Body Type have few exposures.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
31
Model #1: Standard Poisson Regression • We build a 4-factor model • • • •
Vehicle Value Driver Age Area (territory) Vehicle body type
• Many levels of AREA, BODY_TYPE are not statistically significant. • Note: levels of BODY_TYPE with few exposures have large GLM parameters. • Dilemma: should we exclude these levels, judgmentally temper them, or keep them as-is? Copyright © 2009 Deloitte Development LLC. All rights reserved.
32
Model #2: Random Intercepts for Area and Body Type • Rather than use dummy variables for AREA and BODY_TYPE we can introduce “random effects”. • Methodology equally applicable even with many more levels.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
33
Model #3: Add Vehicle Value Random Slope • Intuition: Relationship between vehicle value and claim frequency might vary by type of vehicle. • Response: Introduce random slopes for VEH_VALUE.
Copyright © 2009 Deloitte Development LLC. All rights reserved.
34
Model Comparison • Shrinkage: The hierarchical model estimates (green, blue) are less extreme than the standard GLM estimates. • Different stories: All models agree for (e.g.) Sedans (10K+ exposures) but tell much different stories for (e.g.) Coupes (300 exposures). Area=A Age Group=3
15000
25000
0.5 0.4 0.1 0.0
5000
15000
25000
35000
5000
15000
25000 veh_value
RDSTR
SEDAN
UTE GLM Random Intercept Ran Inter & Slope
0.2
0.3
0.4
35000
0.1 0.0
0.0
claim frequency
0.2
0.3
GLM Random Intercept Ran Inter & Slope
0.1
claim frequency
0.4
GLM Random Intercept Ran Inter & Slope
0.5
veh_value
0.5
veh_value
Copyright © 2009 Deloitte Development LLC. All rights reserved.
0.0
0.3
claim frequency
0.3
35000
GLM Random Intercept Ran Inter & Slope
0.2
0.5 0.4
GLM Random Intercept Ran Inter & Slope
0.0
0.0 0.1
0.2
0.3
0.4
0.5
5000
claim frequency
COUPE
0.1
0.2
0.3
claim frequency
0.4
GLM Random Intercept Ran Inter & Slope
0.1
claim frequency
CONVT
0.2
0.5
BUS
35
Hierarchical Growth Curve Loss Reserving Model
Hierarchical Modeling for Loss Reserving • Here is a garden-variety loss triangle (Dave Clark CAS Forum 2003): AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 chain link chain ldf growth curve
12 358 352 291 311 443 396 441 359 377 344
24 1,125 1,236 1,292 1,419 1,136 1,333 1,288 1,421 1,363
3.491 14.451 6.9%
1.747 4.140 24.2%
Cumulative Losses in 36 48 60 1,735 2,183 2,746 2,170 3,353 3,799 2,219 3,235 3,986 2,195 3,757 4,030 2,128 2,898 3,403 2,181 2,986 3,692 2,420 3,483 2,864
1.455 2.369 42.2%
1.176 1.628 61.4%
1.104 1.384 72.2%
1000's 72 3,320 4,120 4,133 4,382 3,873
84 3,466 4,648 4,629 4,588
96 3,606 4,914 4,909
1.086 1.254 79.7%
1.054 1.155 86.6%
1.077 1.096 91.3%
108 3,834 5,339
120 3,901
reported 3,901 5,339 4,909 4,588 3,873 3,692 3,483 2,864 1,363 344
est ult 3,901 5,434 5,379 5,298 4,858 5,111 5,672 6,787 5,644 4,971
reserve 0 95 470 710 985 1,419 2,189 3,922 4,281 4,627
1.018 1.000 1.018 1.000 98.3% 100.0%
34,358
53,055
18,697
• We can regard this as a longitudinal dataset. • Grouping dimension: Accident Year (AY) • We can build a parsimonious non-linear model that uses random effects to allow the model parameters to vary by accident year. Copyright © 2009 Deloitte Development LLC. All rights reserved.
37
Growth Curves • Let’s build a non-linear model of the loss triangle.
• Basic idea: we fit these curves to the LDFs and add random effects to and/or to allow the curves to vary by year.
1.0 0.8 0.6
G ( x | , ) 1 exp ( x / )
0.4
• 2-parameter curves • = scale • = shape
x x
0.2
• Uses growth curve to model the loss development process
G ( x | , )
Cumulative Percent of Ultimate
• GLM shows up a lot in the stochastic loss reserving literature. • But… are GLMs natural models for loss triangles?
Weibull and Loglogistic Growth Curves
Weibull Loglogistic 12
36
60
84
108
132
156
180
Age in Months Copyright © 2009 Deloitte Development LLC. All rights reserved.
38
Baseline Model: Heuristics • Basic intuition is familiar: (CLAY,t) * (LDF) = Ult loss
0.6 0.2
CLAY,t = (Ult lossAY)*G,(t) + error
0.4
Cumulative Percent of Ultimate
CLAY,t = (Ult lossAY)*(1 / LDFt)
0.8
1.0
Weibull and Loglogistic Growth Curves
Weibull Loglogistic 12
36
60
84
108
132
156
180
Age in Months
CumLoss AY ,dev ULTAY 1 exp ( dev / ) AY ,dev 2 ULTAY ~ N ULT , ULT Var ( AY ,dev ) 2CLˆ AY ,dev
• The “growth curve” part comes in by using G(t) instead of LDFs. • Think of LDF’s as a rough piecewise linear approximation to a G(t)
• The “hierarchical” part comes in because we can let ULTAY, , and/or vary by AY (using sub-models). Copyright © 2009 Deloitte Development LLC. All rights reserved.
39
Including Exposures in the Model • Our model so far:
CumLoss AY ,dev ULTAY 1 exp (dev / ) AY ,dev 2 ULTAY ~ N ULT , ULT Var ( ) 2CLˆ
AY , dev
AY , dev
• What if we wish to include an exposure measure in the model? • It’s easily done:
CumLoss AY ,dev prem AY * LR AY 1 exp (dev / ) 2 LR AY ~ N LR , LR Var ( AY , dev ) 2CLˆ AY ,dev
AY , dev
• premAY is given; LRAY are hyperparameters; LR and LR are parameters. • LR and LR replace ULT and ULT. • LR is essentially a “Cape Cod” style LR estimate for all years combined. • {LR1991, LR1992,…, LR2000, } are “credibility weighted” LR estimates for each of the individual accident years. Copyright © 2009 Deloitte Development LLC. All rights reserved.
40
Other “Random Effects” • Our model so far:
CumLoss AY ,dev ULTAY 1 exp (dev / ) AY ,dev 2 ULTAY ~ N ULT , ULT Var ( ) 2CLˆ
AY , dev
AY , dev
• What if we want to include other random effects in the model? • It’s easily done: CumLoss ULT 1 exp (dev / ) AY , dev
AY
2 ULT ULT ULTAY ~ N , , AY ULT ULT , Var ( AY ,dev ) 2CLˆ AY ,dev
AY , dev
ULT , 2
• Here we add a “random warp” effect to let vary by AY. • Can also add “random scale” () effect if we want.
• We can compare AIC and diagnostic plots to judge whether this improves the model. Copyright © 2009 Deloitte Development LLC. All rights reserved.
41
Baseline Model Performance Cumulative losses @ dev = (Ult losses) * (modeled growth) CumLossAY ,dev ULTAY 1 exp (dev / ) AY ,dev 2 ULTAY ~ N ULT , ULT We must estimate the parameters: Var( AY ,dev ) 2CLˆ AY ,dev {ULT; ; ; ULT ; } Weibull Growth Curve Loss Development Model fixed 20
1996
• Random effects added to ultimate loss (ULT) parameter.
• Random shape (), scale () effects were tested, found not to be significant.
60
80 100
20
1997
1998
40
60
80 100
1999
2000
6000 5000 4000 3000 2000
cumulative losses
– Analogous to random intercepts
40
AY
1000
1991
6000
1992
1993
1994
1995
5000 4000 3000 2000 1000
20
40
60
80 100
Copyright © 2009 Deloitte Development LLC. All rights reserved.
20
40
60
80 100
months of development
20
40
60
80 100
42
Baseline Model Performance
CumLoss AY ,dev ULTAY 1 exp ( dev / ) AY ,dev
2 ULTAY ~ N ULT , ULT Var ( ) 2CLˆ
The random effects allow a “custom fit” growth curve for each AY while maintaining parsimony.
AY , dev
AY , dev
Weibull Growth Curve Loss Development Model fixed 20
1996
40
60
AY
80 100
20
1997
1998
40
60
80 100
1999
2000
6000 5000 4000 3000
cumulative losses
2000
The model contains only 5 hyperparameters, but fits the loss triangle very well
1000
1991
6000
1992
1993
1994
1995
5000 4000 3000 2000 1000
20
40
60
80 100
Copyright © 2009 Deloitte Development LLC. All rights reserved.
20
40
60
80 100
months of development
20
40
60
80 100
43
Residual Diagnostics • An advantage of stochastic reserving in general – and this method in particular – is that it enables us to use residual diagnostic analysis. Normal Q-Q Plot
actual vs predicted
4000 3000 2000
actual
1
1000
0
0.0
-1
0.2
0.4
Sample Quantiles
0.6
2
5000
3
0.8
residual histogram
-1
0
1
2
3
-2
0
1
2
1000
2000
3000
4000
residuals vs development time
residuals vs accident year
3000
4000
5000
2 1
residuals
-1 -2
-1 -2 2000
0
2 1
residuals
0
2 1 0 -2
1000
5000
3
residuals vs predicted 3
predicted
3
Theoretical Quantiles
-1
residuals
-1
residuals
6
18
30
Copyright © 2009 Deloittepredicted Development LLC. All rights reserved.
42
54
66
development
78
90
102 114
1992
1994
1996
accident.year
1998
2000
44
Model Results • The overall o/s reserve estimate is close to that of the chain ladder $18.7M. AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 chain link chain ldf growth curve
AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 total
12 358 352 291 311 443 396 441 359 377 344
24 1,125 1,236 1,292 1,419 1,136 1,333 1,288 1,421 1,363
CHAIN LADDER METHOD 36 48 60 72 1,735 2,183 2,746 3,320 2,170 3,353 3,799 4,120 2,219 3,235 3,986 4,133 2,195 3,757 4,030 4,382 2,128 2,898 3,403 3,873 2,181 2,986 3,692 2,420 3,483 2,864
84 3,466 4,648 4,629 4,588
96 3,606 4,914 4,909
108 3,834 5,339
120 3,901
3.491 1.747 1.455 1.176 1.104 1.086 1.054 1.077 1.018 1.000 14.451 4.140 2.369 1.628 1.384 1.254 1.155 1.096 1.018 1.000 6.9% 24.2% 42.2% 61.4% 72.2% 79.7% 86.6% 91.3% 98.3% 100.0%
dev 114 102 90 78 66 54 42 30 18 6
reported 3,901 5,339 4,909 4,588 3,873 3,692 3,483 2,864 1,363 344
est ult 3,901 5,434 5,379 5,298 4,858 5,111 5,672 6,787 5,644 4,971
reserve 0 95 470 710 985 1,419 2,189 3,922 4,281 4,627
34,358
53,055
18,697
Parameters and Estimated Reserves - Baseline Model omega theta growth reported eval120 eval240 1.306 46.638 96.0% 3,901 3,943 4,073 1.306 46.638 93.8% 5,339 5,239 5,412 1.306 46.638 90.6% 4,909 5,207 5,379 1.306 46.638 85.9% 4,588 5,423 5,602 1.306 46.638 79.3% 3,873 4,777 4,935 1.306 46.638 70.2% 3,692 5,052 5,219 1.306 46.638 58.2% 3,483 5,512 5,694 1.306 46.638 43.0% 2,864 5,850 6,043 1.306 46.638 25.0% 1,363 5,255 5,429 1.306 46.638 6.6% 344 5,101 5,270
ULT 4,074 5,413 5,380 5,603 4,936 5,220 5,695 6,044 5,430 5,271 53,066
These are the 12 hierarchical model parameters. Copyright © 2009 Deloitte Development LLC. All rights reserved.
reserves 172 74 470 1,015 1,062 1,528 2,212 3,180 4,067 4,927 18,708
ULT1998 is lower in hierarchical model than chain ladder. Cum losses @36 disproportionately high for 1998. This data point has more leverage in the chain ladder method.
ULT = 5306.6
45
References Clark, David R. (2003) “LDF Curve Fitting and Stochastic Loss Reserving: A Maximum Likelihood Approach,” CAS Forum.
de Jong, Piet and Heller, Gillian (2008). Generalized Linear Models for Insurance Data. New York: Cambridge University Press.
Frees, Edward (2006). Longitudinal and Panel Data Analysis and Applications in the Social Sciences. New York: Cambridge University Press.
Gelman, Andrew and Hill, Jennifer (2007). Data Analysis Using Regression and Multilevel / Hierarchical Models. New York: Cambridge University Press.
Guszcza, James. (2008). “Hierarchical Growth Curve Models for Loss Reserving,” CAS Forum.
Pinheiro, Jose and Douglas Bates (2000). Mixed-Effects Models in S and S-Plus. New York: Springer-Verlag. Copyright © 2009 Deloitte Development LLC. All rights reserved.
46