Introduction to Hierarchical Modeling - Casualty Actuarial Society

Introduction to Hierarchical Modeling

CAS RPM Seminar Las Vegas March, 2009

Jim Guszcza Bill Stergiou Deloitte Consulting LLP

Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings. Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding – expressed or implied – that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy.

Copyright © 2009 Deloitte Development LLC. All rights reserved.

1

Topics

Hierarchical Modeling Theory Sample Hierarchical Model Hierarchical Models and Credibility Theory Case Study #1: Poisson Regression Case Study #2: Loss Reserving


2

Hierarchical Model Theory

Hierarchical Model Theory

What is Hierarchical Modeling? • Hierarchical modeling is used when one’s data is grouped in some important way. • • • • • • •

Claim experience by state or territory Workers Comp claim experience by class code Income by profession Claim severity by injury type Churn rate by agency Multiple years of loss experience by policyholder. …

• Often grouped data is modeled either by: • Pooling the data and introducing dummy variables to reflect the groups • Building separate models by group

• Hierarchical modeling offers a “third way”. • Parameters reflecting group membership enter one’s model through appropriately specified probability sub-models. Copyright © 2009 Deloitte Development LLC. All rights reserved.

5

What’s in a Name? • Hierarchical models go by many different names • • • • •

Mixed effects models Random effects models Multilevel models Longitudinal models Panel data models

• We prefer the “hierarchical model” terminology because it evokes the way models-within-models are used to reflect levels-within-levels of ones data. • An important special case of hierarchical models involves multiple observations through time of each unit. • Here group membership is the repeated observations belonging to each individual. • Time is the covariate. Copyright © 2009 Deloitte Development LLC. All rights reserved.

6

Varying Slopes and Intercepts Random Intercept Model

Random Slope Model

Random Intercept / Random Slope Model

• Intercept varies with group • Slope stays constant

• Intercept stays constant • Slope varies by group

• Intercept and slope vary by group

• Each line represents a different group


7

Common Hierarchical Models • Notation: • Data points (Xi, Yi)i=1…N • j[i]: data point i belongs to group j.

• Classical Linear Model

Yi =  + Xi + i

• Equivalently: Yi ~ N( + Xi, 2) • Same  and  for every data point

• Random Intercept Model

Yi = j[i] + Xi + i

• Where j ~ N(, 2) & i ~ N(0, 2) • Same  for every data point; but  varies by group

• Random Intercept and Slope Model

Yi = j[i] + j[i]Xi + i

• Where (j, j) ~ N(, ) & i ~ N(0, 2) • Both  and  vary by group



Yi ~ N  j [ i ]   j[ i ]  X i , 

2



      j  where   ~ N   ,      j    


  2 ,   

     2  8

Parameters and Hyperparameters • We can rewrite the random intercept model this way:

Yi ~ N ( j[i ]  X i ,  2 )

 j ~ N (  ,  2 )

• Suppose there are 100 levels: j = 1, 2, …, 100 (e.g. SIC bytes 1-2) • This model contains 101 parameters: {1, 2, …, 100, }. • And it contains 4 hyperparameters: {, , , }. • Here is how the hyperparameters relate to the parameters:

ˆ j  Z j  ( y j  x j )  (1  Z j )  ˆ

where Z j 

nj 2  nj 

 2

• Does this formula look familiar? Copyright © 2009 Deloitte Development LLC. All rights reserved.

9

Sample Hierarchical Model

Example • Suppose we wish to model a company’s policies in force, by region, for the years 2005-08. • 8 * 4 = 32 data points. 2700

Policies in Force by Year and Region

• One way to visualize the data:

3

2

2200

7 8

4 8 7

1 6 5

5 8 7 8 7 2 1 3 5

5 1 6

2 6

6 1 2005


4

2

4

2300

2400

2500

3

2000

– Use a trellis-style display, with one plot per region – More immediate representation of the data’s hierarchical structure. – (see next slide)

2100

pif

• Alternate way:

2600

– Plot all of the data points on the same graph, use different colors/symbols to represent region.

4 3

2006

2007

year

2008 11

Trellis-Style Data Display • We wish to build a model that captures the change in PIF over time. • We must reflect the fact that PIF varies by region.

2006

2007

2008

2006

2007

2008

2600 2000

2200

pif

2400

2600 2400 2000 2005

2005

2006

2007

2008

2005

2006

2007

region5

region6

region7

region8

2007

2008

year

2005

2006

2007

2008

year

2400 2000

2200

pif

2400 2000

2200

pif

2400 2000

2200

pif

2400 2200

2006


2008

2600

year

2600

year

2600

year

2600

year

2000 2005

region4

2200

pif

2400 2000

2200

pif

2400 2000

2200

pif

2005

pif

region3

2600

region2

2600

region1

2005

2006

2007

year

2008

2005

2006

2007

year

2008

12

Option 1: Simple Regression • The easiest thing to do is to pool the data across groups -- i.e. simply ignore region • Fit a simple linear model • Alas, this model is not appropriate for all regions

2006

2007

2008

2006

2007

2008

2600 2400 2000

2200

pif

2200

pif

2400

2600

region4

2000 2005

2005

2006

2007

2008

2005

2006

2007

region5

region6

region7

region8

2008

2400 2000

2200

pif

2400 2000

2200

pif

2400 2200 2000

pif

2200

Copyright2005 © 20092006Deloitte Development LLC. All rights reserved. 2007 2008 2005 2006 2007

2008

2600

year

2600

year

2600

year

2600

year

2000

2400

2400 2000

2200

pif

2400 2000

2200

pif

2005

pif

region3

2600

region2

2600

region1

PIF     t  

2005

2006

2007

2008

2005

2006

2007

2008

13

Option 2: Separate Models by Region • At the other extreme, we can fit a separate simple linear model for each region. • Each model is fit with 4 data points. • Introduces danger of over-fitting the data.

2007

2008

2008

2600 2000

2200

pif 2005

2006

2007

2008

2005

2006

2007

region6

region7

region8

2400 2000

2200

pif

2400

pif

2200 2000

pif

2200 2000

2400

2008

2008

2600

region5

2600

year

2600

year

Copyright ©20052009 Deloitte Development LLC.2005All rights 2006 2007 2008 2006 reserved. 2007

k 1, 2 ,..,8

2400

2600 2400

pif

2200 2007

year

2200

2400

2006



region4

2000 2005

  kt   k

year

2000

pif

2400 2000

2200

pif

2400

pif

2200 2000

2006

2600

2005

k

region3

2600

region2

2600

region1

PIF  

2005

2006

2007

2008

2005

2006

2007

2008

14

Option 3: Random Intercept Hierarchical Model • Compromise: Reflect the region group structure using a hierarchical model.

PIF ~ N ( j[ i ]   t ,  ) 2

2006

2007

2008

2006

2007

2008

2600 2400 2000

2000

2200

pif

2400

2600

region4

2200

pif 2005

2005

2006

2007

2008

2005

2006

2007

region5

region6

region7

region8

2008

2400 2000

2200

pif

2400 2000

2200

pif

2400 2200 2000

pif

2200

Copyright LLC. All rights reserved. 2005 © 2009 2006 Deloitte 2007 Development 2008 2005 2006 2007

2008

2600

year

2600

year

2600

year

2600

year

2000

2400

2400 2000

2200

pif

2400 2000

2200

pif

2005

pif

region3

2600

region2

2600

region1

 j ~ N (  ,  ) 2 

2005

2006

2007

2008

2005

2006

2007

2008

15

Compromise Between Complete Pooling & No Pooling

PIF  

PIF    t  

k

  kt   k



k 1, 2 ,..,8

Complete Pooling

No Pooling

• Ignore group structure altogether

• Estimating one model for each group

Compromise

Hierarchical Model • Estimates parameters using a compromise between complete pooling and no pooling methodologies

PIF ~ N ( j[ i ]   t ,  2 ) Copyright © 2009 Deloitte Development LLC. All rights reserved.

 j ~ N (  ,  2 ) 16

Option 1b: Adding Dummy Variables • Question: of course it’d be crazy to fit a separate SLR for each region. • But what about adding 8 region dummy variables into the SLR?

PIF   1 R1   2 R2  ...   8 R8   t   • If we do this, we need to estimate 9 parameters instead of 2. • In contrast, the random intercept model contains 4 hyperparameters: , , ,  • Now suppose our example contained 800 regions. If we use dummy variables, our SLR potentially requires that we estimate 801 parameters. • But the random intercept model will contain the same 4 hyperparameters. Copyright © 2009 Deloitte Development LLC. All rights reserved.

17

Varying Slopes • The random intercept model is a compromise between a “pooled” SLR and a separate SLR by region.

PIF ~ N ( j[i ]  t ,  2 )

 j ~ N (  ,  2 )

• But there is nothing sacred about the intercept term: we can also allow the slopes to vary by region.



Yi ~ N  j [i ]   j[ i ]  X i , 

2



      j  where   ~ N   ,      j    

  2 ,   

     2 

• In the dummy variable option (1b) this would require us to interact region with the time t variable… i.e. it would return us to option 2. • Great danger of overparameterization.

• Adding random slopes adds considerable flexibility at the cost of only two additional hyperparameters. • Random slope only: • Random slope & intercept:

, , ,  , , , , , 


18

Option 4: Random Slope & Intercept Hierarchical Model • We can similarly include a sub-model for the slope .



2007

2008

2600

2600 2006

2007

2008

pif

2200 2000 2005

2006

2007

2008

2005

2006

2007

year

region5

region6

region7

region8

2007

2008

2005

2006

2007

2008

2400 2000

2200

pif

2400 2000

2200

pif

2400 2000

2200

pif

2400 2200

2006

Copyright © 2009 Deloitte Development LLC. All rights reserved. year year

2008

2600

year

2600

year

2600

year

2000 2005

2400 2000

2005

     2 

region4

2200

pif

2400 2000

2200

pif

2400

pif

2200 2000

2006

2600

2005

pif

region3

2600

region2

2600

region1

  2 ,   

2400



PIFi ~ N  j[ i ]   j[ i ]  t i , 

2

 j  where   ~ N   ,   ,    j 

2005

2006

2007

year

2008

2005

2006

2007

year

2008

19

Does Adding Random Slopes Improve the Model? • How do we determine whether adding the random slope term improves the model? 1. Graphical analysis and judgment: • the random slopes arguably yield an improved fit for Region 5. • but it looks like the random slope model might be overfitting Region 3. • Other regions a wash

2. Out of sample lift analysis. 3. Akaike information Criterion [AIC]: -2*LL + 2*d.f. • Random intercept AIC: 380.40 • Random intercept & slope AIC: 380.64 • Slight deterioration  better to select the random intercept model.

• Random slopes don’t help in this example, but it is a very powerful form of variable interaction to consider in one’s modeling projects.


20

Parameter Comparison • It is important to distinguish between each model’s parameters and hyperparameters. , 

, , , 

SLR region 1 2 3 4 5 6 7 8

intercept 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0 2068.0

slope 100.1 100.1 100.1 100.1 100.1 100.1 100.1 100.1

• SLR: • Random intercept: • Random intercept & slope:

random intercept intercept slope 1911.3 100.1 2087.8 100.1 2236.1 100.1 2267.3 100.1 1980.3 100.1 1932.3 100.1 2066.8 100.1 2061.8 100.1

, , , , ,  random intercept & slope intercept slope 1999.3 70.3 2070.2 111.2 2137.0 137.4 2159.6 133.2 2033.1 79.3 2008.9 73.8 2066.3 101.2 2069.5 94.1

2 parameters and 2 hyperparameters 9 parameters and 4 hyperparameters 16 parameters and 6 hyperparameters

• How do the hyperparameters relate to the parameters? Copyright © 2009 Deloitte Development LLC. All rights reserved.

21

Connection with Credibility Theory

Hierarchical Models and Credibility Theory • Let’s revisit the random intercept model.

PIF ~ N ( j[i ]  t ,  2 )

 j ~ N (  ,  2 )

• This is how we calculate the random intercepts {1, 2, …, 8}:

ˆ j  Z j  ( y j  t j )  (1  Z j )  ˆ

where Z j 

nj 2  nj 

 2

• Therefore: each random intercept is a credibility-weighted average between: • The intercept for the pooled model (option 1) • The intercept for the region-specific model (option 2)


23

Within Variance and Between Variance ˆ j  Z j  ( y j   t j )  (1  Z j )  ˆ

where Z j 

nj 2 nj 

 2

• Low within variance () and high between variance () high credibility (Z) More like separate models (option 2) • High within variance () and low between variance () low credibility (Z) More like complete pooling (option 1) Copyright © 2009 Deloitte Development LLC. All rights reserved.

24

Hierarchical Models and Credibility Theory • This makes precise the sense in which the random intercept model is a compromise between the pooled-data model (option 1) and the separate models for each region (option 2).

ˆ j  Z j  ( y j  t j )  (1  Z j )  ˆ

where Z j 

• As 0, the random intercept model  option 1 • As  , the random intercept model  option 2

nj 2  nj 

 2

(complete pooling) (separate models)

• Aside: what happens to the above formula if we remove the covariate t from our random intercept model?


25

Bühlmann’s Credibility and Random Intercepts • If we remove the time covariate (t) from the random intercepts model, we are left with a very familiar formula:

ˆ j  Z j  y j  (1  Z j )  ˆ

where Z j 

nj 2  nj 

 2

• Therefore: Bühlmann’s credibility model is a specific instance of hierarchical models. • The theory of hierarchical models gives one a practical way to integrate credibility theory into one’s GLM modeling activities.


26

Sample Applications • Territorial ratemaking or including territory in a GLM analysis. • The large number of territories typically presents a problem.

• Vehicle symbol analysis • WC or Bop business class analysis • Repeated observations by policyholder • Experience rating • Loss reserving • Short introduction to follow


27

Summing Up • Hierarchical models are applicable when one’s data comes grouped in one or more important ways. • A group with a large number of levels might be regarded as a “massively categorical value”… • Building separate models by level or including one dummy variable per level is often impractical or unwise from a credibility point of view.

• Hierarchical models offer a compromise between complete pooling and separate models per level. • This compromise captures the essential idea of credibility theory. • Therefore hierarchical model enable a practical unification of two pillars of actuarial modeling: • Generalized Linear Models • Credibility theory


28

Other thoughts • The “credibility weighting” reflected in the calculation of the random effects represents a “shrinkage” of group-level parameters (j, j) to their means (, ). • The lower the “between variance” (2) the greater amount of “shrinkage” or “pooling” there is. • There is more shrinkage for groups with fewer observations (n). • Panel data analysis is a type of hierarchical modeling  this is a natural framework for analyzing longitudinal datasets. • Multiple observations of the same policyholder • Loss reserving: loss development is multiple observations of the same AY claims


29

Case Study Hierarchical Poisson Regression

Modeling Claim Frequency • Personal auto dataset. • 67K observations. • Build Poisson claim frequency models.

• AREA and BODY_TYPE are highly categorical values. • We can treat these as dummy variables or as random intercepts. • Note several levels of Body Type have few exposures.


31

Model #1: Standard Poisson Regression • We build a 4-factor model • • • •

Vehicle Value Driver Age Area (territory) Vehicle body type

• Many levels of AREA, BODY_TYPE are not statistically significant. • Note: levels of BODY_TYPE with few exposures have large GLM parameters. • Dilemma: should we exclude these levels, judgmentally temper them, or keep them as-is? Copyright © 2009 Deloitte Development LLC. All rights reserved.

32

Model #2: Random Intercepts for Area and Body Type • Rather than use dummy variables for AREA and BODY_TYPE we can introduce “random effects”. • Methodology equally applicable even with many more levels.


33

Model #3: Add Vehicle Value Random Slope • Intuition: Relationship between vehicle value and claim frequency might vary by type of vehicle. • Response: Introduce random slopes for VEH_VALUE.


34

Model Comparison • Shrinkage: The hierarchical model estimates (green, blue) are less extreme than the standard GLM estimates. • Different stories: All models agree for (e.g.) Sedans (10K+ exposures) but tell much different stories for (e.g.) Coupes (300 exposures). Area=A Age Group=3

15000

25000

0.5 0.4 0.1 0.0

5000

15000

25000

35000

5000

15000

25000 veh_value

RDSTR

SEDAN

UTE GLM Random Intercept Ran Inter & Slope

0.2

0.3

0.4

35000

0.1 0.0

0.0

claim frequency

0.2

0.3

GLM Random Intercept Ran Inter & Slope

0.1

claim frequency

0.4


0.5

veh_value

0.5

veh_value


0.0

0.3

claim frequency

0.3

35000


0.2

0.5 0.4


0.0

0.0 0.1

0.2

0.3

0.4

0.5

5000

claim frequency

COUPE

0.1

0.2

0.3

claim frequency

0.4


0.1

claim frequency

CONVT

0.2

0.5

BUS

35

Hierarchical Growth Curve Loss Reserving Model

Hierarchical Modeling for Loss Reserving • Here is a garden-variety loss triangle (Dave Clark CAS Forum 2003): AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 chain link chain ldf growth curve

12 358 352 291 311 443 396 441 359 377 344

24 1,125 1,236 1,292 1,419 1,136 1,333 1,288 1,421 1,363

3.491 14.451 6.9%

1.747 4.140 24.2%

Cumulative Losses in 36 48 60 1,735 2,183 2,746 2,170 3,353 3,799 2,219 3,235 3,986 2,195 3,757 4,030 2,128 2,898 3,403 2,181 2,986 3,692 2,420 3,483 2,864

1.455 2.369 42.2%

1.176 1.628 61.4%

1.104 1.384 72.2%

1000's 72 3,320 4,120 4,133 4,382 3,873

84 3,466 4,648 4,629 4,588

96 3,606 4,914 4,909

1.086 1.254 79.7%

1.054 1.155 86.6%

1.077 1.096 91.3%

108 3,834 5,339

120 3,901

reported 3,901 5,339 4,909 4,588 3,873 3,692 3,483 2,864 1,363 344

est ult 3,901 5,434 5,379 5,298 4,858 5,111 5,672 6,787 5,644 4,971

reserve 0 95 470 710 985 1,419 2,189 3,922 4,281 4,627

1.018 1.000 1.018 1.000 98.3% 100.0%

34,358

53,055

18,697

• We can regard this as a longitudinal dataset. • Grouping dimension: Accident Year (AY) • We can build a parsimonious non-linear model that uses random effects to allow the model parameters to vary by accident year. Copyright © 2009 Deloitte Development LLC. All rights reserved.

37

Growth Curves • Let’s build a non-linear model of the loss triangle.

• Basic idea: we fit these curves to the LDFs and add random effects to  and/or  to allow the curves to vary by year.

1.0 0.8 0.6



G ( x |  , )  1  exp  ( x /  ) 



0.4

• 2-parameter curves •  = scale •  = shape

x x   

0.2

• Uses growth curve to model the loss development process

G ( x | , ) 

Cumulative Percent of Ultimate

• GLM shows up a lot in the stochastic loss reserving literature. • But… are GLMs natural models for loss triangles?

Weibull and Loglogistic Growth Curves

Weibull Loglogistic 12

36

60

84

108

132

156

180

Age in Months Copyright © 2009 Deloitte Development LLC. All rights reserved.

38

Baseline Model: Heuristics • Basic intuition is familiar: (CLAY,t) * (LDF) = Ult loss

0.6 0.2

 CLAY,t = (Ult lossAY)*G,(t) + error

0.4

Cumulative Percent of Ultimate

 CLAY,t = (Ult lossAY)*(1 / LDFt)

0.8

1.0

Weibull and Loglogistic Growth Curves

Weibull Loglogistic 12

36

60

84

108

132

156

180

Age in Months







CumLoss AY ,dev  ULTAY 1  exp  ( dev /  )   AY ,dev 2 ULTAY ~ N ULT ,  ULT Var ( AY ,dev )   2CLˆ AY ,dev





• The “growth curve” part comes in by using G(t) instead of LDFs. • Think of LDF’s as a rough piecewise linear approximation to a G(t)

• The “hierarchical” part comes in because we can let ULTAY, , and/or  vary by AY (using sub-models). Copyright © 2009 Deloitte Development LLC. All rights reserved.

39

Including Exposures in the Model • Our model so far:







CumLoss AY ,dev  ULTAY 1  exp  (dev /  )   AY ,dev 2 ULTAY ~ N ULT ,  ULT Var ( )   2CLˆ



AY , dev



AY , dev

• What if we wish to include an exposure measure in the model? • It’s easily done: 



 

CumLoss AY ,dev  prem AY * LR AY 1  exp  (dev /  ) 2 LR AY ~ N  LR ,  LR Var ( AY , dev )   2CLˆ AY ,dev



 

AY , dev

• premAY is given; LRAY are hyperparameters; LR and LR are parameters. • LR and LR replace ULT and ULT. • LR is essentially a “Cape Cod” style LR estimate for all years combined. • {LR1991, LR1992,…, LR2000, } are “credibility weighted” LR estimates for each of the individual accident years. Copyright © 2009 Deloitte Development LLC. All rights reserved.

40

Other “Random Effects” • Our model so far:







CumLoss AY ,dev  ULTAY 1  exp  (dev /  )   AY ,dev 2 ULTAY ~ N ULT ,  ULT Var ( )   2CLˆ



AY , dev



AY , dev

• What if we want to include other random effects in the model? • It’s easily done: CumLoss  ULT 1  exp  (dev /  )   AY , dev

AY





2   ULT  ULT  ULTAY    ~ N  ,   ,      AY   ULT    ULT , Var ( AY ,dev )   2CLˆ AY ,dev



AY , dev

 ULT ,   2   

• Here we add a “random warp” effect to let  vary by AY. • Can also add “random scale” () effect if we want.

• We can compare AIC and diagnostic plots to judge whether this improves the model. Copyright © 2009 Deloitte Development LLC. All rights reserved.

41

Baseline Model Performance Cumulative losses @ dev =  (Ult losses) * (modeled growth) CumLossAY ,dev  ULTAY 1  exp (dev /  )   AY ,dev 2 ULTAY ~ N ULT ,  ULT  We must estimate the parameters: Var( AY ,dev )   2CLˆ AY ,dev {ULT;  ; ; ULT ; } Weibull Growth Curve Loss Development Model fixed 20

1996

• Random effects added to ultimate loss (ULT) parameter.

• Random shape (), scale () effects were tested, found not to be significant.

60

80 100

20

1997

1998

40

60

80 100

1999

2000

6000 5000 4000 3000 2000

cumulative losses

– Analogous to random intercepts

40

AY

1000

1991

6000

1992

1993

1994

1995

5000 4000 3000 2000 1000

20

40

60

80 100


20

40

60

80 100

months of development

20

40

60

80 100

42

Baseline Model Performance







CumLoss AY ,dev  ULTAY 1  exp  ( dev /  )   AY ,dev



2 ULTAY ~ N ULT ,  ULT Var ( )   2CLˆ

The random effects allow a “custom fit” growth curve for each AY while maintaining parsimony.

AY , dev



AY , dev

Weibull Growth Curve Loss Development Model fixed 20

1996

40

60

AY

80 100

20

1997

1998

40

60

80 100

1999

2000

6000 5000 4000 3000

cumulative losses

2000

The model contains only 5 hyperparameters, but fits the loss triangle very well

1000

1991

6000

1992

1993

1994

1995

5000 4000 3000 2000 1000

20

40

60

80 100


20

40

60

80 100

months of development

20

40

60

80 100

43

Residual Diagnostics • An advantage of stochastic reserving in general – and this method in particular – is that it enables us to use residual diagnostic analysis. Normal Q-Q Plot

actual vs predicted

4000 3000 2000

actual

1

1000

0

0.0

-1

0.2

0.4

Sample Quantiles

0.6

2

5000

3

0.8

residual histogram

-1

0

1

2

3

-2

0

1

2

1000

2000

3000

4000

residuals vs development time

residuals vs accident year

3000

4000

5000

2 1

residuals

-1 -2

-1 -2 2000

0

2 1

residuals

0

2 1 0 -2

1000

5000

3

residuals vs predicted 3

predicted

3

Theoretical Quantiles

-1

residuals

-1

residuals

6

18

30

Copyright © 2009 Deloittepredicted Development LLC. All rights reserved.

42

54

66

development

78

90

102 114

1992

1994

1996

accident.year

1998

2000

44

Model Results • The overall o/s reserve estimate is close to that of the chain ladder $18.7M. AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 chain link chain ldf growth curve

AY 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 total

12 358 352 291 311 443 396 441 359 377 344

24 1,125 1,236 1,292 1,419 1,136 1,333 1,288 1,421 1,363

CHAIN LADDER METHOD 36 48 60 72 1,735 2,183 2,746 3,320 2,170 3,353 3,799 4,120 2,219 3,235 3,986 4,133 2,195 3,757 4,030 4,382 2,128 2,898 3,403 3,873 2,181 2,986 3,692 2,420 3,483 2,864

84 3,466 4,648 4,629 4,588

96 3,606 4,914 4,909

108 3,834 5,339

120 3,901

3.491 1.747 1.455 1.176 1.104 1.086 1.054 1.077 1.018 1.000 14.451 4.140 2.369 1.628 1.384 1.254 1.155 1.096 1.018 1.000 6.9% 24.2% 42.2% 61.4% 72.2% 79.7% 86.6% 91.3% 98.3% 100.0%

dev 114 102 90 78 66 54 42 30 18 6

reported 3,901 5,339 4,909 4,588 3,873 3,692 3,483 2,864 1,363 344

est ult 3,901 5,434 5,379 5,298 4,858 5,111 5,672 6,787 5,644 4,971

reserve 0 95 470 710 985 1,419 2,189 3,922 4,281 4,627

34,358

53,055

18,697

Parameters and Estimated Reserves - Baseline Model omega theta growth reported eval120 eval240 1.306 46.638 96.0% 3,901 3,943 4,073 1.306 46.638 93.8% 5,339 5,239 5,412 1.306 46.638 90.6% 4,909 5,207 5,379 1.306 46.638 85.9% 4,588 5,423 5,602 1.306 46.638 79.3% 3,873 4,777 4,935 1.306 46.638 70.2% 3,692 5,052 5,219 1.306 46.638 58.2% 3,483 5,512 5,694 1.306 46.638 43.0% 2,864 5,850 6,043 1.306 46.638 25.0% 1,363 5,255 5,429 1.306 46.638 6.6% 344 5,101 5,270

ULT 4,074 5,413 5,380 5,603 4,936 5,220 5,695 6,044 5,430 5,271 53,066

These are the 12 hierarchical model parameters. Copyright © 2009 Deloitte Development LLC. All rights reserved.

reserves 172 74 470 1,015 1,062 1,528 2,212 3,180 4,067 4,927 18,708

ULT1998 is lower in hierarchical model than chain ladder. Cum losses @36 disproportionately high for 1998. This data point has more leverage in the chain ladder method.

ULT = 5306.6

45

References Clark, David R. (2003) “LDF Curve Fitting and Stochastic Loss Reserving: A Maximum Likelihood Approach,” CAS Forum.

de Jong, Piet and Heller, Gillian (2008). Generalized Linear Models for Insurance Data. New York: Cambridge University Press.

Frees, Edward (2006). Longitudinal and Panel Data Analysis and Applications in the Social Sciences. New York: Cambridge University Press.

Gelman, Andrew and Hill, Jennifer (2007). Data Analysis Using Regression and Multilevel / Hierarchical Models. New York: Cambridge University Press.

Guszcza, James. (2008). “Hierarchical Growth Curve Models for Loss Reserving,” CAS Forum.

Pinheiro, Jose and Douglas Bates (2000). Mixed-Effects Models in S and S-Plus. New York: Springer-Verlag. Copyright © 2009 Deloitte Development LLC. All rights reserved.

46