March 2012
Insights Predictive Modeling Blending Business Expertise With Sophisticated Science Predictive models are now used by insurers worldwide to help run their businesses. In fact, in many developed markets, they are now the new standard for setting prices, particularly in personal lines such as auto and homeowners insurance. They are also being used in commercial insurance. The true worth of predictive models is only realized when the data are analyzed by industry experts who understand what the data truly mean and how they can best be used to advance product development and pricing. An apt comparison is a violinist and the violin. In the hands of a proficient violinist, the violin is a machine that can produce a piece that may be technically correct, but in the hands of a master who can understand and interpret the notes and knows where emphasis is needed, great music is made. The violinist’s gift and the power of the machine create a more nuanced performance. So, too, with predictive modeling — the model or machine can produce data, but only with a deeper understanding of those data is a masterful approach to the market accomplished. In essence, insurers must move beyond the noise that is part of raw data culled from predictive models and fine-tune information that management, with its keen market ear, can use to improve the ability of its company to compete.
Background The term “predictive modeling” covers a wide range of statistical tools used by analytical teams to understand data generated by the business process. There are two basic elements in the data that are collected — metrics that represent the business idea being measured and facts that represent the information collected for records that produce a particular metric.
In statistical parlance, the metrics are often called responses, and the facts are called predictors. (This article will discuss supervised predictive models — unsupervised models will be discussed at a later time.)
“The term ‘predictive
Before presenting the advantages and disadvantages among the various classes of tools, it is important to have a solid understanding of what the predictive model is trying to achieve. The goal is to produce a sensible model that explains recent historical experience and is likely to be predictive of future experience.
modeling’ covers a wide range of statistical tools used by analytical teams to understand data generated by the business process.”
This idea is achieved by recognizing that the response data in the model have two key components: signal and noise. The signal represents a systematic pattern that will likely repeat itself in future data. The noise represents the randomness that is inherent in a statistical process (Figure 1). Figure 1. Basic structure of a predictive model Response variable
=
Systematic component Signal — Function of the predictors
The systematic component is often referred to as the model structure. It reflects the relationship that the response has with the various predictors in the data set. Predictive models relate the characteristics of the predictors to the response. Theoretically, the modeler has an unlimited choice of functional forms that can relate the predictors to the response. As mentioned, the goal of the model-building process is to find the “best” form that is likely to be predictive of the future.
+
Unsystematic component Noise — Reflects stochastic process
Insights | March 2012
With this background, we can now introduce a second idea so that we can compare multiple model forms to identify the most appropriate. This idea is referred to as overfitting versus underfitting. In its simplest form, we are trying to understand the level of complexity that is needed to create a predictive model. Consider the spectrum depicted in Figure 2. The left-hand side is a model structure in which the aggregate average is defined as the model. For example, in auto insurance, this would be akin to giving everyone an identical rate. Because of the Law of Large Numbers inherent in insurance, this model has strong predictive power. This means that, from quarter to quarter, the aggregate average number will be a reasonable prediction of future aggregate averages. Unfortunately, this model will generate adverse selection and will fail because it has no explanatory power. The right-hand side represents a model structure that is so complex that it effectively repeats the historical data perfectly. In this case, the model has strong explanatory power. Unfortunately, this model assumes that someone’s past experience is a perfect replication of their future experience (in auto insurance, it is akin to assuming that if an insured had no losses last year, then the insured would have no losses in the next year). This model will fail because it has no predictive power. The best model balances both predictive power and explanatory power. If the model is too simple, then the resulting form is said to underfit the data, which is to say it ignores signals in the data and results in a model with weak explanatory power. If the model is too complex, then the form overfits the data, reflecting too much noise and resulting in a model with weak predictive power.
“The best model balances both predictive power and explanatory power.”
2 towerswatson.com
Figure 2. Overfitting versus underfitting
Overall mean
Best Models
One parameter per observation
Model Complexity (number of parameters) Underfit Predictive Poor explanatory power
Machines to Help As mentioned earlier, there are limitless numbers of structures that can relate the response to the predictors of the data. This flexibility can produce quite a daunting challenge to the analytical team studying the data. In practice, there are many statistical tools that can be utilized to build the model structure. In this section, we will focus on two of the most commonly used tools in data analysis. These tools can create some of the best predictive models when used with appropriate oversight and business knowledge. A common tool used in many applications is the generalized linear model (GLM). The model structure starts with a nonlinear function applied to a linear combination of predictors to describe the response. For example, a typical auto-rating algorithm can be expressed in these terms. There are many common search tools that allow the analytical team to search the data for the right structural relationships. Common examples include: • Stepwise regression. Here, statistical tests and search permutations are used to identify the right set of factors and interaction terms. • Balance tests. In these tests, aggregate observed values are compared with aggregate fitted values to identify potential factors and interactions to include in the model.
Overfit Poor predictive power Explains history
Insights | March 2012
There are many complexities that can be added to the model to enhance its structure. For example, recent research and development on saddle technologies allows the user to integrate extremely complex interactions in the model structure (Figure 3).
Figure 3. Interaction detection with saddle technology
In GLM tools, there are many routines that allow the analyst to simplify and complicate the model structure. Because of the volatility of insurance data, strict reliance on these statistical routines will often produce models that overfit the data. So we strongly recommend additional rigor and testing be applied to confirm that the pattern identified is real. Rigorous tests can be grouped into three broad categories: • Statistical tests • Consistency tests • Judgment The GLM is widely accepted because it is able to frame the solution within a variety of different rigorous perspectives and, most importantly, is able to reflect the business team’s judgment and insurance expertise in the model form. Another common tool used in modeling is the classification and regression tree (CART) (Figure 4). These decision trees are effectively sophisticated approaches that identify the best way to split the data into homogeneous groups. For example, tier placement rules are used to segment risks. Many of these algorithms have been built with decision trees (although many of the newer tier algorithms tend to be built with more parametric models). The decision tree uses a greedy search algorithm to recursively expand and prune the tree. The analytical team has a wide array of controls they can use to both quantify and test the value of various splits. The decision tree is a valuable tool to identify potential structures within the data. Our experience with this tool has generated many strong models. As with a GLM, there are many statistical and consistency tests that can be performed on the data, but the major drawback with a tree is that it is very challenging to validate lower-level splits. This inability to apply judgment to the statistical tool creates a much greater risk of overfitting the data and weakening their predictive power. Because of this concern, the tree would often be used to provide indications of the model structure but not the model parameters.
3 towerswatson.com
Figure 4. Typical decision tree
Insights | March 2012
A Cautionary Tale A common mistake for firms is the reliance on machine algorithms that are part of all statistical constructs. Modern technology has made the difficult disciplines of machine learning and data mining more accessible. On the one hand, this is a great asset to the analytical team. The application of sophisticated tools to data is bound to find a new previously undetected pattern. Even so, there is the very real risk that without a more rigorous understanding of why the pattern exists, the model is more likely to respond to the noise in the data. There are a variety of statistical techniques that can be built into applications to improve the predictive power of the model (Figure 5). Towers Watson consultants have technical and business expertise working with these methods. When using these techniques, it’s important to apply business acumen and judgment to assess whether the suggested predictive relationships actually make sense. Only this approach can maximize the predictive value that these techniques offer and also avoid the risk of following noise in the data.
Figure 5. Modeling techniques Sample methods
Value
Descriptive Simple univariate linear regression
Basic patterns in data identified
Principle components
Unsupervised dimension reduction
Classification Clustering
Homogenous grouping of data for dimension reduction
Classification trees
Decision-tree-fitting approach that splits data into different sets of training and validation data
Regression Generalized linear models Generalized additive models
Sophisticated parametric models for rating and scoring
Generalized estimating equations Multivariate adaptive regression lines
Systematic-based approach for model construction
Neural networks/Genetic algorithms/Other
Recursive-based nonlinear models to identify structure
Know Your Role We believe that exploratory data mining and machine learning are likely to be most useful for predictive models as a way to initially clean and filter data, particularly if companies do not have much experience with data, or the data needs to be broken down into more manageable and interpretable pieces. This might apply to sources such as web clickstream data (where explanatory factors are not obvious) or to a post/ZIP code analysis, which could generate as many as 300 variables for further analysis. This phase of the modeling process is often referred to as exploratory analysis, and for new data sources, this is where raw data fields become potential explanatory variables. Even with new data sources, an expert must use domain knowledge to improve on the initial variables uncovered by the exploratory mining techniques.
4 towerswatson.com
“When using these techniques, it’s important to apply business acumen and judgment to assess whether the suggested predictive relationships actually make sense.”
Insights | March 2012
Expert Analysis The most critical step in establishing a system of best practices is the application of analytical techniques to data. A very basic predictive modeling approach uses data to create risk buckets and prices accordingly. However, without analysis, there can be no in-depth understanding about why a surcharge should be applied. In short, the machinery of predictive modeling produces a low-level action. This is particularly true when multiple factors and their interactions are being evaluated. For instance, if asked to examine how age and gender interact, a predictive modeling or machine-based approach may look at insurance experience and credit factors before assigning a value to the data produced. But this approach may not offer a satisfying answer that reflects nuances in data. For instance, males in the higher age brackets may not need to be assigned the same charges as younger males.
Figure 6. Industry loss ratios by company class; more sophisticated methods produce better loss ratios Loss ratio 85% 80%
73
75%
62
59
57
70% 43 65%
38
60% 55% 50%
2005
2006
2007
2008
Low
Medium
*Loss ratios include ALAE
But when years of industry experience and analytical skill are applied, insurance actuaries can adapt techniques to tackle very specific insurance issues, such as spatial smoothing in geographic and vehicle classification analyses, that overcome instances where data are either quite sparse or fail to fully explain what is going on. As noted below, it is the calibration, carefully fitted by expert analysis, that allows insurers to achieve real-time results.
Proven Success Predictive modeling coupled with expert analysis is an effective way of improving business performance. Figure 6, for example, shows the relative performance of U.S. auto insurers by rating sophistication. Insurers can use this proven success as a base to expand predictive modeling to other business lines and business practices. Once a reliable predictive model is in place, there are no limitations. By combining the right predictive modeling tools, good data, imagination and analytical skill, insurers can use predictive modeling to gain and retain competitive advantage.
5 towerswatson.com
2009
Year
The Complete Solution Predictive modeling makes it possible to hone underwriting and bring it to a new level. The U.S. automobile industry illustrates the potential to dramatically impact loss ratios. And its power as a machine that drives pricing and other key functions is just starting to be harnessed. But this promising tool must be coupled with expert experience, understanding and analysis or it will remain just that — a tool. Intellectual capacity will give insurers the competitive margin needed to succeed in a more competitive market, and will reveal the true essence and meaning of predictive modeling. For more information, contact: Serhat Guven +1 210 767 3811
[email protected]
High
2010
About Towers Watson Towers Watson is a leading global professional services company that helps organizations improve performance through effective people, risk and financial management. With 14,000 associates around the world, we offer solutions in the areas of employee benefits, talent management, rewards, and risk and capital management.
Copyright © 2012 Towers Watson. All rights reserved. TW-NA-2012-24488
towerswatson.com