Model Stability and the Subprime Mortgage Crisis
Xudong An†, Yongheng Deng§, Eric Rosenblatt‡, Vincent W. Yao‡
September 12, 2010
Abstract We study the potential model instability problem with respect to mortgage default risk and examine to what extent it helps explain the default shock during the recent crisis. We find that econometric default risk models based on historical data can be unstable over time. Due to temporal shifts in the parameters, default prediction of the 2006 vintage subprime loans based on hazard and Logit models estimated with 2003 vintage loan data can generate over 40 percent fewer defaults than the actual number, assuming perfect forecast of house price change. We also find that the combined impact of parameter instability and bad forecast of HPI growth enlarges the under-prediction of default rate but the marginal impact of parameter instability is larger than that of bad HPI forecast. Our findings have important implications regarding model limitations and risk, model improvements, economic capital, and regulatory reform. Keywords: subprime mortgage, default risk, model stability, hazard model, Logit model
The authors are grateful to John Clapp, David Geltner, Richard Green, Michael Lea, David Ling, Tony Sanders, and Brent Smith for helpful discussions and suggestions. We also thank participants in the Maastricht-MIT-NUS 2009 Real Estate Finance and Investment Symposium, the 2010 Weimer School of Advanced Studies in Real Estate and Land Economics, the Finance Seminar at San Diego State University for helpful comments. † Department of Finance, College of Business Administration, San Diego State University, 5500 Campanile Dr., San Diego, CA 92182-8236;
[email protected], (619) 594-3027, (619) 594-3272(fax). § Institute of Real Estate Studies, National University of Singapore; 21 Heng Mui Keng Terrace, #04-02, Singapore, 119613;
[email protected], (65) 6516-8291, (65) 6774-1003 (fax). ‡ Fannie Mae, 3900 Wisconsin Avenue, Washington, DC 20016. E-mails:
[email protected] and
[email protected]. 1
Electronic copy available at: http://ssrn.com/abstract=1676724
Model Stability and the Subprime Mortgage Crisis 1. Introduction The recent financial crisis was originally triggered by the large scale of unexpected losses on mortgages and mortgage-related securities.1 The default shock has called upon an investigation of what went wrong with the credit risk models. Some see the problem coming from the data input. For example, Satyajit Das, a former Citigroup banker, told Bloomberg reporters that: “The models are fine. But they have an input problem. It becomes a number we pluck out of the air. They could be wrong, and the ratings could be misleading.”2 Others, however, blame model instability. For example, Alan Greenspan, 2008, suggested: “The whole intellectual edifice, however, collapsed in the summer of last year because the data inputted into the risk management models generally covered only the past two decades — a period of euphoria.”3 In this paper, we investigate the potential model instability problem with respect to mortgage default risk and examine to what extent the model instability explains the default shock in the crisis. Our study is with regard to two econometric models that are well-established in the academic literature and that have been widely adopted by the mortgage industry, the Logit model and the Cox proportional hazard model. Although mortgage lenders and investors usually keep their 1 Major mortgage investors had to substantially write down their mortgage assets and rating agencies had to adjust their ratings to reflect revised expectations of default losses during the crisis. Many mortgage lenders went into bankruptcy due to unexpected losses. 2 “CDO Boom Masks Subprime Losses, Abetted by S&P, Moody's, Fitch,” Bloomberg News, May 31, 2007. 3 “The Financial Crisis and the Role of Federal Regulators,” the House Committee on Oversight and Government Reform hearing on October 23, 2008. 2
Electronic copy available at: http://ssrn.com/abstract=1676724
specifications of those models proprietary and thus we cannot evaluate models of a particular lender or investor, we hope to form insights about the general features of those econometric models through this study. We find that both the conventional Logit model and the hazard model with reasonable specifications show inter-temporal instability. For example, based on Wald tests, subprime mortgage loans originated in 2006 have significantly different default sensitivity to house price appreciation (depreciation) than those originated in 2003. To assess how the model instability explains the default shock during the subprime mortgage crisis, we use parameters estimated with the 2003 vintage data to forecast default probabilities of the 2006 vintage loans and study the aggregate prediction accuracy. Using the actual realization of the default risk factors such as house price appreciation (HPI growth) and leverage of the 2006 vintage, we find that the hazard model estimated with the 2003 vintage data predicts about 40% fewer defaults than the actual results of the 2006 vintage while the Logit model predicts about 41% fewer defaults. Alternatively, we take imperfect HPI growth forecast into consideration. If one is to assume the same HPI growth during the periods of 2006-2009 and 2003-2006, the under-prediction will be more severe. The predicted default rate is less than 50 percent of the actual results. Stated differently, the actual default rate of the 2006 vintage loans during the 2006-2009 period is more than twice as high as that is predicted based on the 2003 vintage models. Apparently, adding bad forecast of other variables such as interest rate and unemployment into consideration, the ex post default rate could be several times higher than that is predicted. This finding coincides with the relation between expected and actual losses of the 2006 vintage loans: the 2006 vintage were originated with similar spreads with those of the 2003 vintage, reflecting mortgage lenders expected similar losses from those two vintages; however, the ex post default rate of the 2006 3
vintage is over 3 times higher than that of the 2003 vintage. Meanwhile, comparing the impacts of parameter instability and bad forecast of HPI growth, we find that the marginal impact of bad HPI forecast is smaller than that of a bad model. The model instability problem we penetrate in this paper is similar to the so-called “Lucas critique” regarding econometric policy evaluations (Lucas, 1976). The unprecedented crisis in the subprime mortgage market and the widely believed structural break in the mortgage and financial markets provide us a unique opportunity to study this issue in the field of risk management.
Our findings in this paper have a number of implications regarding model
limitations and risk, model improvements, economic capital and regulatory reform. The rest of the paper proceeds as follows: in the next section, we review the state of the art econometric models of mortgage default risk and discuss the specifications of the hazard model and Logit model that we are focusing on in this paper; we report our data and explain our sampling technique in section 3; in section 4, we discuss estimates of the two econometric models and parameter stability tests; we explain our prediction experiments and assess how model instability explains the default shock in section 5; and we provide concluding remarks in a final section. 2. Mortgage Default Risk Models 2.1. Econometric models of mortgage default risk The past forty years have seen a growing literature on mortgage default risk based on ex post loan performance4, which helps lenders and investors to understand determinants of mortgage 4 There is also a literature that tries to understand the implied (ex ante) default risk through mortgage prices (see, Kau, Kenan and Kim 1994, Capozza Kazarian and Thomson 1998 and many others). 4
default risk and lays out the foundation for default risk prediction, pricing and management. Econometric models for mortgage default risk have evolved from simple linear regressions to the more sophisticated Logit and hazard models. Linear regression von Furstenberg (1969, 1970a, 1970b) develop the first academic default risk model, a linear regression based on aggregate data. The author regress the aggregate default rate of FHA/VA loans on loan characteristics such as loan-to-value (LTV) ratio and age of the loan, and find that home equity at loan origination is the most important predictor of default. Subsequently, a number of studies apply that technique to different loan samples (e.g. loans originated by S&Ls) and alternative mortgage instruments (ARMs and GPMs).5 Later studies also experiment with more explanatory variables such as borrower income, payment-to-income ratio, metro unemployment rate, and mark-to-market LTV. Those studies provide important guidance for lenders’ underwriting practice and lead to major revisions in underwriting criteria.6 Linear regressions with aggregate data are still used in recent studies of subprime mortgage default (see, e.g. Mian and Sufi 2009). Probit and Logit models The literature on mortgage default proliferates as disaggregate loan level data become available. While some early studies such as Herzog and Earley (1970) still apply linear regressions to disaggregate loan level data (with 0/1 dependent variable)7, many more studies apply probit and Logit models based on economic theory of discrete choice. Jackson and Kasserman (1980) is the first to use probit model based on individual FHA loans. Campbell and Dietrich (1983) and many others use Logit models. The use of probit/Logit model 5 See, for example, von Furstenberg and Green (1974), Follain and Struyk (1977), Vandell (1978), Jackson and Kasserman (1980), Foster and Van Order (1984, 1985), Clauretie (1987), and Quigley and Van Order (1991). 6 For example, lenders made important changes in response to pressure on revealed redlining practice and Fannie Mae revised standards on ARMs based on academic studies (Vandell, 1993). 7 Other examples include Williams, Beranek and Kenkel (1974) and Webb (1982). 5
allows researchers to explore more risk factors at the loan level such as loan purpose, loan term, etc. Meanwhile, contemporaneous equity position replaces original equity position in newer studies (e.g. Zorn and Lea 1989, Cunningham and Capone 1990). Transaction costs of default, trigger events and related borrower characteristics appear more frequently in the models (see, e.g. Vandell and Thibodeau 1985, Hendershott and Schultz 1993, Archer, Ling and McGill 1996, 1997, Capozza, Kazarian and Thomson 1997).
As option pricing theory being applied to
mortgage valuations, more and more studies conduct tests on whether default is significantly related to put option “in the money” (see, e.g. Quigley and Van Order 1995, Archer, Ling and McGill 1996, Philips, Rosenblatt and Vanderhoff 1996). Recent applications of Logit models create event-history for each loan and thus are better suited to study the impacts of time varying variables (see, e.g. Clapp et al 2001, Ambrose and Sanders 2003, Clapp, Deng and An 2006, An, Clapp and Deng 2010). For subprime mortgage default, Rajan, Seru and Vig (2010) and Keys, Mukherjee, Seru and Vig (2010) have applied Logit models in their studies. Hazard model Developed in the biostatistics literature and first used for mortgage prepayment risk studies (e.g. Green and Shoven 1986, and Quigley 1987), the proportional hazard model has prevailed for mortgage default risk studies in the past two decades (see Van Order 1990, Quigley and Van Order 1991, Schwartz and Torous 1993, Deng, Quigley and Van Order 1996 and many others). In contrast with Logit model with event-history data, which assumes the borrower’s choices in each month are i.i.d. events, hazard model is based on conditional default probability and implicitly handles path-dependency. It is thus more appealing for the modeling of borrower behavior that is usually path-dependent. Moreover, Logit model is restricted by the assumption of no correlations among competing risks via unobservable variables. By contrast, hazard model has the flexibility to allow correlated competing risks (Clapp, Deng and An 2006). Recently, a 6
number of papers have applied the Cox proportional hazard model to study subprime mortgage default (e.g. Demyanyk and Hemert 2008, Gerardi, Shapiro and Willen 2008, Elul 2009, Haughwout, Okah and Tracy 2009, Green, Rosenblatt and Yao 2010, An, Yao and Rosenblatt 2010)8. However, proportional hazard model is not as convenient as the multinomial Logit model in dealing with the competing risks of mortgage default and prepayment. To address that issue, Deng, Quigley and Van Order (2000) apply the competing risks hazard model to mortgage prepayment and default, and a number of following studies have adopted that methodology. Deng, Quigley and Van Order (2000) and Deng and Quigley (2002) also model the unobserved heterogeneity with a mass-point mixed competing risks hazard model. Alexander, Grimshaw, McQueen and Slade (2002) and Pennington-Cross (2003) apply that technique to subprime mortgage default. Clapp, Deng and An (2006) extend the unobserved heterogeneity concept to multinomial Logit model. Covariates included in a hazard model are usually similar to those in a Logit model. Although other models such as discriminant analysis, neural networks analysis, and classification trees analysis appear in the literature9, Logit model and hazard model have been the dominant econometric models used in the academic literature for mortgage default risk. They have also become the standard tool of mortgage default risk analysis in the mortgage industry. A Cox proportional hazard model assumes that the hazard rate of default of a mortgage loan at period T since its origination follows the following form:
hi T ; X i t h0 T exp X i t ' , i 1, n .
(1)
8 Ciochetti et al (2003), Chen and Deng (2003), and An, Deng and Sanders (2009) apply the model to CMBS loan default. 9 See Morton (1975), Episcopos, Pericli and Hu (1998), and Feldman and Gross (2005), respectively. 7
Here h0 T is the baseline hazard function, which only depends on the age (duration) of the loan,
T 10; X i t is a vector of proportional covariates for individual loan i that are time-varying or time-invariant risk factors. In a Logit model, the default probability of a loan at age T is:
Pri T ; Z i t
exp Z i t , T '
1 exp Z i t , T '
.
(2)
Here the dependence of default probability on loan age (duration) is modeled by including loan duration dummy variables in the covariates set Zi t , T . 2.2. Model specification Our model specification generally follows the existing literature. We include the following covariates in our models: Negative equity von Furstenberg (1970a, 1970b), Williams, Beranek and Kenkel (1974) and
many other find that home equity at loan origination is the most important predictor of mortgage default.
Later studies use contemporaneous LTV that takes house price change and loan
amortization into consideration, and find it to be a significant risk factor (see, e.g. Foster and Van Order 1984, Vandell and Thibodeau 1985, Deng 1997). Recent studies on subprime default risk also find such variable a significant risk factor (e.g. Alexander, Grimshaw, McQueen and Slade 2002)11. In this paper, we calculate the borrower’s negative equity as the difference between contemporaneous house value and market value of the loan. Home price index (HPI) and loan 10 Notice that the loan duration time T is different from the natural time t, which allows identification of the model. 11 Alternatively, Demyanyk and Hemert 2008 and Elul 2009 use house price appreciation. 8
amortization are incorporated in our calculation. Additionally, we acknowledge multiple loans (liens) on some properties and thus use the combined loan amount in negative equity calculation. FICO score FICO score is a numerical summary of borrower’s history of debt repayment.
Although prime mortgage lenders screen out the low credit score borrowers, some researchers still find the level of credit score matters to default risk among those loans originated (e.g. Clapp et al 2001). For subprime mortgage loans, many recent studies have found it to be a significant risk factor (see, e.g. Pennington-Cross 2003, Demyanyk and Hemert 2008, Elul 2009, Keys, Mukherjee, and Seru and Vig 2010). Since many mortgage loans have cosigners (husbands or wives), we use the minimum of the FICO scores of the two in this study. Backend ratio The literature has long included payment-to-income ratio as a default risk factor
(e.g. Herzog and Earley 1970, Archer, Ling and McGill 1996, Deng and Gabriel 2006). The payment-to-income ratio (frontend ratio) and debt-to-income ratio (backend ratio) are indeed two important mortgage underwriting variables besides FICO score. But again for many subprime mortgage loans, those two variables far exceed the traditional underwriting thresholds and recent studies have found debt-to-income ratio to be significant default risk determinants (Demyanyk and Hemert 2008, Green, Rosenblatt and Yao 2010). Loan type Fixed rate mortgage (FRM) behave very differently from adjustable-rate mortgage
(ARM) (see, e.g. Cunningham and Capone 1990, Philips, Rosenblatt and Vanderhoff 1995, Calhoun and Deng 2002). Some research has also found that 15-year FRMs are less risky than 30-year FRMs (e.g. Alexander, Grimshaw, McQueen and Slade 2002, Deng and Gabriel 2006). In this study, we focus on FRM but include 15-year FRMs as explanatory variable.
9
Property type Recent research has found that condominium loans are less likely to default,
everything else equal (Agarwal, Ambrose and Sanders 2009). Therefore, we test whether different property types, i.e. single unit, two-to-four unit and condominium, have different default risk. Loan purpose Different loan purposes indicate borrowers being in different stage of their
housing tenure as well as in different financial situations. While Clapp et al (2001) find refinance loans are more likely to default among prime mortgage loans, recent research such as Elul (2009) find that subprime refinance loans are less likely to default, everything else equal. A related variable considered in the literature is whether the property is an existing/new unit (see, e.g. von Furstenberg and Green 1974, Campbell and Dietrich 1983, Deng and Gabriel 2006). We consider three different loan purposes in this study: home purchase, rate/term refinance, and cash out refinance. Documentation type An important feature of subprime mortgage loans is that many loans do not
have full documentation of income, asset or employment. The low documentation may be caused by borrower’s difficulty in verifying their income, asset or employment, or in some extreme situations borrowers simply state the income they don’t have (stated income loans). Recent studies have found that low doc loans have higher default risk (e.g. Demyanyk and Hemert 2008, Rajan, Seru and Vig 2009). Occupancy type Demyanyk and Hemert (2008) and Agarwal, Ambrose and Sanders (2009) find
that investor properties are more likely to default. In this paper, we consider the following three types of occupancy types: owner-occupied, second/vacation home, investor property.
10
Mortgage brokerage type Many popular media has ascribed the subprime crisis to mortgage
brokers. Green, Rosenblatt and Yao (2010) have found that broker and correspondent loans have higher default risk than retail loans. We therefore include brokerage type in our models. Origination loan balance Size of the loan is thought to be related to the transaction cost of
default (e.g. Clapp et al 2001, Deng and Gabriel 2006) and recent studies of subprime mortgage default risk have found it to be significantly related to default (see, e.g. Demyanyk and Hemert 2008, Elul 2009). Origination LTV Some researchers believe that LTV at origination (or down payment) does not
only affects the equity position of the borrower throughout the life of the loan, but also reveals borrower’s default propensity, or indicates the borrower’s ability to save, or affects borrower’s default decision as sunk costs (see, Yezer, Phillips and Trost 1994, Deng, Quigley and Van Order 1996, 2000, Kelly 2009, Green, Rosenblatt and Yao 2010). Additionally, lenders may pay different levels of due diligence on high LTV and low LTV loans. Therefore, in addition to considering combined LTV in negative equity calculation, we include origination LTV. Prepayment penalty Most prime residential mortgage loans are free to prepay. By contrast,
many subprime loans have prepayment penalty clause in the mortgage contracts. Researchers believe prepayment penalty limit the subprime borrower’s ability to refinance into more affordable loans and thus increase the chance of default (e.g. Demyanyk and Hemert 2008, Elul 2009, Agarwal, Ambrose and Sanders 2009). Unemployment rate One possible reason of mortgage default is borrower’s loss of job and thus
not being able to make the mortgage payment. Therefore, the mortgage default risk literature has long included local area unemployment as a risk factor (see, e.g. Williams, Beranek and Kenkel 11
1974, Campbell and Dietrich 1983, Deng, Quigley and Van Order 2000). Recent research has also found it to be a significant risk factor for subprime loans (e.g. e.g. Demyanyk and Hemert 2008, Elul 2009). Excess premium
There is increasing evidence that mortgage lenders can possess private
information about borrower/loan quality that is not reflected in underwriting documents (see, e.g. Elul 2009, Rajan, Seru and Vig 2009, An, Deng and Gabriel 2010, Keys, Mukherjee, Seru and Vig 2010). Therefore, we include excess premium as a proxy of lender’s private information about loan quality. The variable is constructed as the residual of a mortgage spread regression that includes all observable default risk factors on the right hand side.12 Other variables We also consider some other variables such as jumbo loan status, growth of per
capita disposable income and growth of population in the metro area, corporate credit spread, and HPI volatility. There are not included in the final model due to multicollinearity problem. Ideally, we would include borrower characteristics such as age, gender, ethnicity and profession, number of dependents, and neighborhood variables such as whether the property is in central city, neighborhood homeownership rate, poverty level, crime rate, percent of homes foreclosed, etc. However, we do not have data on those variables. 3. Data
3.1 Data sources Our data is mainly from First American CoreLogic LoanPerformance (hereafter LP). The LP database contains loan-level data on over 80 percent of all securitized subprime mortgages, which is also over half of all subprime mortgage loans originated in US. 12 The regression results are available upon request. 12
LP provides detailed information on each subprime mortgage loan, including note rate, original loan balance, LTV, loan term (30 year, 15 year, etc.), loan type (fixed-rate, 5-1 ARM, etc.), loan purpose (home purchase, rate/term refinance, cash out refinance), borrower credit score, occupancy status, number of units, originator type (broker, retail lender, etc.), and prepayment penalty type. LP also tracks the performance (default, prepayment, mature, or current) of each loan in every month. Therefore, we construct the event-history of each loan, starting from its origination to default, prepayment, mature, or our data collection point, whichever is the earliest. We also merge other information such as HPI growth, interest rate, MSA-level unemployment rate and income growth into our loan level data. Treasury rate and interest rate swap rate is matched into the data to calculate the mortgage spread. HPI is from Fannie Mae and it is at the zip code level. Treasury interest rate, corporate bond yields are from the Federal Reserve, and MSA-level income growth and unemployment rate are from Moody’s Economy.com. 3.2 Sampling The LP database contains about 14 million subprime mortgage loans. For our study purposes, we focus on first-lien, fixed-rate mortgage loans, which are about 19 percent of all loans13 . We further apply a number of filters: we first exclude loans originated before 1995 since LP has relatively less accurate information about those loans; seasoned loans are excluded since information such as loan balance and LTV of those loans is not at loan origination; we also exclude those loans with interest only periods or those not in metropolitan areas (MSAs); loans with missing or wrong information on property type, refinance indicator, occupancy status, backend ratio, FICO score, documentation level or mortgage note rate are excluded. 13 A large fraction of the subprime mortgage loans are ARMs, e.g. about 38 percent of the LP sample are 2/28 ARMs. 13
We further adopt a sampling technique for our purposes of study: we select a 10% random sample of three vintage loans, those originated in 2000, in 2003 and 2006. The numbers of subprime mortgage loans of those three vintages are 8,533, 31,836 and 26,876, respectively. Then for each vintage, we look at a three-year window of loan performance after loan origination. For example, for loans originated in 2000, we focus on its performance in 2000, 2001 and 2002. In so doing, we have three non-overlapping samples. 3.3 Descriptive statistics In table 1, we report the performance of the three vintage subprime mortgage loans. Default is defined as over 90- day delinquency, and censor means that the loan is alive at the end of the three-year window. Default rate varies across the three vintages, e.g. 2003 vintage has a cumulative default rate of about 7 percent over the three-year window, in contrast to the 16 percent of the 2000 vintage and the 22 percent of the 2006 vintage. Apparently the strong house price appreciation the 2003 vintage experienced during 2003-2005 helped most of the 2003 vintage loans stay current, while the 2001-2002 economic downturn and the sharp house price decline starting from 2006 contributed to the high default rates of the 2000 and 2006 vintages. Overall, default rates of all the three vintages of subprime mortgage loans are much higher than that of prime mortgage loans as reported in previous studies (see, e.g., Philips, Rosenblatt and VanderHoff 1995, Deng, Quigley and Van Order 2000, Clapp, Deng and An 2006). Figure 1 compares the cumulative default rates of the three vintages over the life of the loan. In every quarter after loan origination in the three-year window, the 2003 vintage has lower default rate than the 2000 and 2006 vintages. Default rate of the 2006 vintage starts lower than that the
14
2000 vintage but soon surpassed that of the 2000 vintage one year after loan origination. Over 20 percent of loans originated in 2006 default within two years of origination. Table 2 compares the loan characteristics of the three vintages. The 2000 vintage has much lower average loan amount but much higher average coupon rate. Mortgage spread is defined as difference between the mortgage coupon rate and comparable maturity Treasury rate14. The 2000 vintage has an average mortgage spread of 505 bps, while the 2003 vintage and the 2006 vintage has an average mortgage spread of 339 bps and 343 bps, respectively. Apparently the relative magnitude of the mortgage spread of the 2000 and 2003 vintages somehow reflects the aforementioned default rate differences between these two vintages. However, this risk-return relationship is not true when we compare the 2006 vintage with the 2003 vintage – while they have similar average mortgage spread, the 2006 vintage have over 3 times higher cumulative default rate than the 2003 vintage over a three-year window. This finding concurs the so called “default shock” – lenders and investors found several times higher default rate than expected during the housing and subprime mortgage crisis. Average FICO score improves over time. In fact, the average FICO scores of the 2003 and 2006 vintages both exceed 620, the traditional FICO score cutoff for prime mortgage loans. This pattern is consistent with many anecdotal evidences that subprime lending became more for noncredit reasons as the market evolved. This observation is also supported by the increases of borrowers having low/no documentation. In 2000, only 19 percent of subprime loans have low/no doc, while the percentage of low/no doc increased to 32 and 28 percent, respectively, in 2003 and 2006.
14 10-year Treasury rate for FRM 30 and 7-year Treasury rate for FRM 15. 15
Combined LTV also increases monotonically from 2000 to 2006. In fact, table 3 shows that the 2006 vintage has substantially higher proportion of high LTV loans. Nearly 18 percent of loans originated in 2006 have LTV higher than 97 percent, while that number is less than 3 percent for the 2000 vintage. Proportion of less risky 15-year loans decreases over time. In 2000, 19 percent of subprime FRMs are 15-year, while in 2006 this number becomes only 5 percent. Loan purpose compositions also vary over time. Percentage of loans as rate/term refinance loans is 10 percent in 2000. It increased to 18 percent in 2003 and then fell back to 10 percent in 2006. We also notice that a large proportion of the 2003 vintage were originated by mortgage brokers or correspondent lenders. Prepayment penalty prevails in all of the three vintages. Table 4 further presents a comparison of the time-varying covariates of the three vintage loans. The most significant difference comes from HPI growth. The 2000 and 2003 vintages experienced an average HPI growth of 7 percent and 14 percent, respectively. In contrast, the 2006 vintage had an average HPI decline of 4 percent. Correspondently, the average negative equity of the 2006 vintage is much higher than those of the 2000 and 2003 vintages. Both the 2000 and 2006 vintage loans experienced an average 1 percentage point increase in unemployment rate, while the 2003 vintage had decline in unemployment rate (improvement in employment). 4. Model Estimation and Tests of Model Stability
4.1. Model estimation Both the hazard model and the Logit model are estimated using the maximum likelihood estimation methods as discussed in Clapp, Deng and An (2006).
16
Table 5 reports our hazard model estimates on the three separate samples, which are constructed based on the event-history data of the three vintage loans. The first column of the coefficients is for the 2000 vintage sample. Most of the estimates are conforming to our expectation. For example, default probability decreases with FICO score. The higher the original loan balance, the lower the likelihood that the loan will default post-origination, everything else equal. Low/no doc loans, investment property loans, and loans with prepayment penalty all have higher default risk than their reference groups, respectively. 15-year FRM and condo loans have lower default risk. Backend ratio is marginally significant with the expected sign of coefficient. Those loans with original LTV higher than 80 percent do not show a significant different default risk than those with LTV lower than or equal to 80 percent. We do see a significant positive relationship between negative equity and default probability – the larger the negative equity, the more likely the loan will default. Interestingly, Excess premium is significantly related to default probability, which supports the notion that loan originators do possess valuable private information regarding loan default risk and they incorporate that information in loan pricing. The 2003 vintage estimates show more significant default risk factors. For example, backend ratio is now highly significant with the expected sign of coefficient. Loans with higher than 80 percent original LTV are riskier than those with original LTV less than or equal to 80 percent, everything else equal. 2- to 4-unit property loans have higher risk than 1-unit loans. Both rate/term refinance and cash out refinance loans are shown to be less risky than home purchase loans. In addition, broker/correspondent loans tend to be riskier. Change in unemployment rate becomes a significant risk factor with the expected impact. FICO score, log loan balance, low/no doc, 15-year FRM, condo loan, investment property, Excess premium and negative equity
17
continue to be significant and have the same signs of coefficient with those of the 2000 vintage estimates. The significant default risk factors of the 2006 vintage are similar to those of the 2003 vintage except that LTV greater than 80 percent, condo loan, and broker/correspondent loan become marginally significant. However, we notice that the magnitude of many risk factors is very different from those of the 2003 vintage estimates. In table 6, we present our estimates of the Logit model. Here we are concentrating on default probability and thus prepayment and censor observations are counted as non-default and a binary Logit model is estimated. First, we notice that the estimates of all the three vintage models are similar with those of the hazard model estimates reported in table 5. Second, comparing the estimates across the three vintage samples, the patterns are also similar with those discussed above. 4.2. Tests of parameter stability To formally assess whether parameters estimated with the three separate samples are statistically different, we conduct Wald tests as discussed in Andrews and Fair (1988). Basically, denote
and and * as true parameters of any two models (based on two different vintage loans), and
as their estimates. We test the following hypothesis: *
H0 : *
(3)
The Wald statistic is:
18
* ' var var * W
1
*
(4)
Under the null hypothesis, the Wald test statistic should be 2 distributed with a degree of freedom equal to the number of parameters in the model (number of rows in the first or third matrix in equation (4)). Wald test results of the hazard model are reported in table 5 to the side of the estimates. Moving from the 2000 vintage model to the 2003 vintage model, a number of parameters are statistically different: default probability of the 2003 vintage are more sensitive than that of the 2000 vintage to FICO score and log loan balance, as the magnitude of those two coefficients are significantly larger in the 2003 vintage model than in the 2000 vintage model; interestingly, higher than 80 percent LTV loans have significantly higher default risk in the 2003 vintage model but not in the 2000 vintage model. This may be due to the fact that when more subprime loans are available, higher risk borrowers self-select into high LTV loans. Rate/term refinance and cash out refinance also become significant in the 2003 vintage model, which could be due to relatively worse performance of the home purchase loans in the 2003 vintage; however, three of the prepayment penalty dummy variables become insignificant; finally, the sensitivity of default probability to Excess premium declines significantly, which is consistent with findings in An, Yao and Rosenblatt (2010) that Excess premium becomes less predictive of default possibly due to subprime lenders’ decreasing effort to collect soft information when originate-to-distribute becomes easier. Comparing the parameters of the 2003 and 2006 model, again we see significant parameter instability. The sensitivity of default probability to FICO score and change in unemployment rate becomes smaller in the 2006 vintage model, while LTV greater than 80 percent, condo loan, 19
broker/correspondent loan become insignificant in the 2006 vintage model. The most remarkable changes come from log loan balance and negative equity. Log loan balance is negatively associated with default probability in the 2003 vintage model but it becomes positively related to default probability in the 2006 vintage model. Negative equity coefficient does not change sign but the magnitude in the 2006 vintage model is more than three times higher than that in the 2003 vintage model. In other words, the 2006 vintage subprime borrowers are much more sensitive to negative equity in their default decisions. This is in fact quite intuitive: some borrowers might not choose to default when house price is on the rise even if they had some negative equity in their houses; but many borrowers might have chosen to default when house price was falling even if they only had small negative equity in their houses. Wald test results on the Logit models are reported in table 6. They are very similar to the aforementioned results on the hazard models. A number of parameters are unstable over time but the most instability comes from the negative equity variable. Borrowers become much more sensitive to negative equity (decline in house price) in default during the crisis. Apparently, when house price dropped dramatically during the crisis, this increased sensitivity made things worse as they multiply to the increase negative equity to cause more defaults.
5. Default Shock and the Subprime Mortgage Crisis Econometric default risk models rely heavily on historical data. Mortgage lenders and investors typically use mortgage loan performance observed in previous periods to estimate how certain default risk factors such as house price appreciation affects mortgage default probability and loss severity. Such models are then used to predict future default losses under simulated paths of
20
house price appreciation.15 One can imagine that if models are unstable over time, even with the most accurate predictions of the risk factor dynamics, default probability (loss) predictions will be significantly off the target. The subprime mortgage crisis is characterized by an unusually large fraction of subprime mortgage loans originated during 2005-2007 turning into default during 2007-2009. This high wave of default comes as a shock to many lenders, investors and rating agencies. Evidenced in the previous analysis, the 2006 vintage subprime mortgage loans were originated with very similar mortgage spreads with those of the 2003 vintage; however the ex post default rate of the 2006 vintage is over three times higher than that of the 2003 vintage. In this section, we conduct a simple econometric experiment to decompose this “default shock”, which is to see how much of the default rate “surprise” is due to the unprecedented house price drop (HPI input error) and how much of the surprise is due to the changing sensitivity of the parameters (parameter instability). Notice that the subprime mortgage market started to explode in 2003 while default rates of subprime loans really take off in 2006. Therefore, using the 2003 vintage model to predict the 2006 vintage data will be an interesting experiment regarding model instability. We obtain the parameter estimates from the 2003 vintage sample and use them as default risk factor loadings16. In the first experiment, we use the actual subsequent values of the default risk factors for the 2006 vintage loans, together with parameters estimated based on the 2003 vintage sample to predict default rate of the 2006 vintage. This experiment tells us that if we have perfect prediction of the default risk factors how accurate we can predict default probability. Notice that 15 Those predictions together with scenario analysis and sensitivity analysis are then used to assist mortgage underwriting, pricing and risk management. 16 We set the insignificant parameters to zero because they are statistically indifferent from zero. 21
this is not a completely feasible forward-looking prediction but it separates the parameter instability problem from model input error problem. With both the hazard model and the Logit model, we make quarter-by-quarter predictions. Figure 2 plots the predicted cumulative default rates by the two models in contrast with the actual cumulative default rate. We see that while the two models have very similar predictions, both under-predict defaults remarkably. Table 7 simply presents the aggregate results. Again, both the hazard model and the Logit model under-predict the default probability of the 2006 vintage loans. While the actual cumulative default rate of the 2006 vintage loans is 22.2 percent in a three-year window, our hazard model prediction is only 13.3 percent and our Logit model prediction is only 13.0 percent. Normalized by the actual default rate, we can see from figure 3 that the hazard model predicts about 40% fewer defaults than the actual results while the Logit model predicts about 41% fewer defaults. Prior to the crisis, the predicted future house price path was probably much higher than the actual subsequent price path. Therefore in our second experiment, in addition to using the 2003 vintage model estimates to predict default of the 2006 vintage, we assume a naïve house price model – the one that predicts the HPI growth rate in each zip-code during 2006-2008 remains the same with that of 2003-2005. In so doing, we are able to see a combined impact of parameter instability and bad HPI forecast. Again, table 8 presents the cumulative predicted defaults in a three-year window and compares them to the actual figures. Both models predict less than 11 percent of defaults while the actual default rate is about 22 percent. Therefore, the combined impact of parameter instability and bad HPI forecast is larger than the sole impact of parameter instability: it causes over 50 percent
22
fewer defaults than the actual results. State differently, the actual default rate is over twice higher than the predicted default rate. However, an interesting observation from a comparison of table 7 and table 8 is that the marginal impact of the bad HPI forecast is much smaller than that of the parameter instability (prediction accuracy comes down from 60 percent to 48 percent in contrast to from 100 percent to 60 percent). This may help explain why the default wave came as a “surprise” even though many lenders and investors conducted scenario analysis and some of them might have already predicted much lower HPI growth for the 2006-2008 period – using a wrong model is more detrimental than applying an unrealistic HPI growth.
6. Conclusions and Discussions The subprime mortgage market has experienced an explosive development in the early- and mid2000s and then collapsed in 2007. During the past three years, massive defaults of subprime mortgage loans have caused catastrophic losses in the financial markets. Much of the default loss came as a shock to the investment community, as evidenced either from the non-proportionate mortgage spreads charged by lenders at loan origination or from the large scale of write down mortgage lenders and investors conduct on their mortgage assets during the crisis. This has spurred retrospection on what went wrong with the risk management models. Following this spirit, we investigate the stability of econometric default risk models and conduct econometric experiments to examine to what extent the model instability explains the default shock. Estimating separate hazard and Logit models for three vintage loans, all with a three-year observation window, we find that the prevailing econometric mortgage default probability models can be highly unstable over time. We find that not only the default risk factors such as HPI growth are significantly different across the three vintages, coefficients of a number of
23
variables especially that of the negative equity variable are significantly different in those three vintage models. Comparing the 2003 vintage loans with the 2006 vintage loans, the 2003 vintage have experienced the highest house price run-up in the history within three years of their origination, while those loans originated in 2006 were exposed to an unprecedented house price decline during 2006-2008. Meanwhile, default probability of the 2006 vintage loans are over three times more sensitive than that of the 2003 vintage to house price change. Our simulation suggests that both the hazard model and the Logit model estimated with 2003 vintage data under-predict the default probability of the 2006 vintage loans. Assuming a perfect forecast of HPI and other default risk factors, the hazard model predicts about 40 percent fewer defaults than the actual results while the Logit model predicts about 41 percent fewer defaults. When house price forecasting is not accurate, we see a more severe under-prediction. Assuming a naïve house price prediction, the two econometric models under-predict over 50 percent of the default rate. The findings in this paper have a number of implications. First, we have to exercise extra caution explaining and applying empirical results based on historical data, especially those nonrepresentative ones. The house price run-up during 2003-2006 was atypical. If we were just to use data during the atypical period in default risk forecasting we would obtain exceptional results, as we show in this paper. It is definitely not an easy task to identify the non-representative data ex ante. Remedies to that problem include using larger sample and longer history, and adding scrutiny to every data we analyze. Second, judged by the aggregate post-sample prediction accuracy, we need improvements in default risk models as well as house price forecasting models. Certainly, the current paper does not explore the optimal specification within the current hazard or Logit model framework and we do believe improvements can be made in that regard. 24
However, models with more “structural framework” may be more promising. For example, as many people believe that we have had regime shifts in the mortgage and housing market, models that can capture those regime shifts many help improve our ability to forecast mortgage default risk. Third, default risk models can be misleading if used inappropriately and model risk has to be understood in risk management operations. Model limitations may be masked by other factors during normal times but when there is structural change that leads to different data generating mechanism model risk can become most significant and costly. Fourth, from a regulation perspective, the Basel II regulation framework should be reformed to address the credit cycles and avoid the pro-cyclicality of usual risk assessment models. Finally, economic capital is important to mortgage bankers and to the investment community. In that regard, again a technical problem will be how to get around the pro-cyclicality of usual risk management models.
25
References Agarwal, Sumit, Brent W. Ambrose, Souphala Chomsisengphet and Anthony B. Sanders. 2009. The Neighbor’s Mortgage: Does Living in a Subprime Neighborhood Impact your Probability of Default? SSRN working paper. Alexander William P., Scott D. Grimshaw, Grant R. McQueen and Barrett A. Slade. 2002. Some Loans Are More Equal than Others: Third-Party Originations and Defaults in the Subprime Mortgage Industry. Real Estate Economics 30(4), 667-697. Ambrose, B. W. and A. B. Sanders. 2003. Commercial Mortgage Backed Securities: Prepayment and Default. Journal of Real Estate Finance and Economics, 26(2/3): 179-196. An, Xudong, John C. Clapp and Yongheng Deng. 2010. Omitted Mobility Characteristics and Property Market Dynamics: Application to Mortgage Termination. Journal of Real Estate Finance and Economics 41(3). An, Xudong, Yongheng Deng and Stuart A. Gabriel. 2010. Asymmetric Information, Adverse Selection, and the Pricing of CMBS. Journal of Financial Economics, forthcoming. An, Xudong, Yongheng Deng and Anthony B. Sanders. 2009. Default Risk of CMBS Loans: What Explains the Regional Variations? National University of Singapore, IRES Working Paper 2009-009. Andrews, Donald W. and Ray C. Fair. 1988. Inference in Nonlinear Econometric Models with Structural Change. Review of Economic Studies 55: 615-640. Archer, W. R., D. C. Ling and G.. A. McGill. 1996. The Effect of Income and Collateral Constraints on Residential Mortgage Terminations. Regional Science and Urban Economics, 26: 235-261. Archer, W. R., D. C. Ling and G.. A. McGill. 1997. Demographic Versus Option-Driven Mortgage Terminations. Journal of Housing Economics, 6(2): 137-163. Calhoun, Charles, and Yongheng Deng. 2002. A Dynamic Analysis of Fixed- and Adjustable-Rate Mortgage Terminations. Journal of Real Estate Finance and Economics, 24: 9-33. Campbell, T. and J. K. Dietrich. 1983. The Determinants of Default on Conventional Residential Mortgages. Journal of Finance, 48(5): 1569-1581. Capozza, D. R., D. Kazarian, and T.A. Thomson. 1997. Mortgage Default in Local Markets. Real Estate Economics, 25(4): 631-655. Capozza, D. R., D. Kazarian, and T. A.Thomson. 1998. The Conditional Probability of Mortgage Default. Real Estate Economics. 26(3): 359-390. Chen, Jun and Yongheng Deng. 2003. Commercial Mortgage Workout Strategy and Conditional Default Probability: Evidence from Special Serviced CMBS Loans. USC Lusk Center for Real Estate Working Paper, 2003-1008.
26
Ciochetti, Brian A., Yongheng Deng, Gale Lee, James Shilling and Rui Yao. 2003. A Proportional Hazards Model of Commercial Mortgage Default with Originator Bias. Journal of Real Estate Finance and Economics 27(1), 5-23. Clapp, John M., Yongheng Deng and Xudong An, 2006, Unobserved Heterogeneity in Models of Competing Mortgage Termination Risks, Real Estate Economics 34(2), 243-273. Clapp, J. C., G. M. Goldberg, J. P. Harding and M. LaCour-Little. 2001. Movers and Shuckers: Interdependent Prepayment Decisions. Real Estate Economics, 29(3): 411-450. Clauretie, T. M. 1987. The Impact of Interstate Foreclosure Cost Differences and the Value of Mortgages on Default Rates. Journal of the American Real Estate and Urban Economics Association, 15(3): 152-67. Cunningham, D. F. and C. A. Capone, Jr. 1990. The Relative Termination Experience of Adjustable to FixedRate Mortgages. Journal of Finance, 45(5): 1687-1703. Deng, Yongheng, 1997. Mortgage Termination: An Empirical Hazard Model with Stochastic Term Structure. Journal of Real Estate Finance and Economics, 14 (3), 309-331. Deng, Yongheng, John M. Quigley and Robert Van Order. 1996. Mortgage Default and Low Down-payment Loans: The Cost of Public Subsidy. Regional Science and Urban Economics, 26: 263-285. Deng, Yongheng, John M. Quigley and Robert Van Order. 2000. Mortgage Terminations, Heterogeneity and the Exercise of Mortgage Options. Econometrica, 68(2): 275-307. Deng, Yongheng and John M. Quigley. 2002. Woodhead Behavior and the Pricing of Residential Mortgages. Lusk Center for Real Estate Working Paper, No. 2001-1005. Deng, Yongheng, and Stuart A. Gabriel. 2006. Risk-Based Pricing and the Enhancement of Mortgage Credit Availability among Underserved and Higher Credit-Risk Populations. Journal of Money, Credit and Banking, 38 (6), 1431-1460. Demyanyk, Yuliya S. and Van Hemert, Otto. 2009. Understanding the Subprime Mortgage Crisis. Review of Financial Studies, forthcoming. Elul, Ronel. 2009. Securitization and Mortgage Default: Reputation vs. Adverse Selection. SSRN working paper. Episcopos, A., A. Pericli, and J. Hu. 1998. Commercial Mortgage Default: A Comparison of Logit with Radial Basis Function Networks. Journal of Real Estate Finance and Economics, 17(2):163-178. Feldman, David and Shulamith Gross. 2005. Mortgage Default: Classification Trees Analysis. Journal of Real Estate Finance and Economics 30(4), 369-396. Follain, J. and R. Struyk. 1977. Homeownership Effects of alternative Mortgage Instruments. Journal of the American Real Estate and Urban Economics Association, 5(1): 1-43. Foster, C. and R. Van Order. 1984. An Option-Based Model of Mortgage Default. Housing Finance Review, 3(4): 351-372. 27
Foster, C., and R. Van Order. 1985. FHA Terminations: A Prelude to Rational Mortgage Pricing. Journal of the American Real Estate and Urban Economics Association, 13:292-316. Gerardi, Kristopher, Adam Hale Shapiro and Paul S. Willen. 2008. Subprime Outcomes: Risky Mortgages, Homeownership Experiences, and Foreclosures. Federal Reserve Bank of Boston working paper. Green, J. and J. B. Shoven. 1986. The Effect of Interest Rates on Mortgage Prepayment. Journal of Money, Credit and Banking 18, 41-50. Green, Richard K., Eric Rosenblatt and Vincent Yao. 2010. Sunck Costs and Mortgage Default. SSRN working paper. Haughwout, Andrew, Ebiere Okah and Joseph Tracy. 2009. Second Chances: Subprime Mortgage Modification and Re-Default. Federal Reserve Bank of New York Staff Report. Hendershott, P. H., and W. R. Schultz. 1993. Equity and Nonequity Determinants of FHA Single-Family Mortgage Foreclosures in 1980s. Journal of American Real Estate and Urban Economics Association. 21(4): 405-430. Herzog, J. and J. Earley. 1970. Home Mortgage Delinquency and Foreclosure. New York: National Bureau of Economic Research. Jackson, J. and D. Kaserman. 1980. Default Risk on Home Mortgage Loans: A Test of Competing Hypotheses. Journal of Risk and Insurance, 4: 678-690. Kau, J. B., D. C. Keenan and T. Kim. 1994. Default Probabilities for Mortgages. Journal of Urban Economics, 35: 278-296. Kelly, Austin. 2009. Skin in the Game: Zero Down Payment Mortgage Default. Journal of Housing Research, 17 (2), 75-99. Keys, Benjamin, Tanmoy Mukherjee, Amit Seru and Vikrant Vig. 2010. Did Securitization Lead to Lax Screening? Evidence from Subprime Loans. Quarterly Journal of Economics 125 ( 1), 307-362. Lucas, Robert. 1976. Econometric Policy Evaluation: A Critique. In Brunner, K. and A. Meltzer, The Phillips Curve and Labor Markets, Carnegie-Rochester Conference Series on Public Policy 1: 19-46. New York: Elsevier. Mian, Atif and Amir Sufi. 2009. The Consequences of Mortgage Credit Expansion: Evidence from the U.S. Mortgage Default Crisis. Quarterly Journal of Economics, 124 (4), 1449-1496. Morton, T. G. 1975. A Discriminant Function Analysis of Residential Mortgage Delinquency and Foreclosure. Journal of the American Real Estate and Urban Economics Association, 3(1): 73-90. Pennington-Cross, Anthony. 2003. Credit History and the Performance of Prime and Nonprime Mortgages. Journal of Real Estate Finance and Economics 27(3), 279-301. Philips, R.A., E. Rosenblatt and J.H. VanderHoff. 1996. The Probability of Fixed and Adjustable Rate Mortgage Termination. Journal of Real Estate Finance and Economics 13(2): 95–104. 28
Philips, R. A. and J. H. VanderHoff. 2004. The Conditional Probability of Foreclosure: An Empirical Analysis of Conventional Mortgage Loan Defaults. Real Estate Economics, 32(4): 571-587. Quigley, John M. 1987. Interest Rate Variations, Mortgage Prepayments and Household Mobility. Review of Economics and Statistics 69(4), 636-643. Quigley, John M. and Robert Van Order. 1991. Defaults on Mortgage Obligations and Capital Requirements for U.S. Savings Institutions: A Policy Perspective. Journal of Public Economics 44(3): 353-370. Quigley, John M. and Robert Van Order. 1995. Explicit Tests of Contingent Claims Models of Mortgage Default. Journal of Real Estate Finance and Economics, 1(2): 99–117. Rajan, Uday, Amit Seru and Vikrant Vig. 2010. Statistical Default Models and Incentives. American Economic Association Papers and Proceedings, 100 (2), 1-5. Schwartz, Eduardo S. and Walter N. Torous. 1993. Mortgage Prepayment and Default Decisions: A Poisson Regression Approach. Journal of the American Real Estate and Urban Economics Association, 21(4): 431449. Vandell, K. D. 1978. Default Risk under Alternative Mortgage Instruments. Journal of Finance, 33(5): 1279– 98. Vandell, Kerry D. and T. Thibodeau. 1985. Estimation of Mortgage Defaults Using Disagregate Loan History Data. Journal of the American Real Estate and Urban Economics Association. 13(3): 292-316. Vandell, Kerry D. 1993. Handing Over the Keys: A Perspective on Mortgage Default Research. Journal of the American Real Estate and Urban Economics Association. 21, 211-246. Van Order, Robert. 1990. The Hazards of Default. Secondary Mortgage Markets. 1990 (fall): 29-31. von Furstenberg, G. 1969. Default Risk on FHA-Insured Home Mortgage as a Function of the Term of Financing: A Quantitative Analysis. Journal of Finance, 24(2): 459-77. von Furstenberg, G. 1970a. Interstate Differences in Mortgage Renting Risks: An Analysis of Causes. Journal of Financial and Quantitative Analysis, 5: 229-42. von Furstenberg, G. 1970b. The Investment Quality of Home Mortgages. Journal of Risk and Insurance, 37 (3): 437-45. von Furstenberg, G. and R.J. Green. 1974. Home Mortgages Delinquency: A Cohort Analysis. Journal of Finance, 29(4): 1545-48. Webb, B.G. 1982. Borrower Risk under Alternative Mortgage Instruments. Journal of Finance, 37 (1): 16983. Williams, A. O., W. Beranek and J. Kenkel. 1974. Default Risk in Urban Mortgages: A Pittsburgh Prototype Analysis. Journal of the American Real Estate and Urban Economics Association, 2(2): 101-2.
29
Yezer, Anthony M. J., Robert F. Phillips and Robert P. Trost. 1994. Bias in Estimates of Discrimination and Default in Mortgage Lending: The Effects of Simultaneity and Self-Selection. Journal of Real Estate Finance and Economics 9, 197-215. Zorn, Peter and Michael Lea. 1989. Mortgage Borrower Repayment Behavior: A Microeconomic Analysis with Canadian Adjustable Rate Mortgage Data. Journal of the American Real Estate and Urban Economics Association, 17(1): 118-136.
30
Percentage 25
2000 Vintage 2003 Vintage
20
2006 Vintage 15
10
5
0 1
2
3
4
5
6
7
8
9
10
11
12
Loan age (Quarter)
Figure 1: Cumulative default rates of the three vintage loans
31
Percentage default 25
Actual cumulative default rate Hazard model prediction 20
Logit model prediction 15
10
5
0 1
2
3
4
5
6
7
8
9
10
11
Loan age (quarter)
Figure 2: Predicted cumulative default rates of the 2006 vintage loans
32
100
100
100 90 80 70
60
59
60 50
49
48
40 30 20 10 0
Actual default Prediction Prediction with actualwith naïve HPI HPI Hazard model
Actual default Prediction Prediction with actualwith naïve HPI HPI Logit model
Figure 3: Model predicted defaults as a percentage of actual defaults, 2006 vintage loan
33
Table 1: Performances of the three vintage loans
Default Prepayment Censor Total
Number 1,396 2,571 4,566 8,533
2000 Percentage 16.36 30.13 53.51 100.00
2003 Number Percentage 2,255 7.08 14,440 45.36 15,141 47.56 31,836 100.00
2006 Number Percentage 5,969 22.21 4,868 18.11 16,039 59.68 26,876 100.00
Note: Default is defined as over 90-day delinquency. Censor means that the loan is alive at the data cutoff point, which is 2002Q4, 2005Q4 and 2008Q4 for the three vintages, respectively.
34
Table 2: Comparison of the loan characteristics of the three vintages
Original loan amount ($) Coupon rate Mortgage spread (%) FICO score Backend ratio Combined LTV LTV>80% Low/No doc Jumbo size loan 30-year FRM 15-year FRM 1-unit property 2- to 4-unit property Condo Rate/term refinance Cash out refinance Home purchase Owner-occupied home Second/vacation home Investment property Broker/correspondent loan Retail loan Prep penalty 1-year Prep penalty 2-year Prep penalty 3-year Prep penalty over 3-year Number of loans
2000 90,568 0.11 5.05 602 0.38 76.28 0.37 0.19 0.03 0.81 0.19 0.90 0.07 0.03 0.10 0.65 0.25 0.88 0.01 0.11 0.04 0.03 0.05 0.02 0.21 0.26 8,533
Mean 2003 161,410 0.07 3.39 638 0.38 78.93 0.42 0.32 0.07 0.89 0.11 0.89 0.06 0.05 0.15 0.65 0.20 0.89 0.01 0.10 0.19 0.10 0.08 0.04 0.50 0.08 31,836
2006 170,762 0.08 3.43 626 0.39 79.38 0.36 0.28 0.04 0.95 0.05 0.92 0.04 0.04 0.10 0.67 0.22 0.92 0.01 0.07 0.06 0.02 0.05 0.03 0.54 0.10 26,876
2000 74,401 0.02 1.58 63.88 0.11 15.55 0.48 0.39 0.17 0.39 0.39 0.30 0.25 0.18 0.30 0.48 0.44 0.33 0.10 0.32 0.20 0.17 0.21 0.12 0.41 0.44
Standard Deviation 2003 2006 105,792 119,415 0.01 0.01 1.21 1.29 64.45 61.62 0.10 0.10 15.24 17.42 0.49 0.48 0.47 0.45 0.26 0.20 0.31 0.23 0.31 0.23 0.31 0.27 0.24 0.19 0.21 0.20 0.36 0.30 0.48 0.47 0.40 0.42 0.31 0.27 0.09 0.10 0.30 0.26 0.39 0.23 0.30 0.13 0.26 0.23 0.19 0.18 0.50 0.50 0.28 0.30
35
Table 3: Combined LTV distributions of the three vintages
2000 2003 2006
(0,60) 16.45 12.3 12.44
[60,70) 13.08 12.85 11.1
[70,75) 11.16 8.9 7.64
[75,80) 23.39 21.46 18.11
Combined LTV (%) [80,85) [85,90) 13.08 14.33 10.74 16.92 10.09 14.62
[90,95) 5.06 8.61 8.06
[95,97) 1 0.34 0.31
[97,100) 1.89 7.65 17.51
[100,~) 0.56 0.23 0.12
Total 100.00 100.00 100.00
36
Table 4: Comparison of the time varying covariates of the three vintages Mean HPI growth since origination Contemporaneous negative equity MSA-level income growth Change in MSA-level unemployment rate (percentage point) Observations (loanquarters)
Std Dev
Minimum
2000
2003
2006
2000
2003
2006
0.07
0.14
-0.04
0.07
0.14
-0.48
-0.62
-0.32
1.16
0.01
0.02
0.01
0.01
-0.01
0.01
62,025
213,701
192,940
Maximum
2000
2003
2006
2000
2003
2006
0.13
-0.15
-0.16
-1.07
0.66
0.79
0.37
0.73
0.74
-58.91
-27.16
-56.15
0.21
0.13
0.65
0.03
0.04
0.02
-0.16
-1.52
-0.10
0.21
0.40
0.16
0.01
0.01
0.01
-0.12
-0.11
-0.10
0.08
0.15
0.14
Note: Negative equity is calculated with the contemporaneous house value (based on zip code level HPI) and the market value of the mortgage loan outstanding. See Clapp, Deng and An (2006) for more details.
37
Table 5: Hazard model parameter estimates and Wald test results of the three vintage loans
FICO score Backend ratio Log of original loan balance LTV>80% Low/No doc 15-year FRM 2- to 4-unit property Condo Rate/term refinance Cash refinance Second/vacation home Investment property Broker/correspondent loan Prep penalty 1-year Prep penalty 2-year Prep penalty 3-year Prep penalty over 3-year Excess premium Negative equity Change in unemployment rate N
Coefficient (S.E.) 2000 2003 -0.548*** -0.719*** (0.032) (0.026) 0.052 0.079*** (0.027) (0.022) -0.074* -0.213*** (0.031) (0.025) -0.096 0.321*** (0.066) (0.05) 0.324*** 0.451*** (0.071) (0.048) -0.532*** -0.511*** (0.095) (0.083) 0.196 0.203* (0.106) (0.091) -0.399* -0.579*** (0.197) (0.145) 0.009 -0.362*** (0.1) (0.071) -0.113 -0.37*** (0.067) (0.054) -0.178 -0.281 (0.306) (0.29) 0.446*** 0.277*** (0.085) (0.074) 0.118 0.115* (0.126) (0.055) 0.207 0.192* (0.128) (0.094) 0.447* -0.098 (0.195) (0.117) 0.177* -0.049 (0.071) (0.049) 0.154* -0.082 (0.067) (0.083) 0.423*** 0.3*** (0.027) (0.018) 0.281** 0.247*** (0.087) (0.042) 0.041 0.194*** (0.034) (0.011) 62,025 213,701
2006 -0.627*** (0.016) 0.068*** (0.014) 0.031* (0.015) 0.042 (0.029) 0.448*** (0.029) -0.327*** (0.086) 0.138* (0.069) -0.098 (0.066) -0.485*** (0.05) -0.427*** (0.032) 0.185 (0.124) 0.413*** (0.052) -0.097 (0.056) 0.275*** (0.06) 0.079 (0.069) -0.007 (0.033) 0.076 (0.048) 0.233*** (0.013) 0.78*** (0.045) 0.138*** (0.014) 192,940
Wald Statistics 2000-2003 2003-2006 17.49*** 9.37** 0.62
0.21
12.13***
72.19***
24.96***
23.03***
2.22
0.00
0.03
2.38
0.00
0.32
0.55
9.12**
9.14**
1.97
8.95**
0.83
0.06
2.17
2.26
2.24
0.00
7.39**
0.01
0.55
5.75*
1.7
7.01**
0.53
4.89*
2.71
14.32*** 0.13 17.97***
9.65** 75.12*** 10.4**
38
-2LogL
23,535
42,941
113,793
Note: *, ** and *** indicate significant at 0.05, 0.01 and 0.001 level, respectively. The baseline estimates are not shown in this table.
39
Table 6: Logit model parameter estimates and Wald test results of the three vintage loans
FICO score Backend ratio Log of original loan balance LTV>80% Low/No doc 15-year FRM 2- to 4-unit property Condo Rate/term refinance Cash refinance Second/vacation home Investment property Broker/correspondent loan Prep penalty 1-year Prep penalty 2-year Prep penalty 3-year Prep penalty over 3-year Excess premium Negative equity Change in unemployment rate N
Coefficient (S.E.s) 2000 2003 -0.561*** -0.733*** (0.033) (0.026) 0.053 0.082*** (0.028) (0.022) -0.077* -0.218*** (0.032) (0.025) -0.096 0.332*** (0.068) (0.051) 0.333*** 0.46*** (0.072) (0.048) -0.541*** -0.528*** (0.097) (0.084) 0.201 0.2* (0.109) (0.092) -0.405* -0.582*** (0.199) (0.146) 0.01 -0.361*** (0.102) (0.073) -0.115 -0.375*** (0.068) (0.055) -0.183 -0.283 (0.31) (0.292) 0.46*** 0.278*** (0.087) (0.076) 0.119 0.117* (0.129) (0.056) 0.207 0.202* (0.13) (0.096) 0.453* -0.103 (0.2) (0.119) 0.177* -0.055 (0.072) (0.049) 0.154* -0.081 (0.069) (0.084) 0.433*** 0.31*** (0.028) (0.018) 0.285** 0.24*** (0.088) (0.042) 0.041 0.221*** (0.035) (0.014) 62,025 213,701
2006 -0.649*** (0.017) 0.07*** (0.014) 0.032* (0.015) 0.043 (0.03) 0.462*** (0.03) -0.332*** (0.087) 0.142* (0.072) -0.098 (0.068) -0.503*** (0.052) -0.445*** (0.033) 0.201 (0.129) 0.429*** (0.053) -0.1 (0.058) 0.285*** (0.063) 0.093 (0.072) -0.007 (0.034) 0.079 (0.05) 0.242*** (0.013) 0.792*** (0.046) 0.147*** (0.014) 192,940
Wald Statistics 2000-2003 2003-2006 16.91*** 7.44** 0.65
0.2
12.06***
72.95***
25.53***
23.8***
2.14
0.00
0.01
2.65
0.00
0.25
0.51
8.99**
8.78**
2.55
8.87**
1.17
0.05
2.29
2.51
2.64
0.00
7.37**
0.00
0.53
5.72*
2.01
7.05**
0.65
4.68*
2.68
13.93***
9.18**
0.21
78.43***
22.8***
13.93***
40
-2LogL
12,470
22,609
48,759
Note: *, ** and *** indicate significant at 0.05, 0.01 and 0.001 level, respectively. The baseline estimates are not shown in this table.
41
Table 7: Impact of parameter instability on default prediction
Predicted default Actual default Sample size
Hazard model prediction Number Percentage 3,579 13.32 5,969 22.21 26,876 100.00
Logit Model prediction Number Percentage 3,507 13.05 5,969 22.21 26,876 100.00
Note: These are cumulative (predicted and actual) defaults of the 2006 vintage loan. The prediction is based on the model estimated with 2003 vintage data and the actual realization of the 2006 vintage covariates.
42
Table 8: Combined impact of parameter instability and HPI input error on default prediction
Predicted default Actual default Sample size
Hazard model prediction Number Percentage 2,852 10.61 5,969 22.21 26,876 100.00
Logit Model prediction Number Percentage 2,931 10.91 5,969 22.21 26,876 100.00
Note: These are cumulative (predicted and actual) defaults of the 2006 vintage loan. The prediction is based on the model estimated with 2003 vintage data and assumes the 2006 vintage loan has the same zip code-level HPI growth during 2006-2008 with that of the 2003 vintage loan during 2003-2005.
43