Multidimensional Distance to Collapse Point and Sovereign Default Prediction* Roberto Savona** University of Brescia C.da S. Chiara, 50 25122 Brescia, Italy
[email protected]
Marika Vezzoli University of Brescia C.da S. Chiara, 50 25122 Brescia, Italy
[email protected] This draft: July 2008
Abstract: This paper focuses on predictability of sovereign debt crisis proposing a twostep procedure centered on the idea of a multidimensional distance-to-collapse point. The first step is non-parametric and devoted to constructing a generalized early warning system that signals a potential crisis. The second is parametric and tries to contextualize the country default within a theoretical-based process depending on the first step estimators. Empirical evidence shows that our methodology predicts future defaults about the 80 per cent of the total events.
*
We thank Chuck Stone (University of Berkeley), Craig Lewis (Vanderbilt University), Paolo Manasse (University of Bologna), seminar participants at Bicocca University, University of Brescia, FMA 2008 European Conference for useful comments and suggestions. We also thank Dan Steinberg (Salford Systems) for providing CART software. ** Corresponding author.
1
1. Introduction A central topic in both academic and policy circles is financial crisis prediction. The rapid growth of the international debt of developing nations in the 1970s and the collapse of an increasing number of emerging market economies during the 1980s focused the attention on monitoring systems that detect symptoms of debt crisis. Although significant progress has been made, accurately predicting the occurrence of crises is still an open question for both academics and policymakers. As pointed out by Eichengreen (2002), the reasons are threefold. First, forecasting requires the modeling of structural relationships that interact in nonlinear and state-contingent ways. Second, complex macroeconomic systems exhibit multiple equilibria which in turn imply a non trivial sensitivity to small perturbations (Morris and Shin, 2000). Third, financial markets are affected by a circularity effect, that is economic forecasts recursively impact on market behaviors but forecasts arise as expectations of market behavior, making silent the economic fundamental signals. Together, these arguments lead to questions about the stability of reduced-form models. Economists are extremely skeptical about predictions from these models, especially in light of the Lucas Critique. Nonetheless, many see such a methodological effort as a primary goal, since the potential costs to the economy induced by debt crises are substantial. In this paper we offer a new approach for crisis prediction. By introducing a semiparametric approach centered on the idea of a measurable distance-to-collapse point, we develop a two-step procedure. The first step is non-parametric and devoted to construct a generalized early warning system that signals a potential crisis every time a group of indicators exceeds specific thresholds; the second is parametric and incorporates the first step country default predictors within a generalized probit specification. There are two main innovations in our paper. First, we generalize Kaminsky, Lizondo, Reinhart (1998) (KLR), who introduced a nonparametric method for extracting information from several variables about an impending crisis. In particular, their methodology involves monitoring the behavior of a number of leading indicators, attributing a signal of a crisis whenever such variables depart from normal levels which are chosen so as to minimize the false-to-good signal ratio. Our first-step procedure extends KLR by: (i) simultaneously combining all the signals arising from predictors; (ii) ranking the indicators according to their forecasting ability; (iii) estimating a probability of default conditional on simultaneous signals issued by the various indicators; and (iv) allowing the indicator to be quantitative or qualitative variables. Second, our two-step procedure extends KLR by incorporating these estimates into a parametric model that readily accommodates out-of-sample forecasting. Some recent papers implementing structural models in the spirit of Merton (1974) have important similarities compared to our methodology. Gray, Merton, Bodie (2006) use contingent claim approach to quantify sovereign credit risk. By adapting the original Merton Model to the case of the sovereign balance sheet, the authors make transparent how the risk transfers across sectors and accumulates in the public sector ultimately
2
leading to a default by the government. A second paper by Hilscher and Nosbusch (2007) models the probability of the sovereign default assuming that a country defaults if a measure of macroeconomic fundamentals fall below a country’s measure of debt level; the default probability is then computed as the standard normal c.d.f of the negative distance-to-default defined as the number of standard deviations by which fundamentals exceed the debt level. Generalizing the approach of Hilscher and Nosbusch (2007), our two-step procedure uses a multidimensional concept of the distance-to-default wherein we care about the distance of each indicator from its threshold. Our choice to semi-parametrically inspect the issue of country credit risk in the context of algorithmic and parametric modeling approaches bypasses the reliability problem associated with predictive models, while maintaining a theoretical-based explanation of the process through which a country fails. From a purely technical viewpoint, our work is closely related to Vezzoli and Stone (2007) (VS), which is inspired by the seminal work of Manasse and Roubini (2007). VS remove some important limitations associated with of traditional data mining approaches, namely the Classification And Regression Trees (CART) algorithm. In a sense, Manasse and Roubini (2007) also serves as our starting point because we inspect the same database, which contains annual observations for 47 emerging economies over the period 1970-2002, which has been extended to 66 countries and restricted to the time interval 1975-2002. The empirical analysis provides convincing evidence on the ability of our approach to predict potential crises. More precisely, the first step delivers a potential sovereign risk rating system with corresponding empirical default probabilities, that are conditional on signals. The second-step regression approach acts as a useful tool to show how nonparametric distances can be combined to generalize and parameterize multidimensional distance to normality. We find that 7 variables are sufficient to achieve a significant share of correctly classified observations based on 22 potential predictors that pertain to traditional categories commonly used to assess the debt-service capacity of a sovereign. Interestingly, inflation seems to be the most important factor in splitting high and low default probabilities. Measures of predictive ability reveal that our model does a good job predicting default. When considering thresholds of 0.05, 0.1, 0.25 and 0.5, the corresponding percentage of correctly classified defaults are 0.8182, 0.8030, 0.6818 and 0.5152. The structure of the paper is as follows. Section 2 describes the reasons of sovereign default by inspecting theoretical and methodological issues. Section 3 introduces the procedure we use to predict the sovereign default. Section 4 describes the data. Section 5 reports results. Section 6 concludes.
3
2. On the reasons of sovereign default 2.1. Theoretical issues Inspecting the economic reasons of default is particularly challenging for sovereign counterparts. The underlying reason for financial crises are conditional on factors that contains idiosyncratic and systematic components that vary over time. Sovereign credit risk is indeed complicated by the fact that1 a sovereign default is endogenously triggered by political decisions which in turn implies a trade-off between debt payment costs and costs connected to reputation or international trade impediment. A first question is how we define sovereign default. Our perspective focuses on debtservice capacity which is the key element used by various agencies to assess the relative likelihood that a borrower will default on its obligations. This broad definition guarantees homogeneity in computation of the probability of default. That is, what differentiates sovereign and corporate defaults is not the definition of default per se, which is broadly speaking the same, but the way a country defaults and the variables that predict it. It is firstly worth noting that country default may be triggered by a currency crisis, a banking crisis or both (the twin crises). It seems plausible to presume that currency crisis is harder to predict, since forecasting banking crisis is based on the monitoring of slowly evolving fundamentals. Empirical evidence is consistent with this supposition. Goldstein, Kaminsky, and Reinhart (2000), for e.g., examine the links between currency and banking crises and changes in sovereign credit ratings2. They find mixed evidence on the ability of the rating agencies to anticipate financial crises. Reinhart (2002) points out that neither rating agency predicted banking crises, but there is evidence that the Moody’s sovereign ratings have some (very low) predictive power for currency crises. Other forecasting limitations arise because models often fail to incorporate political factors that help explain, at least in some part, willingness to meet debt payments3. In addition, other technical issues further complicate the assessment of sovereign creditworthiness. These includes the composition of external debt4 and the choice of potential predictors5. Despite these limitations, there is nonetheless sufficient convergence among academics and policymakers to detect factors that are associated with the debt-service capacity of a 1
See papers by Bulow and Rogoff (1989); Eaton and Gersovitz (1981); Gibson and Sundaresan (2001). They analyzed the rating by Institutional Investor and Moody’s for 20 countries. 3 Theoretical models have been recently proposed, as it is the case of Leblang (2002) who inspected the impact of speculative attack conditional on specific political contexts. 4 As a reference, recall that such a factor has been used in Frankel and Rose (1996) and Berg and Pattillo (1999) to inspect currency crises by estimating a probit model. 5 This is particularly true for exports, real overvaluation, changes in reserves, and financial variables as M2 over reserves. As discussed in Eichengreen (2002), deceleration of the growth exports in the year before the crisis does a better job in some cases but not in others, and the same is observable for overvaluation even if detrending real rate does little better. The M2-to-reserves ratio is instead a good predictor in general, but it become noisy by using its growth rate. 2
4
sovereign. KLR6 identify the following indicators, classified by the corresponding category: • • • • • •
• • •
•
Capital account: international reserves, capital flows, short-term capital flows, foreign direct investment, and differential between domestic and foreign interest rates. Debt profile: public foreign debt, total foreign debt, short-term debt, share of debt classified by type of creditor and by interest structure, debt service, and foreign aid. Current account: real exchange rate, current account balance, trade balance, exports, imports, terms of trade, price of exports, savings, and investments. International variables: foreign real GDP growth, interest rates, and price level. Financial liberalization: credit growth, change in the money multiplier, real interest rates, and spread between bank lending and deposit interest rates. Other financial variables: central bank credit to the banking system, the gap between money demand and supply, money growth, bond yields, domestic inflation, the “shadow” exchange rate, parallel market exchange rate premium, central exchange rate parity, position of the exchange rate within the official band, and M2/international reserves. Real sector: real GDP growth, output, output gap, employment/unemployment, wages and changes in stock prices. Fiscal variables: fiscal deficit, government consumption, and credit to public sector. Institutional/structural factors: openness, trade concentration, dummies for multiple exchange rate, exchanges controls, duration of the fixed exchange rate periods, financial liberalization, banking crisis, past foreign exchange market crisis, and past foreign exchange market events. Political variables: dummies for election, incumbent electoral victory or loss, change of government, legal executive transfer, illegal executive transfer, leftwing government, and new financial minister; also, degree of political instability (qualitative variable based on judgment).
2.2. Methodological issues On the question involving the relationship between observable predictors and country risk, different methodologies have been explored based on the philosophical assumptions about the nature of default. In general, the credit risk literature strengthens the difference between reduced-form models, in which default is assumed to be an inaccessible event whose probability is specified through a stochastic intensity process, and structural models, in which default is explicitly modeled as a triggering event based on the balance-sheet notion of solvency. A third, and in some sense parallel, perspective is given by pure statistical approaches and classifiable as “primitive” reduced-form models that explain the relationship between country default and a number of possible predictors. 6
Even though KLR focus on currency crisis, the variables also apply to the general case of sovereign default.
5
More strictly on the issue of sovereign default, financial literature experienced all these methodological possibilities, examining (i) the prices of sovereign debts, and (ii) the relationship between the country default and predictors. As noted in Duffie and Singleton (2003)7, the empirical literature primarily focuses on intensity models. For example Claessens and Pennacchi (1996) derive pseudo default probabilities from Mexican Brady bond prices; Bhanot (1998) analyzes implied default recovery rates of coupon payments for Brady bonds; Merrik (2001) implements a constant risk-neutral default intensity model to Russian and Argentinian bonds; Pagès (2000) adopts an affine-based intensity model8 to Brazialian Brady bonds; Duffie, et al. (2003) implement a three factor affine intensity model for pricing sovereign Russian bonds; Pan and Singleton (2006) explored the term structure of credit default swap (CDS) spreads for Mexico, Turkey, and Korea. The literature on sovereign default based upon structural models is not as wide as that for reduced-form models. Keswani (2005) applied the two-factor structural model of Longstaff and Schwartz (1995) to Brady bonds. Gray, Merton and Bodie (2006) introduce a contingent claim approach to quantify the sovereign credit risks adapting the original Merton Model to the case of the sovereign balance sheet, and Gapen, et al. (2005) empirically implemented such model to 12 emerging economies; Hilscher and Nosbusch (2007) proposed a model of sovereign spreads based on a very similar concept of the Mertonian “distance-to-default”. Finally, the literature on “primitive” reduced-form models is very extensive and mainly focuses on parametric approaches. Considering those papers which inspired our methodology, we firstly start with KLR (1998), where the authors elaborate a system in which a crisis is signaled when pre-selected leading economic indicators exceed some thresholds to be estimated according to a minimization procedure of the “false alarm-togood signal ratio”; that is, the ratio of false signals to good signals. This method was further implemented in a composite way by Goldstein, Kaminsky and Reinhart (2000), which is an elaborated version of Kaminsky (1998), who proposed a composite indicator approach as a weighted sum of individual indicators. Frankel and Rose (1996) estimated a probit model to inspect the currency crisis as in Berg and Pattillo (1999), even though these authors do a better job (in-sample) using the KLR indicators as covariates. Two other recent papers finally appear as extremely interesting for our methodological inspiration. The first is Manasse, Roubini and Schimmelpfenning (2003), who developed an early warning system by using a classification tree approach outperforming the logit model in predicting crisis. Extending such an approach, Manasse and Roubini (2007) then proposed a collection of “rules of thumbs” that help predict potential crisis in the spirit of KLR, while simultaneously using the pre-selected indicators.
7
Duffie and Singleton (2003), pg. 171. Essentially, affine intensity models are multidimensional version of intensity process in which the explanatory variables are modeled as independent stochastic processes. See Appendix A in Duffie and Singleton (2003) for further details.
8
6
3. Methodology 3.1. Motivating our approach As discussed above, prior research uses a model-based approach that typically relies on linear relation between the crisis and a set of informative variables. Although these approaches are intellectually appealing, especially for structural-form models, sovereign defaults have multiple sources of risk that do not conform to the underlying distributional assumption upon which these models rely. And when crisis becomes more complex, these models fail to present a simple picture of sovereign default. It is indeed questionable, at least from an empirical viewpoint, that sovereign default could be isomorphically approximated by a short put option on the assets of the country, as the original Merton model suggests. As in the structural-based models, our approach is based on the concept of distance-todefault (DD), for which the default is explicitly modeled as a triggering event based on balance-sheet notion of solvency. In the classical Merton model, DD is defined as the difference between the market value of the firm and the face value of the firm’s debt scaled by the volatility of the firm. In order to preserve this concept in sovereign default, we focus on fundamentals. Indeed, as in Hilsher and Nosbusch (2007) we assume that whenever the fundamentals fall below a certain point the country fails. Linking a crisis prediction system to the monitoring of a number of variables that correlate with sovereign default relative to certain level, to consider as “normal”, seems to be a flexible approach that can also accommodate limitations connected to traditional structural models. In a broad sense, the type of monitoring we envision may be viewed as a sort of Merton model-equivalent, after relaxing the hypotheses underlying the DD. Indeed, even though the concept of DD is formalized within a general equilibrium model in which the Modigliani-Miller theorem holds and conceived to explain the economic reason of the default of a firm, such a process is essentially reduced to a metric distance, namely the distance to a specific point. Exporting this reasoning into the signal approach it seems, then, reasonable to think of signals as a sort of Multidimensional Distance to Collapse Point (MDCP), so defined in order to distinguish it from the original DD, which could be used to find a Merton-type model to predict sovereign default. In sum, our approach exhibits some similarities with: (i) KLR (1998), as for the signal approach, while simultaneously considering a set of informational variables both quantitative and qualitative; (ii) Manasse and Roubini (2007), as for the non-parametric approach, while modifying the classical data mining approach using a new algorithm which is tailored to the data structure we deal with9; (iii) Hilscher and Nosbusch (2007), as for the concept of distance-to-default10, while generalizing to the multivariate context. Our methodology is then semi-parametric, since it combines the nonparametric signals approach with regression-based approach in order to generalize and parameterize the multidimensional distance to normality. 9
While Manasse and Roubini (2005) straightforwardly estimate the regression trees, we implement a new algorithm recently introduced in Vezzoli and Stone (2007), which is well suited to panel data structure as it is the case of the database of sovereign default we inspected. 10 Which is the MDCP in our case.
7
3.2. First-step procedure: a tailored data mining approach 3.2.1. Methodological context Due to the complex nature of the default of a sovereign, which clearly differs from that of a corporate, our first “philosophical” perspective relied on algorithmic modeling, allowing the data to speak about the sovereign default also seeing how well data conform to a particular model. Indeed, we read the sovereign defaults as the output of an non-stylizable system in which a set of variable acts as inputs. In this setting, the central question is to describe the conditional distribution of the defaults, Y, given X , where X = [( X 1 , K , X r )] is a collection of r vectors of predictors, both quantitative or qualitative. More formally, the main objective is to recursively partition the predictor space into subsets in which the distribution of Y is successively more homogeneous. This is what data mining techniques try to do. In a sense, they detect complex and latent structures nested within large data sets. The output from such an analysis produces a system that help classify and predict sovereign defaults through a sequence of decision splitting rules. Strictly speaking, this is the underlying logic of the Classification And Regression Tree (CART) model popularized in the statistical community by the seminal work of Breiman, Friedman, Olshen, and Stone (1984), and recently applied in the sovereign debt crisis prediction by Manasse and Roubini (2007). Technically, such recursive partitioning of the predictor space is carried out by the use of a binary tree, which graphically depicts the topology of the space by a series of subsequent nodes that collapse into distinct partitions. Let T denote a tree with ~ m = 1,K, M terminal nodes, i.e. the disjoint regions Tm , and by Θ = θ1 , K , θ M the parameter that associates each m-th θ value with the corresponding node. A generic dependent variable Y conditional on Θ assumes the distribution f (Y Θ) , and according
to whether the Y is quantitative or qualitative the model is called regression tree or classification tree. For our purposes, Y is a latent variable that takes the value 1 in the event of a sovereign default and zero otherwise. Computationally, the general problem 2 for finding a good tree is solved by minimizing [Y − f (Y Θ)] w.r.t. Ξ = {T , Θ} . This entails selecting the optimal number of regions and corresponding splitting values11. Importantly, CART models assume that (i) Y is i.i.d. within each region, and (ii) Y is independent across the regions, leading to the following data distribution: M
p (Y X, Ξ ) = ∏ f (Y Θ ) .
(1)
m =1
11
Due to technical difficulties in solving such minimization process many use a greedy algorithm to grow the tree, by sequentially choosing splitting rules for nodes based upon some maximization criterion, then controlling for overfitting by pruning the largest tree according to a specific model choice rule such as cost-complexity pruning, cross-validation or multiple tests of the hypothesis that two adjoining regions should merge into a single one. See Hastie, Tibshirani and Friedman (2001) for technical details.
8
where X is the collection of predictors with X = [( X 1 , K , X r )] m = 1, K, M represents the set of homogeneous clusters with respect to the predictor values given specific thresholds. Equation (1) then can be used to assess the probability of default conditional on predictors and clusters. Since the number of clusters (terminal nodes or regions) and the threshold values (the splitting rules) are the result of a minimization process, the methodology gives a rating mapping that is considered optimal for both the number of classes and the corresponding probabilities of default. To this end, consider that the CART seeks to obtain the maximum homogeneity within the regions minimizing an impurity index, frequently measured by the Gini index, for classification trees, or by the sum of squared errors, for regression trees. In other terms, such a technique delivers also a sort of “implied rating system validation”, since the partitions are homogeneous in terms of maximum predictability by construction. These are essentially the main motivations of the paper by Manasse and Roubini (2007), who referred to the classification tree in predicting the sovereign debt crisis and the reason why they used the classification-type model is because the dependent variable was assumed to be qualitative. In the next section we will demonstrate that if Y is a dummy variable, as it is the case for the sovereign default, regression and classification trees converge then allowing the use of regression-type model also for dummy variables. This is not a pure technical detail, because regression trees deliver results that can be read as the probability that the observations within a specific region assume the state 1 or 0, i.e., the default probability or the survival probability, respectively. Another critical point is about the two base assumptions of the CART models, namely Y is i.i.d. within each region and independent across the regions. One of the main problem of such models is indeed connected to the structure of the data set. Unfortunately, neither the first nor the second assumption applies for panel data, which is the typical structure of the data relative to defaults of different countries occurred over specific time intervals. Indeed, due to their inner mechanism, CART models basically look like a technique that does not pay attention to the intrinsic data structure, in which autocorrelations and other latent dependencies could play a major role. 3.2.2. More technical issues on regression trees More technicalities on regression trees are now needed to better explain our approach. The first point is about the convergence between classification and regression trees when the dependent variable is a dummy. This is particularly interesting, because using regression trees leads to a probability estimate. To understand why, consider first the (1), where it is ease to observe that f is a parametric family indexed by θ m which give ~ the structure of the tree, since each θ represents a specific Tm region, by ~ ∑θ I (X ∈ T ) M
m =1
m
m
in which I is the indicator function. Consider, also, that since the
regression tree seeks to minimize the sum of squared errors, then the best θˆm is simply ~ the average of the Y values within the m-th node for x i ∈ Tm , to be estimated as
9
θˆm = N m−1
∑y
~ x i ∈Tm
i
with i = 1, K , N the total number of observations and N m the number
~ within the m-th region. While for classification trees the Tm is modeled as the proportion of a specific category, say, k with k = 1, K , K to be estimated as pˆ (k m ) = N m−1 ∑ I ( y i = k ) . Then, it is straightforward to note that reducing k to 2 ~ x i ∈Tm
states, to denote with 1 and 0 corresponding to default and survival, respectively, pˆ (1 m ) = N m−1 ∑ I ( y i = 1) = N m−1 ∑ y i ≡ θˆm and pˆ (0 m ) = 1 − pˆ (1 m ) . Hence, when the ~ x i ∈Tm
~ x i ∈Tm
dependent variable is a dummy the use of regression tree allows θˆm to be interpreted as the probability that the cases in the m-th node assume the state 1, namely the probability of default. On the contrary, a pure classification tree would consider the Bernoulli model by partitioning the cases between the nodes with almost the same proportion of 1 and 0 within each of them. The second point to discuss about technicalities is on the iterative procedure by which an optimal regression tree can be obtained. Indeed, the binary recursion partition refers to the maximum error reduction in choosing the best splitting rule. Mathematically, if 2 we denote by s ∗ the best split value (category)12 and by R(m ) = N −1 y − θˆ the m
∑(
~ x i ∈Tm
i
m
)
loss function which essentially measures the variability within each node, the fitting criterion is given by ΔR (s ∗ , m ) = max ΔR (s, m ) with ΔR (s, m ) = R(m ) − [R(m1 ) + R(m2 )] . ∗ s
Finally, the last technical issue is about the overfitting problem. As mentioned before, to control for this problem a common strategy is to grow the tree obtaining an overfitted tree Tmax , then prune it back in such an extent to minimize the cost complexity function ( ( Rα (T ) = R(T ) + α T in which α ≥ 0 denotes the cost complexity parameter and T the number of the terminal nodes in T. As is evident α modulates the tradeoff between the size and its goodness of fit to the data: large values of α result in smaller trees Tα , and vice versa. The previous points summarize the main technicalities of the CART approach, which intuitively appears as a possible and useful technique to inspect the sovereign default. It allows for non-linear relationships and predictors can be quantitative or qualitative. Again, since the number of the nodes are the output of an optimization procedure as well as the corresponding splitting threshold values, it may be possible to obtain a rating system in which the partitions appear homogeneous with respect to the predictor values and able to predict an impeding crisis in the spirit of the signal approach. Moreover, since the predictors are simultaneously used, what we could obtain is a multidimensional distance to collapse point derived as a collection of “rule of thumbs”. Unfortunately, in addition to the above-mentioned limitations of the CART, i.e. Y is assumed to be i.i.d. within each region and independent across the regions, other two drawbacks can be serious enough to doubt about the potential of this approach: the tradeoff accuracy vs. complexity and the instability. The accuracy refers to the model ability in providing correct predictions, which can be thought as a measure of good 12
According to which the variable is quantitative (value) or qualitative (category).
10
signals about the event occurrence (in our case the default), and a model achieves high accuracy by increasing its complexity then reflecting in a prediction system that provides good signals but difficult to read, since they are spanned over an overfitted tree structure. The instability is instead a problem linked to the extreme sensitivity CART exhibits relative to the input space: minor modifications of the sample could indeed lead to significant changes in the tree structure. Together, these reasons explain why we need for better approach, which should be tailored to the data structure we have to inspect. 3.2.3. A tailored data mining approach Starting from the limitations discussed in the previous section, a recent data mining procedure called CRAGGING has been recently proposed in VS (2007) with the end to remove such problems then generalizing the base scheme to an approach able to deliver better and more robust results. From a statistical standpoint, this algorithm is designed in a way to reconcile accuracy, stability and interpretability of a prediction system. It is in fact well known in the statistical community that, although recent technical improvement led to new methods13 achieving better predictions and controlling for the instability, some problems still remain concerning the pervasiveness these methods have on the data structure as well as their inner complexity, which usually provide multiple outputs. Within this context, the algorithm introduced in VS (2007) represents a possible reconciling solution designed in the spirit of the Occam’s razor principle, since it delivers, in a sense, a data compressor which is also a scientific explanation/formulation generator. To better understand the underlying logic, it is worth recalling that each data mining model is estimated through a rotational estimation procedure, known as cross-validation, with which the sample is partitioned into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained for subsequent use in confirming and validating the initial analysis. The initial subset of data is called the training set; the other subset(s) are called validation or testing sets. This is done with the end to improve the out-of sample predictability. Such further technical note is important because much of the innovation of the CRAGGING, that stands for CRoss-validation AGGregatING, is indeed focused on this point. In sum, the idea underlying such algorithm is to repeatedly rotate14 the subsets in which the analysis is initially performed (the training sets) in such an extent to, first, generating multiple predictors and, second, combining them to obtain a univariate and stable tree. This is the reason why the CRAGGING can be viewed as a generalization of the CART. Following VS (2007), let’s start by considering a panel data L with observations for t j with t j = 1,K , T j where j denotes the unit (in our case a country) for j = 1,K , J ; 13
These methods called ensemble learning linearly combine single functions of the input variables following the Taylor or Fourier series expansion. In more depth, predictors, more formally ensemble ~ predictions f ( X) , are made as a linear combination of base or weak learner, fˆ ( X) , which is a function of the input variables f ( X) = ( X 1 ,K, X r ) . 14 More precisely we should use the term perturb, since we refer to as perturbation and combination, all the more that the CRAGGING is formally an ensemble learning algorithm pertaining to P&C methods (Perturb and Combine).
11
J
hence, the total number of observations is N = ∑ T j . Denote also with y jt the j =1
dependent variable and with x jt the vector of predictors for the j unit in period t. The J units are randomly partitioned in V test sets of equal size denoted by Lv, reported below in table 4, where v = 1, K , V each one containing J v number of units. By denoting with Lcv a generic v-th training set where the superscript stands for complement, since Lcv = L - Lv containing J vc units, the expression
(
fˆα x jt Lcv \l
)
with l ∈ Lcv
(2)
denotes the fitted function (the base learner) for the j unit in t computed by the tree carried out over L cv except for the l-th unit using an α cost complexity parameter. Then, the following criterion is used to improve the accuracy of the model
(
~ crag 1 fα , v (x jt ) = c ∑ fˆα x jt Lcv \l J v l ∈Lcv
)
with
j ∈ Lv , t = 1,K, T
(3)
which is the average of the function (2) (the ensemble learning) fitted over the units contained within the test set L v . This procedure is carried out for every L v , in which the ~ fitted functions (2) are linearly combined so that the f αcrag ,v (x ) will act as a good predictor for future x in L v . As for the choice of tuning parameter α, which is of fundamental importance for the accuracy of the model, the CART is performed over the entire L then obtaining a starting value of the parameter to be reduced by an arbitrarily quantity, then computing the (3) with the new parameter value. The optimal α corresponds to that value for which the out of sample error estimation over all the L v is minimized, i.e.
arg min Lcrag = α α
1 V
V
∑ Lα v =1
crag ,v
(4)
where L is a loss function, namely the estimation error for every L v ,
Lcrag α ,v =
1 Nv
∑∑ L[y T
j∈Lv t =1
12
jt
]
~ ; f αcrag ,v (x jt )
(5)
Jv
in which N v = ∑ T j is the number of observations in the test set. j =1
As a final step, CRAGGING and CART are combined in the following way. First, the predictions computed in every test set conditional on the optimal cost-complexity parameter are staked to obtain a N × 1 vector. Second, this vector is used to replace the dependent variable Y, then growing a single tree with the same optimal cost complexity parameter. Combining CRAGGING with CART what we obtain is a parsimonious final model, with good predictions (accuracy), better interpretability and minimized instability. At this point, it is somewhat clear what changes in the above general formulations when the CRAGGING is applied to sovereign default data sets, where the dependent variable is a dummy. As noted, in this case the use of regression tree allows θˆm to be interpreted as the probability that the cases in the m-th node assume the state 1, ~ namely the probability of default. This signifies that the general function f has to be read as the predicted probability of default15 conditional on (i) predictors and (ii) terminal nodes. This leads to reformulate equation (1) as
p (Y X, Ξ ) = fˆα ∗ (X ) . Armed with this new technology we would be able to better predict the sovereign default also producing an optimal rating mapping, while maintaining the panel data structure and using the same signal approach as in KLR (1998) and Manasse and Roubini (2007). As one can note, the overall architecture of this data mining approach has a limitation due to the fact that the final outcome fˆα ∗ (X ) is a physical average probability of default for the node population, since they arise as simple average of defaults over pooled timeseries and cross section data. We have no information about country-specific and time varying probability of default. To overcome this problem we run a second stepprocedure to parameterize the non-parametric MDCP estimated in the first step. 3.3. Second step-procedure: semi-parametric MDCP
As pointed out by Hilscher and Nosbusch (2007) using a pure structural approach to model the sovereign default is more challenging than modeling the corporate defaults. Standard structural models, which are based on the idea that when firm assets fall below the liabilities the firm default, cannot be straightforwardly exported into the sovereign default mechanism. A possible way out is to focus on fundamentals. Indeed, the 15
In this case equation (5) will be the mean squared error, i.e. Lcrag = 1 α ,v Nv
13
∑ ∑ [y T
j∈L v t =1
jt
]
2 ~ . − fαcrag , v (x jt )
underlying idea of the distance-to-default can be preserved by assuming that whenever the fundamentals fall below a certain point, the country fails. This is what we do, introducing a multidimensional semi-parametric approach. Let denote with F a measure for the fundamentals of a given country following the process dFt ′ = σ (F ) dWt ′
(6)
where W is a standard Brownian motion on a probability space (Ω, F , P ) satisfying the usual conditions for which W0 = 0 , W has independent increments and (Wt′ − Wt ) ∼ N (0, t ′ − t ) for 0 ≤ t < t ′ 16 with N to denote the normal distribution; and where σ is the fixed volatility for F. We assume that there is a critical value of fundamentals c for which the country fails in t ′ whenever Ft ′ ≤ c . We call c the collapse point for F. Then, the probability of default in t ′ conditional on the current fundamentals F is F −c⎞ ⎛ ⎟ Pt (Ft ′ ≤ c Ft ) = Pt (Ft + σ (F ) dWt ′ − c ≤ 0) = Pt ⎜⎜ dWt ′ ≤ − t σ (F ) ⎟⎠ ⎝ ≡ N (− ht )
(7)
where N (− ht ) is the probability that a standard normal variable is less than − ht where ht =
Ft − c . σ (F )
Suppose
now
to deal with an n-dimensional distance-to-collapse point ⎛ Ft (1) − c (1) Ft (n ) − c (n ) ⎞ (1) (n ) ⎜ ht = (ht ,K , ht ) = ⎜ (1) (1) ,L , (n ) (n ) ⎟⎟ . Formally, h is a n-dimensional Markov σ (F ) ⎠ ⎝ σ (F ) factor process living in a state space S with S = ℜ n , then ht is a vector of real numbers. The Markovian assumption implies that for any given function g : S → ℜ n and for any fixed time t ′ > t , we have Et [g (ht ′ )] = f (ht ) for some function f : S → ℜ n , where Et denotes the expectation on all information available at time t. In our study we assume that the function f is defined by some vector b in ℜ n so that, for all h we have:
16
See Protter (1990) for further technical definitions.
14
f (h ) = b ⋅ h = b1 h (1) + L + bn h (n ) .
(8)
Roughly speaking, equation (8) is a linear combination of the n distance-to-collapse points which maintains equation (7) valid for the multivariate case of different measures of fundamentals. It is now easy to note that by using a function g (h ) = N (b ⋅ h ) we obtain a probit model without intercept and as covariates the n distance-to-collapse points, Et [g (ht ′ )] = N (b ⋅ h )
≡ Et [Pt ′ (h )]
(9)
where N (b ⋅ h ) is the cumulative standard-normal distribution function with b a vector of coefficients to be estimated, and where P denotes the probability of default. In this way we model a MDCP where the output of the final tree is the input of the probit model. In other terms, with this second step we contextualize the country default within a theoretical-based process depending on a linear combination of the volatility scaled distances to the threshold. ⎛ Ft (1) − c (1) Ft (n ) − c (n ) ⎞ (1) (n ) ⎜ Analytically, the covariates ht = (ht ,K , ht ) = ⎜ (1) (1) ,L , (n ) (n ) ⎟⎟ are σ (F ) ⎠ ⎝ σ (F ) constructed with the n variables in the final tree partition, where F − c is the distance from the corresponding threshold used as splitting rules, and where σ is computed by stacking time-series and cross-sectional observations for each variable. In this way, the probit regression will deliver the best linear combination of the volatility scaled distances to the threshold that better calibrate the transformed data towards 0 or 1, depending on the observed status of the countries (1 for default and 0 for non-default). The reason of using the probit model is clearly motivated by the assumptions (6)-(9), for which shock to fundamentals is normally distributed with mean 0 and constant variance σ 2 , i.e. the probability that a standard normal variable is less than b ⋅ h computed in t is equivalent to the probability of default for t ′ with t ′ > t . The regression approach we propose is then useful to understand how non-parametric distances could be mixed together in order to reduce the n-dimensional measures of fundamentals obtaining a “generalized” distance to default which summarizes all relevant information to better predict the likelihood of default for a given country across time. We then remove limitations of the non-parametric approach which gives us physical long-run probability of defaults. Indeed, the standard normal c.d.f of b vector times the normalized distances h observed across time and for each country delivers country-specific time-varying probability of default. Of particular interest is also the fact that under the multivariate normality of ht(1) ,K , ht(n ) the usual statistics for the coefficient estimates of the probit model can be used for
15
parametric tests on the thresholds. Hence, the corresponding p-values of the coefficients indicate the statistical significance of the splitting rules of the final tree under the common asymptotic test assumptions. 4. Data
The data used in this paper come from S&P’s, World Bank’s Global Development Finance (GDF), IMF, Government Finance Statistics database (GFS) and Freedom House (2002), and include annual observations over the period 1975-2002 for 66 emerging economies. Our dataset is similar to that used in Manasse and Roubini (2007), though we extended the overall number of sovereigns and reduced the time interval17. Reasons due to missing values and low number of defaults in the subperiod 1970-1974 basically explain why our final dataset differs from Manasse and Roubini (2007). For each country the debt crisis indicator is that provided by S&P’s, for which a sovereign default is defined according to whether a country does not meet his financial obligations (interest or/and capital payment) on due date, and the potential predictors were selected according to most of the empirical findings on debt-service capacity of a sovereign. Table 1 reports the list of variables used in this study. Note that the 22 potential predictors, both quantitative and qualitative, cover all the categories in KLR (1998)18, by including measures of external debt, public debt, solvency and liquidity, financial variables, real sector and institutional factors19. The dataset focused on countries that in the year before the crisis were not in default. In such a way, we strengthened the attention on default prediction when the current year is not yet a crisis year20. Table 1 and 2 report, respectively, the list of variables and the corresponding summary statistics on winsorized distributions at the 5th and 95th percentile levels, in order to control for the impact of extreme values on summary statistics. As a whole, we analyzed 70 crisis episodes reported below in table 3, which seem suggest that over time crises exhibit a cyclical effect: much of the defaults are in the period 1981-1986, and a new cycle seem commenced starting from 1996, i.e. starting from the Asian crisis of 1997. On this point, it is interesting to note that many viewed the period 1997-2002 as a sort of “stress test” for rating agencies: the following crises of Russian bonds in 1998, Pakistani Eurobond in 1999, Ecuador in 2000, Argentina in 2001 and, finally, Uruguay 2002 led indeed to focus also on contingent liability and international liquidity considerations (Bathia, 2002). Summary statistics for potential crisis predictors are in table 2, relative to quantitative variables. To give some preliminary insights on predictors’ ability in signaling 17
The period inspected by the Manasse and Roubini (2005) starts from 1970. Many of which are regressors included in the IMF’s Early Warning Signals (EWS) model of currency crisis. The reason is because it has proven that there is a possible link between currency and debt crisis (see, for e.g., Reinhart, 2002). 19 For political factors we used, in particular, the index of political rights compiled by Freedom House (2002) take value on a scale from one (most “free”) to seven (least “free”). 20 This is not trivial since the sovereign default exhibits a substantial difference with respect to the corporate defaults: for these ones, the default is an absorbing state, i.e. after the crisis the firm cannot move in a non-default state, while for the countries, the default is not absorbing because after the crisis a sovereign will move in a non-default state. 18
16
sovereign defaults, we split the overall sample in defaults and non-defaults, computing mean and standard deviation for each variable. Then, we run parametric paired t-test on default and non-default mean variables. Note that factors that are likely to affect a sovereign crisis (and expected direction of the effect)21 seem to be: reserves (+ ) , exports (+ ) , GDP (+ ) , external debt (− ) , short term debt to reserves (− ) , M2 to reserves (− ) and inflation (− ) . On the other hand, current account, exchange rate volatility and the index of political rights by Freedom House (2002) exhibit less discriminant power between default and non-default sub-samples. 5. Empirical analysis 5.1. First-step procedure
The panel data inspected in our empirical analysis and denoted with L, contains J
N = ∑ T j = 1355
observations
for
J = 66
countries.
As
described
in
the
j =1
methodological section, we first randomly divided the 66 countries within V = 11 equally sized sets ( L1 ,K , L11 ), each one containing 6 countries ( J v = 6 ), i.e. the 10 per cent of the overall sample, and reported in table 4. Secondly, we estimated the equation fˆα x jt Lcv \l , namely the fitted function of predicted probabilities computed over v-th
(
)
training set Lcv = L - Lv except for the l-th country using a specific cost complexity parameter α . Hence, the training set, to be used estimating the equation (2), contains 60 countries and the test set, to be used in estimating equation (3), contains 6 countries. The optimal tuning parameter, say α * , is determined having the objective to minimize the out-of sample error over all the L v , as we formalized in equation (4). Computationally, we first computed the cost complexity parameter that best modulate the trade-off between size and its goodness of fit to the data, say α ind where “ind” stands for independent, recalling the CART assumption about i.i.d. of the dataset. Such parameter is chosen so as to minimize the relative error, which is the normalized measure of accuracy used to remove the scale dependence. More formally, if we define 2 R(μ ) = E [Y − μ ] as the mean squared error using the constant μ as a predictor of Y (which is also the variance of Y), and where μ = E[Y ] , the relative mean squared error RE is
RE =
Lv R (μ )
21
(10)
The variables whose expected direction of the effect on sovereign risk is positive worsening in the runup of a crisis, then improving when the country exits from the crisis. For the variables with negative expected direction the corresponding values increase before the crisis then reducing and stabilizing after the crisis.
17
in which the numerator is the mean squared error over all the L v test sets. In table 5 we report results on best α ind . Note that the optimal tree that minimizes the relative error has six nodes with α ind = 2.1762 . This signifies that a high penalty on the number of final nodes is required to control for the overfitting problem. This α ind is the starting value of the tuning parameter to use in performing the CRAGGING algorithm. To be clearer in our exposition, let consider the first learning sample Lc1 = L - L1 . Analytically, we removed each country in Lc one at time, then computing fˆ x Lc 1
α ind
(
1\l
)
(equation (2)). This function was next fitted over all the units in L1 using predictions to ~ (x ) (equation (3)). At this point, we calculated the prediction compute the average f αcrag ind ,1 error in the test set L1 , Lcrag α ind ,1 , which is the first argument within the summation operator in equation (4). All that was executed for each L v test set. The procedure just described was repeated changing the value for α ind in order to
1 V crag ∑ Lα ,v . In detail, we progressively V v =1 α reduced the parameter by using a range of 18 values then growing for each Lcv 1,080 trees22 obtaining a total of 11,080 trees, i.e. 1,080 trees for each v-th training set with V = 11. The results are in table 6, where for each of the 18 values of α we report the obtained prediction error for each v-th training set L , as well as the value for Lcrag α according to equation (5). Note that the optimal α , i.e. the tuning parameter that minimizes Lcrag α , is set at the null. This result is extremely interesting in relation to the overfitting problem. Indeed, since α is the key element in modulating the tradeoff between complexity and accuracy, the common expectation is that a very low value of this parameter reflects on high estimation errors, which next require the pruning as the strategy to find the optimal tuning parameter. But surprisingly, our methodology delivered α ∗ = 0 . In other words, we obtained a minimized MSE for a tuning parameter which allows to significantly reduce the size of the tree, obtaining a parsimonious partition ease to read without the pruning. To make a comparison with common CART models, observe, in table 5, that when α is set at the null the number of nodes is 36. = minimize the loss function, i.e. arg min Lcrag α
Lastly, CRAGGING and CART were merged together by replacing the dependent variable (sovereign default) with the predictions in L1 , K , L11 with the optimal cost complexity parameter α ∗ , then growing a single tree on the learning set and validated on the test-set. In doing this we used the same partition of 60 countries, for the learning set, and 6, for the test set23. This is the final model, in the words of VS (2007), namely a Since the training set contains 60 countries and V = 11 then the number of trees is 60 times the 18 values of α. 23 As before, over a total of 66 countries, the learning and the test set contained, 10 and 90 per cent, respectively, namely 6 and 60 countries. In detail, the test set contained Czech Republic, Lithuania, Malaysia, Nicaragua, Ukraine, Zimbabwe, while the learning set contained the remaining 60 countries. 22
18
single final tree computed as usual by choosing the optimal tuning parameter α in order to minimize the test set relative error. Figure 1 reports the optimal α while figure 2 reports the structure of the final tree with corresponding optimal thresholds to be used in classifying a country within a specific cluster. Consider, first, the optimal tuning parameter. As indicated in figure 1, the value that minimize the test-set relative errors is 0.0532, corresponding to a 10-nodes tree partition. Observe, also, that imposing small values for α, say 0 ≤ α ≤ 0.15, the results are less sensitive towards the cost-complexity parameter. In other terms, the final model do not require high penalty to control the overfitting problem while minimizing the prediction error. The final tree is therefore accurate and exhibits a simple structure. On this last point, let consider, now, figure 2. Of 22 potential predictors, our model selected 7 significant variables: inflation (INF); exchange rate (EXCHR); total external debt (TEDY); short term debt to reserves (STDR); real growth (RGRWT); reserves growth (RESG); export growth (WXG). Interestingly, inflation appears as the most significant variable in predicting a crisis. The value of the corresponding threshold is 33.44 per cent which basically splits the overall sample into: (i) episodes with low inflation (smaller or equal than 33.44 per cent) in which the probability of default is low; (ii) episodes with high inflation (greater than 33.44 per cent) in which the probability of default is high. Table 7 reports the detailed results in which the splitting rules help classify the sovereign within a specific class giving the corresponding probability of default fˆα ∗ (X ) . This is equivalent to a rating
system in which, first, we classify a sovereign within a rating cluster, second, we attribute the corresponding default probability conditional on predictors and membership rating class. An in depth analysis of the data gives some interesting insights about the crisis of a sovereign. If indeed we focus on the most risky clusters, we note that having high inflation together with external overindebtness and low exchange rate, which implies a deterioration of the current account also indicating rigid exchange regime since volatility moves in level, give rise to significant probability of sovereign default. On the other hand, high exchange rate, connected with improvements in current account and flexible regimes (high volatility), leads to reduce the probability of default (see the terminal node 10 in figure 2), which is coherent with the view that contagion (particularly for the crises of Argentina, Brazil and Chile) has been limited by more flexible exchange rates (Eichengreen, 2002). Hence, having an inflation targeting seems to be a good ingredient to control the riskiness of a sovereign. And this point seems support recent works suggesting that inflation targeting should move in tandem with floating exchange rates (see, for e.g., Minskin and Schmidt-Hebbel, 2002). However, even if the inflation is low, a negative real GDP growth associated with short term indebtness and a negative bounded reserves growth (the splitting rule is > −4.29 ) can lead to significant probability of default (see terminal node 3); on the other hand, a significant reduction in reserves growth (less than –4.29 per cent) leads, ceteris paribus, to minor default probability. At first sight this could appears as paradoxical, since the prevailing theory consider that the larger the reserves, the lower is the probability of default. If indeed we observe the so-called parent node 4, i.e. the node from which we obtain the terminal node 2 and 3, we note that the splitting rule consider episodes with a short term debt to reserves ratio greater
19
than 1.47. Hence, our tree basically control for cases in which the STDR could be greater than the threshold because reserves plumbed. In other words, what one should consider is the actual short term debt increase, and this explain why the higher probability of default is associated with reserves growth greater than − 4.29 per cent. 5.2. Second-step procedure In the second-step procedure we parameterized the non-parametric distances computed with the final model. A probit model was estimated using as covariates the volatilityscaled difference between each observation in the panel data and corresponding threshold, i.e., P (Y jt +1 = 1 H ) = N (H ′b )
(11)
where N (.) is the standard normal distribution function; H is the JT × n dimensional matrix of distance-to-collapse points estimated for the J units (countries) in the periods T for each n variable, i.e., H = (h (jt1) , K , h (jtn ) ) where h are vectors of JT volatilityscaled differences
F jt(i ) − c (i )
σ (i ) (F (i ) )
for i = 1,K, n , with F the value of the i-th variable for the
j-th country at t, c the i-th threshold, σ the standard deviation of each F variable; b is the n × 1 vector of coefficient that linearly combine the distances-to-collapse points. Computationally, our regression approach was carried out as follows: •
•
In the probit estimation, the c thresholds used in computing the covariates were those of the final tree obtained in the first-step procedure. In detail, for each observation we first considered the final node classification, then using the corresponding threshold in order to standardize the data relative to the standard deviation. In doing so, we had only two variables which threshold was “conditional” on the node classification, namely INF and TEDY. Indeed, observing the final tree structure in figure 2, we note that the thresholds for INF are 33.44 and 26.05, for those cases clustered within node number 4. Again, TEDY is used both for high inflation and relative low inflation cases, however with different spitting rules. Then, the corresponding value for c used in the probit analysis differed across units depending on the node classification: 43.78 for INF>33.44, and 44.02 otherwise. In computing the distances to the thresholds, we did not consider our expectation about the raw distances, since that for some indicators the crisis is signaled whenever F exceeds c (INF, TEDY, STDR, RESG), while for others the reverse is true (RGRWT, EXCHR, WXG), i.e. a default occurs whenever F falls below c. We simply computed F − c then inspecting the sign of the coefficient estimates in order to verify the parameter plausibility with the expected direction of the effect on the default probability. Furthermore, σ was calculated over stacked time-series and cross-sectional observations by conditioning on the
20
ownership node. It is indeed reasonable to assume “node heterogeneity” with respect to the dispersion measure of each fundamental. Hence, the model is P (Y jt +1 = 1) = ⎛ INF − c INF ⎞ TEDY − cTEDY RGRWT − (− 1.26 ) ⎜ b1 (INF ) ⎟ + b2 (TEDY ) + b3 + (RGRWT ) σ σ node node node ⎜ σ ⎟ = N⎜ ⎟ ⎜ + b4 EXCHR − 3.74 + b5 STDR − 1.47 + b6 RESG − (− 4.24 ) + b7 WXG − 5.92 ⎟ ⎜ σ (EXCHR ) node σ (STDR ) node σ (RESG ) node σ (WXG ) node ⎟⎠ ⎝ with
[
]
[
[
]
]
[
[
]
]
[
]
[
]
(12)
⎧26.05 for INF ≤ 26.05 ∧ RGRWT > −1.26 c INF ⎨ ⎩33.44 otherwise ⎧44.02 for INF ≤ 33.44 c TEDY ⎨ ⎩43.78 for INF > 33.44
•
•
•
which was estimated winsorizing explanatory variables at the 5th and 95th percentiles in order to control for extreme values24. Equation (12) is the semiparametric model (S-P Model) used in assessing time-varying and countryspecific probability of default in our analysis. In estimating pooled probit of Y on H we used the Huber-White sandwich robust variance estimator with country-specific variances. In this way we controlled for “individual heterogeneity” as well as other potential additional biases in unknown directions we suspected in cross-section data, providing appropriate asymptotic covariance matrix in the QMLE probit estimation25. This, in turns, implied that asymptotically normal regression z-test was applied to test the significance of the estimated coefficients, namely the significance of the thresholds in predicting the defaults. In order to compare our methodology with competing alternatives, we also estimated two logit models using the same explanatory variables in (12). As for the S-P Model, we winsorized the data at the 5th and 95th percentiles also using the Huber-White sandwich robust variance estimator. The two logit models, Logit1 and Logit2, differ one another only for the intercept: Logit1 includes the constant while Logit2 do not. The S-P Model and the two logit were estimated using the same database employed in the final tree (the test set), which contains 1,265 non-defaults and 66 defaults, since that the splitting rules (thresholds) are indeed based on such dataset.
5.3. Empirical results Table 8 reports the estimation results for S-P Model as well as for Logit1 and Logit2. SP Model and Logit1 exhibit signs of estimated slopes in line with expectation, while 24
This procedure is common in default and bankruptcy studies, e.g., Shumway (2001) and Hilscher and Nosbusch (2007). 25 See Freedman (2006) for further technicalities on Huber sandwich estimator.
21
Logit2 has anomalous negative coefficients for INF and TEDY. In more depth, we note that whenever INF, TEDY, STDR and RESG increase in magnitude, the likelihood of default is higher; and again, the crisis increases in probability when RGRWT, EXCHR, WXG tend to be lower. Hence, the behavior of crisis indicators seem to be coherent with results of the final tree. Both considering original factors and the distances to the thresholds scaled by the standard deviation conditional on the ownership cluster node, the sign of coefficient estimates are economically significant. However, from a statistical viewpoint, some factors do not reach significance at 0.1 level. These are EXHR and WXG, for Logit1 and S-P Model, STDR and RESG, for Logit2. As discussed in section 3.3, the robust z-tests on S-P Model coefficients give direct information on statistical significance of the thresholds used in the final tree partition. Then, we conclude that all the splitting rules appear as statistically significant except for EXHR and WXG. For these two factors, the corresponding thresholds are not significant in predicting the likelihood of country default. The ability of the S-P model to predict sovereign crises was evaluated using the Cramer’s measure26, the percentage of correctly classified observations splitting between default and non-default observations, together with the usual log likelihood and MSE. These measures are a good mix for assessing the goodness of fit and the forecasting accuracy of our model relative to Logit1 and Logit2. The choice of the Cramer’s measure and the percentage of correctly classified observations using different thresholds is because the sample is unbalanced, and as such standard fit measures could lead to misleading conclusions about the fit and predictive ability of the model, which implicitly assume a predictive rule based on 0.5 threshold level27. Model evaluation criteria are in table 8, for Cramer’s measure, log likelihood and MSE, and in table 9, for correctly classified observations. Consider, first, the goodness-of-fit measures. Our model significantly outperforms Logit1 and Logit2 with respect to the Cramer’s measure: the value is 0.4665, for S-P Model, while for Logit1 and Logit2 the values are 0.1913 and 0.0547, respectively. The same conclusion arises also considering the log likelihood and the MSE, although these statistics do not control for unbalanced panel data leading to possible biased goodness of fit measures. Measures of predictive ability of the models in table 9 reveal that the S-P Model do a better job in default prediction: when considering thresholds of 0.05, 0.1, 0.25 and 0.5, the corresponding percentage of correctly classified defaults are 0.8182, 0.8030, 0.6818 and 0.5152, which significantly outperform Logit1 and Logit2. An in-depth analysis of data in table 9, shows also that Type II error (incorrectly classified defaults) is greater for S-P Model, although the correctly classified non-defaults range from 0.7454 to 0.9542. This signifies that the trade-off between Type I and Type II error of our model however maintains high predictability for both default and non-default. And indeed, if we consider asymmetric classification costs, placing as obvious more weight on Type I error (incorrectly classified non-default), it is easy to check that our model outperform Logit1 and Logit2 in each threshold values.
(
(
)
) (
(
)
)
Cramer (1999) suggests using the measure, average 1 − Pˆ Yi = 0 − average 1 − Pˆ Yi = 1 where Pˆ is the fitted probability 27 See Greene (2008), pp. 790-793 on this point. 26
22
To summarize, our results suggest that parsimonious pooled probit model used to parameterize the MDCP forecasts relative well and outperforms naïve logit models. Parametric tests on threshold values indicates that excepts for exchange rate and exports growth, all the selected variables are significant predictors, other than significant discriminators. This signifies that a “rule of thumbs” approach in the spirit of EWS that simultaneously monitor a parsimonious set of variables can be parameterized in order to reconcile structural versus reduced versus primitive default models. In a sense, what we obtain is a sort of generalization of the Mertonian-type distance-to-default, in which the thresholds are computed so as to minimize the default prediction error and the probit helps inspect how the distances to the threshold can be mixed together obtaining a multidimensional distance to collapse point. 6. Conclusion In this paper we introduced a novel approach in predicting sovereign debt crisis proposing a possible remedy to the unreliability of reduced models, which are usually conceived assuming linear structural relationships. Indeed, due to high complexity and multidimensional nature of the sovereign default, predictive models often deliver irrelevant theory and questionable conclusions. On one extreme, we could inspect the issue from a pure algorithmic perspective, then without making assumption about the data generating process. On the other, we could reduce the sovereign default as a data process generated by independent draws from predictor variables, parameters and random noise, as it is the case for logistic regression. Averaging these two philosophical perspectives, we propose a two-step procedure in which, first, we construct a non-parametric early warning system the signals a potential crisis whenever a set of indicators exceeds specific threshold; second, we parameterize such distances to corresponding threshold using simple pooled probit model. What we obtain is a semi-parametric model able to reduce a multidimensional distance to what we defined as “collapse point”, since that within our context the threshold is the key element to understand whether fundamentals are bad enough to signal an impending crisis. The choice to semi-parametrically inspect the issue of the country credit risk by combining algorithmic and parametric modeling cultures, de facto leads to bypass the problem of the reliability of predictive models, while maintaining a theoretical-based explanation of the process through which a country fails. The empirical analysis computed on a comprehensive database of sovereign default in the period 1975-2002 offers convincing evidence about predictive ability of our model. We find that over 22 potential predictors, both qualitative and quantitative, the most significant variables are: inflation, exchange rate, total external debt, short term debt to reserves, real growth, reserves growth and export growth. Together, these factors deliver prediction of future events around the 80 per cent of the actual defaults. Inflation appeared as the most significant predictor, discriminating between high (inflation above the 33.44 per cent) and low (inflation below the 33.44 per cent)
23
probability of default. However, even though inflation targeting seems to be a good ingredient in controlling the sovereign risk, we also find that a negative real GDP growth associated with short term indebtness and a negative bounded reserves growth can lead to significant probability of default. These findings seem corroborate our view for which complex data mining techniques and simple parametric models are not necessarily conflicting. Indeed, algorithmic and parametric models can mach together delivering useful tools for both statisticians and policymakers. Future research we posted in our agenda is about corporate defaults. We are indeed somewhat convinced that our S-P Model may offer new possibilities and new insights on this issue.
24
References Berg, A. and C. Pattillo (1999). “Predicting Currency Crises: The Indicators Approach and an Alternative.” Journal of International Money and Finance 18:561-586. Bhanot, K. (1998). “Recovery and Implied Default in Brady Bonds.” Journal of Fixed Income 8: 47-51. Bhatia, A. (2002). “Sovereign Credit Ratings Methodology: An Evaluation.” IMF Working Paper WP 02/170. Breiman, L. (2001b). “Statistical Modeling: the Two Cultures.” Statistical Science, (16) 3: 199-231. Breiman, L., J. Friedman, R. Olshen and C. Stone (1984). Classification and Regression Trees. Wadsworth Inc., California. Bulow, J. and K. Rogoff (1989) “Sovereign Debt: Is to Forgive to Forget?” American Economic Review 79: 43-50. Claessens, S., and G. Pennacchi (1996) “Estimating the Likelihood of Mexican Default from the Market Prices of Brady Bonds.” Journal of Financial and Quantitative Analysis 31: 109-126. D. Duffie and K. Singleton (2003). Credit Risk. Princeton University Press. Duffie, D., L. Pedersen and K. Singleton (2003). “Modeling Sovereign Yield Spreads: A Case Study of Russian Debt.” Journal of Finance 58: 119-159. Eaton, J. and M. Gersovitz (1981). “Debt with potential Repudiation: Theoretical and Empirical Analysis.” Review of Economic Studies (48): 289-309. Eichengreen, B. (2002). “Predicting and Preventing Financial Crises: Where Do We Stand? What Have We Learned?.” Kiel Week annual conference, Kiel, Germany, 24 25 June 2002. Frankel, J. and A. Rose (1996). “Currency Crashes in Emerging Markets.” Journal of International Economics 41:351-366. Freedman, D. (2006). “On the So-Called ‘Huber Sandwich Estimator’ and Robust Standard Errors.” The American Statistician 60: 299-302. Gapen, M., D. Gray, C. Lim and Y. Xiao, Y. (2005) “Measuring and Analyzing Sovereign Risk with Contingent Claims.” IMF Working Paper WP 05/155.
25
Gibson R. and S. Sundaresan (2001). “A Model of Sovereign Borrowing and Sovereign Yield Spreads.” Working paper, Graduate School of Business, Columbia University Working Paper. Goldstein, M., G. Kaminsky. and C. Reinhart (2000). Assessing Financial Vulnerability: An Early Warning System for Emerging Markets. Washington, D.C.: Institute for International Economics. Gray, D., R. Merton and Zvi Bodie (2006). “A New Framework for Analyzing and Managing Macrofinancial Risks of an Economy.” NBER Working Paper Series WP12637. Hastie, T., R. Tibshirani and J. Friedman (2001). The elements of Statistical Learning: Data mining, Inference, and Prediction. Springer, New York. Hilscher, J. and Y. Nosbusch (2007). “Determinants of Sovereign Risk: Macroeconomic Fundamentals and the Pricing of Sovereign Debt.” Working Paper, Brandeis University International Business School. Kaminsky, G., S. Lizondo, and C. Reinhart (1998). “Leading Indicators of Currency Crises.” IMF Staff Papers 45:1-48. Keswani, A. (2005) “Estimating A Risky Term Structure Of Brady Bonds.” The Manchester School 73(1): 99-127. Leblang, D. (2002). “The Political Economy of Speculative Attacks in the Developing World.” International Studies Quarterly 46 (1):69-91. Longstaff, F. and E. Schwartz (1995). “A Simple Approach to Valuing Risky Fixed and Floating Rate Debt.” Journal of Finance 50: 789-819. Manasse, P. and N. Roubini (2007). “Rules of Thumb for Sovereign Debt Crises”, IMF Working Paper WP 03/05. Manasse, P., N. Roubini and A. Schimmelpfennig (2003). “Predicting Sovereign Debt Crises.” IMF Working Paper WP 03/221. Merrik, J. (2001). “Crisis Dynamics of Implied Default Recovery Ratios: Evidence from Russia and Argentina” Journal of Banking and Finance 25(10): 1921-1939. Merton, R. (1974). “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates.” Journal of Finance 29:449-470. Mishkin, F.S. and K. Schmidt-Hebbel (2001). “One Decade of Inflation Targeting in the World: What Do We Know and What Do We Need to Know.” NBER Working Paper 8497.
26
Morris, S. and H.S. Shin (2000). “Rethinking Multiple Equilibria in Macroeconomic Modeling.” NBER Macroeconomics Annual. Pagès, H. (2000). “Estimating Brazilian Sovereign Risk from Brady Bond Prices.” Working Paper, Bank of France. Pan, Y. and K. Singleton (2006) “Default and Recovery Implicit in the Term Structure of Sovereign CDS Spreads.” Working Paper. Graduate School of Business, Stanford University. Protter, P. (1990). Stochastic Integration and Differential Equations. Springer, New York. Reinhart, C. (2002). “Credit Ratings, Default, and Financial Crises: Evidence from Emerging Markets.” World Bank Economic Review 16(2): 151-170. Vezzoli, M. and C. Stone (2007). “Cragging”, manuscript, Department of Statistics, University of California, Berkeley.
27
Table 1: Dependent and independent variables Acronym Crisis Dependent Variable SP Independent variables 1. Dummy variables MAC DAFR DADV DAPD
Name
Description
S&P's
Sovereign Default
Market Access Dummy Regional Dummies Regional Dummies Regional Dummies
Aggregation World Economic Outlook (WEO) - Africa Aggregation WEO - Asia and other Asia (China, Cyprus, Israel, Korea) Aggregation WEO - India (India, Indonesia, Malaysia, Papua New Guinea, Philippines, Sri Lanka, Thailand Aggregation WEO - Western Hemisphere (Central and South America) Aggregation WEO - Middle East and Europe (Jordan, Lebanon, Oman, Pakistan, Turkey) Aggregation WEO - Countries in Transition: Central, Eastern European countries, Russia, the other states of the former Soviet Union. Oil producing nation defined by WEO (fuel main export)
DWHD DMED
Regional Dummies Regional Dummies
DTRANS
Regional Dummies
DOil Oil Dummy 2. Regressors from the Fund’s EWS for currency crisis IMF CAY Current account ResG Reserves growth WX Exports(WEO) WXG Export growth (WEO) STDR Short-term debt to reserves M2R M2 to reserves OVER Percentage deviation of the Exchange Rate from its trend (from regression on linear trend) EXCHR Exchange rate 3. Macroeconomic variables INF Inflation NRGWT Nominal growth RGRWT Real growth 4. Debt variable TEDY Total external debt 5. Political economy variable PR Index of Political Rights
Sources: IMF, Standard & Poor’s, World Bank.
28
Dummy variable Current account balance in of GDP Percent change in reserves (year-on-year) Exports in Billions of US$ Percent change in exports of goods and services (in US$, year-on-year) External short-term debt to reserves ratio M2 to reserves ratio Estimated value of the residual divided by the predicted value Exchange rate (average per US$) Percent change in CPI (year-on-year) Nominal GDP growth (year-on-year, in percent) Real GDP growth (year-on-year, in percent) Total external debt in percent of GDP (Public & Private) Country Rating of Political Rights (Yearly)
Table 2: Descriptive statistics of winsorized predictors - 1975-2002 Mean Default
St. Dev.
Non Default
Default
Test of Equality of Means
Non Default
t-stat
p-value
CAY
-3.8554
-4.1803
4.9354
5.8274
0.4574
0.6474
ResG
33.4
15.4006
71.1454
37.8688
3.6401
0.0003
WX
4.8344
7.1635
6.4602
10.6075
-1.8181
0.0693
WXG
0.9577
8.5168
14.3252
15.3359
-4.0276
0.0001
STDR
4.5033
1.6785
5.0025
1.7158
11.3800
0.0000
M2R
16.4078
8.8448
18.8095
8.6566
6.5154
0.0000
INF
32.8178
13.354
31.7452
14.1084
10.2130
0.0000
NRGWT
30.5163
16.7789
34.4861
13.9131
7.1462
0.0000
RGRWT
-0.9535
3.7111
5.4259
3.986
-9.3272
0.0000
TEDY
67.3861
44.8676
35.1804
22.1615
7.9680
0.0000
EXCHR
138.7322
188.4298
257.8426
391.6183
-1.0490
0.2944
OVER
-9.1312
-98.3111
393.9637
1023.0445
0.7261
0.4679
PR
4.5086
4.1676
1.7787
1.7543
1.5822
0.1138
Default observations: 70 Non-Default observations: 1265
The table reports summary statistics on winsorized distributions at the 5th and 95th percentile levels. Mean and standard deviations are computed splitting between default and non-default observations. The last two columns report the t-test and the corresponding p-value computed testing the hypothesis of equality of means.
29
Table 3: Crisis episodes Year of the crisis 1975
Nobs. Defaults 1
Countries Zimbabwe
1976
2
Congo, Dem Rep, Peru
1978
3
Jamaica, Peru, Turkey
1979
2
Nicaragua, Sudan
1980
2
Bolivia, Peru
8
Costa Rica, Dominican Rep, El Salvador, Honduras, Jamaica, Madagascar, Poland, Romania
1982
7
Argentina, Ecuador, Haiti, Malawi, Mexico, Nigeria, Turkey
1983
13
Brazil, Burkina Faso, Chile, Ivory Coast, Morocco, Niger, Panama, Peru, Philippines, Sierra Leone, Uruguay, Venezuela, Zambia
1984
1
Egypt
1985
1
Cameroon
1986
7
Bolivia, Gabon, Madagascar, Morocco, Paraguay, Romania, Sierra Leone
1987
2
Jamaica, Uruguay
1988
1
Malawi
1989
1
Jordan
1990
2
Uruguay, Venezuela
1991
3
Algeria, Ethiopia, Russia
1994
1
Kenya
1995
1
Venezuela
1997
2
Sierra Leone, Sri Lanka
1998
3
Indonesia, Pakistan, Ukraine
1999
2
Ecuador, Gabon
2000
2
Ivory Coast, Zimbabwe
2001
1
Argentina
2002
2
Gabon, Indonesia
Total
70
1981
The table reports the crises episodes inspected in this paper, specifying for each year the total number of default and the corresponding countries.
Table 4: Test sets compositions Lv # 1
Countries Colombia, Costa Rica, Ecuador, Lesotho, Panama, Slovak Republic;
2
Bolivia, Cameroon, Egypt, El Salvador, Kazakhstan, Poland;
3
Burundi, Kenya, Madagascar, Senegal, Sri Lanka, Venezuela;
4
Congo, Haiti, Hungary, Korea, Mexico, Zambia;
5
Brazil, Jamaica, Morocco, Romania, Sudan, Uruguay;
6
Argentina, Ivory Coast, Honduras, India, Malawi, Russia;
7
Algeria, Chile, Gabon, Pakistan, Peru, Turkey;
8
Chez Republic, Lithuania, Malaysia, Nicaragua, Ukraine, Zimbabwe;
9
Burkina Faso, Dominican Republic, Lebanon, Niger, Paraguay, Philippines;
10
Bangladesh, Botswana, Ethiopia, Mauritius, New Guinea, Sierra Leone;
11
Indonesia, Jordan, Mali, Nigeria, Thailand, Tunisia.
In this table we analytically report the test sets composition used to estimate equation (3) in the CRAGGING.
30
Table 5: CART and α ind Number of nodes
Cross-Validated Relative Error
Alpha
36
1.0337 ± 0.0743
0.0000
34
1.0265 ± 0.0739
0.0993
33
1.025 ± 0.0741
0.1375
32
1.0247 ± 0.0739
0.1616
31
1.0232 ± 0.0738
0.1792
30
1.0202 ± 0.0737
0.1836
29
1.0127 ± 0.0732
0.1875
27
1.0078 ± 0.0729
0.1942
26
1.0049 ± 0.0727
0.2593
22
1.0065 ± 0.072
0.3204
21
0.9892 ± 0.0706
0.4242
20
0.981 ± 0.0698
0.5487
19
0.9755 ± 0.0692
0.5871
18
0.9809 ± 0.069
0.6029
16
0.9751 ± 0.0687
0.6177
15
0.9373 ± 0.0658
0.6519
14
0.9584 ± 0.0638
0.7066
11
0.9422 ± 0.0624
0.7726
10
0.9204 ± 0.0608
1.2191
9
0.9095 ± 0.0583
1.3751
8
0.9029 ± 0.0504
1.7323
6
0.8792 ± 0.0477
2.1762*
5
0.8831 ± 0.0452
2.179
3
0.9346 ± 0.036
3.1232
1
1 ± 0.0001
3.9027
The table reports the optimal cost complexity parameter (alpha) selected so as to minimize the relative error, i.e., the normalized measure of accuracy used in the cross-validation of CART model.
31
Table 6: CRAGGING and optimal α Lcrag α
Test set number (Lv)
Alpha 1
2
3
4
5
6
7
8
9
10
11
0*
0.0193
0.0300
0.0509
0.0285
0.0977
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04123
0.05
0.0193
0.0300
0.0509
0.0285
0.0977
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04123
0.1
0.0193
0.0300
0.0509
0.0285
0.0977
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04123
0.15
0.0193
0.0300
0.0509
0.0285
0.0977
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04123
0.2
0.0193
0.0300
0.0509
0.0285
0.0979
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04125
0.25
0.0193
0.0300
0.0509
0.0285
0.0979
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04125
0.3
0.0193
0.0300
0.0509
0.0285
0.0979
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04125
0.35
0.0193
0.0300
0.0509
0.0285
0.0979
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04125
0.4
0.0193
0.0300
0.0509
0.0285
0.0980
0.0488
0.0684
0.0253
0.0432
0.0170
0.0244
0.04125
0.45
0.0193
0.0300
0.0509
0.0285
0.0989
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04135
0.5
0.0193
0.0300
0.0509
0.0285
0.0993
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04138
0.55
0.0193
0.0301
0.0509
0.0285
0.0994
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04140
0.6
0.0194
0.0302
0.0509
0.0285
0.0994
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04142
0.65
0.0194
0.0302
0.0509
0.0285
0.0994
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04142
0.7
0.0196
0.0302
0.0509
0.0285
0.0996
0.0488
0.0684
0.0253
0.0432
0.0171
0.0244
0.04145
0.75
0.0205
0.0302
0.0509
0.0285
0.0997
0.0488
0.0684
0.0253
0.0432
0.0172
0.0244
0.04155
1
0.0212
0.0303
0.0509
0.0285
0.1004
0.0488
0.0685
0.0253
0.0432
0.0174
0.0244
0.04172
2
0.0264
0.0372
0.0509
0.0285
0.1004
0.0599
0.0712
0.0255
0.0434
0.0210
0.0246
0.04445
In this table we report the optimal tuning parameter computed by minimizing the out of sample prediction error over the 11 test sets. Specifically, for each value for alpha we show the corresponding MSE . computed in every test set as well as the overall average Lcrag α
32
Table 7: Sovereign rating mapping and probability of default Rating Class 1 - low risk
Rules and critical threshold INF≤26.05, RGRWT>-1.26
Probability of default 0.0272
Terminal Node # 4
2
26.05≤INF-1.26 TEDY≤44.02
0.0398
5
3
INF>33.44, TEDY≤43.78
0.0726
7
4
INF≤33.44, RGRWT≤-1.26 STDR≤1.47
0.0759
1
5
INF≤33.44, RGRWT≤-1.26 STDR>1.47, RESG≤-4.29
0.0872
2
6
26.05≤INF≤33.44, RGRWT>-1.26 TEDY>44.02
0.1630
6
7
INF>33.44, TEDY>43.78 EXCHR>3.74
0.1929
10
8
INF≤33.44, RGRWT≤-1.26 STDR>1.47, RESG>-4.29
0.2349
3
9
INF>33.44, TEDY>43.78 EXCHR≤3.74, WXG>5.92
0.3564
9
10 - high risk
INF>33.44, TEDY>43.78 EXCHR≤3.74, WXG≤5.92
0.6804
8
This table reports the splitting rules to be used in classifying observations within the nodes of the final tree also indicating the corresponding probability of defaults.
34
Table 8: Model estimates Logit1 INF TEDY RGRWT EXCHR STDR RESG WXG Constant
Logit2
S-P Model
0.0361
-0.0156
0.3813
(4.697)***
(-1.912)*
(10.325)***
0.0221
-0.0336
0.4189
(3.972)***
(-8.067)***
(6.12)***
-0.1273
-0.2457
-0.7877
(-4.067)***
(-9.732)***
(-7.877)***
-0.0005
-0.0009
-0.2927
(-1.473)
(-1.99)**
(-1.184)
0.2808
0.0604
0.7482
(5.216)***
(0.92)
(3.749)***
0.0081
0.0021
0.0766
(2.73)***
(0.77)
(0.793)
-0.0117
-0.0188
-0.2651
(-1.361)
(-2.977)***
(-3.35)***
0.1913
0.0547
0.4665
-190.982
-301.521
-186.501
0.0421
0.0638
0.0382
-5.1858 (-10.917)***
Cramer's measure Log Likelihood MSE Total observation Default observation Non-Default observation
1335 66 1201
The table reports results on Logit1, Logit2 and S-P Model estimates. Absolute values of z-statistics are in parentheses. *, **, *** denote coefficient significantly different from zer at the 0.1, 0.05 and 0.01 levels.
35
Table 9: Predictive ability Correctly Classified Default
Correctly Classified Non-Default
Correctly Classified Obs
0.05
0.7576
0.7841
0.7827
0.10
0.5758
0.8960
0.8784
0.25
0.3788
0.9744
0.9417
0.50
0.1515
0.9947
0.9484
0.05
0.6515
0.4907
0.4996
0.10
0.3636
0.7295
0.7094
0.25
0.1970
0.9110
0.8718
0.50
0.0606
0.9762
0.9259
0.05
0.8182
0.7454
0.7494
0.10
0.8030
0.8167
0.8160
0.25
0.6818
0.9101
0.8976
0.50
0.5152
0.9542
0.9301
Logit1 Threshold values
Logit2 Threshold values
S-P Model Threshold values
The table reports share of default, non-default and total observations correctly classified using different thresholds for Logit1, Logit2 and S-P Model.
36
Figure 1: Final model and optimal α
Num ber of nodes 95 81 75 72 69 65 62 59 57 55 53 49 44 42 40 38 35 33 28 26 23 21 19 17 15 13 10
6
4
2
1.6 1.5 1.4
Test Set Relative Error
1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4
α* = 0.0532 number of nodes = 10 minimized test set relative error = 0.4512 ± 0.0806
0.3 0.2 0.000
0.000
0.000
0.000
0.001
0.001
0.001
0.001
0.002
0.003
0.005
0.008
0.029
0.053
0.586
Alpha
The figure report the optimal alpha selected in the final model so as to minimize the test set relative error, which in turn delivers the number of nodes to be considered in the final tree partition.
37
Figure 2: Final Tree structure
Node 1 INF