Non-parametric Bootstrap Method in Risk Management - Core

13 downloads 0 Views 415KB Size Report
There are two basic approaches in the field of risk management; one of them determines the .... Then the statistical analysis is realized on the basis of statistics calculated from these ..... changes in the file are done), the generated resampling is changed and also the ... In: Proceeding of the 7th International Days of Statistics.
Available online at www.sciencedirect.com

ScienceDirect Procedia Economics and Finance 24 (2015) 701 – 709

International Conference on Applied Economics, ICOAE 2015, 2-4 July 2015, Kazan, Russia

Non-parametric Bootstrap Method in Risk Management Katarína Valáškováa,*, Erika Spuchľákováb, Peter Adamkoc a,b

c

University of Žilina in Žilina, Faculty of Operation and Economics of Transport and Communications, Department of Economics, Univerzitná 1, Žilina 010 26, Slovak Republic University of Žilina in Žilina, Faculty of Operation and Economics of Transport and Communications, Department of Quantitative Methods and Economic Informatics, Univerzitná 1, 010 26 Žilina, Slovak Republic

Abstract The process of risk management is a relatively new part of business entities and it arose due to the enormous quick changes in the market conditions. There are two basic approaches in the field of risk management; one of them determines the distribution based on the expert estimations and the other one estimates the probability distribution on the basis of the observed data. There are two types of methods to determine the risk factors: parametric and non-parametric. The aim of this paper is to analyze the group of non-parametric methods, its algorithms and theoretical background, with main emphasis given to the theoretical and practical application of a non-parametric bootstrap method. © 2015 2015 The TheAuthors. Authors.Published Publishedby byElsevier ElsevierB.V. B.V. © This is an open access article under the CC BY-NC-ND license Selection and/or peer-review under responsibility of the Organizing Committee of ICOAE 2015. (http://creativecommons.org/licenses/by-nc-nd/4.0/). Selection and/or peer-review under responsibility of the Organizing Committee of ICOAE 2015. Keywords: risk management; non-parametric approach; bootrstap method.

1. Introduction Risk management is a complex process and its main aim is to identify the areas with high level of risk, to define it, assess it and establish the measurements that either eliminate the risk or decrease it on the demand level (Buc, Kliestik, 2013). In the case that no data are available on which one could estimate the probability distribution, we rely on experts estimations if we want to determine the risk factors distribution. In such cases, as stated by Hnilica and Fotr (2009) only the following types of distribution are to be used: Uniform distribution- can be used only if the estimations of maximum and the minimum possible values are

* Corresponding author. Tel.: +421-41-5133247. E-mail address: [email protected]

2212-5671 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Selection and/or peer-review under responsibility of the Organizing Committee of ICOAE 2015. doi:10.1016/S2212-5671(15)00678-4

702

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

known. It disposes of a high uncertainty rate which is the same for all values.

X | R ( a , b) Ÿ f ( x )

1 I ( a,b) ( x); x, a, b  R, a¢ b ba

(1)

Triangular distribution- it is the most usable type of distribution to model expert estimations. It differs from the uniform distribution in the situation that if some value from the interval (minimum, maximum) = (a, b) the most probable is c. The density of the triangular distribution is:

f ( x)

­ 2( x  a) ° (b  a)( c  a) ° ° 2(b  x) ® ° (b  a)(b  c) 0 ° ° ¯

adxdc c d x d b, a, b, c, x  R otherwise (2)

Sometimes it is possible to use quantiles instead of maximum and minimum values and thus it can eliminate the problem that the estimated minimum or maximum could be exceeded (Pacakova, 2003). BetaPERT distribution – it is a special type of beta distribution and it is very similar to the triangular one, as it assumes the knowledge of minimum, maximum or the most probable value. But unlike the triangular distribution it has higher concentration of probable values around the most probable value. As stated in Wilcox (2003) beta PERT is based on beta distribution with the density:

f ( x) E

1 ­ x w (1  x) v ° ® E ( w  1, v  1) ° 0 ¯

0 d x d1 otherwise

E (v  1, w  1)

1

³ t (1  t ) 0

v

w

dt (3)

Assume that v and w are parameters of the form which is derived from three basic values. Then the other value of the parameter λ is introduced and it determines the weight subsistent to the most probable value. The fundamental value of this parameter is λ=45. Different alternatives of λ parameter are depicted in Fig. 1.

Fig. 1. (a) density of betaPERT distribution for different λ; (b) density of triangular distribution.

Then an expected value is

703

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

P

a  b  Oc 2O

(4)

and v and w parameters are calculated as follows

v

O (c  a ) (b  a)

, w O v (5)

The searched density of betaPERT distribution is

f PERT ( x)

xb ­ 1 ° ) ad xdb f E( ®b  a ab ° 0 otherwise ¯

(6)

Another way to determine the distribution is to estimate individual quantiles, starting with the estimation of median, then 25 and 75% quantiles and these intervals are again divided into a half. Quantiles are then intersected by a curve representing the searched distribution function which derivative is the searched density (Bartosova, 2008). However, these functions are very sensitive to changes of individual values and the inaccuracies can also bring a kind of interpolation function. On the other hand, if there are some historical data at disposal, we can construct the probability distributions based on it, either by parametric or non-parametric methods. Parametric method is to be defined only very briefly as main focus of this article is given to the non-parametric one. Parametric methods assume that the observed data comes from some known distribution with unknown parameters, which are estimated thanks to this data (Misankova et. al., 2014). Correctness of the choice of a particular distribution is then tested, for example. The most often used types of distribution are: normal, logistic, lognormal, gama and Pareto distributions (Cisko, Kliestik, 2013). 2. Non-parametric methods The nonparametric approach represents another way to describe a course of financial time series. The general formula of a nonparametric model is:

Xt

m(Xt 1,...,X t p )  σ(X t 1,...,X t p )ε t ,

t 1,...,n

(7)

Where m (-) and σ (-) are unknown smooth functions (σ is non-negative) and εt are iid random variables with zero mean and unit variance. There are no more requirements for the smoothness of m (-) and σ (-) functions. The previous non-parametric model is called non-parametric autoregressive conditional heteroscedastic (NARCH) model or non-parametric autoregressive (NAR) model if σ (-) function is constant (Engle, 1982). However, the model is useful only if p = 1 or 2. In the case of higher values of p, the estimation of non-parametric forms of functions m (-) and σ (-) is really complicated because we expect a huge number of observations. So, we assume that p equals to 1. Non-parametric models are not used only in the autoregression (Fink, Kreiss, 2013). The basic principle can be described by the classical regression. Let consider the simplest model with two (financial) variables xt and yt, the relation between then is represented by a smooth function m:

Yt

m( X t )  H t , t 1,...,n

(8)

m (-) is the unknown variable and ^ε t `| IID(0,1) . The most important objective of the non-parametric approach is to estimate the function m (-) based on the observed data. X1 is an estimation of the function m (-) based on the

704

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

observed data X1,...,X n and Y1,...,Yn .(Fan, Yao, 2003).

(ε n

1 ¦ ε t o 0) n

(9)

The simplest estimation, based on the law of large numbers (9), assumes that the estimation of the function m (-) is the arithmetic mean:

mˆ ( x)

1 n

n

¦Y

t

t 1

(10)

But, this estimation is not very appropriate as it does not consider values of Xt that can differ significantly from the various values of argument x. Therefore the weighted average is used and the weights of wt(x) depends on the distance of Xt from x.

mˆ ( x)

1 n

n

¦ w ( x).Y t

t

t 1

(11)

There are several non-parametric methods to determine the distribution of risk factors. The simplest one of them is the bootstrap method which generates the possible values of factors of the known historical data. 2.1. Non-parametric bootstrap method The bootstrap method, presented by Efron in 1979, is used in the number of statistics issues. It simulates what would happen if we observed repeated samples from the basic set in such a way that the available data creates new random selections, i. e. resampling, replication (Hardle, Muller, Sperlich, 2004). These random selections can have a smaller dimension than the dimension of the original random data and they can be created with or without replications. The best results are achieved if the bootstrap random selections are designed with replications and have the same extent as the original selection (Franke, Neumann, Stockis, 2004). The bootstrap method is based on a creation of more B selections with extend of n based on the original selection with the same extend of n. New selections are determined by the selection of elements from the original set with replication (Leng, Tsai, 2014). Then the statistical analysis is realized on the basis of statistics calculated from these B bootstrap selections. Thus, we can obtain relatively reliable results without any restrictive assumptions. The following table depicts the basic principle of bootstrap method. Table 1. Basic principle of the bootstrap method. X X1 X2 X3 X4 X5 X6

(original) → (repetition) random selection with repetition

X1* X3 X1 X6 X4 X1 X6

X2* X2 X2 X3 X4 X1 X2

X3* X3 X6 X5 X2 X3 X1

… … … … … … …

XB* X1 X6 X2 X1 X4 X4

Suppose we have a random selection X1,...,X n of observed values. As X1,...,X n are random variables, they have a distribution function F and let θ θ(F) be an unknown parameter which we would like to estimate using the statistics Tn Tn (X1,...,Xn ) Otherwise, we want to find a point estimation of the parameter T ; θˆ Tn Tn (X1,...,Xn ) . We can assume that we have the statistics (estimation) and we can find the character of this estimation. (Fleming, Kirby, Ostdiek, 2005). The most common statistical issues reflect on the deflection of

705

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

estimation variance, standard deviation of the estimation, confidence intervals or critical value for hypotheses testing. The method of bootstrap was firstly used as a universal instrument that brings precise results for all issues automatically. Since then it has been updated many times and it is one of the favorite non-parametric methods that can be used in a form of autoregressive bootstrap (in a case of non-parametric time series) and wild bootstrap (differs from the autoregressive in a form of bootstrap residues formation). The big advantage of this method is that no assumption of the data distribution is necessary. According to Praskova (2004), let X1, X2 ,...,Xn be independent, same distributed, random variables with an unknown distribution function F. We would like to estimate some characteristic of this distribution function θ θ(F) . Let Sn Sn (X1, X2 ,...,Xn) be a statistics to estimate the parameter θ . The bootstrap method means that the unknown distribution function F is replaced by the empirical distribution function:

1 n

Fn ( x)

n

¦ I

 f, x @ ( X i )

i 1

(12)

ˆ ,X ˆ ,...,X ˆ is a random selection of Fn (i.e. bootstrap selection), each of them takes value of Thus if X 1 2 n X1, X2 ,...,Xn with the probability of 1/n. If the original selection is replaced by the bootstrap selection and the distribution function F by the empirical distribution function then we get the estimation θˆ θ(Fn ) .The estimation of the original data distribution can be found by generating many bootstrap selections (Jaros, Melichar, Svadlenka, 2014). The risk management process focuses on the estimation of mean value, volatility and some quantile or VaR. First of all, the mean value estimation is done by the selective mean: Xn

1 n

n

¦X

i

i 1

(13)

The estimation of variance is calculated by the selective variance

S n2

1 n 1

n

¦(X

i

 X n )2

i 1

(14)

To estimate quantiles, if α  0,1) , the selective quantile is used:

xˆD

^

Fˆn1 (D ) inf x : Fˆn ( x) t D

`

(15)

The resulting probability distribution is described by randomness of how this data was generated. The parameter estimation, which is generated from these data, is exposed to a considerable extent of uncertainty which is a consequence of point estimation search. The idea of the uncertainty can be generated by the distribution of ˆ ,X ˆ ,...,X ˆ is a random selection of the original data individual parameters. Firstly, we assume that the observation X 1 2 n ˆ ˆ ˆ distribution. Then the selection and replications of X1, X2 ,...,Xn are realized many times. Finally, each selection with replication is described by certain statistic characteristics (Gavlakova, Kliestik, 2014). This procedure provides a distribution of the estimated parameters which are used to specify interval estimation (5% quantile, 95% quantile). Interval estimation supplements the point estimation and indicates the measure of how the actual values may differ from the expected. Some modifications of this method lies in the fact that the measured values are given a minimum, which is lower than all observed values, and maximum, which is higher than all values (Kollar, Bartosova, 2014). It helps eliminate the problem that the observed data may not cover all possible values of the studied risk factor.

706

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

To depict the main advantages of the bootstrap method (Kollar, Kliestik, 2014), we do not have to forget to mention that is very user friendly, it frees us from the two constraints of classical approach, namely the assumption of normal data distribution and the need to focus on statistics which theoretical properties can be analyzed only mathematically and finally it speeds up the convergence to the actual distribution and it is suitable for smaller selections. 3. Application of a bootstrap method The application of a bootstrap method in its simplified version is realized in the chosen business entity with main emphasis given to the elimination of risk factors in the field of bearings production. KINEX BEARINGS is a company dealing production of rolling bearings for different segments of industry. It is on the first place in a chart of top suppliers of special bearings for textile industry and a European leader in production of cylindrical roller bearings for railway vehicles. The requirement for high quality products is its commonplace. The company wants to minimize the production of defective products to meet the high standards of its customers. They want to achieve the zero faultiness to minimize the risk of a customer loss and so they decide to train its employees to produce top quality products. And thus it needs to analyze the time needed by new employees to be able to produce bearings without any default. The time needed to train new employees has normal distribution with two unknown parameters μ and σ 2 . In a selective set of 55 randomly selected employees the following information about x i (in days) was found:

27 23 28 29 27 31 39 32 26 38 31 25 36 33 34 30 18 27 27 31 42 28 35 32 26 20 30 32 29 24 40 32 22 34 38 21 29 33 37 27 15 41 26 18 22 27 31 28 16 23 31 24 29 32 22

(16)

The confidence level (1-α) = 0.95 is used to determine the limits of the confidence interval for the mean of the basic set μ . The basic set is determined by all employees of the company KINEX who are a part of bearings production. As the variance σ 2 of the random variable X in the basic set is not known, the interval estimation of the parameter μ is calculated using the following formula:

s

x t

1

D

n

2

 P  x t

1

s D

n

2

(17)

The selective data set is a fundament for the calculation of selective characteristics of the selective mean x (13) and selective variance s2 (14). The values figured in this example are: 55

¦x

i

x

i 1

n

55

1588 55

¦ (x

i

28.873

s

2

 28.873) 2

i 1

54

2084.109 54

38.595 (18)

38.595 6.21. If the confidence interval equals to 0.95, the The calculated selective standard deviation is s quantile t 0,975 of t distribution with ν n  1 54 degrees of freedom is determined. According to the statistics table the values of t 0,975is 2.03. Using the formula (17) the interval estimation is calculated as follows

707

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

28.873  2,03.

6.21 55

 P  28.873  2,03.

6.21

(19)

55

Calculating the lower and upper limits: 27.172  μ  30.573 . Taking the confidence level or probability of 95 % into account, an average time to train employees of KINEX to produce top quality bearings with zero faultiness is between 27.172 and 30.573 days. An employee with required education and qualification needs the above mentioned time, so in a case of more days, the company pays extra money for training which increase the production costs. Within the mentioned limits the employee is trained enough to produce the bearings in the required quality, with minimal production costs and with minimal risk of defective bearings production. The bootstrap method can be calculated using the MS Excel. We use the same input data of the company KINEX to analyze the time of training to reduce defective production (16). In MS Excel, the bootstrap sampling is to be used. First of all we have to define the sample which includes the information about the training time of the selected group of 55 employees. As this selection is relatively small, whereas the number of employees in the company is not very high (about 500), we have to increase the number of observations artificially using the bootstrap replication. These replications or resampling can be easily implemented in MS Excel using the following code: =INDEX(sample;ROWS(sample)*RAND()+1;COLUMNS(sample)*RAND()+1) The code always chooses one element from the original choice with replacement. We copy the code to 11 columns, so that in each replication there is the same number of observations as in the original selection. Depending on the number of replications (B), we copy the code to the B number of rows. In our case, B = 500. For each resampling we calculate mean, median and standard deviation. The following table depicts the first ten rows of the bootstrap calculation. Table 2. Statistics of individual bootstrap replications. average 28.63636 27.81818 26.36364 28 30.18182 27.81818 30 27.90909 31.27273 29.72727

median 30 28 24 27 27 27 29 28 31 29

Stan.deviation 6.8516016 5.3385391 6.4247784 7.9021797 8.2882112 5.3489355 6.2271806 6.0207973 5.2862505 7.350359

To illustrate the results, the graph of probability density of the bootstrap resampling averages was constructed.

Fig.2. Probability density of the Bootstrap resampling averages

708

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

We use the MS Excel matrix function Frequency; data_array is presented by the value of median for each replication row and bins_array represents the possible values that the parameter (training time) can achieve. Finally we calculate the confidence interval. To do it, it was necessary to define variables, such as total sample, alpha, confidence level, lower and upper bounds of the interval. To calculate the sample size, we use the excel function Count (the input data is the calculated bootstrap mean), alpha was assessed at the level of 0.05 and thus the confidence level (a-alpha) is 0.95 (95 %). The lower bound was determined by the Small function where the input data is the bootstrap median (array) and total sample multiplied by alpha (nth_position) and the upper bound by the same function but different algorithm of calculation; the bootstrap median (array) and total sample multiplied by 1alpha (nth_position). The results is depicted in the following table: Table 3. Confidence interval of bootstrap replications. Total sample Alpha Conf. level Lower bound Upper bound

500 0.05 0.95 26 34

The calculation in MS Excel was realized to declare that the bootstrap samples, their statistics and confidence intervals can be calculated relatively easily, without any special statistical programs. The bootstrap method is based on the resampling and a random number generator is also used. Therefore, every time the F9 key is typed (or any changes in the file are done), the generated resampling is changed and also the calculated bootstrap statistics and graph. The results of the MS Excel bootstrap calculation are bit different from the first one, as when the confidence level of probability is again 95 %, an average time to train employees of KINEX to produce top quality bearings with zero faultiness is between 26 and 34 days. The bounds are slightly exceeded, so an employee is given a longer time to gain the required training. 4. Conclusion When dealing with the various risk factors, most of the time the parametric methods are preferred because the ideas of estimations, confidence intervals, hypothesis testing, analyzes of the variance are valid only if some important conditions are fulfilled. One of them may be a rule that choice is from the given distribution, either fully known or with certain parameters. Many methods accept the changes in the conditions, especially when the huge selections are considered. But in a case of small selections, the non-parametric methods are to be used. Unlike the parametric method, they use much more general assumptions, their determination and calculation is relatively easy and modest. Non-parametric methods are commonly used method in statistics where small sample sizes are used to analyze nominal data. Non-parametric method is used when the researcher does not know anything about the parameters of the sample chosen from the population. Hence, this method is sometimes referred to as parameterfree or distribution-free method. One of the most favorite non-parametric method is the bootstrap method, which simulates what would happen if we observed repeated samples from the basic set in such a way that the available data creates new random selections (resampling, replication). One of the biggest advantages of the bootstrap method is its simplicity. There is no need to assume the specific data distribution based on some theoretical distribution. We use the empirical distribution and we can analyze the statistics which theoretical properties cannot be analyzed mathematically. Acknowledgements The contribution is an output of the science project VEGA 1/0656/14- Research of Possibilities of Credit Default Models Application in Conditions of the SR as a Tool for Objective Quantification of Businesses Credit Risks.

Katarína Valášková et al. / Procedia Economics and Finance 24 (2015) 701 – 709

References Bartosova, V., 2008. Financial analysis and planning. EDIS Publishers, Zilina, pp. 82. Buc, D., Kliestik, T., 2013. Aspects of statistics in terms of financial modelling and risk. In: Proceeding of the 7th International Days of Statistics and Economics, Prague, pp. 215-224. Cisko, S., Kliestik, T., 2013. Financny manazment podniku II. EDIS Publishers, Zilina, pp. 775. Engle, R. F., 1982. Autoregressive Conditional Heteroskedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 50, 987-1007. Fan, J. a Yao, Q., 2003. Nonlinear Time Series, Nonparametric and Parametric Methods. Springer, New York, pp. 551. Fink, T., Kreiss, J. P., 2013. Bootstrap for Random Coefficient Autoregressive Models. Journal of Time Series Analysis 34 (6), pp. 646-667. Available at: . Fleming, J., Kirby, C., Ostdiek, B., 2005. Bootstrap Tests of Multiple Inequality Restrictions on Variance Ratios. Economics Letters, Forthcoming. Available at: . Franke, J., Neumann, M. H. a Stockis, J.P., 2004. Bootstrapping nonparametric estimators of the volatility function. Journal of Econometrics 118, 189–218. Gavlakova, P., Kliestik, T., 2014. Credit Risk Models and Valuation. In: 4th International Conference on Applied Social Science. Information Engineering Research Institute, Singapore. Advances in Education Research 51, 139-143. Härdle, W., Müller, M., Sperlich, S. a Werwatz, A. 2004. An introduction to Non- and Semiparametric Models. Springer, Berlin, pp.300. Hnilica, J., Fotr, J., 2009. Aplikovana analyza rizika. Grada, Praha, pp. 259. Jaros, J., Melichar, V., Svadlenka, L., 2014. Impact of the Financial Crisis on Capital Markets and Global Economic Performance. In: 18th International Conference on Transport Means, Lithuania, pp. 431-434. Kollar, B., Bartosova, V., 2014. Comparison of Credit Risk Measures as an Alternative to VaR. In: 2nd International Conference on Social Sciences Research. Advances in Social and Behavioral Sciences 5, pp. 167 – 171. Kollar, B., Kliestik, T., 2014. Simulation approach in credit risk models. In: 4th International Conference on Applied Social Science. Information Engineering Research Institute, Singapore. Advances in Education Research 51, 150-155, 2014. Leng, C., Tsai, C. L., 2014. A Hybrid Bootstrap Approach to Unit Root Tests. Journal of Time Series Analysis 35 (4), 299-321. Available at: . Misankova, M., Kocisova, K., Frajtova-Michalikova, K., Adamko, P., 2014. CreditMetrics and Its Use for the Calculation of Credit Risk. In: 2nd International Conference on Economics and Social Science. Information Engineering Research Institute. Advances in Education Research 61, 124-129. Pacakova, V. et al., 2003. Štatistika pre ekonómov. Iura Edition, Bratislava. pp.268. Praskova, Z., 2004. Metoda bootstrap. Robust 5, 299-314. Wilcox, R.R., 2003. Applying contemporary statistical techniques. Academic Press, Burlington, pp. 605.

709

Suggest Documents