Oct 18, 2017 - A generally accepted measure of the risk is the expected shortfall, based on ... An important tool in the risk analysis is the regression α-quantile.
arXiv:1710.06638v1 [math.ST] 18 Oct 2017
EMPIRICAL REGRESSION QUANTILE PROCESS WITH POSSIBLE APPLICATION TO RISK ANALYSIS ˇ ´ MARTIN SCHINDLER, AND JAN PICEK JANA JURECKOV A, Abstract. The processes of the averaged regression quantiles and of their modifications provide useful tools in the regression models when the covariates are not fully under our control. As an application we mention the probabilistic risk assessment in the situation when the return depends on some exogenous variables. The processes enable to evaluate the expected α-shortfall (0 ≤ α ≤ 1) and other measures of the risk, recently generally accepted in the financial literature, but also help to measure the risk in environment analysis and elsewhere.
1. Introduction In everyday life and practice we encounter various risks, depending on various contributors. The risk contributors may be partially under our control, and information on them is important, because it helps to make good decisions about system design. This problem appears not only in the financial market, insurance and social statistics, but also in environment analysis dealing with exposures to toxic chemicals (coming from power plants, road vehicles, agriculture), and elsewhere; see [30] for an excellent review of such problems. Our aim is to analyze the risks with the aid of probabilistic risk assessment. In the literature were recently defined various coherent risk measures, some satisfying suitable axioms. We refer to [5], [6], [31], [41], [43], [36], [1], [42], [9], [37], [38], and to other papers cited in, for discussions and some projects. For possible applications in the insurance we refer to [10]. A generally accepted measure of the risk is the expected shortfall, based on quantiles of a portfolio return. Its properties were recently intensively studied. Acerbi and Tasche in [1] speak on ”expected loss in the 100α% worst cases”, or shortly on ”expected α-shortfall”, 0 < α < 1, which is defined as Z 1 α −1 −1 F (u)du, (1.1) − IE{Y |Y ≤ F (α)} = − α 0 where F is the distribution function of the asset Y. The quantity can be estimated by means of approximations of the quantile function F −1 (u) by the sample quantiles. The quantile regression is an important method for investigation of the risk of an assett in the situation that it depends on some exogenous variables. An averaged regression quantile, introduced in [22], or some of its modifications, serve as a convenient tool for the global risk measurement in such a situation, when the 1991 Mathematics Subject Classification. Primary 62J02 , 62G30; Secondary 90C05, 65K05, 49M29, 91B30 . Key words and phrases. Averaged regression quantile, one-step regression quantile, Restimator, risk measurement. ˇ 15-00243S. The authors gratefully acknowledge the support of the Grant GACR 1
2
ˇ ´ MARTIN SCHINDLER, AND JAN PICEK JANA JURECKOV A,
amount of covariates is not under our control. The typical model for the relation of the loss to the covariates is the regression model Yni = β0 + x⊤ ni β + eni ,
(1.2)
i = 1, . . . , n
where Yn1 , . . . , Ynn are observed responses, en1 , . . . , enn are independent model errors, possibly non-identically distributed with unknown distribution functions Fi , i = 1, . . . , n. The covariates xni = (xi1 , . . . , xip )⊤ , i = 1, . . . , n are random or nonrandom, and β ∗ = (β0 , β ⊤ )⊤ = (β0 , β1 , . . . , βp )⊤ ∈ IRp+1 is an unknown parameter. For the sake of brevity, we also use the notation x∗ni = (1, xi1 , . . . , xip )⊤ , i = 1, . . . , n. An important tool in the risk analysis is the regression α-quantile ⊤ ⊤ b ∗ (α) = βˆn0 (α), (β b (α))⊤ ˆn0 (α), βˆn1 (α), . . . , βˆnp (α) . β = β n n It is a (p + 1)-dimensional vector defined as a minimizer b ∗ (α) = arg min β n p+1 b∈IR
(1.3)
n h io nX + ∗⊤ − α(Yi − x∗⊤ b) + (1 − α)(Y − x b) i i i i=1
+
where z = max(z, 0) and z − = max(−z, 0), z ∈ IR1 .
⊤ b ∗ (α) = (βˆ0 (α), β(α)) b The solution β minimizes the (α, 1 − α) convex combination n ∗⊤ p+1 of residuals (Yi −xi b) over b ∈ IR , where the choice of α depends on the balance between underestimating and overestimating the respective losses Yi . The increasing α ր 1 reflects a greater concern about underestimating losses Y, comparing to overestimating. The methodology is based on the averaged regression α-quantile, what is the b ∗ (α), 0 ≤ α ≤ 1: following weighted mean of components of β n n
(1.4)
p
XX b ∗ (α) = βbn0 (α) + 1 ¯n (α) = x∗⊤ β xij βbj (α), B n n n i=1 j=1
n
x∗n =
1X ∗ x n i=1 i
¯n (α) − β0 − x ¯⊤ In [22] it was shown that B n β is asymptotically equivalent to the [nα]-quantile en:[nα] of the model errors, if they are identically distributed. Hence, ¯n (·) can help to make an inference on the expected α-shortfall (1.1) even under B the nuisance regression. ¯n (α), its various modifications can also be used, sometimes better comBesides B prehensible. The methods are nonparametric, thus applicable also to heavy-tailed and skewed distribution; notice that [8] speak about considerable improvement over normality, trying to use different distributions. An extension to autoregressive models is possible and will be a subject of the further study; there the main tool will be the autoregression quantiles, introduced in [29], and their averaged versions. The autoregression quantile will reflect the value-at-risk, based on the past assets, while its averaged version will try to mask the past history. ¯n (α) with 0 < α < 1 has been illustrated in [4] and [27], and The behavior of B ¯n (α) is nondecreasing step function of summarized in [25]; here it is showed that B ¯ α ∈ (0, 1). The extreme Bn (1) with α = 1 was studied in [19]. Notice that the upper ∗ n b (·) and also of B ¯ bound of the number Jn of breakpoints of β (·) is = n n p+1 O np+1 . However, Portnoy in [34] showed that, under some condition on the
EMPIRICAL REGRESSION QUANTILE PROCESS
3
design matrix Xn , the number Jn of breakpoints is of order Op (n log n) as n → ∞, and thus much smaller. An alternative to the regression quantile is the two-step regression α-quantile, introduced in [21]. Here the slope components β are estimated by a specific rank˜ , which is invariant to the shift in location. The intercept component estimate β nR ˜ . The averaged is then estimated by the α-quantile of residuals of Yi ’s from β nR ˜n (α) is asymptotically equivalent to B ¯n (α) under a two-step regression quantile B wide choice of the R-estimators of the slopes. However, finite-sample behavior of ˜n (α) generally differs from that of B ¯n (α); is affected by the choice of R-estimator, B ˜n (α) exactly equals but the main difference is that the number of breakpoints of B to n. Being aware of various important applications of the problem, we shall study ¯n (α) is monotone this situation in more detail. The averaged regression quantile B ˜ in α, while the two-step averaged regression quantile Bn (α) can be made motonone ˜ . Hence, we can consider their inversions, by a suitable choice of R-estimate β nR which in turn estimate the parent distribution F of the model errors. As such they both provide a tool for an inference. The behavior of these processes and of their approximations is analyzed and numerically illustrated. ¯n (α) over α ∈ (0, 1). 2. Behavior of B
¯n (α) Let us first describe one possible form of the averaged regression quantile B as a weighted mean of the basic components of vector Y. Consider again the minimization (1.3), fixed α ∈ [0, 1] fixed. This was treated in [26] as a special linear programming problem, and later on various modifications of this algorithm were developed. Its dual program is a parametric linear program, which can be written simply as maximize (2.1) where (2.2)
under
x∗⊤ n1 X∗n = . . . x∗⊤ nn
ˆ(α) Yn⊤ a ⊤ ˆ(α) = (1 − α)X∗⊤ X∗⊤ n a n 1n ˆ(α) ∈ [0, 1]n , 0 ≤ α ≤ 1 a
is of order n × (p + 1).
ˆ(α) = (ˆ The components of the optimal solution a an1 (α), . . . , a ˆnn (α))⊤ of (2.1), called regression rank scores, were studied in[12], who showed that a ˆni (α) is a continuous, piecewise linear function of α ∈ [0, 1] and a ˆni (0) = 1, a ˆni (1) = 0, i = ˆ(α) is invariant in the sense that it does not change if Y is 1, . . . , n. Moreover, a replaced with Y + X∗n b∗ , ∀b∗ ∈ IRp+1 (see [12] for detail). Let {x∗i1 , . . . , x∗ip+1 } be the optimal base in (2.1) and let {Yi1 , . . . , Yip+1 } be the ¯n (α) equals to a weighted mean corresponding responses in model (1.2). Then B of {Yi1 , . . . , Yip+1 }, with the weights based on the regressors. Indeed, we have a theorem Theorem 1. Assume that the regression matrix (2.2) has full rank p + 1 and that the distribution functions F1 , . . . , Fn of model errors are continuous and increasing
4
ˇ ´ MARTIN SCHINDLER, AND JAN PICEK JANA JURECKOV A,
in (−∞, ∞). Then with probability 1 ¯n (α) = B
(2.3)
p+1 X
wk,α Yik ,
wk,α = 1
k=1
k=1
(2.4)
p+1 X
¯n (α) ≤ B ¯n (1) < max Yi B
and
i≤n
⊤
where the vector Yn (1) = (Yi1 , . . . , Yip+1 ) corresponds to the optimal base of the linear program (2.1). The vector wα = (w1,α , . . . , wp+1,α )⊤ of coefficients equals to (2.5)
p+1 X ∗ ∗ −1 ⊤ X (X ) , while wα = n−1 1⊤ wk,α = 1 n1 n n k=1
∗⊤ where X∗n1 is the submatrix of X∗n with the rows x∗⊤ i1 , . . . , xip+1 .
b ∗ (α) is a step function of α ∈ (0, 1). If α is a Proof. The regression quantile β n continuity point of the regression quantile trajectory, then we have the following identity, proven in [22]: n
(2.6) where a ˆ′ni (α)) = (2.7)
n
X 1X b∗ ¯n (α) = 1 x∗⊤ Yi a ˆ′ni (α) B i β n (α) = − n i=1 n i=1
d ˆni (α). dα a
Moreover, (2.1) implies
n X
a ˆ′ni (α) = −n
n X
xij a ˆ′ni (α) = −
i=1
i=1
n X
1 ≤ j ≤ p,
xij ,
i=1
b ∗ (α). b ∗ (·) and Yi = x∗⊤ β Notice that a ˆ′ni (α) 6= 0 iff α is the point of continuity of β n n i To every fixed continuity point α correspond exactly p + 1 such components, such that the corresponding x∗i belongs to the optimal base of program (2.1). Hence there exist coefficients wk,α , k = 1, . . . , p + 1 such that ¯n (α) = − 1 B n
n X i=1
Yi a ˆ′ni (α) =
p+1 X
wk,α Yik .
k=1
∗
b The equalities Yi = x∗⊤ i β n (α) hold just for p + 1 components of the optimal base ∗⊤ ∗ ∗ ∗ xi1 , . . . , xip+1 . Let Xn1 be the submatrix of X∗n with the rows x∗⊤ i1 , . . . , xip+1 and ∗ ′ ′ ⊤ ′ ˆip+1 (α)). Then Xn1 is regular with probability 1 and let (ˆ a1 (α)) = (ˆ ai1 (α), . . . , a 1 ′ 1 ⊤ = − (ˆ wα a (α))⊤ = 1⊤ X∗ (X∗ )−1 . n n n n n1 ∗ ∗ −1 (ˆ a′1 (α))⊤ = −1⊤ and n Xn (Xn1 )
p+1 X
wk,α = 1.
k=1
This and (2.6) imply (2.3) and (2.5). The inequality (2.4) was proven in [19].
EMPIRICAL REGRESSION QUANTILE PROCESS
5
¯n (α) as a process in α ∈ (0, 1). Assume that all model Let us now consider B errors eni , i = 1, . . . , n are independent and equally distributed according to joint continuous increasing distribution function F. We are interested in the the average regression quantile process n ∗ o b (α) − β(α) ˇ ¯ ∗⊤ B¯n (α) = n1/2 x β ; 0