Stochastic Models in Risk Theory. Michael Zazanis .... 0 for all n ∈ N. Also recall
the binomial theorem, valid for ...... Applied Probability and Queues, John Wiley.
Stochastic Models in Risk Theory Michael Zazanis Department of Statistics Athens University of Economics and Business Summer 2005
Chapter 1 Discrete Distributions 1.1 Sums of discrete independent random variables Let X, Y , be independent random variables with values in Z. Suppose that an = P (X = n), bn = P (Y = n) denotes the distributions of X and Y respectively. The distribution of Z := X + Y is then given by P (Z = n) = =
∞ X k=−∞ ∞ X
P (X + Y = n, Y = k) =
∞ X
P (X = n − k, Y = k)
k=−∞ ∞ X
P (X = n − k)P (Y = k) =
k=−∞
an−k bk .
k=−∞
For the most part we will restrict ourselves to distributions on the non-negative integers. In this case, if X, Y , take values on N, then P (Z = n) =
n X
an−k bk
for n ∈ N.
k=0
If {an }, {bn }, n ∈ N are real sequences then the sequence {cn } where cn = is called the convolution of the two sequences. We write cn = (a ? b)n .
Pn k=0
an−k bk
1.2 The Probability Generating Function The probability generating function (p.g.f.) of a discrete random variable X (with values in N) is defined as ∞ X X φ(z) := Ez = P (X = n)z n . (1.1) n=0
1
The series at least for all z ∈ [−1, 1]. We note that if pk = P (X = k), P∞above converges k φ(z) = k=0 pk z , and by φ(k) (z) we denote the derivative of order k at z, then pk =
1 (k) φ (0), k!
k = 0, 1, 2, . . . ,
(1.2)
and E[X(X − 1) · · · (X − k + 1)] = φ(k) (1).
(1.3)
The latter is called the descending factorial moment or order k. Ordinary moments can be easily obtained from these. Finally we note that the probability distribution {pn } obviously determines uniquely the p.g.f. φ(z) and, reversely, the p.g.f. uniquely determines the probability distribution via (1.2). In particular we point out that, if X, Y , are independent random variables with p.g.f.’s φX (z), φY (z) respectively, then the p.g.f. of their sum Z = X + Y is given by φZ (z) = φX (z)φY (z). To see this it suffices to note that φZ (z) = E[z X+Y ] = E[z X z Y ] = Ez X Ez Y , the last equality holding because of the independence of X, Y . The above relation extends readily to the case of any finite number of independent random variables. In particular if Xi , i = 1, 2, . . . , n are i.i.d. (independent, identically distributed) random variables with (common) probability generating function φX (z) then their sum Sn := X1 + · · · + Xn has p.g.f. given by φSn (z) = (φX (z))n . While the p.g.f. of the sum Sn is readily obtained in terms of the p.g.f. of each of the terms Xi , the corresponding probability distributions are in general hard to compute. Based on the above discussion it should be clear that ¯ ¯ 1 dk n¯ P (Sn = k) = (φX (z)) ¯ , k k! dz z=0 a quantity that, in the general case, is not easy to evaluate. Alternatively, if pk = P (X = k) then P (Sn = k) = p?n k := (p ? · · · ? p)k , the n–fold convolution of the sequence {pn } with itself. We give some examples of discrete probability distributions.
1.3 Discrete distributions 1.3.1 The Bernoulli and the Binomial distribution The random variable
½ ξ=
0 1
w.p. q := 1 − p, w.p.p
where p ∈ [0, 1] is called a Bernoulli random variable. It is the most elementary random variable imaginable and a useful building block for more complicated r.v.’s. Its p.g.f. is given by φ(z) = 1 − p + zp, its mean is p and its variance is pq.
2
If ξi , i = 1, 2, . . . , n are independent Bernoulli random variables with the same parameter p then their sum X := ξ1 + ξ2 + · · · + ξn is Binomial with parameters n and p. Its distribution is given by µ ¶ n k P (X = k) = p (1 − p)n−k , k = 0, 1, 2, . . . , n, k and its p.g.f. by φ(z) =
n µ ¶ X n k=0
k
pk (1 − p)n−k z k = (1 − p + pz)n .
The mean and variance of the binomial can be readily obtained from its representation as a sum of independent Bernoulli random variables. Indeed, EX = E[ξ1 + · · · + ξn ] = np and Var(X) = Var(ξ1 + · · · + ξn ) = Var(ξ1 ) + · · · + Var(ξn ) = nqp. Note that, if X ∼ Binom(p, n), Y ∼ Binom(p, m), and X, Y , are independent, then X + Y ∼ Binom(p, n + m).
1.3.2 The Poisson distribution X is Poisson with parameter α > 0 if its distribution is given by P (X = k) =
1 k −α α e , k!
k = 0, 1, 2, . . . .
Its p.g.f. is given by ∞ X
∞ X 1 k −α 1 −α φ(z) = z α e =e (αz)k = e−α ezα = e−α(1−z) . k! k! k=0 k=0 k
The mean and variance of the Poisson can be easily computed and are given by EX = Var(X) = α. One of the most important properties of the Poisson distribution is that it arises as the limit of the binomial distribution Binom(n, α/n) when n → ∞ (i.e. in the case of a large number of independent trials, say n, each with a very small probability of success, α/n). This is easy to see by examining the probability generating function of the binomial (n, α/n) and letting n → ∞. Indeed, µ ¶n ³ α ´n α(1 − z) α = lim 1 − = e−α(1−z) lim 1 − + z n→∞ n→∞ n n n which establishes that Binom(α/n, n) → Poi(α) as n → ∞. We also point out that, if X1 , X2 are independent Poisson random variables with parameters α1 , α2 respectively, then X1 + X2 ∼ Poi(α1 + α2 ). The easiest way to see this is to consider the p.g.f. Ez X1 +X2 = Ez X1 Ez X2 = e−α1 (1−z) e−α2 (1−z) = e−(α1 +α2 )(1−z) .
3
1.3.3 The geometric distribution If X is geometric with parameter p its distribution function is given by P (X = k) = q k−1 p,
k = 1, 2, 3, . . . ,
(1.4)
where p ∈ (0, 1) and q = 1 − p, and its p.g.f. by φ(z) =
∞ X
q k−1 pz k =
k=1
(1 − q)z . 1 − qz
(1.5)
The parameter p is usually referred to as the “probability of success” and X is then the number of independent trials necessary until we obtain the first success. An alternative definition counts not the trials but the failures Y until the first success. Clearly Y = X − 1 and P (Y = k) = q k p, k = 0, 1, 2, . . . , (1.6) with corresponding p.g.f. Ez Y =
1−q . 1 − qz
(1.7)
It is easy to check that EY = q/p and Var(Y ) = q/p2 . Also, EX = 1 + EY = 1/p and Var(X) = Var(Y ) = q/p2 .
1.3.4 The negative binomial distribution The last example we will mention here is the negative binomial (or Pascal) distribution. Recall that the binomial coefficient is defined for all a ∈ R and n ∈ N as µ ¶ a a(a − 1) . . . (a − n + 1) = . n n! ¡a¢ If a is a positive integer¡then = 0 for all n > a. If however a is a negative integer or a n ¢ a (non-integer) real then n 6= 0 for all n ∈ N. Also recall the binomial theorem, valid for |x| < 1 and all α ∈ R: ∞ µ ¶ X α k α (1 + x) = x . (1.8) k k=0 ¡α¢ (If α is a positive integer then k = 0 for all kP= α¡+¢1, α + 2, · · · and thus the infinite series (1.8) turns into a finite sum: (1 + x)α = αk=0 αk xk .) ¡ ¢ Note in particular that binomial coefficient −α can be written as n µ ¶ −α (−α)(−α − 1) · · · (−α − n + 2)(−α − n + 1) = n! n µ ¶ n α+n−1 n (α + n − 1)(α + n − 2) · · · (α + 1)α = (−1) . = (−1) n n! 4
Thus we have the identity −α
(1 − x)
=
¶ ∞ µ X −α k=0
k
k
(−x) =
¶ ∞ µ X α+k−1 k=0
k
xk .
(1.9)
If p ∈ (0, 1) and q = 1 − p then the negative binomial distribution with parameters p and α > 0 is defined as µ ¶ α+k−1 α k P (X = k) = p q , k = 0, 1, 2, . . . . (1.10) k In order¢ to check that the above is indeed a probability it suffices to note that ¢ ¡α+k−1 P∞ ¡distribution α+k−1 α k > 0 when α > 0 for all k ∈ N and that k=0 p q = pα (1 − q)−α = 1, k k on account of (1.9). The probability generating function of the negative binomial distribution is given by ¶ µ ¶α ∞ µ X α+k−1 α k k p φ(z) = p q z = . k 1 − qz k=0 α
p If X is a random variable with this distribution then EX = φ0 (1) = αq (1−q) α+1 or
q EX = α . p ³ ´2 pα q Similarly, EX(X − 1) = φ00 (1) = α(α + 1)q 2 (1−q) = α(α + 1) . Thus we have α+2 p ³ ´2 ³ ´2 ³ ´2 = EX 2 = α(α + 1) pq + α pq and thus Var(X) = α(α + 1) pq + α pq − α pq ³ ´ α pq 1 + pq or q Var(X) = α 2 . p When α = m ∈ N then the negative binomial random variable can be thought of as a sum of m independent geometric random variables with distribution (1.6). This follows readily by comparing the corresponding generating functions.
5
Chapter 2 Distributions on R The statistics of a real random variable X are determined by its distribution function F (x) := P (X ≤ x), x ∈ R. It is clear that F is nondecreasing and that limx→−∞ F (x) = 0, limx→∞ F (x) = 1. F is defined to be right-continuous. Note that P (a < X ≤ b) = F (b)−F (a). If x is a point of discontinuity of F then x is called an atom of the distribution and P (X = x) = F (x) − F (x−) > 0. If on the other hand x is a point of continuity of F then P (X = x) = 0. F can have at most countably many discontinuity points. If there exists a nonnegative f such that Z x F (x) = f (y)dy, x∈R −∞
then F is called an absolutely continuous distribution and f is (a version of) the density of F . Most of the distributions we will consider here will have densities though occasionally we will find it useful to think in terms of more general distribution functions. Most of the time we will also be thinking in terms of distributions on R+ , i.e. distributions for which F (0−) = 0. The function F (x) := 1 − F (x) is called the tail of the distribution function. The moment of order k of a distribution is defined as Z ∞ mk := xk dF (x), −∞
provided that the integral exists. The moment generating function that corresponds to a distribution F is defined as Z ∞ θX M (θ) := Ee = eθx dF (x) −∞
for all values of θ for which the integral converges. If there exists ² > 0 such that M (θ) is defined in (−², +²) then the corresponding distribution is called light–tailed. In that case one canR show that repeated differentiation inside the integral is permitted and thus ∞ M (k) (θ) = −∞ xk eθx dF (x) for θ ∈ (−², +²). Thus we see that F has moments of all orders and M (k) (0) = mk , 6
M (θ) =
∞ X θk k=0
k!
mk .
This justifies the name “moment generating function”. There exist however many distributions for which the moment generating function does not exist for all values of θ ∈ R. We shall see such examples in the sequel. In fact it is possible that the integral defining the moment generating function exists only for θ = 0. This is the case for instance in the ”double-sided Pareto” distribution with density f (x) = 2|x|αα+1 , |x| ≥ 1, α > 0. Convergence problems, such as the R ones just mentioned, are usually sidestepped by examining the characteristic function R eitx dF (x). In this case the defining integral converges for all t ∈ R. Also, particularly when dealing with nonnegative random variables, it is often customary to examine the so-called Laplace transform which is defined R −sx as e dF (x). For nonnegative random variables the Laplace transform always exists for s ≥ 0. The only difference between Laplace transforms and moment generating functions is of course the sign in the exponent and thus all statements regarding moment generating functions carry over to Laplace transforms mutatis mutandis. Scale and location parameters. Let X a random variable with distribution F (and density f ). If Y = aX + b where (a > 0 and b ∈ R) then the distribution G(x) := P (Y ≤ x) of Y is given by µ ¶ x−b G(x) = P (X ≤ (x − b)/a) = F . a a is called a scale parameter while b a location parameter. The density of G, g, is given by µ ¶ x−b 1 g(x) = f . a a Note in particular that EY = aEX + b and Var(Y ) = a2 Var(X). Thus if X is “standardized” with mean 0 and standard deviation 1, then Y has mean b and standard deviation a. Also, if MX (θ) = EeθX is the moment generating function of X, then the moment generating function of Y is MY (θ) = Eeθ(aX+b) = eθb MX (aθ).
(2.1)
2.1 Some distributions and their moment generating functions In this section we give the definition of several continuous distributions that will play an important role in the sequel. Many of their properties will be explored in later sections.
7
2.1.1 The normal distribution This is the most important distribution in probability theory. The standard normal distribution has density given by 1 2 1 ϕ(x) = √ e− 2 x , 2π
x ∈ R.
(2.2)
The distribution function of the standard normal, denoted by Z x Φ(x) := ϕ(y)dy, x∈R
(2.3)
−∞
cannot be expressed in terms of elementary functions. Its values are available in tables. If X has the standard normal density then one can readily check (by a symmetry argument) that EX = 0. Also, an integration by parts shows that Var(X) = 1. We denote the standard normal distribution as N (0, 1). The general normal random variable can be obtained via a location-scale transformation: If X is N (0, 1) then Y = σX + µ (with σ > 0) has mean µ and variance σ 2 . Its density is given by µ ¶ (x−µ)2 x−µ 1 1 f (x) = ϕ = √ e− 2σ2 (2.4) σ σ σ 2π and of course its distribution function by F (x) = Φ( x−µ ). It is denoted by N (µ, σ 2 ). σ The moment generating function of the standard normal distribution is given by Z ∞ Z ∞ 1 1 2 1 2 2 θx 1 − 21 x2 dx = e 2 θ √ e− 2 (x −2θx+θ ) dx M (θ) = e √ e 2π 2π −∞ −∞ Z ∞ 2 1 2 1 1 √ e− 2 (x−θ) dx = e2θ 2π −∞ 1 2
= e2θ ,
(2.5) 1
2
where in the last equality we have used the fact that √12π e− 2 (x−θ) is a probability density function. Thus, using (2.1), for a N (µ, σ 2 ) normal distribution the corresponding moment generating function is given by 1 2 2 σ
M (θ) = eµθ+ 2 θ
,
θ ∈ R.
(2.6)
Note that the moment generating function is defined for all θ ∈ R. While Φ(x) cannot be expressed in closed form in terms of elementary functions, some particularly useful bounds for the tail of the distribution, Φ(x) := 1 − Φ(x) are easy to derive. We mention them here for future reference. Proposition 1. For all x > 0 we have ¶ − 1 x2 µ 1 2 1 e 2 1 1 e− 2 x √ √ − ≤ 1 − Φ(x) ≤ x x3 x 2π 2π
8
(2.7)
R∞ 1 2 Proof: The tail is given by Φ(x) = x √12π e− 2 u du. The upper bound for the tail follows immediately from the inequality Z ∞ Z ∞ Z 1 ∞ − 1 u2 1 2 1 1 2 u − 1 u2 − 12 u2 2 du ≤ e du = e e 2 d( u ) = e− 2 x x x x 2 x x x (remember that x > 0). The lower bound can be obtained by the following integration by parts formula ¯∞ Z ∞ Z ∞ 3 − 1 u2 1 − 1 u2 ¯¯ 1 − 1 u2 2 2 0≤ e du = − e − e 2 du ¯ u4 u3 u2 x x x Z ∞ 1 − 1 x2 1 − 1 u2 = 3e 2 − e 2 du x u2 x Z ∞ 1 2 1 − 1 x2 1 − 1 x2 = 3e 2 − e 2 + e− 2 u du. x x x
(2.8)
2.1.2 The exponential distribution The distribution function is ½ F (x) =
0 if x < 0 , 1 − e−λx if x ≥ 0
(where λ > 0 is called the rate) with corresponding density ½ 0 if x < 0 . f (x) = −λx λe if x ≥ 0 The mean of the exponential distribution is function is given by Z ∞ eθx λe−λx dx = 0
1 λ
and the variance
λ , λ−θ
1 . λ2
for θ < λ.
2.1.3 The Gamma distribution The density function is given by 0 f (x) =
if x ≤ 0
(βx)α−1 −βx e if x > 0 β Γ(α)
9
.
Its moment generating
β is often called the scale parameter, while α the shape parameter. The Gamma function, which appears in the above expressions is defined via the integral Z ∞ Γ(x) = tx−1 e−t dt x > 0, (2.9) 0
and satisfies the functional equation Γ(x + 1) = xΓ(x). In particular, when x is an integer, say n, Γ(n) = (n − 1)! . (This can be verified by evaluating the integral in (2.9).) We also mention that Γ The corresponding distribution function is 0 if x ≤ 0 Z x F (x) = (βu)α−1 −βu β e du if x > 0 Γ(α) 0
¡1¢ 2
=
√
π.
α > 0,
which be expressed in terms of the incomplete gamma function defied as Γ(z, α) := R z α−1can −t t e dt. 0 The moment generating function of the Gamma distribution is µ ¶α Z ∞ α−1 β xθ (βx) −βx M (θ) = e β e dx = . Γ(α) β−θ 0 Note that M (θ) above is defined only in the interval −∞ < θ < β because when θ ≥ β the defining interval does not converge. It is easy to see that, for α = 1 the Gamma distribution reduces to the exponential. A special case of the Gamma distribution is the so-called Erlang distribution obtained for integer α = k − 1 (We have also renamed β into λ). Its density is given by 0 if x < 0 f (x) = k λ (λx) e−λx if x ≥ 0 k! with corresponding distribution function 0 if x < 0 k−1 F (x) = X (λx)i −λx e if x ≥ 0 1 − λ i! i=0
¡ λ ¢k Its moment generating function is of course λ−θ . One of the reasons for the importance of the Erlang distribution stems from the fact that it describes the sum of k independent exponential random variables with rate λ.
10
2.1.4 The Pareto distribution The Pareto density has the form
0
if x ≤ c
f (x) =
α αc if x > c xα+1 with corresponding distribution function if x ≤ c 0 F (x) = ³ ´α 1− c if x > c x where α > 0. The Pareto distribution is a typical example of a subexponential distribution. R ∞ n α −α−1 The nth moment of the Pareto distribution is given by the integral c x αc x dx provided that it is finite. Hence the nth moment exists if α > n and in that case it is equal to αcn cα . In particular the mean exists only if α > 1 and in that case it is equal to α−1 α−n
An alternative form of the Pareto which is non-zero for all x ≥ 0 is given by if x < 0 0 f (x) = α if x ≥ 0 c(1 + x/c)α+1 0 if x < 0 F (x) = 1 if x ≥ 0 1− (1 + x/c)α where α > 0.
2.1.5 The Cauchy distribution The standardized Cauchy density is given by 1 1 f (x) = , x ∈ R, π 1 + x2 with distribution function 1 1 F (x) = + arctan(x), x ∈ R. 2 π It has “fat” polynomial tails: In fact using de l’Hˆopital’s rule we see that f (x) x2 1 F (x) = lim = lim = . −1 −2 2 x→∞ x x→∞ x x→∞ π(1 + x ) π
lim xF (x) = lim
x→∞
This it does not have a mean or a variance because the integrals that define them do not converge. It is useful in modelling phenomena that can produce large claims.
11
2.1.6 The Weibull distribution The distribution function is given by ½ F (x) =
0 if x ≤ 0 −xβ 1−e if x > 0
with corresponding density ½ f (x) =
0 if x ≤ 0 β−1 −xβ βx e if x > 0
The nth moment of this distribution is given by ¶ µ Z ∞ Z ∞ n n+β−1 −xβ n/β −y +1 . βx e dx = y e dy = Γ β 0 0
2.2 Sums of independent random variables in R+ Suppose that F , G, are two distributions on R+ . Their convolution is defined as the function Z x F ? G(x) = F (x − y)dG(y), x ≥ 0. (2.10) 0
If X, Y , are independent random variables with distributions F and G respectively, then F ? G is the distribution of their sum X + Y . Indeed, Z ∞ Z ∞ P (X ≤ x − y|Y = y)dG(y) P (X + Y ≤ x|Y = y)dG(y) = P (X + Y ≤ x) = 0 0 Z ∞ Z x = F (x − y)dG(y) = F (x − y)dG(y). 0
0
In the above string of equalities we have used the independence of X and Y to write P (X ≤ x − y|Y = y) = F (x − y) and the fact that F (x − y) = 0 for y > x to restrict the range of integration. In view of this last remark it is clear that F ? G = G ? F . We will also write F ?n to denote the n–fold convolution F ?F ?· · ·?F (with n factors) with the understanding that F ?1 = and F ?0 = I where I(x) = 1 if x ≥ 0 and I(x) = 0 when x < 0. When both F and G are absolutely continuous with densities f and g respectively then H = F ? G is again absolutely continuous with density Z x h(x) = f (x − y)g(y)dy. 0
We will denote the convolution of the two densities by h = f ∗ g. For instance, if f (x) = λe−λx , g(x) = µe−µx , then ¢ ¡ Z x −(µ−λ)x ¢ λµ ¡ −λx −λ(x−y) −µy −λx 1 − e = e − e−µx . f ∗ g(x) = λµe e dy = λµe µ−λ µ−λ 0 12
Note that, if X, Y , are independent then the moment generating function of the sum X + Y is given by MX+Y (θ) = Eeθ(X+Y ) = EeθX eθY = MX (θ)MY (θ). If Xi , i = 1, 2, . . . , n are independent, identically distributed random variables with distribution F and moment generating function MX (θ) then S := X1 + · · · + Xn has distribution function F ?n and moment generating function MS (θ) = (MX (θ))n . Convolutions are in general hard to evaluate explicitly. As an exception to this statement we mention the exponential distribution and the uniform distribution. In the case of the exponential distribution, F (x) = 1 − e−λx , x ≥ 0, we have ∗n
F (x) = 1 −
n−1 X (λx)k k=0
k!
e−λx .
P (λx)k −λx e (This is the well known Erlang distribution). More generally, if F (x) = 1− m−1 k=0 k! Pnm−1 (λx)k −λx ∗n then F (x) = 1 − k=0 e and, more generally yet, if F is Gamma(α, λ) then k! F ∗ is Gamma(nα, λ). The uniform distribution R ∞ is another case in point. Note that the Laplace transform of xn , x ≥ 0 is given by 0 xn e−sx dx = n!sn+1 . Also, the Laplace transform of (x − τ )n+ , x ≥ 0, τ ≥ 0, (where x+ denotes the positive part of the real number x, i.e. x+ = x if x ≥ 0 and x+ = 0 if x < 0) is given by Z ∞ Z ∞ n −sx −τ s (x − τ )+ e dx = e (x − τ )n e−s(x−τ ) dx = n!e−τ s sn+1 . (2.11) 0
τ
Consider now the sum of n independent random variables, uniformly distributed in [0, a]. The Laplace transform of each one of them is given by Z 1 a −sx 1 − e−sa e dx = . a 0 sa Thus, the Laplace transform of the n-fold convolution is −ska X ³n´ (1 − e−sa ) ke (−1) = . n an k (sa)n s k=0 n
n
Based on (2.11), we have the identity Z ∞ ³x ´n−1 1 e−ska dx. = − k sn an a(n − 1)! a + 0 Thus ∗n
dF (x) =
n ³ ´ X n k=0
k
(−1)k
³x ´n−1 1 dx −k a(n − 1)! a +
13
and, integrating,
n X F (x) = (−1)k ∗n
k=0
³x ´n 1 −k . k!(n − k)! a +
In, particular, for the uniform distribution in [0,1] the above expression becomes F ∗n (x) =
n X k=0
x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
(−1)k
1 (x − k)n+ . k!(n − k)!
U ∗10 (x) 0.00000000 0.00000000 0.00000028 0.00001589 0.00027943 0.00246917 0.01346285 0.05045252 0.13890157 0.29451868 0.50000000 0.70548132 0.86109843 0.94954748 0.98653715 0.99753083 0.99972057 0.99998411 0.99999972 1.00000000 1.00000000
N (5, 5/6) 0.00000002 0.00000041 0.00000589 0.00006305 0.00050756 0.00308500 0.01422982 0.05017411 0.13666088 0.29194118 0.50000000 0.70805882 0.86333912 0.94982589 0.98577018 0.99691500 0.99949244 0.99993695 0.99999411 0.99999959 0.99999998
Table 2.1: Comparison of the distribution U ∗10 (x) (sum of 10 independent uniform random variables in [0,1]) with the approximating normal which has mean 5 and variance 5/6. The above table shows that the approximation based on the Central Limit Theorem is quite good in this case.
2.3 Random Sums Suppose that Xi , i = 1, 2, . . . is a sequence of non-negative random R ∞variables with distribution function F and moment generating function MX (θ) := 0 eθx dF (x). Suppose
14
Figure 2.1: Convolution of the uniform distribution, 10 times with itself and the approximating normal. The agreement is too close for the graph to show any difference also that P N is a discrete random variable, independent of the Xi ’s, i = 1, 2, . . .. Let SN = N i=1 Xi . The distribution and the moments of SN can be obtained by conditioning on N . For instance P (SN ≤ x) =
∞ X
P (N = n)P (X1 + · · · + Xn ≤ x) =
n=0
∞ X
P (N = n)F ?n (x). (2.12)
n=0
The mean and the variance of SN can be computed in the same fashion. ESN =
∞ X
P (N = n)E[X1 + · · · + Xn ] =
n=0
∞ X
P (N = n)nEX1 = EN EX1 . (2.13)
n=1
Also E
à n X i=1
!2 Xi
=E
" n X
Xi2 +
n=1
X
# Xi Xj = nEX12 + n(n − 1)(EX1 )2
i6=j
and thus 2 ESN
=
∞ X
2
P (N = n)E[(X1 + · · · + Xn ) ] =
n=0
= E(X12 )EN + (EX1 )2
∞ X
¢ ¡ P (N = n) nEX12 + n(n − 1)(EX1 )2
n=1 ∞ X
n(n − 1)P (N = n)
n=1
= E(X12 )EN + (EX1 )2 EN 2 − (EX1 )2 EN = Var(X1 )EN + (EX1 )2 EN 2 .
15
(2.14)
From (2.13) and (2.14) we obtain Var(SN ) = Var(X1 )EN + Var(N )(EX1 )2 .
(2.15)
Finally we can also compute the moment generating function of SN by conditioning: MSN (θ) = Ee
θSN
=
∞ X
P (N = n)Ee
θ
Pn
i=1
Xi
=
n=0
=
∞ X
∞ X
¡ ¢n P (N = n) EeθX1
n=0
P (N = n) (MX (θ))n .
n=0
If we denote by φN (z) =
P∞ n=0
P (N = n)z n the p.g.f. of N we see from the above that MSN (θ) = φN (MX (θ)).
16
(2.16)
Chapter 3 Simple Models for Premium Calculation 3.1 The rationale for insurance from the point of view of the buyer The simplest formulation of the problem of insurance from the point of view of the insurance buyer assumes that the buyer, who has a utility function u such that u0 > 0 and u00 < 0 has initial wealth W and faces the possibility of a loss which will be represented by a random variable X with distribution function F (x) := P (X ≤ x). (We assume that F (x) = 0 for x < 0.) He is willing to buy insurance which will cover his losses completely by paying a premium p provided that Z ∞ u(W − p) ≥ u(W − x)dF (x). (3.1) 0
The above equation states that the utility of the buyer if he pays the premium and thus reduces his fortune to W − p is greater than the expected utility if he doesn’t. In this latter case, his fortune on the event that X = x is u(W − x) and the right hand side of the above equation represents his expected utility. It is clear from (3.1) that the insurance buyer would be willing to pay any premium p less than p∗ in order to buy full insurance for the risk X, where p∗ satisfies the equation Z ∞ ∗ u(W − p ) = u(W − x)dF (x). 0
It is interesting to note that, in view of our assumptions regarding the utility function u, p∗ > EX and thus there exist values of p > EX that would be acceptable to the insurance buyer. This can be seen by means of Jensen’s inequality. If u is a concave function (as is the case here since u00 < 0) then for every y, u(z) < u(y) + (z − y)u0 (y).
17
Setting y = W − p and z = W − x we obtain u(W − x) < u(W − p) + (p − x)u0 (W − p). Integrating this last equation against dF (x) we obtain Z ∞ Z ∞ Z ∞ 0 u(W − x)dF (x) < u(W − p) dF (x) + u (W − p) (p − x)dF (x) 0
or
0
Z
∞
0
u(W − x)dF (x) < u(W − p) + (p − EX)u0 (W − p).
0 0
Since u (W − p) > 0 by assumption it is clear that there exist values of p greater than EX and such that Z ∞ u(W − x)dF (x) < u(W − p). 0
Thus the insurance buyer would be willing to buy insurance and pay for it a premium higher than the expected size of his loss, EX.
3.2 Safety in numbers While the decision to buy insurance is based on the buyer’s risk aversion, from the point of view of the insurance company a statistical argument based on the law of large numbers applies. Before we explain the situation further we remind the reader of two fundamental results of probability theory. Theorem 1 (Strong Law of Large Numbers). If Xi , i = 1, 2, . . ., are independent, identically distributed random variables with finite expectation m = EX1 , then, as n → ∞, X1 + X2 + · · · + Xn →m n (The statement above means that P (limn→∞ large numbers is the following
with probability 1. X1 +···+Xn n
= m) = 1.) A less strict law of
Theorem 2 (Weak Law of Large Numbers). If Xi , i = 1, 2, . . ., are independent, identically distributed random variables with finite expectation m = EX1 then, for any ² > 0, as n→∞ ¯ ¶ µ¯ ¯ X1 + X2 + · · · + Xn ¯ lim P ¯¯ − m¯¯ ≥ ² = 0. n→∞ n The second fundamental result is the Central Limit theorem which may be formulated as follows. Theorem 3 (Central Limit Theorem). Let {Xi } be a sequence of independent, identically distributed random variables with common distribution F (x) := P (X ≤ x) such that their
18
R R common mean m := xdF (x) and variance σ 2 := (x − m)2 dF (x) are both finite. Then, as n → ∞, µ ¶ X1 + X2 + · · · + Xn − nm √ P ≤ x −→ Φ(x) ∀x ∈ R (3.2) σ n where
Z
x
Φ(x) := −∞
1 2 1 √ e− 2 y dy 2π
is the standard normal distribution. Note that the Central Limit Theorem can be seen to describe the statistical fluctuations around the Law of Large Numbers since it can also be stated as follows: √ √ P (m − xσ n ≤ X1 + · · · + Xn ≤ m + xσ n) ≈ Φ(x) − Φ(−x).
3.3 The point of view of the insurance company Suppose then that the insurance company has a portfolio consisting of n policies. Over one period each one of these policies generates a claim, the ith claim being a random variable Xi , i = 1, 2, . . . , n. These are assumed to be independent, identically distributed random variables with common distribution F and mean m. The premium the insurance company receives for each company is p. If at the beginning of the period the free reserves Pn of the insurance company are u, then at the end of the period they are u + np − i=1 Xi . If the premium p is greater than the expected size of the claim (a situation which may be acceptable to the insurance buyer as we saw in the previous section) then by virtue of the law of large numbers the insurer is likely to profit and extremely unlikely to be unable to cover the total losses. To see this note that the event that the income from the premiums np plus the P initial free reserves u will not suffice to cover the losses, happens with probability P ( n1 ni=1 Xi > p + u/n). Pn If p > m then by virtue of the weak law of large numbers we 1 have that limn→∞ P ( n i=1 Xi > p + u/n) = 0. The above limiting argument makes it very implausible that for large but finite n the losses will exceed the premium income and safety reserves. In order to quantify this a study of the fluctuations of the sums of random variables is necessary. This study starts with the central limit theorem.
3.3.1 Net premium computation using the Central Limit Theorem Let us start by setting u/n =: v, the total free reserves per contract and c := p + v. Then, in order to set the probability that the losses will exceed the available reserves equal to α, (a small number, typically 0.1% – 1%) we should choose c such that P (X1 + X2 + · · · + Xn ≥ nc) = α. 19
In order to take advantage of the Central Limit theorem (CLT) let us rewrite the above equation as µ ¶ X1 + X2 + · · · + Xn − nm c − m√ √ P n = α. (3.3) ≥ σ σ n Assuming n to be large enough to justify our appeal to the CLT we then have c − m√ n = z1−α σ where Φ(z1−α ) = 1 − α. Thus 1 c = m + z1−α σ √ n
(3.4)
and hence
u 1 + m + z1−α σ √ . (3.5) n n The above argument often provides the basis for rate setting but it should be used with care. There are three reasons that could lead to erroneous estimation of the probability of ruin: p=
a) The risks in the portfolio are inhomogeneous to such a degree that the equidistribution approach is not justified b) The claim distribution is heavy tailed (σ = ∞) and therefore the CLT cannot be applied c) We are interested in ”rare” events, out in the tail of the distribution of X1 + X2 + · · · + Xn where the CLT does not hold We will have the opportunity to look at the issues raised in a) and b). Regarding c) let us examine more closely the approximation involved in (3.3). In order to be justified in applying the CLT we should be prepared to let n → ∞. However, the right hand side of the inequality in (3.3)³also goes to infinity with n and therefore the only thing we learn from √ ´ X1 +X2 +···+X −nm c−m n √ the CLT is that P > σ n → 0 as n → ∞ . σ n Assuming that the CLT approximation can be used, ¶ µ ¶ µ c − m√ X1 + X2 + · · · + Xn − nm c − m√ √ n ≈Φ n P > σ σ σ n where Φ(x) = 1 − Φ(x). Then it is easy to see that µ ¶ µ ¶ X1 + X2 + · · · + Xn − nm 1 c − m√ 1 c − m√ √ lim log P > n = lim log Φ n . n→∞ n n→∞ n σ σ σ n This last limit will be computed in the next section using the inequalities of proposition 1 and we will see that µ ¶ µ ¶2 c − m√ 1 c−m 1 . n =− lim log Φ n→∞ n σ 2 σ 20
We conclude from the above analysis that, if the probability of ruin P (X1 +X2 +· · ·+Xn ≥ n(c − m)) is set equal to α (a small number) then for large n µ ¶2 1 1 c−m log α ≈ − n 2 σ or equivalently 2 log(1/α) √ . n This last equation is to be compared with (3.4). c=m+σ
(3.6)
3.4 Logarithmic Asymptotics Suppose mean R that Xi , i = 1, 2, 3, . . . , are i.i.d. with distribution function F , corresponding R m = R xF (dx), variance σ 2 , and moment generating function M (θ) := R eθx F (dx). The weak law of large numbers guarantees that lim P (Sn ≥ nx) = 0
for x > m
(3.7)
lim P (Sn ≤ nx) = 0
for x < m
(3.8)
n→∞
and similarly that n→∞
Correspondingly, if the premium charged per policy, x, is higher than the expected claim size, m, then the probability or ruin goes to zero, whereas if the it is less than m then ruin is certain. One important question however not answered by the weak law of large numbers is how fast do the above probabilities go to zero. We will see that they go to zero exponentially fast, i.e. that P (Sn ≥ nx) ³ e−nI(x) for x > m (3.9) In the above formula note that the exponential rate of decay I(x) is a function of x. The meaning of (3.9) is made precise if we state it as 1 log P (Sn ≥ nx) = −I(x) for x > m. n→∞ n Where does the exponential behavior come from? Write P (Sn ≥ nx) as µ µ ¶¶ x−m Sn − nm √ √ P (Sn − nm ≥ n(x − m)) = P ≥ n σ σ n lim
(3.10)
(3.11)
√ and appeal to the central limit theorem: For n large Snσ−nm is approximately normally n distributed with mean 0 and standard deviation 1 and hence µ µ ¶¶ Z ∞ Sn − nm √ x−m 1 − 12 u2 √ ≥ n P (Sn ≥ nx) = P ≈√ e du σ σ n 2π √n( x−m ) σ (x−m)2 σ −n 2σ 2 √ ≈ e . (x − m) 2πn
21
Are the above asymptotics justified? In one case at least yes. Suppose that the r.v.’s Xi , 2 2 are³i.i.d. are ´ normal with mean m and variance σ (N (m, σ )). Then Sn /n has distribution 2 N m, σn . Hence in this case (3.11) becomes an exact relationship and we have Z P (Sn ≥ nx) =
∞ √
n( x−m ) σ
1 2 1 √ e− 2 u du. 2π
(3.12)
Taking into account the bounds in proposition 1 we have ³³ ´ 2´ − 12 n( x−m 1 σ 1 σ3 1 ) σ √ log n1/2 x−m − n3/2 (x−m)3 e ≤ log P (Sn ≥ nx) 2π µ ¶ 2 σ 1 − 21 n( x−m 1 ) σ √ e ≤ log n1/2 x − m 2π or ³ 1 σ − log n + log x−m − 2
1 σ3 n (x−m)3
´
µ ¶2 1 x−m 1 − log 2π − n ≤ log P (Sn ≥ nx) 2 2 σ ¡ ¢2 1 σ − 12 log 2π − 12 n x−m . ≤ − log n + log x−m σ 2
Dividing the above inequality with n and letting n → ∞ (taking into account that n1 log n → 0) we obtain µ ¶2 1 1 x−m lim log P (Sn ≥ nx) = − . (3.13) n→∞ n 2 σ ¡ ¢2 we obtain (3.7) for normal random variables. Can we Hence, setting I(x) = 12 x−m σ generalize this to non–normal random variables? Can we generalize it for sequences that are not independent and identically distributed? As we will see the answer is in the affirmative on both counts. We start with a relatively simple bound known as the Chernoff bound.
3.5 Chernoff bounds In the same framework as before Xi , i = 1, 2, . . . are assumed to be i.i.d. r.v.’s with moment generating function M (θ). We start with the obvious inequality 1(Sn ≥ nx)enxθ ≤ eθSn which holds for all θ ≥ 0 since the exponential is non–negative. Taking expectations in the above inequality we obtain P (Sn ≥ nx) ≤ e−nxθ E[eθX1 +X2 +···+Xn ] = e−nxθ M (θ)n ,
22
θ ≥ 0.
The above inequality provides an upper bound for P (Sn ≥ nx) for each θ ∈ R+ . Since the left hand side in the above inequality does not depend on θ we can obtain the best possible bound by setting P (Sn ≥ nx) ≤ inf e−n{xθ−log M (θ)} = e−n supθ≥0 {xθ−log M (θ)} . θ≥0
Define now the rate function I(x) := sup {xθ − log M (θ)} .
(3.14)
θ∈R
With this definition the Chernoff bound becomes P (Sn ≥ nx) ≤ e−nI(x) .
(3.15)
As we will see in many cases this upper bound can be turned into an asymptotic inequality. This is the content of Cram´er’s theorem. Theorem 4. The cumulant log M (θ) is a convex function of θ. Proof: To establish this we will show that the second derivative negative. Indeed µ 0 ¶2 d2 M (θ) M 00 (θ) log M (θ) = − 2 dθ M (θ) M (θ) However note that M 00 (θ) = and hence
Similarly
d2 dθ2
log M (θ) is non–
d2 E[eθX ] = E[X 2 eθX ] dθ2
M 00 (θ) eθX = E[X 2 ] = EPe [X 2 ]. M (θ) M (θ) M 0 (θ) eθX = E[X ] = EPe [X] M (θ) M (θ)
and thus ¡ ¢2 ¡ ¢2 d2 log M (θ) = EPe [X 2 ] − EPe [X] = EPe X − EPe [X] ≥ 0. 2 dθ
3.6 Examples of rate functions Bernoulli Random Variables Suppose that P (Xi = 1) = 1 − P (Xi = 0) = p (i.e. the random variables take only the values 0 and 1 with probabilities 1 − p and p respectively).
23
¡ ¢ In this case log M (θ) = log peθ + 1 − p . To maximize xθ−log M (θ) we set its derivative peθ x 1−p θ and, taking logarithms, equal to zero: x = 1−p+pe θ or e = 1−x p θ = log Therefore
x 1−p + log . 1−x p
x 1−x , 0 0 p I(x) = +∞, x≤0 In the following graph the rate function of the geometric distribution (with p = 1/2) is shown.
3.7 Properties of the rate function Let D = {x : I(x) < ∞} be the domain of definition of I. It is easy to see that D is either the whole of R or an interval that may extend infinitely to the right or the left. If the upper or lower end of the interval is finite it may or may not belong to D depending on the case. Thus in any case D is a convex set in R. 1. I(x) is a convex function (on its domain of definition). It suffices to show that, for each λ ∈ [0, 1], x, y ∈ D, I(xλ + y(1 − λ)) ≤ λI(x) + (1 − λ)I(y). Indeed, I(xλ + y(1 − λ)) = sup {θ(xλ + y(1 − λ)) − log M (θ)} θ
= sup {λ(θx − log M (θ)) + (1 − λ)(θx − log M (θ))} θ
≤ λ sup {θx − log M (θ)} + (1 − λ) sup {θy − log M (θ)} θ
θ
= λI(x) + (1 − λ)I(y)
25
2. I(x) ≥ 0 for all x ∈ D and I(m) = 0. (In particular this implies that I is minimized at x = m.) We begin with the remark that for θ = 0, θx − log M (θ) = 0. Thus I(x) ≥ 0. Next, use Jensen’s inequality: M (θ) = EeθX ≥ eθEX for all θ for which M (θ) < ∞. Thus log M (θ) ≥ θm or θm − log M (θ) ≤ 0. Since I(x) ≥ 0, we conclude that I(m) = 0. 3. For each x ∈ D there exists θ∗ such that M 0 (θ∗ ) =x M (θ∗ )
(3.16)
We will not present a complete proof of this. A justification might be given along the following lines: since for fixed x the function f (θ) = θx − log M (θ) is convex in θ and smooth (M (θ) has derivatives of all orders) it suffices to find θ∗ so that f (θ∗ ) = 0 or equivalently x − M 0 (θ∗ )/M (θ∗ ) = 0.
3.8 The twisted distribution Let F (y) be a distribution function on R with moment generating function M (θ). The distribution function Fe(y) defined via dFe(dy) =
eθy F (dy) M (θ)
is called the twisted distribution that corresponds to F . It is easy to see that Z y eθu e F (y) = F (du) −∞ M (θ) is a non–decreasing function of y and as y → ∞, Fe(y) → 1. The mean of the twisted distribution is given by Z ∞ Z ∞ Z ∞ 1 d M 0 (θ) 1 θy e ye F (dy) = eθy F (dy) = . y F (dy) = M (θ) −∞ M (θ) dθ −∞ M (θ) −∞ In particular when θ = θ∗ , the solution of (3.16), Z ∞ M 0 (θ∗ ) 1 θ∗ y ye F (dy) = = x. M (θ∗ ) −∞ M (θ∗ )
(3.17)
Regarding our notation, it will be convenient to think of two different probability measures, the probability measure P , under which the random variables Xi , i = 1, 2, . . . , have distribution F , and the twisted measure Pe, under which the r.v.’s Xi have distribution Fe. e Expectations with respect to the probability measure Pe will be denoted by E.
26
3.9 Cram´er’s Theorem Theorem 5. Suppose that {Xn } is an i.i.d. sequence of real random variables with moment generating function M (θ) which exists in an open neighborhood of zero. Then, if m = EX1 and Sn := X1 + · · · + Xn , 1 log P (Sn ≥ nx) = −I(x), n→∞ n lim
x ≥ m.
Proof: Note that, in view of the discussion regarding Chernoff’s theorem, nx) ≤ −I(x) whence lim sup n→∞
1 log P (Sn ≥ nx) ≤ −I(x), n
1 n
log P (Sn ≥
x ≥ m.
Thus, in order to establish the theorem it suffices to show the corresponding inequality for the lim inf, i.e. lim inf n→∞
1 log P (Sn ≥ nx) ≥ −I(x) n
for x > m.
(3.18)
Start with the inequality P (Sn ≥ nx) ≥ P (n(x + ²) ≥ Sn ≥ nx) = E [1(n(x + ²) ≥ Sn ≥ nx)] ·
¸ ∗ eθ Sn P (Sn ≥ nx) ≥ e M (θ ) E 1(n(x + ²) ≥ Sn ≥ nx) M (θ∗ )n = e−n(x+²) M (θ∗ )n Pe (n(x + ²) ≥ Sn ≥ nx) −n(x+²)
∗ n
(3.19)
The twisted distribution can be used to establish (3.18) as follows: Set θ = θ∗ so that, under Pe, the mean of Xi is x. Also µ ¶ √ S − nx n Pe (n(x + ²) ≥ Sn ≥ nx) = Pe ≥0 (3.20) n² ≥ √ n e = x, we can appeal to the Central Limit Theorem to conclude that and since EX µ ¶ √ 1 S − nx n lim Pe ≥0 = n² ≥ √ n→∞ 2 n or equivalently
1 lim Pe (n(x + ²) ≥ Sn ≥ nx) = . n→∞ 2 Taking logarithms in (3.19) we have
(3.21)
log P (Sn ≥ nx) ≥ −n(x + ²)θ∗ + n log M (θ∗ ) + log Pe (n(x + ²) ≥ Sn ≥ nx) 27
from which we obtain lim inf n
1 log P (Sn ≥ nx) ≥ −(x + ²)θ∗ + log M (θ∗ ) n 1 + lim inf log Pe (n(x + ²) ≥ Sn ≥ nx) . (3.22) n n
In view of (3.21) and the fact that ² was arbitrary we obtain lim inf n
1 log P (Sn ≥ nx) ≥ −(xθ∗ − log M (θ∗ )) = −I(x), n
the last equality following from the fact that θ∗ is the value that maximizes the quantity xθ − log M (θ).
28
Chapter 4 Large claim risk theory 4.1 Subexponential Distributions Definition 1. (Subexponential Distribution Functions) A distribution function F with support (0, ∞) is subexponential if, for all n ≥ 2, F ∗n (x) = n. x↑∞ F (x) lim
(4.1)
The class of subexponential distribution functions will be denoted by S. The class of subexponential distributions includes all distributions with polynomial (i.e. “fat”) tails. Let Xi , i = 1, 2, . . . , n, be i.i.d. random variables with distribution F . Denote by Sn := X1 + · · · + Xn their sum and by Mn := max(X1 , X2 , . . . , Xn ). Then P (Sn > x) = F ∗n (x) P (Mn > x) = F n (x) = 1 − F n (x) = (1 − F (x))(1 + F (x) + · · · + F n−1 (x)) n−1 X = F (x) F k (x). k=0
Hence,
n−1 X P (Mn > x) lim = lim F k (x) = n. x→∞ x→∞ F (x) k=0
(4.2)
From (4.1) we see that P (Sn > x) ∼ nF (x) while from (4.2) it follows that P (Mn > x) ∼ nF (x). Thus subexponentiality implies that P (Sn > x) ∼ P (Mn > x),
29
x → ∞.
which means that the tail of the sum and the tail of the maximum are the same, i.e. that very large values of the sum are mainly due to a single large contribution. This behavior does not occur with distributions that have exponential tails. Example: Let F (x) = 1 − e−λx . Then P (Mn > x) = F (x)
n−1 X
F k (x) ∼ ne−λx ,
as x → ∞.
k=0
On the other hand P (Sn > x) =
n−1 X (λx)k k=0
Hence in that case
k!
e−λx .
P (Mn > x) = 0. x→∞ P (Sn > x) lim
Lemma 1. If F ∈ S then, given ² > 0, there exists K < ∞ such that, for all n ≥ 2, F ∗n (x) ≤ K(1 + ²)n , F (x)
x ≥ 0.
(4.3)
Proof: We start with the following simple identity: F ∗(n+1) (x) F (x) − F (n+1)∗ (x) =1+ =1+ F (x) F (x) Let αn = sup x≥0
Z
x 0
F n∗ (x − y) dF (y). F (x)
F ∗n (x) . F (x)
(4.4)
(4.5)
Then αn+1 = = = = where AT =
Z x n∗ F ∗(n+1) (x) F (x − y) sup = 1 + sup dF (y) F (x) F (x) x≥0 x≥0 0 Z x n∗ Z x n∗ F (x − y) F (x − y) dF (y) + sup dF (y) 1 + sup F (x) F (x) x≥T 0 0≤x≤T 0 Z x n∗ Z x n∗ F (x − y) F (x − y) F (x − y) 1 + sup dF (y) + sup dF (y) F (x) F (x − y) F (x) 0≤x≤T 0 x≥T 0 F (x) − F ∗2 (x) , 1 + AT + αn sup F (x) x≥T 1 F (T )
< ∞. Also,
F (x) − F ∗2 (x) −F (x) + F ∗2 (x) F ∗2 (x) = = −1→2−1=1 F (x) F (x) F (x) 30
as x → ∞
since F ∈ S. Hence we can choose T large enough so that sup x≥T
F (x) − F ∗2 (x) ≤ (1 + ²). F (x)
Thus αn+1 ≤ 1 + AT + αn (1 + ²), hence αn ≤ (1 + AT )²−1 (1 + ²)n .
4.2 An asymptotic expression for the ruin probability in the subexponential case P Consider now the random sum Y = N i=1 Xi where Xi are i.i.d. subexponential random variable with distribution function F and N an independent random variable with values in N and distribution P (N = n) = pn , n = 0, 1, 2, . . .. We will assume that EN = m < ∞. Then the following proposition holds Proposition 2. If Ψ(u) = P (Y > u) then Ψ(u) ∼ mF (u). Proof: Note that
(4.6)
∞
Ψ(u) X F ∗k (u) pk = . F (u) k=0 F (u)
(4.7)
Letting u go to infinity and using Lemma 3 to allow us to pass the limit inside the infinite sum we obtain ∞ ∞ Ψ(u) X F ∗k (u) X lim pk lim pk k = m. = = u→∞ F (u) u→∞ F (u) k=0 k=0
31
Chapter 5 Exact and approximate computation of the loss distribution in a Portfolio 5.1 The compound Poisson distribution for integer–valued claims Consider S = X1 + X2 + · · · + XN where N is a Poisson random variable with mean α = α1 + α2 + · · · + αm and Xi are independent identically distributed, integer–valued random variables with distribution given by P (Xi = j) = pj , for j = 1, 2, . . . , n (note that the distributions are assumed to have finite support). Then P (S = n) =
∞ X
P ∗k (n)e−α
k=0
αk k!
(5.1)
where P ∗0 (n) = 1(n = 0), P ∗1 (n) = pn , and ∗k
P (n) =
m X
P ∗(k−1) (n − l)pl ,
k = 2, 3, . . . , n = 0, 1, . . . , nm.
l=0
The computation of the above convolutions is of course far from simple in general. Fortunately, in the discrete case there exists a rapid recursive way for this. Let f (n) := P (S = n). Then f (0) = P (S = 0) = P (N = 0) = e−α , m 1X jαpj f (n − j), n = 1, 2, 3, . . . . f (n) = n j=1
(5.2)
To derive this recursive relationship note that g(z) =
∞ X n=0
32
f (n)z n
(5.3)
and log g(z) =
m X
αpj (z j − 1).
(5.4)
j=1
Since
d g(z) dz
d = g(z) dz log g(z), ∞ X n=1
nf (n)z n−1 =
̰ X
!Ã f (i)z i
i=0
Comparing coefficients we have nf (n) =
m X
! jαpj z j−1
.
(5.5)
j=1
Pm j=1
f (n − j)jαpj .
5.2 The normal approximation for random sums P Here we consider the random sum S := N n=1 Xi where N is independent of the Xi that are assumed to be i.i.d. with finite second moment. We denote the mean of X1 by m and its variance by σ 2 and otherwise make no other assumptions regarding the statistics of X or N . The normal approximation then suggests that the random variable √S−ES Var(S)
is approximately normally distributed. Of course we have seen that ES = mEN and Var(S) = m2 Var(N ) + σ 2 EN hence, if G(x) := P (S ≤ x), ! Ã x − mEN . (5.6) G(x) ≈ Φ p m2 Var(N ) + σ 2 EN The above relationship is of course not exact. To obtain a bound for the error we start with the Berry-Esseen bound for the error in the central limit theorem: Theorem 6 (Berry-Esseen). Suppose that {Xi } are i.i.d. with zero mean, variance σ 2 = EX12 , and finite third moment, γ 3 = EX13 . If Sn = X1 + · · · + Xn , then ¯ ¯ ¡ ¢ √ c1 ³ γ ´3 ¯P Sn /σ n ≤ x − Φ(x)¯ ≤ √ n σ
for all x ∈ R.
The value of the constant c1 is less than 1. The above theorem can be used to obtain a bound for the normal approximation error. Then, since from the Berry-Esseen bound ¡ ¢ √ c1 ³ γ ´3 c1 ³ γ ´3 −√ ≤ P Sn /σ n ≤ x − Φ(x) ≤ √ n σ n σ
(5.7)
√ The above is valid for n = 1, 2, . . . , while for n = 0, regardless of how S n is defined, n/ √ √ |P (Sn /σ n ≤ x) − Φ(x)| ≤ 1. (We could agree for instance to set SN / N = 0 when
33
N = 0.) Then from (5.7) we have ¯ ³ ¯ ´ √ ¯ ¯ ¯P SN /σ N ≤ x − Φ(x)¯ = (1(x ≥ 0) − Φ(x))P (N = 0) +
∞ X
¯ ¡ ¯ ¢ √ P (N = n) ¯P Sn /σ n ≤ x − Φ(x)¯
n=1
∞ ³ γ ´3 X
1 P (N = n) √ σ n=1 n ³ γ ´3 = P (N = 0) + c1 P (N > 0) E[N −1/2 | N > 0]. σ
≤ P (N = 0) + c1
We will need the following Lemma 2. For any random variable M with values on the positive integers, 1, 2, 3, . . . we have ¡ ¢1/2 (EM )−1/2 ≤ EM −1/2 ≤ EM −1 . (5.8) Proof: We have (EM −1/2 )2 ≤ E(M −1/2 )2 = EM −1 (from the Cauchy-Schwarz inequality) or ¡ ¢1/2 EM −1/2 ≤ EM −1 . (5.9) On the other hand, from Jensen’s inequality we have EM −1/2 ≥ (EM )−1/2 . This last equation, together with (5.9) establish the proof of the lemma. Let us now consider the case when N is Poisson with mean α. −1/2
Lemma 3. Let Nα be a Poisson random variable with mean α. Then E[Nα 0] ³ α−1/2 as α → ∞.
| Nα >
Proof: We have P (Nα > 0) = 1 − e−α and −α −1
E[Nα |Nα > 0] = (1 − e
)
∞ X αn n e−α = α(1 − e−α ). n! n=1
Then we can use inequality (5.8) on the random variable Nα , conditioned on the event Nα > 0 to obtain ¡ ¢1/2 E[Nα−1/2 |Nα > 0] ≤ E[Nα−1 |Nα > 0] . In order to estimate the size of E[Nα−1 |Nα
∞ e−α X 1 αn > 0] = 1 − e−α n=1 n n!
34
we set g(α) := and note that g 0 (α) =
P∞ n=1
∞ X 1 αn n n! n=1
αn−1 n!
whence we obtain that g 0 (α) = α−1 (eα − 1). Thus Z α g(α) = (ex − 1)x−1 dx 0
and E[N
−1
1 |N > 0] = α e −1
Z
α 0
ex − 1 dx x
by virtue of lemma 2 we have p
1 α(1 − e−α )
≤ E[N
−1/2
1 |N > 0] ≤ √ α
µ
α α e −1
Z
α 0
ex − 1 dx x
¶1/2 .
Since
Z α x e −1 α lim α dx = 1 α→∞ e − 1 0 x (e.g. use de L’Hˆopital’s rule) we conclude that 1 E[Nα−1/2 |Nα > 0] ³ √ . α We can thus state the following Theorem 7. For the compound Poisson model we have the asymptotic relationship ¯ ³ ¯ ´ p c1 ³ γ ´3 ¯ ¯ . ¯P SNα /σ Nα ≤ x − Φ(x)¯ ³ √ α σ
5.3 The translated Gamma approximation The idea here is to approximate S := tion which has density
PN i=1
Xi by means of a translated Gamma distribu-
β α (x − k)α−1 −β(x−k) f (x) = e ; Γ(α)
x ≥ k,
(5.10)
where, as usual, β > 0, α > 0. In particular, if Z has distribution Gamma(α, β) then X := k + Z has the translated Gamma distribution with density given by (5.10. Let us denote by m := ES and σ 2 := Var(S) the mean and variance of the random sum S and by γ :=
E[S 3 ] (Var(S))3/2 35
the skewness coefficient. On the other hand, the mean, variance, and skewness coefficient for the translated normal can be obtained easily as follows: Clearly, EX = k + EZ while Var(X) = V ar(Z) and similarly, the skewness coefficient of X and Z are the same. Since Z follows the Gamma(α, β) distribution we have EZ = α/β and Var(Z) = α/β 2 . Also, EZ 3 = α(α + 1)(α + 2)/β 3 and thus E[(Z − EZ)3 ] = E[Z 3 − 3Z 2 EZ + 3Z(EZ)2 − (EZ)3 ] α(α + 1) α α(α + 1)(α + 2) α3 α3 − 3 − = + 3 β3 β2 β β3 β3 2α = . β3 Thus
2 Skewness coeff. of X = Skewness coeff. of Z = √ . α
The unknown parameters of the translated Gamma approximation are then determined from the known m, σ, γ by solving the system m
=
σ2
=
γ
=
36
k+
a β
a β2 2 √ . α
Bibliography [1] Asmussen, S. (1987). Applied Probability and Queues, John Wiley. [2] Beirlant, J., J.L. Teugels, and P. Vyncier. (1996). Practical Analysis of Extreme Values, Leuven University Press. [3] B¨uhlman, H. (1971). Mathematical Methods in Risk Theory, Springer–Verlag. [4] Feller, W. (1971). Introduction to Probability Theory, Vol. 2, John Wiley. [5] Grandel, J. (1991). Aspects of Risk Theory, Springer–Verlag.
37