Chapter 5 Discrete random variables

5 downloads 129 Views 557KB Size Report
Chapter 5. Discrete random variables. 5.1 Definition. This chapter is concerned ... Much more often, the distribution of a random variable is given directly, without ...
42

5. Discrete random variables

Statistics 2R: Probability

P{X = 0} =

P{X = 1} =

Chapter 5

P{X = 2} =

Discrete random variables p(x) =

5.1

Definition

This chapter is concerned with discrete random variables. We start by stating the definition of a discrete random variable.

                        



p(x)

F (x)

1

1

0.75

0.75

0.5

0.5

0.25

0.25

Definition 5.1 (Discrete random variables). A random variable whose range RX is finite or countably

infinite is called a discrete random variable.

The random variable we have studied in example 4.1 is an example of a discrete random variable. Other examples of discrete random variables would be the number of times you miss a 2R lecture, the number of elephants in a national park, etc. In section 4.1 we have seen how the probability distribution of a random variable can be derived in general. Much more often, the distribution of a random variable is given directly, without referring to the underlying sample space. The probability mass function (p.m.f.) is one way of specifying the distribution of a discrete random variable. It gives the probability that a random variable X takes a value x.

1 (a) Probability density function (p.d.f.)

2

x

1

2

x

(b) Cumulative distribution function (c.d.f.)

Figure 5.1. Probability density function (p.d.f.) and cumulative distribution function (c.d.f.) of the random variable X of example 5.1.

Definition 5.2 (Probability mass function (p.m.f.)). Let X be a discrete random variable. The function

p : R → R, x 7→ p(x) = P{X = x} is then called the probability mass function (p.m.f.).

Example 5.1. Suppose you toss a fair coin twice and denote by X the number of heads obtained. Find the

probabilities P{X = 0}, P{X = 1}, and P{X = 2} and write down and sketch the probability mass function (p.m.f.) p(·) of X (use figure 5.1(a)).

Proposition 5.3. Let X be a discrete random variable with probability mass function (p.m.f.) p(·).

i. p(x) ∈ [0, 1] for all x ∈ Rx .

ii. p(x) = 0 for all x 6∈ Rx . X iii. P{X ∈ A} = p(x). x∈A X p(x) = 1. iv. x∈Rx

Statistics 2R: Probability

5.1 Definition

43

Any function p(·) with the properties i., ii., and iv. is a valid probability mass function (p.m.f.), i.e. it uniquely determines the distribution of X, or, more precisely, for any given function p(·) which satisfies i., ii. and iv., there exists a discrete random variable X which has p(·) as its probability mass function

44

5. Discrete random variables

Statistics 2R: Probability

iii. limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1. iv. F (·) is a step function with discontinuities at all possible values X can take, i.e. for x ∈ RX the height of the “jump” of F (·) at x is

(p.m.f.).

Proof.

v. F (x) =

i. p(x) ≡ P{X = x} ∈ [0, 1] (axiom ii. and proposition 1.7)

ii. p(x) = P{X = x} = 0 if x 6∈ RX .

x∈A

P{X ∈ A} = P iv.

X

x∈Rx

x∈A

{X = x}

!

X X axiom iii. = P{X = x} = p(x). x∈A

p(k).

k∈RX k≤x

iii. Because A ⊂ RX is at most countable and we can write A as an at most countable disjoint union [ A= {X = x}, we have that [

X

p(x) = F (x) − F (x− ).

Figure 5.2 illustrates proposition 5.5 for a discrete random variable with range RX = {−1, 1, 1} and

P{X = −1} = a, P{X = 0} = b, and P{X = 1} = c (a + b + c = 1). p(x)

F (x)

x∈A

a+b+c = 1

P{X = 1}

a+b

iii.

p(x) = P{X ∈ RX } = P(Ω) = 1. P{X = −1}

The probability mass function (p.m.f.) which gives the probability that a discrete random variable X exactly equals x is not the only function one can define to specify the probability distribution of a random variable X. Another function which can be used to specify the distribution of a random variable is the

P{X = 0} a

a



b c

P{X = 0}

P{X = 1} x 1

−1

P{X = −1} x −1

1

Figure 5.2. Illustration of proposition 5.5, which relates the probability mass function (p.m.f.) p(·) to the cumulative distribution function

(c.d.f.) F (·).

cumulative distribution function (c.d.f.) which gives the probability that the random variable X is at most x. Proof. Definition 5.4 (Cumulative distribution function (c.d.f.)). The function

i. P{X > x} = 1 − P({X > x}∁ ) = 1 − P{X ≤ x} = 1 − F (x).

ii. We have that {X ≤ b} = {X ≤ a} ∪ {a < X ≤ b} is a disjoint union, thus by the third axiom F (b) = P{X ≤ b} = P ({X ≤ a} ∪ {a < X ≤ b})

F (x) = P{X ≤ x}

= P{X ≤ a} +P{a < X ≤ b} = F (a) + P{a < X ≤ b}. | {z }

is called the cumulative distribution function (c.d.f.) of the random variable X.

=F (a)

Solving this equation for P{a < X ≤ b} gives P{a < X ≤ b} = F (b) − F (a).

We will see later on in this course that whilst the probability mass function (p.m.f.) is specific to discrete random variables, the cumulative distribution function (c.d.f.) can be defined for any type of random variable.

iii. We have that

lim F (x) = lim P{X ≤ x} = P{X ≤ −∞} = P(∅) = 0

x→−∞ x→+∞

Proposition 5.5. Let X be a discrete random variable with probability mass function (p.m.f.) p(·) and

cumulative distribution function (c.d.f.) F (·). i. P {X > x} = 1 − F (x) for all x ∈ R.

ii. P {a < X ≤ b} = F (b) − F (a) for a ≤ b ∈ R.

x→−∞

lim F (x) = lim P{X ≤ x} = P{X < +∞} = P(Ω) = 1. x→+∞

iv. Let xi and xi+1 be two consecutive elements of the range RX of X. In order to show that F (·) is a step function we need to show that it is constant on the interval [xi , xi+1 ). Let xi ≤ x < xi+1 . Then F (x) = P{X ≤ x} = P{X ≤ xi } + P{xi < X ≤ x} = F (xi ) {z } | {z } | =F (xi )

=0

Statistics 2R: Probability

5.1 Definition

45

The second probability is 0 because X cannot take any value that lies in the interval (xi , x].

P{X < x} | {z }

=limξրx P{X≤ξ}=limξրx F (ξ)=F (x− )

v. As X is a discrete random variable we have that {X ≤ x} =

S

k∈RX k≤X

P{X < x} = P{X ≤ x} − P{X = x} = F (x) −

discrete random variable, i.e. it uniquely determines the distribution of the random variable X. The corollary below summarises how we can compute probabilities from the cumulative distribution

P{X = x} = p(x) = F (x) − F (x− )



= F (x− )



Any function F (·) which satisfies iii. and iv. is a valid cumulative distribution function (c.d.f.) of a

Corollary 5.6. For a discrete random variable X we have that

p(x) |{z}

Figure 5.3 illustrates the formulae from proposition 5.5 and corollary 5.6.

k∈RX k≤X

function (c.d.f.).

= 1 − F (x− )

=F (x)−F (x− )

= k}. As this is a countable

k∈RX k≤X

p(x) |{z}

=F (x)−F (x− )

=p(x)

and disjoint union, we can apply the third axiom and obtain ! [ X X F (x) = P{X ≤ x} = P {X = k} = P{X = k} = p(k).

Statistics 2R: Probability

P{X ≥ x} = P{X > x} + P{X = x} = 1 − F (x) +

+ P{X = x} | {z }

k∈RX {X k≤X

5. Discrete random variables

Proof. We only need to show that

Furthermore F (x) = P{X ≤ X} =

46

Example 5.2. Find the cumulative distribution function (c.d.f.) for the random variable X from example

5.1 (number of heads when tossing a fair coin twice). Sketch the c.d.f. F (·) in figure 5.1(b).             F (x) =           



P{X < x} = F (x− ) = F (x) − p(x)

P{X ≤ x} = F (x)

P{X ≥ x} = 1 − F (x− ) = 1 − F (x) + p(x).

P{X > x} = 1 − F (x)

5.2 Expected value Example 5.3. Consider the game from example 4.1, where you loose / win £1 with probability

1 3

each. If

you play the game n = 1000 times, you would expect that the relative frequencies of the events {X = −1}, {X = 0}, and {X = 1} would approximately be 13 . Your total profit after n games is then

1

1

−1 · (# times you loose £1) + 0 · (# times neither player wins) + 1 · (# times you win £1) =

P{X > x} P{X ≥ x}

F (x)

F (b) P{a < X ≤ b}

P{X = x} −

F (x ) P{X ≤ x}

F (a)

= −1 · n · rfn ({X = −1}) + 0 · n · rfn ({X = 0}) + 1 · n · rfn ({X = 1}). So you would expect an average profit per game of

P{X < x}

x Figure 5.3. Illustration of the formulae from proposition 5.5 and corollary 5.6.

−1 · rfn ({X = −1}) + 0 · rfn ({X = 0}) + 1 · rfn ({X = 1}) ≈ −1 · a

b

Thus in the long run you will on average play even.

This motivates the following definition of the expected value.

1 1 1 + 0 · + 1 · = 0. 3 3 3 ⊳

Statistics 2R: Probability

5.2 Expected value

47

Definition 5.7 (Expected value). Let X be a discrete random variable with range RX . Then we define

48

5. Discrete random variables

Statistics 2R: Probability

Example 5.5. Let X be a discrete random variable with probability mass function1

the expected value (expectation) of X to be E(X) ≡ provided the sum

P

x∈Rx

X

x∈RX

p(x) = x · p(x),

Then we have that

X

|x| · p(x) converges.

x∈N

The expected value is the “average” of the distribution of the random variable X, just like the arithmetic

for x ∈ N.

6 X |x| 6 = 2· π 2 x∈N x2 π

X1

,

x x∈N | {z }

diverges to +∞

thus E(X) does not exist for this distribution.

mean x¯ is the “average” of a sample x1 , . . . , xn . In 2X you will learn the Laws of Large Numbers: If you ¯ of the observed values of X will repeatedly carry out the same experiment, then the sample mean X (under suitable regularity conditions) converge to the expected value E(X). Figure 5.4 given the expected

|x|p(x) =

6/π 2 x2



Calculating the sum in the definition of the expected value can sometimes be cumbersome. In these cases the following alternative formula can be useful.2

value for three simple distributions. p(x)

p(y)

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

1

2

3

4

5

(a) E(X) = 3 and Var (X) = 1.2.

x

1

Proposition 5.8. Let X be a random variable whose range is (a subset of ) the non-negative integers, i.e.

pZ (z)

2

3

4

5

(b) E(Y ) = 2 and Var (Y ) = 1.2.

y

RX = N0 . Then E(X) =

X

P(X > x) =

X

x∈N0

x∈N0

1 − F (x).

Proof.

1

2

3

4

5

E(X) =

z

X

x∈RX

=

(c) E(Z) = 3 and Var (Z) = 2.

Figure 5.4. The distributions in panel (a) and (c) have the same expected value. The distributions in panels (a) and (b) have the same variance.

Example 5.4. Find the expected value E(X) of the random variable X from example 5.1 (number of

heads when tossing a fair coin twice).

x · p(x) = 0 · p(0) + 1 · p(1) + 2 · p(2) + 3 · p(3) + 4 · p(4) + . . . p(1)

+

p(2)

+

p(2)

+

p(3)

+

p(3)

+

p(3)

+

p(4)

+

p(4)

+

p(4)

+

...

+

= P{X > 0} + P{X > 1} + P{X > 2} + P{X > 3} + . . . X X 1 − F (k) P{X > k} = = k∈N0

k∈N0

A more formal way of writing down this idea is to write x = 1 + · · · + 1 = | {z } x times

E(X) =

X

x∈RX

=



x · p(x) =

X X

x∈N0 k∈N0 k 2}

P{X > 1}

y · pY (y) =

X

x∈RX

g(x) · pX (x).



Example 5.6. Consider again the random variable X from example 5.1 (number of heads when tossing a

fair coin twice). Find E(X 2 ). x 1

2

Figure 5.5. Illustration of proposition 5.8. The expected value E(X) corresponds to the area shaded in grey.

Geometrically speaking, the proposition says that for a non-negative random variable the expected value E(X) equals the area between the cumulative distribution function F (x) and the horizontal asymptote at 1. Figure 5.5 visualises this. In 2X you will prove the following important formulae that tell us how to compute the expected value



of the sum and the product of two random variables. Proposition 5.10 provided us with a formula for computing E(g(X)) for a general function g(·). If g(·) Proposition 5.9. Let X and Y be random variables (which do not need to be independent), then

is a linear transformation, this formula can be simplified even further, as the following proposition shows.

E(X + Y ) = E(X) + E(Y ). Proposition 5.11. Let X be a (discrete) random variable and a, b ∈ R be deterministic constants, then

If X and Y are independent, then E(X · Y ) = E(X) · E(Y ). Often we are not only interested in the expected value E(X) of X, but also in the expected value of

E(a · X + b) = a · E(X) + b. (“The expected value is a linear operator.”)

functions g(X) of X. Corollary 5.12. Let X1 , . . . , Xn be random variables and a1 , . . . , an ∈ R be deterministic constants, then Proposition 5.10. Let X be a discrete random variable with probability mass function pX (·) and let

g : R → R be a function. Then

E(g(X)) =

X

x∈RX

g(x) · pX (x).

(provided the (possibly infinite) sum on the right hand side is absolutely convergent)

Note that unless g(·) is a linear function E(g(X)) 6= g(E(X)). Proof.

P P E ( ni=1 ai · Xi ) = ni=1 ai · E(Xi ). Proof.

Statistics 2R: Probability

5.3 Variance

51

52

5. Discrete random variables

Statistics 2R: Probability

σ(X) ≡



p

Var (X).

Example 5.8. Compute the variance and standard deviation for the random variable X of example 5.1

5.3

(number of heads when tossing a fair coin twice).

Variance

In example 5.4 we have found that µ = E(X) = 1.

Example 5.7. Consider a modification of the game from example 4.1. You now win / loose £a with

probability −a · 13 + 0

1 3 · 13

each. For every a ∈ R, the expected value of X is E(X) = +a·

1 3

= 0.

P

x∈{−a,0,a}

x

x · pX (x) =

Whatever value a takes, the game is a fair game. Whilst most of you might be willing to play this game

if a =£1 is at stake, would you still play the game if a =£1, 000, 000 are at stake? Probably not. Although it is clear that in either game you will eventually play approximately even for every a ∈ R,

1

2

P –

(x − µ)2 p(x) (x − µ)2 · p(x)

This implies that

for a single game there is much less “risk” involved if only £1 is at stake. In other words, the larger a, the more variation there is the game.

0

Var (X) = E((X − µ)2 ) =



X

x∈RX

(x − µ)2 · p(x) =

σ(X) =



How can we define a measure of variation (or spread / riskiness)? One criterion we could look at is how far X will be on average from its expected value µ = E(X). We measure how far X is from its expected value by looking at the squared distance (X − µ)2 .

The following proposition summarises important properties if the variance Var (X), most importantly that the variance cannot be negative — otherwise our definition of the standard deviation would not have made sense.

Definition 5.13 (Variance). The variance of a random variable X is defined to be

 Var (X) ≡ E (X − µ)2 ,

Proposition 5.15. (a) The variance of a random variable cannot be negative, i.e. Var (X) ≥ 0.

(b) If Var (X) = 0, then P{X = µ} = 1, i.e. X is (almost surely) constant.

where µ = E(X), provided E(X 2 ) exists (and is finite).

Note that, like the expected value, the variance does not need to exist. The random variable from example 5.5 does not have a (finite) variance either. It is important to keep in mind that the expected value and the variance measure two different aspects

Proof. (a) Var (X) = E((X − µ)2 ) ≥ E(0) = 0.

{z } P (b) Note that 0 = Var (X) = x∈RX (x − µ)2 · p(x) implies for every x ∈ RX that either p(x) or (x − µ) |

≥0

is 0, i.e. p(x) = 0 if x 6= µ.



of a distribution. The expected values gives the mean of the random variable and thus contains information about the (average) location. The variance, however, is a measure of the spread of the distribution. Figure 5.4 illustrates this using three simple examples. The variance is the expected squared distance to the expected value. Its scale is thus the squared scale of X. If X is for example measure in metres, the variance would be in square metres. This can make interpreting the variance difficult. Thus we define the standard deviation as the square root of the variance: this ensures that it is on the same scale as the random variable X itself.

Definition 5.14 (Standard deviation). The standard deviation of a random variable is defined to be

Whilst we can use the definition of the variance to compute the variance (as we did in example 5.8), there is a more convenient formula:

Statistics 2R: Probability

5.3 Variance

53

Proposition 5.16. Var (X) = E(X 2 ) − µ2 , where µ = E(X).

54

5. Discrete random variables

Statistics 2R: Probability

Proposition 5.17. (a) Let X be a (discrete) random variable and a, b ∈ R be deterministic constants.

Then

Var (a · X + b) = a2 · Var (X) .

Proof. We have that

(b) Let X and Y be two (discrete) random variables which are independent. Then

Var (X) = E((X − µ)2 ) = E(X 2 − 2 · X · µ + µ2 )

Var (X + Y ) = Var (X) + Var (Y ) .

= E(X 2 ) − E(2 · X · µ) + E(µ2 ) = E(X 2 ) − 2 · E(X) ·µ + µ2 | {z } =µ

2

2

2

2



2

= E(X ) − 2 · µ + µ = E(X ) − µ .

Corollary 5.18. Let X1 , . . . , Xn be independent random variables and a1 , . . . , an ∈ R be deterministic

constants, then Var ( Example 5.9. Consider the discrete uniform distribution on the set {1, 2, 3, 4}, i.e.

p(x) =

(

1 4

0

1

2

2

x

i=1

ai · Xi ) =

Pn

i=1

a2i · Var (Xi ).

Proof. (a) Denote by Y = a · X + b. Denote by µ = E(X) and η = E(Y ). Then

if x ∈ {1, 2, 3, 4} else

η = E(a · X + b) = a · µ + b,

Compute E(X) and Var (X). x

Pn

3

4

Var (a · X + b) = E((Y − η)2 ) = E((a · X + b − (a · µ))2 ) = a2 E((X − µ)2 ) = a2 · Var (X) .

P

(b) You will prove this in 2X.





p(x) x · p(x) x2 · p(x)

Learning outcomes for chapter 5. By the end of this chapter you should be able to . . .

– explain what is meant by a discrete random variable, the probability mass function (p.m.f.), the cumu⊳

lative distribution function (c.d.f.), the expected value and the variance, as well as state their formal definitions; – obtain probabilities from the p.m.f. and/or the c.d.f.;

Finally, we will state two important formulae that tell us how to compute the variance of linear transformations and of sums of random variables.

– compute the expected value and variance for a given discrete distribution; and – derive theoretical properties from the definitions of the p.m.f., c.d.f., expected value and variance. Additional reading. Chapter 8 and sections 11.1, 11.2, and 14.1 of McColl / Sections 4.3 to 4.6 of Ross.

Suggest Documents