Preparatory Course Econometrics Probability Theory - Statistical ...

Preparatory Course Econometrics Probability Theory - Statistical Inference - Matrix Algebra

Prof. Dr. Christian Conrad Dipl.-Vw. Daniel Rittler University of Heidelberg Winter term 2011/12

Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

1 / 87

Preparatory Course Econometrics

Contents 1. Probability framework for statistical inference 2. Fundamentals of asymptotic distribution theory 3. Point estimation 4. Interval estimation 5. Statistical hypothesis testing 6. Fundamentals of matrix algebra


Winter term 2011/12

2 / 87

Preparatory Course Econometrics Literature Statistics Hogg, R. V. and A. T. Craig, Introduction to Mathematical Statistics, Prentice Hall, 1995; Mosler, K. and F. Schmid, Wahrscheinlichkeitsrechnung und schließende Statistik, Springer, 2004. Econometrics W. H. Greene, Econometric Analysis, 6. edition, Prentice Hall, 2008; J. H. Stock and M. W. Watson, Introduction to Econometrics, 2. edition, Addison-Wesley, 2007; J. M. Wooldridge, Econometric Analysis of Cross Section and Panel Data, 2. edition, MIT Press, 2002.


Winter term 2011/12

3 / 87

1 Probability framework for statistical inference

1 Probability framework for statistical inference 1.1 1.2 1.3 1.4 1.5

Calculus of probability Probability measure Random variables Joint distributions Conditional distributions


Winter term 2011/12

4 / 87

1 Probability framework for statistical inference 1.1 Calculus of probability

1.1.1 Notation Our starting point is a random experiment with possible outcomes ω and the set of all outcomes Ω, called the sample space. An event A is a collection of outcomes, and, hence a subset of Ω. Problem: Tossing a die one time Describe the sample space and the events A: "the outcome is an odd number" and B: "the outcome is an even number".


Winter term 2011/12

5 / 87

1 Probability framework for statistical inference 1.1 Calculus of probability

1.1.2 Elementary set operations Union: A ∪ B = {x : x ∈ A or x ∈ B}; Intersection: A ∩ B = {x : x ∈ A and x ∈ B};

Complement: A = {x : x ∈ / A}; Relative complement: A\B = {x : x ∈ A and x ∈ / B} 1.1.3 Properties of set operations A and B are disjoint if A ∩ B = ∅;

A and B are a partition of C if A ∩ B = ∅ and A ∪ B = C; Commutativity: A ∪ B = B ∪ A and A ∩ B = B ∩ A;

Associativity: A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ C) = (A ∩ B) ∩ C; Distributive laws: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) and A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C); De Morgan’s laws: A ∪ B = A ∩ B and A ∩ B = A ∪ B Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

6 / 87

1 Probability framework for statistical inference 1.2 Probability measure

Starting point: experiment with sample space Ω and relevant events A

Definition A function P that assigns a real number to each interesing event A is called probability measure (probability) if (i) 0 ≤ P(A) ≤ 1

for all events A

(ii) P(Ω) = 1 S∞ P∞ (iii) P( i=1 Ai ) = i=1 P(Ai ) for all A1 , A2 , ... with Ai ∩ Aj = ∅ for i 6= j

Implications: P(∅) = 0

P(A) = 1 − P(A) P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

P(A) ≤ P(B), if A ⊆ B S P P( ni=1 Ai ) = ni=1 P(Ai ) for all A1 , A2 , ..., An with Ai ∩ Aj = ∅ for i 6= j


Winter term 2011/12

7 / 87


Problem: Tossing a fair coin two times Consider the random experiment "tossing a fair coin two times". Describe all possible outcomes; the sample space; all possible events; the probability measure.


Winter term 2011/12

8 / 87


1.2.1 Conditional probability and statistical independence

Definition Let A and B be two events in Ω with P(B) > 0 then P(A|B) :=

P(A ∩ B) P(B)

is called conditional probability of event A under the condition of event B. Note that P(·|B) satisfies P(A|B) ≥ 0, P(Ω|B) = 1, and

P(A1 ∪ A2 |B) = P(A1 |B) + P(A2 |B) if A1 ∩ A2 = ∅. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

9 / 87


Example: Consider a sample space Ω with cardinality |Ω| = n. Further, A and B are events with |A| = k and |B| = l. Finally, |A ∩ B| = m. Each element of the sample space occurs with equal probability. Determine P(A), P(B), P(A ∩ B), and P(A|B); Illustrate the problem graphically; Interprete the conditional probability P(A|B).


Winter term 2011/12

10 / 87


Definition The definition of the conditional probability yields the multiplication theorem P(A ∩ B) = P(A|B) · P(B)

Definition The events A and B are called statistically independent if P(A|B) = P(A) Hence, the occurence of event B has no information about the likelihood of event A. Note: If A and B are statistically independent, then P(A ∩ B) = P(A) · P(B). Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

11 / 87

1 Probability framework for statistical inference 1.3 Random variables

1.3.1 Random variables

Definition Consider the sample space Ω and the probability measure P. A mapping X : Ω → Ω′ ⊆ R is called random variable.

The probability distribution (distribution) PX of X is given by PX (B) = P(X −1 (B)) = P({ω ∈ Ω|X(ω) ∈ B}),

with B ⊆ Ω′ . Interpretation: A random variable X assigns a real number X(ω) = x to each outcome ω ∈ Ω of a random experiment. While X is a function, x is a real number, called realised value. X induces a new sample space Ω′ and a new probability function PX on Ω′ . Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

12 / 87


Example: Two gamblers, S and T, throw two fair coins. S pays T two Dollars if both coins show "heads". T pays S one Dollar if one coin shows "tails". If both coins show "tails" none of the gamblers receives a payment. X shows the payment of S. Determine the distribution of the random variable X. Sample space: Ω = {(h, h), (h, t), (t, h), (t, t)}; The function X assigns the corresponding payoffs of S to each of the outcomes ω X(h, h) = −2 New sample space: Distribution of X:

X(h, t) = 1

X(t, h) = 1

X(t, t) = 0

Ω′ = {−2, 0, 1}.

PX ({−2}) = P(X −1 ({−2})) = P({ω ∈ Ω|X(ω) = −2}) = P({(h, h)}) =

1 4

PX ({1}) = P(X −1 ({1})) = P({ω ∈ Ω|X(ω) = 1}) = P({(t, h), (h, t)}) =

1 2

PX ({0}) = P(X −1 ({0})) = P({ω ∈ Ω|X(ω) = 0}) = P({(t, t)}) =


1 4

Winter term 2011/12

13 / 87


1.3.2 Cumulative density function

Definition Consider the random variable X : Ω → R with probability distribution PX . The function FX : R → [0, 1] with FX (x) := PX ((−∞, x]) = P(X ≤ x) is called cumulative density function (CDF) of X. Properties of the CDF: 1 FX is nondecreasing in x. 2

FX is right-continuous, that means x→x lim FX (x) = FX (x0 ). 0 x>x0

3

lim FX (x) = 0,

x→−∞ 4

lim FX (x) = 1.

x→∞

PX ((a, b]) = FX (b) − FX (a).


Winter term 2011/12

14 / 87


1.3.3 Discrete and continuous random variables A random variable X is discrete if FX (x) is a step function. The range of X consists of a countable set of real numbers x1 , x2 , .... The probability function takes the form P(X = xi ) = πi , i = 1, 2, ... P where 0 ≤ πi ≤ 1 and i πi = 1 so that X FX (xi ) = P(X ≤ xi ) = P(X = xt ). xt ≤xi

A random variable X is continuous if FX (x) is continuous in x. Probabilities are represented by the the probability density function (PDF) d fX (x) = FX (x) dx so that Z x FX (x) = P(X ≤ x) = fX (u)du −∞ R A function fX (x) is a PDF if and only if fX (x) ≥ 0 for all x ∈ R and R fX (x)dx = 1. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

15 / 87


Problem: Consider the function fX (x) =

(

e−x 0

if x ≥ 0 else

i) Check, whether fX is a probability density function. ii) Derive the cummulative density function of X. iii) Show fX and FX graphically. iv) Determine P(X > 0, 5) and P(0, 5 < X ≤ 1). v) Interprete the results of iv)graphically.


Winter term 2011/12

16 / 87


1.3.4 Moments of random variables

Definition For any real function g, we define the expectation E[g(X)] as follows. If X is discrete X E[g(X)] = g(xi )P(X = xi ), i

and if X is continuous

E[g(X)] =

Z

∞

g(x)fX (x)dx.

−∞

R∞ Note that −∞ g(x)fX (x)dx is well defined and finite if R∞ −∞ |g(x)|fX (x)dx < ∞. For g(x) = x, we have the ordinary definition of the expectation. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

17 / 87


Definition For m > 0, we define the m′ th moment of X as E(X m ) and the m′ th central moment of X as E[(X − E(X))m ]. Special moments Mean µX = E(X) Variance σX2 = E[(X − µX )2 ] p σX = σX2 is called the standard deviation of X X Z = X−µ σX is called the standard score of X. Problem: Show that E(Z) = 0 and Var(Z) = 1. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

18 / 87


Problem: Consider the continuous random variable X and the funcion g(X) = a + bX with a, b ∈ R. Derive E(g(X)) and Var(g(X)).


Winter term 2011/12

19 / 87


1.3.5 The binomial distribution A random experiment is called a Bernoulli experiment if we are only inetrested in whether an event A occurs or not. Hence, the random variable Xi is given by ( 1 if A occurs Xi = 0 if A occurs

Definition A random variable Xi with codomain {0,1} and parameter p ∈ (0, 1) is Bernoulli distributed (Xi ∼ Be(p)) if P(X = 1) = p. Consider n independent P Be(p)-distributed random variables Xi , ..., Xn . The random variable X = i Xi with codomain {0, 1, ..., n} and parameters n ∈ N and p ∈ (0, 1) is binomially distributed (X ∼ B(n, p)) if n k P(X = k) = p (1 − p)n−k . k Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

20 / 87


1.3.6 The normal distribution

Definition A continuous random variable with density 1 (x − µX )2 fX (x) = √ exp − 2σX2 2πσX with µX , x ∈ R and σX > 0 is called univariate normal. The mean and the variance of the normal distribution are µX and σX2 . It is conventional to write X ∼ N(µX , σX2 ). X The random variable Z := X−µ σX ∼ N(0, 1) is called standard normally distributed.


Winter term 2011/12

21 / 87

1 Probability framework for statistical inference 1.4 Joint distributions

1.4.1 Joint CDF

Definition A pair of random variables (X, Y) is a function from the sample space Ω into R2 . The joint CDF of (X, Y) is given by F(X,Y) (x, y) = P(X ≤ x, Y ≤ y). Properties of the joint CDF: F(X,Y) (x, y) is nondecreasing in x and y. F(X,Y) (x, y) is right-continiuous in x und y. (i) (ii)

lim F(X,Y) (x, y) = 0

x→−∞

lim F(X,Y) (x, y) = 0

y→−∞

lim F(X,Y) (x, y) = 1.

x,y→∞


Winter term 2011/12

22 / 87


1.4.2 Joint PDF

Definition The joint PDF f(X,Y) (x, y) of (X, Y) is given by f(X,Y) (x, y) =

∂2 F(X,Y) (x, y). ∂x∂y

if (X, Y) is continuous, and by f(X,Y) (x, y) = P(X = xi ∩ Y = yj ) if (X, Y) is discrete. Properties of f(X,Y) (x, y) for continuous random variables: f(X,Y) (x, y) ≥ 0 for all x and y. R∞ R∞ f (x, y)dxdy = 1 −∞ −∞ (X,Y) Rx Ry F(X,Y) (x, y) = −∞ −∞ f(X,Y) (u, v)dudv Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

23 / 87


1.4.3 Marginal distributions

Definition Consider the bivariate continuous random variable (X, Y). The marginal distribution of X is given by Z x Z ∞ FX (x) = P(X ≤ x) = lim F(X,Y) (x, y) = f(X,Y) (u, y)dydu y→∞

−∞

−∞

Definition Consider the bivariate continuous random variable (X, Y). The marginal density of X is given by Z ∞ d fX (x) = FX (x) = f(X,Y) (x, y)dy dx −∞ Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

24 / 87


1.4.4 Statistical independence

Definition The random variables X and Y with the joint CDF F(X,Y) (x, y) are statistically independent if F(X,Y) (x, y) = FX (x) · FY (y)

for all x, y ∈ R.

Interpretation: The statistical independence of X and Y is equivalent to the statistical independence of the events AX = {X ≤ x} and BY = {Y ≤ y} for all x, y ∈ R. Equivalently, X and Y are statistically independent if f(X,Y) (x, y) = fX (x) · fY (y) for all x, y ∈ R.


Winter term 2011/12

25 / 87


1.4.5 Expectation and Covariance

Definition For any real-valued function g(x, y) P P i Rj g(xi , yj )P(X = xi , Y = yj ), R∞ E[g(X, Y)] = ∞ g(x, y)f(X,Y) (x, y)dxdy, −∞ −∞

if X, Y are discrete if X, Y are continuous

Definition The covariance between X and Y is Cov(X, Y) = σX,Y = E[(X − µX )(Y − µY )]. The correlation between X and Y is Corr(X, Y) = ρXY = Daniel Rittler (Universitity of Heidelberg)

σXY . σX σY Winter term 2011/12

26 / 87


Properties of Covariance and Correlation Corr(X, Y) ∈ [−1, 1] is a measure of linear dependence, free of units of measurement; Cov(a + bX, c + dY) = bdCov(X, Y); Cov(X, Y) = E[XY] − E[X]E[Y]; X, Y are statistically independent ⇒ Cov(X, Y) = Corr(X, Y) = 0; Cov(X, Y), Corr(X, Y) 6= 0 ⇒ X, Y are statistically dependent;

Cov(X, Y) = Corr(X, Y) = 0 6⇒ X, Y are statistically independent.


Winter term 2011/12

27 / 87


Problem: Consider the bivariate random variable (X, Y) with density function ( e−x−y if x ≥ 0, y ≥ 0 f(X,Y) (x, y) = 0 else i) Determine the marginal density functions fX (x) and fY (y). ii) Check, whether X and Y are statistically independent. iii) Determine Cov(X, Y) without computation.


Winter term 2011/12

28 / 87


Expectation and variance of X + Y and X − Y Consider the random variables X and Y. The expectation of the expression X + Y is given by E(X + Y) = E(X) + E(Y) The variance of X + Y is given by Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) and the variance of X − Y is given by Var(X − Y) = Var(X) + Var(Y) − 2Cov(X, Y) An implication is that if X and Y are independent, then Var(X + Y) = Var(X) + Var(Y) since Cov(X, Y) = 0. Problem: Proof that Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y). Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

29 / 87


Expectation and variance of specifically distributed random variables Consider the random variables X ∼ B(n, p) and Y ∼ B(m, p) with X, Y indepedent. Then X + Y ∼ B(n + m, p) with E(X + Y) = (n + m)p and Var(X + Y) = (n + m)p(1 − p).

Consider the random variables X ∼ N(µX , σX2 ) and Y ∼ N(µY , σY2 ) with X, Y indepedent. Then X + Y ∼ N(µX + µY , σX2 + σY2 ) with E(X + Y) = µX + µY and Var(X + Y) = σX2 + σY2 .


Winter term 2011/12

30 / 87

1 Probability framework for statistical inference 1.5 Conditional Distributions

Definition Consider the random variables X and Y. The conditional density of Y given X = x is fX,Y (x, y) , for fX (x) > 0. fY|X (y|X = x) = fX (x) The conditional mean or conditional expectation of Y given X = x is given by Z ∞ m(x) = E(Y|X = x) = y · fY|X (y|X = x)dy −∞

The conditional variance of Y given X = x is given by σ 2 (x) = Var(Y|X = x) = E[(Y − m(x))2 |X = x)]. Evaluated at X = x, the conditional mean m(x) and conditional variance σ 2 (x) are realized values of the random variables m(X) = E(Y|X) and σ 2 (X) = Var(Y|X). Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

31 / 87

1 Probability framework for statistical inference 1.5 Conditional Distributions

Laws of iterated expectations Simple law of iterated expectations E(E(Y|X)) = E(Y) Extended law of iterated expectations E(E(Y|X, Z)|X) = E(Y|X) Conditioning theorem E(E(g(X)Y|X)) = g(X)E(Y|X) Problem: Proof the simple law of iterated expectations for continuous random variables X and Y. .


Winter term 2011/12

32 / 87

2 Fundamentals of asymptotic theory

2 Fundamentals of asymptotic theory 2.1 2.2 2.3 2.4

Convergence of random variables Laws of large numbers Central limit theorem Asymptotic transformations


Winter term 2011/12

33 / 87

2 Fundamentals of asymptotic theory 2.1 Convergence of random variables

Inequalities: Jensen inequality: If g is a convex function then g(E(X)) ≤ E(g(X)). Tschebyscheff inequality: P(|x − E(X)| ≥ ε) ≤


Var(X) . ε2

Winter term 2011/12

34 / 87


2.1.1 Convergence in distribution

Definition A sequence {Xn } of i.i.d. random variables is said to converge in distribution to a random variable X if lim FXn (x) = FX (x), n→∞

for every number x ∈ R at which F is continuous. Convergence in distribution d is denoted as Xn − → X. Properties: Since FX (a) = P(X ≤ a), the convergence in distribution means that the probability for Xn to be in a given range is approximately equal to the probability that the value of X is in that range, provided n is sufficiently large.


Winter term 2011/12

35 / 87


2.1.2 Convergence in probability

Definition A sequence {Xn } of i.i.d. random variables converges in probability towards X if for all ε > 0 lim P |Xn − X| ≥ ε = 0. n→∞

p

Convergence in probability is denoted as Xn − → X. Properties:

Let Pn be the probability that Xn is outside the ball of radius δ centered at X. Then for Xn to converge in probability to X there should exist a number Nδ such that for all n ≥ Nδ the probability Pn is less than ε. Convergence in probability implies convergence in distribution.


Winter term 2011/12

36 / 87


2.1.3 Convergence in square mean

Definition A sequence {Xn } of i.i.d. random variables converges in square mean towards X, if E|Xn |2 < ∞ for all n, and lim E (Xn − X)2 = 0 n→∞

where E denotes the expected value. Convergence in square mean is L2

denoted as Xn −→ X. Properties: Convergence in square mean tells us that the expectation of the square of the difference between Xn and X converges to zero. Convergence in square mean implies convergence in probability, and hence implies convergence in distribution. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

37 / 87

2 Fundamentals of asymptotic theory 2.2 Laws of large numbers

2.2 Law of large numbers

Definition The weak law of large numbers states that the sample average X n of a sequence i.i.d. random variables converges in probability towards the expected value µX p Xn − → µX when n → ∞. Properties: The weak law states that for a specified large n, the average X n is likely to be near µX . Thus, it leaves open the possibility that |X n − µX | > ε happens an infinite number of times. Problem: Proof the weak law of large numbers.


Winter term 2011/12

38 / 87

2 Fundamentals of asymptotic theory 2.3 Central limit theorem

2.3 Central limit theorem

Definition Let {Xn } be a sequence of i.i.d. random variables with mean µX and variance σX2 . The central limit theorem states that the standardised sample average √ x will converge in distribution to the standard normal distribution as n Zn = Xσxn/−µ n approaches infinity, that means d

Zn − → N(0, 1). Pn Problem: Show that the binomially distributed random variable Xn = i=1 Xi , with Xi ∼ Be(p), converges in distribution towards a normally distributed random variable.


Winter term 2011/12

39 / 87

2 Fundamentals of asymptotic theory 2.4 Asymptotic transformations

2.4.1 Continuous mapping theorem Let {Xn } be a sequence of random variables, X a random variable, and g(Xn ) a real valued function. The continuous mapping theorem states that d

i) Xn − → X

⇒

p

ii) Xn − → X

⇒

d

g(Xn ) − → g(X); p

g(Xn ) − → g(X);

2.4.2 Slutzky’s theorem Let {Xn } and {Yn } be sequences of random variables. If {Xn } converges in distribution to the random variable X, and {Yn } converges in probability to the constant c, then Slutzky’s theorem states that d

i) Xn + Yn − → X + c; d

ii) Yn Xn − → cX. Problem: Consider the i.i.d. random variables X1 , ..., PXnn with mean µ and variance σ 2 . Show that the sample variance Sn2 = n1 i=1 (Xi − X)2 converges in probability to σ 2 . Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

40 / 87

3 Point estimation

3 Point estimation 3.1 Fundamentals 3.2 Properties of an estimator


Winter term 2011/12

41 / 87

3 Point estimation 3.1 Fundamentals

3.1 Fundamentals Starting point: Independently and identically distributed random variables X1 , . . . , Xn ; Sample with sample size n: realized values x1 , . . . , xn with

Notes:

P sample mean x = 1n ni=1 xi ; P 2 sample variance SX = 1n ni=1 (xi − x)2 .

X1 , . . . , Xn are randomly drawn such that sample mean and sample variance are random as well. Population mean E(X) and variance Var(X) are unknown. Sample mean x and variance SX2 are used to estimate population mean E(X) and variance Var(X).


Winter term 2011/12

42 / 87

3 Point estimation 3.1 Fundamentals

Estimator and estimate

Definition Let X1 , . . . Xn be a sequence of i.i.d. random variables. ˆ 1 , ..., Xn ) is called estimator for ϑ ∈ Θ ⊆ R. A function ϑ(X ˆ 1 , ..., xn ) of an estimator ϑ(X ˆ 1 , ..., Xn ) is called the The realized value ϑ(x estimate for ϑ based on the sample x1 , ..., xn . Properties: ϑˆ is a random variable. Θ is called the parameter space. ˆ and variance Varϑ (ϑ) ˆ of ϑˆ depend on the distribution of ϑ. Mean Eϑ (ϑ)


Winter term 2011/12

43 / 87

3 Point estimation 3.2 Properties of an estimator

3.2 Properties of an estimator

Definition An estimator ϑˆ for an unknown parameter ϑ is unbiased if for all ϑ ∈ Θ

ˆ = ϑ; E(ϑ)

asymptotically unbiased if lim E(ϑˆn ) = ϑ;

n→∞

consisent if ϑˆn converges in probability to ϑ, that is, for all ε we have n o lim P ϑˆn − ϑ < ε = 1; n→∞

efficient if it is uniased and has the smallest variance of all unbiased estimators. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

44 / 87

3 Point estimation 3.2 Properties of an estimator

Problems: Show that an estimator ϑˆn with lim E(ϑˆn ) = ϑ

n→∞

and

lim Var(ϑˆn ) = 0

n→∞

is consistent for ϑ. Consider the following three estimators for E(X): a) µ â = b) µ ˆb = c) µ ˆc =

Pn i=1 Xi P n 1 i=1 Xi n−2 P 3 1 i=1 Xi 3 1 n

Evaluate the properties of the estimators. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

45 / 87

3 Point estimation 3.2 Unbiased estimators

Unbiased estimators Consider the sequences X1 , ..., Xn and Y1 , ..., Yn of i.i.d. random variables with E(X) = µX , Var(X) = σX2 and E(Y) = µY , Var(Y) = σY2 . Unbiased estimators are P X = 1n ni=1 Xi for the population mean E(X); ′ 1 Pn 2 1 SX2 = n−1 i=1 (Xi − X) for the population variance Var(X); P n 1 SXY = n−1 i=1 (Xi − X)(Yi − Y) for the covariance between X and Y.

1 Note

that SX2 =

1 n

Pn

i=1 (Xi


− X)2 is not unbiased but asymptotically unbiased. Winter term 2011/12

46 / 87

4 Confidence intervals

4 Confidence intervals 4.1 Fundamentals 4.2 Confidence intervals for the population mean


Winter term 2011/12

47 / 87

4 Confidence intervals 4.1 Fundamentals

4.1 Fundamentals Starting point: ˆ 1 , ..., Xn ); So far: Point estimators ϑ(X ˆ 1 , ..., Xn ) = ϑ) = 0. For continuous random variables P(ϑ(X h i Now: Derivation of intervals ϑˆl , ϑû , such that h i P ϑ ∈ ϑˆl , ϑû = 1 − α.

h i The interval ϑˆl , ϑû is called confidence interval, and covers the true parameter ϑ with probability (1 − α) · 100.

Since we are interested in confidence intervals for the population mean, we do have to know the sample distributions of the estimator.


Winter term 2011/12

48 / 87


Sample Distributions

Definition i.i.d.

2 Consider Pnthe sequence X1 , ..., Xn of random variables with Xi ∼ N(µ, σ ). X n = 1n i=1 Xi is a point estimator for µ with 2

X n ∼ N(µ, σn ) and

Z=

X n −µ √ σ/ n

∼ N(0, 1).

Z is called the Gauss-statistic.

Definition i.i.d.

Consider the sequence X1 , ..., Xn of random variables with Xi ∼ N(0, 1). Then Y=

n X i=1

Xi2 ∼ χ2 (n).

We say that Y is χ2 -distributed with n degrees of freedom. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

49 / 87


Definition Consider the random variables X ∼ N(0, 1) and Y ∼ χ2 (n); X, Y are independent. Then the distribution of the random variable T with X T= q

Y n

is called the t-distribution with n degrees of freedom.

Definition i.i.d.

Consider the sequence X1 , ..., Xn of random variables with Xi ∼ N(µ, σ 2 ). Then Xn − µ Xn − µ t= √ = ′ √ ∼ t(n − 1). S/ n S/ n − 1

The statistic is called t-statistic. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

50 / 87


Definition Consider the random variables X ∼ χ2 (m) and Y ∼ χ2 (n); X, Y are independent. Then the distribution of the random variable F with F=

X/m Y/n

is called the F-distribution with m and n degrees of freedom.


Winter term 2011/12

51 / 87

4 Confidence intervals 4.2 Confidence intervals for the population mean

4.2 Confidence intervals for the population mean Problem: i.i.d. Consider the sequence X1 , ..., Xn of random variables with Xi ∼ N(µ, σ 2 ). Derive a central (1 − α)-confidence interval for µ for σ 2 known; for σ 2 unknown.


Winter term 2011/12

52 / 87


Problem: Consider the analysis of the gas consumption X (in litre per 100 km) of a new car. Assume that X is normally distributed with E(X) = µ and Var(X) = σ 2 . Based on n = 16 test run, the following values are observed: 3.3; 4.1; 3.5; 4.0; 4.0; 3.6; 2.9; 3.1; 3.8; 4.1; 3.7; 4.2; 3.9; 3.5; 3.6; 3.0.

Compute a central confidence intervall (α = 0.05) for µ. Assume that σ 2 = 0.15 is known. Compute a central confidence intervall (α = 0.05) for µ. Assume that σ 2 is unknown.


Winter term 2011/12

53 / 87


Comments on the t-distribution: i.i.d.

If Xi ∼ N(µ, σ 2 ), then the t-distribution is the finite sample distribution of the t-statistic. Construction of exact confidence intervals for each sample size is possible. For n → ∞, the difference between the t-distribution and the N(0, 1) quantiles are negligible. The t-distribution is only relevant when the sample size is small. (However, for the t-distribution to be correct, you must be sure that the population distribution of Xi is normal.)


Winter term 2011/12

54 / 87

5 Statistical hypothesis testing

5 Statistical hypothesis testing 5.1 Fundamentals 5.2 Tests for the population mean


Winter term 2011/12

55 / 87

5 Statistical hypothesis testing 5.1 Fundamentals

5.1 Fundamentals Starting point: Independently and identically distributed random variables X1 , . . . , Xn ; Sample with sample size n and realized values x1 , . . . , xn ; Approach: Make a decision, based on the sample X1 , ..., Xn , whether a hypothesis concerning the parameter ϑ has to be rejected or cannot be rejected.

Definition Let ϑ be the unknown parameter, Θ the parameter space, and Θ0 , Θ1 a partition of Θ. The test problem can be written as H0 : ϑ ∈ Θ 0

against

H1 : ϑ ∈ Θ 1 ,

where H0 is the null hypothesis and H1 is the alternative hypothesis. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

56 / 87


Central idea of hypothesis testing: Starting point is the unbiased estimator ϑˆ = ϑ(X1 , ..., Xn ) with known distribution. Based on the conditional distribution of ϑˆ (conditional on H0 ), we derive a decision rule which evaluates, whether the realized sample x1 , ..., xn is compatible with H0 , or not. Under validity of H0 , we determine the region of acceptance (A) and the region of rejection (R) for the conditional distribution of ϑˆ such that P(ϑˆ ∈ region of rejection |H0 is true) = α. α is called level of significance. Based on ϑˆ we derive the test statistic V = V(X1 , ..., Xn |H0 ) which is compared with the region of rejection. For v(x1 , ..., xn ) ∈ R, we reject H0 ; for v(x1 , ..., xn ) ∈ A, we do not reject H0 . Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

57 / 87


Construction of a test: 1 2 3 4 5

Formulate the hypotheses H0 and H1 ; Determine the level of significance α; Choose the test statistic V = V(X1 , ..., Xn ) with known distribution fV (v|H0 ); Determine the region of rejection R with P(V ∈ R|H0 ) = α; Decision rule: Reject H0 ⇔ v(x1 , ..., xn ) ∈ R.


Winter term 2011/12

58 / 87


Type I and type II errors: Type I error: rejection of H0 even though H0 is true; Type II error: no rejection of H0 even though H0 is false; Probability of a type I error: α = P(V(X1 , ..., Xn ) ∈ R|H0 is true); Probability of a type II error: β = P(V(X1 , ..., Xn ) ∈ A|H0 is false). p-value The p-value is the probability of drawing a statistic that is at least as adverse to the null hypothesis as the value actually computed with the data of the sample, assuming that the null hypothesis is true.


Winter term 2011/12

59 / 87

5 Statistical hypothesis testing 5.2 Tests for the population mean

5.2 Tests for the population mean Let X1 , ..., Xn be i.i.d. N(µ, σ 2 ). Case 1: Test for unknown µ; σ 2 known. Two-sided test 1 H0 : µ = µ0 ; against µ 6= µ0 ; 2 3

e.g. α = 5%; If H0 is true, then

V= 4

X − µ0 √ n ∼ N(0, 1); σ

Region of rejection: Reject H0 if |X − µ0 | is large α = P(|V| > z1− α2 );

5

Reject H0 ⇔ |v(x1 , ..., xn )| > z1− α2 .


Winter term 2011/12

60 / 87


Case 2: Test for unknown µ; σ 2 unknown. Two-sided test 1 H0 : µ = µ0 ; against µ 6= µ0 ; 2

e.g. α = 5%;

3

If H0 is true, then V=

4

X − µ0 √ n ∼ t(n − 1); S′

Region of rejection: Reject H0 if |X − µ0 | is large α = P(|V| > t1− α2 (n − 1));

5

Reject H0 ⇔ |v(x1 , ..., xn )| > t1− α2 (n − 1).


Winter term 2011/12

61 / 87

5 Statistical hypothesis testing

Problem: A producer of ball bearings knows because of long lasting experience that the size X of the produced balls is normally distributed. The producer guarantees a size of µ0 = 2 cm and a standard deviation of σ0 = 0.16 cm. A random sample of n = 25 balls has yielded x = 1.89 cm and S = 0.2 cm. Check whether the claim of the producer concerning the size of the balls is statistically firm at the five percent level. Assume that the porducer’s information on σ is correct Assume that σ is not known.

Compute the p-values of both test statistics.


Winter term 2011/12

62 / 87


Difference-in-mean test Let X1 , ..., Xn be i.i.d. N(µX , σX2 ) and Y1 , ..., Ym be i.i.d. N(µY , σY2 ), X, Y independent, with σ2 σ2 X − Y ∼ N(µX − µY , X + Y ). n m Case 1: σX2 , σY2 known. 1 2 3

4

H0 : µX = µY ; against e.g. α = 5%; If H0 is true, then

Region of rejection:

µX 6= µY ; X−Y V= q 2 ∼ N(0, 1); σY2 σX + n m α = P(|V| > z1− α2 );

5

Reject H0 ⇔ |v(x1 , ..., xn , y1 , ..., ym )| > z1− α2 .


Winter term 2011/12

63 / 87


Case 2: σX2 , σY2 unknown. µX 6= µY ;

1

H0 : µX = µY ;

2

e.g. α = 5%;

3

If H0 is true and n, m ≥ 40, then due to the central limit theorem

against

V= q 4

X−Y ′ SX2

n

+

approx. ′ SY2

∼ N(0, 1);

m

Region of rejection: α = P(|V| > z1− α2 );

5

Reject H0 ⇔ |v(x1 , ..., xn , y1 , ..., ym )| > z1− α2 .


Winter term 2011/12

64 / 87


Problem: Consider the California test score data set of Stock and Watson (2007). Based on data of 420 districts, we analyse whether the class size has a significant effect on the test score of a student. We have the variables district average of the test score: TS student teacher ratio: STR Compare districts with "small" (STR < 20) and "large" (STR ≥ 20) class sizes: Class size i small large

TSi 657.4 650.0

STSi 19.4 17.9

ni 238 182

Check whether there is a significant difference between TSsmall and TSlarge for α = 0.05. Compute the p-value. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

65 / 87

6 Fundamentals of matrix algebra

6 Fundamentals of matrix algebra 6.1 Basic principles 6.2 Multivariate statistics


Winter term 2011/12

66 / 87

6 Fundamentals of matrix algebra 6.1 Basic principles

Basic principles A matrix A is a m × n rectangular array of numbers, written as   a1,1 a1,2 · · · a1,n  a2,1 a2,2 · · · a2,n    A= . .. ..  = (ai,j )i=1,...,m,j=1,...,n. ..  .. . . .  am,1 am,2 · · · am,n

The transpose of a matrix, denoted as A′ is obtained by flipping the matrix on its diagonal. Thus A = (aj,i )i=1,...,m,j=1,...,n.

Example: 1 2 3 A= 0 −6 7 Daniel Rittler (Universitity of Heidelberg)

 1 A′ = 2 3

 0 −6 7 Winter term 2011/12

67 / 87


Special matrices A matrix A is square if m = n; symmetric if A = A′ which requires ai,j = aj,i . diagonal if the off-diagonal elements are all zero, so that ai,j = 0 if i 6= j.

is upper (lower) diagonal if all elements below (above) the diagonal equal zero. An important diagonal matrix is the identity matrix, which has ones on the diagonal. The k × k identity matrix is denoted as   1 0 ··· 0 0 1 · · · 0   Ik =  . . . ..  . . . . . . . . 0 0 ··· 1 Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

68 / 87


Basic operations Matrix addition A = (ai,i )i=1,...,m,j=1,...,n, B = (bi,j )i=1,...,m,j=1,...,n. C = A + B = (ci,j )i=1,...,m,j=1,...,n = (ai,j + bi,j )i=1,...,m,j=1,...,n. Example: 1 3 1 0

1 0 0 + 0 7 5

5 1+0 = 0 1+7

3+0 1+5 0+5 0+0

=

1 3 8 5

6 0

Skalar multiplication A = (ai,i )i=1,...,m,j=1,...,n, λ ∈ R. λ · A = (λ · ai,j )i=1,...,m,j=1,...,n. Example:


1 4· 0

2 3 −6 7

=

4 8 12 0 −24 28 Winter term 2011/12

69 / 87


Matrix multiplication A = (ai,i )i=1,...,l,j=1,...,m, B = (bi,j )i=1,...,m,j=1,...,n. m X C = A · B = (ci,j )i=1,...,l, j=1,...,n ci,j = ai,k · bk,j k=1

Example:

1 0 −1 3

with

 3 2 × 2 1 1

 1 5 1 1 = . 4 2 0

5 = 1 · 3 + 0 · 2 + 2 · 1; 1 = 1 · 1 + 0 · 1 + 2 · 0;

4 = −1 · 3 + 3 · 2 + 1 · 1; 5 = −1 · 1 + 3 · 1 + 1 · 0. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

70 / 87


Properties Matrix addition i) A + B = B + A ii) (A + B) + C = A + (B + C)

Matrix multiplication i) (A · B) · C = A · (B · C) ii) A · B 6= B · A Example: 1 2 0 · 3 4 0

1 0 = 0 0

1 , 3

0 0

1 1 · 3 0

2 4

=

3 0

4 . 0

iii) A · (B + C) = A · B + A · C (B + C) · A = B · A + C · A iv) Multiplication with identity matrix for any m × n matrix M: M · In = Im · M = M. v) Idempotence: A · A = A. vi) Transponse of a product: (A · B · C)′ = C′ · B′ · A′ Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

71 / 87


Quadratic form Consider a symmetric matrix A ∈ Rn×n and a vector x ∈ Rn×1 . The expression x′ Ax =

n X n X

xi ai,j xj

i=1 j=1

is called quadratic form. Implications: A is positive definite if x′ Ax > 0 for all x 6= 0. A is negative definite if x′ Ax < 0 for all x 6= 0. Problem: Show that the matrix   2 −1 0 A = −1 2 −1 0 −1 2 is positive definite.


Winter term 2011/12

72 / 87


Rank and inverse of a matrix The rank of the m × n matrix (n ≤ m)

A = (a1 , ..., an )

is the number of linearly independent columns aj and is written as rank(A). A has full rank if rank(A) = n. Properties: A square k × k matrix A is said to be nonsingular if it is has full rank, e.g. rank(A) = k. This means that there is no k × 1 vector c 6= 0 such that Ac = 0. If a square k × k matrix A is nonsingular then there exists a unique matrix k × k matrix A−1 called the inverse of A which satisfies AA−1 = A−1 A = Ik .

If A is positive or negative definite, then A is nonsingular. Problem: Compute the inverse of the matrix 8 2 A= . 2 1 Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

73 / 87


Trace of a matrix The trace of an k × k square matrix A is defined to be the sum of the elements on the main diagonal, i.e., k X tr(A) = ai,i i=1

Properties for square matrices A and B and real λ are : tr(λA) = λtr(A); tr(A′ ) = tr(A); tr(A + B) = tr(A) + tr(B);

tr(Ik ) = k; If A is an m × n matrix and B is an n × m matrix, then tr(AB) = tr(BA). Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

74 / 87

6 Fundamentals of matrix algebra 6.2 Multivariate statistics

6.2 Multivariate statistics Mean vector

Definition Consider a n-dimensional random vector x′ = (x1 , ..., xn ). The vector µ′ = (µ1 , ..., µn ) with Z µi = E(xi ) =

xi fxi (xi )dxi

R

is called the mean vector of x.


Winter term 2011/12

75 / 87


Covariance matrix

Definition Consider a n-dimensional random vector x′ = (x1 , ..., xn ). The covariance matrix is given by Σ = cov(x) = E((x − µ)(x − µ)′ )  E[(x1 − µ1 )2 ] E[(x1 − µ1 )(x2 − µ2 )] E[(x2 − µ2 )(x1 − µ1 )] E[(x2 − µ2 )2 ]  = .. ..  . . E[(xn − µn )(x1 − µ1 )] E[(xn − µn )(x2 − µ2 )]

··· ··· .. . ···

 E[(x1 − µ1 )(xn − µn )] E[(x2 − µ2 )(xn − µn )]  . ..  . E[(xn − µn )2 ]

Properties of Σ Σ = E(xx′ ) − µµ′ ; The diagonal entries are the variances E((xi − µi )2 ) =: σi2 , i = 1, ..., n; The off-diagonal entries are the covariances E((xi − µi )(xj − µj )) =: σij , i, j = 1, ..., n; i 6= j. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

76 / 87


Covariance matrix of two random vectors X and Y

Definition Consider a n-dimensional random vector x′ = (x1 , ..., xn ) with E(x) = µ and a m-dimensional random vector y′ = (y1 , ..., yn ) with E(y) = ν. The covariance between x and y is defined as Σxy = cov(x, y) = E((x − µ)(y − ν)′ )   E[(x1 − µ1 )(y1 − ν1 )] · · · E[(x1 − µ1 )(ym − νm )]   .. .. .. = . . . . E[(xn − µn )(y1 − ν1 )] · · · E[(xn − µn )(ym − νm )]

Properties of Σ cov(x, y) contains the covariances between components of x and y; cov(x) contains the covariances between components of x; Defining the vector z′ := (x′ y′ ) yields all covariances, that is cov(z) = Daniel Rittler (Universitity of Heidelberg)

cov(x) cov(y, x)

cov(x, y) cov(y)

=

Σx Σyx

Σxy Σy

Winter term 2011/12

77 / 87


Mean vector and covariance matrix of a linear combination Consider a n-dimensional random vector x′ = (x1 , ..., xn ) with mean vector µ, covariance matrix Σ, and a vector of constants a′ = (a1 , ..., an ). z = a′ x is a random scalar with z = a′ x = a1 x1 + a2 x2 + ... + an xn . Hence the mean of z = a′ x is given by E(z) = a1 E(x1 ) + a2 E(x2 ) + ...an E(xn ) = a′ E(x) = a′ µ. The variance is Var(z) = E((z − E(z))2 ) = E((a′ x − a′ µ)2 ) = E((a′ (x − µ))2 ). Since a′ (x − µ) is a scalar it is identical with (x − µ)′ a. Hence, Var(z) = E(a′ (x − µ)(x − µ)′ a) = a′ E((x − µ)(x − µ)′ )a = a′ Σa. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

78 / 87


Example: Consider the 2-dimensional random vector x′ = (x1 , x2 ) with mean vector µ, covariance matrix Σ, and a vector of constants a′ = (a1 , a2 ). Mean and variance of the linear combination are given by µz = a1 µ1 + a2 µ2 and Var(z) = a′ Σa

σ12 = (a1 a2 ) σ21

σ12 σ22

a1 a2

= a21 σ12 + a22 σ22 + 2a1 a2 σ12 .


Winter term 2011/12

79 / 87


Mean vector and covariance matrix of a linear combination Consider a n-dimensional random vector x′ = (x1 , ..., xn ) with mean vector µ, covariance matrix Σ, and a n × p-matrix of constants A. z = A′ x is a random vector with

z = A′ x. The mean vector and the covariance matrix are given by E(A′ x) = A′ µ, Var(A′ x) = A′ ΣA. Problem: Name the dimensions of A′ µ and A′ ΣA.


Winter term 2011/12

80 / 87


Correlation matrix

Definition Consider a n-dimensional random vector x′ = (x1 , ..., xn ) with mean vector µ and covariance matrix Σ. The correlation matrix is given by 

1 ρ21  ρ= .  .. ρn1

ρ12 1 .. . ρn2

··· ··· .. . ···

 ρ1n ρ2n   ..  . .  1

Properties of ρ σ ρij := σi σij j ∈ [−1, 1], i, j = 1, ..., n.


Winter term 2011/12

81 / 87


Relationship between Σ and ρ Consider a n-dimensional random vector x′ = (x1 , ..., xn ) with mean vector µ, covariance matrix Σ and correlation matrix ρ. We define the matrices  σ1 0 · · ·  0 σ2 · · ·  D=. .. . .  .. . . 0

0

···

 0 0  ..  .

and D−1

σn



1/σ1  0  = .  .. 0

0 1/σ2 .. .

··· ··· .. .

0 0 .. .

0

···

1/σn

where σi is the standard deviation of random variable xi .



  , 

The relationship between Σ and ρ is given by Σ = DρD; ρ = D−1 ΣD−1 .


Winter term 2011/12

82 / 87


Rank of Σ and ρ Var(a′ x) ≥ 0 for all a;

Since Var(a′ x) = a′ Σa, Σ is positive semi-definite; Since D is nonsingular and ρ = D−1 ΣD−1 , ρ is positive semi-definite and rank(ρ) = rank(Σ) ≤ n; For rank(Σ) = n, Σ is positive definite, since a′ Σa > 0 a 6= 0; For rank(Σ) < n, there exists a a 6= 0 such that a′ x is a constant, and hence, a′ Σa = 0. This indicates that Σ is positive semi-definite but not positive definite.

If rank(Σ) < n at least one of the components of x is a linear combination of other components. That means that the information of this variable is redundant since the information of this variable is provided by other variables.


Winter term 2011/12

83 / 87


Further properties E(x + y) = E(x) + E(y); E(Ax + b) = AE(x) + b; cov(Ax + b) = Acov(x)A′ ; cov(Ax + a, By + b) = Acov(x, y)B′ .


Winter term 2011/12

84 / 87


Fundamentals of the multivariate normal distribution

Definition A continuous random vector x′ = (x1 , ..., xn ) with density 1 1 ′ −1 p (x − µ) Σ (x − µ) , f(x1 ,...,xn) (x1 , ..., xn ) = exp − n 2 (2π) 2 det(Σ) where Σ is a positive definite n × n matrix, Σ−1 is the inverse of Σ and det(Σ−1 ) is the determinante of Σ, is called n-variate normal.

Mean vector and covariance matrix of the normal distribution are given by µ and Σ. It is conventional to write y ∼ Nn (µ, Σ); The transformation 1 y = Σ− 2 (x − µ) is called the standardization of x. We write: y ∼ Nn (0, In ). If x ∼ Nn (µ, Σ) and y = Ax + b where A ∈ Rn×m and b ∈ Rm×1 then y ∼ Nm (Aµ + b, AµA′ ). Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

85 / 87


Marginal distributions and conditional density Consider the random vector z′ = (y′ , x′ ) with x′ = (x1 , ..., xn ) and y′ = (y1 , ..., ym ) where y µy Σy Σyx z= ∼ Nm+n (µ, Σ) with µ = , Σ= . x µx Σxy Σx

If Σx and Σy are positive definite then

x ∼ Nn (µx , Σx ), y ∼ Nm (µy , Σy ), that means, the marginal distributions are normal. The conditional distribution of y|x is given by y|x ∼ Nm (µy|x , Σy|x ), where

µy|x = µy + Σyx Σ−1 y (x − µy ) = b0 + Bx; Σy|x = Σy Σyx Σ−1 x Σxy

with Daniel Rittler (Universitity of Heidelberg)

b0 = µy − Bµx and B = Σyx Σ−1 x .

Winter term 2011/12

86 / 87


Example Consider the random vector z′ = (y, x) where y µy z= ∼ N2 (µ, Σ) with µ = , x µx

Σ=

Σy Σxy

Σyx . Σx

In this case, we have Σy = σy2

Σyx = σyx

Σxy = σxy

Σx = σx2

Hence, the distribution of y|x is given by y|x ∼ N(µy|x , σy|x ), where

σyx σyx µy|x = µy − 2 µx + 2 x. σ σx | {z x } |{z} =β0

β1

Remember: β0 and β1 are the coefficients of a linear regression of y on x. Daniel Rittler (Universitity of Heidelberg)

Winter term 2011/12

87 / 87

Preparatory Course Econometrics Probability Theory - Statistical ...

Preparatory Course Econometrics Probability Theory - Statistical ...

Suggest Documents

Course in Probability, Statistics and Econometrics

Mathematics - Preparatory course

Probability Theory - I Sem Statistics Complementary Course

Diploma Preparatory Course - Google Sites

CIMA Qualification preparatory course details

A Course in Statistical Theory - Google Sites

A Course in Statistical Theory

Probability Theory

Probability and Statistical Applications

ADVANCED PROBABILITY - Statistical Laboratory

PROBABILITY AND STATISTICAL INFERENCE Probability vs ...

Course Catalog Dowload - Legend College Preparatory

MSc Finance and Financial Law Preparatory Course

THE PROBABILITY APPROACH IN ECONOMETRICS BY ... - Smu

PdF Probability Theory and Statistical Inference - Google Sites

PHR /SPHR Preparatory Course ® ® THE INTERNATIONAL ...

CILS Preparatory Course Registration - University of Pennsylvania

CIMA Qualification preparatory course details

Probability Models Course Information

PDF Download Probability Theory: A Concise Course ... - Google Sites

PDF Probability Theory: A Concise Course - Google Sites

A Course on Elementary Probability Theory arXiv:1703.09648v1 [math ...

A Course on Probability Theory for Computer Scientists

A short course on mean field spin glasses - Probability Theory