example. Markov model of an SIR epidemic. S. I. R i r. 1 â i. 1 â r. 1.... St. It. Rt... = .... the average value of some function f(x) under a distribution p(x) ...
Pattern Recognition Prof. Christian Bauckhage
outline lecture 05 recap discrete Markov chains basic probability theory summary exercises
basic terms and concepts from linear algebra vector space, inner product space, normed space linear combinations (convex, conic, affine, linear) span, linear independence, bases Lp norms / distances for Rm standard simplex in Rm
discrete Markov chains
stochastic vector
q ∈ Rm is a stochastic vector if q 0 and kqk1 = 1
stochastic vector
q ∈ Rm is a stochastic vector if q 0 and kqk1 = 1 ⇔ q ∈ Rm is stochastic if q ∈ ∆m−1
q e1
stochastic matrix
P ∈ Rm×n is column (row) stochastic if each of its columns (rows) is a stochastic vector P ∈ Rm×m is bi-stochastic if it is column- and row stochastic
stochastic matrix
P ∈ Rm×n is column (row) stochastic if each of its columns (rows) is a stochastic vector P ∈ Rm×m is bi-stochastic if it is column- and row stochastic
alternative terminology column stochastic ⇔ left stochastic row stochastic ⇔ right stochastic bi-stochastic ⇔ doubly stochastic
the literature typically considers row stochastic vectors and row stochastic matrices
the literature typically considers row stochastic vectors and row stochastic matrices in this course, we will consider column stochastic vectors and column stochastic matrices
the literature typically considers row stochastic vectors and row stochastic matrices in this course, we will consider column stochastic vectors and column stochastic matrices conceptually, this is no big deal . . . live with it
the literature typically considers row stochastic vectors and row stochastic matrices in this course, we will consider column stochastic vectors and column stochastic matrices conceptually, this is no big deal . . . live with it ⇒ from now on: stochastic matrix ⇔ column stochastic matrix
Lemma If P ∈ Rm×n is a stochastic matrix and q ∈ Rn is a stochastic vector, then r ∈ Rm where r = P q is a stochastic vector.
Lemma If P ∈ Rm×n is a stochastic matrix and q ∈ Rn is a stochastic vector, then r ∈ Rm where r = P q is a stochastic vector.
Proof. krk1 =
X i
Lemma If P ∈ Rm×n is a stochastic matrix and q ∈ Rn is a stochastic vector, then r ∈ Rm where r = P q is a stochastic vector.
Proof. krk1 =
X i
ri =
XX i
Pij qj
Lemma If P ∈ Rm×n is a stochastic matrix and q ∈ Rn is a stochastic vector, then r ∈ Rm where r = P q is a stochastic vector.
Proof. krk1 =
X i
ri =
XX i
Pij qj =
X j
X i
Pij =
X j
Lemma If P ∈ Rm×n is a stochastic matrix and q ∈ Rn is a stochastic vector, then r ∈ Rm where r = P q is a stochastic vector.
Proof. krk1 =
X i
ri =
XX i
Pij qj =
X j
X i
Pij =
X j
qj = 1
Lemma If P ∈ Rm×k and Q ∈ Rk×n are stochastic matrices, then R ∈ Rm×n where R = PQ is a stochastic matrix.
Lemma If P ∈ Rm×k and Q ∈ Rk×n are stochastic matrices, then R ∈ Rm×n where R = PQ is a stochastic matrix.
Proof. Since Q = q1 q2 . . . qn and R = r1 r2 . . . rn , we note that R = PQ ⇔ ri = Pqi and resort to the previous Lemma.
stochastic matrices and vectors play a crucial role in Markov process models
Markov chains
used to model systems that have m possible states and, at any one time, are in one and only one of their m states the set Q = q1 , . . . , qm of states is called the state space state transitions happen according to certain probabilities
Markov chains
used to model systems that have m possible states and, at any one time, are in one and only one of their m states the set Q = q1 , . . . , qm of states is called the state space state transitions happen according to certain probabilities for instance, Markov model of an SIR epidemic 1−i
types of Markov chains
discrete-time Markov chain ⇔ a stochastic model that has the Markov property p Xt+1 = qit+1 | Xt = qit , . . . , X1 = qi1 = p Xt+1 = qit+1 | Xt = qit
homogenous discrete-time Markov chain ⇔ a discrete-time Markov chain such that p Xt+1 = qi | Xt = qj = p Xt = qi | Xt−1 = qj = p qi | qj = pij
Markov processes
the dynamics of a homogenous DTMC are governed by qt = P qt−1 where P ⇔ transition matrix q ⇔ state vector such that pij ⇔ p i ← j qi ⇔ p i
example Markov model of an SIR epidemic 1−i
1−i 0 0 St−1 St 1 − r 0 It−1 It = i Rt 0 r 1 Rt−1
example Markovian SIR dynamics
St It Rt
St It Rt
i = 43 , r =
1 2
i = 14 , r =
1 2
example Markovian SIR dynamics
St It Rt
St It Rt
basic probability theory
degree of belief in the truth of various propositions
degree of belief in the truth of various propositions
examples of propositions A = it will rain this afternoon B = this is a fair coin C = this coin will come up heads twice as likely as tails D = this image shows a face Yi = party i will win the upcoming election
3 requirements for consistent reasoning
1) transitivity 2) closure 3) conditional probability
if we believe X more than Y and Y more than Z, then we must believe X more than Z
if we believe X more than Y and Y more than Z, then we must believe X more than Z ⇒ implies an ordering
if we believe X more than Y and Y more than Z, then we must believe X more than Z ⇒ implies an ordering ⇒ assign real numbers to beliefs
if we believe X more than Y and Y more than Z, then we must believe X more than Z ⇒ implies an ordering ⇒ assign real numbers to beliefs ⇔ the larger the value associated with a proposition, the more we believe it
if we believe X more than Y and Y more than Z, then we must believe X more than Z ⇒ implies an ordering ⇒ assign real numbers to beliefs ⇔ the larger the value associated with a proposition, the more we believe it 0 = prob false disbelief 1 = prob true certainty
if we specify, how much we believe that X is true, we implicitly specify our disbelief
if we specify, how much we believe that X is true, we implicitly specify our disbelief ⇒ sum rule prob X + prob ¬X = 1
conditional probability
if we first state how much we believe that Y is true, and then state how much we believe that X is true given that Y is true, we implicitly specify, how much we believe that both X and Y are true
conditional probability
if we first state how much we believe that Y is true, and then state how much we believe that X is true given that Y is true, we implicitly specify, how much we believe that both X and Y are true ⇒ product rule prob X, Y = prob X Y prob Y
sum and product rule define the algebra of probability more results can be derived therefrom
Bayes’ theorem
prob Y X prob X prob X Y = prob Y
Bayes’ theorem
prob Y X prob X prob X Y = prob Y
this is because prob X, Y = prob Y, X ⇔ prob X Y prob Y = prob Y X prob X
prob X, Y + prob X, ¬Y = prob Y X + prob ¬Y X prob X
prob X, Y + prob X, ¬Y = prob Y X + prob ¬Y X prob X = prob X
marginalization n let Yi i=1 be a set of mutually exclusive propositions
marginalization n let Yi i=1 be a set of mutually exclusive propositions, then n X
prob Yi X = 1
and n X i=1
prob X, Yi = prob X
towards the continuum
if there are infinitely many mutually exclusive possibilities (e.g. Y = height of a person), then ∞ Z
prob Y X dY = 1 −∞
and ∞ Z
prob X, Y dY = prob X −∞
prob X, Y is technically a probability density function Zy2
pdf X, Y dY
prob X, y1 6 Y 6 y2 = y1
to get probabilities out of densities, we have to integrate
to get probabilities out of densities, we have to integrate we will henceforth drop this distinction and simply write p X, Y to indicate either prob X, Y or pdf X, Y
if X and Y are independent, then p X, Y = p X p Y because p X, Y = p X Y p Y and p X Y =p X
random variable
a variable X whose value is subject to chance
random variable
a variable X whose value is subject to chance it can assume different values, each according to an associated probability
random variable
a variable X whose value is subject to chance it can assume different values, each according to an associated probability to express that X ∈ R is distributed according to p(x), we write X ∼ p(x)
pX (x)
the average value of some function f (x) under a distribution p(x) is called the expectation of f (x) we have X E f (x) = f (x) p(x) x
Z E f (x) = f (x) p(x) dx
special case
expectation of a random variable X Z E X = x p(x) dx ≡µ
symmetric, unimodal distribution
skewed, unimodal distribution
multimodal distribution
multimodal distribution
special cases
averaging a function of several variables Z E f (x, y) = f (x, y) p(x, y) dx dy
special cases
averaging a function of several variables Z E f (x, y) = f (x, y) p(x, y) dx dy
averaging a function of several variables over one variable Z Ex f (x, y) = f (x, y) p(x) dx conditional expectation Z Ex f | y = f (x) p(x | y) dx
E E f (x) = E f (x)
E E f (x) = E f (x)
because E E f (x) =
f (x) p(x) dx p(z) dz
E E f (x) = E f (x)
because E E f (x) =
f (x) p(x) dx p(z) dz Z Z = p(z) dz f (x) p(x) dx
E E f (x) = E f (x)
because E E f (x) =
f (x) p(x) dx p(z) dz Z Z = p(z) dz f (x) p(x) dx Z = f (x) p(x) dx = E f (x)
the variability of f (x) around the mean E f (x) is called the variance of f (x) we have 2 var f (x) = E f (x) − E f (x)
the variability of f (x) around the mean E f (x) is called the variance of f (x) we have 2 var f (x) = E f (x) − E f (x) and note that h i var f = E f 2 − 2 f E f + E2 f = E f 2 − 2 E f E f + E2 f = E f 2 − E2 f
special case
variance of a random variable X 2 var X = E X − E X 2 =E X−µ ≡ σ2
special case
variance of a random variable X Z var X = (x − µ)2 p(x) dx Z
Z 2
= x p(x) dx − 2µ x p(x) dx + µ Z = x2 p(x) dx − µ2 ≡ σ2
Z 2
p(x) dx
once again
var = expected deviation from expected value
for two random variables X and Y, we have cov X, Y = EXY X − E X Y −E Y
= EXY XY − E X E Y
covariance matrix
for two random vectors x and y, we have T cov x, y = Exy x − E x y−E y
T = Exy xyT − E x E y ≡C
covariance matrix
in particular, we have cov x, x = E xxT − µµT where µ=E x
we now know about
basic terminology and concepts of probability theory
show that all of the following are indeed identical p(X, Y, Z) = p(X | Y, Z) p(Y, Z) = p(X | Y, Z) p(Y | Z) p(Z) = p(Y | X, Z) p(Z | X) p(X) = p(Y, Z | X) p(X) .. .
show that, for a constant c and a random variable X a) E[c + X] = c + E[X] b) E[cX] = c E[X] c) var[c + X] = var[X] d) var[cX] = c2 var[X] show that, for two random variables X and Y a) E[X + Y] = E[X] + E[Y] show that, for two independent random variables X and Y a) E[XY] = E[X] E[Y]