Probability and stochastic process is important in: The statistical modeling of
sources that generate the information;. The digitization of the source output;.
Wireless Information Transmission System Lab.
Chapter 2 Probability and Stochastic Processes
Institute of Communications Engineering
National Sun Yat-sen University
Table of Contents Probability Random Variables, Probability Distributions, and Probability Densities Functions of Random Variables Statistical Averages of Random Variables Some Useful Probability Distributions Upper Bounds on the Tail Probability Sums of Random Variables and the Central Limit Theorem
Stochastic Processes Statistical Averages Power Density Spectrum Response of a Linear Time-Invariant System to a Random Input Signal Sampling Theorem for Band-Limited Stochastic Processes Discrete-Time Stochastic Signals and Systems Cyclostationary Processes 2
Introduction Probability and stochastic process is important in: The statistical modeling of sources that generate the information; The digitization of the source output; The characterization of the channel through which the digital information is transmitted; The design of the receiver that processes the informationbearing signal from the channel; The evaluation of the performance of the communication system.
3
2.1 Probability Sample space or certain event of a die experiment:
S = {1,2,3,4,5,6} The six outcomes are the sample points of the experiment. An event is a subset of S, and may consist of any number of sample points. For example: A = {2,4}
The complement of the event A, denoted by A , consists of all the sample points in S that are not in A: A = {1,3,5,6} 4
2.1 Probability Two events are said to be mutually exclusive if they have no sample points in common – that is, if the occurrence of one event excludes the occurrence of the other. For example: A = {2,4}; B = {1,3,6} A and A are mutually exclusive events. The union (sum) of two events in an event that consists of all the sample points in the two events. For example: C = {1,2,3} D = B ∪ C = {1,2,3,6} A∪ A = S 5
2.1 Probability The intersection of two events is an event that consists of the points that are common to the two events. For example: E = B ∩ C = {1,3} When the events are mutually exclusive, the intersection is the null event, denoted as φ. For example: A∩ A = φ
6
2.1 Probability Associated with each event A contained in S is its probability P(A). Three postulations: P(A)≥0. The probability of the sample space is P(S)=1. Suppose that Ai , i =1, 2, …, are a (possibly infinite) number of events in the sample space S such that Ai ∩ A j = φ ; i ≠ j = 1,2,... Then the probability of the union of these mutually exclusive events satisfies the condition: ⎛ ⎞ P⎜⎜ ∪ Ai ⎟⎟ = ∑ P( Ai ) ⎝ i ⎠ i 7
2.1 Probability Joint events and joint probabilities (two experiments) If one experiment has the possible outcomes Ai , i =1,2,…,n, and the second experiment has the possible outcomes Bj , j =1,2,…,m, then the combined experiment has the possible joint outcomes (Ai ,Bj), i =1,2,…,n, j =1,2,…,m. Associated with each joint outcome (Ai ,Bj) is the joint probability P (Ai ,Bj) which satisfies the condition: 0 ≤ P ( Ai , B j ) ≤ 1 Assuming that the outcomes Bj , j =1,2,…,m, are mutually exclusive, it follows that: m ∑ P ( Ai , B j ) = P ( Ai ) j =1
If all the outcomes of the two experiments are mutually exclusive, n m n then: P( A ,B ) = P( A ) =1
∑∑ i =1 j =1
i
∑
j
i =1
8
i
2.1 Probability Conditional probabilities The conditional probability of the event A given the occurrence of the event B is defined as: P ( A, B ) P( A | B) = P( B) provided P(B)>0. P ( A, B ) = P ( A | B ) P ( B ) = P ( B | A) P ( A) P( A, B ) is interpreted as the probability of A ∩ B. That is, P( A, B ) denotes the simultaneous occurrence of A and B. If two events A and B are mutually exclusive, A ∩ B = φ , then P( A | B) = 0. If B is a subset of A, we have A ∩ B = B and P( A | B ) = 1. 9
2.1 Probability Bayes’ theorem: If Ai , i = 1, 2,..., n, are mutually exclusive events such that n
∪A =S i
i =1
and B is an arbitrary event with nonzero probability, then P ( Ai , B ) P ( B | Ai ) P ( Ai ) P ( Ai | B ) = = n P( B) ∑ P( B | Aj ) P( Aj ) j =1
P(Ai) represents their a priori probabilities and P(Ai|B) is the a posteriori probability of Ai conditioned on having observed the received signal B. 10
2.1 Probability Statistical independence If the occurrence of A does not depend on the occurrence of B, then P ( A | B ) = P ( A). P ( A, B ) = P ( A | B ) P ( B ) = P ( A) P ( B )
When the events A and B satisfy the relation P(A,B)=P(A)P(B), they are said to be statistically independent. Three statistically independent events A1, A2, and A3 must satisfy the following conditions: P( A1 , A2 ) = P( A1 ) P( A2 ) P( A1 , A3 ) = P( A1 ) P ( A3 ) P( A2 , A3 ) = P ( A2 ) P( A3 ) P( A1 , A2 , A3 ) = P( A1 ) P ( A2 ) P( A3 ) 11
2.1.1 Random Variables, Probability Distributions, and Probability Densities Given an experiment having a sample space S and elements s ∈ S , we define a funciton X ( s ) whose domain is S and whose range is a set of numbers on the real line. The function X(s) is called a random variable. Example 1: If we flip a coin, the possible outcomes are head (H) and tail (T), so S contains two points labeled H and T. Suppose we define a function X(s) (s = H) ⎧+ 1 such that: X (s ) = ⎨ (s = T) ⎩-1 Thus we have mapped the two possible outcomes of the coin-flipping experiment into the two points ( +1,-1) on the real line. Example 2: Tossing a die with possible outcomes S={1,2,3,4,5,6}. A random variable defined on this sample space may be X(s)=s, in which case the outcomes of the experiment are mapped into the integers 1,…,6, or, perhaps, X(s)=s2, in which case the possible outcomes are mapped into the integers {1,4,9,16,25,36}. 12
2.1.1 Random Variables, Probability Distributions, and Probability Densities Give a random variable X, let us consider the event {X≤x} where x is any real number in the interval (-∞,∞). We write the probability of this event as P(X ≤x) and denote it simply by F(x), i.e., F ( x) = P( X ≤ x), -∞ < x < ∞ The function F(x) is called the probability distribution function of the random variable X. It is also called the cumulative distribution function (CDF). 0 ≤ F ( x) ≤ 1
F (−∞) = 0 and F (∞) = 1.
13
2.1.1 Random Variables, Probability Distributions, and Probability Densities Examples of the cumulative distribution functions of two discrete random variables.
14
2.1.1 Random Variables, Probability Distributions, and Probability Densities An example of the cumulative distribution function of a continuous random variable.
15
2.1.1 Random Variables, Probability Distributions, and Probability Densities An example of the cumulative distribution function of a random variable of a mixed type.
16
2.1.1 Random Variables, Probability Distributions, and Probability Densities The derivative of the CDF F(x), denoted as p(x), is called the probability density function (PDF) of the random variable X. dF ( x ) p( x) = , dx
−∞ < x < ∞
x
F ( x) =
∫ p(u )du ,
−∞ < x < ∞
−∞
When the random variable is discrete or of a mixed type, the PDF contains impulses at the points of discontinuity n of F(x): p ( x ) = ∑ P ( X = xi )δ ( x − xi ) i =1
17
2.1.1 Random Variables, Probability Distributions, and Probability Densities Determinin g the probabilit y that a random variable X falls in an interval ( x1 , x2 ) , where x2 > x1. P ( X ≤ x2 ) = P ( X ≤ x1 ) + P ( x1 < X ≤ x2 ) F ( x2 ) = F ( x1 ) + P ( x1 < X ≤ x2 ) ⇒ P ( x1 < X ≤ x2 ) = F ( x2 ) − F ( x1 ) x2
= ∫ p ( x ) dx x1
The probabilit y of the event {x1 < X ≤ x2 }is simply the area under the PDF in the range x1 < X ≤ x2 . 18
2.1.1 Random Variables, Probability Distributions, and Probability Densities Multiple random variables, joint probability distributions, and joint probability densities: (two random variables) Joint CDF : F ( x1 , x2 ) = P ( X 1 ≤ x1 , X 2 ≤ x2 ) = ∫
x1
∫
x2
-∞ -∞
p (u1 , u 2 ) du1d u 2
∂2 F ( x1 , x2 ) Joint PDF : p ( x1 , x2 ) = ∂x1∂x2
∫
∞
−∞
∫
p ( x1 , x2 ) dx1 = p ( x2 )
∞
−∞
p ( x1 , x2 ) dx 2 = p ( x1 )
The PDFs p ( x1 ) and p ( x2 ) obtained from integratin g over one of the variables are called marginal PDFs. ∞
∫ ∫
∞
-∞ -∞
p ( x1 , x2 ) dx1d x2 = F (∞, ∞ ) = 1
Note that : F (− ∞,−∞ ) = F (− ∞, x2 ) = F ( x1 ,−∞ ) = 0. 19
2.1.1 Random Variables, Probability Distributions, and Probability Densities Multiple random variables, joint probability distributions, and joint probability densities: (multidimensional random variables)
Suppose that X i , i = 1,2,...,n, are random variables. Joint CDF F ( x1 , x2 ,..., xn ) = P( X 1 ≤ x1 , X 2 ≤ x2 ,..., X n ≤ xn ) x2
xn
-∞ -∞
−∞
=∫
x1
∫ ...∫
p(u1 , u2 ,..., un )du1du2 ...dun
∂n Joint PDF p( x1 , x2 ,..., xn ) = F ( x1 , x2 ,..., xn ) ∂x1∂x2 ...∂xn ∞
∫ ∫
∞
−∞ −∞
p( x1 , x2 ,..., xn )dx2 dx3 = p( x1 , x4 ,..., xn )
F ( x1 , ∞, ∞, x4 ,..., xn ) = F ( x1 , x4 , x5 ,..., xn ). F ( x1 ,−∞,−∞, x4 ,..., xn ) = 0. 20
2.1.1 Random Variables, Probability Distributions, and Probability Densities Conditional probability distribution functions (consider two random variables): Probability of the event (X1 ≤ x1 | x2 − Δx2 < X 2 ≤ x2 ) is the probability that the random variable X 1 ≤ x1 conditioned on x2 − Δx2 < X 2 ≤ x2 . Bayes’ Theorem
P(X1 ≤ x1 | x2 − Δx2 < X 2
x1
∫ ∫ ≤ x )=
x2
− ∞ x2 − Δx2 x2
2
p(u1 , u2 )du1du2
∫
x2 − Δx2
p (u 2 )du2
x2 − Δx2
p (u , u )du du − ∫ ∫ p(u , u )du du ∫ ∫ = ⎡ p(u )du − ⎤ ( ) p u du ∫ ⎢⎣ ∫ ⎥⎦ x1
x2
−∞ −∞
x1
1
2
1
2
−∞ −∞ x2 − Δx2
2
−∞
x2
−∞
2
F ( x1 , x2 ) − F ( x1 , x2 − Δx2 ) = F ( x2 ) − F ( x2 − Δx2 )
21
1
2
2
1
2
2
P ( X 1 ≤ x1 | X 2 = x2 )
= lim P ( X 1 ≤ x1 | x2 − Δx2 < X 2 ≤ x2 ) Δx2 → 0
2.1.1 Random Variables, Probability Distributions, and Probability Densities Conditional probability distribution functions (consider two random variables): Divide both numerator and denominator by Δx2 and let Δx2 → 0 The conditional CDF of X 1 given the random variable X 2 is given by: ∂F ( x1 , x2 ) / ∂x2 P ( X1 ≤ x1 | X 2 = x2 ) ≡ F ( x1 | x2 ) = ∂ F ( x2 ) / ∂ x2 x1 x2 ⎡ ∂ ∫ ∫ p ( u1 , u2 ) du1du2 ⎤ ⎢⎣ −∞ −∞ ⎥⎦
=
∂ ⎡ ∫ p (u2 ) du2 ⎤ ⎣⎢ −∞ ⎦⎥ x2
∂ x2
∫ =
x1
−∞
∂x2
p ( u1 , x2 ) du1 p ( x2 )
Observe that F ( −∞ | x2 ) = 0 and F ( ∞ | x2 ) = 1. 22
( *)
2.1.1 Random Variables, Probability Distributions, and Probability Densities Conditional probability distribution functions (consider two random variables): Differentiating Equation (*) with respect to x1 , we obtain p ( x1 , x2 ) p ( x1 | x2 ) = p ( x2 ) We may express the joint PDF p ( x1 , x2 ) in terms of the conditional PDFs, p ( x1 | x2 ) or p ( x2 | x1 ) , as: p ( x1 , x2 ) = p ( x1 | x2 ) p ( x2 ) = p ( x2 | x1 ) p ( x1 )
23
2.1.1 Random Variables, Probability Distributions, and Probability Densities Conditional probability distribution functions (consider multidimensional variables): Joint PDF of the random variables X i , i = 1, 2,..., n,
p ( x1 , x2 ,..., xn ) = p ( x1 , x2 ,..., xk | xk +1 ,..., xn ) p ( xk +1 ,..., xn )
Joint conditional CDF: F ( x1 , x2 ,..., xk | xk +1 ,..., xn ) x1
xk
−∞
−∞
...∫ ∫ =
p ( u1 , u2 ,..., uk , xk +1 ,..., xn ) du1du2 ...duk p ( xk +1 ,..., xn )
F ( ∞, x2 ,..., xk , xk +1 ,..., xn ) = F ( x2 ,..., xk , xk +1 ,..., xn ) F ( −∞, x2 ,..., xk | xk +1 ,..., xn ) = 0 24
2.1.1 Random Variables, Probability Distributions, and Probability Densities Statistically independent random variables: If the experiments result in mutually exclusive outcomes, the probability of an outcome is independent of an outcome in any other experiment. The joint probability of the outcomes factors into a product of the probabilities corresponding to each outcome. Consequently, the random variables corresponding to the outcomes in these experiments are independent in the sense that their joint PDF factors into a product of marginal PDFs. The multidimensional random variables are statistically independent if and only if: F ( x1 , x2 ,..., xn ) = F ( x1 )F ( x2 )...F ( xn ) p ( x1 , x2 ,..., xn ) = p ( x1 ) p ( x2 )... p ( xn ) 25
2.1.2 Functions of Random Variables Problem: given a random variable X, which is characterized by its PDF p(x), determine the PDF of the random variable Y=g(X), where g(X) is some given function of X. When the mapping g from X to Y is one-to-one, the determination of p(y) is relatively straightforward. However, when the mapping is not one-to-one, as in the case, for example, when Y=X2, we must be very careful in our derivation of p(y).
26
2.1.2 Functions of Random Variables Transformation of a random variable
27
2.1.2 Functions of Random Variables Example 2.1-1: Y=aX+b; where a and b are constant and assume that a>0. y −b FY ( y ) = P (Y ≤ y ) = P ( aX + b ≤ y ) = P ( X ≤ ) a ( y-b ) ⎛ y −b⎞ = ∫ a p X ( x ) dx = FX ⎜ ⎟ -∞ ⎝ a ⎠ 1 ⎛ y −b⎞ pY ( y ) = p X ⎜ ⎟ a ⎝ a ⎠
( F ( y ) 對 y 作微分 ) Y
28
2.1.2 Functions of Random Variables Example 2.1-1 (cont.): Y=aX+b; where a and b are constant and assume that a>0.
29
2.1.2 Functions of Random Variables Example 2.1-2: Y=aX3+b; a>0. The mapping between X and Y is one-to-one.
FY ( y ) = P (Y ≤ y ) = P ( aX 3 + b ≤ y ) 1 1 ⎡ ⎡ 3⎤ ⎛ y −b⎞ 3⎤ ⎛ y −b⎞ = P⎢ X ≤ ⎜ ⎟ ⎥ ⎟ ⎥ = FX ⎢⎜ ⎝ a ⎠ ⎥⎦ ⎢⎣⎝ a ⎠ ⎥⎦ ⎢⎣ Differenti ation with respect to y yields the desired
relationsh ip between th e two PDFs 1 ⎡ ⎛ y − b ⎞1 3 ⎤ pY ( y ) = p X ⎢⎢ ⎜ a ⎟ ⎥⎥ 23 ⎠ ⎦ ⎣⎝ 3a[( y − b) a ] 30
2.1.2 Functions of Random Variables Example 2.1-3: Y=aX2+b; a>0. The mapping between X and Y is not one-to-one.
31
2.1.2 Functions of Random Variables Example 2.1-3 (cont.): Y=aX2+b; a>0.
⎛ FY ( y ) = P (Y ≤ y ) = P ( aX + b ≤ y ) = P ⎜⎜ X ≤ ⎝ 2
y −b ⎞ ⎟ a ⎟⎠
⎛ y −b ⎞ ⎛ y −b ⎞ ⎟ − FX ⎜ − ⎟ FY ( y ) = FX ⎜⎜ ⎟ ⎜ ⎟ a a ⎝ ⎠ ⎝ ⎠ Differentiating with respect to y, we obtain the PDF of Y in terms of the PDF of X in the form:
pX ⎡ ⎣ pY ( y ) = 2a ⎡ ⎣
( y − b ) / a ⎤⎦
pX ⎡− ( y − b ) / a ⎤ ⎣ ⎦ + ( y − b ) / a ⎤⎦ 2a ⎡⎣ ( y − b ) / a ⎤⎦ 32
2.1.2 Functions of Random Variables Example 2.1-3 (cont.): Y=g(X)=aX2+b; a>0 has two solutions: y −b y −b , x2 = − x1 = a a pY(y) consists of two terms corresponding to these two solutions: p X ⎡ x1 = ⎣ pY ( y ) = g ′ ⎡ x1 = ⎣
p X ⎡ x1 = − ( y − b ) / a ⎤ ⎣ ⎦ + ( y − b ) / a ⎤⎦ g ′ ⎡⎣ x1 = − ( y − b ) / a ⎤⎦
( y − b ) / a ⎤⎦
In general, suppose that x1,x2, …, xn are the real roots of g(x)=y. The PDF of the random variable Y=g(X) may n p X ( xi ) be expressed as p ( y ) = Why? ∑ Y ′ i =1 g ( xi ) 33
2.1.2 Functions of Random Variables Function of one random variable (discrete case): Suppose X is a discrete random variable that can have one of n values x1, x2, …, xn. Let g(X) be a scalar-valued function. Y=g(X) is a discrete random variable that can have one of m, m≤n, values y1, y2, …, ym. If g(X) is a one-to-one mapping, then m will be equal to n. If g(X) is many-to-one, then m will be smaller than n. The CDF of Y can be obtained easily from the CDF function of X as: P (Y = y ) = P( X = x ) i
∑
i
where the sum is over all values of xi that map to yi . 34
2.1.2 Functions of Random Variables Function of one random variable (continuous case): If X is a continuous random variable, then the pdf of Y=g(X) can be obtained from the pdf of X. Let y be a particular value of Y and let x(1), x(2), …, x(n) be roots of the equation y=g(x). That is y=g(x(1))=g(x(2))= … = y=g(x(n)). P ( y < Y ≤ y + Δy ) = f Y ( y )Δy as Δy → 0
= P[{x : y < g ( x ) ≤ y + Δy}]
35
2.1.2 Functions of Random Variables Function of one random variable (continuous case): =y
x
36
2.1.2 Functions of Random Variables Function of one random variable (continuous case):
(
) ( ) + P ( x ( ) < X < x ( ) + Δx ( ) ) = f ( x ( ) ) Δx ( ) + f ( x ( ) ) Δx ( ) + f ( x( ) ) Δx( ) = f ( y ) Δy
P ( y < Y < y + Δy ) = P x(1) < X < x(1) + Δx(1) + P x( 2) − Δx ( 2) < X < x( 2) 3
3
1
1
3
2
X
2
3
X
3
X
Since the slope g ′( x ) of g ( x ) is Δy / Δx, we have ( 2 ) Δy (3 ) Δ y Δx (1) = Δy (1) Δx = ( 2 ) Δx = g′ x g′ x g ′ x (3 )
( ) ( ) ( ) f (x ( ) ) f (x ( ) ) f (x ( ) ) Δy + f ( y )Δy = ( ) Δy + ( ) Δy ( ) g ′(x ) g ′(x ) g ′(x ) f (x ( ) ) ⇒ f (y) = ∑ g ′(x ( ) ) 1
2
X
Y
X
1
k
Y
i =1
3
X
3
2
i
X
i
37
Y
2.1.2 Functions of Random Variables Function of several random variables (continuous case): Goal: find the joint distribution of n random variables Y1, Y2, …, Yn given the distribution of n related random variables, X1, X2, …, Xn, where Yi = g i ( X 1 , X 2 ,..., X n ), i = 1,2,..., n Let’s start with a mapping of two random variables onto two other random variables: Y1 = g1 ( X 1 , X 2 ) Y2 = g 2 ( X 1 , X 2 )
(
)
Suppose x1(i ) , x2(i ) , i = 1,2,..., k are the k roots of y1 = g1 ( x1 , x2 ) and y 2 = g 2 (x1 , x2 ). We need to find the region in the x1 , x2 plane such that y1 < g1 (x1 , x2 ) < y1 + Δy1 and y2 < g 2 ( x1 , x2 ) < y2 + Δy2 38
2.1.2 Functions of Random Variables Function of several random variables (continuous case): There are k such regions as shown in the following figure for the case of k=3.
39
2.1.2 Functions of Random Variables Function of several random variables (continuous case): Each region consists of a parallelog ram and the area of each Δy Δy parallelog ram is equal to 1 2 where J ( x1 , x2 ) is J x1(i ) , x2(i )
(
)
the Jacobian of the transform ation is defined as ∂g1 ∂x J ( x1 , x2 ) = 1 ∂g 2 ∂x1
∂g 1 ∂x 2 ∂g 2 ∂x 2
By summing the contributi on from all regions, we obtain the joint pdf of Y1 and Y2 as k
f Y1 ,Y2 ( y1 , y 2 ) = ∑ i =1
(
f X 1 , X 2 x1(i ) , x2(i )
(
J x1(i ) , x2(i )
40
)
)
2.1.2 Functions of Random Variables Function of several random variables (continuous case): W e can generalize this result to the n -variate case as f Y ( y1 , y 2 , ..., y n ) =
(
k
f X x1( i ) = g 1− 1 , x 2( i ) = g 2− 1 ,..., x n( i ) = g n− 1
i =1
J (i )
∑
)
where J ( i ) denotes the Jacobian of the transformation defined by the following determinant when evaluating at the i − th solution x1( i ) = g 1− 1 , x 2( i ) = g 2− 1 ,..., x n( i ) = g n− 1 of y = g ( x ) = [ g 1 ( x ) ,g 2 ( x ) , ..., g n ( x ) ]:
J (i ) =
∂ g1 ∂ x1
∂ g1 ∂x2
∂ g1 ∂xn
∂g n ∂ x1
∂g n ∂x2
∂g n ∂xn 41
i i i at x1( ) = g1− 1 , x 2( ) = g 2− 1 ,..., x n( ) = g n− 1
2.1.2 Functions of Random Variables Example 2.1-4 n
Yi = ∑ a ijX j ,
i = 1, 2,..., n
j =1
Y = AX X = A −1Y
wher e A is an n × n matrix. n
⇒ X i = ∑ b ijYj,
i = 1,2 ,...,n
j =1
{b }are the elements of the inverse matrix ij
A -1 .
The Jacobian of this transform ation is J = det A. pY ( y1, y 2,..., yn ) n n n ⎛ ⎞ 1 ⎜ = p X ⎜ x1 = ∑ b1 j ⋅ yj , x 2 = ∑ b 2 j ⋅ yj ,..., xn = ∑ b nj ⋅ yj ⎟⎟ j =1 j =1 j =1 ⎝ ⎠ det A
42
2.1.2 Functions of Random Variables The mean or expected value of X, which characterized by its PDF p(x), is defined as: ∞ E ( X ) ≡ m x = ∫ xp ( x ) dx −∞
This is the first moment of random variable X. The n-th moment is defined as: ∞ n E ( X ) = ∫ x n p ( x ) dx −∞
Define Y=g(X), the expected value of Y is: E (Y ) = E [g ( X )] = ∫ g ( x) p ( x)dx ∞
−∞
43
2.1.3 Statistical Averages of Random Variables The n-th central moment of the random variable X is: ∞ n E (Y ) = E ( X − m x ) = ∫ ( x − m x ) n p ( x ) dx −∞ When n=2, the central moment is called the variance 2 of the random variable and denoted as σ x : ∞ 2 σ x = ∫ ( x − m x ) 2 p ( x ) dx
[
]
−∞
σ = E ( X ) − [E ( X )] = E ( X 2 ) − m x2 2 x
2
2
In the case of two random variables, X1 and X2, with joint PDF p(x1,x2), we define the joint moment as: E( X X ) = ∫ k 1
n 2
∞
∫
∞
−∞ −∞
x1k x2n p ( x1 , x2 )dx1dx2 44
2.1.3 Statistical Averages of Random Variables The joint central moment is defined as:
[
E ( X 1 − m1 ) k ( X 2 − m2 ) n =∫
∞
∫
∞
−∞ −∞
]
( x1 − m1 ) k ( x2 − m2 ) n p ( x1 , x2 )dx1dx2
If k=n=1, the joint moment and joint central moment are called the correlation and the covariance of the random variables X1 and X2, respectively. The correlation between Xi and Xj is given by the joint moment: ∞ ∞ E ( X i X j ) = ∫ ∫ xi x j p ( xi , x j ) dxi dx j −∞ −∞
45
2.1.3 Statistical Averages of Random Variables The covariance between Xi and Xj is given by the joint central moment: μij ≡ E[( X i − mi )(X j − m j )]
∫ (x − m )(x − m )p( x , x )dx dx = ∫ ∫ (x x − x m − x m + m m )p ( x , x )dx dx = ∫ ∫ x x p ( x , x )dx dx − m m = ∫ ∫ x x p ( x , x )dx dx − m m − m m + m m
=∫
∞
∞
i
−∞ −∞ ∞
j
i
i
j
j
i
j
j
i
i
j
i
j
i
j
i
j
i
j
∞
−∞ −∞ ∞
j
∞
−∞ −∞ ∞
i
i
j
i
j
i
j
i
j
i
j
i
j
i
j
i
j
∞
−∞ −∞
i
j
= E ( X i X j ) − mi m j
The n×n matrix with elements μij is called the covariance matrix of the random variables, Xi, i=1,2, …, n. 46
2.1.3 Statistical Averages of Random Variables Two random variables are said to be uncorrelated if E(XiXj)=E(Xi)E(Xj)=mimj. Uncorrelated → Covariance μij = 0. If Xi and Xj are statistically independent, they are uncorrelated. If Xi and Xj are uncorrelated, they are not necessary statistically independently. Two random variables are said to be orthogonal if E(XiXj)=0. Two random variables are orthogonal if they are uncorrelated and either one or both of them have zero mean. 47
2.1.3 Statistical Averages of Random Variables Characteristic functions The characteristic function of a random variable X is defined as the statistical average:
E (e
jvX
∞
) ≡ ψ ( jv ) = ∫ e jvx p ( x ) dx −∞
Ψ(jv) may be described as the Fourier transform of p(x). The inverse Fourier transform is: 1 ∞ − jvx p( x) = ψ ( jv ) e dv ∫ 2π − ∞ First derivative of the above equation with respect to v: ∞ dψ ( jv) = j ∫ xe jvx p( x)dx −∞ dv 48
2.1.3 Statistical Averages of Random Variables Characteristic functions (cont.) First moment (mean) can be obtained by: dψ ( jv) E ( X ) = mx = − j dv v =0 Since the differentiation process can be repeated, n-th moment can be calculated by: n ψ ( jv ) d n n E ( X ) = (− j ) dv n v = 0
Suppose the characteristic function can be expanded in a Taylor series about the point v=0: n ∞ ∞ ⎡ d nψ ( jv ) ⎤ v n ( jv ) n ψ ( jv ) = ∑ ⎢ ⇒ ψ = ( jv ) E ( X ) ∑ ⎥ n n! dv n =0 ⎣ n =0 ⎦ v = 0 n! 49
2.1.3 Statistical Averages of Random Variables Characteristic functions (cont.) Determining the PDF of a sum of statistically independent random variables:
n ⎡ ⎞⎤ ⎛ Y = ∑ X i ⇒ ψ Y ( jv ) = E ( e ) = E ⎢ exp ⎜ jv ∑ X i ⎟ ⎥ i =1 ⎠⎦ ⎝ i =1 ⎣ ∞ ∞ ⎛ n ⎡ n jvX i ⎤ jvx i ⎞ = E ⎢∏ e ... = e ⎟⎟ p ( x1 , x 2 ,..., x n ) dx1dx 2 ...dx n ⎜ ⎥ ∫− ∞ ∫− ∞ ⎜ ∏ ⎣ i =1 ⎦ ⎠ ⎝ i =1 Since the random variables are statistica lly independen t, n
jvY
(
)
n
p ( x1 , x 2 ,..., x n ) = p ( x1 ) p ( x 2 )... p ( x n ) ⇒ ψ Y ( jv ) = ∏ ψ X i ( jv ) i =1
If X i are iid (independe nt and identicall y distribute d) ⇒
ψ Y ( jv ) = [ψ X ( jv ) ]n 50
2.1.3 Statistical Averages of Random Variables Characteristic functions (cont.) The PDF of Y is determined from the inverse Fourier transform of ΨY(jv). Since the characteristic function of the sum of n statistically independent random variables is equal to the product of the characteristic functions of the individual random variables, it follows that, in the transform domain, the PDF of Y is the nfold convolution of the PDFs of the Xi. Usually, the n-fold convolution is more difficult to perform than the characteristic function method in determining the PDF of Y.
51
2.1.3 Statistical Averages of Random Variables Characteristic functions (for n-dimensional random variables) If Xi, i=1,2,…,n, are random variables with PDF p(x1,x2,…,xn), the n-dimensional characteristic function is defined as: ⎡ ⎛ n ⎞⎤ ψ ( jv1 , jv2 ,..., jvn ) = E ⎢exp⎜ j ∑ vi X i ⎟⎥ ⎠⎦ ⎣ ⎝ i =1
⎛ n ⎞ = ∫ ...∫ exp⎜ j ∑ vi xi ⎟ p ( x1 , x2 ,..., xn ) dx1dx2 ...dxn −∞ −∞ ⎝ i =1 ⎠ ∞
∞
For two dimensional characteristic function: ∞ ∞ ψ ( jv1 , jv 2 ) = ∫ ∫ e j ( v x + v x ) p ( x1 , x2 ) dx1dx 2 1 1
2 2
−∞ −∞
∂ 2ψ ( jv1 , jv 2 ) E( X1X 2 ) = − ∂v1∂v2 52
v1=v2 =0
2.1.4 Some Useful Probability Distributions Binomial distribution:
P ( X = 0 ) = 1 − P ( X = 1) = p n
Let Y = ∑ X i where the X i , i = 1, 2,..., n are statistically iid, i =1
what is the probability distribution function of Y ? ⎛n⎞ k n−k P (Y = k ) = ⎜⎜ ⎟⎟ p (1 − p ) ⎝k ⎠
⎛n⎞ n! ⎜⎜ ⎟⎟ ≡ ⎝ k ⎠ k !(n − k )!
n
PDF of Y is : p ( y ) = ∑ P (Y = k )δ ( y − k ) k =0
⎛n⎞ k = ∑ ⎜⎜ ⎟⎟ p (1 − p ) n − k δ ( y − k ) k =0 ⎝ k ⎠ n
53
2.1.4 Some Useful Probability Distributions Binomial distribution: The CDF of Y is:
[y]
⎛n⎞ k F ( y ) = P (Y ≤ y ) = ∑ ⎜⎜ ⎟⎟ p (1 − p ) n − k k =0 ⎝ k ⎠
where [y] denotes the largest integer m such that m≤y. The first two moments of Y are: E (Y ) = np E (Y 2 ) = np (1 − p ) + n 2 p 2
σ 2 = np (1 − p ) The characteristic function is: ψ ( jν ) = (1 − p + pe jv ) n 54
2.1.4 Some Useful Probability Distributions Uniform Distribution The first two moments of X are: 1 E ( X ) = (a + b) 2 1 2 2 E ( X ) = ( a + b 2 + ab ) 3 1 σ 2 = (a − b) 2 12 The characteristic function is: e jvb − e jva ψ ( jν ) = jv (b − a ) 55
2.1.4 Some Useful Probability Distributions Gaussian (Normal) Distribution The PDF of a Gaussian or normal distributed random variable is: 1 − ( x − m x ) 2 / 2σ 2 p( x) = e 2π σ where mx is the mean and σ2 is the variance of the random variable. ( u − mx ) dt = du t= 2σ 2σ The CDF is: x 1 1 ( x − m x )/ 2σ −t 2 − (u − m x )2 / 2σ 2 F ( x) = e du = e dt ∫ ∫ π −∞ 2π σ − ∞ x − mx x − mx 1 1 1 = + erf ( = − erfc ) 1 ( ) 2 2 2 2σ 2σ 56
2.1.4 Some Useful Probability Distributions Gaussian (Normal) distribution:
57
2.1.4 Some Useful Probability Distributions Gaussian (Normal) Distribution erf( ) and erfc( ) denote the error function and complementary error function, respectively, and are defined as: erf ( x ) =
2
π
∫
x
0
e dt and erfc ( x ) = −t 2
2
π
∫
∞
x
−t 2
e dt = 1 − erf ( x )
erf(-x)=-erf(x), erfc(-x)=2-erfc(x), erf(0)=erfc(∞)=0, and erf(∞)=erfc(0)=1. For x>mx, the complementary error functions is proportional to the area under the tail of the Gaussian PDF.
58
2.1.4 Some Useful Probability Distributions Gaussian (Normal) Distribution The function that is frequently used for the area under the tail of the Gaussian PDF is denoted by Q(x) and is defined as: Q ( x) =
1 2π
∫
∞
x
e
−t2 / 2
dt =
59
1 ⎛ x ⎞ erfc ⎜ ⎟ 2 ⎝ 2⎠
x≥0
2.1.4 Some Useful Probability Distributions Gaussian (Normal) Distribution The characteristic function of a Gaussian random variable with mean mx and variance σ2 is: ∞ 1 jvm x − (1 / 2 )v 2σ 2 − ( x − m x )2 / 2σ 2 ⎤ jvx ⎡ ψ ( jv ) = ∫ e ⎢ e dx = e ⎥ −∞ ⎣ 2π σ ⎦ The central moments of a Gaussian random variable are: k ⎧ 1 ⋅ 3 ⋅ ⋅ ⋅ ( k − 1 ) (even k ) σ k E ( X − mx ) = μ k = ⎨ (odd k ) ⎩0
[
]
The ordinary moments may be expressed in terms of the central moments as: k ⎛k ⎞ i k E X = ∑ ⎜⎜ ⎟⎟mx μ k −i i =0 ⎝ i ⎠
[ ]
60
2.1.4 Some Useful Probability Distributions Gaussian (Normal) Distribution The sum of n statistically independent Gaussian random variables is also a Gaussian random variable. n
Y = ∑ Xi i =1
n
n
ψ Y ( jv ) = ∏ψ X ( jv ) = ∏ e i =1
i
jvmi − v 2σ i2 / 2
=e
jvm y − v 2σ y2 / 2
i =1
n
n
i =1
i =1
where m y = ∑ mi and σ y2 = ∑ σ i2 Therefore, Y is Gaussian - distribute d with mean m y and variance σ y2 . 61
2.1.4 Some Useful Probability Distributions Chi-square distribution If Y=X2, where X is a Gaussian random variable, Y has a chisquare distribution. Y is a transformation of X. There are two type of chi-square distribution: Central chi-square distribution: X has zero mean. Non-central chi-square distribution: X has non-zero mean.
Assuming X be Gaussian distributed with zero mean and variance σ2, we can apply (2.1-47) to obtain the PDF of Y with a=1 and b=0;
pY ( y ) =
p X [ x1 = g ′[ x1 =
(y − b)/ a ] + ( y − b) / a ] 62
p X [ x1 = − g ′[ x1 = −
( y − b)/ a ] (y − b)/ a ]
2.1.4 Some Useful Probability Distributions Central chi-square distribution The PDF of Y is:
pY ( y ) =
1 − y / 2σ 2 e , y≥0 2πyσ
The CDF of Y is:
FY ( y ) = ∫
y
0
1 pY (u )du = 2π σ
∫
y
0
1 −u / 2σ 2 e du u
The characteristic function of Y is:
ψ Y ( jv) =
1 (1 − j 2vσ 2 )1/ 2 63
2.1.4 Some Useful Probability Distributions Chi-square (Gamma) distribution with n degrees of freedom. n
Y = ∑ X i2 , X i , i = 1,2,..., n, are statistically independent and i =1
identically distributed (iid) Gaussian random variables with zero mean and variance σ 2 . The characteristic function is: 1 ψ Y ( jv) = (1 − j 2vσ 2 ) n / 2 The inverse transform of this characteristic function yields 1 n/ 2 −1 -y/ 2 σ 2 the PDF: p ( y ) = y e , y≥0 Y ⎛1 ⎞ σ n 2 n / 2 Γ⎜ n ⎟ ⎝2 ⎠ 64
2.1.4 Some Useful Probability Distributions Chi-square (Gamma) distribution with n degrees of freedom (cont.). Γ( p ) is the gamma function, defined as : ∞
Γ( p ) = ∫ t p −1e −t dt , p > 0 0
Γ( p ) = (p - 1)!
p an integer > 0
⎛1⎞ ⎛3⎞ 1 Γ⎜ ⎟ = π Γ⎜ ⎟ = π ⎝2⎠ ⎝2⎠ 2 When n=2, the distribution yields the exponential distribution.
65
2.1.4 Some Useful Probability Distributions Chi-square (Gamma) distribution with n degrees of freedom (cont.). The PDF of a chi-square distributed random variable for several degrees of freedom.
66
2.1.4 Some Useful Probability Distributions Chi-square (Gamma) distribution with n degrees of freedom (cont.). The first two moments of Y are: E (Y ) = nσ 2 E (Y 2 ) = 2nσ 4 + n 2σ 4
σ y2 = 2nσ 4 The CDF of Y is: FY ( y ) = ∫
1
y
0
σ 2 n
n/2
⎛1 ⎞ Γ⎜ n ⎟ ⎝2 ⎠
u
67
n / 2 −1 −u / 2σ 2
e
du ,
y≥0
2.1.4 Some Useful Probability Distributions Chi-square (Gamma) distribution with n degrees of freedom (cont.). The integral in CDF of Y can be easily manipulated into the form of the incomplete gamma function, which is tabulated by Pearson (1965). When n is even, the integral can be expressed in closed form. Let m=n/2, where m is an integer, we can obtain: FY ( y ) = 1 − e
− y / 2σ 2
m −1
1 y k ( 2) , ∑ k = 0 k! 2σ
68
y≥0
2.1.4 Some Useful Probability Distributions Non-central chi-square distribution If X is Gaussian with mean mx and variance σ2, the random variable Y=X2 has the PDF:
pY ( y ) =
y mx 1 − ( y + m x2 ) / 2σ 2 e cosh( ), 2 σ 2πyσ
y≥0
The characteristic function corresponding to this PDF is: 1 jm x2 v /(1− j 2 vσ 2 ) ψ Y ( jv) = e 2 1/ 2 (1 − j 2vσ )
69
2.1.4 Some Useful Probability Distributions Non-central chi-square distribution with n degrees of freedomn
Y = ∑ X i2 , X i , i = 1,2,..., n, are statistically independent and i =1
identically distributed (iid) Gaussian random variables with mean mi , i = 1,2,..., n, and identical variance equal to σ 2 . The characteristic function is:
ψ Y ( jv ) =
1 (1 − j 2vσ 2 ) n / 2
70
n ⎞ ⎛ ⎜ jv ∑ mi2 ⎟ i =1 ⎟ exp ⎜ ⎜ 1 − j 2 vσ 2 ⎟ ⎟ ⎜ ⎠ ⎝
2.1.4 Some Useful Probability Distributions Non-central chi-square distribution with n degrees of freedom The characteristic function can be inverse Fourier transformed to yield the PDF: y ( n − 2) / 4 − ( s 2 + y ) / 2σ 2 s pY ( y ) = ( ) e I ( y ), n / 2 −1 2 2 2 2σ s σ where, s 2 is called the non-centrality parameter: 1
y≥0
n
s 2 = ∑ mi2 i =1
and Iα ( x ) is the α th-order modified Bessel function of the first kind, which may be represented by the infinite series: ∞ ( x / 2)α + 2 k , Iα ( x ) = ∑ x≥0 k = 0 k ! Γ (α + k + 1) 71
2.1.4 Some Useful Probability Distributions Non-central chi-square distribution with n degrees of freedom The CDF is: FY ( y ) = ∫
y
0
1 ⎛u ⎞ 2 ⎜ 2 ⎟ 2σ ⎝ s ⎠
(n − 2 )/ 4
e
− ( s 2 + u ) / 2σ 2
I n / 2 −1 ( u
s
σ
The first two moments of a non-central chi-squaredistributed random variable are:
E (Y ) = nσ 2 + s 2 E (Y 2 ) = 2nσ 4 + 4σ 2 s 2 + (nσ 2 + s 2 ) 2
σ y2 = 2nσ 4 + 4σ 2 s 2 72
2
) du
2.1.4 Some Useful Probability Distributions Non-central chi-square distribution with n degrees of freedom When m=n/2 is an integer, the CDF can be expressed in terms of the generalized Marcum’s Q function: Qm ( a , b ) = ∫
∞
b
⎛x⎞ x⎜ ⎟ ⎝a⎠
m −1
e−( x
= Q1 ( a , b ) + e
2
+ a2 ) / 2
(
k
⎛b⎞ ) ∑ ⎜ ⎟ I k ( ab ) k =0 ⎝ a ⎠
− a2 +b2 / 2
(
I m −1 ( ax ) dx
m −1
)
k
∞
⎛b⎞ Q1 ( a , b ) = e b>a>0 where ∑ ⎜ ⎟ I k ( ab ), k =0 ⎝ a ⎠ 2 u By using x 2 = 2 and let a 2 = s 2 , it is easily shown: − a2 +b2 / 2
σ
σ
⎛s y⎞ FY ( y ) = 1 − Qm ⎜ , ⎜ σ σ ⎟⎟ ⎝ ⎠ 73
2.1.4 Some Useful Probability Distributions Rayleigh distribution Rayleigh distribution is frequently used to model the statistics of signals transmitted through radio channels such as cellular radio. Consider a carrier signal s at a frequency ω0 and with an amplitude a: s = a ⋅ exp( jω 0t ) The received signal sr is the sum of n waves: n
sr = ∑ ai exp[ j (ω 0t + θ i ) ] ≡ r exp[ j (ω 0t + θ ) ] i =1
n
where r exp( jθ ) = ∑ ai exp( jθ i ) i =1
74
2.1.4 Some Useful Probability Distributions Rayleigh distribution
n
n
i =1
i =1
Define : r exp( jθ ) = ∑ ai cos θ i + j ∑ ai sin θ i ≡ x + jy n
n
i =1
i =1
We have : x ≡ ∑ ai cos θ i and y ≡ ∑ ai sin θ i where : r 2 = x 2 + y 2 x = r cos θ y = r sin θ
Because (1) n is usually very large, (2) the individual amplitudes ai are random, and (3) the phases θi have a uniform distribution, it can be assumed that (from the central limit theorem) x and y are both Gaussian variables with means equal to zero and variance: σ x2 = σ y2 ≡ σ 2 75
2.1.4 Some Useful Probability Distributions Rayleigh distribution Because x and y are independent random variables, the joint distribution p(x,y) is ⎛ x2 + y 2 ⎞ 1 ⎟ p( x, y) = p( x) p( y) = exp⎜⎜ − 2 2 ⎟ 2πσ ⎝ 2σ ⎠ The distribution p(r,θ) can be written as a function of p(x,y) : p ( r , θ ) = J p ( x, y ) J≡
∂x / ∂r
∂x / ∂θ
∂y / ∂r
∂y / ∂θ
=
cos θ
− r sin θ
sin θ
r cos θ
⎛ r2 ⎞ ⎟ exp ⎜⎜ − p (r ,θ ) = 2 ⎟ 2πσ ⎝ 2σ ⎠ r
76
=r
2.1.4 Some Useful Probability Distributions Rayleigh distribution Thus, the Rayleigh distribution has a PDF given by: 2π
pR (r ) =
∫ 0
⎧ r − r 2 / 2σ 2 ⎪ e p ( r , θ ) dθ = ⎨ σ 2 ⎪⎩ 0
r≥0 otherwise
The probability that the envelope of the received signal does not exceed a specified value R is given by the corresponding cumulative distribution function (CDF): r
FR ( r ) = ∫ 0
u
σ
2
e
− u 2 / 2σ 2
du = 1 − exp
77
− r 2 / 2σ 2
,
r≥0
2.1.4 Some Useful Probability Distributions Rayleigh distribution Mean:
∞
rmean = E[ R ] = ∫ rp ( r ) dr =σ 0
∞
π 2
= 1.2533σ
Variance: σ r2 = E[ R 2 ] − E 2 [ R ] = ∫ r 2 p ( r ) dr − 0
σ 2π 2
π⎞ ⎛ = σ ⎜ 2 − ⎟ = 0.4292σ 2 2⎠ ⎝ rmedian 1 p (r )dr Median value of r is found by solving: = ∫ 2 0 rmedian = 1 .177 σ k⎞ k 2 k/2 ⎛ Monents of R are: E[R ] = (2σ ) Γ⎜1 + ⎟ ⎝ 2⎠ 2
Most likely value:= max { pR(r) } = σ. 78
2.1.4 Some Useful Probability Distributions Rayleigh distribution
79
2.1.4 Some Useful Probability Distributions Rayleigh distribution Probability That Received Signal Doesn’t Exceed A Certain r Level (R) FR ( r ) = ∫ p (u ) du 0
⎛ u2 ⎞ ⎟ du = ∫ 2 exp ⎜⎜ − 2 ⎟ σ ⎝ 2σ ⎠ 0 r
u
r
⎛ u ⎞ ⎟ = − exp ⎜⎜ − 2 ⎟ ⎝ 2σ ⎠ 0 2
⎛ r2 ⎞ ⎟ = 1 − exp ⎜⎜ − 2 ⎟ ⎝ 2σ ⎠ 80
2.1.4 Some Useful Probability Distributions Rayleigh distribution ∞ Mean value: rmean = E[ R] = ∫ rp(r )dr ∞
0-0=0
0
∞ ⎛ r2 ⎞ ⎛ r2 ⎞ = ∫ 2 exp ⎜ − 2 ⎟ dr = ∫ −rd exp ⎜ − 2 ⎟ σ ⎝ 2σ ⎠ ⎝ 2σ ⎠ 0 0 ∞ ∞ 2 ⎛ r ⎞ ⎛ r2 ⎞ = −r ⋅ exp ⎜ − 2 ⎟ + ∫ exp ⎜ − 2 ⎟ dr ⎝ 2σ ⎠ 0 0 ⎝ 2σ ⎠
r2
∞
= 2πσ ⋅ ∫ 0
⎛ r2 ⎞ 1 exp ⎜ − 2 ⎟ dr 2πσ ⎝ 2σ ⎠
π 1 = 2πσ ⋅ = σ = 1.2533σ 2 2 81
2.1.4 Some Useful Probability Distributions Rayleigh distribution: Mean square value: ∞
E[ R 2 ] = ∫ r 2 p ( r ) dr 0
∞
0-0=0
∞ 2 ⎛ r2 ⎞ ⎛ ⎞ r 2 ⎟ dr = ∫ − r d exp ⎜⎜ − ⎟ = ∫ 2 exp ⎜⎜ − 2 ⎟ 2 ⎟ σ ⎝ 2σ ⎠ ⎝ 2σ ⎠ 0 0
r3
∞
∞ 2 ⎛ ⎞ ⎛ ⎞ r r 2 ⎟ + ∫ 2r ⋅ exp ⎜⎜ − ⎟ dr = − r ⋅ exp ⎜⎜ − 2 ⎟ 2 ⎟ ⎝ 2σ ⎠ 0 0 ⎝ 2σ ⎠
2
∞
⎛ r ⎞ 2 ⎟ = − 2σ ⋅ exp ⎜⎜ − = 2 σ 2 ⎟ 2 σ ⎝ ⎠0 2
2
82
2.1.4 Some Useful Probability Distributions Rayleigh distribution Variance:
σ r2 = E[ R 2 ] − E 2 [ R ] = ( 2σ ) − (σ ⋅ 2
π 2
)2
π⎞ ⎛ = σ ⋅ ⎜ 2 − ⎟ = 0.4292σ 2 2⎠ ⎝ 2
83
2.1.4 Some Useful Probability Distributions Rayleigh distribution Most likely value Most Likely Value happens when: dp(r) / dr = 0
⎛ r 2 ⎞ 2r 2 ⎛ r2 ⎞ dp ( r ) 1 ⎟=0 ⎟− = 2 exp⎜⎜ − ⋅ exp⎜⎜ − 2 ⎟ 4 2 ⎟ dr σ ⎝ 2σ ⎠ 2 ⋅ σ ⎝ 2σ ⎠ ⇒ r =σ ⇒
p ( r ) r =σ
⎛ r2 ⎞ ⎟ = 2 exp⎜⎜ − 2 ⎟ σ ⎝ 2σ ⎠ r =σ r
84
⎛ 1⎞ exp⎜ − ⎟ 2 ⎠ 0.6065 ⎝ = =
σ
σ
2.1.4 Some Useful Probability Distributions Rayleigh distribution Characteristic function ∞ r − r 2 / 2σ 2 jvr ψ R ( jv ) = ∫ 2 e e dr 0 σ ∞ r ∞ r − r 2 / 2σ 2 − r 2 / 2σ 2 =∫ 2e cos (vr )dr + j ∫ 2 e sin (vr )dr 0 σ 0 σ 1 ⎛ 1 1 2 2⎞ 2 − v 2σ 2 / 2 π vσ e =1 F1 ⎜1, ;− v σ ⎟ + j 2 ⎝ 2 2 ⎠ ⎛ 1 ⎞ where 1F1 ⎜1, ;− a ⎟ is the confluent hyupergeom etric function : ⎠ ⎝ 2 ∞ Γ (α + k )Γ (β )x k , β ≠ 0,−1,−2,... 1 F1 (α ; β ; x ) = ∑ k = 0 Γ (α )Γ (β + k )k ! 85
2.1.4 Some Useful Probability Distributions Rayleigh distribution Characteristic function (cont.) ⎛ 1 ⎞ Beaulieu (1990) has shown that 1F1 ⎜1, ;− a ⎟ may be ⎝ 2 ⎠ expressed as : k ∞ a ⎛ 1 ⎞ −a ; − = − F 1 , a e ⎜ ⎟ ∑ 1 1 2 ⎠ ⎝ k = 0 (2 k − 1)k !
86
2.1.4 Some Useful Probability Distributions Generalized Rayleigh distribution Consider R =
n
2 X ∑ i , where X i , i = 1,2,..., n, are statistica lly i =1
independen t, identicall y distribute d zero mean Gaussian radom variables . Y=R2 is chi-square distributed with n degrees of freedom. PDF is given by: r n −1 − r 2 / 2σ 2 p R (r ) = e , r≥0 (n − 2 )/ 2 n ⎛ 1 ⎞ 2 σ Γ⎜ n ⎟ ⎝2 ⎠
87
2.1.4 Some Useful Probability Distributions Generalized Rayleigh distribution When n is an even number, i.e., n=2m, the CDF of R can be expressed in the closed form: FR (r ) = 1 − e
− r 2 / 2σ 2
k
1⎛ r ⎞ ⎜⎜ 2 ⎟⎟ , ∑ k = 0 k ! ⎝ 2σ ⎠ m −1
2
r≥0
For any integer n, the k-th moment of R is:
( ) (
E R = 2σ k
)
2 k/2
⎡1 ⎤ Γ ⎢ (n + k )⎥ ⎣2 ⎦, ⎛1 ⎞ Γ⎜ n ⎟ ⎝2 ⎠ 88
k ≥0
2.1.4 Some Useful Probability Distributions Rice distribution When there is a dominant stationary (non-fading) signal component present, such as a line-of-sight (LOS) propagation path, the small-scale fading envelope distribution is Rice. direct wav es
scattered waves
sr = r ' exp[ j (ω 0t + θ )] + A exp( jω 0t ) ≡ [( x + A) + jy ] exp( jω 0t ) ≡ r exp[ j (ω 0t + θ )] r = ( x + A) + y 2
2
2
x + A = r cos θ y = r sin θ 89
2.1.4 Some Useful Probability Distributions Rice distribution By following similar steps described in Rayleigh distribution, we obtain: ⎧ r ⎛ r 2 + A2 ⎞ ⎛ Ar ⎞ for ( A ≥ 0,r ≥ 0 ) ⎪ 2 exp ⎜ − ⎟ I0 ⎜ 2 ⎟ 2 p(r ) = ⎨σ r 2σ r ⎠ ⎝ σ r ⎠ ⎝ ⎪ 0 for (r < 0 ) ⎩ where ⎛ Ar ⎞ 1 2π ⎛ Ar cos θ ⎞ I0 ⎜ 2 ⎟ = exp ⎜ ⎟dθ 2 ∫ ⎝ σ r ⎠ 2π 0 ⎝ σr ⎠ is the modified zeroth-order Bessel function. ∞ ⎛ xi ⎞ I 0 (x) = ∑ ⎜ i ⎟ ! 2 i ⋅ i =0 ⎝ ⎠ 90
2.1.4 Some Useful Probability Distributions Rice distribution The Rice distribution is often described in terms of a parameter K which is defined as the ratio between the deterministic signal power and the variance of the multi-path. It is given by K=A2/(2σ2) or in terms of dB: A2 k (dB) = 10 ⋅ log 2 [dB] 2σ The parameter K is known as the Rice factor and completely specifies the Rice distribution. As AÆ0, KÆ-∞ dB, and as the dominant path decreases in amplitude, the Rice distribution degenerates to a Rayleigh distribution. 91
2.1.4 Some Useful Probability Distributions Rice distribution
92
2.1.4 Some Useful Probability Distributions Rice distribution n
sr = ∑ ai exp[ j (ω 0t + θ i ) ] + A exp( jω 0t ) i =1
⎡ n ⎤ = ⎢∑ ai exp( jθ i ) ⎥ j (ω 0t ) + A exp( jω 0t ) ⎣ i =1 ⎦ ≡ r ' exp( jθ ) exp( jω 0t ) + A exp( jω 0t ) = r ' exp[ j (ω 0t + θ ) ] + A exp( jω 0t )
≡ [( x + A) + jy ]exp( jω 0t ) ≡ r exp[ j (ω 0t + θ ) ] n
where r ' exp( jθ ) = ∑ ai exp( jθ i ) i =1
93
2.1.4 Some Useful Probability Distributions Rice distribution
n
n
i =1
i =1
Define : r ' exp( jθ ) = ∑ ai cos θ i + j ∑ ai sin θ i ≡ x + jy n
n
i =1
i =1
We have : x ≡ ∑ ai cos θ i and y ≡ ∑ ai sin θ i and r 2 = ( x + A) 2 + y 2 x + A = r cos θ y = r sin θ
Because (1) n is usually very large, (2) the individual amplitudes ai are random, and (3) the phases θi have a uniform distribution, it can be assumed that (from the central limit theorem) x and y are both Gaussian variables with means equal to zero and variance: σ x2 = σ y2 ≡ σ 2 94
2.1.4 Some Useful Probability Distributions Rice distribution Because x and y are independent random variables, the joint distribution p(x,y) is ⎛ x2 + y 2 ⎞ 1 ⎟ p( x, y) = p( x) p( y) = exp⎜⎜ − 2 2 ⎟ 2πσ ⎝ 2σ ⎠ The distribution p(r,θ) can be written as a function of p(x,y) : p ( r , θ ) = J p ( x, y ) ∂x / ∂r ∂x / ∂θ cos θ J≡ = ∂y / ∂r ∂y / ∂θ sin θ 95
− r sin θ =r r cos θ
2.1.4 Some Useful Probability Distributions Rice distribution ⎛ x2 + y2 ⎞ ⎟⎟ exp⎜⎜ − p (r ,θ ) = 2 2σ ⎠ 2πσ ⎝ ⎛ ( r cos θ − A) 2 + ( r sin θ ) 2 ⎞ r ⎟⎟ exp⎜⎜ − = 2 2πσ 2σ ⎠ ⎝ ⎛ r 2 + A 2 − 2 Ar cos θ ⎞ r ⎟⎟ exp⎜⎜ − = 2 2πσ 2σ ⎠ ⎝ r
⎛ r 2 + A2 ⎞ 1 ⎛ Ar cos θ ⎞ ⎟ exp = exp⎜⎜ − ⎜ ⎟ 2 2 ⎟ σ 2σ ⎠ 2π ⎝ σ ⎠ ⎝ r
96
2.1.4 Some Useful Probability Distributions Rice distribution The Rice distributi on has a probabilit y density function (pdf) given by : 2π
p(r ) =
∫ p ( r , θ ) dθ 0
⎧ r ⎛ r 2 + A2 ⎞ 1 ⎟⎟ ⎪ 2 exp ⎜⎜ − 2 = ⎨σ 2σ ⎠ 2π ⎝ ⎪ 0 ⎩
2π
⎛ Ar cos θ ⎞ ∫0 exp⎜⎝ σ 2 ⎟⎠dθ otherwise
97
r≥0
2.1.4 Some Useful Probability Distributions Rice distribution Just as the Rayleigh distribution is related to the central chisquare distribution, the Rice distribution is related to the noncentral chi-square distribution. To illustrate this relation, let Y = X 12 + X 22 , where X 1 and X 2 are statistically independent Gaussian random variables with means mi , i = 1, 2, and common variance σ 2 . Y has a non-central chi-square distribution with noncentrality parameter s 2 = m12 + m22 . The PDF of Y, obtained from Equation 2.1-118 for n = 2, is: 1 − ( s 2 + y ) / 2σ 2 ⎛ 2 ⎞ , pY ( y ) = e I y y≥0 0⎜ 2 2 ⎟ 2σ σ ⎠ ⎝ 98
2.1.4 Some Useful Probability Distributions Rice distribution Define a new random varible R =
y . The PDF of R, obtained
from Equation 2.1 - 140 by a simple change of variable, is : ⎛ rs ⎞ p R (r ) = 2 I 0 ⎜ 2 ⎟, r ≥ 0 σ ⎝σ ⎠ This PDF characterizes the statistic of the envelope of a signal corrupted by additive narrowband Gaussian noise. It is also used to model the signal statistics of signals transmitted through some radio channels. The CDF of R is obtained from Equation 2.1-124: ⎛s r⎞ FR (r ) = 1 − Q1 ⎜ , ⎟, r ≥ 0 (Q1 is given in 2.1 - 123) ⎝σ σ ⎠ r
e (
)
− r 2 + s 2 / 2σ 2
99
2.1.4 Some Useful Probability Distributions Generalized Rice distribution R=
n
2 X ∑ i , where X i , i = 1,2,..., n are statistica lly independen t i =1
Gaussian random variables with means mi , i = 1,2,..., n, and identical variances equal to σ 2 . The random variable R 2 = Y has a non - central chi - square distributi on with n degrees of freedom and non - centrality parameter s 2 given by Equation 2.1 - 119. Its PDF is given by Equation 2.1 - 118 : r n/2 ⎛ rs ⎞ − (r 2 + s 2 )/ 2σ 2 p R (r ) = 2 (n − 2 ) / 2 e I n / 2 −1 ⎜ 2 ⎟, σ s ⎝σ ⎠ 100
r≥0
2.1.4 Some Useful Probability Distributions Generalized Rice distribution The CDF is: FR (r ) = P (R ≤ r ) = P
(
)
y ≤ r = P (Y ≤ r 2 ) = FY (r 2 )
where is FY (r 2 ) given by Equation 2.1 - 121. In the special case when m=n/2 is an integer, we have:
⎛s r⎞ FR (r ) = 1 − Qm ⎜ , ⎟, r ≥ 0 ⎝σ σ ⎠ The k-th moment of R is: ⎡1 ⎤ Γ ⎢ (n + k )⎥ 2 2 2 ⎞ ⎛ k / 2 n k n s + 2 ⎣ ⎦ k − s / 2σ 2 ⎜ E R = 2σ e , ; 2 ⎟⎟; 1 F1 ⎜ ⎛1 ⎞ ⎝ 2 2 2σ ⎠ Γ⎜ n ⎟ ⎝2 ⎠
( ) (
)
101
k ≥0
2.1.4 Some Useful Probability Distributions Nakagami m-distribution Frequently used to characterize the statistics of signals transmitted through multi-path fading channels. PDF is given by Nakagami (1960) as: m
2 ⎛ m ⎞ 2 m −1 − mr 2 / Ω e p R (r ) = ⎜ ⎟ r Γ (m ) ⎝ Ω ⎠
( )
Ω = E R2 m=
(
Ω2
)
2
1 m≥ 2
,
E[ R − Ω ] The parameter m is defined as the ratio of moments, called the fading figure. 2
102
2.1.4 Some Useful Probability Distributions Nakagami m-distribution The n-th moment of R is: n/2 ( ) Γ m+n/2 ⎛Ω⎞ E (R n ) = ⎜ ⎟ Γ (m ) ⎝ m ⎠
By setting m=1, the PDF reduces to a Rayleigh PDF.
PDF for the Nakagami m-distribution. Shown with Ω=1. 103
2.1.4 Some Useful Probability Distributions Lognormal distribution: Let X = ln R , where X is normally distribute d with mean m and variance σ 2 .
The PDF of R is given by: 2 ⎧ 1 − ( ln r − m ) / 2σ 2 e ⎪ p ( r ) = ⎨ 2π σ r ⎪0 ⎩
(r ≥ 0) (r < 0)
The lognormal distribution is suitable for modeling the effect of shadowing of the signal due to large obstructions, such as tall buildings, in mobile radio communications. 104
2.1.4 Some Useful Probability Distributions Multivariate Gaussian distribution Assume that Xi, i=1,2,…,n, are Gaussian random variables with means mi, i=1,2,…,n; variances σi2, i=1,2,…,n; and covariances μij, i,j=1,2,…,n. The joint PDF of the Gaussian random variables Xi, i=1,2,…,n, is defined as 1 ⎡ 1 ⎤ ′ −1 ( ) ( ) − − − p( x1 , x2 ,..., xn ) = x m M x m exp x x ⎥ n/2 1/ 2 ⎢ (2π ) (det M ) ⎣ 2 ⎦ M denotes the n × n covariance matrix with elements {μij}; x denotes the n × 1 column vector of the random variables; mx denote the n × 1 column vector of mean values mi, i=1,2,…,n. M-1 denotes the inverse of M. x' denotes the transpose of x. 105
2.1.4 Some Useful Probability Distributions Multivariate Gaussian distribution (cont.) Given v the n-dimensional vector with elements υi, i=1,2,…,n, the characteristic function corresponding to the ndimentional joint PDF is: 1 ⎛ ⎞ jv ′x ′ ′ = exp ⎜ jm X v − v Mv ⎟ ψ ( jv ) = E e 2 ⎝ ⎠
(
)
106
2.1.4 Some Useful Probability Distributions Bi-variate or two-dimensional Gaussian The bivariate Gaussian PDF is given by: p ( x1 , x2 ) =
1 2πσ 1σ 2 1 − ρ 2
⎡ σ 2 ( x − m )2 − 2 ρσ σ ( x − m )( x − m ) + σ 2 ( x − m )2 ⎤ 1 1 2 1 1 2 2 1 2 2 ⎥ × exp ⎢ − 2 1 2 2 2 2πσ 1 σ 2 (1 − ρ ) ⎢⎣ ⎥⎦ ⎡ σ 12 μ12 ⎤ ⎡ m1 ⎤ , μ12 = E ⎡⎣( x1 − m1 )( x2 − m2 ) ⎤⎦ mX = ⎢ ⎥, M = ⎢ 2⎥ ⎣ m2 ⎦ ⎣ μ12 σ 2 ⎦
μij , i ≠ j , 0 ≤ ρ ij ≤ 1 ρ ij = σ iσ j ⎡ σ 22 ⎡ σ 12 ρσ 1σ 2 ⎤ 1 −1 M=⎢ ⎥, M = 2 2 2 2 ⎢ ρσ σ σ 1 σ σ ρ − ) ⎣ − ρσ 1σ 2 2 ⎣ 1 2 ⎦ 1 2 ( 107
− ρσ 1σ 2 ⎤ ⎥ σ 12 ⎦
2.1.4 Some Useful Probability Distributions Bi-variate or two-dimensional Gaussian ρ is a measure of the correlation between X1 and X2. When ρ=0, the joint PDF p(x1,x2) factors into the product p(x1)p(x2), where p(xi), i=1,2, are the marginal PDFs. When the Gaussian random variables X1 and X2 are uncorrelated, they are also statistically independent. This property does not hold in general for other distributions. This property can be extended to n-dimensional Gaussian random variables: if ρij=0 for i≠j, then the random variables Xi, i=1,2,…,n, are uncorrelated and, hence, statistically independent.
108
2.1.4 Some Useful Probability Distributions Linear transformation of n Gaussian random variables Let Y=AX , X=A-1Y, where A is a nonsingular matrix. The Jacobian of this transformation is J=1/det A. The joint PDF of Y is: ′ −1 −1 1 ⎡ 1 −1 ⎤ ( ) ( ) exp − − − A y m M A y m p(y ) = x x ⎥ (2π )n / 2 (det M )1/ 2 det A ⎢⎣ 2 ⎦ 1 ⎡ 1 ⎤ ′ −1 = exp ⎢− (y − m y ) Q (y − m y )⎥ n/2 1/2 (2π ) (det Q ) ⎣ 2 ⎦ where m y = Am x and Q = AMA′ The linear transformation of a set of jointly Gaussian random variables results in another set of jointly Gaussian random variables. 109
2.1.4 Some Useful Probability Distributions Performing a linear transformation that results in n statistically independent Gaussian random variables Gaussian random variables are statistically independent if they are pairwise-uncorrelated, i.e., if the covariance matrix Q is diagonal: Q=AMA’=D, where D is a diagonal matrix. Since M is the covariance matrix, one solution is to select A to be an orthogonal matrix (A’=A-1) consisting of columns that are the eigenvectors of the covariance matrix M. Then, D is a diagonal matrix with diagonal elements equal to the eigenvalues of M. To find the eigenvalues of M, solve the characteristic equation: det(M-λI)=0. (See Example 2.1-5 on P. 51) 110
2.1.5 Upper Bounds on the Tail Probability Chebyshev inequality Suppose X is an arbitrary random variable with finite mean mx and finite variance σx2. For any positive number δ:
σ x2 P( X − mx ≥ δ ) ≤ 2 δ Proof: ∞
σ = ∫ ( x − mx ) 2 p ( x ) dx ≥ ∫ 2 x
| x − m x | ≥δ
−∞
≥δ 2∫
| x − m x | ≥δ
( x − mx ) 2 p ( x ) dx
p ( x ) dx = δ 2 P (| X − mx |≥ δ )
111
2.1.5 Upper Bounds on the Tail Probability Chebyshev inequality It is apparent that the Chebyshev inequality is simply an upper bound on the area under the tails of the PDF p(y), where Y=X-mx, i.e., the area of p(y) in the intervals (-∞,δ) and (δ,∞). Chebyshev inequality may be expressed as: σ x2 1 − [ FY (δ ) − FY ( −δ )] ≤ 2 δ or, equivalently, as
σ x2 1 − [ FX ( mx + δ ) − FX ( mx − δ )] ≤ 2 δ 112
2.1.5 Upper Bounds on the Tail Probability Chebyshev inequality Another way to view the Chebyshev bound is working with the zero mean random variable Y=X-mx. Define a function g(Y) as:
⎧1 g (Y ) = ⎨ ⎩0
(Y ≥ δ ) (Y < δ )
and E [g (Y )] = P ( Y ≥ δ )
Upper-bound g(Y) by the quadratic
(Y/δ)2,
⎛Y ⎞ ( ) ≤ g Y ⎜ ⎟ i.e. ⎝δ ⎠
( )
2 2 2 σ ⎞ ⎛ σ Y E Y y The tail probability E [g (Y )] ≤ E ⎜ ⎟ = x = = 2 2 2 ⎜δ2 ⎟ δ δ δ ⎝ ⎠
113
2
2
2.1.5 Upper Bounds on the Tail Probability Chebychev inequality A quadratic upper bound on g(Y) used in obtaining the tail probability (Chebyshev bound)
For many practical applications, the Chebyshev bound is extremely loose. 114
2.1.5 Upper Bounds on the Tail Probability Chernoff bound The Chebyshev bound given above involves the area under the two tails of the PDF. In some applications we are interested only in the area under one tail, either in the interval (δ, ∞) or in the interval (-∞, δ). In such a case, we can obtain an extremely tight upper bound by over-bounding the function g(Y) by an exponential having a parameter that can be optimized to yield as tight an upper bound as possible. Consider the tail probability in the interval (δ, ∞). ⎧⎪1 g (Y ) ≤ e and g (Y ) is defined as g (Y ) = ⎨ ⎪⎩0 where v ≥ 0 is the parameter to be optimized. v (Y −δ )
115
(Y ≥ δ ) (Y < δ )
2.1.5 Upper Bounds on the Tail Probability Chernoff bound
The expected value of g(Y) is
(
E [g ( y )] = P (Y ≥ δ ) ≤ E e v (Y −δ ) This bound is valid for any υ ≥0. 116
)
2.1.5 Upper Bounds on the Tail Probability Chernoff bound The tightest upper bound is obtained by selecting the value that minimizes E(eυ(Y-δ)). A necessary condition for a minimum is: d E e v (Y −δ ) = 0 dv
(
)
d d E (e v (Y −δ ) ) = E ( e v (Y −δ ) ) dv dv = E[(Y − δ )e v (Y −δ ) ] = e − vδ [ E (Ye v (Y −δ ) ) − δ E (e vY )] = 0
(
)
( )
→ E YevY − δE e vY = 0 117
2.1.5 Upper Bounds on the Tail Probability Chernoff bound Let vˆ be the solution, the upper bound on the one - sided tail probability is :
( )
P(Y ≥ δ ) ≤ e −vˆδ E e vˆY
An upper bound on the lower tail probability can be obtained in a similar manner, with the result that
( )
P (Y ≤ δ ) ≤ e − vˆδ E e vˆY
δ > 1 : P(Y ≥ δ ) ≤ 120
2
δ 2
)
1− 1+δ 2
e
e
−δ
⎛ 1 ⎞ ⎜ 2 for Chebyshev bound ⎟ ⎝δ ⎠
2.1.5 Upper Bounds on the Tail Probability Chernoff bound for random variable with a nonzero mean: Let Y=X-mx, we have: P (Y ≥ δ ) = P ( X − m x ≥ δ ) = P ( X ≥ m x + δ ) ≡ P ( X ≥ δ m ) ⎧⎪1 ( X ≥ δm ) Define: g ( X ) = ⎨ ( X < δm ) ⎪⎩0
and upper-bounded as: g ( X ) ≤ e
v ( X −δ m )
ˆ We can obtain the result: P ( X ≥ δ m ) ≤ e − vˆδ m E ( evX ) For δ0) approaches zero as n approaches infinity. The upper bound converges to zero relatively slowly, i.e., inversely with n. 124
2.1.6 Sums of Random Variables and the Central Limit Theorem The Chernoff bound
⎛1 n ⎞ P (Y − m y ≥ δ ) = P ⎜ ∑ X i − m x ≥ δ ⎟ ⎝ n i =1 ⎠ ⎧⎪ ⎡ ⎛ n ⎛ n ⎞ ⎞ ⎤ ⎫⎪ = P ⎜ ∑ X i ≥ nδ m ⎟ ≤ E ⎨exp ⎢v⎜ ∑ X i − nδ m ⎟ ⎥ ⎬ ⎪⎩ ⎝ i =1 ⎠ ⎠ ⎦ ⎪⎭ ⎣ ⎝ i =1
where δ m = m x + δ and δ > 0. ⎧⎪ ⎡ ⎛ n ⎡ ⎛ n ⎞ ⎤ ⎫⎪ ⎞⎤ − vnδ m E ⎨exp ⎢v⎜ ∑ X i − nδ m ⎟ ⎥ ⎬ = e E ⎢exp ⎜ v ∑ X i ⎟ ⎥ ⎪⎩ ⎠ ⎦ ⎪⎭ ⎣ ⎝ i =1 ⎣ ⎝ i =1 ⎠ ⎦ =e
− vnδ m
∏ E (e ) = [e n
i =1
vX i
− vδ m
( )]
Ee
vX
n
(*)
(Note : X is any one of the X i ). 125
2.1.6 Sums of Random Variables and the Central Limit Theorem The Chernoff bound (cont.) The parameter υ that yields the tightest upper bound is obtained by differentiating Equation (*) and setting the derivative equal to zero. This yields the equation:
(
)
( )
E Xe vX − δ m E e vX = 0
The bound on the upper tail probability is ⎛1 n ⎞ − vˆδ m vˆX n P⎜ ∑ X i ≥ δ m ⎟ ≤ e E (e ) , δ m > m x ⎝ n i =1 ⎠
[
]
In a similar manner, the lower tail probability is upperbounded as: − vˆδ m vˆX n P(Y ≤ δ m ) ≤ e E (e ) , δm < mx
[
]
126
2.1.6 Sums of Random Variables and the Central Limit Theorem Central limit theorem Consider the case in which the random variables Xi, i=1,2,…,n, being summed are statistically independent and identically distributed, each having a finite mean mx and a finite variance σx2. Define a normalized random variable Ui with a zero mean and unit variance. X − mx Ui = i , i = 1,2,..., n
σx
1 n U i . Note that Y has zero mean and unit variance. Let Y = ∑ n i =1 We wish to determine the CDF of Y in the limit as n→∞. 127
2.1.6 Sums of Random Variables and the Central Limit Theorem Central limit theorem The characteristic function of Y is: n ⎡ ⎛ ⎢ ⎜ jv ∑ U i ψ Y ( jv ) = E ( e jvY ) = E ⎢ exp ⎜ i =1 ⎢ n ⎜ ⎜ ⎢ ⎝ ⎣ n ⎛ jv ⎞ = ∏ ψU i ⎜ ⎟ ⎝ n⎠ i =1
⎡ ⎛ jv ⎞ ⎤ = ⎢ψ U ⎜ ⎟⎥ n ⎠⎦ ⎣ ⎝
⎞⎤ ⎟⎥ ⎟⎥ ⎟⎥ ⎟⎥ ⎠⎦
n
Where U denotes any of the Ui, which are identically distributed. 128
2.1.6 Sums of Random Variables and the Central Limit Theorem Central limit theorem Expand the characteristic function of U in a Taylor series: ⎛
ψU ⎜ j ⎝
v ⎞ v v 2 j E U E U = + − + 1 ( ) ( ) ⎟ n 2! n⎠ n 2
( jv )
3
( n ) 3! 3
E (U 3 ) − ...
Since E (U ) = 0, and E (U 2 ) = 1, the above equation simplifies to: v2 1 ⎛ v ⎞ ψU ⎜ j ⎟ = 1 − 2 n + n R ( v, n ) n⎠ ⎝ where R ( v,n ) /n denotes the remainder. We note that R ( v,n ) approaches zero as n → ∞. ⎡ v R ( v, n ) ⎤ ψ Y ( jv ) = ⎢1 − + ⎥ n n 2 ⎣ ⎦ 2
n
⎡ v 2 R ( v, n ) ⎤ ⇒ lnψ Y ( jv ) = n ln ⎢1 − + ⎥ n n 2 ⎣ ⎦ 129
2.1.6 Sums of Random Variables and the Central Limit Theorem Central limit theorem
1 2 1 3 For small value of x, ln (1 + x ) = x − x + x − ... 2 3 ⎡ v 2 R (v, n ) 1 ⎛ v 2 R (v, n ) ⎞ 2 ⎤ ⎟⎟ + ...⎥ lnψ Y ( jv ) = n ⎢ − + − ⎜⎜ − + 2 ⎝ 2n n n ⎠ ⎢⎣ 2n ⎥⎦ 1 2 −v 2 / 2 lim lnψ Y ( jv ) = − v ⇒ lim ψ Y ( jv ) = e n→∞ n →∞ 2 The above equation is the characteristic function of a Gaussian random variable with zero mean and unit variance. We can conclude that the sum of statistically independent and identically distributed random variables with finite mean and variance approaches a Gaussian CDF as n→∞. 130
2.1.6 Sums of Random Variables and the Central Limit Theorem Central limit theorem Although we assumed that the random variables in the sum are identically distributed, the assumption can be relaxed provided that additional restrictions are imposed on the properties of the random variables. If the random variables Xi are i.i.d., sum of n = 30 random variables is adequate for most applications. If the functions fi(x) are smooth, values of n as low as 5 can be used. For more information, referred to Cramer (1946) and Papoulis “Probability, Random Variables, and Stochastic Processes”, pp. 214-221, 3rd Edition. 131
2.2 Stochastic Processes Many of random phenomena that occur in nature are functions of time. In digital communications, we encounter stochastic processes in: The characterization and modeling of signals generated by information sources; The characterization of communication channels used to transmit the information; The characterization of noise generated in a receiver; The design of the optimum receiver for processing the received random signal. 132
2.2 Stochastic Processes Introduction At any given time instant, the value of a stochastic process is a random variable indexed by the parameter t. We denote such a process by X(t). In general, the parameter t is continuous, whereas X may be either continuous or discrete, depending on the characteristics of the source that generates the stochastic process. The noise voltage generated by a single resistor or a single information source represents a single realization of the stochastic process. It is called a sample function.
133
2.2 Stochastic Processes Introduction (cont.) The set of all possible sample functions constitutes an ensemble of sample functions or, equivalently, the stochastic process X(t). In general, the number of sample functions in the ensemble is assumed to be extremely large; often it is infinite. Having defined a stochastic process X(t) as an ensemble of sample functions, we may consider the values of the process at any set of time instants t1>t2>t3>…>tn, where n is any positive integer. In general, the random variables X ti ≡ X (ti ), i = 1,2,..., n, are
(
)
characteri zed statistica lly by their joint PDF p xt1 , xt 2 ,..., xt n . 134
2.2 Stochastic Processes Stationary stochastic processes
Consider another set of n random variables X ti +t ≡ X ( ti + t ) ,
i = 1, 2,..., n, where t is an arbitrary time shift. These random
(
)
variables are characterized by the joint PDF p xt1 +t , xt2 +t ,..., xtn +t .
The jont PDFs of the random variables X ti and X ti +t ,i = 1, 2 ,...,n, may or may not be identical. When they are identical, i.e., when
(
)
(
p xt1 , xt2 ,..., xtn = p xt1 +t , xt2 +t ,..., xtn +t
)
for all t and all n, it is said to be stationary in the strict sense.
When the joint PDFs are different, the stochastic process is non-stationary. 135
2.2.1 Stochastic Processes Averages for a stochastic process are called ensemble averages. The nth moment of the random variable X ti is defined as :
( )= ∫
E X
n ti
∞
−∞
( )
xtni p xti dxti
In general, the value of the nth moment will depend on the time instant ti if the PDF of X ti depends on ti .
( ) ( )
When the process is stationary , p xti + t = p xti for all t. Therefore, the PDF is independen t of time, and, as a consequenc e, the nth moment is independen t of time.
136
2.2.1 Stochastic Processes Two random variables: X ti ≡ X ( ti ) , i = 1, 2. The correlation is measured by the joint moment:
(
)
∞
(
∞
)
E X t1 X t2 = ∫ ∫ xt1 xt2 p xt1 , xt2 dxt1 dxt2 −∞ −∞ Since this joint moment depends on the time instants t1 and t2, it is denoted by φ(t1 ,t2). φ(t1 ,t2) is called the autocorrelation function of the stochastic process. For a stationary stochastic process, the joint moment is: E ( X t1 X t2 ) = φ (t1 , t2 ) = φ (t1 − t2 ) = φ (τ )
φ (−τ ) = E ( X t X t +τ ) = E ( X t +τ X t ) = E ( X t X t −τ ) = φ (τ ) 1
1
1
1
' 1
' 1
Average power in the process X(t): φ(0)=E(Xt2). 137
2.2.1 Stochastic Processes Wide-sense stationary (WSS) A wide-sense stationary process has the property that the mean value of the process is independent of time (a constant) and where the autocorrelation function satisfies the condition that φ(t1,t2)=φ(t1-t2). Wide-sense stationarity is a less stringent condition than strict-sense stationarity. If not otherwise specified, any subsequent discussion in which correlation functions are involved, the less stringent condition (wide-sense stationarity) is implied.
138
2.2.1 Stochastic Processes Auto-covariance function The auto-covariance function of a stochastic process is defined as: μ (t1 , t 2 ) = E X t1 − m(t1 ) X t 2 − m(t 2 ) = φ (t1 , t 2 ) − m(t1 ) m(t 2 )
{[
][
]}
When the process is stationary, the auto-covariance function simplifies to: μ (t1 , t 2 ) = μ (t1 − t 2 ) = μ (τ ) = φ (τ ) − m 2 For a Gaussian random process, higher-order moments can be expressed in terms of first and second moments. Consequently, a Gaussian random process is completely characterized by its first two moments. 139
2.2.1 Stochastic Processes Averages for a Gaussian process Suppose that X(t) is a Gaussian random process. At time instants t=ti, i=1,2,…,n, the random variables Xti, i=1,2,…,n, are jointly Gaussian with mean values m(ti), i=1,2,…,n, and auto-covariances: μ (ti , t j ) = E X ti − m (ti ) X ti − m (t j ) , i, j = 1,2,......, n
[(
)(
)]
If we denote the n × n covariance matrix with elements μ(ti,tj) by M and the vector of mean values by mx, the joint PDF of the random variables Xti, i=1,2,…,n, is given by: 1 ⎤ ⎡ 1 ′ −1 p ( x1 , x2 ,..., xn ) = exp ⎢ − (x − m x ) M (x − m x )⎥ 1/ 2 n/2 (2π ) (det M ) ⎦ ⎣ 2 If the Gaussian process is wide-sense stationary, it is also strict-sense stationary. 140
2.2.1 Stochastic Processes Averages for joint stochastic processes Let X(t) and Y(t) denote two stochastic processes and let Xti≣X(ti), i=1,2,…,n, Yt’j≣Y(t’j), j=1,2,…,m, represent the random variables at times t1>t2>t3>…>tn, and t’1>t’2>t’3>…>t’m , respectively. The two processes are characterized statistically by their joint PDF:
(
)
p xt1 , xt 2 , ..., xt n , yt ' , yt ' ,..., yt ' m 1 2 The cross-correlation function of X(t) and Y(t), denoted by φxy(t1,t2), is defined as the joint moment:
φ xy (t1 , t 2 ) = E ( X t Yt ) = ∫ 1
2
∞
∫
∞
−∞ −∞
xt1 y t 2 p ( xt1 , y t 2 ) dxt1 dy t 2
The cross-covariance is: μ xy (t1 , t 2 ) = φ xy (t1 , t 2 ) − mx (t1 )m y (t 2 ) 141
2.2.1 Stochastic Processes Averages for joint stochastic processes When the process are jointly and individually stationary, we have φxy(t1,t2)=φxy(t1-t2), and μxy(t1,t2)= μxy(t1-t2):
φ xy ( −τ ) = E ( X t Yt +τ ) = E ( X t −τ Yt ) = E (Yt X t −τ ) = φ yx (τ ) 1
' 1
1
' 1
' 1
' 1
The stochastic processes X(t) and Y(t) are said to be statistically independent if and only if : p ( xt1 , xt 2 ,..., xt n , y t ' , y t ' ,..., y t ' ) = p ( xt1 , xt 2 ,..., xt n ) p ( y t ' , y t ' ,..., y t ' ) 1
m
2
1
2
m
for all choices of ti and t’i and for all positive integers n and m. The processes are said to be uncorrelated if
φ xy (t1 , t 2 ) = E ( X t ) E (Yt ) 1
2
142
⇒
μ xy (t1 , t 2 ) = 0
2.2.1 Stochastic Processes Complex-valued stochastic process A complex-valued stochastic process Z(t) is defined as:
Z (t ) = X (t ) + jY (t ) where X(t) and Y(t) are stochastic processes. The joint PDF of the random variables Zti≣Z(ti), i=1,2,…,n, is given by the joint PDF of the components (Xti, Yti), i=1,2,…,n. Thus, the PDF that characterizes Zti, i=1,2,…,n, is: p ( xt1 , xt 2 ,..., xt n , yt1 , yt 2 ,..., yt n ) The autocorrelation function is defined as: 1 1 (**) φzz (t1 , t2 ) ≡ E ( Zt1 Zt∗2 ) = E ⎡⎣ X t1 + jYt1 X t2 − jYt2 ⎤⎦ 2 2 1 = φxx (t1 , t2 ) + φ yy (t1 , t2 ) + j ⎡⎣φ yx (t1 , t2 ) − φxy (t1 , t2 ) ⎤⎦ 2
(
{
)(
)
}
143
2.2.1 Stochastic Processes Averages for joint stochastic processes: When the processes X(t) and Y(t) are jointly and individually stationary, the autocorrelation function of Z(t) becomes:
φ zz (t1 , t 2 ) = φ zz (t1 − t 2 ) = φ zz (τ ) φZZ(τ)= φ*ZZ(-τ) because from (**): 1 1 1 ∗ ∗ φ (τ ) = E ( Z t1 Z t1 −τ ) = E ( Z t ' +τ Z t ' ) = E ( Z t ' Z t∗' +τ ) = φ zz ( −τ ) 1 1 1 1 2 2 2 ∗ zz
144
2.2.1 Stochastic Processes Averages for joint stochastic processes: Suppose that Z(t)=X(t)+jY(t) and W(t)=U(t)+jV(t) are two complex-valued stochastic processes. The cross-correlation functions of Z(t) and W(t) is defined as: 1 φzw (t1 , t2 ) ≡ E ( Z t1 ⋅ Wt2∗ ) 2 1 = E ⎡ X t1 + jYt1 U t2 − jVt2 ⎤ ⎦ 2 ⎣ 1 = φxu (t1 , t2 ) + φ yu (t1 , t2 ) + j ⎡⎣φ yu (t1 , t2 ) − φxu (t1 , t2 ) ⎤⎦ 2
(
)(
{
)
}
When X(t), Y(t),U(t) and V(t) are pairwise-stationary, the cross-correlation function become functions of the time difference. 1 1 1 ∗ ∗ ∗ φ zw (τ ) = E ( Z t1 Wt1 −τ ) = E ( Z t ' +τ Wt ' ) = E (Wt ' Z t∗' +τ ) = φwz (−τ ) 1 1 i i 2 2 2 145
2.2.2 Power Density Spectrum A signal can be classified as having either a finite (nonzero) average power (infinite energy) or finite energy. The frequency content of a finite energy signal is obtained as the Fourier transform of the corresponding time function. If the signal is periodic, its energy is infinite and, consequently, its Fourier transform does not exist. The mechanism for dealing with periodic signals is to represent them in a Fourier series. 146
2.2.2 Power Density Spectrum A stationary stochastic process is an infinite energy signal, and, hence, its Fourier transform does not exist. The spectral characteristic of a stochastic signal is obtained by computing the Fourier transform of the autocorrelation function. The distribution of power with frequency is given by the function: ∞ Φ ( f ) = ∫ φ (τ )e − j 2πfτ dτ −∞
The inverse Fourier transform relationship is: ∞ j 2πfτ ( ) ( ) φτ =∫ Φ f e df −∞
147
2.2.2 Power Density Spectrum ∞
( )≥ 0
φ (0) = ∫ Φ( f )df = E X t −∞
2
Since φ(0) represents the average power of the stochastic signal, which is the area under Φ(f ), Φ(f ) is the distribution of power as a function of frequency. Φ(f ) is called the power density spectrum of the stochastic process. If the stochastic process is real, φ(τ) is real and even, and, (2.2-13) hence Φ(f ) is real and even. If the stochastic process is complex, φ(τ)=φ*(-τ) and Φ(f ) ∞ ∞ is real because: * * j 2π f τ Φ ( f ) = ∫ φ (τ ) e dτ = ∫ φ * ( −τ ' ) e − j 2π f τ ' dτ ' −∞
−∞
∞
= ∫ φ (τ ) e − j 2π f τ dτ = Φ ( f −∞
148
)
2.2.2 Power Density Spectrum Cross-power density spectrum For two jointly stationary stochastic processes X(t) and Y(t), which have a cross-correlation function φxy(τ), the Fourier ∞ transform is: Φ xy ( f ) = ∫ φ xy (τ )e − j 2πfτ dτ −∞
Φxy( f ) is called the cross-power density spectrum. ∞ ∞ * * j 2πfτ Φ xy ( f ) = ∫ φ xy (τ )e dτ = ∫ φ xy* (− τ )e − j 2πfτ dτ −∞
−∞
(2.2-15)
∞
= ∫ φ yx (τ )e − j 2πfτ dτ = Φ yx ( f ) −∞
If X(t) and Y(t) are real stochastic processes
Φ
* xy
∞
( f ) = ∫−∞ φxy (τ )e j 2π f τ dτ = Φ xy ( − f ) 149
→ Φ yx ( f ) = Φ xy ( − f )
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal Consider a linear time-invariant system (filter) that is characterized by its impulse response h(t) or equivalently, by its frequency response H( f ), where h(t) and H( f ) are a Fourier transform pair. Let x(t) be the input signal to the system and let y(t) denote the output signal. ∞ y ( t ) = ∫ h (τ )x ( t − τ ) dτ −∞
Suppose that x(t) is a sample function of a stationary stochastic process X(t). Since convolution is a linear operation performed on the input signal x(t), the expected value of the integral is equal to the integral of the expected value. ∞ m y = E ⎡⎣Y ( t ) ⎤⎦ = ∫ h (τ )E ⎡⎣ X ( t − τ ) ⎤⎦ dτ −∞
∞
= mx ∫ h (τ )dτ = mx H ( 0 ) −∞
stationary
The mean value of the output process is a constant. 150
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal The autocorrelation function of the output is:
(
)
1 φ yy (t1 , t 2 ) = E Yt1Yt 2* 2 1 ∞ ∞ = ∫ ∫ h(β )h * (α )E X (t1 − β )X * (t 2 − α ) dαdβ 2 −∞ −∞
[
=∫
∞
∫
∞
−∞ −∞
]
h(β )h * (α )φ xx (t1 − t 2 + α − β )dαdβ
If the input process is stationary, the output is also stationary:
φ yy (τ ) = ∫
∞
∫
∞
−∞ −∞
h* (α )h ( β ) φ xx (τ + α − β ) dα d β
151
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal The power density spectrum of the output process is: ∞
Φ yy ( f ) = ∫ φ yy (τ )e − j 2π f τ dτ −∞
=∫
∞
∞
∫ ∫
∞
−∞ −∞ −∞
h* (α )h ( β ) φxx (τ + α − β ) e − j 2π f τ dτ dα d β
= Φ xx ( f ) H ( f )
2
(by making τ0=τ+α-β) The power density spectrum of the output signal is the product of the power density spectrum of the input multiplied by the magnitude squared of the frequency response of the system.
152
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal When the autocorrelation function φyy(τ) is desired, it is usually easier to determine the power density spectrum Φyy(f ) and then to compute the inverse transform. ∞
φ yy (τ ) = ∫ Φ yy ( f )e j 2π f τ df −∞ ∞
= ∫ Φ xx ( f ) H ( f −∞
)
2
e j 2π f τ df
The average power in the output signal is: ∞
φ yy ( 0 ) = ∫ Φ xx ( f ) H ( f ) df 2
−∞
Since φyy(0)=E(|Yt|2) , we have:
∫
∞
−∞
Φ xx ( f ) H ( f ) df ≥ 0 2
153
valid for any H(f ).
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal Suppose we let |H( f )|2=1 for any arbitrarily small interval f1 ≤ f ≤ f2 , and H( f )=0 outside this interval. Then, we have:
∫
f2 f1
Φ xx ( f )df ≥ 0
This is possible if an only if Φxx( f )≥0 for all f. Conclusion: Φxx( f )≥0 for all f.
154
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal
Example 2.2-1 Suppose that the low-pass filter is excited by a stochastic process x(t) having a power density spectrum Φxx(f)=N0/2 for all f. A stochastic process having a flat power density spectrum is called white noise. Let us determine the power density spectrum of the output process. The transfer function of the low-pass filter is: R 1 = H(f )= R + j 2πfL 1 + j 2πfL / R ⇒ H(f) = 2
1 1 + ( 2π L / R ) f 2 2
155
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal
Example 2.2-1 (cont.) The power density spectrum of the output process is: N0 1 Φ yy ( f ) = 2 1 + ( 2π L / R )2 f 2
The inverse Fourier transform yields the autocorrelation function: ∞ N 1 j 2π f τ 0 φ yy (τ ) = ∫ e df 2 2 −∞ 2 1 + ( 2π L / R ) f RN 0 −( R / L ) τ = e 4L 156
2.2.3 Response of a Linear Time-Invariant System to a Random Input Signal
Cross-correlation function between y(t) and x(t)
(
[
)
]
1 1 ∞ * φ yx (t1 , t 2 ) = E Yt1 X t 2 = ∫ h(α )E X (t1 − α )X * (t 2 ) dα 2 2 −∞
Function of t1-t2
∞
= ∫ h(α )φ xx (t1 − t 2 − α )dα = φ yx (t1 − t 2 ) −∞
The stochastic processes X (t ) and Y (t ) are jointly stationary . With t1-t2=τ, we have: ∞
φ yx (τ ) = ∫ h(α )φ xx (τ − α )dα −∞
In the frequency domain, we have: Φ yx ( f ) = Φ xx ( f )H ( f ) If the input process is white noise, the cross correlation of the input with the output of the system yields the impulse response h(t) to within a scale factor. 157
2.2.4 Sampling Theorem for BandLimited Stochastic Processes A deterministic signal s(t) that has a Fourier transform S(f) is called band-limited if S(f )=0 for | f |>W, where W is the highest frequency contained in s(t). A band-limited signal is uniquely represented by samples of s(t) taken at a rate of fs≥2W samples/s. The minimum rate fN=2W samples/s is called the Nyquist rate. Sampling below the Nyquist rate results in frequency aliasing.
158
2.2.4 Sampling Theorem for BandLimited Stochastic Processes The band-limited signal sampled at the Nyquist rate can be reconstructed from its samples by use of the interpolation formula: n ⎞⎤ ⎡ ⎛ sin ⎢2πW ⎜ t − ⎟⎥ ∞ ⎛ n ⎞ ⎣ ⎝ 2W ⎠⎦ s (t ) = ∑ s ⎜ ⎟ n ⎞ ⎛ n = −∞ ⎝ 2W ⎠ 2πW ⎜ t − ⎟ ⎝ 2W ⎠ where {s(n/2W)} are the samples of s(t) taken at t=n/2W, n=0,±1, ±2,…. Equivalently, s(t) can be reconstructed by passing the sampled signal through an ideal low-pass filter with impulse response h(t)=(sin2πWt)/2πWt. 159
2.2.4 Sampling Theorem for BandLimited Stochastic Processes The signal reconstruction process based on ideal interpolation can be illustrated in the following figure:
160
2.2.4 Sampling Theorem for BandLimited Stochastic Processes A stationary stochastic process X(t) is said to be bandlimited if its power density spectrum Φ(f)=0 for |f|>W. Since Φ(f) is the Fourier transform of the autocorrelation function φ(τ), it follows that φ(τ) can be represented as: n ⎞⎤ ⎡ ⎛ sin ⎢2πW ⎜τ − ⎟⎥ ∞ 2W ⎠⎦ ⎛ n ⎞ ⎣ ⎝ φ (τ ) = ∑ φ ⎜ ⎟ n ⎞ ⎛ n = −∞ ⎝ 2W ⎠ 2πW ⎜τ − ⎟ 2W ⎠ ⎝ where {φ(n/2W)} are the samples of φ(τ) taken at τ=n/2W, n=0,±1, ±2,…. 161
2.2.4 Sampling Theorem for BandLimited Stochastic Processes If X(t) is a band-limited stationary stochastic process, then X(t) can be represented as: ⎡ n ⎞⎤ ⎛ sin ⎢ 2πW ⎜ t − ⎟⎥ ∞ ⎛ n ⎞ ⎣ ⎝ 2W ⎠ ⎦ X (t ) = ∑ X ⎜ (A) ⎟ ⎝ 2W ⎠ 2πW ⎛ t − n ⎞ n = −∞ ⎜ ⎟ ⎝ 2W ⎠ where {X(n/2W)} are the samples of X(t) taken at t=n/2W, n=0,±1, ±2,…. This is the sampling representation for a stationary stochastic process.
162
2.2.4 Sampling Theorem for BandLimited Stochastic Processes The samples are random variables that are described statistically by appropriate joint probability density functions. The signal representation in (A) is easily established by showing that: ⎧ ⎪ ∞ ⎪ ⎛ n E ⎨ X (t ) − ∑ X ⎜ ⎝ 2W n = −∞ ⎪ ⎪ ⎩
n ⎞⎤ ⎡ ⎛ sin ⎢2πW ⎜ t − ⎟⎥ W 2 ⎞ ⎣ ⎝ ⎠⎦ ⎟ ⎠ 2πW ⎛ t − n ⎞ ⎟ ⎜ ⎝ 2W ⎠
2
⎫ ⎪ ⎪ ⎬=0 ⎪ ⎪ ⎭
Equality between the sampling representation and the stochastic process X(t) holds in the sense that the mean square error is zero. 163
2.2.5 Discrete-Time Stochastic Signals and Systems Discrete-time stochastic process X(n) consisting of an ensemble of sample sequences {x(n)} are usually obtained by uniformly sampling a continuous-time stochastic process. The mth moment of X(n) is defined as:
[ ]= ∫
E X
m n
∞
−∞
X nm p ( X n )dX n
The autocorrelation sequence is: 1 1 ∞ ∞ * φ (n, k ) = E (X n X k ) = ∫ ∫ X n X k* p ( X n , X k )dX n dX k 2 2 −∞ −∞ The auto-covariance sequences is: 1 μ (n, k ) = φ (n, k ) − E ( X n )E (X k* ) 2 164
2.2.5 Discrete-Time Stochastic Signals and Systems For a stationary process, we have φ(n,k)≣φ(n-k), μ(n,k)≣μ(n-k), and 1 2 μ (n − k ) = φ (n − k ) − mx 2 where mx=E(Xn) is the mean value. A discrete-time stationary process has infinite energy but a finite average power, which is given as: 2 E X n = φ (0 ) The power density spectrum for the discrete-time process is obtained by computing the Fourier transform of φ(n).
( )
Φ(f )=
∞
∑
φ (n )e − j 2 π fn
n = −∞
165
2.2.5 Discrete-Time Stochastic Signals and Systems The inverse transform relationship is:
φ (n ) = ∫
12
−1 2
Φ ( f )e j 2πfn df
The power density spectrum Φ( f ) is periodic with a period fp=1. In other words, Φ( f+k)=Φ( f ) for k=0,±1,±2,…. The periodic property is a characteristic of the Fourier transform of any discrete-time sequence.
166
2.2.5 Discrete-Time Stochastic Signals and Systems Response of a discrete-time, linear time-invariant system to a stationary stochastic input signal. The system is characterized in the time domain by its unit sample response h(n) and in the frequency domain by the frequency response H(f ). H(f )=
∞
− j 2πfn ( ) h n e ∑
n = −∞
The response of the system to the stationary stochastic input signal X(n) is given by the convolution sum: y (n ) =
∞
∑ h (k )x (n − k )
k = −∞
167
2.2.5 Discrete-Time Stochastic Signals and Systems Response of a discrete-time, linear time-invariant system to a stationary stochastic input signal. The mean value of the output of the system is: m y = E ⎡⎣ y ( n ) ⎤⎦ = = mx
∞
∑ h ( k ) E ⎡⎣ x ( n − k )⎤⎦
k = −∞
∞
∑ h (k ) = m
k = −∞
x
H (0 )
where H(0) is the zero frequency [direct current (DC)] gain of the system.
168
2.2.5 Discrete-Time Stochastic Signals and Systems The autocorrelation sequence for the output process is: φ yy ( k ) = 12 E ⎡⎣ y ∗ ( n ) y ( n + k ) ⎤⎦ = =
∞
1 2
∞
∑∑
i =−∞ j =−∞ ∞
∞
∑∑
i =−∞ j =−∞
h∗ ( i ) h ( j ) E ⎡⎣ x ∗ ( n − i ) x ( n + k − j ) ⎤⎦
h∗ ( i ) h ( j ) φ xx ( k − j + i )
By taking the Fourier transform of φyy(k) and substituting the relation in Equation 2.2-49, we obtain the corresponding frequency domain relationship: 2 Φ yy ( f ) = Φ xx ( f ) H ( f ) Φyy( f ), Φxx( f ), and H( f ) are periodic functions of frequency with period fp=1. 169
2.2.6 Cyclostationary Processes For signals that carry digital information, we encounter stochastic processes with statistical averages that are periodic. Consider a stochastic process of the form: X (t ) =
∞
∑ a g ( t − nT )
n =−∞
n
where {an} is a discrete-time sequence of random variables with mean ma=E(an) for all n and autocorrelation sequence φaa(k)=E(a*nan+k)/2. The signal g(t) is deterministic. The sequence {an} represents the digital information sequence that is transmitted over the communication channel and 1/T represents the rate of transmission of the information symbols. 170
2.2.6 Cyclostationary Processes The mean value is: E ⎡⎣ X ( t ) ⎤⎦ =
∞
∑ E ( a ) g ( t − nT ) n
n =−∞
= ma
∞
∑ g ( t − nT )
n =−∞
The mean is time-varying and it is periodic with period T. The autocorrelation function of X(t) is: φ xx ( t + τ , t ) = 12 E ⎡⎣ X ( t + τ ) X ∗ ( t ) ⎤⎦ = =
∞
1 2
∞
∑ ∑
n =−∞ m =−∞ ∞
∞
∑ ∑
n =−∞ m =−∞
E ( an∗ am ) g ∗ ( t − nT ) g ( t + τ − mT )
φaa ( m − n ) g ∗ ( t − nT ) g ( t + τ − mT ) 171
2.2.6 Cyclostationary Processes We observe that φ xx ( t + τ + kT , t + kT ) = φ xx ( t + τ , t ) for k=±1,±2,…. Hence, the autocorrelation function of X(t) is also periodic with period T. Such a stochastic process is called cyclostationary or periodically stationary. Since the autocorrelation function depends on both the variables t and τ, its frequency domain representation requires the use of a two-dimensional Fourier transform. The time-average autocorrelation function over a single period is defined as: 1 T2 φ xx (τ ) = ∫ φ xx ( t + τ , t ) dt T −T 2 172
2.2.6 Cyclostationary Processes Thus, we eliminate the tie dependence by dealing with the average autocorrelation function. The Fourier transform of φxx(τ) yields the average power density spectrum of the cyclostationary stochastic process. This approach allows us to simply characterize cyclostationary process in the frequency domain in terms of the power spectrum. The power density spectrum is: ∞
Φ xx ( f ) = ∫ φ xx (τ ) e − j 2π f τ dτ −∞
173