Chapter 3 Source Coding

63 downloads 127 Views 3MB Size Report
3.4 Coding for Analog Sources – Optimum ... 3.5 Coding Techniques for Analog Sources ... Whether a source is analog or discrete, a digital communication.
Chapter 3 Source Coding

Table of Contents 3.1 Mathematical Models for Information Sources 3.2 A Logarithmic Measure of Information 3.2.1 Average Mutual Information and Entropy 3.2.2 Information Measures for Continuous Random Variables

3.3 Coding for Discrete Sources 3.3.1 Coding for Discrete Memoryless Sources 3.3.2 Discrete Stationary Sources 3.3.3 The Lempel-Ziv Algorithm

2

WITS Lab, NSYSU.

Table of Contents (cont.) 3.4 Coding for Analog Sources – Optimum Quantization 3.4.1 Rate-Distortion Function 3.4.2 Scalar Quantization 3.4.3 Vector Quantization

3.5 Coding Techniques for Analog Sources 3.5.1 Temporal Waveform Coding 3.5.2 Spectral Waveform Coding 3.5.3 Model-Based Source Coding

3

WITS Lab, NSYSU.

Introduction Two types of sources: analog source and digital source. Whether a source is analog or discrete, a digital communication system is designed to transmit information in digital form. The output of the source must be converted to a format that can be transmitted digitally. This conversion of the source output to a digital form is generally performed by the source encoder, whose output may be assumed to be a sequence of binary digits. In this chapter, we treat source encoding based on mathematical models of information sources and provide a quantitative measure of the information emitted by a source.

4

WITS Lab, NSYSU.

3.1 Mathematical Models for Information Sources The output of any information source is random. The source output is characterized in statistical terms. To construct a mathematical model for a discrete source, we assume that each letter in the alphabet {x1, x2, …, xL} has a given probability pk of occurrence.

pk = P ( X = xk ), L

∑p k =1

k

5

1≤ k ≤ L

=1

WITS Lab, NSYSU.

3.1 Mathematical Models for Information Sources Two mathematical models of discrete sources: If the output sequence from the source is statistically independent, i.e. the current output letter is statistically independent from all past and future outputs, then the source is said to be memoryless. Such a source is called a discrete memoryless source (DMS). If the discrete source output is statistically dependent, we may construct a mathematical model based on statistical stationarity. By definition, a discrete source is said to be stationary if the joint probabilities of two sequences of length n, say a1, a2, …, an and a1+m, a2+m, …, an+m, are identical for all n≥1 and for all shifts m. In other words, the joint probabilities for any arbitrary length sequence of source outputs are invariant under a shift in the time origin. 6

WITS Lab, NSYSU.

3.1 Mathematical Models for Information Sources Analog source An analog source has an output waveform x(t) that is a sample function of a stochastic process X(t). Assume that X(t) is a stationary stochastic process with autocorrelation function φxx(τ) and power spectral density Φxx(f). When X(t) is a band-limited stochastic process, i.e., Φxx(f)=0 for | f |≥W, the sampling theorem may be used to represent X(t) as: ⎡ n ⎞⎤ ⎛ sin ⎢ 2π W ⎜ t − ⎟⎥ ∞ 2 W ⎛ n ⎞ ⎝ ⎠⎦ ⎣ X (t ) = ∑ X ⎜ ⎟ 2 W ⎝ ⎠ 2π W ⎛ t − n ⎞ n =−∞ ⎜ ⎟ 2 W ⎝ ⎠ By applying the sampling theorem, we may convert the output of an analog source into an equivalent discrete-time source. 7

WITS Lab, NSYSU.

3.1 Mathematical Models for Information Sources Analog source (cont.) The output samples {X(n/2W)} from the stationary sources are generally continuous, and they can’t be represented in digital form without some loss in precision. We may quantize each sample to a set of discrete values, but the quantization process results in loss of precision. Consequently, the original signal can’t be reconstructed exactly from the quantized sample values.

8

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information Consider two discrete random variables with possible outcomes xi, i=1,2,…, n, and yi, i=1,2,…,m. When X and Y are statistically independent, the occurrence of Y=yj provides no information about the occurrence of X=xi. When X and Y are fully dependent such that the occurrence of Y=yj determines the occurrence of X=xi, , the information content is simply that provided by the event X=xi. Mutual Information between xi and yj: the information content provided by the occurrence of the event Y=yj about the event X=xi, is defined as: P ( xi | y j ) I ( xi ; y j ) = log P ( xi ) 9

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information The units of I(xi,yj) are determined by the base of the logarithm, which is usually selected as either 2 or e. When the base of the logarithm is 2, the units of I(xi,yj) are bits. When the base is e, the units of I(xi,yj) are called nats (natural units). The information measured in nats is equal to ln2 times the information measured in bits since: log c b log a b = log c a ln a = ln 2 log 2 a = 0.69315log 2 a When X and Y are statistically independent, p(xi|yj)=p(xi), I(xi,yj)=0. When X and Y are fully dependent, P(xi|yj)=1, and hence I(xi,yj)=-logP(xi). 10

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information Self-information of the event X=xi is defined as I(xi)=-logP(xi). Note that a high-probability event conveys less information than a low-probability event. If there is only a single event x with probability P(x)=1, then I(x)=0. Example 3.2-1: A discrete information source that emits a binary digit with equal probability. The information content of each output is:

1 I ( xi ) = − log 2 P ( xi ) = − log 2 = 1 bit, 2

xi =0,1

For a block of k binary digits, if the source is memoryless, there are M=2k possible k-bit blocks. The self-information is:

I ( xi' ) = − log 2 2− k = k bits 11

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information The information provided by the occurrence of the event Y=yj about the event X=xi is identical to the information provided by the occurrence of the event X=xi about the event Y=yj since: I ( xi ; y j ) = =

P ( xi | y j ) P ( xi )

=

P ( xi | y j ) P ( y j ) P ( xi ) P ( y j )

P ( y j | xi ) P ( xi ) P ( xi ) P ( y j )

=

P ( y j | xi ) P( yj )

=

P ( xi , y j )

P ( xi ) P ( y j )

= I ( y j ; xi )

Example 3.2-2: X and Y are binary-valued {0,1} random variables that represent the input and output of a binary channel. The input symbols are equally likely. 12

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information Example 3.2-2 (cont.): The output symbols depend on the input according to the conditional probability: P (Y = 0 | X = 0 ) = 1 − p0 P (Y = 1| X = 0 ) = p0

P (Y = 1| X = 1) = 1 − p1 P (Y = 0 | X = 1) = p1 Mutual information about X=0 and X=1, given that Y=0: P (Y = 0 ) = P (Y = 0 | X = 0 ) P ( X = 0 ) 1 + P (Y = 0 | X = 1) P ( X = 1) = (1 − p0 + p1 ) 2 P (Y = 1) = P (Y = 1| X = 0 ) P ( X = 0 ) 1 + P (Y = 1| X = 1) P ( X = 1) = (1 − p1 + p0 ) 2 13

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information Example 3.2-2 (cont.) The mutual information about X=0 given that Y=0 is: P (Y = 0 | X = 0 ) 2 (1 − p0 ) I ( x1 ; y1 ) = I ( 0;0 ) = log 2 = log 2 1 − p0 + p1 P (Y = 0 ) The mutual information about X=1 given that Y=0 is: 2 p1 I ( x 2 ; y1 ) ≡ I (1; 0 ) = log 2 1 − p0 + p1 If the channel is noiseless, p0=p1=0:

I ( 0;0 ) = log 2 2 = 1 bit If the channel is useless, p0=p1=0.5:

I ( 0;0 ) = log 2 1 = 0 bit 14

WITS Lab, NSYSU.

3.2 A Logarithmic Measure of Information Conditional self-information is defined as: 1 I ( xi | y j ) = log = − log P ( xi | y j ) P ( xi | y j ) I ( xi ; y j ) = log P ( xi | y j ) − log P ( xi ) = I ( xi ) − I ( xi | y j )

We interpret I(xi|yj) as the self-information about the event X=xi after having observed the event Y=yj. The mutual information between a pair of events can be either positive or negative, or zero since both I(xi|yj) and I(xi) are greater than or equal to zero.

15

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy Average mutual information between X and Y: I ( X ; Y ) = ∑∑ P ( xi , y j )I ( xi ; y j ) n

m

i =1 j =1

= ∑∑ P ( xi , y j ) log n

m

i =1 j =1

P ( xi , y j )

P ( xi ) P ( y j )

I(X;Y)=0 when X and Y are statistically independent. I(X;Y)≥0. (Problem 3-4) Average self-information H(X): n

H ( X ) = ∑ P ( xi ) I ( xi ) i =1

When X represents the alphabet of possible output letters from a source, H(X) represents the average self-information per source letter, and it is called the entropy. 16

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy In the special case, in which the letter from the source are equally probable, P(xi)=1/n, we have: 1 1 H ( X ) = −∑ log = log n n i =1 n n

In general, H(X) ≤ log n for any given set of source letter probabilities. In other words, the entropy of a discrete source is a maximum when the output letters are equally probable.

17

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy Example 3.2-3 : Consider a source that emits a sequence of statistically independent letters, where each output letter is either 0 with probability q or 1 with probability 1-q. The entropy of this source is: H ( X ) ≡ H ( q ) = − q log q − (1 − q ) log (1 − q ) Binary Entropy Function Maximum value of the entropy function occurs at q=0.5 where H(0.5)=1.

18

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy The average conditional self-information is called the conditional entropy and is defined as: H ( X | Y ) = ∑∑ P ( xi , y j ) log n

m

i =1 j =1

1 P ( xi | y j )

H(X|Y) is the information or uncertainty in X after Y is observed. I ( X ;Y ) = =

∑ ∑ P ( x , y ) log P ( x ) P ( y ) m

i =1

j =1

i

j

i

∑ ∑ P ( x , y ) ⎡⎣ log {P ( x n

m

i =1

=

P ( xi , y j )

n

j =1

i

j

∑ ∑ P ( x , y ) ⎡⎣ log P ( x n

m

i =1

j =1

i

j

n

i

n

= − ∑ P ( x i ) log P ( x i ) + ∑ i =1

= H (X )− H (X |Y

i =1

i

j

}

| y j ) P ( y j ) − log P ( x i ) − log P ( y j ) ⎤ ⎦

| y j ) − log P ( x i ) ⎤⎦

∑ P ( x , y ) ⎡⎣ log P ( x m

i

j =1

j

i

| y j ) ⎤⎦

) 19

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy Since I(X;Y)≥0, it follows that H(X) ≥H(X|Y), with equality if and only if X and Y are statistically independent. H(X|Y) can be interpreted as the average amount of (conditional self-information) uncertainty in X after we observe Y. H(X) can be interpreted as the average amount of uncertainty (self-information) prior to the observation. I(X;Y) is the average amount of (mutual information) uncertainty provided about the set X by the observation of the set Y. Since H(X) ≥H(X|Y), it is clear that conditioning on the observation Y does not increase the entropy.

20

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy Example 3.2-4: Consider Example 3.2-2 for the case of p0=p1=p. Let P(X=0)=q and P(X=1)=1-q. The entropy is: H ( X ) ≡ H ( q ) = − q log q − (1 − q ) log (1 − q )

21

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy As in the proceeding example, when the conditional entropy H(X|Y) is viewed in terms of a channel whose input is X and whose out is Y, H(X|Y) is called the equivocation and is interpreted as the amount of average uncertainty remaining in X after observation of Y.

22

WITS Lab, NSYSU.

3.2.1 Average Mutual Information and Entropy Entropy for two or more random variables: n1

n2

(

nk

)

(

H ( X 1 X 2 ... X K ) = −∑ ∑ ...∑ P x j1 x j2 ...x jk log P x j1 x j2 ...x jk j1 =1 j2 =1

jk =1

)

since P ( x1 x2 ...xk ) = P ( x1 ) P ( x2 | x1 ) P ( x3 | x1 x2 ) ...P ( xk | x1 x2 ...xk −1 ) H ( X 1 X 2 X 3 ... X K ) = H ( X 1 ) + H ( X 2 | X 1 ) + H ( X 3 | X 1 X 2 ) + ... + H ( X k | X 1 X 2 ... X k −1 ) k

= ∑ H ( X i | X 1 X 2 ... X i −1 ) i =1

k

Since H ( X ) ≥ H ( X | Y ) ⇒ H ( X 1 X 2 ... X k ) ≤ ∑ H ( X m ) m =1

where X = X m and Y = X 1 X 2 ... X m −1 23

WITS Lab, NSYSU.

3.2.2 Information Measures for Continuous Random Variables If X and Y are random variables with joint PDF p(x,y) and marginal PDFs p(x) and p(y), the average mutual information between X and Y is defined as: ∞ ∞ p ( y | x) p ( x) I ( X ; Y ) = ∫ ∫ p ( x ) p ( y | x ) log dx dy −∞ −∞ p ( x) p ( y) Although the definition of the average mutual information carriers over to continuous random variables, the concept of self-information does not. The problem is that a continuous random variable requires an infinite number of binary digits to represent it exactly. Hence, its self-information is infinite and, therefore, its entropy is also infinite.

24

WITS Lab, NSYSU.

3.2.2 Information Measures for Continuous Random Variables Differential entropy of the continuous random variables X is defined as: ∞ H ( X ) = − ∫ p ( x ) log p ( x ) dx −∞

Note that this quantity does not have the physical meaning of selfinformation, although it may appear to be a natural extension of the definition of entropy for a discrete random variable.

Average conditional entropy of X given Y is defined as: H ( X | Y ) = −∫







−∞ −∞

p ( x, y ) log p ( x | y ) dxdy

Average mutual information may be expressed as: I ( X ;Y ) = H ( X ) − H ( X | Y ) = H (Y ) − H (Y | X ) 25

WITS Lab, NSYSU.

3.2.2 Information Measures for Continuous Random Variables Suppose X is discrete and has possible outcomes xi, i=1,2,…,n, and Y is continuous and is described by its marginal PDF p(y). When X and Y are statistically dependent, we may express n p(y) as: p ( y ) = ∑ p ( y | xi )P ( xi ) i =1

The mutual information provided about the event X= xi by the occurrence of the event Y=y is: p ( y | xi ) P ( xi ) p ( y | xi ) I ( xi ; y ) = = log p ( y ) P ( xi ) p ( y) The average mutual information between X and Y is: n ∞ p ( y | xi ) I ( X ; Y ) = ∑ ∫ p ( y | xi )P ( xi ) log dy −∞ p ( y) i =1 26

WITS Lab, NSYSU.

3.2.2 Information Measures for Continuous Random Variables Example 3.2-5. Let X be a discrete random variable with two equally probable outcomes x1=A and x2=-A. Let the conditional PDFs p(y|xi), i=1,2, be Gaussian with mean xi and variance σ2. 2 2 −( y − A) −( y + A) 1 1 2σ 2 2σ 2 p ( y | A) = e p ( y | − A) = e 2πσ 2πσ The average mutual information is: p ( y | A) p ( y | − A) ⎤ 1 ∞⎡ I ( X ; Y ) = ∫ ⎢ p ( y | A ) log + p ( y | − A ) log ⎥dy −∞ 2 ⎣ p ( y) p ( y) ⎦ where

1 p ( y ) = ⎡⎣ p ( y | A ) + p ( y | − A ) ⎤⎦ 2 27

WITS Lab, NSYSU.

3.3 Coding for Discrete Sources Consider the process of encoding the output of a source, i.e., the process of representing the source output by a sequence of binary digits. A measure of the efficiency of a source-encoding method can be obtained by comparing the average number of binary digits per output letter from the source to the entropy H(X). The discrete memoryless source (DMS) is by far the simplest model that can be devised for a physical model. Few physical sources closely fit this idealized mathematical model. It is always more efficient to encode blocks of symbols instead of encoding each symbol separately. By making the block size sufficiently large, the average number of binary digits per output letter from the source can be made arbitrarily close to the entropy of the source. 28

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Suppose that a DMS produces an output letter or symbol every τs seconds. Each symbol is selected from a finite alphabet of symbols xi, i=1,2,…,L, occurring with probabilities P(xi), i=1,2,…,L. The entropy of the DMS in bits per source symbol is: L

H ( X ) = −∑ P ( xi ) log 2 P ( xi ) ≤ log 2 L

Problem 3-5.

i =1

The equality holds when the symbols are equally probable. The average number of bits per source symbol is H(X). The source rate in bits/s is defined as H(X)/ τs.

29

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Fixed-length code words Consider a block encoding scheme that assigns a unique set of R binary digits to each symbol. Since there are L possible symbols, the number of binary digits per symbol required for unique encoding is:

R = log 2 L

when L is a power of 2.

R = ⎢⎣log 2 L ⎥⎦ + 1 when L is not a power of 2. ⎢⎣ x ⎥⎦ denotes the largest integer less than x. The code rate R in bits per symbol is R. Since H(X)≤log2 L, it follows that R≥H(X). 30

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Fixed-length code words The efficiency of the encoding for the DMS is defined as the ratio H(X)/R. When L is a power of 2 and the source letters are equally probable, R=H(X). If L is not a power of 2 , but the source symbols are equally probable, R differs from H(X) by at most 1 bit per symbol. When log2 L>>1, the efficiency of this encoding scheme is high. When L is small, the efficiency of the fixed-length code can be increased by encoding a sequence of J symbols at a time. To achieve this, we need LJ unique code words. 31

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Fixed-length code words By using sequences of N binary digits, we have 2N possible code words. N ≥ J log2 L. The minimum integer value of N required is N = ⎢⎣ J log 2 L ⎥⎦ + 1. The average number of bits per source symbol is N/J=R and the inefficiency has been reduced by approximately a factor of 1/J relative to the symbol-by-symbol encoding. By making J sufficiently large, the efficiency of the encoding procedure, measured by the ratio H(X)/R=JH(X)/N, can be made as close to unity as desired. The above mentioned methods introduce no distortion since the encoding of source symbols or block of symbols into code words is unique. This is called noiseless. 32

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Block coding failure (or distortion), with probability of Pe, occurs when the encoding process is not unique. Source coding theorem I: (by Shannon) Let X be the ensemble of letters from a DMS with finite entropy H(X). Blocks of J symbols from the source are encoded into code words of length N from a binary alphabet. For anyε>0, the probability Pe of a block decoding failure can be made arbitrarily small if J is sufficiently large and N R ≡ ≥ H (X )+ε J Conversely, if R≤H(X)-ε, Pe becomes arbitrarily close to 1 as J is sufficiently large. 33

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Variable-length code words When the source symbols are not equally probable, a more efficient encoding method is to use variable-length code words. In the Morse code, the letters that occur more frequently are assigned short code words and those that occur infrequently are assigned long code words. Entropy coding devises a method for selecting and assigning the code words to source letters.

34

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Variable-length code words Letter

P(ak)

Code I

Code II

Code III

a1

1/2

1

0

0

a2

1/4

00

10

01

a3

1/8

01

110

011

a4

1/8

10

111

111

Code I is not uniquely decodable. (Example: 1001) Code II is uniquely decodable and instantaneously decodable. Digit 0 indicates the end of a code word and no code word is longer than three binary digits. Prefix condition: no code word of length ll. 35

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Variable-length code words Code III has a tree structure:

The code is uniquely decodable. The code is not instantaneously decodable. This code does not satisfy the prefix condition. Objective: devise a systematic procedure for constructing uniquely decodable variable-length codes that minimizes: L

R = ∑ nk P ( ak ) k =1

36

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Kraft inequality A necessary and sufficient condition for the existence of a binary code with code words having lengths n1≤n2≤…≤nL that satisfy the prefix condition is L

− nk 2 ∑ ≤1 k =1

Proof of sufficient condition: Consider a code tree that is embedded in the full tree of 2n (n=nL) nodes.

nL=4 37

WITS Lab, NSYSU.

3.3.1 Coding for Discrete Memoryless Sources Kraft inequality Proof of sufficient condition (cont.) Let’s select any node of order n1 as the first code word C1. This choice eliminates 2 n-n1 terminal nodes (or the fraction 2-n1 of the 2n terminal nodes). From the remaining available nodes of order n2, we select one node for the second code word C2. This choice eliminates 2 n-n2 terminal nodes. This process continues until the last code word is assigned at terminal node L. At the node j σ ) 2 x

where σ x2 is the variance of the Gaussian source output. 61

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function Rate-distortion function for a memoryless Gaussian source No information need be transmitted when the distortion D ≥ σ x2 . D = σ x2 can be obtained by using

zeros in the reconstruction of the signal. For D > σ x2 we can use statistic-

ally independent, zero-mean Gaussian noise samples with a variance of D - σ x2 for the reconstruction. 62

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function Theorem: Source coding with a distortion measure (Shannon 1959) There exists an encoding scheme that maps the source output into code words such that for any given distortion D, the minimum rate R(D) bits per symbol (sample) is sufficient to reconstruct the source output with an average distortion that is arbitrarily close to D. It is clear that the rate-distortion function R(D) for any source represents a lower bound on the source rate that is possible for a given level of distortion. Distortion-rate function for the discrete-time, memoryless Gaussian is (from Equation 3.4-6): Dg ( R ) = 2−2 R σ x2 or 10 log10 Dg ( R ) = −6 R + 10 log10 σ x2 [dB] 63

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function Theorem: Upper Bound on R(D) The rate-distortion function of a memoryless, continuousamplitude source with zero mean and finite variance σ2x with respect to the mean-square-error distortion measure is upperbounded as: σ x2 1 , 0 ≤ D ≤ σ x2 R ( D ) ≤ log 2 2 D The theorem implies that the Gaussian source requires the maximum rate among all other sources for a specified level of mean-square-error distortion. The rate distortion R(D) of any continuous-amplitude, memoryless source with zero mean and finite variance σ2x satisfies the condition R(D) ≤Rg(D). 64

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function The distortion-rate function of the same source satisfies the condition: D ( R ) ≤ Dg ( R ) = 2−2 R σ x2 The lower bound on the rate-distortion function exists and is called the Shannon lower bound for a mean-square-error distortion measure: 1 * R ( D ) = H ( X ) − log 2 2π eD 2 where H(X) is the differential entropy of the continuousamplitude, memoryless source. The distortion-rate function is: 1 −2 ⎡⎣ R − H ( X )⎤⎦ D* ( R ) = 2 2π e For any continuous-amplitude, memoryless source: R* ( D ) ≤ R ( D ) ≤ Rg ( D ) and D* ( R ) ≤ D ( R ) ≤ Dg ( R ) 65

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function The differential entropy of the memoryless Gaussian source is: 1 H g ( X ) = log 2 2π eσ x2 ⇒ R* ( D ) = Rg ( D ) 2 By setting σ2x =1, we obtain from Equation 3.4-12: 10 log10 D* ( R ) = −6 R − 6 ⎡⎣ H g ( X ) − H ( X ) ⎤⎦ 10 log10

Dg ( R ) D ( R) *

= 6 ⎡⎣ H g ( X ) − H ( X ) ⎤⎦ dB = 6 ⎡⎣ Rg ( D ) − R* ( D ) ⎤⎦ dB

The differential entropy H(X) is upper-bounded by Hg(X) (shown by Shannon 1948). 66

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function Differential entropies and rate-distortion comparisons of four common PDFs for signal models p ( x)

PDF Gaussian Uniform Laplacian

1 2πσ x

H ( x) 2

/ 2σ x2

1 , x ≤ 3σ x 2 3σ x 1 − e 2σ x 4

Gamma

e− x

3

8πσ x x

e

Rg ( D ) − R* ( D ) Dg ( R ) − D* ( R )

2 x /σ x

− 3 x / 2σ x

( bits/sample )

( dB )

0

0

)

0.255

1.53

1 log 2 2e 2σ x2 2

0.104

0.62

0.709

4.25

1 log 2 2π eσ x2 2

(

1 log 2 12σ x2 2

(

(

)

)

1 log 2 4π e0.423σ x2 / 3 2

(

67

)

WITS Lab, NSYSU.

3.4.1 Rate-Distortion Function Consider a band-limited Gaussian source with spectral density: ⎧σ X2 ⎪ 2W ( f ≤ W ) Φ( f ) = ⎨ ⎪⎩ 0 ( f >W ) When the output of this source is sampled at the Nyquist rate, the samples are uncorrelated and, since the source is Gaussian, they are also statistically independent. The equivalent discrete-time Gaussian source is memoryless. The rate-distortion function in bits/s is: Rg ( D ) = W log 2

σ X2

D The distortion-rate function is: −R Dg ( R ) = 2 W σ X2 68

0 ≤ D ≤ σ x2

WITS Lab, NSYSU.

3.4.2 Scalar Quantization In source encoding, the quantizer can be optimized if we know the PDF of the signal amplitude at the input to the quantizer. Suppose that the sequence {xn} at the input to the quantizer has a PDF p(x) and let L=2R be the desired number of levels. Design the optimum scalar quantizer that minimizes some function of the quantization error q= x -x, where x is the quantized value of x. f( x -x) denotes the desired function of the error. Then, the distortion resulting from quantization of the signal amplitude is: ∞

(

)

D = ∫ f x − x p ( x ) dx −∞ Lloyd-Max quantizer: an optimum quantizer that minimizes D by optimally selecting the output levels and the corresponding input range of each output level. 69

WITS Lab, NSYSU.

3.4.2 Scalar Quantization For a uniform quantizer, the output levels are specified as x k= (2k-1)Δ/2, corresponding to an input signal amplitude in the range (k-1)Δ≤x0 with squares having area Δ2, the total number of levels that will result is the area of the rectangle divided by Δ2, i.e., ab ' Lx = 2 ∆ The difference in bit rate between the scalar and vector 2 quantization method is: a + b) ( ' Rx − Rx = log 2 2ab 93

WITS Lab, NSYSU.

3.4.3 Vector Quantization Example 3.4-1 (cont.) A linear transformation (rotation by 45 degrees) will decorrelate x1 and x2 and render the two random variables statistically independent. Then scalar quantization and vector quantization achieve the same efficiency. Vector quantization will always equal or exceed the performance of scalar quantization. Vector quantization has been applied to several types of speech encoding methods including both waveform and model-based methods.

94

WITS Lab, NSYSU.

3.5 Coding Techniques for Analog Sources Three types of analog source encoding methods Temporal waveform coding : the source encoder is designed to represent digitally the temporal characteristics of the source waveform. Spectral waveform coding : the signal waveform is usually subdivided into different frequency bands, and either the time waveform in each band or its spectral characteristics are encoded for transmission. Model-based coding : based on a mathematical model of the source.

95

WITS Lab, NSYSU.

3.5.1 Temporal Waveform Coding Pulse-code modulation (PCM) Let x(t) denote a sample function emitted by a source and let xn denote the samples taken at a sampling rate fs≥2W, where W is the highest frequency in the spectrum of x(t). In PCM, each sample of the signal is quantized to one of 2R amplitude levels, where R is the number of binary digits used to represent each sample. The quantization process may be modeled mathematically as: x n = x + q n

n

where qn represents the quantization error, which we treat as an additive noise. 96

WITS Lab, NSYSU.

3.5.1 Temporal Waveform Coding Pulse-code modulation (PCM) For a uniform quantizer, the quantization noise is well characterized statistically by the uniform PDF: 1 1 1 p ( q ) = , − ∆ ≤ q ≤ ∆, ∆ = 2- R 2 2 ∆ Mean square value of the quantization error is: 1 2 1 2 E ( q ) = ∆ = × 2−2 R 12 12

97

WITS Lab, NSYSU.

3.5.1 Temporal Waveform Coding Pulse-code modulation (PCM) Many source signals such as speech waveforms have the characteristic that small signal amplitudes occur more frequently than large ones. A better approach is to employ a non-uniform quantizer. A non-uniform quantizer characteristic is usually obtained by passing the signal through a non-linear device that compresses the signal amplitude, followed by a uniform quantizer. In the reconstruction of the signal from the quantized values, the inverse logarithmic relation is used to expand the signal amplitude. The combined compressor-expandor pair is termed as a compandor. 98

WITS Lab, NSYSU.

3.5.1 Temporal Waveform Coding Pulse-code modulation (PCM) In North America: μ-law compression: log e [1 + µ ( x / xmax )] y = ymax ⋅ sgn x log e (1 + µ ) ⎧+1 for x ≥ 0 sgn x = ⎨ ⎩−1 for x < 0 In Europe: A-Law compression:

⎧ A( x / xmax ) ⋅ sgn x ⎪ y max 1 + log e A ⎪ y=⎨ ⎪ y 1 + log e [ A( x / xmax )] ⋅ sgn x ⎪⎩ max 1 + log e A 99

0