Blind decoding of Linear Gaussian channels with ISI, capacity ... - arXiv

1 downloads 0 Views 169KB Size Report
Jan 3, 2008 - ISI, capacity, error exponent, universality. Farkas Lóránt. Abstract—A new straightforward universal blind detection algorithm for linear ...
1

Blind decoding of Linear Gaussian channels with ISI, capacity, error exponent, universality Farkas Lóránt

arXiv:0801.0540v1 [cs.IT] 3 Jan 2008

Abstract— A new straightforward universal blind detection algorithm for linear Gaussian channel with ISI is given. A new error exponent is derived, which is better than Gallager’s random coding error exponent.

I. I NTRODUCTION In this paper, the discrete Gaussian channel with intersymbol interference (ISI) yi =

l X

xi−j hj + zi

(1)

j=0

will be considered, where the vector h = (h0 , h1 , . . . , hl ) represents the ISI, and {zi } is white Gaussian noise with variance σ 2 A similar continuous time model has been studied in Gallager [6]. He showed that it could be reduced to the form yn = vn xn + wn

(2)

where the vn are eigenvalues of the correlation operator. The same is true also for the discrete model (1), but the reduction Pl requires knowledge of the covariance matrix R(i, j) = k=0 h(k − i)h(k − j) whose eigenvectors should be used as new basis vectors. Here however such knowledge will not be assumed, our goal is to study universal coding for the class of ISI channels of form (1). As a motivation , note that the alternate method of first identifying the channel by transmitting a known “training sequences” has some drawbacks. Because the length of the training sequence is limited, the estimation of the channel can be imprecise, and the data sequence is thus decoded according to an incorrect likelihood function. This results in an increase in error rates [2], [3] and in a decrease in capacity [4]. As the training sequence contains no valuable information, the longer it is the less information bits can be carried. One can think this problem could be solved by choosing the training sequence sufficiently large to ensure precise channel estimation, and then choose the data block sufficiently long, but this solution seldom works due to the delay constraint, and to the slow change in time of the channel. So we will give a straightforward method of coding and decoding, without any Channel Side Information (CSI) To achieve this, we generalise the result of Csiszár and Körner [5] to Gaussian channel with white noise and ISI, using an analogue of the Maximal Mutual Information (MMI) decoder in [5]. Thanks for Imre Csiszár for his numerous correction in this work.

We will show that our new method is not only universal, not depending on the channel, but its error exponent is better in many cases than Gallager’s [6] lower bound for the case when complete CSI is available to the receiver. Previously, Gallager’s error exponent has been improved for some channels, using an MMI decoder, such as for discrete memoryless multiple-access channels [7]. We don’t use Generalised Maximum likelihood decoding [9], but a generalised version of MMI decoding. This is done by firstly approximating the channel parameters by maximum likelihood estimation, and then adopt the message whose mutual information with the estimated parameters is maximal. By using an extension of the powerful method of types, we can simply derive the capacity region, and random coding error exponent. At the end, we have a more general result, namely: We show how the method of types can be extended to a continuous, non memoryless environment. The structure of this correspondence is as follows. In Section II we generalise typical sequences to ISI channel environment. The main goal of section III is to give a new method of blind detection. In section IV we show by numerical results, that for some parameters the new error exponent is better than Gallager’s random coding error exponent. In Section V we discuss the result, and give a general formula to the channels with fading. II. D EFINITIONS Let γn be a sequence of positive numbers with limit 0. The sequence x ∈ Rn is γn -typical to an n-dimensional continuous distribution P , denoted by x ∈ TP , if |− log(p(x)) − H(P )| < γn (3) n where p(x) denotes the density function, and H(P ) the differential entropy of P . Similarly sequences x ∈ Rn ,y ∈ Rn+l are jointly γn -typical to 2n + l dimensional joint random distribution PX,Y denoted by (x, y) ∈ TPXY , if − log(pX,Y (y, x)) − H(pX,Y ) < nγn

In the same way, a sequence y is γn -typical to the conditional distribution PY |X , given that X = x, denoted by y ∈ TPY |X (x) if − log(pY |X (y|x)) − H(Y |X) < nγn

For simplifying the proof, in the following, PX = P is always the n dimensional i.i.d. standard normal distribution, the optimal input distribution of a Gaussian channel with

2

power constrain 1. The conditional distribution will be chosen (n,σ) as PY |X with density 2

1

exp(

σ n (2π)(n+ln )/2

−|y − h ∗ x| ) 2σ 2

Pl where h = (h0 , h1 , h2 , . . . , hl ) and (h ∗ x)i = j=0 xi−j hj where xk = 0 is understood for k < 0. So in this case Hh,σ (Y |X) = H(h∗X+Z|X) = H(Z) = n[

ln(2πe) +ln(σ)] 2

The limit of the entropy of Y = X ∗ h + Z as n → ∞, is 1 1 Hh,σ (Y ) = ln(2πe) + lim n→∞ n 2 2π

Z



ln (σ + f (ξ)) dξ

0

P∞ Pl−|k| ikλ where f (λ) = see [1] (here k=−∞ ( j=0 hj hj+|k| )e P l−|k| Rm,n = r(m−n) = r(k) = j=0 hj hj+|k| is the correlation matrix). So the limit of the average mutual information per symbol, that is lim In (h, σ) = lim

n→∞

n→∞

Hh,σ (Y ) − Hh,σ (Y |X) n

Z

0



We summarise the result of this section: The first Lemma shows that the above definition of ISI type is consistent, in the sense that y is conditionally PYh,σ |X typical given x, at least in the case when ky − h ∗ xk2 is not too large. Lemma 2 gives the properties which we need for our method, and proves that almost all randomly generated sequences has these properties. Lemma 4 gives an upper bound to the set of output signals, which are “problematic” thus typical to two codewords, namely they can be result of two different codeword with different channel. Lemma 5 shows that if the channel parameters estimated via maximum likelihood (ML), the codewords and the noise cannot be very correlated. Lemma 6 gives the formula of the probability of the event that an output sequence is typical with another codeword with respect to another channel. All Lemmas are used in Theorem 1, which gives the main result, and defines the detection method strictly. Lemma 1 When |

ky − h(i) ∗ xi k2 − σ(i)2 | < γn n

so the detected variances is in the interior of the set of approximating variances, then

is equal to 1 I(h, σ) ⊜ 2π

III. L EMMAS , T HEOREM

y ∈ TP h(i),σ(i) (xi )

  f (ξ) dξ ln 1 + σ

Y |X

Proof: Indeed, if

moreover the sequence I n (h, σ) is non-increasing (see [1]). We will consider a finite set of channels that grows subexponentially with n, and in the limit dense in the set of all ISI channels. To this end, define the set of approximating ISI, as

| then

ky − h(i) ∗ x k2 n i − < nγn . 2σ(i)2 2

Hn = {h ∈ Rln : hi = ki γn , |hi | < Pn , ki ∈ Z, ∀i ∈ {1, 2, . . . , ln }} where ln is the length of the ISI, Pn is the power constraint per symbol, and γn is the “coarse graining”, intuitively the precision of detection. Similarly we define the set of approximating variances as +

Vn = {σ ∈ R : σ = kγn , 1/2 < |σ| < Pn , k ∈ Z ∀i ∈ {1, 2, . . . , ln }} These two sets form the approximating set of parameters, denoted by Sn = Hn × Vn . 1 1 Below we set ln = [log2 (n)], Pn = n 16 , γn = n− 4 1 Definition The ISI type of a pair (x, y) ∈ Rn × Rn+ln is the pair (hn , σn ) ∈ Sn defined by

ky − h(i) ∗ xi k2 − σ(i)2 | < γn n

With

h(i),σ(i)

− log(PY| X

(y|xi )) =

ky − h(i) ∗ xi k2 n log(2πσ(i)2 ) + 2 2σ(i)2

we get h(i),σ(i)

| − log(PY| X

(y|xi )) − HP h(i),σ(i) (Y |X)| = Y |X ky − h(i) ∗ x k2 n i − 2σ(i)2 2

and by the definition y ∈ TP h(i),σ(i) (xi ) if Y |x

h(i),σ(i)

| − log(PY| X

(y|xi )) − HP h(i),σ(i) (Y |X)| < nγn Y |X

h(i) = argminh(i)∈Hn ky − h(i) ∗ xi k ky − h(i) ∗ xi k2 | n h(i)∈Hn Note that this type concept does not apply to separate input or output sequences, only to pairs (x, y). σ(i)2 = argminσ(i)∈Vn |σ(i)2 − min

Lemma 2 For arbitrarily small δ > 0, if n is large enough, there exists a set A ⊂ TPn , with P n (A) > 1 − δ, where P is the ndimensional standard normal distribution, such that for all

3

x ∈ A ,k, l ∈ {0, 1, . . . , ln } k 6= l Pn j=0 xj−k xj−l (4) < γn n Pn j=0 xj−k xj−k − 1 < γn (5) n 1 − ln p(x) − 1/n H(P ) < γn (6) n Proof: Take n i.i.d, standard Gaussian random variables X1 , X2 , . . . , Xn . Fix k, l 1 < k, l < n. By Chebishev’s inequality, ( P r ) ni=1 Xi−k Xi−l 1 1 Pr < 2 >ξ n n ξ 1

From this, with ξ = γn n 2   Pn i=1 Xi−k Xi−l 1 > γn < = δn Pr 1 n (γn n 2 )2

Which means that, there exist a set in Rn whose P n measure is at least 1 − δn and for all sequences from this set it is true that Pn j=0 xj−k xj−l < γn n

Similarly there exist such sets for all k 6= l in {0, 1, . . . , ln } × {0, 1, . . . , ln }. By a completely analogous procedure we can make sets which satisfy 5 and 6. The intersection of these sets P -measure at least 1 − 2δn (ln2 + 1). As δn ln2 → 0, this proves the Lemma. The Lebesgue measure will be denoted by λ; its dimension is not specified, it will be always clear what it is.

Lemma 4 For all R > 0, δ > 0,, there exist at least 2n(R−δ) different sequences in Rn which are elements of the set A from Lemma ˆ ∈ Hn , σ, σ 2, and for each pair of ISI channels with h, h ˆ ∈ Vn , and for all i ∈ {1, 2, . . . , M } 

λ TP (h,σ) (xi ) ∩ ˆ

And for any m-dimensional continuous distribution Q(·) λ(TQ ) < 2H(Q)+nγn

j6=i



TP (h,ˆ ˆ σ ) (x ) ≤ j Y |X

2−[n(| I(h,ˆσ)−R|+ )−Hh,σ (Y |X)]

λ{TP h,σ (x)} ≤ 2n(Hh,σ (Y |X)−γn ) Y |X

from Lemma 3, a C ∈ Cm satisfy (7), if for all i, h, ˆh X ˆ ui (C) ⊜ ui (C, h, h) ·2

ˆ h,h∈H ˆ n[I(h,ˆ σ)−R]−Hh,σ (Y |X)

is at most 1, for every i. Notice that, if C ∈ Cm m

1 X ui (C) ≤ 1/2 m i=1

(8)

′ then ui (C) ≤ 1 for at least m 2 i indices i. Further, if C ′ is the subcollection, states the above indexes, then ui (C ) ≤ ui (C) ≤ 1 for every such index i. Hence the Lemma will be proved, if for an m with δ

where TQ is the set of typical sequences to Q, see (3) Proof: Since Z 1 > P (A) = p(x)λ(dx) > 1 − δ A

by the previous Lemma, by using 2−(HPX −nγn ) > p(x) > 2−(HPX +nγn ) on TPX , and A ⊂ T , we get

2 · 2n(R−δ) ≤ m ≤ 2n(R− 2 )

TQ

2

H(Q)−nγn

> λ(TQ )

The next lemma is an analogue of the Packing Lemma in [5]

(9)

we find a C ∈ Cm which satisfy 8. Choose C ∈ Cm at random, according to uniform distribution from A. In other words, let W m = (W1 , W2 , . . . , Wm ) be independent RV’s, each uniformly distributed over A. In order to prove that 8 is true for some C ∈ Cm , it suffices to show that

2H(P )+nγn ) > λ(A) > (1 − δ)2H(P )−nγn ) > 2H(P )−2nγn ) if n is large enough. Similarly from Z q(x)λ(dx) 1 > Q(TQ ) =

(7)

provided that n ≥ n0 (n, m, δ) Proof: We shall use the method of random selection. For fixed n, m constans , let Cm be the family of all ordered collections C = {x1 , x2 , . . . , xm }, of m not necessarily different sequences in A. Notice that if some C = {x1 , x2 , . . . , xm } ∈ Cm satisfies (7) for every i and pair of Gaussian channels ˆ σ (h, σ), (h, ˆ ), then xi ’s are necessarily distinct. For any colˆ lection C ∈ Cm , denote the left-hand side of (7) by ui (C, h, h). Since for x ∈ TP

Lemma 3 If n is large enough then the set A in Lemma 2 satisfies 2H(P )−2nγn < λ(A) < 2H(P )+nγn

Y |X

[

Eui (W m ) ≤

1 2

i = 1, 2, . . . , m

(10)

To this end, we bound Eui (W m , h, ˆh). Recalling that, ˆ denotes the left-hand side of 7, we have ui (C, h, h) Eui (W m , h, ˆh) = Z [ Pr{y ∈ TP h,σ (Wi ) ∩ TP h,ˆ ˆ σ (Wj )} Y |X

y∈R

j6=i

Y |X

(11) (12)

4

As the Wj are independent identically distributed the probability the integration is bounded above by X Pr{y ∈ TP h,σ (Wi ) ∩ TP h,ˆ ˆ σ (Wj )} = (13) Y |X

j:j6=i

Now, n X

ky − ˆh ∗ xi k2 =

Y |X

(14)

Y |X

As the Wj ’s are uniformly distributed over A, we have for all fixed y ∈ Rn

n X

=

(yj −

=

˜ g xj−g − γn xj−k )2 = h

(zj − γn xj−k )2 =

Pr{y ∈ TP h,σ (Zi )} =

Y |X

λ{A}

Y |X

The set in the enumerator is non-void only if y ∈ TP h,σ . In this Y case it can be written as TP¯X|Y (y), where P¯ is a conditional distribution, which h,σ ¯ PX (a)PYh,σ |X (b|a) = PY (b)PX|Y (a|b)

Thus by Lemma 3, and Lemma 2 2Hh,σ (X|Y )+nγn 2H(X)−2nγn ) −n(I(h,σ)−3γn ) =2

Pr{y ∈ TP h,σ (Zi )} ≤ Y |X

Y

≤ kz i k2 − 2nγn λk + γn2 (n + γn ) ≤ kzi k2 − (n − γn )γn2 ˜∗x k = ky − ˜h ∗ x k − nγ 2 + γ 3 < ky − h i

Let n be so large that 7γn < δ/2, then we get

(15)

h = argminh∈H(n) ky − h ∗ xk

(16)

o

,σ phY |X (y|x) ≤ 2−n[d((h,σ)k(h

Lemma 5 ˜ = For x ∈ A from Lemma 2, and y as is (1), and h ˜ argminh∈H(n) ky − h ∗ xk, and z = y − h ∗ x, Pn

j=1 zj xj−k

< γn k ∈ {0 . . . ln } n Proof: (Indirect) Suppose that Pn j=1 zj xj−k = λk > γn n for some k ∈ {0 . . . ln }. Then let ( ˜ j if j 6= k h ˆj = h ˜ j + γn if j = k h ˆ ˜ We will show, that ky−h∗x i k < ky−h∗xi k, which contradicts ˜ to the definition of h).

o

,σo ))−δ]−Hh,σ (Y |X)

2

2

o 2

k Here d((h, σ)k(ho , σ o )) = − 21 log( σσ2 ) − 12 + σ +kh−h is 2∗σo2 o an information divergence for Gaussion distributions, positive if (h, σ) 6= (ho , σ o ). Proof:

Eui (W m ) ≤ |Hn2 ||Vn2 |2−n(δ/2) which proves (10)

(17)

Then

ˆ

≤2

y ∈ TYh,σ |X (x) σ = ky − h ∗ xk/n

o

ˆ σ )−R+δ−7γn ]+Hh,σ (Y |X) −n[I(h,ˆ

i

n

2

Eui (W m , h, ˆ h) ≤ Y

n

Lemma 6 Let δ > 0, and x ∈ A from Lemma 2. Let h, σ ∈ H × V and ho , σ o ∈ H × V be two arbitrarily (ISI function, variance) pairs. Let y and x be such that

Y |X

λ(TP h,σ )(m − 1)2−n[I(h,σ)+I(h,ˆσ)−6γn ]

(zj2 − 2γn zj xj−k + γn2 x2j−k )

On account of (4),

If y ∈ TP h,σ , and Pr{y ∈ TP h,σ (Wi )} = 0 otherwise. So, if we upper bound λ(TP h,σ ) by 2Hh,σ (Y )+nγn - with the use of Y Lemma 3 - from (14), (12) and (9) we get,

n X j=1

j=1

λ{x : x ∈ TPX , y ∈ TP h,σ (x)}

ˆhg xj−g )2 =

g=0

j=1

n X

ln X

ln X g=0

j=1

Y |X

(m − 1) · Pr{y ∈ TP h,σ (Wi )} · Pr{y ∈ TP h,ˆ ˆ σ (Wj )}

(yj −

o

o

,σ PYh |X (y|xi ) = 2

"

1 log −n − n

ho ,σo P (y|xi ) Y |X h,σ P (y|xi ) Y |X

!#

(y|xi ))] +log(PYh,σ |X

(18) and y ∈ TYh,σ |X (x) by the definition, so: δ 3 (19)

− log(PYh,σ |X (y|xi )) ≥ Hh,σ (Y |X) − γn > Hh,σ (Y |X) −

if n is large enough. With this:   ky−h∗xk2 ! o 1 ,σo ) exp(− n 2 PYh|X (y|xi ) 2σ n o  (2Π) 2 σo  = log  log = h,σ ky−h∗xk2  1 PY |X (y|x) ) exp(− n 2 2σ 2 n (2Π)

2

σ o

ky − h ∗ xk ky − h ∗ xk2 σ )+ − ≤ σo 2σ 2 2σo2 ky − ho ∗ xk2 σ2 n + nγn n log( 2 ) + − 2 σo 2 2 ∗ σo2

n log(

(20)

5

where d((h, σ)kho , σ o )) is the information divergence (6),

Introduce the following notation z = y − h ∗ x, Then ky − ho ∗ xk2 = kz + h ∗ x − ho ∗ xk2 = =

n X

(zj +

j=1

n X

(zj2

ln X

(hk − hok )xj−k )2 =

k=0

+ 2zj

j=1

ln X

(hk −

hok )xj−k

k=0

= kzk + 2

ln X

ln X + ( (hk − hok )xj−k )2 k=0

(hk − hok )

n X

zj xj−k +

i=1

k=0

+

Remark 1 The expression minimised above is a continuous function of ho , σ o , h, σ, R Proof: Let δ = ε/3, and let

ln n X X ( (hk − hok )xj−k )2 = j=1 k=0

using Lemma 5

n X

ln X

zj xi,j−k < nγn

(hk − hok )nγn +

k=0

+

n X ln X

the set of codesequences from Lemma 4, so M ≥ 2n(R−δ) . The coding function sends the i-th codeword for message i, f (i) = xi . The decoding happens as follows: Let denote the ISI-type of y, xi by h(i), σ(i) for all i ∈ {1, 2, . . . , M }. Using these parameters we define the decoding rule as follows φ(y) = i ⇐⇒ i = argmaxj I(h(j), σ(j)) in case of non-uniqueness, we declare an error. Now we bound the error

j=0

= kzk + 2

C = {x1 , x2 , . . . , xM }

ln X

(hk − hok )2

n X

M 1 X ho ,σo P (φ(y) 6= i|xi , ec1 ) Pe = Pout + M i=1 Y |X

x2i,j−k +

j=0

k=0

(hm − hom )(hk − hok )xi,j−k xi,j−m ≥

j=0 k6=m

≥ kzk − n4ln Pn γn + (n − nγn )kh − ho k2 − n(ln Pn )2 γn With this we can bound (20) ! o ,σo PYh |X (y|xi ) 1 − log = n PYh,σ |X (y|xi ) σ2 1 σ 2 + kh − ho k2 γn 1 log( 2 ) − + − − 4ln Pn γn 2 σo 2 2 ∗ σo2 2 − γn l4Pn2 − (ln Pn )2 γn = (21)



where Pout denotes the probability of event E that the detected variance, for some i ∈ {1, 2, . . . , M } does not satisfy |σ(i)2 − ky−h(i)∗xi k2 | < γn . Bound the probability of this event. If E n occurs then σ(i) is extremal point of the approximating set of parameters, so |y − h(i) ∗ xi k2 > nPn . Since h = (0, 0, . . . , 0) is element of the approximating set of ISI, this means that the power of the incoming sequence is greater than nPn , the probability of this  Z ∞ n −|z|2 1 2 Pout ≤ 2 e 2σ = n n/2 Pn ln σ (2π)

while kzk = nσ 2 . If n large enough, then

= (2erfc(

δ 6 since limn→∞ Pn2 ln2 γn = 0. Using (6) we continue from (21) γn = d((h, σ)k(ho , σ o )) − 2 − (4ln Hn γn + γn l4Hn2 + (ln Hn )2 γn ) ≥ max(4ln Pn γn , 4γn ln Pn2 , (ln Pn )2 γn )
0 ε > 0, and blocklength n > n0 (R, ǫ), there exist a code (f, φ) (coding/decoding function pair), with rate ≥ R − ε such that for all ISI channels, with parameters ho ∈ Rln , |hoi | < Pn , σ o < Pn , σ o 6= 0, the average error probability satisfies Pe (ho , σ o , f, φ) ≤ 2−n(Er (R,h

o

Here Er (R, ho , σ o ) ⊜ min{d((h, σ)kho , σ o )) + | I(h, σ) − R|+ } h,σ

(22) (23)

1+1/8 Pn ln n ))