Coding schemes for the binary symmetric channel

1 downloads 0 Views 385KB Size Report
is an auxiliary random variable with conditional distribution ¢BÂ¥ and .... It is instructive to sketch the proof of achievability, based on random coding and random ...
Coding schemes for the binary symmetric channel with known interference Giuseppe Caire Eurecom Institute, France

Amir Bennatan Tel-Aviv University, Israel

David Burshtein Tel-Aviv University, Israel

Shlomo Shamai Technion, Israel

Abstract We consider a binary symmetric channel with additive binary interference known to the transmitter but unknown to the receiver. Depending on whether the interference signal is known noncausally or causally, this channel falls in the cases studied by Gel’fand and Pinsker and by Shannon, respectively, for which coding theorems and single-letter capacity expressions are known. In this work we present effective code constructions for both problems. In particular, we show that the non-causal interference knowledge case can be turned into an equivalent binary multipleaccess channel, for which standard superposition coding and successive decoding can be used. We discuss also two alternatives for the causal interference knowledge case, one based on time-sharing and the other based on coding over a ternary alphabet with non-uniform probability assignment.

1 Introduction Memoryless channels with input , output and state-dependent transition probability where the channel state is i.i.d., known to the transmitter and unknown to the receiver, date back to Shannon [1], who considered the case of state sequence known causally, and to Kusnetsov and Tsybakov [2], who considered the case of state sequence known non-causally. Gel’fand and Pinsker [3] proved the capacity formula (1) 































!

#











%



!

#

'



%

)



!

for the non-causal case, where is an auxiliary random variable with conditional distribution is a deterministic function of and . For the causal case with i.i.d. state sequence, Shannon proved the capacity formula [1] 

*





and

!

















!

#



(2) %

 

!

which can be obtained as a special case of (1) by restricting the supremization to independent of [4]. From Costa’s “Writing on dirty-paper” famous title [5], coding techniques for the non-causally known state sequence are generally referenced to as “Dirty-Paper coding” By analogy, the case of causally known interference is referred to as “Dirty-Tape coding”. Binary Dirty-Paper and Dirty-Tape problems are relevant in data-hiding subject to a maximum distortion, where the host signal is a black and white image, or the least-significant bit-layer of a gray-scale image, and where the signal is received through some memoryless transformation modeled as a BSC. Dirty-Paper applies when the host signal is available at the encoder before transmission, while DirtyTape is relevant for on-line data hiding, where the host signal is revealed causally to the encoder. where addition is over the binary field , is the The channel output is given by , is the BSC noise, host signal, assumed i.i.d. with uniform probability Bernoulli- , and is the channel input. Due to the additive nature of the channel at hand, we shall refer to the host signal as interference in the following.



/



/

1





4







6



6

7

9

2



1



:

;

809



%





%

3



A coding scheme is defined by a sequence of encoding functions 















































 













for

, such that for the information message the corresponding codeword is





 







#













and the interference realization 

 

# #

%

"



 &

&

&





#

'

If the transmitter has non-causal knowledge of , then is allowed to depend on the entire interference signal (Dirty-Paper). On the contrary, if the transmitter has causal knowledge of , then is allowed to depend only on the interference signal up to time (Dirty-Tape). The maximum distortion constraint is reflected by the input constraint 

&



#







&





-

#



)

(3)

+

"

&



#

where )

/

1

4 

4

3 3

denotes Hamming distance between two -vectors

+

&

and .

1.1 The binary Dirty-Paper capacity When the encoder knows non-causally, it can be shown (see [6] and references therein) that the maximum achievable rate is given by 

>

?

#

5



8

for for

:

1

&

@

1

A

#

where the function

D





>

?

A 1




&




?

&

1

&



F

M

#

#

X

Z



\

a

K

e

d

i

e

T

V

L

1

&

h

&

&

d

d

i





o



d



i



#

)

+

o

&

/

1



>

?

Since each subset is large enough, such sequence can be found with probability for sufficiently . The decoder observes the standard BSC output large . Then, the encoder sends , where is the noise realization, finds the unique codeword jointly typical with (i.e., such that for some ) 2 and outputs the message as the index of the subset containing . Since the codebook is small enough, for sufficiently large the probability of error can be made smaller than . Eventually, the rate can be transmitted 



o

L



h

a

"



a

a

r

o

a

s

s

s

"



o



A

a

)

t

o



d

#

#

w

w

y

+

r

t



o

t

r

o

&

/

M



&

>

d

t



?

#

#

:

z

1

|



t

~



&

h

1

&

L

h

When information rates are measured in bits, log and exp are base-2, when it is measured in nats, log and exp are base- . Clearly, a better decoder such as the minimum distance decoder achieves the same rate. €

2



810

‚

„

…

†

ˆ

‰

‹

Œ



Ž





‘

“

‚

•

–

˜



with error probability not larger than for sufficiently large . The above achievability argument can be made rigorous. The converse is provided, for example, in [6]. , capacity is achieved by time-sharing with duty-cycle In the range the above scheme for Hamming distortion equal to and “silence” (zero rate and zero Hamming distortion). : the codebook must be a good channel code We notice the different roles of and its subsets must be a good Hamming quantizer for for the BSC with transition probability while each subset the interference sequence (a Bernoulli i.i.d. unbiased source) with Hamming distortion . Unlike the case of AWGN channel with Gaussian interference studied by Costa [5], the binary Dirty-Paper capacity is strictly less than the capacity if interference was not present, which is given by









































 





#

(6)

 







 



#





 





"





-

#



0



&

where we define (6) is always larger than (4).

-

+







. Clearly, 



for all















, hence 

1.2 The binary Dirty-Tape capacity In the Dirty-Tape case, Shannon’s capacity formula (2) is given explicitly by 



0

0





2

4

6

(7)













The proof of (7) is a simple example of the idea of “coding over strategies” underlying Shannon’s result. The auxiliary variable in (2) takes on values in the set of memoryless functions (or “strategies”) mapping the state into the input . are binary, then , i.e., the identically *zero*, identically When both and *one* functions, identity and negation, respectively. The transition probability assignment of the associated channel with input and output , given by 8

9

;



:

=

>

@

>

C

E

>

G

H

I




Q



Q



>

^ ^



^

\

L M

O

R P

S



L M

O

Y

[

R \

S

 

L



is given by the table =

@

Q

id

S

0 1

#



0

#



0

#



0

#



0

#



not  

#

 



Due to the concavity of capacity as a function of the input constraint symbol has associated cost zero, we have [7] =

and to the fact that the input 



e



b

c

n

d

L M

O

P

L l

M

O

P

l

o

0 0

  

2

4

6

(8)



>





 e

g

i



u



p j

 

r







s

q

S

:







C

E



G

H

I

or by . In order to show that (8) is where the supremization is achieved by either indeed the capacity, we find an input probability assignment such that equals (8). The input constraint is given by S

S

w



8

{

P

L

K



#

>



8



u

=



 @





&

r

C #



p

E G



H

I



& &

(9)

0 s

P

: q





L

P P



~

L 

~

L

P



L 







We observe that the optimal input probability assignment must put zero probability mass on the input , since would increase the input average Hamming weight without increasing mutual ). Moreover, by symmetry, it must be (notice that information ( 

@

@



+

P

L



w





8

@



C



E



G



{

L



811

H

I



P

K

P



L



0.7

No interference

0.6

1 − h(0.1) = 0.531 bits

C (bit/symbol)

0.5

Dirty−Paper

0.4

Dirty−Tape

0.3

0.2

Function K(W)

0.1

0

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

W

Figure 1: Capacity of the input-constrained BSC with causally and non-causally known interference, . for 















this choice makes the output distribution uniform). Hence, we choose for some . The resulting mutual information is given by 









#



















, and





























$



'



)



,

.

0

(10)

1

*





 







, we obtain (8). By letting for Fig. 1 shows the binary Dirty-Paper and the Dirty-Tape capacities vs. the input constraint . Causal knowledge of the interference sequence incurs a noticeable capacity loss with respect to non-causal knowledge. 







5



5

2 Superposition coding for the Dirty-Paper problem A standard approach to implement the random binning scheme consists of using nested linear codes [4] , and identifying the bins with the cosets of in the partition . Let be a linear binary be a linear subcode of . We have the coset decomposition code, and 

6

8

:

6

6

6

;

6

8

8

6

6

8

6

6



=

>

B

? C

6 6

?

D

(11) E

8

A

@

? C

?

D D

is the -th coset of in , is a coset leader (i.e., a coset representative with minimum where Hamming weight). We define minimum-distance decoding for a code as 6

8

G

6

8

6

6





H

arg

J

K



R

(12)

T

Q

M

N

O K

U



J

P

for any K



W X

Y

, and the modulo 6

operator as #





C

J



(13)

J H

K

K

K



#



C

is the coset leader of the coset in the partition . Hence, Fig. 2 shows the block diagram of the structured binning scheme (see also [4]). The encoder maps into the coset leader of the partition . Then, the transmitted sethe information message quence is given by (14) J



[

K

6

8

K

6

6

8



D

]

6

;



#

D

C

D

_

^





J

C

;

812

C

_

[

;

U

8

6

8









where weight not larger than 



. The code is chosen such that the leaders of its cosets in . Therefore, satisfies the input constraint.















have Hamming

 







) (

Decoder

Coset Mapper



!







Coset



#

$

for %

Demapper &

'

Figure 2: Dirty-Paper coding scheme with linear binary codes.

The received sequence is given by *

,

.

0

2

0

4

,

6

8

0

:

;

0

4

,

:

0

(15) 4

is a codeword of . Hence, the decoder “sees” a standard BSC Notice that, by construction, and decodes by some decoding scheme (not necessary with noise and transmitted codeword minimum distance). Let be the decoding outcome (not necessarily a codeword of since decoding might not be complete). If , then it belongs to some coset of the partition and the index is output. Otherwise, an error is declared. By using random binary linear codes (i.e., whose parity-check matrix is generated with i.i.d. uniformly distributed elements) it can be easily shown that there exist sequences of nested linear codes such that, for sufficiently large , has exponentially vanishing error probability for BSC parameter and has Hamming distortion not larger than for an i.i.d. uniformly distributed source, provided that and [9, 10, 8] Although the above scheme is optimal, it presents some problems for practical code construction. In fact, while can be decoded by any suitable method, not necessarily minimum-distance, the modulo operation requires minimum-distance decoding or, at least, that admits complete decoding, inducing into decision regions. Unfortunately, minimum-distance decoding has a partition of the whole space some randomlike code construction exponential complexity. We are tempted to use for both and (e.g., LDPC [8, 11], or turbo codes [12]), which proved to be very effective in approaching capacity of various binary-input channels under low-complexity iterative decoding based on Belief-Propagation [13]. Unfortunately, while BP decoding is very effective for channel coding, it fails miserably for quantization. The reason of this failure is intuitively explained as follows: in channel coding the received is typically close to the transmitted codeword (within a Hamming sphere of radius signal ). It turns out that, if the code has an appropriate Tanner graph, then BP is able to recover a large fraction of the symbols of , achieving vanishing bit-error rate (BER) in the limit of large block length if is below a certain iterative decoding threshold , that depends on the ensemble defining . There exist several constructions of ensembles for which can be made very close to with coding rate very . On the other hand, in quantization the source signal is uniformly distributed over close to and it is typically far from the codewords of . It turns out that BP applied to quantization yields very poor performance, since with high probability it does not even converge to a codeword of . in some ensemble of Relaxing the requirement on the quantization code , and searching for structured codes for which minimum-distance decoding can be implemented with moderate complexity, must be a linear subcode of , the Tanner graph of must satisfy a is of little help. In fact, since certain structure thus preventing the use of standard randomlike ensembles (such as LDPCs) for . Because of these shortcomings, we take a different route and construct Dirty-Paper codes based and , referred to as the on superposition coding and successive decoding. Consider two codes quantization code and the auxiliary code in the following. The quantization code is randomly generated with uniform i.i.d. elements, while the auxiliary code is randomly generated according to an i.i.d. . The superposition code is defined as , Bernoulli- probability distribution, with :

,

6

8

0

:

;

B

4

:

D

B

B

:

E

B

:

D

;

0

6

G

;

8

E

B

B

B

I

B

E



B



;

B



;











B































B





















;

B

B

;

B





;

B

*



,

:

0

B

4

.



B

:







!

B

!



2





















;



B



;

B

;

;

B

B

;

B

B

B

B

;

B

$

,

%

&



%





813

B

;

B

0

$

i.e.,

 



































(16)



The encoder chooses a codeword 





, and sends the sequence 



 





 





 









 





 

 







 



 







 

where

 



. The decoder receives the signal

 

 



 

 











(17)

 

 





The error probability of this scheme is given by the probability of the union of the following events, and . Two observations are in order: 

































!



%



&





 

and independent of , then also is uniformly 1. Since is uniformly distributed over in the ensemble of random linear codes with rate distributed over . By choosing we can find a sequence of codes for which vanishes exponentially as . 



'



(

1



)

,



'

)

.

0



2







3

!

4



6





7





and it is independent of , the 2. Again because of the fact that is uniformly distributed over codeword is independent of . Hence, (17) coincides with a multiple-access channel (MAC) and send independent messages by sending the where two virtual users with codebooks codewords and , respectively. 

'



(

)



























is also vanishing if the rate pair Given the formal analogy with the MAC, we conclude that (where is the rate of and is the rate of ) is inside the capacity region of the MAC. This condition is sufficient, i.e., it yields an achievability result. In fact, the decoder is only interested must be reliably decoded. However, as we in reliable decoding of , while in the MAC both and shall see in the following, for an appropriate choice of the parameter governing the ensemble of , incurs no loss of optimality. Moreover, decoding the additional condition of decoding reliably also of the superposition code can be accomplished by (low complexity) successive decoding of and . We have: Lemma 1. Consider the MAC (18)



4







6







































:



9

:



=



=

where all variables are in , where is Bernoulli- , and where user 1 has the Hamming weight input satisfying constraint . The capacity region is given by all pairs '

)



?









)





















3

3

A

?

C





2

?



1







 , 

.

0

(19)

3



A

2

?



Proof. The proof follows immediately from the general MAC capacity. We give the details for , define the region the sake of completeness. For any input product distribution given by 





E

F

E

E

D

E

D

D







E

E

D

D





H



I

 :



:

A





H

9



K



I



 :

:

A

9

K







 

H

 I

 



: :

A 

 9



The capacity region of the MAC (18) is given by the closure of the convex hull of the union of all regions , for all satisfying the input constraint, i.e., for which satisfies 







E

D

E

D

E



D

E

E

D

D



M





  :









A



We observe that H



I





 :

:

3

9

K





A

814

3

?

C





2

?



(20)



 





since the RHS in (20) is the capacity of the input-constrained BSC (under conditioning , the contribution of user 0 can be removed from the received signal). We observe with respect to also that (21) 













































since the RHS in (21) is the capacity of the BSC 









. Finally, we observe that 













 

















(22)







"

$













be Bernoulli- and be Bernoulli- , the upper bounds (20), (21) and (22) are By letting simultaneously achieved. Since the resulting region is closed and convex, no closure and convex hull operations are needed. Now, we choose such that . For any there exist for which this condition holds. Hence, the rate pair , corresponds to a vertex of the MAC capacity region (see Fig. 3). We conclude that pairs of quantization and auxiliary codes and can be found that achieve rates arbitrarily close to under successive decoding. Namely, we first decode by treating as noise. Then, we subtract the decoded codeword of . the first stage from the received signal and decode based on !















'

"





)







)







$



0











-



"

$



3



/















)



)











4









7





)











:

7











=

=

+

!

>

+



#

$



+

!

&

&

+











1

%





%







=

1



(







F

B

D

*

%







+

G



(

"







)

"

,

J

I

G

 

K



M

K



N

2



N



2

(23) "

#

O

K

K

K

F

N

P

"





(





W





>

can be From the law of large numbers it is immediate to show that made smaller than for sufficiently large . Hence, this scheme achieves rate . with probAn alternative approach consists of constructing codes over the ternary alphabet . However, for finite length and randomlike code constructions this approach ability does not yield better results, as shown by the following random coding error exponent analysis. for The Gallager random coding error exponent is given by . For the BSC with parameter and uniform input probability we have T

V

I



(

Q

S









%











!

#

$

&

1











%







)









"

,







%









Y

4

5

7

*

)

)

'



/

/

)

Y

[





+





-

\



%

,

]

\

)

8



8

*

 



5

7

* :

 

0

2

/

/



! 3



#

5

7

:





8 8



Y

/

$

%

 

! F



#

$



F

\

%







(24)

The exponent for the Dirty-Tape channel under the previously described time-sharing approach is given , where denotes the BSC exponent. by 



Y





)

)







Y

;

2

3

0

'





)

Y

0

2

'

2

3

'









816

Error exponent for the binary Dirty−Tape channel, p=0.01, W=0.1 0.12

0.1

Ternary coding

Time−sharing with binary coding

Er(R)

0.08

0.06

0.04

0.02

0

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

R (nat) 







Figure 4: Random coding exponents for the binary Dirty-Tape channel with





and









. 









For the random coding error exponent achieved by ternary coding over the alphabet we have input probability









with















  

 





























 























 





 









 







 

 

















































 

 

(25)

*





for all . In other words, standard It can be shown that coding for the BSC (with a uniform input probability) and multiplexing the code word with dummy symbols according to a predetermined time-sharing patters outperforms direct ternary code construction. Fig.4 shows the random coding exponents for the case and . This result can be interpreted as follows: under no Hamming weight input constraint, the symbol should not be used since its conditional mutual information is . We are forced to use it in order to satisfy the input constraint, since this is the symbol with zero cost. However, it is better to use this symbol in predetermined positions (time-sharing) rather than constructing codes over the input alphabet including this symbol. 

(









$

&

































+



.





0

-



4 Conclusions We have shown that the binary Dirty-Paper capacity can be approached by superposition coding and successive decoding. This makes the code design simpler, since we can separately construct a strucallowing complete minimum-distance decoding, and an auxiliary Hammingtured quantization code weight constrained randomlike code that can be decoded using low-complexity belief propagation iterative decoding. The superposition coding approach puts less constraints on the code construction than the conventional nested linear coding approach since it does not require that be the set of coset for some linear supercode of . representatives of the partition Despite these advantages, our first attempts to explicit constructing codes for the binary Dirty-Paper channel were not fully satisfactory. We believe that the reason for these results is twofold. On one hand, careful optimization of the GQC-LDPC code ensemble is called for. On the other hand, successive decoding should be replaced by iterative BP decoding of the whole superposition code, as currently proposed in “Turbo multiuser decoding” of coded CDMA (see for example [17, 18] and references therein). Iterative BP decoding relaxes the constraint on the BER performance of the convolutional Hamming quantizer as a channel code, which clearly appears to be the main limiting factor preventing practically good performances in the code design example reported above. Further results including iterative decoding and careful optimization of the GQC-LDPC component code are reported in [19]. 3







3

3



3



3



817

3



For the binary Dirty-Tape case we have shown that conventional binary linear coding with timesharing achieves capacity and a generally better error exponent than the direct (ternary) coding approach. This result provides a theoretical justification of some heuristically proposed watermarking/data hiding schemes, where the information is first encoded and then superimposed to the host signal according to some predetermined pattern (time-sharing). Since in several cases the Dirty-Paper and the Dirty-Tape capacities are not too far apart, such heuristic approaches can achieve a large fraction of the maximum achievable data-hiding rate.

References [1] C. Shannon, “Channels with side information at the transmitter,” IBM J. Res. & Dev., pp. 289–293, 1958. [2] A. Kusnetsov and B. Tsybakov, “Coding in a memory with defective cells,” Probl. Pered. Inform., vol. 10, no. 2, pp. 52–60, 1974. [3] S. Gelfand and M. Pinsker, “Coding for channel with random parameters,” Problems of Control and Information Theory, vol. 9, no. 1, pp. 19–31, January 1980. [4] R. Zamir, S. Shamai, and U. Erez, “Nested linear/lattice codes for structured multiterminal binning,” IEEE Trans. on Inform. Theory, vol. 48, no. 6, pp. 1250–1276, June 2002. [5] M. Costa, “Writing on dirty paper,” IEEE Trans. on Inform. Theory, vol. 29, no. 3, pp. 439–441, May 1983. [6] S. Pradhan, J. Chou and K. Ramchandran, “Duality between source coding and channel coding with side information,” UCB/ERL Technical Memorandum No. M01/34, UC Berkeley, Dec. 2001. [7] S. Verd´u, “On Channel Capacity per Unit Cost,” IEEE Trans. on Inform. Theory, Vol. 36, No. 5, pp. 1019–1030, September 1990. [8] R. Gallager, Low-density parity check codes, M.I.T. Press, Cambridge, MA, 1963. [9] V. Blinowskii, “A lower bound on the number of words of a linear code in an arbitrary sphere with given radius in .” Probl. Pered. Inform. (Problems of Inform. Trans.), Vol. 23, No. 2, pp. 50–53, 1987. 2

…

†

[10] R. Dobrushin, “Asymptotic optimality of group and systematic codes for some channels,” Theor. Probab. Appl., Vol. 8, pp. 52-66, 1963. [11] T. Richardson and R. Urbanke, “The capacity of low-density parity check codes under message passing decoding,” IEEE Trans. on Inform. Theory, vol. 47, no. 2, pp. 599–618, Feb. 2002. [12] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” IEEE Intern. Conf. on Commun. ICC ’93, pp. 1064-1070, Geneva, Switzerland, May 1993. [13] Special issue on iterative decoding, IEEE Trans. on Inform. Theory, vol. 47, no. 2, Feb. 2002. [14] P. Frenger, P. Orten, T. Ottosson, and A. Svensson, ”Multirate convolutional codes,” Tech. Rep. 21, Dept. of Signals and Systems, Communication Systems Group, Chalmers University of Technology, Goteborg, Sweden, Apr. 1998. [15] A. Bennatan and D. Burshtein, “On the Application of LDPC Codes to Arbitrary Discrete-Memoryless Channels”, submitted for publication IEEE Trans. on Inform. Theory. Also presented at the Int. Symp. Inf. Theory, Yokohama, Japan, 2003. [16] A. Bennatan and D. Burshtein, “Iterative Decoding of LDPC Codes over Arbitrary Discrete-Memoryless Channels”, The 41st Annual Allerton Conference on Commun., Control and Computing, Monticello, IL, Oct. 1-3, 2003. [17] J. Boutros and G. Caire “Iterative Multiuser Joint Decoding: United Framework and Asymptotic Analysis,” IEEE Trans. on Inform. Theory, Vol. 48, No. 7, July 2002. [18] G. Caire, S. Guemghar, A. Roumy and S. Verdu, “Maximizing the spectral efficiency of coded CDMA,” to appear on IEEE Trans. on Inform. Theory, 2004. [19] G. Caire, A. Bennatan, D. Burshtein and S. Shamai, “Coding schemes for channels with known interference,” in preparation, 2004.

818

Suggest Documents