Source and Channel Coding With Action-dependent Partially Known ...

2 downloads 6192 Views 166KB Size Report
channel coding dual problem where the formula duality of special cases is recognized. ..... i=1 PSd|X,A,Se (sd,i|xi,ai(w),se,i) from the memoryless property.
Source and Channel Coding With Action-dependent Partially Known Two-sided State Information c 2010 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for

advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

KITTIPONG KITTICHOKECHAI, TOBIAS J. OECHTERING, MIKAEL SKOGLUND, AND RAGNAR THOBABEN

Stockholm 2010 Communication Theory School of Electrical Engineering Kungliga Tekniska Hgskolan IR-EE-KT 2010:039

Source and Channel Coding With Action-dependent Partially Known Two-sided State Information Kittipong Kittichokechai, Tobias J. Oechtering, Mikael Skoglund, and Ragnar Thobaben School of Electrical Engineering and the ACCESS Linnaeus Center Royal Institute of Technology (KTH), Stockholm, Sweden

Abstract—We consider a source coding problem where the encoder can take actions that influence the availability and/or quality of the side information which is available partially and noncausally at the encoder and the decoder. We then characterize the associated achievable tradeoffs between rate, distortion, and cost. In addition, we state and discuss a capacity result for the channel coding dual problem where the formula duality of special cases is recognized.

ˆn X Decoder

Encoder w ∈ [1, 2nR1 ]

Sdn

Sen Action decoder

le

ld

n

A (w) Sn

I. I NTRODUCTION Source coding with side information and channel coding with state information have found many applications, e.g., in high-definition television where the noisy analog version of the TV signal is the side information at the decoder, and in wireless communications where the fading coefficient is the state information at the transmitter [1]. In [2] Wyner and Ziv considered source coding with side information at the receiver, while the problem of coding for channels with noncausal state information available at the encoder was solved by Gel’fand and Pinsker in [3]. In practice, the encoder and/or the decoder may not have full knowledge of state information. Heegard and El Gamal in [4] studied the channel with ratelimited noncausal state information available at the encoder and/or the decoder. Further, Cover and Chiang provided a unifying framework in characterizing the channel capacity and the rate distortion function for the systems with partial state information in [1]. Recently, Weissman has studied the channel with actiondependent states [5], and the source coding dual problem with action-dependent side-information has been studied by Permuter and Weissman [6] where the system can take actions that affect the quality of the side information. One interesting potential application of the idea of having action-dependent state/side information is in networked control problems, where the control action can influence the communication strategy, i.e., the dual control problem. To gain more insight, we are interested in studying this fundamental problem of source and channel coding with action-dependent partial state information known noncausally at the encoder and the decoder. In Section II we extend the source coding problem in [6] to the more general case where the action-dependent partial side information Sen and Sdn , possibly correlated, are available at the encoder and the decoder. In particular, the extended source coding system is different from that considered in [6] in that the action taken by the encoder depends on a rate-limited link. This might also model the sensor network scenario where the

t ∈ [1, 2nR2 ]

Xn

Fig. 1.

PSe ,Sd |X,A

S n = (Sen , Sdn )

Rate distortion with action-dependent partial side information.

encoder can take actions to require more side information at the encoder and the decoder, i.e., the noisy correlated version from other sensor measurement. In Section III we present the capacity of the channel with partially known two-sided actiondependent state information which is the modification of [5] and discuss the duality of the special cases of source and channel coding problems. We provide the proof of the source coding problem in Section IV. Notation: We denote the discrete random variables, their corresponding realizations or deterministic values, and their alphabets by the upper case, lower case, and calligraphic n letters, respectively. The term Xm denotes the sequence {Xm , . . . , Xn } when m ≤ n, and the empty set otherwise. Also, we use the shorthand notation X n for X1n . The term X n\i denotes the set {X1 , . . . , Xi−1 , Xi+1 , . . . , Xn }. II. T HE S OURCE C ODING P ROBLEM In this section we study source coding with actiondependent partial side information, and we characterize the rate distortion and cost function of the source coding problem depicted in Figure 1. At the encoder, given the partial side information Sen , the length-n source sequence of the discrete memoryless source X n is compressed into a source description with a certain rate and then transmitted to the decoder. The decoder observes this description together with the partial side information Sdn and produces the reconstruction sequence ˆ n with a certain maximum average distortion. We consider X here the finite source and reconstruction alphabets X and Xˆ . Similarly to [6], the system can take actions that influence the availability or quality of the side information with some cost. However, the system model considered here is slightly different from that considered in [6] in that the concept of sequential processing has to be taken into consideration in

our system for generating the partial side information at the encoder. In more detail, let xn be theQsource sequence which is drawn n independently according to i=1 PX (xi ), and d(n) (xn , x ˆn ) (n) n and Λ (a ) be the distortion and cost functions, respectively. (n) The encoder f1 : X n → W (n) , W (n) = {1, 2, . . . , 2nR1 }1 , (n) generates an index w = f1 (xn ), and then sends it to the action decoder and the decoder. Given the index w (n) the action decoder ga : W (n) → An selects the action (n) sequence an = ga (w). Then, with the input (xn , an ) whose the current symbols do not depend on the previous channel output, the side information (sne , snd ) ∈ Sen × Sdn is generated as the output of the memoryless channel n n n n n with Qn transition probability PSen ,Sd |X n ,An (se , sd |x , a ) = i=1 PSe ,Sd |X,A (se,i , sd,i |xi , ai ). The side information is then mapped to the partial side information for the encoder and the (n) (n) decoder by the mappings le (sne , snd ) = sne and ld (sne , snd ) = (n) snd . Next, the encoder f2 : X n × Sen → T (n) , T (n) = (n) nR2 {1, 2, . . . , 2 }, generates another index t = f2 (xn , sne ) and sends it to the decoder. Given the indices t and w and the side information snd the decoder g (n) : T (n) × W (n) × Sdn → Xˆ n reconstructs the source sequence as x ˆn = g (n) (t, w, snd ). The total rate of transmitting indices from the encoder to the decoder is the sum rate R, R = R1 + R2 . Definition 1: A rate distortion cost triple (R, D, C) is said to be achievable if for any δ > 0 and all sufficiently large n there exists  an (n, 2nR )-rate distortion cost code (n) (n) (n) f1 , f2 , ga , g (n) such that     ˆ n ) ≤ D + δ, E d(n) (X n , X E Λ(n) (An ) ≤ C + δ. The rate distortion and cost function R(D, C) is the infimum of the achievable rates with distortion D and cost C. Theorem 2: For a bounded single-letter distortion d(x, x ˆ) and cost measure Λ(a), the rate distortion and cost function for the source with action-dependent partial side information available at the encoder and the decoder is given by

R(D, C) = min[I(X; A)+I(U ; X, Se |A)−I(U ; Sd |A)], (1) where the joint distribution of (X, A, Se , Sd , U ) is of the form PX (x)PA|X (a|x)PSe ,Sd |X,A (se , sd |x, a)PU |X,Se ,A (u|x, se , a) and the minimization is over all PA|X , PU |X,Se ,A and g˜ : U × Sd → Xˆ under the distortion and cost constraints,   E d X, g˜(U, Sd ) ≤ D, E[Λ(A)] ≤ C.

U is the auxiliary random variable with |U | ≤ |A||X | + 3. Proof: The proof is provided in Section IV. We can also express R(D, C) in (1) as R(D, C) = min[I(X; A) + H(U |A) − H(U |A, X, Se ) − H(U |A) + H(U |A, Sd )] (∗)

= min[I(X; A) − H(U |A, X, Se , Sd ) + H(U |A, Sd )] = min[I(X; A) + I(U ; X, Se |A, Sd )], (2) 1 For simplicity, we assume that the term 2nR1 is an integer. The similar assumptions are also used in the remaining of the paper.

where (∗) follows from the Markov chain U −(X, A, Se )−Sd and the minimization is over the same distribution as in (1). Lemma 3: The rate distortion and cost function R(D, C) given in (1) is a non-increasing convex function of D and C. Proof: Since the domain of minimization in (1) or (2) increases with D and C, R(D, C) is non-increasing in D and C. For convexity, we consider two distinct points (Ri , Di , Ci ), i = 1, 2, which lie on the boundary  of R(D, C). Suppose PAi |X , PUi |X,Se ,Ai and g˜i (Ui , Sd ) , i = 1, 2, achieve these respective points, i.e., Ri = R(Di , Ci ) = I(X; Ai ) + I(Ui ; X, Se |Ai , Sd ), i = 1, 2. Let Q ∈ {1, 2} be a random variable independent of X with ˜ PQ (1) = 1−PQ (2) = λ where 0 ≤ λ ≤ 1. Define A = (Q, A) ˜ ) and consider the distribution and U = (Q, U PA|X (a|x)PU |X,Se ,A (u|x, se , a) = PQ,A|X ˜|x)PQ,U˜ |X,Se ,Q,A˜ (q, u ˜|x, se , q, a ˜) ˜ (q, a = PQ (q)PAq |X (a|x)PUq |X,Se ,Aq (u|x, se , a), ∀ q, a, x, u, se , ˜ = Uq if Q = q, for q = 1, 2. i.e., A˜ = Aq and U Consider also the function ˜ , Sd ) = g˜Q (UQ , Sd ) g˜(U, Sd ) = g˜(Q, U ˜ = Λ(AQ ). Λ(A) = Λ(Q, A)

and

Thus, we have     D = E d X, g˜(U, Sd ) = λE d X, g˜1 (U1 , Sd )   + (1 − λ)E d X, g˜2 (U2 , Sd ) = λD1 + (1 − λ)D2 and C = E[Λ(A)] = λE[Λ(A1 )] + (1 − λ)E[Λ(A2 )] = λC1 + (1 − λ)C2 . Then it follows  R λD1 + (1 − λ)D2 , λC1 + (1 − λ)C2 = R(D, C) ≤ I(X; A) + I(U ; X, Se |A, Sd ) ˜ ˜ ; X, Se |Q, A, ˜ Sd ) = I(X; Q) + I(X; A|Q) + I(U = λI(X; A1 ) + (1 − λ)I(X; A2 ) + λI(U1 ; X, Se |A1 , Sd ) + (1 − λ)I(U2 ; X, Se |A2 , Sd ) = λR(D1 , C1 ) + (1 − λ)R(D2 , C2 ), where I(X; Q) = 0 is due to the independency between Q and X. III. T HE C HANNEL C ODING P ROBLEM In this section we state the capacity of the channel with action-dependent partial state information depicted in Figure 2. Let n denote the block length. Given that the message m is selected uniformly at random from the set of mes(n) sages M(n) = {1, 2, . . . , 2nR }, the action encoder fa : (n) (n) n n M → A selects the action sequence a = fa (m). Given an (m) the partial state sequence at the encoder and the decoder (sne , snd ) ∈ Sen × Sdn is then generated as the output of the memoryless channel Qn with transition probability PSen ,Sdn |An (sne , snd |an ) = i=1 PSe ,Sd |A (se,i , sd,i |ai ).

M

S n = (Sen , Sdn )

An (M ) Action encoder

information Sd is degenerated, i.e., |Sd | = 1) is given by

PSe ,Sd |A

Sdn

Sen Channel encoder

Fig. 2.

C∗ =

ld

le Xn

Yn PY n |X n ,S n

ˆ M Decoder

Channel with action-dependent partial state information.

Given (m, sne ) the channel encoder f (n) : M(n) × Sen → X n generates the channel input sequence xn = f (n) (m, sne ). The channel is modelled as the memoryless channel with transition probability PY n |X n ,Sen ,Sdn (y n |xn , sne , snd ) = Q n i=1 PY |X,Se ,Sd (yi |xi , se,i , sd,i ). Given the channel output y n and the state information snd the decoder g (n) : Y n ×Sdn → M(n) generates the decoded message m ˆ = g (n) (y n , snd ). Definition 4: A rate R is said to be achievable if for any δ > 0 and all sufficiently large n there exists an (|M(n) |, n)  (n) (n) code fa , f (n) , g (n) with n1 log |M(n) | ≥ R−δ and Pe ≤ (n) ˆ. δ, where Pe is the average error probability that M 6= M The channel capacity is the supremum of all achievable rates. Theorem 5: The capacity of the channel with actiondependent partial states available noncausally to the encoder and the decoder is given by C = max[I(A, U ; Y, Sd ) − I(U ; Se |A)],

(3)

where the joint distribution of (A, Se , Sd , U, X, Y ) is of form PA (a)PSe ,Sd |A (se , sd |a)PU |A,Se (u|a, se )1{X=f˜(U,Se )} · PY |X,Se ,Sd (y|x, se , sd ) and the maximization is over all PA , PU |A,Se , and f˜ : U × Se → X , and U is the auxiliary random variable with |U | ≤ |A||Se ||X | + 1. Proof: The proof follows from [5] with the modifications such that the state S n = (Sen , Sdn ), Y n and Sdn are considered as the new channel output, and a set of distributions is restricted to satisfy the Markov relations U − (A, Se ) − Sd and X − (U, Se ) − (A, Sd ). Remark 6: Since I(A, U ; Y, Sd ) − I(U ; Se |A) is convex in PX|U,Se (·) when all other distributions are fixed, the maximum in (3) cannot be increased with the stochastic encoder. Duality: In the following, we discuss the duality of source and channel coding with partial state information in the sense that the roles of elements in the system are complementary as discussed in [1]. For the special cases of the results in (1) and (3), it is known that the rate distortion and cost function of the source with action-dependent partially known side information at the decoder (Se is degenerated, i.e., |Se | = 1) is given by R∗ (D, C) = min[I(X; A) + I(U ; X|A) − I(U ; Sd |A)] = min[I(A, U ; X) − I(U ; Sd |A)], (4) where the minimization is over PU,A|X , g˜ : U × Sd → Xˆ , and the capacity of the channel with action-dependent partially known state information only at the encoder (the state

max

PA ,PU |A,Se ,f˜:U ×Se →X

[I(A, U ; Y ) − I(U ; Se |A)]. (5)

From this, we can recognize the duality and equivalent relationships of the rate distortion and cost function and the channel capacity in (4) and (5) by the following transformation, R∗ (D, C) ↔ C ∗ minimization ↔ maximization X(source symbol) ↔ Y (received symbol) ˆ X(decoded symbol) ↔ X(transmitted symbol) Sd (state at the decoder) ↔ Se (state at the encoder) U (auxiliary) ↔ U (auxiliary) A(action) ↔ A(action). IV. P ROOF OF T HEOREM 2 We prove Theorem 2 in two parts, namely achievability and converse. A. Achievability We use the definitions and properties of ǫ-robust typicality as in [7, Appendix], i.e., the set of ǫ-robustly typical sequence for ǫ > 0 with respect to PX (·) is denoted by   (n) n n 1 n Tǫ (X) = x ∈ X : N (x|x ) − PX (x) ≤ ǫPX (x) , n (6)

where N (x|xn ) is the number of occurrences of x in the sequence xn . We show that any R > R(D, C) is achievable. Fix PA|X , PU |X,Se ,A , and the function g˜ : U × Sd → Xˆ . Codebook Generation: Let W (n) = {1, 2, . . . , 2nR1 }, ′ (n) T = {1, 2, . . . , 2nR2 }, and V (n) = {1, 2, . . . , 2nR }. For each ∈ W (n) we generate an n-tuple an (w) i.i.d. according Qw n to i=1 PA (ai ). Further, for each w ∈ W (n) we generate ′ 2n(R2 +RQ) codewords {un (t, v, w)}t∈T (n) ,v∈V (n) i.i.d. accordn ing to i=1 PU |A ui |ai (w) . We reveal the codebooks to the encoder, the action decoder, and the decoder. Consider 0 < ǫ0 < ǫ1 < ǫ2 < ǫ3 < ǫ ≤ 1. Encoding: Given xn the first looks for the smallest  encoder (n) w such that xn , an (w) ∈ Tǫ1 (X, A) and transmits it to the decoder and the action decoder to generate the side information (sne , snd ). If there does not exist such w, the encoder transmits w = 1. Then, given xn , sne , and w, the encoder in the second stage looks for a pair (t, v) with smallest t and v (n) such that (xn , un (t, v, w), an (w), sne ) ∈ Tǫ3 (X, U, A, Se ). If there exists such a pair, the corresponding index t is sent to the decoder. Otherwise, the encoder sends t = 1. Decoding: Given the indices t, w, and the side information snd the decoder looks for the unique v˜ ∈ V (n)  (n) such that snd , un (t, v˜, w), an (w) ∈ Tǫ (Sd , U, A). If there exists such v˜, the decoder reconstructs x ˆn , where x ˆi = g˜(ui (t, v˜, w), sd,i ). Otherwise, the decoder puts out x ˆn , where x ˆi = g˜(ui (t, 1, w), sd,i ).

Analysis of average distortion: Let (t¯, v¯, w) ¯ denote the corresponding indices of the chosen sequences an and un at the encoder. We define the “error” events as follows.  E1 = X n ∈ / Tǫ(n) (X) 0   E2 = X n , An (w) ∈ / Tǫ(n) (X, A), for all w ∈ W (n) 1  / Tǫ(n) E3 = (X n , An (w), ¯ Sen , Sdn ) ∈ (X, A, Se , Sd ) 2   E4 = X n , U n (t, v, w), ¯ An (w), ¯ Sen ) ∈ / Tǫ(n) (X, U, A, Se ), 3 (n) (n) for all (t, v) ∈ T ×V   E5 = X n , Sdn , U n (t¯, v¯, w), / Tǫ(n) (X, Sd , U, A) ¯ An (w) ¯ ∈   E6 = Sdn , U n (t¯, v˜, w), ¯ An (w) ¯ ∈ Tǫ(n) (Sd , U, A), for some v˜ ∈ V (n) , v˜ 6= v¯ .

The total “error” probability that the encoding and decoding processes do not lead to the decoded codeword un which is jointly typical with snd and xn is bounded by Pr(E) ≤ Pr(E1 ) + Pr(E2 ∩ E1c ) + Pr(E3 ∩ E2c ) + Pr(E4 ∩ E3c ) + Pr(E5 ∩ E4c ) + Pr(E6 ∩ E5c ), where Eic denotes the complement of the event Ei .  (n) 1) By [7, Lemma 17], Pr X n ∈ Tǫ0 (X) ≥ 1 − δǫ0 (n). Since δǫ0 (n) can be made arbitrarily small with increasing n if ǫ0 > 0, we have Pr(E1 ) → 0 as n → ∞. 2) Consider the events E2 and E1c . We have X

Pr(E2 ∩ E1c ) = Pr



(n) xn ∈Tǫ0 (X)

\

w∈W (n)

X

=

(n) xn ∈Tǫ0 (X)

(a)



X xn

(b)

p(xn )·



 n n (n) (X , A (w), ¯ Sen ) ∈ Tǫ2 (X, A, Se ) , we have Pr(E4 ∩ E3c ) X ≤

(n)

(xn ,an ,sn e )∈Tǫ2 (X,A,Se )



xn , An (w) ∈ / Tǫ(n) (X, A) 1

h i2nR1 n (A|x ) p(xn ) · 1 − Pr An (w) ∈ Tǫ(n) 1

h i2nR1  p(xn ) · 1 − 1 − δǫ0 ,ǫ1 (n) 2−n[I(X;A)+δǫ1 ]

h i ≤ δǫ0 ,ǫ1 (n) + exp −2n[R1 −I(X;A)−δǫ1 ] ,

where (a) follows by using [7, Lemma 25] and (b) follows from (1 − xy)m ≤ 1 − x + exp(−ym), for 0 ≤ x, y ≤ 1, m > 0, [8, Lemma 10.5.3]. Since δǫ0 ,ǫ1 (n) → 0 as n → ∞, we have Pr(E2 ∩ E1c ) → 0 as n → ∞ if R1 > I(X; A) + δǫ1 . 3) Consider the event E2c in which there exists an index (n) w ¯ such that xn , anQ (w) ¯ ∈ Tǫ (X, A). Since (Sen , Sdn ) is n i.i.d. according to i=1 PSd ,Sd |X,A sd,i , sd,i |xi , ai (w) ¯ , by using [7, Lemma 22], we have Pr (xn , an (w), ¯ Sen , Sdn ) ∈  (n) ¯ → 1 as n → ∞, Tǫ2 (X, A, Se , Sd ) X n = xn , An = an (w) and thus Pr(E3 ∩ E2c ) → 0 as n → ∞. 4) Consider the event that the encoder cannot find a pair (t, v) such that (X n , U n (t, v, w), ¯ An (w), ¯ Sen ) ∈ (n) Tǫ3 (X, U, A, Se ). Since we know that the event E3c ⊆

\

t∈T (n) ,v∈V (n)

h i2n(R2 +R′ )  ≤ 1 − 1 − δǫ2 ,ǫ3 (n) 2−n[I(U ;X,Se |A)+δǫ3 ] h i ′ ≤ δǫ2 ,ǫ3 (n) + exp −2n[R2 +R −I(U ;X,Se |A)−δǫ3 ] , (⋆)



where (⋆) follows byQusing [7, Lemma 20and 24] where U n n is i.i.d. according to i=1 PU |A ui |ai (w) ¯ . That is, Pr(E4 ∩ c E3 ) → 0 as n → ∞ if R2 + R′ > I(U ; X, Se |A) + δǫ3 . 5) Consider the event E4c in which there exists (t¯, v¯)  (n) such that xn , un (t¯, v¯, w), ¯ an (w), ¯ sne ∈ Tǫ3 (X, U, A, Se ). n We have the Markov Qnchain U − (X, A, Se ) − Sd , and Sd is i.i.d. according to i=1 PSd |X,A,Se sd,i |xi , ai (w), se,i from the memoryless property. By using the Markov lemma [7, Lemma 23], we have Pr (xn , un (t¯, v¯, w), ¯ an (w), ¯ sne , Sdn ) ∈  (n) Tǫ (X, U, A, Se , Sd ) → 1 as n → ∞. This implies that Pr(E5 ∩ E4c ) → 0 as n → ∞.   (n) ¯ ∈ Tǫ (Sd , A) , it 6) Since the event E5c ⊆ Sdn , An (w) then follows that [ X Pr(E6 ∩ E5c ) ≤ p(snd , an ) · Pr (n)

n (sn d ,a )∈Tǫ

≤ 



(xn , U n (t, v, w), ¯ an (w), ¯ sne ) ∈ / Tǫ(n) (X, U, A, Se ) 3




I(X; A) + I(U ; X, Se |A) − I(U ; Sd |A). Finally, we consider the case when there is no “error” in the encoding and decoding processes. In this case, we (n) have (xn , un (t¯, v¯, w), ¯ snd ) ∈ Tǫ (X, U, Sd ) and v˜ = v¯. The distortion is then bounded by n  1X d xi , g˜(ui , sd,i ) d(n) (xn , x ˆn ) = n i=1  1 X = N (a, b, c|xn , un , snd )d a, g˜(b, c) n a,b,c X  PX,U,Sd (a, b, c) (1 + ǫ) d a, g˜(b, c) ≤ a,b,c

  = E d X, g˜(U, Sd ) (1 + ǫ) ,

where the inequality follows from the definition in (6). Now we consider the average distortion which can be bounded by    ˆ n ) ≤ Pr(E) · dmax + 1 − Pr(E) · E d(n) (X n , X   E d X, g˜(U, Sd ) (1 + ǫ) ,

where dmax is assumed to be the maximal distortion incurred by the “error” events. Since ǫ can be made arbitrarily small with increasing n, we show that for any δ > 0 and all sufficiently large n if R > I(X;  A) + I(U ; X, Se |A) − I(U ; Sd |A),n and ˆ )] ≤ E d X, g˜(U, Sd ) ≤ D, we can achieve E[d(n) (X n , X D + δ. Similarly, by considering that an is typical and E Λ(A) ≤ C, for any δ > 0 and all sufficiently large n, we have the bound on the cost constraint E[Λ(n) (An )] ≤ C + δ. Thus, any (R, D, C) with R > R(D, C) is achievable. 

the last term, and Q = −H(X n |An , Sdn , T, W ) − H(Sen |An , Sdn , T, W, X n ) n X ≥ −H(Xi |An , Sdn , T, W, X i−1 ) i=1

− H(Se,i |An , Sdn , T, W, X i−1 , Xi ) n X H(Xi , Se,i |Ui , Ai , Sd,i ), =−

(9)

i=1

n\i

where Ui , (An\i , Sd , T, W, X i−1 ), i = 1, 2, . . . , n. Combining (7)-(9), we have n X nR ≥ I(Xi ; Ai ) + I(Ui ; Xi , Se,i |Ai , Sd,i ) i=1

n    X  ≥ R E d Xi , g˜i (Ui , Sd,i ) , E[Λ(Ai )]

(a)

i=1

B. Converse

(b)

We will show that for any achievable rate distortion cost triple (R, D, C), R ≥ R(D, C). Let (R, D, C) be any achievable rate distortion cost triple. Then, for any δ > 0 and all sufficiently large n, there an (n, 2nRi) code such that h exists P ˆ n )] = E 1 n d(Xi , X ˆ i ) ≤ D + δ, and E[d(n) (X n , X i=1 n   1 Pn E n i=1 Λ(Ai ) ≤ C +δ. Let (T, W ) ∈ {1, 2, . . . , 2nR2 }× {1, 2, . . . , 2nR1 } denote the encoded version of X n where R = R1 + R2 is the sum-rate, then the standard properties of the entropy function give (∗)

nR ≥ H(T, W ) = H(T, W, An ) = H(An ) + H(T, W |An ) ≥ [H(An ) − H(An |X n , Sen )] + [H(T, W |An , Sdn ) − H(T, W |An , X n , Sen , Sdn )] = H(X n , Sen ) − H(X n , Sen |An ) + H(X n , Sen |An , Sdn ) | {z } =P

−H(X n , Sen |An , Sdn , T, W ), {z } |

(7)

=Q

(n)

(n)

where in (∗) we used the fact that An = ga (W ), ga (·) is the deterministic function. Further, P = H(X n ) + H(Sen |X n ) + H(Sen , Sdn |X n , An ) − H(Sen |X n , An ) − H(Sdn |An ) ≥ H(X n ) + H(Sen , Sdn |X n , An ) − H(Sdn |An ) n (⋆) X H(Xi ) + H(Se,i , Sd,i |Xi , Ai ) − H(Sd,i |Ai ) ≥ i=1

=

n X

H(Xi ) + H(Se,i |Xi , Ai ) − H(Xi |Ai )

i=1

− H(Se,i |Xi , Ai ) + H(Xi , Se,i |Ai , Sd,i ) n X = I(Xi ; Ai ) + H(Xi , Se,i |Ai , Sd,i ),

(8)

i=1

where in (⋆) we used the memoryless property for the first two terms and the fact that conditioning reduces entropy for

≥ n·R

(c)

n

n

 1 X 1X  E d Xi , g˜i (Ui , Sd,i ) , E[Λ(Ai )] n i=1 n i=1

!

≥ n · R(D, C),

where (a) follows from the definition of rate distortion and cost function in (1) and (2), and the fact that Ui − ˆi = (Ai , Xi , Se,i ) − Sd,i forms a Markov chain, and that X (n) n gi (W, T, Sd ) = g˜i (Ui , Sd,i ) for some g˜i (·), (b) follows from Jensen’s inequality and convexity of R(D, C), and (c) follows from the non-increasing property of R(D,C),  Pn ˆ i ) ≤ D + δ, and E 1 Pn Λ(Ai ) ≤ E n1 i=1 d(Xi , X i=1 n C + δ which have to hold for any δ > 0. For the bound on the cardinality of the set of U , it can be shown by using the support lemma [9] that U should have |A||X | − 1 elements to preserve PA,X , plus four more for I(U ; X, Se |A), I(U ; Sd |A), the distortion, and the cost constraints. This finally concludes the proof.  ACKNOWLEDGEMENT

The authors wish to thank the anonymous reviewers and Hieu Do for helpful comments and discussions. R EFERENCES [1] T. M. Cover and M. Chiang, “Duality between channel capacity and rate distortion with two-sided state information,” IEEE Trans. Inf. Theory, vol. 48, no. 6, pp. 1629–1638, Jun. 2002. [2] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1–10, Jan 1976. [3] S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random parameters,” Probl. Contr. Inf. Theory, vol. 9, no. 1, pp. 19–31, 1980. [4] C. Heegard and A. E. Gamal, “On the capacity of computer memory with defects,” IEEE Trans. Inf. Theory, vol. 29, no. 5, pp. 731–739, Sep 1983. [5] T. Weissman, “Capacity of channels with action-dependent states,” submitted to IEEE Trans. Inf. Theory. [6] H. Permuter and T. Weissman, “Source coding with a side information “vending machine”,” submitted to IEEE Trans. Inf. Theory. [7] A. Orlitsky and J. R. Roche, “Coding for computing,” IEEE Trans. Inf. Theory, vol. 47, no. 3, pp. 903–917, Mar 2001. [8] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. New York: Wiley, 2006. [9] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems. London, U.K.: Academic Press, 1981.