Noisy Feedback Schemes and Rate-Error Tradeoffs from ... - CiteSeerX

Noisy Feedback Schemes and Rate-Error Tradeoffs from Stochastic Approximation Utsaw Kumar, J. Nicholas Laneman, and Vijay Gupta Department of Electrical Engineering University of Notre Dame Email: {ukumar, jnl, vgupta2}@nd.edu Abstract—It is known that noiseless feedback does not increase the capacity of memoryless channels. However, such feedback can considerably increase the reliability or reduce the coding complexity of schemes that approach capacity. One might hope for the same to be (at least partially) true of noisy feedback. This paper develops a new class of coding schemes for additive white noise channels with feedback corrupted by additive white noise, focusing much of the results and discussion on the Gaussian case. These schemes are variants of the well-known Schalkwijk-Kailath (SK) coding scheme and are based upon simple techniques from stochastic approximation. Specifically, instead of the classic Robbins-Munro approach to stochastic approximation originally used in the SK scheme, we employ more recent techniques from Kushner. The resulting schemes enable a tradeoff between transmission rate and error performance in the presence of noisy feedback even as the number of iterations becomes large.

I. I NTRODUCTION Communication over channels with access to a feedback link has long been studied in the information theory community. Shannon [1] proved that the channel capacity of a memoryless channel is, somewhat surprisingly, not increased by noiseless feedback. However, feedback is still useful in a number of ways, including reducing the delay and complexity of capacity-approaching schemes. For example, the simple coding scheme proposed by Schalkwijk and Kailath [2] achieves capacity for the additive white Gaussian noise (AWGN) channel with noiseless feedback, while simultaneously achieving an error probability decay that is doubly exponential in the coding interval compared to just exponential for one-way schemes. Motivated by these advantages, a large body of work has looked at extending the Schalkwijk-Kailath (SK) scheme to more general cases. As some representative examples, Ozarow extended the SK scheme to the two-user multiple access channel [3] and the broadcast channel [4], Kramer [5] extended the scheme to interference networks, and Merhav and Weissman [6] extended the “dirty-paper coding” scheme to the case of feedback with an SK-like scheme. More general channels than an AWGN link have also been considered. Shayevitz and Feder [7], [8] recently provided an interpretation of the SK scheme that allows construction of such schemes for arbitrary channels. In particular, it unifies the classic scheme of Horstein [9] for binary symmetric channels and the SK scheme for AWGN channels. Another direction in which SK schemes have been extended is to consider an imperfect feedback link. Unfortunately, SK

and related schemes generally appear to depend critically on the absence of noise in the feedback link and are not robust to non-ideal feedback, e.g., AWGN in the feedback path [2], [10]. For feedback noise of infinite support, and specifically AWGN, Schalkwijk and Kailath [2] discuss two possible approaches for dealing with the feedback noise. In the first strategy, the decoder transmits its estimate on the feedback link. Although the probability of error at the decoder decreases to zero, the rate relative to the capacity of the channel approaches zero simultaneously. In the second strategy, the decoder transmits the output of the forward channel on the feedback link. In this case, although the relative rate approaches unity, the meansquared error does not decay to zero. In fact, as [11] shows, as long as the encoding scheme is linear, feeding back the output of the channel cannot result in positive achievable rates with a vanishing probability of error. A general scheme that works well for the case of noisy feedback is still unknown, which remains a drawback for the SK scheme both intellectually and practically. The recent work in [12] extends the SK scheme to handle feedback noise of finite support, e.g., quantization noise, by utilizing some ideas from the Witsenhausen counterexample in distributed control [13]. In particular, their modification of the Schalkwijk-Kailath scheme provides a positive rate and a vanishing probability of error, in spite of the noise in the feedback link. However, more work remains to be done for schemes that function in spite of more general noisy feedback. In this paper, we take a fresh look at the problem of communicating across links with noisy feedback. In these initial results, we mainly focus on an AWGN link with feedback corrupted by AWGN. Inspired by the stochastic approximation based interpretation of the SK scheme, we present a generalization of the scheme for a point-to-point additive noise channel with noisy feedback. We employ modern techniques from the stochastic approximation literature [14] to develop a class of SK-like schemes that enable a trade-off between the transmission rate and the error performance even as the number of iterations grows large. Although our schemes do not simultaneously achieve capacity with vanishing probability of error, they permit tuning the performance with respect to the two metrics. The remainder of the paper is organized as follows. We discuss the channel model in Section II. We present our scheme, a few special cases, and their analysis in Section III.

Z2 (k) θ

Encoder

f (X1 (k), Y1 (k))

Y2 (k)

X1 (k)

Decoder

ˆ θ(k)

X2 (k)

Y1 (k)

g(X2 (k − 1), Y2 (k − 1)) Z1 (k)

Fig. 1. Block diagram for the AWGN channel with feedback corrupted by AWGN.

In Section IV we present our simulation results. We briefly discuss the performance of our scheme in the presence of nonGaussian noise channels in Section V. We finally conclude and present some directions for future work in Section VI. II. C HANNEL M ODEL Consider the case of point-to-point communication across a continuous-time AWGN channel with feedback corrupted by AWGN. The transmitted signal must satisfy an average power constraint Pav , but no constraints are imposed on its bandwidth or peak power. The noise in the forward link and the noise in the feedback link are mutually independent, white and with two-sided power spectral densities σ22 and σ12 , respectively. Since the capacity of a memoryless noisy channel is not increased by noiseless feedback, it cannot be increased by noisy feedback; thus, the capacity of such a channel would be C = Pav /2σ 2 nats/second. As in [2], we work with a discrete-time equivalent model in which messages are pulse-amplitude modulated with symbol period ∆; T , an integer multiple of ∆, is the coding interval of a message in seconds. The discrete-time equivalent of the channel is illustrated in Fig. 1. We adopt the following notation throughout the paper: node 1 is the transmitter and node 2 is the intended receiver; Yi represents the observations at node i = 1, 2, respectively. Node i stores and updates a “state” value ˆ Xi recursively, which we define in Section III. θ(k) represents the estimate at the receiver. The noise in the forward link ∞ {Z2 (k)}∞ k=1 and the noise in the feedback link {Z1 (k)}k=1 are mutually independent, white and with distributions N (0, σ22 ) and N (0, σ12 ), respectively. The transmitter maps the message to one of M disjoint, equal-length subintervals of the interval [0, 1] on the real line. The message to be transmitted corresponds to the midpoint θ of a particular message interval. In our development, the functions f (·) and g(·) that define the signals transmitted over the two links are memoryless, although more general approaches can be considered. We now summarize the class of encoding and decoding schemes we propose and show how they reduce to SK for a special choice of parameters. III. F EEDBACK C OMMUNICATION S CHEMES Our algorithm is an extension of the SK algorithm that is based on modern stochastic approximation techniques [14]. In

general, at time step k the encoder sends α f (X1 (k)) = γ (X1 (k) − θ) , k

(1)

where α > 0 and γ ∈ [0, 1]. The parameter α plays an identical role as in the SK scheme, and as we will see, the parameter γ prevents the feedback noise from dominating the transmit power. The signal Y2 (k) = f (X1 (k))+Z2 (k) is received at the decoder. The decoder uses the following algorithm to generate ˆ the estimate θ(k) of the message θ at time k : !

P (k)

$ Y2 (k) X2 (k) = X2 (k − 1) − αk 1−γ & β %ˆ + 1−γ θ(k − 1) − X2 (k − 1) , #k $! " '

1 ˆ θ(k) = 1− k

"#

(

(2)

Q(k)

ˆ − 1) + 1 X2 (k), θ(k k

(3)

where β ∈ [0, 1]. The values g(·) = X2 (k) are then sent back to the encoder, which uses X1 (k) = X2 (k − 1) + Z1 (k) for the next iteration. We can arbitrarily assume the initial values ˆ X1 (1), X2 (0), and θ(0) to be 1/2, which is known to both the encoder and the decoder. The parameter β weights the “averaging” term Q(k) relative to the “observation” term P (k), and β ∈ [0, 1] ensures that the averaging does not dominate [14, Sec. 3.3]. Note that throughout the paper, we focus our technical analysis on the case for which β = 0, i.e., X2 (k) = P (k). We illustrate the advantage of non-zero β in Section IV with the help of numerical examples. We now show how this algorithm reduces to the SK algorithm. A. Special Cases: Schwalkwijk-Kailath Schemes For γ = 0 and β = 0, (1)-(2) reduces to the SK scheme, which for perfect feedback is a Robbins-Monro stochastic approximation algorithm [15]. If the feedback is noisy, [2] discusses two possible approaches for dealing with the feedback noise. In either case, the encoder sends f (X1 (k)) = α(X1 (k) − θ), where the initial value X1 (1) is arbitrarily set to, e.g., 1/2. At the decoder, at every time step, Y2 (k) = f (X1 (k)) + Z2 (k) is received and X2 (k) = X2 (k − 1 1) − ( αk )Y2 (k) is computed, with X2 (0) = X1 (1). Note that the “smoothing” of (3) does not affect the convergence rate of ˆ the estimate θ(k) if γ = 0 [14, Sec. 3.3]. Thus the smoothing can be ignored in this case without loss of optimality, i.e., ˆ θ(k) = X2 (k). In the first strategy, the decoder transmits g(·) = X2 (k − ˆ − 1). We call this scheme SK-Estimate. The scheme 1) = θ(k can be summed up as Y2 (k) = α(X1 (k) − θ) + Z2 (k), 1 X2 (k) = X2 (k − 1) − Y2 (k), αk X1 (k) = Y1 (k) = X2 (k − 1) + Z1 (k).

(4)

The recursive equations can be solved to yield the following relation after n iterations. n n 1 ! 1! ˆ Z2 (k) − Z1 (k). (5) θ(n) = X2 (n) = θ − αn n k=1

the numerator increases only logarithmically. For the special case of perfect feedback, that is σ 12 = 0, it can be shown that lim Pav (n) = 2σ22 Rc ⇒ Rc =

n→∞

k=2

ˆ Hence the conditional distribution of θ(n) given θ is " # σ2 (n − 1)σ12 ˆ θ(n)|θ ∼ N θ, 22 + . α n n2

ˆ − θ|2 ) approaches The mean-squared error σ 2 (n) = E(|θ(n) zero as Θ(1/n). The probability of error is the probability that ˆ θ(n) lies outside the message interval of size 1/M , which is given by " # 1 Pe = 2 erfc . 2M σ(n) for small $ > 0, we can make the If we set M (n) = n probability of error go to zero while simultaneously ensuring a non-zero rate of R(n) = (1 − $) ln n/(2T ) nats/sec. The probability of error would hence be given by " # 1 Pe = 2 erfc . (6) 1 2n 2 (1−!) σ(n)

which is the channel capacity of the additive white Gaussian noise channel. In the second strategy, the observation Y 2 (k−1) is sent back, i.e., g(·) = Y2 (k − 1). We call this scheme SK-Observation. The scheme can be summed up as Y2 (k) = α(X1 (k) − θ) + Z2 (k), 1 Y2 (k), X2 (k) = X2 (k − 1) − αk Y1 (k) = Y2 (k − 1) + Z1 (k), 1 Y1 (k). X1 (k) = X1 (k − 1) − αk

1/2(1−!)

It can be seen that Pe → 0 as n → ∞ only if $ > 0. The critical rate is defined as $ % ln M ln n Rc := nats/sec, (7) = T !=0 2T

Thus, for a non-zero asymptotic rate, we can see that the number of iterations n increases exponentially with T , explaining the wideband assumption. To calculate capacity, we need to compute the average transmitted power & n ' ! 1 2 2 Pav (n) = E α (X1 (k) − θ) . (8) T k=1

We can see that

" # (n − 1)σ12 σ22 2 + σ1 . X1 (n + 1)|θ ∼ N θ, 2 + α n n2

(1 − $) ln n R(n) &n−1 ' . (11) = n−1 2 2 C ! ! k−1 α 1 σ + α2 12 + +n 12σ22 k σ2 k2 k=1

k=1

We can see from this expression that the relative rate gradually approaches zero as n increases. This is because the denominator increases linearly with the feedback noise variance while

(12)

Solving the recursive relations, we obtain X1 (n + 1) = θ −

n 1 ! (Z2 (k) + Z1 (k + 1)) , αn k=1

n

1!1 ˆ Z1 (k). θ(n) = X2 (n) = X1 (n + 1) + α k k=1

Hence, ˆ θ(n)|θ ∼N

&

n " #2 σ22 + σ12 σ12 ! 1 θ, + 2 α2 n α k k=1

'

.

The first term in the variance goes to zero, but the second term gradually converges to a nonzero constant. Thus, the meanˆ − θ|2 ) does not converge to squared error σ 2 (n) = E(|θ(n) zero. The probability of error expression is identical to (6) ˆ We can see that with σ 2 (n) given by the variance of θ(n). " # σ22 + σ12 X1 (n + 1)|θ ∼ N θ, . α2 n

(9)

Assuming a uniform prior distribution for the message point θ, we have E(X1 (1) − θ)2 = 1/12. The rest of the terms in (8) can be computed using (9), yielding & #' n−1 2 ! " σ2 1 (k − 1)σ α2 2 1 Pav (n) = + + + σ12 . T 12 α2 k k2 k=1 (10) Using (10) to compute the channel capacity, we can find the relative rate expression for this scheme as

Pav , 2σ22

(13)

We can calculate the relative rate expression for this scheme as before: & n ' ! 1 2 2 Pav (n) = E α (X1 (k) − θ) T k=1 & ' n−1 α2 1 σ22 + σ12 ! 1 = + , (14) T 12 α2 k k=1

so that R(n) = C

(1 − $) ln n

n−1 α σ22 + σ12 ! 1 + 12σ22 σ22 k 2

k=1

This expression gradually approaches unity.

.

(15)

B. Our Algorithm The last term in the variance of (9) adds up in (8) to increase the denominator in (11) linearly in n. Thus, a large portion of the power is spent in sending information about the feedback noise to the decoder. To overcome this problem we introduce γ into our scheme (1)-(3). This helps keep the contribution from feedback noise small, ensuring that the denominator does not have a linear increase with the feedback noise variance. Let us first consider the case in which γ &= 0 and β = 0, so that X2 (k) = P (k). It can be seen that X2 (n) = θ −

n n 1 ! γ 1! k Z2 (k) − Z1 (k). αn n k=1

(16)

k=2

Knowing the distribution of X 2 (n), we can find the distribuˆ tion for θ(n). It can be seen from (3) and (16) that n

1! ˆ X2 (k) θ(n) = n k=1   n k k 1 ! γ 1 ! 1! θ− j Z2 (j) − Z1 (j) = n αk j=1 k j=2 k=1

n

n

n

!1 !1 1! 1 ! γ − . j Z2 (j) Z1 (j) αn j=1 k n j=2 k k=j

k=j

(17)

ˆ Hence, θ(n)|θ is also conditionally Gaussian. , ˆ θ(n)|θ ∼ N θ, σ 2 (n) ,

b(n)

(18) The terms a(n) and b(n) above represent the contributions of the forward and feedback noises, respectively, to the meansquared error. Expanding out the squares in b(n) and simplifying, we obtain  & n '2  n ! !1 σ12  1 . b(n) = 2 2n − − (19) n k k k=1

k=2

R(n) = C

α2 + 12σ22

n !

k=2



(1 − $) ln(n)

, k−1 α2  1 ! 2γ (k − 1)σ12 σ12  j + + 2 k 2γ α2 k 2 j=1 k 2 σ22 σ2

(21) which approaches unity only if γ ∈ [1/2, 1]. More generally, since (18) and (21) are increasing in γ, our algorithm allows for a tradeoff between error performance and transmission rate. Kushner et al. [14] discuss the smoothing of estimates (3) which is useful only if γ > 0. They also suggest feeding back the averages into the estimation equation to get better estimates. Motivated by their work, we introduce the averaging term Q(k) in (2) which empirically improves the mean-squared error E(| θˆn − θ|2 ). To this point we have not analytically characterized performance with β &= 0, but in the next section we provide simulations results as an illustration. IV. S IMULATION R ESULTS

where σ 2 (n) can be calculated as 2 2   n n n n 2 ! 2 ! ! ! 1 1 σ σ  + 1  .  σ 2 (n) = 2 2 2 j 2γ  α n j=1 k n2 j=2 k k=j k=j . /0 1 . /0 1 a(n)

' n 2 ! α 1 Pav (n) = E α2 (X1 (1) − θ)2 + (X1 (k) − θ)2 T k 2γ k=2    n k−1 2 2 2 ! 1 ! σ (k − 1)σ α 1 1  2 + j 2γ + + σ12  . = T 12 k 2γ α2 k 2 j=1 k2 Using (20) to compute the channel capacity, we can calculate the relative rate expression as

k=1

n

k=1

&

(20)

Since Z2 (k) and Z1 (k) are Gaussian, & ' n σ22 ! 2γ (n − 1)σ12 X2 (n)|θ ∼ N θ, 2 2 . k + α n n2

=θ−

The feedback noise variance σ 12 off course affects the rate performance, which we will now discuss. The average power can be calculated as: & ' n σ22 ! 2γ (n − 1)σ12 2 k + + σ1 . X1 (n + 1)|θ ∼ N θ, 2 2 α n n2

k=1

Hence, b(n) decreases to zero at least as Θ(1/n) irrespective of the value of γ > 0, and the mean-squared error for our scheme is more robust than the SK scheme to the feedback noise variance σ12 for large n. On the other hand, it can be ˆ shown that a(n) converges to zero, and θ(n) is a consistent estimator of θ, only if γ ∈ [0, 1/2). The probability of error expression is identical to (6) with σ 2 (n) given by (18).

Figs. 2 and 3 illustrate performance by plotting the meansquared error and the relative transmission rate for SKEstimate, SK-Observation, and our algorithm for different values of γ and β. For all algorithms, we assume: the message θ to be uniformly distributed in [0, 1]; both the forward and backward noises to have variance σ 12 = σ22 = 0.1; and the parameter α to be 2, which need not be optimal. We display the curves for β = 0 and 0.4. The figures clearly show the advantage of additional averaging and smoothing of the estimates in (2)-(3): the mean-squared error is reduced to give better estimates while minimally affecting the transmission rate performance. V. N ON -G AUSSIAN N OISE Stochastic approximation algorithms are in general nonparametric [2], [14]. Therefore, our coding scheme also applies if the additive white noise is non-Gaussian. Using Sacks’ theorem on asymptotic distributions [16], we can say that ˆ the estimate θ(n) is asymptotically Gaussian. So the error performance for large values of n would be similar to the Gaussian case. A lower bound on channel capacity for nonGaussian white noise channels is given by P av /2σ22 ; thus, the

10

0

S K− O bse rva tion

M e a n S qua re d E rror

10

10

10

10

10

−1

−2

−3

−4

S K− E stim a te −5

10

0

10

1

2

10 N o. of Ite ra tions

10

3

10

4

Fig. 2. Mean-squared error versus iterations for AWGN with noisy feedback for different schemes; successively higher intermediate curves correspond to our scheme with γ = 0.4, 0.5, 0.6, and 0.7, all for α = 2. Dashed curves correspond to additional smoothing of the estimates with β = 0.4, and the dash-dotted curves correspond to no additional smoothing, i.e., β = 0.

feedback link is noisy. Notice that, although we considered a delay of one time step in the feedback link throughout the paper, even for a delay of d > 1 time steps, our algorithm would still converge. Moreover, it will obtain the same meansquared error in d additional time steps. Hence, for a large number of iterations, the effect on either the rate or the meansquared error would be negligible. This work is merely a first cut in this direction and can be further developed in multiple ways. One immediate direction is to optimize the different parameters to obtain the best possible mean-squared error performance for a given rate. More generally, non-linear stochastic approximation algorithms might achieve better mean-squared error performance. The advantage of having stochastic approximation based estimation is that the estimation procedures do not change if we change the driving function at the encoder, as long as the message point remains a zero of that particular function. Finally, we would like to extend the scheme to more general channels and noise distributions. The insights of [7], [8] are expected to be important in this regard. R EFERENCES

R elative R ate = R /C

1

0.8 S K O bse rva tion 0.6

0.4

0.2

0 0 10

S K E stim a te

10

1

2

10 N o. of Ite ra tions

10

3

10

4

Fig. 3. Relative ransmission rate versus iterations for AWGN with noisy feedback for different schemes; successively higher intermediate curves correspond to our scheme with γ = 0.4, 0.5, 0.6, and 0.7, all for α = 2. Dashed curves correspond to additional smoothing of the estimates with β = 0.4, and the dash-dotted curves correspond to no additional smoothing, i.e., β = 0.

relative rate plots with respect to the lower bound would look similar to those for the Gaussian case as well. VI. C ONCLUSIONS AND F UTURE W ORK In this paper, inspired by the stochastic approximation based interpretation of the SK scheme, we presented some initial results about a generalization of the scheme that permits us to trade-off the relative rate with the mean-squared error for communication over a link with noisy feedback. In particular, our algorithm allows transmission at a positive information rate for a reasonably low mean squared error even when the

[1] C. Shannon, “The zero-error capacity of a noisy channel,” IRE Transactions on Information Theory, vol. IT-2, pp. 8–19, September 1956. [2] J. P. M. Schalkwijk and T. Kailath, “A coding scheme for additive noise channels with feedback - I: No bandwidth constraint,” IEEE Trans. Inform. Theory, vol. IT-12, pp. 172– 182, April 1966. [3] L. H. Ozarow, “The capacity of the white Gaussian multiple access channel with feedback,” IEEE Trans. Inf. Theory, vol. 30, no. 4, pp. 623–629, Jul. 1984. [4] L. Ozarow and S. Leung-Yan-Cheong, “An achievable region and outer bound for the Gaussian broadcast channel with feedback (corresp.),” IEEE Trans. Inf. Theory, vol. 30, no. 4, pp. 667–671, Jul. 1984. [5] G. Kramer, “Feedback strategies for white Gaussian interference networks,” IEEE Transactions on Information Theory, vol. IT-48, No. 6, pp. 1423–1438, Jun. 2002. [6] N. Merhav and T. Weissman, “Coding for the feedback Gel’fand-Pinsker channel and the feedforward Wyner-Ziv source,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Sep. 2005, pp. 1506–1510. [7] O. Shayevitz and M. Feder, “Communication with feedback via posterior matching,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Nice, Jun. 24–29, 2007, pp. 391–395. [8] ——, “The posterior matching feedback scheme: Capacity achieving and error analysis,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Toronto, ON, Jul. 6–11, 2008, pp. 900–904. [9] M. Horstein, “Sequential transmission using noiseless feedback,” IEEE Transactions on Information Theory, pp. 136–143, Jul. [10] M. Gastpar, “On noisy feedback in Gaussian networks,” in Proc. Allerton Conf. Communications, Control, and Computing, Monticello, IL, sep 2005. [11] Y.-H. Kim, A. Lapidoth, and T. Weissman, “The Gaussian channel with noisy feedback,” in Proc. IEEE Int. Symp. Information Theory (ISIT), Nice, Jun. 24–29, 2007, pp. 1416–1420. [12] N. C. Martins and T. Weissman, “Coding for additive white noise channels with feedback corrupted by quantization or bounded noise,” IEEE Trans. on Information Theory, vol. IT-54, no. 9, pp. 4274–4282, September 2008. [13] H. Witsenhausen, “A counterexample in stochastic optimum control,” SIAM Journal on Control, vol. 6, p. 131, 1968. [14] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications. New York: Springer-Verlag, Inc., 2003. [15] H. Robbins and S. Munro, “Coding for additive white noise channels with feedback corrupted by quantization or bounded noise,” Ann. Math. Stat., vol. 22, pp. 400–407, Sep. 1951. [16] J. Sacks, “Asymptotic distributions of stochastic approximation procedures,” Ann. Math. Stat., vol. 29, pp. 373–405, May 1959.