Maximum Expected Rates of Block-Fading Channels with Entropy ...

576

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 2, FEBRUARY 2013

Maximum Expected Rates of Block-Fading Channels with Entropy-Constrained Channel State Feedback V´ıctor M. Elizondo and Milan S. Derpich

Abstract—We obtain the maximum average data rates achievable over block-fading channels when the receiver has perfect channel state information (CSI), and only an entropy-constrained quantized approximation of this CSI is available at the transmitter. We assume channel gains in consecutive blocks are independent and identically distributed and consider a short term power constraint. Our analysis is valid for a wide variety of channel fading statistics, including Rician and Nakagami-m fading. For this situation, the problem translates into designing an optimal entropy-constrained quantizer to convey approximated CSI to the transmitter and to define a rate-adaptation policy for the latter so as to maximize average downlink data rate. A numerical procedure is presented which yields the thresholds and reconstruction points of the optimal quantizer, together with the associated maximum average downlink rates, by finding the roots of a small set of scalar functions of two scalar arguments. Utilizing this procedure, it is found that achieving the maximum downlink average capacity C requires, in some cases, time sharing between two regimes. In addition, it is found that, for an ¯ < log2 (L), a quantizer with more uplink entropy constraint H than L cells provides only a small capacity increase, especially at high SNRs. Index Terms—Channel state information feedback, Information rates, fading channels, quantization, radio communication.

I. I NTRODUCTION

I

T is well known that the achievable data rates for reliable communication over a fading wireless channel depend on the availability of channel state information (CSI) at the transmitter and receiving end [1], [2]. For single-input singleoutput (SISO) flat fading channels, the CSI consists of channel gain and phase. If perfect CSI is available at the transmitter (perfect CSIT) and at the receiver (perfect CSIR), the channel is slowly fading and the transmission is subject to a long-term average power constraint, then the average capacity is achieved by adapting rate and power to the channel gain in a time waterfilling fashion [3], [4]. By contrast, if an instantaneous (per block) maximum power constraint is imposed, the fades are ergodic and the transmission blocks are long enough so that the fade statistics over each block converge to their ensemble Manuscript received August 14, 2011; revised January 23 and May 19, 2012. The associate editor coordinating the review of this letter and approving it for publication was O. Simeone. V. M. Elizondo and M. S. Derpich are with the Department of Electronic Engineering, Universidad Técnica Federico Santa Mar´ıa, Valpara´ıso, Chile (e-mail: [email protected], [email protected]). This work was supported by CONICYT grant ACT-53 and FONDECYT grants 3100109 and 1120468. Digital Object Identifier 10.1109/TCOMM.2012.12.110537

statistics, then the ergodic channel capacity is achievable without CSIT [3], [5]. Else, if the fading is so slow that channel gain can be regarded as constant within each block (which corresponds to a block-fading scenario) then CSIT is beneficial. In this case, with perfect CSIT and per-block power constraint, the capacity is achieved by transmitting at maximum power, with only the data rate being adapted to the channel gain in each transmission block [3]. If perfect CSIR is available and the receiver feeds back this CSI via an uplink with limited information throughput, then only imperfect CSI will be available at the transmitter. In a block-fading situation, the uncertainty at the transmitter about the true channel gain in each block implies a trade-off between throughput and reliability: the larger the data-rate chosen by the transmitter, the higher the probability of exceeding the channel capacity during the transmission block [6]. This poses the problem of encoding the CSI at the receiver and decoding it at the transmitter (i.e., choosing rate and power) in a rate-distortion optimal fashion, where the distortion is some measure of the decrease in downlink throughput, as in [3], [7] or the increase in error probability, as in [8]. The capacity of memory-less block-fading SISO channels with long-term power constrained downlink transmission and fixed-rate constrained CSI feedback was studied in [9]. A similar situation was considered in [10], assuming a multilayer downlink coding scheme in which data-blocks are decoded perfectly or totally lost if transmission data-rate is, respectively, below or above the channel capacity during the block. The idea in [10] was to design a quantizer with a fixed number of quantization cells so as to maximize the expected downlink rate, i.e., the expected number (or long term average) of successfully decoded bits. Also for a constraint in the number of CSI quantization cells, [11] studied the maximization of downlink throughput considering a noisy feedback channel. There exist also numerous results related to downlink throughput maximization problems for multipleinput multiple-output (MIMO) wireless channels (see, e.g. [2], [12]–[14] and the references therein). Although not directly related to the SISO problem (which is the focus of this work), it is worth mentioning that, in all the MIMO results in [2], [12]–[14] and the references therein, the only constraint on the quantizer (where there is a quantizer) is its cardinality. In [15], the maximum SISO downlink average throughput under a long-term power constraint and for a fixed number of quantization cells is analyzed. The performance of zero-

c 2013 IEEE 0090-6778/13$31.00

ELIZONDO and DERPICH: MAXIMUM EXPECTED RATES OF BLOCK-FADING CHANNELS WITH ENTROPY-CONSTRAINED CHANNEL STATE FEEDBACK

outage schemes (referred to as MASA schemes) was compared against that of average reliable throughput schemes (referred to as ART schemes), which allow for outages to occur. It is shown in [15] that, in some regimes, when the additional feedback load of the ART policies (associated with informing the transmitter of a previous outage) is counted in, MASA schemes outperform ART schemes. In that context, the feedback load refers to the entropy of the messages (quantized CSI plus ACKs and NACKs) that are sent to the transmitter. However, in [15] this entropy is evaluated a posteriori, i.e., after the quantizers have been optimized without considering uplink entropy as a constraint. Thus, in all these papers, the design of optimal CSI quantizers has been addressed only considering a constraint on the number of quantization intervals (or cells). However, if one addresses the question “what is the maximum throughput that can be attained if there is a constraint on the amount of information that can be sent to the transmitter for representing the CSI?”, then it is more appropriate to consider an entropy constraint (instead of a cardinality constraint) for the CSI quantizer. On the other hand, the entropy of the quantized output, say H, is a lower bound to the average number of bits required to represent this output. At the same time, by using Huffman coding, it is possible to find prefix-free bitwords for each quantized CSI outcome with an average length not greater than H + 1 bits per CSI realization. Moreover, in a situation in which K i.i.d. CSI realizations are quantized at a time (which would happen, for example, in an OFDM system with K independently fading carriers), joint entropy coding would yield bitwords with an average length of K times H plus, at most, 1 bit. Since, in general, having fewer information bits to feed back for each CSI realization requires less average power, bandwidth or time, the latter benefits can be directly associated with a low entropy. This constitutes a practical motivation for considering entropy, instead of cardinality, as a constraint for the CSI quantizer. However, to the best of the authors’ knowledge, there are no available results on average downlink throughput maximization in which the quantizer utilized to encode CSI for the transmitter is to be designed subject to a constraint on the entropy of its output. With the motivations stated in the previous paragraph, in this paper we study the problem of finding entropy-constrained quantizers with any given number of quantization intervals, for encoding block channel gains for the transmitter, that yield the largest average downlink data rate. In our setup, the downlink channel is assumed to experience i.i.d. block fading, with associated gains and phases perfectly known to the receiver. We consider a wide family of fading statistics, general enough to include Ricean and Nakagami-m fading channels with one or more degrees of freedom. As in [10], the uplink over which quantized CSI is fed back is an errorfree, zero-delay channel. To solve this problem, we propose a numerical method which yields the optimal quantization thresholds and reconstruction points for any given number of quantization intervals and average channel signal-to-noise ratio (SNR). The optimization problem is partly similar to the quantizer design problems addressed in [16]–[19] because of the common entropy constraint. However, as we shall see, since the distortion measure in this case is the decrease in

577

average downlink rate (not mean squared error), the resulting situation is vastly different from the one encountered in standard entropy-constrained quantization. The CSI entropyconstrained coding problem turns out to be non-convex, and our analysis reveals that it has, in general, several local optima. Its Lagrangian formulation, for L = N + 1 quantization cells, leads to a system of 3N + 2 non-linear equations in 3N + 2 unknowns, each of which taking values over the non-negative real numbers. Since each of these equations must be solved numerically, and due to the high-dimensionality of the search space, direct solution of this system of equations constitutes a high numerical complexity task. The numerical procedure introduced in this paper greatly reduces this complexity by turning the problem into finding the roots of a small set of scalar functions of two scalar arguments, only one of which having unbounded support. The evaluation of each of these functions involves solving N line-search problems with respect to monotonic functions. By applying this procedure, it is found that, in general, the maximum average downlink ¯ is a non-concave capacity C for a given uplink entropy H function. Since in our formulation time sharing between two regimes yields an average capacity and entropy equal to the weighted averages of the capacity/entropy values of each ¯ pairs is given by regime, the region of all achievable C, H ¯ curve. On the other hand, it is the convex hull of the C v/s H ¯ is log2 of the number of available quantization found that if H cells, then arranging the thresholds so as to obtain equiprobable cells is nearly optimal. Our results also allow one to find the gain in downlink average throughput of using an optimal entropy-constrained quantizer instead of a cardinality constrained optimal quantizer. For instance, when the average downlink SNR is 0 dB, then an entropy-coded quantizer with 3 levels and an average rate of 1 bit per CSI realization yields an 8% increase in average downlink throughput over an optimal fixed-rate quantizer with two levels (i.e., requiring the same average rate). The performance of the latter fixed-rate quantizer corresponds with the one found in [10]. It is also found that for any given maximum uplink entropy constraint ¯ < log (L), the increase in maximum downlink capacity H 2 obtained by using a quantizer with more than L cells is relatively small. Moreover, our analysis also suggests that, ¯ the maximum average downlink capacity for any given H, is achieved using a quantizer with a finite number of cells. This contrasts with what is obtained also for an exponentially distributed source but with MSE as the distortion measure, wherein the optimal quantizer turns out to be uniform with infinitely-many levels [16]. In the following section we present a precise model description, introduce some notation, and formally state the problem of interest. To illustrate some of the properties of this problem and its solutions, we first analyze the case N = 1 (two quantization cells), which can be solved explicitly, in Section III. Then we extend the analysis to the case N > 1 in Section IV, where we introduce the numerical procedure to solve the problem in its generality. Section V shows and analyzes the results obtained with this procedure for the case N = 2 and N = 3 under Rayleigh fading. Finally, Section VI draws conclusions.

578


Transmitter

W

Encoder

√

xb [k]

Receiver nb [k]

gb

Downlink Channel

Decoder Channel Estimator

yb [k]

gb

∼

gb Uplink Channel

W

Quantizer

Fig. 1. Transmitter and receiver connected by downlink and uplink channels.

II. P ROBLEM F ORMULATION We consider a block-fading downlink additive white Gaussian noise (AWGN) channel, a transmitter, a receiver and an error-free, zero-delay uplink channel, as depicted in Fig. 1. In the transmitter, the binary message sequence W is mapped into consecutive blocks of K symbols. During each block b ∈ N, a real-valued sequence {xb [k]}K k=1 is transmitted over the downlink channel. The random block-channel gain √ magnitude for the b-th block, gb , is assumed constant within each block. Channel gains in consecutive blocks are i.i.d. according to a probability density function (PDF) satisfying the following: Assumption 1: The PDF of the the channel gain’s squared magnitude, fg , has the form fg (u) = K1 e−K2 u β(u),

(1)

for appropriate constants K1 , K2 > 0, where the differentiable dβ(u)/du + function β : R+ is non 0 → R0 is such that the ratio β(u) increasing with respect to u over [0, ∞). The structure of the PDF of g in (1) is fairly general. For example, if channel gain magnitude is Ricean distributed, then the PDF of g has the form √ ν u 1 − 12 (u+ν 2 ) 2σ e I0 , (2) fg (u) = 2σ 2 σ2 where I0 (·) is the modified Bessel function of the first kind of order zero. From direct comparison 2 with2 (1), we obtain, for 2 ) exp −ν /(2σ ) , K2 = 1/(2σ 2 ) this case, K1 = 1/(2σ √ 2 and β(u) = I0 ν u /σ . It can be verified (numerically) that the latter form of β(·) satisfies the conditions required by Assumption 1. Likewise, if channel gain magnitudes are governed by a Nakagami-m distribution, then the PDF of g takes the form m mm fg (u) = um−1 e− ω u (3) Γ(m)ω m m

m m−1 . and we have K1 = Γ(m)ω m , K2 = m/ω and β(u) = u If m ≥ 1, it is easy to verify that β(u) also satisfies the conditions required by Assumption 1.1 Returning to Fig. 1, the real-valued random process nb [k], k ∈ {1, . . . , K}, is AWGN with sample variance N0 . Thus, if {nb [k]}K k=1 were the samples, taken at Nyquist frequency, of continuous-time AWGN band-limited to B [Hz], then the 1 The necessity of the condition upon β(·) in Assumption 1 will become evident in Lemma 1 (Section III), in which it allows us to prove the convexity of a function playing a key role in the problem under study.

two-sided PSD of the latter would be N0 . On the other hand, the information-bearing signal xb [k] is subject to a per-block power constraint of the form 1 K xb [k]2 ≤ M, ∀b ∈ N, (4) k=1 K where M > 0. With this constraint, if the block-length K is large, then the maximum achievable data-rate during any given block b can be well approximated by Shannon’s capacity formula [20] as Cb = ln (1 + γ gb ) nats/s/Hz, where γ

M N0

(5)

is the mean SNR at the receiver for a channel power gain g with unit mean value. At the other end of the downlink channel, the receiver is assumed to acquire a perfect estimation of gb prior to (or at the beginning of) the b-th transmission block. This channel power gain is instantaneously quantized and entropy coded, with the resulting bits being sent over a zero-delay, error-free uplink channel. These assumptions about the feedback channel have been considered before in [7]–[10], [21]. The zero-delay condition can be expected to be a good approximation when the time spent to feed the quantized CSI back to the transmitter is much shorter than the duration of a downlink frame. In turn, it is possible to have an almost error-free feedback channel if the feedback SNR is sufficiently large and/or strong forward error correction is employed for the CSI bits. And naturally, if in a given situation these latter conditions are not present, then our results would provide upper bounds to achievable performance. As foreshadowed in the Introduction, it is possible to translate a small entropy of the quantized CSI into using less average power, bandwidth or time to convey this CSI to the transmitter. At this point, it is perhaps worth noting that if only a single CSI realization is quantized and fed back at the beginning of each downlink block, then attaining these benefits may require one to match the channel coding and modulation scheme in the feedback link to the variable bitword lengths coming out of the entropy coder. For instance, placing an “off-the-shelf” channel coder and modulator in the feedback channel would yield an uplink that conveys only sequences of fixed-length data blocks. Such choice which would entail significant inefficiencies when transmitting variable-length bitwords, in comparison to sending fixedlength bitwords. However, in this scenario wherein a single CSI realization is quantized and fed back at a time, the


Vi μi

ui

μi+1

Fig. 2. Illustration of the i-th quantization cell.

feedback channel coding and modulation can be chosen so as to handle variable-length bit words (or the associated unequal probability outcomes of the quantizer) as efficiently as it is possible for fixed-rate quantizers. This can be done, e.g., by employing variable-length error-correcting codes [22] or joint source-channel coding (see, e.g., [23]–[25] and the references therein). Although the design of such coders and modulators is beyond the scope of this work, we illustrate this fact with an example (presenting a simple scheme similar in spirit to [26]), which can be found in Section VII-A in the Appendix. Upon receiving the quantized CSI, the transmitter chooses a transmission data-rate rb from a discrete set of data-rates. To define the quantizer and its reconstruction values, let N + 1 be the number of quantization intervals (or cells), and let +1 {μi }N i=0 denote the thresholds set, where 0 = μ0 ≤ μ1 ≤ · · · ≤ μN < μN +1 = ∞.

(6)

Define also the quantization cells Vi [μi , μi+1 ), i = 0, . . . , N .2 As in [10], whenever the channel power gain gb falls within cell Vi , the transmitter outputs a codeword {xb [k]}K k=1 satisfying (4), belonging to the i-th codebook amongst N + 1 codebooks, one for each cell. This codebook is capacity-achieving for some nominal channel power gain ui associated with the cell Vi , i.e., gb ∈ Vi ⇐⇒ rb = ri ln (1 + γui ) nats/s/Hz, ∀i ∈ {0, . . . , N }.

(7)

Thus, the power-gain levels {ui }N i=0 can be seen as the set of reconstruction points (or codepoints) of the quantizer, as represented in Fig. 2. Since in each block the transmitter sends information over the downlink using a capacity-achieving code for a nominal channel gain ui , all the transmitted bits are correctly decoded if gb ≥ ui . Else, if gb < ui , then ri is not supported by the channel, and the receiver declares an outage, discarding all the information received during the b-th block. From this, it is clear that a necessary condition for a set of reconstruction values to be optimal is ui ∈ Vi ,

∀i = 0, . . . , N.

(8)

Let the random variable ub , taking values in {ui }N i=0 with probabilities Pr{ub = ui } = Pr{gb ∈ Vi }, denote the output of the quantizer for the b-th block. As already mentioned in Section I, we focus on quantizers that satisfy an entropy constraint, which we now formally state as N ¯ Pr{ub = ui } ln Pr{ub = ui } ≤ H H(ub ) = − i=0 (9) nats/block, ∀b ∈ N, 2 It is easy to show that restricting the quantization regions to be intervals entails no loss of optimality.

579

¯ ≥ 0 representing the maximum entropy allowed for with H the quantizer’s output. We are interested on finding the quantizer (i.e., the threshN olds {μi }N i=1 and reconstruction points {ui }i=0 ) satisfying the entropy constraint (9) and maximizing the average datarate in the downlink channel, defined as the average number of correctly decoded bits. The average number of correctly decoded bits (sometimes referred to as average “goodput” or average reliable throughput) is also the the objective function in [7], [10], [12]. It is a reasonable figure of merit if one assumes forward error correction followed by interleaving has been applied to the data being sent over the downlink, so that, with high probability, downlink blocks lost due to outage do not cause irrecoverable errors. Otherwise, and if all data is to be decoded correctly, ACKs and NACKs would have to be sent back over the uplink to request retransmission of lost data. As we shall see from the examples in Section V, at least for Rayleigh fading and for γ equal 0 dB and 30 dB, the optimal entropy-constrained CSI quantizers are such that only the first cell (V0 ) allows for outage events. The latter suggests that if sending ACKs and NACKs is necessary, then it would mean adding at most Pr{gb ∈ V0 } bits per CSI realization to the uplink datarate. (The extra bit-rate is upper bounded by Pr{gb ∈ V0 } bits/block because only when gb ∈ V0 it becomes necessary to send an ACK or NACK during block b + 1, which means at most 1-extra bit every time gb ∈ V0 .) Since fades in consecutive blocks are i.i.d. and ergodic, averages over a large number of blocks converge to ensemble averages. Thus, for notation simplicity, in the following we drop the frame and channel-use indexes b and k. With this, the quantizer design problem can be stated as finding the N thresholds {μi }N i=1 and codepoints {ui }i=0 satisfying (6) and (8), that maximize the average data rate in the downlink channel without exceeding the entropy constraint in the uplink. More precisely, combining (6), (7), (8) and (9), we state the optimization problem in canonical form as N minimize: J ln (1 + γui ) (Fg (ui ) − Fg (μi+1 )) N {μi }N i=1 ,{ui }i=0

i=0

(10a) subject to: −

N

(Fg (μi+1 ) − Fg (μi )) ×

(10b)

¯ ≤0 ln (Fg (μi+1 ) − Fg (μi )) − H

(10c)

μi − ui ≤ 0, ∀i ∈ {0, . . . , N } ui − μi+1 ≤ 0, ∀i ∈ {0, . . . , N }

(10d) (10e)

i=0

u with μ0 = 0 and μN +1 = ∞, and where Fg (u) 0 fg (x)dx is the cumulative distribution function (CDF) of g. This optimization problem is difficult to solve primarily because the entropy constraint (10c) is non-convex. As we shall see, this leads to the existence of several local solutions, which, in principle, requires one to run an optimization program several times with a potentially large number of different starting values. In the following sections we solve this optimization problem, first explicitly for the case N = 1, and then numerically by means of Lagrangian optimization and a novel procedure which greatly reduces the overall complexity of the task.

580


III. P ROBLEM S OLUTION FOR T WO C ELLS We now address the optimization problem stated in (10) for the case N = 1, corresponding to two quantization cells. In this case, (10) reduces to: J = ln (1 + γu0 ) (Fg (u0 ) − Fg (μ1 )) minimize: μ1 ,{ui }1i=0 + ln (1 + γu1 ) (Fg (u1 ) − 1) (11a) subject to: − Fg (μ1 ) ln (Fg (μ1 )) + (1 − Fg (μ1 )) × ¯ ≤ 0 (11b) ln (1 − Fg (μ1 )) − H − u0 ≤ 0,

(11c)

u0 − μ1 ≤ 0, μ1 − u1 ≤ 0.

(11d) (11e)

for any Δ ≥ 0, with the inequality being a consequence of the fact that F (·) is non-decreasing. Recalling from Lemma 1 ∂ that ψ(u, μ) is convex in u, we conclude that ∂u ψ(u, μ + Δ) becomes zero at a single value of u greater than or equal to d U(μ) ≥ 0 holds for all μ ∈ (0, ∞). u0 . This proves that dμ The latter result implies that the optimal value for u0 must belong to the interval (0, ξ(γ)), where ξ(γ) U(∞). Also, by applying Lemma 1 with μ = ∞, it is readily found that the unique value of u1 that minimizes J in (11a) also equals ξ(γ). The convexity granted by Lemma 1 means that the latter function can be easily obtained numerically by line search. Moreover, Lemma 1 implies that sgn(∂J/∂u1 ) = sgn(u1 − ξ(γ)), which leads to the conclusion that the optimal value of u1 given μ1 is u1 = max {μ1 , ξ(γ)} .

(15)

It then follows that, if u0 is part of an optimal quantizer for some entropy constraint, then the optimal μ1 can be explicitly obtained from u0 by using (12), and then the optimal u1 can be directly derived from μ1 using (15). In this manner, by increasing u0 from 0 to ξ(γ) and evaluating μ1 and u1 with the latter equations, one generates a family of quantizers ¯ ∂J containing the optimal quantizer for every value of H. = 0 ⇐⇒ Fig. 3 (top) shows the curve of downlink capacity versus ∂u0 (12) 1 + γu0 uplink entropy obtained by the method described in the preln (1 + γu0 ) fg (u0 ) + Fg (u0 ) Fg (μ1 ) = vious paragraph, for two quantization cells, under an average γ SNR γ = 1 (0 dB, left) and γ = 1000 (30 dB, right). More + We see from this equation that for every u0 ∈ R there exists ξ(γ) 1000 } =1 , the corresponding μ1 precisely, for every u0 ∈ { 1000 a unique μ1 > u0 for which u0 minimizes J. On the other was calculated using (12). Then, for this value of μ1 , the hand, in order to determine the optimal value of u1 given μ1 , optimal codepoint u1 was determined using (15). In this way, we notice that this value has to minimize the term ln(1 + for each value of u0 , a different quantizer was obtained. The γu1 )(F (u1 ) − 1) in (11a). Although, in general, such value pair {downlink capacity, output entropy} associated with each cannot be found explicitly, the following lemma guarantees quantizer yielded a single point in the top two graphs in that it is unique and that it can easily be found numerically: Fig. 3. In this example, channel gain magnitudes distribute Lemma 1: Let f be a PDF satisfying Assumption 1. Define Rayleigh, so g has exponential PDF, chosen to yield unitu b F (u) 0 f (x)dx. Then, for any μ > 0, the function mean, i.e., fg (u) = e−u , ∀u ∈ [0, ∞). Notice that, for every ¯ α(u) ln(1 + γu) (F (u) − F (μ)) (13) value of H < 2, there exist two solutions to (12) and (15), corresponding to different local constrained minimizers of J in (11). There is one quantizer associated with each of these is convex for all u ∈ [0, μ]. The proof of this lemma, which will play a key role later in two solutions. One of these yielded the upper section of the Section IV, can be found in the Appendix, at the end of this curve in each of the plots shown in 3-top, and the other one yielded the lower section. Of course, the optimal quantizers are document. It turns out that the value of u which minimizes those responsible for the upper part of their respective curves, ψ(u, μ) ln(1 + γu)[F (u) − F (μ)] increases monotoni- i.e., those which yield the maximum downlink capacity for cally with μ. More precisely, define the function U(μ) a given uplink entropy constraint. As expected, in general,

d u˜ : du ψ(u, μ) = 0 . Then, under Assumption 1, it downlink capacity grows when the entropy of the quantized u=˜ u CSI available to the transmitter (uplink entropy) is increased. holds that Note that the maximum downlink capacity for γ = 1 d (SNR = 0 dB), 0.52347 bits/s/Hz, occurs at an entropy U(μ) ≥ 0, ∀μ ∈ (0, ∞). (14) dμ of 0.85644 bits/block, not coinciding with maximum uplink ∂d To prove this claim, suppose that ∂u ψ(u, μ) = 0 at u = u . entropy, which, for a two-cell quantizer is 1 bit/block. This is expectable for a quantizer with a fixed number of cells, This means that since there is no reason why all cells being equally likely (the only situation in which the entropy is maximized) should yield d d γ 0= = − ψ(u, μ) [ln(1 + γu)F (u)] F (μ) the largest downlink capacity. Also, the maximum downlink du du 1 + γu u=u u=u capacities for a two-cell quantizer at 0 dB and 30 dB SNRs γ d [ln(1 + γu)F (u)] F (μ1 + Δ) − ≥ coincide with what was obtained in [10], where average downdu 1 + γu u=u link capacity was maximized without an entropy constraint d = ln(1 + γu)[F (u) − F (μ1 + Δ)] on the quantized output. At an average SNR of 30 dB, du u=u This problem can be solved explicitly without using Lagrange multipliers by noticing that the entropy associated with the two cells depends only on the threshold μ1 . Supposing μ1 is given, the optimal value of u0 is found by differentiating the objective function J with respect to it and equating to zero:


0.6

8

downlink capacity for γ = 0dB maximum downlink rate

7.8

downlink capacity, bits/s/Hz


581

0.55

0.5

0.45

0.4

downlink capacity for γ = 30dB maximum downlink rate

7.6 7.4 7.2 7 6.8 6.6 6.4 6.2

0.2 0.4 0.6 0.8 entropy of the quantized output, bits/block

6

μ1 u0 u1

5 4 3 2 1 0

0


6

1

optimal threshold and code-point values

optimal threshold and code-point values

0.35 0

1

0


1

6

μ1 u0 u1

5 4 3 2 1 0

0


1

Fig. 3. Solutions to (12) and (15) (two quantization cells) for Rayleigh fading with unit-mean channel power gain. Top: Downlink capacity v/s uplink entropy under an average SNR γ = 1 (0 dB, left) and γ = 1000 (30 dB, right). Bottom: Thresholds and code-points for the optimal solution as a function of uplink entropy under an average SNR γ = 1 (0 dB, left) and γ = 1000 (30 dB, right).

Fig. 3 (top right) shows that the maximum downlink capacity occurs closer to maximum uplink entropy, and that equallylikely cells, corresponding to this maximum entropy, are near optimal. Interestingly, the maximum capacity curve at this SNR shows a breakpoint at an entropy of about 0.5 bits/block (where the curve crosses itself in Fig. 3 top right), which makes it non concave. For this case, this implies that better performance for entropies below 0.8 bits/block can be achieved by doing time-sharing between two quantizers: one with a single cell and zero entropy, and another with two cells and an entropy of about 0.88 bits/block. By choosing one regime more frequently than the other, it is possible to achieve a downlink capacity and an uplink entropy equal, respectively, to the weighted averages of the capacities and entropies of both regimes. In this manner, all capacity/entropy points within the convex hull of the capacity v/s entropy curves can be achieved. Fig. 3 (bottom) shows the evolution of thresholds and codepoints for the optimal solution, as the entropy of the quantized output varies. For γ = 1 (0 dB), on the left, u1 = μ1 at all

entropies. We see also that higher capacities are achieved by bringing thresholds and code-points closer together. For the 30 dB case, shown in Fig. 3 (bottom right), the breakpoint in capacities coincides with a change in the arrangement of thresholds and code-points in the quantizer. Except in a neighborhood to the right of this breakpoint, we find that the codepoint u1 coincides with its left-boundary threshold μ1 . IV. P ROBLEM S OLUTION FOR M ORE T HAN T WO Q UANTIZATION C ELLS A. Preliminaries The straightforward approach presented in the previous section cannot be extended directly to the case in which there are more than two quantization cells. Lagrangian optimization can be utilized instead, which, as we shall see, leads to a system of non-linear equations that must be solved numerically. The non convexity of the optimization problem (10) and the existence of several local minimizers satisfying the constraints imposes the need to solve this system of non-linear equations possibly

582


many times with different initial values. However, we will introduce an algorithm, in the same spirit as the strategy illustrated in the previous section, that allows one to simplify the Lagrangian optimization problem to a sequence of simple line search problems, each with a single solution. Before stating the Lagrangian associated with (10), we note from (10a) that, at the optimum, the inequality constraints stated in (10e) are not active. Similarly, constraint (10d) for the case i = 0 is not active since increasing u0 above μ0 = 0 would raise the average downlink data rate without increasing the entropy of the quantized output. Taking the above observations into account, the Lagrangian associated with (10) adopts the following form N ln (1 + γui ) (Fg (μi+1 ) − Fg (ui )) L=− i=0N +λ − [Fg (μi+1 ) − Fg (μi )] × i=0 N ¯ + ψi (μi − ui ) ln (Fg (μi+1 ) − Fg (μi )) − H i=1 (16) with Fg (μ0 ) = 0, Fg (μN +1 ) = 1, and where λ and {ψ}N 1 are Lagrange multipliers. Differentiating L with respect to u0 and equating to zero gives −γ ∂L = (Fg (μ1 ) − Fg (u0 )) + ln (1 + γu0 ) fg (u0 ) = 0 ∂u0 1 + γu0 (17a) Notice that (17a) is identical to (12), which implies that u0 is a solution to (17a) only if u0 ∈ (0 , ξ(γ)].

(17b)

Differentiating L with respect to the other codepoint values and equating to zero yields ∂L −γ = (Fg (μj+1 ) − Fg (uj )) ∂uj 1 + γuj (17c) + ln (1 + γuj ) fg (uj ) − ψj = 0, 1 ≤ j ≤ N Hence, we obtain N +1 non-linear equations, which must hold simultaneously. On the other hand, differentiating the Lagrangian with respect to the thresholds we obtain ∂L = − ln (1 + γuj−1 ) fg (μj ) ∂μj Fg (μj ) − Fg (μj−1 ) − λfg (μj ) ln + ψj = 0, Fg (μj+1 ) − Fg (μj ) 1≤j≤N (17d) yielding N additional equations to be satisfied simultaneously. Finally, the Karush-Kuhn-Tucker (KKT) conditions [27], [28] provide another set of N + 1 equations, N (Fg (μi+1 ) − Fg (μi ))× (17e) λ − i=0 ¯ =0 ln (Fg (μi+1 ) − Fg (μi )) − H (17f) ψi (μi − ui ) = 0, 1 ≤ j ≤ N (17g) plus the requirement that λ≥0 ψj ≥ 0,

∀j = 1, . . . , N,

all to be satisfied simultaneously. Although it is possible to solve this system of 3N + 2 non-linear equations by standard numerical algorithms, the existence of numerous local minima requires one to apply these algorithms repeatedly, each time with different initial values. This shortcoming is worsened by the fact that the vector of initial values (Lagrange multipliers plus threshold and code-point values) lies in a (3N + 2)-dimensional space, which implies a large number of initial guesses is required to obtain a “reasonably good” coverage of the search space. In the following section we will show how these inconveniences can be avoided by using an approach similar to the one presented in Section III. More precisely, we derive a method which, for this problem, allows one to find all local minima in a systematic, sequential manner, greatly reducing the number of required computations. B. An Efficient Algorithm for More than Two Cells In this section we exploit the recursive nature of (17) to reduce its induced system of non-linear equations into a sequence of line-search problems over bounded intervals, in a spirit similar to the one behind the approach followed in Section III. To begin with, recall that the KKT conditions imply that, for every constrained (local) minimizer of (10a), the multiplier ψj > 0 only if uj = μj (i.e., only if the associated constraint is active). The next corollary of Lemma 1 provides an easyto-verify condition for uj = μj to hold in such a minimizer: Corollary 1: Let fg be a PDF satisfying Assumption 1. N Suppose {ui }N i=0 , {μi }i=1 are a solution to optimization problem (10). Then uj = μj holds for some j ∈ {1, . . . , N } if and only if d [− ln(1 + γu) (Fg (μj+1 ) − Fg (u))] u=μj ≥ 0. (18) du Proof: The result follows directly from Lemma 1, since it implies the convexity of the function − ln(1 + γu) (Fg (μj+1 ) − Fg (u)), and from (10a), upon recalling that the entropy of the quantized output does not depend on the choice of codepoint values. Corollary 1 will allow us to find uj and μj+1 from λ, uj−1 , μj and μj−1 in a simple manner. For this purpose, define, for j = 1, . . . , N , ρj (uj , θj+1 ) ln (1 + γuj ) fg (uj ) γ γ − (1 − Fg (uj )) + θj+1 1 + γuj 1 + γuj sj (θj+1 ) ln (1 + γuj−1 ) fg (μj ) (19a) θj−1 − θj + λfg (μj ) ln (19b) θj − θj+1 where θj 1 − Fg (μj )

(19c)

is the complementary CDF of g evaluated at μj . With these functions, the combined conditions (17c), (17d), (17g) and (17i) can be written in the following equivalent form

(17h)

ψj = ρj (uj , θj+1 ) = sj (θj+1 ) ≥ 0,

(17i)

ψj (μj − uj ) = 0,

j = 1, . . . , N

(20a)

j = 1, . . . , N. (20b)


sj (θj+1 )

sj (θj+1 )

ρj (μj , θj+1 )

θj+1

583

ρj (μj , θj+1 )

θj+1 c θj+1 (1)

θj+1

0

θj+1

(2)

θj+1

(0)

θj+1

0

θj+1

(2)

θj+1

(0)

θj

θj+1

θj (0)

Fig. 4. Plots of ρj (μj , θj+1 ) and sj (θj+1 ) (defined in (19)), as functions of θj+1 . Left: a case in which ρj (μj , θj+1 ) > 0. Right: a case in which (0) ρj (μj , θj+1 )

< 0.

Figure 4 shows a (qualitative) description of ρj (μj , θj+1 ) and sj (θj+1 ) as functions of θj+1 . It can be seen that both functions are monotonically increasing, the first one being affine, the second one convex. A look at Fig. 4 immediately suggests that, for any given j, and depending on the values of the parameters λ, γ, uj−1 , μj−1 and μj , there will be, in general, more than one pair of values of uj and μj+1 satisfying (20). Let us find out which solutions actually exists by first considering the conditions under which the constraint uj ≥ μj is active or inactive: • Inactive Constraint: In this case, uj > μj , which implies ψj = 0 (see (20b)). In view of (20), this is equivalent to having 0 = sj (θj+1 ) = ρj (uj , θj+1 ).

(21)

The first equality of (21) is satisfied by a unique value (0) of the argument of s(·), say θj+1 , shown in Fig. 4. From (19b) this value is given explicitly by (0)

1/λ

θj+1 = θj − (θj−1 − θj ) (1 + γuj−1 )

.

(22)

From the definition of θj+1 , this quantity must be nonnegative. On the other hand, Corollary 1 implies that uj > μj if and only if (0)

ρj (μj , θj+1 ) < 0,

•

(23)

situation exemplified and illustrated in Fig. 4-right. In addition, Lemma 1 states that, if this inequality is satisfied, then there exists a unique uj satisfying the second equality of (21). Thus, there exists a solution to (20) for which the constraint uj ≥ μj is inactive if and only if (0) θj+1 ≥ 0 and (23) holds. In this case, we say solution (0) exists for j. Active Constraint: In this case, uj = μj , and ψj can be positive. By looking at (20), a solution satisfying this condition exists if and only if there is θj+1 for which (a)

ρ(μj , θj+1 ) = s(θj+1 ) ≥ 0,

(24)

which correspond to intersections of the plots of ρ(μj , ·) and s(·) occurring on the first quadrant in Fig. 4. Since, with respect to θj+1 , the function ρj (μj , θj+1 ) is affine and increasing, and the function s(θj+1 ) is convex and

monotonically increasing, it follows that (24) is satisfied for either none, one or two values of θj+1 . Indeed, these properties imply that two solutions to equality (a) in (24) will exist if and only if

ρ(μj , θj+1 ) > s(θj+1 ),

(25)

where

θj+1 θj −

1 + γμj λf (μj ) γ

(26)

is the unique value of x at which ∂ρ(μj , x)/∂x = ∂s(x)∂x, see Fig. 4. It is also easy to show that if (25) (1) (2) holds, these two solutions, say θj+1 and θj+1 , will lie at opposite sides of θj+1 . Also, it is straightforward to verify that both solutions are not larger than θj . Therefore, (1) (2) θj+1 ≤ θj+1 ≤ θj+1 ≤ θj . Returning to our original problem, each solution to equality (a) in (24) will also be a solution to (20) if and only if it is non-negative and it yields a non-negative value for s(·). This allows (1) (0) one to discard solution θj+1 if ρ(μj , θj+1 ) > 0 or if s(θj+1 ) < 0. On the converse, if these two conditions do (1) not hold, solution θj+1 will be non-negative and yield (1) ρ(μj , θj+1 ) ≥ 0, i.e., it will be a valid solution to (20). In this case, it is easy to verify that 1 + γμj (1) c ln (1 + γμj ) fg (μj ), θj − θj+1 ≥ θj+1 γ (27) a situation which is illustrated in Fig. 4-right. Making use of all these all these conditions, and letting (i) μj+1 , with i = 0, 1, 2, be the threshold value associated (i) with θj+1 , we can devise the following procedure to find the solutions to (20) for a single, given j: Procedure 1: Suppose λ ≥ 0, and that uj−1 , μj−1 and μj are given, with 0 ≤ μj−1 ≤ uj−1 ≤ μj . Then, in order to find the solutions to (20) for j, (0) c explicitly using (22), (26), 1) Calculate θj+1 , θj+1 and θj+1 and (27), respectively. 2) If ρj (μj , θj+1 ) ≥ s(θj+1 ) (2)

a) Find θj+1 by solving equality (a) in (24) with respect (2) to θj+1 by line search over [θj+1 , θj ]. If ρ(θj+1 ) ≥ 0,

584

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 61, NO. 2, FEBRUARY 2013 (2)

set uj = μj and θj+1 = θj+1 , in which case solution (2) exists. b) If ρj (μj , θj+1 ) > s(θj+1 ) and s(θj+1 ) > 0 and (0) (1) ρ(θj+1 ) ≤ 0, find θj+1 solving equality (a) in (24) (0) c , θj+1 ) (unless ρ(θj+1 ) = 0, by line search over (θj+1 (1) (0) in which case θj+1 = θj+1 ). Set uj = μj , and (1) θj+1 = θj+1 . In this case, solution (1) exists. (0)

(0)

3) If ρ(θj+1 ) < 0, find uj solving ρj (uj , θj+1 ) = 0 (0) (c.f. (21)) by line search over (μj , μj+1 ) and set θj+1 = (0) θj+1 . In this case, solution (0) exists. 4) If j < N discard solutions in which θj+1 < 0. The last step, where solutions yielding θj+1 < 0 are discarded whenever j < N , responds to the fact that θj+1 is a complementary CDF value. This requirement is dropped only if one is calculating the last threshold (i.e., when j = N ), to allow a higher-level routine to iteratively adjust λ so that θN +1 = 0. (Such a routine is exemplified by Procedure 3 below.) Of course, unless j = N , solution (0) can be discarded (0) before doing the corresponding line search if θj+1 < 0. The same applies for solution (1) if θj+1 < 0. In each of the line searches mentioned in Procedure 1, there exists a single solution over the corresponding search interval, since, in all cases, the involved function is monotonic within it. This makes each step straightforward to execute. Notice also that for every j ≥ 1, and depending on the values of uj−1 , μj−1 and μj , there are between zero and three pairs of values for uj , μj+1 , namely solutions (0), (1) and (2), which satisfy (20) for that j. The above procedure can be applied sequentially to find all pairs uj , μj+1 , for j = 1, 2, . . . , N , that satisfy (20). More precisely, one can first choose values for u0 ∈ (0, ξ(γ)] and λ ∈ R+ 0 , with which one can calculate μ1 explicitly from (17a). Then, setting j = 1, Procedure 1 can be carried out to find at most three pairs of values for u1 , μ2 satisfying (20) for j = 1. Each solution can be considered a branch in a tree structure. Increasing j by one and repeating the procedure for each emerging branch, until j = N , yields the complete tree of valid solutions, each of which is associated with a path in the tree. Thus, for each choice of u0 and λ, there can be at most 3N different paths, each associated with a sequence of codepoints and thresholds satisfying (20) for all j = 1, 2, . . . , N . However, we shall see in the following that the number of valid paths obtained in practice is much smaller than 3N . Following our notation for solutions adopted in Procedure 1, we label each solution path in the tree using the numbers of the solution types associated with its segments, thus referring N to a sequence = {j }N j=1 ∈ {0, 1, 2} . For a given choice of u0 and λ, we define the set of valid solution paths as Ω(u0 , λ) ∈ {0, 1, 2}N : is a valid solution path for u0 , λ . (28) It is important to note that there is a one-to-one relationship between every solution {uj , μj+1 }N j=1 satisfying (20) for a given choice of u0 , λ and every path in Ω(u0 , λ). Indeed,

the path associated with any such solution can be easily determined by analyzing each of its code-point/threshold pairs using the the reasoning that led to Procedure 1. Now, suppose one wants to know whether a given path is associated with a valid solution to (20) and then find this solution. Instead of applying Procedure 1 to find the entire tree of valid paths and then checking if is one of them, one can apply the following algorithm, derived directly from Procedure 1, for this purpose: Procedure 2: Let u0 ∈ (0, ξ(γ)], λ ≥ 0 and the corresponding μ1 be given by (17a). Let ∈ {0, 1, 2}N be a path. Then the following steps can be taken to determine whether ∈ Ω(λ, u0 ) and find its associated thresholds and codepoints: 1) Set j = 1 (0) c explicitly using (22), (26), 2) Calculate θj+1 , θj+1 and θj+1 and (27), respectively. 3) If j = 0: (0)

(0)

a) If {θj+1 ≥ 0 or N = j} and ρj (μj , θj+1 ) < 0: Set (0) (0) μj+1 = μj+1 and calculate uj solving ρj (uj , θj+1 ) = (0) 0 by line search over (μj , μj+1 ). b) Else, declare ∈ / Ω(λ, u0 ) and quit. 4) Else if j = 1: (0)

a) If ρj (μj , θj+1 ) < 0 and ρj (μj , θj+1 ) > s(θj+1 ) > 0 (1) and {θj+1 ≥ 0 or N = j}: Calculate θj+1 solving c equality (a) in (24) by line search over (θj+1 , θj+1 ). (1) (1) If {θj+1 ≥ 0 or j = N } and s(θj+1 ) ≥ 0, output (1) solution uj = μj , θj+1 = θj+1 . Else, declare ∈ / Ω(λ, u0 ) and quit. b) Else, declare ∈ / Ω(λ, u0 ) and quit. 5) Else (if j = 2):

(2)

a) If ρj (μj , θj+1 ) ≥ s(θj+1 ): Calculate θj+1 solving equality (a) in (24) by line search over (θj+1 , θj ]. (2) (2) If {θj+1 ≥ 0 or j = N } and s(θj+1 ) ≥ 0, (2) output solution uj = μj , θj+1 = θj+1 . Else, declare ∈ / Ω(λ, u0 ) and quit. b) Else, declare ∈ / Ω(λ, u0 ) and quit. 6) If j = N , declare ∈ Ω(λ, u0 ) and quit. Else, set j = j + 1 and go to step 2). With the above procedure, the resulting values of all thresholds and codepoints are a function of , u0 and λ. For convenience, we denote this function by ϕ(, u0 , λ), defined as ϕ(, u0 , λ) {ui , μi+1 }N i=0 from Procedure 2 †

, if ∈ Ω(λ, u0 ) , otherwise

where we have chosen † as a special symbol to indicate that the path does not yield a valid solution. For future use, we also define φ(, u0 , λ) μN +1 from ϕ(, u0 , λ)

(29)

Although being a solution to (20) is a necessary condition for code-points and thresholds to be a solution to (17), two


additional conditions must be satisfied for sufficiency. The first one is the entropy constraint, expressed in the KKT equation (17f). The second is the construction condition μN +1 = ∞ or, equivalently, θj+1 = 0. For a given path , u0 and λ, this condition can be expressed as φ(, u0 , λ) = 0. Therefore, G ∈ {0, 1, 2}N , u0 ∈ (0, ξ(g)], λ ≥ 0, : φ(, u0 , λ) = 0 (30)

is the set of all paths , values for the first codepoint u0 and Lagrange multiplier λ associated with solutions to (17) for some entropy constraint, while S {ϕ(, u0 , λ) : (, u0 , λ) ∈ G} corresponds to the set of all such solutions. Finally, by defining h({ui , μi+1 }N i=0 ) and c({ui , μi+1 }N i=0 ), respectively, as the entropy and capacity associated with {ui , μi+1 }N i=1 , the original optimization problem (10) can be stated as maximize:

¯ (,u0 ,λ)∈G:h(ϕ(,u0 ,λ))≤H

c(ϕ(, u0 , λ))

(31)

Thus, we have reduced the problem from solving a set of (3N + 2) non-linear equations, over a (3N + 2)-dimensional space, into a moderate number of searches (one per valid path), over two dimensions, requiring the evaluation of a scalar function that takes N line searches to compute. The following is a procedure that can be utilized to solve (31) Procedure 3: 1) Define a grid of values of u0 over (0, ξ(γ)), say U , and a grid of values for λ ∈ [0, λmax ], say Λ, for some λmax . 2) For each u0 in U , a) For each λ in Λ, run Procedure 1 recursively to find all valid paths. b) Detect pairs of consecutive values of λ between which a sign change of θN +1 occurs for some valid path. In each of the intervals formed by such pairs, find the value of λ for which θN +1 = 0 by line search, running Procedure 2 for the corresponding path. 3) Calculate h(ϕ(, u0 , λ)) and c(ϕ(, u0 , λ)) for each of the (, u0 , λ) values found in the previous step. Select the combination that yields the largest capacity with an ¯ entropy not greater than H. In the following section we present an example in which Procedure 3 was utilized to find the solution to (31) for N = 3. V. E XAMPLE In this section we apply Procedure 3 to find the set G (see (30)), for the case N = 3, i.e., for quantizers having √ four cells, assuming g is Rayleigh fading with g being unitmean. The latter set contains the paths , values for u0 and λ that characterize a solution to (17), i.e., quantizer thresholds and code-points that yield a local maximum (or minimum) downlink capacity for some fixed maximum uplink entropy ¯ The obtained results are presented in Fig. 5, for constraint H. γ = 1 (SNR=0 dB), on the left, and for γ = 1000 (SNR=30 dB), on the right. The two top graphics in this figure correspond to downlink capacity v/s uplink entropy plots for all solutions to (17). It

585

can be seen that, although for each SNR there exist several such solutions, for each entropy constraint the number of these solutions is significantly smaller than 3N = 27. For the 0 dB case, the absolute maximum downlink capacity is 0.6466 bits/s/Hz, attained with an uplink entropy of 1.842 bits/block, by a solution with associated path = 1 1 1. The curve that yields this maximum is ended at that point and, to the eye, it seems as if there was a missing segment which would connect it to the curve that reaches the right boundary of the plot, at an entropy of 2 bits/block. The absence of this segment can be attributed to the fact that in the Lagrangian formulation of the problem, the uplink entropy is an inequality (and not an equality) constraint. The solution yielding the maximum downlink capacity for a given uplink entropy is given by the highest curve in each of the capacity v/s entropy plots. Interestingly, for entropies somewhat below log2 (3) and log2 (2) bits/block, the highestcapacity curve coincides with the optimal solution with 3 and 2 quantization cells, respectively. This can be seen by noticing from the bottom plots in Fig. 5 that a few bits/block below these entropy values, the optimal solution is such that one, two or three thresholds, respectively, tend to infinity, effectively leaving three, two and then one cell, as the entropy is decreased.3 Such behaviour suggests that for a finite uplink entropy constraint, the maximum downlink capacity over all quantizers is achieved with a quantizer having a finite number of cells. As already observed for the same SNR and two cells, the region of achievable downlink average capacities is given by the convex hull formed by all the curves in the capacity v/s entropy plots. Unlike what is observed for γ = 1, since the composite maximum capacity curve for γ = 1000 (Fig. 5 top-right) is not concave, achieving the maximum downlink capacity for some entropy constraint values requires timesharing between two regimes. A. Comparison with Fixed-Rate, Cardinality-Constrained Quantization The point of maximum capacity for each number of quantization levels corresponds to the solution obtained when a quantizer with that number of levels is optimized for maximum average downlink throughput without an entropy constraint. Therefore, those peak points are the solutions found in [10] for single-layer coding, where the quantizer was optimized under a cardinality constraint only. From this fact, we can conclude from Fig. 5 (top left) that at SN R = 0 dB, and for the same average uplink rate of an optimal fixed-rate quantizer from [10] with two levels (that is, 1 bit per CSI realization), which achieves 0.52347 bits/s/Hz, an optimal entropy-coded quantizer with 4 quantization cells yields (approximately) 0.56 bits/s/Hz. This represents an increase of roughly 8% in downlink average throughput for the same average uplink rate. The corresponding increase with respect to a fixed-rate quantizer with 3 cells goes from 0.6 bits/s/Hz (at a fixed 3 We note that any number of the available cells can be shrunk into single points or, more generally, zero-probability cells. Therefore, the set of solutions achievable with a given number of cells includes all the solutions attainable with fewer cells.

586


0.65


0.7

8.5

N=1 for γ = 0dB N=2 for γ = 0dB N=3 for γ = 0dB

0.6 0.55 0.5 0.45 0.4

optimal threshold and code−point values

0.35 0

0.5 1 1.5 entropy of the quantized output, bits/block

Thresholds μ Code-points u

4 3 2 1 0 0


2

N=1 for γ = 30dB N=2 for γ = 30dB N=3 for γ = 30dB

7.5 7 6.5 6 0

2

6 5

8

optimal threshold and code−point values


0.75


2

6 5

Thresholds μ Code-points u

4 3 2 1 0 0


2

Fig. 5. Downlink capacity v/s quantized output entropy for the solutions to (17), for Rayleigh fading with unit-mean channel power gain, using up to 4 quantization cells (N ≤ 3). Left: Average SNR γ = 1 (0 dB). Right: Average SNR γ = 1000 (30 dB).

rate of log 3 = 1.585 bits/CSI realization) to (approximately) 0.63 bits/s/Hz (using an optimal entropy-coded quantizer with 4 cells). For an SNR of 30 dB, Fig. 5 (top right) reveals that an optimal entropy-constrained quantizer provides smaller gains over fixed-rate optimal quantizers. Notice also that fixedrate quantizers can only operate at a limited set of uplink rates (given by log2 N bits/CSI realization), Thus, variablerate entropy-coded quantization is the only scheme which allows one to send quantized CSI using other average uplink rates (for example, rates below 1 bit/CSI realization). VI. C ONCLUSIONS We have proposed a numerical procedure to find the maximum downlink average capacity over block-fading channels, under a fixed per-block power constraint, when the receiver has perfect CSI and an error-less, delay-free, entropyconstrained uplink channel is available to convey quantized CSI to the transmitter. This procedure, which has a smaller numerical complexity than trying to directly solve the Lagrangian equations associated with the problem, also yields the quantizer thresholds and codepoints that achieve the optimal solution. Our results are valid for a broad class of channel fade distributions, including Nakagami and Ricean fading. We have applied the procedure proposed here to find optimal quantizers having 2 and 4 quantization cells. The obtained

results revealed that, for a given number of quantization cells, say L, maximum capacity is achieved at an uplink entropy slightly below log2 (L) bits/block. Furthermore, our results show that for any uplink entropy below log2 (L) bits/block, there is little to be gained (in average downlink capacity) by using more than L quantization cells or intervals. This suggests that for any finite uplink entropy, the optimal quantizer has a finite number of quantization intervals. Our analysis also revealed that for high average SNRs, achieving the maximum average downlink capacity requires time sharing between two regimes with different uplink entropies and associated capacities. As a final remark, we would like to mention that, after several attempts, the authors have found that the results and strategies developed here for a short-term power constraint do not seem to be applicable for the long-term power constrained version of the problem. Indeed, the latter problem appears to be significantly harder to solve than both the one addressed in this paper and the long-term problem without the uplink entropy constraint. VII. A PPENDIX A. Efficient Channel Coding for the Output of an Entropy Coded Quantizer Here we provide an example to illustrate how a discrete random source with small entropy (which can be associated with


the variable-length bitwords coming out of an entropy coder) can be efficiently transmitted using a matched channel coder. The latter coder is able to transmit the low-entropy source using less power and with a smaller message error probability than what is obtained when transmitting equiprobable symbols using 4-QAM modulation and maximum likelihood decoding. For this purpose, suppose we have two quantizers, each with four quantization cells. The first quantizer is not entropy coded, and each of its outcomes, represented using two bits, has equal probability. For simplicity, suppose that this quantizer is followed by a rate 1/2 error-correcting channel coder and that 4-QAM is utilized. Assume each symbol in the 4-QAM constellation has unit energy and that there is a memoryless channel in the uplink, with complex circularly symmetric white Gaussian noise with variance σn2 . Therefore, each quantized outcome (or message) is mapped into channelsymbol sequences of length two (that is, two consecutive 4QAM symbols are sent for each message), which yields an average energy of 2. At the other end, the decoding is done by by picking the most likely transmitted symbol sequence given the received signal. For this scheme, it can be found (either analytically or via simulations) that at an SNR of 10 dB, the message error probability is approximately 0.9 × 10−3 . For the entropy-coded quantizer, suppose that its four possible outcomes, say m1 , m2 , m3 , m4 , have probabilities 1/2, 1/4, 1/8 and 1/8, respectively. An entropy coder for this quantizer (for example, a Huffman coder) would output 1, 2, 3 and 3 bits, respectively, for each of its outcomes. In order to send these four outcomes (or messages) over the feedback channel, consider a time-varying digital modulator generating symbols from the constellations shown in Fig 6 (top). Each of these symbols (except symbol o) has unit energy. An outcome from the entropy-coded quantizer is fed back to the transmitter by sending a sequence of three channel symbols, each coming from the first, second and third constellation, as specified in the table of Fig 6. Notice that messages with higher probability are mapped to symbol sequences with smaller total energy, the latter being proportional to the length of the bitwords yielded by a Huffman entropy coder (i.e., proportional to − log2 (pmi ), where pmi is the probability of the i-th message or quantizer’s outcome). This corresponds to the sequence-length (or energy) distribution that minimizes average energy, the latter energy being in this case equal to 7/4. Therefore, the average energy required by the channel coding scheme for the entropy-coded quantizer is just 7/8 of the mean energy associated with the fixed-rate quantizer. On top of this power reduction gain, if this “variable-rate” scheme matched to the entropy-coded quantizer is combined with a maximum-likelihood sequence decoder at an SNR of 10 dB, then each message is decoded with a message error probability not greater than 0.6 × 10−3 . This shows that if the outcomes of the CSI quantizer have uneven probabilities (which amounts to a smaller entropy than with equi-probable outcomes), then it is possible to transmit the quantized CSI using less power and with a smaller probability of error than when transmitting the outcomes of a fixed-rate quantizer.

587

B. Proof of Lemma 1 Proof: We have that γ d α(u) = (F (u) − F (μ)) + ln (1 + γu) f (u) (32) du 1 + γu Differentiating again, 2 γ d2 α(u) = − (F (u) − F (μ)) du2 1 + γu γ f (u) + ln (1 + γu) f (u) +2 1 + γu γ γ (F (μ) − F (u)) = 2f (u) + 1 + γu 1 + γu + ln (1 + γu) f (u) γ 2f (u) + ln (1 + γu) f (u), ≥ 1 + γu (33) where we use f as a short-hand notation for df /du. For every u such that f (u) = 0, it readily follows from the structure of f (·) (see (1)) that f (u) = 0 and therefore (33) immediately yields d2 α(u)/du2 ≥ 0. The same holds for u = 0. Thus, it is only left to consider values of u > 0 and such that f (u) > 0. If dα(u)/du ≤ 0, then, from (32), γ (F (μ) − F (u)) ≥ ln(1 + γu)f (u). (34) 1 + γu Since we are only considering the cases in which f (u) > 0, (34) implies F (μ) − F (u) > 0. This allows one to substitute (34) into (33), obtaining d2 ln(1 + γu) 2f (u)2 + ln (1 + γu) f (u) α(u) ≥ 2 du F (μ) − F (u) 2f (u)2 + f (u) = ln(1 + γu) F (μ) − F (u) Thus, for every u > 0 at which dα(u)/du ≤ 0, the following implication holds 2f (u)2 + [F (μ) − F (u)] f (u) ≥ 0 =⇒

d2 α(u) ≥ 0 (35) du2

On the other hand, if dα(u)/du > 0, then γ (F (μ) − F (u)) < ln(1 + γu)f (u). 1 + γu

(36)

Since we are considering values of u such that f (u) > 0, we can substitute (36) into (33), obtaining d2 γ 2f (u) + ln (1 + γu) f (u) α(u) ≥ (37) du2 1 + γu γ f (u) ≥ 2f (u) + (F (μ) − F (u)) (38) 1 + γu f (u) It then follows that (35) also holds if dα(u)/du > 0. We therefore conclude that, irrespective of the sign of dα(u)/du, d2 α(u) ≥ 0 (39) du2 Now, from the definition of f , we have that μ μ F (μ) − F (u) = f (x)dx = K1 e−K2 x β(x)dx u u μ K1 −K2 x 1 1 = f (u) − f (μ) + e β (x)dx K2 K2 u K2 2f (u)2 + [F (μ) − F (u)] f (u) ≥ 0 =⇒

588


a

a

a o

o 30o

c

m1 m2 m3 m4

→ → → →

b a b c d

d c

d o a c d

b o o a b

probability energy Error probability @ SNR=10dB 1/2 1 0.42 × 10−3 0.56 × 10−3 1/4 2 1/8 3 0.53 × 10−3 1/8 3 0.53 × 10−3 mean energy 7/4

Fig. 6. Top: Three consecutive constellations of a digital modulator adapted for transmitting the outcomes of an entropy-coded quantizer with 4 quantization cells. Symbols a, b, c, d have unit energy. Bottom: A mapping between quantization outcomes (messages m1 to m4 ) and channel symbol sequences. The 2. error probabilities refer to message errors, and the associated SNR is 1/σn

On the other hand, f (u) = −K2 f (u) + K1 e−K2 u β (u) β (u) = −K2 f (u) + f (u) β(u)

(40) (41)

Therefore, 2f (u)2 + [F (μ) − F (u)] f (u) 1 1 f (u) − f (μ) = 2f (u)2 − K2 K2 μ K1 −K2 x + e β (x)dx K2 f (u) u K2 β (u) + [F (μ) − F (u)]f (u) β(u) μ β (u) β (x) − f (x) = f (u) f (u) + f (μ) + dx β(u) β(x) u (42)

(x) is a non-increasing function, it follows that the Since ββ(x) RHS of (42) is positive. Therefore, we obtain from (39) that α(u) is also convex for all u ∈ (0, μ] such that f (u) > 0. This completes the proof.

R EFERENCES [1] M. Medard, “The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel,” IEEE Trans. Inf. Theory, vol. 46, no. 3, pp. 933–946, May 2000. [2] D. J. Love, R. W. Heath, V. K. N. Lau, D. Gesbert, B. D. Rao, and M. Andrews, “An overview of limited feedback in wireless communication systems,” IEEE J. Sel. Areas Commun., vol. 26, no. 8, pp. 1341–1365, Oct. 2008. [3] A. J. Goldsmith and P. P. Varaiya, “Capacity of fading channels with channel side information,” IEEE Trans. Inf. Theory, vol. 43, no. 6, pp. 1986–1992, Nov. 1997. [4] G. Caire and S. Shamai, “On the capacity of some channels with channel state information,” IEEE Trans. Inf. Theory, vol. 45, no. 6, pp. 2007– 2019, Sept. 1999. [5] L. H. Ozarow, S. Shamai, and A. D. Wyner, “Information theoretic considerations for cellular mobile radio,” IEEE Trans. Inf. Theory, vol. 43, no. 2, pp. 359–378, May 1994. [6] E. Biglieri, J. Proakis, and S. Shamai, “Fading channels: informationtheoretic and communications aspects,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2619–2692, Oct. 1998. [7] L. Lin, R. D. Yates, and P. Spasojevic, “Adaptive transmission with discrete code rates and power levels,” IEEE Trans. Commun., vol. 51, no. 12, pp. 2115–2125, Dec. 2003. [8] P. Ligdas and N. Farvadin, “Optimizing the transmit power for slow fading channels,” IEEE Trans. Inf. Theory, vol. 46, no. 2, pp. 565–576, Mar. 2000.

[9] V. K. N. Lau, Y. Liu, and T.-A. Chen, “Capacity of memoryless channels and block-fading channels with designable cardinality-constrained channel state feedback,” IEEE Trans. Inf. Theory, vol. 50, no. 9, pp. 2038–2049, Sept. 2004. [10] T. T. Kim and M. Skoglund, “On the expected rate of slowly fading channels with quantized side information,” IEEE Trans. Commun., vol. 55, no. 4, pp. 820–829, Apr. 2007. [11] S. Ekbatani, F. Etemadi, and H. Jafarkhani, “Transmission over slowly fading channels using unreliable quantized feedback,” in Proc. 2007 Data Compression Conf., pp. 353–362. [12] T. Wu and V. K. N. Lau, “Robust rate, power and precoder adaptation for slow fading MIMO channels with noisy limited feedback,” IEEE Trans. Wireless Commun., vol. 7, no. 6, pp. 2360–2367, June 2008. [13] V. K. N. Lau and Y.-K. R. Kwok, Channel Adaptive Technologies and Cross Layer Designs for Wireless Systems with Multiple Antennas. Wiley-Interscience, 2006. [14] T. Wu and V. K. N. Lau, “Robust precoder adaptation for MIMO links with noisy limited feedback,” IEEE Trans. Inf. Theory, vol. 55, no. 4, pp. 1640–1649, Apr. 2009. [15] A. Gjendemsjo, S. de Ryhove, and G. E. Oien, “Spectral efficiency vs. feedback load in discrete-rate link adaptation: is a zero information outage constraint optimal or not?” in Proc. 2008 European Wireless Conf., pp. 1–6. [16] T. Berger, “Optimum quantizers and permutation codes,” IEEE Trans. Inf. Theory, vol. IT-18, no. 6, pp. 759–765, Nov. 1972. [17] J. C. Kieffer, T. M. Jahns, and V. A. Obuljen, “New results on optimal entropy-constrained quantization,” IEEE Trans. Inf. Theory, vol. 34, no. 5, pp. 1250–1258, Sep. 1988. [18] A. György and T. Linder, “Optimal entropy-constrained scalar quantization of a uniform source,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2704–2711, Nov. 2000. [19] A. György, T. Linder, P. A. Chou, and B. J. Betts, “Do optimal entropyconstrained quantizers have a finite or infinite number of codewords?” IEEE Trans. Inf. Theory, vol. 49, no. 11, pp. 3031–3037, Nov. 2003. [20] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd edition. Wiley-Interscience, 2006. [21] S. Lembo, C.-H. Yu, and O. Tirkkonen, “Block fading channels with limited channel state information,” in Proc. 2010 Int. Information Theory and its Applications Symp., pp. 807–811. [22] V. Buttigieg and P. G. Farrell, “Variable-length error-correcting codes,” IEE Proceedings-Commun., vol. 147, no. 4, pp. 211–215, Aug. 2000. [23] K. Sayood, H. H. Otu, and N. Demir, “Joint source/channel coding for variable length codes,” IEEE Trans. Commun., vol. 48, no. 5, pp. 787– 794, May 2000. [24] A. Hedayat and A. Nosratinia, “Performance analysis and design criteria for finite-alphabet source-channel codes,” IEEE Trans. Commun., vol. 52, no. 11, pp. 1872–1879, Nov. 2004. [25] K. R. Kumar and G. Caire, “Channel state feedback over the MIMOMAC,” IEEE Trans. Inf. Theory, vol. 57, no. 12, pp. 7787–7797, Dec. 2011. [26] V. A. Vaishampayan and N. Farvardin, “Joint design of block source codes and modulation signal sets,” IEEE Trans. Inf. Theory, vol. 38, no. 4, pp. 1230–1248, July 1992. [27] D. Luenberger, Optimization by Vector Space Methods. John Wiley and Sons, Inc., 1969. [28] J. M. Borwein and A. S. Lewis, Convex Analysis and Nonlinear Optimization—Theory and Examples. Canadian Mathematical Society, 2005.


V ´ıctor M. Elizondo graduated as Ingeniero Civil Electrónico and obtained the M.Sc. degree from Universidad Técnica Federico Santa Mar´ıa in 2011. During this period, he was supported by research scholarships from research grants CONICYT ACT53 , FONDECYT 3100109 and PIIC-2010. He is now with LAN Airlines working as Electronic Engineer. His main research interests include wireless communications, optimization theory and information theory.

589

Milan S. Derpich (S’08–M’09) received the Ingeniero Civil Electrónico degree from the Universidad Técnica Federico Santa Mar´ıa (UTFSM), Valpara´ıso, Chile in 1999. During his time at the university he was supported by a full scholarship from the alumni association and upon graduating received several university-wide prizes. Mr. Derpich also worked by the electronic circuit design and manufacturing company Protonic Chile S.A. between 2000 and 2004. In 2009 he received the PhD degree in electrical engineering from the University of Newcastle, Australia. He received the Guan Zhao-Zhi Award at the Chinese Control Conference 2006, and the Research Higher Degrees Award from The Faculty of Engineering and Built Environment, University of Newcastle, Australia, for his Ph.D. thesis. Since 2009 he has been with the Department of Electronic Engineering at Universidad Técnica Federico Santa Mar´ıa, Chile. His main research interests include rate-distortion theory, communications, networked control systems, sampling and quantization.

Maximum Expected Rates of Block-Fading Channels with Entropy ...

Maximum Expected Rates of Block-Fading Channels with Entropy ...

Suggest Documents

Maximum Entropy Models: Convergence Rates ... - Semantic Scholar

Entropy Expected Surprise

Maximum Entropy Models: Convergence Rates and Application in ...

Combining Propositional Logic with Maximum Entropy ... - CiteSeerX

Maximum Likelihood Sequence Estimation in Channels with ...

Maximum Entropy Learning with Deep Belief Networks

Tissue Radiation Response with Maximum Tsallis Entropy

Chunking with Maximum Entropy Models - Association for

Maximum Entropy - Wolfram

maximum entropy method - Hindawi

Maximum Entropy Wave functions

Determination of Maximum Entropy Probability

Domain Adaptation of Maximum Entropy

Pontryagin Maximum Principle and Maximum Entropy ...

Normalized Expected Utility-Entropy Measure of Risk

PicXAA: greedy probabilistic construction of maximum expected

Maximum-entropy model - Google Sites

Maximum Entropy Discrimination Denoising Autoencoders

Information Entropy Dynamics and Maximum

Maximum Entropy and the Lottery.

Maximum entropy distributions on graphs

maximum entropy flow networks - OpenReview

Maximum Entropy Reconstruction Methods in

Fact-Checking Ziegler's Maximum Entropy