Transceiver Design for MIMO Wireless Systems ... - IEEE Xplore

TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 32

ADVANCES IN SIGNAL PROCESSING FOR COMMUNICATIONS

Transceiver Design for MIMO Wireless Systems Incorporating Hybrid ARQ Jungwon Lee and Hui-Ling Lou, Marvell Semiconductor Dimitris Toumpakaris, University of Patras Edward W. Jang and John M. Cioffi, Stanford University

ABSTRACT Hybrid ARQ, an extension of ARQ that incorporates forward error correction coding, is a retransmission scheme employed in current communications systems. The use of HARQ can contribute to efficient utilization of the available resources and the provision of reliable services in latest-generation systems. This article focuses on wireless systems using HARQ with emphasis on the multiple-input multiple-output paradigm. MIMO-HARQ offers new opportunities because of the additional degrees of freedom introduced by the multiple antennas at the transmitter and receiver. The architecture of MIMO transceivers that are based on bit-interleaved coded modulation and employ HARQ is described. Additionally, receiver implementations are presented and compared in terms of complexity, memory requirements, and performance.

INTRODUCTION Automatic repeat request (ARQ) protocols are used to improve the reliability of communications networks. In systems employing ARQ, the receiver asks for retransmission of packets that are corrupted. Because only error detection is required to determine whether a packet should be accepted, the coding overhead is small, and the system throughput is not considerably affected, especially when the channel quality is good. However, when the channel deteriorates, the retransmissions may result in significant throughput loss. A possible remedy is to use an error correcting code separate from ARQ in order to provide a more reliable channel, but this also reduces the throughput of the system. Instead of following this layered approach, hybrid ARQ (HARQ) systems attempt to reap the benefits of both ARQ and forward error correction (FEC) by combining the two schemes [1]. The HARQ receiver handles error detection and correction as well as retransmission requests simultaneously. A retransmission is requested only when the receiver detects an uncorrectable error. More-

32

0163-6804/09/$25.00 © 2009 IEEE

over, the packets are kept at the receiver to be used again for decoding after each retransmission. By combining error correction and retransmission, and appropriately choosing an FEC scheme whose aim is to correct the most frequent errors, HARQ can achieve better throughput performance than ARQ for a given channel. Because HARQ can contribute to more efficient use of the available resources, it has been included in latest-generation wireless systems such as IEEE 802.16e [2] and 3GPP-LTE [3]. The simplest HARQ scheme, called Chase (or code) combining HARQ (CC-HARQ) or type I HARQ, consists of retransmitting the same symbol sequence repeatedly until the receiver decodes the packet successfully. More sophisticated incremental redundancy (IR-) HARQ schemes (Type II/III) transmit different symbol sequences in general. The difference emanates from employing different coding schemes for the same data, using different coding polynomials, or modulating different subsets of the encoder output. The focus of this article is the use of HARQ in latest-generation wireless systems that employ bit-interleaved coded modulation (BICM). The architecture of BICM-based wireless systems employing HARQ is described by considering a special case of IR-HARQ as an example, where a different subset of bits of a mother code is sent during each transmission. Because IR-HARQ can benefit from a coding gain, it generally performs better than CCHARQ for optimal receiver implementations. However, suboptimal practical implementations may affect the performance of IR-HARQ, especially in multiple-input multiple-output (MIMO) systems. On the other hand, when CC-HARQ is used, data are only coded once at the transmitter. Moreover, it is possible to reduce complexity and memory by combining received symbols, as explained in more detail in the following. Hence, for systems that cannot afford large complexity and storage requirements, it may be preferable to use CC-HARQ instead of IR-HARQ. Arguably, the most important recent physical layer enhancement of wireless systems is the use

IEEE Communications Magazine • January 2009

TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 33

of MIMO transmission. Multiple antennas provide additional degrees of freedom, leading to significant capacity increase. Multiple antennas can also be used to provide beamforming gains and reduce the outage probability. Therefore, it is of particular interest to examine how HARQ can be incorporated into MIMO transceivers and its impact on the system performance, complexity, and storage requirements. The performance of MIMO-HARQ depends not only on noise and temporal channel variations that affect SISO-HARQ as well, but also on the interference between the signals transmitted by the multiple antennas. As described later in this article, in some cases the design involves a trade-off between system performance and receiver complexity and memory requirements. Simplifying the receiver of a MIMO-HARQ system to reduce storage and complexity may increase sensitivity to interstream interference. Not only can HARQ be viewed as a retransmission technique that exploits time diversity; it can also be used in the context of systems that employ macrodiversity. If a mobile station communicates with two or more base stations that can exchange information, the system can combine the signals of the base stations before decoding using the same techniques as for HARQ. In that case, the HARQ receiver storage requirements translate to requirements on the necessary bandwidth for the communication between the base stations. Therefore, results derived for HARQ can be applied to such systems, which may become increasingly common in the future. This article is organized as follows. In the next section the architecture of a single-input single-output (SISO) transceiver using BICM and HARQ is presented. The MIMO case is then considered with different receiver implementations. The following section contains some discussion of MIMO system design based on the employed HARQ scheme, receiver complexity, and storage requirements. Finally, some concluding remarks are provided.

SISO-HARQ SYSTEMS EMPLOYING BICM Figure 1 depicts the transmitter of a MIMO system employing BICM and HARQ. In a typical SISO system the architecture of Fig. 1a can be employed, with the difference that the last block, which maps symbols to different antennas, is not necessary because the system only uses one antenna. A bit sequence d = [d[0], d[1], …, d[L – 1]] of length L is encoded using a rate r mother code to produce the encoded bit sequence c = [c[0], c[1], …, c[L/r – 1]]. For example, in IEEE 802.16e-compliant systems, a rate-1/3 convolutional turbo code (CTC) can be employed [2]. For each block of L bits, 3L bits are produced. The first L (systematic) bits are the original input bits. The encoder block also contains the interleaving operations, if any. A subset of the mother code bits is selected for transmission. When CC-HARQ is used, the bit selection module always outputs the same sequence. For IRHARQ, the indices of the selected bits depend


d

c Encoder

Bit selection

b(i)

Bits-tosymbols mapping

s(i)

Symbols-toantennas mapping

_x(i)

_‘ x

_(i) x

(a) MIMO-HARQ transmitter d

c Encoder

s‘

Bits-tosymbols mapping

Symbols-toantennas mapping

Symbol vector selection

(b) MIMO-HARQ transmitter employing symbol vector selection

■ Figure 1. Transmitter architectures for MIMO systems employing BICM and HARQ. For SISO systems, the blocks mapping symbols to antennas are not used.

d

Data bits

L c

Data bits

L

b(0)

Data bits

L

L

Data bits

L

Parity bits 2

L b(2)

Parity bits 2

Parity bits 1

L b(1)

Parity bits 1

Data bits

L/5

Data bits

Parity bits 1

Parity bits 2

L

4L/5

2L/5

Parity bits 1

L

L

L

Parity bits 2

Data bits

3L/5

3L/5

• • •

• • •

(a) Rate-1/2 output

(b) Rate-5/6 output

■ Figure 2. Examples of bit selection for IR-HARQ transmission.

on the transmission index. An example of IRHARQ bit selection for IEEE 802.16 systems is given in Fig. 2. Although the CTC case is examined in the figure, the bit selection is similar when convolutional coding, block turbo coding, or LDPC codes are used. In Fig. 2a, 2L bits are sent during each transmission. During the first transmission, b(0) = [c[0], c[1], …, c[2L – 1]] = [d[0], d[1], …, d[L – 1], cL, c[L + 1], …, c[2L – 1]]. During the second transmission, b (1) = [c[2L], c[2L + 1], …, c[3L – 1], c[0], c[1], …, c[L – 1]] = [c[2L], c[2L + 1], …, c[3L – 1], d[0], d[1], …, d[L – 1]], where the fact that the first L bits of the CTC are the systematic bits is used.

33

TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 34

y _ (0) y _ (1) • • y(N) • _

LLRs

ML detector and LLR calculator

Decoder

^ d

y(i) _

Channel estimates H(i) (a) MIMO-HARQ receiver

Symbol combining

~ y

_

^ x 1

Channel combining

y_ [m]=Σ H(i)*[m]y_(i)[m] i

~

H

• • •

^ x nt

Decoder

^ d

LLR calculator

LLRs Decoder

^ d

i

LLR calculator

+ D

w1(i)

^ (i) x1

MIMO equalizer

γ0(i)

w0(i)

^ x 0(i)

y(i) _

~ H[m]=Σ H(i)*[m]H(i)[m]

H

LLR calculator

• • •

~

Channel combining

LLR calculator

(d) Pre-equalization symbol-level combining MIMO-HARQ receiver

~

MIMO equalizer H(i)

y _

MIMO equalizer H(i)

y _ (i)

^ x 0

~

Symbol combining

LLR calculator

+ D

• • •

γnt(i)

(i)

wnt

^ x nt(i)

H(i)

γ1(i) MIMO decoder

^ d

• • •

LLR calculator

+ D

(b) Symbol-level combining MIMO-HARQ receiver

y(i) _

(e) Post-equalization symbol combining MIMO-HARQ receiver ^ (i) x0

y(i) _

ML detector and LLRs + LLR calculator

Decoder

^ d ^ (i) x1

D Channel estimates H(i)

MIMO equalizer H(i)

(c) Bit-level combining MIMO-HARQ receiver

LLR calculator LLR calculator

LLR accumulation

Decoder

^ d

• • •

• • • ^ (i) x nt

LLR calculator

(f) Bit-level combining MIMO-HARQ receiver employing equalization

■ Figure 3. Receiver architectures for MIMO systems employing BICM and HARQ.

Therefore, the systematic bits are included in both transmissions, whereas different parity bits are sent during the odd and even transmissions. In Fig. 2b, 6L/5 bits are sent during each transmission. The second transmission only contains parity bits, whereas during the third transmission, only the first 3L/5 systematic bits are sent. The second rate-5/6 scheme is more susceptible to errors, but requires fewer resources, because fewer bits need to be sent during each transmission. b(i) is then sent to a bits-to-symbols mapper. Typical modulation schemes are binary phase shift keying (BPSK), quaternary PSK (QPSK), 16-quadrature amplitude modulation (QAM), and 64-QAM. The symbol sequence s(i) = [s (i)[0], s (i)[1], …, s (i)[M]] is then sent to the channel using single- or multicarrier schemes. Both IEEE 802.16e and 3GPP-LTE rely on multicarrier transmission. The length M of the symbol sequence depends on the modulation scheme and is equal to the length of b(i) divided by the number of bits mapped to each symbol s(i)[m]. As mentioned previously, in general, the bit

34

sequence d may be re-encoded at each transmission i. For example, in [4] the re-encoded bit sequence results in a sequence c(i). By appropriate design of the coded sequence c(i) and the bit selection process, the coding gain of IR-HARQ can be improved. Code design that also exploits the MIMO channel to improve the performance of HARQ is an active area of research [5]. Although IR-HARQ code design is a very interesting topic per se, this article attempts to address the implementation of a transceiver for HARQ and the design trade-offs from a generic point of view. Clearly, the exact performance and complexity trade-offs will depend on the details of the HARQ scheme employed. The specific IEEE 802.16e IR-HARQ scheme assumed in the remainder of the article merely serves as an example and to facilitate the discussion of the transceiver architectures. Figure 3a presents a receiver for the HARQ system of Fig. 1a. Flat fading is considered, and the effect of the channel can be modeled as multiplication with a complex number. Therefore,


TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 35

the only difference with from MIMO case shown in the figure is that for SISO-HARQ, the complex matrix H(i)[m], where m is the symbol index, comprises only one complex element, h (i) [m]. For OFDM systems, h(i)[m] equals the frequency response of the subcarrier through which symbol m is transmitted. A maximum likelihood (ML) detector uses all channel outputs y(i) = [y(i)[0], y (i) [1], …, y (i) [M]] corresponding to different transmissions i, together with the channel estimates h(i) = [h(i)[0], h(i)[1], …, h(i)[M]] to calculate the log-likelihood ratio (LLR) for each bit of the mother code c. Therefore, the detector also incorporates knowledge of the employed code, the bit selection pattern, and the bit-tosymbol mapping. For example, in Fig. 2a, after the first transmission, the LLRs of the first 2L bits of the mother code are calculated, whereas the LLRs for the remaining L bits are set to zero. The LLRs are then sent to the decoder that produces an estimate dˆ for the original bit sequence d. If the receiver determines that dˆ is corrupted, a second transmission is requested. The decision may be based on parity bits included in d, such as a cyclic redundancy check (CRC) code, or metrics obtained while decoding. After the second transmission, both y (0) and y (1) are used, together with the channel estimates h (0) and h (1) , to yield LLRs for all the bits of the mother code. The LLRs for the systematic bits are more reliable after the second transmission, because new information has been received. Decoding proceeds with LLRs for all bits of the mother code. If the decoded sequence dˆ is still found to be corrupted, a third transmission is requested, and so on. The standard retransmission strategies of ARQ, such as stop-and-wait, go-back-N, and selective-repeat, can also be used for HARQ. As the number of transmissions grows, the ML detector and LLR calculator block becomes increasingly complex. The block has to be designed for the maximum allowed number of transmissions, N. The required memory is also an issue, because all received symbols and channel estimates need to be stored until decoding succeeds or the maximum number of transmissions is reached. As explained below, when CCHARQ is used, the design of the receiver can be simplified and the required memory reduced without affecting performance. Receiver simplification can also be achieved for IR-HARQ by controlling the bit selection scheme at the transmitter. When CC-HARQ is employed, the same bit sequence b is produced during all transmissions. Therefore, the symbol sequence s(i) sent to the channel does not depend on i. It can be shown that instead of feeding directly all received sequences y(i) and channel estimates h(i) to the ML detector and LLR calculator block, they can be used to derive an equivalent sequence ~ y and ~ an equivalent channel estimates sequence h by performing maximal-ratio combining (MRC) [6]. ~ Only ~ y and h need to be used by the ML detector and LLR calculator to produce the LLRs for the mother code bits. This symbol-level combining scheme is shown in Fig. 3b. For the SISO case, the matrix sequences H (i) should be replaced by sequences h (i) of scalars, whereas


sequences of received vectors _y(i) are replaced by sequences of scalars y(i). The symbol-level combining scheme is equivalent to the receiver of Fig. 3a and is therefore optimal. The storage requirements at the receiver are reduced by a factor equal to the limit on total transmissions, N. Combining y(i) and h(i) consists of multiplying with complex scalar values (h * (i)[m]) [6]. The approach of Fig. 3b can also be used in IRHARQ systems as long as the bit selector is designed so that the alignment between bits and symbols does not change. For example, for 16QAM, if each symbol is formed using bits [c[4m], c[4m + 1], c[4m + 2], c[4m + 3]], a new symbol containing [c[4m], c[4m + 1], c[4m + 2], c[4m + 3]] will be generated after a certain number of retransmissions. Then the received symbols containing the same bits in the same order can be combined before detection. For example, in one of the modulation and coding schemes used in IEEE 802.16e systems where the data packet length is 54 bytes, the CTC rate is 1/3, and 64QAM is used, 54 × 8 × 3 = 1296 bits correspond to 1296/8 = 162 64-QAM symbols. The simplified receiver can be used, because bits 8m to 8m + 7 will always be mapped to symbol m. For IRHARQ, the required storage at the receiver is proportional to the maximum number of different symbols s that can be generated, which equals L/(r × b), where r is the rate of the mother code, L is the length of the original data sequence, and b is the number of bits transmitted in each symbol. Hence, more storage is required than with CC-HARQ. This also means that the ML detector and LLR calculator block becomes more complex because it needs to process an equivalent symbol sequence of length L/(r × b) that is larger than M. If bits and symbols are not aligned, it may not be possible to use the receiver of Fig. 3b, because the number of different symbols s may exceed L/(r × b) and become very large. In order to simplify the receiver, ML detection and LLR calculation can be performed separately for each y(i), as shown in Fig. 3c. The LLRs per mother code bit and per transmission are then simply added together to produce the LLR value sent to the decoder. This bit-level combining scheme has suboptimal performance. However, the performance loss in SISO systems is generally small. By using bit-level combining, the storage requirements at the receiver are reduced when the bits of the mother code are fewer than the maximum number of received symbols s and channel estimates h that need to be stored. This is true, in general, unless the allowed maximum number of retransmissions is small. The complexity of the ML detector and LLR calculator block is also reduced. Since the performance loss of the bit-level combining receiver is not significant for SISOHARQ systems, IR-HARQ generally performs better than CC-HARQ even when bit-level combining is used. This happens because the loss incurred by the suboptimal implementation of the receiver is usually smaller than the coding gain of IR-HARQ. However, as explained in the following section, because of interstream interference, bit-level combining in MIMO-HARQ may result in significant performance degrada-

When CC-HARQ is used, the design of the receiver can be simplified and the required memory can be reduced without affecting performance. Receiver simplification can also be achieved for IR-HARQ by controlling the bit selection scheme at the transmitter.

35

TOUMPAKARIS LAYOUT

12/18/08

The performance penalty when trying to simplify the receiver of MIMO-HARQ systems using bit-level combining is larger compared to SISO systems.

3:40 PM

Page 36

tion. Thus, for practical MIMO receiver implementations subject to complexity and memory constraints, the choice between CC-HARQ and IR-HARQ may not always be straightforward. It should also be noted that some cases have been identified where CC-HARQ performs better than IR-HARQ in SISO systems even when the optimal receiver of Fig. 3a is employed. Examples include systems where the effect of fading on different IR-HARQ codewords varies, especially when the codewords that contain the systematic part of the code are severely affected [7].

The performance degradation increases further when equalization is used instead of ML detection.

36

HARQ APPLIED TO MIMO SYSTEMS As shown in Fig. 1a, compared to the SISO case, the transmitter for MIMO-HARQ includes an additional symbols-to-antennas mapping block after the generation of the modulated symbols s (i) that determines from which antenna each symbol will be transmitted. In the general case, a given symbol s may be transmitted from more than one antenna, or the antennas may transmit linear combinations of the original symbols. Specifically, the symbols-to-antennas mapper generates a sequence of nt × 1 symbol vectors x(i) = [x_(i)[0], _x(i)[1], …, _x(i)[K]] based on the symbol sequence s(i) = [s(i) [0], s(i)[1], …, s(i)[M]], where nt is the number of transmit antennas. Each vector _x is sent through the MIMO channel, resulting in an nr × 1 vector _y at the receiver, where nr is the number of receive antennas. As in the SISO case, flat fading is considered, and the effect of the MIMO channel is modeled using a sequence H (i) of n r × n t matrices. The capacity and diversity gains that can be achieved depend on the correlation between the received signals (i.e., the condition of the channel matrix. Ideally, a well conditioned channel matrix is desired. Therefore, in addition to noise and fading, the two factors affecting transmission in SISO systems, MIMO systems are also subject to interstream interference. When the interference is high, transmission may be severely affected even when the received power per antenna is large. HARQ can be used in MIMO systems to combat interstream interference in addition to noise and channel gain fluctuations caused by fading. Although not shown in the figure, the symbol-toantenna mapper may also employ a space-time block code (STBC). When CC-HARQ is employed, the simplified transmitter of Fig. 1b can be used. The transmitter can also be used for IR-HARQ, as long as the alignment between bits and signal vectors does not change. A bits-to-symbols mapper creates a symbol sequence s′′ based on the encoded bits sequence c, and is followed by a symbols-toantennas mapper that transforms s′′ to a symbol vector sequence x′′. The transmitter is simpler because the symbol vector sequence x′′ can be precomputed. However, the main benefit is the simplification of the receiver, as described in the following. Similar to the SISO case, as shown in Fig. 3a, the nr × 1 received symbol vectors can be sent to an ML detector and LLR calculator block that

combats interstream interference in addition to compensating for noise and channel fading. The ML detector and LLR calculator block is more complex than the SISO case, because matrix and vector operations are involved. Once the LLRs are produced, decoding proceeds in exactly the same way as in SISO systems. Some questions now emerge. When the alignment between bits and symbol vectors is fixed, can symbol-level combining be used for MIMOHARQ similar to the SISO case? As shown later, that is indeed possible by an extension of the SISO-MRC scheme. Can the receiver be simplified if the alignment between bits and symbol vectors is not fixed or symbol-level combining is not possible in early retransmissions, and what are the implications to the system performance and complexity? As described below, the performance penalty when trying to simplify the receiver of MIMO-HARQ systems using bitlevel combining is larger than in SISO systems. The performance degradation increases further when equalization is used instead of ML detection. Therefore, when realistic MIMO receiver implementations are desired, a careful assessment of the performance loss of IR-HARQ because of bit-level combining should be made. These questions are addressed in more detail in the remainder of this section. CC-HARQ is considered first. The observations can be extended to the case of IR-HARQ where bit-to-symbol vector alignment is preserved. The same symbol vector sequence x is sent during each retransmission. As in the SISO case, instead of using the receiver of Fig. 3a, the architecture of Fig. 3b can be employed. It can be shown that an MRC-like combining scheme can be used to form an equivalent nt × 1 symbol vector sequence _~ y~and an equivalent channel matrix sequence H from the received symbol vector and channel matrix estimate sequences ~ _y(i) and H(i), respectively. Each H [m] is a Hermitian matrix of size nt × nt [6]. Thus, the MIMOHARQ problem is converted to an equivalent single-transmission MIMO problem, because the ~ sizes of _~ y and H remain the same after each retransmission. An ML detector and LLR calculator block that uses only one symbol vector sequence and one channel estimate sequence can then be used. Therefore, the memory requirements of the symbol-level combining receiver of Fig. 3b are reduced from those of the receiver of Fig. 3a. This simplification of the receiver is aided by reusing the same ML detector and LLR combiner block after each transmission. Moreover, numerical techniques such as QR decomposition can be used for implementation [6]. The receivers of Figs. 3a and 3b are equivalent, so there is no loss in performance. For IR-HARQ, the receiver of Fig. 3b can be used by considering all different symbol vectors ~ that may be generated. The length of ~ _y and H (i) will be at least as large as K, the length of x . When the alignment between bits and symbol vectors is not fixed, the bit-level combining receiver of Fig. 3c can be employed if using symbol-level combining is impractical. However, this architecture is not optimal and may result in significant performance degradation when the different paths of the MIMO channel are


TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 37

correlated. The main cause is not the combining of bits instead of symbol vectors, but the separate detection and LLR calculation after each transmission. When the channel matrix H is ill conditioned, erroneous decisions may be made about the individual elements of a symbol vector x even when the quality of the received symbol vector is good. On the other hand, when symbollevel combining is used, detection and LLR calculation are performed after gathering information from all retransmissions. From the viewpoint of the architecture of Fig. 3b, the condition of the H(i)[m] equivalent matrix is better ~ than that of some of the matrices H[m]. Although bit-level combining is suboptimal, it is also less complex, because the same blocks are reused in Fig. 3c regardless of the number of retransmissions. The required storage is also reduced because only the accumulated LLRs of the bits of the mother code need to be stored. In some systems the ML detector and LLR calculator block may be too complex to implement, even in the simplified receiver of Fig. 3b. In this case equalization across the spatial streams can be used (recall that flat fading is assumed). Linear or decision feedback equalizers (DFEs) (zero-forcing [ZF] or minimum mean square error [MMSE]) can be employed. The MIMO equalization schemes described above are well known and not particular to HARQ. They can be implemented efficiently, for example, using QR decomposition. This brief overview is given in order to facilitate the discussions in the remainder of this section. When CC-HARQ is employed, the receiver of Fig. 3d can be used. First, the spatial streams are decoupled using an equalizer. Then, for each element x^j of the equalized symbol vector sequence ^ x , separate LLR calculators are used that take into account the mapping of the transmit symbols into symbol vectors and the corresponding channel estimates. In general, the ^ xj are soft values and are not sliced to the nearest constellation symbol. Each time symbol vectors from a new transmission arrive, they are combined with the symbol vectors of all previous transmissions, and the equivalent symbol vectors are re-equalized using the equivalent channel matrix sequence. Similar to the ML case, the pre-equalization symbol-level combining operation does not result in information loss. For this reason, the scheme of Fig. 3d exhibits the best performance among all equalization-based architectures [8]. After each retransmission, the equivalent vector sequence _~ y is stored at the receiver, together with the equivalent channel ~ ~ matrix sequence H. Each H [m] is Hermitian and of size n t × n t. Hence, K × n t ~× (n t + 1)/2 complex entries are required for H and K × nt complex entries for ~ _y. In order to reduce storage, a post-equalization symbol-level combining scheme, shown in Fig. 3e, can be used. The received signal vectors _y (i) are equalized after each retransmission, and the resulting symbol vector sequences ^(i) x are combined before LLR calculation. Only the _y (i) are used to obtain the ^(i) x . It can be shown that the optimal way to combine the ^(i) x is using MRC, which consists of multiplying each element ^ xj(i) of ^(i) x with a complex weight that depends on the


channel estimate H (i) and accumulating the result with the values from previous transmissions [8]. The resulting weighted sum is normalized before LLR calculation. Post-equalization symbol-level combining reduces receiver memory because instead of a sequence of K Hermitian matrices, only K × nt normalization weights γj(i) need to be stored in addition to the weighted and accumulated ^(i) x . However, post-equalization symbol-level combining exhibits performance loss compared to pre-equalization combining [8, 9]. Therefore, for fixed bit-to-symbol vector alignment, use of post-equalization combining is motivated by the need to reduce the storage at the receiver. Even when n t is small, the savings can be significant when K is large. The storage requirements can be reduced further (by K × nt complex values per transmission) by combining the ^ xj(i) using equal weights [9] at the cost of additional performance degradation. The receiver of Fig. 3e can also be used for IR-HARQ. The difference is that K should be replaced by the number of all possible symbol vectors x that may be sent from the transmitter before reaching the transmission limit N. When the bit-to-symbol vector alignment is not fixed or the number of symbol vectors is large, the structure of Fig. 3f can be employed, whose difference with that of Fig. 3e is that the LLRs are calculated directly after equalization. The performance of the receiver of Fig. 3f is inferior compared to the other schemes. The largest part of the performance degradation is caused by the separate equalization after each transmission without combining information from different transmissions. What needs to be stored now are the LLRs of the bits of the mother code c that are sent to the channel. Hence, if use of the receiver is considered for HARQ with fixed bit-to-symbol vector alignment, in order to determine whether memory reduction can be achieved compared to other architectures, the total number of different symbols that are sent to the channel needs to be taken into account. Table 1 summarizes the MIMO-HARQ receiver architectures presented in this section and their memory requirements.

In some systems, the ML detector and LLR calculator block may be too complex to implement. In this case, equalization across the spatial streams can be used. Linear or Decision-Feedback Equalizers (DFE) (Zero-Forcing [ZF] or Minimum Mean-Square Error [MMSE]) can be employed.

COMPARISON OF RECEIVER ARCHITECTURES AND EXAMPLES In the previous section it was argued that the receiver implementation depends on the transmission scheme (CC- or IR-HARQ), whether the alignment between bits and symbol vectors is fixed, and the constraints in memory and complexity. Simplifying the receiver may come at a price. As an example of the performance degradation caused by suboptimal receiver implementations, an IEEE 802.16e compliant system using partial usage of subchannels (PUSC) and spatial multiplexing (Matrix B) is considered [2]. Two transmit and two receive antennas are employed, communicating through a vehicular Type A channel with a high degree of spatial correlation and Doppler speed equal to 120 km/h. The data are encoded using the mother rate-1/3 CTC. Bits are punctured sequentially to produce sequences of equal length, as in Fig. 2.

37

TOUMPAKARIS LAYOUT

12/18/08

The designer needs

3:40 PM

Page 38

Receiver implementation

Storage requirements

Comments

K × N × nr × (1 + nt)

K: length of symbol vector sequence per HARQ transmission N: maximum number of transmissions nt/nr: number of transmit/receive antennas Can be used with any HARQ scheme. Optimal.

to take into account the complexity and memory constraints,

Generic (Fig. 3a)

the channel characteristics, and the maximum

For CC-HARQ:

allowed number of retransmissions before deciding on

Symbol-level combining with ML detection (Fig. 3b)

the MIMO-HARQ

 n + 1 K × nt ×  1 + t   2 

C: number of distinct equivalent symbol vectors and equivalent channel estimate sequences C = L/(r × b × nt × K) for fixed bit-to-symbol vector alignment. Optimal.

For IR-HARQ:

 n + 1 K × C × nt ×  1 + t   2 

scheme.

Bit-level combining with ML detection (Fig. 3c)

r: rate of mother code L: length of original data sequence (in bits) b: bits per symbol. Suboptimal.

CC-HARQ: K × nt × b IR-HARQ: L/r

For CC-HARQ:

 n + 1 K × nt ×  1 + t   2  Pre-equalization symbollevel combining (Fig. 3d)

C: number of distinct equivalent symbol vectors and equivalent channel estimate sequences C = L/(r × b × nt × K) for fixed bit-to-symbol vector alignment. Inferior to generic.

For IR-HARQ:

 n + 1 K × C × nt ×  1 + t   2 

Post-equalization symbollevel combining (Fig. 3e)

For CC-HARQ: 2 × K × nt For IR-HARQ: 2 × K × C × nt

C: number of distinct equivalent symbol vectors and equivalent channel estimate sequences C = L/(r × b × nt × K) for fixed bit-to-symbol vector alignment. Inferior to pre-equalization combining.

Bit-level combining with equalization (Fig. 3f)

CC-HARQ: K × nt × b IR-HARQ: L/r

r : rate of mother code L: length of original data sequence (in bits) b: bits per symbol Inferior to all the above.

n Table 1. Comparison of memory requirements and performance of MIMO-HARQ receiver implementations.

In Fig. 4a the bit-level combining receiver of Fig. 3f is employed using zero-forcing linear equalization (ZF-BLC). 64-QAM and the rate1/2 code of Fig. 2a are considered. IR-HARQ has a coding gain of more than 1 dB over CCHARQ because of the additional parity bits that are transmitted. However, when the optimal preequalization symbol-combining receiver of Fig. 3d is used with CC-HARQ (MRC-ZF), the system exhibits a gain of almost 2 dB over IRHARQ. CC-HARQ also outperforms IR-HARQ when ML detection is used instead of equalization, as seen from curves MRC-ML and MLBLC that correspond to the receivers of Figs. 3b and 3c, respectively. The performance advantage of IR-HARQ can be recaptured using the receiver of Fig. 3a at the cost of increased complexity and memory requirements. When a rate-5/6 code is used, the coding gain

38

of IR-HARQ over CC-HARQ is much larger than the rate-1/2 code (on the order of 4 dB, as shown in Fig. 4b). Therefore, although symbollevel combining improves the performance of CC-HARQ, IR-HARQ still achieves a gain of approximately 1 dB. The gain is attained for both equalizer-based and ML-based implementations.

CONCLUDING REMARKS This article examines the implementation of HARQ in wireless systems employing BICM, mainly in the MIMO context. Because of the introduction of new dimensions, a number of different architectures can be used for the receiver. In general, the designer needs to take into account the complexity and memory constraints, channel characteristics, and maximum allowed


TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 39

101

101 1 transmission, ZF 2 transmissions, CC, MRC-ZF 2 transmissions, CC, ZF-BLC 2 transmissions, IR, ZF-BLC 2 transmissions, CC, MRC-ML 2 transmissions, IR, ML-BLC

100

100

10-1

BER

BER

10-1

10-2

10-2

10-3

10-3

10-4

10-4

10-5

1 transmission, ZF 2 transmissions, CC, MRC-ZF 2 transmissions, CC, ZF-BLC 2 transmissions, IR, ZF-BLC 2 transmissions, CC, MRC-ML 2 transmissions, IR, ML-BLC

10

12

14

16 18 SNR [dB]

20

22

24

10-5

15

20

25

30

SNR [dB]

(a)

(b)

■ Figure 4. MIMO system, Type-A vehicular channel, 120 km/h, high inter-stream correlation, PUSC, spatial multiplexing: a) 64-QAM, code rate = 5/6, packet size = 54 bytes; b) 64-QAM, code rate = 5/6, packet size = 60 bytes. number of retransmissions before deciding on the MIMO-HARQ scheme. In order to improve the performance of HARQ with low receiver complexity, proper bit-to-symbol vector alignment can be used to enable symbol-level combining at the receiver. Moreover, new code designs could focus on developing IR-HARQ schemes that are robust to suboptimal receiver implementations.

REFERENCES [1] S. Lin, D. J. Costello, Jr., and M. J. Miller, “AutomaticRepeat Request Error-Control Schemes,” IEEE Commun. Mag., vol. 22, Dec. 1984, pp. 5–17. [2] IEEE Std. 802.16e-2005, “IEEE Standard for Local and Metropolitan Area Networks, Part 16: Air Interface for Fixed Broadband Wireless Access Systems, Amendment 2: Physical and Medium Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands,” Feb. 2006. [3] 3GPP TS 25.201 V8.0.0 (2008-03), “3rd Generation Partnership Project; Technical Specification Group Radio Access Network; Physical Layer — General Description (Release 8).” [4] K. R. Narayanan and G. Stüber, “A Novel ARQ Technique Using the Turbo Coding Principle,” IEEE Commun. Lett., vol. 1, no. 2, Mar. 1997, pp. 49–51. [5] Z. Ding and M. Rice, “Hybrid-ARQ Code Combining for MIMO Using Multidimensional Space-Time Trellis Codes,” Proc. IEEE ISIT ’07, Glasgow, Scotland, June 2007. [6] E. W. Jang et al., “Optimal Combining Schemes for MIMO Systems with Hybrid ARQ,” Proc. IEEE ISIT ’07, Nice, France, June 2007.


[7] J.-F. Cheng, “Coding Performance of Hybrid ARQ schemes,” IEEE Trans. Commun., vol. 54, no. 6, June 2006, pp. 1017–29. [8] D. Toumpakaris et al., “Storage-Performance Tradeoff for Receivers of MIMO Systems Using Hybrid ARQ,” Proc. 9th IEEE Int’l. Wksp. Sig. Processing Advances in Digital Commun., Recife, Brazil, July 2008. [9] E. N. Onggosanusi et al., “Hybrid ARQ Transmission and Combining for MIMO Systems,” Proc. IEEE ICC, vol. 5, May 2003, pp. 3205–09.

BIOGRAPHIES JUNGWON LEE [S’00, M’05] ([email protected]) received a Ph.D. degree in electrical engineering from Stanford University in 2005. From 2000 to 2003 he worked as an intern for National Semiconductor, Telcordia Technologies, and AT&T Shannon Labs Research, and as a consultant for Ikanos Communications. Since 2003 he has worked for Marvell Semiconductor Inc., Santa Clara, California, where he is now a principal engineer/senior manager. His specific research interests are in wireless and wireline communication theory with emphasis on OFDM and single-carrier system design, transmission optimization, resource allocation, cross-layer design, and estimation and detection theory. D IMITRIS T OUMPAKARIS [S’98, M’04] ([email protected]) received his Diploma in electrical and computer engineering from the National Technical University of Athens, Greece, in 1997, and his M.S. and Ph.D. degrees in electrical engineering from Stanford University in 1999 and 2003, respectively. He was a senior design engineer in Marvell Semiconductor Inc., Santa Clara, California, from 2003 to 2006. He has also worked as an intern for Bell-Labs, CERN, and France Télécom, and as a consultant for Ikanos Communications and Marvell Semiconductor Inc. He is currently

39

TOUMPAKARIS LAYOUT

12/18/08

3:40 PM

Page 40

an assistant professor in the Wireless Telecommunications Laboratory, Department of Electrical and Computer Engineering, University of Patras, Greece. His current research interests include information theory with emphasis on multi-user communications systems, digital communication, synchronization and estimation, and cross-layer optimization. EDWARD W. JANG [S’04] ([email protected]) received his B.S. degree in electrical engineering from Seoul National University, Korea, in 2002, and his M.S. degree in electrical engineering from Stanford University in 2004. He is currently pursuing his Ph.D. degree at Stanford University. His research interests include transmission schemes for systems with a limited feedback rate and MIMO systems with HARQ. HUI-LING LOU ([email protected]) is a senior engineering director at Marvell Semiconductor, Santa Clara, California, leading teams responsible for physical layer standards, systems, and architecture design and development for mobile WiMax chip sets, and investigating next generation wireless technologies. She has also formed and led physical layer standards and systems teams that designed, developed and productized Marvell’s first 802.11n, Bluetooth, and digital FM chip sets. Prior to Marvell, she spent nine years at Bell Laboratories Research, Murray Hill, New Jersey, where she designed algorithms, systems, and efficient hardware architectures for cellular and digital broadcasting systems. She also developed a reconfigurable trellis codec chip for Amati Communications as a consultant in 1992.

40

She completed her M.S.E.E. and Ph.D. degrees at Stanford University in 1988 and 1992, respectively. She has more than 60 patents, granted and pending, and has published more than 50 peer-reviewed publications. J OHN M. C IOFFI [F‘96] ([email protected]) received his B.S.in electrical engineering in 1978 from the University of Illinois and his Ph.D. in electrical engineering in 1984 from Stanford University. He was with Bell Laboratories, 1978–1984, and IBM Research, 1984–1986. He has been a professor of electrical engineering at Stanford since 1986. He founded Amati Com. Corp in 1991 (purchased by TI in 1997) and was officer/director from 1991–1997. He currently is on the board of cirectors of ASSIA (Chairman), ClariPhy, Teranetics, Vector Silicon Inc., and the Marconi Foundation. He is on the advisory boards of Focus Ventures, Quantenna, and Amicus. His specific interests are in the area of high-performance digital transmission. Various awards include International Marconi Fellow (2006), Holder of Hitachi America Professorship in Electrical Engineering at Stanford (2002); Member, National Academy of Engineering (2001); IEEE Kobayashi Medal (2001); IEEE Millennium Medal (2000); IEE JJ Tomson Medal (2000); 1999 U. of Illinois Outstanding Alumnus, 1991 and 2007 IEEE Communications Magazine best paper; 1995 ANSI T1 Outstanding Achievement Award; NSF Presidential Investigator (1987–1992), ISSLS 2004, ICC 2006, 2007, and 2008 Conference Best-Paper awards. He has published over 250 papers and holds over 80 patents, of which many are heavily licensed including key necessary patents for the international standards in ADSL, VDSL, DSM, and WiMAX.