Multistage Iterative Decoding With Complexity Reduction for ...

1300

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 8, AUGUST 2005

Multistage Iterative Decoding With Complexity Reduction for Concatenated Space–Time Codes Andres Reial, Member, IEEE, and Stephen G. Wilson, Senior Member, IEEE

Abstract—Space–time encoders exploiting concatenated coding structures are efficient in attaining the high rates available to largedimensional multiple-transmitter, multiple-receiver wireless systems under fading conditions, while also providing maximal diversity benefits. We present a multistage iterative decoding structure that takes full advantage of the concatenated nature of the transmission path, treating the modulator and channel stages as an additional encoder in serial concatenation. This iterative decoder architecture allows an encoder employing decoupled coding and modulation to reach the performance of coded modulation systems. It also admits reduced-complexity decoding with a computational load that is nonexponential in the number of antennas or the transmission bit rate, and makes practical decoding for large transmitter arrays possible. The performance curves for these methods follow the shape of the Fano bound, with only a modest power penalty. Index Terms—Diversity, iterative decoding, reduced-complexity decoding, space–time coding (STC).

I. INTRODUCTION

H

IGH bandwidth efficiency is achievable via the use of multiantenna transmitters and receivers. Initial results indicating that systems with several transmit and receive antennas offer large ergodic channel capacities, proportional to the smallest array dimension, were presented in [1]. Furthermore, under fading conditions which occur naturally, due to multipath effects in wireless communications, rates very close to the ergodic capacity are also available with arbitrarily small outage probabilities, making it practical to use fixed-rate transmission on any realization of the fading channel ensemble [2], [3]. Experimental transmission rates up to several tens of b/s/Hz have been reported for high signal-to-noise ratios (SNRs) [4], however, full exploitation of the channel potential under all conditions requires taking advantage of the available high diversity orders. When the channel side information is not available at the transmitter, linear processing ([5]–[7]) or coding ([8], [9]) at the transmitter can be used to guarantee transmit diversity. Although a wide variety of space–time coding (STC) approaches exists, providing high diversity orders for high data rates over

Paper approved by N. C. Beaulieu, the Editor for Wireless Communication Theory of the IEEE Communications Society. Manuscript received June 10, 2000; revised January 29, 2003 and August 18, 2004. This work was supported by the National Science Foundation under Grant NCR-9714646. This paper was presented in part at CISS 2000, Princeton, NJ. A. Reial is with Ericsson Research, Ericsson AB, 22183 Lund, Sweden (e-mail: [email protected]). S. G. Wilson is with the Department of Electrical and Computer Engineering, University of Virginia, Charlottesville, VA 22904 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TCOMM.2005.852848

large transmitter arrays has remained a difficult problem. The main obstacle is the decoding process, which often requires exponential computational complexity in the number of transmit antennas, either due to the space–time encoder-state complexity or the size of the transmit vector alphabet. In response, a number of results proposing concatenated encoder structures for space–time (ST) applications that can circumvent the state complexity bottleneck have been presented recently [10]–[12], with promising diversity and data-rate results. [12] also provides a theoretical justification why concatenated ST codes can be expected to perform well in the context of fading multiple-input, multiple-output (MIMO) channels. In this paper, we focus on efficient decoding algorithms to complement the serially concatenated ST encoder architectures. Our goal is to take advantage of the growing outage capacity and transmit at a rate proportional to the array size, while achieving maximal diversity benefits, and accomplish this with a practical decoding complexity, even for large antenna arrays. The paper is organized as follows. In Section II, we introduce the transmission system model. Next, the valuable features of concatenated codes are reviewed in Section III as they relate to the necessary conditions for the full-diversity operation, and a concatenated encoder structure is presented that is well suited for scalable-rate transmission. This section is a summary of the analysis previously presented in [11] and [12]. An extended iterative decoding architecture and its reduced-complexity variants are then developed in Section IV. It is exactly the concatenated encoder viewpoint, both explicit and implicit, of the transmission chain that allows us to handle the decoding efficiently, without the traditional exponential complexity. The performance results are presented in Section V, and in Section VI, we conclude with some comments about the sensitivity of the reduced decoding to adverse channel conditions and a summary. We note that a multitude of related work has been published during the review cycle of this paper. In [13] and [14], differential STC systems are presented for flat-fading channels, using a concatenation of simple convolutional codes. Constituent code and interleaver design guidelines for serially concatenated ST codes are proposed in [15]. Other concatenated ST encoder structures have been proposed, for example, in [16]. II. TRANSMISSION MODEL The general system model for transmission over an -transmatrix mitter, -receiver fading channel, denoted by a , is depicted in Fig. 1. Although the performance of ST transmission depends on the cooperation of the binary encoder and the signal space modulator, we find it convenient to view the from two as separate entities. Source symbols

0090-6778/$20.00 © 2005 IEEE

REIAL AND WILSON: MULTISTAGE ITERATIVE DECODING WITH COMPLEXITY REDUCTION

Fig. 1. General ST transmission system model.

a -ary alphabet are encoded by some encoder of rate , from an alphabet producing code symbols of size , where and are the th bit in the information and code symbol label, respectively. Each of the trans, mitters uses a -dimensional -ary constellation to and a 1–1 mapping in the vector modulator maps each a transmit vector . In general denotes the th element of vector , and notation, denotes a vector where the th element of has been replaced by . We consider block transmission of information bits, and , the th codeword is a sequence of vectors is the number interpreted as a -by- matrix, where of vector transmissions per block. At each transmission, the receiver observes an -tuple

where is a white Gaussian noise vector of circularly symper conmetric elements, with zero mean and variance stellation dimension. The vector symbol SNR for complex conis defined as stellations , and the bit SNR as . We assume that the channel matrix is known at the receiver, but no channel side information is available at the transmitter, and adopt the standard quasi-static fading model, where the channel is constant for the duration of a block. Multipath delay spread, and thus intersymbol interference (ISI), is assumed negligible. The linear growth of capacity with and the achievis justified when we assume that the mulable diversity tipath consists of many uncorrelated rays, resulting in complex Gaussian path gains and Rayleigh fading statistics. However, the proposed decoding scheme does not rely on any specific channel statistics, and is applicable to an arbitrary channel realization. III. STC DESIGN CRITERIA AND CONCATENATED ENCODERS A number of STC encoder structures employing concatenation have been proposed recently [10], [17], [18], primarily as replacements or augmentations to the traditional trellis [8] or block STC [9]. Next, to motivate the following decoder design, we will briefly review the main features of the concatenated coding structures that make them well suited for STC tasks. The rigorous development of the presented properties and criteria may be found in [11] and [12].

1301

. For large states to provide the maximal diversity order of and , straightforward decoding of codes with such complexity is not feasible. However, it is well known that the family of concatenated and interleaved coding structures (turbo codes: [19], [20]) exhibits inherent high composite state complexity, possibly exponential in . The effective high number of states is achieved via interleaving. On the other hand, these codes can be iteratively decoded with effort proportional to the relatively simple constituent-code complexity, making them good candidates for STC since they satisfy the necessary condition (1). B. Performance on Gaussian Channels The traditional concatenated coding structures are known for their ability to perform near the Shannon capacity limits on Gaussian channels. Viewing each realization of the block-fading multiantenna channel as a separate MIMO Gaussian channel, it is clear that when near-capacity performance can be achieved on an arbitrary channel realization, the average bit-error rate (BER) over the fading ensemble must also be within certain bounded distance of the information-theoretic limits. This is a useful design justification for ST codes. Contending that the performance of a practical code is close to the information-theoretic limit on any channel realization implies that, for a fixed reference BER , the SNR ratio and the capacity limit between the observed performance SNR

is bounded. This may be thought of as a bounded SNR gap in the log-log coordinates. That is (2) for some , where is such that . over the channel enIn this case, averaging the practical semble results in asymptotic slope in log-log coordinates, -fold diversity is guaranteed, [11], [12]. In other and thus, is a sufficient condiwords, a bounded performance gap tion for asymptotic full diversity on a fading channel ensemble. Thus, a strong concatenated code, whose performance is within a bounded distance from the Fano bound for any channel realization, will also provide the maximal diversity order at high SNR. C. Average Performance and Rank/Determinant Properties In the context of [8], we can isolate certain features of the concatenated encoder that underlie good STC fading perfor. mance. For the codewords and , denote For full diversity operation, it is required that rank Denoting the nonzero eigenvalues of

A. State Complexity It was shown in [8] that an ST trellis code transmitting at a data rate b/vcu (bits per vector channel use) over an -transmitter, -receiver channel, must have (1)

as

, the coding

. gain of the STC is proportional to Interleaved concatenation is well suited to satisfying the conditions of [8]. The weight multiplication, arising from the use of an interleaver in a typical (parallel or serial) concatenated encoder, ensures a certain minimum Hamming distance at the

1302

Fig. 2.


Serially concatenated STC system model.

output of . More importantly, it also thins the weight spectrum, reducing the probability of codeword pairs with small distance. In addition, an interleaver prior to the vector modulator can efficiently disperse the Hamming weight that determines , or equivalently, disperse the difference the elements of weight to produce many instances of nonzero Euclidean dis, pseudorandomly and uniformly. tance among all entries of This produces codeword pairs with that has full rank and large determinant with high probability, which, in turn, leads to favorable rank and Euclidean distance properties. Therefore, a serially concatenated coding scheme with sufficient interleaving can be the ST encoder of choice for high data rates and/or antenna array sizes. Asymptotic full diversity requires that all codeword pairs . However, nonasymptotically, it is entirely yield a full-rank feasible to build a system with slopeBER down to any if we can control the rank properties of the code desired statistically, i.e., guarantee that the fraction of rank-deficient pairs is sufficiently low. It can also be shown [11], [12] that the probability of rank-deficient events can be driven arbitrarily low, simply by increasing the block length. D. Encoder Architecture The serially concatenated coding system may be seen in Fig. 2, by considering the full details of implementing the with encoder function . The outer convolutional encoder states operates at the rate , with input bits and output bits. Analogous notation is used for the inner convolutional encoder , which is recursive. Suitable code rates and input alphabet sizes can be chosen independently for each encoder, because between the encoders, the bit stream ensures that most events is serialized. The first interleaver generating low weight out of will produce significant output weight after . This guarantees a certain minimum Hamming distance, thins the code distance spectrum, and provides the interleaving gain by ensuring that the minimum-distance events have minimum information weight. The second interleaver spreads the output weight uniformly among the transmitters for the vector modulator. It also makes the likelihoods of the bits constituting a given transmit vector appear approximately independent to the decoder , avoiding bursty errors in decoding. The spread-random bit interleavers (SIL) [21] are used and because of their pseudorandom structure and for guaranteed spreading properties.

Fig. 3. (a) Two-stage nonrecursive E encoders.

and (b) recursive E

systematic

Many constituent encoder structures are possible. Because the averaged performance over a fading channel ensemble is dominated by channel realizations with low effective SNR, we chose simple two-state codes, known for their robustness at low SNR in iterative decoding [22]. The information transmission b/vcu depends on the rates of the rate constituent encoders. Rate-1/2 systematic encoders were used (Fig. 3), followed by puncturing of parity bits, to produce the de. For the well-known SIL sired overall encoder rate and constituent encoder structures, lower bounds on the block length can be formulated as necessary conditions for full diversity operation [11], [12]. IV. DECODING Maximum-likelihood (ML) or maximum a posteriori (MAP) decoding complexity for STC schemes is typically determined by the state complexity of the encoder and its effective input alphabet size. Concatenated encoders usually admit a suboptimal, but well-performing, iterative decoding algorithm, which mitigates the concerns about the composite encoder complexity. Unfortunately, no simplification is generally available for the vector symbol alphabet . Due to the matrix channel effects, the elements of receiver alphabet cannot be decomis exponential in , posed into scalar components, but since we should also avoid decoding methods which require exhaustive coverage of all vector hypotheses. Successive-cancellation approaches have been used [4], [23] to decouple the code symbols , but these limit the achievable diversity, analogous to the observation that the final symbol decision in the decoding . Furtherscheme [2] has a diversity order of , instead of more, true soft information, accounting for the imperfect cancellation, is not easily available. Consistent with the concatenated structure of the encoder, we will instead look for a generalized iterative algorithm that is amenable to alphabet size reduction.


Fig. 4.

Traditional two-stage SISO decoder structure.

Fig. 5.

Generalized three-stage SISO decoder structure.

A. Generalized Iterative Decoding Iterative decoding of concatenated codes with soft-input, soft-output (SISO) constituent decoders [20], an approximation of the true MAP algorithm, requires initial computation of the marginal likelihoods for all transmitted bits. The complexity of this procedure is exponential in in case of the optimal, total-probability method, which involves computing the likeli. However, the number hoods of all vector hypotheses of handled hypotheses can be reduced significantly, if the decoding structure is augmented with an additional feedback stage, as explained next. The cascading of the transmission stages in Fig. 2 can be viewed in two different ways. Traditionally, the encoding function is viewed as being limited to the explicit concatenated encoder , followed by a composite channel, consisting of the modulator, matrix , and the receiver noise source. The comand output symbols . A posite channel input symbols are simple iterative decoding loop [20] in Fig. 4 approximates MAP decoding. However, we can also interpret the combination of the modulator and the matrix channel as a “virtual” third encoder in serial concatenation, followed by a simple Gaussian noise is treated as channel. If the outer encoder a stand-alone stage, then this combination on its own (channel

1303

) can also be decoded in a single loop. input being Joining the two loops generalizes the conventional two-stage decoding structure to three stages, leading to the configuration shown in Fig. 5. The extra feedback loop allows the information , from a previous iteration, describing the coded bits to augment the information derived from the probability distrito improve the quality of the curbution of vector symbols . Clearly, when the second loop is rent estimates of cut, this scheme becomes equivalent to the traditional two-stage are processing where the probabilities of the code symbols calculated only once, based on the received data alone, and some degradation in decoding results should be expected, compared with the three-stage approach. However, adding the third stage is useful only when the interleaver is present, as the independence of the probability distributions for adjacent bits as they are passed between the encoders is critical to the success of iterative decoding. The full decoding complexity in an -stage SISO loop is directly proportional to the number of states and the size of the input alphabet in each of the constituent encoders

1304


Fig. 6. Subset selection for reduced-complexity decoding.

where is the number of iterations. The binary encoders and can be chosen with small constraint lengths , , and input alphabet sizes , . The “virtual” encoder is memoryless, so it has only one state, . However, since , the edge complexity makes full-complexity decoding infeasible for large . We will propose a sub, effectively considering only a optimal approximation to small number of edges and pruning the rest. B. Reduced-Complexity Solution (simplification of the The SISO-style channel decoder general case in [20]) must produce posterior extrinsic probamaking up the symbols . bility distributions for the bits For the one-state encoder, these assume the form

, where and deto form the desired set note the sets with edges carrying preferred likelihood and prior metrics, respectively. Since the two parts are decoupled, it is, especially at initial iterations, important to merge their contributions on each edge we select. This is achieved by a search that uses an element of as an initial value function . and attempts to maximize the composite metric Denote and , then . This , implemented as an subset composition algorithm to find edge-selection preprocessor attached to , is depicted in Fig. 6. We next cover the selection and search algorithms in more detail. The time indexes are omitted, since a single code vector is considered at any time. C. Likelihood-Based Selection

(3) where is the channel measurement-based Gaussian likelihood of . By , we denote the output of the matrix channel when the vector modulator input is . The term

provides the bit prior probabilities available from a previous itand are not strictly proberation. The quantities abilities, due to the lack of proper normalization, which for the final result is ensured via a suitably chosen . In the full-complexity implementation, and . The reduction in computation is achieved by judiciously composing the set with reduced cardinality, isolating the dominant terms in (3). We use the likelihood and prior metrics separately

For a nondiagonal channel transformation , the decision regions in the transform space do not have simple Voronoi regions, -dimenand simple quantization or algebraic detection in the sional space is not sufficient to determine the ML estimate of the transmit vector and other hypotheses with high likelihoods to be used in . To avoid an exhaustive search, we operate in the original transmitter space, where the constellations at each transmitter have simple decision regions. To that end, we perform the pseudoinverse operation on the received vector , and quantize the result to reach an estimate of the transmitted vector

The pseudoinverse

can be formed as zero forcing (ZF) , or minimum mean-square error (MMSE) . However, it is generally not an identity matrix, so the noise term will be spatially correlated, and the quantization result is not necessarily close to the ML estimate.


On well-conditioned and large-norm channel realizations, the estimate is likely to have a high quality, but for badly conditioned channels, the equilikelihood surfaces will lose their spherical form, and become severely stretched ellipsoids -dimensional space. To provide a safety mechanism in for handling such cases, we also evaluate the vicinity of in space. For the th transmitter, a list of nearest conthe stellation symbols with smallest metrics , , is produced. We then evaluate at neighthe received vector likelihoods vectors with best likelihoods to boring points, and pick the form . The direct inverse modulatormapof thenprovidesthedesiredbinarywords that comprise . The concise algorithm can be presented as follows: ; 1) 2) for all transmitters, , find with nearest neighbors [table lookup, search can be performed offline]; , for all nearest neigh3) for all transmitters, bors, , compute ; vectors with best likelihoods from among 4) pick and perturbed to form ; . 5) Our implementation has complexity per transmitted nearest vector . We used a ZF pseudoinverse and neighbors per transmitter in the evaluations in Section V. D. Prior-Based Selection Unlike the codewords with good likelihoods, the codewords with the best prior probabilities can be determined exactly, with trellis edges with largest the guarantee that precisely the are, indeed, selected. prior components We start by making a hard decision on the input-bit probability to determine the preferred codeword distribution if else. This is the codeword with maximum prior probability (4) entries, we will look for bitTo find the subsequent inversion patterns that cause minimal decrease in the modified prior product. To that end, we first evaluate the effect of inverting a single th bit in the codeword (5) The effect of inverting several bits is expressed simply by the product of the relevant ’s. For further convenience, we take the log of these quantities to work with sums, and sort them in . We now have an ordered list ascending order, numbers , with the objective to find of distinct sets such that are as small as possible. Let denote the largest element of

Fig. 7.

1305

Search-tree structure for Lq = 3.

The structure of the search can be visualized in the form of levels and nodes, where each level correa tree with , . Each edge sponds to the number of elements in associated with it, and each node is described has a weight and a sum . Fig. 7 illustrates the structure for by a set . The sequence of edges leading from the root to any of the nodes has a monotonically increasing label sequence, and therefore, also a nondecreasing weight sequence. Determining the best prior codeword is equivalent to selecting the root node, corresponding to and . are recorded, and The nodes selected in steps 1 through can only be a direct descendant of one of the the th node nodes already chosen. Violation of this principle would lead to skipping some of the nodes with better sums because is chosen, of the monotonic edge-weighting scheme. When . its immediate descendant is necessarily must also be updated, changing its The parent node . We can execute next descendant to the selection of the th node by picking the top entry out of a ordered entries stemming from previous selections, list of , each already calculated when that selection was made and ordered by the candidate sums . After the th step, which used the th selection as the parent node , we only have to add and , modify and , and insert them at their proper positions in the list . If , the node will be pruned from the last edge label was further evolution. The algorithm therefore has a simple and ef. ficient structure, with complexity 1. Determine the best prior codeword as in (4) 2. Compute the flip-effect values as in (5) 3. Create the additive sorted list 4. set , , 5. for , , place and in their proper positions in ascending order of metrics or prune if 6. for

in the

1306


E. Space-Filling Initialization At the first iteration, there is no useful information available from to direct the search for . But since the likelihood-based initial starting points in may not fully capture the distribution of all bit positions, and may exclude some states of some bits altogether, it is advantageous to expand the search space by guided providing additional initial points in lieu of the estimates that are available on subsequent iterations. If, in the absence of any prior information, all codewords are considered equiprobable, it is desirable to spread the initial points of the search paths as widely as possible in the codeword space to give all bit states a roughly equal opportunity to is large, the starting points can simply be be reached. If distributed randomly, with uniform distribution, over the vector is small, a more systematic method is alphabet . But if required to guarantee approximately uniform coverage. The problem is equivalent to designing a codebook that is spread in the available space as uniformly as possible, in order to provide maximal intercodeword distances. The desired “code” codewords, each from the alphabet of size . consists of code with , This is equivalent to designing an , and . The constructive method of the Varshamov bound [24, p. 439] can be applied to form for the code [11]. a parity-check matrix , the generator matrix is given by Denoting

Wethencreate thedistinctbinarypatterns for , and compute the desired initialization points as . This process can be performed offline, and thus does not contribute to the overall complexity. F. Composite Search Given the initial points , we need to find with good composite metthe set of codewords , although global optimality cannot rics be guaranteed. We use a method similar to the likelihood-based selection above, with the difference that the search will occur in the Hamming space and has a greedy nature. For each ini, we find the composite metrics tial estimate for all bit vectors having a Hamming distance 1 from , or . From the set we choose the best metric

that will be picked as the survivor at the th step. This process is repeated until there is no improvement , or times, each time setting for the new iteration. up to ordered lists of codewords and Given the , we finally choose unique entries with their metrics the best metrics to form . All other edges in the trellis of are pruned. 1. for all for up to

, set search steps, or until no improvement in

form determine set 2. Out of saved words comprise in (3).

, save choose

The complexity of the search is complexity of the final selection is

best that

, and the .

G. Reduced-Complexity Parameter Selection Up to now, we have expressed the computational complexity of the subset-selection scheme in terms of the selection-space , , . Practical decoding for very large parameters arrays is made feasible by our finding that the subset results closely approximate the full-complexity implementation if the (or , as selection parameters depend only linearly on allowed by information theory when ). We have chosen , , and , them as factor expands the search to obtain more where the last consistent results for larger constellations where individual bits are closely coupled. Thus, the subset algorithm can be operations per information bit implemented in only , assuming ). We note that for small transmitted (or notation values of , the asymptotic growth rates of the do not yet come into effect, and for , we have found experimentally that the complexity growth, judged by the processing time in the computer simulations, appears to be effectively linear in . In other words, for small-to-modest , the processing time per bit remains approximately independent of . V. PERFORMANCE RESULTS We next present numerical simulation results for the concate, , two-state ennated STC encoder with coders, quaternary phase-shift keying (QPSK) signaling at each transmitter, and decoding using ten three-stage iterations with , except where noted. The block length is indicated for each figure. We will focus on large transmit-array results, which is the main intended application for the method. Since we are not aware of previously proposed systems targeting full diversity while allowing the transmission rate to grow proportionally to the array size, we chose the information-theoretic limits as the comparison criterion for our results. It can be shown [11] that for large array sizes, the asymptotic diversity order is reached only for high SNR and very low , precluding a direct diversity slope evaluation based on the performance in the observable region. Instead, we will compare our performance against the average lower bound to , arising from the application of the Fano inequality, averaged over all channel realizations we encounter in the average BER calculations. The bound is indicated as a dotted line in the plots. The practical coding system is deemed satisfactory in terms of diversity if the slope of the average error curve agrees with that of the Fano bound at the same . We first compare the two-stage and three-stage decoding , where the array size is not yet prohibitively process for large for full-complexity decoding. We can, therefore, directly


Fig. 8. Comparison of two-stage and three-stage iterative decoding for M = 4, b = 4 b/vcu, small block length N = 2000.

L=

compare the different decoding structures. In Fig. 8, it can be seen that the edge-pruning approach to reducing complexity does not incur a significant performance penalty, compared with the full-complexity implementation. The SNR penalty is negligible when the search-space parameters are chosen proportional to the array size , as indicated above. As anticipated, the three-stage decoder significantly improves upon the two-stage from decoding, because output extrinsic information on is exploited, and the reduced-complexity implementation also fully uses the benefits derived from the additional feedback loop. The main potential of this work lies in the observation that the reduced-complexity algorithm yields favorable results for larger antenna-array sizes, as well. Fig. 9 presents the system, using a block length of only performance of a 200, a system for which computational savings are significant. Although the full-complexity decoding results are not available, the closeness to the averaged Fano bound suggests that practical performance is not severely affected, even when the delay , constraints may limit the block length. Finally, Fig. 10 ( ) shows that the bandwidth efficiency 16-QAM, can be further improved by using higher order modulation. As discussed above, larger constellations require a slightly is applied in expanded search space, and the factor of these results. In most cases, the loss relative to the capacity limit curve is dominated by the channel realizations with low effective SNR, where the iterative decoding is more inferior, compared with ML decoding [12]. BER results are usually improved if the block length is increased. It is noteworthy region of interest for large arrays has moved that the to very low SNRs, as predicted by the Shannon-limit SNRs that follow from [2]. VI. DISCUSSION AND CONCLUSION Like many reduced-complexity decoding algorithms where a large fraction of the trellis edges are discarded, the proposed

1307

Fig. 9. Reduced-complexity decoding for L = M = 16, b = 16 b/vcu, small block length N = 200.

Fig. 10. Reduced-complexity decoding for L = M = 8, achieving high spectral efficiency b = 16 b/vcu with 16-QAM, N = 2000.

method also imposes the risk that, under some conditions, the edges carrying the correct hypothesis are omitted, and the decoding process is led astray. The effects may range from mild SNR penalty to a catastrophic failure to converge up to an unreasonably high SNR on a given channel realization. In terms of the ensemble performance, the average BER is most affected by channel realizations where the actual code performance significantly deviates from the capacity limit SNR . Since the latter is fundamentally determined by [11], the realizations with a weak norm are not intrinsically problematic. The degradation usually occurs on channels that in some sense severely warp the receiver space, which, in turn, results in highly ellipsoidal noise contours for the pseudoinverse results. The choice of initial points for the composite search then becomes both critical and difficult, and

1308


may result in a serious loss in performance. We have used the bias-free ZF pseudoinverse option, since although the effective noise enhancement would be reduced by the MMSE variant, our aim is locating the proper Voronoi region, as opposed to minimizing the Euclidean distance. These difficult channel realizations usually also have large condition numbers, and require more iterations to converge. Some of the factors that affect the robustness of decoding when some of the edges with significant metrics are inadvertently pruned include the constituent encoder structure and constraint length, the ratio of channel matrix dimensions and , and the size of the constellation [11]. The iterative decoding process presented in this paper can be likened to iterative decoding of bit-interleaved coded modulation (BICM-ID) that was investigated in [25] for single-antenna systems; the three-stage iterative process proposed here exhibits a similar beneficial effect of reducing the random bit modulation. In essence, the additional stage in the iterative structure recovers most of the energy penalty usually associated with disjoint coding and modulation, which, on the other hand, offers additional system flexibility. However, since it is the low-effective-SNR behavior that strongly affects the average performance, several key design criteria differ, e.g., we recommend the Gray labeling as the preferred constellation labeling scheme [11]. Our efforts are focused on lessening the sensitivity of dewith coding to the possible exclusion of some edge labels , in which context the Gray lasignificant metrics from beling scheme works best. However, the mixing effect of the makes this system somewhat less sensitive matrix channel to the labeling scheme than those operating over single-input channels. The serially concatenated encoder structure has several properties that make it well suited for ST applications in terms of diversity. However, it is the novel iterative decoding algorithm that makes it practical to exploit these desirable qualities when the advantages relative to conventional ST encoders really become significant, particularly for large array sizes. The relatively simple encoder is well complemented by a suboptimal , instead of decoding algorithm, requiring polynomial exponential , decoding complexity.1 The crucial step in the presented decoding approach lies in using the bit extrinsic probabilities, available from previous stages of decoding the convolutional constituent codes, to capture most of the probability mass while handling only a small number of possible transmitted vector hypotheses, instead of exhaustive coverage. This composite approach provides an appealing STC solution for cases where high data rates are desired with large transmitter arrays. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their useful comments and suggestions.

D

1We note that the complexity of may be reduced further, to O[L], by simplifying the computation of the extrinsic information P (v = v ; O), with a modest performance degradation [11].

REFERENCES [1] E. Telatar, “Capacity of multi-antenna Gaussian channels,” Eur. Trans. Telecommun., vol. 10, no. 6, pp. 585–595, 1999. [2] G. J. Foschini, “Layered space–time architecture for wireless communication in a fading environment when using multi-element antennas,” Bell Labs Tech. J., vol. 1, pp. 41–59, Autumn 1996. [3] G. J. Foschini and M. J. Gans, “On limits of wireless communications in a fading environment when using multiple antennas,” Wireless Pers. Commun., vol. 6, pp. 311–335, Mar. 1998. [4] G. D. Golden, C. J. Foschini, R. A. Valenzuela, and P. W. Wolniansky, “Detection algorithm and initial laboratory results using V-BLAST space–time communication architecture,” Electron. Lett., vol. 35, pp. 14–16, Jan. 1999. [5] G. Wornell and M. Trott, “Efficient signal processing techniques for exploiting transmit antenna diversity on fading channels,” IEEE Trans. Signal Process., vol. 45, no. 1, pp. 191–205, Jan. 1997. [6] A. Narula, M. D. Trott, and G. W. Wornell, “Performance limits of coded diversity methods for transmitter antenna arrays,” IEEE Trans. Inf. Theory, vol. 45, no. 11, pp. 2418–2433, Nov. 1999. [7] G. G. Raleigh and J. M. Cioffi, “Spatio-temporal coding for wireless communication,” IEEE Trans. Commun., vol. 46, no. 3, pp. 357–366, Mar. 1998. [8] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space–time codes for high data rate wireless communication: Performance criterion and code construction,” IEEE Trans. Inf. Theory, vol. 44, no. 3, pp. 744–765, Mar. 1998. [9] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space–time block codes from orthogonal designs,” IEEE Trans. Inf. Theory, vol. 45, no. 7, pp. 1456–1467, Jul. 1999. [10] A. Stefanov and T. M. Duman, “Turbo-coded modulation for wireless communications with antenna diversity,” in Proc. IEEE Veh. Technol. Conf., Sep. 1999, pp. 1565–1569. [11] A. Reial, “Concatenated space–time coding,” Ph.D. dissertation, Univ. Virginia, Charlottesville, VA, May 2000. [12] A. Reial and S. G. Wilson, “Concatenated space–time coding for large antenna arrays,” IEEE Trans. Inf. Theory, submitted for publication. [13] A. Grant and C. Schlegel, “Differential turbo space–time coding,” in Proc. IEEE Inf. Theory Workshop, Sep. 2001, pp. 120–122. [14] C. Schlegel and A. Grant, “Concatenated space–time coding,” in Proc. 12th IEEE PIMRC, vol. 1, Sep. 2001, pp. C139–C143. [15] L. Xiaotong and R. S. Blum, “Guidelines for serially concatenated space–time code design in flat Rayleigh fading channels,” in Proc. IEEE SPAWC, 2001, pp. 247–250. [16] D. Tujkovic, “Recursive space–time trellis codes for turbo-coded modulation,” in Proc. IEEE Globecom, vol. 2, Dec. 2000, pp. 1010–1015. [17] A. Reial and S. G. Wilson, “Concatenated space–time coding,” in Proc. CISS, sec. TA6, Mar. 2000, pp. 12–17. [18] G. Bauch, “Concatenation of space–time block codes and “Turbo”TCM,” in Proc. IEEE Int. Conf. Commun., vol. 2, Jun. 1999, pp. 1202–1206. [19] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near-Shannon-limit error-correcting coding and decoding: Turbo codes,” in Proc. IEEE Int. Conf. Commun., 1993, pp. 1064–1070. [20] S. Benedetto, D. Divsalar, G. Montorsi, and F. Pollara, “Serial concatenation of interleaved codes: performance analysis, design, and iterative decoding,”, TDA Prog. Rep. 42-126, Aug. 15, 1996. [21] S. Dolinar and D. Divsalar, “Weight Distributions for turbo codes using random and non-random permutations,”, TDA Prog. Rep. 42-122, Aug. 15, 1995. [22] C. Berrou, “Some clinical aspects of turbo codes,” in Proc. Int. Symp. Turbo Codes, Brest, France, 1997, pp. 26–31. [23] D. Shiu and J. M. Kahn, “Layered space–time codes for wireless communications using multiple transmit antennas,” in Proc. IEEE Int. Conf. Commun., vol. 1, Jun. 1999, pp. 436–440. [24] S. G. Wilson, Digital Modulation and Coding. Englewood Cliffs, NJ: Prentice-Hall, 1996. [25] X. Li, A. Chindapol, and J. A. Ritcey, “Bit-interleaved coded modulation with iterative decoding and 8-PSK signaling,” IEEE Trans. Commun., vol. 50, no. 8, pp. 1250–1257, Aug. 2002.


Andres Reial (S’96–M’00) received the B.S.E.E. degree in 1993 from the Maharishi International University, Fairfield, IA, and the M.S.E.E. and Ph.D. degrees in 1997 and 2000, respectively, from the University of Virginia, Charlottesville. He was a Research Assistant with the Communications, Control, and Signal Processing Laboratory at the University of Virginia from 1997 to 2000. Since 2000, he has been with Ericsson Research, Lund, Sweden. His current research interests include advanced signal processing and receiver structures for wideband CDMA, as well as multiantenna transceiver systems.

1309

Stephen G. Wilson (S’65–M’68–SM’99) received the B.S.E.E. degree from Iowa State University, Ames, the M.S.E.E. degree from the University of Michigan, Ann Arbor, and the Ph.D. degree from the University of Washington, Seattle. He is currently a Professor of electrical and computer engineering at the University of Virginia, Charlottesville. His research interests are in applications of information theory and coding to modern communication systems, specifically digital modulation and coding techniques for satellite channels, wireless networks, spread spectrum technology, and ST coding for multipath channels. Prior to joining the University of Virginia faculty, he was a Staff Engineer for The Boeing Company, Seattle, WA, engaged in system studies for deep-space communication, satellite air traffic control systems, and military spread-spectrum modem development. He is the author of Digital Modulation and Coding (Englewood Cliffs, NJ: Prentice-Hall, 1996). He has authored over 100 technical articles on communication system design and signal processing systems. He also acts as consultant to several industrial organizations in the area of communication system design and analysis, and digital signal processing. Dr. Wilson is presently an Area Editor for Coding Theory and Applications of the IEEE TRANSACTIONS ON COMMUNICATIONS. He has been recognized for outstanding teaching with Distinguished Professor awards from the University of Virginia Alumni Association, the State Council on Higher Education in Virginia, and the ASEE-Southeastern Section.

Multistage Iterative Decoding With Complexity Reduction for ...

Multistage Iterative Decoding With Complexity Reduction for ...

Suggest Documents

Reduction of ML Decoding Complexity for MIMO ... - ITA @ UCSD

on the application of low complexity iterative decoding ...

Polynomial Complexity Universal Block Decoding with ... - CiteSeerX

Polynomial Complexity Universal Block Decoding with ... - CiteSeerX

Complexity Reduction of Blind Decoding Schemes Using CRC ... - DiVA

ML Decoding Complexity Reduction in STBCs ... - Semantic Scholar

High-Rate Short-Block LDPC Codes for Iterative Decoding with

Improved Iterative Decoding for Perpendicular Magnetic ... - arXiv

A Turbo-Like Iterative Decoding Algorithm for

Multiuser demodulation and iterative decoding for

"Complexity Reduction of Iterative Minimum Mean Square ... - CiteSeerX

Shuffled Iterative Decoding - IEEE Xplore

Distributed MIMO coding scheme with low decoding complexity for ...

Soft-Decision List Decoding with Linear Complexity for ... - Google Sites

Space-time codes with controllable ML decoding complexity for any ...

Reduced Complexity Super-Trellis Decoding for ... - arXiv

Low Complexity Sphere Decoding for Spatial ...

Multilevel Codes and Multistage Decoding for Unequal Error Protection

Soft-Output Decoding Algorithms in Iterative Decoding of Turbo Codes

Iterative Decoding vs. Viterbi Decoding: A Comparison - Google Sites

Iterative Decoding vs. Viterbi Decoding: A Comparison

Soft-Output Decoding Algorithms in Iterative Decoding of ... - CiteSeerX

Iterative Decoding of JPEG Coded Images with Channel ... - CiteSeerX

Iterative Joint Source-Channel Decoding with source statistics estimation