IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
773
Doubly Iterative Decoding of Space–Time Turbo Codes With a Large Number of Antennas Ezio Biglieri, Fellow, IEEE, Alessandro Nordio, Member, IEEE, and Giorgio Taricco, Senior Member, IEEE
Abstract—We examine the performance of a reduced-complexity doubly iterative decoder for space–time turbo codes on a quasistatic fading channel. The decoder works by using preliminary soft values of the coded symbols, obtained after a limited number of turbo iterations, to reduce the spatial interference from the received signal. Then, new turbo iterations are performed to improve on the quality of the soft values, and so on. Using a number of approximations, we obtain a receiver interface that achieves a good tradeoff between performance and complexity, and allows the use of turbo space–time codes for a large number of transmit and receive antennas. Index Terms—Fading channels, multiple antennas, space–time codes, turbo codes.
I. INTRODUCTION
M
ULTIPLE-antenna techniques have been recently recognized as capable of greatly increasing the spectral efficiency of wireless systems (see, e.g., [13] and the references therein). For this reason, a considerable research effort is being spent to design space–time codes that approach the impressive values of channel capacity available. A problem here is related to the complexity of optimum decoding: in fact, maximum-likelihood receiver interfaces exhibit a complexity that grows exponentially with the modulation size and the number of antennas, and becomes quickly unpractical as either parameter is large. Thus, in addition to the search for good space–time codes, it is important to seek receiver interfaces that achieve a close-to-optimum performance while keeping a moderate complexity: this would remove the practical restriction to small-signal constellations or few antennas. As for code design, original studies of trellis space–time codes [20] showed a state complexity exponential in the number of antennas. More recently, it was argued [6] that good performance for low-to-intermediate signal-to-noise ratios can be achieved by using “textbook” codes, designed for use over the additive white Gaussian noise channel, with coded symbols distributed, in a round-robin fashion, among transmit antennas. This suggests that capacity-approaching codes could be used as space–time codes, provided that their decoding complexity is tolerable. As for the complexity curse, suboptimal receiver
Paper approved by H. El-Gamal, the Editor for Space-Time Coding and Spread Spectrum of the IEEE Communications Society. Manuscript received September 16, 2003; revised October 1, 2004. This paper was presented in part at the Second Joint Workshop on Communications and Coding (JWCC 2003), Nuits-St.-Georges, France, October 19–21, 2003 and at the International Conference on Communications, Paris, France, June 20–24, 2004. The authors are with the Dipartimento di Elettronica, Politecnico di Torino, 10129 Torino, Italy (e-mail:
[email protected]; alessandro.nordio@polito. it;
[email protected]). Digital Object Identifier 10.1109/TCOMM.2005.847156
interfaces may include linear filters, successive cancellations of spatial interference, or sphere decoding. The first suboptimal architecture, called the Bell Labs Layered Space–Time (BLAST) and based on the layer concept, was introduced by Foschini et al. [8], [9]. In [1], it was shown that this architecture, for quasi-static fading channels and perfect channel state information available at the receiver, can perform at 2.5–3 dB from channel outage probability.1 Sphere-decoding techniques, possibly in a simplified form, were advocated in [10], [21], [22]. In [6], [7], linear and vertical-BLAST interfaces were analyzed, and their performance evaluated with space–time codes. Turbo algorithms, iterating between a receiver interface and a decoder, can also be employed to design low-complexity architectures. This idea was developed in the past for multiuser detectors processing encoded signals [14], [23], and subsequently extended to multiple-antenna systems, in which spatial interference has to be mitigated in order to achieve reliable decoding (see, e.g., [10], [15]). In [4], an iterative spatial-interference canceler was shown to provide a good tradeoff between complexity and performance. In the receiver interface, the signals are first combined through a linear minimum mean-square error (MMSE) filter, then spatial interference is reduced by feeding back hard decisions provided by the decoder. Use of turbo space–time codes was advocated and analyzed, among others, in [12], [17]–[19], [24]. In particular, [17] describes a multiple-antenna system which employs turbo-coded modulation and an iterative demodulation-decoding approach yielding a performance 2 to 3 dB away from channel outage probability. However, since the receiver complexity increases exponentially with the number of transmit antennas and modulation size, this system is not suited for a large number of antennas. A reduced-complexity receiver, also proposed in [17], allows the number of antennas to increase, but this simplification comes at the price of a considerable loss in performance.2 The main goal of this letter is to describe a class of low-complexity iterative receivers suitable for use with turbo codes and with a relatively large number of antennas. By applying an idea previously advocated in [2], and, in the context of code-division multiple access (CDMA), in [16], we make the receiver doubly iterative, in the sense that preliminary results obtained from a few iterations of the turbo decoding algorithm are used to reduce spatial interference. After this reduction, further turbo-decoding iterations are performed in order to improve on the interference cancellation, and so on. This letter is organized as follows. 1Alternately, the matched-filter bound (a genie-aided (GA) decoder that artificially removes all the symbols from the interfering antennas) can be used as a benchmark. 2Other classes of codes could also be used in lieu of turbo codes.
0090-6778/$20.00 © 2005 IEEE
774
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
After a description of the transmitter and of the channel model (Section II), Section III provides a detailed description of the proposed receiver, along with a short description of the system introduced in [17], which we use for comparison. The complexities of these two systems are compared in Section IV. Finally, Section V provides simulation results. These show that the receiver proposed here improves on the one advocated in [17], as it can actually achieve an attractive tradeoff between performance and complexity, with the latter kept at reasonable levels even in the presence of a large number of antennas.
Fig. 1.
Receiver interface with the decoding scheme of [18].
II. SYSTEM MODEL Fig. 2. Block diagram of the doubly iterative minimum mean-square error (MMSE) receiver.
A. Transmitter We consider a multiple-input, multiple-output (MIMO) system with transmit and receive antennas. We assume that a vector of source bits enters a channel “turbo” encoder with and block length . We denote by the vector of rate the encoded bits after interleaving; these are multiplexed to streams, each one being sent to a modulator. There are two interleavers in this transmission scheme: i) The turbo-code interleaver, whose role is well known and understood, and ii) the space–time-code row interleaver, whose action is beneficial to the overall system performance as discussed in [4]. The matrix space–time codeword collects the modulated symbols sent through the channel. The row index of indicates space, while the column index indicates time: that is, , the th column of , is the -tuple of channel symbols . transmitted simultaneously at discrete time , transmitted by the th antenna at time , has The symbol average energy and is chosen from a two-dimen. The specsional constellation whose cardinality is bits per dimension pair. We detral efficiency is then the modulator, i.e, the function mapping the note by components of the column vector into those of the -vector . Notice that is the th encoded bit sent at time ( , ). Specifically, since we have an independent modulator for each one of the antennas, we can write
for , where is the th mapping function corresponding to the th modulator and
Accordingly, we have
B. Channel Model The received signal is modeled by the where
matrix
(1)
matrices for . Similarly, we asare sume that and . is the number of independent fading states of equal lengths experienced by a single codeword of length (it is assumed that divides ). To simplify notation, we drop the fading state subscript , and understand that refers to the specific fading state corresponding to the current time position. is a matrix of complex circularly distributed Gaussian random variables, with zero mean and independent real and , imaginary parts having the same variance the noise power spectral density. Thus, the noise affecting the received signal is spatially and temporally white, with , where denotes the identity matrix Hermitian transposition. The channel is described by and complex Gaussian random matrices , whose the are circularly distributed with variance of their entries real and imaginary parts equal to . The matrices are and , and remain constant during the independent of both transmission of an entire codeword (this is the quasi-static, or block-fading, assumption [5]). We finally assume that the channel state information (CSI) is perfectly known at the receiver.
III. RECEIVERS Here we briefly describe two receiver interfaces matched to the coding scheme described in previous section. The first one, proposed by Stefanov and Duman (SD) in [17] and referred to in the following as “SD interface,” is depicted in Fig. 1. The second one, the “MMSE iterative interface” proposed in [4], is depicted in Fig. 2. Both systems employ a turbo decoder based on the classical “maximum a posteriori probability” (MAP) or log-MAP algorithm. We hasten to point out here that MAP decoding requires as inputs the log-likelihood ratios (LLRs) of the transmitted bits; their computation requires the knowledge of the channel signal-to-noise ratio (or at least an accurate estimate thereof). In the following, the problem of computing the LLRs will be examined in the context of receiver complexity.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
775
A. SD Interface [17] diThis interface computes the LLRs of the encoded bits rectly from the received signal through the standard formula
(2)
where is the th column of and is the channel matrix (at time ). The LLRs are first deinterleaved, then passed to the turbo decoder which, after a constant number of iterations which we denote by , outputs , the estimate of the uncoded bit vector . Since the complexity of LLR computation grows exponentially with the product , for a large number of antennas a reduced-complexity computation is necessary, as advocated in [17]. This partitions the transmit antennas in groups, containing antennas, respectively, with . The LLRs of the transmitted bits belonging to the same group are computed by nulling the interference from other groups (see [17, Sec. IV] for additional details). However, while this technique dramatically decreases the complexity of LLR computation (it rather than with ), at grows exponentially with the same time, it impairs the receiver performance to an extent that may turn out unacceptable. B. MMSE Iterative Receiver The receiver we advocate here employs a linear MMSE interface as front end. This consists of the linear filter modeled by the matrix that minimizes the mean-square error (3) with
is the scalar matrix , so 1) The expected value of tends to a scalar matrix itself, that, for large and , and noise enhancement has no effect in this case, as the signal-to-noise ratio is unchanged. 2) To avoid possible noise enhancement, one could solve the constrained MMSE problem
which does not seem to be amenable to a simple analytical solution. Fig. 2 depicts the block diagram of our iterative MMSE receiver. The filtered signal is sent to an interference-canceler block, which outputs a
matrix
to the following algorithm. At iteration while for
generated according , we set
,
(5) In this way, we expect to reduce the spatial interference affecting . The spatial interference is completely removed if , i.e., with correct decoding. Here, comprises the “extrinsic” soft estimates of the transmitted bits, provided by the turbo decoder in the form of log-probabilities (6) For example, with unit-energy quaternary phase-shift keying fed back to the in(QPSK) modulation3 the soft estimates terference canceler are obtained in the form
. The normalized filtered signal is (4)
where , , and , so that the matrix has zeroes on its main diagonal. In (4), the first term of the sum represents the desired signal, the second term the spatial interference, and the third term the filtered noise. For this approach to be sensible, we require that the matrix exists or, in other words, that has no zeros on its main diagonal. Now, since
we can calculate the probability that is not invertible as a defined as the multidimensional integral on a domain in union of the subdomains
which are hypersurfaces of dimension at most (from the inverse function theorem [11]). Hence, their union has dimenand the integral of the -dimensional continuous sion over this domain is probability density function (pdf) of . Thus, the probability that is not invertible is . We observe further that our approach may cause noise enhancement, due to the multiplication of the noise matrix by . However, we have the following.
(7) for and . In general, at iteration , after interference cancellation we can write (8) where accounts for the noise and the residual spatial interference at iteration . 1) Approximations: To proceed further, we need a number of approximations to the quantities involved in our iterative alto be the th column of ; we apgorithm. Define first proximate this random complex vector with one whose distribution is (conditionally on ) circularly Gaussian, with zero mean and covariance matrix (9) 3The performance of the iterative receivers considered in this transactions letter depends critically on the size of the signal constellation, and a degradation is expected if this increases. Reasons of space prevent us from expanding on this issue.
776
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
With this approximation, at every iteration the LLRs of the encoded bits are given by
where we assume that the transmitted symbols are independent and identically distributed (i.i.d.) with zero mean and variance , so that . This assumption holds for large codeword lengths and ideal interleaving. and In order to use the simplified LLRs of (13), denoting the th row of and , respectively, we have
(10)
(17)
where is the th column of matrix . Since the form (10) may still be too complex for practical applications, we resort to a further simplification consisting of approximating expectation (9) with one taken with respect to the whole codeword. Specifically, at every iteration we set
The situation becomes more involved as we consider the subsequent iteration steps. In fact, for , we have
(11) which yields the following suboptimal LLRs:
which includes a contribution depending on , whose statistical distribution is hopelessly difficult to find (it depends on all the transmission parameters and the decoding process itself). This difficulty can be overcome by some additional simplifying assumptions, supported by numerical evidence. 1) We neglect the correlation existing between the noise term and the output decision
(12) As a final simplification of the decoding algorithm, we drop to obtain the off-diagonal entries of
(13)
where is the antenna index corresponding to the encoded bit index . Explicitly
Notice how the complexity of the LLR evaluation grows linearly with , and exponentially with the constellation size. is the mapping function corresponding As an example, if , we have to unit-energy QPSK with (14)
(15) that and are mapped to the real and imaginary parts of . : We have shown that the computation 2) Estimating . of the LLRs at the th iteration depends on the matrices These are generally difficult to calculate, except at the beginning we have of the iterative decoding procedure. In fact, for the following closed form:
, assuming that
As a consequence of this assumption, we have the following matrix inequality:4 2) The signal term and the noise term in are assumed to be uncorrelated, i.e.,
(This assumption is especially reasonable when the spatial interference is sufficiently small with respect to the useful signal.) 3) The expectation of the matrix is assumed to be approximately independent of the iteration index . Assumptions 2) and 3) imply (18) and our decoding algorithm will use in lieu of . This matrix is nonnegative definite; hence, also its approximation must be nonnegative definite. Thus, whenever fails to have this property (which tends to occur when the decoding algorithm is close to convergence, and spatial interference is about eliminated) it is reset to the value assumed in the absence of spa. This matrix lower-bounds (in tial interference, namely, terms of matrix inequalities) all possible covariance matrices according to Assumption 1 above, i.e.,
Now, the diagonal elements of
can be written as (19)
(16)
4Here and in the following, the notation A negative definite.
B means that A 0 B is non-
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
The effect of resetting to can also be explained in terms of the variance of the interference-plus-noise additive disturbance. This operation does not allow the variance to become smaller than a given limiting value, and, since the signal-to-interference-plus-noise ratio (SINR) is inversely proportional to the variance, it keeps the SINR below a given threshold. This limitation has a favorable effect that can be intuitively explained as follows. When the turbo decoder receives an estimated SINR higher than the actual one, the corresponding extrinsic output information is assumed to be more reliable than it should be. Correspondingly, the soft outputs become closer to their corresponding hard values, thereby producing a stronger effect on interference cancellation. Thus, interference cancellation may be based on wrong decisions, and hence impair the receiver performance.
777
with variance . The log-MAP algorithm outputs the log-a posteriori probability ratio on bit , defined as (20) for each encoded bit , rule to (20) we obtain
given the noisy received codeword . Applying Bayes’
(21) where the term (22) is the prior information on bit .5 After some algebra, and by exploiting the nature of the trellis, the LLRs in (20) can be rewritten using the log-sum operator
IV. RECEIVER COMPLEXITY Here we compare the complexity of the two interfaces examined in this transactions letter. Although the actual complexity of a given algorithm depends on its actual hardware implementation (especially on pipelining and parallelization), in this work, complexity is expressed in terms of number of operations. Among these we have real additions and real multiplications, whose number is denoted by and , respectively. Since computation of (2), of (12), and of the log-MAP involves numerous computations of logarithms and exponentials, for simplicity we denote with the symbol the “log-sum” operator which can be approximated by the “ ” operator . Its complexity is denoted by . Since most quantities processed here are complex numbers, we and assume that the complexity of an addition in equals . that of a multiplication in equals A. Log-MAP Complexity The iterative algorithm usually implemented to decode turbo codes is based on the log-MAP algorithm. Since in our case we consider a turbo encoder made of two parallel recursive systemwith atic convolutional (RSC) encoders of rate states, then the turbo decoder performs, at each iteration, two calls to the log-MAP algorithm in order to decode both RSC codes. The log-MAP decoder algorithm works on the trellis generated by the RSC encoder. In general, a bit entering the RSC encoder induces the transition from encoder state at time to state at time , , where is the number of trellis sections in a codeword, and where , , being the set of encoder states. If the encoder rate is , then the transition generwhere is the sysates coded bits, tematic bit, while are the parity bits. We assume that each coded bit is mapped to a binary constellation according , , for , and that the to transmitted symbols are corrupted by additive Gaussian noise
(23) where and are the sets of ordered pairs corresponding to the transitions caused by or , respecand are obtained using standard tively. In (23), forward and backward Bahl–Cocke–Jelinek–Raviv (BCJR) recursions [3]. Moreover, it can be shown that (24) where
is a constant while (25)
is the transition probability which is directly computed from the LLRs (2) or (13) depending on the implemented receiver. : For a given constituent en1) Computation of trellis steps, metrics must coder and for each one of the be computed. Each metric requires for a total of
operations. Backward loop. The number of operations is
Forward loop. In the complexity of the forward loop we also take into account the computation of (20), to obtain
operations. Overall log-MAP complexity. Since the turbo decoding algorithm is iterative, at each iteration the log-MAP routine is exe5In the turbo decoder, extrinsic information is exchanged between the two RSC log-MAP decoders. The prior information used by, say, log-MAP decoder 1 is the extrinsic information computed and provided by log-MAP decoder 2. Prior information 3 is initialized to 0 at the first iteration of the turbo decoder, corresponding to equally likely input bits.
778
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
cuted once for each constituent encoder. However, does not depend on iteration and is computed only once. The overall number of operations involved in of the log-MAP algorithm is then approximately
TABLE I COMPARISON BETWEEN MMSE ITERATIVE AND SD INTERFACES: RATIO BETWEEN THEIR COMPLEXITIES
where the integer denotes the number of iterations of the turbo decoder. D. Complexity Ratio
B. Iterative MMSE Receiver Complexity 1) MMSE Filter Computation: The complexity of the comand is negligible for large putation of the MMSE filters . However, the application of the filter to the received signal, that is, the product requires operations. 2) Interference Cancellation: Interference cancellation in, which requires volves the complex matrix product operations. 3) LLR Computation: According to (13), the evaluation of for all possible requires operations. Hence, the total number of operations needed for LLR evaluation for all the coded bits in the codeword is, according to (13)
which, in the case of QPSK modulation, from (14) and (15) reduces to
4) Estimation of : The evaluation of (19) for costs for . For , we use (17), whose evaluation has negligible complexity for large . 5) Computation of : The soft symbol estimates obtained calls from the LLR provided by the turbo decoder require function, according to (7). In practice, calculation to the of this function, whose complexity we indicate as , can be approximated by using a lookup table. 6) Overall Complexity: Assume we have interference cancellation iterations, and turbo-decoder iterations. Each turbodecoder iteration has complexity , which yields an overall receiver complexity
(26) C. SD Interface Complexity 1) LLR Computation: Equation (2) requires the evaluation of , where is computed once per operations per codeword. frame. This requires operations are required for the computaMoreover, LLRs. tion of the 2) Overall Complexity: The overall receiver complexity is (27)
Comparison of the two interfaces is based on the ratio of their complexities
Table I shows the values of for obtained . The turbo code is made of with QPSK modulation , and two constituent RSC codes with rate four states. The puncturing matrix alternately takes the parity bit at the output of the two RSC encoders, which results into a and . We choose itercode rate interference-canceling ations for the turbo decoder and iterations for the MMSE receiver. Depending on the hardware implementation, the relative costs of , , , and may vary; however, if we employ the log-max approximation we can assume the costs , , and to be about equivalent. In general, the value of above is only moderately sensitive to the ratio between the costs of these basic operations. We also observe that the ratios exhibited in Table I reflect, with a good approximation, the ratios of computer times necessary for the simulation of the two interfaces. As put in evidence by Table I, the MMSE iterative receiver is convenient in terms of complexity when the , the number of transmit antennas is larger than four. For complexity of the SD receiver is not affordable. V. NUMERICAL RESULTS In this section, we provide numerical results showing the performance of the proposed receiver. At the transmitter, we use and random interleaving, a turbo QPSK modulation made of two RSC encoders of rate code with rate with four states and octal generators . symbols. In all cases, we conThe codeword spans iterations of the turbo decoder for each interfersider ence-cancellation iteration, and we approximate the log-sum computation with the log-max operator. We consider first the , which corresponds to a channel with long cohercase ence and no interleaving, so that a single fading value affects the whole codeword. Fig. 3 shows the performance of the MMSE receiver. The frame error rate (FER) is plotted versus the signal-to-noise ratio for a system with 16 transmit and receive antennas. The solid line without markers represents the outage probability (unconstrained, i.e., obtained for Gaussian signals), while the solid lines with markers describe the performance of our MMSE re, , and interference-cancellation ceiver for iterations, and where the LLRs are computed according to (13).
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 53, NO. 5, MAY 2005
779
architectures proposed in the past, it achieves a good tradeoff between performance and complexity, and allows the use of turbo space–time codes even for a relatively large number of transmit and receive antennas (for example, ). REFERENCES
Fig. 3. Performance of our doubly iterative MMSE receiver on a block-fading channel with F = 1. The FER is plotted versus the signal-to-noise ratio E =N for a system with 16 transmit and receive antennas. The solid line without markers represents the outage-probability ideal limit; the solid lines with markers describe the performance of our MMSE receiver for k = 0 and k = 4 interference-cancellation iterations, with the LLRs computed according to (13). Here q = 8.
Since (13) is a simplification of (12), that neglects the off-di, one would expect a receiver agonal terms of the matrix based on the LLR computation given by (12) to perform better. this receiver exhibits a poorer performance. However, for , the computation This is interpreted by observing that, for using of (12) requires the estimation of the entire matrix (18), while only its diagonal is required by (13). The errors in are enhanced by its inversion, leading to the estimate of a severe performance degradation. As another check on the validity of our approximations, observe that, if a “GA” receiver , then had perfect knowledge of the residual interference could be computed by using (11). This GA the matrix receiver exhibits a marginal gain with respect to our receiver. This moderate difference in performance validates the approximations introduced in (13). For the same system parameters, Fig. 3 compares our receiver with that proposed in [17], denoted here as “SD” (dashed lines). Notice that the doubly iterative MMSE receiver with interference-cancellation iterations and FER is at about 2.5 dB away from channel outage probability, and is about 7 dB ” receiver at FER . better than the “SD, VI. CONCLUSION We have described a limited-complexity doubly iterative decoder for space–time turbo codes on a quasi-static fading channel. Preliminary soft values of the coded symbols, obtained from a few turbo iterations, are used to reduce the spatial interference from the received signal. After this reduction, new turbo iterations are performed to improve on the quality of the soft values, and so on. A number of approximations simplify the calculations of the soft values. Complexity analysis shows that the system advocated here is much simpler than similar
[1] L. Ariyavisitakul, “Turbo space-time processing to improve wireless channel capacity,” IEEE Trans. Commun., vol. 48, no. 8, pp. 1347–1359, Aug. 2000. [2] G. Bauch, J. Hagenauer, and N. Seshadri, “Turbo processing in transmit antenna diversity systems,” Ann. Télécommun., vol. 56, no. 7–8, pp. 455–471, Jul.–Aug. 2001. [3] S. Benedetto and E. Biglieri, Priciples of Digital Transmission With Wireless Applications. New York: Kluwer, 1999. [4] E. Biglieri, A. Nordio, and G. Taricco, “Suboptimum receiver interfaces and space-time codes,” IEEE Trans. Signal Process., vol. 51, no. 11, pp. 2720–2728, Nov. 2003. [5] E. Biglieri, J. Proakis, and S. Shamai (Shitz), “Fading channels: Information-theoretic and communications aspects,” IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2619–2692, Oct. 1998. [6] E. Biglieri, G. Taricco, and A. Tulino, “Performance of space-time codes for a large number of antennas,” IEEE Trans. Inf. Theory, vol. 48, no. 7, pp. 1794–1803, July 2002. , “Decoding space-time codes with BLAST architectures,” IEEE [7] Trans. Signal Process., vol. 50, no. 10, pp. 2547–2552, Oct. 2002. [8] G. J. Foschini, “Layered space-time architecture for wireless communications in fading environments when using multiple antennas,” Bell Labs Tech. J., pp. 41–59, Autumn 1996. [9] G. J. Foschini, D. Chizhik, M. J. Gans, C. Papadias, and R. A. Valenzuela, “Analysis and performance of some basic space-time architecture,” IEEE J. Sel. Areas Commun., vol. 21, no. 3, pp. 303–320, Apr. 2003. [10] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003. [11] J. Hubbard and B. B. Hubbard, Vector Calculus, Linear Algebra and Differential Forms: A Unified Approach, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2002. [12] Y. Liu, M. P. Fitz, and O. Y. Takeshita, “Full rate space-time turbo codes,” IEEE J. Sel. Areas Commun., vol. 19, no. 5, pp. 969–980, May 2001. [13] A. Paulraj, R. Nabar, and D. Gore, Introduction to Space-Time Wireless Communications. Cambridge, U.K.: Cambridge Univ. Press, 2003. [14] H. V. Poor, “Iterative multiuser detection,” IEEE Signal Process. Mag., vol. 21, no. 1, pp. 81–88, Jan. 2004. [15] A. Sanderovich, M. Peleg, and S. Shamai (Shitz), “LDPC coded MIMO multiple access communications,” in Proc. Int. Zürich Seminar on Communications, Zürich, Switzerland, Feb. 18–20, 2004, pp. 106–110. [16] Z. Shi and C. Schlegel, “Joint iterative decoding of serially concatenated error control coded CDMA,” IEEE J. Sel. Areas Commun., vol. 19, no. 8, pp. 1646–1653, Aug. 2001. [17] A. Stefanov and T. M. Duman, “Turbo-coded modulation for systems with transmit and receive antenna diversity over block fading channels: system model, decoding approaches, and practical considerations,” IEEE J. Sel. Areas Commun., vol. 19, no. 5, pp. 958–968, May 2001. [18] , “Peformance bounds for turbo-coded multiple-antenna systems,” IEEE J. Sel. Areas Commun., vol. 21, no. 3, pp. 374–381, Mar. 2003. [19] H. Su and E. Geraniotis, “Space-time turbo codes with full antenna diversity,” IEEE Trans. Commun., vol. 49, no. 1, pp. 47–57, Jan. 2001. [20] V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for high data rate wireless communication: Performance criterion and code construction,” IEEE Trans. Inf. Theory, vol. 44, no. 2, pp. 744–765, Mar. 1998. [21] H. Vikalo and B. Hassibi, “Modified Fincke-Pohst algorithm for low-complexity iterative decoding over multiple antenna channels,” in Proc. 2002 IEEE Int. Symp. Information Theory (ISIT 2002), Lausanne, Switzerland, Jun./Jul.5 2002, p. 390. [22] , “Toward closing the capacity gap on multiple antenna channels,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP ’02), May 13–17, 2002, pp. 2385–2388. [23] X. Wang and H. V. Poor, “Iterative (turbo) soft interference cancellation and decoding for coded CDMA,” IEEE Trans. Commun., vol. 47, no. 7, pp. 1046–1061, July 1999. [24] X. Zhuang and F. W. Vook, “Coding-assisted MIMO joint detection in turbo-coded OFDM,” in Proc. Vehicular Technology Conf. (VTC 2002Fall), Vancouver, BC, Canada, Sep. 2002, pp. 23–27.