916
IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 5, MAY 2013
Reducing the Complexity of Quasi-ML Detectors for MIMO Systems Through Simplified Branch Metric and Accumulated Branch Metric Based Detection Xiaoming Dai, Runmin Zou, Shaohui Sun, Yingmin Wang
Abstract—A detection scheme built on a combination of a simplified branch metric based search and refined calculation is proposed in this work to reduce the computational complexity of the quasi-maximum-likelihood (QML) detectors, such as the QR-decomposition combined with M-algorithm (QRD-M), list sequential sphere decoder (LISS) and list sphere decoding (LSD), for multiple-input multiple-output (MIMO) multiplexing systems, while maintaining similar performance to the original methods. More specifically, the proposed method utilizes a multiplicationfree accumulated branch metric to first identify the potential candidates with significantly reduced search complexity. The optimal solution is then determined by the standard squared Euclidean distance metric from the initial candidates. The key idea of the proposed method lies in the fact that the best candidate identified by the simplified branch metric coincides with that of which is based on the conventional one with a high probability for a properly designed parameter. Numerical results demonstrates that the QRD-M detector using the proposed method achieves significantly better performance than the conventional one with comparable (or even less) computational complexity.
ones to explore the potentially promising candidates during the preliminary search stage. After identifying the candidate list at the last detection stage, the best solution is redetermined based on the standard squared Euclidean distance metric. The optimum solution of the candidates identified at the final stage based on the simplified branch metric and accumulated branch metric coincides with that of conventional method with a high probability due to the quasi-monotonic relationship between the standard Euclidean distance metric based and the proposed ones. Therefore, the quasi-ML detectors using the proposed technique achieve similar performance to the conventional ones with reduced computational complexity. As an illustrative example, the QRD-M detector is employed in this work. Numerical results show that the proposed QRDM detector outperforms the conventional one via retaining a larger number of nodes at the interim stage, with comparable (or even less) computational complexity.
Index Terms—List sequential sphere decoder (LISS), multipleinput multiple-output (MIMO), quasi-maximum-likelihood (QML), QR-decomposition combined with M-algorithm (QRD-M).
II. Conventional Soft-Output QRD-M Decoding
I. Introduction
T
HE QRD-M-like algorithms, such as QRD-M, LSD [1]– [3] have received great attention recently for their good complexity-performance and amenability to hardware implementation. The computational complexity of the QRD-M-like quasi-ML detectors is mainly determined by the number of branch metric calculation which is related to the constellation size Q and transmit antenna number NT . The requirements on high-rate quasi-ML MIMO detectors will, in general, mandate the use of dedicated hardware implementations. The computational complexity of the real multiplication is 1-2 orders higher than that of the real addition for application specific integrated circuits (ASIC)-based applications [5]. In this work, we propose a multiplication-free branch metric and a simplified accumulated branch metric (ABM) to substitute the conventional squared Euclidean distance based Manuscript received January 15, 2013. The associate editor coordinating the review of this letter and approving it for publication was A. Alagan. X. Dai, S. Sun, and Y. Wang are with State Key Laboratory of Wireless Mobile Communications of the China Academy of Telecommunication Technology (CATT), Beijing, 100080, P. R. China (e-mail:
[email protected];
[email protected]). R. Zou is with the School of Information Science and Engineering, Central South University, Changsha 410083, China. The work of R. Zou was supported in part by the National Science Foundation (NSF) of China under Grant 61174210. Digital Object Identifier 10.1109/LCOMM.2013.040213.130117
A. System Model We consider a space-time bit-interleaved coded modulation (ST-BICM) MIMO multiplexing system with NT transmit and NR = NT receive antennas in this work. Let vector b with size K be source information bits entering the rate Rc Turbo channel encoder. c denotes the vector of encoded bits. c is grouped into blocks of Mc bits (Mc is the number of bits per constellation symbol) and then multiplexed to NT sub-streams. Each block is mapped onto 2 Mc -ary quadrature amplitude modulation (QAM) complex symbols by the Gray mapping. The standard complex baseband model between the transmitted and received signals is given by y = Hs + n, where y ∈ CNR ×1 is the received vector, H ∈ CNR ×NT denotes channel matrix, s ∈ CNT ×1 represents the transmitted vector, and n ∈ CNR ×1 is the independent and identically distributed (i.i.d.) zero mean complex circular Gaussian random noise with variance σ2 . Each symbol vector s is associated with a bit-level (label) vector x. Slightly abusing common terminology, we denote the entries of x as xi,l , i = 1, · · · , NT , l = 1, · · · , log2 Q, where the indices i and l refer to the l-th bit in the label of the constellation point corresponding to the i-th entry of s. B. Basic Configuration of the QRD-M Method Assuming i.i.d. complex Gaussian noise and perfect channel estimation at the receiver, the ML detector is given by
c 2013 IEEE 1089-7798/13$31.00
y − Hs2 . sML = arg min N s∈X
T
(1)
DAI et al.: REDUCING THE COMPLEXITY OF QUASI-ML DETECTORS FOR MIMO SYSTEMS THROUGH SIMPLIFIED BRANCH METRIC . . . branch metric calculation at stage 1
branch metric calculation at stage 2
917
branch metric calculation at stage NT
Q Q2 ˜y − Rs2 = |˜yNT − rNT ,NT s1Q |2 + |˜yNT −1 − rNT −1,NT −1 s2Q − rNT −1,NT s1Q |2 + · · · + |˜y1 − r1,1 sQ NT − r1,2 sNT −1 −, · · · , r1,NT s1 | symbol replica generation
accumulated branch metric (3)
Based on QR decomposition, the ML metric in (1) can be rewritten as ˜y − Rs2 sML = arg min (2) NT s∈X where y˜ = QH y, Q is an NR × NT unitary matrix and R is an NT × NT upper triangular matrix with real-valued positive entries on its main diagonal. The cost metric in (2) can be expanded as (shown at the top of the page), where siQ denotes one of the Q constellation points retained at the i = 1, · · · , NT stage. The QRD-M algorithm is a breadth-first tree search that views all the branches it will ever consider for a given stage of the detection tree and then rejects all but the Mi best nodes at the i-th detection stage before continuing on to the next stage. Based on (2), the log-likelihood ratio (LLR) of a posteriori probability (APP) of each coded bit conditioned on the received signal y is normally calculated using the max-log approximation as follows: L(xi,l |y) = L(xi,l |˜y) ≈ min
(0) s∈L∩Xi,l
˜y − Rs2 − min ˜y − Rs2 (1)
s∈L∩Xi,l
= min {γ(s)} − min {γ(s)} (0) s∈L∩Xi,b
(4)
(1) s∈L∩Xi,b
(1) where X(0) i,l and Xi,l designate the sets of symbol vectors that have the l-th bit in the label of the i-th scalar symbol equal to 0 and 1, respectively. L denotes the final candidate list identified by the QRD-M detector at the last detection stage.
III. Proposed Method With a closer inspection of (3) and (4), we note that the primary objective of the QRD-M detector (and other list-based algorithms, LISS and LSD) is to identify the candidate list L containing the optimum solution with whatsoever means (not the list L that matches perfectly with that of the full ML detector). Inspired by this observation, we first propose to utilize an approximate square root [4] Euclidean distance metric (instead of the commonly used square Euclidean distance metric [1], [3]) √ 1 (5) a2 + b2 ≈ max(|a| , |b|) + min(|a| , |b|) 4 in the QRD-M detector. A. Proposed Accumulated Branch Metric Up to Stage NT − 1 Then a simplified accumulated branch metric at stage i is introduced as follows:
γ (s) = |˜yNT − rNT ,NT s1Q | + |˜yNT −1 − rNT −1,NT −1 s2Q − rNT −1,NT s1Q | Q + · · · + |˜y1 − rNT −i+1,NT −i+1 siQ − rNT −i+1,NT −i+2 si−1
−, · · · , rNT −i+1,NT s1Q | ≈ max Re{˜y − r
Q yNT − rNT ,NT s1Q } + NT NT ,NT s1 } , Im{˜
1 Q Q min Re{˜yNT − rNT ,NT s1 } , Im{˜yNT − rNT ,NT s1 } + · · · 4
where Re{a} and Im{a} denote the real part and imaginary part of a, respectively. The approximate square root Euclidean distance metric and simplified accumulated branch metric are first utilized to determine the list of the preliminary potential candidates up to the stage NT − 1. B. Proposed Refined ABM Calculation at Stage NT After identifying the best MNT −1 nodes with the simplified branch metric and the accumulated branch metric at stage NT − 1, we then calculate the LLR of each coded bit utilizing the standard squared Euclidian distance metric of (4). The calculation costs per operation for the real multiplication, real addition, and comparison are 10, 1, and 1, respectively [5]. Therefore, the computational complexity of the multiplicationfree branch metric and the accumulated branch metric is much simpler than that of (3). Remark 1: The underlying idea of the proposed method is based on the observation that the simplified branch metric and accumulated branch metric exhibit a quasi-monotonic relationship with the standard squared Euclidian distance based ones. The major composition of the candidate list L identified using the proposed two-stage approximation scheme is dissimilar to that based on the standard method (3) with a high probability. However, the optimum solution of the final candidates L identified by the proposed method coincides with that of which is determined by the canonical square Euclidean distance based ones (3) which has been verified by numerical results in Section V. Since the performance of the QRD-M detector relies critically on the number of nodes retained at the interim detection stage [1], [3] (to a lesser extent on the accuracy of the quasi-optimum branch metric and accumulated branch metric), which gives us the indication that reshuffling more resource to the coarse (but much simpler) branch metric calculation for a larger candidates rather than the precise (but much more complex) branch calculation of smaller candidates will be a more cost-efficient paradigm for list-based quasi-ML detectors. We elucidate this by using the following example. Example 1: An example of the QRD-M algorithm for a 3×3 MIMO multiplexing system with 4-QAM is illustrated in Fig. 1. Since the LLR of the detected bits with a higher amplitudes indicate greater reliability for correctly detected symbols for BICM based systems [cf. (4)].1 As expected, Fig. 1(a) shows that the conventional one with M = 3 obtains the most reliable LLR (i.e., with highest magnitudes) for respective bits. Fig. 1(c) illustrates that the proposed method with M = 3 achieves 1 For incorrectly detected symbols, a higher LLR may not signify a more reliable detection since it may be caused by possible outliers (of incorrectly detected symbols). For a more comprehensive exposition of this, the reader is referred to [2]. With a closer observation of Fig. 1(a), we can determine that the result of M = 3 equals to the ML detector in this case.
918
IEEE COMMUNICATIONS LETTERS, VOL. 17, NO. 5, MAY 2013
(a)
(c)
(b)
Fig. 1: Illustration of the proposed QRD-M detector and the conventional one based on a 3 × 3 MIMO multiplexing system with 4-QAM.
slightly less reliable LLRs of the respective bits than those of the conventional one with M = 3. Whereas, the conventional QRD-M detector with M = 2 makes erroneous decisions on the third, fifth, sixth bits, i.e., L(x2,1 ) = −0.2, L(x3,1 ) = 0.3, L(x3,2 ) = 0.3 vs. L(x2,1 ) = 0.2, L(x3,1 ) = −1.0, L(x3,2 ) = −0.8 as depicted in Fig. 1(b) and Fig. 1(a), respectively. Therefore, the proposed QRD-M detector achieves better performance through retaining a larger number of nodes at the interim detection stage, with comparable or even less computational complexity (further detailed later in Section IV and V). This is attributed to the fact that the performance improvement due to the larger L outweighs the performance degradation resulting from the quasi-optimum branch metric and accumulated branch metric. IV. Complexity Comparison Based on the analysis of Section III, we compare the overall computation cost (including real multiplication, addition, and comparison) of the proposed QRD-M detector and the conventional one for MIMO multiplexing systems with different transmit antennas (i.e., 4 and 12) and different modulation schemes (i.e., 4-, 16-, and 64-QAM). The results are shown in Table I. Note that the link performance of the average block error rates (BLER) at 10−1 of the proposed QRD-M detector is assumed to be similar to those of the conventional one with the same number branch metric calculations based on the numerical results of Section V. For NT = 4, Table I shows that the proposed QRD-M detector achieves overall cost reduction by about 12%, 15%, 18%, and 24% for 4-QAM with M1−4 = 4, M1 = 4 M2−4 = 6, M1 = 4 M2−4 = 8, and M1 = 4 M2−4 = 14, respectively, compared with the conventional one. For 16-QAM, the computational complexity reduction is 44%, 48%, 51%, and 53% for the QRD-M detector with with M1−4 = 12, M1−4 = 16, M1 = 16 M2−4 = 20, M1 = 16 M2−4 = 24, respectively. About 67%, 68%, 68.5%, and 70% cost reduction is achieved for 64-QAM, respectively, with M1−4 = 48, M1 = 48 M2−4 = 64, M1−4 = 64, and M1 = 64 M2−4 = 80. As the number of transmit antennas increases, the complexity reduction ratio decreases as shown in Table I. The complexity reduction ratio is more prominent for larger M and/or higher modulation schemes. V. Simulation Comparison In this section, we evaluate the complexity and performance of the proposed method in comparison with the conventional
one over the extended vehicular (EVA) channel model with the maximum Doppler frequency of 5 Hz. We consider a turbocoded NT = NR = 4 MIMO-OFDM multiplexing system with 1024-point fast Fourier transform (FFT) and 15 KHz subcarrier spacing. At the transmitter, binary information data bits are first encoded by turbo coding with the original coding rate Rc = 1/3 (with generators polynomials [13, 15]8) and then punctured according to the coding rate of Rc = 5/6 as specified in [6]. Fig. 2(a) shows that the gap between the proposed QRD-M detector with M2−4 = 4 and the conventional one is less than 0.2 dB at BLER of 10−2 with about 88% the complexity of the latter (see Table I) for 4-QAM. The performance of the M2−4 = 14 is almost indistinguishable from the conventional one with greater computational complexity reduction of 25% (≈ 1− 7096 9404 ). It is also shown that in Fig. 2(a) that the proposed QRDM detector with M2−4 = 14 outperforms the conventional one of M2−4 = 8 by about 1 dB at BLER of 10−3 with 95% (≈ 7096 7460 ) the computational complexity of the latter. For 16-QAM, Fig. 2(b) shows that the proposed QRD-M detector of M1−4 = 16 achieves performance gain of about 3.7 dB at BLER = 3 × 10−2 over the conventional one of M1−4 = 12 with only about 64% (≈ 15368 24620 ) the complexity of the latter based on results of Table I. The performance degradation introduced by the proposed method with M1−4 = 16 is only 0.2 dB at BLER = 3 × 10−2 compared with the conventional one as shown in Fig. 2(b) for 16-QAM. Fig. 2(c) illustrates that proposed QRD-M detector with M1 = 64 M2−4 = 80 achieves almost identical performance as the conventional one with significantly reduced computational complexity (i.e., 30% ≈ 134440 441128 ). It is also shown in Fig. 2(c) that the proposed QRD-M detector of M1−4 = 64 surpasses the conventional one of M1−4 = 48 by about 2.2 dB at BLER = 1 × 10−2 with only about 40% (≈ 112616 274808 ) the computational complexity of the latter for 64-QAM. The BLER performance gain of the QRD-M detector over the conventional one for the NT = NR = 12 is similar to that of the NT = NR = 4 (not shown due to lack of space). VI. Conclusion In this work, we have proposed a simplified-search-plusrefined-calculation based paradigm for (quasi-)ML detectors. The proposed technique reduces the complexity of the most computationally intensive part of the branch metric computation for the quasi-ML detectors while still retaining the same
DAI et al.: REDUCING THE COMPLEXITY OF QUASI-ML DETECTORS FOR MIMO SYSTEMS THROUGH SIMPLIFIED BRANCH METRIC . . .
4-QAM, Rc = 5/6
64-QAM, Rc = 5/6
16-QAM, Rc = 5/6 −1
−1
10
10
−1
10
M =4, M 1
2−4
=4, proposed
M1 =12, M2−4 =12, conventional −2
10
1
2−4
=12, proposed
M1 =4, M2−4 =6, conventional
M1 =16, M2−4 =16, conventional
M =4, M
M =16, M
1
2−4
=6, proposed
1
M1 =4, M2−4 =8, conventional
−3
10
M =12, M
1
2−4
−3
10
=14, conventional
2
3
4
=16, proposed
1
2−4
6
7
8
5
6
7
9
=48, proposed
1
2−4
=64, proposed
M1 =64, M2−4 =64, conventional M1 =64, M2−4 =64, proposed
=24, conventional
8
2−4
M =48, M
M =64, M 1
−3
10
M1 =16, M2−4 =24, proposed 5
1
M1 =48, M2−4 =64, conventional
M1 =16, M2−4 =20, proposed M =16, M
M1 =4, M2−4 =14, proposed 1
M =48, M −2
10
M1 =16, M2−4 =20, conventional
M1 =4, M2−4 =8, proposed M =4, M
2−4
M1 =48, M2−4 =48, conventional
BLER
M1 =4, M2−4 =4, conventional
−2
10
BLER
BLER
919
10
11
12
13
14
2−4
=80, conventional
M1 =64, M2−4 =80, proposed 12
13
14
15
16
(a) 4-QAM
17
18
19
20
Eb /N0
Eb /N0
Eb /N0
(b) 16-QAM
(c) 64-QAM
Fig. 2: BLER comparisons of the proposed QRD-M detector and the conventional one based on a 4 × 4 MIMO multiplexing system. TABLE I: Comparisons of the Number of Arithmetic Operations and Total Computation Cost Detection
method Operations
Real multiplication
Required number of operation per symbol interval Real addition
Conventional
QR decomposition of channel matrix H
4N 3 T
4N 3 − 3N 2 T T
QRD-M
Multiplication of QH to received signal vector
4N 2 T
Symbol replica generation
Total cost (NT = NR = 4)
Total cost (NT = NR = 12)
Bit Shift
4-QAM
16-QAM
64-QAM
4-QAM
16-QAM
64-QAM
0
0
6164 M1 =4 M2−4 =4
24620 M1 =12 M2−4 =12
274808 M1 =48 M2−4 =48
98532 M1 =4 M2−4 =4
187164 M1 =12 M2−4 =12
1188072 M1 =48 M2−4 =48
2NT (2NT − 1)
0
0
6812 M1 =4 M2−4 =6
29912 M1 =16 M2−4 =16
357752 M1 =48 M2−4 =64
100908 M1 =4 M2−4 =6
206280 M1 =16 M2−4 =16
1492200 M1 =48 M2−4 =64
2N 2 Q T
NT (NT − 1)Q
0
0
7460 M1 =4 M2−4 =8
35096 M1 =16 M2−4 =20
358184 M1 =64 M2−4 =64
103284 M1 =4 M2−4 =8
225288 M1 =16 M2−4 =20
1492632 M1 =64 M2−4 =64
Calculation of squared Euclidean distances
2(Q+ NT −1 Mi Q) i=1
7(Q+ NT −1 Mi Q) i=1
0
0
9404 M1 =4 M2−4 =14
40280 M1 =16 M2−4 =24
441128 M1 =64 M2−4 =80
110412 M1 =4 M2−4 =14
244296 M1 =16 M2−4 =24
1796760 M1 =64 M2−4 =80
Proposed
QR decomposition of channel matrix H
4N 3 T
4N 3 − 3N 2 T T
0
0
5456 M1 =4 M2−4 =4
13712 M1 =12 M2−4 =12
90632 M1 =48 M2−4 =48
96320 M1 =4 M2−4 =4
152160 M1 =12 M2−4 =12
594168 M1 =48 M2−4 =48
QRD-M
Multiplication of QH to received signal vector
4N 2 T
2NT (2NT − 1)
0
0
5784 M1 =4 M2−4 =6
15368 M1 =16 M2−4 =16
112456 M1 =48 M2−4 =64
97624 M1 =4 M2−4 =6
159608 M1 =16 M2−4 =16
700600 M1 =48 M2−4 =64
Symbol replica generation
2N 2 Q T
NT (NT − 1)Q
0
0
6112 M1 =4 M2−4 =8
16984 M1 =16 M2−4 =20
112616 M1 =64 M2−4 =64
98928 M1 =4 M2−4 =8
167016 M1 =16 M2−4 =20
700760 M1 =64 M2−4 =64
Calculation of the simplified branch metric
0
8(Q+ NT −2 Mi Q) i=1
2(Q+ NT −2 Mi Q) i=1
(Q+ NT −2 Mi Q) i=1
7096 M1 =4 M2−4 =14
18600 M1 =16 M2−4 =24
134440 M1 =64 M2−4 =80
102840 M1 =4 M2−4 =14
174424 M1 =16 M2−4 =24
807192 M1 =64 M2−4 =80
2NT M N −1 Q T
NT M N −1 Q T
0
0
Recalculation of squared Euclidean distances based on L [cf. (4)]
Comparison
best solution as that of the conventional one with a high probability. Based on the proposed method, the QRD-M detector achieves similar/better performance than the conventional one with less/comparable computational complexity. The extension of proposed method to other (quasi)-ML based detectors, such as LSD and LISS [7], is straightforward. References [1] K. J. Kim and R. A. Iltis, “Joint detection and channel estimation algorithms for QS-CDMA signals over time-varying channels,” IEEE Trans. Commun., vol. 50, no. 5, pp. 845–855, May 2002. [2] X. Dai, J. An, X. Li, S. Sun, and Y. Wang, “Reducing the complexity of quasi-maximum-likelihood detectors through companding for coded MIMO systems,” IEEE Trans. Veh. Technol., vol. 61, no. 3, pp. 1109– 1123, Mar. 2012.
[3] B. M. Hochwald and S. ten Brink, “Achieving near-capacity on a multiple-antenna channel,” IEEE Trans. Commun., vol. 51, no. 3, pp. 389–399, Mar. 2003. [4] Prajnanam Project Team and S. S. Mahant Shetti, “Euclidean distance computation algorithm for QAM applications,” VLSI Society of IndiaVSI VISION, vol. 1, no. 2, 2005. [5] 3rd Generation Partnership Project, R1-031301, “Complexity comparison of OFDM HS-DSCH receivers and advanced receivers for HSDPA and associated text proposal,” Meeting #35, Nov. 2003. [6] 3rd Generation Partnership Project, “Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (EUTRA); Physical channels and modulation,” 3GPP TS 36.211, V8.0.0, Sep. 2007. [7] S. B¨aro, J. Hagenauer, and M. Witzke, “Iterative detection of MIMO transmission using a list-sequential (liss) detector,” in Proc. 2003 IEEE ICC, pp. 2653–2657.