1
An Improved Ordered-Block MMSE Detector for Generalized Spatial Modulation Chiao-En Chen, Member, IEEE, Cheng-Han Li, and Yuan-Hao Huang, Member, IEEE
Abstract— In this letter, an improved ordered-block minimummean-squared-error (OB-MMSE) detector for generalized spatial modulation systems is presented. We first propose to use the concentrated distance metric derived from the conditional maximum likelihood estimator as the ordering metric for the OBMMSE and then design a computationally-efficient algorithm for computing this metric. The improved ordering performance of the proposed algorithm allows the early-termination of the OB-MMSE detector without noticeable performance loss which can be exploited to further reduce its complexity. Simulation results show that the proposed algorithm can achieve better performance-complexity tradeoff compared to the existing OBMMSE detector. Index Terms— Spatial Modulation (SM), Generalized Spatial Modulation (GSM), Multiple-Input-Multiple-Output (MIMO)
I. I NTRODUCTION PATIAL modulation (SM) [1] is a recently proposed multiple-input-multiple-output (MIMO) scheme which combines the conventional amplitude/phase modulation with antenna index modulation. Compared to the conventional schemes, SM enables a higher throughput, simpler transceiver design, and better energy efficiency tradeoffs. Hence, it has drawn great research attentions lately. Many of the recent advances in SM-related techniques can be found in the survey papers [2, 3] and also references therein. Based on the SM principle, a generalized scheme called the generalized spatial modulation (GSM) [4, 5] has also been presented to enable more than one transmit antenna being activated at each time instance. These additional degrees of freedom can be leveraged to achieve higher diversity and/or spectral efficiency, but also place additional challenges on the design of low-complexity receivers. In contrast to the SM receivers [3], GSM receivers are required to decode information from all the possible transmit antenna combinations (TACs) and hence generally require searching over a much larger solution space. In [6], a low complexity detector using decorrelation receiver has been proposed, where the error rate performance obtained is far from that of the optimal maximum likelihood (ML) detector. More recently, a new GSM detector called the orderedblock minimum-mean-squared-error (OB-MMSE) detector is
S
This work was partially supported by the Ministry of Science and Technology (MOST), Taiwan, under grant number MOST103-2622-E-194-008-CC1 and MOST103-2221-E-007-126-MY2. C.-E. Chen is with the Department of Electrical/Communications Engineering, National Chung Cheng University, Chiayi, Taiwan. (e-mail:
[email protected]). C.-H. Li and Y.-H. Huang are with the Department of Electrical Engineering, Institute of Communications Engineering, National Tsing-Hua University, Hsinchu, Taiwan.
proposed [7]. The OB-MMSE first sorts the potential TACs using some ordering criterion and then checks each candidate TAC along with the associated MMSE solution for symbol detection. Simulation results show the OB-MMSE can achieve near-ML performance with almost 80% reduction in complexity [7]. In this letter, an improved OB-MMSE detector for GSM is proposed. We first show that the concentrated ML distance metric can be used to generate better ordering results compared to the one used in [7]. To reduce the complexity in computing this metric, a computationally efficient concentrated ML (CECML) computation algorithm is then proposed to avoid redundant computations. Simulation results show that with the proposed CECML, more than 50% of the complexity of OB-MMSE at sufficiently high SNR can be reduced without noticeable performance degradation, which clearly shows the efficiency of the proposed algorithm. Notations: Throughout this letter, matrices and vectors are set in boldface, with uppercase letters for matrices and lower case letters for vectors. The superscripts T , H , −1 and † denote the transpose, conjugate transpose, inverse, and the pseudoinverse, respectively. k · k is used to represent the Euclidean norm of a vector, and IN denotes the N × N identity matrix. II. S YSTEM M ODEL Consider a GSM system communicating over a MIMO channel using its Nt transmit antennas and Nr receive antennas. The MIMO channel is assumed to be of quasi-static frequency flat fading, and hence can be represented by a Nr × Nt channel matrix H. Consequently, the received signal y ∈ CNr ×1 can be described as y = Hx + w, where x denotes the transmitted symbol vector, and w ∈ CNr ×1 denotes the zero-mean circularly-symmetric complex Gaussian random noise vector with covariance matrix σ 2 INr . In the GSM system, only Np (2 ≤ Np ≤ Nt ) transmit antennas are activated at each time instance. Consequently, t there are CN Np possible TACs. Among these TACs, N = j k N log C t
2 2 Np TACs are chosen to convey log2 N bits of information. Since only Np antennas are active, x consists of exactly Np non-zero elements and can be expressed as T x = · · · , 0, s1 , 0, · · · , 0, s2 , 0, · · · , 0, sNp , 0, · · · , where the symbols s1 , , s2 , . . . , sNp have unit energy and are assumed to be drawn i.i.d. from a M -ary constellation set Ω. As a result, Np log2 M bits of information will be conveyed using the Np symbols, and an overall log2 N +Np log2 M bits of information can be transmitted per symbol period.
2
y = Hx + w =
Np X
hik sk + w = HIi s + w.
(1)
1 Empirical cumulative distribution function (CDF)
For convenience of later discussion, suppose the ith TAC set Ii among the N possible TAC sets is being used for transmission. Let Ii = i1 , i2 , . . . , iNp where 1 ≤ i1 < i2 < . . . < iNp ≤ Nt denote the indices of active antennas in Ii . It follows that the received data vector y can therefore be represented alternatively as
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2
proposed (10 dB) original (10 dB) proposed (0 dB) original (0 dB)
0.1
k=1 0
T
Here hℓ denotes the ℓth columni of H, s = s1 , . . . , sNp , h and HIi = hi1 , hi2 , , . . . , hiNp is the associated submatrix of H corresponding to the TAC set Ii .
III. R EVIEW
ON THE
OB-MMSE D ETECTOR
(j,s)
(j,s)
20
30 Order
40
50
60
Fig. 1: Empirical CDFs of the generated rank for the true TAC after sorted by the original and the proposed ordering criteria. (Nt = 16, Nr = 8, N = 64).
IV. P ROPOSED CECML C OMPUTATION A LGORITHM
The optimal detector for the aforementioned GSM-MIMO system is the ML detector, given by (ˆj, ˆ s) = arg min d(Ij , s) = arg min ky − HIj sk2 ,
10
(2)
among all ordered subsets {Ij }N j=1 and all symbol vectors s ∈ ΩNp . Clearly, a straightforward exhaustive search approach requires a tremendous complexity, and hence is inefficient for practical implementation. Recently, a low-complexity OBMMSE detector featuring near-ML performance has been proposed [7]. The main idea of OB-MMSE is to sort all the potential TACs according to an ordering criterion, and then detect the associated symbol vector using a conventional MMSE detector for each TAC. Let the MMSE detected result corresponding to Ij be denoted as ˆ sIj . The distance metric d(Ij , ˆ sIj ) is computed and then compared with a predefined threshold vth . If the distance metric is smaller than vth , the algorithm outputs the current results (j, ˆ sIj ) as the final decision and terminates. Otherwise, the algorithm compares distance metrics in all the N hypotheses and outputs the detection results associated to the hypothesis that has the smallest d(Ij , ˆ sIj ). Clearly, ordering plays an important role in OB-MMSE. A good ordering criterion places the true TAC in lower rank so that the true TAC is tested earlier, which results in early termination. Now, we elaborate on how the TACs are ordered in OB-MMSE. In OB-MMSE, the data vector y is first multiplied by hH y the pseudo-inverse of hℓ to obtain zℓ = h†ℓ y = khℓ k2 , ℓ for all ℓ = 1, . . . , N t . Then for each candidate TAC set Ij = j1 , j2 , . . . , jNp , a weighting factor wj is computed PNp 2 as wj = |zj1 |2 + |zj2 |2 + . . . + |zjNp |2 = k=1 |zjk | . After wj ’s have been computed for all j = 1, . . . , N , the algorithm sorts the weighting factors in descending order such that wm1 ≥ wm2 ≥ . . . ≥ wmN . Then the TAC set Im1 is tested first, followed by Im2 , ..., and finally ImN . The overall complexity in computing these weighting factors has been shown to be 14Nr Nt + 3Nt + N Np real floating point operations (flops) [7].
A. New ordering criterion based on the conditional MLE We first consider the conditional ML estimate (MLE)1 [8] for s, under the condition that HIi is given. From the sIi = system model (1), the CMLE ˜ sIi is obtained as ˜ −1 H†Ii y = HH HH sIi into d(Ii , ·), we I i HI i Ii y. Substituting ˜ obtain the concentrated distance metric d˜ which only explicitly depends on the TAC
˜ i ) = IN − PH y 2 , (3) d(I Ii r −1 H HIi . From (3), it is clear where PHIi = HIi HH I i HI i ˜ i ) measures the euclidean distance square between that d(I y and the subspace spanned by the columns of HIi , and hence can be used to measure how likely y is generated from the TAC set Ii . From the Pythagoras’ theorem kyk2 = kPHIj yk2 + k(INr − PHIj )yk2 , it is clear that minimizing
2
(3) is equivalent to maximizing PHIi y = yH PHIi y. We therefore propose a new ordering criterion which sorts the TACs in descending order according to the associated new weighting factor uj , defined as uj = yH PHIj y, for all j = 1, . . . , N . Fig. 1 shows the empirical CDF of the generated rank of the true TAC after being sorted using the original and the proposed ordering criteria, simulated over 10, 000 independent realizations. It is easily observed that the proposed ordering criterion has higher probability of generating smaller rank for the true TAC, which is advantageous for OB-MMSE detection. Despite of the superior ordering performance of the proposed ordering criterion, the main drawback lies in the increased complexity. Straightforward computation of uj requires a complex QR factorization, a matrix-vector multiplication, and an inner product. It can be shown that an overall complexity of O(8N Nr Np2 − 83 N Np3 + 8N Nr Np ) real flops is required to compute all the weighting factors. This high complexity motivates us to develop a computationally more efficient algorithm to compute these metrics as shown in the next subsection. 1 Here we do not take the constellation constraints of s into account in order to enable efficient computation as described in Section IV-B.
3
B. Proposed CECML computation algorithm We first denote the channel matrix associated to the TAC I as HI = [B, c], where B and c are formed by the first Np − 1 column vectors and the last column vector of HI , respectively. By using the well known property of projection matrices [9], we have PHI = P[B,c] = PB +P(I−PB )c , where PB and P(I−PB )c denote projection matrices associated to the subspaces spanned by the columns of B and (I − PB )c, respectively. It follows that for any two arbitrary vectors f ∈ CNr ×1 and g ∈ CNr ×1 , one can compute f H PHI g = f H P[B,c] g = f H PB g + f H P(I−PB )c g H f c − f H PB c cH g − cH PB g H =f PB g + . (4) 2 kck − cH PB c For the special case where f = g = y, we obtain H c y − cH PB y 2 H H . y PHI y = y PB y + 2 kck − cH PB c
(5)
Note that the quadratic and bilinear quantities yH PB y, cH PB y and cH PB c in (5) are also in the form of (4) and hence we can similarly represent B as [B′ , c′ ] and then apply the relation described in (4) again. It is not difficult to see that we can keep performing (4) until all the quadratic and bilinear forms are expressed in the form of f H Pd g = (f H d)(dH g)/kdk2 for some vectors f , d, and g. With the above procedure, we can write the explicit expression of yH PHI y for any Np ≥ 2 in terms of inner products among the columns HI and y. Here we provide the explicit expression for the Np = 3 case with HI = [t1 , t2 , t3 ] as an example. With some mathematical manipulations, yH PHI y is given by yH P[t1 ,t2 ,t3 ] y = yH P[t1 ,t2 ] y + num/den, 2 H num = tH 3 y − t3 P[t1 ,t2 ] y , den = kt3 k2 − tH 3 P[t1 ,t2 ] t3 , H
H
2
y P[t1 ,t2 ] y = y Pt1 y + |term1 | /term2 , H tH 3 P[t1 ,t2 ] y = t3 Pt1 y + (term3 )(term1 )/term2 , 2 H tH 3 P[t1 ,t2 ] t3 = t3 Pt1 t3 + |term3 | /term2 , H term1 = tH 2 y − t2 Pt1 y, term2 = kt2 k2 − tH 2 Pt1 t2 , H H term3 = t3 t2 − t3 Pt1 t2 , H 2 H t y (tH 3 t1 )(t1 t2 ) H t = P , y Pt1 y = 1 2 , tH 2 t 1 3 2 kt1 k kt1 k H 2 H t t1 (tH 2 t1 )(t1 y) H H , t2 Pt1 t2 = 2 2 , t2 Pt1 y = 2 kt1 k kt1 k H 2 H H t3 t1 (t3 t1 )(t1 y) H y = t = tH P , t P 3 t t 1 1 3 3 2 2 , kt1 k kt1 k
(6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17)
From the above discussion, it is clear that the computation of uj = yH PHIj y can always be decomposed into some basic arithmetic operations of inner products between the columns of HIj and the data vector y, and also the inner products between the columns of HIj . Let the weighting factors be
sorted as um1 ≥ um2 ≥ . . . ≥ umN . Since there are only finitely many ( more precisely, Nt ) possible channel vectors, it is expected that when we compute umj for some relatively large j, many of the required inner product terms may have already been computed when computing um1 , . . . , umj−1 . This motivates the development of the proposed CECML computation algorithm in which the redundant computation of these inner product terms is avoided. A summary of the proposed CECML ordering algorithm for the case of Np = 3 is described as in Table I. The algorithm first initializes the elements in Γhh , Φh,h , and γ hy to be all zeros. The (ℓ, m)th element of Γhh will used to store the value of hH ℓ hm after it has been computed. The (ℓ, m)th element of Φh,h is set to 1 if [Γhh ]ℓ,m has been computed and stored, while the ℓth element of γ hy is used to store the value of hH ℓ y after it has been computed. In line 2 to line 4 of Table I, the values for [Γhh ]ℓ,ℓ and [γ hy ]ℓ are computed for ℓ = 1, . . . , Nt , and the diagonal terms of the flag matrix Φh,h are updated. This exhibits a complexity of O(Nr Nt ). In line 5 to line 19 of Table I, the algorithm computes uj for all j = 1, . . . , N . When computing uj , the algorithm first checks the corresponding values in Φhh H H to see whether or not hH j2 hj1 , hj3 hj2 , and hj3 hj1 have been computed already. If some of these values are computed, then they are read from [Γhh ], otherwise they will be computed and have the corresponding positions in [Γhh ] and Φhh updated. After all the required inner product terms are available, The algorithm then use (6)-(17) to compute uj . TABLE I: Summary of our CECMD algorithm (Np = 3) Nr ×Nt Input: TAC Sets {Ij }N . j=1 and channel matrix H ∈ C Output: weighting factors {uj }N j=1 . 1: Set Γhh = 0Nt ×Nt , Φhh = 0Nt ×Nt , and γ hy = 0Nt ×1 . 2: for ℓ = 1 to Nt do 3: [Γhh ]ℓ,ℓ = khℓ k2 , [Φhh ]ℓ,ℓ = 1, and γ hy ℓ = hH ℓ y. 4: end for 5: for j = 1 to N do 6: if [Φhh ]j2 ,j1 == 0 then 7: [Γhh ]j ,j = hH j2 hj1 , [Φhh ]j2 ,j1 = 1 2 1 8: [Γhh ]j ,j = [Γhh ]∗ j2 ,j1 , [Φhh ]j1 ,j2 = 1 1 2 9: end if 10: if [Φhh ]j3 ,j1 == 0 then 11: [Γhh ]j ,j = hH j3 hj1 , [Φhh ]j3 ,j1 = 1 3 1 12: [Γhh ]j ,j = [Γhh ]∗ j3 ,j1 , [Φhh ]j1 ,j3 = 1 1 3 13: end if 14: if [Φhh ]j3 ,j2 == 0 then 15: [Γhh ]j ,j = hH j3 hj2 , [Φhh ]j3 ,j2 = 1 3 2 16: [Γhh ]j ,j = [Γhh ]∗ j3 ,j2 , [Φhh ]j2 ,j3 = 1 2 3 17: end if 18: Compute uj using (6) using the following quantities: H H tH 1 y = γ hy j , t2 y = γ hy j , t3 y = γ hy j . 1
2
kt1 k2 = [Γhh ]j ,j , kt2 k2 = [Γhh ]j ,j , 1 1 2 2 kt3 k2 = [Γhh ]j ,j , 3 3 H ∗ tH 1 t2 = (t1 t2 ) = [Γhh ]j1 ,j2 , H ∗ tH 1 t3 = (t1 t3 ) = [Γhh ]j1 ,j3 , H H t2 t3 = (t2 t3 )∗ = [Γhh ]j ,j . 2 3 19: end for
3
The complexity of computing all the uj ’s using the proposed CECML algorithm can be obtained through careful bookkeeping. Since the proposed scheme produces many intermediate variables as it decomposes the quadratic form yH P[t1 ,...,tNp ] y into simple arithmetic operations of inner products, it is very difficult to provide an unified expression to characterize
4
7
5
SD Original (N
4
10
Flops
3
−3
10
original (N
original (N original (N −4
10
In this letter, we presented our CECML algorithm which enables efficient computation of a newly proposed ordering metric for the OB-MMSE detector in a GSM system. The proposed algorithm avoids redundant computations and enables early termination without noticeable performance degradation.
proposed (Ntest=N/4)
−2
V. S IMULATION R ESULTS
VI. C ONCLUSION
=N)
test
Naive implementation (Ntest=N/4)
−1
original (N
=N/8)
2
test
=N/4)
test
=N/2)
test
=N)
test
proposed (N
=N/8)
proposed (N
=N/4)
proposed (N
=N)
1
test test test
ML SD 0 0
2
4
6
8 10 12 SNR (dB)
14
16
18
20
0
2
4
6
8 10 12 SNR (dB)
14
16
18
20
Fig. 2: BER and complexity of various GSM detectors (Nt = 16, Nr = 4, N = 64, Np = 2). 9
x 10
SD Original (N
2
=N)
test
Naive implementation (N
=N/16)
test
proposed (N
=N/16)
4
8
test
−1
Flops
10
Bit error rate (BER)
In this section we present the simulation results of the proposed CECML algorithm applied to the OB-MMSE detector in comparison with the original OB-MMSE detector, the ML detector, and the Sphere Decoder (SD) [10]. We use QPSK to modulate the symbols, and the channel is assumed to be frequency flat Rayleigh faded, where the elements of H are drawn i.i.d. from a complex Gaussian distribution with unit variance. The threshold of the OB-MMSE is set to be vth = Nr σ 2 , and the signal-to-noise ratio is defined as SNR = Np /σ 2 . Each simulation point on the complexity plot is obtained by averaging the number of real flops over 10, 000 independent realizations. In Fig. 2, the bit error rate (BER) performance and the averaged number of flops of performing the OB-MMSE detection are illustrated under Nt = 16, Nr = 4, Np = 2 and N = 64. The BER of the optimal ML detector is also plotted for comparison. From the BER curve, it is observed that the OB-MMSE detector using our proposed CECML computation algorithm can achieve similar performance as the original OB-MMSE (Ntest = N ) by just testing the first Ntest = N/4 ordered TACs. The simulation result also shows that the proposed scheme enjoys almost the same performance of the original OB-MMSE while exhibiting only 60% of its complexity at an SNR=20 dB. By avoiding redundant recomputations, the proposed CECML algorithm reduces the complexity at SNR=20 dB by roughly 50% compared to that of the naive implementation described in Section IV-A. Comparing to the SD, the proposed CECML-OB-MMSE can achieve 75% to 85% complexity reduction in this simulation setting with only slight BER degradation. At low SNR, the advantage of early-termination in the proposed CECML-OBMMSE is less effective due to the large vth . In this case, the CECML-OB-MMSE can have higher complexity compared to the original OB-MMSE because the proposed CECML-OBMMSE requires higher complexity in computing the weighting factors. Fig. 3 illustrates the performance of OB-MMSE detection under Nt = 16, Nr = 8, Np = 3 and N = 512. In this setting, the proposed CECML algorithm can achieve similar performance as the full original OB-MMSE (Ntest = N ) by just testing the first Ntest = N/16 ordered TACs. This results in roughly 78% complexity reduction at an SNR=12 dB compared to the original OB-MMSE.
x 10
10
Bit error rate (BER)
the complexity for the general Np case. For Np = 2 and Np = 3 cases, it can be shown that the complexities required to compute all the uj ’s are 8N Nr + 8Nt Nr + 24N and 4Nt2 Nr + 8Nt Nr + 69N real flops, respectively. The details of derivation is omitted due to the page limit.
−2
10
1 original (N original (N original (N
=N/32)
test
=N/16)
test
=N/2)
test
original (Ntest=N) proposed (N
=N/32)
proposed (N
=N/16)
proposed (N
=N)
test
−3
10
test test
ML SD 0 0
2
4
6 SNR (dB)
8
10
12
0
2
6 SNR (dB)
10
12
Fig. 3: BER and complexity of various GSM detectors (Nt = 16, Nr = 8, N = 512, Np = 3). Simulation results show that the proposed algorithm provides substantial complexity reduction at moderate to high SNR region and hence exhibits a better performance-complexity tradeoff compared to the existing OB-MMSE detector. R EFERENCES [1] R. Mesleh, H. Haas, S. Sananovic, C. W. Ahn, and S. Yun, “Spatial modulation,” IEEE Trans. Veh. Tech., vol. 57, no. 4, pp. 2228–2241, Jul. 2008. [2] M. D. Renzo, H. Haas, A. Ghrayeb, S. Sugiura, and L. Hanzo, “Spatial modulation for generalized MIMO: Challenges, opportunities and implemention,” Proc. IEEE, vol. 102, no. 1, pp. 56–103, Jan. 2014. [3] P. Yang, M. D. Renzo, Y. Xiao, S. Li, and L. Hanzo, “Design guidelines for spatial modulation,” IEEE Communications Surveys and Tutorials, pp. 1–24, 2014. [4] J. Fu, C. Hou, W. Xiang, L. Yan, and Y. Hou, “Generalised spatial modulation with multiple active transmit antennas,” in Proc. IEEE GLOBECOM Workshops, Miami, FL, Dec. 2010, pp. 839–844. [5] A. Younis, N. Serafimovski, R. Mesleh, and H. Haas, “Generalised spatial modulation,” in Proc. IEEE Asilomar Conf. Signals Syst. Comput., Pacific Grove, CA, Nov. 2010, pp. 1498–1502. [6] J. Wang, S. Jia, and J. Song, “Generalised spatial modulation system with multiple active transmit antennas and low complexity detection scheme,” IEEE Trans. Wireless Commun., vol. 11, no. 4, pp. 1605– 1615, Apr. 2012. [7] Y. Xiao, Z. Yang, L. Dan, P. Yang, L. Yin, and W. Xiang, “Lowcomplexity signal detection for generalized spatial modulation,” IEEE Commun. Lett., vol. 18, no. 3, pp. 403–406, Mar. 2014. [8] P. Stoica and N. Nehorai, “Performance study of conditional and unconditional direction-of-arrival estimation,” IEEE Trans. Acoust. Speech, Signal Process., vol. 38, no. 10, pp. 1783–1795, Oct. 1990. [9] I. Ziskind and M. Wax, “Maximum likelihood localization of multiple sources by alternating projection,” IEEE Trans. Acoust. Speech, Signal Process., vol. 36, no. 10, pp. 1553–1560, Oct. 1988. [10] A. Younis, S. Sinanovi´c, M. Di Renzo, R. Mesleh, and H. Haas, “Generalised sphere decoding for spatial modulation,” IEEE Trans. Commun., vol. 61, no. 7, pp. 2805–2815, Jul. 2013.