Low-complexity Detection Based on Belief Propagation in a Massive MIMO System Wataru Fukuda∗ , Takashi Abiko∗ , Toshihiko Nishimura∗ , Takeo Ohgane∗ , Yasutaka Ogawa∗ , Yusuke Ohwatari† , and Yoshihisa Kishiyama† ∗ Graduate
School of Information Science and Technology, Hokkaido University Kita 14 Nishi 9, Kita-ku, Sapporo, Hokkaido, 060-0814 Japan Email: {fukuda, abiko}@m-icl.ist.hokudai.ac.jp, {nishim, ohgane, ogawa}@ist.hokudai.ac.jp † NTT DoCoMo, INC. 3-6 Hikari-no-oka, Yokosuka-shi, Kanagawa, 239-8536 Japan Email:
[email protected],
[email protected] Abstract—A very large MIMO system has a potential to achieve extremely-high system throughput. In general, however, algorithms detecting spatially-multiplexed signals require the complexity proportional to the cubed number of antenna elements in the least case. Thus, the implementation of an antenna array with an order of 100 elements becomes very difficult. In this paper, we focus on the algorithm which is based on belief propagation and implementable with the second-order calculations. The simulation results show that the algorithm provides very good BER performance in MIMO spatial multiplexing when the number of antenna elements is 100 and reasonably low complexity in comparison to the MMSE spatial filtering.
I. I NTRODUCTION Multiple-input multiple-output (MIMO) systems can effectively improve the channel capacity [1] and have already proved those advantages through single-user MIMO services such as WiFi, WiMAX, and LTE. We may say that MIMO systems are now in a phase of further evolution. Currently, multiuser MIMO is one of the major issues in standardization of LTE-Advanced and IEEE 802.11ac [2], [3]. The multiuser MIMO is suitable when the base station or access point is equipped with lots of antenna elements. As an extension of this concept, recently, very large (or massive) MIMO systems with an order of 100 antenna elements have been proposed, and there its capability to achieve extremely high capacity is demonstrated [4], [5]. Such massive MIMO systems are mainly discussed for a broadcast channel where each user terminal may detect a few data streams itself. When we consider a multiple access channel, however, the base station has to detect a number of spatially multiplexed signals simultaneously. In addition, an improved usability of higher frequency bands in the future might make it possible to realize massive single-user MIMO systems. Thus, a demand to develop low-complexity detection algorithms is growing. Well known algorithms such as zero-forcing or minimum mean squared error (MMSE) spatial filtering, sphere decoding [6], and QRM-MLD [7], [8] require a matrix calculation in which complexity is proportional to the cubed number of antenna elements. In order to reduce a complexity to the second-order calculations, several works have been done. The
typical examples are likelihood ascent search [9] and reactive tabu search [10]. These have a structure similar to the bit inversion algorithm in LDPC decoding and achieve both low complexity and good performance. And then, a detection method based on the belief propagation (BP) algorithm has been proposed [11], [12]. The BP-based algorithm is less likely to suffer from the local minimum problem and provides better performance in general. However, to the author’s knowledge, there are performance evaluations only with other MIMO techniques such as V-BLAST and STBC. Our objective in the paper is to reveal the detection capability of pure BP-based algorithm for spatially-multiplexed signals. The rest of the paper is organized as follows. After introducing a Tanner graph expression, we redescribe the BP-based algorithm with parallel interference canceller (PIC). Then, BER performances for uncoded and coded cases are compared to the case with MMSE spatial filtering. Finally, these complexities are roughly discussed. II. TANNER G RAPH E XPRESSION OF MIMO C HANNEL In this paper, we consider MIMO spatial multiplexing with M transmit and N receive antennas as shown in Fig. 1(a). Here, independent signals are transmitted from each transmit antenna without precoding. Let us denote the transmitted signal vector as x ∈ CM ×1 , the noise vector as z ∈ CN ×1 , and the channel matrix as H ∈ CN ×M . Then, the received signal vector y ∈ CN ×1 is written as y = Hx + z, where
⎡ ⎢ ⎢ H=⎢ ⎣
h11 h21 .. .
h12 h22 .. .
hN 1
hN 2
··· ··· hij ···
(1) h1M h2M .. .
⎤ ⎥ ⎥ ⎥. ⎦
(2)
hN M
Note that a time-nondispersive (frequency flat) MIMO channel is assumed for the sake of simplicity although the following discussion is applicable to any channel models. Figure 1(a) shows that all the channels between the multiple transmit antennas and receive antennas are existing and
978-1-4673-6337-2/13/$31.00 ©2013 IEEE
Transmitter #1
#j
#1
#j
#M
#M
βi1 #i
#1
#i
#N
Fig. 2.
Message update at the ith observation node.
Receiver
(a) MIMO system model Symbol node #1
#j
#M
so large with the increase in the number of antenna elements. Thus, it is expected that the effective number of loops with high impact is a few. This is one of the reasons that BP based detection is suitable for massive MIMO systems. III. I TERATIVE P ROCESSING BASED ON BP A LGORITHM A. Message Update at Observation Nodes
#1
#i
#N
Observation node
(b) corresponding Tanner graph Fig. 1.
MIMO channel and its Tanner graph expression.
expressed by the channel responses hij ∈ C (i = 1, 2, . . . , N , j = 1, 2, . . . , M ). Here, let us consider a certain symbol timing. At each receive antenna, M transmitted symbols are summed with different weights corresponding to channel gains. Obviously, the transmitted symbols and received signals are mutually dependent. This property clearly indicates that the MIMO spatial multiplexing system can be modeled by a Tanner graph. Figure 1(b) illustrates a Tanner graph expression for Fig. 1(a). Nodes at the transmitter side have information on transmitted symbols individually. Thus, each node is referred to as “symbol node” hereinafter. On the other hand, nodes at the receiver side store the signal information observed at the receiver. Therefore, we call each node as “observation node”. Based on this Tanner graph, reliability messages can be exchanged between both sides’ nodes. In the next section, we will describe message update rules used here as an example of the BP algorithm implementation. Before starting the detailed discussion, note that there are many loops in the graph since all nodes are connected to each other. In general, it is said that such loops cause performance degradation in message passing. In this model, however, the coupling strength is determined by the channel response and thus expected to distribute randomly. In other words, the number of major connections would substantially not become
At each observation node, transmitted symbols are detected, and the result is passed as a message (extrinsic information) to each symbol node. Here, we describe the detection algorithm using a message update rule at the ith observation node. Let us consider generating extrinsic information to the jth symbol node as shown in Fig. 2. It corresponds to detect the jth symbol using other symbols’ information βim (m = 1, . . . , j − 1, j + 1, . . . , M ). βim denotes the extrinsic information passed from the mth symbol node and contains log-likelihood ratios (LLRs) of the bits within the symbol. Defining the number of the bits per symbol as L, we can write βim as (im)
βim = {b1
(im)
, . . . , bL
},
(3)
(im)
where bl is the lth bit’s LLR of the mth symbol passed to the ith observation node. Clearly, maximum a posteriori estimation using the received signal yi and βim is optimum but requires complicated calculations. Our aim is to develop a low-complexity detection algorithm. Thus, a PIC is applied instead. The received signal at the ith observation node is written as yi =
M
him xm + ni
m=1
= hij xj +
M
him xm + ni .
(4)
m=1,m=j
In the above equation, the second term on the right hand side corresponds to the inter-substream interference1 . This term can be cancelled using bit LLR information of corresponding symbols as follows. (i) First, the replica of the mth symbol x ˆm is reconstructed using βim passed from the mth symbol node to the ith 1 This is equal to the multi-access interference when a multiple-access channel of multi-user MIMO systems is considered.
γj
#1
Fig. 3.
#j
#j
#1
#N
#i
A posteriori LLR calculation at the jth symbol node.
Fig. 4.
observation node. Let f be the symbol mapping function of (i) bit LLR information. Then, x ˆm is expressed as (im)
x ˆ(i) m = f (βim ) = f (b1
(im)
, . . . , bL
).
(5)
(i)
For example, in the BPSK case, x ˆm is written as (m)
(m)
= 1) + (−1) Pr(s1 x ˆ(i) m = (+1) Pr(s1
(im) 2 , = tanh b1
= 0)
(m)
M
him x ˆ(i) m.
(8)
m=1,m=j
Since the replicas are not perfect, residual interference components usually remain in the signal. We regard these components as an additional noise and replace the noise power by (ij) the equivalent noise power. Finally, LLR al of the lth bit of the jth symbol is calculated by (ij) al
= log
Pr(˜ yi | sl
(j)
= 1)
Pr(˜ yi | sl
(j)
= 0)
.
(9)
The process at the jth observation node is accomplished by passing the message including bit LLRs: (ij)
(ij)
αij = {a1 , . . . , aL },
(10)
as the extrinsic information to the ith symbol node. B. Message Update at Symbol Nodes One major role of the jth symbol node is to calculate a posteriori probability of the symbol with messages passed from the observation nodes as shown in Fig. 3. The a posteriori (j) LLR γl of the lth bit can be simply obtained as a sum of the extrinsic LLRs (j)
γl
=
N
(nj)
al
,
Next, the message to the ith observation node βij is updated using the a posteriori LLR. The extrinsic information should be composed of the information given by the observation nodes except the ith node to avoid propagating duplicate messages. Therefore, each bit LLR in βij is obtained as (ij)
where sl is the lth bit value (1 or 0) of the mth symbol. Cancelling the interference components in (4) using these replicas and channel information, we obtain the signal y˜i including the jth symbol mainly as y˜i = yi −
bl
(ij)
− al
.
(12)
is forThen, the bit LLR package warded to the ith observation node as in Fig. 4. The process at the jth symbol node is completed by this. C. Iteration As described above, the messages are iteratively exchanged between the observation nodes and symbol nodes. Through this process, an improvement of reliability for the symbol information is expected. In this paper, each bit in the transmitted symbols is determined using the a posteriori bit LLR at the symbol node after a certain number of iterations. The iteration starts at the observation node, and there we require initial values of βim . The easiest way is setting to all zero: (13) βim = {0, . . . , 0}. Although this method is very simple, the interference cancellation is not carried out at the first iteration since the replica signals also become zero. As a safer way, we also used a method generating the initial values of βim based on the maximum a posteriori estimation. However, this is nearly impossible due to high-complexity. Here, we focused D major links connected to the observation node and estimate the βim for these symbols only. First, we select the top D links in descending order of the signal strength from M transmitted symbols at each observation node. The symbols not chosen are regarded as the noise. Consequently, using the indices g(d) (d = 1, . . . , D) of selected symbols at the ith observation node, we calculate L bit-LLRs for the g(d)th symbol with max-log approximation as (ig(d))
αl
(g(d))
= log
(11)
is a bit LLR component included in the message where al αnj as in (10). As will be described later, this is used for the final decision.
(j)
= γl
(im) (im) βim = {b1 , . . . , bL }
n=1 (nj)
Message update at the jth symbol node.
(6) (7)
#N
#i
log
Pr(yi | sl Pr(yi | (g(q))
st
max
(g(d))
Pr(yi|sl
= 1)
∈(1,0),t=1,..,L,q=1,..,D,(t,q)=(l,d)
(g(q))
st
= 1)
(g(d)) sl = 0)
max
(g(d))
Pr(yi|sl
= 0)
∈(1,0),t=1,..,L,q=1,,,,D,(t,q)=(l,d)
(14)
,
where the noise power is replaced by the equivalent noise power. The remaining (M − D)L bit-LLRs for the symbols not selected are set to all zero. This process is executed at all observation nodes, and the results are summed in order to improve the accuracy as (1j)
bl
(2j)
= bl =
N
(N j)
= · · · = bl (nj)
αl
.
(15)
n=1
The number of multiplications for the initial LLR computation becomes (2L )D N D in total. Obviously, the complexity increases exponentially with the number of selected symbols. D. Complexity
TABLE I S IMULATION PARAMETERS . Detecting method
BP w/ PIC
Number of antennas (M × N )
MMSE 100 × 100
Modulation
QPSK
Channel statistics
Quasi-static Rayleigh fading
Noise
AWGN
Frame length
10 symbols
Number of frames
10,000
Channel encoding
Convolutional code (constraint length 3, coding rate 1/2)
Channel decoding
Max-Log MAP decoder
Number of selected symbols: D
0, 3
—
Number of iterations: I
1, 3, 5, 7
—
Let us discuss the complexity in terms of the number of multiplications. Processing at the symbol nodes requires additions and subtractions only. In contrast, at the observation node, some multiplications are needed in both PIC and LLR calculation2 . The number of multiplications per observation node becomes M for PIC and M L for LLR calculation. Thus, with the number of iterations I, the whole complexity is expressed by M (1 + L)N I. E. Channel Encoded Case The above discussion can be extended to the case with channel coding. In the encoded case, we can reconstruct the factor graph as proposed in [13] and exchange the messages in three levels. This method is expected to be effective for channel codes using iterative decoding such as turbo and LDPC codes. In this paper, as a less-complexed method, we included the decoding process in the symbol node and replaced the a posteriori LLR by the decoder output. Specifically, after a posteriori LLR calculation at the symbol nodes, the deinterleaved LLR information is passed to the channel decoder. Then, the decoded outputs are returned to the symbol nodes after interleaving. At the symbol nodes, updated messages are calculated using this a posteriori LLR. By including the decoding process in each iteration, the complexity is increased. When using a convolutional code, however, the impact on the complexity increase is expected to be small. IV. N UMERICAL E VALUATION A. BER Performance We applied computer simulations for performance evaluation of the method described above (hereinafter called BP with PIC) and compared the BER performance between BP with PIC and MMSE spatial filtering. The simulation parameters are shown in Table I. The modulation is QPSK, the numbers of both transmit and receive antennas are equally set to 100, and each channel response between antennas is given as to be i.i.d. quasi-static Rayleigh distribution within a frame. The channel state information is assumed to be perfectly 2 It is assumed that the symbol mapping function is stored in a look-up table.
Fig. 5.
Uncoded BER performance of BP with PIC.
known at the receiver. In the coded case, a convolutional code (constraint length: 3, coding rate: 1/2) and max-log MAP decoder are used. As parameters of BP with PIC, the number of transmitted signals chosen for computing the initial LLR is set to D = 0 or D = 3. Here, D = 0 indicates the case setting to all zero for initial LLR as shown in (13), i.e., no cancellation is performed at the first message update. The BER performance was obtained by transmitting 10,000 frames having 10 symbols assuming perfect symbol synchronization. The uncoded BER performance is shown in Fig. 5. It is seen that the performance of BP with PIC is significantly improved with increasing the number of iterations. Although there are some differences in performances of BP with PIC between D = 0 and D = 3 in early stages, the performance in D = 0 approaches to the one in D = 3 with increase of iterations. Specifically, the difference is within about 0.3 dB in I = 5 and
based detection algorithm is very attractive for massive MIMO systems. V. C ONCLUSIONS In this paper, we discussed the method using iterative processing based on BP with PIC for detecting spatiallymultiplexed signals in a very large MIMO system and evaluated its performance. It has been revealed that the method achieves much higher performance with lower or comparable complexity than spatial filtering based on the MMSE criterion. Thus, we can conclude that BP with PIC is very effective as a method detecting spatially-multiplexed signals in massive MIMO systems. An impact of channel estimation error is one of the open and urgent issues. In addition, further reduction of complexity and symbol level combining instead of LLR synthesis at the symbol node would be future work. R EFERENCES Fig. 6.
Coded BER performance of BP with PIC.
0.1 dB in I = 7 at the BER of 10−3 . Note that the obtained gain of BP with PIC in I = 5 and I = 7 reaches about 8 dB and 10 dB compared to MMSE filtering at the BER of 10−3 , respectively3 . Figure 6 shows the coded BER performance. This simulation was performed for D = 0 only because no severe degradation could be seen as indicated in Fig. 5. The gains obtained by BP with PIC in comparison to MMSE spatial filtering are about 4 dB in I = 3 and about 5 dB in I = 5 and I = 7 at the BER of 10−3 . In addition, it can be said that the number of iterations required for convergence becomes smaller by channel encoding. B. Complexity Finally, let us discuss the complexity in terms of the number of multiplications. The MMSE filtering requires some matrix calculations. According to our rough estimation4 , the number of multiplications for weight calculation is 5N M 2 /2 in total. It becomes 2,500,000 when N = M = 100. For BP with PIC, the number of multiplications per symbol is M (1 + L)N I as described above and becomes 21,000 for the same condition and I = 7. Clearly, the complexity of BP with PIC is very small if calculating the weight matrix for MMSE filtering is needed per symbol. Even when the weight matrix calculation is once per frame, the total complexity of BP with PIC is still less than the one of MMSE filtering in the assumed frame length (10 symbols5 ). Considering the high detection performance of BP with PIC, we can state that the BP3 Note that the performance of MMSE is also improved by having huge number of antenna elements. 4 Matrix multiplication for correlation matrix: N M 2 /2, matrix inversion: M 3 , and matrix multiplication for weight matrix calculation: N M 2 . 5 The number of OFDM symbols per frame is 7 in LTE.
[1] E. Telatar, “Capacity of Multi-antenna Gaussian Channels,” European Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595, Nov./Dec. 1999. [2] A. Ghosh, R. Ratasuk, B. Mondal, N. Mangalvedhe, and T. Thomas, “LTE-advanced: Next-Generation Wireless Broadband Technology,” IEEE Trans. Wireless Commun., vol. 17, no. 3, pp. 10–22, June 2010. [3] L. Liu, R. Chen, S. Geirhofer, K. Sayana, Z. Shi, and Y. Zhou, “Downlink MIMO in LTE-advanced: SU-MIMO vs. MU-MIMO,” IEEE Commun. Mag., vol. 50, no. 2, pp. 140–147, Feb. 2012. [4] T. L. Marzetta, “Noncooperative Cellular Wireless with Unlimited Numbers of Base Station Antennas,” IEEE Trans. Wireless Commun., vol. 9, no. 11, pp. 3590–3600, Nov. 2010. [5] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up MIMO: Opportunities and Challenges with Very Large Arrays,” IEEE Signal Process. Mag., to appear, 2012. [6] E. Viterbo and J. Boutros, “A Universal Lattice Code Decoder for Fading Channels,” IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 1639-1642, July 1999. [7] K. J. Kim and J. Yue, “Joint Channel Estimation and Data Detection Algorithms for MIMO-OFDM Systems,” Proc. Thirty-Sixth Asilomar Conference on Signals, System and Computers, pp. 1857–1861, Nov. 2002. [8] H. Kawai, K. Higuchi, N. Maeda, M. Sawahashi, T. Ito, Y. Kakura, A. Ushirokawa, and H. Seki, “Likelihood Function for QRM-MLD Suitable for Soft-Decision Turbo Decoding and Its Performance for OFCDM MIMO Multiplexing in Multipath Fading Channel,” Proc. IEEE PIMRC, pp. 1142–1148, Sep. 2004. [9] K. V. Vardhan, S. K. Mohammed, A. Chockalingam, and B. S. Rajan, “A Low-Complexity Detector for Large MIMO Systems and Multicarrier CDMA Systems,” IEEE J. Sel. Areas Commun., vol. 26, no. 3, pp. 473– 485, Apr. 2008. [10] N. Srinidhi, S. K. Mohammed, A. Chockalingam, and B. S. Rajan, “Low-Complexity Near-ML Decoding of Large Non-Orthogonal STBCs using Reactive Tabu Search,” Proc. IEEE ISIT, pp. 1993-1997, June/July 2009 [11] P. Som, T. Datta, A. Chockalingam, and B. S. Rajan, “Improved LargeMIMO Detection Based on Damped Belief Propagation,” Proc. IEEE ITW, Jan. 2010. [12] C. Knievel, M. Noemm, and P. A. Hoeher, “Low-Complexity Receiver for Large-MIMO Space-Time Coded Systems,” Proc. IEEE VTC-Fall, Sept. 2011. [13] T. L. Narasimhan, A. Chockalingam, and B. S. Rajan, “Factor Graph Based Joint Detection/Decoding for LDPC Coded Large-MIMO Systems,” Proc. IEEE VTC-Spring, May 2012.