Low-complexity Detection Algorithms Based on Matrix Partition for Massive MIMO Haijian Wu, Jun Lin, Chuan Zhang† and Zhongfeng Wang School of Electronic Science and Engineering, Nanjing University, China National Mobile Communications Research Laboratory, Southeast University, China E-mail: wuhaijian
[email protected],
[email protected],
[email protected],
[email protected] †
Abstract—Massive Multiple-Input Multiple-Output (MIMO) is one of the key technologies in the fifth generation (5G) wireless communication for much higher throughput. However, current detection algorithms for massive MIMO suffer from large computational complexity. The Neumann series based approximated matrix inverse is a good tradeoff between detection performance and computational complexity. In this paper, compared to the traditional Neumann series based method, a matrix partition (MP) method is proposed to significantly reduce the number of required multiplications and additions while maintain comparable or even better detection performance. The presented MP method innovates the construction of pre-conditioner matrix and employs the Neumann series method in an intelligent way. Simulation results from a 128 × 16 massive MIMO system with 16-QAM modulation demonstrate that the proposed MP method can reduce the number of multiplications and additions by 68% and 70%, respectively.
I. I NTRODUCTION Massive Multiple-Input Multiple-Output (MIMO) is a promising technology to meet the demand of higher data rate, which has been adopted in the 5G wireless communication system, as one of the key techniques [1]. In a massive MIMO system, a small number of users communicate with the base station equipped with hundreds of antennas, so that multiple data streams can be transmitted simultaneously in the same frequency band, offering better spectral efficiency, compared with the traditional point-to-point wireless systems [2]–[4]. Therefore, it has drawn great interests from both academic and industry. While enjoying the promised benefits of massive MIMO, constructing a practical and efficient system faces the challenge of significantly increased computational complexity [5]. There are a number of signal detection algorithms proposed recently, which can be divided into linear signal detection and non-linear signal detection, to reduce the computation complexity and achieve the near-optimal performance. Linear signal detection was proposed to trade off performance with computational complexity. Typical linear detection algorithms are minimum mean square error (MMSE) and zero-forcing (ZF), where a full matrix inversion and Gram-matrix [6] computation are included. The uplink MIMO system is considered with N antennas at BS station and M antennas at the transmit teminal in this paper. For a M × M matrix, exact inversion has a complexity of O(M 3 ). As a result, the computational complexity is high when M is very large. To avoid the full matrix inversion,
Neumann series expansion (NSE) has been employed to approximate the matrix inversion using the diagonal matrix [7]. The performance of NSE depends on the number of terms used and the matrix parameters. The computational complexity is O(M 3 ) [8], when the number of the selected terms of Neumann series expansion is 3 or more. Different signal detection approaches combined the MMSE and iterative methods were developed to alleviate the computational complexity, such as the Jacobi algorithm, Gauss-Seidel algorithm and so on [5]. The computation complexity of these detection algorithm is O(M 2 ). The Jacobi algorithm is the same as the Neumann series expansion in terms of performance with reduced complexity. Gauss-Seidel needs to compute the lower triangular matrix inversion, using the most up-to-date estimates of transmits [8], which shows much better performance than Jacobi and Neumann series expansion with the same iterations. As one of the non-linear signal detection algorithms, the maximum likelihood (ML) detection has optimal performance while suffering from exponentially grown complexity when the number of transmit antennas or the order of constellation modulation size increases [9]. Another non-linear algorithm is called belief propagation (BP), where iterations and updates of conditional probabilities are performed on the parent and child nodes. The BP algorithm can converge in finite iterations to obtain the optimal solution if there are no loop in the Bayesian network [10], [11]. The Gram-matrix property greatly affects the performance of the detection methods described above. If the N/M ratio is more than 5.83, the detection methods can achieve better performance with quicker convergence [12]. The constraint of Gram-matrix being a diagonally dominant was also come up with in [12], so the (N/M ) ratio needs to satisfy the conditions if these algorithms work well. In this paper, a matrix partition (MP) method is proposed for the signal detection in massive MIMO systems. The proposed detection algorithm works well for massive MIMO systems with relative smaller N/M ratio. The presented MP method innovates the construction of pre-conditioner matrix and employs the Neumann series method in an intelligent way. Simulation results from a 128 × 16 massive MIMO system with 16-QAM modulation demonstrate that the proposed MP method can reduce the number of multiplications and additions by 68% and 70%, respectively. The remainder of this paper is organized as follows. Sec-
978-1-5386-2062-5/17/$31.00 ©2017 IEEE
tion II presents the system model of massive MIMO uplink and the approximate matrix inversion with Neumann series. For a MIMO uplink system, an innovative approximate matrix inversion with matrix partition is proposed in Section III. Section IV shows the performance analysis of two approximate matrix inversion methods used in a massive MIMO uplink system. The conclusion is drawn in Section V. Notations: [·]T , [·]H and [·]−1 stand for the transpose, conjugate transpose and matrix inversion, respectively. {·} and {·}, denote to the real part and imaginary part of a complex number, respectively. IM represents an M × M identity matrix. O(·) refers to the order of the magnitude of the computational complexity. II. BACKGROUND A. Massive MIMO Uplink Channel Model As shown in Fig. 1, we consider a typical uplink massive MIMO system, where the base station (BS) employs N antennas at the receiver side. The transmitter side has M single-antenna users. The transmitted bits are modulated with a set of constellation points, e.g., 16 Quadrature Amplitude Modulations (16QAM) [13]. The transmitted signal vector xc is denoted as xc = [xc1 , · · · , xcM ]T , where E|xci | = Exc and xci c T is a complex value for i ∈ [1, M ]. y c = [y1c , · · · , yN ] denotes c the received complex value signal vector. H is denotes the flat Rayleigh fading channel matrix whose entries are independently and identically distributed (i.i.d) and follow the distribution CN (0, 1). Let nc = [nc1 , · · · , ncM ]T denote the additive white Gaussian noise (AWGN) whose entries are i.i.d and follow the distribution. The signal-to-noise ratio (SNR) is c [14]. The system model for a massive MIMO defined as MσEx 2 n uplink is described by y c = H c × x c + nc .
(1)
#2
#1
#2
··· ···
#i
#j
{H c } −{H c } {xc } , H= , x= {xc } {H c } {H c } {y c } {nc } and y = . n= c {n } {y c }
B. MMSE Detection for Massive MIMO Uplink System For the MMSE estimation [15] the estimated transmitted signal δn2 ˆ −1 yˆM M SE , (3) I2M )−1 H H y = W Ex where yˆM M SE = H H y is regarded as the output of the ˆ = H H H is a 2M × 2M Gram matrix and matched filter. G 2 δn H ˆ I2M is a 2M × 2M symmetric positive W = H H + Ex definite matrix [16]. As shown in Eq. (3), we have to compute the matrix inversion in the MMSE detection. However, the exact matrix inversion will be hard to implement in practical systems as the scale of massive uplink MIMO system increases. x ˆM M SE = (H H H +
C. Neumann Series Approximation for Matrix Inversion ˆ is a symmetric positive definite matrix, The Since W ˆ −1 can be obtained after several approximate value of W iterations with Neumann series [7]. When using the Neumann series method, the computational complexity can be reduced compared to exact matrix inversion. Hence, for cost-effective hardware implementations, the Neumann series method takes the place of the exact matrix inversion method. With the ˆ is similar to basis of Neumann series approximation, if W an invertible matrix X, we can get ˆ )i = 0. lim (I − X −1 W
i→∞
(4)
ˆ −1 can be rewritten with the following Neumann Hence W series [7]: ∞ ˆ −1 = ˆ ))i X −1 . W (X −1 (X − W (5) i=0
Transmitter #1
where
··· ···
#M
#N
Receiver Fig. 1. Massive MIMO uplink channel model.
Because the complex-valued model is inefficient to compute, it is usually transformed into real-valued model. In [5], the equivalent real-valued massive MIMO uplink system model given by y = H × x + n, (2)
When we use the Neumann series approximation to calculate the approximate inversion, the ratio α should be taken into account, where N . (6) α= M If α > 5.83, the probability of convergence of Eq. (4) is ˆ will high [12]. It means that the approximated inversion of W ˆ for a relative small i. On be similar to the exact inversion of W the other hand, if α is not large enough, a larger n is required ˆ . In this case, for achieving relatively accurate inversion of W the computational complexity will be substantially large and the corresponding hardware implementations are inefficient. ˆ is diagonally dominant, we can decompose W ˆ as Since W ˆ ˆ W = D + E, where D is main diagonal of W and E is the ˆ . Let X = D, Eq. (5) can be rewritten off-diagonal part of W as [17]: ∞ ˆ −1 = (−D−1 E)i D−1 . (7) W i=0
Usually, we take the first k terms of Neumann series as the ˆ −1 [18]: approximation of W ˆ −1 ≈ W k
k−1
(−D−1 E)i D−1 .
(8)
i=0
When the α is not large enough, in order to get the relatively ˆ , we need to increase the value of k. If accurate inverse of W ˆ requires O((2M )2 ) k = 2, the approximate inversion of W 3 operations. It will require O((2M ) ) operations, when k > 2. ˆ can be computed The reciprocals of the diagonal element of W with a lookup table [19] in real hardware implementations. In this paper, the computational complexity of finding the reciprocal of a value is not considered.
(a) 1 × 1 blocks
III. A PPROXIMATE MATRIX INVERSION METHOD BASED ON MATRIX PARTITION
A. The Block Matrix Method In order to guarantee the convergence of Eq. (5), |λmax | should be smaller than 1, where λmax is the largest eigenvalue ˆ )) [20]. Consequently, using of the matrix (X −1 (X − W appropriate pre-conditioner matrix X is an effective method to accelerate the convergence speed. In [21], some extra elements ˆ were added to the matrix X, the accuracy from the matrix W −1 ˆ will be kept with lower iterations. of W In this paper, a matrix partition (MP) method is proposed to obtain lower bit error rate (BER) while reducing the computational complexity compared to the regular Neumann series method. The computational complexity of different for different k values is shown in the Table I. For the MP method, the pre-conditioner matrix X is constructed based on block ˆ compared matrices in order to consider more elements of W to the regular Neumann series method, where X is just the ˆ . When Y = W ˆ − X and the k−term Neumann diagonal of W series is adopted, Eq. (5) can be transformed into ˆ −1 ≈ W
k−1
(−X −1 Y )i X −1 .
(b) 2 × 2 blocks
(c) 4 × 4 blocks
(9)
Fig. 2. Different partitioned matrix methods
i=0
ˆ is a 8 × 8 matrix, Fig. 2 shows three different Suppose W ways to construct X, where the red parts are nonzero values of the matrix X and the white parts are the nonzero values of the matrix Y . Here, X and Y are both 8 × 8 matrices. It is obvious that the original Neumann series method take the X defined in Fig. 2(a). For the proposed MP method, X consists of some block matrices as shown in Figs. 2(b) and (c), where the size of the corresponding block matrix is 2 × 2 and 4 × 4, respectively. As shown in Eq. (9), the computation of X −1 is performed by calculating the inverse the block matrices in X for the proposed MP method. In this paper, the inverse of a block matrix is obtained with the original Neumann series method. Once X −1 is computed, our MP method calculates ˆ −1 using Eq. (9). W In more detail, suppose X takes the form shown in Fig. 2(c), ˆ with the presented MP method the inverse of the matrix W is shown as follows. Without loss of generality, the k1 −term
Neumann series is used in this example. When k1 = 2, the ˆ can be calculated as follows: approximate inverse of W ˆ −1 ≈ X −1 − X −1 Y, W where
(10)
E1 . 0 (11) According to Eq. (10) and Eq. (11), the approximate inverse ˆ can be express as: of W −1 −1 0 E1 D1 0 ˆ −1 ≈ D1 0 W − E2 0 0 D2 0 D2 (12) −1 −1 D1 D 1 E1 = , D2−1 E2 D2−1 D1 ˆ W = E2
E1 , D2
D1 X= 0
0 , D2
0 Y = E2
TABLE I C OMPUTATIONAL C OMPLEXITY A NALYSIS k=2
k=3
k=4
k=5
k=6
k=7
2(2M )2 2M
(2M )3 + 2(2M )2 (2M )3 + 2M
2(2M )3 + 2(2M )2 2(2M )3 + 2M
3(2M )3 + 2(2M )2 3(2M )3 + 2M
4(2M )3 + 2(2M )2 4(2M )3 + 2M
5(2M )3 + 2(2M )2 5(2M )3 + 2M
where the sub-matrices D1−1 and D2−1 are hard to be obtained using the exact inverse method, since the sub-matrices D1 and D2 are both 4×4 matrices. Suppose D1−1 and D2−1 are known, ˆ −1 is denoted as First Level Iteration the calculating of W (FLI). Since D1 and D2 are both symmetric positive definite matrix, the Neumann series can also be used to compute the inverse of D1 and D2 . In this work, we decompose D1 into its main diagonal D1 D and off diagonal E1 E such that D1 = D1 D + E1 E . Meanwhile, we decompose D2 into its main diagonal D2 D and off diagonal E2 E such that D2 = D2 D + E2 E . Using the k2 −term Neumann series, we can compute the approximate inverse of matrix D1 and D2 . Without loss of generality, let k2 = 2, D1 −1 ≈ D1−1D − D1−1D E1
D2 −1 ≈ D2−1D − D2−1D E2
E, E.
(13)
The calculating of D1−1 and D2−1 is denoted as Second Level Iteration (SLI). B. Complexity Analysis ˆ −1 is For the proposed MP method, the calculation of W divided into two parts: FLI and SLI. Therefore, the overall computational complexity is the sum of that of FLI and SLI. The computational complexity of SLI is similar to the computational complexity of original Neumann series for different k−term. When the number of user antenna is M , the sub-matrices D1 , D2 , E1 and E2 are M × M matrices and all of them are not diagonal matrices. Therefore, the matrix multiplication in SLI has a complexity of O(M 3 ). When the number of Neumann series iteration is kF LI and the size of ˆ is 2M × 2M , the number of multiplication is: matrix W NF LI
multi k
kF LI = (2M )3 . 4
kF LI (kSLI − 2) (2M )3 + (2M )3 + (2M )2 , 4 4 kF LI (2M )3 − (2M )2 + N add k = 4 (kSLI − 2) (2M )3 + 2M. 4 (17) Hence, the above equation can be applied to calculate the ˆ −1 for different kF LI overall computational complexity of W and kSLI . N
multi k
=
IV. P ERFORMANCE A NALYSIS In general,the N is much more than the M [2]. If α is not large enough, in order to obtain relatively accurate ˆ , we need a larger number of Neumann series inverse of W iterations. Nevertheless, the extra number of Neumann series ˆ. iterations will increase the complexity of the inverse of W By contrast, with the proposed method, we can reduce the computational complexity of matrix inversion and get better detection performance at the same time. 10 0 Neumann series k=2 Neumann series k=3 Neumann series k=4 Neumann series k=5 Neumann series k=6 neumann k=7 mmse
10 -1
10 -2
10 -3
10 -4
(14) 10 -5
The number of addition is given by kF LI (2M )3 − (2M )2 . 4
ˆ −1 using The overall computation complexity of calculating W the proposed method is given by
BER
Multiplications Additions
0
2
4
6
8
10
12
14
16
18
20
SNR dB
(15)
Fig. 3. Performance of massive MIMO detection with M = 16 and N = 128 using Neumann Series.
The number of multiplications and additions in SLI is similar to the previous case where the number of Neumann ˆ is 2M ×2M . series iteration is kSLI and the size of matrix W They can be expressed as
Fig. 3 shows the BER performance of a massive MIMO detection system with M = 16 , N = 128 and 16−QAM modulation, using Neumann Series method with different k−term and MMSE method. k in Fig. 3 denotes the number of iterations. It is shown that the BER performance will be improved with increased number of Neumann series iterations. In Fig. 4, for the same MIMO detection system simulated in Fig. 3, the performance curves of MMSE method and the
NF LI
add k
=
(kSLI − 2) (2M )3 + (2M )2 , 4 (kSLI − 2) = (2M )3 + 2M. 4
NSLI
multi k
NSLI
add k
=
(16)
10 0
65%
Proposed Method k
10 -1
Proposed Method k Proposed Method k Proposed Method k Proposed Method k
10 -2
FLI FLI FLI FLI FLI FLI
=2,k =2,k =3,k =3,k =4,k =4,k
SLI SLI SLI SLI SLI SLI
Addition Ratio
=4
60%
=5 =3
55%
=4
Complexity Ratio (a )
Proposed Method k
=3 =4
BER
mmse
10 -3
50% 45% 40% 35% 30%
10 -4
25%
10 -5
20%
0
2
4
6
8
10
12
14
16
18
20
4
5
6
7
Number of Neumann Iteration
SNR dB
Fig. 6. Addition complexity comparison
Fig. 4. Performance of massive MIMO detection with M = 16 and N = 128 using Block Matrix. 65% 10 -1
Multiplication Ratio Proposed Method k
=4,k
SLI
=3
Neumann series k=5 Neumann series k=6 Neumann series k=7 mmse
60% 55%
Complexity Ratio (m )
10 -2
BER
FLI
10 -3
50% 45% 40% 35%
10 -4 30% 25% 10 -5
0
2
4
6
8
10
12
14
4
5
6
7
Number of Neumann Iteration
SNR dB
Fig. 7. Multiplication complexity comparison Fig. 5. Comparison between the Neumann Series Method and Block Matrix Method with M = 16 , N = 128 and 16−QAM.
proposed MP method with different kF LI and kSLI values are presented.It shows that the BER performance will be improved by increasing kF LI and kSLI . The original k−term Neumann series method, the proposed MP method with different kF LI and kSLI values and the MMSE method are compared in Fig. 5. As shown in Fig. 5, when the BER equals 1e − 4, the MP method with kF LI = 4 and kSLI = 3 is 0.1dB better than the 6−term Neumann series method and 0.36dB better than the 5−term Neumann series method, respectively. Fig. 6 shows complexity ratio which is expressed as: In order to compare the computational complexity of the proposed MP method with that of the original Neumann series method, two factors αm and αa are defined, where N Bm αm = , NN m k (18) N Ba αa = . NN a k NN m k and NN a k denote the number of multiplications
ˆ −1 using k−term Neuand additions used in computing W mann series method, respectively. NB m and NB a represent the number of multiplications and additions consumed in ˆ −1 using the MP method with kF LI = 4 and computing W kSLI = 3, respectively. αa and αm with respect to k are shown in Figs. 6 and 7, respectively. As shown in Fig. 5, the original Neumann series method with 6 iterations has comparable BER performance with the MP method with kF LI = 4 and kSLI = 3. Hence, when similar performance are achieved, the number of multiplications and additions is about 32% and 30% of that of the Neumann series method. We simulated the TNS algorithm in [21] with 5−term and 6−term, compared them with our proposed method. From Fig. 8, the conclusion is obtained that BER performance of the proposed method is 0.13dB better than the 5−term TNS and 0.04dB worse than the 6−term TNS, when kF LI and kSLI are set as 4 and 3 respectively in the massive MIMO uplink system. However, our method reduces computational complexity greatly. After calculation, we found out that compared to 5−term TNS, the proposed algorithm saves 68.454% multiplications and 69.507% additions.
10 -1
10 -2
BER
10 -3
10 -4 5-term TNS in [13] 6-term TNS in [13] Proposed Method k
10 -5
Proposed Method k
FLI FLI
=4,k =4,k
SLI SLI
=3 =4
mmse
10 -6
0
2
4
6
8
10
12
14
SNR dB
Fig. 8. Comparison between TNS in [21] and Block Matrix Method with M = 16 , N = 128 and 16−QAM.
V. C ONCLUSION In this paper, we propose a low-complexity detection algorithm for linear detection in massive MIMO uplink systems. This proposed method combines matrix decomposition and Neumann series method in a clever way, which can significantly reduce the computational complexity of matrix inversion with comparable or even better BER performance when α is not large enough. The simulation results show that the numbers of multiplications and additions of the proposed method with kF LI = 4 and kSLI = 3 are reduced by 68% and 70% compared with the Neumann series method when similar BER performance is required. ACKNOWLEDGEMENT This work was supported in part by the National Nature Science Foundation of China under Grants No. 61604068 and 61774082; the Fundamental Research Funds for the Central Universities under Grant No. 021014380045 and 021014380065. R EFERENCES [1] F. Rusek, D. Persson, B. K. Lau, E. G. Larsson, T. L. Marzetta, O. Edfors, and F. Tufvesson, “Scaling up mimo: Opportunities and challenges with very large arrays,” IEEE Signal Processing Magazine, vol. 30, no. 1, pp. 40–60, Jan 2013. [2] T. L. Marzetta, “Noncooperative cellular wireless with unlimited numbers of base station antennas,” IEEE Transactions on Wireless Communications, vol. 9, no. 11, pp. 3590–3600, November 2010. [3] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectral efficiency of very large multiuser mimo systems,” IEEE Transactions on Communications, vol. 61, no. 4, pp. 1436–1449, April 2013. [4] J. Hoydis, S. ten Brink, and M. Debbah, “Massive mimo in the ul/dl of cellular networks: How many antennas do we need?” IEEE Journal on Selected Areas in Communications, vol. 31, no. 2, pp. 160–171, February 2013. [5] Z. Zhang, J. Wu, X. Ma, Y. Dong, Y. Wang, S. Chen, and X. Dai, “Reviews of recent progress on low-complexity linear detection via iterative algorithms for massive mimo systems,” in 2016 IEEE/CIC International Conference on Communications in China (ICCC Workshops), July 2016, pp. 1–6. [6] R. Guo, X. Li, W. Fu, and Y. Hei, “Low-complexity signal detection based on relaxation iteration method in massive mimo systems,” China Communications, vol. 12, no. Supplement, pp. 1–8, December 2015.
[7] M. Wu, B. Yin, A. Vosoughi, J. R. Cavallaro, C. Studer, and C. Dick, “Approximate matrix inversion for high-throughput data detection in the large-scale mimo uplink,” Circuits and Systems (ISCAS), 2013 IEEE International Symposium on. IEEE, pp. 0271–4302, May 2013. [8] Z. Wu, C. Zhang, Y. Xue, S. Xu, and X. You, “Efficient architecture for soft-output massive mimo detection with gauss-seidel method,” in 2016 IEEE International Symposium on Circuits and Systems (ISCAS), May 2016, pp. 1886–1889. [9] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE Transactions on Information Theory, vol. 48, no. 8, pp. 2201–2214, Aug 2002. [10] T. Usami, T. Nishimura, T. Ohgane, and Y. Ogawa, “Bp-based detection of spatially multiplexed 16-qam signals in a fully massive mimo system,” in 2016 International Conference on Computing, Networking and Communications (ICNC), Feb 2016, pp. 1–5. [11] J. Yang, C. Zhang, X. Liang, S. Xu, and X. You, “Improved symbolbased belief propagation detection for large-scale mimo,” in 2015 IEEE Workshop on Signal Processing Systems (SiPS), Oct 2015, pp. 1–6. [12] D. Zhu, B. Li, and P. Liang, “On the matrix inversion approximation based on neumann series in massive mimo systems,” in 2015 IEEE International Conference on Communications (ICC), June 2015, pp. 1763–1769. [13] D. Kong, X. G. Xia, and T. Jiang, “A differential qam detection in uplink massive mimo systems,” IEEE Transactions on Wireless Communications, vol. 15, no. 9, pp. 6371–6383, Sept 2016. [14] B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Conjugate gradient-based soft-output detection and precoding in massive mimo systems,” in 2014 IEEE Global Communications Conference, Dec 2014, pp. 3696–3701. [15] X. Gao, L. Dai, Y. Hu, Z. Wang, and Z. Wang, “Matrix inversion-less signal detection using sor method for uplink large-scale mimo systems,” in 2014 IEEE Global Communications Conference, Dec 2014, pp. 3291– 3295. [16] X. Liang, C. Zhang, S. Xu, and X. You, “Coefficient adjustment matrix inversion approach and architecture for massive mimo systems,” in 2015 IEEE 11th International Conference on ASIC (ASICON), Nov 2015, pp. 1–4. [17] B. Yin, M. Wu, G. Wang, C. Dick, J. Cavallaro, and C. Studer, “A 3.8gb/s large-scale mimo detector for 3gpp lte-advanced,” 2014, pp. 3879–3883. [18] F. Wang, C. Zhang, J. Yang, X. Liang, X. You, and S. Xu, “Efficient matrix inversion architecture for linear detection in massive mimo systems,” in 2015 IEEE International Conference on Digital Signal Processing (DSP), July 2015, pp. 248–252. [19] C. Studer, S. Fateh, and D. Seethaler, “Asic implementation of softinput soft-output mimo detection using mmse parallel interference cancellation,” IEEE Journal of Solid-State Circuits, vol. 46, no. 7, pp. 1754–1765, July 2011. [20] X. Liang, C. Zhang, S. Xu, and X. You, “Coefficient adjustment matrix inversion approach and architecture for massive mimo systems,” in 2015 IEEE 11th International Conference on ASIC (ASICON), Nov 2015, pp. 1–4. [21] H. Prabhu, O. Edfors, J. Rodrigues, L. Liu, and F. Rusek, “Hardware efficient approximative matrix inversion for linear pre-coding in massive mimo,” in 2014 IEEE International Symposium on Circuits and Systems (ISCAS), June 2014, pp. 1700–1703.