Hybrid Vector Perturbation Precoding: The Blessing of Approximate ...

1 downloads 0 Views 377KB Size Report
Oct 10, 2017 - CVP on a sub-lattice or impose a constraint for the signal space of sphere precoding. E.g., in [11], the authors proposed to alternatively solve a ...
1

Hybrid Vector Perturbation Precoding: The Blessing of Approximate Message Passing

arXiv:1710.03791v1 [cs.IT] 10 Oct 2017

Shanxiang Lyu and Cong Ling, Member, IEEE

Abstract—Vector perturbation (VP) precoding is a promising technique for multiuser communication systems operating in the downlink. In this work, we introduce a hybrid framework to improve the performance of lattice reduction (LR) aided (LRA) VP. Firstly, we perform a simple precoding using zero forcing (ZF) or successive interference cancellation (SIC) based on a reduced lattice basis. Since the signal space after LR-ZF or LRSIC precoding can be shown to be bounded to a small range, then along with sufficient orthogonality of the lattice basis guaranteed by LR, they collectively pave the way for the subsequent application of an approximate message passing (AMP) algorithm, which further boosts the performance of any suboptimal precoder. Our work shows that the AMP algorithm in compressed sensing can be beneficial for a lattice decoding problem whose signal constraint lies in Z and entries of the input lattice basis not necessarily being i.i.d. Gaussian. Numerical results show that the developed hybrid scheme can provide performance enhancement with negligible increase in the complexity. Index Terms—Vector perturbation, lattice reduction, approximate message passing

I. I NTRODUCTION HE broadband mobile internet of the next generation is expected to deliver high volume data to a large number of users simultaneously. To meet this demand in the broadcast network, it is desirable to precode the transmit symbols according to the channel state information (CSI) with improved timeefficiency while retaining the reliability. It has been indicated that plain channel inversion performs poorly at all singal-tonoise ratios (SNRs), and further regularization cannot improve the performance substantially. In [1], [2], the authors proposed a precoding scheme called vector perturbation (VP) based on Tomlinson-Harashima precoding to modify the transmitted data by modulo-lattice operations, and the scheme has been shown to be achieving near-sum-capacity of the system, which does not require explicit dirty-paper techniques. The optimization target in a VP problem represents a closest vector problem (CVP) in a lattice perspective, which has been proved NP-complete by a reduction from the decision version of CVP [3]. Therefore, the sphere decoding technique [4] adopted in [1], [2] (referred to as sphere precoding) is computationally prohibitive for large-scale systems. This hardness especially looms in the VP precoding problem because there is no prior on the distance from a target to the lattice, and the lattice bases in VP are not Gaussian random, so that Hassibi’s expected complexity analysis [5] no longer suits this setting. The complexity issue is indeed one of the three main

T

S. Lyu and C. Ling are with the Department of Electrical and Electronic Engineering, Imperial College London, London SW7 2AZ, United Kingdom (e-mail: [email protected], [email protected]).

challenges associated with VP, where the other two issues are about its power scaling factor and large signal space [6], [7]. To bypass the complexity issue of sphere precoding, much work has been done in recent years to explore low complexity CVP algorithms in multiuser (MU) multiple-input multipleouput (MIMO) communicatoins, e.g., cf. [6], [8], [9], [10], [11], [12], [13]. The spirit of these results is to address CVP on a sub-lattice or impose a constraint for the signal space of sphere precoding. E.g., in [11], the authors proposed to alternatively solve a CVP about a selective sub-basis of a smaller dimension, whose associated complexity of VP depends on the size of the new basis. As for the sparse vector perturbation technique in [13], it also belongs to the class of selective vector perturbation where it only selects two vectors. The reduction on the target vector in [13] is then applied to all basis vectors sequentially, which resembles a special case of the sequential lattice reduction [14]. There is however no theoretical performance guarantee for these simplified methods, so we have to resort to a lattice reduction (LR) aided (LRA) precoding scheme [10], [15], [16], which had been shown diversity achieving [16]. In addition to their theoretical guarantees, LRA methods particularly suit slow fading channels, where the lattice basis is fixed during a large number of time slots and only the CVP targets are changing. We investigate VP by using LRA methods in this work. LR has become quite popular in both MIMO precoding and decoding, especially after the pioneering work of Lenstra–Lenstra–Lovász (LLL) [17]. In recent years, in addition to the polynomial LLL algorithm, more researchers are showing interests in strong lattice reduction algorithms such as Minkowski’s reduction [18], [19], Korkine-Zolotarev’s (KZ) reduction [20] and its boosted version [21]. The performance of LRA precoding is not well understood except [10], [16], so our primary motivation is to investigate how far LRA methods can go, especially with the blessing of algorithms from compressed sensing. We propose to use a message passing algorithm to explore the vicinity of sub-optimal solutions under the LRA framework. The approximate message passing (AMP) algorithm was initially proposed by Donoho, Montanari and Maleki in [22], [23], [24] to solve the least-absolute shrinkage and selection operator (Lasso) problem in compressed sensing, which has much lower complexity than previous benchmark algorithms. Researchers have been adopting message passing algorithms to solve problems in MIMO detection [25], [26], [27] with small constellation sizes, where the assumed Rayleigh fading channel assists to model the input lattice basis with i.i.d. Gaussian entries. It is noteworthy that directly applying AMP

2

in MIMO detection problems cannot be diversity achieving because a general discrete prior renders the AMP threshold function not Lipschitz continuous in high signal-to-noise-ratio (SNR), so channel coding is often required (e.g., cf. [25]). If we want to embrace the low complexity advantage of AMP, several practical issues must be hampered. i) the lattice basis in VP is not a Gaussian random one nor its dual, while [28] shows the entries have to be at least sub-Gaussian and the generalized AMP (GAMP) [29], [30] only shows convergence of the algorithm with the aid of damping. ii) the problem size may not be infinitely large, and we should make AMP feasible in the non-asymptotic region (say, the base station is equipped with 20 antennas to serve 20 users). iii) the constellation in AMP cannot be integers Z. Fortunately, AMP in conjunction with a reduced lattice basis can alleviate all these concerns. The contributions of this paper are summarized as twofold: 1) After showing boosted LLL/KZ (b-LLL/b-KZ) reduced bases are good for AMP, we analyze the energy efficiency of LRA precoding with zero-forcing (ZF) or successiveinterference-cancellation (SIC) precoding. b-LLL/b-KZ suits compressed sensing scenarios because they yield bases with small coherence parameters, and an orthogonality metric in lattice theory is indeed reflecting the same goodness. The proved bound on LR-ZF/SIC not only shows that a sub-optimal estimator has a power scaling factor not far from that of sphere precoding, but also reveals that we can subtract the LR-ZF/SIC estimation to get into another estimation problem of a bounded constellation size. Since the bound on constellation size is derived from a worst case analysis, we also empirically shows a small constellation size suffices for our new problem. 2) For the first time, the AMP algorithm is successfully deployed to address a lattice decoding problem with an arbitrary input basis and an integer prior Z. A reduced lattice basis may still not suit the basis assumption of AMP, so we derive a new one based on the exposition of Montanari [24] and Maleki [31]. This derivation can be associated with a state evolution equation, where the impacts of lattice reduction and parameter selections are revealed explicitly. We propose to impose a ternary prior for AMP, so that the threshold functions have closed forms and the whole algorithm has relatively low complexity. This design helps to explore all the 3n adjacent Voronoi cells of a LR-SIC/ZF one. Numerical results show that we can get a few dB’s gain after concatenating AMP to previous LRA-ZF/SIC. The rest of this paper is organized as follows. We review some basic concepts about lattices and VP in Sec. II. The hybrid scheme is explained in Sec. III, which includes demonstrations about why we have reached another problem with a finite constellation size. Sec. IV presents our AMP algorithm. And lastly we give out simulation results and conclusions. Notation: Matrices and column vectors are denoted by uppercase and lowercase boldface letters. ⌊·⌉ denotes rounding, |·| denotes the absolute value, k·k denotes the Euclidean norm, and † stands for pseudoinverse. span(S) denotes the vector space spanned by S. πS (x) and πS⊥ (x) denote the projection of x onto span(S) and the orthogonal complement of span(S). ∝ stands for equality up Pnto a normalization constant. [n] denotes {1, . . . , n}, hxi = j=1 xj /n. In the message passing algo-

rithms, we take {a, b}, {i, j} to index the rows and columns of H, respectively. We use the standard asymptotic notation p(x) = O(q(x)) when lim supx→∞ |p(x)/q(x)| < ∞. II. P RELIMINARIES A. Lattices An n-dimensional lattice is a discrete additive subgroup in Rn . A Z-lattice with basis H = [h1 , . . . , hn ] ∈ Rm×n can be represented by     X c i hi , c i ∈ Z . L(H) = v | v =   i∈[n]

The ith successive minimum of L(H) is the smallest real number r such that L(H) contains i linearly independent vectors of length at most r: λi (H) = inf {r | dim(span((L ∩ B(0, r))) ≥ i} , in which B(t, r) denotes a ball centered at t with radius r. It is necessary to distinguish whether a lattice basis is good or not. Good means all the lattice vectors are short and nearly orthogonal, and this property is measured by the orthogonality defect (OD): Qn khi k . (1) ξ(H) = p i=1 det(H⊤ H) We have ξ(H) ≥ 1 due to Hadamard’s inequality. Lattice reduction is the process to transform a bad lattice basis into a good one. Depending on what type of goodness we are pursuing, and how much complexity we can afford, there are many well developed reduction algorithms. Here we review the polynomial time LLL [17] reduction and the exponential time KZ [20] reduction because most reduction algorithms can be interpreted as variants of these two. We shall present the definitions of LLL/KZ reduction, whose algorithmic routines can be found in [32]. Let R be the R matrix of a QR decomposition on H, with elements ri,j ’s, and δ ∈ (1/4, 1] be a Lovász constant.

Definition 1 ([17]). A basis H is called LLL reduced if it satisfies the size reduction conditions of |ri,j /ri,i | ≤ 12 for 1 ≤ 2 2 2 i ≤ n, j > i, and Lovász conditions of δri,i ≤ ri,i+1 +ri+1,i+1 for 1 ≤ i ≤ n − 1. p √ Define β = 1/ δ − 1/4 ∈ (2/ 3, ∞). If H is LLL reduced, it has [17] ξ(H) ≤ β n(n−1)/2 .

(2)

Definition 2 ([33]). A basis H is called KZ reduced if it satisfies the size reduction conditions, and the projection ⊥ conditions of π[h (hi ) being the shortest vector of the 1 ,... ,hi−1 ] ⊥ projected lattice π[h1 ,... ,hi−1 ] ([hi , . . . , hn ]) for 1 ≤ i ≤ n. If H is KZ reduced, it has [33] !  n √ n/2 Y i+3 2n . ξ(H) ≤ 2 3 i=1

(3)

It has been shown in [21] that the boosted version of LLL/KZ can produce shorter and more orthogonal basis vectors.

3

Definition 3 ([21]). A basis H is called boosted LLL (bLLL) reduced if it satisfies diagonal reduction conditions of 2 2 δri,i ≤ (ri,i+1 −⌊ri,i+1 /ri,i ⌉ri,i )2 +ri+1,i+1 for 1 ≤ i ≤ n−1, and all hi for 2 ≤ i ≤ n are reduced by an approximate CVP oracle with list size L along with a rejection operation. Although the definition of b-LLL ensures that it performs no worse than LLL, only the same bound on OD has been proved: ξ(H) ≤ β n(n−1)/2 [21]. Definition 4 ([21]). A basis H is called boosted KZ (b-KZ) reduced if it satisfies the projection conditions as KZ, and the length reduction conditions of khi k ≤

hi − QL([h ,... ,h ]) (π[h ,... ,h ] (hi )) for 2 ≤ i ≤ n, 1 i−1 1 i−1 where Q (·) is a lattice quantizer. If H is b-KZ reduced, it has !  √ n−1 √ n/2 2n n Y i+3 . ξ(H) ≤ 2 2 3 i=1

Define y = H† s ∈ Rm , H = AB† ∈ Rm×n , then (9) represents a closest vector problem (CVP) of lattice L(H): ˆ = arg minn ky − Hxk2 . x

(10)

x∈Z

This CVP is different from the CVP in MIMO detection [34] because the distance distribution from y to lattice L(H) is not known, the lattice basis is generally not admitting Gaussian distributions, and the optimization domain of x is in Zn rather than a finite constellation. III. T HE HYBRID SCHEME Our hybrid scheme to solve the CVP in (10) is described as follows. The rationale is demonstrated in Fig. 1.

(4)

B. Vector Perturbation and Optimization Vector perturbation is a precoding technique that aims to minimize the transmitted power that is associated with the transmission of a certain data vector [1], [2]. Assume the MIMO system is equipped with with m transmit antennas and n individual users, and each user has only one receive antenna. The observed signal l at users 1 to n can be collectively expressed as l = Bt + w (5) where B ∈ Rn×m denotes a channel matrix whose entries admit N(0, 1), t ∈ Rm is a transmitted signal, and w ∼ 2 N(0, σw In ) is an additive Gaussian noise. With perfect channel knowledge at transmitter’s side, the transmitted signal t is designed to be a truncation of the channel inversion precoding B† s: t = B† (s − Ax),

(6)

where x ∈ Zn is an integer vector to be optimized, s ∈ An is the symbol vector. We set A = {0, . . . , A − 1}, because any quadrature amplitude modulation (QAM) constellation can be transformed to this format after adjusting (6), which means A has an equivalent QAM size of A2 . Assume the transmitted signal has unit power, and Et , ktk is a normalization factor. Then the received data at users can be represented as l = (s − Ax)/Et + w. ′

(7)



Let l = Et l, w = Et w, since Ax mod A = 0, the receive equation can be transformed to ⌊l′ ⌉

mod A = ⌊s + w′ ⌉

mod A.

(8)

From (8), we can see that if |wi′ | < 12 ∀ i, where w′ ∈ 2 N(0, σw Et In ), then s can be faithfully recovered. To decrease the decoding error probability which is dominated by Et , we have to address the following optimization problem at the transmitter:

2 ˆ = arg minn B† (s − Ax) . (9) x x∈Z

Fig. 1. Exploring the vicinity of a good candidate xzf ∈ R3 , whose ZF parallelepiped P(H) is the cyan cube. After updating the target vector y ← y − Hxzf , to optimize minx∈{−1,0,1}3 ky − Hxk enables locating all the blue lattice points inside the white cubes (some cubes are not plotted to avoid shading).

1) Apply lattice reduction to a not necessarily good input basis of L(H) to get H ← HU, U ∈ GLn (Z), and use ˆ , e.g., this new basis to obtain a sub-optimal candidate x ˆ = xzf = ⌊H† y⌉. x 2) Let y ← y − Hˆ x and define a finite constraint B n . 3) Use our AMP algorithm to solve: 2

xamp = arg minn ky − Hxk . x∈B

(11)

ˆ←x ˆ + xamp . 4) Return x In order to show the hybrid scheme is valid, we try to answer the following three questions in this paper: • To make the reduced basis good for AMP, which lattice reduction algorithm should we adopt? Answer: We should use b-LLL/b-KZ. These algorithms excel in the “short and orthogonal” metrics. See Appx. A for more details. • Is there any theoretical/practical guarantee for transforming x ∈ Zn to x ∈ B n ? Answer: See Secs. III-A and III-B. • The AMP algorithms in [22], [23], [24] were assuming at least the entries of H being sub-Gaussian with variance O(1/n). Can we tune an AMP algorithm that is suitable for problem (11), and possibly the routines are simple and have closed-form expressions? Answer: See Sec. IV.

4

A. The bound of B n

In the application to precoding, we show in this section that the estimation range B n is bounded after LRA precoding. We first analyze the energy efficiency ηn 1 of b-LLL/b-KZ aided ZF/SIC, and then address the bound for B n based on ηn . Definition 5. The energy efficiency of an algorithm providing ˆ is the smallest ηn in the constraint x ky − Hˆ xk ≤ ηn ky − Hxcvp k ,

(12)

cvp

where x = arg minx∈Zn ky − Hxk, and we say this algorithm solves ηn -CVP. The practical implication of ηn is to describe how far a suboptimal perturbation is from an optimal one.

ky − Hxcvp k is bounded by the covering radius ̺(H) of L(H), so that from the triangle inequality, kH (ˆ x − xcvp )k ≤ ky − Hˆ xk + ky − Hxcvp k ≤ (ηn + 1) ̺(H).

With unitary transform, we have kH (ˆ x − xcvp )k = cvp kR (ˆ x − x )k. Then it is reminiscent of evaluating an equality from sphere decoding: kR (ˆ x − xcvp )k ≤ (ηn + 1) ̺(H). In the nth layer, one has |ˆ xn − xcvp n | ≤ (ηn + 1) ̺(H)/|rn,n | ≤ (ηn + 1) ̺(H)̟n /λ1 (H).

Theorem 1. For the serial SIC algorithm 2 , if the lattice basis is reduced by b-LLL, then p ηn = β n / β 2 − 1, (13) √ where β ∈ (2/ 3, ∞); and if the basis is reduced by b-KZ, then

Similarly in the (n − 1)th layer,



/|rn−1,n−1 |

ˆ 1:n−1 − xcvp |ˆ xn−1 − xcvp 1:n−1 n−1 | ≤ R:,1:n−1 x

8n 1+ln(n−1)/2 (n − 1) . (14) 9 Proof: Regarding the b-LLL/b-KZ, their lower bounds of 2 2 r1,1 , . . . , rn,n are not worse than those of LLL/KZ, we can use the ηn of classic LLL/KZ if they exist. So Eq. (13) is adapted from LLL in [10, Lem. 1]. Since no result about the ηn of KZ is known, we prove a sharp bound for b-KZ in Appx. B, where the skill involved is essentially due to [35].

By induction, we can obtain the upper bounds of |ˆ xn−2 − xcvp x1 − xcvp n−2 |, . . . , |ˆ 1 |. The concrete values of these bounds are easily evaluated by plugging in the values of ηi , ωi and ̟i based on the chosen LR aided ZF/SIC algorithms. The theoretical bound of B n represents a worst case scenario. Although we have proved the existence of these upper bounds, it is not necessary to evaluate these values because, in practice, LR aided ZF/SIC are quite close to the optimal one.

ηn = 1 +

Theorem 2. For the parallel ZF algorithm, if the lattice basis is reduced by b-LLL, then ηn = 2n

n Y

β j−1 + 1;

(15)

j=1

and if the basis is reduced by b-KZ, then ηn = 2n

n Y

j 2+ln(j)/2 + 1.

(16)

j=1

Proof: see Appx. C. Remark 1. Unfortunately, (15) is no better than that of LLL in [10, Lem. 1]. The hardness in the analysis is to incorporate the effect of length reduction of b-LLL/b-KZ, while Thm. 2 only employs their projection conditions or diagonal reduction conditions. Since our empirical survey strongly suggests using b-LLL/b-KZ for the ZF estimator (see e.g., Figs. 6 and 9), we need Thm. 2 to claim their bounds on ηn . ηn is related to B n in the following way. B denotes the symbol bound of x ˆi − xcvp i . For a reduced basis, we have the following relations: ∀i, khi k ≤ ωi λi (H), |ri,i | ≥ λ1 (H)/̟i , and the values of ωi and ̟i can be found in [21]. 1 In [10], η is referred to as proximity factor in the CVP context. To n avoid confusion with the proximity factor in [34], we simply call it “energy efficiency”. 2 The readers may consult [34] if not familiar with SIC routines.

≤ ((ηn + 1) ̺(H) + ωn λn (H)|ˆ xn − xcvp n |) ̟n−1 /λ1 (H) ≤ (ηn + 1) ̟n−1 ̺(H)/λ1 (H) + ωn (ηn + 1) ̟n ̟n−1 λn (H)̺(H)/λ21 (H).

B. Empirical B n

cvp In Fig. 2, we plot the maximal error maxi |xzf i − xi | for 4 the ZF estimator under 10 Monte Carlo runs. Four groups of simulations are tested with system size m = n = 8 or m = n = 12, and the size of constellations set as A = 8 or A = 32. Among the four histograms inside Fig. 2, we can see cvp that maxi |xzf i − xi | = 1 is the worst case behavior, and the probability of correct decoding decreases from around 30% to 10% when the system size n increases from 8 to 12. With similar settings, we have also plotted histograms of the SIC estimator. The probability of correct decoding with n = 8 is about 60%, and it slightly decreases to 50% when n = 12. The maximal symbol errors are most likely to occur with cvp maxi |xsic i −xi | = 1, and there exists a small probability that cvp we have maxi |xsic i − xi | = 2 when n = 12. The changes of constellation size A has almost no impact on these histograms.

C. Things are ready for AMP? Regarding the constellation of x, the previous discussions have demonstrated that the error of ZF/SIC estimator is bounded to a function about system dimension n and some inherent lattice metrics. This means we are not facing an infinite lattice decoding problem with Z constellations in Eq. (11), whence the application of AMP becomes possible.

5

10000 5000 0 10000 5000 0 10000 5000 0 10000 5000 0

n=8,A=8

0

1

2 3 n=8,A=32

4

5

0

1

2 3 n=12,A=8

4

5

2 3 n=12,A=32

4

5

4

5

0

1

0

1

10000 5000 0 10000 5000 0 10000 5000 0

IV. AMP ALGORITHM

FOR

3

n=8,A=8

0

1

2 3 n=8,A=32

4

5

0

1

2 3 n=12,A=8

4

5

E Q . (11)

By combing the non-informative likelihood function with the signal prior pX (xi ), we obtain a Maximum-a-Posteriori (MAP) function for Bayesian estimation: Y Y pX (xi ), (17) pa (x, ya ) p(x|y, H) ∝ i∈[n]

a∈[m]

2

cvp Fig. 2. The error histogram of ZF. The x-axis is maxi |xzf |. i − xi

10000 5000 0

As for the distribution of noise wamp = y − Hx, it is not known a priori. We can equip wamp with a Gaussian distribution whose radius is in the order O(λ1 (H)). To be concise, let p(wamp ) ∼ N(0, σ 2 Im ), σ 2 = O(λ1 (H)). This is crucial for obtain a non-informative likelihood function of x, i.e., p(x) ∼ N(H−1 y, σ 2 (H⊤ H)−1 ).

where pa (x, ya ) = exp(− 2σ1 2 (ya − Ha x)2 ), and the prior pX (xi ) to be designed in subsection IV-E. The MAP function p(x|y, H) is not discrete, so the measure events are extended from a power set (e.g., message passing decoding of LDPC [37]) to a field Rn in the Lebesgue measure space. The simplified belief propagation (BP) [38] in IV-A is folklore and can be found in some pioneering literature [22], [23], [24], [31]; they are however included to help understanding the derivation in the followed subsections. After deriving our AMP algorithm, we will present the threshold functions of certain priors and characterize the symbol-wise estimation errors in Thm. 3. A. Simplified BP

0

1

0

1

2 3 n=12,A=32

2

3

4

5

4

5

In the BP algorithm, there are m factor nodes and n variable nodes, indexed by {a, b} and {i, j} respectively. The message from variable i to factor a is given by Y mtb→i (xi ), (18) mt+1 i→a (xi ) = pX (xi ) b∈[m]\a

cvp Fig. 3. The error histogram of SIC. The x-axis is maxi |xsic |. i − xi

Moreover, the bound of B n can be made very small when designing our AMP algorithm. Regarding the channel matrix H, it is short and nearly orthogonal after lattice reduction. If the basis satisfies a subGaussian assumption, then one can adopt the well developed AMP [22], [23], [24] or GAMP [29], [30] algorithms to solve our problem in Eq. (11). Indeed, rigorous proofs [28], [36] showing the AMP algorithm can track the symbol-wise estimation errors are relying on this foundation. We are assuming a reduced basis can approximately take the blessing of this sub-Gaussian assumption. Rigorously proving this equivalence seems technically complicated, but our simulation results would confirm the plausibility of modeling reduced bases as sub-Gaussian. To improve the accuracy of this approximation, we should slightly modify the AMP algorithm to make it work with column-wise i.i.d. basis entries. To be concise, even in the cases where the successive minima constitute a basis, the entries of channel matrix H cannot be uniformly normalized to O(1/n). This motivates us to adjust the classic AMP algorithm in [22], [23], [24], hoping to reach the simplest routines.

where the message from a to i is Z Y t mtj→a (xj )}dx. {pa (x, ya ) ma→i (xi ) = x\xi

(19)

j∈[n]\i

These messages are impractical to evaluate in the Lebesgue measure space, and thus often simplified by many techniques. We attempt to remove the complexities from an expectation propagation [39] perspective. Suppose the message in Eq. (19) t is estimated by a Gaussian function with mean αta→i /βa→i t and variance 1/βa→i , then mta→i (xi )

t t = N(Hai xi ; αta→i /βa→i , 1/βa→i ). (20)

By substituting Eq. (20) into Eq. (18), we have X Hbi αtb→i )xi mt+1 i→a (xi ) ∝ pX (xi ) exp(( − 1/2(

b∈[m]\a

X

2 t 3 3 Hbi βb→i )x2i + O(nHai xi ))

b∈[m]\a

∝ pX (xi )N(xi ; ui→a , vi→a ), where uti→a

P

t b∈[m]\a Hbi αb→i , 2 t b∈[m]\a Hbi βb→i

= P

(21) (22)

6

t vi→a = P

1

(23)

. 2 t b∈[m]\a Hbi βb→i

In the other direction, we work out messages mt+1 i→a (xi ) with Gaussian functions through matching their first and second order moments by the following constraints: t t t t mt+1 i→a (xi ) = N(xi ; η(ui→a , vi→a ), κ(ui→a , vi→a )),

t η(uti→a , vi→a )

=

Z

xpX (x)N(x;

x

t κ(uti→a , vi→a )

=

Z

−η t η(uti→a , vi→a )

x 2

t uti→a , vi→a )dx,

(24)

For a reduced lattice basis H, we denote kh1 k2 = σ12 , . . . , khn k2 = σn2 . Then the variance of entries in H can be equipped, e.g., V(Hbi ) = σi2 /m, so one can employ this knowledge to further simplify the algorithm in IV-A. Here we define X t t Haj xt−1 (32) ra→i = αta→i /βa→i = ya − j→a . j∈[n]\i

(25)

t x2 pX (x)N(x; uti→a , vi→a )dx

t (uti→a , vi→a ),

B. Reaching O(m + n) scalars

t By equipping all the βb→i with ∀ b, referred Pequal magnitude 2 t ≈ σi2 due to the to as β¯b→i , as well as using b∈[m]\a Hbi law of large numbers, it yields 1 X 1 t xti→a = η( 2 Hbi rb→i , 2 t ), (33) σi σi β¯b→i b∈[m]\a

(26)

t ςi→a = κ(

t κ(uti→a , vi→a )

where and are referred to as threshold functions. From Eq. (24), inferring xt+1 i→a and its t+1 variance ςi→a from mt+1 i→a (xi ) by using the MAP principle yields: t t xt+1 (27) i→a = η(ui→a , vi→a ), t+1 t ςi→a = κ(uti→a , vi→a ).

(28)

By plugging the approximation of Eq. (24) into Eq. (19), which becomes an multidimensional Gaussian function expectation E(pa (x, ya )) about probability measure Q t j∈[n]\i mj→a (xj ), the integration over Gaussian functions becomes mta→i (xi ) ∝ X X t−1 2 |Haj |2 ςj→a ). Haj xt−1 N(Hai xi ; ya − j→a , σ + j∈[n]\i

j∈[n]\i

(29) Compare Eq. (29) with the previously defined mean t t αta→i /βa→i and variance 1/βa→i , we have

1 σi2

X

t Hbi rb→i ,

b∈[m]\a

1 ). t σi2 β¯b→i

(34)

For the moment, we can expand the local estimations about t t t ra→i and xti→a as ra→i = rat + δra→i , xti→a = xti + δxti→a , so the techniques in [24], [23] can be employed. The crux of these transformation is to neglect elements whose amplitudes are no larger than O(1/n). Subsequently, Eq. (32) and (33) become X t−1 t Haj (xt−1 +δxt−1 , (35) rat +δra→i = ya − j j→a )+Hai xi j∈[n]

xti +δxti→a = η(

1 1 1 X t Hbi (rbt +δrb→i )− 2 Hai rat , 2 t ). 2 σi σi σi β¯b→i b∈[m]

(36) In (35), terms with {i} indexes are mutually related while others are not, so that X Haj (xt−1 + δxt−1 (37) rat = ya − j j→a ), j∈[n]

αta→i = (ya −

X

2 Haj xt−1 j→a )/(σ +

j∈[n]\i

j∈[n]\i

t βa→i = 1/(σ 2 +

X

X

j∈[n]\i

t−1 |Haj |2 ςj→a ),

t−1 |Haj |2 ςj→a ).

t δra→i = Hai xt−1 . i

(38)

(30)

Further expand the r.h.s. of (36) with the first order Taylor expression of η(u, v) at u, in which

(31)

∂η(u, v) t = |u= 12 Pb∈[m]\a Hbi rb→i ,v= 2 1t σ σ β¯ ∂u i i b→i 1 1 X t t Hbi rb→i , 2 t ), κ( 2 σi2 β¯b→i σi σi β¯b→i

Thus far, Eqs . (22) (23) (27) (28) (30) (31) define a simplified version of BP, where the tracking of 2mn functions in Eqs. (18) and (19) has been replaced by the tracking of 6mn scalars.

then it yields

mta→i (xi )

with a density Remark 2. Our derivation is to equip function that can be fully described by its first and second moments, then one obtains their moment equations when passing mtj→a (xj ) back. In [31, Lem. 5.3.1], Maleki had applied the Berry–Esseen theorem to prove that approximating mta→i (xi ) with a Gaussian is tight. Although our variance t 1/βa→i of mta→i (xi ) looks different from his, they are indeed t t . of mti→a (xi ) as σ 2 ςi→a equivalent if we set the variance ςi→a Moreover, [31, Lem. 5.5.4] also justifies the correctness on the other side of our approximation.

(39)

b∈[m]\a

xti + δxti→a = η(

1 X 1 t Hbi (rbt + δrb→i ), 2 t ) σi2 σi β¯b→i b∈[m]

t −β¯b→i κ(

1 X 1 t Hbi (rbt + δrb→i ), 2 t )Hai rat . σi2 σi β¯b→i b∈[m]

Distinguishing terms that are dependent on index {a} leads to 1 1 X t Hbi (rbt + δrb→i ), 2 t ), (40) xti = η( 2 σi σi β¯b→i b∈[m]

7

δxti→a

1 1 X t t Hbi (rbt + δrb→i ), 2 t )Hai rat . κ( 2 = −β¯b→i σi σi β¯b→i b∈[m]

(41) Then we substitute (38) into (40), and (41) into (37), to obtain 1 X 1 xti = η( 2 Hbi rbt + xt−1 , 2 t ), (42) i σi σi β¯b→i b∈[m]

rat

= ya −

X

Haj xt−1 + φrat−1 , j

(43)

j∈[n]

where φ=

X

2 t−1 Haj β¯b→j κ(

j∈[n]

1 X 1 t Hbi (rbt + δrb→i ), 2 t ). σi2 σi β¯b→i b∈[m]

(44)

C. Further simplification From (42), the estimated variance for each xti now becomes 1 1 X Hbi rbt + xt−1 , 2 t ), (45) ςit = κ( 2 i σi σi β¯b→i b∈[m]

t As ςit ≈ ςi→b , ∀ b, (31) tells t β¯b→i

2

P

j∈[n]

σj2 ςjt−1

). (46) m t According to (46), we denote β¯b→i as 1/τt2 , then the whole algorithm can be described by the following four steps: X Hbi rbt + xt−1 , τt2 /σi2 ), (47) xti = η(1/σi2 i = 1/(σ +

b∈[m]

ςit = κ(1/σi2

X

, τt2 /σi2 ), Hbi rbt + xt−1 i

(48)

b∈[m]

rat+1

= ya − 2 τt+1

X

Haj xtj

+

j∈[n] 2

=σ +

P

P

j∈[n]

j∈[n]

σj2 ςjt

mτt2

σj2 ςjt

.

rat ,

(49)

(50)

m Let = 1/n , Θ = diag(1/σ12 , . . . , 1/σn2 ). The iterations in (47) to (50) can be summarized in Algo. 1. τ¯t2

P

2 t j∈[n] σj ςj

D. Discussions After lattice reduction and transforming the Z constraint to a finite set, we recognize that the AMP/GAMP algorithms in [26], [22], [29] can be employed for our problem after further regularizing the channels (i.e., let H ← HΘ1/2 and update the prior x ← Θ−1/2 x). However, Algo. 1 still gives valuable insights in the following aspects: explicitly study the impact of channel power  i)2 We can σ1 , ... , σn2 on the state evolution equation based on our 2 derivation, as shown in Sec. IV-F. Moreover, τt+1 in Algo. 1 reveals the averaged estimation variance of x and its convergence behavior, which is computationally advantageous if one needs to observe the and choose  convergence behavior a candidate in the set x0 , x1 , . . . , xT that has the best

Algorithm 1: The AMP algorithm. Input: Lattice basis H, target y, number of iterations T , threshold functions η and κ, the minimum symbol error σ 2 . Output: estimated coefficient x. 0 0 4 1 x = 0, r = y, τ0 = 10 ; ⊤ 2 Θ = diag(1/diag(H H)); 3 for t = 1, . . . , T do 4 xt = η(ΘH⊤ rt + xt−1 , Θτt2 1); 5 τ¯t2 = hΘ−1 κ(ΘH⊤ rt + xt−1 , Θτt2 1)i; 2 n τ¯t t r; 6 rt+1 = y − Hxt + m τt2 n 2 2 τ¯t 7 τt+1 = σ2 + m

fitness value (in that the last xT corresponding to a stable fixed point may not have the best fitness). ii) The estimated xt in Algo. 1 is reflecting the MAP estimation, while AMP with x ← Θ−1/2 x needs additional steps to scale xt back. Regularizing AMP with x ← Θ−1/2 x can be detrimental in a finite accuracy processor. For instance, in the single precision floating-point arithmetic defined in IEEE-754 standard, if u in η(u, v) is operating is a scaled range where e.g. only 4 bits of mantissa are effective, then the other 20 bits in the mantissa are wasted.

E. Associating discrete priors Algo. 1 needs to work with specifically designed threshold functions. From Secs. III-A and III-B, a dominant portion of “errors” would be corrected if we impose a ternary prior {−1, 0, 1} for pX (xi ). We present its threshold functions ηε (u, v) and κε (u, v) in Lem. 1, which can be proved after a simple algebra exercise. These threshold functions have closed forms and are easy to compute. The AMP algorithm using (51) (52) due to ternary priors is referred to as AMPT. Lemma 1. Let Y = X +W , with X ∼ pX (x) = (1−ε)δ(x)+ ε/2δ(x−1)+ε/2δ(x+1), W ∼ N(0, v). Then the conditional mean and conditional variance of X on Y are: sinh(u/v) , (1 − ε)/εe1/(2v) + cosh(u/v) (51) (1 − ε)/εe1/(2v) cosh(u/v) + 1 κε (u, v) , V(X|Y = u) = . ((1 − ε)/εe1/(2v) + cosh(u/v))2 (52) ηε (u, v) , E(X|Y = u) =

We also discuss the possibility of using general discrete Gaussian priors in Appx. E. In addition to some convergence issues, using a large constellation size is computationally intensive and fall out of our low complexity scope, so we will only include a closed form approximation of it in our simulations. The AMP algorithm using (66) (67) due to Gaussian priors is referred to as AMPG. In Fig. 4, we have plotted the ηε (u, v) graph by setting v = 1 and ε ∈ {0.1, 0.3, 0.5, 0.7, 0.9}. ηε (u, τt2 /σi2 ) is always Lipschitz continuous because τt2 /σi2 ≥ σ 2 /σi2 ∀i.

8

In the AMP algorithm for ternary alphabets, we shall  demonstrate the impact of channel power σj2 and sparsity (1 − ε). The technique involved is about analyzing fixed points [40]. We define a function based on (55):

1 η0.1 (u, 1) η0.3 (u, 1) η0.5 (u, 1) η0.7 (u, 1) η0.9 (u, 1)

η (u,1)

0.5

Ψ(e τ 2) ,

j∈[n]

0

ε

 1 X 2 σj E (1 − ε)g1 (Z, τe2 ) + εg2 (Z, τe2 ) + σ 2 . m

(56)

2

Definition 6 ([40]). τe is called a fixed point of Ψ(e τ ) if Ψ(e τ 2 ) = τe2 , and this point is called stable if there exists ǫ → 0+ , such that Ψ(e τ 2 + ǫ) < τe2 and Ψ(e τ 2 − ǫ) > τe2 . When Ψ(0) = 0, the stability condition is relaxed to Ψ(e τ 2 + ǫ) < τe2 . A fixed point is called unstable if it fails the stability condition.

−0.5

−1 −10

−5

0 u

5

2

10

Proposition 1. There exists a minimum ǫ′ > 0, such that ∀σ 2 > Pǫ′ , the highest stable P fixed point of Eq. (56) is Ψ(ε/m j∈[n] σj2 + σ 2 ) = ε/m j∈[n] σj2 + σ 2 .

Fig. 4. The threshold function ηε (u, v).

F. The impact of channel power and sparsity We first present a theorem about the state evolution equation of our model, whose proof is given in Appx. D.

Proof: Since we have   1 X 2 1−ε ε 2 lim Ψ(e τ )= σj + σ2 + τ e2 →∞ m (1 − ε) /ε + 1 (1 − ε) /ε + 1 j∈[n] ε X 2 σj + σ 2 , = m

Theorem 3. Assume the reduced lattice basis can be modeled j∈[n] ˆ . Let xt denote the as Hbi ∼ N(0, σi2 /m), and x0 = xcvp − x one can always tune σ 2 such that Ψ(e τ 2 ) intersects with estimation inside Algo. 1. Then, almost surely 2 2 n o τ ) = τe and the point of intersection becomes stable.

2  f (e 2 ) − X|2 , ∀i , This point is the highest one as ∂Ψ(e lim xti − x0,i , ∀i = E|η(X + τt,i Z, τt,i τ 2 )/∂e τ 2 = 0 for all n→∞ P 2 2 2 τ 2 ) < τe2 in this τe > ε/m j∈[n] σj + σ , which means Ψ(e in which τt,i meets the following relation: region. 1 X 2 σ2 In the proposition, the highest fixed point is unique if 2 2 2 τt,i = σj E|η(X + τ(t−1),j Z, τ(t−1),j ) − X| + 2 , ∂Ψ(e τ 2 )/∂e τ 2 < 1 ∀e τ 2 > 0, which means the increment of mσi2 σi j∈[n] 2 τ ) is never larger than that of f (e τ 2 ) = τe2 . (53) Ψ(e Prop. 1 reflects the worst case mean square error (MSE) where the expected value is taken over two independent performance of our algorithm. One implication of Prop. 1 is, random variables Z ∼ N(0, 1) and X ∼ pX . a stronger lattice reduction method can help to make the fixed We call (53) a state evolution equation. Notice that if we point smaller. E.g., with b-KZ, one has 2 2 2 define τet2 , τt,j σj2 = τt,i σi , then Eq. (53) becomes X √j + 3 X 2 λj (H) σ ≤ j 1 X 2 2 2 τet2 = σj E|η(X + τet−1 /σj Z, τet−1 /σj2 ) − X|2 + σ 2 , j∈[n] j∈[n] m j∈[n] (54) for n ≥ 2. Another implication is, the performance of AMP should be better if the real spark ε is small. There is however and this relation is reflected by step 7 in Algo. 1. no genie granting which ε fits the actual prior. According to Based on Eq. (54), we can use Lem. 1 with W ∼ our simulations, ε = 0.5 is a good trade-off. 2 2 3 N(0, τe /σ ) and the expression of p (x) to obtain t−1

τet2 =

where

X

j

 1 X 2 2 2 σj E (1 − ε)g1 (Z, τet−1 ) + εg2 (Z, τet−1 ) + σ2 , m j∈[n]

(55)

2 g1 (Z, τet−1 )=

2 g2 (Z, τet−1 )=

2

2

2

2

(1 − ε)/εeσj /(2eτt−1 ) cosh(Zσj /e τt−1 ) + 1

((1 − ε)/εeσj /(2eτt−1 ) + cosh(Zσj /e τt−1 ))2 2

,

2

τt−1 ) 2 )+1 (1 − ε)/εeσj /(2e cosh(Zσj /e τt−1 + σj2 /e τt−1

((1 − ε)/εe

2 σj2 /(2e τt−1 )

2 ))2 + cosh(Zσj /e τt−1 + σj2 /e τt−1

.

3 Note that this step is using an unconditional variance rather than the conditional variance in Eq. (52).

G. Complexity

The complexity is assessed by counting the number of floating-point operations (flops). For the threshold functions  (51) (52) of AMPT, we can use sinh uv ≈ uv , cosh uv ≈ P 2 2n ∞ x 1 + u 2 for small u/v since sinh x = n=0 (2n)! , cosh x = P∞ 2v x2n+1 n=0 (2n+1)! . Outer bounding (51) (52) is also possible for large u/v, so we can approximate (51) (52) by O(1) flops, respectively. The O(1) complexity also holds for (66) (67) of AMPG. In conclusion, the complexity of our AMP program is O(mnT ). On the contrary, a full enumeration with a ternary constraint requires at least O(3n ) flops, and ZF/SIC requires O(mn2 ) flops.

9

0

−1

10

10

−2

SER

SER −4

10

sphere precoding b-LLL-ZF b-LLL-ZF-AMPT b-LLL-ZF-AMPG b-LLL-SIC b-LLL-SIC-AMPT b-LLL-SIC-AMPG

23

25

27 29 SNR/dB

−3

10

−4

LLL-ZF LLL-ZF-AMP b-LLL3-ZF-AMP b-LLL9-ZF-AMP

10

n=14

−6

10

10

n=8

−2

10

31

33

−5

10

25

26

27

28 SNR/dB

29

30

31

Fig. 5. The symbol error rate of different algorithms. Fig. 6. The impact of lattice reduction, n = 20.

V. S IMULATIONS 0

10

−2

10 SER

In this section, we examine the symbol error rate (SER) performance and the complexity of our hybrid scheme, as well as the impact of lattice reduction and parameter selection. All the SER figures are generated from 2 × 104 Monte Carlo runs. In the first example, we study the SNR versus SER performance of our AMP algorithms. The system size is set as m = n = 8 or m = n = 14. We use b-LLL of list size 1 as the reduction method. The benchmark algorithms serving the comparison purpose include sphere precoding, b-LLL-ZF and b-LLL-SIC. The sphere precoding method [2] which exactly solves CVP serves as the lower bound, while b-LLL-ZF/bLLL-SIC shows the system performance before gluing the AMP algorithm. In the AMP algorithm, we set σ 2 = 0.05, T = 20. We set ε = 0.5 for AMPT and σg2 = 0.5 for AMPG. According to Fig. 5, with n = 8, most algorithms approach the sphere precoding lower bound except b-LLL-ZF, which is because the lattice reduction aided methods are all diversity achieving and their gaps to sphere precoding are modest in small system size. With n = 14, we can observe 1dB gain from b-LLL-ZF to b-LLL-AMPG, and 2dB gain from b-LLL-ZF to b-LLL-AMPT. As for the SIC estimators, the AMPT&G have about 1dB gain from the original b-LLL-SIC, and they are within 0.5dB distance from the optimal sphere precoding. In the second example, we study the impact of lattice reduction algorithms. The AMP parameters as chosen as above. We use higher system size as m = n = 20 and adopt ZF so as to show more significant improvements. From Fig. 6, we can see that a stronger LR algorithm provides more obvious SER gain in the high SNR region. For instance, the b-LLL with 3/9 branches (b-LLL-3/9) has about 0.5/1dB improvement over the classic LLL reduction. Since the system size is larger than those of the first example, we can observe at least 2.5dB SER gain with our hybrid scheme. There should be a further gain by using b-KZ reduction, it is however excluded in the figure because simulating b-KZ consumes too much time. In the third example, we examine the effect of choosing different spark ǫ in AMPT. The benchmark algorithms are chosen as b-LLL aided ZF/SIC, and we set ε ∈ {0.1, 0.5, 0.9}.

−4

10

b-LLL-ZF b-LLL-ZF-0.1 b-LLL-ZF-0.5 b-LLL-ZF-0.9 b-LLL-SIC b-LLL-SIC-0.1 b-LLL-SIC-0.5 b-LLL-SIC-0.9

ZF

SIC

−6

10

23

25

27

29

31

33

SNR/dB

Fig. 7. The impact of spark ε on SER, n = 14.

According to Fig. 7, with ZF initialization, the gain with ε = 0.9 is only about 1dB, while ε = 0.9 and ε = 0.5 have about 1.5dB and 2.5dB gains, respectively. It shows that ε = 0.5 is a good trade-off value. Similar observation is made with SIC initialization. In the last example, we examine the complexity of our AMP algorithms. We use estimations in Sec. IV-G to measure the complexity of ZF/SIC and AMP. As for the sphere precoding algorithm, it is implemented after b-LLL so as to decrease its complexity. All algorithms can take the benefits of b-LLL, and the complexity costed by lattice reduction is not counted for all of them. The actual complexity of sphere precoding depends on the inputs, so we count the number of nodes it visited, and assign 2k + 7 flops to a visited node in layer k [21]. From Fig. 8, we can see that AMP with a constant iteration number, e.g., T = 10 or T = 20, is adding little complexity budget to that of ZF/SIC. On the contrary, the exponential complexity of sphere precoding makes it at least 200 times more complicated than our ZF/SIC+AMP scheme in dimension n = 22.

10

performance of different lattice reduction algorithms from dimension 14 to 20. This figure displays the advantages of boosted algorithms versus non-boosted algorithms, where the b-LLL with 1 to 9 branches (b-LLL-1/3/9) all outperform LLL/KZ, and b-KZ provides the best sin θmin functioning.

7

10

sphere precoding ZF/SIC ZF/SIC+AMP, T=10 ZF/SIC+AMP, T=20

6

complexity in flops

10

5

10

0.5 LLL b−LLL−1 b−LLL−3 b−LLL−9 KZ b−KZ

4

10

0.45 3

2

10

6

8

10

12 14 16 dimension n

18

20

22

Fig. 8. The complexity of different algorithms.

In this work, we have presented a hybrid precoding scheme for VP. The precoding problem in VP is about solving CVP in the geometry of numbers, and this problem is quite general because the signal space lies in integers Z. After performing certain LR aided estimations, we demonstrated that the signal space have been significantly constrained, which paves the way for the application of the celebrated AMP algorithm in compressed sensing. By using the AMP algorithm with ternary prior or Gaussian prior, we can have threshold functions that enjoy closed form expressions. Our simulations showed that our hybrid scheme can provide a few dB’s gain in SER for VP, and the attached AMP algorithm is adding little complexity to that of ZF/SIC. As the extension of this work, one may investigate the hybrid scheme for other signal processing problems. A PPENDIX A AMP

Both the OD in lattice theory and coherence parameter µ (H) (without l2 -norm normalization4) in compressed sensing [41] can serve as the metric to evaluate the goodness of a basis, in which we found both metrics imply the boosted versions of LLL/KZ are better. More generally we can study the maximal correlation of H:



(h ) sin θmin , min πH\h

/ khi k . i i i

The goodness on sin θmin implies virtues for both µ (H) and ξ(H), where sin arccos (µ (H)) ≥ sin θmin , and n

Y

⊥ (h ) khi k / π[h ξ(H) =

i ,...,h ] 1 i−1 i=1

−n

≤ (sin θmin )

.

According to the proof of Thm. 2, sin θmin also controls the bound of ηn . In Fig. 9, we have plotted the maximal correlation 4 µ (H)

, max1≤i6=j≤n |h⊤ i hj |/ khi k khj k .

0.4

0.35

0.3

VI. C ONCLUSIONS

B -LLL/ B -KZ ARE GOOD FOR

sin θ min

10

0.25 14

15

16

17 18 Dimension n

19

20

Fig. 9. The maximal correlation sin θmin of different LR algorithms.

P ROOF

A PPENDIX B E Q . (14) IN T HM . 1

OF

When proving the energy efficiency of b-KZ aided SIC/ZF, the following lemma would be needed. Remind that H = QR is the QR factorization. Lemma 2 ([21]). Suppose a basis H is b-KZ reduced, then this basis conforms to 8i 2 (i − 1)ln(i−1)/2 ri,i , 9   2i 2 2 khi k ≤ 1 + (i − 1)1+ln(i−1)/2 ri,i 9 for 1 ≤ i ≤ n, and λ1 (H)2 ≤

2 rk−j+1,k−j+1 ≤

8j 2 (j − 1)ln(j−1)/2 rk,k 9

(57) (58)

(59)

for 2 ≤ k ≤ n, j ≤ k.

Under the unitary transform Q⊤ , we aim to prove an equivalence of (12) as k¯ y − Rˆ xk ≤ ηn minn k¯ y − Rxk , x∈Z

(60)

with y ¯ = Q⊤ y. Let vcvp = Rxcvp be the closest vector to y ¯, and vsic = Rxsic be the vector founded by SIC. As the SIC parallelepiped generally mismatches the Voronoi region, we need to investigate the relation of xcvp and n cvp sic xsic = ⌊¯ y /r ⌉ as in that in [35]. If x = x , we n n,n n n n only need to investigate ηn−1 in another n − 1 dimensional

¯ − R1:n,1:n−1 xsic

CVP by setting y ¯←y ¯ − rn xsic n : y 1:n−1 ≤ ηn−1 minx∈Zn−1 k¯ y − R1:n,1:n−1 xk. When this situation continues till the first layer, one clearly has η1 = 1. Generally, we can assume that this mismatch first happens in the kth

11

layer, i.e., assume xcvp 6= xsic k , k ∈ {2, . . . , n}, then k cvp 1 |¯ yk /rk,k − xk | ≥ 2 , and k¯ y−v

cvp 2

k ≥

2 rk,k (¯ yk /rk,k



2 xcvp k )



2 rk,k /4.

(61)

2 According to (59) of b-KZ, we have rk−j+1,k−j+1 ≤ 8j 9 (j − ln(j−1)/2 2 sic 1) rk,k , then the SIC solution R1:n,1:k x1:k satisfies k X

2

y

≤ 1 ¯ − R1:n,1:k xsic r2 1:k 4 i=1 i,i   1 2k 1+ln(k−1)/2 2 ≤ rk,k . + (k − 1) 4 9 (62)

Combining (62) and (61), and choose k = n in the worst case, we have  

2 8n 2 1+ln(n−1)/2

y ¯ − vsic ≤ 1 + k¯ y − vcvp k . (n − 1) 9 A PPENDIX C P ROOF OF T HM . 2

The energy efficiency of b-LLL/b-KZ aided ZF precoding is non-trivial to prove because we cannot employ the size reduction conditions to claim an upper bound for (Ai )−1 1,1 as that in [34, Eq. (65)], in which Ai = R⊤ R . i:n,i:n i:n,i:n This condition is crucial as one already has sin2 θi =

khi k (Ai )−1 1,1

according to [34, Appx. I], where θi is the angle between hi and span(h1 , . . . , hi−1 , hi+1 , . . . , hn ). The following lemma proves a lower bound for sin2 θi by only invoking the relation 2 2 between khi k and ri,i . Lemma 3. Let H be a b-KZ  reduced basis, then it satisfies Qn 2 2+ln(k)/2 −1 sin θi ≥ . k=i k i Proof: Define Mk = R−1 i:k,i:k along with M = # " 1 k−1 k−1 M R M i:k−1,k r k,k . Mk = 1 0 rk,k

1 ri,i ,

then

k−1 By using Cauchy–Schwarz inequality on M1,: Ri:k−1,k , we also have  2

k 2 k−1 2 1 k−1

+

M1,: = M M R i:k−1,k 1,: rk,k 1,: ! 2

k−1 2 kRi:k−1,k k

. (63) 1+ ≤ M1,: 2 rk,k 2

 ≤ 1 + 2k 9 (k −

(a)

2 It is evident that kRi:k−1,k k ≤ khk k − rk,k  2 2 − rk,k , where (a) is due to inequality (58), 1)1+ln(k−1)/2 rk,k kR

k2

1+ln(k−1)/2 . Plug this into (63), ≤ 2k so that i:k−1,k 2 9 (k−1) rk,k then  

k 2 k−1 2

1 + 2k (k − 1)1+ln(k−1)/2

M1,: ≤ M 1,: 9

k−1 2 2+ln(k)/2

≤ M k . 1,:

n Y

n 2

M1,: ≤ r−2 (Ai )−1 = k 2+ln(k)/2 . 1,1 i,i k=i+1

and thus 2

sin θi ≥

khi k

2

Qn

2 ri,i

k=i+1

k 2+ln(k)/2



n Y

k

2+ln(k)/2

k=i

!−1

,

where the second inequality is due to Lem. 2. With the same technique as above, we can bound sin2 θi for b-LLL. Lemma 4. Let H be a b-LLL reduced basis, then it satisfies Qn k−1 −1 sin2 θi ≥ . k=i β

We proceed to investigate inequality (60). Let vcvp = Rxcvp be the closest vector to y ¯, and vzf P = Rxzf be the n cvp zf vector found by ZF. Define v −v = i=1 φi hi with φi ∈ Z. If vcvp = vzf , then the energy efficiency ηn = 1. If vcvp 6= vzf , then n

cvp

X

v kφj hj k . − vzf ≤ j=1

At the same time, we have

¯ = vcvp − vzf + vzf − y ¯ vcvp − y ′ = (φk + φzf k )hk + m ,

1 2

2

By induction, one has

where P m′ ∈ span(h1 , . . . , hk−1 , hk+1 , . . . , hn ), vzf − n zf zf ¯ y = i=1 φi hi satisfies |φi | ≤ 1/2 ∀i, and k

, arg maxi kφi hi k. From Lem. 3, (φk + φzf )hk + m′ ≥ k Q −1 n 2+ln(j)/2 |φk + φzf khk k, so that k | j=k j 

¯ k ≥ |φk | 2 kvcvp − y

n Y

j=k

−1

j 2+ln(j)/2 

khk k

as |φk + φzf k | ≥ |φk |/2. According to the triangle inequality, one has for b-KZ that

zf



v − y ¯ ≤ vzf − vcvp + kvcvp − y ¯k   n Y ¯k . j 2+ln(j)/2 + 1 kvcvp − y ≤ 2n j=1

One can similarly prove for b-LLL that   n Y

zf

v − y ¯k . ¯ ≤ 2n β j−1 + 1 kvcvp − y j=1

A PPENDIX D P ROOF OF T HM . 3

Proof: We follow [36, Sec. 1.3] to analysis the state evolution equation (53). Let the observation equation be yt = Ht x0 + w, where the prior of x0 is denoted by pX , Hbi ∼ N(0, σi2 /m), and wi ∈ N(0, σ 2 ). Without the Onsager term, the residual equation becomes: rt = yt − Ht xt .

(64)

12

Along with with independently generated {Ht }, the estimation equation becomes: xt+1 = η(ΘHt⊤ rt + xt , Θτt2 1).

(65)

Then we evaluate the first input for the threshold function η: ΘHt⊤ rt + xt = ΘHt⊤ (Ht x0 + w − Ht xt ) + xt t⊤ w. = x0 + (ΘHt⊤ Ht − I)(x0 − xt ) + ΘH {z } | {z } |

for any k > 0. This implies that ρσg (z) can be calculated from a finite range. E.g., we have Prz∼ρσg (z) (|z| > 10σg ) ≤ 3.86 × 10−22. If σg = 0.1, then ρσg (z) becomes equivalent to our ternary prior with ε ≤ 0.5. Assume that we have observed Y = u from model Y = X + W , with X ∼ pX (x) = ρσg (x), W ∼ N(0, v). Then the threshold functions are given by ηd (u, v) =

v

u

σi2

Regarding term v, it satisfies V(vi ) = m × σ14 ×m×σ 2 , which i means vi ∼ N(0, σ 2 /σi2 ). As for the statistics of term u, we need the following basic algebra to measure term ΘHt⊤ Ht −I: Suppose that we have two independent Gaussian columns hi and hj whose entries are generated from N(c, σi2 /m) and N(c, σj2 /m) respectively. Then ∀i 6= j, we have E(h⊤ i hj ) = 2 2 2 2 2 2 ⊤ mc and V(hi hj ) = σi σj /m + c (σi + σj ). For i = j, 2 2 we have E(khi k ) = mc2 + σi2 and V(khi k ) = 2σi4 /m2 + 2 2 4c σi /m. Further denote the covariance matrix of x0 − xt as 2 2 2 2 diag(ˆ τt,1 , ... , τˆt,n ), where τˆt,i = E|η(X + τt,i Z, τt,i ) − X|2 , X ∼ pX , Z ∼ N(0, 1). Then {ui } are i.i.d. with zeros mean and variance 2 τˆt,i 1 X 2 2 σj τˆt,j , + m mσi2

2 k 2 1 X − 2σl g2 − (l−u) 2v le , Sk

l=−k

κd (u, v) =

k (l−u)2 l2 1 X 2 − 2 − 2v (l − ηg (u, v)) e 2σg , Sk l=−k

Pk 2 2 2 where Sk = l=−k e−l /2σg −(l−u) /(2v) . We however stress that evaluating ηd (u, v) and κd (u, v) is generally computationally intensive, and the fixed points of their state evolution equation are unfathomable. Fortunately, the sum of a discrete Gaussian and a continuous Gaussian resembles a continuous Gaussian if the discrete Gaussian is not very bumpy [43, Lem. 9], so we can replace ρσg (x) with N(x; 0, σg2 ) with properly chosen σg2 . Let the signal prior be pX (x) = N(x; 0, σg2 ), then it corresponds to another pair of threshold functions that have closed forms: uσg2 , σg2 + v vσg2 . κg (u, v) = 2 σg + v

2 P τˆt,i 1 2 2 in which m ˆt,j and thus negligible. The ≪ mσ 2 j∈[n] σj τ i t⊤ t t t entry of ΘH r + x can be written as x0,i + τt,i Z, where the variance of τt,i Z = ui + vi satisfies

2 τt,i

σ2 1 X 2 2 σj τˆt,j + 2 = 2 mσi σi j∈[n]

1 X 2 σ2 2 2 = ) − X| + σ E|η(X + τ Z, τ , (t−1),j j (t−1),j mσi2 σi2

Ψg (e τ 2 ) , σ2 +

H OW TO

ASSOCIATE GENERAL DISCRETE

G AUSSIAN

PRIORS

1 X τe2 σj2 σg2 . m τe2 + σj2 σg2

(68)

j∈[n]

j∈[n]

A PPENDIX E

(67)

Based on Eq. (54), we can obtain its equivalent fixed point function:

(a)

where (a) comes from evaluating the covariance of x0 − xt .

(66)

ηg (u, v) =

j∈[n]

2 Ψg (e τ 2 ) is proportional to τe2 . Define σmin , minj σj2 and 2 2 σmax , maxj σj , we also have 2 ne τ 2 σmin σg2

 ≤ Ψg (e τ 2 ) − σ2 ≤ 2

2 σ m τe2 + σmin g

2 ne τ 2 σmax σg2

2 m τe2 + σmax σg2

.

Since we have only proved that maxi |ˆ xi −xcvp i | is bounded by a function about dimension n and some lattice metrics, one may wonder whether a discrete Gaussian prior for pX (x) brings some benefits. Indeed, the ternary prior can be treated as a special case of the general discrete Gaussian priors. A discrete Gaussian distribution on Z with zero mean and width σg is defined as

As a consequence, one can easily prove that Eq. (68) has a 2 2 unique stable fixed point that satisfies τe2 ∈ [e τmin , τemax ], where

1 −z2 /(2σg2 ) e , S P∞ 2 2 where S = k=−∞ e−k /(2σg ) . According to a tail bound on discrete Gaussian [42, Lem. 4.4], we have

2 2 2 and τemax is defined by replacing σmin with σmax in (69). In order to make the fixed point small, one should also make the lattice basis short. The setting of σg2 is however a trade-off: it should be set smaller to yield a lower fixed point, but there should be a minimum for it so that the imposed prior still reflects discrete Gaussian information.

ρσg (z) =

Prz∼ρσg (z) (|z| > kσg ) ≤ 2e−k

2

/2

  1 2 n 2 σ + − 1 σmin σg2 2r m 2  1  2 n 2 σ 2 , (69) 2 σ2 + 4σ 2 σmin + − 1 σmin σ + g g 2 m

2 τemin =

13

R EFERENCES [1] C. Peel, B. Hochwald, and A. Swindlehurst, “A Vector-Perturbation Technique for Near-Capacity Multiantenna Multiuser Communication– Part I: Channel Inversion and Regularization,” IEEE Trans. Commun., vol. 53, no. 1, pp. 195–202, jan 2005. [2] B. M. Hochwald, C. B. Peel, and A. L. Swindlehurst, “A vectorperturbation technique for near-capacity multiantenna multiuser communication - Part II: Perturbation,” IEEE Trans. Commun., vol. 53, no. 3, pp. 537–544, 2005. [3] D. Micciancio and S. Goldwasser, Complexity of Lattice Problems, pp. 1–228. Boston, MA: Springer US, 2002. [4] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, “Closest point search in lattices,” IEEE Trans. Inf. Theory, vol. 48, no. 8, pp. 2201–2214, 2002. [5] B. Hassibi and H. Vikalo, “On the sphere-decoding algorithm I. Expected complexity,” IEEE Trans. Signal Process., vol. 53, no. 8, pp. 2806–2818, aug 2005. [6] C. Masouros, M. Sellathurai, and T. Ratnarajah, “Computationally efficient vector perturbation precoding using thresholded optimization,” IEEE Trans. Commun., vol. 61, no. 5, pp. 1880–1890, 2013. [7] J. Maurer, J. Jalden, D. Seethaler, and G. Matz, “Vector Perturbation Precoding Revisited,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 315–328, jan 2011. [8] H. S. Han, S. H. Park, S. Lee, and I. Lee, “Modulo loss reduction for vector perturbation systems,” IEEE Trans. Commun., vol. 58, no. 12, pp. 3392–3396, 2010. [9] S. H. Park, H. S. Han, S. Lee, and I. Lee, “A decoupling approach for low-complexity vector perturbation in multiuser downlink systems,” IEEE Trans. Wirel. Commun., vol. 10, no. 6, pp. 1697–1701, 2011. [10] S. Liu, C. Ling, and X. Wu, “Proximity factors of lattice reduction-aided precoding for multiantenna broadcast,” IEEE Int. Symp. Inf. Theory Proc., pp. 2291–2295, 2012. [11] C. Masouros, M. Sellathurai, and T. Ratnarajah, “Maximizing energy efficiency in the vector precoded MU-MISO downlink by selective perturbation,” IEEE Trans. Wirel. Commun., vol. 13, no. 9, pp. 4974– 4984, 2014. [12] D. A. Karpuk and P. Moss, “Channel Pre-Inversion and max-SINR Vector Perturbation for Large-Scale Broadcast Channels,” no. 4, pp. 1–12, 2016. [13] Y. Ma, A. Yamani, N. Yi, and R. Tafazolli, “Low-Complexity MU-MIMO Nonlinear Precoding Using Degree-2 Sparse Vector Perturbation,” IEEE J. Sel. Areas Commun., vol. 34, no. 3, pp. 497–509, mar 2016. [14] S. Lyu and C. Ling, “Sequential lattice reduction,” in 2016 8th Int. Conf. Wirel. Commun. Signal Process. IEEE, oct 2016, pp. 1–5. [15] C. Windpassinger, R. F. H. Fischer, and J. B. Huber, “Lattice-reductionaided broadcast precoding,” IEEE Trans. Commun., vol. 52, no. 12, pp. 2057–2060, 2004. [16] M. Taherzadeh, A. Mobasher, and A. K. Khandani, “Communication over MIMO broadcast channels using lattice-basis reduction,” IEEE Trans. Inf. Theory, vol. 53, no. 12, pp. 4567–4582, 2007. [17] A. K. Lenstra, H. W. Lenstra, and L. Lovász, “Factoring polynomials with rational coefficients,” Math. Ann., vol. 261, no. 4, pp. 515–534, 1982. [18] H. Minkowski, “Diskontinuitätsbereich für arithmetische Äquivalenz.” J. für die reine und Angew. Math. (Crelle’s Journal), no. 129, pp. 220–224, 1905. [19] L. Afflerbach and H. Grothe, “Calculation of Minkowski-reduced lattice bases,” Computing, vol. 35, no. 3-4, pp. 269–276, sep 1985. [20] A. Korkinge and G. Zolotareff, “Sur les formes quadratiques positives,” Math. Ann., vol. 11, no. 2, pp. 242–292, jun 1877. [21] S. Lyu and C. Ling, “Boosted KZ and LLL Algorithms,” IEEE Trans. Signal Process., vol. 65, no. 18, pp. 4784–4796, sep 2017. [22] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing.” Proc. Natl. Acad. Sci. U. S. A., vol. 106, no. 45, pp. 18 914–18 919, 2009. [23] ——, “Message passing algorithms for compressed sensing: I. motivation and construction,” in IEEE Inf. Theory Work. 2010 (ITW 2010). IEEE, jan 2010, pp. 1–5. [24] A. Montanari, “Graphical models concepts in compressed sensing,” in Compress. Sens., Y. C. Eldar and G. Kutyniok, Eds. Cambridge: Cambridge University Press, 2010, pp. 394–438. [25] S. Wu, L. Kuang, Z. Ni, J. Lu, D. Huang, and Q. Guo, “Low-Complexity Iterative Detection for Large-Scale Multiuser MIMO-OFDM Systems Using Approximate Message Passing,” IEEE J. Sel. Top. Signal Process., vol. 8, no. 5, pp. 902–915, oct 2014.

[26] C. Jeon, R. Ghods, A. Maleki, and C. Studer, “Optimality of Large MIMO Detection via Approximate Message Passing,” in 2015 IEEE Int. Symp. Inf. Theory, vol. 2015-June, 2015, pp. 1227–1231. [27] R. Ghods, C. Jeon, A. Maleki, and C. Studer, “Optimal Large-MIMO Data Detection with Transmit Impairments,” no. 2, 2015. [28] M. Bayati, M. Lelarge, and A. Montanari, “Universality in polytope phase transitions and message passing algorithms,” Ann. Appl. Probab., vol. 25, no. 2, pp. 753–822, apr 2015. [29] S. Rangan, “Generalized approximate message passing for estimation with random linear mixing,” in 2011 IEEE Int. Symp. Inf. Theory Proc. IEEE, jul 2011, pp. 2168–2172. [30] S. Rangan, P. Schniter, and A. Fletcher, “On the convergence of approximate message passing with arbitrary matrices,” in 2014 IEEE Int. Symp. Inf. Theory. IEEE, jun 2014, pp. 236–240. [31] A. Maleki, “Approximate Message Passing Algorithms for Compressed Sensing,” Ph.D. dissertation, Stanford university, 2011. [32] P. Q. Nguyen and V. Brigitte, The LLL Algorithm, ser. Information Security and Cryptography, pp. 1–503. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. [33] J. C. Lagarias, H. W. Lenstra, and C. P. Schnorr, “Korkin-Zolotarev bases and successive minima of a lattice and its reciprocal lattice,” Combinatorica, vol. 10, no. 4, pp. 333–348, 1990. [34] C. Ling, “On the proximity factors of lattice reduction-aided decoding,” IEEE Trans. Signal Process., vol. 59, no. 6, pp. 2795–2808, 2011. [35] L. Babai, “On Lovasz lattice reduction and the nearest lattice point problem,” Combinatorica, vol. 6, no. 1, pp. 1–13, 1986. [36] M. Bayati and A. Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 2, pp. 764–785, 2011. [37] T. Richardson and R. Urbanke, Modern Coding Theory, pp. 1–572. Cambridge: Cambridge University Press, 2008. [38] Y. Weiss and W. T. Freeman, “On the optimality of solutions of the maxproduct belief-propagation algorithm in arbitrary graphs,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 736–744, 2001. [39] T. P. Minka, “Expectation Propagation for Approximate Bayesian Inference,” in Proc. 17th Conf. Uncertain. Artif. Intell. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2001, pp. 362—-369. [40] L. Zheng, A. Maleki, H. Weng, X. Wang, and T. Long, “Does lp-minimization outperform l1-minimization?” IEEE Trans. Inf. Theory, pp. 1–1, jul 2017. [41] S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing, ser. Applied and Numerical Harmonic Analysis, pp. 1–625. New York, NY: Springer New York, 2013. [42] V. Lyubashevsky, “Lattice signatures without trapdoors,” in Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 7237 LNCS, 2012, pp. 738–755. [43] C. Ling and J.-C. Belfiore, “Achieving AWGN Channel Capacity With Lattice Gaussian Coding,” IEEE Trans. Inf. Theory, vol. 60, no. 10, pp. 5918–5929, oct 2014.

Suggest Documents