A Fast Generalized Sphere Decoder for Optimum Decoding of Under ...

A Fast Generalized Sphere Decoder for Optimum Decoding of Under-determined MIMO systems Pranav Dayal and Mahesh K. Varanasi e-mail: fdayalp, [email protected] University of Colorado, Boulder, CO 80309

Abstract A new Generalized Sphere Decoder (GSD) is proposed for optimum decoding of lattice points in a system with fewer equations than unknowns. The proposed algorithm is significantly faster than the one proposed in [1] and is a natural extension of the sphere decoding philosophy introduced in [2, 3]. The complexity reduction is achieved by partitioning the entire range of maximum likelihood hypotheses for the GSD into ordered subsets. Such a partition enables the rejection of several hypothesis at once. A parameter called the depth of the GSD is shown to offer additional complexity reduction as the gap between the number of equations and the number of unknowns grows. Numerical results are presented to show the application of the proposed GSD for various Multiple-Input Multiple-Output (MIMO) systems.

1 Introduction The advent of smart closest lattice point search algorithms such as the Pohst strategy [2] or the Schnorr-Euchner strategy [4] have enabled lattice encoding as an effective means for achieving good performance on both Gaussian and Rayleigh fading channels. The sphere decoding algorithm [3] based on the Pohst strategy has found application in several communication schemes such as the ones in [5–8]. Even though the general closest lattice point search problem is NP-hard, the expected complexity of the sphere decoding algorithm has been observed to be polynomial in the number of unknowns [6, 9] for systems designed for Rayleigh fading channels with additive white Gaussian noise. However, this polynomial nature is not preserved when the system is under-determined, i. e., the vector of observed statistics lies in a space of dimension smaller than the number of unknowns. The generalized sphere decoder presented in [1] solves the closest lattice point problem with a complexity exponential in the difference of the number of equations and the number of unknowns. In this paper, we present a new generalized sphere decoding algorithm that is significantly faster than the one in [1] and possibly a step towards achieving non-exponential complexity for decoding under-determined systems. This paper is organized as follows. The description of the existing GSD is presented in Section 2. The new GSD algorithm is explained in Section 3. Further improvement of the proposed strategy is presented in Section 4. Numerical results are presented in Section 5 and the conclusions are given in Section 6. This

work was supported in part by NSF Grant CCR-0112977 and by ARO grant DADD19-03-1-0248

2 Preliminaries Since any complex linear model can be reduced to an equivalent real model, the description of the generalized sphere decoding algorithm is presented for the real model only. Consider the q -PAM constellation given by Iq = f odd integer j j q + 1 j q 1g, for some even integer q . Let x 2 IqM be the transmitted vector of information symbols. Let H 2 IRN M be the equivalent channel transfer matrix with N < M and rank (H) = N . Let the received statistics be given by y = Hx + n; (1) where n 2 IRN is the random vector of additive white Gaussian noise. Any linear system that can be reduced to the form in (1) will be referred to as under-determined because the number of equations N is less than the number of unknowns M . The matrix H can be interpreted as the generator matrix of the projection of a random lattice to IRN and y as the vector perturbed from the transmitted lattice point Hx. Assume that H is known exactly at the receiver. The ^ for x is obtained by minimizing the Euclidean distance maximum-likelihood (ML) estimate x of y from the valid lattice points,

x^ = arg minM ky Hxk2 x2Iq = arg min kF( x)k2 ; x2Iq M

(2)

where = HT (HHT ) 1 y and F 2 IRM M is the upper triangular matrix corresponding to the generalized Cholesky decomposition [1] of HT H. Thus, FT F = HT H and all entries of the last M

N rows of F are zero. The matrix F can be represented as F =

F1 2 IRN N is upper triangular and F2 2 IRN (M

F1 F2 , where 0 0

refer to the . Let the subscripts G and G indices corresponding to the first N and the last M N elements of a vector, respectively. The minimum distance corresponding to the optimum rule in (2) is now given by

ky

Hx^ k

2

= =

min

xG 2IqM

min

xG 2IqM

N N

N)

min

k[F1 ; F2]

min

k~

xG 2IqN xG 2IqN

F2 xG

F1 xG k

2

F1 x G k

2

;

(3)

where ~ = [F1 ; F2 ] F2 xG in the second equation. The Generalized Sphere Decoder (GSD) in [1] checks all valid constellation points that lead to the squared Euclidean distance in (2) smaller than some positive number C . This is done by exhaustively searching through all hypotheses corresponding to xG in (3) and employing the “regular” sphere decoder presented in [3] to compute the bracketed expression in (3). The full rank and upper triangular nature of F1 is used by the regular sphere decoder to place constraints on the entries of xG successively so that k~ F1 xG k2 C . Such successive constraints also allow computing only as many entries of ~ as required. The regular sphere decoder will find the valid hypothesis for xG that minimizes the squared Euclidean distance between ~ and F1 xG if the squared minimum distance is less than C . Otherwise, a failure of the regular sphere decoder for the given hypothesis of xG is declared. When a hypothesis pair (xG ; xG ) is found within the sphere, the value of C is updated and the algorithm continues to search through the remaining hypotheses for xG . If the regular sphere decoder fails for every possible hypothesis of xG , then the entire algorithm is repeated

with a larger value of the squared search radius C . We will refer to the GSD based on the exhaustive search over xG and the regular sphere decoder for every value of xG as the conventional GSD. The shortcoming of the conventional GSD algorithm described above can be explained as follows. The under-determined system in consideration is interpreted as consisting of N information symbols drawn from Iq and one super-symbol drawn from IqM N . The conventional GSD requires one to go through each of the q M N possible hypotheses for the super-symbol at least once, irrespective of whether or not the regular sphere decoder fails subsequently for the given search radius C . Hence, the philosophy of the sphere decoding algorithm is not effectively applied to the larger super-symbol to predict a-priori the values of xG for which no part of the effective statistic ~ need be computed. In other words, the knowledge of the failure of the regular sphere decoder for a particular hypothesis of xG is not used to predict the failure for any other hypotheses. The new algorithm we propose next is motivated by the key observation stated above and leads to significantly faster decoding using the ML rule than with the conventional GSD. The algorithm presented here is also a more natural generalization of the sphere decoder to systems with fewer equations than the number of unknowns.

3 A New Generalized Sphere Decoder To simplify the description of the proposed algorithm, we shall restrict to q = 2 so that the signal constellation Iq = f1g corresponds to the BPSK set. The extension for q > 2 is provided at the end of this section. The Euclidean distance in the ML decoding rule (2), for x 2 IqM , is given by

kF(

x)k2

N X M X = Fij (j i=1 j =i 0 M N X1 X Fij (j = @

2 xj )

2 1 M A X xj ) + FNj (j

j =i

i=1

(4)

j =N

2 xj ) :

(5)

As in the ordinary sphere decoding algorithm, the requirement of kF( x)k2 C translates into a constraint on the value of each component in the summation given above. Thus,

kF( =)

p

x)k2 C

C

M X

)

M X FNj (j j =N

FNj (j

j =N

xj )

p

2 xj )

C

(6)

C:

(7)

We first make a bijective transformation given by 1 + xj ; N j M; 2 M . Defining aj = 2FNj ; N j

bj =

so that bj becomes

2 f0; 1g; N j M X j =N

!

FNj (1 + j )

p

C

M X j =N

aj bj

M X j =N

(8)

M , the constraint in (7) !

p

FNj (1 + j ) + C:

(9)

Let n = M N +1 and define the vectors a = [aN ; aN +1 ; : : : ; aM ]T and b = [bN ; bN +1 ; : : : ; bM ]T . Let LB and U B represent the lower and upper bounds given in (9). Among all possible 2n binary sequences for the vector b, the ones that do not satisfy (9) will not lead to constellation points within the sphere of radius C . The conventional GSD effectively verifies the validity of the inequality in (9) by an exhaustive search. We now propose a systematic method of checking (9) so that not all 2n possible binary sequences need be searched. The new method also does not sacrifice the optimality inherent in the ML decoding rule. Let S be the set of all possible 2n binary sequences for b. Let 2S denote the power set of S , i. e., the set of all subsets of S . For every set A 2 2S , we associate two quantities, lb(A) and ub(A), that depend only on a and provide lower and upper bounds for the quantity aT b so that

lb(A) aT b ub(A);

8 b 2 A:

(10)

Define a relation H on the subsets of 2S based on the rule that if A; B 2 2S , then A H B implies that lb(A) lb(B). For any given value of y and C , if lb(A) is strictly greater than the upper bound U B in (9), then no sequence in A satisfies the constraints in (9). Additionally, if A H B, then there is no sequence in B that satisfies (9) either. The subscript H will be provided to explicitly represent the dependence of a generalized inequality or a mapping on the channel matrix H. We also introduce the symbol vH based on the upper bound ub(:) so that if A; B 2 2S , then A vH B implies that ub(B) ub(A). Thus, if ub(A) is strictly less than LB and A vH B, then none of the elements of either A or B will satisfy the constraint in (9). Consider the construction of a certain disjoint partition of S that leads to the same ordering with respect to H when all the elements of every set in the partition are permuted by a mapping H . Such a partition of S is required to be independent of the received statistics y, the channel matrix H and even the search radius C . Let the result of the possible orderings be summarized into the following array of D + 1 set inequalities,

H (S0;1 ) H (S1;1 ) ::: H (SD;1 )

H H (S0;2) H : : : H H (S0;L ) H : : : : : : H H (S1;L ) 0

1

::: ::: H (SD;2 ) : : : H H (SD;LD );

H Ld where [D d=0 [l=1 Sd;l = S and Sd ;l \ Sd ;l

(11)

= if (d1 ; l1 ) 6= (d2 ; l2 ). The elements of S are now tested for the constraint in (9) starting with the zeroth row in the array, then the next until the D -th row. 1 1

2 2

(H (S0;1 ); H (S0;2 ); : : : ; H (S0;L0 ); : : : ; H (SD;1 ); : : : ; H (SD;LD )):

Thus, one goes through all the row inequalities in (11) sequentially. In any row, if the lower bound lb(H (Sd;l )) becomes greater than the upper bound in (9), then H (Sd;l ) and all the subsequent sets to the right of H (Sd;l ) in the same row, namely H (Sd;l0 ); l0 > l, need not be checked and the search advances to the next row. This way we may possibly avoid searching through all the 2n hypotheses. The immediate question that arises is whether such a fixed partition of S and a channel dependent permutation H is indeed possible that leads to the same ordering in (11) for all H. One solution for this problem is provided next. Since the matrix H is known at the receiver, it is possible to sort the entries of a in increasing order to form the vector a0 , so that

a01 a02 : : : a0n :

(12)

Set H to be the permutation that sends a0 back to a. Then, H (a0 ) = a and for any binary sequence b 2 S , we have that a0T b = aT H (b): (13) Let w (:) denote the Hamming weight of a binary sequence. Consider the following subsets of 2S , S0;1 = f0n g; S1;1 = fb 2 Sjw(b) = 1g; Sd;l = fb 2 Sjw(b) = d; bl = 1 and bj = 0; 1 j l 1g; (14) for 2 d n; 1 l Ld and Ld = n disjoint partition of S and that jSd;l j = the ordering that we seek in (11).

d + 1 for d 2. It is evident that the sets Sd;l form a for d 2. The following proposition establishes

n l d 1

Proposition 1 For the sets in (14), lb(H (S1;1 )) = a01 ; ub(H (S1;1 )) = a0n and

lb(H (Sd;l )) =

dX +l 1 j =l

a0j ;

(15)

n X 0 ub(H (Sd;l )) = al + a0j ;

(16)

j =n d+2

for 2 d n; 1 l

Ld so that H (Sd;l ) H H (Sd;l ) whenever d 2; l1 l2. Consider d 2. The Hamming weight of any element b 2 Sd;l is d and thus 1

2

Proof: T a H (b) = a0T b is a sum of d components of a0 . This sum includes a0l and precludes all entries of a0 with index strictly smaller than l. Since the vector a0 is ordered, the smallest and largest values of such a sum are given by (15) and (16), respectively. Similarly, a01 and a0n are lower and upper bounds for aT H (b); b S1;1 . Now,

2

lb(H (Sd;l1 )) lb(H (Sd;l2 ));

8d 2; l1 l2 ;

(17)

as can be seen by a term by term comparison of the two sides in (17) based on (15). Hence, by definition of H , H (Sd;l1 ) H H (Sd;l2 ) whenever d 2; l1 l2 . Before searching through all the elements of a subset H (Sd;l ) in the order specified in (11), we first check that both of the following conditions are satisfied :

lb(H (Sd;l )) UB and ub(H (Sd;l )) LB:

(18)

If the first condition in (18) is not met then all the subsets H (Sd;l0 ), l0 l, can be discarded. If the the second condition in (18) is not met, then the set H (Sd;l ) can be discarded but not the ones given by H (Sd;l0 ), l0 > l. This is because the ordering for every d is with respect to H and not vH . To make the sets H (Sd;l ); 1 l Ld ; ordered with respect to vH also, the assignment of the upper bound ub(H (Sd;l )) must be weakened to the following quantity

ub(H (Sd;l )) =

n X j =n d+1

a0j ;

2 d n; 1 l

Ld :

(19)

The upper bound in (19) is, in fact, the maximum possible value of aT b among all binary sequences b with Hamming weight d. The upper bound in (19) is quite loose, but theoretically, defining lb(H (Sd;l )) and ub(H (Sd;l )) by (15) and (19), respectively, allows expurgation of all the sets H (Sd;l0 ) for l0 l if any of the two conditions in (18) is not met.

If both the conditions in (18) are satisfied for the current set H (Sd;l ), then we go through each element of H (Sd;l ) and execute a regular sphere decoder as explained next. Let xb denote the n-length vector of BPSK symbols corresponding to the binary sequence b 2 H (Sd;l ). Define the following sub-blocks of the matrix F,

F01 2 IRN 1N 1 ; F01 (i; j ) = F(i; j ); 1 i; j N 1 F02 2 IRN 1M N +1 ; F02 (i; j ) = F(i; j + N 1); 1 i N

1; 1 j

M

N + 1:

The regular sphere decoder is executed with the effective received statistics ~ = [F01 ; F02 ] P M F02 xb , the effective generator matrix F01 and the squared search radius set to C j j =N FNj (j xj )j2 . If the sphere decoder succeeds in finding a valid solution for xj ; 1 j N 1, the value of C is updated to the minimum distance thus found and the corresponding lattice point is stored as the current solution. The bounds LB and UB in (9) are then updated. Since we change the quantities LB and U B while searching through the elements of H (Sd;l ), we must verify that the conditions in (9) are met before executing the regular sphere decoder each time. After all the sequences in the set Sd;l have been tested, the algorithm proceeds to verify the validity of (18) for the remaining groups in the order specified by (11). If no valid lattice point is found after either checking or discarding every set H (Sd;l ), the value of C is increased by some factor and all the sets are searched again in the same order specified by (11). If all the sets have been checked or discarded, and the current solution is non-empty, the algorithm terminates, declaring the current solution as optimal. The proposed algorithm is now extended for q = 2m for some positive integer m. Replace the bijective transformation in (8) by

bj =

q

1 + xj ; 2

N

j M; P

1 i so that 0 bj q 1 for N j M . Consider the binary representation bj = m i=0 2 bi;j , PM PM Pm 1 i bi;j 2 f0; 1g. We get that j =N aj bj = j =N i=0 (2 aj )bij . Let n = (M N + 1)m. Define the n length vectors a = [aN ; 2aN ; : : : ; 2m 1 aN ; : : : ; aM ; 2aM ; : : : ; 2m 1 aM ] and b = [b0;N ; : : : ; bm 1;N ; : : : ; b0;M ; : : : ; bm 1;M ] and proceed as before from (12).

4 Multiple Depth GSD In the previous section, we formulated a method for grouping n-length binary sequences into disjoint groups H (Sd;l ) and using the information about a group, namely ub(H (Sd;l )) and lb(H (Sd;l )), to facilitate rejection of several sequences. For large n, the cardinality of some sets Sd;l may be quite large, such as for l = 1; d n2 . For such sets, the upper and lower bounds [lb(H (Sd;l )); ub(H (Sd;l ))] may be very weak. Thus, we may not reject many sequences and the number of hypotheses tested may even be comparable to the total number of possibilities 2n . We now propose an extension of the method in Section 3 to accommodate for the increase in the average group size with increasing n. The multiple depth GSD is obtained by a simple recursive application of the same key idea of dividing a group of sequences into ordered subgroups. For illustration, consider the equations (11) and the partition of S given by (14). Suppose, for d 3 and 1 l Ld , both the relations in (18) are true so that all elements of the set H (Sd;l ) need to be searched. We now further partition the set Sd;l into disjoint subsets that can be ordered with respect to H after the permutation by H . Consider the following subdivision of Sd;l for d 3,

Sd;l =

[nv=1l

Sd;l;v1 \ Sd;l;v2 = ;

d+2

Sd;l;v ;

1ln v1 6= v2 ;

d+1

where the subsets Sd;l;v are obtained as

Sd;l;v = fb 2 Sd;l jbl+v = 1; bl+j = 0; 1 j for 1 v

n

l

v

1g;

(20)

d + 2. The quantities lb and ub are defined as l+X v +d 2 0 lb(H (Sd;l;v )) = al + a0j ;

(21)

j =l+v

ub(H (Sd;l;v )) = a0l + a0l+v +

n X n d+3

a0j :

(22)

A term-by-term comparison of (21) for v = v1 and v = v2 > v1 shows that lb(H (Sd;l;v1 )) lb(H (Sd;l;v2 )) and so the sets H (Sd;l;v ) are indeed ordered according to H , i. e.,

H (Sd;l;1 ) H H (Sd;l;2 ) : : : H H (Sd;l;n

l d+2 ):

All the elements of the set H (Sd;l;v ) are checked only if the following two conditions are satisfied lb(H (Sd;l;v )) UB and ub(H (Sd;l;v )) LB: (23)

Note that lb(H (Sd;l;1 )) = lb(H (Sd;l )) and so lb(H (Sd;l;1 )) UB because (18) is true. If lb(H (Sd;l;v )) > U B for some v > 1, then all the sets H (Sd;l;v0 ), v 0 v can be discarded. If ub(H (Sd;l;v )) < LB , then the set H (Sd;l;v ) can be discarded but not H (Sd;l;v0 ) for v 0 > v because the sets H (Sd;l;v ) for a fixed d and l are not ordered according to vH . Once again we can weaken the upper bound ub to n X 0 ub(H (Sd;l;v )) = al + a0j ;

(24)

j =n d+2

so that ub(H (Sd;l;v )) < LB would imply that all the sets H (Sd;l;v ), v 0 v can be discarded. We refer to the proposed algorithm involving the partition of S into Sd;l only as a Depth 1 GSD and the algorithm that additionally involves the sets Sd;l;v as a Depth 2 GSD. Further splitting of Sd;l;v into disjoint ordered groups leads to higher Depth GSD schemes. In the particular partitioning scheme described above, the subsets of 2S consisting of sequences of Hamming weight up to X > 1 are chosen to be the same for both Depth X and Depth X 1 GSD. As the depth of the algorithm increases, the maximum cardinality of the subsets in the partition of S decreases and the upper and lower bounds lb and ub become more accurate.

5 Numerical Results In this section, we present numerical comparison of the proposed algorithm with respect to the conventional GSD. The purpose of this section is to show the potential advantage available from the best partitioning scheme for S by presenting the results for the specific partitions provided in this paper. In the first subsection, we discuss the uncoded BLAST system with fewer receive than transmit antennas. In the second subsection, we discuss the application of the improved GSD algorithm to the Threaded Algebraic Space-Time (TAST) [7] codes. The proposed GSDs are implemented for a Depth up to 4. The Depth 1 GSD is obtained by the ordering rule in (11) and the partitioning scheme in (14). The values of lb(H (Sd;l )) and ub(H (Sd;l )) are taken from (15) and (16), respectively. The Depth 2 GSD is obtained

from the additional partitioning in (20) and the values of lb(H (Sd;l;v )) and ub(H (Sd;l;v )) are taken from (21) and (22), respectively. The Depths 3 and 4 GSD are obtained by the further partitioning of Sd;l;v as described in Section 4. The conventional GSD is obtained from any of the above mentioned GSD by not exploiting the bounds lb and ub and exhaustively searching through all binary sequences in S . The complexity of the decoders is given in terms of the total number of CPU cycles required to perform the arithmetic operations in the algorithm. We have counted 1; 4; 7 and 20 cycles for each instance of integer addition/subtraction, floating point addition/subtraction, floating point multiplication and the square root operation, respectively. Since the sets Sd;l and Sd;l;v are fixed irrespective of the channel realization, they are generated off-line and provided as a multi-dimensional array to the function that implements the GSD algorithm.

5.1 BLAST Scheme The system considered in this subsection is comprised of an Nt transmit and Nr receive antennas fading channel. Independent symbols from the 4-QAM constellation are transmitted from each antenna in every symbol interval. The received statistics in a given symbol interval for the BLAST scheme can be written as r Ms + w; r= (25) Nt where M 2 CI Nr Nt is the matrix of fading coefficients, w 2 CI Nr 1 is the noise vector and s 2 4 QAM Nt is the vector of transmitted symbols. The entries of M and w are independent and identically distributed zero mean complex Gaussian random variables with unit variance. In the quasi-static assumption, the fading process remains constant for T symbols periods before the next realization. The quantity Nt is the SNR per bit. The equivalent real model of (25) consists of 2Nr equations and 2Nt integer unknowns that assume the values from f1g. When Nr Nt , the regular sphere decoding algorithm is applicable. However, large number of receive antennas may not always be practical, for instance, on a space-limited mobile receiver. In such a case, ML decoding can be performed using the GSD. We compute the total number of CPU cycles taken by the conventional GSD and the proposed GSDs to decode a common sequence of channel realizations in (25). The operations involved in Cholesky factorization and other channel operations that depend only on M are not counted because these can be neglected as T ! 1. The initial value of the squared search radius C is chosen to be Nr . If no constellation point is found within the initial sphere, the algorithm is repeated with the squared search radius increased by a factor of 2. An interesting trend is observed when we plot the ratio of the total CPU cycles for the proposed GSDs with respect to the total CPU cycles for the conventional GSD. In Figure 1, such a comparison is made for Nr = 2; T = 10; 4 Nt 9 and 3000 fading realizations. Not only do the proposed GSDs require fewer CPU cycles on average, but the ratio corresponding to the best of the proposed GSDs is decreasing with increasing Nt . For a fixed depth GSD, the ratio eventually tends to 1 for large Nt as predicted in Section 4. We therefore propose to increase the depth of the GSD with increasing Nt so that the rate of increase in complexity of the multiple depth GSD is strictly less than that of the conventional GSD.

5.2 Full Layer TAST codes Space-time codes are a means of achieving higher transmit diversity than the BLAST scheme on a quasi-static fading channel. The family of TAST codes presented in [7] are designed to

achieve the full transmit diversity Nt and a rate of L (1 L Nt ) information symbols per channel use on a MIMO Rayleigh fading channel. The equivalent linear model for decoding each TAST codeword consists of LNt unknowns and Nt Nr equations. It was recommended in [7] that the TAST codes should transmit at the maximum rate of L = min(Nt ; Nr ) symbols per channel use to prevent the system from becoming under-determined. However, for L = Nr < Nt , the capacity of the equivalent channel provided by the L-layer TAST code decreases to the capacity of an Nr transmit and Nr receive antenna system at the same SNR. The full capacity of the Nt transmit antenna and Nr receive antenna system can be achieved only with the full layer (L = Nt ) TAST code that unfortunately makes the system under-determined for Nr < Nt . Hence, the GSD algorithm is required to decode the full layer TAST code in this scenario and the size of the ML part of the conventional GSD in the equivalent real model is 2Nt (Nt Nr ). The well known result of the capacity of the Rayleigh fading channel growing as min(Nt ; Nr ) log2(SN R) [10,11] suggests that the difference in the capacities of the Nt Nr and Nr Nr systems may not be significant for Nr < Nt . However, for reasonably small values of Nt and Nr , the difference is not negligible. For instance, the capacity at an SNR of 30 dB for the (Nt = 1; Nr = 1) system is 9:14 bits/channel use and that for the (Nt = 3; Nr = 1) system is 9:71 bits/channel use. This is also reflected in the better performance of the 3-layer TAST code compared to the 1-layer TAST code for the 3 transmit antenna 1 receive antenna channel as shown in Figure 2. Thus, to enjoy the improved performance of the full-layer TAST code with Nr < Nt , one must deal with the high complexity of the generalized sphere decoder. In Figure 3 , the ratio of the total number of CPU cycles to decode 1000 codeword matrices from a 3-layer TAST code for (Nt = 3; Nr = 1) at 30 dB SNR and 4-QAM symbols is shown for the Depth X GSD, 1 X 4, with respect to the conventional GSD. The Depth 4 GSD decreases the total CPU cycles by a factor of 1:8 compared to the conventional GSD. While the Depth 4 GSD still requires about 460 times more CPU cycles than the sphere decoder for the 1-layer TAST code, it would be interesting to see how this factor for higher depth GSD or a different set partitioning scheme compares to the factor of 27 which corresponds to a cubic increase with 3 times the number of unknowns.

6 Conclusions A new generalized sphere decoding algorithm was proposed and shown to be much faster than the existing algorithm presented in [1]. The key idea of partitioning the set of all maximum likelihood hypotheses into disjoint ordered groups was applied in conjunction with the sphere decoding algorithm. It was recognized that the complexity of the proposed algorithm depends upon the largest size of the subgroup in the partition and, therefore, further improvement was achieved with the concept of multiple-depth GSD algorithm. Higher depth for the GSD algorithm are necessary to effectively reduce the complexity for severely under-determined systems.

References [1] M. O. Damen, K. Abed-Meraim, and J.-C. Belfiore, “Generalised sphere decoder for asymmetrical space–time communication architecture,” IEE Electronics Letters, vol. 36, no. 2, pp. 166–167, Jan. 2000.

Nr=2, 4−QAM, 28 dB SNR/bit

1.4 1.3 1.2

GSD Conv. X =1 X =2 X =3 X =4

1−Layer TAST code 3−Layer TAST code

1.1

Frame Error Probability

Ratio of Total CPU cycles

3 Tx , 1 Rx, 6 bits/channel use, TAST codes

0

10 Conventional GSD Depth 1 GSD Depth 2 GSD Depth 3 GSD Depth 4 GSD

1 0.9 0.8 0.7

−1

10

−2

10

0.6 0.5 0.4 4

−3

5

6

N

7

8

t

Figure 1: Ratio of CPU cycles with respect to the conventional GSD for BLAST.

9

Ratio 1 0:89 0:77 0:64 0:55

10

22

23

24

25 26 27 Average Received SNR (dB)

28

29

Figure 2: Performance of TAST codes with 3 transmit and 1 receive antenna.

30

Figure 3: CPU cycle ratio for full layer TAST code, Nt = 3, Nr = 1, 30 dB SNR, Depth X vs conventional GSD.

[2] U. Fincke and M. Pohst, “Improved methods for calculating vectors of short length in a lattice, including a complexity analysis,” Math. Comput., vol. 44, pp. 463–471, Apr. 1985. [3] E. Viterbo and J. Boutros, “A universal lattice code decoder for fading channels,” IEEE Trans. Inform. Theory, vol. 45, no. 5, pp. 1639–1642, July 1999. [4] C. P. Schnorr and M. Euchner, “Lattice basis reduction: Improved practical algorithms and solving subset sum problems,” Math. Programming, vol. 66, pp. 181–191, 1994. [5] B. Hassibi and B. M. Hochwald, “High-rate codes that are linear in space and time,” IEEE Trans. Inform. Theory, vol. 48, no. 7, pp. 1804–1824, July 2002. [6] M. O. Damen, K. Abed-Meraim, and J.-C. Belfiore, “Diagonal algebraic space–time block codes,” IEEE Trans. Inform. Theory, vol. 4, no. 3, pp. 628–636, Mar. 2002. [7] H. El Gamal and M. O. Damen, “Universal space–time coding,” IEEE Trans. Inform. Theory, vol. 49, no. 5, pp. 1097–1119, May 2003. [8] L. Brunel and J. Boutros, “Euclidean Space Lattice Decoding for Joint Detection in CDMA systems,” in ITW’99, June 1999. [9] M. O. Damen, K. Abed-Meriam, and M. S. Lemdani, “Further results on the sphere decoder,” in Proc. IEEE Intl. Symposium on Information Theory, June 2001. [10] G. J. Foschini, “Layered space–time architecture for wireless communication in fading environments when using multiple antennas,” Bell Labs Tech. J., vol. 1, no. 2, pp. 41–59, Autumn 1996. [11] L. Zheng and D. N. C. Tse, “Diversity and multiplexing: A fundamental tradeoff in multiple antenna channels,” IEEE Trans. Inform. Theory, vol. 49, pp. 1073–1096, May 2003.

A Fast Generalized Sphere Decoder for Optimum Decoding of Under ...

A Fast Generalized Sphere Decoder for Optimum Decoding of Under ...

Suggest Documents

The complexity of sphere decoding perfect codes under a ... - Eurecom

A new decoder for the optimum recovery of ... - Semantic Scholar

A Multi-Core Sphere Decoder VLSI Architecture for ... - Semantic Scholar

A Novel Approach for Sphere Decoder MIMO ... - Journal Repository

Sphere Decoding for Multiprocessor Architectures - Arizona State

Enabling Sphere Decoding for SCMA - arXiv

Low Complexity Sphere Decoding for Spatial ...

Sphere Detection Technique: An Optimum

Generalized Sphere Packing Bound

real-time implementation of a sphere decoder-based mimo wireless

Max-Log-MAP Sphere Decoder

A generalized hypothetical reference decoder for H.264/AVC - Circuits ...

VLSI IMPLEMENTATION OF PIPELINED SPHERE DECODING WITH ...

GENERALIZED CIRCLE AND SPHERE THEOREMS FOR ... - CiteSeerX

A fast and accurate decoder for underwater acoustic telemetry - JSATS

A fast and accurate decoder for underwater ... - Semantic Scholar

Generalized Stack Decoding Algorithms for Statistical Machine ...

An Improved Complex Sphere Decoder for V-BLAST Systems

OUT-SPHERE DECODER FOR NON-COHERENT ... - Caltech Authors

Low-complexity dominance-based Sphere Decoder for MIMO Systems

a fast parallel huffman decoder for fpga ... - Semantic Scholar

low-power pre-decoding based viterbi decoder for ... - Semantic Scholar

A Fast Asynchronous Huffman Decoder for Compressed-Code ...

nozomi { a fast, memory-efficient stack decoder for lvcsr - CiteSeerX