O N - LINE A RITHMETIC FOR D ETECTION IN D IGITAL C OMMUNICATION R ECEIVERS
Sridhar Rajagopal and Joseph R. Cavallaro Center for Multimedia Communication Department of Electrical and Computer Engineering Rice University Houston TX 77005. fsridhar,
[email protected]
Abstract This paper demonstrates the advantages of using on-line arithmetic for traditional and advanced detection algorithms for communication systems. Detection is one of the core computationallyintensive physical layer operations in a communication receiver and determines the communication data rates. Detection algorithms typically involve hard decisions (sign based testing) to find the sign of the transmitted information bit. This implies extraneous computations in a conventional number system as the sign is obtained only at the end due to the Least Significant Digit First (LSDF) nature of computations. On-line arithmetic, based on a signed digit number representation, provides Most Significant Digit First (MSDF) computation. Hence, the computations can stop after the first non-zero MSD (sign) is computed and additional computations for the successive digits can be avoided. The need of back-conversion to a conventional number system is also not required as the sign of the digit represents the detected bit. A comparison of a radix-4 serial digit on-line detector with a 8-bit parallel conventional arithmetic detector shows a decrease in latency by 1.9X, a 3X increase in throughput and possible savings in area. We show that similar benefits are also possible with on-line arithmetic for decoding algorithms in communication receivers.
Keywords: on-line arithmetic, MSDF, wireless communication receivers, detection, decoding, W-CDMA, VLSI architecture
I.
Introduction
Next generation communication systems based on W-CDMA (Wideband Code Division Multiple Access) are being designed to increase data rates by orders-of-magnitude and enhance performance significantly in order to support real-time multimedia services [1]. However, the increase in complexity required by the algorithms to support multimedia capabilities has not been supported by hardware to achieve real time implementations. Detection is one of the core baseband processing operations in a digital communication receiver and is used to find the transmitted bits from the source after it has been corrupted by noise and other interference present in the channel. The data rates that can be achieved in a communication system are dependent on the speed of detection and hence, it is critical to accelerate detection to meet real-time performance requirements. There have been several advanced schemes proposed for detection [2, 3] for future communication receivers that provide better performance in terms of lower bit error rates. But these algorithms have not been implemented in practical systems in spite of their sub-optimality, due to their high implementation complexity [4]. In this paper, we demonstrate the use of non-conventional redundant number representations using on-line arithmetic to accelerate the implementation of traditional as well as advanced detection algorithms to help meet real-time requirements for next generation communication systems. Figure 1 shows the main blocks in a digital communication receiver. The physical baseband layer in a communication receiver involves operations to detect and decode the information bits that are transmitted. Sophisticated algorithms for channel estimation, detection and decoding are applied on the receiver to determine the transmitted bits. Thus, many high-precision operations are performed to accurately estimate the channel, detect the transmitted bits and decode the information. Based on these operations, a hard decision (typically by performing a sign-based testing) is made on the information bits that are sent, which are typically 1’s, assuming a Binary Phase Shift Keying (BPSK) system. Thus, higher precision algorithms are used to determine the sign of the result. This represents extraneous computations even with finite precision implementations as the sign is obtained only at the end of the computations. Sometimes, intermediate stages in advanced schemes for detection and decoding also use soft decisions for performance benefits, where the 2
Antenna
Hard decision d or soft decision y Detection
RFUnit
A/D
Decoding
Decoded Information bits
Demux
Channel Estimation
Figure 1: Block diagram of a digital communication receiver. The detector produces either singlebit precision outputs (hard decisions) of the detected bits or higher precision outputs (soft decisions). The final decision on the transmitted information is made at the decoder. entire precision is kept to help make an improved hard decision with higher accuracy at the end of detection. On-line arithmetic has been shown to be very useful for many signal processing applications such as DCT, FFT, CORDIC, filtering and matrix based operations [5, 6]. In conventional arithmetic, all digits of the result must be computed before the next operation can be started. However, in on-line arithmetic, the operands, as well as the results, flow through the computations in a digit-by-digit manner, starting from the most significant digit (MSDF). On-line arithmetic [7, 8, 9] has been developed and researched extensively over the past two decades. The advantages of online arithmetic have been to eliminate carry-propagate addition, reduce interconnection bandwidth between modules and allow parallelism between several operations. With a serial data flow, online arithmetic can be pipelined to implement sophisticated algorithms. As carry-propagation is eliminated, on-line operations can be overlapped. On-line arithmetic has been shown to provide a speed-up of 2-16X [10] for conventional numerical operations. The implementation tradeoffs related to the applicability of on-line arithmetic are its need for a non-conventional number system, conversions to-and-from a conventional system [11] and the inherently serial nature of the operations. In this paper, we show that on-line arithmetic also has immense potential for use in detection in communication systems. We present on-line implementations for both traditional and advanced detection algorithms. As communication systems are special-purpose applications, there is no real need to maintain a conventional number representation. On-line arithmetic can be used in a 3
communication receiver with almost the same numerical accuracy and precision and significant performance acceleration in terms of speed and latency reduction. Also, a MSDF representation allows us to stop computations as soon as the first non-zero MSD (sign) has been calculated. This not only avoids the computations of the successive digits, but also eliminates the need for backconversion to a conventional number system. Since the operations of detection and decoding are typical in digital communication receivers, on-line arithmetic can be applied to a wide range of detection and decoding algorithms. As detection and decoding are the final blocks in the chain of computations in the physical layer of the receiver, other blocks such as channel estimation as shown in Figure 1 can also be implemented using on-line arithmetic and pipelined with detection and decoding. The inputs generated serially from the A/D converter in Figure 1 are inherently MSDF and hence, the on-line algorithms can increase the overall speed by overlapping computations with the conversion. Other signal processing applications involving sign-based computations can also benefit from an on-line approach. The organization of the paper is as follows. Section 2 gives a brief introduction to on-line arithmetic. In section 3, traditional and advanced algorithms for detection, that are to be implemented using on-line arithmetic are presented. The advantages in speed and area of on-line arithmetic over conventional arithmetic for detection are presented in Section 4. Section 5 shows the potential for also implementing other units in the receiver chain using on-line arithmetic. Section 6 presents the summary and future work. A review of on-line addition and on-line multiplication used in this paper is given in the Appendix.
II.
Background on on-line arithmetic
On-line arithmetic algorithms [11] work in a digit-serial manner, producing the result in a MSDF fashion. To generate the first output digit, Æ digits of the input are required. Thereafter, with each digit of the input, a new digit of the result can be obtained. The on-line delay Æ is typically a small integer, e.g 1 to 4. Since the outputs are produced serially, the algorithms can be pipelined with a latency of Æ as shown below:
4
Input x
x1 x2 x3 x4 x5 ....
Input y
y1 y2 y3 y4 y5 ....
Output z
!
Æ
z1 z2 z3 z4 z5 ....
In order to achieve MSDF operations, on-line algorithms need to use a redundant number system [12] for carry-free addition. The on-line representation of a number x is given by [7] Xj
=
X0
=
Xj
Æ X
1
+ xj +Æ r
xi r
Æ j
i
i=1
where Xj represents the value of the addends at step j , r is the radix of the redundant number system and Æ is the on-line delay. The digits xi belong to a redundant digit set, f-,.....,-1,0,1,.....,g (assumed symmetric) where r=2 r
1,
system. For our system, we shall assume
= r
represents the amount of redundancy in the number 1
for maximum redundancy as this will allow
the inputs to the system to be directly acceptable in redundant form for on-line operations. We choose a radix-4 system to reduce the on-line delay to Æ
= 1
for multiplication and addition [7]
as it requires the least number of gates [13]. In our algorithms, we shall assume fixed point inputs with 8-bit precision. The operations that need to be performed for detection are on-line addition and multiplication, which are presented in the Appendix.
III.
Detection algorithms implemented in a W-CDMA communication receiver
We first show the advantages of on-line arithmetic for a traditional and widely implemented matched filter detector [3] (or the single user detector) as an initial example. Then, one of the advanced multiuser detection algorithms proposed to achieve better performance in wireless communication receivers is presented. A. Synchronous matched filter detector A synchronous detector implies that different users are transmitting their information to the receiver at the same time, i.e. all users are synchronized in time with respect to each other. This 5
makes detection a simpler task as the delays of the different users need not be accounted for. This is a valid assumption for the downlink when a base-station is talking to a mobile receiver. A matched filter detector is also termed as a single user detector as it ignores the effect of interference due to the other users. The detection problem can be viewed as a least squares problem. Let
r 2R i
N
be the received signal and
A2R
be the cross-correlation matrix obtained from
N K
channel estimation. N is the length of the spreading code (also known as the spreading gain or the spreading factor) and K is the number of users in the system. Let
d 2 f + 1; 1g i
K
be the bits of
the K users to be detected. Then, the system can be formulated as given below:
r
Ad + :
=
i
i
(1)
i
where is the Additive White Gaussian Noise (AWGN) in the system. So, a least-squares solution to be problem can be derived as
Ar d H
i
=
i
=
A Ad + A sign((A A) A (r H
H
i
H
(2)
i
1
H
i
i )):
(3)
Equation 3 represents the decorrelating detector, which is a multiuser detector. A simple approximation to the decorrelating detector can be obtained by assuming that the cross-correlations are zero (there is only a single user) and the
A A matrix is identity. This gives rise to the single user H
detector or the matched filter detector with hard decision outputs:
d
i
=
A r ):
sign(
H
i
(4)
B. Asynchronous multiuser detection In the uplink, when different mobile users are communicating with the base-station, the desired user’s bits receive interference from the past or future overlapping symbols of different users along with their current symbols because they are asynchronous. This is as shown in Figure 2. Multiuser detection cancels the interference from other users to improve the error rate performance, compared to traditional single user detection using only a matched filter [3]. Here, we assume that 6
User 1 User 2
i
i-1
i-1
Received Signal (r)
?? ? ?? ?? i
i-1
i+1
i+1
i
i+1
Time
Figure 2: Multiple Access Interference. This figure shows the interference due to asynchronous delays of the different users. The received signal contains the sum of past, current and future bits (each shaded differently) of all users. the delays of the different users are coarse-synchronized within one symbol duration. We consider multistage detection [14, 15], based on the principle of Parallel Interference Cancellation (PIC). This scheme cancels the interference from different users, successively in stages and is shown to have computational complexity quadratic with the number of users. The different delays and phase-shifts make the received signal
r
i
and the channel estimates as
complex numbers. For an asynchronous system with Binary Phase Shift Keying (BPSK) mod-
A ; A 2 C which corresponds to partial correlation information for the successive bit vectors d ; d 2 f+1; 1g , which are to be deulation, the channel estimate can be arranged as
0
N K
1
i 1
tected. In vector form, the received signal is
r
i
=
2 6 6 6 6 6 6 [A0 A1 ℄ 6 6 6 6 6 4
d1;i
1
.. . dK;i d1;i
.. .
1
i
K
3 7 7 7 7 7 7 + i : 7 7 7 7 7 5
(5)
dK;i
1. Asynchronous matched filter detection The bits,
d , of the K users to be detected lie between the received signal r i
i
and
r
i 1
bound-
aries. The matched filter detector [2, 3] does a single user detection for each user, neglecting the
7
effects of other users. The synchronous matched filter detector (equation (4)) can be extended to
y
the asynchronous case, producing both soft decisions
y d
i
=
=1.2
0.2 W 4> j >=-0.2 4
0
000
-0.2 W 4> j >=-1.2 4
-1
111
-1.2 W 4> j >=-2.2 4
-2
110
-2.2 W 4>
-3
101
j
Figure 9: Selection function for r = 4 where the selection is done by rounding by using a discretization algorithm (DIS) [22] that performs the rounding based on the first few MSDs. The selection function is given by:
bj j
j j sign(Wj )bjWj j otherwise :
sele t(Wj ) = sign(Wj ) Wj + 1=2 if Wj =
Figure 8 shows the comparison of an on-line adder with a conventional adder. From the figure, we can observe that the on-line implementation using CSAs makes the digit serial on-line addition time independent of the working precision, while the time for bit parallel conventional adder, is dependent on the d-bit precision. The result digit selection function for a radix-4 system is as shown in Figure 9.
19
Conventional bit- parallel multiplier a
On-line digit-serial multiplier
b
Append
REG A j-1
Wallace Tree Multiplier
* b
REG PS
REG SC
aj
Append
Pj
j
bj
bj
REG B j-1 B j-1a*
j
aj
P-1j CS A (4-to-2) radix( )+1 r
c
t=t
* conv log
2 (d)
PS - Partial sums SC - Stored carries
Wj Select
CSA (3-to-2)
cj t=t
Pj
OL
Figure 10: On-line Multiplication. The figure shows the comparison between a bit parallel conventional multiplier and an on-line digit serial multiplier B. On-line Multiplication The steps involved in on-line multiplication [7, 22] are:
Initialization P
=
0 = A 0 = B0 = 0
f or j
=
0; 1; :::::; m + 1 do
Wj
=
r
Bj
=
< Bj
j
=
sele t(Wj )
Pj
=
Wj
1
Re urren e Pj
1+ 1;
< Aj
1;
a j > b j + Bj
1
aj
bj >
j
where the selection function is the same as in on-line addition. Figure 10 shows the comparison of an on-line multiplier with a conventional multiplier.
20
Acknowledgments
We would like to thank Dr. Behnaam Aazhang for his comments and guidance on this paper. This work was supported in part by Nokia Corporation, Texas Instruments Inc., the Texas Advanced Technology Program under grant 1999-003604-080, and by NSF under grant ANI9979465. References [1] F. Adachi, M. Sawahashi, and H. Suda, “Wideband DS- CDMA for Next-Generation Mobile Communication Systems,” IEEE Communications Magazine, vol. 36, no. 9, pp. 56–69, September 1998. [2] S. Moshavi, “Multi-User Detection for DS-CDMA Communications,” IEEE Communications Magazine, pp. 124–136, October 1996. [3] S. Verd´u, Multiuser Detection, Cambridge University Press, 1998. [4] I. Seskar and N. B. Mandayam, “A Software Radio Architecture for Linear Multiuser Detection,” IEEE Journal on Selected Areas in Communication, vol. 17, no. 5, pp. 814–823, May 1999. [5] M. D. Ercegovac and T. Lang, “On-line Arithmetic: A Design Methodology and Applications in Digital Signal Processing,” VLSI Signal Processing III, pp. 252–263, November 1988, IEEE Press. [6] J. Bruguera and T. Lang, “2-D DCT using on-line arithmetic,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Detroit, MI, May 1995, vol. 5, pp. 3275–3278. [7] M. D. Ercegovac, “On-line arithmetic: An overview,” Real Time Signal Processing VII, SPIE, vol. 495, pp. 86–93, 1984. [8] T. Lynch and M. J. Schulte, “A High Radix On-line Arithmetic for Credible and Accurate Computing,” Journal of Universal Computer Science, vol. 1, no. 7, pp. 439–453, July 1995. [9] R. McIlhenny and M. D. Ercegovac, “On-line Algorithms for Complex Number Arithmetic,” in 32nd Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, October 1998, pp. 172–176. [10] M. D. Ercegovac and A. L. Grnarov, “On the performance of on-line arithmetic,” in International Conference on Parallel Processing, August 1980, pp. 55–62.
21
[11] M. D. Ercegovac and T. Lang, “On-the-Fly Conversion of Redundant into Conventional Representations,” IEEE Transactions on Computers, vol. C-36, no. 7, pp. 895–897, July 1987. [12] A. Avizienis, “Signed-Digit Number Representations for Fast Parallel Arithmetic,” IRE Transactions on Electronic Computers, vol. EC-10, no. 3, pp. 389–400, September 1961. [13] A. Gorji-Sinaki and M. D. Ercegovac, “Design of a digit-slice on-line arithmetic unit,” in 5th IEEE Symposium on Computer Arithmetic, Ann Arbor, MI, May 1981, pp. 72–80. [14] M. K. Varanasi and B. Aazhang, “Multistage Detection in Asynchronous Code-Division Multiple -Access Communications,” IEEE Transactions on Communications, vol. 38, no. 4, pp. 509–519, April 1990. [15] G. Xu and J. R. Cavallaro, “Real-time Implementation of Multistage Algorithm for Next Generation Wideband CDMA Systems,” in Advanced Signal Processing Algorithms, Architectures, and Implementations IX, SPIE, Denver, CO, July 1999, vol. 3807, pp. 62–73. [16] S. Rajagopal, S. Bhashyam, J. R. Cavallaro, and B. Aazhang, “Efficient Algorithms and Architectures for Multiuser Channel Estimation and Detection in Wireless Base-station Receivers,” Submitted to IEEE Journal in Selected Areas in Communication (JSAC), August 2000. [17] N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems Perspective, chapter 8, Addison-Wesley, second edition, 1993. [18] S. Rajagopal, S. Bhashyam, J. R. Cavallaro, and B. Aazhang, “Efficient VLSI Architectures for Baseband Signal Processing in Wireless Base-Station Receivers,” in 12th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), Boston, MA, July 2000, pp. 173–184. [19] H. Suzuki, Y. Chang, and K.K. Parhi, “Low-power bit-serial Viterbi decoder for next generation wide-band CDMA systems ,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Phoenix, AZ, March 1999, vol. 4, pp. 1913 –1916. [20] B. Slkar, “A Primer on Turbo Code Concepts,” IEEE Communications Magazine, pp. 94– 101, December 1997. [21] J. J. Modi, Parallel Algorithms and Matrix Computation, Oxford University Press, 1988. [22] M. J. Irwin and R. M. Owens, “Fully Digit On-line Networks,” IEEE Transactions on Computers, vol. C-32, no. 4, pp. 402–406, April 1983.
22