Hardware Architecture of Time Domain LTE Baseband ... - IEEE Xplore

4 downloads 4465 Views 2MB Size Report
Abstract—The evolution of wireless technology to its fourth generation has improved exponentially from its predecessor. On the other hand, this improvement ...
Hardware Architecture of Time Domain LTE Baseband Signal Processor Trio Adiono∗ , Franciscus M. Satria† , Nur Ahmadi‡ , Felix Soewito§ Department of Electrical Engineering, School of Electrical Engineering and Informatics Bandung Institute of Technology, Jl. Ganesha No. 10 Bandung, 40132, Indonesia Email: ∗ [email protected], † [email protected], ‡ [email protected], § [email protected]

Abstract—The evolution of wireless technology to its fourth generation has improved exponentially from its predecessor. On the other hand, this improvement requires a higher level of complexity and computation to process the signal. In this paper, we propose an architecture for 4G LTE’s time domain baseband signal processor. To improve the computation time so that it meets the real-time specification, we propose a custom timing synchronization algorithm and FFT implementation. The purpose of this design is to detect the synchronization signal from LTE such as, Primary Synchronization Signal (PSS) and Secondary Synchronization Signal (SSS). The proposed design is written in Verilog HDL and synthesized using Quartus II software, and succesfully implemented in Altera DE4 FPGA board. Keywords—LTE, Time Domain, Synchronization, FFT, FPGA.

I.

Timing Synchronizer

Channel Estimation RS Extraction

CRC

FFT

Turbo Decoder

Fig. 1.

A/D

De-Interleaving Soft Combine

DFE

RF

MIMO Detector

FFT

A/D

II.

Bit Stream

Timing synchronization plays an important role in LTE receiver system. Before data can be processed further, the start of the symbol from each packet data must be determined first. Main function of this block is to determine start of each symbol of the received data. In time-domain LTE signal, this synchronization can be implemented using cross-correlation method [4] which is mathematically written as follows. Rxy (m) =

∞ 

x(n + m) y ∗ (n)

(1)

n=−∞

If the two signals being correlated are identical, the results of the correlation will be approaching one at zero lag (for normalized value). On the other hand, if one of the signal is shifted to positive or negative value, the results will also be shifted. Thus, this method can be used to determine the start of a symbol. On LTE signal, cross-correlation method is performed by correlating the Cyclic Prefix (CP) and the end of symbol of the same symbol packet. CP is the prefixing of a symbol with a repetition of the end. This method is called CP Correlation as shown in Figure 2. Correlation

CP

LTE physical layer flow diagram Fig. 2.

According to [3], there are some design challenges on LTE receiver implementation, such as, synchronization, high complexity of the algorithm, and so on. In this paper, we propose a custom algorithm for timing synchronization and a hybrid architecture for FFT module to ensure high data throughput and low latency can be achieved, hence the system can work in real-time operation. The proposed system is designed to process FD-LTE with 10 MHz bandwidth.

T HEORIES

A. Timing Synchronization

I NTRODUCTION

LTE is a wireless communication standard developed by 3GPP as the subsequent development of 3G and HSPA. In the early stages of development, LTE can reach a speed up to 100 Mbps for downlink and 50 Mbps for uplink [1]. To achieve those higher data transfer rates than its predecessor, LTE utilizes OFDM as the modulation technique to transmit and receive data [2]. The physical layer downlink processing diagram is depicted as Figure 1.

RF

This paper is organized as follows. Section II presents the overview of theories used in this design. Section III discusses the system design and architecture implementation. Section IV presents the simulation and synthesis results. The conclusion is drawn in Section V.

Symbol (n)

CP

Symbol (n+1)

Cyclic prefix correlation

B. FFT-IFFT In OFDM based communication systems, Discrete Fourier Transform (DFT) and Inverse Discrete Fourier Transform (IDFT) are one of the most important modules. IDFT is used to replace N oscillator for modulating and sending data on the

c 978-1-4799-8641-5/15/$31.00 2015 IEEE

transmitter end. While DFT is used to demodulate and decode the data on the receiver end. DFT of N complex data points, xn , is defined by: Xk =

N −1 

2πi nk − , xn e N

Data In 1

12 16[1]

Data In 2

Sign Extender

22[2]

Output R(0)

12 22

k = 0, 1, 2, · · · , N − 1

(2)

22 22 R(1)

n=0

20[3] 2πi N

is the twiddle factor (WN ). DFT has a disadwhere e vantage in computation time since it is computationally intensive with time complexity of O(N 2 ). Fast Fourier Transform (FFT) was proposed by Cooley and Tukey [5] to efficiently speed up the computation time complexity to O(N log2 N ), where N denotes the FFT sizes. Cooley-Tukey Algorithm reexpress the DFT of an arbitrary composite size N into smaller DFT sizes (N = N1 N2 ). Inverse Fast Fourier Transform (IFFT) has the same operation with FFT except for a sign change and a scale factor of data length. III.

S YSTEM D ESIGN AND A RCHITECTURE

A. System Design The block diagram of the proposed system design in this paper is shown in Figure 3.

Fig. 3.

Data Input

Timing Synchronizer

CP Removal

Data Output

SSS Synchronizer

PSS Synchronizer

FFT

The goal of this system design is to detect the Primary Synchronization Signal (PSS) and Secondary Synchronization Signal (SSS) from received LTE signal so that NCellID of the transmitter base station can be identified. B. Timing Synchronizer Timing Synchronizer block consists of 5 sub-modules: Input Buffer, Correlator, CORDIC, Peak Detector (Mean Method), and Maximum Detector. The data flow of timing synchronizer is illustrated in Figure 4. Buffer

Correlator

Index Data Out Error

Cordic

22 22 R(n)

Fig. 5.

Correlator architecture

CORDIC module is used to calculate the absolute value of the signal. This value will be used in comparator operation. Peak Detector is used to filter the maximum value from correlation operation by calculating the average value from one frame and generate a threshold value. Each signal with power less than the threshold will be forced to zero. Maximum Detector is used to find the maximum value from correlation operation in one frame. This module will find the highest value and record the index (signal position). This signal represents the start of a symbol.

The FFT block consists of three main modules: FFT input buffer (serial-to-parallel), FFT module, and FFT output buffer. The FFT input buffer is used to rearrange the input data from serial to parallel. This each parallel input will consist 256 data. In this buffer, the data will also be padded with 2 bit from the MSB of the input to prevent overflow during FFT computation. The main module of the FFT Block is designed using Decimation in Frequency (DIF) algorithm with combination of single-path delay feedback and multi-path delay commutator architectures. These architectures are implemented in parallel to gain high throughput, low latency, and high efficiency of memory usage [6]. This FFT module is implemented using fixed-point number representation. Table I shows the radix configuration for each stage of FFT. TABLE I.

Compa rator

R ADIX CONFIGURATION FOR EACH FFT STAGE

Peak Detector

Head Out

Fig. 4.

R(n-1)

C. FFT

System block diagram

Data In

22

Timing synchronizer’s data flow

Input Buffer consists of collection of 12 bit registers which is used to delay the input signal. The purpose of the delay is to store the data while the previous signal is processed through correlation process. Correlator module is used to compute the correlation between CP and the end part of the signal. This correlation will be used to determine the start of a symbol.

Stage 1 2 3 4 5 6

Radix-2 Radix-2 Radix-2 Radix-2 Radix-8 Radix-8

Radix Configuration multi-path delay commutator multi-path delay commutator single-path delay feedback single-path delay feedback single-path delay feedback single-path delay feedback

To prevent overflow resulting from the computation process, the bit width is added after addition from each stage. Table II shows the bit width of each stage of FFT:

TABLE II.

B IT WIDTH OF EACH STAGE 1

Data Format Sign Int Frac Total

Input

1 1 2 12 15

1 1 12 14

2 1 3 12 16

Stage 3 4 1 1 4 5 12 12 17 18

5 1 8 12 21

6 1 11 12 24

0

Output 1 1 12 14

Head In

FIFO

phwr phrd

Control

1

Input

Output

0

Radix 8

Head Out

In0

Fig. 8.

Architecture for Stage 6 FFT Rd1

Head Out

Head In

WE R_addr W_addr

Control

RAM 1 Data_in

Address reorder

WE R_addr W_addr

Out0 Input (0..3)

Out1

Twq zero

In2

Input Arranger

Radix 2 In1

Fig. 9.

Data out 1

Output Arranger

The FFT module utilizes Radix-2 multipath delay commutator architecture in stage 1 and 2 as shown in Figure 6, Radix-2 single path delay feedback in stage in stage 3 and 4 as can be seen in Figure 7. In stage 5 and 6, Radix-8 single path delay feedback architecture is used as shown in Figure 7 and 8. In the last stage, Figure 9, bit reorder is required to arrange the bit output same as the input.

Data out 2

Output (0..3)

RAM 2 Data_in

Architecture for reorder bit

Out2

Radix 2

Buffer 12 x 600

Out3

In3

Output Ro(600)

Ro(n)

Ro(3)

Ro(2)

R(1)

Twq zero TW

Head In

Generator

Control

Head In

Control

Mux Controller Demux Controller

Head Out Buffer 12 x 256 x 4

Fig. 6.

Architecture for Stage 1 and 2 FFT

1

1

1

Out0 R1(1021)

0

R1(4n-3)

0

R1(5)

0

R1(1)

1

1 0

Head In

R2(1022)

phwr phrd

Control

R2(4n-2)

1

Radix 2 / Radix 8

1

1

0

0

Output

0

R2(6)

0

R2(2)

0

1

1

R3(4n-1)

1

0

R3(7)

0

R3(3)

1

1

Out3 R4(1024)

TW Generator

Twq zero

0

R4(4n)

0

R4(8)

0

R4(4)

Head Out

Fig. 10. Fig. 7.

0

Out2

R3(1023)

Input

1

1

Out1

FIFO

Architecture for output buffer (parallel-to-serial)

Architecture for Stage 3, 4, and 5 FFT

The output buffer is employed to rearrange the output data from FFT module, which is arranged in parallel, back to serial. In this block, bit width of the output data is converted back into 12 bits by cutting the fractional bit so that the result is not changed abruptly. The architecture for this output buffer is shown in Figure 10. IV.

S IMULATION AND S YNTHESIS R ESULTS

A. Timing Synchronizer Correlation operation can be performed by using MATLAB built-in function, xcorr. However, for this case, our interest is

only at zero lag value, therefore we employ custom correlation algorithm. Figure 11 shows the result of correlation operation based on (1). From this result, it shows that correlation method can be used to find the start of the symbol. The start index of each symbol is shown by the peak from each symbol. The verification is performed by comparing the correlation results against the reference data. The simulation is carried out with several test parameters, such as, SNR value and NCELLID of the input signal. The error of correlation operation also called as offset (zero, negative, and positive) is computed with

LAB, this calculation is performed by using exact approach whereas in Verilog HDL implementation, the calculation is performed using CORDIC approach with limited bit resolution.

Fig. 11.

CP correlation’s result

different SNR level and can be seen in Table III. The failure of obtaining the correct start of the symbol can lead to incorrect demodulation output. For the case of negative offset, the signal is still possible to be reconstructed, while positive offset much harder to do so. TABLE III.

C ORRELATION ERROR RATE

Description Zero Offset Negative Offset Positive Offset Average Accuracy Worst Case Accuracy

40 dB 52% 20% 28% 99.73% 98.30%

SNR 10 dB 46% 22% 38% 99.73% 97.63%

Fig. 13.

5 dB 38% 24% 38% 99.63% 98.10%

Figure 12 shows the functional simulation result of timing synchronizer module using Modelsim tool. The input of this simulation is a data bit stream. This module produces the output exactly the same as the input. It also gives additional index out output, which shows the index of maximum values from each symbol. The head out output output works as a flag to show which data is valid for next processing. If there are some errors occurred during processing, it will be shown from the error output and will show high logic output.

B. FFT The functional simulation is performed to verify the validity of the proposed FFT block. Figure 14 shows the result from simulation using ModelSim tool. The input of this module is data bit stream in time domain from timing synchronizer module. After the FFT processing are done, the output is in frequency domain. The packet data is in form of downlink data packet which has 600 packets for each symbol.

Fig. 14.

Fig. 12.

Modelsim functional simulation result

The verification is performed by dumping data output from Modelsim simulation result and then comparing them with the output data from MATLAB calculation. From Figure 13, it can be seen that from 80 samples, there are 2 error detected. These differences occur because of differences between MATLAB and RTL computation upon calculating abs(x + jy). In MAT-

MATLAB vs ModelSim Comparison

ModelSim functional simulation result

The verification system design is perfomed by comparing the results of dumped data taken from simulation using ModelSim with the output value from the default MATLAB FFT function using the same input. From Figure 15, it can be seen that there are small differences and 3 abrupt differences from 1800 samples. These small errors are caused by different algorithm used to calculate the FFT. In MATLAB, the calculation is perfomed using exact approach (floating point representation), whereas in Verilog HDL implementation, the calculation is perfomed using fixed point representation. The abrupt differences are caused by twiddle factor used in the Verilog implementation. Although an abrupt errors occur, those errors are still within the system tolerance limit, thus it will not change the final results.

Fig. 15.

(a) FPGA testing of PSS synchronizer

Output Comparison from ModelSim and MATLAB

C. FPGA Testing The proposed design is synthezed by using Altera Quartus II software with DE4 FPGA as the target board. The testing procedure is carried out as in Figure 16. Computer

data

FPGA Processing

Memory

Dump data Verification

Computer

Read

RAM

(b) FPGA testing of SSS synchronizer Fig. 17.

Fig. 16.

FPGA testing of PSS and SSS Synchronizer

FPGA testing procedures

The verification on FPGA is performed by downloading the input data into FPGA memory and then the process will be done on FPGA. The results is displayed using 7 segment, which is part of the FPGA board (as shown on the top left corner of the board). Push button is utilized to toggle between PSS and SSS number shown on the 7 segment. The ouput number is shown in hexadecimal format. For data verification, the reference data is dumped into RAM memory and then read by computer and compared with ModelSim calculation. Figure 17 shows the testing of PSS and SSS synchronizer perfomed on DE4 FPGA using dummy input from PC. The input for the test is LTE packet data with NCellID 82. From the simulation result, it is shown that PSS is 1 and SSS is 1B or 27 in decimal format. According to [7] the calculation for NCellID is: (27 × 3) + 1 = 82 . Therefore, the system implemented on FPGA has shown the result as desired. The performance of proposed design is measured by using latency (processing time) and area parameters. The latency is obtained from the simulation using Modelsim, while area is obtained from the synthesis result using Quartus II. The latency needed for finishing each module computation can be seen in Table IV. Utilizing Quartus II TimeQuest Analyzer, the maximum clock of each module can be obtained as shown in Table V. It is shown that the highest frequency is achieved in FFT input buffer module by 467.73 MHz, while the lowest frequency of 32 MHz is resulted from peak detector module. The area consumption of the proposed design is shown in Table VI. The area consumption is represented by the combinational ALUTs,

memory ALUTs, and dedicated logic registers. This proposed design utilizes total area of 206798. TABLE IV.

P ROCESSING TIME

Module Timing Synchronizer FFT

TABLE V.

M AXIMUM FREQUENCY

Module Correlator CORDIC Peak Detector Maximum Detector FFT Input Buffer FFT Main FFT Output Buffer

TABLE VI.

Processing Time (Clock Cycle) 3303 1810

Frequency (MHz) 330 352 32 368 467.73 64.19 280.9

A REA C ONSUMPTION

Logic Gate Combinational ALUTs Memory ALUTs Dedicated Logic Registers

Area 41627 1848 163323

V.

C ONCLUSION

A hardware architecture for time domain LTE baseband signal processor have been designed. The propsed design consists of timing synchronizer and FFT module. This design is functionally simulated using Modelsim and synthezed using Altera Quartus II. The design has been successfully implemented in DE4 FPGA board and results as desired. The total are consumed by this design is 206798, while the maximum frequency of the system is limited by its peak detector module which is 32 MHz. The future work is to optimize the area and increase the speed. R EFERENCES [1] [2]

[3]

[4] [5]

[6]

[7]

“Overview of the 3GPP LTE physical layer,” White Paper, Freescale Semiconductor, July 2007. E. Dahlman, A. Furusk¨ar, Y. Jading, M. Lindstr¨om, and S. Parkvall, “Key features of the LTE radio interface,” Ericsson Review, vol. 2, pp. 77–80, 2008. D. Wu, J. Eilert, D. Liu, A. Nilsson, E. Tell, and E. Alfredsson, “System architecture for 3GPP LTE modem using a programmable baseband processor,” in 2009 International Symposium on System-on-Chip. (SOC 2009). IEEE, 2009, pp. 132–137. F. Soewito, “FPGA based timing synchronizer for LTE 4G receiver,” Bachelor’s Thesis, Bandung Institute of Technology, 2015. J. W. Cooley and J. W. Tukey, “An algorithm for the machine calculation of complex fourier series,” Mathematics of computation, vol. 19, no. 90, pp. 297–301, 1965. T. Adiono and R. Mareta, “Low latency parallel-pipelined configurable FFT-IFFT 128/256/512/1024/2048 for LTE,” in 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS), vol. 2. IEEE, 2012, pp. 768–773. Physical Channels and Modulation, 3GPP TS 36.211, Technical Specification Group Radio Access Network; Evolved Universal Terrestrial Radio Access (E-UTRA), Rev. 10.0.0, 2011.