2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2009) December 7-9, 2009
MP2-C-3
Low-Complexity Joint Synchronization of Symbol Timing and Carrier Frequency for CMMB System Nan Shao, Yun Chen, Huxiong Xu, Simeng Li and Xiaoyang Zeng State Key Lab of ASIC and System, Department of Microelectronics, Fudan University, Shanghai 201203 China E-mail:
[email protected] Tel: +86-21-5135-5195
Abstract—In the lately released Chinese mobile multimedia broadcasting standard CMMB, cyclic-prefix based OFDM technique has been adopted. Traditional timing synchronization for this system requires a correlator to correlate the last Ng samples of the received OFDM symbol with their copies ahead, and at least N (the length of OFDM symbol) data and Ng correlation results have to be stored for implementation, which results in large consumption of memory. In this paper, a low complexity scheme for the joint synchronization of symbol timing and fractional carrier frequency is proposed. This scheme extremely lowers the hardware cost, especially the memory consumption. Simulation results show that the performance of this algorithm can also meet the need of the system well over both AWGN and typical CMMB multipath channel.
I.
INTRODUCTION
OFDM technique has been adopted into many outdoor mobile applications for its high spectral efficiency and capability of dealing with severe channel impairments encountered in wireless communications. In the lately released Chinese mobile multimedia broadcasting standard CMMB, cyclic-prefix based OFDM is also employed. Symbol synchronization is a critical part of CMMB receiver, which indicates the proper start of the symbols and ensures the correct FFT window. For a cyclic-prefix based OFDM system like CMMB, the conventional maximumlikelihood estimation [1] works well over AWGN channels. This scheme requires correlation between the last Ng samples of an OFDM symbol with their copies ahead. Direct computation of this correlation will result in a great computational burden as well as an extremely high hardware cost. An exponentially weighted average (EWA) scheme is adopted [2] to estimate the symbol timing while the carrier frequency is still estimated by ML algorithm. In this paper, the EWA algorithm is adopted to estimate both the symbol timing and carrier frequency. Furthermore, the consecutive identical synchronization signals are utilized to achieve the proposed algorithm instead of the OFDM symbols and their cyclic prefix due to the special architecture of CMMB system [3]. Implementation result shows that the proposed method greatly reduces the hardware cost, especially the memory bits. The remainders of this paper are organized as follows. The CMMB system model is described in Section II. Section III introduces the proposed low complexity joint estimation of symbol timing and carrier frequency, as well as algorithm
c 978-1-4244-5016-9/09/$25.00 2009 IEEE
simulation and performance analysis. Then, Section IV presents ASIC, FPGA implementation and comparison of hardware consumption with the traditional ML algorithm. Finally, the conclusion is given in Section V. II.
CMMB SYSTEM MODEL
The frame structure of CMMB system [3] is illustrated in Fig. 1. Every frame is composed of 40 time slots and every time slot has a beacon followed by 53 OFDM symbols. The beacon comprises a transmitter identity (TxID) and two consecutive identical synchronization signals (Sync). Sync is a short-cut PN sequence modulated by 2048 sub-carriers, the space between sub-carriers is 4.8828125 kHz. Each OFDM symbol includes two parts: one is the symbol body which consists of 4096-sample modulated onto 4096 sub-carriers and the space between sub-carriers is 2.44140625 kHz; the other is cyclic prefix (CP), which is the copies of last 512 samples of the body. Guard interval is inserted between the TxID, Sync and the OFDM symbols.
Fig.1. CMMB Frame Structure
The basic structure of transmitter and receiver in CMMB system is showed in Fig. 2. Considering the unknown symbol arrival time T and fractional carrier frequency offset f (integer frequency offset is ignored because of ambiguity), the received signal with additive noise n( k ) is given by [4]
– 192 –
r (k )
s (k T ) e 2 kS f / N n(k ).
(1)
2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2009) December 7-9, 2009
Fig. 2. Basic transmitting-receiving structure of CMMB system
where N is 4096, Ng is 512, Tˆ and fˆ indicate the estimated synchronization point and the frequency offset respectively. The conventional structure and the optimized structure [5] are illustrated in Fig.3. In the optimized structure the movingsum part is especially optimized by a memory ring instead of the register-chain. The figure indicates that, compared with the conventional moving-sum structure, the addition times are greatly reduced, but still a lot of memory is needed. CMMB frame structure is based on time slot, once the start of the time slot is found, the FFT start points of the following 53 OFDM symbols will be determined. Therefore, the consecutive identical Sync at the beacon can be used to perform the symbol timing, as they perform like a 2048length symbol with a 2048 cyclic prefix ahead.
(a)
A.
(b) Fig. 3. Structure of ML algorithm.(a) The conventional structure. (b) The optimized structure.
III.
6\PEROWLPLQJV\QFKURQL]DWLRQ
Based on the structure of two consecutive identical Syncs, the delay-correlating results can be derived as follows: corr ( k )
*
r ( k ) r ( k 2048)
| s ( k ) |2 e j 2S f n( k )k I ® * j 2S f n ( k )k I ¯ s ( k ) s ( k 2048)e
PROPOSED ALGORITHM
Conventional ML algorithm can be expressed as [2]
(4)
Tˆ N g 1
Tˆ
arg max(| Re{
¦
*
r ( k ) r ( k N )} |
k Tˆ
(2)
Tˆ N g 1
|Im{
¦
*
r ( k ) r ( k N )}|)ˈ
k Tˆ
Tˆ N g 1
fˆ
1 2S
¦
Im{ tan
*
r ( k ) r ( k N )}|
k Tˆ
1
ˈ
Tˆ N g 1
Re{
¦ k Tˆ
*
r ( k ) r ( k N )}
(3)
where I represents the set of the samples belong to the second Sync. Fig.4(a) shows the continuous accumulation results of corr ( k ) . Only the correlation of the two Syncs samples can get the result proportional to the signal energy, while other samples’ correlations get noise-like results. Therefore, the accumulation curve presents step form. The rising part stands for the location of the Syncs. As a result, a IIR low-pass filter can be used to pick up the correlation peak as Fig.4(b) shows. The transfer function of the filter is: 1 H (Z ) (5) 1 1 wZ
– 193 –
2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2009) December 7-9, 2009
0.9
-2
Normalized Root Mean Square Error of Carrier Estimation
0.8 0.7
Amplitude
0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.5
1
1.5 Time,samples
2
2.5
3 4
x 10
(a) 0.5
Amplitude
0.4
0.3
0.2
0.1
0
0
0.5
1
1.5 Time,samples
2
2.5
3
10
-3
10
Proposed EWA(TU6) ML(AWGN) Proposed EWA(AWGN) ML(TU6) -4
10
0
5
10
15 SNR(dB)
4
x 10
(b)
20
25
30
Fig. 5. Normalized Root Mean Square Error of residual CFO estimated by conventional ML algorithm and proposed algorithm.
3
Normalized phase
2 1
As shows in (4), when k I , corr ( k ) has a random phase.When k I , corr ( k ) has a time-invariant phase. The first term of (7) has a phase angle exactly equal to 2S f , while the second term has a relatively negligible phase for its random phase. As a result, the fractional frequency offset can be obtained approximately by (8).
0 -1 -2 -3
0
0.5
1
1.5 Time,samples
2
2.5
3 4
x 10
(c) Fig. 4. Outputs of the proposed estimator. (a) Continual correlation-sum without proposed algorithm (b) Continual correlation-sum with proposed algorithm (c) Angle of the continual correlation-sum with proposed algorithm
Tˆ
fˆ
1 2S
tan
1
Tˆ
Tˆ
Tˆ k
arg max(| Re{¦ w
*
r ( k ) r ( k 2048)} |
k 0
Tˆ
(6) Tˆ k
|Im{¦ w
*
r ( k ) r ( k 2048)}|
k 0 Tˆ
*
r ( k ) r ( k 2048)}
(7) Tˆ k
¦w
Re{
M
where w 1 2 , for the convenience of implementation. M is a positive integer. Here M is set 11 as the best choice according to simulation. Then, the symbol timing offset can be estimated as follows:
Tˆ k
¦w
Im{
*
r ( k ) r ( k 2048)}
k 0
Fig.5. gives the Normalized Root Mean Square Error (NRMSE) results of estimated fractional frequency offset with conventional ML algorithm and proposed algorithm, both over AWGN channel and typical CMMB multipath channel TU6.(The parameters of TU6 is showed in Table I) NRMSE is defined as (9)
k 0
1
M
¦ ( fˆ f ) M
2
/ fcs
Different with the conventional ML algorithm, this scheme takes only two accumulators instead of the 512-length register-chain or the 512-length memory ring. Furthermore, the delay-sample is only 2048, and half of the memory for delay is saved.
NRMSE
B.
TABLE I PARAMETERS OF TU6 MULTIPATH CHANNEL Path 1 2 3 4 5 Delay/us 0 0.2 0.5 1.6 2.3 Amplitude/dB -3 0 -2 -6 -8
Carrier Frequency Synchronization Noted that:
Tˆ
Tˆ k
¦w
Tˆ *
r (k ) r (k 2048)
w
Tˆ 2048
Tˆ k
¦w k 0
*
r ( k ) r (k 2048)
k Tˆ 2047
k 0
¦
Tˆ k
(7) *
r (k ) r ( k 2048)
i
(8)
i 1
Where f cs denotes sub-carrier space of frequency.
6 5.0 -10
The figure indicates that, although the accuracy of the proposed algorithm is less than ML, it meets the basic requirement of coarse fractional frequency offset estimation, that the residual carrier offset is less than 2%.
– 194 –
2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS 2009) December 7-9, 2009
Fig. 6. Hardware structure of proposed algorithm. TABLE I, HARDWARE CONSUMPTION OF DIFFERENT ESTIMATORS FPGA Implementation Proposed EWA Modified ML 867 938 Combinational Area/Gates 340 519 Noncombinational Area/Gates 49152 110592 Total Gates 8 8
Combinational ALUTs Registers/bit Memory/bit 9-bit Multiplier
IV. HARDWARE IMPLEMENTATION Here we implement the proposed algorithm with 12-bit quantization both by ASIC (SMIC 0.13um) and FPGA. Fig.6 presents the structure of proposed algorithm. As it shows, only two accumulation registers are needed (for both real and imaginary part), instead of the 512-length register-chain or the 512-length memory ring in Fig.3. Traditional implementation of N-sample delay employs N registers, which leads to great cost. Take 12-bit quantization for example, 24*N bit registers will be consumed. Here we take a dual-port memory instead, and a great many registers are saved. w 1 2
M
, which is very convenient for implementation,
M
for 2 can be realized by right shift. In CMMB system we choose M to be 11. Angle calculation module can be realized by CORDIC [6] algorithm, which is an approximation algorithm by cyclic iteration, and the accuracy is determined by the iterative times. Here we take 12-times iteration. The main calculations are shift and addition, which is convenient for implementation. Multiply a constant coefficient is needed to calculate the frequency offset fˆ . The coefficient is converted to CSD code [7], and multiplication is substituted by trial shift and add operation. For comparison the modified ML algorithm employing memory ring as shown in Fig.3 is also hardware implemented. The ASIC and FPGA implementation results are summarized in Table II. The table shows that, the hardware cost, especially the memory consumption is greatly reduced utilizing the proposed algorithm and hardware structure. V.
CONCLUSION
In this paper, a low-complexity joint synchronization of symbol timing and carrier frequency algorithm for CMMB
ASIC Implementation Proposed EWA Modified ML 8260 8513 90472
202366
98732
210879
system has been presented, as well as implementation results. Simulation and Synthesis results show that, the proposed algorithm can meet the synchronization requirements of CMMB system well over both AWGN channel and the typical CMMB multipath channel, while the hardware cost is greatly reduced compared with the conventional ML algorithm. For further cost reduction, sign-bit EWA can be used to estimate the start of the time slot. However, the accumulation of sign bits’ delay-correlation can’t be used for carrier frequency offset’s estimation. As a result, the proposed scheme chooses 12-bit quantization for less performance lost. REFERENCES [1] J.-J. van de Beek, M. Sandell, M. Isaksson, and P. O. Borjesson, “Low-complex frame synchronization in OFDM systems,” Proc IEEE Int. Conf. Universal Personal Commun., Toronto, Canada,Sept. 27-29, 1995, pp. 982-986. [2] M. H. Hsieh and C. H. Wei, “A low-complexity frame synchronization and frequency offset compensation scheme for OFDM systems over fading channels,” IEEE Trans. Vehicular. Technology, vol 48, No.5, Sep. 1999, pp. 1596-1609. [3] “Mobile Multimedia Broadcasting Part 1: Framing Structure, Channel Coding and Modulation for Broadcasting Channel,” Chinese broadcasting, film and television industrial standard.GY/T 220.1-2006. [4] Changchuan Yin, Tao Luo, Guangxin Le, “Multi-carrier Wideband Wireless Communication Technology,” Beijing University of Post and Telecommunication Publication, 2004. [5] Xiaojin Li, Yu Zheng, Zongsheng Lai, “A Low Complexity Sign ML Detector for Symbol and Frequency Synchronization of OFDM Systems,” IEEE Trans. Consumer Electronics, vol 52, No.2, May. 2006, pp. 317-320. [6] Jack E. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electron. Comput., 1959, 8, (3), pp. 330–334. [7] Reza Hashemian, “A New Method for Conversion of a 2’s Complement to Canonic Signed Digit Number System and its Representation,” Proc Asilomar Conf. Signals, Syst., Computers, 1997, pp. 904–907
– 195 –