Considering the lack of flexibility of custom Application-. Specific Integrated Circuit (ASIC) in reconfiguration and reprogrammability and the recent advances in.
DSP Implementation of a BPSK SNR Estimation Algorithm for OFDM Systems in AWGN Channel A.Doukas, A.Kotsopoulos, G.Kalivas University of Patras, Department of Electrical and Computer Engineering Campus of Rion, 26500, Patras, Greece {adoukas,akotsop,kalivas}@ee.upatras.gr Abstract-Scope of this paper is to present and analyze the implementation and optimization procedure of the Squared signal to Noise Variance (SNV) Signal-to-Noise Ratio (SNR) estimation algorithm for Orthogonal Frequency Division Multiplexing (OFDM) systems, on the TMS320C6711 Digital Signal Processor (DSP) of Texas Instruments (TI), and to evaluate its performance under realistic processing conditions. SNV-SNR is a one-step SNR estimator operating on BPSK modulated data in Additive White Gaussian Noise (AWGN) channels. At first, the algorithm was implemented in standard ANSI-C on a conventional C-compiler for a first simulation proof of concept and to define its basic structure. At the next stage, the code was ported on the DSP using C6711-specific C/Assembly for optimization and embodied into a lab’s previously implemented HIPERLAN/2complying OFDM transceiver running on the DSP for providing the estimator with actual OFDM data. Moreover, a thorough optimization strategy was carried out for speed enhancements and resulting time/memory constraints identification based on techniques such as double-word access. Finally, the overall performance results in terms of SNR estimation accuracy, execution speed and memory requirements of the algorithm was analyzed resulting into an estimation accuracy very close to the software implementation of the algorithm and a code size that occupies only 1.8KB of the I-RAM. Keywords: SNR estimation, DSP, optimization, OFDM.
I.
INTRODUCTION
Over the last few years, the introduction of next generation applications has led to the demand for even higher data rates. In Orthogonal Frequency Division Multiplexed (OFDM) [1] based Wireless Local Area Networks (WLAN) such as 802.11a and HIPERLAN/2 this can be achieved by exploiting the channel state information (CSI) and adapting the transmission characteristics accordingly. Towards this are moving the adaptive modulation and adaptive bit-loading techniques [2], that seem able to enhance significantly the throughput performance of the system. The estimation of Signal-toNoise Ratio (SNR) is a crucial parameter, since it provides significant information that can be used to maximize the utilization of the channel via such adaptive techniques. In order to test the capabilities and constraints of a standard algorithm under realistic conditions, an implementation on appropriate hardware is critical. Considering the lack of flexibility of custom ApplicationSpecific Integrated Circuit (ASIC) in reconfiguration and reprogrammability and the recent advances in
semiconductor technology which led to the rapid increase in clock-speed and memory of DSPs, the most suitable solution is keeping the functionality entirely inside a programmable DSP. This way, fast upgrades and modifications are feasible, and in parallel time-to-market is minimized. In addition, a DSP implementation is crucial since it enables the evaluation of the algorithms in a realistic system in order to determine vital parameters for its applicability such as execution speed, on-the-fly estimation time constraints, estimation accuracy and memory requirements, in an environment where real processing conditions and quantization errors exist. Whole or partial system implementation of OFDM based WLANs on DSPs has been studied in [3-6]. In [3] and [4] the development of a whole OFDM WLAN system using two [3] or more [4] Texas Instruments (TI) DSPs is described. The implementation besides the basic parts of an OFDM system also includes CSI parts for equalization of the received data. A similar implementation is described in [5] but for a Terrestrial Digital Video Broadcasting (DVB-T) system, focused mainly on the structure of the equalizer and the demodulator of the system. Only in [6] an implementation of an SNR estimation algorithm is presented. That algorithm, Iterative-SNR (I-SNR), is an iterative SNR estimation algorithm that estimates SNR from the received after a few iterations. Drawback of this algorithm is its iterative nature, resulting into a significant amount of necessary processing load making it cumbersome for a real time implementation. In this paper the implementation, optimization, and evaluation of a one-step SNR estimation algorithm, the Squared-signal-to-Noise Variance SNR (SNV-SNR) estimator [7] is presented, which to the best of authors’ knowledge is the first time that is presented in the literature. SNV-SNR operates on a vector of BPSK modulated data in OFDM systems with AWGN channel characteristics. We examine its performance in terms of estimation accuracy and hardware resources, including execution speed and memory. A thorough optimization scheme is applied in order to obtain minimized execution time without sacrificing precision while we verify that its behavior conforms to previous simulation results. The rest of this paper is organized as follows. In Section II, an overview of the OFDM HIPERLAN/2-complying platform where we embodied the SNR estimation stage
and the SNV-SNR are described. The implementation process is presented in Section III, whereas in Section IV we describe the general optimization strategy as well as the way the scheme was applied for our case. In Section V we illustrate the overall performance results of the implemented algorithm and a comparison with a similar published implementation in terms of execution speed, memory requirements, and estimation accuracy. Finally, concluding remarks are given in Section VI. II. OFDM BASICS AND SNV-SNR ESTIMATOR A. OFDM Basics A standard OFDM symbol consists of a useful and a cyclic prefix part. Specifically, in the case of HIPERLAN/2 [8], the useful part contains 48 modulated data symbols, 12 frequency nulls and 4 pilot carriers, whereas the cyclic prefix is the 16-sample wide cyclic extension of the useful part and is used for robustness against multipath. A 64-point FFT converts each symbol from the frequency domain into a sequence of complex samples in the time domain. The cyclic prefix is then applied to the start of each symbol, and an applicationdefined number of symbols constitute an OFDM frame with a preamble part at the front for synchronization purposes. At the receiver, the preamble is removed after synchronization, a 64-point FFT is applied, the pilot carriers are separated, and the demodulated bits are descrambled to recover the original binary data stream. In order to supply the SNR estimator with actual OFDM data, we used the previously implemented HIPERLAN/2-complying transceiver running on the DSP, illustrated in Fig. 1, together with the additional CSI block which in our case consists of just the SNR estimation stage. B. Squared Signal-to-Noise Variance SNR Estimator The Squared Signal-to-Noise Variance SNR (SNVSNR) is a one-step estimator that does not use any previously known data, such as pilots or preamble, but only the received data symbols (DA) from receiver decisions (RX). Thus it is denoted as Data Aided from Receiver decisions (RXDA). The estimation is performed using a vector of N BPSK modulated data symbols. The position inside the receiver where the data symbols are collected at the receiver is at the input of the demodulator, as shown in Fig.1.The estimated SNR value is directly acquired from the following formula
SNR =
⎛1 ⎜ ⎝N 1 N
⎞ rk ⎟ ∑ k =1 ⎠ N
⎛1 r −⎜ ∑ k =1 ⎝N N
2 k
2
⎞ rk ⎟ ∑ k =1 ⎠ N
2
(1)
where rk and N represent the received complex signal and the vector’s size respectively. The precision of the estimation depends on the length N of the data vector the algorithm is provided with.
Figure 1. HIPERLAN/2 OFDM transceiver and CSI block.
SNV-SNR IMPLEMENTATION SNV-SNR was implemented on a TMS320C6711 DSP with internal clock frequency at 200MHz [9], and the development tool used for debugging and optimization was Code Composer Studio (CCS) 3.1 Platinum. The chosen numerical representation was floating-point arithmetic. The reason for that was two folded. Firstly, although the C6711 is capable of executing both fixedand floating-point code, yet its internal architecture is basically floating-point oriented and ,thus, provides little flexibility for fixed-point implementations of high complexity such as changing between several fixed-point representations. This way if there was a need for functions performing conversion between different fixed-point formats, in order to maintain the desired accuracy handcoded should be involved resulting into a significant overhead and higher implementation complexity, which would in turn cancel the possible gain of a fixed-point realization against a floating-point one. Additionally, the SNV-SNR involves arithmetic operations that are difficult to manipulate with fixed-point numbers without causing overflows or underflows, such as divisions, additions, and subtractions repeated over a rather large number of iterations inside loops. Thus, with estimation precision evaluation under realistic conditions being the primary goal, it was decided floating-point arithmetic to be used along with a powerful optimization strategy for cycle minimization. The implementation procedure started by coding the algorithm using conventional ANSI-C and a standard Ccompiler, so as to roughly define the necessary variables and construct the implementation’s basic structure. Next, the algorithm was ported on the C6711 using DSPspecific C and adapted to the HIPERLAN/2-complying platform for correctness verification purposes using actual OFDM samples from the transceiver. The program makes use of a circular buffer to continuously collect the necessary OFDM samples, the number of which is defined in a separate header file at compile time together with every parameter of the transceiver. A semaphore is used to determine the moment when sufficient samples are gathered, and the sample vector is passed to the estimator. The SNV-SNR implementation as well as the OFDM system was placed in the internal RAM (IRAM) of the DSP, since the code size allowed so and the IRAM is significantly faster than
complicated arithmetic, such as complex divisions, into the main code so as to avoid the relative overheads. In the next stage the powerful optimization features of the C6x compiler coming with CCS were applied. Full optimization (-o3) and speed-most-critical options were set along with C67x-specific features enabled in order to fully utilize the hardware characteristics of our target DSP. Finally, arithmetic operations, like reciprocals, were replaced by their equivalent C67x compiler intrinsic, which are C-callable functions that map directly to C67xoptimized assembly code, therefore enhancing performance. Having exhausted compiler-based improvements, the III. SNV-SNR OPTIMIZATION code was profiled and the most cycle-consuming parts In the following subsection the overall optimization were identified. These parts were mostly loops including strategy is presented. The analysis for each step as that high processing load. The available options for speedingwas applied on the SNV-SNR is discussed in subsection B. up these code parts were three: I) via optimized dsp67x library functions, II) via optimization techniques such as A. Optimization strategy overview loop unrolling and double-word access in parallel with Here we introduce a general strategy which can be used in similar implementations using the CCS. Our compiler intrinsics, and III) via fast algorithms presented selected strategy, which as main concern has the in literature, for example in [10], where the Newton Rapson Inverse (NRI) method for square root estimation optimization of execution, was structured as follows: and the Equirriple-Error Magnitude method for complex 1. Place the whole code in IRAM if possible. magnitude estimation are evaluated. Finally a profile-and-compare stage was carried out, 2. Re-arrange/re-write the code exploiting any algorithmic redundancies via reuse of any possible inner results. which showed that the most suitable options were (II), (III) 3. Avoid function calls either via macros and inlines or via and a combination of the two in some cases. Special care was taken for the fast algorithms chosen in (III) in order to integrating the function bodies into the main program. ensure that the maximum precision errors they posed did 4. Compile the code with full CCS compiler optimization not worsen significantly the resulted SNR estimation (for example -o3 option) enabled values. This way, in the worst case the maximum resulted 5. Replace basic arithmetic operations, such as additions, SNR deviation, from the original approach, was multiplications, and reciprocals, with their DSP-specific approximately at 0.05dB. intrinsic equivalents. any external memory because it avoids external bus access for input/output. The decision about the size of the data vector, with which the estimator is supplied, is actually a trade-off between execution speed and estimation precision. However, as the measurements from the DSP implementation showed, the algorithm needs a minimum of 400 samples in order to provide sufficiently accurate estimations. The resulting high processing load leads to the necessity of the appliance of a powerful optimization scheme, which is analyzed in the upcoming section, upon the first implementation.
6. Make use of any DSP-specific libraries available, such as the dsp67x.lib, after having profiled them to ensure desired performance. 7. Profile the code and identify the most cycle-consuming parts, most probably loops with high processing load. 8. Re-write these parts using any possible optimization techniques, such as double-word access or further loopunrolling. 9. Exploit any No-Operations (NOP) in the compilerproduced assembly and replace them with useful code if possible, thus, taking full advantage of the pipelining capabilities of the DSP. 10. As a last solution, due to the great programming effort needed, hand-write the most time-consuming parts in hand-coded assembly from scratch. If still the performance of the code is not satisfactory go back at step 7, trying to split the sensitive parts into smaller and optimize each one separately. B. SNV-SNR optimization analysis The first stage of optimization lied upon modifications of the coding structure of SNV-SNR. According to the scheme, the computational flow was modified and new buffers for storage of inner reusable results were defined, in order to exploit code redundancies. Additionally, the usage of function calls was minimized by integrating
IV. PERFORMANCE RESULTS In this section we are going to present the performance results of SNV-SNR. In Fig.2 the estimation accuracy of the DSP implemented algorithm compared to the software implemented algorithm is presented. We can see that the hardware implementation is very close to the software implementation. This negligible difference is due to the inevitable quantization errors that the DSP poses and the algorithms used for fast square root and complex amplitude estimation. In Fig. 3 the estimation accuracy of the algorithm with variable number of data used is depicted. From this figure, firstly, we can see that in every case the estimation accuracy is very high. In addition the trade-off between the accuracy and the samples used for estimation is obvious: the more the samples used the better the estimation. Taking under consideration that the additional 600 samples between the 400 and 1000 samples implementation add to the DSP a significant computational load, and that the estimation accuracy is only slightly degraded when only 400 samples are used, we choose to perform the estimation using 400 samples. In Fig. 4 a comparison between SNV-SNR and I-SNR in terms of estimation accuracy is presented .It can be seen
SNV-SNR estimation accuracy 20 actual SNR hardware estimated SNR for 1000 samples software estimated SNR for 1000 samples
18 16
estimated SNR
14 12 10 8 6 4 2 0
0
2
4
6
8
10 12 actual SNR
14
16
18
20
that both algorithms perform similarly except for a slight advantage of I-SNR in lower SNR values. In Table I, the hardware implementation requirements of SNV-SNR and a comparison with I-SNR in terms of speed are depicted. We can see that through the optimization process a significant decrease in the number of the cycles has been achieved. The penalty for this decrease is the slightly increased code size, yet it can still be fitted in IRAM. Finally, it is obvious that, although the I-SNR provides slightly better estimations for low SNR values, yet the SNV-SNR is significantly faster and thus meets better any real time constraints. V. CONCLUSIONS
Figure 2. Hardware vs. Software Implementation. SNV-SNR estimation accuracy 25 actual SNR estimated SNR for 400 samples estimated SNR 600 samples estimated SNR 1000 samples
estimated SNR
20
15
10
5
0
0
5
10
15
20
25
actual SNR
In this work we have presented the implementation and optimization procedure of a single step SNR estimation algorithm, SNV-SNR, on a TI C6711 DSP operating on BPSK modulated data for OFDM systems in AWGN channel and suggested an optimization process for such implementations. We show that the estimation accuracy of the implemented algorithm is similar to the software implementation of the same algorithm. In addition we compare the implemented algorithm with a previously implemented iterative SNR estimation algorithm and our implementation achieves slightly worse estimation accuracy results but much higher execution speed. Finally through the optimization procedure we manage to further lower its time requirements while at the same time achieving to keep its code size in very low levels, making it able to fit into the IRAM of the DSP.
Figure 3. SNV-SNR estimation accuracy with different sample number. [1]
Comparison of SNV-SNR vs I-SNR 20 18
Actual SNR SNR estimation with I-SNR SNR estimation with SNV-SNR
16
SNR estimated (dB)
14 12 10 8 6 4 2 0
0
2
4
6
8 10 Actual SNR
12
14
16
18
Figure 4. SNV-SNR vs. I-SNR Estimation accuracy comparison.
Table I. SNV-SNR Hardware Requirements and Performance. Optimization level Number of samples Cycles Code-size Cycles for 64point FFT I-SNR Cycles
No-optimization 500
Full-optimization
1000
500
20400 35000 1.5KBytes
REFERENCES
1000
1520 2580 1.8Kbytes 900
9000 (6 Iterations with 1000 samples)
R. Van Nee, and R. Prasad, OFDM for Wireless Multimedia Communications. Artech House, 2000. [2] A.M.Wyglinski, F.Labeau, and P.Kabal, “Bit loading with BERconstraint for multicarrier systems,” IEEE Trans. on Wireless Comm., vol. 4, pp.1383-1387, June 2005. [3] M. F. Tariq, Y, Baltaci, T. Horseman, M. Butler, and A. Nix,“Development of an OFDM based High Speed Wireless LAN Platform using the TI C6x DSP,” Proc. IEEE International Conf. on Communications , pp. 522-526, New York,USA, 2002. [4] B. McNair, L.J. Cimini and N. Sollenberger,“Implementation of an experimental 384 kb/s radio link for high-speed Internet access,” Proc. IEEE 52nd Vehicular Technology Symposium, pp. 323 – 330, Boston, USA, Sept. 24-28 ,2000. [5] F. Frescura, S. Pielmeier, G. Reali, G. Baruffa and S. Cacopardi, “DSP Based OFDM Demodulator and Equalizer for Professional DVB-T Receivers,” IEEE Trans. On Broadcasting, vol. 41, pp 323-332, Sep. 1999. [6] A. Doukas, A. Kotsopoulos and G. Kalivas,” DSP Implementation of SNR Estimation Algorithm for OFDM Systems,” accepted for presentation IEEE Intern. Conf. Comm. Circuits and Systems,June 25-28, 2006. [7] A.Doukas and G.Kalivas, “BPSK SNR estimation algorithms for HIPERLAN/2 transceiver in AWGN channel”, Proc.IEEE International Conf. on Advanced Communication Technology, pp.196-200,Phoenix Park,Korea, ,Feb. 21-23,2005. [8] ETSI Broadband Radio Access Network (BRAN);HIPERLAN/2 Technical Specifications Physical layer,V0.k., August 1999. [9] Texas Instruments, “TMS320C6711, TMS320C6711B, TMS320C6711C, TMS320C6711D floating-point digital signal processors, Literature Number: SPRS088M, Febr.1999 – Revised Febr. 2005”. [10] M. Allie, and R. Lyons, “A root of less evil,” IEEE Signal Processing Magazine, v. 22, Number 2, pp. 93-96, March 2005.