A Low Power Reconfigurable Heterogeneous ... - Semantic Scholar

11 downloads 123 Views 148KB Size Report
platform for software-defined radio,” Proc. 2004 Software. Defined Radio Technical Conferemces,. [5] Rob Pelt, Martin Lee, “Low power software defined radio.
A Low Power Reconfigurable Heterogeneous Architecture for A Mobile SDR System

Zong Wang School of Engineering & Electronics University of Edinburgh Email: [email protected]

Tughrul Arslan School of Engineering & Electronics University of Edinburgh Email: [email protected]

Abstract

three aspects stated above, trying to leverage them within different budgets. But few of these were successful in making the solution suitable for mobile use, in which situation, low power, low cost together with an on-the-fly reconfiguration ability is critical. Most proposed solutions have problems in being difficult to program [3] and/or high cost [4].

The main challenge in designing a mobile wireless Software Defined Radio (SDR) system is to provide a solution that has high flexibility, hardware-like throughput, low power consumption, in addition to ease of programmability. In this paper, the authors propose a new architecture for SDR that is based on a reconfigurable instruction cell array (RICA). The architecture targets the IEEE 802.11g standard that includes Viterbi decoding, which is a key performance bottleneck. One of the salient novel features in this architecture, compared to existing solutions, is adopting a multi processor frame segmentation scheme when implement the 802.11 physical layer of the above standard. The paper describes the architecture, the associated software design flow, and performance efficiency. We demonstrate that the architecture can achieve a raw data throughput of 30.6Mbps for an 802.11g receiver at a core power of 8.1mW.

1. Introduction Software Defined Radio (SDR) is currently one of the most important topics in the area of mobile and personal communications. It is viewed as an enabler of global roaming and as a unique platform for the rapid introduction of new services into existing cellular and local networks. Therefore SDR promises mobile communication technology a major increase in flexibility, capability and durability. Figure 1 illustrates trends in the implementation of SDR with a view of providing performance requirements matching that of future mobile systems on one hand with the added demand for flexibility on the other hand. Since the concept of SDR was introduced, a significant R&D has been carried out in academic institutions as well as industry. Researchers have explored solutions from the

978-1-4244-2796-3/08/$25.00 © 2008 IEEE

Fig. 1 SDR flexibility trend

In this paper, we propose a solution for the mobile SDR, based on a novel RICA processing element that provides low power hardware-like performance while can be programmed all the way through with only ANSI-C.

2. Architecture overview 2.1 RICA architecture overview [6] As it shows in figure 2, RICA’s coarse-grain processing fabric, with its Instruction Cells (ICs) provides a hardware level performance by executing assembly-like instructions, so a straight forward CPU-like design-flow can be obtained.

313

FPT 2008

performance, rather than spending months to learn how to use different designing tools. Through the implementation of 802.11g, Bluetooth and WiMAX protocols, we found that most DSP algorithm like FFT, FIR filters, Viterbi decoders, and cipher engines rely on high degree of SIMD parallelism Table I Architecture comparison between different SDR solutions

SODA [3] Sandbridge Sandblaster [9] IMEC [10] Ours

Fig. 2 RICA core design high level flow

Applications such as wireless communication standards mapped onto RICA can be written in C or C++ and compiled through a tailored C compiler. From the assembly code, a step-based net-list file can be generated through a SIMD pipelined scheduler. ICs are connected through a network of programmable switches to allow the creation of data paths by reading the configuration bits from net list files stored in the program memory. The interconnect configuration is dynamically reprogrammed at run-time based on the schedule information. RICA uses an Eclipse C IDE as the development tool, therefore, it provides convenience for programmers to map applications without few knowledge of hardware language. Further more, since the ARM controller is also programmed in C, the whole design process can be completed within a C environment, which with no doubt, will accelerate the development speed and reduce the costs. Another advantage is that both RICA core and ARM core are 32bits architectures, thereby, there is no need to scale the data interchanged between the two processing cores.

System conf. 1 ARM + 4 PEs Static Scheduled 4 PEs Multi-threading 1 ARM + 2 BB engines 1 ARM + 2 RICA cores

PE conf. VLIW with SIMD ops GPP + SIMD VLIW + coarsegrained array SIMD recon. instruction cells

[7]. Their main operations are based on long vector variables with short data widths. Though the ICs in RICA are 32bit integer based, some of these can be configured into 4QI or 2HI vector SIMD peration modes, thereby, providing more efficient integration on the RICA for the base band processing element compared to conventional DSPs [8]. All the processors in the architecture are connected through AHB bus interface that has a large 64/128 bit data width, as shown in figure 3. Peripherals are linked by APB bus. Digital signal data flows fed from A/D directly, digital front end or other communication modules are coupled with the RICA core using a DMA engine, or preprocessed through the ARM microcontroller before being fed into RICA core through the AHB internal bus (when speed is not critical but frame level management is required). Each RICA core can be assigned different tasks. This is decided in the high level software design flow. Once each net list file is generated for the RICA cores, these can be downloaded into each RICA core individually either through internal AHB bus or through the debug interface. The net list files contain step context that is used to configure the RICA cores and indicate which instruction cells are active in a given step. This in addition, provides routing information to build connections between ICs, as well as providing timing information. The program counter is maintained by RICA core itself [6].

2.2 Hardware architecture In commercial wireless communication systems, low computation algorithms are often handled by DSPs, whereas high computation algorithms are mapped to ASICs or FPGAs. The whole system comprises an integrated SoC with a simple controller, such as an ARM processor [3] for handling tasks which are control or input/output related. Considering the complexity of the above real time systems, we integrate the design of individual base band processing algorithms on a single RICA core using a unified programming language, ANSI-C, together with the ARM controller. This scheme will provide programmers with opportunities to best optimize the system

2.3 System design environment Mapping of communication protocols commence with simulation and verification of functionally through M A T L A B , S i m u l i n k o r wi t h f l o a t i n g - p o i n t C implementations. Unlike traditional SoC design, the

314

AHB INT

DMA DMA

AHB INT

Fig. 3 Hardware architecture diagram

development flow then separates into system-level design and RICA profile design [7]. Both of these are performed in fixed-point precision in a C pattern environment, with the mere difference that RICA uses an Eclipse C extension IDE whereas ARM utilizes an ARM C development toolkit.

The mapping methodology is based on single RICA core, and the configuration of kernel algorithms are shown in table II. This profile can be used for programming the other RICA core inside our architecture as well. Data frames are fed into individual RICA processors following a frame segmentation method, as shown in figure 5. The frame sequence detection unit extracts the 2 bytes frame sequence control information from the frame header. Then it tells the data buffer pool which frame to be fed into which RICA processor. Here we utilize an even and odd method, which means every even frame will be fed into RICA processor 1 and vice versa for RICA processor 2. As a result of this approach, same quantity of frames is processed by individual RICA processor, leading to a close execution time. Therefore, the idle time ratio for each processor is low and the throughput of the system can

3. System assessment 3.1 Case study Figure 4 illustrates the 802.11g OFDM physical layer that has been mapped onto our proposed system. Since the 802.11g protocol is able to work in full duplex mode, which means the transmitter and receiver are able to run simultaneously, both of these are implemented.

Table II

Kernel algorithms configuration of 802.11g in our mapping process Kernel algorithms IFFT / FFT QAM / DQAM

Fig. 4 802.11g physical layer data flow

315

Config. 64 points

Rx task weight (%) 9.64 2.96

Conv. Enc.

64 constellation points K=7, Rate =3/4

Viterbi Dec.

K=7, Soft input

83.08

2.16

profile design to improve the system performance significantly. Custom cells can also be created and used as accelerator to special application requirements.

4. Conclusion In this paper, we have proposed a heterogeneous architecture targeting mobile SDR. The hardware platform is composed of an ARM controller and several Reconfigurable Instruction Cells based Array Processors. 802.11g protocol physical layer is implemented on our architecture. Simulation result demonstrates the advantage of the architecture for mobile SDR terminal that has high throughput requirement and tight power budget.

Fig. 5 Frame segmentation scheme for multi RICA cores

be roughly doubled by processing the same amount of frames. The simulation results are presented in table III as well as a comparison to SODA architecture. Table III Performance comparison of between our architecture and SODA Throughput (RICA @ 200 MHz) 1 2 SODA@ RICA RICAs 400MHz Tx Rx

54 Mbps 15.6 Mbps

54 Mbps 30.1 Mbps

24 Mbps 24 Mbps

Steps 1 RICA

SODA Cycles

23.2 Mcycles/s 21.5 Mcycles/s

806 Mcycles/s 1194 Mcycles/s

5. References [1] Gerard K. Rauwerda, Paul M. Heysters, and Gerard J.M.Smit, “Towards Software Defined Radios Using Coarse-Grained Reconfigurable Hardware,” IEEE Trans. On very large scale integration (VLSI) systems, vol. 16, no. 4, pp. 3-13, Jan. 2008. [2] John Glossner. Daniel Iancu, Jin Lu, et al., “A software defined communications baseband design,” IEEE Communications Magazine, vol. 41, no. 1, pp. 120-128, Jan. 2003.

3.2 Analysis

[3] Yuan Lin, Hyunseok Lee, Yoav Harel, Mark Woh, et al., “SODA: A low-power architecture for software radio,” Proc. 33rd Intl. Symposium on Computer Architecture (ISCA), vol.1, pp. 89-100, Jun. 2006.

Our approach can achieve a higher receiver and transmitter throughput with a lower frequency. Meanwhile, long data-path kernel loops inside RICA processor reduces the steps (RICA cycles) needed for completing a full physical layer, which also helps decrease the memory access frequency as well as the power consumption. Due to the incompletion of our power estimation software, the interconnection and routing energy is not available yet. However, using RICA as the coprocessor in our approach still shows significant advantage on core power compared to the SODA solution, as it shows in table IV, and other proposed architectures [9][4]. On the other hand, considering the programming flexibility using C language, more optimization techniques can be applied on the RICA

[4] Hans-Martin Bluethgen, Cyprian Grassmann, Wolfgang Raab, Ulrich Ramacher, et al., “A programmable baseband platform for software-defined radio,” Proc. 2004 Software Defined Radio Technical Conferemces, [5] Rob Pelt, Martin Lee, “Low power software defined radio design using FPGAs,” Proc. of the SDR 2005 Technical Conference and Product Exposition, [6] Sami Khawam, Ioannis Nousias, Mark Milward, et al., “The reconfigurable instruction cell array,” IEEE Trans. On very large scale integration (TVLSI) systems, vol. 16, no. 1, pp. 75-85, Jan. 2008. [7] Zong Wang, Tughrul Arslan, Ahmet Erdogan, “Implementation of Hardware encryption engine for wireless communication on a reconfigurable instruction cell array,” Proc. of IEEE international symposium on electronic design, test & applications (DELTA 2008), vol. 1, pp. 148-152, Jan. 2008.

Table IV Power comparison between our RICA and SODA’s PE Power (mW) Components 802.11a 802.11g 24Mbps 54Mbps SODA Exclude: PE SIMD Pipeline, clock, 1879 N/A Routing, interconnect Our Exclude: N/A 8.1 RICA Routing, interconnect

[8] Stephen M. Blust, “Perspective on software defined radio focusing on reconfiguration and software download,” Cingular Wireless (v3.0), Sept. 8, 2003 [9] Sandbridge white paper: http://www.sandbridgetech.com [10] IMEC research: http://www.imec.be

316