Heterogeneous Multi-Core Architecture for a 4G Communication in High-Speed Railway Mariem Makni* , Mouna Baklouti†, Smail Niar* , Morteza Biglari-Abhari and Mohamed Abid† * LAMIH, University of Valenciennes, France Email:
[email protected] [email protected] † National Engineering School of Sfax, Tunisia Email: {mouna.baklouti, mohamed.abid}@enis.rnu.tn University of Auckland, New Zealand Email:
[email protected] Abstract—The fast development of high-speed railway (HSR), as a high-mobility intelligent transportation system (ITS), and the growing demand of broadband services for HSR users, introduce new challenges to wireless communication systems. 4G Long Term Evolution (LTE) standard has been widely used to satisfy the HSR communication system needs. The key part of 4G LTE standard is the Orthogonal Frequency Division Multiplexing (OFDM) modulation. In order to achieve a reliable communication and meet the demands of high performance processing and low energy consumption of HSR, we propose a flexible heterogeneous multi-core architecture for embedded LTE MIMO-OFDM system using Field Programmable Gate Array (FPGA). In this paper, different multi-core configurations of the LTE MIMO-OFDM are explored and their performances are evaluated on the Xilinx Zynq FPGA platform. The consumed area, power, and execution times of the different configurations are analyzed and compared in order to propose the most efficient architecture for this application. Keywords—Intelligent Transportation Systems (ITS); Highspeed railway (HSR); Multiple-Input Multiple-Output (MIMO); Long Term Evolution (LTE); Orthogonal Frequency Division Multiplexing (OFDM); Field Programmable Gate Array (FPGA)
I. INTRODUCTION With the recent development of high-speed railway (HSR) considering its complexity, speed and diversity of operation environments, it is necessary to exploit the existing technologies to support safer and more efficient train operations [1]. One of the main parts of the HSR system is the signaling system, which is also known as the train operation control system. Since the HSR services have been introduced and planned in many countries, the train has become the dominant mode of transport on many routes, which leads HSR systems to confront severe challenges in passenger flow security monitoring. The video surveillance in high-speed railway systems, which is also known as Closed-Circuit Television (CCTV) is considered as one of the modern applications of Intelligent Transportation Systems (ITS) that makes the transport field better managed and more efficient. However, the realization of CCTV an intelligent transportation system is complex and requires the integration of communication and information technologies to achieve a high-speed data exchange and a reliable data communication tool. For this reason, one promising solution is the utilization of wireless communication technology for high-speed railway called 4G Long Term Evolution (LTE). 978-1-4673-9994-4/15/$31.00 ©2015 IEEE
LTE is the new standard radio communication at very high speed, which offers interesting prospects and could be an alternative for existing Vehicle-to-Vehicle communication. The 4G LTE standard has been set up for the train communication and the transmission of train control data. It supports many applications, such as video surveillance in trains and at stations, and passenger information services. The main core of 4G LTE standard is Orthogonal Frequency Division Multiplexing (OFDM). In fact, OFDM is a popular technique of signal modulation, which presents a promising prospect to achieve the requirements of modern wireless network systems. Furthermore, it provides high bandwidth efficiency and reliable signal transmission by dividing the available spectrum into group of closely spaced orthogonal sub-carriers, instead of transmitting a high data stream with a single carrier. The combination of MultipleInput Multiple-Output (MIMO) and Orthogonal Frequency Division Multiplexing (OFDM) technologies, known as MIMO-OFDM technology, has the advantage of enabling support of more antennas and providing higher data rates within an available transmission bandwidth [2]. Using deep sub-micron technology to build Heterogeneous Multi-Processor System-On-Chip (HtMPSoC) is one of the promising approaches. In the near future, it will be possible to use hybrid embedded platforms having several cores along with wide area of reconfigurable Logic Elements (LE) to implement very complex hybrid architectures to support complex applications. The Field Programmable Gate Arrays (FPGAs) such as the ZynqUltraScale+ from Xilinx, Cyclone V from Altera and SmartFusion from Micro-Semi are examples of existing HtMPSoC [3][4][5]. Ht-MPSoCs are composed of different kinds of cores and/or different hardware accelerators. In such systems, multiple processing elements (PEs) are used to satisfy the computational needs of the target application and to meet the power/performance constraints. Focusing on improving high performance and respecting the critical embedded design constraints, namely low energy consumption, we propose a new MIMO-OFDM implementation for 4G LTE systems using an FPGA-based Ht-MPSoC. This paper explores several architectures, evaluates their performances, resource utilization and power/energy consumption to find the architecture that gives the best trade-offs. Current embedded signal processing applications such as LTE MIMO-OFDM Tx/Rx includes
complex, specific and heterogeneous functions and tasks that require different kinds of processing units and suitable cores. Hence, efficient exploitation of the different sources of parallelism in the target application need the utilization of heterogeneous computing resources to achieve the required performance/power consumption trade-offs.
The main role of the IFFT is to transform the signal from the frequency domain to the time domain. Using Cycle Prefix (CP) technique is advantageous in multipath channel because the inter-symbol-interference (ISI) can be totally eliminated by appending a guard band to each block of the data.
The trend towards multi-core technologies on a single chip is giving a great potential of computational power and performance required for signal processing applications. The advantages of heterogeneous architectures result from many sources. The most important advantage stems from more efficient adaptation to the diverse application needs. In fact, many signal processing applications demand different levels of processing capabilities and energy consumption trade-offs which may be satisfied by using a heterogeneous multi-core architecture.
B. OFDM Receiver As presented in Fig. 1, OFDM receiver is the reverse process of OFDM transmitter, which includes serial-toparallel converter, FFT, OFDM demodulation, and parallelto-serial process. The incoming signal is produced to a serialto-parallel converter. After removing the CP, the received signal is sent to an N-point Fast Fourier Transform (FFT) to transform the time domain signal to frequency domain.
In this work, we first analyze the LTE MIMO-OFDM application. Then a set of FPGA-based multi-core heterogeneous architectures is explored to execute the LTE MIMO-OFDM application to achieve the required performance and energy efficiency. The implemented architectures are evaluated based on 3 standard modulation schemes, namely: Quadrature Phase Shift Keying (QPSK), 16-Quadrature Amplitude Modulation (16-QAM) and 64QAM modulations. The purpose here is to measure for each of these modulations the performances and energy consumption on each of multi processors architectures on our FPGA. The flexibility of the fabric of an FPGA is a valuable asset when performing repetitive computations for different modulation schemes. The remainder of this paper is organized as follows. Section II presents an overview of OFDM Tx/Rx processing blocks. Section III discusses related works on the embedded implementation of the LTE system. Section IV describes our proposed FPGA-based heterogeneous multi-core architecture. The experimental results to evaluate different architectures are discussed in Section V. Section VI concludes the paper with a brief outlook on future work. II. ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING (OFDM) In LTE OFDM modulation, the large bandwidth is divided into multiple parallel sub-channels called subcarriers, which are orthogonally transmitted under multipath environment. OFDM avoids inter-symbol-interference (ISI) by using a cyclic prefix technique [2]. The LTE OFDM transceiver system has two main modules: transmitter module and receiver module. Besides, the LTE OFDM is computationally efficient by using FFT and IFFT techniques to implement the modulation and demodulation functions. The block diagram of an LTE OFDM transceiver is shown in Fig. 1. A brief description of the different blocks composing the LTE OFDM system will be discussed in the next subsections.
Fig. 1. Block Diagram of OFDM system
In our work, we use fixed parameters as shown in Table I and only modulation schemes are modified in different cases. Table I shows the parameters used for the implementation of the LTE MIMO-OFDM system on FPGA. For a bandwidth of 10 MHz, the number of sub-carriers in the transmission bandwidth is 50. IFFT / FFT of size 1024 are usually associated with 10 MHz. We specify the 2x2 MIMO mode that enables multiple antenna transmissions. TABLE I. MAIN SYSTEM PARAMETERS FOR LTE BANDWIDTH OF 10 MHZ Parameter
Value
Tx antennas x Rx antennas
2x2
Channel Bandwidth (MHz)
10
Number of sub-carriers
50
FFT size
1024
OFDM symbols per slot
14/12 (Normal/extended)
Modulation type
QPSK/ 16-QAM/ 64-QAM
A. OFDM Transmitter The main components of the LTE OFDM transmitter are shown in Fig. 1. It consists of 4 processing blocks: serial-toparallel converter, OFDM modulation, Inverse Fast Fourier Transform (IFFT) and parallel-to-serial module.
For all modulation schemes, profiling results of the LTE MIMO-OFDM transceiver system show that IFFT & FFT functions are the most time consuming and critical functions in the LTE MIMO-OFDM Tx/Rx. Therefore, the implementation of these 2 blocks must be optimized to achieve better performance. To give more details about the system process, we present a task flow for our application.
The input OFDM symbols are first sent to the transmitter block serially. Then, a serial-to-parallel converter is used. The number of sub-channels defines how many bands the total spectrum is subdivided into. After that, they are sent to the IFFT module.
As shown in Fig. 2, the input OFDM symbols are represented by a matrix of size (600x14x2) of complex numbers. Then, sequential functions are computed to Pack data, add the direct current (DC) subcarrier and reorder the input data. An output structure data named “Tmp” is
generated. After that, the structure “Tmp” will be transmitted to the IFFT module. Then, we use the CP technique. Finally, the output data is sent to parallel-to-serial block. III. RELATED WORKS Several existing studies have evaluated the OFDM implementations based on computer simulation and/or theory results, but a few have evaluated the LTE MIMO-OFDM systems implemented on FPGA platform. The authors in [6] and [7] report simulation results for the OFDM Tx/Rx system modeled using VHDL language using the XILINX ISE tool. They used Matlab software to validate and compare the simulation results. However, no implementation based on FPGA hardware has been considered. The authors in [6] used 8 sub-carriers to simulate the OFDM system. This implementation is considered as a simple and basic implementation. Thus, it has the advantage of reducing complexity and processing time, but the implemented system presents a low spectral efficiency. In order to improve the spectral efficiency, the number of various sub-carriers can be increased [8]. In [8], the authors investigate feasibility and the concept of an OFDM system by varying some of its principal parameters to improve performance and computing speed. Input (600x14x2)
Input OFDM symbols
Pack data, add DC and reorder
Sequential Blocks
Tmp [28672] IFFT processing
Add Cyclic Prefix (CP) & Parallel to Serial converter
Output
Parallel Block
Sequential Blocks
Output [30720]
Fig. 2. Task flow of the LTE MIMO-OFDM Transmitter system
The authors in [9] and [10] propose efficient methods to be applied to implement FFT/IFFT algorithms in OFDM system. The main focus of these methods is to reduce the multiplicative complexity and the computational time of FFT/IFFT functions and then increase speed and performance of the OFDM system. However, the FPGAbased implementations of these works are performed using VHDL hardware description language and just evaluated at the simulation level. OFDM-based systems presented in [11], have been generated from Matlab and directly described in VHDL. The proposed work is only based on ModelSim tool to simulate the hardware model in VHDL. Developing and implementing pure HW systems present limits in terms of flexibility, in particular with the rising needs of the embedded signal processing applications. The authors in [12] explore and evaluate three different architectures implemented on the ARM processor, the embedded NEON engine and the Vectorblox MXP soft vector processor on a Zynq. They demonstrated that, the FPGA-based MXP soft vector processor can outperform the NEON hard vector processor in FPGA-based embedded systems. However, the power gain is relatively low. In addition, the different processing units are not exploited simultaneously.
The ARM big.LTTLE [13] is a recent heterogeneous computing architecture developed by ARM Holdings. This architecture can be found in recent smartphones and tablets, such as the Samsung Galaxy S5. It integrates two different core types with the same instruction-set architecture (ISA): a low-power processor cores (LITTLE) and powerful ones (Big). This architecture is considering as an attractive solution to improve the energy efficiency of mobile processors. There are also non-commercial efforts in the field of Ht-MPSoC. For example, Tumeo et al. [14] proposed a master-slave heterogeneous MPSoC consisting of two PowerPC processors, four Microblaze processors and DMA engines to prototype and evaluate real-time scheduling applications. Using C language provides flexibility to make changes to the system software easily and efficiently, which are a need for the LTE MIMO-OFDM system. To achieve this, we propose to explore and present flexible LTE MIMO-OFDM Tx/Rx designs, using hard-cores and soft-cores on reconfigurable platforms. IV. LTE PARALLALIZATION ON HETEROGENOUS ARCHITECTURE Our objective is to explore and evaluate a variety of heterogeneous architectural designs in order to select the best architecture, which provides the largest computational power with the maximum energy efficiency for the embedded LTE application. In the paper we focus on software implementation of the LTE OFDM, because it provides opportunities for scalability and flexibility to match the updates of next wireless standards. A. Proposed FPGA-based heterogeneous architecture The basic design of our proposed heterogeneous architecture is depicted in Fig. 3. The multi-core architecture is composed of both hard-core and soft-core processors. The main role of hard-core processors is to compute the sequential parts of the program, as indicated in Fig. 2. Whereas the soft-core processors considered as the processing elements (PEs), handle the parallel computations (IFFT/FFT functions of the LTE application). As illustrated in Fig. 3, the Processing System (PS) contains a dual core ARM cortex A9. In the other side, we have the Programmable Logic (PL) part of the FPGA, which has been used to synthesize and map Microblaze processors. The reconfigurable FPGA fabric includes 9 Microblaze soft-cores. Here, the number of implemented MicroBlaze processors is limited by the available FPGA logic resources. The interconnections between PL and PS parts are implemented using AXI4 buses. The communication and synchronization between the ARM and Microblaze processors are realized via a shared on-chip memory (BRAM). The designed heterogeneous multi-core architecture aims to take advantages of the datalevel parallelism existing in the LTE MIMO-OFDM application. Input data sets are transferred to each Microblaze using AXI Bus into shared memory for all configurations. When the ARM processor finishes writing the input data, a parametric number of Microblaze processors are activated by setting their flags and start executing their local program and store the resulting data in the shared memory. Here, each Microblaze has a 32 KB local. It computes the IFFT/FFT blocks on multiple input data simultaneously using data partitioning approach based on the execution speed of the implemented cores, as will be explained in the next subsection.
Several parameters related to modulation schemes will be discussed through the following experiments. For each type of modulation, we implement different configurations and execution times as well as resource usage and energy dissipation are measured. V. EXPERIMENTAL RESULTS
Fig. 3. Proposed Architecture: A heterogeneous multi-core system composed of the ARM Cortex A9 processor on the processing system (PS) and Microblaze processors on the programmable logic (PL)
B. Data partitioning approach For the experiments, we use a data partitioning approach based on the execution speed of the implemented processors to distribute the input data among the different processors. Let’s assume we have n ARM hard-core processors, m Microblaze processors and E elements (in our work, E is equal to the size of the structure “Tmp” presented in Fig. 2). The frequency of the ARM processor is x times higher than the frequency of the Microblaze. The size of data processed by each Microblaze is: =
E
.
Whereas, the size of data processed by the ARM processor is: =
E
.
Thus, the number of elements allocated to the ARM Cortex-A9 is x times more than that of the Microblaze. Fig. 4 gives an example of data partitioning.
ARM
17924
MB
MB
MB
MB
2687
Fig. 4. Example of data partitioning of the Tmp array for the (1-ARM, 4Mbze) configuration for ( 28672, 1, 4, 6.67)
In this work, we first evaluate the performances on a single ARM processor to profile the application. The profiling results show that IFFT and FFT functions of the LTE MIMO-OFDM Tx/Rx applications are the most time consuming functions. These functions consume respectively 60% and 50% of the total execution time respectively. The LTE transmitter and receiver applications are mapped to evaluate the design trade-offs. The LTE standard supports different modulation schemes. Since both types of processor cores share access to read/write data from/to memory, we need to lock/unlock the shared memory, preventing simultaneous access to data. For this purpose, we use test-and-set instruction for process synchronization in multi-core configurations [15].
In this paper, the different configurations of the architecture shown in Fig. 3 are implemented on the Xilinx XC7Z020 Zynq [16] on the Zedboard development platform using Vivado 2013.4 tool. Different blocks of the LTE MIMO-OFDM Tx/Rx system are implemented in C language. Our proposed architecture can be considered as an extension of the ARM big.LITTLE [13], which is an asymmetric multi-core system. In this work, the 'big' core is the ARM processor, which is used to compute the sequential parts of the program, as illustrated in Figure 2. Whereas, the 'LITTLE' core is the Microblaze processor. Hence, our proposed heterogeneous architecture can integrate two ‘Big’ cores and up to nine ‘LITTLE’ cores on a single on-chip. The execution times include the process of writing input data in the shared memory and collecting the resulting data, which is done on the ARM processor. A. Synthesis Results As shown in Table II, we use Microblaze soft-core with area minimal configuration for the LTE system, allowing our design to be built with less area usage. In this work, the ARM processor clock is configured to run at 667 MHz. The Xilinx Microblaze processors are running at 100 MHz. Table II. Resource utilization of a minimal implementation of Microblaze; (No Floating Point Unit, No MMU) Slice LUTs Slice Registers Block RAM
Used 1281 1259 2
Available 53200 106400 140
Usage % 2.40 1.18 1.42
Fig. 5 presents area results of implementing the different (ARM, Mbze) configurations on the Xilinx Zynq FPGA platform. The main criteria for area usage are Look-Up Tables (LUTs), the Block RAMs (BRAMs) resources and the slice registers of an FPGA. Occupation indicates the percentage of FPGA resources utilized by the different configurations. As illustrated in the Fig. 5, the configuration with one Microblaze processor consumes 9% of the available LUTs and 16% of the available BRAMs within the FPGA. The configuration with four Microblaze processors occupies 24% of slice LUTs, 10% of slice registers and 46% of block RAMs. We didn’t present the area occupation of the configurations with a single ARM and several Microblaze processors in the Fig. 5, because the area occupation is only considered for the reconfigurable part (i.e. soft-cores). The PS side that integrates a dual ARM cortex A9 cores is an ASIC part. As the Fig. 5 presents, it is interesting to note that the HW implementation of the configuration with 9 Microblaze processors consumes an important number of Block RAMs, about 96% of the available BRAMs. Here, the BRAM resources limit the number of MicroBlaze processors.
Fig. 5. Slice percentage occupation of different implementations measured on the Xilinx XC7Z020 Zynq for different Microblaze configurations (m={1,2,4,8,9})
Fig. 6 presents the power consumption of different heterogeneous configurations. In the experiments we use the Xilinx Xpower Estimator (XPE) tool to measure power consumption. This tool can take the number of ARM processor in use into consideration
Fig. 7. Execution time (ms) of LTE MIMO-OFDM_Tx on (n-ARM, mMbze) configurations (n={1,2}, m={1,2,4,8,9}) using different modulations {QPSK , 16QAM, 64QAM}
For the LTE MIMO-OFDM receiver, the execution time of the different configurations is depicted in Fig. 8. The (2ARM, 1-Mbze) configuration of QPSK modulation is about 3250 ms.
Based on the experimental results (Fig. 6), we can conclude that using a single ARM processor with several Processing elements (PEs) reduces the power consumption compared to the configurations with a dual ARM processor and different PEs.
Fig. 8. Execution time (ms) of LTE MIMO-OFDM_Rx on (n-ARM, mMbze) configurations (n={1,2}, m={1,2,4,8,9}) using different modulations {QPSK , 16QAM, 64QAM}
Fig. 6. Power consumption of different implementations measured on the Xilinx XC7Z020 Zynq platform for different (n-ARM, m-Mbze) configurations (n={1,2}, m={1,2,4,8,9})
B. Timing Performance In our experiments, different architectures have been designed to match the needs (energy efficiency) of the LTE MIMO-OFDM system. Different configurations are designed and explored as shown in Figure 7. Due to BRAMs resources constraint, only up to 9 Microblaze processors can be implemented on this FPGA. We evaluate the performance of different architectures in term of the FPGA-based execution time.
Fig. 8 demonstrates that the implementation of the (2ARM, 9-Mbze) configuration reduces the execution time. However, it consumes much more energy than the configurations with a single ARM (Fig. 10). From the experimental results of execution time presented in Fig. 8, we note that the (1-ARM, 9-Mbze) configuration achieves about a 4.4-fold speedup compared to the (1-ARM, 1-Mbze) configuration. Fig. 9 and 10 summarize the energy consumption of the LTE OFDM Tx/Rx applications. The energy consumed by the whole system is estimated by multiplying the average power by the execution time. Experiments presented in Fig. 9 and 10 confirm that the (2ARM, 9-Mbze) implementation consumes much more energy than the (1-ARM, 9-Mbze) implementation.
For QPSK modulation (Fig.7), the execution time for the (1-ARM, 9-Mbze) and (2-ARM, 9-Mbze) configurations are about 500 ms and 300 ms respectively. From Fig. 7, we note that the (2-ARM, 9-Mbze) configuration reduces the execution time by about 9% compared to the (2-ARM, 1Mbze) configuration, but with more increased power consumption (Fig. 6). For heterogeneous multi-core configurations, we can see a significant speed-up for both OFDM transmitter and receiver modules (Fig. 7 and 8).
Fig. 9. Energy consumption (in joules) of LTE MIMO-OFDM_Tx on (nARM, m-Mbze) configurations (n={1,2}, m={1,2,4,8,9})
Using a Microblaze soft-core with area minimal configuration (Table II) can reduce the power consumption of the system. In addition, the use of a small soft-core processor on a single FPGA chip, yields performance and energy consumption competitive with existing hard-core processors. From Fig. 9, we can see that for the 64QAM modulation, the configuration with (1-ARM, 9-Mbze) reduces the energy consumption by 1.8% compared to the (2ARM, 9-Mbze) configuration.
ACKNOWLEDGMENT This research is funded by the International Campus on Safety and Intermodality in Transportation (CISIT) the French-Tunisian project PHC-UTIQUE. The authors thank all these institutions. REFERENCES [1]
[2]
[3]
[4] [5] [6]
[7]
Fig. 10. Energy consumption (in joules) of LTE MIMO-OFDM_Rx on (nARM, m-Mbze) configurations (n={1,2}, m={1,2,4,8,9})
[8]
The different configurations with a single ARM processor are effective with regard to energy because they consume less power than the configurations with a dual-core ARM processor (Fig. 6). Based on the results, it should be noted that for the implemented configurations, increasing the number of soft-cores in the heterogeneous multi-core system enhances the efficiency of energy and increases the speed of data transmission.
[9]
Furthermore, the incorporation of small power-efficient processors and big high-performance cores has a significant impact on the energy consumption. For the considered application and from the experiments, we can conclude that the heterogeneous architecture composed of a single ARM and 9 PEs provides the best performance/energy consumption compromise.
[10]
[11]
[12]
[13]
[14]
VI. CONCLUSION OFDM is the most effective modulation in multi-carrier transmission. In this paper, parametric heterogeneous multicore architectures have been proposed and explored. The proposed system was implemented on a Xilinx Zynq-7000 FPGA, running the embedded LTE MIMO-OFDM application. Energy consumption is one of the main constraints in the embedded LTE OFDM application. Our experimental results demonstrate that with a heterogeneous multi-core architecture composed of one ARM and 9 Microblaze processors, we obtained a satisfying performance with low energy consumption. As extensions of the work, we project to study the integration of hardware accelerators in the heterogeneous architecture. We will also evaluate the efficiency of sharing hardware accelerators between ARM cores and between Microblaze cores. The whole chain of the 4G LTE will be executed and tested on different heterogeneous architectures using embedded GPU, such as the Mali GPU and Neon SIMD Accelerator.
[15]
LinoFigueiredo, Isabel Jesus, J. A. Tenreiro Machado, Jose Rui Ferreira, J. L. Martins de Carvalho, Towards the Development of Intelligent Transportation Systems, IEEE ITS Conference 2001. Weinstein.S. B, Ebert P.M, “Data Transmission by Frequency Division Multiplexing using the Discrete Fourier Transform”, IEEE Transactions on Communications, Vol-COM-19, pp. 628-634, Oct1971. Xilinx, “Zynqultrascale+ mpsoc, product selection guide.” http://www.xilinx.com/publications/prod mktg/ zynq-ultrascale-plusproduct-selection-guide.pdf, 2015. Altera Corp. http://www.altera.com, 2004. Micro-Semi, http://www.microsemi.com/products/fpga-soc/socfpga/smartfusion, 2015. Nilesh Chide, ShreyasDeshmukh, Prof. P. B. Borole, “Implementation of OFDM System using IFFT and FFT,” International Journal of Engineering Research and Applications (IJERA), Jan. –Feb. 2013. M.A. Mohamed1, A.S. Samarah1, M.I. Fath Allah2, “A Novel implementation of OFDM using FPGA”, IJCSNS International Journal of Computer Science and Network Security, November 2011. P. G. Lin, “OFDM simulation in MATLAB”, a senior project, faculty of California Polytechnic State University, San Luis Obispo, June 2010. ManjunathLakkannavar, Ashwini Desai, “Design and Implementation of OFDM using VHDL and FPGA,” International Journal of Engineering and Advanced Technology (IJEAT), Aug. 2012. Pradeepa M., Gowtham P., “Optimized Implementation of FFT Processor for OFDM system,” International Journal of Advances in Engineering & Technology (IJAET),vol. 3, pp. 429-441,May 2012. NADAL Jérémy, ABDEL NOUR Charbel, BAGHDADI Amer, LIN Hao, Hardware prototyping of FBMC/OQAM baseband for 5G mobile communication systems, IEEE International Symposium on Rapid System Prototyping, 2014. Soh Jun Jie, and NachiketKapre, “Comparing Soft and Hard Vector Processing in FPGA-based Embedded Systems”, International Conference on Field Programmable Logic and Applications, 2014. P. Greenhalgh. Big.LITTLE processing with ARMCortex-A15 & Cortex-A7: Improving energy efficiency in high-performance mobile platforms.http://www.arm.com/files/downloads/big LITTLE Final Final.pdf, Sept. 2011. A. Tumeo, M. Branca, L. Camerini, M. Ceriani, M. Monchiero, G. Palermo, F. Ferrandi, and D. Sciuto, “Prototyping pipelined applications on a heterogeneous fpga multiprocessor virtual platform”, in ASP-DAC 2009. D. Alistarh, H. Attiya, S. Gilbert, A. Giurgiu, and R. Guerraoui., “Fast randomized test-and-set and renaming”. In Proc. of 24th DISC, 2010.
[16] Zedboard. [Online]. Available: http://www.zedboard.org/.