Design and Implementation of a Scalable Channel Emulator for ...

11 downloads 0 Views 445KB Size Report
into most wideband wireless communication standards (e.g.. 802.11n, WiMax .... For example, frequency domain channel equalization [21] and multi-rate ... It is possible to extend frequency domain processing to channel emulation by ...... MEMS integrated antennas for adaptive MIMO systems,” IEEE Commun. Mag., vol.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Design and Implementation of a Scalable Channel Emulator for Wideband MIMO Systems Hamid Eslami, Student Member IEEE, Sang V. Tran, Ahmed M. Eltawil, Member IEEE

Abstract—Wireless channel emulation is becoming increasingly important, especially with the advent of Multi Input Multi Output (MIMO) systems, where the system performance is highly dependent on the accurate representation of the channel condition. In this paper, we compare the conventional Finite Impulse Response (FIR) based emulator versus performing the emulation solely in the frequency domain. We show that for Single Input Single Output (SISO) systems, FIR based emulators are computationally efficient but that the complexity rapidly becomes impractical for larger array sizes. On the other hand, frequency domain approaches exhibit a fixed initial complexity cost that grows at a reduced rate as a function of the array size, resulting in significant savings in complexity for higher order arrays. As an illustrative example of this approach, an FPGA architecture implementing a sample 3x3 MIMO system exhibits a resource savings of up to 67% over a similarly constrained FIR approach. The architecture is discussed in detail and implementation results as well as laboratory measurements are presented.

Index Terms— Channel Emulation, Wireless, MIMO, FPGA, OFDM, FFT, Frequency Domain.

I. INTRODUCTION

O

ver the past decade the combination of Multiple Input Multiple Output (MIMO) techniques with Orthogonal Frequency Division Multiplexing (OFDM) has proliferated into most wideband wireless communication standards (e.g. 802.11n, WiMax, LTE etc.). The resilience of OFDM to multipath fading and the high spectral efficiency promise of MIMO are the main reasons motivating this choice. Accurately quantifying the performance of such systems is non-trivial due to the fact that many of the favorable characteristics of MIMO techniques are highly dependent on the nature of the wireless channel. To gain performance benefits, advanced standards utilize a multitude of modalities that aim to maximize throughput based on the current channel condition. Thus, to fully characterize the performance of a system under all possible channel variations and modalities requires an unreasonably vast sample space of simulation channels. Furthermore, the complexity of the channel model grows quadratically with the size of the MIMO array. Manuscript received December, 2008. Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to [email protected]. This work was partially supported by the National Institute of Justice (NIJ), Department of Justice (DOJ) under grant number 2006-IJ-CX-K044. Authors are with University of California, Irvine. Author’s address: 516 F Engineering Tower, University of California - Irvine, Irvine CA 92697-3425, USA. Email : {heslami, transv, aeltawil}@uci.edu.

For example, an NxM MIMO array requires NM sub-channels for each channel realization. Finally, channel realizations are time varying and change from one coherence time to the next. These demanding requirements have prompted a compromise of utilizing relatively simpler channel models for analysis, while depending on wireless channel emulators to perform accurate high speed simulations/emulations that refine these analytical assumptions. Wireless channel emulation facilitates the test and validation cycles by replicating channel artifacts in a controllable and repeatable laboratory environment in real time. Typically, wireless channels are commonly emulated using time-varying Finite Impulse Response (FIR) filters. Different approaches such as Distributed Arithmetic (DA), reduced complexity multipliers based on alternative number systems (Redundant Number System, RNS) and Canonical Signed Digits (CSD) have been widely utilized in filtering approaches to date [1]-[4]. However each approach has its limitations as follows: 1- While the DA approach lends itself to efficient VLSI implementations, the bit serial nature of the operation and the need for large look up tables, limits its application to low bandwidth systems. 2- Reduced multiplier complexity approaches such as CSDbased systems depend on the fact that one of the multiplicands is known a-priori to replace multipliers by efficient shift and add operations. While this approach is widely used for filtering, it cannot be directly applied to channel emulation because the channel coefficients change every coherent time. 3- In general, FIR based systems typically suffer from a tradeoff between the number of filter taps per subchannel and the overall number of channels that could be supported. This stems from the fact that, typically, the computational resources (filter taps) are fixed in number, therefore as the number of sub-channels increase (higher array order) the number of taps used to represent each channel must be reduced. 4- Finally, the need to maintain a shift register structure for the FIR filter limits the delay spreads that could be supported due to the high area and power cost of using Flip-Flops as intermediate storage nodes. Thus, convolution-based architectures are very well suited for narrow-band, Single Input Single Output (SISO) systems with short delay spreads; however, the complexity scales quadratically with the MIMO array size and rapidly becomes impractical.

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

A review of current and proposed communication standards such as High-Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), and their evolutions such as ultra mobile broadband (UMB), thirdgeneration (3G) Long-Term Evolution (LTE), and WiMAX II, indicate a clear trend in industry towards supporting MIMO functionality. While most current proposals include provisions for 2x2 systems, it is clear that in the near future, support for higher order arrays will be required to enable higher capacity networks. In fact, it is not uncommon in literature to present systems that reach a MIMO order of 8x8 and higher [5]-[6]. This is made possible by advances at all levels of the communication platform. For example, advances at the device level allow unprecedented integration of computational resources at the baseband processing section, while monolithic integration of antennas as Micro Electro-Mechanical Systems (MEMS) components allow scalability to much higher order MIMO arrays than those currently achievable by discrete components [7]-[8]. To support these emerging trends and provide accurate characterization of the system performance over an ever increasing set of possible channel configurations, it is imperative to design new emulation platforms that are highly scalable and computationally efficient in terms of the array size. This reality places FIR based approaches at a disadvantage since the complexity of the platform scales poorly with an increase in the MIMO array size. To clearly quantify the complexity associated with higher order arrays, we present a study of two alternative approaches to emulation. The first approach is based on FIR temporal emulation, while the second approach performs emulation in the frequency domain. The main contributions of the paper are: 1- It is shown that frequency domain implementations exhibit a fixed initial complexity cost that grows at a reduced rate as a function of the array size resulting in significant savings in complexity for higher order arrays. Thus, contrary to common belief that frequency domain processing is computationally expensive, it will be shown that for higher order MIMO arrays, it is not only computationally efficient, but also highly modular and scalable by design. 2- Furthermore, it is shown that by performing the emulation in the frequency domain the classical problem of trading off the number of filter taps per channel versus the number of emulated channels is inherently resolved. 3- We report the implementation results for the frequency domain emulation and compare that to conventional time domain emulation. In particular, the architecture for a 3x3 MIMO system is presented and the resource analysis is reported, targeting a Xilinx Virtex-4 SX35 FPGA. The choice of a 3x3 system is limited by the I/O resources available on the FPGA board at hand. Due to the modularity of the design, it is straightforward to increase the array size for larger FPGA boards. To lower the complexity, Cordic-based FFT/IFFT engines are used

for spectral processing. A reduction of up to 67% in FPGA resources is obtained compared to a similarly constraint time domain implementation. 4- Finally, laboratory measurements confirming the functionality and performance of the emulator are presented. To the best of our knowledge, this is the first publication that presents a comprehensive design and implementation of a frequency based channel emulator for wide band MIMO systems. The paper is organized as follows: section II summarizes the time domain channel emulation and reviews academic as well as commercially available emulators. In section III complexity analysis and storage requirements of the proposed frequency domain emulation are presents and its advantages are highlights. The system architecture for the emulator is presented in section IV. Section V discusses the design specification as well as SNR analysis. The VLSI architecture of the proposed emulator is presented in section VI and also resource utilization for time and frequency domain emulators are reported and compared. Section VII presents laboratory measurements and evaluates the functionality and performance of the proposed emulator. Finally, section VIII concludes the paper. II. PRIOR ART Typically, the wireless channel is modeled as an FIR filter, where the Channel Impulse Response (CIR) is expressed as follows: L

h(t ) = ∑ c k δ (t − τ k )

(1)

k =1

where L represents the number of multi-path, ck is the complex coefficient and τ k is the delay associated with the kth path. The received signal (output of the FIR filter) is then represented as: r (t ) = s (t ) ∗ h(t ) (2) where * represents the convolution operation and s(t) is the transmitted signal. Coefficients ck are generated according to the channel profile and are updated every coherence time [9][11]. In case of MIMO, where multiple antennas are deployed at the transmitter and the receiver ends, each sub-channel is represented by a separate impulse response as described in (1). Table 1 presents a survey of a representative set of channel emulators. To present a balanced view, state of the art academic published emulators are compared to commercially available emulators. In [12] authors reported an ASIC implementation of a multi-access SISO FIR based channel emulator with 5 MHz bandwidth. Later, by replicating the same architecture, other prototypes for SISO and MIMO channel emulator with wider bandwidth were reported [13][20]. It is interesting to note that in all reported cases, there is a tradeoff between the average number of taps per sub-channel and the number of sub-channels supported. This is a limitation of the FIR approach and is attributed to the increase in computation resources required due to a quadratic increase in

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

TABLE 1, SURVEY OF CHANNEL EMULATORS Max. No. of Max. No. Ave. No. of Channels of taps taps/ channel

Reference

Year ψ

B.Wα MHz

Dao Nguyen Dung et al. [15] A. Dassatti et al. [14] M. Cui et al. [13] Olmos J. J et al. [12]

2006

20

16

2005

20

6

2004 1999

25 5

16 6

Max. Length of Impulse response

Delay Resolution

Max. Doppler KHz

NR

10 ns

NR

NR

NR

NR

NA 80 µs

NA 50 ns

NR 1

40 µs (delay spread) 2 ms 6.4 ms

25 ns

0.655

0.1 ns 10 ns

2 10

Academic Reported Systems 288 18

108

18

16 1 20 20/6 Commercially Available Systems 32 640 20

Azimuth ACE 400 2007 20 [35] Spirent SR5500 [34] 2006 12 4 96 24 Propsim C8 [33] 2004 50 16 384 24 Legend; NR: Not reported, NA: Not applicable ψ : Refers to the publication year for academic papers or the datasheet date for commercial systems. α : refers to the baseband bandwidth supported

the number of sub-channels supported when migrating to a higher order MIMO configuration. In [13], the authors focus on generating an accurate single tap using a modified Jakes model, the single tap sub-channel is then used to represent a 4x4 flat fading system. In [14] tapped delay line values are stored in memory and a Xilinx FPGA in conjunction with an ARM9 microcontroller are used to perform FIR based filtering. In [15] a total of 17 Xilinx Virtex 2 FPGAs are used to model a 4x4 (16 channels) uni-directional emulator utilizing FIR filter techniques. The need for such a large numbers of FPGAs to perform a 4x4 emulation clearly indicates the explosion in computational resources required when performing time based emulation and motivates investigating alternative approaches, one of which is frequency domain emulation. III. FREQUENCY DOMAIN PROCESSING Processing in the frequency domain offers a strong alternative approach to the time domain approach and in fact has been used extensively in many applications. For example, frequency domain channel equalization [21] and multi-rate filtering techniques [22] utilize frequency domain processing to achieve significant reductions in computational complexity. It is possible to extend frequency domain processing to channel emulation by applying the overlap-and-add method [23] to ensure equivalence between the linear convolution expressed by (2) and circular convolution created by spectrum multiplication. To emulate a channel with a delay spread of η in the frequency domain, an FFT of length N FFT is taken from a data vector of length N d with the condition that: N FFT = N d +

η Ts

point inverse FFT is then taken to generate the output signal in time domain with N IFFT = N d and the last N ds samples are stored in a separate memory. Thus (2) can be rewritten in frequency domain for an NxM MIMO system as follows: N

Rm ( f ) =

A.

Basic complexity comparison Processing in the frequency domain is typically perceived to be computationally intensive due to the large overhead associated with performing FFT/IFFT operations. While this is true for SISO systems, large savings in complexity are possible for MIMO systems due to the averaging of the complexity over multiple sub-channels and data points. Table 2 presents a basic complexity comparison between time domain and frequency TABLE 2, COMPLEXITY REPORT PER OUTPUT SAMPLE FOR TIME-DOMAIN VS. FREQUENCY DOMAIN EMULATORS

Mul.

Add.

Frequencyψ Complexity (Complex operations) 2P P N log 2 2P + M log2 P + N .2PM 2 2

Time NML

N.2P log2 2P + M .P log 2 P + N.2PM

NM(L-1)

2( N .2 P + MP)

NM(P-1)

Mem

Average complexity per output sample

Average Complexity = Complexity PM Mul.

multiplying the channel and data spectra, overlap-and-add is performed on the last and the first N ds samples of two consecutive data spectrum vectors to satisfy the linear convolution properties. An N IFFT -

Add.

η

(3)

) H m, n ( f ), 1 ≤ m ≤ M

where for an NxM system, Hm,n(f) is the spectrum of the channel between the nth transmitter and the mth receiver, Rm(f) is the spectrum at the mth receiver path, and Sn(f) is the spectrum of the transmit data from the nth transmit antenna. Sn, Hm,n and Rm are all vectors of length N FFT .

zeros. After

N ds =

n(f

n =1

where Ts is the sampling time; simply by

appending the N d -sample data vector by

∑S

Ts

Mem ψ

2N 1 1 ) log 2 P + N (2 + ) M 2 M 2N 2 (1 + ) log2 P + N (2 + ) M M

(1 +

2(1 +

2N ) M

Average Complexity = Complexity M NL

N(L-1) N(P-1)

A P point radix-2 FFT operation requires P/2*log2P complex multiplications and P*log2P complex additions.

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

domain emulation assuming a channel with a maximum impulse response length of P samples and L multipaths. Sampling frequency for both systems is Fs. The data vector length is chosen to be 2P and based on the above discussion, the FFT length is set to be 2P. IFFT is taken over P data samples to account for the overlap and add approach. Table 2 illustrates the total complexity to generate an output vector as well as the average number of complex additions, multiplications and storage required per output sample. The results are reported for a radix-2 implementation of FFT/IFFT decimation in time or frequency for the frequency domain emulation and an FIR filter with L taps for time domain. In the frequency domain, one FFT/IFFT operation results in P.M output samples, while in the time domain, M output samples are generated at any given time. Finally, the FFT and IFFT approaches are assumed to be double buffered to accommodate one buffer streaming or storing samples at the sample rate, while the other buffer is used for the FFT/IFFT operation. Table 3 presents a numeric example of complexity comparison for a 4x4 MIMO system with P=1024 and L=18. TABLE 3, COMPLEXITY PER OUTPUT SAMPLE FOR A 4X4 MIMO, P=1024, L=18 Average complexity per output sample Frequency Time Mul. 24 72 Add. 40 68

It is important to note that in terms of the average number of multiplications and additions per output sample, both systems exhibit a linear dependency on the array dimension, however the frequency approach has a fixed initial cost that is a function of the number of FFT points (2P) independent of the number of taps. This makes the frequency domain approach less attractive for SISO systems with short delay spreads, however this extra complexity is amortized over the entire array size for MIMO systems. On the other hand, the time domain exhibits a linear cost in terms of the array dimension but also depends on the number of taps supported. This is the reason why in time domain approaches, to achieve a fixed complexity there is always a tradeoff between the MIMO array dimensions and the number of taps supported. Finally, from a storage perspective, the average storage requirement per output sample is fixed for the frequency domain approach, while it grows linearly in the array size and depends on the delay spread in the time domain. B. Multiplier Complexity Typically for DSP systems, multipliers present a challenge since they are both power and resource hungry and directly impact the critical path of the design. For that reason, when extra information is available (such as knowing that one of the multiplicands are fixed), numerous simplification techniques are available to reduce the complexity of the multiplier to a series of shifts and adds (e.g. CSD systems). However, as discussed previously, since in the time domain emulation approach, the channel coefficients change every coherence time and the data is also variable, these techniques cannot be applied and full multipliers must be used.

From a frequency domain viewpoint and by referring back to Table 2, one can see that the multiplication cost consists of an overhead,

(1 +

2N 1 ) log 2 P , M 2

that is associated with the

FFT/IFFT operations and a cost, N (2 + 1 ) , that depends on the M

dimension of the MIMO array size, but does not depend on the channel taps. The first term grows at a slow rate of N with M

respect to array size while the second term grows linearly. This reality opens up an opportunity for optimization, since the multipliers used in the FFT/IFFT perform fixed rotations of the incoming data. These rotations can be implemented using a Cordic based approach [25] which simplifies the multiplication to a series of shifts and adds thus further reducing the complexity of a frequency domain approach. C. Cordic Based FFT In the Cordic algorithm [25], polar rotations due to the Twiddle factors ejα are performed by projecting such rotations to vectors in Cartesian coordinate system combined with additions and subtractions. To rotate a vector r = x + jy Cordic rotation is described by the Givens rotation detailed in (4): x ′ = x cos(α ) − y sin(α ) (4) y ′ = y cos(α ) + x sin(α ) And the resulting vector is r ′ = x′ + jy′ . Equation (4) can be rearranged as: x′ = cos(α )[ x − y tan(α )] (5) y′ = cos(α )[ y + x tan(α )] Where (x,y) are the projections of the original vector in the Cartesian coordinate system and (x’,y’) are the rotated version by a rotation angle of α. These equations can be further simplified by representing tan(α)=±2-i , thus the multiplication by the tangent term is reduced to a simple shift operation. Due to the symmetry of the cosine function (cos(α)=cos(-α)), the cosine multiplication reduces to a scaling factor irrespective of the direction of rotation. Arbitrary angles of rotation are obtainable by performing a series of successively smaller elementary rotations (with Q iterations) designed to satisfy a desired accuracy. Initially the signal is rotated by an integer multiple (m) of π/2, i.e. m*π/2, to reside in the same quadrant as α. Then by halving the value of the residual angle res1 = α − m π in 2 each iteration and performing rotations using the corresponding projected vectors in Cartesian coordinates in the direction determined by the residue sign, the equivalent ) rotation converges to α given by: π Q ) (6) α = m + ∑ resi

(

2

)

i =1

1 (7) ) 2i Based on the above discussion, a complex multiplier performing a fixed rotation can be replaced by Q complex res i +1 = res i − sign (res i ) * Arc tan(

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

adds/subtracts where the shifting operations can be performed in wiring and thus do not require extra hardware. Table 4 depicts the average complexity per output sample for Radix-2 and Radix-4 Cordic based emulators. Note the drastic reduction in the number of multipliers required and the linear increase in the number of addition operations. Furthermore, note that a Radix-4 Cordic based FFT butterfly requires 25% less resources as compared to a Radix-2 butterfly.

TABLE 4, COMPLEXITY REPORT PER OUTPUT SAMPLE FOR CORDIC-BASED EMULATORS Average complexity per output sample N Q (1 + ) Radix-2 Q 2N 2 ) (1 + )(1 + ) log2 P + 2 N (1 + 2 M M 3Q (1 + ) 3Q 2N Radix-4 8 (1 + )(1 + ) log2 P + 2 N (1 + 8 M M

2(1 +

Mem

N ) M

IV. SYSTEM ARCHITECTURE Fig. 1 illustrates the full system diagram of the channel emulator. As shown in the diagram the operations are partitioned between software and hardware. The non real time part which involves the channel generation is performed in software since it needs to be updated every coherence time. Alternatively, if channel sounding results are available, they can be used directly as the channel source. Typically, channel coefficients change with a relatively low frequency in the Hz or low KHz rate since the coherence time is linearly related to the Doppler frequency experienced by the mobile unit [26]. The model chosen for the MIMO channel emulation is based on the correlation (Kronecker) model presented in [9]. The Kronecker channel model is realized by multiplying a complex Gaussian i.i.d. matrix with the square root of the covariance matrices at both the transmitter and receiver sections as shown in (8):

H = Rrx

1/ 2

∫ exp{− jd .k}G(θ ) P(θ )dθ −π +π







cos( D. sin(θ ))G (θ ) P(θ ) dθ + j sin( D.sin(θ ))G (θ ) P(θ )dθ

−π

Finally, from a storage requirement point of view, Table 2 and Table 4 illustrate that on average, per output sample, the storage requirement of the frequency domain approach are fixed and set by the array size. The time domain approach, on the other hand, linearly depends on both the array size and the impulse response length P. It is important to note that frequency domain approaches have a clear advantage due to the fact that compact RAM structures can be used to store frames of FFT/IFFT operations. On the other hand, time domain structures, by design, require access to all storage nodes in a shift register fashion, to generate one output sample. This leads to a heavy cost in terms of both area and power since Flip-Flops are much larger that standard RAM cells.

Add



ρ=

=

D. Storage Requirements

Mul

matrix and Hi.i.d is the MxN identically distributed complex Gaussian matrix. The correlation coefficients (ρ) of the receive and transmit covariance matrices are computed based on (9) given below [9]:

H i .i .d Rtx

1/ 2

(8)

where for an NxM MIMO array, Rrx is the MxM receive covariance matrix and Rtx is the NxN transmit covariance

−π

(9) where d, k are the positional vectors of the Tx and Rx antenna array, D is the normalized separation between antenna array elements, G(θ) is the antenna radiation pattern, P(θ) is the power azimuth spectrum. These parameters can be configured by software for each antenna array realization. The impulse response for each sub-channel is computed and zero-padded prior to an FFT operation. The FFT length is determined to match the length of the FFT performed on the data vector. Once a block of frequency channel realizations are generated, they are transferred to SRAMs on the hardware for the emulation process. A 66MHz-32bit PCI bus is used to transfer the generated channel samples from the software to the hardware. For an NxM MIMO system with FFT length of N FFT , the average time it takes to transfer one set of channel realization (including all sub-channels) is TD =

N FFT * N * M ( µs ) and consequently Doppler frequencies 66

up to FD = 1 can be supported. For instance, for a 4x4 TD MIMO system with N FFT = 512 , FD = 8KHz which is well beyond the requirements for today’s communication systems. Additive White Guassian Noise (AWGN) is not part of the current emulator; however, it could be easily added in hardware immediately before the digital to analog convertor or in software at the receiver side and it does not affect the performance of the emulator. Further details on the design considerations and FPGA implementation are provided in the following sections. V. DESIGN CONSIDERATIONS The system diagram presented in Fig. 1 can be used for any type of channel emulation, however to focus the discussion and generate concrete VLSI implementation results, we will assume a case study of a MIMO channel emulator required to satisfy 802.11n channels. The emulation process takes place in real-time on the FPGA. In the emulator core, the computed frequency domain channel samples are downloaded to the platform. One channel interface is required for each subchannel. For example, for an NxM MIMO system, MN interfaces are needed. The channel interface is a memory of length 2P ( N FFT in general) to store the spectrum of the channel under emulation. The data interface on the other hand has a double buffered configuration where one buffer is storing samples at the sample rate, while the other buffer acts as the

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Hardware MN Units Complex Mult. And Adder

Tx M B U F

ADC

FFT

Rx N

M Tx

SRAM N M



Setup Registers SRAM 1

IFFT

1

B U F

DAC

Setup Registers

FFT

M SRAM N



NR

B U F

ADC

x

Tx 1 Rx 1 IFFT

B U F

DAC

1

Setup Registers

Setup Registers

SRAM 1

MN Channels Generated FFT Kronecker Channel Model Rtx ½

Hi.i.d.

{MxM}

{MxN}

PDP, AoA, AoD, AS, PAS, Type, Ant. Pattern, Ant. Spacing, etc.

0

Rrx½ {NxN}

Software

GUI

Fig. 1, System diagram of frequency domain channel emulator

FFT/IFFT holding buffer. The buffering size depends on the latency of the FFT/IFFT engines, and the required delay spread. The data interface is replicated for each input signal. Before presenting the details of the FPGA architecture, the following sections derive the dynamic range and quantization requirements of both the channel and the data interfaces. A. Dynamic range requirements for 802.11n The 802.11n standard (also known as Wi-Fi) is an OFDM based standard for high throughput applications that is inherently designed to support MIMO systems. The IEEE 802.11n task group specifies a set of six channels that encapsulate different environments ranging from a single tap channel to a multipath fading channel with 18 taps and a maximum delay spread of 1050ns [11]. In OFDM systems, carriers can add constructively creating a peak power that can be up to 9 dB above the average power. At the output of the channel emulator, the dynamic range of the signal is much wider in order to support Doppler fades that could produce down-fades in the order of -40 dB, and up-fades of +3 dB. Fig. 2 illustrates the dynamic range requirements of the signal. B. Data SNR analysis To find the optimum bit-width for the data path and evaluate the effect of FFT length on the emulator performance, the output of the fixed point implementation for the proposed

emulator is compared to the same architecture performed in floating point precision in MATLAB. A known data sequence (OFDM vector compliant with 802.11n) is used for the comparison. The channel to be emulated is 802.11n model ‘F’. Fig. 3 depicts the Signal to Noise Ratio (SNR) results for three different FFT length, N FFT =256, 1024, 4096 versus the data bit-width (D_Width), with the channel frequency samples quantized to 10 bits. Noise is defined as the difference between the floating-point model and the signal processed on the board. Due to the multiple truncation stages within the FFT Cordic architecture, an extended bit width of (D_Width+2) is used within the FFT/IFFT architecture. Fig. 3 shows that longer FFT length (larger N FFT ) results in a slightly lower SNR, 1.5 to 3 dB per step, due to quantization effects associated with an expanded number of iterations within the structure. This could be mitigated by extending the word length. Fig. 3 shows that, as expected, 5 to 6 dB gain in SNR is achievable by adding an extra bit to the data samples. At a D_Width of 14 bits with N FFT =256 the SNR is higher than 40 dB. Clearly, higher SNRs are achievable by extending the D_Width as indicated by the figure.

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

wtw

w+2

Fig. 2, Signal dynamic range for 802.11n channel model w+2

Fig. 4, Cordic architecture

Fig. 3, SNR vs. bit-width

C. Cordic based FFT Fig. 4 depicts the Cordic based Radix-4 FFT butterfly employed. Since FFT rotations are known a-priori, this information is used to unroll the iterative nature of the Cordic engine and perform the operation in one clock cycle. A total of 16 Cordic stages are implemented per rotation (Q=16). Furthermore, a single Cordic rotator is used to perform all three multiplications required for a Radix-4 FFT butterfly operation. Finally, the entire FFT/IFFT operation is performed serially via one Cordic engine. The core is operated at 4 times the sampling frequency. The core is pipelined to generate one output sample per sample per sample clock, thus maintaining real time operation via double buffering of the input and output vectors. The bit width of the FFT is extended to (D_Width+2) while the phasor rotator is represented in 24 bits. Table 5 reports the latency and slice count of the proposed FFT/IFFT architecture versus state of the art implementations. The proposed architecture exhibits minimum latency and a highly competitive slice count as compared to other approaches. TABLE 5, LATENCY AND SLICE COUNT FOR MULTIPLE FFT IMPLEMENTATIONS No. of slices Input Phasor Latency for 1024 bitbitin clock Radix-4 width width cycles Proposed 992 16 24 3121 Xilinx 1795 16 16 3446 12 16-bit 3714 16 16 NA multiplier [22] 3 Cordic engine 1420 16 16 NA [22] 1 Cordic engine 626 16 16 NA [22] 12 16-bit 2235 16 16 5440 multiplier [23] [24] 1639 13 13 5067

D. Computational Complexity As shown in Table 2, 3 and 4, the complexity of an FIR approach linearly increases as a function of either the array dimensions or the number of taps supported. On the other hand, the Cordic-based frequency approach has a fixed multiplier complexity which scales linearly with the array dimension and migrates most of the complexity into adder operations. Furthermore, there is inherently no dependency on the number of taps (channel impulses in the CIR). To illustrate the tradeoffs associated with using either approach, we define a complexity factor (C) which indicates the relative complexity of a multiplier when referred to an adder; i.e. the complexity cost of using a multiplier is equal to C*(the complexity cost of using an adder). Clearly C depends on the bit width used and the architecture of both the multiplier and adder. A reasonable value to assume is C between 8 and 16, depending on the word length used. This can be justified as follows; if the multiplications required are each T bit, then T add operations (each T bits) needs to be carried out by a shift and add multiplier. The latency and power consumption is equal to that of T cascaded adders. The word length throughout the design is different depending on the dynamic range required. We therefore assume that the complexity C as T ≤ C ≤ T . For the case that T is 16 bits, reasonable values 2 for C are between 8 and 16. Similar analysis can be carried out for different adder architectures or multiplicand length. Fig. 5 depicts the total cost in terms of equivalent adder operations needed for both time and frequency approaches (per output sample) versus the complexity factor C under the following assumptions: a) Number of Taps L=18, b) FFT length N FFT =256, c) Two MIMO array sizes are considered 2x2 and 4x4 d) Cordic rotations per stage Q=16. Clearly for all practical values of C the frequency domain approach is far more efficient than its time domain counterpart. Fig. 6 depicts the percentage reduction in equivalent adder operations for MIMO emulators when using the Cordic approach versus a regular FFT approach. Two cases are considered, namely a 2x2 and a 4x4 case. For small values of C, it is more beneficial to use a regular multiplier. In fact,

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

using the Cordic approach creates a penalty by increasing the number of operations required. However, for typical values of C (i.e. C>10), using the Cordic approach consistently results in savings in the net operations required. For example, for a nominal value of C=16, the 2x2 and 4x4 systems experience a savings of 22 and 27% respectively in the required number of equivalent additions when using a Cordic approach.

Fig. 5, Equivalent adder operations vs. complexity factor 60

Percentage Reducation in Eqivalent Adder Operations

A. Resource Utilization Table 6 outlines the detailed resource requirements for a SISO system drawn from post place and route report with a system clock that is 4 times the sampling rate. For comparison purposes, implementation results for an FIR-based channel emulator are also reported in Table 6. Xilinx core generator was used to generate all FIR filters for the time domain approach. To establish a fair comparison both sampling frequency and internal frequency are the same as the frequency domain approach. Both systems are used to emulate channels with channel duration of P.Ts where Ts is the sampling time. To this end, 2P-point FFT and P-point IFFT are used for frequency domain emulation. Furthermore, the FIR filter assumes a maximum of 18 taps per sub-channel (802.11n compliant). It is clear to see that for shorter channels, the time domain approach requires less resources, while for longer channels, the frequency domain approaches requires less resources.

4x4 System 2x2 System

40

20

0

-20

-40

-60

FPGA is running at 120 MHZ. For this specific implementation, the critical path found within the architecture is 5.495ns which implies a maximum system clock of 182Mhz.

0

5

10 15 20 25 Complexity Factor "C"

30

35

Fig. 6, Percentage reduction in equivalent adder operation for a regular multiplier approach versus a Cordic approach.

VI. FPGA IMPLEMENTATION A single Xilinx Virtex-4 SX35 FPGA was used as a target platform to implement the channel emulator. A data flow diagram for such a system is depicted in Fig. 1. For the FPGA used in this project, memory blocks are referred to as RAMB16/FIFO16 each of 18 Kbit storage capacity and DSP slices are referred to as DSP-48 consisting of a multiplier followed by an adder suited for Multiply-Accumulate (MAC) applications. The architecture is designed to be highly scalable. In general, any NxM MIMO emulator could be prototyped by instantiating N transmit elements, NM channel multipliers, M receive elements and appropriate connections. A transmit or receive element encapsulates the FFT/IFFT engine and the associated buffering. The input ADC generates 12 bit samples with a sampling frequency of 30 MHz, The internal data-width is set to 14 bits as discussed previously. The FFT and IFFT engines use the Cordic rotation core (with 16 bits resolution) as discussed in section V-D. Finally, a 14-bit DAC is used to generate the analog processed base-band signal. The system clock of the

Table 7 depicts the complexity of different MIMO configurations generated for the target Xilinx Virtex-4 SX35 FPGA in terms of equivalent gate count, while Fig. 7 represents the same data in a graphical format. It is important to note that the gate count cost of the frequency approach changes linearly with a very weak slope as a function of the array size. This is attributed to the efficient Cordic structure and the use of highly integrated RAMs for storage. On the other hand, the time domain exhibits a quadratic growth with the array size, due to the quadric increase in multipliers required to support extra sub-channels as well as the use of shift registers for storage. Note that in a 3x3 MIMO system with delay spread of N FFT =1024 samples, the frequency domain approach exhibits a savings up to 67% as compared to the time domain approach. It is also clear to see that higher MIMO array orders will exhibit higher savings ratios. Finally, it is important to note that the target FPGA (Xilinx Virtex-4 SX35) can only support up to a maximum of a 3x3 MIMO emulator using the FIR approach. This is extended to a possible 7x7 emulator utilizing the same FPGA if a frequency emulation approach is adopted. TABLE 6, RESOURCE REQUIREMENT REPORT FOR SISO CHANNEL EMULATION 18–tap complex Frequency FIR filter 16-bit FFT, 16-bit P= P= P= P= P= P= 64 256 1024 64 256 1024 Gate 94922 144074 340682 312728 319342 325792 countψ Slices 1987 5059 17347 1649 1835 2011 Slices FF 3612 9756 34332 2663 2991 3321 4-LUT 1386 1386 1386 2941 3299 3645 DSP 48 24 24 24 3 3 3 ψ : Gate count is a measure of logic capacity in terms of the number of 2-input NAND gates that would be required to implement the same number and type of logic functions [32].

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

TABLE 7, GATE COUNT REPORT FOR NXN CHANNEL EMULATION Time, 18-tap channel 4x4 3x3 2x2 1x1 4x4 3x3 2x2 1x1

P=64 P=256 P=1024 1518752 2305184 5450912 854298 1296666 3066138 379688 576296 1362728 94922 144074 340682 Frequency Domain 1267120 1293576 1319376 947301 967143 986493 629508 642736 655636 31728 319342 325792

snapshot showing the response of the implemented channel emulator for ρ = 0.96 and τ = 1µs with an input of white Gaussian noise. The results confirm the functionality of the emulators and agree with theory with a fading magnitude of 16.9 dB as shown in the figure. 2) MIMO Mode In the MIMO mode, a 3x3 system was tested. The input signal is an OFDM signal compliant with 802.11n standard with a generic MIMO configuration in which sub-channels are added randomly. Three CIR’s are picked as follows: h11 (t ) = δ (t ) , h21 (t ) = − ρ 1 e jϕ δ (t − τ 1 ) and 1

jϕ 2

3.5

x 10

P=1024 T ime P=256 T ime P=64 T ime P=1024 Freq. P=256 Freq. P=64 Freq.

3 2.5 Eq. Gate Count

h31 (t ) = ρ 2 e δ (t − τ 2 ) . The resulting spectrum at the receiver is a faded version of the transmit signal as illustrated in Fig. 9. The frequency selective fading nature of the channel is clear in this spectral snapshot.

6

2 1.5 1 0.5 0

1

2 Array Dimension

3

Fig. 7, Gate count vs. N for NxN channel emulation

VII. Laboratory tests In this section, we present laboratory tests performed on the implemented 3x3 FPGA prototype. Two types of tests are presented; first functional tests are used to illustrate the general functionality of the system, where actual spectrum analyzer snapshots of SISO and MIMO systems are presented. The performance of the system is then quantified by performing SISO and MIMO tests utilizing the 802.11n (model ‘F’) channel model as the emulated channel. Theoretical and emulated BER versus SNR curves are compared to illustrate the impact of implementation loss and quantization noise.

Fig. 8, Two-ray model transfer function

A. Functional Tests 1) SISO Mode In the SISO case, a simple two ray model is emulated to generate a well know transfer function as follows: h (t ) = δ (t ) + ρe jϕ .δ (t − τ ) (10)

H ( f ) = 1 + ρe jϕ .e j 2 πfτ

(11) where h(t) is the CIR of a wireless channel with two multipaths one at delay zero with unity amplitude and the jϕ

other at delay τ with a complex coefficient of ρe . H(f) is the transfer function of the such channels and has a periodic behavior with period T = 1τ and the magnitude ranging between 1 + ρ to 1 − ρ . Fig. 8 depicts the spectrum analyzer

Fig. 9, MIMO channel spectrum

B. Performance Measurements In this class of tests the channel is selected to conform to the 802.11n model ‘F’ channel with 18 taps. The sampling frequency is chosen to be 40 MHz. The data vector is generated with 1024 subcarriers with a cyclic prefix of 128. Two modes were tested, namely a SISO mode with 8PSK and a 3x3 MIMO mode using Space Time Block Coding (STBC) of rate ¾th. To maintain fairness in comparing the SISO and MIMO modes, a sixteen point quadrature amplitude modulation (16-QAM) system is used in the MIMO mode thus achieving the same rate as the SISO 8-PSK system. SNR is

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

defined as the transmit SNR divided by the noise at the receiver. 1) SISO Mode Fig. 10 depicts the performance of the system for different bit width of the data path versus the theoretical floating point realization in MATLAB. For 10 bits the performance is dominated by the quantization and implementation noise floor which becomes evident for SNR > 20 dB. Clearly, no improvement in the signal SNR improves performance. At 12 bits the signal SNR at which the noise floor impact becomes evident is pushed to > 30 dB. At 14 bits the emulator performance closely matches the ideal performance. 2) MIMO Mode In the MIMO mode, a 3x3 STBC system was tested. A rate ¾th STBC code is used with a 16-QAM system to maintain the same rate as the SISO system. The performance of this implementation is depicted in Fig. 11. The impact of diversity due to the STBC mode is clear by the increased slope of the SNR curve. Trends similar to the SISO mode can be observed. At 10 bits, performance is dominated by quantization and implementations losses that causes a flooring behavior that is clearly evident for SNR> 20dB. The 12 bit and 14 bit implementations closely follow the theoretical curves, with 14 bits giving the least loss from ideal. 10

10

MATLAB Floating Point bit width = 10 bit width = 12 bit width = 14

-1

This paper presented wireless channel emulation in frequency domain as an alternative to conventional FIR based approaches. The computational complexity in terms of additions and multiplications per output sample is characterized and compared with that of conventional time domain channel emulation. It is shown that emulation in the frequency domain exhibits an initial fixed computational cost due to Fourier transforms that grows at a reduced rate as a function of the MIMO array size, and is independent of the number of channel taps. For larger MIMO arrays with long channel impulse responses, frequency domain emulation becomes more computationally efficient than FIR based systems. An efficient VLSI implementation is proposed and its implementation results are reported. As an illustrative example of this approach, an FPGA architecture implementing a sample 3x3 MIMO system with channel impulse length of P=1024 exhibits a resource savings of up to 67% over a similarly constrained FIR approach. The architecture is discussed in details and implementation results as well as laboratory measurements are presented. REFERENCES [1] [2]

L = 1024, 1x1 STBC system

0

VIII. CONCLUSION

[3]

SER

[4] 10

-2

[5] 10

10

-3

[6] [7]

-4

-10

-5

0

5

10 15 Transmit SNR

20

25

30

35

Fig. 10, SISO system performance 10

10

[8]

L = 1024, 3x3 STBC system

0

MATLAB Floating Point bit width = 10 bit width = 12 bit width = 14

-1

[9] [10]

SER

10

10

-2

[11]

-3

[12] 10

-4

[13] 10

10

-5

[14]

-6

-5

0

5

10

15 20 Transmit SNR

25

30

35

S. A. White, “Applications of distributed arithmetic to digital signal processing: A tutorial review,” IEEE Acoust., Speech, Signal Processing Mag., pp. 4–19, July 1989. R. A. García, U. Meyer-Bäse, A. Lloris, and F. J. Taylor, “RNS implementation of FIR filters based on distributed arithmetic using fieldprogrammable logic”, Proc. IEEE Int. Symp. on Circuits and Systems, vol. 1, pp. 486–489, Jun. 1999. M. A. Soderstrand, and R. A. Escott, “VLSI implementation in multiplevalued logic of an FIR digital filter using residue number system arithmetic,” IEEE Trans. on Circuits System., vol. CAS-31, pp. 5-25, Jan. 1986. S. He, and M. Torkelson, “FPGA implementation of FIR filters using pipelined bit-serial canonical signed digit multipliers,” Custom Integrated Circuits Conf., pp. 81–84, 1994. D.W. Bliss, K.W. Forsythe, A.O. Hero, and A.L. Swindlehurst, "MIMO environmental capacity sensitivity," Asilomar Conf. on Signals, Systems and Computers, vol.1, pp.764-768, 2000. A. S. Behbahani, R. Merched, and A. Eltawil, "Optimizations of a MIMO relay network," IEEE Trans. on Signal Processing, vol.56, no.10, pp.50625073, Oct. 2008. B. A. Cetiner and H. Jafarkhani et al., “Multifunctional reconfigurable MEMS integrated antennas for adaptive MIMO systems,” IEEE Commun. Mag., vol. 42, pp. 62–70, Dec. 2004. B. A. Cetiner, E. Sengul, E. Akay, E. Ayanoglu “A MIMO system with multifunctional reconfigurable antennas” IEEE Antennas and Wireless Propagat. Letters, Vol. 5, pp. 463-466, Dec. 2006 L. Schumacher, K.I. Pedersen, and P.E. Mogensen, “From antenna spacing to theoretical capacities – guidances for simulating MIMO systems,” Proc. of Personal, Indoor and Mobile Radio Comm., pp. 587-592, Sept. 2002. A. A. Saleh, and R. A. Valenzuela, “A statistical model for indoor multipath propagation,” IEEE Journal on Selected Areas in Comm., vol. SAC-5, no. 2, pp. 128-137, Feb. 1987. V. Erceg, L. Schumacher, et al. “TGn channel models,” IEEE 802.1103/940r4, May 2004. J.J. Olmos, A. Gelonch, F.J. Casadevall, and G. Fermenias, “Design and implementation of a wide-band real-time wireless channel emulator,” IEEE Trans. on Vehicular Technology, vol. 48, no. 3, pp. 746-764, May 1999. M. Cui, M. Hidekazu, and A. Kiyomichi “FPGA implementation of 4x4 MIMO test-bed for spatial multiplexing systems,” Proc. of Personal, Indoor and Mobile Radio Comm., vol. 4, pp. 3045–3048, Sept. 2004. A. Dassatti, G. Masera, M. Nicola, A. Concil, and A. Poloni, “high performance channel model hardware emulator for 802.11n,” Proc. of Int. Conf. on Field-Programmable Technology, pp. 303- 304, Dec. 2005.

Fig. 11, MIMO STBC system performance

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

[15] D. N. Dung, et al., “Implementation and evaluation of 4x4 MIMO fading simulator considering antenna characteristics” Int. Conf. on Comm. and Electronics, pp. 472-477, Oct. 2006. [16] P. Murphy, F. Lou, A. Sabharwal ,and J. P. Frantz, “An FPGA-based rapid prototyping platform for MIMO systems” Proc. of Asilomar Conf. on Signals, Systems, and Computers, vol. 1, pp. 900-904, Nov. 2003. [17] C. Mehlfuhrer, F. Kaltenberger, M. Rupp, and G. Humer, “A scalable rapid Prototyping system for real-time MIMO OFDM transmission,” Proc. of the 2nd EE/EURASIP Conf. on DSP enabled Radio, Sept. 2005. [18] M. Wickert, and J. Papenfuss, “Implementation of a real-time frequency selective RF channel simulator using a hybrid DSP-FPGA architecture,” IEEE Trans. on Microwave Theory and Techniques, vol. 49, pp. 1390– 1397, Aug. 2001. [19] M. Nicola, A. Dassatti, G. Masera, A. Concil, and A. Poloni. “Mixed hardware-software test-bed for IEEE-802.11n,” Proc. of Int. Conf, on Comm,, June 2006. [20] J. Kolu and T. Jamsa. “A real-time simulator for MIMO radio channels,” Proc. of Wireless Personal Multimedia Comm., vol. 2, pp. 568–572, Oct. 2002. [21] D. Falconer, S. L. Ariyavisitakul, A. Benyamin-Seeyar, and B. Eidson, “Frequency domain equalization for single-carrier broadband wireless systems,” IEEE Comm. Mag., vol. 40, no. 4, pp. 58-66, Apr. 2002. [22] J. J. Shynk, “Frequency-domain and multi-rate adaptive filtering,” IEEE Signal Processing Mag., vol. 9, no. 1, pp. 14-37, Jan. 1992. [23] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, “Discrete-time signal processing,” second edition, Prentice Hall, p. 623 1999. [24] H. Eslami, and A. M. Eltawil, “A scalable wideband channel emulator for broadband MIMO systems,” Proc. of Int. Conf. on Comm., June 2007. [25] Y. Hu, “Cordic-based VLSI architecture for digital signal processing,” IEEE Signal Processing Mag., pp. 16-35, 1992. [26] J. G. Proakis, “Digital communications,” fourth edition, McGraw Hill, pp 801-810, 2001 [27] T. Sansaloni, A. Pe´rez-Pascual and J. Valls, “Area-efficient FPGA-based FFT processor,” Electronics Letters, Vol. 39, No. 19, Sept. 2003 [28] I .S. Uzun, A. Amira, and A. Bouridane, “FPGA implementations of fast Fourier transforms for real-time signal and image processing,” IEE Proc. Vision, Image and Signal Processing, Vol. 152, No. 3, June 2005 [29] H. Ozcelik, M. Herdin, H. Hofstetter, and E. Bonek, “A comparison of measured 8x8 MIMO systems with a popular stochastic channel model at 5.2 GHz,” in Proc. Int. conf. on telecomm., vol. 2, pp. 1542- 1546, Feb. 2003. [30] Amphion Semiconductor Ltd. 1024 Point Block Based FFT/IFFT. Apr. 2002, Available: [31] http://www.amphion.com/signal.html [32] Gate Count Capacity Metrics for FPGAs, Xilinx Application Notes, Available:

Ahmed M. Eltawil (S’97-M’03) received the B.Sc. and M.Sc. degrees (with honors) from Cairo University, Egypt, in 1997 and 1999, respectively, and the Doctorate degree from the University of California, Los Angeles, in 2003. Since 2005, he has been with the Department of Electrical Engineering and Computer Science, University of California, Irvine, where he is currently an Assistant Professor. He holds the title of Henry Samueli Faculty Fellow and is the Director of the Wireless Systems and Circuits Laboratory. His current research interests are in digital circuit and signal processing architectures with an emphasis on communications systems, where he has published more than 50 technical papers on the subject, including four book chapters. He is a recipient of several research and service awards including being a co-recipient of a best paper award at ISQED 2006. Dr Eltawil has been on the technical program committees for several workshops, symposia and conferences in the area of VLSI and system design, including the IEEE International Conference on Computer-Aided Design (ICCAD), among others. Since 2006, he has been a member of the Association of Public Safety Communications Officials (APCO) and has been actively involved in expert policy panel discussions towards the applications of cognitive, low power, software defined technology for critical first responder communication networks. Dr. Eltawil has held several industry positions including being the director of VLSI Engineering at Innovics Wireless (2000-2003), where he led the team to deliver the first reported diversity enabled third generation W-CDMA mobile transceiver system on a chip. From 2003-2005, he was a partner at Silvus Communications, where his work focused on designing scalable Multi-InputMulti-Output VLSI architectures.

http://www.xilinx.com/support/documentation/application_notes/xapp059.pdf

[33] Elktrobit Propsim C8, Available: http://www.propsim.com/index.php?2029(URL) [34] Spirent SR5500, Available: http://www.spirent.com/index.cfm(URL) [35] Azimuth ACE-400NB, Available: http://www.toyo.co.jp/azimuth/ace-400nb.html(URL) Hamid Eslami (S’06) received his B.S from Electrical and Computer Science Department of University of Tehran, Iran in 2003 with concentration in Communication systems. He joined the EECS department of University of California, Irvine in 2005 and received Masters degree from University of California, Irvine in 2007. He is currently pursuing a doctoral degree at the same university. His research interests include design of efficient VLSI architectures for broadband wireless systems. Sang V. Tran received a B.S. (summa cum laude) in 1987, and a M.S. in Electrical Engineering in 1988 both from the University of California, Los Angeles. From 1999 to 2005, he was with Broadcom Corporation where he led the development of the programmable audio solution for cable/satellite setup boxes. His research interest is in the area of SoC design of wireless communications systems.

Copyright (c) 2009 IEEE. Personal use is permitted. For any other purposes, Permission must be obtained from the IEEE by emailing [email protected]. Authorized licensed use limited to: Univ of Calif Irvine. Downloaded on October 1, 2009 at 18:45 from IEEE Xplore. Restrictions apply.

Suggest Documents