Performance and Implementation of Clustered-OFDM for Wireless ...

Performance and Implementation of ClusteredOFDM for Wireless Communications Babak Daneshrad, Leonard J. Cimini, Jr., Manny Carloni, Nelson Sollenberger Abstract: An elegant means by which high-speed burst wireless transmission can be accomplished with small amounts of overhead is through a novel technique referred to as clustered-OFDM [5]. By using OFDM modulation with a long symbol interval, Clustered-OFDM overcomes the complex and costly equalization requirements associated with single carrier systems. Moreover, the need for highly linear power amplifiers typically required in OFDM systems is alleviated through the use of multiple transmit antennas combined with nonlinear coding. The clustering technique also leads to a natural implementation of transmit diversity. This paper reports on preliminary results on the performance of a clustered-OFDM system as well as the design and implementation of a clustered-OFDM transmitter. The prototype transmitter can deliver 7.5 Mbps, and it is expected that this data rate could be easily tripled with existing technology in a second generation system.The paper also describes the architectural trade-offs made in order to reduce the hardware complexity of the boards as well as some experimental results showing the operation of the transmitter.

1. Introduction and Basic Concepts The need to deliver high speed packetized data over frequency selective wireless channels for applications such as wireless LANs, third generation cellular systems or future PCS systems has inspired researchers to consider a variety of system scenarios. In the case of wireless LANs [1], two major schools of thought exist. The first attempts to overcome the multipath effects of the channel by combining equalization with either QPSK or GMSK modulation [2][3][4], while the second uses multi-carrier OFDM modulation with a long enough symbol interval to eliminate the need for equalization at the receiver [5] [6] [7]. Designers of equalizer-based systems generally avoid the large complexity associated with the recursive least squares (RLS) update algorithms, and use the simple, but slow converging, LMS algorithm. The consequence of this choice is to burden the system with a relatively long training interval, thus reducing the overall system efficiency. Simulations show that the LMS algorithm can take anywhere from 100 to 1000 symbols to converge depending on the channel [4]. The use of OFDM in packet based communications over frequency selective channels serves a dual purpose. First, the need for a training interval is eliminated due to the absence of equalization. This allows OFDM packets to be transmitted and received with relatively little overhead. Second, the hardware complexity of the baseband processing engine is significantly reduced. The price paid for this simplification is the need for a highly linear and inefficient transmit power amplifier that can accommodate the large peak-to-average power (PAP) ratio associated with an OFDM waveform. Nonetheless, OFDM based systems are in use for digital audio broadcasting (DAB), and Asymmetric digital subscriber (ADSL) services. They have also been proposed for future cellular and PCS applications [8] [9]. The clustered-OFDM system, reduces the PAP ratio through the use of nonlinear coding and clustering. Moreover, the hardware simplicity of the system is demonstrated by the ease with Performance and Implementation of Clustered-OFDM ...

1

B. Daneshrad, L. J. Cimini, M. Carloni, N. Sollenberger

Cluster-1 Encoder

serial data 7.5 Mbps

S/P

Cluster-2

Oversampled FFT + Shaping + Guard Intvl

Coding for PAP Reduction

D/A

P/S

ej(2πfct)

Cluster-3 Cluster-4

Transmitter

RF/ IF Stage

A/D

Buffer

Synch Detect

FFT

Receiver

P/S

tan-1 Table Lookup

Differential Phase Detection

S/P

PAP Decoder

Z-N

Figure 1, a) Clusterd OFDM transmitter, b) A possible receiver architecture.

which the prototyped transmitter reported in this paper was implemented. The wire-wrapped prototype took a short time to complete, and can deliver a data rate of 7.5 Mbps. It is expected that the data rate could be tripled without changing the board architecture simply by realizing the transmitter on a printed circuit board (PCB). Figure 1 shows the block diagram of a clustered-OFDM system. The input data stream is first encoded to allow for error/erasure correction in the receiver. The transmit path is then divided among M (M=4 in Figure 1) parallel clusters (sections). Each cluster transmits N/M adjacent subchannels (modulated carriers) over a separate antenna, where N is the total number of subchannels to be transmitted and N/M = 7 for our prototype implementation. Each of the N subchannels is thus a narrowband signal, and as in the case of OFDM, N should be chosen so that the channel can be assumed flat for each subchannel. The clustering of the tones in this manner has advantages of its own. First, the PAP ratio is reduced by 10log(M); second, the size of the table needed for the nonlinear coding is significantly reduced; and third, the transmission of different clusters on separate antennas results in independent fading on each cluster. With the use of error correction coding across frequencies and a minimal amount of information from the receiver regarding the relative performance of the clusters, the clustering approach can result in an effective means for realizing transmit diversity [5]. The advantage of multicarrier modulation combined with transmit diversity for wireless data transmission have been demonstrated in [10] for the extreme case of one carrier per transmit antenna, corresponding to clusters of one carrier each. Moreover, the advantages of transmit diversity over receive diversity can be appreciated by considering the complexity of the RF circuitry needed to implement diversity at the receiver. The remainder of this paper is organized as follows. Section 2 presents the performance results of a clustered-OFDM system for different channel assumptions as well as different system configurations. Section 3 provides a detailed description of the implemented clustered-OFDM transmitter, followed by Section 4 which presents measurement results obtained from the prototype system. The paper is then concluded in Section 5.

2. Clustered-OFDM Performance The results presented in this section assume N subchannels each of bandwidth 1/T and separated by 1/T, where T is the symbol interval for the individual subchannels. We also assume no ISI Performance and Implementation of Clustered-OFDM ...

2


within a given subchannel. That is, the individual subchannels are narrow enough so that the only effect of multipath is flat fading in each subchannel. In general, even though the multipath channel has been assumed flat across a given subchannel, across the entire multicarrier signal bandwidth, the multipath channel is frequency selective. Thus, for any one realization of the multipath channel, some subchannels will be good (it will have sufficient received power to meet the bit error probability objective) while others will be bad. Therefore, a reasonable measure of performance can be computed as follows: For a particular realization of the multipath channel, an equivalent SNR is computed for each subchannel and the number of subchannels meeting some SNR (or, equivalently, some bit error probability) objective is determined. To be more specific, for a particular realization of the multipath channel an effective received SNR is computed for each subchannel, n ε [1,N]. The fading is assumed to be flat in each subchannel, so that the received SNR in the nth subchannel is SNR n, rec = SNR Pb, AWGN P m arg in H ( f n )

2

(1)

where |H(fn)|2 represents the attenuation due to the flat fading on each subchannel (with E[|H(f)|2] = 1). SNRPb, AWGN is that ratio required to achieve the target bit error probability, Pb, in additive, white Gaussian noise (AWGN). Pmargin is the fade margin, and is the excess power provided above that required to meet the Pb objective in AWGN. Then, by considering many multipath channel realizations, we can compute the number of times a given number of bad subchannels occurs out of the total number of subchannels. From this, we can also evaluate the benefits of coding and clustering. If no coding is used, a clustered system (i.e., multiple transmit antennas) is worse than a nonclustered system (i.e., a single antenna). For no coding and zero delay spread (so the multipath channel is flat over the entire bandwidth on a given antenna), the probability that a given block is good (i.e., no subchannels are bad) is simply p = exp (-1/Pmargin). Therefore, for a single antenna, the probability that the block is bad, which we will call the outage, is Pout,1 = 1-p. Similarly, for M antennas, Pout,M = 1 - pM > 1-p = Pout,1. For example, for a 20-dB fade margin, p = 0.99. Therefore, for one antenna, the outage is 0.01, while, for four antennas, the outage is 0.04. More n i n–i generally, P ( i good antennas and n-i bad antennas ) =  i  p ( 1 – p ) . On the other hand, with heavy to moderate coding the clustered approach can provide some advantage. To obtain an estimate of the benefits of coding, we simply count the number of bad subchannels remaining after correcting some number of subchannels and then recompute the outage. Results are given in Figure 2 (flat fading on each antenna), Figure 3 (two-ray power delay profile) and Figure 4 (exponential power delay profile). The outage is plotted versus the number of frequencies which are being corrected in the decoder. In the latter two figures, the rms delay spread, τrms, is chosen so that τrms Rbaud = 0.1xN. The target bit error probability is Pb = 10-8. Results are shown for different margins and two partitions: one antenna with 32 tones and 4 antennas with 8 tones on each. Performance and Implementation of Clustered-OFDM ...

3


Figure 2, Outage versus the number of frequencies which are being corrected in the decoder. Flat fading case.

Figure 3, Outage versus the number of frequencies which are being corrected in the decoder. Two-ray power delay profile with the rms delay spread, τrms, chosen so that τrmsRbaud = 0.1xN, where N is the total number of tones.

From the plots, the benefits of coding are apparent for either antenna/tone arrangement, especially when the delay spread is significant (or, equivalently, more potential frequency diversity). In particular, for a two-ray power delay profile, for a 1% target outage, a single antenna system with no coding requires about 27 dB of fade margin. However, if 8 frequencies can be corrected (for example, 1/2-rate Reed-Solomon code, with error correction, or a 3/4-rate code with erasure correction), this outage can be achieved with only 17 dB. This is comparable to the improvement which would be obtained with ideal selection diversity. Of course, this is at the expense of reduced bandwidth efficiency. The exponential profile is much more benign and much less coding is needed, as shown in Figure 4. Obviously, in a flat fading environment (see Figure 2), for a single antenna system, coding is not of much use since the 32 tones are either all good or all bad. In addition, for low-bit-error probability situations, with moderate coding, the clustered approach can

Performance and Implementation of Clustered-OFDM ...

4


Figure 4, , Outage versus the number of frequencies which are being corrected in the decoder. Exponential power delay profile with the rms delay spread, τrms, chosen so that τrmsRbaud = 0.1xN, where N is the total number of tones.

provide significant improvements in performance over the single-antenna approach. For example, in Figure 3, with 20-dB of margin, more than an order of magnitude improvement in outage can be achieved over the single antenna system if only 4 frequencies are corrected. The improvements are less substantial for less margin (i.e., higher intrinsic outages) and are negligible for an exponential profile. In Figure 5, we show the number of frequencies which need to be corrected to achieve a 1% outage versus the normalized rms delay spread (i.e., τrms Rbaud). Results are shown for a 10-dB fade margin, two-ray and exponential power delay profiles, and the previously considered two partitions of antennas and tones (i.e., one antenna with 32 tones and 4 antennas with 8 tones on each). As expected, the larger delay spread environment provides more frequency diversity and, therefore, less coding is required to obtain the desired performance. In addition, the clustered approach is uniformly more efficient than the single-antenna configuration. The clustered OFDM approach also facilitates dynamically assigning clusters to antennas (which we term cluster switching) and provides a significant improvement in the outage performance, as we show in the next section. With multicarrier transmission, other impairments such as carrier frequency offset must also be handled carefully. However, we can usually choose the subchannel wide enough to minimize the effects of this impairment [11]. 2.1 Performance Improvement with Cluster Switching It is well-known that the performance of OFDM can be improved (that is, more bit rate achieved for a given bandwidth) by matching the transmitted signal to the multipath channel frequency response [12]. Similarly, we can improve the performance of clustered-OFDM by optimally assigning a given cluster to a particular antenna. If a feedback channel is available, we can learn which subchannels (tones) are bad and switch a bad cluster to a different antenna. Irv Kalet [13] has generated some results for the optimal assignment of clusters to four antennas. His results are idealized in that the channel is assumed flat across clusters (i.e., no delay spread). While the results are optimistic, they do provide an estimate of the potential of cluster switching. As


5


Figure 5, The number of frequencies which need to be corrected to achieve 1% outage versus the normalized rms delay spread (i.e., τrmsRbaud).

shown in Table 1 (which is reproduced from [13]), the improvements can be significant. In particular, for a 20-dB fade margin, cluster switching provides more than two orders of magnitude improvement in the outage. Alternatively, for a fixed outage of 1%, cluster switching can provide about a 10-dB reduction in the required fading margin.

P

Pout (Fixed)

Pout (Adaptive)

0.9000

3.44x10-1

3.70x10-2

0.9900

3.94x10-2

3.97x10-4

0.9990

3.99x10-3

4.00x10-6

0.9999

3.99x10-4

4.00x10-8

Table 1: Performance Comparison 2.2 Peak-to-Average Power Ratio Reduction It can be shown that the peak-to-average power ratio, PAP, for a multicarrier signal is equal to N (or, in dB, 10 logN). Clustered-OFDM reduces PAP since fewer tones are transmitted through a given amplifier. For a total of 32 tones (subchannels) but with four antennas transmitting 8 tones each, PAP is reduced by 6 dB (PAP=9 dB). This translates into a factor of four reduction in the PAP seen by each amplifier, plus a factor of four reduction in the average power for an individual amplifier. Of course, four such amplifiers are required. Also, as mentioned above, the combination of clustering and coding may provide some performance benefits since now more uncorrelated symbols are presented to the decoder making the coding more effective. Next, we give an estimate of these benefits.


6


PAP of the multitone signal in each cluster can be further reduced by embedding the data sequence in a longer sequence (i.e., coding). This mapping can be implemented using a table lookup where all of the "large-peak" sequences are not used (for example, see [14]). An 8-tone signal (with QPSK on each tone) has a 9-dB PAP. By accepting a 12% overhead (i.e., one additional tone), we can reduce PAP to 3.6 dB. The table look-up approach can almost always reduce PAP to less than 4 dB with minimal overhead. Nevertheless, some problems are associated with this approach. 1. The size of the look-up table can be large. For 8 tones (16 bits) mapped into 9 tones (18 bits), there will be 216 18-bit entries in the transmitter table and 218 16-bit entries in the receiver table. 2. Because the coding is nonlinear, a single-bit error will cause an entire block to be received incorrectly (error propagation). 3. The bandwidth efficiency is reduced due to the code rate. The reduced efficiency is a minor issue, since the necessary codes have high rates (for example, 8 tones to 9 tones). Problems 1 and 2 can be avoided if a systematic approach to the coding can be found. In [14], several techniques have been discussed, including using Complementary Golay sequences [15] which can provide mathematical encoding and decoding (eliminating the table look-up) and may have some error detection/correction capability. However, further study is required. To obtain a rough estimate of the effect of the error propagation, assume that 8 tones are mapped into 9 tones and that 8 tones (16 bits) corresponds to two 8-bit Reed-Solomon code symbols (GF(256)). Therefore, a single-bit error in transmission will cause two Reed-Solomon symbols to be incorrectly received. Since, to reduce the PAP, 3/4 of the 18-bit sequences are not used, then about 25% of the time the error will go undetected - requiring 4 parity symbols to correct. The rest of the time, an error will be detected (i.e., the received sequence is not in the valid set of received 18-bit sequences), and an erasure will be declared and corrected. In the more usual case, without the nonlinear PAP coding, a single-bit error would cause only one Reed-Solomon symbol to be incorrectly received. This could be corrected either using 2 parity symbols or, if the errored frequency can be flagged as bad, 1 erasure symbol. A simple alternative to reducing the PAP of the multicarrier signal is to clip the signal before amplification and then filter after clipping. Previous studies [16]-[17] have investigated the inband degradation caused by clipping and the resulting PAP after filtering; however, further investigation is needed. This is an appealing solution because (1) It is scalable to a different number of antennas and subchannels per antenna; (2) It avoids look-up tables and the error propagation associated with nonlinear coding; (3) The number of transmitters could be reduced; and (4) Individual tones could be optimally assigned to a transmit antenna. A final possibility is to change the bias on the transmit amplifier in response to the peak value of the data sequence. Since the "large-peak" sequences do not occur very often, this could provide a significant increase in the power efficiency of the amplifier. As with clipping, further study is needed.


7


2.3 Design Parameters Assume M antennas with N subchannels on each antenna. In general, for OFDM, ISI can be eliminated by extending the symbol period, T = 1 / ∆f (∆f = subchannel spacing), using a guard interval, Tg, equal to the time extent of the multipath channel [12]. In addition, some number of guard frequencies, Fg, are necessary to minimize adjacent channel interference and to facilitate filtering. Including these inefficiencies as well as those caused by coding, the resulting bit rate is ( MN ) – F g Rb = 2 × R PAP × R code × -------------------------1 ----- + T g ∆f

(2)

where RPAP is the code rate for the nonlinear PAP reduction code and Rcode is the rate of the forward-error-correction code. In (2), we have ignored the packet efficiency which would include all of the overhead for training, as well as the MAC layer overhead. If we consider frequency and guard intervals which are 10% of the bandwidth and symbol period, respectively, without forward error correction, efficiencies (i.e., Rb/ BT) on the order of 1.4-1.6 bits/sec/Hz can be achieved. The number of tones (subchannels) per cluster, N/M, is limited by the PAP to about 5-10 (i.e., 7-10 dB). The number of transmitters (i.e., clusters), M, is upper-bounded by cost and power and lower-bounded by the desired diversity advantage - a good compromise choice is 4. The tone spacing ∆f (i.e., the subchannel bandwidth) must be large enough to accommodate the expected frequency offset, but small enough to avoid equalization. For a rms delay spread of 150 nsec and a carrier frequency offset of 3-5 KHz, a frequency spacing of 300-500 KHz should be adequate. Given these limitations, we consider two cases: 9 tones per cluster and 7 tones per cluster. The former conveniently accommodates a byte format using a Reed-Solomon code with GF(256) (i.e., 8-bit symbols). Two bytes (16 bits, 8 tones) will be mapped into 9 tones (18 bits) which are then transmitted over one of M antennas. In the latter case, we will use Reed-Solomon code with GF(64) but with a much smaller look-up table for the PAP reduction coding. In what follows we also assume a total bandwidth of about 11 MHz. Example 1: If we assume M=4 clusters and N=36 tones, the resulting subchannel spacing will be 305.6 KHz (i.e., BT/N). If we allow 2 guard tones (i.e., about 300 KHz on either end) for adjacent channel interference and filtering and a guard interval of 300 nsec (two times the rms delay spread), then Rb = 16.9 Mb/s x Rcode. However, the use of 9 tones per cluster (i.e., 18-bit sequences) requires a fairly large look-up table for the PAP reduction. The following example uses parameters which are more amenable to implementation. Example 2: Let M=4 and N=28. Then, the subchannel spacing will be 392.9 KHz. With 2 guard tones and 300 nsec of guard interval, Rb = 15.7 Mb/s x Rcode.

3. Clustered-OFDM Transmitter Implementation The Transmitter Architecture Figure 6 shows the block diagram of a single transmitter cluster. Throughout the design a conscious effort was made to minimize the hardware complexity and arrive at a board architecture Performance and Implementation of Clustered-OFDM ...

8


Synch Word

MHz

PAP ROM

6

DFT REAL

7 msb

Part-I

PLD

10 MHz 8

14

12 156.25 kHz

2

MUX

1.875

S/P

cluster-select

PAP ROM

8

DFT REAL

7 lsb

Part-II

D/A

Counter 0-63 8 10 MHz

rst_fft_cntr

6 To Imaginary DFT Tables

10 MHz

Figure 6. A single cluster of the transmitter.

that would result in a simple implementation. The first generation prototype reported in this paper was implemented on a wire-wrapped board with a maximum clock rate of 10 MHz. The board required three different clock signals which were related to one another through the use of the PLL/ clock generation circuit shown in Figure 7. clk_ref_by12 clk_DAC_by64 clk_ref

÷6

6-Bit Counter

÷2

1.875 MHz

VCXO 10 MHz

6 clk_DAC

rst_fft_cntr

Loop Filter H(s)

Figure 7, Clock generation board.

Having demonstrated the feasibility of the scheme using the wire-wrapped prototype, higher speed versions of the transmitter can be realized by simply migrating to a printed circuit board implementation. It should be noted that the components used were chosen to support data rates of up to 22 Mbps. In addition to the four identical cluster boards, a fifth board was also used to generate the clock and control circuitry. With such a modular approach, the user has the freedom to vary the number of clusters at will. The clock and control signals, along with power and ground lines, can be shared among the boards via a backplane connection, and any number of transmit cluster boards can be simply plugged into the system. Performance and Implementation of Clustered-OFDM ...

9


Datapath Architecture The datapath for a single cluster transmitter is shown Figure 6. A serial bit stream at 1.875 MHz enters each of the four clusters (overall transmitted bit rate = 4 clusters x 1.875 Mbps/cluster = 7.5 Mbps) where a serial-to-parallel conversion produces a 12-bit word at 156.25 KHz. In a conventional OFDM system, this word would have been used to modulate 6 complex tones, however, in our system, the 6 tones need to be coded (mapped) into 7 for the purposes of PAP reduction [14]. A nonlinear code was used for this purpose which guarantees the PAP ratio of the 7 tones to be no more than 3.2 dB. Given the nonlinear mapping involved in this procedure, a 4Kx14 ROM based table lookup technique is the best means for its implementation (Figure 6). The 14-bit PAP-ROM output word represents the encoded complex symbols of a QPSK constellation, which modulate a cluster of 7 complex tones. In OFDM, modulation onto the tones (subchannels) is performed by way of a Discrete Fourier Transform (DFT). Given our desire to transmit 4 clusters of 7 tones each, the modulator on each section must realize the following equation: 6

Ym ( k ) =

∑ ( x2n + jx2n + 1 )e

2πk ( n + 7m ) 1 – j ------------------------------- --28 2

k=0,...,55 m=0,1,2,3

(3)

n=0

where m is the cluster number, Ym(k) is the output sequence which is fed to the D/A for transmission and xn represents the nth bit of the 14-bit word appearing at the output of the PAP ROM. Note that an arbitrary decision was made to assign the even bits to the real part of the symbol and the odd bits to the imaginary part. A closer look at (3) reveals that the output sequence consists of 56 complex samples, twice what is required for a typical 28-point DFT. This is due to the desire to oversample the DFT output sequence by a factor of two, and explains the 1/2 multiplier introduced into the exponential function in (3). The oversampling guarantees a separation of fs/2 between the baseband signal and the first image of the D/A output signal. The separation results in a significant relaxation of the specification for the image cancelling lowpass filters following the D/ A. An OFDM transmit block typically consists of the three components shown in Figure 8 (see [12], [18] for example). The original K-point block (K=2N=56 in our implementation), a cyclic prefix (extension) and possibly a guard interval. In our realization, a total of 8 samples were allowed for the combination of the cyclic prefix and the guard interval. As will become evident shortly, the particular implementation outlined here allows the user complete freedom as to the contents of these 8 samples. Consequently for every 14-bit word that appears at the output of the PAP ROMs (Figure 6), 64 samples need to be read from the DFT ROMs and presented to the D/ A. These 64 samples constitute a complete OFDM symbol (block). In an effort to simplify the hardware it was decided to combine the cyclic prefixing, windowing, and the DFT operation into a ROM lookup table. This avoids the use of elaborate and costly signal processing ICs [19] and also provides the user with a flexible mechanism in which the relative size of the cyclic prefix and guard intervals of the OFDM symbol can be varied. The implementation also enables the user to implement any windowing function on the OFDM symbol.


10


Unextended data block

Guard Interval

Cyclic Prefix

Cyclic Prefix

Guard Interval t

K/2

-K/2

Figure 8, A transmitted OFDM symbol (block).

With such an implementation, the DFT ROM must have a total of 20 address bits, 14 for the PAP ROM output and 6 for the 64-bit counter that reads off the 64 samples of the OFDM symbol. This results in a total ROM address space of one million and a ROM access speed equal to the D/ A rate of 10 MHz in our implementation. Such high-end memory modules tend to be fairly expensive, especially if the same chip is to be used in the PCB-based version of the prototype, which might utilize D/A rates up to 30 MHz. The issue was resolved by separating the DFT lookup table between two separate ROMs, each modulating 3.5 complex tones. The outputs of these ROMs were then added together in a programmable logic device (PLD) to realize the desired total of 7 tones, (4)-(5). This partitioning of the DFT task allowed us to replace the 1 MByte ROM with a pair of 8 KByte ROMs. Furthermore, it was desired to allow each transmitter cluster board to be capable of transmitting any one of the four clusters. Consequently, the two 8 KByte DFT ROMs were replaced by two 32K ROMs having two additional address lines to select between one of the four different clusters. Ym ( k ) = x 6 e

2πk ( 3 + 7m ) – j ------------------------------56

2

∑ ( x2n + jx2n + 1 )e

+

2πk ( n + 7m ) – j ------------------------------56

k=0,...,55

m=0,1,2,3

(4)

n=0

Y m ( k ) = jx 7 e

2πk ( 3 + 7m ) – j ------------------------------56

6

+

∑ ( x2n + jx2n + 1 )e

2πk ( n + 7m ) – j ------------------------------56

k=0,...,55

m=0,1,2,3

(5)

n=4

The final issue to be addressed in the design of the DFT ROMs was the word size for the DFT samples. A preliminary simulation was carried out to measure the amount of in-band interference caused by quantizing the DFT samples. The study showed that with 8 bits of quantization, the inband interference was below -45 dB; using 6 bits the interference level was below -30 dB. Based on these results an 8-bit representation was used for the DFT samples stored in each of the two DFT ROMs.


11


The only other block in Figure 2 which has not yet been described is the synch-word ROM. This ROM contains a synchronization word that must be sent at the beginning of each packet (a packet consists of many OFDM symbols). The synch word is used at the receiver to identify the start of the incoming OFDM symbols. In our implementation, it is stored in a separate ROM and its samples are sent to the D/A by way of the multiplexer built into the PLD (Figure 6). Clock Generation The operation of the datapaths described in the previous subsection is governed by three clocks and a control signal. We will start by describing the relationship of the clocks and continue with the control signal needed to synchronize the clk_DAC_by64 clock with the DFT address generation counter. From Figure 6, it is observed that the serial data coming onto the board is clocked at a 1.875 MHz rate (clk_ref). clk_ref must then be divided by 12 to provide the clock signal for the serial-toparallel converter, as well as the PAP ROM, this signal is referred to as clk_ref_by12 (156.25 KHz). For each 14-bit word produced by the PAP ROM, 64 samples have to be read from the DFT ROMs and processed through to the D/A which is being clocked at 10 MHz (64x156.25 KHz) with the signal clk_DAC. The clock generation circuit is shown in Figure 7. A VCXO centered at 10 MHz is used to generate clk_DAC, this signal is then divided in frequency by 64 and is phase locked with clk_ref_by12. A simple exclusive-OR gate is used as a phase detector. The phase error signal is then filtered by an active loop filter with transfer function H(s) = (R2Cs+1)/R1Cs before being fed-back to the VCXO. In addition to synchronizing the frequency of the three system clocks, it is also important to synchronize the 6-bit counters used to generate clk_DAC_by64 and the 6-bit counters that generate the 6 LSB’s of the DFT addresses. To appreciate this, consider what happens to the system upon start-up. The DFT address counters are started immediately, whereas the clk_DAC_by64 signal undergoes some frequency fluctuations until the PLL is locked, Figure 7. As such, we have no guarantee of the relative position of the two counters in steady state. To resolve this issue the rst_fft_cntr signal was generated by sensing a 62 count and delaying the signal by one clk_DAC period. This signal was then sent to the DFT address counter which undergoes a synchronous reset at the next rising edge of clk_DAC. A complete clustered-OFDM transmitter would require four transmitter boards, and a single clock generation board. It should be noted that the clustering approach applies only to the transmitter, the receiver must implement a complete N-point DFT to recover the data [19]. However, with the exception of this single processing element, the remainder of the receiver can be realized using inexpensive, off-the-shelf ROMs and PLDs in the same manner as the transmitter. 3.1 Issues for ASIC Implementation The ROM-based approach described in the previous section is ideal for rapid and flexible board level prototyping, however, it does not lend itself well to application specific integrated circuit (ASIC) implementation. In this section we describe potential architectural trade-offs which may facilitate an ASIC realization of the clustered-OFDM transmitter. Moreover, an ASIC implementation would combine all clusters onto a single IC in which case some of the circuitry may be multiplexed between the different clusters.


12


Coding for PAP Reduction Due to its nonlinear nature, the PAP coding needs to be implemented via table lookup. Although the size of the table increases with the number of tones in each cluster, it should be noted that the speed with which these tables are accessed is equal to the frequency with which the OFDM blocks are generated. As an example, in the implementation reported here, this rate is 156.25 KHz (see Figure 6). At such rates, compact single transistor ROM cells can be used to minimize the Silicon area consumed by the table. The need for multiple ROM tables, one per cluster, is also eliminated in an ASIC implementation, as the same table could be shared among all of the clusters. Finally, it is worth noting that there are research efforts under way [14] to find a systematic approach to the problem of coding for PAP reduction. This would considerably reduce the hardware complexity of future ASICs by allowing the use of datapath architectures rather than table lookup solutions. DFT Architecture In a board level implementation, the realization of the DFT as a table lookup was attractive, since it resulted in simple hardware and simple components. Mapping the same architecture to an ASIC, however, is not judicious. In general as long as the number of tones is a power of 2, an FFT based on a 2-point butterfly is the most efficient implementation. In cases where the number of tones in each cluster is not a power of 2 an alternative solution is needed. A possible approach would be to only store K samples of the sinusoids sin(2πl∆fkT) and cos(2πl∆fkT), for l=0,...,L-1, and k=0,...,63, where K is the total number of samples per OFDM block and L is the number of tones per cluster. Noting that each tone is modulated using a QPSK constellation, the only thing that is necessary is to simply multiply the samples of the sinusoids by ±1 and accumulate them in the appropriate manner. With this approach the memory requirements for the DFT operation are reduced to KL samples. A simple exclusive-OR operation is needed for two’s complement multiplication by -1 as well as an accumulator to sum up the different frequency components. The memory requirements can be further reduced by realizing that all the tones above the carrier can be generated by simply conjugating the corresponding tone below the carrier frequency. Using the knowledge that the tone frequencies are integer multiples of one another, one can store samples of the lowest frequency tone, and then decimate these by the appropriate factor to get the higher frequencies.

4. Experimental Results A transmitter board as described in this paper was constructed for the transmission of clustered-OFDM signals, using four clusters of seven tones each. The output of the board was coupled to an I/Q mixer stage and up-converted to 1.925 GHz. Random data was then passed through the system. The cluster-select bits (Figure 6) were manipulated to provide the four clusters shown in Figure 9. A close inspection of the spectrum-analyzer plots shows the seven tones which appear as seven small bumps in the 1.25 MHz bandwidth occupied by each cluster. The figures also reveal the image rejection capability of the transmitter to be better than 20 dB. The rejection could have been improved had we been able to adjust the relative time delays of the I- and Q- signal paths between the D/A converter and the mixer.


13


Figure 9, Spectrum analyzer output showing the four possible transmit clusters.


14


The operation of the nonlinear coding approach for PAP reduction was also verified by measuring the PAP ratio of the transmitted waveform. However, prior to a discussion of these measurement results, a distinction must be made between the PAP ratio and the crest factor (the crest factor is what is actually measured). The term, PAP ratio, used throughout this paper is defined as the peak-to-average power ratio of the complex envelope of a narrowband, passband modulated signal. We will define the term “crest factor” as the ratio of the peak amplitude to the rms value of the passband signal. The justification for the introduction of the crest factor is that it is easily measured in the laboratory. The square of the crest factor is related to the PAP ratio through a multiplicative factor of two. In other words 10log(PAP) = 20 log(crest factor) - 3 dB [20]. The PAP ratio of the four clusters were measured by measuring the crest factor of the RF signal and normalizing out the effects of the cyclic-prefix interval, shaping, and the guard interval by introducing a factor of 10log(0.22). Using this approach, the measured PAP ratio was 20log(5.25/2.4) - 3 dB - 0.66 dB = 3.14 dB, which compares well with the theoretically predicted 3.2 dB.

5. Conclusion In this paper, we have described a technique referred to as clustered- OFDM which can overcome several of the practical difficulties encountered with either a conventional multicarrier approach or with a single-carrier system with equalization. This technique has several inherent properties which make it suitable for use in applications such as wireless LAN/ATM, Fixed Wireless Access, third generation cellular and PCS systems. The clustered-OFDM technique reduces the peak-to-average power ratio problem and minimizes the receiver training required. In addition, the use of coding across frequencies exploits some of the potential of diversity without requiring multiple receivers. Performance can be further improved through the use of cluster switching and nonlinear PAP reduction. In this paper, we have described a technique referred to as clustered-OFDM which can overcome several of the practical difficulties encountered with either a conventional multicarrier approach or with a single-carrier system with equalization. This techniques has several inherent properties which make it suitable for use in applications such as wireless LAN/ATM and Fixed Wireless Access. The clustered-OFDM technique reduces the peak-to-average power ratio problem and minimizes the receiver training required. In addition, the use of coding across frequencies exploits some of the potential of diversity without requiring multiple receivers. Performance can be further improved through the use of cluster switching and nonlinear PAP reduction. In addition to being a suitable modulation scheme for high-speed indoor wireless propagation, clustered-OFDM is also “implementation-friendly”. This fact has been demonstrated by the prototype transmitter reported in this paper. A fairly simple wire-wrapped prototype was built and demonstrated using off-the-shelf memories and PLDs. In its present form, the transmitter board is capable of accommodating data rates of up to 7.5 Mbps with a choice of four different clusters. It is expected that the data rates can be tripled when the design is assembled on a printed circuit board. In addition to its ease of implementation, clustered-OFDM effectively addresses the peakto-average power problem for transmit amplifiers, and it supports diversity at the transmitter.


15


References [1]

Radio Equipment and Systems (RES), “High Performance Radio Local Area Network (HIPERLAN); Functional Specifications”, European Telecommunications Standards Institute, Draft proposal, Jan. 25, 1995.

[2]

B. Daneshrad, L. J. Cimini, Jr. “Equalization Requirements for 30 Mbps Indoor Wireless Data Transmission, “Proc. of VTC ‘96, pp. 71-75, Atlanta GA, April 1996.

[3]

A. R. Nix, “HIPERLAN Compatible Modulation and Equalisation Techniques-What are the Real Choices,” ETSI RES-10 standard contribution RES-10/TTG/93/78, Dec., 1993.

[4]

E. Khayata, et. al., Minutes of HIPERLAN Meeting, Paris, France, Sept. 1994.

[5]

L. J. Cimini, Jr., B. Daneshrad, N. Sollenberger, “Clustered OFDM with Transmitter Diversity and Coding,” Proceedings of IEEE Globecom ‘96.

[6]

H. Rohling, T. May, “Comparison of PSK and DPSK Modulation in a Coded OFDM System,” Proc. VTC ’97, May 1997, pp. 870-874.

[7]

R.F. Ormondroyd, J. J. Maxey, “Comparison of Time Guard-Band and Coding Strategies for OFDM Digital Cellular Radio in Multipath Fading,” Proc. VTC ’97, May 1997, pp. 850-854.

[8]

R. E. Ziemer and M. A. Wickert, “Link Design for Third-Generation Wireless Systems for Rural Communities Using OFDM and ATM,” Proc. VTC ’97, May 1997, pp. 1649-1653.

[9]

B. Stanchev, J. Kuehne, M. Bronzel, G. Fettweis, “An Integrated FSK-Signaling Scheme for OFDM-Based Advanced Cellular Radio,” Proc. VTC ’97, May 1997, pp. 1629-1633.

[10] S. Sakakura, W. Huang, M. Nakagawa, “Pre-Diversity using Coding, Multi-carrier and Multi-antennas,” Proc. IEEE ICUPC ‘95, pp. 605-609 [11] T. Pollet, M. Van Bladel, M. Moeneclaey, “BER Sensitivity of OFDM Systems to Carrier Frequency Offset and Wiener Phase Noise,” IEEE Trans. on Commun., vol. 43, no. 2/3/4, Feb./March/April 1995, pp. 191-193. [12] John A. C. Bingham, “Multicarrier Modulation for Data Transmission: An Idea Whose Time Has Come,” IEEE Communications Magazine, May 1990, pp. 5-14. [13] I. Kalent, private communications. [14] T. A. Wilkinson, A. E. Jones, “Minimisation of the Peak to Mean Envelope Power Ratio of Multicarrier Transmission Schemes by Block Coding,” Proc. of VTC’95, pp. 825-829. [15] M. J. E. Golay, “Complementary Series,” IRE Trans. on Info. Theory, vol. IT-7, no. 2, April 1961, pp. 82-87.


16


[16] J. Rinne and M. Renfors, “The Behavior of Orthogonal Frequency Division Multiplexing Signals in an Amplitude Limiting Channel,” Proc. of VTC ‘94, pp. 381-385. [17] R. O’Neill, L. B. Lopes, “Envelope Variations and Spectral Splatter in Clipped Multicarrier Signals,” Proc. of PIMRC ‘95, pp. 71-75 [18] A. Peled, A. Ruiz, “Frequency Domain Data Transmission Using Reduced Computational Complexity Algorithms,” Proc. IEEE ICASSP ‘80, pp. 964-967, April 1980. [19] Digital Signal Processing IC Handbook, GEC Plessey Semiconductors PDSP16510. [20] D. R. Gimlin, C. R. Patisaul, “On Minimizing the Peak-to-Average Power Ratio for the Sum of N Sinusoids,” IEEE Transactions on Communications, Vol. 41, No. 4, p. 632, April 1993.


17


Performance and Implementation of Clustered-OFDM for Wireless ...

Performance and Implementation of Clustered-OFDM for Wireless ...

Suggest Documents

Protocol Design and Implementation for Wireless ...

Implementation of Wireless Gateway for Smart Home

Implementation of Wireless Sensor Networks for ... - HortTechnology

Performance and Implementation of Dynamic

design and implementation of performance

Design and Implementation of a Wireless ... - ScienceDirect

design and implementation of wireless automatic ...

Design and Implementation of MansOS: a Wireless

Programing and Implementation of Wireless Monitoring Automatic

Development and Implementation of Wireless Multigas Concentration ...

Implementation and Experimentation of Industrial Wireless Sensor ...

Implementation of wireless ECG measurement

Performance models for wireless channels

Study and implementation of a wireless accelerometer network for gait ...

Design and Implementation of a System for Wireless ... - CiteSeerX

Performance Evaluation of Relaying Schemes for Wireless ...

Suggested Framework for Implementation of Performance ... - icgfm

Pay for Performance: Implementation of Individual ...

Implementation Considerations For Wireless ... - Semantic Scholar

Cognitive Engine Implementation for Wireless Multicarrier ... - CiteSeerX

Smart Antenna Implementation Issues for Wireless Communications ...

Capacity and Performance of MIMO systems for Wireless - Journal of ...

SystemC Implementation and Performance ...

An Implementation Approach and Performance