A Digital Down Converter (DDC) processes the IF signal data to extract the desired channel, and reduces the sampling rate to economize on the computational ...
SYNTHESIS OF A 12-BIT COMPLEX MIXER FOR FPGA IMPLEMENTATION Qian. Liu, J.M.P. Langlois, D. Al-Khalili Royal Military College of Canada, Kingston, Ontario V. Szwarc Communication Research Center, Ottawa, Ontario R. Inkol Defence R & D Canada, Ottawa, Ontario
This paper presents a FPGA implementation of a multiplier-based complex mixer for communication systems that require high-throughput rates and architecture scalability. The paper focuses on the design of a complex mixer that consists of a BaughWooley-Adder-Tree complex multiplier and a Direct Digital Frequency Synthesizer (DDFS) based on a linear segment interpolation algorithm. The regular structure of this architecture permits deep pipelining and facilitates scaling to meet a given system specification.
demands for low power consumption and/or minimum hardware cost, complicates the task of designing the major DDC blocks such as the complex mixer. This paper describes the architecture of a 12-bit multiplier-based complex mixer targeting Xilinx VirtexE FPGAs. The system consists of a complex multiplier and a DDFS. The complex multiplier employs the Baugh-Wooley algorithm [1] and parallel adder tree in order to achieve a high throughout. The DDFS employs a novel linear segment interpolation architecture [2] to minimize chip area and maximize Spurious Free Dynamic Range (SFDR). The complex mixer is pipelined to operate at a clock rate of 143 MHz.
Keywords: Complex mixer; complex multiplier; direct digital frequency synthesizer; FPGA implementation
1. INTRODUCTION The implementation of quadrature receiver systems based on analog techniques is complicated by gain and phase imbalances resulting from component mismatches. Digital techniques do not suffer from these limitations, but are dependent on ADC performance. Consequently, progress in the evolution of ADC technology has aroused interest in quadrature receiver designs based on digital techniques. For a typical digital quadrature receiver, the front-end ADC generates a wideband stream of sampled and digitized IF signal data at rates on the order of 50-100 Mbits/sec and higher. A Digital Down Converter (DDC) processes the IF signal data to extract the desired channel, and reduces the sampling rate to economize on the computational cost of subsequent signal processing. The need to process the stream of digitized IF signal data in real-time, coupled with conflicting
CCECE2003 – CCGEI 2003, Montréal, May/mai 2003 0-7803-7781-8/03/$17.00 2003 IEEE
2. SYSTEM DESCRIPTION A block diagram of a digital quadrature receiver is shown in Figure 1. The RF signal from the antenna first passes through a BPF and is amplified by an analog amplifier. Frequency conversion to a suitable IF is then performed by multiplying the amplified signal with a fixed-frequency Local Oscillator (LO) signal and suppressing undesired mixing products with a bandpass filter. After conversion to the digital domain by the ADC, the 12-bit samples are mixed to baseband at the complex mixer block by multiplication with the 12-bit complex sinusoids from the DDFS. The sampling rate of quadrature output from complex mixer is reduced through decimation filters for further process at DSP. Complex Mixer
ADC
LPF
BPF RF Amp
Complex Multiplier
IF Amp
DDFS
local oscillator
Figure 1. Digital quadrature receiver
-001-
Decimation Filter
Abstract
DSP
3. COMPLEX MIXER ARCHITECTURE In the digital receiver, the wideband digitized IF signal contains the desired signal channel of carrier frequency, w c = 2 π f c , and can be expressed as: x(t ) = Xi (t) + jYi (t ) = g(t )e jw t
(1) Where X i (t ) and Yi (t ) are the real and imaginary components of the bandpass signal, and g(t) is the complex baseband signal. The baseband information is extracted from the wideband IF signal by multiplication with a complex sinusoid at the frequency of ωc = 2πf c : c
x(t )e
− jwc t
= x(t ) * [cos(wc t ) − j sin( wc t )]
= g (t ) = X 0 (t ) + jY0 (t ) (2) X o (t ) = X i (t ) cos( wc t ) − Yi (t ) sin( wc t ) (3) Y o ( t ) = Y i ( t ) cos( w c t ) + X i ( t ) sin( w c t ) (4) where X o (t ) and Yo (t ) are the baseband in-phase and quadrature components. Figure 2 shows the architecture of a conventional complex mixer. The DDFS generates a complex sinusoidal signal at a frequency based on the Frequency Control Word (FCW) and its clock frequency. The complex multiplication is performed between the complex sinusoids and the quadrature bandpass signals to generate the in-phase and quadrature baseband signals of the desired channel. X I
Y
adder. For each clock cycle, a new phase angle is obtained by adding the value of FCW with the previous register value. The PSAC provides an approximation of sine and cosine functions at this phase angle. In the conventional approach, the PSAC is implemented with ROM Look-Up Tables. For a resolution of M phase bits and L amplitude bits, the ROM size is 2M-2 x L bits when the octant symmetry of the sine and cosine functions is exploited [3]. If high resolution is needed, this approach is slow and expensive in chip area and power consumption. A concept for reducing the PSAC complexity and improving performance was proposed in [2]. The idea involves using a piecewise-continuous linear segmentation of the sine and cosine functions over the interval [0, π/4], with multiple linear segments in the form of: f ( x) = y k + m k ( x − x k ), x k ≤ x < x k +1 (5) The segment slopes mk are selected such that they can be represented as a sum of a few powers of two to eliminate the requirement for multiplication. The segment initial amplitudes are in turn selected to maximize the SFDR. In the present design, the DDFS has a 32-bit input FCW. With a clock reference of 100 MHz, the corresponding frequency resolution is better than 0.025 Hz. The first octant of the sine and cosine functions is approximated with 32 linear segments, producing a 12bit quadrature sinusoid with a SFDR of 80 dB. The DDFS is pipelined and can operate at 150 MHz in the selected FPGA. Because of the non-scalable DDFS architecture used, different SFDR levels can only be achieved with different designs of DDFS.
Xo
-
I
5. COMPLEX MULTIPLIER ARCHITECTURE
Yo
cos(
w
c
-s in ( w
n )
c
n )
DDFS
FCW
Figure 2. Complex mixer architecture
4. DDFS ARCHITECTURE The DDFS generates data sequences corresponding to quadrature sinusoidal digital signals with very high frequency resolution and tuning latency in the order of a few clock cycles [3]. A DDFS is divided into two blocks: a phase accumulator, and a Phase-to-Sinusoid Amplitude Converter (PSAC). The phase accumulator consists of a phase register and an
A complex multiplication consists of four parallel multiplications, one addition and one subtraction. If implemented directly, the architecture requires complex and irregular interconnects between components, resulting in large routing delay and, consequently, degraded speed. It has been observed that a complex multiplier with greatly reduced interconnects can be constructed when two multipliers and one adder are integrated into a single block [4]. An extra signal is used to control the addition/subtraction between two multiplications.
-002-
5.1 Multiplier Block Architecture
10
A block diagram of multiplier block is shown in Figure 3. The block performs function of A * B ± C * D . The inputs ‘a’ and ‘c’ are 12-bit 2’s complement IF data from the ADC, while ‘b’ and ‘d’ are 12-bit sign & magnitude (SM) quadrature sinusoid from the DDFS. The block generates a 24-bit 2’s complement output. The external signal SUB determines whether the addition or subtraction is performed between the two multiplications. There are 14 partial products generated from two Partial Product Generation (PPG) blocks, requiring four accumulation stages in the Partial Product Accumulation (PPA) block. At each stage, every two partial products are added concurrently, thus reducing the number of partial products by half. Two blocks are needed to generate the quadrature output, with function A * B − C * D at one block, and A * D + B * C at another block. a
PPG
b
c
d
PPG 2
PPG 1
10
10
sub
i =0 j =0
i =0
i =0
6) After rearranging and combining the summations, there are seven partial products generated. Among them, P0 to P4 are generated from the unsigned multipliers block,
10
10
∑∑ab 2 i =0 j =0
i+ j
i j
, and are given as 11 x
2 multipliers: 10
P 0 = ∑ a i 2 i * (b1 21 b0 2 0 ) i =0
10
P1 = ∑ a i 2 i * (b3 2 3 b2 2 2 ) i =0
M
10
P 4 = ∑ a i 2 i * (b9 2 9 b8 2 8 ) i =0
P5 and P6 are the products of two additions from the Negative Component Addition (NCA) and Adder blocks. The block diagram at Figure 4 shows the three parallel components at the PPG stage. A
PPA stage 1
PPA
10
= a11b11 222 + ∑ ∑ aib j 2i + j − ∑ aib11 2i +11 − ∑ a11bi 2i +11 (
A
B 11
11
PPA stage 2
PPA stage 3
Mult
B
11
A
B(11)
12 NCA
Adder
PPA stage 4
5 x 14 O utput
13 PPA
Figure 3. Multiplication block in complex multiplier
Figure 4. Decomposition partial product generator
5.2 Partial Product Generation Architecture
5.3 Number Conversion Circuits
The Baugh-Wooley algorithm [1] is originally proposed for direct 2’s complement array multiplication. The advantage of the algorithm is that all partial products are positive and the partial product matrix has a uniform structure, which makes the complex multiplier scalable for FPGA implementation. In the Baugh-Wooley algorithm, the multiplication is decomposed into four segments, i.e., one unsigned multiplication, two negative components, and one sign bit. The following equation gives an example of a 12bit multiplication: 22
( P )10 = − p23 223 + ∑ pi 2i i =0
10
10
i =0
i =0
The SM output from the DDFS is converted to 2’s complement format in order to take advantage of the 2’s complement multiplication algorithm. If the SM number is positive, the MSB is 0 and it is equal to the positive 2’s complement number. If the SM number is negative, the MSB is 1 and the conversion to 2’s complement is performed by inverting all bits except the sign bit, and then adding a ‘1’ to the result. The add/subtract control signal SUB is XORed with the sign bit of the SM input to realize the negation. The addition of ‘1’ for negative SM number conversion generates one extra partial product, which requires sign extension for correct partial product accumulation.
= ( A)10 * ( B)10 = (−a11 211 + ∑ ai 2 i ) * (−b11 211 + ∑ bi 2 i )
-003-
5.4 Partial Product Addition Stages
7. CONCLUSIONS
The partial products that are generated through Baugh-Wooley algorithm are accumulated to obtain the final output. In an ASIC design, fast multi-operand addition algorithms like Wallace tree are the most efficient accumulation methods to reduce the number of partial products to two. However, this approach is not applicable to FPGA design. The partial product addition design targeting Xilinx FPGAs should consider using the dedicated carry chains. With dedicated carry chains, a 2-LUT SLICE can implement a 2-bit full adder. The carry propagation delay of this 2-bit full adder is only about 0.1~0.2 ns. Without using carry chains, two LUTs are needed for a full adder instead of the one. So it is more advantageous to use the built-in adders throughout the adder tree rather than only at the final addition.
6. SIMULATION RESULTS The structural VHDL description of the complex multiplier and the DDFS were coded and verified individually. Synopsys FPGA Express was used as the synthesis tool, and the complex mixer is implemented on Xilinx VirtexE, V2000EFG680 with speed grade of -8. To achieve the minimal chip area, the design was first implemented without the insertion of pipeline stages. Simulation results show that this implementation contains 576 SLICES, and can operate at a clock rate of 28 MHz. The clock rate is limited as a result of the total delay being the sum of the delay of the DDFS and the delay of complex multiplier. Pipelining can be used to speed up the whole system at the expense of larger chip area. For a 9-stage pipelined complex mixer, four pipelines are inserted in the DDFS and five pipelines in the complex multiplier. The system uses 835 SLICES, and can operate at a clock rate of 143 MHz. The results show that the chip area increases by 40%, while the speed increases by a factor of 5. Figure 5 shows a 24bit output spectrum simulation using matlab, achieving 66 dB SFDR.
A complex mixer was designed for high-speed digital receiver applications. One of its building blocks, the complex multiplier, was designed to be scalable, and is pipelined to operate at high throughput rates. An 80-dB SFDR DDFS based on linear interpolation was integrated into the design. The 12-bit complex mixer was implemented on Xilinx VirtexE FPGAs. It can operate at a clock rate of 143 MHz for the selected technology. The system also has fast hopping speed and high spectrum purity, which make this design approach attractive for modern digital communication systems.
Acknowledgements The authors wish to thank J.L. Derome at RMC for his support and encouragement. The research was supported in part by grants from Defence R&D Canada-Ottawa.
References [1] K. Hwang, Computer Arithmetic Principles, Architecture, and Design, John Wiley & Sons Inc., 1979. [2] J.M.P. Langlois and D. Al-Khalili, “A quadrature direct digital frequency synthesizer architecture using piecewise-continuous linear segments,” Proceedings of the Queen’s Biennial Symposium on Communications, Kingston, Ontario, June 2003, pp.463-467. [3] V.F. Kroupa, Ed., Direct Digital Frequency Synthesizers, IEEE Press, 1999. [4] J.C.H. Latour, “High-Speed Complex Multiplier Integrated Circuit with Emphasis on Low Power Design”, Master’s Thesis, Royal Military College of Canada, Kingston, Ontario, Canada, June 1995.
Figure 5. Matlab simulation of output samples
-004-