A Systolic-Array Architecture for First-Order 3-D IIR ... - Semantic Scholar

1546

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 55, NO. 6, JULY 2008

A Systolic-Array Architecture for First-Order 3-D IIR Frequency-Planar Filters H. L. P. Arjuna Madanayake, Student Member, IEEE, and Len T. Bruton, Fellow, IEEE

-3-D spatio-temporal

Abstract—A massively parallel systolic-array architecture is proposed for the implementation of real-time VLSI spatio-temporal 3-D IIR frequency-planar filters at a throughput of one-frame-per-clock-cycle (OFPCC). The architecture is based on a differential-form transfer function and is of low circuit complexity compared with the direct-form architecture. A 3-D look-ahead (LA) form of the transfer function is proposed for maximizing the speed of the implementation, which has a nonseparable 3-D transfer function. The systolic array enables real-time implementation of 3-D IIR frequency-planar filters at radio-frequency (RF) frame-rates and is therefore a suitable building block for 3-D IIR digital filters having beam- and cone-shaped passbands as required for smart-antenna-array beam-forming applications involving the broadband spatio-temporal filtering of plane-waves. The fixed-point systolic-array implementation have a throughput of OFPCC and the tested real-time prototype achieves frame (clock) sample frequencies of up to 90 MHz using one Xilinx Virtex-4 sx35-10ff668 FPGA device.

frequency variables. -3-D -domain variables. —3-D spatio-temporal variables. —3-D input Laplace transform. —3-D output Laplace transform. —3-D input -transform. —3-D output -transform. Denotes a subcircuit in a PPCM. subcircuit

—z-transform of output of subcircuit . Transfer function of subcircuit . , 2, 3—Denotes internal circuits of a PPCM. Output of the PPCM at location . Number of sensors in the -direction. Number of sensors in the -direction. Depth of pipelining in a PPCM.

Index Terms—Digital filter, frequency-planar, radio frequency (RF), sensor, smart antenna arrays, systolic array, wireless.

NOMENCLATURE DPP PPCM LA ZIC OFPCC OSPCC CPD

Distributed parallel processor. Parallel processing core module. Look ahead. Zero initial condition.. One frame per clock cycle One sample per clock cycle. Critical path delay. —Continuous domain output signal. —Point in 3-D space-time continuum. —Continuous domain input signal. —Index for a spatio-temporal sample. Synchronously sampled 3-D input signal. Synchronously sampled 3-D output signal. —3-D Laplace variables.

Manuscript received January 19, 2007; revised July 30, 2007. First published February 2, 2008; last published July 10, 2008 (projected). This paper was recommended by Associate Editor A. Kummert. The authors are with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada, T2N 1N4 (e-mail: [email protected]) Digital Object Identifier 10.1109/TCSI.2008.916612

—z-transform of input of .

Inter sensor spacing in . Inter frame sample time. Frame sample frequency. —Clock period ( frame sample period). Clock frequency for LA of order . CPD of circuit having LA of order .

I. INTRODUCTION IRST-ORDER three-dimensional (3-D) frequency-planar digital filters have well-known applications as a building-block for 3-D broadband sensor-array-based beam formers [1], [2] and video signal processing [3]–[5]. For example, frequency-planar filters are useful building blocks for high-order 3-D beam and 3-D cone filter banks [6]. Although the multidimensional (MD) signal processing theory behind the application of 3-D beam-, cone- and frequency-planar filters, for various applications, are well established, real-time hardware circuit architectures suitable for implementing these

F

1549-8328/$25.00 © 2008 IEEE Authorized licensed use limited to: University of Calgary. Downloaded on January 6, 2009 at 11:35 from IEEE Xplore. Restrictions apply.

MADANAYAKE AND BRUTON: SYSTOLIC-ARRAY ARCHITECTURE FOR FIRST-ORDER 3-D IIR FREQUENCY-PLANAR FILTERS

useful 3-D filters in real-time at extremely high frame sample rates (on the order of hundreds of millions of frames per second) are not available. In this paper, we report our progress in MD circuits and systems in the area of massively parallel speed-maximized systolic-array-based hardware architecture having low circuit complexity for the real-time circuit implementation of useful 3-D IIR frequency-planar filter building blocks. The proposed architecture and field-programmable gate array (FPGA) circuit prototype improves on 2-D/3-D architectures based on direct-form filters, by a factor of up to 57% (for the 3-D case) lower circuit complexity by employing 3-D differential operators [7]–[9] in the systolic-array design [10]–[20]. Furthermore, the proposed architecture uses a novel 3-D look-ahead (LA) speed maximization technique that leads to increased real-time throughputs. This work preempts future very large-scale integration (VLSI) realizations of 3-D IIR filter circuits operating in real time at radio-frequency (RF) frame rates. High-speed real-time 3-D digital filters having beam- and cone-shaped passbands are especially useful in sensor-array beamforming applications, such as for filtering the 3-D ultrabroadband spatio-temporal signals that are received at smart antenna arrays, where digital beam steering may be employed for achieving multiple broadband electronically steerable beams. Such arrays [21]–[31] have applications in radio astronomy (i.e., the square kilometer array project [32]–[35]), radar [29], [36]–[41], communications [42], [43], and navigation [44], [45]. Here, we employ the fact that the 3-D region of support (ROS) of the spectrum of an ideal 3-D broadband plane-wave lies on a line through the frequency origin and the direction of this line in is equal to the direction of arrival (DOA) of the plane-wave in the spatio-temporal domain [6], [46], [47]. Three-dimensional plane-wave filtering ideally requires that the 3-D beam- or cone-shaped passband of the filter be aligned to encompass this ROS of the spectrum of the plane-wave [6], [48]. The 3-D stopband must be capable of significantly attenuating unwanted signals, such as 3-D wideband noise and plane-waves having DOAs outside of the passband [6], [49]. Such applications provide our motivation for implementing real-time 3-D frequency-planar filters. Systolic-array architectures for implementing 3-D frequency-planar filters may be used as building blocks for second-order and higher order 3-D IIR beam filters [48], [50]–[52] and 3-D IIR cone filter-banks. For example, two 3-D IIR frequency-planar filters may be cascaded to realize a 3-D IIR beam filter and a parallel-form interconnection of such 3-D beam filters can be used to implement 3-D IIR cone-filter banks [5]–[7]. These 3-D beam and cone filters are suitable for the highly selective enhancement of propagating spatio-temporal broadband plane-waves on the basis of their DOA [6], [7], [46]–[63]. In the architecture proposed here, the 2-D spatial array of input sensors typically employs synchronously sampled broadband antennas such as helical antennas [64], [65] or slotted bow-tie antennas [66], [67], at each sensor location. These sensor signals are typically each amplified by a low-noise amplifier (LNA) [68]–[70], low-pass filtered (to avoid temporal aliasing) and then temporally sampled using a (potentially

1547

time-interleaved [71], [72]) high-speed analog-to-digital (A/D) converter [73], [74]. The overall structure of the proposed systolic architecture is a 2-D mesh of identical parallel processors having lower circuit complexity than conventional direct-form architecture [9]. Conventional 1-D LA methods of speed maximization inherently require that the 1-D pole locations of the0 z-transform transfer function be known. Such methods cannot be extended to the general MD case because the denominator of the corresponding MD transfer function is nonseparable, implying that the poles of the transfer function are unknown. Hence, in the MD case, separability of the denominator of the -transform transfer function has been required [75], [76] if conventional LA methods are to be applied. It is a significant feature of the proposed method that a speed-maximized architecture is achieved in spite of the nonseparability of the denominator of the 3-D -transform of the frequency-planar transfer function. This paper is organized as follows. Section II contains a review of 3-D frequency-planar transfer functions and a typical design is described for the proposed architecture. Section III contains a typical transfer-function design example along with details concerning finite-precision effects, pipelining, and critical path delay (CPD). In Section IV, simulation and verification details of a real-time implementation are described, employing a midcapacity Xilinx Virtex-4 Sx-35 FPGA device. Section V discusses future work and conclusions. A review of differential-form 3-D IIR frequency-planar digital filter synthesis is provided in the Appendix. II. SYSTOLIC-ARRAY ARCHITECTURE The required -domain input-output transfer function [77], [78] of the 3-D IIR frequency-planar filter is given by

(1) where are direct-form feedback coefficients [53]. Under zero initial conditions (ZICs), the 3-D filter having -domain transfer function (1) is implemented by the spatio-temporal differenceequation

(2)

Raster-scanned 3-D IIR filter implementations, having a throughput of one-sample per clock cycle (OSPCC), are well known [5], [7], [11], [79]–[83] for video/image processing applications. However, to our knowledge, 3-D filter real-time architectures having the much higher throughput of one frame

Authorized licensed use limited to: University of Calgary. Downloaded on January 6, 2009 at 11:35 from IEEE Xplore. Restrictions apply.

1548


Fig. 2. Each antenna (sensor) in the rectangular array is coupled to a broadband LNA [70] using a broadband matching network [84]–[87], low-pass filtered, and subsequently sampled using a dedicated A/D converter, sampled at rate F = T F = T . The digital output of the A/D converter is delayed via a first-in-first-out (FIFO) buffer to compensate for pipelining (explained later) and fed to the signal input A of the corresponding parallel-pron ;n < N ;N . cessing-core-module (PPCM) located at ; ;

=1 1 =

Fig. 1. A 2-D interconnection of identical PPCMs enable real-time throughput depth of pipelining. at OFPCC. Note: p

=

per clock cycle (OFPCC), as typically required for synchronously sampled RF applications, are limited to the relatively complex circuit reported in [9]. Here, a “frame” refers to an set of data samples obtained at each time sample. Previous circuits, due to high CPD, have throughput bottlenecks at RF frame sample rates. In this study, we propose a novel architecture that is shown to have OFPCC throughput, to have up to 57% lower circuit complexity compared to [9], and have higher throughput due to the significantly lower CPDs [9]. A. Overview of the Proposed Systolic Array The proposed systolic array (Fig. 1) achieves OFPCC throughput by synchronously computing the difference equation (1) recursively for all spatial sample points. In general, VLSI implementations of systolic-array processors are modular, regular and locally interconnected and have high throughput, making them highly suitable for real-time high-frequency applications [10]–[20], [88]–[100]. The distributed-parallel-processor (DPP) systolic-array architecture in Fig. 1 is proposed for the implementation of the 3-D spatio-temporal difference equations of 3-D first-order IIR frequency-planar filters. This architecture consists of an array of identical synchronous parallel interconnected processing core modules (PPCMs). The synchronous operation of the PPCMs requires that the array of A/D converters synchronously sample their respective analog input signals to form the 3-D input , where and signal . Here, space-time locations are , , and , respectively. The analog inputs from each sensor must be amplified using a broadband LNA, lowpass filtered to minimize temporal aliasing, and subsequently sampled using a synchronous array of parallel high-speed highprecision A/D converters, as shown in Fig. 2. Each PPCM essentially implements (1) in real time for a particular spatial location. The straightforward direct-form I implementation of (1) inside each PPCM is described in [9]. How-

=1 1

(0 0 ) (

) (

)

ever, for high-speed implementations, the direct-form realization is not the best choice because it leads to high VLSI resource consumptions, low computational throughput due to high complexity, as well as higher CPDs. Here, we reduce the circuit complexity by employing an alternative differential-form realization inside the PPCMs, as described in Section II-C. The throughput limitations in [9] are much improved in the proposed architecture by employing a novel 3-D LA speed optimization method, as described in Section II-D. B. Review of 3-D IIR First-Order Frequency-Planar Filters in Differential Form We require an algebraic decomposition of (1) that yields an architecture having the desired high throughput of OFPCC. For this purpose, each PPCM in Fig. 1 employs first-order 1-D differentiators having 1-D -domain transform transfer functions [7], [101] (3) are the horizontal and vertical spatial delay operawhere tors and is the temporal delay operator. Equation (3) allows (1) to be decomposed in the differentiator form [52], [53], [57], [101], [102] (4)

where the feedback coefficients

,

, 2, 3 are given by

(5) As reviewed in the Appendix, the above four design parameters , , 2, 3, are selected to achieve the required narrow 3-D bandwidth and 3-D passband of the 3-D IIR frequency planar filter.



1549

Fig. 3. PPCM is a three-input-three-output circuit capable of computing the 3-D filter difference-equations at a throughput of OSPCC leading to a total throughput of OFPCC in the DPP architectures.

C. PPCM Architecture The proposed architecture for each PPCM is shown in Fig. 3, where the required transfer function of the subfilter in Fig. 3 is (6) . It is this further decomposition of the where that facilitates LA transfer function, as a function of optimization and pipelining for low critical path delay and maximum throughput [75], [103]. The parallel multipliers in (5) are associated with the spatial differential operators and are implemented as shown in Fig. 3. Clocked FIFO buffers are employed , to implement each pipeline having delays is the clock (frame) period and where where is the so-called depth of pipelining. These FIFO buffers are inserted at any point in their respective signal paths in order to reduce the overall critical path delay of the circuit [75], [76], [103]–[105]. In Fig. 4, four alternative decompositions are proposed, where each decomposition yields a of different throughput, resulting from four corresponding different critical path delays, achieved at the expense of greater hardware complexity as represented by an increased number of multipliers and adders. The final choice depends on the required throughput. D. Speed Optimization of the Subcircuit With LA Techniques

Fig. 4. Four alternative implementations resulting of the subfilter T (z ) (X (z )=X (z )) having: (i) no LA speed opti(1 + z =1 + z ) mization; (ii) first-order LA speed optimization; (iii) second-order LA speed optimization; and (iv) third-order LA speed optimization. Note: (iii) and (iv) have the same complexity.

the denominator of (1) implies that the 3-D pole surfaces (i.e., manifolds) of the transfer function of the 3-D frequency planar filter cannot be found [77]. Consequently, the well-established 1-D methods for designing LA circuits for IIR filters (which all depend on knowing the pole locations) cannot be extended to nonseparable MD IIR transfer functions, such as (1) [77]. In general, the problem of designing LA circuits for IIR MD filters has not been solved. However, here we provide an LA solution for the 3-D first-order nonseparable case. We do so by decomposing the first-order 3-D transfer function using differential operators. For brevity and focus, we do not pursue extensions of the idea to higher order or higher dimensions. In the following, the philosophy is to apply LA speed optimization only in the direction of temporal recursion and to implement the optimization inside each subfilter circuit of each PPCM. Cross-multiplying terms, simplifying (4), and using (3), we obtain (7)

Using Pipelining

Remarks About Pipelining in the Multidimensional Case: It is important to note from (1) that the direct-form denominator

Cross-multiplying terms in the denominator of (7) and further simplifying, we obtain the following decomposition of (7) as required for speed maximization using LA and pipelining:

of the input–output transfer function of the filter is nonseparable for these frequency-planar filters. It is well known that such 3-D polynomials cannot generally be factored (due to the lack a Fundamental Theorem of Algebra for the case of MD polynomials [77], [106], [107]). This nonfactorable property of

(8)


1550


Rearranging (8) and employing (3) and (6) yields the proposed 3D Look Ahead Form

(9) Although the right-hand side of (9) is not a transfer function operating on the input, it is -separable in denominator, and it is this property that allows us to employ conventional timedomain LA methods. 1) LA Speed Optimization: The proposed 3-D LA speed maximization is based on well known LA based 1-D IIR digital filter pipelining. Essentially, LA is the utilization of pole-zero cancellation (assuming infinite precision) to increase the order of the denominator polynomial such that more pipeline stages can be included in the corresponding hardware feedback circuit. For example, consider a first-order transfer-function where , and its simple LA-based equivalents , ,

and

for LA of order , 2, and 3, respectively, and lead to various hardware architectures having different circuit complexities and throughputs. Pole-zero cancellation (infinite precision) in the LAs . For , 1, 2, the LA optimized filter implement circuits require one, two, and three multipliers, respectively, , the number of multipliers can while, for LA of order be three or four, depending on the use of the cascade or parallel forms in the numerator part. The spatial 2-D inverse -transform of (9) under ZICs yields the following mixed-domain equation in the 3-D variable :

(10) where (11) (12) Writing the transform pair , the 3-D inverse -transform of (4) yields the following spatio-temporal decomposition of the required spatio-temporal input-output 3-D difference equation (2): (13) (14) (15) (16)

It is easily shown that equations (13) to (16) are bounded-inputbounded-output (BIBO) stable if the usual ZICs are imposed [108]. As shown in Fig. 3, each PPCM consists of parallel arithmetic hardware for the computation of the inverse -transform of (10) and for the computation of (14) and (15), using subcircuit which are implemented using parallel pipelined substractor cirand , respectively. Equations (6), (7), and (13) are cuits time-synchronously calculated in one clock cycle, leading to the desired OFPCC throughput of the complete DPP of Fig. 1. Note: we chose to explain the operation of subcircuit using the mixed-domain equation (10), even though we use the spatiotemporal difference equations (14) and (15) for the explanation and . By doing so, we avoid the use of teof subcircuits dious convolution operations while explaining the different de. compositions of of the PPCM is the The temporal signal at output port and 1-D inverse -transform of (10). The pairs of ports implement the inputs and outputs of (14) and (15), respectively, with the PPCMs interconnected according to Fig. 1. The ZICs are implemented according to (13)–(15) for the spatial , where and range correspond to the locations of the spatial ZICS. It is important to note that the temporal recursions are implemented having transfer only within the PPCMs using the subfilters . The temporal ZICs at are obtained function by blanking all internal registers of each PPCM at the power-on . time : The transfer func2) On the Four Decompositions for of subcircuit may be implemented using a variety tion of equivalent architectures, each having different numbers of multipliers and adder/substractor circuits and different critical path delays. Here we propose four particularly useful equivalent architectures for implementing subfilter , as follows: consider the subfilter transfer function (6). The filter coefficient satisfies because , , , 2, 3, for all 3-D passband directions and bandwidths. Therefore, contains a simple real pole that is not outside the unit circle and is therefore BIBO stable [103], [109]. Stable first-order recursive filter circuits may be fully pipelined by inserting additional delays inside the feedback path, employing the known method of LA speed optimization [75], [76], [103]–[105], [109], [110]. LA speed optimization of order in a first-order recursive filter enables additional clocked FIFO delays inside the feedback path [75]. These delays are used for pipelining the feedback loop, leading to , where reduced critical path delays given by is the critical path delay without LA speed optimization [75]. as shown in Fig. 4 (i) has The implementation of no LA speed optimization. Such first-order circuits having a single quantizer in the feedback path are free of zero-input and overflow limit cycle oscillations when implemented using magnitude truncation with overflow saturation arithmetic [109], [111]. The critical path delay after pipelining is where and correspond to the logic delays in a parallel multiplier and adder/substractor



1551

TABLE I COMPARISON OF MULTIPLIER-ADDER/SUBSTRACTOR CIRCUIT COMPLEXITY, OFPCC THROUGHPUT LEVEL, AND GAIN IN THROUGHPUT COMPARED WITH Hz CONVENTIONAL RASTER SCANNED 3-D IIR FILTERS CLOCKED AT F

circuit, respectively [75]. The maximum clock frequency is Hz. The alternative implementation therefore shown in Fig. 4 (ii) has first-order LA speed optimizaof tion and is arrived at by first multiplying both numerator and , leading to denominator of (6) by (17) has a reduced critical This second alternative circuit for path delay given by and is therefore capable of approximately twice the throughput of the circuit in Fig. 4 (i), which is given by [75] . The third implementation of , as shown in Fig. 4 (iii), has second-order LA speed optimization and is arrived at by multiplying both the numerator and the de, leading directly to nominator of (6) by (18) The critical path delay is thereby reduced to , which leads to circuits having [75]. Finally, an implementation of (6) having third-order LA speed optimization is shown in Fig. 4 (iii). This implementation is obtained by multiplying both numerator and denominator of (17) , leading to by

(19) The critical path delay is now further reduced to , leading to a higher maximum [75]. The proposed impleclock frequency of are summarized in Table I. Note that higher mentations of order LA speed optimization may be used, leading to clock fre, where the order of the LA quencies are not recomused is . However, circuits having order mended because of their high complexity and potential unsuitability for typical applications due to excessive VLSI resource consumptions. The VLSI silicon real-estate savings in terms of multiplier circuit complexity compared to [9] is 57%, 43%, 35%, and 35%, for 3-D LA Forms corresponding to , 1, 2, 3, respectively.

III. DESIGN, ARCHITECTURE, PIPELINING, AND QUANTIZATION Here, we describe a proof-of-concept for the implementation of the above high-speed DPP architecture, using a single Xilinx Virtex-4 FPGA chip. Most of the available programmable logic resources in this device has been employed to implement , a real-time DPP having an array size of implying a total of 49 synchronous PPCMs. By extension, larger DPPs may be implemented on FPGA devices that have more programmable logic resources, or on application-specific integrated circuits (ASICs). In general, a modern high capacity FPGA such as the Xilinx Virtex-4 Lx-200 is equivalent in capacity to 1.5 million ASIC gates [112]. A design requiring about 15 million ASIC gates can be prototyped using approximately ten Virtex-4 Lx-200 FPGAs (note: the Xilinx Virtex-4 Lx-200 has a lot more programmable logic resources than the Sx35 used here). We therefore estimate that real-time DPPs having at least 500 PPCMs may be implemented on currently feasible ASIC processors. This would permit high performance 2-D and 3-D space-time beamforming wideband cone filters [6], [46], [47] to be implemented for filtering broadband plane-waves, for which this 3-D frequency-planar filter is a building block [1], [6], [54], [58], [59]. A. Design of the Transfer Function Design of the 3-D IIR Frequency-Planar Filter The filter (4) is obtained using (5) by selecting the design , , , and , parameters as , , and resulting in the coefficients so that (20) and the subfilter circuit

having transfer function (6) given by (21)

. The 3-D magnitude frefor the constant quency response follows from (20) and is given by


(22)

1552


Fig. 5. Magnitude frequency response of the first-order 3-D IIR frequencyplanar filter, plotted for three discrete temporal frequencies ! : ,! = , respectively. = , and !

4

= 2

= 00

=

This function is shown in Fig. 5 for the three discrete temporal , , and , where it is frequencies , , 2. plotted over the spatial 2-D square The spatial origin is indicated. B. Architectural Design of the PPCM In this example, we employ LA speed optimization of order one, as shown in Fig. 4(ii), because it leads to a realizable FPGA implementation employing approximately 80% of the programmable logic resources and most of the embedded hardware multipliers in the Xilinx Virtex-4 sx35-10ff668 FPGA device. 1) Finite Precision Effects: The PPCMs circuits are of finite precision due to the employment of two’s compliment fixedpoint arithmetic circuits. The use of finite precision leads to nonidealities such as quantization noise, overflow, transfer-function errors (magnitude sensitivity to coefficient errors), potential instability, and potential limit-cycle oscillations. The DPP circuit consists of a multidimensional IIR recursive filter circuit having parallel inputs and as many outputs. A complete treatment of finite precision effects of such a system require future work. Here, we briefly discuss the primary design aspects for reducing the finite precision effects in the DPP architecture.

a) System word size: Finite word length (FWL) leads to quantization noise, starting at the A/D converter. Temporal over sampling and averaging (useful for reducing the effects of bilinear warping in the passband) increases the effective number of bits in the A/D converter and improves the SNR by 6 dB for every doubling of the A/D sample frequency [113]. The SNR reduces with additional stages of requantization in the signal flow. The quantization noise is also subject to 3-D frequency-planar filtering, as it propagates through the filter circuit. This may be investigated using a linear model of uncorrelated additive noise sources. In general, higher precisions are preferred in terms of lower noise powers, even though higher precisions lead to higher critical path delays (i.e., lower throughputs) and higher VLSI resource consumptions. b) Sensitivity: Multiplier coefficients errors due to finite precision lead to the deviation of the implemented 3-D transfer function shape from the ideally frequency-planar (Bilinear-warped) passband, in Fig. 5. The worst case errors, both for magnitude and phase of the transfer function, may be adapted to 3-D circuits starting from the well-known 1-D filter theory of worst case multiparameter sensitivity analysis. Digital filters that are derived from resistively terminated passive prototype networks, such as first-order 3-D IIR frequency-planar filters, have desirable low multiparameter sensitivity properties in their passband compared to 3-D IIR filters that are synthesized using certain numerical optimization methods [61], [102], [114]. The 18-b precision in the multiplier coefficients leads to worst case magnitude transfer-function errors 1%, which is sufficient. Wave digital filters (WDFs) and integral-form signal flow graph-based 3-D IIR filters can achieve lower sensitivity at the cost of added computational complexity [102], [115]. Furthermore, it is well known that cascade- and parallel-form high-order digital filters have better sensitivity properties compared with direct-form structures. Our approach of synthesizing 3-D IIR beam filters using a cascade of first-order 3-D IIR frequency-planar filter building blocks and using a parallel form implementation of such 3-D IIR beam filters for achieving 3-D IIR cone filter banks is therefore expected to display better sensitivity properties compared with a direct-form implementation of beam/cone filters. stages of pipelining, the correc) Pole location: For ) of the 3-D LA speed-maximized sponding 1-D pole (in is located at [75] 1-D input-output transfer function (23) where is the 1-D pole of the PPCM when no 3-D LA speed maximization is present, and is the finite precision for LA of order error in representing the coefficient . The pole at (23) is more sensitive to smaller values of . This, however, does not cause instability because of the [75]. The effects of proximity of the pole to the origin finite precision to the 3-D frequency-planar transfer-function, in the presence of the proposed 3-D LA optimization, remains to be investigated. However, we suggest that the net effect on the DPP 3-D transfer-function is tolerably low because the effect on each PPCM can be made small using well established 1-D IIR filter hardware design methods [75], [104], [110].



Fig. 6. Zero-input and overflow limit cycle free design of the recursive feedback path shown in Fig. 4(ii) using finite precision arithmetic. Magnitude truncation at the quantizer, together with overflow saturation leads to zero-input and overflow limit-cycle-free operation.

d) Limit cycles: These instabilities may occur in FWL IIR digital filters. If the PPCMs having FWLs are stable, the DPP must in turn be stable and free of temporal limit cycle oscillations. First-order single-quantizer FWL IIR filter circuits can be free of overflow and zero-input limit cycles by adopting magnitude truncation at the quantizer together with overflow saturation [111], [116]. Magnitude truncation at the so-called passive quantizer together with overflow saturation, removes the problem of zero input limit cycles. However, here we use two’s complement truncation (TCT) despite its tendency to generate low amplitude limit cycles (for TCT, the required passivity is not achieved for negative number quantization [117]), because the magnitude truncation quantizer we ideally require is unavailable in the hardware cores used. Each of the 49 PPCMs employ two’s-complement fixed-point arithmetic with truncation type quantization and saturation type overflow [109], [111]. The XtremeDSP kit-4 FPGA rapid prototyping solution consists of a Nallatech BenADDA daughter card and a Nallatech BenONE mainboard. The BenADDA card carries the Xilinx Virtex-4 Sx35-10ff668 user FPGA and two A/D and D/A channels, each having 14 b of precision. The A/D and D/A channels assume two’s complement fixed-point arithmetic with a the binary point set at bit 13. The system word-size of the DPP architecture is quantized at 14 b of precision, with the binary point at bit 13, to accommodate interfacing with the A/D and D/A channels. The internal registers of the PPCMs are quantized to 16 b of fractional precision, to help reduce the effects of quantization. In order to reduce temporal zero-input limit cycles and eliminate overflow limit cycles, quantization is implemented at the output of the summation block of the second order feedback loop shown in Fig. 4(ii) [118]. In Fig. 6, we show the details of the finite precision implementation of the feedback circuit of Fig. 4(ii). The system-word binary fraction size is 13-bits. The is at 16 bits fractional part of the output of the multiplier of precision and the output of the summation block of the fractional part of the second order feedback loop, shown in Fig. 4(ii), is quantized to 13 b TCT, as shown [111], [118]. 2) Pipelining: Pipelining leads to low critical path delay and hence high clock frequencies, which in turn leads to high realtime throughputs. Each PPCM, as shown in Fig. 3, is pipelined

1553

stages. Furthermore, the critical path delay is rein duced by employing first-order LA speed optimization in subof each PPCM. The pipelining is compensated by infilter cluding clocked FIFO delays of depth samples and the corresponding between each A/D converter at PPCM input port , such that the input to the PPCM at this array , location is modified to . The pipeline stages of the PPCMs may consist of (a) clocked unit delay buffers between two arithmetic circuits and (b) internally pipelined arithmetic circuits. Note that, in Fig. 3, all pipelines in a given signal path are represented by a single clocked FIFO buffer of depth . Clocked delays due to pipelining can be compensated by inserting additional clocked at the FIFO buffers having length output of each PPCM at location . IV. SIMULATION, FPGA IMPLEMENTATION, CO-SIMULATION, AND REAL-TIME OFPCC OPERATION The proposed DPP architecture has been simulated for an with bit-true cycle-accurate array size of modeling, using Matlab/Simulink and Xilinx System Generator (XSG) FPGA design tools [26], [119], [120]. Thereafter, it is physically implemented in real time for array size on a single FPGA device. A. Verification of the 3-D Frequency-Response by Simulation We have verified the 3-D frequency-response of the FPGA circuit (with ) by simulating the 3-D bit-true cycle-accurate unit impulse response and computing its 3-D fast Fourier transform (FFT) according to the transform pair [77], [78]. The corresponding 3-D magnitude frequency response was observed to agree almost exactly with the ideal response shown in Fig. 5. The departure of the 3-D passband in Fig. 5 from a straight beam is due to the well-known warping distortion of the triple bilinear transform and is therefore independent of the hardware implementation. Bilinear warping may be avoided by twice over-sampling the input signal, at the expense of reducing the throughput, or by employing the polyphase cone-filtering method in [2]. B. Physical Implementation The FPGA physical implementation was first verified using on-chip hardware co-simulation and thereafter operated at OFPCC throughput in real-time at frame sample rates in excess MHz. The bit-true cycle-accurate FPGA circuit of 16 DPP confirms simulation (in Section IV-A) of the 16 correct operation of the XSG model. However, it does not provide conclusive experimental evidence of circuit operation in real-time. In order to experimentally verify stable real-time operation at the OFPCC throughput, we have physically implemented a 7 7 PPCM DPP circuit for the example frequency-planar filter equation (20) using a single Xilinx Virtex-4 sx35-10ff668 FPGA device. The simulated design is scaled 7 in order to facilitate physical implementation down to 7 and real-time operation (time-periodic 3-D impulse response


1554


Fig. 7. FPGA circuit for simulating and measuring the quantization noise on a sinusoidal input signal for PPCM located at spatial location (0,0).

test) using the midcapacity Virtex-4 sx35-10ff668 device we had available. 1) On-FPGA-Chip Stepped-Hardware Co-Simulation: The 7 physical implementation was verified on-chip for cor7 rect impulse response using the Xilinx XtremeDSP kit-4 prototyping system, the resulting outputs from the FPGA output pins were electronically captured into a Matlab/Simulink simulation environment for analysis [120]–[123]. The measured on-chip impulse response is essentially identical to the theoretical response, as shown in Fig. 9 (top). 2) On-FPGA-Chip Real-Time Implementation at OFPCC Throughput: Real-time tests have been achieved by free-running the on-chip hardware co-simulation (HCS). We applied , a -periodic unit impulse input , at the input port of the PPCM at location (0,0), during the real-time test. For -periodic excitations, the FPGA implementation produces a -periodic 3-D impulse response , , which is then displayed in an oscilloscope in real-time by digital to analog (D/A) conversion of the 13-bit output words at output ports of the PPCMs. This produces output traces for the 7 DPP. For brevity, in Fig. 9 (top), we only show 16 of 7 , the theoretically computed impulse response traces corresponding to PPCM array locations . Shown in Fig. 9 (bottom) are oscilloscope images of the measured impulse response of the real-time implementation. These traces are for the corresponding real-time clock freMHz and for a frame-sample frequency quency of frames/s, thereby achieving a 3-D IIR frequencyplanar filtering at a throughput of OFPCC. Because the observed and , this experthroughput of OFPCC is independent of iment confirms the feasibility of a throughput of OFPCC while achieving stable operation under finite precision arithmetic. Potential VLSI realizations of DPPs, having many more PPCMs, are clearly possible. Further, even though the CPD of 10.357 ns (from design Xilinx design file ./netlist/xflow.results) is equiva-

Fig. 8. PSD showing signal and noise power for a 1-D sinusoidal signal sin(2 (F =8)n ) at the input (top, for 14 and 32 b) of the first PPCM in the array at spatial location (0,0). The PSDs of the outputs y (0; 0; n ) and y (6; 6; n ) (bottom) show the relative degradation of SNR due to multiple quantization stages.

lent to an of 96 MHz, the realization in Fig. 7 is clocked at MHz clock, for reduced finite bandwidth effects of the 60-MHz oscilloscope. C. Quantization Noise Using the test FPGA circuit shown in Fig. 7, the effects of quantization on a test signal has been measured (see Fig. 8). The input signal takes the form



1555

Fig. 9. Continuous-time 3-D unit impulse response h(n ; n ; t) of the 3-D IIR frequency-planar filter, shown for the first 16 spatial locations 0 n 3; 0 n ; 3. For the Xilinx Virtex-4 SX35-10FF668 FPGA-based implementation, the 16 traces are shown calculated (top) and measured for real-time DPP operation = 30 MHz (bottom). [Note: the minor differences in the theoretical and experimental traces are due to use of a at a frame sampling (clock) frequency of F cubic splines interpolation (top) and the low-pass effect of analog oscilloscope having bandwidth of 60 MHz we used (bottom)].

. The PSD of signal and noise, is shown at 32-b precision (for reference) and at 14-b precision at input port (0,0). The output at the PPCM at (0,0), the output of the longest signal path containing the most number of quantizer stages and highest order filtering is at the output of the last PPCM at spatial location (6,6). Fig. 8 shows a worst case scenario for the filter where the input signal is highly correlated and the maximum possible number of quantizer stages (wideband noise sources) has been included. The TCT quantization we used here imposes noise sources having nonzero mean, together with the low-pass nature of the filter 1-D transfer between output function of the PPCM at (6,6) and input of the PPCM at (0,0) shows

relatively low SNRs at very low frequencies. Fortunately, the filter response at such low frequencies are of low practical interest because most broadband sensors (example: antennas) show little or no response to very low temporal frequencies. The experiment described here offers some insight into the quantization noise of the proposed filter circuit. However, new methods for analyzing the quantization effects for broadband 3-D plane-wave input signals needs further work. D. FPGA Resource Consumptions The FPGA implementation on the Virtex-4 sx35-10ff668 device consumes 12088 slices out of the available 15360 slices (78%) and 182 DSP48 embedded hardware multipliers out of


1556


the available 192 multipliers. The multiplier count matches the . The post “place-andmodel in Table I (column 2) for route” timing report indicates maximum delays of 3.172 ns and 2.959 ns, net skews of 0.625 ns and 0.455 ns, and fanouts of 8092 and 259 for the sysgen_clk and dsp_clki clock nets, respectively. E. Real-Time Computational Throughput MHz, the practically achieved frame rate frames/s at OFPCC, by computing fixed-point multiplications per/sec, and fixed-point additions/subtractions per second, respectively. The gain in throughput, compared with a differential-form raster scanned architecture, is a factor of . This implies that such a raster scanned architecture will only support a maximum frame rate of 980 kHz [11], [79]–[81], [124], if implemented in a similar FPGA at comparable level of binary precision (13-bit system word size) [80], [124]. At

is

V. SUMMARY AND CONCLUSION We extend the direct-form systolic array proposed in [9] to a low-complexity differential form having improved critical path delay using the proposed 3-D LA form. This enables the use of well-known 1-D LA speed optimization when the 3-D polar surfaces (or manifolds) of the 3-D frequency-planar transform transfer function are unknown. We have shown how “3-D look-ahead form” of the frequency-planar filter leads to high-throughput VLSI architectures that are practical-BIBO-stable and free of overflow and zero-input temporal limit cycle instabilities in the temporal dimension, while delivering the typically massive and previously unreported OFPCC throughput required in smart antenna array applications. These architectures enable antenna array frame sample frequencies that are same as the system clock frequency, leading to RF frame rates. 7 FPGA real-time implementation demonThe above 7 strates the OFPCC-throughput capability of the DPP systolicarray. It also demonstrates 3-D IIR stability, stability under finite precision arithmetic, pipelining and throughput maximization using the proposed 3-D LA form, and the correct operation of the proposed low-complexity 3-D differential-form circuit. The above results are valid for larger arrays having as many PPCMs as required, and the OFPCC throughput and finite precision stability is equally valid at any clock frequency. The 3-D IIR first-order frequency-planar filter is a building block for 3-D IIR beam and 3-D IIR cone filter banks. Such filters are useful for the highly selective real-time filtering of broadband spatio-temporal plane-waves encountered in smart antenna-array beamforming applications. A typical 3-D IIR OFPCC-capable cone filter bank having 5 real bands [6] for practically useful broadband smart antenna array applications require custom VLSI circuits of larger systolic arrays (a typical size would be 30 30) with as many as 10 integrated 3-D IIR frequency-planar modules. The systolic-array is demonstrated here, in real time, using a Xilinx Virtex-4 sx35-10ff668 FPGA device at clock (frame) frequencies exceeding 90 MHz. The

Fig. 10. Elemental predistorted 3-D pseudo-passive prototype network for 3-D IIR frequency-planar filters in differential-form.

availability of the proposed 3-D frequency-planar filter architecture is a starting point for future 3-D IIR beam and cone filter VLSI DPP implementations at OFPCC throughput, for high- speed applications. APPENDIX A 3-D pseudo-passive network, with 3-D inductors , negative resistors , and to, has a transfer function gether with terminating resistor, [7], [52], [57], [61] (see Fig. 10) (A1) where . The filter (A1), having passand of 3 dB bandband normal vector , is planar-resonant on the frequency-plane width . Differential-form synthesis requires , , 2, 3. Converting to -domain using 3-D BLT , , 2, 3, for ), we obtain

(A2)

where , , 2, 3. The discrete-domain FP filter given by (A2) is warped due to the bilinear transform [6]. REFERENCES [1] L. T. Bruton, “A 3-D polyphase-DFT cone filter bank for broad band plane wave filtering,” in Proc. IEEE Int. Symp. Circuits Syst., Vancouver, BC, Canada, 2004, vol. 3, pp. 181–184. [2] B. Kuenzle and L. T. Bruton, “3-D IIR filtering using decimated DFTpolyphase filter bank structures,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 53, no. 2, pp. 394–408, Feb. 2006. [3] R. W. Issler and L. T. Bruton, “Tracking and enhancement of objects in image sequences using 3-D frequency planar combined DFT/LDE filters,” in Proc. IEEE Int. Symp. Circuits Syst., May 1990, vol. 2, pp. 999–1002. [4] B. Kuenzle and L. T. Bruton, “A novel low-complexity spatio-temporal ultra wide-angle polyphase cone filter bank applied to subpixel motion discrimination,” in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, 2005, vol. 3, pp. 2397–2400. [5] C. J. Kulach, L. T. Bruton, and N. R. Bartley, “A real-time video implementation of a three-dimensional first-order recursive discrete-time filter,” in Proc. IEEE Int. Symp. Circuits Syst., May 1996, vol. 2, pp. 699–702. [6] L. T. Bruton, “Three-dimensional cone filter banks,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 50, no. 2, pp. 208–216, Feb. 2003.



[7] R. K. Bertschmann, N. R. Bartley, and L. T. Bruton, “A 3-D integrator-differentiator double-loop (IDD) filter for raster-scan video processing,” in Proc. IEEE Int. Symp. Circuits Syst., May 1995, vol. 1, pp. 470–473. [8] A. Madanayake and L. T. Bruton, “FPGA prototyping of spatiotemporal 2-D IIR broadband beam plane-wave filters,” in Proc. IEEE 2006 Asia Pacific Conf. Circuits Syst., Singapore, Dec. 2006, pp. 542–545. [9] A. Madanayake and L. Bruton, “A high performance distributed-parallel-processor architecture for 3-D IIR digital filters,” in IEEE Intl. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 2, pp. 1457–1460. [10] S. Y. Kung, “VLSI array processors: Designs and applications,” in Proc. IEEE Int. Symp. Circuits Syst., May 1988, vol. 1, pp. 313–320. [11] M. A. Sid-Ahmed, “Systolic and semi-systolic realizations of three dimensional filters,” IEEE Trans. Consum. Electron., vol. 40, no. 2, pp. 107–113, Feb. 1994. [12] C. Cheng and K. K. Parhi, “A novel systolic array structure for DCT,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 7, pp. 366–369, Jul. 2005. [13] C. M. Rader, “VLSI systolic arrays for adaptive nulling,” IEEE Signal Process. Mag., vol. 13, no. 4, pp. 29–49, Jul. 1996. [14] P. K. Meher, “Design of a fully-pipelined systolic array for flexible transposition-free VLSI of 2-D DFT,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 2, pp. 85–89, Feb. 2005. [15] M. Taheri, G. A. Jullien, and W. C. Miller, “High-speed signal processing using systolic arrays over finite rings,” IEEE J. Sel. Areas Commun., vol. 6, no. 3, pp. 504–512, 1988. [16] G. A. Jullien, W. C. Miller, R. Grondin, L. Del Pup, S. S. Bizzan, and Z. Dapeng, “Dynamic computational blocks for bit-level systolic arrays,” IEEE J. Solid-State Circuits, vol. 29, no. 1, pp. 14–22, Jan. 1994. [17] B. G. Mertzios and A. N. Venetsanopoulos, “Fast implementation of 3-D digital filters via systolic array processors,” Multidimensional Syst. Signal Process., vol. 8, no. 3, pp. 335–349, Jul. 1997. [18] Z. Hu and F. Gaston, “A bit-level systolic 2D-IIR digital filter without feedback,” in Proc. 13th Asilomar Conf. Signals, Syst., Comput., 1996, vol. 2, pp. 1063–1066. [19] B. K. Mohanty and P. K. Meher, “High throughput and low latency implementation of bit-level systolic architectures for 1-D and 2-D digital filters,” IEE Proc. Comput. Digit. Tech., vol. 146, no. 2, pp. 91–99, Mar. 1999. [20] R. Baghaie and V. Dimitrov, “Systolic implementation of real-valued discrete transforms via algebraic integer quantization,” PERGAMON Int. J. Comput. Math. Appl., vol. 41, no. 1, pp. 1403–1416, 2001. [21] J. C. Liberti jnr and T. S. Rappaport, Smart Antennas for Wireless Communications-IS-95 and Third Generation CDMA Applications. Upper Saddle River, NJ: Prentice-Hall, 1999. [22] N. Kiyoshi, “Wideband multi-beam forming method using delayed array sensors and two-dimensional digital filter,” Electron. Commun. Japan (Part III: Fundam. Electron. Sci.), vol. 88, no. 12, pp. 1–12, 2005. [23] Y. Yixin, C. Wan, and C. Sun, “Adaptive design of FIR filter with applications in broadband beamforming,” in Proc. TENCON’04, Nov. 2004, vol. A, pp. 507–510. [24] J. D. Frederick, Y. Wang, and T. Itoh, “A smart antenna receiver array using a aingle RF channel and digital beamforming,” IEEE Trans. Microw. Theory Tech., vol. 50, no. 12, pp. 3052–3058, Dec. 2002. [25] M. Ghavami and R. Kohno, “Frequency selective broadband beamforming using 2-D digital filters,” in Proc. IEEE Veh. Tech. Conf., 2000, vol. 3, pp. 2522–2526. [26] M. Devlin, “How to make smart antenna arrays,” Xilinx Xcell J. Spring, 2003 [Online]. Available: www.xilinx.com/publications/xcellonline/xcell_45/xc_pdf/xc_nallatech45.pdf [27] T. Do-Hong and P. Russer, “Signal processing for wideband smart antenna array applications,” IEEE Microw. Mag., vol. 5, no. 1, pp. 57–67, Mar. 2004. [28] S. W. Ellingson, “A DSP engine for a 64-element array,” in Proc. Perspectives for Radio Astron.—Technol. Large Antenna Arrays, 1999, pp. 235–242. [29] C. T. Rodenbeck, K. Sang-Gyu, T. Wen-Hua, M. R. Coutant, H. Seungpyo, L. Mingyi, and C. Kai, “Ultra-wideband low-cost phased-array radars,” IEEE Trans. Microw. Theory Tech., vol. 53, no. 12, pp. 3697–3703, Dec. 2005. [30] S. Sevskiy and W. Wiesbeck, Ultra-Broadband Omnidirectional Printed Dipole Arrays vol. 3A, Jul. 2005, pp. 545–548. [31] Z. Shutao and I. L. J. Thng, “Robust presteering derivative constraints for broadband antenna arrays,” IEEE Trans. Signal Process., vol. 50, no. 1, pp. 1–10, Jan. 2002.

1557

[32] A. P. Chippendale, “Technology Issues for Square Kilometre Array Receiver Design,” Australia Telescope National Facility, 1999. [33] A. Van Ardenne, “Concepts of the square kilometre array; toward the new generation radio telescopes,” in Proc. IEEE Int. Symp. Antennas Propag., 2000, vol. 1, pp. 158–161. [34] J. P. Weem, B. M. Noratos, and Z. Popovic, “Broadband array considerations for SKA,” in Proc. Perspectives for Radio Astron.—Technol. Large Antenna Arrays, 1999, pp. 59–68. [35] J. G. DeVaate, “RF-IC developments for wide band phased array systems,” Perspectives on Radio Astronomy—Technologies for Large Antenna Arrays, The Netherlands Foundation for Research in Astronomy, 1999, pp. 143–147. [36] S. Haykin, “Radar signal processing,” IEEE ASSP Mag., vol. 2, no. 2, pt. 1, pp. 2–18, Apr. 1985. [37] J. M. Sill and E. C. Fear, “Tissue sensing adaptive radar for breast cancer detection-experimental investigation of simple tumor models,” IEEE Trans. Microw. Theory Tech., vol. 53, no. 11, pp. 3312–3319, Nov. 2005. [38] P. V. Genderen, “State-of-the-art and trends in phased array radar,” in Perspectives on Radio Astronomy—Technologies for Large Antenna Arrays. Dingeloo, The Netherlands: ASTRON, 1999. [39] Z. Zheng-Shu, W. M. Boerner, and M. Sato, “Development of a ground-based polarimetric broadband SAR system for noninvasive ground-truth validation in vegetation monitoring,” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 9, pp. 1803–1810, 2004. [40] E. M. Staderini, “UWB radars in medicine,” IEEE Aerosp. Electron. Syst. Mag., vol. 17, no. 1, pp. 13–18, Jan. 2002. [41] A. Andriianov, Generators, Antennas and Registrator for UWB Radar Application 2004, pp. 135–139. [42] D. S. Goshi, Y. Wang, and T. Itoh, “A compact digital beamforming SMILE array for mobile communications,” IEEE Trans. Microw. Theory Tech., vol. 52, no. 12, pp. 2732–2738, Dec. 2004. [43] M. Ghavami, L. B. Michael, and R. Kohno, Ultra Wideband Signals and Systems in Communication Engineering. New York: Wiley, 2004. [44] K. Gold, R. Silva, R. Worrel, and A. Brown, “Space navigation with digital beam steering GPS receiver technology,” in Proc. 59th Annu. Meeting of ION, Alberquerque, NM, Jun. 2003. [45] R. Silva, R. Worrel, and A. Brown, “Reprogrammable, digital beam steering GPS receiver technology for enhanced space vehicle operations,” in Proc. Core Technol. Space Syst. Conf., Colorado Springs, CO, Nov. 2002. [46] A. C. Tan and H. Sun, “Structurally passive synthesis of three-dimensional recursive cone filters,” in Proc. 32nd Midwest Symp. Circuits Syst., Aug. 1989, vol. 2, pp. 1119–1122. [47] M. Bolle, “A closed-form design method for 3-D recursive cone filters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 1994, vol. 6, pp. 141–144. [48] L. Bruton and N. Bartley, “Highly selective three-dimensional recursive beam filters using intersecting resonant planes,” IEEE Trans. Circuits Syst., vol. CAS-30, no. 3, pp. 190–193, Mar. 1983. [49] Y. T. Chan and G. A. Lampropolus, “Broadband beamforming on the 2-D transform plane,” in Proc. Int. Conf. Acoust., Speech Signal Process., Apr. 1988, vol. 5, pp. 2753–2756. [50] L. Bruton and N. Bartley, “Authors’ reply to comments on ‘Highly selective three-dimensional recursive beam filters using intersection resonant planes’,” IEEE Trans. Circuits Syst., vol. CAS-33, no. 6, pp. 670–673, Jun. 1986. [51] Q. Liu and L. T. Bruton, “Design of 3-D planar and beam recursive digital filters using spectral transformation,” IEEE Trans. Circuits Syst., vol. 36, no. 3, pp. 365–374, Mar. 1989. [52] Y. Zhang and L. T. Bruton, “Differentiator-type three-dimensional recursive ladder filters having frequency-planar- or frequency-beam-shaped passbands,” IEEE Trans. Circuits Syst. Video Technol., vol. 2, no. 3, pp. 297–305, Sep. 1992. [53] L. T. Bruton and N. R. Bartley, “Three-dimensional image processing using the concept of network resonance,” IEEE Trans. Circuits Syst., vol. CAS-32, no. 7, pp. 664–672, Jul. 1985. [54] L. Bruton and N. Bartley, “The design of highly selective adaptive three-dimensional recursive cone filters,” IEEE Trans. Circuits Syst., vol. 34, no. 7, pp. 775–781, Jul. 1987. [55] L. Bruton and N. Bartley, “The enhancement and tracking of moving objects in digital images using adaptive three-dimensional recursive filters,” IEEE Trans. Circuits Syst., vol. CAS-33, no. 6, pp. 604–612, Jun. 1986.


1558


[56] T. J. Fowlow and L. T. Bruton, “Attenuation characteristics of three-dimensional planar-resonant recursive digital filters,” IEEE Trans. Circuits Syst., vol. 35, no. 5, pp. 595–599, May 1988. [57] Z. Yuejin and L. T. Bruton, “Applications of 3-D LCR networks in the design of 3-D recursive filters for processing image sequences,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp. 369–382, Aug. 1994. [58] B. Kuenzle and L. T. Bruton, “A novel low-complexity spatio-temporal ultra wide-angle polyphase cone filter bank applied to sub-pixel motion discrimination,” in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, 2005, vol. 3, pp. 2397–2400. [59] L. T. Bruton and S. Singh, “Plane wave filtering using a novel 3-D conestop filter bank,” in Proc. 45th Midwest Symp. Circuits Syst., 2002, vol. 3, pp. 676–679. [60] L. T. Bruton, “Selective filtering of spatio-temporal plane waves using 3-D cone filter banks,” in Proc. IEEE Pacific Rim Conf. Commun., Comput. Signal Process., Aug. 2001, vol. 1, pp. 67–70. [61] L. Qingli and L. T. Bruton, “Sensitivity analysis Of 3-D recursive digital beam filter structures,” in Proc. 22nd Asilomar Conf. Signals, Circuits Syst., 1988, vol. 1, pp. 161–165. [62] T. J. Fowlow and L. T. Bruton, “The design and application of a high quality three dimensional linear trajectory filter,” in Proc. IEEE Int. Symp. Circuits Syst., Jun. 1998, vol. 2, pp. 1033–1036. [63] M. S. Lazar and L. T. Bruton, “On the practical BIBO stability of multidimensional filters,” in Proc. IEEE Int. Symp. Circuits Syst., May 1993, pp. 571–574. [64] J. Wong and H. King, “Broadband quasi-taper helical antennas,” IEEE Trans. Antennas Propag., vol. 27, no. 1, pp. 72–78, Jan. 1979. [65] K. Noguchi, S. I. Betsudan, T. Katagi, and M. Mizusawa, “A compact broadband helical antenna with two-wire helix,” IEEE Trans. Antennas Propag., vol. 51, no. 9, pp. 2176–2181, Sep. 2003. [66] A. S. Andrenko, “Comparative study of wideband properties of planar solid and strip fractal bow-tie dipoles,” Wireless Commun. Appl. Computat. Electromagn., pp. 178–181, Dec. 3–7, 2005. [67] T. Karacolak and E. Topsakal, “A Double-Sided Rounded Bow-Tie Antenna (DSRBA) for UWB communication,” Antennas Wireless Propag. Lett., vol. 5, no. 1, pp. 446–449, 2006. [68] L. Belostotski, J. W. Haslett, and B. Veidt, “Wide-band CMOS low noise amplifier for applications in radio astronomy,” in Proc. IEEE Int. Symp. Circuits Syst., Kos, Greece, May 2006, pp. 1347–1350. [69] B. Razavi, “A 60-GHz CMOS receiver front-end,” IEEE J. Solid-State Circuits, vol. 41, no. 1, pp. 17–22, Jan. 2006. [70] L. Belostotski and J. W. Haslett, “Sub-0.2 dB noise figure wideband room-temparature CMOS LNA with non-50 signal-source impedance,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2492–2502, Nov. 2007. [71] C. Vogel and H. Johansson, “Time interleaved analog-to-digital converters: Status and future directions,” in Proc. IEEE Int. Symp. Circuits Syst., Kos, Greece, May 2006, pp. 3386–3389. [72] J. Elbornsson, F. Gustafsson, and J. E. Eklund, “Analysis of mismatch noise in randomly interleaved ADC system,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2003, vol. 6, pp. 277–280. [73] A. W. Gunst, A. B. J. Kokkeler, and G. W. Kant, “A/D converter research for SKA,” in Perspectives on Radio Astronomy—Technologies for Large Antenna Arrays. Dingeloo, The Netherlands: ASTRON, 1999, pp. 261–264. [74] S. Shahramian, A. C. Carusone, and S. P. Voinigescu, “Design methodology for a 40-G samples/s track and hold amplifier in 0.18 um SiGe BiCMOS technology,” IEEE J. Solid-State Circuits, vol. 41, no. 10, pp. 2233–2240, Oct. 2006. [75] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [76] K. Parhi and D. Messerschmitt, “Look-ahead computation: Improving iteration bound in linear recursions,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Apr. 1987, vol. 12, pp. 1855–1858. [77] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1990. [78] H. Schroeder and H. Blume, One- and Multidimensional Signal Processing—Algorithms and Applications in Image Processing. New York: Wiley, 2000. [79] G. Runge, “Implementation of 3-D IIR filters for stream processing applications,” in Proc. IEE Colloq. Multidimensional Syst.: Problems and Solutions, 1998, vol. 4, pp. 1–4. [80] H. L. P. A. Madanayake and L. T. Bruton, “A low-complexity scannedarray 3-D IIR frequency-planar filter,” in Proc. IEEE Int. Symp. Circuits Syst., Kobe, Japan, May 2005, vol. 3, pp. 2032–2035.

[81] C. J. Kulach, L. T. Bruton, and N. R. Bartley, “Real-time 3-dimensional recursive digital filter for video signals,” in Proc. 29th Asilomar Conf. Signals, Syst., Comput., Nov. 1995, vol. 2, pp. 1001–1005. [82] K. Jaemin and J. W. Woods, “Spatio-temporal adaptive 3-D Kalman filter for video,” IEEE Trans. Image Process., vol. 6, no. 3, pp. 414–424, Mar. 1997. [83] V. Sundarajan and K. K. Parhi, “Synthesis of folded multidimensional DSP systems,” in Proc. IEEE Int. Symp. Circuits Syst., May 1998, vol. 2, pp. 433–436. [84] S. E. Sussman-Fort, “Matching network design using non-foster impedances,” Intl. J. RF Microw. CAE, vol. 16, no. 1, pp. 135–142, 2006. [85] R. M. Rudish and S. E. Sussman-Fort, “Non-foster impedance matching improves S/N of wideband electrically-small VHF antennas and arrays,” in Proc. IASTED Conf. Antennas, Radar, Wave Propag., Banff, AB, Canada, Jul. 2005, vol. 1, pp. 318–323. [86] W. K. Chen, Broadband Matching: Theory and Implementations, 2 ed. Teaneck, NJ: World Scientific, 1988, vol. 1. [87] E. H. Newman, “Real frequency wideband impedance matching with non-minimum reactance equalizers,” IEEE Trans. Antennas Propag., vol. 53, no. 11, pp. 3597–3603, Nov. 1991. [88] N. R. Shanbhag, “An improved systolic architecture for 2-D digital filters,” IEEE Trans. Signal Process., vol. 39, no. 5, pp. 1195–1202, May 1991. [89] S. C. Wen, C.-M. Liu, and C.-W. Jen, “The designs of two-level pipelined systolic arrays for recursive digital filters with maximum throughput,” in Proc. IEEE Int. Symp. VLSI Technol., Syst. Applications, May 1991, vol. 1, pp. 317–321. [90] K. K. Parhi and D. G. Messerschmitt, “Concurrent architectures for two-dimensional recursive digital filtering,” IEEE Trans. Circuits Syst., vol. 36, no. 6, pp. 813–829, Jun. 1989. [91] L. Xiaojian and L. T. Bruton, “High-speed systolic ladder structures for multidimensional recursive digital filters,” IEEE Trans. Signal Process., vol. 44, no. 4, pp. 1048–1055, Apr. 1996. [92] M. A. Sid-Ahmed, “A systolic realization for 2-D digital filters,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 4, pp. 560–565, Apr. 1989. [93] H. J. Kaufman and M. A. Sid-Ahmed, “Hardware realization of a 2-D IIR semisystolic filter with application to real-time homomorphic filtering,” IEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 1, pp. 2–14, Feb. 1993. [94] H. J. Kaufman and M. A. Sid-Ahmed, “A switched-capacitor implementation for video rate 2-D filters,” IEEE Trans. Consum. Electron., vol. 39, no. 2, pp. 136–140, May 1993. [95] J.-J. Lee and G.-Y. Song, “Implementations of the super-systolic array for convolution,” in Proc. IEEE Asia South Pacific—Design Autom. Conf., Jan. 2003, pp. 491–494. [96] Z. Qiu, H. Lai, and X. Du, “The modular systolic designs for 2-D IIR digital filters,” in Proc. Int. Conf. Commun. Technol., Beijing, China, vol. 1, pp. 2345–2348. [97] J. Velten, M. Krips, and A. Kummert, “Realization of N-D signal processing tasks in high-speed applications,” in Proc. 4th Int. Workshop Multidimensional Syst., 2005, pp. 24–29. [98] S. Y. Kung, VLSI Array Processors. Englewood Cliffs, NJ: PrenticeHall, 1988. [99] P. K. Meher, “Systolic designs for DCT using a low-complexity concurrent convolutional formulation,” IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 9, pp. 1041–1050, Sep. 2006. [100] P. K. Meher, “Hardware-efficient systolization of DA-based calculation of finite digital convolution,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 707–711, Aug. 2006. [101] Z. Yuejin and L. T. Bruton, “Applications of 3-D LCR networks in the design of 3-D recursive filters,” in Proc. IEEE Int. Symp. Circuits Syst., 1993, pp. 906–909. [102] L. T. Bruton and T. E. Strecker, “Two-dimensional discrete filters using spatial integrators,” Proc. Inst. Elect. Eng., vol. 130, no. 6, pt. G, pp. 271–275, Dec. 1983. [103] K. K. Parhi and D. G. Messerschmitt, “Pipelined VLSI recursive filter architectures using scattered look-ahead and decomposition,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., New York, Apr. 1988, pp. 2120–2123. [104] K. K. Parhi, “Pipelining in algorithms with quantizer loops,” IEEE Trans. Circuits Syst., vol. 38, no. 7, pp. 745–754, Jul. 1991. [105] K. K. Parhi, “Finite word effects in pipelined recursive filters,” IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1450–1454, Jun. 1991. [106] R. C. Wrede and M. Spiegel, Advanced Calculus, 2nd ed. New York: McGraw-Hill, 2005, pp. 43–44.



[107] B. Fine and G. Rosenberger, The Fundamental Theorem of Algebra. New York: Springer-Verlag, 1997. [108] P. Agathoklis and L. T. Bruton, “Practical-BIBO stability of N-dimensional discrete systems,” Proc. Inst. Elect. Eng., vol. 130, no. 6, pt. G, pp. 236–242, Dec. 1983. [109] J. G. Proakis and D. G. Manolakis, Digital Signal Processing—Principles, Algorithms, and Applications, 3rd ed. Upper Saddle River, NJ: Prentice-Hall. [110] K. K. Parhi, “Finite word effects in pipelined recursive filters,” IEEE Trans. Signal Process., vol. 39, no. 6, pp. 1450–1454, Jun. 1991. [111] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice-Hall, pp. 416–418. [112] L.-E. Lundgren, “How to Make an ASIC Prototype,” Oct. 2005 [Online]. Available: http://www.fpgajournal.com/articles_2005/20051018_hardi.html [113] “Optimizing Analog Performance in Mixed-Signal Circuit Design, Seminar Training Guide,” Silicon Labs Inc., Austin, TX, 2007 [Online]. Available: www.silabs.com [114] L. Bruton, “Low-sensitivity digital ladder filters,” IEEE Trans. Circuits Syst., vol. CAS-22, no. 3, pp. 168–176, Mar. 1975. [115] Q. Liu and L. T. Bruton, “Design of 3-D planar and beam recursive digital filters using spectral transformation,” IEEE Trans. Circuits Syst., vol. 36, no. 3, pp. 365–374, Mar. 1989. [116] C. K. Hi and W. G. Bliss, “Limit cycle behavior of pipelined recursive digital filters,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process,, vol. 41, no. 5, pp. 351–355, May 1994. [117] D. L. Jones, “Digital filter structures and quantization error analysis,” Connexions, 2004 [Online]. Available: http://cnx.org/content/col10259/latest/ [118] T. I. Laakso, P. S. R. Diniz, I. Hartimo, and T. C. Macedo, Jr., “Elimination of zero-input and constant-input limit cycles in single-quantizer recursive filter structures,” IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 39, no. 9, pp. 638–646, Sep. 1992. [119] D. Denning, J. Irvine, D. Stark, and M. Devlin, “Multi-user FPGA Co-simulation Over TCP/IP,” in Proc. 15th IEEE Int. Workshop Rapid Syst. Prototyping, Jun. 2004, pp. 151–156. [120] Z. Pohl, J. Schier, M. Licko, A. Hermanek, M. Tichy, R. Matousek, and J. Kadlec, “Logarithmic arithmetic for real data types and support for Matlab/Simulink based rapid-FPGA-prototyping,” in Proc. Int. Symp. Parallel Distrib. Process., Apr. 2003. [121] F. Rivoallon, “Achieving Breakthrough Performance in Virtex-4 FPGAs,” Xilinx Inc., WP218 (v1.4), May 2006 [Online]. Available: www.xilinx.com [122] M. A. Aseeri, M. I. Sobhy, and P. Lee, “Lorenz chaotic model using Field Programmable Gate Array (FPGA),” in Proc. IEEE 45th Midwest Symp. Circuits Syst., 2002, vol. 1, pp. 527–530. [123] A. Madanayake, L. Bruton, and C. Comis, “FPGA architectures for real-time 2D/3D FIR/IIR plane wave filters,” in Proc. IEEE Int. Symp. Circuits Syst., May 2004, vol. 3, pp. 613–616.

1559

[124] A. Madanayake and L. Bruton, “A fully-multiplexed first-order frequency-planar module for fan, beam, and cone plane-wave filters,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 53, no. 8, pp. 697–701, Aug. 2006. H. L. P. Arjuna Madanayake (S’03) received the B.Sc. degree (with first class hons.) in electronic and telecommunication engineering from the University of Moratuwa, Moratuwa, Sri Lanka, and the M.Sc. degree in electrical engineering from the University of Calgary, Calgary, AB, Canada, where he is currently working toward the Ph.D. degree. He has been with the MD Signal Processing Group, University of Calgary, since July 2002. He was the engineering team leader for a US Fish and Wildlife Service funded research project in Sri Lanka (June 2001–June 2002), where he helped develop DSP instruments for the detection of elephant infrasound calls. Mr. Madanayake was a recipient of the Informatics Circle of Research Excellence (iCORE) award for Doctoral Studies for International Students in 2005.

Len T. Bruton (F’81) received the B.Sc. degree in electrical engineering from the University of London, U.K., the M.Eng. degree in electrical engineering from Carleton University, Ottawa, ON, Canada, and the Ph.D. degree in electrical engineering from the University of Newcastle Upon Tyne, U.K., in 1964, 1967, and 1970, respectively. He is a Professor with the Department of Electrical and Computer Engineering, University of Calgary, Calgary, AB, Canada, where he carries out research in the fields of analogue and digital signal processing with emphasis on MD-CAS applied to the design and implementation of 2D/3D/4D filtering. He is especially interested in the VLSI real-time applications of spatio-temporal filters for array processing. Dr. Bruton is a member of the Royal Society of Canada and was the recipient of the 2002 Education Award of the IEEE Circuits and Systems (CAS) Society, the 50th Jubilee Medal of the IEEE CAS Society, and the 1994 Outstanding Engineer Award of IEEE Canada. In Canada, he received the 1992 Manning Principal Award for Innovation, and he is one of 162 scientists selected for inclusion in the textbook Great Canadian Scientists by Barry Shell (Polestar Book Publishers, 1997). He received the 1992 Alberta Science and Technology (ASTech) Award for Innovation in Science in recognition of his leadership in science and, in 1993, he received the Federal Government of Canada’s 125th Anniversary of Canadian Confederation Medal in recognition of his significant contributions to compatriots, community, and to Canada.


A Systolic-Array Architecture for First-Order 3-D IIR ... - Semantic Scholar

A Systolic-Array Architecture for First-Order 3-D IIR ... - Semantic Scholar

Suggest Documents

A GPU-based Architecture for Supporting 3D ... - Semantic Scholar

3D Polyaniline Architecture by Concurrent ... - Semantic Scholar

A WISE method for designing IIR filters - Signal ... - Semantic Scholar

Image-based 3D digitizing for plant architecture ... - Semantic Scholar

An Object-Oriented Software Architecture for 3D ... - Semantic Scholar

Intermediate-Node Initiated Reservation (IIR) - Semantic Scholar

A SCALABLE ARCHITECTURE FOR DIRECTORY ... - Semantic Scholar

A Hierarchical Architecture for Nonblocking ... - Semantic Scholar

A Compositionality Architecture for Perceptual ... - Semantic Scholar

TOWARD A GENERIC ARCHITECTURE FOR ... - Semantic Scholar

A distributed architecture for autonomous ... - Semantic Scholar

CRISP: A Template Architecture for - Semantic Scholar

A Decentralized Broker Architecture for ... - Semantic Scholar

A Communication Architecture for Massive ... - Semantic Scholar

A Management Architecture for Heterogeneous ... - Semantic Scholar

A Hierarchical Communication Architecture for ... - Semantic Scholar

A DECENTRALIZED ARCHITECTURE FOR ... - Semantic Scholar

Complex chebyshev approximation for IIR digital ... - Semantic Scholar

Explicit Formulas for Orthogonal IIR Wavelets - Semantic Scholar

A Complex Adaptive IIR notch Filter Algorithm with ... - Semantic Scholar

Genetic Algorithm for the Design of Optimal IIR ... - Semantic Scholar

3D nFPGA: A Reconfigurable Architecture for 3D CMOS/Nanomaterial ...

Architecture Support for 3D Obfuscation

Architecture and Implementation of 3D Engine ... - Semantic Scholar