Field programmable gate array-based design and ... - Semantic Scholar

0 downloads 0 Views 280KB Size Report
Oct 8, 2008 - averaging (ACCA) constant false alarm rate (CFAR) detector based on ordered data ... estimate by a scaling constant based on the desired Pfa.
www.ietdl.org Published in IET Circuits, Devices & Systems Received on 11th March 2008 Revised on 8th October 2008 doi: 10.1049/iet-cds:20080072

ISSN 1751-858X

Field programmable gate array-based design and realisation of automatic censored cell averaging constant false alarm rate detector based on ordered data variability A.M. Alsuwailem1 S.A. Alshebeili1 M.H. Alhowaish2 S.M. Qasim1 1

Department of Electrical Engineering, King Saud University, Riyadh 11421, Saudi Arabia Communications and Information Technology Commission, Riyadh, Saudi Arabia E-mail: [email protected] 2

Abstract: The design and field programmable gate array (FPGA)-based realisation of automatic censored cell averaging (ACCA) constant false alarm rate (CFAR) detector based on ordered data variability (ODV) is discussed here. The ACCA – ODV CFAR algorithm has been recently proposed in the literature for detecting radar target in non-homogeneous background environments. The ACCA – ODV detector estimates the unknown background level by dynamically selecting a suitable set of ranked cells and doing successive hypothesis tests. The proposed detector does not require any prior information about the background environment. It uses the variability index statistic as a shape parameter to accept or reject the ordered cells under investigation. Recent advances in FPGA technology and availability of sophisticated design tools have made it possible to realise the computation intensive ACCA –ODV detector in hardware, in a cost-effective way. The architecture is modular and has been implemented and tested on an Altera Stratix II FPGA using Quartus II software. The post place and route result show that the proposed design can operate at 100 MHz, the maximum clock frequency of the prototyping board and for this frequency the total processing time required to perform a single run is 0.21 ms. This amounts to a speedup for the FPGA-based hardware implementation by a factor of 110 as compared to software-based implementation, which takes 23 ms to perform the same operation.

1

Introduction

Radar is an acronym for Radio Detection and Ranging. Radar is an electromagnetic system used for the detection, location and some times for recognition of objects (or targets). It operates by transmitting electromagnetic energy and then extracting the necessary information about the target from the returned echo signal. Radar system analysis provides quantitative estimates of performance in all the radar’s functions, for a variety of target types in a variety of complex environments. Radar has been used, or proposed for use, in a wide range of applications, both in military and civilian systems [1]. 12

& The Institution of Engineering and Technology 2009

The received signal in a radar system is always accompanied by thermal noise and clutter. Clutter is the term applied to any unwanted radar signal from scatterers that are not of interest to the radar user. Examples of clutter in radar signal detection are reflections from terrain, sea, rain, birds, insects, chaff and so on [2]. The performance of the radar receiver is greatly dependent on the presence of such disturbances, and the receiver is desired to achieve constant false alarm rate (CFAR), and maximum probability of target detection. In a radar system, a target is detected when the output of the receiver crosses a predetermined fixed threshold level set to achieve a specified probability of false alarm (Pfa). Modern radars usually make the detection decision automatic by using an IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

www.ietdl.org adaptive threshold based on a CFAR detector. This detector dynamically determines a detection threshold by estimating the local background noise/clutter power and multiplying this estimate by a scaling constant based on the desired Pfa . Although software-based implementation is very flexible, the whole CFAR processing may degrade the performance of the processor. Hence, to accelerate the processing, it is proposed to realise the computationally intensive CFAR detector in field programmable gate array (FPGA) hardware. FPGAs are a form of programmable logic. They offer design flexibility like software, but with time performance closer to application-specific integrated circuits. Recent advances in FPGA technology have resulted in enormous possibilities for the implementation of sophisticated algorithms of high complexity, in a variety of important applications, by using low cost, high performance and high speed reconfigurable hardware. FPGAs have become one of the prevailing technologies for fast prototyping and implementation of complex digital systems [3]. In this paper, a novel FPGAbased design and realisation of a complex automatic censored cell averaging (ACCA) CFAR detector based on ordered data variability (ODV), proposed in [4], are presented. The rest of this paper is organised as follows. Section 2 describes the background information on CFAR theory and the related work done towards the hardware realisation of different CFAR processors. Section 3 describes the ACCA–ODV detection algorithm. The hardware architecture of the proposed CFAR detector is discussed in detail in Section 4. Section 5 provides the FPGA-based realisation and simulation results. FPGA prototyping results are discussed in Section 6. Finally, Section 7 presents concluding remarks and some directions for future research.

2 Background theory and related work A typical CFAR processor as shown in Fig. 1 consists of a matched filter followed by an envelope detector and a non-coherent integrator. The output samples of the

Figure 1 Block diagram of a typical CFAR detector IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

non-coherent integrator are fed sequentially into a shift register. The adaptive threshold Z, which is proportional to the estimate of the total noise power, is formed by processing the contents of reference cells surrounding the cell under test (CUT), whose content is Y. To maintain Pfa at the desired value, the adaptive threshold is multiplied by a scaling factor called the threshold multiplier T. The product TZ is the resulting adaptive threshold. The output Y from the CUT is then compared with the threshold in order to make a decision. A target is declared to be present if Y exceeds TZ. The processor configuration varies with different CFAR schemes. For example, cell averaging (CA) CFAR processor sums the contents of surrounding cells to produce the statistic Z, that is



N X

Xi

(1)

i¼1

For homogenous environments, the CA – CFAR processor is optimum. However, in the presence of interfering targets, the assumption of homogenous environment is no longer valid. The performance of the CA– CFAR processor seriously degrades under such conditions. Various classes of CFAR techniques have been proposed to enhance the robustness against non-homogeneous environment for different applications [5]. In particular, ordered statistics (OS)-based CFAR detectors proved to provide good performance in the presence of interference. The clutter power estimate Z, in OS– CFAR detectors, is computed by sorting the observations in the reference window in ascending order and setting Z ¼ X(k)

(2)

where X(k) is the kth ordered sample. The rank of the order statistic to be used is determined in advance. It can be any value between 1  k  N, and is typically chosen to maximise detection performance. The OS – CFAR detector has a small additional detection loss over the CA –CFAR detector in homogeneous backgrounds, but can resolve closely spaced interferences. However, it requires a longer processing time than the CA– CFAR detector. Efficient hardware realisations of different CFAR detectors were considered by many authors. In particular, a configurable hardware architecture for adaptive processing of noisy signals for target detection based on CFAR algorithms has been presented in [6]. The architecture has been designed to deal with parallel/pipeline processing and to be configured for three versions of CFAR algorithms: the cell average, the max and the min CFAR [7]. In [8], hardware realisation of a CA– CFAR processor using discrete components has been reported. The design suffers from several drawbacks as compared to the previous work, such as low resolution in 13

& The Institution of Engineering and Technology 2009

www.ietdl.org data elements size, small array averaging, no safeguard cells, non-configurable and complex design. In [9], real-time design of a more complex CFAR detector, taking into account the great opportunity offered by FPGAs, is developed. The system considered in [9] combines ordering with arithmetic averaging. Such a detector is known as trimmed mean (TM) filtering in the signal processing literature [2, 5]. The TM – CFAR detector reduces to the CA– CFAR and OS– CFAR detectors for specific trimming values. The CFAR detectors discussed so far have been developed under the assumption of a homogeneous background. In practice, the environment is usually non-homogeneous because of the presence of multiple targets and/or clutter edges in the reference window, which consists of a finite number of range samples of received radar signal. In such situations, OS detectors have been known to yield good performance as long as the non-homogeneous background and outlying returns are properly discarded [5]. However, most of the work in the literature considers some type of censoring based on a priori knowledge or a judicial guess. Techniques based on the automatic censoring of unwanted cells have been proposed in the literature [2]. The ACCA– ODV CFAR detector selects dynamically, by doing successive hypothesis tests, a suitable set of ranked reference window cells to estimate the unknown background level and set the adaptive threshold accordingly. The advantage associated with this detector is that it neither requires any prior information about the clutter parameters nor does it require the number of interfering targets. The effectiveness of the ACCA–ODV algorithm has been extensively studied in [4] by computing the probability of censoring and the probability of detection in different background environments.

3

Figure 2 Block diagram of ACCA– ODV CFAR detector

to represent the largest rank possible, since CFAR loss would increase with the decrease in the value of j. In particular, the numerical results obtained in [5] show that the appropriate value of j, when detection is performed in homogeneous environments, is j ¼ N. However, in the presence of k interfering targets in the reference window, the value of j is best selected such that j ¼ N 2 k. Therefore the main objective of the ACCA– ODV censoring algorithms is to have the task of determining the best value of k. Once the number of interfering targets is determined automatically, the output of the test cell X0 is then compared with the adaptive threshold Tk according to H1

ACCA–ODV detection algorithm

The square law detected range samples, fXi: i ¼ 0, 1, . . . , Ng, are sent serially into a tapped delay line, as shown in Fig. 2. X0 is the test cell. The remaining N cells surrounding the test cell are the auxiliary cells that are used to construct the CFAR procedure. These auxiliary cells are ranked in ascending order according to their magnitudes to yield X (1)  X (2)      X (p)      X (N )

(3)

The test cell X0 is to be compared with the threshold Tk , to decide whether a target is present or not. Selecting Tk ¼ tk

j X

X (i)

.

X0

Tk

(5)

,

H0

where the adaptive threshold Tk (or equivalently the parameter tk) is selected so that the design Pfa is achieved. Hypothesis H1 denotes the presence of a target in the test cell, whereas H0 is the null hypothesis (i.e. no target is present). To determine the number of interfering targets k, the ODV statistic V0 is first compared with the ODV threshold S0 , which is selected so that a low probability of false censoring Pfc is maintained. The statistic V0 is defined as follows

(4)

i¼1

leads to a CFAR processor in Rayleigh clutter. The threshold Tk is parameterised by the variable tk . The subscript j is taken 14

& The Institution of Engineering and Technology 2009

V0 ¼

mp þ X (N )2 (sp þ X (N ))2

(6)

IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

www.ietdl.org where p X

sp ¼

X (i)

Hypothesis H1 represents the case where X(N 2 k) and thus the subsequent samples X(N 2 k þ 1), X(N 2 k þ 2), . . . , X(N ) correspond to clutter samples with interference, whereas H0 denotes the case where X(N 2 k) is a clutter sample without interference.

(7)

i¼1

and

mp ¼

p X

X 2 (i)

The successive tests are repeated as long as the hypothesis H1 is declared true. The algorithm stops when the cell under investigation is declared homogeneous (i.e. clutter sample only) or, in the extreme case, when all the N 2 p highest cells are tested (i.e. k ¼ N 2 p).

(8)

i¼1

The parameter p has to be carefully selected to yield a robust performance in both homogeneous and non-homogeneous environments. Values of p . N/2 have been found to yield a reasonable performance [4].

It is quite clear from Fig. 2 that the threshold selection is a key element in the implementation of the ACCA– ODV algorithm. The threshold parameter tk is determined for a design Pfa by [4, 5]

If V0 , S0 , the algorithm decides that X(N ) corresponds to a clutter sample without interference, and it terminates. If, on the other hand, V0 . S0 , the algorithm decides that the sample X(N ) is a return echo from an interfering target. In this case, X(N ) is censored and the algorithm proceeds to compare the statistic V1 with the threshold S1 to determine whether X(N 2 1) corresponds to an interfering target or a clutter sample without interference. In this case, we have V1 ¼

 Pfa (k) ¼

N N k

 NY k Tk þ j¼1

 N  j þ 1 1 N kjþ1

(11)

As of Sk , these thresholds are selected such that a low probability of hypothesis test error is achieved in a homogeneous environment. For the ACCA–ODV algorithm, this probability is defined, at each value of k, as ek ¼ Prob(Vk . Sk j homogeneous environment)

mp þ X (N  1)2

(9)

(sp þ X (N  1))2

(12)

The ODV thresholds Sk are selected such that a low Pfc is maintained at each step [4]. Hence, the values of Sk are determined by setting

At the (k þ 1)th step, the ODV statistic Vk is compared with the threshold Sk and a decision is made according to the test

e0 ¼ e1 ¼    ¼ eN p1 ¼ design Pfc

(13)

H1

.

Vk

Table 1 gives the threshold parameter Sk obtained using ACCA– ODV algorithm in a Gaussian homogeneous background [4].

(10)

Sk ,

H0

4 ACCA–ODV CFAR detector architecture

where Vk ¼

mp þ X (N  k)2

The proposed detector comprises of the following main modules: shift register, sorting and censoring module, parallel

(sp þ X (N  k))2

Table 1 ODV thresholds in a homogeneous background with exponential probability density function (pdf) (N, p)

(16, 12)

Sk

Pfc

10

22

5  1023

S1

S2

S3

S4

S5

S6

S7

0.356

0.246

0.199

0.173









0.389

0.267

0.213

0.183









23

0.456

0.320

0.246

0.206









1022

0.332

0.235

0.189

0.162

0.143

0.131

0.122

0.117

5  10

0.362

0.255

0.204

0.173

0.152

0.138

0.129

0.122

1023

0.422

0.305

0.240

0.200

0.174

0.155

0.142

0.133

10 (24, 16)

S0

23

IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

15

& The Institution of Engineering and Technology 2009

www.ietdl.org adder, multiplier, comparator and timing/control unit as shown in Fig. 3. Fig. 4 represents the flow chart of the ACCA–ODV CFAR algorithm. For illustration, a reference window of length N ¼ 16 is taken into consideration. The number of guard cells and parameter p are fixed at 2 and 12, respectively. The serial-in parallel-out shift register consists of N reference cells, surrounding the test cell, which is located at the centre tap. Samples of the reference window are divided into N/2 symmetrical leading (right side) and lagging (left side) groups. In addition, there are G safeguard cells, which are divided into two symmetrical groups separating the test cell from the reference cells on both sides. In order to avoid the possibility of any signal energy spill from the test cell into the adjacent cells, guard cells are used. The raw data samples of the signal to be processed is received from the radar system in digital form and fed to the shift register in a serial manner. The length L of the shift register is given by L ¼ N þ G þ 1 cells

(14)

The two N/2 cell reference groups are merged into one N cell output group and sent to the sorting circuit for arranging the samples in ascending order (the highest values are on the right side). The sorting circuit is the most sophisticated part as far as the circuit size and processing time are concerned in this design. This is because the sorting must be done sequentially for the N cells. After sorting is done, the subsequent operations are explained in detail in Fig. 5, where a certain number of data cells at the array edges are subjected to an automatic censoring mechanism from one side (right side in our case). The censoring operation is very beneficial in minimising the estimation error of the background, and is performed according to the background configuration. The basic idea of this circuit is to consider p of the lowest cells ( p ¼ 12 in our case) to represent the initial estimation of the background level and then use this estimate to compute the number of interfering targets k. Background noise is estimated by first computing the values of sp and mp . Once computed, they can be used

Figure 4 Flow chart of ACCA– ODV CFAR detector

Figure 3 ACCA– ODV CFAR detector architecture 16

& The Institution of Engineering and Technology 2009

IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

www.ietdl.org maximum clock frequency (Fmax) of the FPGA chip employed, that is Ttotal ¼

Tclk s Fmax

(15)

Fmax entirely depends on the FPGA family selected, whereas the required number of clock cycles Tclk is composed of the following: three clock cycles for shifting the data one cell, two clock cycles for the multiply and compare operations (one clock cycle each) and N clock cycles for the sorting algorithm. Therefore Ttotal changes linearly with the number of reference cells N, and is given by Ttotal ¼

(5 þ N ) s Fmax

(16)

4.1 Parallel architecture of censoring module

Figure 5 Basic blocks of ACCA– ODV CFAR detector with the sorted samples (X13 – X16) to determine the ODV statistic (V0 –V3). ODV statistics are fed with the censoring thresholds (S0 – S3) to four parallel comparators, the output of which is a binary word of length four bits. This binary code is applied to a mask circuit, which determines the number of cells to be censored. The resulting uncensored cells and the initial population cells (X1 – Xp) are assigned to parallel adder simultaneously, and the result is multiplied by the pre-calculated scale value tk using a binary multiplier. Finally, the multiplier output is compared with the original test cell using a magnitude comparator circuit, which produces (as a one bit flag output) a high state when the test cell is higher than the value of threshold (tk  Z) and gives a low state otherwise. The timing and control module is the heart of the proposed CFAR detector and is responsible for initialising the process, generating the various synchronisation and initialisation clock pulses, and enabling signals. It typically consists of several counters with different configurations and time constants. The circuit is initialised by the external reset signal to resume the processor operations. After the system initialisation, the shift register is filled with L useful data cells. This is achieved by shifting the data right L times. This operation requires L clock cycles and is performed only once and hence it will not be counted to determine the processor speed. However, the total time (Ttotal ) that is required to execute a single run for detecting an object is determined by two factors, namely, the number of clock cycles required to perform single run (Tclk) and the IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

In order to reduce the processing time for the censoring module, a two level architecture based on a parallel approach is adopted [4]. In Level 1, the arithmetic block consists of computing the ODV statistics. Since there is no dependency between the inputs/outputs data of these tasks, the statistics Vk can be processed simultaneously. Each of these tasks needs the corresponding ranked cell X(N 2 k), the corresponding square X 2(N 2 k) and the two quantities sp and mp . Each Vk calculation requires two fixed-point addition, one fixed-point multiplication and one fixed-point division operation. sp and mp are provided by a previous task requiring p fixed-point additions for each computation. In Level 2, since the decisions dk are simultaneously provided, a logic block is necessary to generate the binary code (Kq21 , . . . , K1 , K0) of the estimated number of censored cells, by treating globally the binary inputs dk [4]. The sequence of operations for the execution of censoring algorithm is illustrated in Fig. 6. The censoring circuit automatically eliminates the interfering targets found in the largest four cells of the sorted cells. For a given (N, p), the binary representation needs q bits such that 2q 2 1  N 2 p and the Boolean functions ( gq21 , . . . , g1 , g0) are defined by analysing the input states for each bit of the code (Kq21 , . . . , K1 , K0). To better illustrate the binary code generation of k, consider the case (N ¼ 16 p ¼ 12). In this case, the range of k is [0, 4] and therefore its code can be represented by q ¼ 3 bits. Table 2 represents the input states of the binary decision dk for all the values of k [4]. The architecture of censoring module as shown in Fig. 6 consists of six different units: mp , sp and Vk circuits, a comparator, a converter unit and a mask unit. These components are used to perform censoring on the remaining/largest cells of the sorted cells (X16 , X15 , X14 and X13) as shown in Fig. 5. The mp unit computes the formula indicated by (8). This unit consists of a parallel multiplier and an adder to multiply each cell of the lowest cells with 17

& The Institution of Engineering and Technology 2009

www.ietdl.org

Figure 6 Architecture of censoring module

Table 2 Converter lookup table Binary decision

Represented code

No. of censored cells

Mask code

Threshold factor

d3

d2

d1

d0

K2

K1

K0

k

M3

M2

M1

M0

tk







0

0

0

0

0

1

1

1

1

0 1 0 1 0 1 0 1 0 1 ¼ (341)10





0

1

0

0

1

1

1

1

1

0

0 1 1 0 1 1 0 1 1 1 ¼ (439)10



0

1

1

0

1

0

2

1

1

0

0

1 0 0 0 1 0 0 1 0 1 ¼ (549)10

0

1

1

1

0

1

1

3

1

0

0

0

1 0 1 0 1 0 1 1 0 0 ¼ (684)10

1

1

1

1

1

0

0

4

0

0

0

0

1 1 0 1 0 1 1 0 0 0 ¼ (856)10

, do not care

itself, the resulting squared output of each cell is then simultaneously applied to the parallel adder in order to compute the estimation of mp . The sp estimation circuit consists of a parallel adder, which adds simultaneously the lowest cells together to satisfy (7). As is evident from (9), the Vk unit is composed of two parallel adders, two multipliers and a divider. The task of this unit is to add mp with the square of X16 first, then the result of this addition operation is divided by the square of sp added to X16 to find V0 . This operation is repeated three times to find V1 , V2 and V3 . Then, V0 is first compared with S0 to determine the binary decision d0 . If V0 is greater than S0 the decision is logic 1, else the decision is logic 0. This operation is repeated for all Vk to find the remaining binary decisions. That means four comparators are required to implement this operation on hardware. The result of the four comparators is a binary decision dk (dk ¼ d3d2d1d0), assigned to a converter to determine the specific mask code Mk that points to the number of censored cells, and the related threshold factor tk which is based on converter lookup table as shown in Table 2. The main task of the mask unit is to determine the uncensored cells. It consists of four parallel AND gates as shown in Fig. 7. It is basically AND operation between 18

& The Institution of Engineering and Technology 2009

Figure 7 Architecture of mask unit

each cell of the remaining/largest cells and its mask code (i.e. X16 AND M0 , X15 AND M1 etc.). It censors any one of the cells when its mask code is equal to logic 0 and accepts it when the mask code is equal to logic 1. Table 3 illustrates the accepted and censored cells for different situation. Finally, the output of the censoring module is the rest of the remaining/largest cells, which are not censored and the threshold factor which is related to the number of censored cells. IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

www.ietdl.org Table 3 Mask lookup table No. of censored cell k

Masking code

Accepted cells

M3 M2 M1 M0 X13 X14 X15 X16

0

1

1

1

1

X13 X14 X15 X16

1

1

1

1

0

X13 X14 X15 0

2

1

1

0

0

X13 X14 0

0

3

1

0

0

0

X13 0

0

0

4

0

0

0

0

0

0

0

0

4.2 Architecture of sorting module Sorting is the operation that puts elements of a list in a certain order. This operation plays an important role since it consumes long computation time and constitutes a bottleneck in the field of real-time signal processing applications [9]. The sorting algorithm can work in ascending order (data elements are sorted from the smallest to the largest) or in descending order (data elements are sorted from the largest to the smallest). Since the processing time is critical, the decision of choosing the highest speed and most efficient method of data sorting is of great interest. In this work, the bubble sorting algorithm is adopted. It is one of the best sorting algorithms that combine high speed and simplicity for the applications that involve small number of elements [10, 11]. The bubble sort algorithm compares every two elements, and then decides which one is greater. As shown in Fig. 8, each compare– swap switch circuit is simply composed of a

comparator circuit, 2:1 MUX and output flip flops. The main task is to compare the two input elements: if the first element is greater than the second, the two elements are swapped, and no change is performed otherwise. This operation is repeated for each pair of adjacent elements till the end of the entire data array. The process can be performed simultaneously for every adjacent pairs, so as to speed up the processing time through what is called parallel bubble sorting. The next step in this module is to repeat the operation described above on the array results obtained from the previous sorted data in a serial manner and in synchronised clocked stages till the end. Note that if the number of elements to be sorted is N, the number of the stages will be N; hence, the total number of the compare– swap circuits will be given by No: of units ¼ N (N  1)=2

5

(17)

FPGA realisation and simulation

The ACCA– ODV CFAR detector has been designed, synthesised and simulated using Altera Quartus II software [12] targeting Stratix II FPGA. It provides a complete design environment for designing system-on-aprogrammable-chip. It offers a very rich library of parameterised modules that can be utilised to construct different processing units used in this design. The designed CFAR detector is modular, which enables the designer to test the various modules individually.

Figure 8 Architecture of compare – swap circuit IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

19

& The Institution of Engineering and Technology 2009

www.ietdl.org For illustration, a shift register consisting of 19 cells (each cell of 16 bit), 16 reference cells and 2 guard cells are considered. Hence, the total number of the clock cycles Tclk required is 21, the number of stages for the sorting circuit is 16 and total number of compare– swap switch circuits is 120. For simulation, two memories and their associated read/ write control signals and address generation unit are built inside the FPGA device as shown in the simplified block diagram of Fig. 9. A 256  16 ROM is used, which receives the data serially, stores it and is then read by the rest of the hardware. The resulting flags decided by the thresholding modules are stored in 256  1 RAM. Table 4 summarises the FPGA hardware resource utilisation of different modules and the proposed CFAR detector. The FPGA implementation result shows that the processor can achieve a maximum operating frequency of 109.37 MHz, which is very close to the clock frequency of the prototyping board (100 MHz). This implies that the total processing time Ttotal (for N ¼ 16) to perform a single run is 0.21 ms Ttotal ¼ 21=100 ¼ 0:21 ms

Figure 9 Block diagram of hardware simulation setup

In the absence of other hardware implementations, information on the extent of speedup obtained by our hardware implementation has been gathered by implementing the ACCA–ODV algorithm in software. The full implementation of the ACCA–ODV CFAR detector was carried out in C language targeted to general purpose PC (3.4 GHz Pentium 4 processor with on-board RAM of 1 GB running Microsoft Windows XP Professional). For the same (N, p) configuration, the processing time on this platform is 23 ms. The performance improvement of the proposed ACCA–ODV CFAR hardware architecture is 110 times than the software implementation of the same algorithm.

6

FPGA prototyping

The proposed ACCA– ODV CFAR detector, including the associated input data ROM and the output target detection RAM, have been implemented on Stratix II FPGA chip. The EP2S60 digital signal processing (DSP) development kit [13] built around Stratix II device has been selected for prototyping the proposed CFAR detector because of its low cost, configurability and the fact that the operating master clock is 100 MHz, which is very close to the maximum frequency determined by the compiler (109.37 MHz). This kit is a development platform for high performance DSP designs. It is normally employed to design, verify and evaluate systems prior to final stand-alone single chip implementation. The in-circuit memory content editor provided with the Quartus II software provides read and write access to in-system FPGA memories. The data is read in Hexadecimal (HEX) format, although the processor is running at maximum speed. The designed CFAR detector has been tested and verified by generating 256 data samples drawn from an exponential distribution function. The data set is downloaded to 256  16 ROM and the output array indicating the target presence result is stored in 256  1 RAM.

Table 4 FPGA resource utilisation of different modules Synthesis summary for different modules targeting Altera Stratix II FPGA Modules

Shifting and sorting

Censoring

Timing/control

Adder, multiplier and comparator

ACCA – ODV CFAR detector

4265/48 352 (9%)

14 032/48 352 (29%)

167/48 352 (,1%)

199/48 352 (,1%)

18 861/48 352 (39%)

4144

1415

123

151

5943

Total memory bits

0

0

4843/2 544 192 (,1%)

0

4843/2 544 192 (,1%)

DSP block 9 bit elements

0

40/288 (14%)

0

8/288 (3%)

48/288 (17%)

Maximum frequency









109.37 MHz

Total ALUTs Total registers

20

& The Institution of Engineering and Technology 2009

IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

www.ietdl.org 7

Conclusions

In this paper, we have presented a design of ACCA–ODV CFAR detector and its realisation using Altera Stratix II FPGA. FPGA proves to be an efficient hardware target for realising the proposed CFAR detector and the implementation results demonstrate that hardware-based ACCA–ODV CFAR detector provides high computational speeds. However, the total processing time for executing a single run for detecting a target depends on the number of reference cells and maximum clock frequency of the FPGA chip. For N ¼ 16 and p ¼ 12, implementation result shows that the proposed design is very compact and takes 0.21 ms to perform processing of a single run. The proposed FPGAbased ACCA– ODV CFAR detector was thoroughly tested after implementation. It was found after doing place and route that the design can operate at 100 MHz, the maximum clock frequency of the prototyping board. This provides a speedup of 110 times as compared to software-based implementation. It is worth noting that the ACCA–ODV CFAR detector has been designed under the assumption that the radar clutter has a Gaussian pdf, which results in Rayleigh distributed amplitude. In several applications of radar target detection, the clutter amplitude may not be Rayleigh distributed. This is true when working with high resolution radars, low grazing angles and horizontal polarisation at high frequencies. Future work would concentrate on the design and realisation of FPGA-based CFAR detector to regulate the false alarm in high resolution radars. Various tradeoffs between implementation of functions in software using embedded soft processor and hardware resources of FPGA will be further studied to determine the critical parts and optimise the designed CFAR detector by applying parallel processing and pipelining techniques.

8

Acknowledgments

This work was supported by the Prince Sultan Advanced Technologies Research Institute (PSATRI) at King Saud University, Saudi Arabia. The authors would like to thank the reviewers for many useful comments and suggestions that have helped to improve the quality of this paper.

9

References

[1] BARTON D.K.: ‘Radar system analysis and modeling’ (Artech House, Norwood, MA, 2005)

IET Circuits Devices Syst., 2009, Vol. 3, Iss. 1, pp. 12– 21 doi: 10.1049/iet-cds:20080072

[2] BARKAT M.: ‘Signal detection and estimation’ (Artech House, Norwood, MA, 2005, 2nd edn.) [3] TODMAN T.J., CONSTANTINIDES G.A., WILTON S.J.E., MENCER O., LUK W., CHEUNG P.Y.K.: ‘Reconfigurable computing: architectures and design methods’, IEE Proc. Comput. Digit Tech., 2005, 152, (2), pp. 193– 207 [4] FARROUKI A., BARKAT M. : ‘Automatic censoring CFAR detector based on ordered data variability for nonhomogeneous environments’, IEE Proc. Radar Sonar Navig., 2005, 152, (1), pp. 43 – 51 [5] GANDHI P.P., KASSAM S.A.: ‘Analysis of CFAR processors in nonhomogeneous background’, IEEE Trans. Aerosp. Electron. Syst., 1988, 24, (4), pp. 427– 445 [6] CUMPLIDO R., TORRES C., LOPEZ S.: ‘A configurable FPGAbased hardware architecture for adaptive processing of noisy signals for target detection based on constant false alarm rate (CFAR) algorithms’. Proc. Int. Signal Processing Conf. and Expo (GSPX’2004), USA, September 2004, CDROM [7] CUMPLIDO R., TORRES C., LOPEZ S.: ‘On the implementation of an efficient FPGA-based CFAR processor for target detection’. Proc. 1st Int. Conf. Electrical and Electronics Engineering, Acapulco, Mexico, June 2004, pp. 214 – 218 [8] EL-FARAMAWY N.M., EL-BADAWY E.A., SALEM A.I.: ‘Hardware implementation of CA-CFAR processor’. Proc. Int. Conf. Computer and Communication Engineering, Kuala Lumpur, Malaysia, May 2006, pp. 573 – 578 [9] ALSUWAILEM A.M., ALSHEBEILI S.A., ALAMMAR M.: ‘Design and implementation of a configurable real-time FPGA-based TM-CFAR processor for radar target detection’, J. Act. Passive Electron. Devices, 2008, 3, (3-4), pp. 241– 256 [10] FAHMY S. , CHEUNG P., LUK W.: ‘Novel FPGA-based implementation of median and weighted median filters for image processing’. Proc. Int. Conf. Field Programmable Logic and Applications (FPL’2005), August 2005, pp. 142– 147 [11] BLAIR G.M.: ‘Low cost sorting circuit for VLSI’, IEEE Trans. Circuits Syst., 1996, 43, (6), pp. 515– 516 [12] Quartus II user manual: http://www.altera.com/ literature/lit-qts.jsp, accessed June 2008 [13] Stratix II development kit EP2S60 DSP user manual http://www.altera.com.cn/products/devkits/altera/kit-dsp2S60.html, accessed June 2008

21

& The Institution of Engineering and Technology 2009

Suggest Documents