Generalized Configurable Architectures For FPGA Implementation Of ...

Generalized Configurable Architectures For FPGA Implementation Of Ordered-Statistic CFAR Kumar Vijay Mishra1, Ramchandra Kuloor2 1Signal

Processing Group, Radar ‘A’, LRDE, Bangalore – 560 093 India Telephone: 0091 080 25240401 Fax: 0091 080 25240821 2Radar ‘A’, LRDE, Bangalore – 560093 India Telephone: 0091 080 25240821 Fax: 0091 080 25240821 Email: [email protected] Abstract

Ordered-Statistic (OS) CFAR is a constant false alarm rate technique that is relatively immune to the presence of interfering targets. This paper presents two scalable hardware architectures for OS-CFAR and its other derivatives which require implementation of sorting algorithms. The first scheme implements parallel insertion sorter and is configurable only for OS-CFAR. An alternative architecture is suggested which, in addition to insertion sort, implements merge-sort as well. The second implementation is configurable for three modes: OS, OS Greatest-of (OSGO) and OS Smallest-of (OSSO), but at the cost of greater resource usage. Modifications of both architectures are then presented to implement CFAR in range as well as Doppler. Additional means are suggested to enhance the configurability of the architecture to suit it for other CFAR schemes and reduce resource utilization. FPGA implementation results are presented and discussed. Key words: Ordered-statistic CFAR, sorting algorithms, FPGA. analyzed in several research efforts where either new combinations of fundamental schemes and modifications of existing techniques have been proposed or the performance in new environments has been analyzed. Although the theoretical aspects of CFAR detection are well-researched [2], the practical hardware implementations of the same are rare. This is because of the high computational requirements demanded by such algorithms in radar signal processing. CFAR processors cannot meet high computational requirements through technology improvements because of the high data rate in radar signal processing. Hence it leads to the design of CFAR hardware architectures based on parallel computational models. The rest of the paper is organized as follows: Section II briefs on the theoretical grounds of OS-CFAR. Section III discusses selection of sorting algorithm for the particular case of OS-CFAR. Section IV presents details of basic hardware architecture for the OS-CFAR algorithm. Section V presents an alternative architecture which is more configurable compared to the first one. In addition, it also provides modifications in the existing architecture necessary to implement CFAR in range as well as Doppler. In Section VI FPGA implementation

I. Introduction In real radar applications, many different noise and clutter background signal situations can occur. The target echo signal practically always appears before a background signal, which is filled with point, area or even extended clutter and additional superimposed noise. In real applications, the clutter itself is a complicated time and space variant stochastic process. Owing to this noisy background, adaptive signal processing techniques are required for target detection. Such techniques are used to eliminate noise and enhance the detectability of the target. The most common scheme employed in such scenarios constantly maintains a desired level of performance i.e. a predetermined constant false alarm rate (CFAR) [1] by adaptively varying the threshold against which the target detection is declared. Usually the threshold is adapted to the local information on the background noise. Therefore all CFAR detections consist of making an estimate of the noise power to calculate the adaptive threshold and then declaring detection of the target by comparing the echo signal amplitude with the threshold. This general detection scheme has been

1

constant false alarm rate can be maintained by fixing the values of N andα.

results are presented and further design alterations are suggested for higher configurability and performance. Section VII carries concluding remarks. II. OSCFAR Algorithm Numerous CFAR algorithms have been proposed till date, based on the method to obtain the adaptive threshold from a sliding reference window of fixed size to test the amplitude of each individual range cell. Cell-Averaging CFAR [3] is the most common CFAR detector wherein all the cells in the reference window are summed up to compute their average, and thus the adaptive threshold. The quality of the conventional CA CFAR threshold in scenarios like steep clutter edges and multiple target environments has been observed to degrade considerably. To alleviate performance degradation of CFAR processors, an algorithm based on order statistics [4], which exhibits reduced sensitivity to spurious targets, was proposed [5]. Here, N background amplitude samples (the normalized cell input to the CFAR processor is the random variable X) are ranked in increasing order

Figure 1: Block Diagram of an OS-CFAR Processor. The main components of the processor are registers/memory, a multiplier, a rank computation module and a comparator.

Figure 1 shows the block diagram of OS-CFAR processor. A reference window of N samples which surround the test cell Y is taken to compute the rank and some guard cells are incorporated in order to avoid target strength leakage which may affect the noise estimation. The scaled noise estimation is compared with the test cell to declare the detection d(Y) of the target

⎧1, if (Y ≥ Z T ) KKKKKKKKK (4) d (Y ) = ⎨ ⎩0, if (Y < Z T )

X 1 ≤ X 2 ≤ ... ≤ X i ≤ ... ≤ X K ≤ ... ≤ X N KK (1) The variable K is the rank of the cell whose input is selected to determine the threshold (representative rank). The threshold level ZT is obtained by multiplying the input from the Kth ranked cell by a scaling factor α

Z T = αX K KKKKKKKKKKKKKK(2) The multiplication factor α provides flexibility in choosing the false alarm probability which for independent and identically distributed (IID) random variables with a Rayleigh probability density function (PDF) is given by

PFA =

N ! (α + N − k )! KKKKKKKKK (3) (N − k )! (α + N )!

(Note: For a non-integer α, the factorial is replaced with the corresponding Gamma function.) As we see PFA for OS-CFAR is not a function of the scale parameter of Rayleigh PDF and hence the

2

III. Selection of Sorting Technique Sort algorithms accept an array of numerical data, compare the numbers, and then arrange the numbers in ascending or descending order. The algorithm can then be used to determine the median, minimum, maximum, and any fractional information, such as the upper or lower quartile. Theoretically there have been numerous approaches to sort a given array of data. These methods vary in performance and complexity from the simple-but-slow bubble-sort to the fast-but-complicated quick-sort algorithm. Methods of straight insertion perform better than the bubble-sort but not as well as quick-sort. A sort algorithm can be implemented in an FPGA in more than one way [6]. In a traditional O(N2) bubble-sort implementation, it can take a maximum of N2 processing cycles to sort a single array. There are O(N) implementation schemes to sort an array serially [7]. An advantage of FPGAs is that they allow performing operations in parallel so that multiple comparisons can be performed in a single pass. To select an appropriate sort algorithm for OSCFAR implementation, the flow of data and data dependencies in the CFAR processor should be

arrival number or its ‘age’; so that even when the reference window is ordered, its elements should be able to get tracked by their time-sequence arrival as well.

examined. As mentioned in [8], let X be the raw data samples of the signal to be processed and N the number of reference cells in the CFAR detector. If we consider a sequence of reference data samples around the test cell (Figure 2), it can be easily deduced that the adjacent frames of the sliding window share all the data samples except data at their edges. This data sharing can be exploited to reuse previous partial results. After processing a particular window, preceding results can be used for computing result of the next window without recalculating partial result afresh just by incorporating insertion and deletion of data at the window edges. Here for the sake of simplicity but without loss of generality, we have omitted guard cells from the explanation. However, the procedure to incorporate guard cells in the existing algorithm is discussed in the next section.

IV. Scalable Hardware Implementation To implement the ‘running sorter’ described above for OS-CFAR reference window, first let us consider the scheme required to implement the running sorter for a simple array. Here, the core hardware consists of two shiftregister-based memories and three registers to store status words generated by the control logic. A Sorted_Array shift register holds the values of all the elements in an ordered manner. The second shift register (Age_Array) also contains the encoded ages of the elements in ordered form. A Replacement_Word, a Comparison_Word and an Insertion_Word each are stored in a separate register. The Replacement_Word indicates the element which should be deleted after every pass of insertion sort. This is invariably the element of highest (or oldest) age.

Figure 2: Data dependencies for three adjacent reference data sets. A data set is obtained by sliding the previous data set and by inserting and deleting on data respectively at the leading and the trailing edges of the previous data set.

The role of partial results in easing up the calculation calls for a sorting algorithm which doesn’t require sorting the entire window every time. Also, in order to exploit the concurrency of the window samples, a parallel architecture for the sorting algorithm should be used. The sorting algorithm, in this case, should delete the oldest window sample from the sorted array and insert the new sample at its appropriate place in the presorted array. This is, in fact, the traditional Insertion sort algorithm. As observed, this variant of insertion sort differs from its conventional counterpart in the following ways: (1) The size of the array doesn’t grow with the addition of the new element. It is fixed to N (the size of the reference window). (2) The element which is deleted after every sort process is not the last element in the array but the outgoing element from the right window. This is invariably the oldest element. (3) Every pass of insertion sort leaves the elements in the array jumbled as per their numerical values and not arranged in the time-sequence manner they entered in the reference window. These observations lead us to append a pointer with every element that expresses its time-sequence

Figure 3: Logic to generate Replacement_Word. Note that the Age_Array is a shift register. The connections among the smaller blocks inside Age-Array are not shown here for simplification.

3

after the Comparison_Word. A simple OR operation of Replacement_Word and Comparison_Word will ascertain the correct insertion position around the transition in Comparison_Word (Figure 5). Of course, the first and the last bits of the Insertion_word should be taken care as special cases with a separate but simpler logic.

Hence the Replacement_Word is generated by tapping the MSB of the ages of all the elements from the Age_Array and ANDing them individually with the MSB of the largest age i.e. ‘1’ (Figure 3). Then the Replacement_Word will have all zeros but one where the oldest element is located. The Comparison_Word is the result of the comparison of the new element with each of the elements stored in the Sorted_Array. If the new element is smaller than the element stored in the Sorted_Array, the result of the comparator is ‘1’ else ‘0’. Since the array is already arranged in a sorted order, the Comparison_Word will be a stream of ‘1’s and ‘0’s with a single transition from 1 to 0. This transition edge indicates the position of the insertion of the new element (Figure 4).

Figure 5: Logic to generate single bit of the Insertion_Word. The bitwise OR of Comparison_Word and Replacement_Word is performed for both the cases (oldest_element > new_element and oldest_element < new _element). This checking should be done only at the transition. The logic is replicated for all the bits of the Insertion_Word.

Once Insertion_Word is available, the Sorted_Array shift register should be updated by (i) inserting the new_element at its position and (ii) shifting the elements between the new_element and the oldest_element. This explains the use of a shiftregister for the Sorted_Array. Similarly, Age_Array should be updated by (i) inserting age of new_element (which is zero here) at the appropriate position. (ii) shifting the ages between the new_element’s age and the oldest_element’s age. (iii) incrementing all other ages by one. Since a pass is over, we must conclude that all the existing elements have grown older by one. The complete Running Sorter Unit and the flow of logic are shown in Figure 6. The unit is based on the scalable architecture and extension of the array length or the data length can be easily accommodated by simply replicating the logic of each word. Since the complete sorted array is available as the output, it enables selection of any rank. The rank selection may be done by using a multiplexer which outputs a section of the sorted data based on the rank set by the operator.

Figure 4: Logic to create Comparison_Word. Note that the Sorted_Array is a shift register. The connections among the smaller blocks inside Sorted_Array are not shown here for simplification.

The Insertion_Word indicates the position of the insertion of new element. If the number to be replaced (the oldest number) is greater than the new element, its ‘1’ lie before the transition in Comparison_Word. If it is smaller, then the ‘1’ would lie

4

The derivatives of OS-CFAR sort the leading and the lagging windows separately and then manipulate the results of both ranks to achieve a better estimate of the background. Notable alterations include OSGO and OSSO [9]. These can be implemented by using the original RSUs (as in Figure 6) separately for each window, and taking the greater or smaller of the two ranks as the background estimate. However, the original OS-CFAR rank is lost in such a scheme. An addition of smaller unit, the Merge Sort Unit (MSU), can be considered to this architecture which merges the results of the two RSUs. The parallel version of MSU is a combination of shift-registers, multiplexer and comparator. Two shiftregisters hold the value of the two sorted arrays. A control logic drives the multiplexer to select the first elements of the two shift-registers. These elements are compared and the larger element is placed at the next available position of the register which holds the merged array. The shift-register which contributes for the next element of the merged-array is shifted to bring the next element to the first place for the comparison. A configuration word controls selection from the three CFAR schemes: OS, OSGO and OSSO. This integrated hardware is shown in Figure 7.

Figure 6: The Running Sorter Unit (RSU). The new_element is the input and sorted_array is the output.

However the RSU so devised can not be used per se in the OS-CFAR architecture shown in Figure 1. This is because the reference window is actually discontinuous in time-sequence because of the presence of the guard cells. To incorporate this discontinuity, following changes should be made to the existing RSU: (i) The whole window can be analyzed as a window with two inputs: the first input being the new data, say Xnew, entering the left window and second one being the data entering right window from the right guard cell, say Xright_guard. (ii) Since RSU accepts only one new data at a time, we should sort the reference window twice: first with Xnew and then with Xright_guard. The output should be latched only after two passes of the sort. (iii) While performing first sort, the age of new element should be lowest (i.e. zero) while for the second pass it should be equal to N/2 + 2. Here we assume only one guard cell on both sides of the test cell. (iv) A reset signal should be used to initialize the reference window and the appropriate registers with zero values.

Figure 7: Configurable hardware architecture to select any of the three schemes: OS, OSGO and OSSO.

V. Configurable Architecture Implementation A. OS, OSGO and OSSO

B. CA, CAGO and CASO

The modified RSU discussed above doesn’t permit configurability in the architecture in the sense that it can implement only OS-CFAR scheme. Other variants which have computational advantages over OS-CFAR can not be implemented using the same RSU.

This architecture can also accommodate CellAveraging CFAR if desired as a feature in the CFAR processor. Since the reference window is never tampered with and its data is, instead, copied in the

5

dimensional modified RSU design for half-windows of size 16 each, data width of 16-bit, a 4-bit threshold control and fmax of 35MHz. For the same values of parameters, Table 2 lists the results for the configurable (OS, OSGO and OSSO) architecture for a twodimensional CFAR.

RSUs; the architecture can be used to simultaneously implement CA, CAGO [10] and CASO [11] as well, as discussed in [8]. C. Extension to Range and Doppler Many a time, a two-dimensional clutterrejection system is applied to the suppression of weather and ground clutter [12]. In such cases a Discrete Fourier Transform (DFT) unit is used before CFAR processor. Thereafter, CFAR is performed along the range for each of the Doppler filters. Here, the reference window discussed above is not a single one but, in fact, is a collection of as many parallel reference windows as the number of filters. All the windows can be processed simultaneously but for the data arrival filter wise in each range cell. In such cases the partial results of CFAR processing should be stored in a separate memory for each filter [Figure 8].

Table 1: Synthesis Summary for 2-D OS-CFAR design Total logic elements: -- Combinational with no register -- Register only -- Combinational with a register Total memory bits

8,047 / 57,120 ( 14 % ) 4932 79 3036 117,760 / 5,215,104 (2%)

Table 2: Synthesis Summary for 2-D OS, OSGO and OSSO CFAR design Total logic elements: -- Combinational with no register -- Register only -- Combinational with a register Total memory bits

20,990 / 57,120 ( 36 % ) 18355 123 2512 84,992 / 5,215,104 (1%)

The hardware resource utilization is directly proportional to the length of the reference window and the data precision. However, the astronomical increase in resource utilization in second design may be attributed to the O(N2) implementation of parallel mergesort. A pipelined serial implementation of merge-sort may be tried to reduce resource utilization here. The proposed architecture produces an output result on each clock cycle after the latency period. The latency arises because of the shift-register nature of the memory which holds the values of the reference cells; since the shift-register must be full in order to output a result. A few more modifications or additions in the existing architecture may increase its configurability and performance. The same architecture can be used for other CFAR schemes as well. A parallel adder may be added, after appropriately truncating the sorted set obtained through MSU, to implement the Trimmed-Mean (TM) CFAR [13]. One can also select rank after the truncation. This is described as a variant of OS-CFAR in [14]. The resource utilization can be almost halved by using a combination of CA and OS CFAR schemes. In

Figure 8: OS CFAR in Range and Doppler. Note that here the reference window should be viewed as a collection of many parallel reference windows. Additional logic to address the appropriate data in the memory is not shown.

Here, the only partial results required for the next pass of RSU include the Age_Array and Sorted_Array. Hence two separate memory blocks should be used to store these results of each RSU. VI. FPGA Implementation and Results The proposed architectures were modeled in VHDL [VHSIC (Very High Speed Integrated-Circuit) Hardware Description Language] and parameterized in terms of the number of reference cells, data precision, the rank and the threshold. The VHDL model was synthesized with Altera Quartus II software for a Stratix EP1S60F1020I6 device. Table 1 summarizes the synthesis results for a two-

6

VII. Conclusion

this case one of the RSUs is replaced by a cellaveraging unit and MSU is done away with. The design can now be used to implement derivative CFAR schemes like OSCAGO (OS and CA Greatest-Of) [15] and MOSCA (Mean of OS and CA) [16]. Even for the original configurable OS architecture, a single RSU can be shared between the two reference windows. This would require a clock of double the data rate, but would reduce the resource overload to a great extent. Of course adjustments should be made to the MSU which should accept data only after the two passes of the sort.

This work proposes a highly configurable design architecture to implement CFAR schemes which require sorting of the reference window data. Scalability and high performance of the proposed architectures is feasible because of the use of parallel processing model and provision of such structures in FPGAs. The architectures can co-exist with the CACFAR designs and efficiently implement OS, OSGO and OSSO algorithms. It can also be extended to several derivatives or combinations of fundamental CFAR algorithms.

References:[1] Skolnik, M. I., “Introduction to Radar Systems”, McGraw-Hill, 2000. [2] Rohling H., “25 Years Research in Range CFAR Techniques”, Proceeding of International Radar Symposium, IRS – 2003, Dresden, Germany, 30 Sept – 02 Oct 2003, pp. 363-368. [3] Finn, H. M. and Johnson, R. S., “Adaptive Detection Mode with Threshold Controls a Function of Spatially Sampled Clutter Estimates”, RCA Review, Vol. 29, No. 3, 1968, pp. 414-464. [4] David H. A. and Nagaraja H. N., “Ordered Statistics”, 3rd edn., New York: Wiley, 2003. [5] Rohling H., “Radar CFAR Thresholding in Clutter and Multiple Target Situations”, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-19, 608-621, July 1983. [6] Peichel C., “Integer out of sorts? Program an FPGA to put them in order”, EDN, Aug 15, 1997, pp. 95-102. [7] Knuth D. E., “The Art of Computer Programming – Vol. 3: Sorting and Searching”, Addison-Wesley, 1985. [8] René Cumplido, César Torres and Santos López, “On the Implementation of an efficient FPGA-based CFAR Processor for Target Detection”, International Conference on Electrical and Electronics Engineering (ICEEE) and Xth Conference on Electrical Engineering (CIE 2004), Acapulco, Guerrero, Mexico, September 8-10, 2004, pp. 214-218. [9] Elias-Fusté Antonio R., de Mercado Manuel Garcia G. and Davó Elias de los Reyes, “Analysis of Some Modified Ordered Statistic CFAR: OSGO and OSSO CFAR”, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-26, No. 1, 197-202, January 1990. [10] Hansen, V. G., “Constant False Alarm Rate Processing in Search Radars”, Proceedings of the IEEE International Radar Conference, 1973, pp. 325-332. [11] Trunk, G. V., “Range Resolution of Targets Using Automatic Detectors”, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-14, No. 5, 750-755, July 1978. [12] Irabu T., Kiuchi E., Hagisava T., Tomita Y., Ibe T. And Shimojo K., “On the Performance of a Two-dimensional Clutter Rejection System”, Proceedings of the IEEE International Radar Conference, 1980, pp. 311-316. [13] Gandhi P. P. and Kassam S. A., “Analysis of CFAR Processors in Nonhomogeneous Background”, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-24, No. 4, 427-445, July 1988. [14] Blake S., “OS-CFAR Theory for Multiple Targets and Nonuniform Clutter”, IEEE Transactions on Aerospace and Electronic Systems, Vol. AES-24, No. 6, 785-790, November 1988. [15] You H., Jian G., Yingning P. and Dajin L., “A New CFAR Detector Based on Ordered Statistics and Cell Averaging”, CIE, 1996, pp. 106-108. [16] You H. and Guan J., “A New CFAR Detector with Greatest Of Selection”, Proceedings of IEEE International Radar Conference, 1995, Alexandria, USA, pp. 589-591.

7

Author Info Kumar Vijay Mishra obtained his B. Tech. degree in Electronics & Communication Engineering from National Institute of Technology, Hamirpur (Himachal Pradesh) in 2003. He joined Defence Research and Development Organization in 2003 and is working with Electronics and Radar Development Establishment (LRDE), Bangalore since 2004. His areas of interest include radar signal processing, automatic target detection techniques and FPGA-based hardware design.

Ramchandra Kuloor obtained his B. E. degree in Electronics from the University Visvesvaraya College of Engineering, Bangalore in 1976 and M. E. degree in Electrical Communication Engineering from the Indian Institute of Science, Bangalore in 1978. He joined the Electronics and Radar Development Establishment (LRDE) in 1978 and has been working in the area of radar signal processing and radar systems engineering. His areas of interest include digital pulse compression, radar ECCM techniques and FPGAbased low power signal processor realization. He is a recipient of the IETE-IRSI award 1996-97 and AGNI award for self-reliance in 2002.

8

Generalized Configurable Architectures For FPGA Implementation Of ...

Generalized Configurable Architectures For FPGA Implementation Of ...

Suggest Documents

FPGA Implementation of Generalized Hebbian

Simplified FPGA implementation of the generalized ...

FPGA Design and Implementation of Matrix Multiplier Architectures for ...

FPGA Design and Implementation of Matrix Multiplier Architectures for

FPGA Implementation of a Configurable Cache ... - Google Sites

Simplified FPGA implementation of the generalized ... - IEEE Xplore

FPGA SDK for Nanoscale Architectures - Google Sites

Controller Estimation for FPGA Target Architectures ... - CiteSeerX

FPGA Implementation For Image Processing

fpga implementation of adaboost algorithm for

FPGA Implementation of Adaptive Median Filter for

FPGA Implementation of LS Code Generator for

FPGA based generalized architecture for Modulation and ...

Dynamically configurable security for SRAM FPGA ... - Lab-STICC

AcENoCs: A Configurable HW/SW Platform for FPGA Accelerated NoC ...

FPGA Implementation of Image Enhancement

FPGA Implementation of Polynomial Evaluation

algorithms and FPGA implementation

High Performance FPGA Implementation of

Synthesizing Configurable Biochemical Implementation of Linear

Experimental Implementation of Generalized

Hardware Implementation of Configurable Booth ...

Design And Implementation of a Configurable Interleaver ...

Area-Optimized Architectures & Implementation of