This algorithm consists of wavelet preprocessing

0 downloads 0 Views 1MB Size Report
This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching.
Invited Paper

Wavelet-based ground vehicle recognition using acoustic signals

Howard C. Choe SWI Technologies 4512 Littleton Court, Bryan, TX 77802-3543 Phone: (409) 774-0195; E-mail: [email protected]

Robert E. Karisen, Grant R. Gerhart, and Thomas Meitzler U.S. Army Tank-automotive and Armament Command Warren, MI 48397-5000

ABSTRACT

We present, in this paper, a wavelet-based acoustic signal analysis to remotely recognize military vehicles using their sound intercepted by acoustic sensors. Since expedited signal recognition is imperative in many military and industrial situations, we developed an algorithm that provides an automated, fast signal recognition once implemented in a real-time hardware system.

This algorithm consists of wavelet preprocessing, feature extraction and compact signal representation, and a simple but effective statistical pattern matching. The current status of the algorithm does not require any training. The training is replaced by human selection of reference signals (e.g. , squeak or engine exhaust sound) distinctive to each individual vehicle based on human

perception. This allows a fast archiving of any new vehicle type in the database once the signal is collected. The wavelet preprocessing provides time-frequency multiresolution analysis using discrete wavelet transform (DWT). Within each resolution level, feature vectors are generated from statistical parameters and energy content of the wavelet coefficients. After applying our algorithm on the intercepted acoustic signals , the resultant feature vectors are compared with the reference vehicle feature vectors in the database using statistical pattern matching to determine the type of vehicle from where the signal originated. Certainly, statistical pattern matching can be replaced by an artificial neural network (ANN); however, the ANN would require training data sets and time to train the net. Unfortunately, this is not always possible for many real world situations, especially collecting data sets from unfriendly ground vehicles to train the ANN. Our methodology using wavelet preprocessing and statistical pattern matching provides robust acoustic signal recognition. We also present an example of vehicle recognition using acoustic signals collected from two different military ground vehicles. In this paper, we will not present the mathematics involved in this research. Instead, the focus of this paper will be on the application of various techniques used to achieve our goal of successful recognition.

1.0 INTRODUCTION Different types of ground vehicles emit acoustic sounds that are unique based on the class of vehicles. In many cases, a trained human operator (or ears) can distinguish between two vehicles as 434 ISPIE Vol. 2762

O-8194-2143-X/96/$6.OO

a human being can audibly distinguish between persons A and B by discerning the difference in their voices (tone, pitch, etc.) or other audible features (e.g. , walking sound). Certainly, we can distinguish between a regular passenger vehicle and a bus or truck because they have different engine, muffler sound, mechanical moving parts, etc. These phenomena are more audible in military vehicles . Many different acoustic features exist in military vehicles that can be used to identify them. These features are track types, engine types (gasoline and diesel), turbine sounds, muffler sounds, engine stokes, the number of engine cylinders, wheel types, different moving parts, and so on. To reliably recognize a vehicle using an acoustic sensor, it is quite important to obtain the distinctive feature or a set of distinctive features . Nevertheless, acoustic vehicle recognition can be hampered by speed-related acoustic signal scale changes, Doppler effects, introduction of noises from various moving parts, noises from friction between terrain and wheels or tracks, wind effects across acoustic sensors , weather and environmental effects, and echoes. We selected three distinctive feature signals from each of two military vehicles, Vi and V2,

by playing back and listening to the collected data. Wavelet multiresolution analysis (MRA) is performed on the feature signals for a more compact signal representation and for feature vector generation. The resultant feature vectors were then stored in a reference database. These reference feature vectors are then used to match intercepted unknown signals with a reference vehicle. Simple statistical correlation in addition to data and decision fusion provide a fast recognition of the target vehicle. Figure 1 describes our overall system architecture for the ground vehicle recognition.

Feature vector Matching and Local Decision Fusion

Figure 1. Overall system architecture.

SPIE Vol. 2762 / 435

Section 2.0 describes how feature signals are selected for the feature vector extraction using multiresolution analysis . Section 3 . 0 briefly discusses data and decision fusion as well as feature vector correlation and matching. Section 4.0 presents the results from our algorithm using realworld collected data sets from Vi and V2. Hardware prototyping issues and concerns are in Section 5.0. We conclude our paper in Section 6.0 and the references are cited in Section 7.0.

2.0 FEATURE SIGNAL SELECTION AND FEATURE VECTOR EXTRACTION We selected the feature signals or base signals by playing back and listening to the collected signals. Thus, human auditory perception is involved in the feature selection. Human perception is

used only to extract signals such as squeak and engine sounds. We did not try to correlate the selected feature to the vehicle type. We simply extracted "squeak" and other signal features from both vehicles. By hearing them independently without a priori knowledge, one cannot distinguish the vehicle that originated the squeak signal. Each feature signal is sampled to 2048 data points either at 8 Khz or at 16 Khz, or the signal

duration of 0.256 seconds or 0. 128 seconds, respectively. Table 1 shows the reference feature signals selected and the sampling frequencies for each feature signal.

Table 1. Summary of the selected reference feature signals.

Vi Description

Sampling frequency (Khz)

Description

Squeak sound Sound under motion Exhaust sound

16

Squeak sound Sound under motion Revving sound

8 16

Sampling frequency (Kliz) 16

8 16

Figures 2(a-c) and 3(a-c) are plots of the databased reference feature signals from Vi and V2, respectively. Figures 4(a-c) and 5(a-c) are plots of the wavelet coefficients after multiresolution analysis of Figures 2(a-c) and 3(a-c) using Daubechies-4 wavelet, respectively. Figures 4(a-c) and 5(a-c) are plotted with the following conventions. Since each feature signal was sampled to 2048 (2, n = 1 1) data points, the total possible number of resolution levels, n, is 1 1 . Thus, the plots are , Hi, where L represents the low-pass (from scaling function) made with L, H, Ha-i, Hn-2, coefficients and H represents the high-pass (the wavelet) coefficients. In our case, Hi contains 1024 wavelet coefficients, H2 contains 512 wavelet coefficients, and so on. From each of L and Hs, we compute the first-order statistical parameters and energy content, which become the elements in the

feature vectors. Thus, a full-length feature vector contains 12 vector elements. These vector elements represent the d-c level, the a-c fluctuation, and the energy concentration in each resolution level of feature signals, and thereby, for the originating vehicle. Therefore, the 2048 data point feature signal can be represented by a 12 (=n + 1) element feature vector.

We evaluated the reference signals using the continuous wavelet transform (CWT) to visualize the reference signal property in time-frequency domain. The continuous wavelets used for the CWT are Haar and Morlet wavelets. The CWT plots were generated by using FFT. Figures 6(a-c) and 7(a-c) are the time-frequency representation of the reference signals shown in Figures 2(ac) and 3(a-c), respectively, using Morlet wavelet. We also compared these plots with the short-time Fourier transform (STFT) of the reference feature signals. The STFT plots are shown in Figures

436 ISPIE Vol. 2762

8(a-c) and 9(a-c). The STFT is performed with a simple rectangular window of 4 msec width (32 and 64 data points for 8 Khz and 16 Khz sampling frequencies, respectively). All the CWT and STFT plots have a frequency axis (y-axis) range from 0.0 Hz to 8,000 Hz except Figures 6(b), 7(b), 8(b), and 9(b), whose maximum frequency axis range is 4,000 Hz. For all the CWT and STFT plots, the intensity (0-255 gray scale) was converted in log-scale to display the detail of the time-frequency magnitude. The time-frequency representation of the Vi and V2 feature signals allows us to see whether there are ambiguities between the two vehicles with respect to the selected feature signals. If the CWTs and STFTs of Vi is similar to those of V2, then the selected signal features are not unique nor distinctive to each vehicle. By comparing the CWT with the STFT of the reference feature signals, we can appreciate the time and frequency signal localization aspects of the wavelet transform.

3.0 DECISION-MAKING PROCESSES Decision-making (or the recognition process) involves statistical pattern matching and decision fusion. We applied two different types of feature pattern matching. One is to measure the

distance between the reference and unknown feature vectors. The other is by computing the statistical correlation coefficients between the reference and the unknown feature vectors.

We also applied a multi-layer data and decision fusion to obtain a global decision (final vehicle type determined) . The data-level decision is performed with the distance measure and correlation coefficient evaluated in the pattern matching processes . The feature-level decision is performed using data-level decisions. Since there are three feature vectors for each vehicle, this results in a local decision for an unknown feature vector generated from a segment of unknown acoustic signal. A voting scheme is then applied to the local decisions to obtain a global decision for a given acoustic signal. The global decision maker also assigns the probability or confidence level of the global decision determined.

We recognize that an artificial neural network can replace this decision-making process.

However, we choose not to use a neural network since the number of the data sets is not substantially enough to develop training and test data sets. Also, in many military applications, a large volume of data collection can be expensive or unavailable (e.g. , unfriendly ground vehicles). For cases where such data collection is available, e.g. , applications in industry, the neural network approach may be suitable.

4.0 RESULTS In this section, we present the results of our acoustical ground vehicle recognition scheme for a given set of measured data. As aforementioned, the data consists of acoustic signals from two different tracked vehicles, Vi and V2. In the 57 data sets, 26 sets were from Vi and 3i sets were from V2. For Vi , 8 data sets were sampled at 8 Khz and 18 sets at 16 Khz. For V2, iO data sets were sampled at 8 Khz and 2i at 16 Khz. The length of each data file ranges from 0.5 seconds to 3 .3 seconds . The collected acoustic data represent the vehicles while stationary (idling and revving) and while mobile (at 5-25 mph while accelerating and reversing).

We used the Haar and Daubechies-4 (D-4) wavelets to perform the DWT multiresolution analysis. The feature vectors were extracted from the feature signals to generate a reference database.

Table 2 shows the results in terms of correct and false recognition and miss SPIE Vol. 2762 / 437

(unidentifiable). This table shows that there is a clear difference in recognition performance between the two wavelets . Wavelet selection is one of the most important factors in the vehicle

recognition process, especially for the preprocessing step of feature extraction. If one can gain a priori knowledge on the structure or shape of the acoustic signals of interest, a proper wavelet can be chosen to optimize the recognition performance. Table 2. Recognition results using Haar and D-4 wavelets on the data set. Class Haar Wavelet Daubechies-4_Wavelet Miss Correct ] False Miss Correct False 26 0 0 23 3 0 V1(26) 1 1 30 0 30 0 V2(31) Correct False Miss

92.98% 7.02% 0.00%

98.25% 0.00% 1.75%

The number in ( ) represents the total number of the data sets available for each vehicle.

5.0 ISSUES ON REAL-TIME HARDWARE IMPLEMENTATION Since our ultimate goal is to develop a real-time hardware system implemented with our algorithm, we outlined the hardware implementation feasibility study in this section. The purpose of this study is to develop a prototype system to demonstrate a real-time ground vehicle recognition using a vehicle's acoustic characteristics. The CPU benchmarking of the algorithm performing a 2048-point redundant DWT (wavelet packet) for an 1 1-level multiresolution process is used in assessing this study. The CPU benchmarking time (shaded) for the DWT process is listed in Table 3. The benchmarking is performed on a 133 Mhz SGI workstation. Table 3. CPU time benchmarkin for performing a 2048-point DWT 1 1-level-MRA. Wavelet type Computer processing time (msec) Total time User time 34.515 34.806 Haar 38.263 38.611 Daubechies-4

For the prototyping and hardware feasibility study, a commercially available PC-based DSP board is used. This DSP board has a data acquisition and a programmable DSP microprocessor chip (TMS32OC4O). Since performing a 2048-point non-redundant DWT in 1 1-level-MRA is equivalent to performing 2048, 1024 , and 4-point FFT in sequence, we measured the possibility of implementing our algorithm using this DSP board by measuring and summing the benchmarking time of each FFT with 2-point, where n =2, . . . , 1 1 . From the vendor's FFT benchmark chart on this DSP board, the time summation ranges approximately from 8.5 msec to 10.0 msec depending upon various options. The throughput of on-board A/D converter's sampling rate ranges from 1 to 52 Khz per channel, and the TMS32OC4O processor runs at 40 Mhz.

438 ISPIE Vol. 2762

The non-redundant 2048-point DWT for the 1 1-level multiresolution process is equivalent to a 4094-point DWT. Since the DWT (N operations) is faster than the FFT (Mog2N operations), the computational time required for the this DWT is less than the equivalent FFT. This is easily shown by comparing the CPU time shown in Table 3 with the benchmarks from the vendor by purely

normalizing the CPU clock cycles. Using the Daubechies-4 wavelet as a testing wavelet, we obtained an estimated CPU processing time on a 40 Mhz DSP microprocessor board for the 2048points redundant DWT. The estimated CPU time is 1.16 msec (= 0.348 msec x 133 Mhz -- 40 Mhz). However, since the current algorithm performs the redundant DWT, the actual number of data points processed is 2048 x 1 1 = 22528. Thus, if a non-redundant DWT (4096 data points) is used, this estimated time (1 . 16 msec) will considerably be reduced.

However, using the clock cycle may not be a fair comparison to obtain the estimated processing time required on the DSP board using the benchmarking done on the SGI workstation because the CPU architecture and the CPU instruction cycle time are different betWeen the two machines. More adequate comparison is then to use the number of operations performed for the DWT and the FFT. Since there is a factor of log2N in the number of operations performed between the DWT and the FFT, we can get another set of the estimated processing time on the DSP board. The vendor's FFT bencbmarking time can be divided by the factor log2N to obtain the estimated DWT time assuming the DWT equivalently requires an overhead time as the FFT. By using this method, the estimated DWT time ranges approximately from 0.85 to 1 .0 msec which is the time required for performing the redundant DWT. Both methods provided the worst approximated processing time of 1 .0 msec. In our hardware implementation study, we used the redundant DWT time (1 .0 msec). This provides a processing cushion when we choose to implement the algorithm in non-redundant manner. In addition, for unseen problems in implementing the DWT on this DSP chip and a possibility of using a DWT with more than 4 coefficients, we ten-fold the redundant DWT processing time to provide a sufficient implementation cushion; thus, we use 10.0 msec for the CPU (TMS32OC4O) time required for a typical 4096-point DWT processing. This design delay (10 msec) will also provide a sufficient processing time for the subsequent feature extraction and decision making processings in addition to the 4096-point DWT. The estimated non-redundant processing time would approximately be 0.2 msec.

Our real-time algorithm buffers 2048 data points and processes them as a block for a multiresolution analysis (MRA) and feature extraction. In the simulation of algorithm, we inserted a 512 data point delay for buffer acquisition which allows time for the MRA, feature extraction, and subsequent recognition processing. Table 4 provides the time delay (number of skipped samples during buffer delay) required for the processing for various sampling frequencies.

Table 4. Sampling frequencies vs. the number of samples skipped in the design (10 msec), redundant DWT (1 .0 msec), and non-redundant DWT (0.2 msec) processing time delays. Number of samples in the given processing delay Sampling time Sampling 0.2 msec 1 .0 msec 10 msec frequency 2 8 80 0. 125000 msec 8 Khz 4 16 160 0.062500 msec 16 Khz 24 Khz

0.041667 msec

240

24

5

32Khz 44 Khz

0.O3l25Omsec 0.022727 msec

320 440

32 44

7

9 SPIE Vol. 2762 1439

Once the algorithm is implemented using the 10 msec design delay, it can be optimized by testing various delay parameters and sampling frequencies. Since the sampling frequencies and the processing time delay are controllable once the algorithm is implemented, it will provide a flexibility in tuning the system to the most suitable sampling rate and the delay for the ground vehicle acoustic signal recognition applications. Thus, our algorithm can be implemented using COTS (commercialoff-the-shelf) components. It will also be flexible to fit various applications in acoustic signal recognition problem.

6.0 CONCLUSIONS We presented our approach to a challenging acoustical ground vehicle recognition problem. From our simulation using the actual measured data set, our algorithm shows promising results. The feature extraction method using the DWT MRA in conjunction with the CWT provided the distinctive features that played a major role in our approach. We also acknowledged that a proper wavelet type selection will impact the recognition performance. One should examine the acoustic signals, if available, prior to the selection of the wavelet or wavelets for the MRA. We also performed a real-time hardware implementation study using COTS . Our study indicated that our algorithm can be efficiently implemented to a programmable DSP microprocessor integrated with data acquisition components.

We are currently testing the algorithm with data sets from other ground vehicles and also evaluating various types of wavelets (linear spline, cubic spline, Battle-Lamarie, more than 4 tap Daubechies, etc.) for acoustic signal recognition applications. We plan to develop a working prototype hardware using the algorithm with a PC-based DSP processing board.

7.0 REFERENCES [1]

[2]

[3] [4] [5]

[6] [7] [8] [9]

B. A. Telfer, H. H. Szu, and G. J. Debeck, "Adaptive wavelet classification of acoustic backscatter, " SPIE Proceedings in Wavelet Applications, Harold H. Szu (ed.), Vol. 2242,

pp. 661-668, 1994. M. K. Tsatsanis and G. B. Giannakis, "Time-varying system identification and model validation using wavelets, " IEEE Transactions in Signal Processing, Vol. 41 , No. 12, pp. 3512-3523, December 1993. C. K. Chui, Introduction to Wavelets, Academic Press, Inc. , Boston, 1992. Y. Meyer, Wavelets: Algorithms and Applications, SIAM, Philadelphia, 1993. B.

Jawerth and W. Sweldens, "An overview of wavelet based multiresolution analysis,"

Department of Mathematics, University of South Carolina, Columbia, SC 29208. I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. A. N. Akansu and R. A. Haddad, Multiresolution Signal Decomposition, Academic Press, Inc., Boston, 1992. 5 . Haykin, Communication Systems-2nd Edition, Jolm Wiley & Sons , 1983.

H. Choe, C. Poole, A. Yu, and H. Szu, "Novel identification of intercepted signals from unknown radio transmitter," SPIE Proceedings in Wavelet Applications, Harold H. Szu (ed.), Vol. 2491. [10] H. Choe, C. Poole, A. Yu, and H. Szu, "Ear-like multiresolution preprocessing neural network for transmitter identification," World Congress on Neural Network (WCNN) 95, 17-21 July 1995, Washington D.C. 440/SPIE Vol. 2762

[11]

[12]

H. Choe, C. Poole, and A. Yu, "Transmitter identification using wavelet transient analysis and neural network," Progress in Electromagnetic Research Symposium (PIERS), 24-28 July 1995, Seattle, WA. H. Szu, X. Yang, B. Telfer, Y. Sheng, "Neural network and wavelet transform for scaleinvariant data classification," Physical Review, Vol. 48, No. 2, August 1993.

SPIE Vol. 2762/441

2762

Vol.

/SP!E

442

sound.

Revving

(c)

motion;

under

Sound

(b)

sound;

Squeak

(a)

V2:

3.

Figure

0.30

0.60 0.75

\/2

—C7S —0.60

(5miir,6

mtiri

rdr

8c.r-d

536

1

2a0

1

024-

1

7ââ

igr-cI



1

2

6

66

a

8

-4

0

2

k(h,)

204-8

k(hz)

792

a

S

t1

0.16

7 6 2

1 5 3 6

1 2 8 0

4-

1 0 2

7 6 8

5 1 2

2 5 6

—0.30

0.00

S

1—0.30



—C.

0 . 1 5

0.



.

0 1 6

61

1

t

(5cmIirg



igrc

fctt..jr-c

\J2

0

—0.75—0.60

S

-1-

—0.



0.00 0.16

-

-

-

0.30 0.1-S0.60 -

0.75

19220-8

ZHz)

1536

1280

iC21-

1 6

ct

(5m6Iin6



768

i9nI

fctr 512

256 \/2

—0.75 —0.60 —0.-5 —0.30



5

0 1 0.00 0.15



0.30

-5

0. 0.60

0.75

sound.

Exhaust

(c)

motion;

under

Sound

(b)

sound;

Squeak

(a)

Vi:

2.

Figure

—0.75 —0.60

S

-

—0.

—0.30



5

0 . 1 0.00 0.15



0.30 0.1-S0.60 075

612

266

1

768

1636

1280

792

20-4-8

-

Suggest Documents