Neural Networks for Vehicle Recognition

Neural Networks for Vehicle Recognition Henryk Maciejewski, Jacek Mazurkiewicz, Krzysztof Skowron, Tomasz Walkowiak Institute of Engineering Cybernetics Technical University of Wrocáaw ul. Janiszewskiego 11/17, 50-372 Wrocáaw, POLAND E-mail: [email protected]

Abstract In this contribution we describe a neural approach to classify vehicles based on sound emitted by them. Engines and the carriageable devices are the sources of signal. The used methodology doesn't require the detailed analysis which part of object is responsible for the components of signal. The sound is preprocessed using wavelet method to obtain feature vectors to be used by a neural classifier. We test and compare two different neural networks multilayer perceptron (MLP) and probabilistic neural network (PNN) based on gaussian mixture as neural recognition devices. Results of classification of various military vehicles are the subject of conclusions. The described methods can be used as the basis for inteligent robots.

1 Introduction The recognition of mobile object is the subject of researches. A sound generated by a vehicle in motion is a basis for classification process. Engines and the carriageable devices are the sources of signal. It also depends on external conditions, kind of pavement, speed, etc. The used methodology doesn't require the detailed analysis which part of object is responsible for the components of signal. The recognition based on acoustic information is possible if the sound generated by tracking object includes specific features, which allow to distinguish it from signals produced by other vehicles. The experimental results show that this assumption is correct. The slight difference detection isn't trivial and requires the sophisticated algorithms for acoustic signal processing. The problem becomes more complicated if the analysis ought to be done in as real-time task. The recognition algorithm was divided into two stages: preprocessing to extract the specific features' vector from

signal and classification to compare the features' vector with master vectors defined during learning procedure and to make the final decision. We tried to use to kinds of neural classifiers instead of classical statistical classifier. We realised the set of experiments to record the sounds generated by mobile motor vehicles in different ground conditions. We recorded signals produced by military vehicles - tanks, transporters - moved on the roads, unsurfaced roads and firing grounds. Speed, distance from measurement centre and the direction of movement were the extra parameters. Signals were converted into computer files using DSP system and they were the basis to construct the master vectors related to each kind of object. The researches related to the mobile object recognition based on acoustic information can be used as the basis for the new systems to support the contemporary military equipment. The system observes the stage in passive way, so it is very hard to find and destroy the military devices equipped by such intelligent attachment. The proposed algorithm of recognition can be implemented in very efficient way using DSP technology and microcontrollers. It means, that the system can be work out as an ASIC circuit and is easy applicable as a no service unit in existing military vehicles.

2 Feature extraction Recognition of vehicles using DSP techniques based on emitted sounds requires that sampled digitized acoustic signals from vehicles are preprocessed in order to extract feature vectors that can be used to distinguish between different types of vehicles. This section describes a system for feature vector generation based on wavelet multiresolution analysis (MRA). The generated feature vectors are then input to a classifying system.

digitized acoustic signal sampled at 5 kHz

stream of vectors v=[v1,...,vn]

Highpass filter at 50Hz (FIR)

stream of vectors w=[w1,...,wn]

stream of feature vectors x=[x1,...,xk]

Wavelet Multiresolution Analysis

Normalization

to classifier

Fig. 1. System for signal preprocessing based on wavelet MRA The overall block diagram of the signal preprocessing system is presented in Fig. 1. In the system presented, acoustic signal, sampled at 1 kHz and digitized with a 16 bit A/D converter, is filtered using digital highpass filter with the cut-off frequency of 50 Hz. The purpose of filtering at this stage is to remove some low frequency noises that were attributed to e.g., wind effects. Blocks (vectors) of n=1024 samples were filtered which under the sampling frequency of 5 kHz correspond to signal duration of about 0.2s. Filtered vectors are normalized so that changing power of acquired acoustic signals is not reflected in the feature vectors. If v=[v1,...,vn] denotes a filtered vector (with n=1024) then a corresponding normalized, power-invariant vector w is:

Z=

Y′ Y′

where Y ′ = Y − Y YQ − Y , Y =

(1)

vector x=[x1,...,xk], so that k=10. The term xi , m=1,2,...,k, represents the mean square of Haar wavelet coefficients at the level m: 2 m−1

xm =

∑Y

L

L

(2)

=

and the symbol |.| denotes length of a vector. The wavelet multiresolution analysis is the core stage of the preprocessing system in which feature vectors are generated based on normalized vectors. One feature vector denoted x=[x1,...,xk] represents one normalized vector w=[w1,...,wn], which correspond to about 0.2 second of sound. The multiresolution analysis is based on the wavelet Haar transform which is computationally straightforward. Coefficients of a feature vector are obtained as follows. For a vector w with n=1024 coefficients, there are 10 (=log2n) possible wavelet resolution levels. Each resolution level contributes one coefficient to the feature

i =1

( m) 2 i

(3)

2 m−1

where:

K

P

L

=

Z

P

+

L

− Z

P

+

L

L =

P

−

(4)

and the terms wji are defined with the following recurrence formula:

Z

P

L

Q

∑(h )

=

Z

P L

+

+ Z L

P

+

L =

P

−

(5)

where m=1,2,...,k, is the resolution level and

Z L

N

+

≡ Z L = L

N

(6)

which means that the coefficient u10 of the feature vector at the resolution level of 10 is directly computed using the vector w=[w1,...,w1024]. Typically with MRA feature extraction, the term Z can also be included in the feature vector, which would give the feature vector in the form x=[x0,x1,...,x10], with x 0 = w1(1) . However, due to vector normalization described here, the additional term is always zero and therefore the feature vectors used for classification are in the form x=[x1,...,x10].

3 Probabilistic neural network Assume that we have some objects from one of 1,...,K classes described by a feature vector x. The process of classification could be understand as taking one of K+1 possible decisions 1,...,K, ℜ, where 1,...,K corresponds to claiming that observed object is from class 1,...,K, whereas ℜ means 'being in doubt', rejecting the object. Assuming that a probability density function for each class is given (denote it by fl(x)) optimal (minimum-errorrate) Bayes theory (described in many standard textbooks, for example [3]) could be used for classification. If there is no cost associated with being correct, all the prior probabilities (probabilities that an example is drawn from a given class) are equal, and the rejection threshold is equal to α, the optimal Bayes decision rule is choosing a maximum of: K

f1 (x ),..., f K (x), (1 − α ) ∑ f l (x )

(7)

l =1

The rejection threshold α is a user specified constants, being in (0,1) range. Values of α near zero gives a big rate of rejection. If on the other hand α >1-1/K then the decision ℜ will never be used. Having some estimator of probability density function for each class the optimal Bayes decision rule could be a base for a probabilistic neural network represented in Fig 2. The density estimator for each class is independent from the other classes. Therefore, from this point we will be talking only about the density for one class. Assume, that an unknown density could be represented as a linear combination of component densities F(x,θi) (whereas θi is a parameter vector):

LQSXWYHFWRU

GHQVLW\

GHQVLW\

HVWLPDWRU

HVWLPDWRU

I

I

GHQVLW\

HVWLPDWRU

I.

− α

ZLQQHUVWDNHDOO

FODVVFODVVFODVV. UHMHFW

Fig. 2: The Probabilistic Neural Network (PNN)

.

∑ IO

O =

LQSXWOD\HUGQHXURQV

KLGGHQOD\HU

D

ΨL =

D

[ P V

)

L

L

D1

Fig. 3: The RBF network for density estimation N

f (x) = ∑ a i F(x , θ i )

(8)

i =1

where ai presents the prior probability of the data having been generated from component i of the mixture. These priors must satisfy the constrains: N

∑ a i = 1, ≤ a i ≤ 1.

(9)

i =1

The component densities (kernels) F(x,θi) could be selected from any density function. However, we limit our attention to the Gaussian distribution function. Moreover, the covariance matrix of Gaussian function will be limited to a diagonal matrix (a matrix having a non-zero values only on its diagonal) so that Σ=diag(s1, s2,..., sd) and hence:

F ( x, θ i ) = F ( x, m i , s i )

2  d  x − mij   = exp −0.5∑   (10) d sij    j =1  d/2 j ( 2π ) ∏ si

1

j =1

Based on equations (8) and (10) a feed-forward RBF neural network presented in Fig. 3. could be defined. The number of neurons in input layer is set equal - in our case 10 - to a dimension of example vector (denoted here as d). Each neuron from input layer is connected with each of N neurons in hidden layer. With each hidden neuron a centre vector mi and a width vector si is associated. The activation of the each hidden neuron is defined by equations (10). All hidden neurons activations are multiplied by weights ai and summed giving an output from the network. The PNN network in Fig. 2. requires a separate RBF network for each class. The described PNN model has several free parameters which must be determinate during a learning process. The parameter values ai , mi , si for each RBF network are determinated for each class separately based on the training set for a given set. This is done using a probability density estimation algorithm,

described in work [2], which lays in the framework of Expectation-Maximisation (EM) algorithm [3].

4 Multilayer perceptron We used two layer feed-forward network to classify given vectors. The number of inputs in the net was forced by the size of vectors to be classified (In our case: 10). The number of outputs equaled the number of classes to be recognized four. We expected that the network would respond using code one-of-n - one output is active while the others are inactive. The size of hidden layer was one of parameters during the experiments. It occurred that best results were obtained with four hidden neurons while increasing the size of hidden layer has caused two drawbacks: - the network has been tending to learn by heart, we observed better results with recognising patterns used to train the network and worse with unknown ones; - process of learning has been decelerating. The net was trained with standard LevenbergMarquardt gradient algorithm for learning feed-forward networks [4]. Hyperbolic tangent sigmoid and linear transfer functions were used to connect neurons between layers. For testing we had to define reliability parameter. We expected that it would be able to decide if answer could be treated as in one-of-n code. We accepted a sample for given σ if and only if: - one showed value greater than 1-σ, - other outputs showed value less than σ. Otherwise sample was treated as unrecognized. An object was rejected. Described parameter allowed us to adjust the reliability of our network. The parameter plays the same role as parameter α in PNN.

5 Experiments The PNN and MLP has been tested for recognition four different military vehicles (BWP, MTLB, SKOT and T-72) based on the acoustical signal generated by them. We realised the set of experiments to record the sounds generated by mobile motor vehicles in different ground conditions. We recorded signals produced by military vehicles moved on the roads, unsurfaced roads and firing grounds. Speed, distance from measurement centre and the direction of movement were the extra parameters.

The recorded acoustical signals have been digitised at frequency 5 kHz. Next, the raw date has been segmented into non-overlapping frames consisting of 1024 points. A feature vector has been generated for each frame as described in section 2. This preprocessing results in a 10 dimensional sample space. The data consisted of two different (independent) sets training and testing. Each class in the PNN is represented by RBF neural network with 8 hidden neuron, 10 input neurons (the network giving the best results). The MLP has 4 hidden neurons (also chosen as the network giving the highest recognition error. Achieved results for testing data set and several rejection thresholds are presented in Tab. 1. for PNN and Tab. 2. for MLP. The results are compared in Fig. 4. and Fig. 5. threshold 0.001 0.01 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.60 0.65÷1.0

correct 28.61 42.18 52.57 58.26 61.85 64.29 65.91 67.54 68.65 70.28 71.39 72.98 73.49 73.68

error 2.40 4.07 7.73 10.20 12.61 14.75 16.60 18.37 20.04 21.59 23.44 24.99 26.14 26.32

rejected 68.98 53.75 39.70 31.53 25.55 20.96 17.49 14.09 11.31 8.13 5.18 2.03 0.37 0.00

Table 1: Achieved results with respect to rejection threshold for PNN threshold 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 1.00

correct 30.65 37.67 45.32 50.35 54.86 58.85 62.66 65.80 68.80 71.20

error 2.88 4.73 7.25 9.57 11.98 14.53 17.15 20.96 24.66 28.80

rejected 66.47 57.60 47.43 40.07 33.16 26.62 20.18 13.23 6.54 0.00

Table 2: Achieved results with respect to rejection threshold for MLP

30 0/3

25 311

20 15 10 5

0

10

20

30

40

50

60

70

3HUFHQWDJHRIUHMHFWHGGDWD

Fig. 4: Error rate (%) with respect to percentage of rejected data

80 0/3

70

311

60 50 40 30 0

10

20

30

40

50

60

70

3HUFHQWDJHRIUHMHFWHGGDWD

Fig. 5: Correct recognition rate (%) with respect to percentage of rejected data

6 Conclusion The achieved results are slightly better for PNN then MLP but the difference is rely small. Therefore it is impossible state with network is better. The MLP has 10*4+4+4*4+4=64 free parameters (weights) what is much smaller then 4*(8*10+8*10+8)=672 for PNN. It is motivated by the fact that PNN is trying to estimate not the class boundary (like in the MLP), but the probability density for each class. The PNN is learned faster than the MLP due two the fact that each class in PNN is learned separately, therefore reducing the size of training set compared to MLP. The fact that learning of each class is independent in a case if PNN results in a possibility of extension the PNN classifier to the higher number of classes without the need of relearning the whole net (only the new RBF nets for new classes must be learned).

This way is very easy to adapt the system to recognise completely new objects and preserve hitherto existing master data. The data base can be updated if there are changes within object construction which have an effect on sounds generated by vehicles. This way the data base can be preserve to be out-ofdate. It is worth to investigate the performance of the mixture of PNN and MLP classifier, as well as the sequential classification (classification based not on a single feature vector but based on several following vectors). The other area of future work is the problem of selecting the number of hidden neurons in both networks. In this example this free parameter was chosen to give the best recognition results on the testing set, what is not suitable for real application. It needs further investigations. The researches related to the mobile object recognition based on acoustic information can be used as the basis for the new systems to support the contemporary military equipment. The system observes the stage in passive way, so it is very hard to find and destroy the military devices equipped by such intelligent attachment. The proposed algorithm of recognition can be implemented in very efficient way using DSP technology and microcontrollers as intelligent robot. It means, that the system can be work out as an ASIC circuit and is easy applicable as a no service unit in existing military vehicles.

References [1] B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 1996. [2] T. Walkowiak, Use of RBF neural networks for density estimation, Proceedings of XIX-th National Conference Circuit Theory and Electronic Networks, October, 23-26, Kraków-Krynica, Poland, vol. II, pp. 657-662. [3] A.P. Dempster, N.M. Laird, D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of Royal Statistical Society, vol. B 39, pp. 1-38, 1977. [4] Ch.M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford, 1995. [5] H.C. Choe, R.E. Karlsen, G.R. Gerhart, T. Meitzler, Wavelet-based ground vehicle recognition using acoustic signals, SPIE, vol. 2762, 1996.