Generalization Ability of a Support Vector Classifier Applied to Vehicle

0 downloads 0 Views 166KB Size Report
Swedish Defense Research Agency (FOI) and the com- pany YZComp. Well over 1000 individual and com- bined passages of 14 different vehicle models have ...
Generalization Ability of a Support Vector Classifier Applied to Vehicle Data in a Microphone Network Andris Lauberts ∗ David Lindgren Swedish Defence Research Agency Abstract - Audio recordings of vehicles passing a microphone network are studied with respect to the classification ability under different weather and local conditions. The audio data base includes recordings in different seasons, recordings at various sensor locations and also recordings using different microphones. A support vector machine (SVM) is used to classify vehicles from normalized, low-frequency spectral features of short time chunks of the audio signals. The classification performance using individual time chunks is estimated, as well as the accuracy of fusing data from the different microphones in the network. The study shows that, combining temporal and spatial data, a vehicle traversing a microphone network can be correctly classified in up to 90 percent of all runs. A more demanding test, classifying data from a totally independent measurement equipment, yields 70 percent correct classifications.

Keywords: Classification, support vector machine, data fusion, vehicle, microphone, network.

1

Introduction

During the past few years, extensive recordings of passing vehicles have independently been carried out by the Swedish Defense Research Agency (FOI) and the company YZComp. Well over 1000 individual and combined passages of 14 different vehicle models have been registered by a sensor system consisting of several microphones and other sensors. An important task of the used sensor system was, and still is, identification and/or classification of the passing vehicles. An important track of FOI’s research in this respect has been vehicle classification by normalized spectral features. Note that, by classification here is meant identification of a vehicle model, for instance, a SAAB 900 or a Volvo V70. This paper shows the ability of a support vector machine to solve this task, particularly the ability to generalize between different sensor settings, seasons and measurement equipment.

1.1

Vehicle Classification by Audio

Audio recordings of vehicles passing a microphone do, not surprisingly, show a systematic difference depend∗ A. Lauberts (corresponding author), Department of Command and Control, Swedish Defence Research Agency, SE-581 11 Linkoping, Sweden. Email: [email protected], phone: +46 13 37 83 38 , fax: +46 13 37 80 58.

ing on the type of passing vehicle, say a car and a motorcycle. Most people can by experience and/or little training recognize the difference and rather accurately distinguish between cars and motorcycles merely by listening to the sound of the recordings. However, natural variations such as different speeds and gears, doppler shift, and so on, variations that the human brain easily generalize, can actually make it rather difficult to construct an automated classifier that does an equally god job as a human when it comes to distinguishing between different vehicles. To make an automated classifier work well, it usually needs to be “trained” by a large set of recordings with known vehicle types—training sets that encompass a major part of all variations within each vehicle class. If a new recording in some sense resembles one of the recordings in the training set, the library, then there is a good chance that the classifier makes the right choice. These libraries, however, need to be excessively large and become very expensive unless the classifier is given ability to extrapolate or generalize. By ability to generalize we mean that the classifier should make a satisfactory classification although the match with some of the recordings in the library is not perfect. This is really the classical problem within the field of pattern recognition and machine learning: to learn something general from a limited set of observations, see, for instance, [1]. By using audio recordings from a sensor network with a whole set of microphones placed at different locations, more information/observations is given, and thus, the sensor network should give better classifier performance compared to a single microphone. The sensor network, however, also gives an increased richness of variation in the observations, and the demand on the classifier’s ability to robustly generalize increases in the same degree. The audio recordings used in this paper are collected at many different locations and include recordings both in the summer and in the winter, see Table 1 for the YZComp data. (The measurements will be further discussed in Section 4.) This rich data set very well gives the opportunity to study the generalization ability of a classifier by collecting a library at one location and validating it on another. This and other tests will be described in Section 5. The ultimate goal is robust tracking and classification of targets moving in a natural environment during a wide range of conditions using a minimal knowledge base provided by available data communication in a

1.2

Advantage and Limitation

An advantage of the methods used in the paper (Section 2) is that the signals are preprocessed by power normalization. This is believed to reduce the impact of variations in signal power due to different sensortarget distances and different gain settings in the measurement amplifier. In consequence, this relaxes the need for accurate tuning and placement of the sensor system. A before-hand known drawback of the methods is that the spectral estimation needs a rather stationary signal, that is, the classifier hardly copes with quick changes (< 12 s). In practice this means that the classifiers proposed here are very good at steadily running vehicles, but less effective at fast accelerations and gear shifts. It will be assumed that a recorded signal is due to one vehicle only.

2

Spectral Features

Inspecting the frequency spectrum of an (audio) signal often gives a “better picture” than the inspection of the signal directly. This is particularly so when there are periodic components in the signal, for instance, from a revolving engine. These periodic components are represented as peaks in the spectrum. The spectral representation also makes it easy to focus on frequency bands known to contain relevant information, feature selection. In this paper, vehicles are classified by audio spectra estimated by the discrete Fourier transform of microphone recordings. The idea is that there are systematic differences in the spectra due to different vehicles, and that these differences can be learned by an automated classification program. This will be briefly explained below as well as how to fuse spectra from different microphones and time instances. Signal classification in the frequency domain is well covered in the literature, see for instance [2], although the best way to handle the non-stationarity in the vehicle audio appears to be an open question.

2.1

Discrete Fourier Transform (DFT)

Denote by xi (t), t = 1, 2, . . . , n the sampled (audio) signal from one microphone under a fixed sampling frequency and under a fixed interval of time (n). The index i is the recording ID and recordings with different IDs can, of course, be due to different vehicles, but also just repeated recordings on the same vehicle. The standard discrete Fourier transform (DFT) offers one way to estimate the signal power of {xi (t)}n1 at the

Spectral Density

self-configurating network. This paper concentrates on classification of single moving vehicles using libraries of models trained on known targets, leaving several challenges for future work, for instance, the need to compensate for varying environment, weather, dynamics, target orientation, internal sound and band width.

0.8 0.6 0.4 0.2 0 0

20

40

60

80

100

120

140

Figure 1: Example of normalized periodograms from two different vehicles. (scaled) frequency f as the periodogram Xi (f ), f = 0, 1, . . . , n − 1, represented by the n-dimensional vector T  Xi = Xi (1) Xi (2) · · · Xi (n) .

(1)

This would be explained to the detail in any elementary textbook on digital signal processing, see for instance [3, pp. 66, 131]. Figure 1 illustrates the low frequency part of the periodograms from two different vehicles. The periodograms are normalized to unit Euclid norm.

2.2

Limitations of the DFT

A spectral estimate by the DFT needs a rather stationary signal with periodic elements to yield as distinct features (peaks) as in Figure 1. Transient processes like accelerations, gear shifts and fast doppler shifts will smear out the peaks considerably and make it more difficult to distinguish the periodogram of one vehicle from that of another. To somewhat overcome the requirement on stationarity, we will take the DFTs of short time chunks (windows) of the signal (0.5 second or less). A too narrow window, however, limits the accuracy (low frequency resolution and severe impact of random noise), so there is a trade off. Thus, every signal is first segmented or partitioned into a set of time chunks which in turn yield a set of periodograms. This set of periodograms are then aggregated to give the over-all picture of the signal spectral features. Of course, the spectrum cannot be expected to be constant over time, since the recorded vehicle speed and thereby the periodic revolutions of the engine may change during the recording. Also, the doppler shift of a passing vehicle makes the frequency components move over the spectrum. This means that the periodograms from one signal cannot be simply aggregated by–say–the mean value. However, more adequate methods of aggregation (fusion) are proposed in Section 3.2.

3

Classification Methods

A periodogram is classified by comparing it to a set of periodograms in a class library. This library is based on (training) recordings where the vehicle class is known beforehand, so if the new periodogram matches a periodogram in the library, it can directly be associated with a vehicle class. In this paper we exclusively study the support vector machine (SVM) classifier which is one among many methods to represent and use the

class library. Below we will briefly describe the SVM, which is well-known in the literature, see [1, 4], and also how to fuse data from multiple microphones.

3.1

Support Vector Machine (SVM)

The SVM classifier is fundamentally a binary classifier f : Rn → R, predicting the class label ωi that takes on values in the set {−1, 1}. For the library (training) periodograms {Xi , ωi }N 1 , f is the solution to min c

N 

V (f (Xi ), ωi ) + ξcT Kc

(2)

i=1

with f (Xi ) =

N 

cj K(Xi , Xj )

and the loss V (f (Xi ), ωi ) = max[1 − ωi f (Xi ), 0]. Note  T that c = c1 c2 · · · cN . K(Xi , Xj ) is a reproducing kernel, and particularly in this paper, we use the polynomial kernel K(Xi , Xj ) =

d

+ 1) 2

K(Xi , Xj ) = e−γXi −Xj  .

(5)

γ and d are tuning parameters. The kernel matrix K has K(Xi , Xj ) on the ith row and jth column. The term ξcT Kc in (2) serves to smooth f (Tikhonov regularization). We use ξ = 1. The minimization problem (2) is a quadratic programming problem with a problem structure that allows very large instances to be solved efficiently . Far from all periodograms in the class library need to be stored. In the literature there are many schemes to combine a set of binary SVM classifiers to handle multiclass problems. It still is an open question how to do this the best way, see [4]. We use an implementation that defines an SVM classifier for every class pair. The results of the binary classifications are then combined in a non-trivial way. We refer to [5] for details. Estimating Reliability When periodogram Xi is classified, also the classification reliability λi needed for efficient data fusion is estimated. λi is a number between 0 and 1, where values close to 1 indicate very confident classification. λi is induced by the class posterior probabilities Pω (Xi ) assumed available as an output of the classification program (SVM), one per class, ω = 1, 2, . . . , q. q is the number of classes in the library. It is assumed the probabilities sum up to unity, q  Pω (Xi ) = 1, i = 1, 2, . . . (6) ω=1

A confident classification is one with one class posterior probability significantly larger than all others. We define λi as supω Pω (Xi ) − q −1 , 1 − q −1

By data fusion, two or more observations of the same target made by two or more sensors are combined to a unified result or classification. Data fusion also combines observations made at different time instances, so we may distinguish between spatial fusion where observations are collected from different sensors, and temporal fusion, where observations are collected over time. Decision fusion is a special fusion concept, where each observation is classified individually, whereafter these separate classifications are combined to a unified classification (in contrast to combining raw data). Below, three decision fusion methods are outlined, Bayes rule, voting, and VIP voting.

ω = 1, 2, . . . , q.

Bayes Rule Given a set of independent observations s {Xi }N 1 to be fused, the posterior distribution for class ω is, assuming uniform prior, Ns Pω (Xi ) Pω (X1 , X2 , . . . , XNs ) = q i=1 . Ns i=1 Pj (Xi ) j=1

(4)

and the radial basis kernel

λi =

Data Fusion

(3)

j=1

(γXiT Xj

3.2

(7)

(8)

As before, Pω (Xi ) is the posterior of class ω given Xi . Pω (X1 , X2 , . . . , XNs ) is evaluated for every class ω and ∗ s the set {Xi }N 1 is classified as a class ω that yields the supreme value. The above expression (8) is derived from the wellknown Bayes rule p(·|Xi )p(Xi ) = p(Xi |·)p(·). Voting The final classification is taken as the majority class of all individual periodogram classifications. Ties are resolved by also considering the second highest rankings. VIP Voting In VIP voting the ten most reliable (largest λi ) classifications are allowed to vote.

4

Measurements

The concept of an interactive sensor network has been tested during experimental runs at different places in Sweden by YZComp (November 2002, May 2003) and FOI (June 2003, October 2003) using arrays of acoustic sensors consisting of microphones in addition to other sensors. The YZComp data base contains data on 14 different (military) vehicle models recorded at different speeds. The acoustic sensor setup consisted of 4 microphones (u1 , u2 , u3 , and u4 ) with sampling rates 10, 40 and 50 kHz. The measurements have been collected from 12 different sensor setups at different times and locations and ground conditions according to Table 1. The FOI acoustic data base contains data on six different vehicle types using a network of 5 microphone arrays operating at 44 kHz sampling rate. The October trial shared three vehicles types with the June trial. The vehicles were driven at steady speeds 20-35 km/h in both directions on a gravel road through a broken ground forest.

Table 1: Type code of the included vehicles by YZComp together with number of vehicle runs recorded at the different sensor locations A-D (November) and E-L (May). The symbols on the second line stand for road conditions, D: dirt road, G: grass, W: woods, A: asphalt, alone or combined.

5

Type Code

Weight

Propulsion

LW1 LW2 HW1 LT HT3 HT8 HW3 HT9 HW2 HT4 HT2 HT1 HT10

Light Light Heavy Light Heavy Heavy Heavy Heavy Heavy Heavy Heavy Heavy Heavy

Wheels Wheels Wheels Tracks Tracks Tracks Wheels Tracks Wheels Tracks Tracks Tracks Tracks

Runs 14 67 58 44 48 44 82 102 38 30 34 30 65

A DG

B G

C DW

D D

E G

F A

14 14 14 14 14 14 14

8 8 8 11 8 8 14

13 12 12 12 12 12 12

2 10

5 4

11 10 10 11 10 10 10

Measurement Analysis

The analysis consists of the following two parts: classification of passages and classification of periodograms. The first part focuses on classifying a vehicle passage using both spatial and temporal fusion on periodograms from all 4 microphones and all time instances. This method is applied to the richer YZComp data only, keeping training and test sets wide apart, with the aim to set an upper performance level using carefully pre-filtered signal data. In the second part the classification accuracy is estimated for individual periodograms taken independently from all sensor locations and time instances. This method is applied to both FOI data and YZComp data with the aim to see how far a classifier, trained on one data set, can generalize its capacity to a different data set.

5.1

Implementation

The analysis is completely conducted in MATLAB with CPU intensive parts in C++. We use the SVM implementation LIBSVM, see [5]. The LIBSVM classifier outputs a score vector with estimated class probabilities Pω (Xi ). The score vector sums up to unity and the actual classification is simply the class corresponding to maximum Pω (Xi ).

5.2

Classification of Passages

Here we study the classification accuracy when all audio data from a whole passage is used. Classification is basically the decision fusion of periodograms collected from all four microphones and from the instant the audio power exceeds a predefined level. Decision fusion by VIP-voting is used. Preprocessing The preprocessing serves to form a normalized periodogram with appropriate resolution: 1. Resample to 25 kHz 2. Detect the close passage as the continuous signal part where the signal power exceeds 20% of the maximum power

G G 2 2

18

H D 12 12

I G

J G

K D

8 4 8 4 4 4

18 18 18 14 18 14

12 12 12 12 12 12

34

L D

31

3. Calculate periodograms based on DFTs of nonoverlapping 8192 point rectangular windows 4. Extract periodogram components 5 to about 100 (15-300 Hz) 5. Normalize to unit Euclid norm The data reduction followed by extracting certain low frequencies of the close passage reduce the computational load considerably and also, to some extent, avoid numerical issues. The SVM classifier is applied directly on the preprocessed data. Validation Approach In a number of sections below the classifier performance is evaluated (model validation). The difference between the sections is the selection of passages that are used to build the vehicle class library, and the selection of test passages used to compute the accuracy of the classification. This accuracy is computed as the share of correctly classified vehicles. The test passages always come from sensor settings different from those making the class libraries, see Table 1. To be certain not to obtain results that are too optimistic, all single-vehicle passages in the available data base are used – even awkward sounding passages, starting, stopping, accelerating, and so on, are classified with no exception. This means that the classifier performance is evaluated on data that include test cases where there is really no hope to make a correct classification, not even for a trained human ear. The number of periodogram components and kernel function to be used, have been selected to yield a classification accuracy of around 99% when testing the classifier on the library data (training data). With higher hit rates, the performance degrade due to overfitting (fitting to noise). 5.2.1

Data Collected Winter Time

This vehicle class library is collected from sensor setting A and validated on B, C, and F. These settings comprise the recordings collected winter time, and include 7 vehicle models, see Table 1. Table 2 is

the confusion matrix resulting from the classification. Note that the rows represent the true vehicles and the columns represent the output from the classifier. Table 3 gives the aggregated statistics over all vehicles. The classifier’s most significant failure is classifying 7 out of 35 HT3 as HT9. Around 80% correctly classified passages is a rather consistent result here. In this test we used periodogram components 5-130 and the polynomial kernel (4) with d = 5. Table 2: Confusion matrix for classification at setting B, C and F. The library is from setting A. LW2 HW1 HW3 LT HT3 HT9 HT8

LW2 23 5 3 2

HW1

HW3 3 2 20

28 1

LT 1

HT3 1

HT9 2

3

1

23 1 5

7 35

HT8 2

27

1

2 24

Table 3: Share of correct classifications at different settings for a vehicle class library from setting A. Setting B C F Average

5.2.2

Correct 78% (51/65) 85% (72/85) 79% (57/72) 81% (180/222)

Setting I J Average

5.2.3

Table 4: Confusion matrix for classification at setting I and J. The library is from setting K. HT2

HW3 2

HT4

19

Data from Winter and Summer Time

This experiment comprises 12 vehicles and recordings collected in both winter and summer. 7 vehicles were recorded in the winter and 7 in the summer. However, only two vehicles are in common for winter and summer recordings, namely HW3 and HT9, see Table 1 and the following section. The confusion matrix in Table 6 indicates difficulties in discriminating the vehicle pair HW3 and LW2 and the vehicle pair HT3 and HT9. Table 7 gives that around 80% of the vehicles are correctly classified in average. It is seen that sensor setting F and I do not fit the vehicle class library well. In this test we used periodogram components 5-130 and the polynomial kernel (4) with d = 5.

Setting B C F G I J Average

Data Collected Summer Time

HW2 24

HT1

HT9

HT6

1

1

1

5.2.4

This library is collected winter time and validated in the summer, see Table 8 and Table 9. Around 99% of the passages are correctly classified, which is a satisfying figure, although it would have been interesting to study more vehicles. However, these are the only vehicles recorded both in the winter and in the summer. In this test we have used periodogram components 5-70 and the radial basis kernel (5) with γ = 4. Table 8: Confusion matrix for settings in the summer (I,J,K). Library from settings in the winter (A,B,C,F).

15 1 2

17 3

17 18

Correct 74% (48/65) 84% (71/85) 68% (49/72) 90% (18/20) 67% (24/36) 96% (109/114) 81% (319/392)

Winter Library Validated in Summer

26 3

Correct 86% (31/36) 92% (105/114) 91% (136/150)

Table 7: Share of correct classifications at different settings for a library from setting A and K.

This vehicle class library is collected from sensor setting K and validated on I, J. These recordings are collected summer time and include 7 vehicle models. Table 4 is the confusion matrix. Table 5 gives the aggregated statistics over all vehicles. It is seen that the vehicle passages at sensor setting I are somewhat more difficult to classify compared to K. This is believed to be due to rather transient vehicle behavior in I, which is known to be unfavorable for the used classification method. This type of vehicle dynamics is only rarely represented to the same degree in the library. In average, 90% of the passages are correctly classified. In this test we used periodogram components 5-130 and the polynomial kernel (4) with d = 5.

HW2 HT2 HW3 HT4 HT1 HT9 HT6

Table 5: Share of correct classifications at different settings for a classification library from setting K.

HW3 HT9

HW3 37

HT9 1 34

Table 6: Confusion matrix for classification at setting B, C, F, G, I, and J. The library is from setting A and K.

1 2

1

2 2 1

2 1 2

1

2 24

1

1

1

In this second part of the analysis, we consider classification libraries based either on YZComp or FOI field trials. This gives the opportunity to validate the methods on data collected by a totally independent measurement system, compared to the recordings in the class library. The SVM is used with the polynomial kernel (4) and d = 3. Only spatial data fusion is used.

1

4. Automatic removal of outliers due to vehicles standing still or strongly accelerating 5. Extract components 6 to 75 (10-150 Hz)

9

3

14 2

1 74

1 1

15 3

3 24

LW1 True class ID

LW3 HW2 HT4 HT1

LW1

HT2

Figure 2: Confusion diagram showing performance of SVM trained on FOI Oct-03 data, mic 3, tested on Oct-03 data, mics 5, 6, 8 applied to all six vehicles and mixed runs used for training. 100 90 80 70 60 50 40 30 Mean estimate 20

Voting Product rule

10 0

LW1

1. Resample to 2 kHz

3. Automatic selection of 50 windows with an average power exceeding 50 per cent of the maximum power

2

1

Preprocessing Prior to the SVM, the signal is preprocessed as follows:

2. Calculate periodograms based on DFTs of nonoverlapping 1024 point rectangular windows

1

16

Correct 100% (12/12) 100% (36/36) 96% (23/24) 99% (71/72)

Classification of Periodograms

HT8 2 2

1 13

1

HT6

HT2

2

Percentage correct classified

5.3

HT9

19

Table 9: Share of correct classifications at settings in the summer (I,J,K). Library from settings in the winter (A,B,C,F). Setting I J K Average

HT1

HT2

10 2 50 3 1

HT4

HT1

1 26

HT3

HT4

18

LT

HT1

HW3

HW2

HW1

HT4

LW2

LW3

HT2

HW2

HW2 26

LW3

HW2 HT2 LW2 HW1 HW3 LT HT3 HT4 HT1 HT9 HT6 HT8

Figure 3: Classification performance from mean of single sensors, voting and Bayes rule for case described in Figure 2. voting or a Bayes rule, beats the mean estimate from single sensors. Note also that voting, except for the vehicle HT4, is just as good as the Bayes rule, reaching about 90% mean success rate, see Table 10 (FOI Oct-03 - Internal).

6. Normalize to unit Euclid norm 5.3.2 5.3.1

Data Collected by FOI in October

The confusion diagram in Figure 2 and its companion, Figure 3, shows the performance of the SVM classifier applied to FOI Oct-03 data on six vehicles using microphone original no. 3 for training and microphones 5, 6 and 8 for test on a mix of same runs used for training. As it should, the dominant squares lie on the main diagonal. We note that for all vehicles data fusion, using

FOI Oct-03 data validated on June data

Still focussing on FOI data, the confusion diagram in Figure 4, together with its companion, Figure 5, shows the performance of the SVM classifier trained on FOI Oct-03 data with six vehicles using microphone no. 3, tested on FOI June-03 data with three vehicles in common with the Oct-03 set using microphones 5, 6 and 8. Vehicle types are sorted to pronounce correct associations shown by the dominance of squares on the main

True class ID

the much better success numbers for HW2, HT4 and HT1.

HW2

HT1

HT2

HT1

HT4

HW2

LW1 80 70 60 50 40 30 Mean estimate 20

Voting

10

product rule

0

60 50 40 30 Mean estimate 20

Voting

10

HT2

70

HW2

LW1

80

90

HT1

90

100

HT4

Percentage correct classified

100

Figure 7: Classification performance from mean of single sensors, voting and Bayes rule for case described in Figure 6.

Product rule

HT2

HT4 HT1

HT10

HT6

HT9

LW2

HT2

HT2 HW1

A crucial test concerns the validation of the SVM classifier with training and testing data taken from completely different field trials, locations, vehicles and sensors. The confusion diagram in Figure 6, and its companion, Figure 7, shows the performance of the SVM trained on YZComp data with five vehicles at five locations (B, C, E, F and I), using four microphones, tested on FOI Oct-03 data with five other vehicles having types in common with the the YZComp set using three microphones. With a slight exception (HT1) all dominant squares lie on the diagonal. Major confusions appear for vehicle LW1 sharing estimates with the completely different type HT1, and vehicle HT2 sharing estimates with HT4 and HT1. In practically all instances data fusion beats the mean estimate with voting closely matching the Bayes rule, apart from the big exception at HT2, see Table 10 (YZComp 5 - FOI Oct). We also note an uneven success rate between the different vehicles, with LW1 and HT2 trailing behind

HW2

HT1

YZComp Data with 5 vehicles validated on FOI October data

LW1

HT4

Figure 5: Classification performance from mean of single sensors, voting and Bayes rule for case described in Figure 4.

HW2

HW2

LW1

LW1

True class ID

Percentage correct classified

HT4

HT2

Figure 4: Confusion diagram of SVM trained on FOI Oct-03 data with six vehicles, mic 3, tested on June-03 data, mics 5, 6, 8 with three vehicles in common to Oct-03 runs.

5.3.3

HW2

HT1

HT4

LW3

HT2

HW2

HT2

0

LW1

Figure 6: Confusion diagram of SVM trained on YZComp data with five vehicles, mics u1, u2, u3, u4, tested on FOI Oct-03 data, mics 5, 6, 8 using five vehicles with types in common to the YZComp set.

LW1

LW1

True class ID

diagonal in the leftmost 3 x 3 block, where LW1 stands for two similar vehicle types. Major confusions appear for vehicle HW2 sharing some estimates with LW1 and HT2. Note also that vehicle HT2 is only confused, to a small degree, with other heavy type vehicles. In contrast to other cases, data fusion is inferior to the mean estimates, scattering about 80 percent success rate, see Table 10 (FOI Oct-03 - June-03). A look at the prediction levels per microphone shows that microphone no. 5 performs considerably worse than nos. 6 and 8, easily corrupting data fusion.

Figure 8: Confusion diagram of SVM trained on YZComp data with ten vehicles, using mic u1, u2, u3, u4, tested on FOI Oct-03 data, mic nos. 5, 6, 8 with five vehicles of types in common to the YZComp set.

5.3.4

YZComp Data with 10 vehicles validated on FOI October data

A final, even more challenging, test again probes the generalization power of SVM using training data from one field trial and testing on quite another, this time including five more vehicle types for training than used in the test. The confusion diagram in Figure 8, together with its companion, Figure 9, shows the performance of the SVM classifier trained on YZComp data with ten vehicles at five locations (B, C, E, F and I), using four microphones, tested on FOI Oct-03 data with

Table 10: Performance in percent for different combinations of data sets and classification methods. Code A:BCD stands for library A, validate on B, C and D, etc. Internal: training and testing using a mix of same runs but different microphones, YZComp 5 and 10: training using 5 and 10 vehicle types, respectively. Data Classif. of passages: YZComp- A:BCD YZComp- K:IJ YZComp- AK:BCFGIJ YZComp- ABCF:IJK Class. of periodograms: FOI Oct-03 - Internal FOI Oct-03 - June-03 YZComp 5 - FOI Oct YZComp 10 - FOI Oct

Mean

Voting Bayes 81% 91% 81% 99%

75% 80% 64% 37%

87% 74% 70% 42%

93% 77% 77% 41%

Percentage correct classified

100 Mean estimate

90

Voting 80

product rule

70

6

Conclusions

The classification accuracy of support vector machines on normalized spectral features of audio recordings of passing vehicles has been estimated. With reference to Table 10 we summarize the results as follows. Passages, YZComp data By using spatial and temporal decision-fusion in a network with four microphones, an over-all classification accuracy exceeding 80% has been achieved. It appears that classification in the summer time is easier (90% accuracy) than in the winter (80% accuracy). However, not all vehicles used in the winter recordings were used in the summer. FOI data Success rates of almost 90% are achieved using data from six vehicles and one observing period, decreasing to almost 80% using training data of six vehicles and test data on three vehicles from a completely different observing period. SVM trained on YZComp data, tested on FOI data Including only the common part of five vehicles, the success rates are 64% per sensor and 70-77% with data fusion. Including ten vehicle types in the training set, performance drops to 40% only.

60 50 40 30 20 10

HT2

HT1

HT4

HW2

LW1

0

Figure 9: Classification performance from mean of single sensors, voting and Bayes rule for case described in Figure 8.

It is challenging to account for the evolving differences between successive spectral estimates caused by speed and engine effects of a particular vehicle. The effect of significant speed changes can, of course, be avoided by simply disregarding spectral estimates with a detected temporal change. Engines, however, display a wide spectrum of signatures, requiring a more detailed study.

References [1] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. Springer, 2001.

five vehicles having types in common with the the YZComp set using microphones 5, 6 and 8. Compared with Figure 6 we see now that only types HW2, HT4 and HT1 show dominant squares on the diagonal in the leftmost 5 x 5 block. Major confusions appear for vehicle LW1, estimated as the completely different type HW1, and vehicle HT2 sharing estimates with HT9 plus minor additions from other types. However, even the best performing three middle types confuse, to a lesser degree, their estimates with other types. This time data fusion only slightly beats the mean estimate, see Table 10 (YZComp 10 - FOI Oct). Compared with Figure 7 the uneven success rate between the different vehicles is even more pronounced, with LW1 and HT2 rates dropping almost to zero level. Evidently the inclusion of five more vehicle types in the training set offers a lot more choices, pointing out the need to restrict the number of known types as far as possible. Another source of error could be individual differences, even between vehicles of the same model.

[2] R. H. Shumway, “Discriminant analysis for time series,” in Handbook of Statistics, P. R. Krishnaiah and L. N. Kanal, Eds. North-Holland Publishig Company, 1982, pp. 1–46. [3] S. K. Mitra, Digital Signal Processing. A Computer Based Approach. McGraw-Hill, 1998. [4] R. Rifkin and A. Klautau, “In defense of one-vsall classification,” Journal of Machine Learning Research, vol. 5, pp. 101–141, 2004. [5] C.-C. Chang and C.-J. Lin, LIBSVM: a library for support vector machines, 2001, software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

Suggest Documents