Computer Standards & Interfaces 33 (2011) 142–151
Contents lists available at ScienceDirect
Computer Standards & Interfaces j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c s i
Vehicle model recognition from frontal view image measurements A. Psyllos a,c,⁎, C.N. Anagnostopoulos b, E. Kayafas a a b c
Electrical and Computer Engineering School, National Technical University of Athens, Greece Cultural Technology & Communication Dpt., University of the Aegean, Mytilene, Greece European Commission Joint Research Centre, Institute for the Protection and Security of the Citizen, Italy
a r t i c l e
i n f o
Available online 28 June 2010 Keywords: Vehicle Recognition Image Measurement
a b s t r a c t This paper deals with a novel vehicle manufacturer and model recognition scheme, which is enhanced by color recognition for more robust results. A probabilistic neural network is assessed as a classifier and it is demonstrated that relatively simple image processing measurements can be used to obtain high performance vehicle authentication. The proposed system is assisted by a previously developed license plate recognition, a symmetry axis detector and an image phase congruency calculation modules. The reported results indicate a high recognition rate and a fast processing time, making the system suitable for real-time applications. © 2010 Elsevier B.V. All rights reserved.
1. Introduction Most of the existing vehicle detection systems deal with only vehicle / non-vehicle detection, classification into general categories or tracking vehicles and is a task that has been adequately addressed in the literature ([5,7,8,19]). However, Vehicle Manufacturer and Model Recognition (VMMR) is a subject with relatively limited research reported in the field, since image acquisition is performed in outdoor environment where illumination conditions are uncontrolled and varying, thus making recognition a difficult task. Given the wide variety in the appearance of vehicles within a single category alone, it is difficult to categorize vehicles using a simple model. Dlagnekov and Belongie [3] utilized Scale Invariant Feature Transform (SIFT) features recently developed by Lowe [10,11] which are invariant to scale, rotation and even partially invariant to illumination differences, making them suitable for VMMR. In their work, rear view vehicle images were used, attaining 89.5% recognition accuracy. The vehicle manufacture and model were treated as a single class and recognised simultaneously. However, it was reported that system does not have real-time performance and no results for recognition speed were published. Petrovic and Cootes [14] presented an interesting approach for VMMR, from frontal view vehicle images, that displayed 93% recognition accuracy. The vehicle manufacture and model were treated as a single class and recognised simultaneously and no results for recognition speed were reported. Merler [13] presents a vehicle detection system based on color segmentation and labeling, which performs color recognition and segmentation but lacks VMMR. Lee [9] uses texture descriptors such as contrast, homogeneity,
⁎ Corresponding author. European Commission Joint Research Centre, Institute for the Protection and Security of the Citizen, Italy. Tel.: +39 0332 789864. E-mail address:
[email protected] (A. Psyllos). 0920-5489/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.csi.2010.06.005
entropy and momentum for frontal view vehicle images, classified by a three-layered neural network for VMMR, with 94% accuracy. The vehicle manufacture and model were treated as a single class and recognised simultaneously. Processing times are also not reported. A comparative knowledge acquisition system appears in Maemoto et al. [12], consisting of several object recognition modules, which represent a car image viewed from the rear, such as a window, tail lights, and so on, based on color recognition. This approach has the drawback of being sensitive to lighting conditions. In this work, a system that aims at obtaining reliable VMMR is presented, first by locating the license plate in a vehicle frontal view image and detects a region of interest over the vehicle, including headlights, grill and logo area, henceforth called “vehicle mask image”. A symmetry axis module is employed to verify and to adjust centrally the segmented frontal “vehicle mask image”, if needed. Then, manufacturer logo detection and segmentation is performed, using a special image processing technique called Phase Congruency Calculation (see Section 2.3). The contribution of this work, lies mainly in the effective use of relatively simple image processing techniques for LPR, vehicle mask and logo images detection and segmentation, and using a Probabilistic Neural Network (PNN) as a classifier, resulting in fast recognition times and being suitable for real-time applications. A novelty of this work, is using a hierarchical image database, that contains a set of much smaller databases, one per manufacturer, with all the models of that manufacturer. This methodology reduces significantly the recognition time, at least in one order of magnitude, since only the respective manufacturer database model needed to be matched1. In order to know which manufacturer to search the models 1 If there are M manufacturers and N models/manufacturer, using a single database, the recognition complexity is O(M2N2) and when using a hierarchical database, the complexity is O(M2) + O(N2).
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
143
Fig. 1. Vehicle manufacturer and model recognition system architecture.
for, a logo detection and recognition module is employed first and the manufacturer is being detected. Next, the model that corresponds to the vehicle mask, where a logo is detected, is recognised using a model database for that manufacturer. Frontal view vehicle images are chosen, since it is more easy to detect the logos, than on the rear view images of vehicles. Another contribution of this work, is the implementation of a fast color recognition scheme, using a simple histogram-based technique, along with VMMR, which completes and integrates the information collected from every vehicle frontal image. In brief, the proposed system consists mainly of eight modules: 1) Vehicle License Plate Recognition, 2) Vehicle Frontal View Segmentation, 3) Color Recognition, 4) Phase Congruency Calculation, 5)Vehicle Mask Segmentation 6) Vehicle Manufacturer Probabilistic Neural Network (PNN) Recognition, 7) Vehicle SIFT Fingerprint Measurement and 8) Vehicle Model PNN Recognition -as depicted in Fig. 1. The algorithmic details and the experimental results are presented in the following sections of this paper. 2. Vehicle Manufacturer and Model Recognition (VMMR) 2.1. Vehicle license plate recognition The vehicle front view image from a photo camera or a framed video camera sequence, is first converted to greyscale (Portable Gray Map, PGM) with 8-bit resolution, then scaled to 640 by 480 pixel size and then a Licence Place Recognition (LPR) module is applied. LPR uses a Sliding Concentric Window (SCW) segmentation method, Anagnostopoulos [1], masking, binarization with Sauvola method [16], followed by connected component labelling and binary measurements, which are arranged in sequence. Fig. 2 highlights the image processing steps for license plate detection, and Fig. 3 shows the flowchart of the license
plate segmentation algorithm. The input to this module is a grayscale image and the result is the coordinates of a rectangular area that includes the vehicle plate number. The whole process is described analytically in Anagnostopoulos et al. [2], where it is reported that the above module when applied in a large image set of 1334 images captured in various illumination conditions, achieves a success rate of 96.5% (1287/1334 plates) for plate segmentation and 89.1% success in the recognition of the entire plate content (1148/1287). The whole process is quite fast (about 100 ms on average). 2.2. Vehicle frontal view image segmentation In this section, the area of interest (“vehicle mask”) , is defined as: Wmask = 4 × W and Hmask = 2 × H, where Wmask, Hmask are the width and the height of the vehicle “mask” and W, H, are the width and the height of the segmented license plate, respectively. The coordinates of the license plate were detected in Section 2.1 and an example is shown in Fig. 4. However, it should be noted that this assumption does not hold for vehicles having non-symmetric frontal license plate location. In that case, a vehicle image symmetry axis module (see our previous work [15] and [18]) is employed, in order to shift the license plate in the symmetric position (centrally) and then to proceed as before. 2.3. Vehicle mask image segmentation The vehicle mask shown in Fig. 4 was segmented further so as to isolate the manufacturer logo. To accomplish this task, a method based on phase congruency calculation was implemented where a dimensionless measure was employ to assess the existence of significant features, Kovesi [6], Anagnostopoulos [2]. Values of phase congruency vary from a minimum of zero (indicating no significance)
Fig. 2. Steps for license plate segmentation. a) The initial image, b) The result of SW segmentation technique after the application of the segmentation rule and thresholding, c) Image b after the image masking step, d) Plate detection.
144
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
Fig. 3. The LPR system architecture.
to 1 (indicating a very significant feature). Code with default values provided by Kovesi [6], were used in this study and a characteristic feature curve was calculated for every image, as an “image signature” of the vehicle, which is unique and representative for each of the samples used, see Fig. 5(a). For instance, in the car image of Fig. 5(c), the plot was divided in discrete sections that correspond to logo (part C), radiator grille (parts B and D) and headlights (parts A and E) using a phase congruency map gradient threshold (dS/dx N= 4.5), see Fig. 5(b). The most important part of the “image signature” is the central region of the vehicle mask, where the manufacturer logo usually appears.
2.4. Vehicle color recognition The captured vehicle image was cropped using the area over the vehicle mask and covering the hood of the vehicle, as shown in Fig. 6. In order to estimate the vehicle color, histograms of the R, G, B components have been created from the segmented hood image. The peak of each histogram (Rmax, Gmax, Bmax) forms the dominant area of
Fig. 4. Definition of the vehicle image mask (RoI) based on license plate location.
interest, which defines the color of the vehicle, as it can be seen in Fig. 7. The estimated dominant RGB value was then compared with a given set of 16 standard colors using the Euclidean metric distance, resulting in a color characterization for the input vehicle image. For example, for the vehicle shown in Fig. 6, the color recognition results in the color presented in Fig. 8. 2.5. Vehicle manufacturer recognition A Probabilistic Neural Network (PNN), described by Specht [17], was used for classification of the vehicle manufacturer (logo). It is a special type of neural network, generally used for pattern classification and it aims at classifying an input pattern into one of the predefined classes, which have been previously presented to the network during supervised training. The NN is first presented with samples of each pattern that it will be expected to recognize. The PNN learns these classes and calculates the probability that the unknown input pattern is of the same class. There are at least three layers: input, radial, and output layer. The radial units are being copied directly from the training data, one per case. Each radial unit models a Gaussian function centered at the training case and there is one output unit per class. Hence, the output units simply add up the responses of the units belonging to their own class. The outputs are each proportional to the kernel-based estimates of the probability density functions (pdf) of the various classes, and by normalizing these to sum to 1.0 estimates of class probability are produced. The greatest advantage for using a PNN is training and classification speed. Training a PNN actually consists of copying training classes into the network, that is fast and straightforward, making it appropriate for real time image processing applications. As soon as one pattern class has been observed, the network can begin to classify all future input patterns, which are fed into the PNN, into one of the pre-defined classes. A disadvantage of PNN is the big network size: the PNN network actually contains the entire set of training cases which is memory expensive and computationally hard but memory size and computer speed become bigger and faster respectively, every year. In our case, training can be done off-line and does not affect the real-time recognition phase. The first layer has Radial Basis Function (RBF) neurons, and calculates its weighted inputs with Euclidean Distance and the
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
145
Fig. 5. (a) Phase congruency feature map, (b) Derivative of phase congruency,(c) Vehicle mask segmentation based on phase congruency map gradient (dS/dx) threshold.
network input product (*). When an input is presented, the first layer computes distances from the input vector to the training input vectors and produces a vector whose elements indicate how close the input is to a training input. The second layer has competitive neurons, calculates its weighted input with dot product and sums these contributions for each class of inputs to produce, as its net output, a vector of probabilities. Finally, a competition transfer function on the
output of the second layer picks the maximum of these probabilities, and produces a 1 for that class and a 0 for the other classes. The Probabilistic Neural Network (PNN) is based on the Bayesian classifier: P ðci = xÞ =
P ðx = ci ÞP ðci Þ ∑ P x = cj P cj N
ð1Þ
j¼1
where P(x / ci) is the conditional probability density function of x given set ci, P(cj) is the probability of drawing data from another class cj . Vector x belongs to class ci if P(ci / x) N P(cj / x) , j = 1,2,…,n, j ≠ i. PNN approximates the probability that vector x belongs to a particular class ci (i.e. it estimates the likelihood of an input feature pattern being part of a learned category) as a sum of weighted Gaussian distributions centered at each training sample given by:
1
Ni
∑ exp p ð2πÞ2 σ p Ni j = 1
Fig. 6. Vehicle mask color segmentation.
T − x−xij x−xij 2σ 2
ð2Þ
where xij is the j-th training vector for patterns in class i, σ is the smoothing operator, N is the dimension of the input vector, and Ni is the number of training patterns in class i. For non-linear decision boundaries, the smoothing operator σ needs to be as small as possible. If σ is near zero, the network acts as a nearest neighbour classifier. As σ becomes larger, the designed network takes into account several nearby design vectors. The system architecture for our PNN is shown below in Fig. 9:
146
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
Fig. 7. Histograms of RGB components.
2.6.1. Scale-space extrema detection In this stage of computation, salient features are detected and extracted from the image in question. Searches over all scales and image locations are performed to identify interest points that are invariant to scale and orientation. The scale-space L(x,y,σ) is defined as the convolution operation (*) of image I(x,y) with a Gaussian kernel G(x,y,σ):
Fig. 8. Detected color.
Lðx; y; σ Þ = Gðx; y; σ Þ * I ðx; yÞ
ð3Þ
2.6. Vehicle fingerprint measurements and vehicle model recognition where In parallel to logo image segmentation, a Scale Invariant Feature Transform (SIFT) as introduced by Lowe [10,11] was applied to a series of vehicle images. SIFT is state-of-the-art in the field of image recognition and the method of choice for a wide range of applications. It is based on the idea of representing images by a set of descriptors based on gradient orientation histograms. The points of interest (keypoints) were located as local peaks in a scale-space search and filtered to preserve only those that are likely to remain stable over transformations. The SIFT keypoint descriptor has the following properties : a) Smooth changes in location, orientation and scale do not cause radical changes in the feature vector. b) It is fairly compact, expressing the patch of pixels using a 128 element vector. c) It is resilient to perspective deformations such as those caused by perspective effects and thus efficient for vehicle recognition in noncontrolled conditions. SIFT methodology is also evidenced from vision research in biological systems. In brief, SIFT includes four main steps: 1) Scale-Space Extrema Detection, 2) Keypoint localization & Filtering, 3) Orientation assignment and 4) Feature Description, which are described below.
Gðx; y; σ Þ =
2 2 1 −x + y e 2σ 2 2 2πσ
ð4Þ
This is implemented efficiently by constructing a Gaussian pyramid, (see Figs. 10, 11) and searching for local peaks (keypoints) in a series of difference-of-Gaussian (DoG) images, (see Figs. 10, 12). Lowe has shown in [11] that DoG approximates well the Laplacian of Gaussian which is the extremum function. In brief, the Difference of Gaussian function (DoG), D(x,y,σ) is computed as follows: Dðx; y; σ Þ = fGðx; y; rσ Þ – Gðx; y; σ Þg ⁎ Ιðx; yÞ = Lðx; y; rσ Þ – Lðx; y; σ Þ
ð5Þ Τhe image Ι(x, y) is blurred progressively with a series (intervals) of σ1 = rσ0 , ..., σn = rσn − 1, σ0 is the initial blurring factor for each series, r = 21/n for k = 1, 2, ..., n , where n is the total number of blurrings (convolutions). Each set of these progressively Gaussian blurred images is called an octave, and
Fig. 9. PNN system architecture.
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
147
lower than his twenty-six neighbors (8 in the current scale, 9 in the scale above and the 9 in the scale below) in the difference of Gaussian space, as shown in Fig. 13. 2.6.2. Keypoint localization and Filtering On each candidate location, a detailed model is used to determine location, scale and contrast, fitted by a Taylor quadratic form expansion [11] as shown below: F → s = Fm +
Fig. 10. A two-octave image pyramid description with 4 Gaussian convolutions per octave, taken from Lowe [11].
the total blurring factor, from the first to the last image within this set, equals 2 (σn / σ0 = 2). The image is then downsampled 50% in both directions and this equals to a further blurring with a Gaussian filter, thus it is computationally more efficient. The process described above is repeated for a fixed number of octaves, thus creating the image pyramid. The Differences of Gaussian (DoG) images, at every octave, are then checked for local extrema: A point is considered a minimum or a maximum, if it is by a given threshold higher or
∂F T→ 1 → T→ ∂F 2 s + s s →2 → 2 ∂s ∂s
ð6Þ
where vector s = s (x, y, σ) and F(s) is the Difference-of-Gaussian (DoG) D(s) function shifted, so as the origin is at the sample point, Fm is the value at the sample point of interest. The extremum position se, is determined by taking the derivative of this function with respect to x and setting it to zero, and giving the following value F(se) for the extremum: " #−1 ∂2 D ∂D → 1 ∂F T→ → F se = Fm + se = − → 2 se → 2 ∂x ∂s ∂s
ð7Þ
Keypoints with a contrast below some threshold value, were rejected. This is equivalent to rejecting points when |F(se)| b h. Here a threshold value of h = 0.01 was used. Eliminating keypoints with low contrast is not enough to ensure the stability of the remaining keypoints. It is also necessary to filter out keypoints lying along the edges, since the DoG function will have a strong response there, even if the location along the edge is poorly determined and therefore unstable to small amounts of noise. As shown
Fig. 11. A 4-octave Gaussian image pyramid with 4 convolutions per octave.
Fig. 12. Difference of Gaussians (DoG) from Fig. 11.
148
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
magnitude eigenvalue κ, and the smaller one, λ, so that ε = κ / λ . It can be shown that keypoint filtering is: μ=
TrðH Þ 1 + +2 Det ðH Þ
ð9Þ
where μ is a threshold parameter and Tr(H), Det(H) is the trace and determinant of H respectively. In this paper, a threshold value of μ = 30 is used and by substitution in Eq. (9), keypoints are rejected when ε N 28. 2.6.3. Orientation assignment Orientation assignment of each keypoint is then done by computing the gradient magnitude m(x, y) and orientation (x, y) of the scale space for the scale of that keypoint: mðx; yÞ =
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ½Lðx + 1; yÞ−Lðx−1; yÞ2 + ½Lðx; y + 1Þ−Lðx; y−1Þ2 ð10Þ −1
θðx; yÞ = tan
Fig. 13. Extrema detection comparing 26 neighbours (9–8–9 at each scale respectively), taken from Lowe [11].
in Lowe [11], a poorly defined peak in the DoG function will have a large principal curvature across the edge but a small one in the perpendicular direction. The principal curvatures can be computed from a 2 by 2 Hessian matrix, H, at the location and scale of the keypoint: 2
∂2 D 6 2 6 ∂x H=6 6 2 4∂ D ∂y∂x
3 ∂2 D 7 ∂x∂y 7 7 7 ∂2 D 5
ð8Þ
∂y2
The eigenvalues of H are proportional to the principal curvatures of D and following the approach used by Lowe [11], Harris and Stephens [4], explicitly computing the two eigenvalues can be avoided, knowing only their ratio. Let ε be the ratio between the largest
Table 1 PNN Training data set. Manufacturer
Samples
Manufacturer
Samples
Alfa Romeo Audi Bmw Citroen Fiat Peugeot
5 5 5 5 5 5
Renault Seat Toyota Volkswagen Unknown/other
5 5 5 5 5
Table 2 PNN setup time.
Lðx; y + 1Þ−Lðx; y−1Þ Lðx + 1; yÞ−Lðx−1; yÞ
ð11Þ
One or more orientations are assigned to each keypoint based on local image properties. The dominant orientations for each keypoint were identified, based on its local image patch. Gradients are weighted by their distances from the feature point location and ordered in a circular histogram of 36 bins that cover 10° each. The bin with the greater number of points is selected and the exact orientation is calculated from a parabola fitted to this peak and its 3 closest neighbors. The assigned orientation(s), scale and location for each keypoint enables SIFT to construct a canonical view for the keypoint that is invariant to similarity transforms. 2.6.4. Feature description The local image gradients are measured at a neighborhood region around each keypoint (feature patch) and transformed into a representation that allows for local shape distortion and change in illumination. This patch has been previously centered about the keypoint location, rotated on the basis of its dominant orientation and scaled to an appropriate size. The keypoint descriptor is created by sampling the magnitudes and orientations of the image gradient in the patch around the keypoint and building smoothed orientation Table 3 PNN classification statistics. Recognition Rate Manufacturer
Correct
Mistaken
Not Recognized
Alfa Romeo Audi Bmw Citroen Fiat Peugeot Renault Seat Toyota Volkswagen Unknown/other Total
9 7 10 10 9 7 7 9 8 9 8 93 [85%]
0 3 0 0 1 2 2 0 1 1 2 12 [11%]
1 0 0 0 0 1 1 1 1 0 0 5 [4%]
Table 4 PNN classification time.
Size
Setup time (ms)
PNN size
Setup time (ms)
25 × 25 50 × 50 100 × 100 200 × 200
576 637 754 1428
25 × 25 50 × 50 100 × 100 200 × 200
154 221 287 793
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
149
Fig. 14. Keypoint descriptor construction.
histograms to capture the important aspects of the patch. A 4 × 4 array of histograms, each with 8 orientation bins, describes the rough spatial structure of the patch (see Fig. 14). This 128-D vector (4 × 4 × 8) is then normalized to unit length and thresholded to remove elements with small values. 3. Experimental procedure — results 3.1. Vehicle manufacturer recognition In Fig. 2 it was shown how the license plate location information was used to crop the vehicle. From a vehicle image database, 110 vehicle images have been selected. The image set was divided into 11 distinct classes (10 different manufacturers and one ‘unknown’). The vehicle
license plate was located and the plate number was retrieved for every image. Using the position and dimension of license plates, the vehicle mask was detected and segmented with a success rate of 97%. In the case that the vehicle mask has not been detected and segmented correctly, this affects the next step of logo detection and therefore the logo recognition cannot proceed. Such failure is attributed to illumination conditions (excessive or non-uniform lighting, shadows), weather (rain or fog), dirt, partial occlusion from other objects and camera view angle. Next, the car manufacturer (logo) area was segmented using the phase congruency calculation, as shown in Fig. 3, with a success rate of 95%. In order to classify the vehicle manufacturers from the vehicle images, a series of runs for the PNN have been performed, in two phases: a) Training Phase and b) Recognition Phase. PNN was implemented by using code developed in MATLAB.
Fig. 15. (a) Raw detected keypoints, (b) low-contrast Filtered, (c) edge filtered, (d) final SIFT keypoints extracted with their locations represented by arrows. The length of the arrow represents the scale of keypoints and the direction represents the orientation. Note that it is possible for multiple keypoints with more than one orientation to be found at the same location.
150
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151
Table 5 Model classification statistics for a fixed manufacturer (Volkswagen). Volkswagen (VW)
Model recognition rate
Model
Correct
Mistaken
Not recognized as VW
Polo II (1990–1994) Polo III (1994–2000) Polo IV (2000–2005) Golf III (1991–1997) Golf IV (1997–2003) Golf V (2003–2006) Bora (1999–2006) Passat IV (1993–1996) Passat V (1996–2005) New Beetle (1998–2005) Unknown/other Total
4 6 4 5 7 5 7 5 6 4 6 59 [54%]
3 2 4 3 2 2 2 3 3 3 4 31 [28%]
3 2 2 2 1 3 1 2 1 3 0 20 [18%]
3.1.1. Training Phase In the Training Phase, the PNN was fed with sets of classes that represent each car manufacturer recognition pattern. Ten classes of car manufacturers were tested plus one class for the unknown car model. For every class five sample classes are provided as a training set. The training sample classes are shown in Table 1. The PNN input vector size is m × n pixel values, directly taken from segmented logo image converted to a one-dimensional array. Four different network (m × n) resolutions have been tested: 25 by 25, 50 by 50, 100 by 100 and 200 by 200 respectively, with 11 classes × 5 samples/class and the total net size (array elements) was : m × n × 55. Different runs were tried varying the value of spread, σ, from 0.0 to 5.0 and it was found that optimum recognition rate was achieved at σ = 0.9. A Pentium IV at 2.0 GHz was used for the simulations. The PNN setup time versus the network size is shown in the following Table 2. 3.1.2. Recognition Phase During the Recognition Phase, unknown logo image samples segmented from vehicle masks were fed to the trained PNN and the classifier produced the results of Table 3. Ten classes of vehicle models were tested plus one class for the unknown vehicle model. The manufacturer recognition rate was about 85%. The speed of classification versus the network size is given in Table 4. 3.2. Vehicle model recognition For each database image and for the vehicle manufacturer recognized in the previous step 3.1.2, a grid was drawn (see Fig. 12) and a set of SIFT descriptors were calculated following the procedure described above. The keypoints were then filtered and selected so that there is only one keypoint per cell. In case no keypoint found inside a cell, a null keypoint is set with its descriptor calculated at the centre of this cell, exactly as if it was a real keypoint. In this way, all the training images were transformed to a fixed set of 16 × 9 128-bit descriptors. A Probabilistic Neural Network (PNN) similar to that used in our previous work for logo recognition [15] was trained from these database image descriptors (Fig. 15).
3.2.1. Training Phase In the Training Phase, the PNN was fed with descriptors created using exactly the same procedure as of 3.1.1 A single vehicle manufacturer database set has been used for the images and the models of this manufacturer (Volkswagen) have been recognized. 3.2.2. Recognition Phase During the Recognition Phase the model from the query image was recognized and the PNN gives the maximum matching probability with a database class for the best match. The results are shown in Table 5. It should be noted here, after presentation of the results, that the neural network method is not very appropriate for classification of keypoints and recognition of vehicle models, since the success rate was almost 54%. The false recognition rate (wrong model) was 28% for the query images and moreover, 18% of the query images were not recognized as VW. One possible reason is, that in general, the keypoints were not supposed to be uniformly distributed, and also by filtering them so as to have one keypoint per cell, this reduces dramatically the possible matches and hence the true detection rate. In a non-filtered image many keypoints may be located very near to each other and having areas without keypoints at all, see Fig. 16. On the other hand, this method is much faster than a conventional nearest neighbour classifier, since the PNN gives almost an immediate response (less than 1 s). A novel classifier, taking into account considerably more keypoints and which already gives promising preliminary results, is currently under development. 3.3. Vehicle color recognition Following the procedure described in Section 2.4, series of vehicle images were segmented and sets of monochromatic RBG histograms for the areas of interest (color) were calculated. The peaks of RGB histograms were classified using the minimum Euclidean distance to a fixed set of 12 preset colors and the results are shown in Table 6. The system performed quite well, with a true recognition rate of 90% and also a 10% false recognition rate was observed, due to strong effects of light reflectance and partial shadowing for some of the tested images. The speed of color recognition, using the same machine as with VMMR, is about 50 ms on average. 4. Conclusions In this paper, a novel vehicle MMR is presented, enhanced by color recognition for completeness. The research is initiated within a small range of vehicle models and manufacturers. Two series of vehicle mask and manufacturer logo images were obtained through automatic detection and segmentation applying LPR, symmetry axis of vehicle image mask performed, if needed, phase congruency calculation was implemented and a hierarchical database of models per manufacturer was created. Logo images were forwarded to a properly trained Probabilistic Neural Network where the vehicle manufacturer was recognized with a very good recognition rate (85%) and with a fast processing time. For those vehicle masks that correspond to the vehicle manufacturer recognised in the previous step, the SIFT-based vehicle keypoints were calculated. These keypoints were forwarded to another properly
Fig. 16. Vehicle mask grid (16 × 6) (real keypoints in yellow, null keypoints in green).
A. Psyllos et al. / Computer Standards & Interfaces 33 (2011) 142–151 Table 6 Vehicle color recognition results. Color
White Black Silver Blue Deep blue Red Dark gray Pink Yellow Orange Green Green blue
RGB value
255 255 000 000 238 224 100 149 000 000 255 000 150 150 255 105 255 255 255 165 084 139 127 255 Total
255 000 229 237 128 000 150 180 000 000 084 212
Color recognition Correct
Mistaken
5 5 4 5 5 5 4 4 5 4 5 4 45 [90%]
0 0 1 0 0 0 1 1 0 1 0 1 5 [10%]
trained Probabilistic Neural Network, based on a database of models for the manufacturer matched in the previous step and the vehicle model was recognized with a fairly good recognition rate (54%) and with a fast processing time. The color recognition scheme is also very effective, displaying 90% of accuracy and a very fast processing time. To further improve the manufacturer and model recognition performance, we plan to extend the system to deal with a wider range of viewpoints and affine transformations, as well as recognition on more complex and occluded scenes, under a wider variety of illumination conditions. There is an ongoing research at this direction with encouraging results. References [1] C. N. Anagnostopoulos, Artificial Vision and Computational Intelligence techniques for industrial applications and quality control, PhD. Thesis, Electrical and Computer Engineering Dpt., National Technical University of Athens, 2002. [2] C.N. Anagnostopoulos, I. Anagnostopoulos, V. Loumos, E. Kayafas, A license plate recognition algorithm for intelligent transportation system applications, IEEE Transactions on Intelligent Transportation Systems 7 (3) (2006) 377–392. [3] L. Dlagnekov, S. Belongie, Recognizing cars, Tech. Rep. CS2005-0833, University of California San Diego, 2005. [4] C. Harris, M. Stephens, A combined corner and edge detector, Fourth Alvey Vision Conference, 1988, pp. 147–151, (UK). [5] T. Kato, Y. Ninomiya, I. Masaki, Preceding vehicle recognition based on learning from sample images, IEEE Transactions on Intelligent Transportation Systems 3 (4) (2002) 252–260. [6] P. Kovesi, Image features from phase congruency, Videre: A Journal of Computer Vision Research, MIT Press 1 (3) (1999) 1–27. [7] A.H.S. Lai, N.H.C. Yung, Vehicle-type identification through automated virtual loop assignment and block-based direction-biased motion estimation, IEEE Trans. Intelligent Transportation Systems 1 (2) (2000) 86–97. [8] A.H.S. Lai, G.S.K. Fung, N.H.C. Yung, Vehicle type classification from visual-based dimension estimation, IEEE Intelligent Transportation Systems Conference (2001) 201–206 (USA). [9] H.J. Lee, Neural network approach to identify model of vehicles in: lecture notes in computer science, Vol. 3973, Springer-Verlag, 2006, pp. 66–72. [10] D. Lowe, Object recognition from local scale-invariant features, International Conference on Computer Vision, 1999, pp. 1150–1157, (Greece). [11] D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 2 (60) (2004) 91–110. [12] H. Maemoto, S. Okuma, Y. Yano, Parametric vehicle recognition using knowledge acquisition system, IEEE International Conference on Systems, Man and Cybernetics 4 (2004) 3982–3987.
151
[13] M. Merler, Car color and logo recognition, CSE 190A Projects in Vision and Learning, University of California, 2006. [14] V. Petrovic, T. Cootes, Analysis of features for rigid structure vehicle type recognition, British Machine Vision Conference, Vol. 2, 2004, pp. 587–596. [15] A. Psyllos, C.N. Anagnostopoulos, V. Loumos, E. Kayafas, Image processing & artificial neural networks for vehicle make and model recognition, Proceedings of 10th International Conference on Applications of Advanced Technologies in Transportation, May 2008 (Greece). [16] J. Sauvola, M. Pietikäinen, Adaptive document image binarization, Pattern Recognition 33 (2000) 225–236. [17] D. Specht, Probabilistic neural networks for classification, mapping or associative memory, IEEE International Conference on Neural Networks Vol.1, 1998, pp. 525–532. [18] C. Sun, D. Si, Fast reflectional symmetry detection using orientation histograms, Real-Time Imaging 5 (1) (1999) 63–74. [19] M. Weber, M. Welling, P. Perona, Unsupervised learning of models for recognition, Lecture Notes in Computer Science, Vol. 1842, Springer-Verlag, 2000, pp. 18–32.
Apostolos Psyllos received his Chemical Engineering (BSc, MSc) and PhD degrees from the National Technical University of Athens (NTUA) in 1988 and 1993, respectively. He worked as a researcher at the Environmental Research Laboratory of the Research Centre ‘Demokritos’ in Greece (1996–2000) and at the EU Joint Research Center in Italy (2000–2001). He worked as a computer engineer, at the School of Mathematics & Physics at NTUA (2002– 2009). He obtained his second PhD degree from the School of Electrical and Computer Engineering at NTUA in April 2010. He is currently working as a post-doctorate researcher at the EU Joint Research Center in Italy. Dr. Apostolos Psyllos is a member of the Greek Chamber of Engineers and the Greek Informatics Society. His research interests are image and signal processing, computer vision, remote sensing and artificial intelligence. He has published 1 paper in a journal and 3 papers in conferences, on the above subjects.
Christos-Nikolaos E. Anagnostopoulos was born in Athens, Greece in 1975. He received his Mechanical Engineering Diploma from the National Technical University of Athens (NTUA) in 1998, and the Ph.D. degree from the Electrical and Computer Engineering Dpt., NTUA in 2002. In 2003, he joined the University of the Aegean as lecturer in the Cultural Technology and Communication Department. Dr. Christos-Nikolaos E. Anagnostopoulos is a member of the Greek Chamber of Engineers and member of IEEE. His research interests are image processing, computer vision, neural networks and artificial intelligence. He has published more than 60 papers in journals and conferences, in the above subjects.
Eleftherios Kayafas received his B.Sc. degree from Athens University in 1970, and the MSc and PhD degrees in electrical engineering from the University of Salford, England, in 1975 and 1978, respectively. He worked with the Civil Aviation Service (1973–74) and after his postgraduate studies with the Hellenic Aerospace Industry ( HAI 1978–79). In 1979 he joined the National Technical University of Athens (NTUA) as Lecturer in the Electrical Engineering Department, then Asst. Professor (1987), Associate Professor (1992) and finally Professor of Applied Electronics (1996). Professor Kayafas is a member of the Greek Chamber of Engineers, member of IEEE, member of IMEKO (International Measurement Confederation, member of TC-4 committee). His research interests are applied electronics, multimedia applications, multimedia communication systems and web engineering. He has published more than 170 papers in journals, book chapters and conferences, in the above subjects.