Particle Swarm Optimization for object detection and segmentation Stefano Cagnoni, Monica Mordonini, Jonathan Sartori Universit`a di Parma Dipartimento di Ingegneria dell’Informazione viale G.Usberti 181/A, 43100 Parma - ITALY fcagnoni,
[email protected],
[email protected] Abstract. In this paper we describe results of a modified Particle Swarm Optimization (PSO) algorithm which has been applied to two image analysis tasks. In the former, accurate region-based segmentation is obtained by analyzing the cumulative results of several runs of the algorithm. In the latter, the fast-convergence properties of the algorithm are used to accurately locate and track an object of interest in real time.
1 Introduction Since its introduction in the late 90’s [1, 2], Particle Swarm Optimization (PSO) has increasingly attracted researchers for its efficiency in locating maxima of even highly multi-modal functions. While the basic PSO algorithm aims at finding a single optimum point within the fitness landscape under exploration, several applications require that more than one optimum be found or that the swarm spread over a whole area of interest, featuring high fitness values, as uniformly as possible. This has led to the definition of several variants of basic PSO, in which particles are subdivided into a pre-defined number of sub-swarms, based on some clustering technique [3–5], or through speciation [6–9], to achieve a dynamical reconfiguration of the swarm and allow for an arbitrary number of regions of interest within the search space. This situation is typical of object recognition tasks, where the goal is to identify all possible occurrences, within an image, of an object of interest which is characterized by a set of specific, even if generally fuzzily defined, features. Similarly, region-based segmentation requires that several regions with homogeneous features be accurately located. In this paper we describe two image analysis applications which are solved by methods based on PSO variants adapted to the specific requirements of the problems under consideration.
2 PSO for object detection and segmentation The two tasks which were used to evaluate the potential of PSO for application to image analysis are two typical tasks in computer vision: region-based segmentation and object detection.
The first task is the ’pasta segmentation problem’, proposed as subject of a competition at GECCO 20061. In such a competition, each image in a set of 12, obtained in different lighting conditions and presenting lighting artifacts, as the bright spot in Figure 2 (top left), were to be analyzed. In particular, the larger pieces of pasta, laid on complex backgrounds and mixed with smaller ones (pasta noise), were to be segmented using a binary pixel-classification strategy. The final classification was to be obtained by thresholding the image resulting from some pre-processing of the original one (see Figure 1). The second task on which PSO was tested was license-plate detection. The goal here was to locate the license plate within rear views of cars acquired under different lighting conditions and camera positions. The same dataset which had been used in developing the APACHE license-plate recognition system [10] was used as benchmark, in order for results to be compared with those obtained by the plate detection stage of such a system. Additionally, further experiments on video sequences were also made, in which the goal was tracking the license plate through the frames in real time, once it had been located. Even if the two tasks are semantically different, they share some common lowerlevel features, which allowed us to apply very similar versions of PSO to solve the two problems. In particular, in both cases the basic step requires that the image be explored, to focus on regions where interesting features (i.e., features which are expected to characterize the objects we want to locate or segment) can be detected. The main goal of our work was therefore to evaluate to which extent the efficiency of PSO-based search could be exploited within the two applications. The following subsection will describe how the basic PSO equations have been modified to fit the requirements of our applications and to implement the basic step, common to both applications, before describing the applications in details in the following two sections. 2.1 PSO fitness and velocity-update equations In the basic PSO algorithm, the fitness function is punctually coincident with the function which is to be optimized. In analyzing images using PSO, the search space being the image, using such a local fitness function would lead to explorations which would be extremely sensitive to noise and possibly misleading. If fitness evaluation were just pixel-based, a meaningless isolated pixel yielding high fitness as a result of noise could attract and trap the whole swarm into its neighborhood. In the applications under consideration, PSO is required to produce a uniform distribution of particles over each region of interest. To induce such a behavior, we have modified the basic PSO algorithm in two directions: – forcing division of the swarm into sub-swarms (defined as subsets of the swarm, within which the distance between any particle and the closest one is below a preset threshold; sub-swarms change dynamically as particles move), able to converge towards different regions of interest; 1
see http://cswww.essex.ac.uk/staff/rpoli/GECCO2006/
– favoring dispersion of the particles all over the regions of interest. Using the so-called K-means PSO [5], in which clusters of particles are formed based on their proximity within the search space, allowed us to achieve the former goal. Achieving the latter required that both the fitness function and the velocity-update equation be modified. As concerns the fitness function to be maximized, we have added a lo al fitness term, which evaluates how “interesting” the neighborhood of one pixel is, to the traditional pun tual fitness function, whose value is computed based only on information carried by the pixel under consideration: f itness(x; y )
=
pun tual f itness(x; y )
+
lo al f itness(x; y )
(1)
The lo al f itness term depends on the number of particles in the sub-swarm, with high punctual fitness, which are near the pixel located in (x; y ) and is given by: lo al f itness
=
K0
number of neighbors
(2)
where number of neighbors is the number of particles within a pre-defined neighborhood of (x; y ) and K0 is a constant. This way, the particles are attracted towards the areas where a larger amount of pixels meet the punctual requirement, keeping away from isolated noisy pixels. This modification enhances the density of particles in the most interesting regions. To cover the whole extension of these regions and not only small areas within them, we needed to modify also the basic PSO velocity-update equation from:
vP( ) = vP( 1) + () [Xbest X( 1 () [Xgbest X( 2 t
w
t
C
rand
C
rand
t
1)℄ +
(3)
1)℄
t
where vP is the velocity of the particle, C1 ; C2 are two positive constants, w is the inertia weight, X is the position of the particle, Xbest is the best-fitness position reached by the particle up to time t 1, Xgbest is the best-fitness point ever found by the whole swarm, to: vP (t) = vP(t) + repulsionP (4) The repulsion term is computed, separately along each axis, as
jrepulsion(
j
i; j )
=
jXi Xj j
RE P U LS I ON RAN GE
(5)
where i and j are the particle indices and RE P U LS I ON RAN GE is the maximum distance within which the particles may interact. Values of repulsion(i,j) are set to 0 for distances between i and j larger than RE P U LS I ON RAN GE . The global repulsion term repulsionP for particle P is the average of all repulsion terms deriving from the presence of other particles in its neighborhood.
repulsionP
=
1 n
XN repulsion P j j=1
(
;
)
(6)
N being the number of particles in the swarm and n being the number of particles within the neighborhood of P defined by RE P U LS I ON RAN GE . Finally, one last change has been made to the standard PSO algorithm, aimed at producing more stable sub-swarms: the possibility for a particle with high punctual and local fitness to stand still. In other words, if a particle with a high punctual fitness lies within a region with a high density of particles, then it has a probability of standing still, which is linearly dependent on such a density. Such a probability is estimated as: P
f
vP (t)
g
=0 =
n
(7)
N
3 PSO-based image segmentation As described in the previous section, the pun tual f itness is the fitness which could be attributed to a pixel (a location in the search space), based only on its intrinsic properties. In the pasta segmentation problem, this translates into a function which measures the similarity of the pixel color to the expected color of pasta pieces or, better, its belonging to a three-dimensional region in the RGB space centered around such a color prototype, which is expressed as: if
j
j
30
pun tual f itness
= 30
pun tual f itness
=0
( r (x; y )
g (x; y )
j
60)
g (x; y )
else
where r(x; y ), g (x; y ) and b(x; y ) are the red, green and blue values, respectively, of the pixel located in (x; y ). Since the aim of the application was to obtain an accurate segmentation, up to pixel precision, and given the rather large size of the input images and the consequent large number of pixels belonging to the objects of interest, it is obvious that PSO could not produce the final solution directly. Instead, it was used in the pre-processing stages preceding the final thresholding stage which produces the actual output. Following the PSO rules modified as previously described, the particles will tend to move towards larger pasta regions and to stay around there. If one performs a number of runs of a PSO algorithm, assigning each pixel a score which is directly proportional to the number of times a particle walks through (or stays on, if the particle stands still) it, the probability of belonging to a large pasta piece can be estimated for each pixel. To better estimate such a probability, and to avoid possible polarizations of results due to the choice of the initial particle locations, each run should start with a different random initialization of the whole swarm. To give globally higher importance to the regions with higher density of pasta pixels, and to regularize results, we decided to extend the ’influence area’ of each particle from just the current pixel to a larger neighborhood. In other words, when a particle visits a pixel the score of the current pixel is increased, as well as the score of its neighbors, by a lower amount roughly proportional to their distance from the current pixel. Finally, we stretch the score distribution within the image by further awarding the pixels whose
Fig. 1. Original image (top left) and results of global search after 500 runs (top right), 750 runs (bottom left), and 1000 runs (bottom right)
score is above a threshold by multiplying it by a fixed factor F > 1, whilst the scores which are below the threshold are reduced by multiplying them by a different factor G < 1. In Figure 1, the score associated with each pixel is represented as a grey-scale image. Areas which eventually end up by having high density of light pixels (i.e., high scores) correspond to pasta regions. The final result of this stage, that we termed global search, is shown in Figure 2. This way, the areas where large pieces of pasta are most likely to be found, have been grossly detected on the whole image; it is now necessary to focus the attention on such areas to achieve a final refinement of their segmentation. To do so, an algorithm which is very similar to the one used in the previous stage is applied; this time the domain where the swarm can move is limited to smaller regions surrounding pixel clusters whose score was above the threshold in the last phase of the global search stage. These are rectangular regions, extracted as follows: – a neighborhood with high density of significant pixels is detected; – the neighborhood is extended until a bounding box is found for the relevant pixel cluster; – the bounding box is extended by 1=3 in each direction, to cope with possible false negatives close to the boundary of the piece of pasta corresponding to the cluster. In initializing this stage, scores assigned to each pixels at the end of the global search are preserved. The result, obtained after running this local search on all significant regions, is shown in Figure 2, along with the results of the final segmentation, obtained by thresholding the results of the local search.
Fig. 2. Original image (top left), results of global search (top right), results of local search (bottom left), and final segmentation (bottom right).
4 PSO-based pattern localization In the license plate detection problem, the low-level feature on which detection can be based is the density of high-level values of the horizontal gradient, which correspond to the frequent discontinuities between high- and low-intensity pixels (or vice-versa), due to the presence, in the plate, of symbols or symbol elements, which can be encountered when the image is scanned row-wise. Since a color image is available, we can use both color and gradient information, by firstly considering only those pixels which satisfy the typical features of plates (black characters on a white background for the most recent European-standard plates), and then considering gradient information. Therefore, the punctual fitness of a pixel, on which PSO-based plate search will rely has been defined in this application as: if
j j )j
g (x; y )
>
30
r x; y )
b(x; y )
>
30
g x; y )
b(x; y
>
30)
or
else
j j( j(
( r (x; y ) or
f
pun tual f itness right gradient lef t gradient if
(
= 0;
=
=
j
j
intensity (x; y )
intensity (x; y )
intensity (x intensity (x
right gradient > lef t gradient
pun tual f itness
=
else pun tual f itness
j
1; y ) ;
)
right gradient;
=
j
+ 1; y ) ;
lef t gradient;
g
The basic PSO step used to solve this problem was virtually the same as in the application described previously. However, it was used within a different algorithm,
Fig. 3. License plate detection. Original image (top left), the sub-swarms at the end of the global search super-imposed on the gradient image (top right), the swarm at the end of the local search super-imposed on the gradient image (bottom left), and the bounding box corresponding to the license plate super-imposed on the input image (bottom right).
which is, as well, divided into a global and a local exploration stage in which, after the most promising areas are firstly located, the exploration of those region is then refined to determine whether they actually represent a plate. In the global search stage, we let the swarm fly over the image until at least one sub-swarm of size greater than a prefixed threshold (50% of the number of particles in the whole swarm) has formed, or a given number of iterations has been reached. Then, in the subsequent stage, a local search is performed in the areas where subswarms of sufficient dimension have formed (at least 3 particles), starting from the region occupied by the most numerous one; during this second stage, we (i) restrict the search to the bounding boxes enclosing the sub-swarms, defined as in the previous application, by clipping particle positions at the boundaries of the bounding box, (ii) re-initialize the search activating a new (full-size) swarm, and (iii) run the search for a pre-set number of iterations. Also in this case, we refer to this second stage as the local search stage. At the end of the local search, a new bounding box, containing all the particles having high fitness, is defined. If this box has a width:height ratio close to 5:1 (the ratio which is typical of a license plate), then the plate is considered to have been found. Otherwise, the swarm is expanded along its two dimensions, by forcing low-fitness particles to move only horizontally or vertically, in order to reach higher-fitness points and, possibly, to let the bounding box reach the expected aspect ratio; in case of failure the current region is discarded and the next area detected during the global search is explored. Figure 3 shows the original image, along with the results of the global and local search, and the final result of the PSO-based algorithm.
5 Experimental results The two applications we have considered were aimed at evaluating the performances which could be obtained using PSO as a search algorithm in image analysis tasks. For both applications the parameters related with swarm motion and fitness equation were set as follows: w
= 0:8;
C1
=
C2
= 2:0;
K0
= 5:0;
RE P U LS I ON RAN GE
=7
(8)
In the application to pasta segmentation G = 1:0 while F = 2:0 during global search and F = 3:0 during local search. Considering the different goals of the two applications, it is quite clear that plate detection is a much less demanding application, in terms of computation time, with respect to pasta segmentation. Once the search has been successful, which most often happens in the first run, limited post-processing is required by the plate detection algorithm to define an accurate bounding box for the plate. On the contrary, pasta segmentation, which requires not only detection but also accurate segmentation of all objects of interest, requires many more runs of PSO to reach stable statistics on the number of visits to each pixel before thresholding can be applied. In fact, using PSO to perform virtually the whole segmentation task is a solution which seems to force PSO to perform beyond its nature of fast and effective search algorithm. This was clearly reflected by the computation time required by pasta segmentation, for which 25 seconds per image were needed on average on a 1.8 GHz Pentium4 PC having 1 GB of RAM. For every image, 1000 PSO runs, each lasting 500 generations (updates of the position of the whole swarm of 20 particles) were performed in the global search stage, while a number of runs proportional to the number of regions of interest extracted during the global search were performed during the local search. Apart from computational inefficiency, segmentation was quite accurate, averaging an accuracy of 91.89% on the 12 images of the set under consideration, when the optimum threshold was chosen separately for each image, with values ranging from 86.65% to 97.71%. However, the robustness of the segmentation induced by the two PSO-based stages of the algorithm is such that accuracy was very little dependent on the threshold value, as shown by table 1. In the license plate detection experiment, the data set on which tests were performed included 98 rear images of cars acquired with different backgrounds and lighting conditions. Given the stochastic nature of the algorithm, 10 runs of the algorithm were performed for each image in the data set, with swarms of 20 particles. The algorithm was able to detect the plate correctly (maximum distance of 3 pixels between the actual border of the license plate and the bounding box extracted by the algorithm) in 958
Threshold 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.50 0.60 0.70 0.80 0.90 Accuracy 70.78 88.89 90.89 91.57 91.69 91.41 90.83 91.14 90.46 86.96 83.68 83.43 83.33 Table 1. Average percent accuracy vs. threshold value for the pasta segmentation problem
Fig. 4. Percentage of detections vs. time for the successful runs.
cases out of 980, with a success rate of 97.76%, much higher than the algorithm which was used in the APACHE system, whose success rate was below 90%. A very interesting observation regarding the PSO-based solution with respect to a traditional computer vision algorithm relying on the same information is that, in the latter, the whole horizontal-gradient image has typically to be computed before the analysis can start, while in the former the horizontal gradient is computed only for those pixels which were visited by the swarm, which, on average, means only as many times as 2% to 3% of the total number of pixels in the image. Therefore, results in term of computation time were even more satisfactory. The average time required to detect a plate was 0.083 s for the successful runs (see Figure 4 for detailed statistics on the distribution of results in time). The runs in which the plate was wrongly detected required on average 0.298 s, while the runs in which the plate could not be detected at all 1.521 s. The average time over the whole data set was 0.096 s per image, which means that, even without considering time correlation between images and performing the global search over the whole image for each frame, a video stream running at up to 10 frames per second could be analyzed on a 1.8 GHz Pentium4 PC having 1 GB of RAM. To evaluate the real-time processing capabilities of the algorithm, in the presence of a tracking strategy, we made further tests on 7 video sequences recorded at 25 fps, of duration ranging from 1.5 to 5 s. To track the plate, after each successful detection, the swarm was initialized, in the subsequent frame, within a neighborhood of the region where the plate had been previously detected. If the search within the previous frame had been unsuccessful, initialization of the swarm could occur anywhere within the new frame. If plate search had been unsuccessful with such an initialization, a full search was performed all over the image. Also in this experiment each test was repeated 10 times for each sequence. The average processing time per frame was well below both limits of .04 s and .033 s required for real-time processing at 25 and 30 fps, respectively, even in the case of the most critical sequence, in which most failures occurred. Table 2 summarizes the results of this experiment.
Sequence N. of frames Avg. time (s) Success (%) 1 96 0.018 100 2 73 0.016 100 3 38 0.017 100 4 48 0.020 95.58 5 49 0.020 98.98 6 145 0.021 97.59 7 117 0.032 93.08 Total 566 0.022 97.83 Table 2. Results of the PSO-based plate detection tracking algorithm
6 Final Remarks In this paper we have described a PSO-based approach to object detection and segmentation which can be considered rather general, as demonstrated by the fact that the two applications which have been described basically share the same algorithm, despite being semantically different. Of course, the fitness function has to be carefully defined, to reflect the peculiarities of the problem at hand. While the choice of the parameters of the PSO equations seems not to be critical (the same settings worked for both applications), problem-specific parameters (such as K0 ) may depend on the the fitness function which is chosen and on how its output is scaled. The efficiency of PSO search is reflected by the real-time performances achieved in the object-recognition application.
References 1. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proc. IEEE Int. conf. on Neural Networks. Volume IV. (1995) 1942–1948 2. Shi, Y.H., Eberhart, R.: A modified particle swarm optimizer. In: Proc. IEEE Int. Conference on Evolutionary Computation. (1998) 69–73 3. Kennedy, J.: Stereotyping: Improving particle swarm performance with cluster analysis. In: Proc. IEEE Int. Conference on Evolutionary Computation. Volume II. (2000) 1507–1512 4. Veenhuis, C., K¨oppen, M.: Data swarm clustering. In Abraham, A., Grosan, C., Ramos, V., eds.: Swarm Intelligence in Data Mining. Volume 34 of Studies in Computational Intelligence. Springer (2006) 221–241 5. Passaro, A., Starita, A.: Clustering particles for multimodal function optimization. In: Proc. GSICE/WIVA. (2006) published on CD, ISSN 1970-5077. 6. Chow, C., Tsui, H.: Autonomous agent response learning by a multispecies particle swarm optimization. In: Proc. IEEE Congress on Evolutionary Computation. (2004) 778–785 7. Bird, S., Li, X.: Enhancing the robustness of a speciation-based PSO. In: Proc. IEEE Congress on Evolutionary Computation. (2006) 3185–3192 8. Yen, G., Daneshyari, M.: Diversity-based information exchange among multiple swarms in particle swarm optimization. In: Proc. IEEE Congress on Evolutionary Computation. (2006) 6150–6157 9. Leong, W., Yen, G.: Dynamic population size in PSO-based multiobjective optimization. In: Proc. IEEE Congress on Evolutionary Computation. (2006) 6182–6189 10. Adorni, G., Bergenti, F., Cagnoni, S., Mordonini, M.: License-plate recognition for restricted-access area control systems. In Foresti, G.L., M¨ah¨onen, P., Regazzoni, C.S., eds.: Multimedia Video-Based Surveillance Systems: Requirements, Issues and Solutions. Kluwer (2000) 260–271