Application of Machine Learning Techniques for ...

2 downloads 0 Views 415KB Size Report
niques for acquiring new knowledge in the image tracking process, specifi- cally, in the blobs detection problem, with the objective of improving per- formance.
Application of Machine Learning Techniques for simplifying the Association Problem in a Video Surveillance System ´ Blanca Rodr´ıguez, Oscar P´erez, Jes´ us Garc´ıa, and Jos´e M. Molina⋆⋆ Universidad Carlos III de Madrid. Departamento de Inform´ atica. Avenida de la Universidad Carlos III, 22 Colmenarejo 28270. Madrid. Spain. [email protected], [email protected], [email protected], [email protected]

Abstract. This paper presents the application of machine learning techniques for acquiring new knowledge in the image tracking process, specifically, in the blobs detection problem, with the objective of improving performance. Data Mining has been applied to the lowest level in the tracking system: blob extraction and detection, in order to decide whether detected blobs correspond to real targets or not. A performance evaluation function has been applied to assess the video surveillance system, with and without Data Mining Filter, and results have been compared.

1

Introduction

Machine learning techniques could be applied to discover new relations among attributes in different domains, this application is named data mining (DM) and it is a part of the knowledge data discovering process (KDD) [1]. The application of data mining techniques to a specific problem takes several perspectives: classification, prediction, optimization, etc. In this work DM techniques will be use to learn a classifier able to determine if a detected surface on an image could be considered as a tentative target or not. This classifier allows avoiding many computations in the association process of the surveillance system. The video surveillance system considered is able of tracking multiple objects or groups of objects in real conditions [2]. The whole system is composed of several processes: – A predictive process of the image background, usually Gaussian models are applied to estimate variation in the background. – A detector process of moving targets, detector process works over the previous and actual acquired frames. – A grouping pixel process, this process groups correlates adjacent detected pixels to conform detected regions. These regions, or blobs, could be defined by a rectangular area or by contour shape. ⋆⋆

Funded by CICYT (TIC2002-04491-C02-02)

2

– An association process, this process evaluate which detected blob should be considering as belonging to each existing target. – A tracking system that maintains a track for each existing target. Usually filters are based on Kalman filter. In this work, we propose the application of DM techniques to add new knowledge into the surveillance system. The surveillance system generates a set of files containing parameters of the detected blobs and, manually, we can generate an identifier to determine if the blob is part of a target or it is just noise. Using this strategy, to add new knowledge, for example the optical flow, the surveillance system is executed with the optimized parameters and the information about the optical flow is recorded for each blob. Then, the DM techniques could use this new information to classify the blobs as a real target (if it is a part of a target) or as false target (if it is only noise). In a previous work [3], an evaluation system has been proposed, and this system will be used to asses the video surveillance system, before and after applying machine learning techniques.

2

Surveillance video system

This section describes the structure of an image-based tracking system. Figure 1 shows a basic architecture of the system. Specifications and details of this video system have appeared in several publications [4], [5], [6].

  

  

!"# $'  !"# $   % # &" '

( #

    

Fig. 1. Basic Architecture of the Video Surveillance System.

The system starts capturing the first image, which is used to initialize background estimations. Then, for each new image, tracks are predicted to the capture time. Pixels contrasting with background are detected and blobs related with actual targets are extracted. Background statistics for pixels without detection in this frame are actualized to enable next frame detection. Then, an association process is used to associate one or several blobs to each target. Not associated blobs are used to initiate tracks, and each track is updated with its assigned blobs, while, in parallel, a function deletes the tracks not updated using the last few captures. Since data mining is being applied to the lowest level in the tracking system, that is, to the ’blobs detection’ block, this is the only block being briefly described. The positioning/tracking algorithm is based on the

3

detection of targets by contrasting with local background, whose statistics are estimated and updated with the video sequence. Then, the pixel level detector is able to extract moving features from background, comparing the difference with a threshold: Detection(x, y) := [Image(x, y) − Background(x, y)] < T HRESHOLD ∗ σ(1) being σ the standard deviation of pixel intensity. With a simple iterative process, taking the sequence of previous images, and weighting to give higher significance to most recent frames, the background statistics (mean and variance) are updated. Finally, the algorithm for blobs extraction marks with a unique label all detected pixels connected, by means of a clustering and growing regions algorithm [7], and the rectangles which enclose the resulting blobs are built.

3

Performance Evaluation System

In this section, the evaluation metric proposed in [3], used to assess the quality of the surveillance system, is briefly described. It has been used the typical approach to evaluate the detection and tracking system performance: ground truth is used to provide independent and objective data that can be related to the observations extracted and detected from the video sequence. The ground truth has been extracted frame by frame for each scenario. The targets have been selected and the following data for each target have been stored: number of analyzed frame, track identifier and values of minimun and maximun (x,y) coordinates of the rectangle that surrounds the target. Ground truth and real detections are compared to by the evaluation system. The evaluation system calculates a number which constitutes the measurement of the quality level for the tracking system. It uses an ideal trajectory as a reference, so the output track should be as similar as possible to this ideal trajectory. With the comparison of the detected trajectories to the ideal one, a group of performance indicators are obtained to analyse the results and determine the quality of our tracking process. The evaluation function is computed by giving a specific weight to each of the next indicators: – Error in area (in percentage): The difference between the ideal area and the estimated area is computed. If more than one real track corresponds to an ideal trajectory, the best one is selected (although the multiplicity of tracks is annotated as a continuity fault). – X-Error and Y-Error: The difference among the x and y coordinates of the bounding box of an estimated object and the ground truth. – Overlap between the real and the detected area of the rectangles (in percentage): The overlap region between the ideal and detected areas is computed and then compared, in percentage, with the original areas. The program takes the lowest value to assess the match between tracking output and ground truth.

4

– Commutation: The first time the track is estimated, the tracking system marks it with an identifier. If this identifier changes in subsequent frames, the track is considered a commuted track. – Number of tracks: It is checked if there is not a single detected track matched with the ideal trajectory. Multiple tracks for the same target or lack of tracks for a target indicate continuity faults. There are two counters to store how many times the ground truth track is matched with more than one tracked object data and how many times the ground truth track is not matched with any track at all. With the evaluation function, a number that measures the quality level of the tracking system is calculated, by means of a weighted sum of different terms based on the evaluation metrics specified befor. The lower the evaluation function, the better the quality of the tracking system.

4

Data Mining for Blobs Classification

This section describes how Data Mining has been applied to the ’Blobs Detection’ block in the Video Surveillance System. The architecture of the Data Miningbased Video Surveillance System is shown in figure 2.

:; ?@A BC

DE F HB>BIC>@GC;

JKLKMNONOPQ YZ[I\@;] RKSTUVNWLTX )*+,- ./01,*22345, .678,9

^B[EHB>BI>@C;G

Fig. 2. Architecture of the Data Mining-based Video Surveillance System.

The objective of applying Data Mining is to classify detected blobs as ”real targets” or ”false targets”, removing these false targets to simplify the association process, and, in this way, improving the whole system. As it has already said, the detection of targets is based on the intensity gradient in the background image. But, not all blobs detected are real targets. These false blobs, or false alarms, may appear because of noise, variation in illumination, etc. It is at this point where data mining may be applied to remove false alarms without affecting real targets. The objective of data mining is finding patterns in data in order to make non-trivial predictions on new data [1]. So, having various characteristics (optical flow, gradient intensity...) of the detected blobs, the goal is to find patterns that allow us to decide whether a detected blob corresponds to a real target or not. The input data take form of a set of examples of blobs. Each instance or example is characterized by the values of attributes that measure different aspects of the instance. The learning scheme needed in this case is a classification scheme that takes a set of classified examples from which it is expected to learn

5

a way of classifying unseen examples. That is, we start from a set of characteristics of blobs together with the decision for each as to whether it is a real target or not, and the problem is to learn how to classify new blobs as ”real target” or ”false target”. The output must include a description of a structure that can be used to classify unknown examples in order that the decision can be explained. A structural description for the blobs in the form of a decision tree is the most suitable output here. Nodes in a decision tree involve testing a particular attribute. Leaf nodes give a classification that applies to all instances that reach the leaf. To classify an unknown instance, it is routed down the tree according to the values of the attributes tested in successive nodes, and when a leaf is reached the instance is classified according to the class assigned to the leaf. For the blob classification, the algorithm C4.5 has been used [8]. C4.5 is based in algorithm ID3 [9], both introduced by Quinlan. The basic ideas behind ID3 are: – In the decision tree each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf. (This defines what a Decision Tree is). – In the decision tree at each node should be associated the non-categorical attribute which is most informative among the attributes not yet considered in the path from the root. (This establishes what a ’good’ decision tree is). – Entropy is used to measure how informative is a node, based in Shannon Theory. (This defines what is meant by ’good’). C4.5 is an extension of ID3 that accounts for some practical aspects of real data, such as unavailable values, continuous attribute value ranges, pruning of decision trees, rule derivation, and so on. In next subsections, the input data used to represent this problem and the obtained output data are described. 4.1

Input Attributes

Three scenarios, described in next section, have been used. In all cases, the training examples are extracted in order to obtain the classifier. These examples correspond to detected blobs, characterized by several attributes and classified as ’true target’ or ’false target’. Many attributes have been calculated for each detected blob so a better classification might be done: 1. Intensity Gradient. It is the parameter initially used for detecting blobs. Media, standard deviation, minimum and maximum values of intensity gradient inside the blob have been stored. 2. Optical Flow. It calculates the motion of an object, represented as a vector, using Horn and Schunck algorithm [10] over consecutive frames. Since it is a vector, module and phase may be considered. Media, standard deviation, minimum and maximum values of the module and the phase of the optical flow inside the blob have been stored.

6

3. Edge Detection. It marks the points in an image at which the intensity changes sharply. Three methods have been used: canny algorithm [11], corner detection and a high-pass filter. In the three cases, the number of pixels of the blob and its surrounding area that correspond to detected edges has been stored. Examples of Optical flow and edge detection are illustrated in figure 3.

_`abacdeafdbgdchiajgeklamn_imaode peql

rajgeklamn_imaodepeql

shbgtgmgomaqclamn odccudebq`amnf shbg tgmgomaqc lamn d nabn idkk vaemg`

Fig. 3. Optical flow and edge detection.

To classify the extracted blobs as ’true target’ or ’false target’, the ground truth must be used. When the overlap of a detected blob with a real target is superior to a specific value (40%), the blob is classified as a ’true target’; otherwise, it is classified as a ’false target’. This training has been done for the three scenarios we are working with. Figure 4 shows a few training examples.

‚yƒ‚„zy… †‡|ˆzƒ‚y

wxyz{|} ~}€

ˆŽƒ ƒyƒ{yz‚ –|‡Žƒy— ˆŠ}ƒ ‹Œ|„ƒ z‚ |‘ ’ “ z‚ |‘ ’ “ z‚ |‘ ’ “ ”|‚‚… ”‡‚ƒ‡ •‹~ ˜™ š™ ›œœž œŸ  › ¡ ›šœ¡ ˜š›š ™›¢ £˜Ÿ˜¡ £˜¢˜ £˜ž¢ ¢› ˜ ˜  š ˜™ ¤¥ ‰

˜™ ›œ ›˜š š¡ž ž¡  ˜š˜¢ Ÿ¡¢ ›¢™ £˜™Ÿ ˜›Ÿ £˜ Ÿ ¢˜  ž › ™Ÿ  š¡¡ ˜Ÿ™š ¢˜¡ š›¢ ˜š  ¢œ› £š¢¡ š˜› ¢š› ˜™ž ¢

¢ ¢

  ¢

Fig. 4. Parameters of detected blobs and their classification.

¤¥ ¦w

7

4.2

Output: Trained System and Classifier’s Performance

The algorithm C4.5 has been used adjusting the parameter ’confidence factor’ to 0.0001 [8] in order to obtain a pruned tree small enough, but also with a small error rate. The trained system obtained, in the form of decision tree, is shown in figure 5.

§¨ ©ª ¨ «¬ ­®¯°± ­² ³ ´µ¶ «¬ °· ³ ³ §³ ¸¹ ³ «¬ º®°» ­¼¼½¾¶¿ÀÁ ÿÄÅÂà Ʒ ­½º®ºÇ·¯± ®ºÈ ³ ³ §³ ÉÊ ³ Ë º ®°» ­¼¼½ ³ ³ ³ ´µ¶ «¬ ¯»¾ ¶¿ÀÁ ÿÄÅÂà Ʊ·± ®ºÇ­½»®ºÈ ³ ³ ³ ´µ¶ Ë ¯»¾ ÿÄÅÂÃ Æ ­º· ®ºÇ­² ®ºÈ ³ ´µ¶ Ë °· ³ ³ ÌÍÎÏÐ «¬ ¯± ³ ³ ³ ÑÍÒÒÓ «¬ ±° ³ ³ ³ ³ ´µ¶ «¬ ²·¾ ÿÄÅÂà ƽ ®ºÇ­®ºÈ ³ ³ ³ ³ ´µ¶ Ë ²·¾ ¶¿ÀÁ ÿÄÅÂà Ʒ» ®ºÈ ³ ³ ³ ÑÍÒÒÓ Ë ±°¾ ÿÄÅÂà Ƽ®ºÇ· ®ºÈ ³ ³ ÌÍÎÏÐ Ë ¯±¾ ÿÄÅÂà ƻ² ®ºÇ·®ºÈ § ³ ÉÊ ³ Ë ­®¯°± ­² ³ ÑÍÒÒÓ «¬ º ³ ³ ÔÏÐ «¬ » ®¯¯¼ ­» ³ ³ ³ § ³ ÉÊ ³ «¬ ·®·½¼¯± ³ ³ ³ ³ ´µ¶ «¬ ½¾¶¿ÀÁ ÿÄÅÂà Ƽº ®ºÇ­°®ºÈ ³ ³ ³ ³ ´µ¶ Ë ½¾ ÿÄÅÂà ƽ ®ºÈ ³ ³ ³ § ³ ÉÊ ³ Ë · ®·½¼¯±¾ ÿÄÅÂà Ʊ²®ºÇ­±®ºÈ ³ ³ ÔÏÐ Ë »®¯¯¼ ­»¾¶¿ÀÁ ÿÄÅÂÃ Æ ­­± ®ºÇ­°®ºÈ ³ ÑÍÒÒÓ Ë º ¾ ÿÄÅÂà Ư ­¼¯ ®ºÇ¯°¼®ºÈ Fig. 5. Decision tree obtained by algorithm C4.5 for classifying detected blobs as real or false targets.

As it can be observed from the decision tree, only 5 out of the 15 attributes are significant; but they cover the three types of parameters: – Maximum value (max∆I) and standard deviation value (σ∆I) of the Intensity Gradient. In general, a high max∆I means that the detected blob is a true target and a high σ∆I (probably produced by noise) means that the detected blob is not a true target. – Mean value of the module of the Optical Flow (µ|OF |). In general, a blob with a high µ|OF | corresponds to a true target. – The values corresponding to Edge Detection obtained by canny algorithm and by the high-pass filter (HPF). In general, blobs with high Canny or HPF correspond to true targets. The algorithm provides as well the classifier’s performance in terms of the error rate. It has been executed with cross-validation, with the objective of getting a reliable error estimate. Cross-validation means that part of the instances

8

is used for training and the rest for classification and the process is repeated several times with random samples. The confusion matrix is used by C 4.5 to show how many instances of each class have been assigned to each class: Table 1. Confusion Matrix ’Target’ ’False Target’ classified as 2958 460

451 2607

’Target’ ’False Target’

In this case, 2958 blobs have been correctly classified as targets (True Positives, TP), 2607 blobs have been correctly classified as false targets (True Negatives, TN), 451 blobs have been incorrectly classified as false targets (False Negatives, FN) and 460 blobs have been incorrectly classified as true targets (False Positives, FP). The false negatives produce the deletion of true targets, which may have a cost with respect to no applying machine learning. The percentage correct classification (PCC) gives the correctly classified instances: P CC =

T otalT P + T otalT N T otalT P + T otalF P + T otalT N + T otalF N

(2)

The global PCC for our scenarios is 85.9327, that is 85.9327 % of the detected blobs have been correctly classified.

5

Evaluation of the Data Mining - Based Surveillance Video System

In this section, the Evaluation System described in section 3 is being applied to the Surveillance Video System with and without the Data Mining-based filter, and results are being compared. Firstly, the three scenarios that have been used throughout the whole research are briefly described. They are localized in an airport where several cameras are deployed for surveillance purposes. – The first scenario presents two aircrafts moving on inner taxiways between airport parking positions. A third aircraft appears at the end, overlapping with one of the other aircraft. Only the aircraft that overlaps is considered. – In the second scenario, there is an aircraft moving with partial occlusions due to stopped vehicles and aircraft in parking positions in front of the moving object. There are multiple blobs representing a single target that must be re-connected, and at the same time there are four vehicles (vans) moving on parallel roads. – Finally, in the third scenario, there are three aircrafts moving in parallel taxiways and their images overlap when they cross. The three aircrafts are considered.

9

The complexity of the videos, due to overlaps and occlusions, increases from the first to the third scenario. The Evaluation Function is calculated for each track, and the results are shown in figure 6.

ãÛäåÜæä çè éÛêä×Ú ãÛäå çè éÛêä×Ú òóôÚÜõ×óרä

ÕÖרÙÚÛÜ Ý àÚÙÖá â ßßëìÝ ÝìñëâÞ ö

ÕÖרÙÚÛÜ Þ àÚÙÖá â ìÞíîëïî ññíÞëâí ðëßß ÷

àÚÙÖá â ðâðîëðÝ ñðíâ ëíð Ýîëíâ ÷

ÕÖרÙÚÛÜ ß àÚÙÖá Ý ÝÞÞÞñëßß ïâññëÞí Þîëðî ÷

àÚÙÖá Þ ÝÝñÝâëÝð ï Ýíâëïâ ÞÝëÞì ÷

Fig. 6. Results of the Evaluation Function

As it was previously explained, the lower the evaluation function, the better the tracking system; so, in four of the five cases, the tracking system is improved. In the only case in which it gets worse is in the simple video, in which the original tracking system had no problems. It gets worse when using the DM Filter because the aircraft is detected one frame later. It is the cost of possibly removing true targets (false negatives), due to the DM filtering (section 4.2). However, in more complex situations the results are better. Next, an example of the most significant of the evaluation metrics in the Evaluation function is given: the number of detected tracks associated to an ideal track. In figure 7 it is shown the number of associated tracks to track 0 in scenario 3:

2

1

0 5

øùúûüýúþÿ

ùú

   

øùúû þÿ

Frame

ùú

69

   

Fig. 7. tracks associated to track 0 in scenario 3 with and without DM-based filter

It can be easily seen how the number of tracks, ideally one, improves with the Data Mining-based filter. Without filtering, the number of tracks associated to track 0 is different from ’1’ in 18 instances; whilst with filtering, this number reduces to 3, which supposes a decrease of 83.33%. We can see as well, the cost of filtering: during three frames the track is lost, due to the fact that real targets have been removed.

6

Conclusions

The initial surveillance video system has been improved in scenarios with some complexity by applying Data Mining-based filtering. The Data Mining-based

10

Filter decides whether the blobs extracted by the system correspond to real targets or not. Algorithm C4.5 has been used; this algorithm obtains a classifier in the form of a decision tree from a set of training examples. These training examples consist on detected blobs, characterized by several attributes, based on the following parameters: intensity gradient, optical flow and edge detection. Besides, the training examples must be classified as ’true target’ or ’false target’, for which, the ground truth, extracted by a human operator, has been used. The result surveillance video system has been evaluated with an evaluation function that measures the quality level. This quality level has been improved in all scenarios tested, except from one, in which the cost of filtering has become manifest. Because of filtering, blobs that correspond to real targets may be removed and this fact may cause the loss of the track or a later detection, what has occurred in the mentioned scenario. In any case, this scenario was the simplest one and the initial tracking system had no problems; so, we can conclude that in scenarios with more complexity Data Mining-based filtering improves the tracking system. In future works some actions will be undertaken to continue this approach, such as, applying machine learning to higher levels of video processing: data association, parameter estimation, etc.

References 1. Witten, I. H., Frank, E.: Data mining : practical machine learning tools and techniques with Java implementations, San Francisco Morgan Kaufmann (2000) 2. Rosin, P.L., Ioannidis, E.: Evaluation of global image thresholding for change detection, Pattern Recognition Letters, vol. 24, no. 14 (2003) 2345–2356 3. P´erez, O., Garc´ıa,J., Berlanga, A., Molina, J.M.: Evolving Parameters of Surveillance Video Systems for Non-Overfitted Learning, Lausanne, Suiza, 7th European Workshop on Evolutionary Computation in Image Analysis and Signal Processing (EvoIASP) (2005) 4. Besada, J. A., Portillo, J., Garca, J., Molina, J.M., Varona, A., Gonzalez, G.: Airport surface surveillance based on video images, FUSION 2001 Conference, Montreal, Canada (2001) 5. Besada, J. A., Portillo, J., Garca, J., Molina, J.M.: Image-Based Automatic Surveillance for Airport Surface, FUSION 2001 Conference, Montreal, Canada (2001) 6. Besada, J. A., Molina, J. M., Garca, J., Berlanga, A., Portillo, J.: Aircraft Identification integrated in an Airport Surface Surveillance Video System, Machine Vision and Applications, Vol 15, No 3 (2004) 7. Sanka, M., Hlavac, V., Bolye, R.: Image Processing, Analysis and Machine Vision, Brooks/Cole Publishing Company (1999) 8. Quinlan, J. R.: C4.5: Programs for machine learning, Morgan Kaufmann (1993) 9. Quinlan, J. R.: Induction of Decision Trees, Machine Learning, vol. 1 (1986) 10. Horn, B. K. P., Schunck, B.G.: Determining Optical Flow, Artificial Intelligence, 17, pp. 185-203 (1981) 11. Canny, J.: A Computational Approach To Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, 8, pp. 679-714 (1986)

Suggest Documents