Learning system for Mobile Robot Detection and Tracking

4 downloads 216 Views 557KB Size Report
target-tracking system specific for mobile robots. We used in our system the Gabor filter to extract the robot features. Robot detection is based on the Support ...
The 2nd International Conference on Communications and Information Technology (ICCIT): Wireless Communications and Signal Processing, Hammamet

Learning system for Mobile Robot Detection and Tracking Sonda Bousnina, Boudour Ammar, Nesrine Baklouti, Adel M. Alimi REGIM: REsearch Groups on Intelligent Machines, University of Sfax, National Engineering School of Sfax (ENIS), BP 1173, Sfax, 3038, Tunisia {sonda.bousnina, boudour.ammar, nesrine.baklouti, adel.alimi}@ieee.org

Abstract—Visual detection and tracking is an important and challenging problem in the area of computer vision. Numerous researches have been undergoing. In this paper, we present a target-tracking system specific for mobile robots. We used in our system the Gabor filter to extract the robot features. Robot detection is based on the Support Vector Machine (SVM) classifier. Once the detection is accomplished, the Kalman filter is employed to track the detected robot. Experimental results have been extracted for a set of video sequences with the moving robot at different positions and with a variation of backgrounds.



The rest of this paper is structured as follows: the next section presents an overview of the literature related to the work presented in this paper. Part 3 gives some definitions of the used techniques. Part 4 describes the proposed detection and tracking process. Part 5 presents a summary the main achievements of the work and discussion of the obtained results. Finally, part 6 draws conclusions and provides suggestions for future works. II.

Keywords-object detction; object tracking; feature extraction; kalman filter; SVM; gabor filter; robotics.

I.

Visual detection and tracking has becoming one of the most popular research topics and it is playing an important role in computer vision applications such as surveillance, humancomputer interaction, vehicle navigation and robotics [2].

There exist lots of ways to accomplish object detection using machine learning techniques such as neural networks, adaboost, support vector machines, etc [2; 4; 5].

In recent years, robotics has witnessed a large growth and profound change in scope. Multirobot systems are becoming pertinent as a result of the increasing number of industrial, service, and exploration robots in current use [1]. Human to robot and robot to robot interaction are nowadays among the most studied aspects of the discipline [11]. Therefore, it seems a very promising idea the study of a system for robot detection and tracking.

Even more, there are several methods for feature extraction; using Gabor filters, Wavelets, etc [2; 4; 5]. Concerning object tracking, Kalman Filtering, Extended Kalman Filtering and Particle Filtering are some of the most common used algorithms [12; 2]. Sun [22] proposed a method for vehicle detection based on Haar wavelets and Gabor filters features. In both cases, the decision boundary is calculated using support vector machines (SVMs). The experimental results and comparisons using real data illustrate the effectiveness of both types of features for vehicle detection, and with Gabor features performing was better.

The work presented in this paper fits in the context of the application of visual tracking in the field of robotics, which presents a target-tracking system for mobile robots. The proposed work includes two main steps: robot detection and robot tracking. Robot detection is based on the Support Vector Machine (SVM) classifier and the Gabor filter for robot feature extraction. The concept of robot tracking is realized, once the detection is accomplished, using the Kalman filter.

Yang [21] studied the classification rates for four different combinations of features and classifiers: each one of SIFT descriptors and Gabor texture features classified using MAP classifiers and also using SVMs. The best results are achieved by Gabor texture features and SVM classification.

The principal contributions of the work presented in this paper are summarized below: Feature extraction based on Gabor filter.



Detection process using SVM learning algorithm with inputs the Gabor filter descriptors of image and outputs 1 and 0 for respectively robot and nonrobot class.

978-1-4673-1948-5/12/$26.00 ©2012 IEEE

RELATED WORKS

It has been proposed numerous approaches for visual detection and tracking. These principally differ from each other by the choice of a suitable object representation for tracking, the used image features, also, the motion, the appearance and the shape of the modeled object [2].

INTRODUCTION



Tracking process based on Kalman filter.

Zhou [18] presented a system for tracking and classifying moving objects using single and multiple cameras in an outdoor environment. A robust tracking is realized using feature fusion and multiple cameras. The proposed method incorporates spatial position, shape and color information for tracking object blobs. The trajectories acquired from

389

individual cameras are integrated by an extended Kalman filter (EKF) to resolve object occlusion. They combine spatial position, shape and color to achieve good performance in tracking people and vehicles.

B. Support Vector Machines In computer science, a support vector machine (SVM) is a notion for a set of associated supervised learning methods which aim to analyze data and recognize patterns, used for classification and regression analysis [8; 17].

Young [16] uses SIFT features and Kalman filters to learn the motion which improve robustness in the cases when it loses track of the object.

As a classifier, the SVM is utilized to cluster data into two classes as a result of finding the maximum marginal hyperplane separating two classes from each other. This maximized margin of the hyperplane is defined as the distance linking the hyperplane and the closest data points. These lasts recline on the border of the margin of the hyperplane are the support vectors [7; 18].

Gupta [13] presents a single object tracking system by means of the dynamic template matching and frame differencing and an industrial camera has been utilized to grab the video frames and track an object. Ammar [5] provided an incremental learning system for human detection and tracking based on AdaBoost learning. Ammar gave a comparative study of the HOG descriptors, the SIFT descriptors and their combination for human detection. Additionally, an incremental PCA is employed for the tracking process. III.

In a binary classification problem where feature extraction is initially performed, the training data can be defined as below: x ; y x ; y x% ; y% ; x& ∊ R) ; y& ∊ {−1; +1}

SVMs have been applied for various applications ranging from face detection and recognition, handwritten character and digit recognition, speaker and speech recognition, information and image retrieval, and prediction, etc [8; 17].

DEFINITIONS

A. Gabor Filter Gabor filter is defined as a linear filter used for object detection [10]. In the spatial domain, we can define a 2D Gabor filter as a Gaussian kernel function which is modulated by a sinusoidal plane wave defined as follow:     

Gx, y; θ, f = exp − 

+

   

 cos2πfx"

x" = xcosθ + xsinθ

y" = −xcosθ + xsinθ

C. Kalman Filter The Kalman Filter (KF) is a set of mathematical equations that gives an efficient computational recursive means in order to estimate the state of a process, in order to minimize the mean square error [10; 12].

(1)

The model of Kalman filter supposes the true state at time k is determined from the state at (k − 1) as the following equations [12]:

(2) (3)

X . = F. X.0 + B. U. +W.

Where •

θ is the orientation of the Gabor filter,



f is the frequency of the cosine wave,



σ and σ are the standard deviations of the Gaussian envelope along the x and y axes, respectively,



xθ and yθ define the x and y axes of the filter coordinate frame, respectively.



• •

(4)

(5)

Where

Gabor filters are characterized by self similarity: they can be generated from one single mother wavelet through dilation and rotation. Therefore, a set of Gabor filters with diverse frequencies and orientations may be useful for extracting constructive features from an image [15; 16].

F. is the state transition model implemented to the previous state X.0 ;

B. is the control-input model applied to the control vector U. ;

W. is the process noise assumed to be established from a zero mean multivariate normal distribution and covariance Q . .

At time k, an observation (or measurement) Z. of the true state X. is obtained according to Z. = H. X.0 +V.

The frequency and orientation representations of Gabor filters are similar to the human visual system perception and they have been found to be mostly suitable for texture representation and discrimination [7].

(6)

Where •

Thus, Gabor filters have received recently a considerable attention and an increased interest in image processing They have been successfully applied in many applications such as texture analysis and segmentation, character recognition, face detection, iris recognition, fingerprint recognition and object detection [14; 15; 20; 21; 22].



H. is the observation model mapping the true state space in the observed space. V. is the observation noise supposed to be zero mean Gaussian white noise and covariance R . .

The Kalman filter can be written as one equation; however it is most frequently conceptualized as two different phases: Predict phase and Update phase [9]. This process is repeated, considering a diverse coercion at every time instance. The

390

from 64*64 pixels to 640*480 pixels in order to allow a robust detection for variable position and dimension of the robot.

entire the measured data is accumulated through time and aids to predict the state. The state of the filter is characterized by two variables: •

8 X .|. : a posteriori state estimated at time k with known observations up to and including at time k;



P.|. : a posteriori error covariance matrix.

A. Feature Extraction First, we apply the Gabor filtering on an image in order to extract robot features as illustrated in the figure 1 below. After that, the features are saved on a descriptor vector.

The calculations required are: State prediction, before measurements are taken: 8 X .0 =F 8 X .0 + BU.0

P. 0

(7)

= FP.0 F + Q

(8)

;

State update, after measurements are taken: K . = P. 0 H ; HP.0 H ; + R 0

8 . =X 8 .0 + K . Z. − HX 8 .0 X P. = I − K . H P. 0

(9)

(a)

(10)

Figure 1.Robot features extraction using the Gabor Filter (a) input image of the robot (b) output image of the robot feature extraction

(11)

Where K is the Kalman gain matrix and P is the covariance matrix for the state estimate, containing information about the measure of the estimated accuracy.

A number of Gabor filters using different frequencies and orientations may be useful for extracting constructive features from an image. For that reason, we added new descriptor vectors to every image of the database by the use of the symmetry relative to the x-axis and y-axis and the rotation relative to the x-axis and y-axis. We apply the Gabor filter to the database of the robot images and non-robot images in order to get a vector descriptor of each image.

The recursive formulas produce more confident predictions because they value future points less heavily as compared to the experience gained from successfully reducing the magnitude of error in its predictions. The filter adapts to changeable measurement time intervals and is able to provide error estimates [9].

B. Robot detection Once the feature extraction is done, we add to the end of the all the descriptor vectors a bit referring to the class of the image. We add bit “1” for the positive samples to refer to the robot class, and a bit “0” to the negative samples to refer to the non-robot class.

The KF is one of the mainly well-known and frequently used tools for so called stochastic state estimation from noisy sensor measurements. It is one of the greatest possible, optimal, estimator for a big class of systems with uncertainty and a very effective estimator for an even larger class [9]. The Kalman Filter has been employed in a wide range of applications. Control and prediction of dynamic systems are the main areas. A KF can for example be used to control and track continuous manufacturing processes, aircrafts, ships, spacecrafts, and robots [2; 6; 9]. IV.

(b)

Given a number of positive and negative training samples, an SVM training algorithm builds a model that assigns new examples into one category or the other. To test an input image of the environment, we apply the Gabor filter extraction to the image, a feature vector is generated. This last is tested using the SVM classifier and finally we obtain a result of the detection.

OVERVIEW OF OUR SYSTEM

The goal of our system is to detect and track a mobile robot in different environments.

C. Robot Tracking To use Kalman filter for robot tracking, we assume that the motion of the robot is almost constant over frames. The state variables, dynamic matrix and measurement matrix usually used for 2D tracking. The measurements a robot makes need to be combined to form an estimate its location.

We started by learning the characteristics of the robot class from a set of training images capturing the variability in robot appearance. The training images are represented by a set of features which are extracted using Gabor filter. Then, the decision boundary between the robot and non robot classes is computed through learning.

We start by the initialization of the matrices and parameters of the Kalman filter based on the initial position of the detected robot. Then, we use initial conditions and model to make prediction and measurements. Next, we correct the prediction due to measurement. We repeat this process predictor–corrector iteratively and all the measured data is accumulated throw time and helps in predicting the state until we find the measurement the closest to the real state.

After that the robot tracking is realized, once the detection is accomplished, using the Kalman filter (see figure 2). Datasets: We have created a database of positive samples and negative samples of the robot for the SVM classification. The training set is composed of 300 negative samples and 700 positive samples of the robot with diverse sizes varying

391

Figure 2.System overview of the robot detection and tracking system

V.

In the video sequence 1 the robot is moving in indoor environment with presence of bright ambient light and a background texture close to that of the robot. Also the robot is captured with different appearances (left and right) and positions from close and far.

TESTS AND EVALUATION

A. Experimentation and results To evaluate the detection process, we have used a test set of 50 positive samples and 20 negative ones. The robot detection process is done by sliding a search window through the frame image and checking whether an image region at a certain position is classified as robot or non-robot. We aim to track a single robot in the image frame, so, in order to increase the rapidity of our system, we start by scanning the input image. Once, we found a match, we stop the scan process and we draw a small green circle on the position of the detected robot.

The robustness of the tracking method is observed in the frames 14, 65and 116of the video. In contrast to video 1, in video 2 we have tested the robot tracking in a non-luminous environment and we obtained successful results as shown in the frames 6, 28 and 105.

Regarding the robot tracking system, the experimental results have been extracted for a set of video sequences with the moving robot at different positions, with a variation of backgrounds and containing different effects such as illumination variation, obscured object and occlusions. In table 2, we show the results of the robot detection and tracking corresponding to four of the captured video sequences.

We have also checked the effectiveness of the proposed method with obstacle presence and partial occlusion as illustrated in frames 58, 157 and 388 from video sequence 4.

In the frames 12, 51 and 84 from the third sequence, the robot detection and tracking are submitted to image modifications due to appearance variation (from close and far position) and background variation.

Despite all image changes present in the video sequences used, due to robot movements, light intensity changes, and even partial occlusion, the system proposed in this work was able to effectively detect and track the mobile robot.

392

The accuracy of Kalman tracking was based on the calculation of the covariance matrix for the state estimate, containing information about the measure of the estimated accuracy.

B. Evaluation and discussion To evaluate the performance of our system, we calculate the recall and precision [9], which are defined as below: Precision =

?@

?@ AB@

(12) (13)

In [18], the object tracking in an outdoor environment using fusion of features and cameras tracked 13 objects out of 17 moving objects with an accuracy of 76%. Among the failed objects, three were due to occlusion and the testing videos gave an accuracy of 82%.

t G : (true positive or true alarms) are the robot blobs detected from the ground truth: Correct result

For our system based on Kalman tracking, the testing videos gave an accuracy is 94%.

Recall =

?@

?@ ABE

Where: • •





fG (false positive or false alarms) are the robot blobs not detected from the ground truth : Unexpected result

These results prove the efficiency and the strength of the proposed detection and tracking system. The main advantage of the proposed system is the high rate of the robot detection and tracking.

t % : (true negative or true events) are the non-robot blobs not detected from the ground truth : Correct absence of result

VI.

f% (false negatives or missed events) are the not robot blobs detected as robot from the ground truth : Missing result

Challenges in visual detection and tracking can arise due to unexpected object motion, changing appearance patterns of the object and the scene, object to object occlusions and object to scene occlusions, non-rigid object structures and camera motion, etc. There has been proposed numerous approaches for visual detection and tracking. The choices for such system depend on the context of tracking and the domain where is performed and the required information to track. We compare some related works to our system which are summarized in the table 1. TABLE I.

CONCLUSION AND FUTURE WORKS

There is a rising interest in moving object detection and tracking algorithms in the fields of computer vision, reconnaissance, robotics, etc. In this paper, we have thoroughly outlined the problem of robot detection and tracking. To solve this problem, we proposed a system able to detect and track a moving mobile robot and estimate its position in a noisy dynamic environment. The combination of Gabor texture features and SVM classification give great results for detection. The Kalman filter is an optimal iterative predictor-corrector estimator for tracking. Multirobot systems are becoming pertinent as a result of the increasing number of industrial, service, and exploration robots in current use. Large systems of various autonomous but networked units, able to act in and on the environment, will be soon a reality. For further works, we can propose a wireless network system of cooperative communicating objects.

EVALUATION MEASURES FOR THE ROBOT DETECTION

ACKNOWLEDGMENT

Related works Methods Precision [21] SIFT &MAP 84.5% [21] SIFT & SVM 76.2% [21] GABOR & MAP 73.9% [5] SIFT & HOG & AdaBoost 91,32% Our method GABOR & SVM 92% The authors in [21] studied the classification rates for four different combinations of features and classifiers. The best precision results are achieved by Gabor texture features and SVM classification 89.8%. The next best combination, SIFT descriptors and MAP classification, 84.5%. Then global SIFT descriptors classified using SVMs 76.2% and in last local Gabor texture features classified using MAP classifiers73.9%.

The authors would like to acknowledge the financial support of this work by grants from General Direction of Scientific Research (DGRST), Tunisia, under the ARUB program. REFERENCES [1]

[2]

[3]

In [5] the HOG descriptors, the SIFT descriptors and the combination of SIFT and HOG were studied. The result of the combination of HOG and SIFT; 91.32% is higher than the one given by only HOG or SIFT 80. 9% and 58.19% respectively. From the evaluation measures related to the results of realized test set samples for the robot detection; our system reaches a precision of 92% and a recall of 93.87%.

[4]

[5]

393

A. Bicchi, A. Fagiolini and L. Pallottino, “Toward a Society of Robots: Behaviors, Misbehaviors and Security”, IEEE Robotics & Automation Magazine, vol. 17, No. 4, pp. 26-36, December 2010. A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey”, ACM Computing Surveys, Vol. 38, No. 4, Article 13, 45 pages, Dec. 2006. Ammar B., Chérif F., and Alimi M. A., “Existence and uniqueness of pseudo-almost periodic solutions of recurrent neural networks with time-varying coefficients and mixed delays”, IEEE Transactions on Neural Networks 2011, Volume: 23 Issue:1, On page(s): 109 - 118 , Jan. 2012 Ammar B., Rokbani N., and Alimi A. M., “Learning System for Standing Human Detection”, IEEE International Conference on Computer Science and Automation Engineering (CSAE 2011), Page(s): 300 – 304, Shanghai 10-12 June 2011. Ammar B., Wali A., and Alimi A. M, “Incremental Learning Approach for Human Detection and Tracking“, 7th International Conference on Innovations in Information Technology (Innovations’11), Page(s): 128-133, Abu Dhabi 25-27 April 2011.

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

Neurocomputing Journal, Vol. 74, N° 6, p. 962-973, Amsterdam, The Netherlands, Feb. 2011. [15] L. Shao and R. Mattivi, “Feature Detector and Descriptor Evaluation in Human Action Recognition”, CIVR, p. 477-487, China, July

D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking”, IEEE Trans. Patt. Analy. Mach. Intell. Vol. 25, p. 564–575, 2003. D. Shan, R. K. Ward, “Improved Face Representation by Non uniform Multilevel Selection of Gabor Convolution Features”, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, N° 6, p. 1408 – 1419, Canada, 2009. G.Wang, “A Survey on Training Algorithms for Support Vector Machine Classifiers”, Fourth International Conference on Networked Computing and Advanced Information Management, p. 123 – 128, Gyeongju, 2008. G. Welch and G. Bishop, “An Introduction to the Kalman Filter”, 2003. J. Makhoul, F. Kubala; R. Schwartz; R. Weischedel, “Performance measures for information extraction”, In Proceedings of DARPA Broadcast News Workshop, Herndon, VA, February 1999. J. Ruiz-Del-Solar, R. Verschae, M. Arena and P. Loncomilla, “Robot Detection system in the soccer domain”, IEEE Robotics & Automation Magazine, vol. 17, No. 4, pp. 43-53, December 2010. J. Ruiz-del-Solar and P. A. Vallejos (2007), “Motion Detection and Object Tracking for an AIBO Robot Soccer Player”, Robotic Soccer, Pedro Lima (Ed.). K. Gupta and A. V. Kulkarni, “Implementation of an Automated Single Camera Object Tracking System Using Frame Differencing and Dynamic Template Matching”, Proceedings of the 2007 International Conference on Systems, Computing Sciences and Software Engineering (SCSS), p. 245-250, Bridgeport, CT, USA, December 3-12, 2007. L. Shao, R. Gao, Y. Liu and H. Zhang, “Transform Based SpatioTemporal Descriptors for Human Action Recognition”, TABLE III.

2010. [16] M. K. Young, “Object Tracking in a Video Sequence”, ETRI

Journal, Vol. 28, N° 3, p. 275-281, June 2006. [17] P. Viola and M. Jones, “Rapid object detection using a

[18]

[19]

[20]

[21]

[22]

boosted cascade of simple features”, IEEE Conference on Computer Vision and Pattern Recognition CVPR, Vol. 1, p. I-511 - I-518, Cambridge, USA, 2001. Q. Zhou and J.K. Aggarwalb, “Object tracking in an outdoor environment using fusion of features and cameras”, Image and Vision Computing, Vol. 24, N° 11, p. 1244–1255, 2006. S. Lee, B. Lee and A. Verri, “Applications of Support Vector Machines for Pattern Recognition: A Survey”, LNCS 2388, p. 213-236, Berlin Heidelberg, 2002. S. Zehang, G. Bebis, R. Miller, “Improving the performance of on-road vehicle detection by combining Gabor and wavelet features”, The IEEE 5th International Conference on Intelligent Transportation Systems, Singapore, 2002. Y. Yang and S. Newsam, “Comparing sift descriptors and gabor texture features for classification of remote sensed imagery”, 15th IEEE International Conference on Image Processing ICIP, p. 1852-1855, San Diego, CA, 2008. Z. Sun, G. Bebis and R.Miller, “Improving the Performance of On-Road Vehicle Detection by Combining Gabor and Wavelet Features”, The IEEE 5th International Conference on Intelligent Transportation Systems, 2002. Proceedings, p. 130 – 135, 2002.

SUCCESSFUL RESULTS FOR THE ROBOT DETECTION AND TRACKING

Video Number: 1 Number of frames: 142 Size: 320*240

Frame 14

Frame 65

Frame 116

Frame 6

Frame 28

Frame 105

Frame 12

Frame 51

Frame 84

Frame 197

Frame 388

Video Number: 2 Number of frames: 377 Size: 640*480

Video Number: 3 Number of frames: 100 Size: 320*240

Video Number: 4 Number of frames: 566 Size: 640*480

Frame 58

394