Proceeding of the IEEE International Conference on Information and Automation Shenzhen, China June 2011
Human Tracking in Thermal Catadioptric Omnidirectional Vision Yazhe Tang, Youfu Li, Tianxiang Bai, Xiaolong Zhou
Zhongwei Li
Department of Manufacturing Engineering and Engineering Management City University of Hong Kong Kowloong, Hong Kong,China
[email protected] &
[email protected]
Department of Material Science and Engineering Huazhong University of Science and Technology Wuhan, Hubei Province, China
[email protected]
Abstract - We propose to explore a novel tracking system for human tracking in thermal catadioptric omnidirectional (TCO) vision, which is able to realize the surveillance in all-weather and wide field of view conditions. In contrast, previous human tracking system mainly focuses on tracking in conventional imaging system. In this paper, the proposed tracking method adopts the classification posterior probability of Support Vector Machine (SVM) to relate the observation likelihood of particle filter for efficient tracking. However, previous works only employ the final output label of SVM for classification. Due to no existing TCO vision dataset available in public, we establish a dataset including TCO videos and extracted human samples to train the classifier and test the proposed tracking method. Moreover, we adjust tracking window distribution of particle filter to fit the characteristic of catadioptric omnidirectional vision which is the size of target in omni-image depends on the distance between target image and the center of catadioptric omnidirectional image. Finally, the experimental results show that our proposed tracking method has a stable and good performance in TCO vision tracking system. Index Terms – omnidirectional vision.
Tracking,
Thermal,
So, the hardware of proposed surveillance system mainly consists of a thermal camera, a catadioptric omnidirectional mirror and a stand (Fig. 1). In addition, a representative of TCO image is given in Fig. 2. For the merits of the proposed surveillance system, it could have a wide range of applications, such as surveillance in the airport, security in the military, wild animal conservation and so on.
Fig. 1. Configuration of TCO system.
Catadioptric
I. INTRODUCTION Human tracking is a hot research topic in computer vision area, which consists of the main part of surveillance system. In modern society, the automatic surveillance system becomes more and more popular in a large variety of applications. However, the previous surveillance system [1] mainly focuses on the conventional vision system, which has a limited field of view and depends on the illumination. In this paper, we try to explore a novel tracking system for human tracking in thermal catadioptric omnidirectional (TCO) vision to realize surveillance under all-weather and wide field of view conditions. The proposed novel surveillance system can offer a much wider field of view and independent of illumination, which permit to acquire more information from the environment. In order to overcome the illumination restriction, the thermal camera is employed for it can detect the warmblooded animal (Human) in any illumination condition, no matter day and night, rain and fog. Furthermore, the price of thermal camera has reduced dramatically, so it becomes popular in civilian application over the recent years. To achieve a big field of view, the catadioptric omnidirectional sensor is able to meet our requirement, which can reflect the surrounding light (360 degree in horizon) into a single camera.
978-1-61284-4577-0270-9/11/$26.00 ©2011 IEEE
Fig. 2. Representative of TCO image.
Tracking in TCO vision has some big challenges due to only limited features can be used in thermal imagery and severe inherent distortion in catadioptric omnidirectional vision. To the best of the author's knowledge, there are very limited works relate to detection and tracking in TCO vision can be referenced by our work. Previously, researchers did some excellent works on detection and tracking in thermal vision. Even tracking in thermal imaging is difficult compare to traditional visible imaging but driven by its merit of illumination independent, and decreasing price of it, more and more researchers began to focus on thermal imagery vision. Recently, there have a flurry of works on human detection and tracking in thermal vision. [2] employed Support Vector Machine (SVM) for classification and use of Kaman filter
97
integrate mean shift to track in thermal imagery. In [3], the author presented a two-stage template-based method integrates with an Adaboosted classifier for pedestrian detection. [4] integrated SVM [5][6] and Histogram of Oriented Gradient (HOG) [7] to detect the pedestrian in thermal imagery. In [8], the author used a generalized expectation-maximization (EM) algorithm to separate infrared images into background and foreground layers and incorporate with SVM for classification, then present a graph matching-based method to meet the tracking purpose. In this decade, more researchers began to concentrate on omnidirectional vision tracking and detection for its wide field of view. In [9], a fisheye omnidirectional tracking system is introduced, the author used of optical flow to detect the target and employed color feature based kernel particle filter (KPF) to realize the single target tracking in omnidirectional vision. [10] presented a catadioptric omnidirectional surveillance system which use of multi-background modeling and dynamic thresholding to make a outdoor tracking in clutter to spot the sniper in the battlefield. [11][12] used of particle filter incorporates with color feature to realize the tracking in catadioptric omnidirectional vision. [13] introduced a catadioptric omnidirectional pedestrian recognition system for vehicle automation, which proposed a method of boosted cascade of wavelet-based classifiers combined with a subsequent texture-based neural network. [13][14] unwarped the catadioptric omnidirectional image to the panoramic image first, and then make the human recognition. However, I prefer to make tracking in the original catadioptric omnidirectional image rather than panoramic image due to unwarp the omniimage involve interpolation which is time consuming. In addition, unwarpping has the risk to split the target that locates at the border of the panoramic image. In this paper, we mainly address the tracking in TCO vision. So, the proposed system is initialized manually in detection phase. Due to only limited features can be adopted in thermal imagery, the HOG [7] feature will be employed as descriptor for contour encoding. Like the most of traditional methods [4] [7] [15], our proposed method adopts SVM to classify the extracted HOG feature in the TCO image. More important is the proposed tracking method intends to take advantage of the SVM classification posterior probability to relate the observation likelihood of particle filter for efficient tracking in TCO vision (Fig. 3). For the reason of no existing TCO vision dataset in public, a dataset including TCO videos and extracted human samples is formed. The extracted samples are used to train the SVM classifier, and TCO videos are employed for testing. In addition, polar coordinate is adopted to fit the catadioptric omnidirectional vision. Moreover, the tracking window distribution of particle filter is adjusted to fit the characteristic of catadioptric omnidirectional vision which is the size of target in omni-image depend on the distance between target image and center of catadioptric omnidirectional image. The paper is structured as follows. In Section II, present the proposed tracking method which consists of particle filter theoretical foundation, SVM classification posterior
probability, extracted TCO samples and tracking window distribution of particle filter based on catadioptric omnidirectional vision. Section III presents and discusses the experimental results. Finally, the conclusion will be followed in Section IV. Extracted HOG
Input Image
SVM
Classification Posterior Probability
Particle Filter
Fig. 3. The framework of proposed method
II. PROPOSED TRACKING ALGORITHM Tracking in TCO vision is difficult due to there are several big challenges have to be overcome, such as limited features in thermal image and inherent distortion in catadioptric omnidirectional vision. In this section, we propose to introduce a novel tracking method which is very suitable to apply in TCO vision. The proposed method integrates SVM with particle filter and makes some relevant adjustments to fit the catadioptric omnidirectional vision for efficient tracking. The detail of algorithm is shown as following. A. Particle Filter Particle filter [16]-[18] is also known as Sequential Monte Carlo methods (SMC), and it has been widely used for its advantage of nonlinear/non-Gaussian. In Bayesian framework, the particle filter is recursively obtain the state xk at time k, given the available observations z1:k = z1 , z 2 , " , z k up to time
(
)
k. Suppose posterior p xk −1 z1:k −1 at time k-1 is available, the
(
)
posterior p xk z1:k can be obtained recursively by prediction and update. The prediction stage makes use of the probabilistic system transition model p xk xk −1 to predict the posterior probability of time instant k. Prediction step:
(
p (xk z1:k −1 ) = p (xk xk −1 ) p (xk −1 z1:k −1 )dxk −1
³
)
(1)
At time instant k, an observation zk is available, the posterior p xk z1:k at time instant k can be obtained through
(
)
update prior by Bayes’s rule. Update step: p (x k z1:k ) =
Where
p (z k x k ) p (x k z1:k −1 ) p (z k z1:k −1 )
(2)
p(z k xk ) is the observation likelihood that
influences the weight distribution of each particle. In this paper, the observation likelihood can be obtained from the SVM classification posterior probability. At time instant k, each particle represents a hypothetical state x ki , with corresponding weight wki . So, the principle of particle filter is to utilize finite discrete N weighted particles
98
{x , w } i k
i k i =1, 2 ,", N
classified by SVM, the result of (6) expresses the extent of extracted feature be positive (class 1). That is pp=P(yi=1|f) represents the probability of sample xi to be classified into positive set (class 1). In contrast, pn=1-P(yi=1|f) is the probability of negative set (class -1). So, the achieved SVM classification posterior probability can be employed to obtain observation model p ( z k xk = xki ) as shown in the following
to approximate the continuous posterior
distribution. Of course, a large number of particles can approach to the true distribution with a greatest degree. However, the computational cast is increased as the N increasing. So, that is a trade-off problem in the practical situation.
B. SVM Classification Posterior Probability SVM [19] [20] is a very useful algorithm for pattern classification. It can discriminate the unlabeled samples correctly based on a trained model. Suppose the training sample set is provided as ൌ ሼሺݔଵ ǡ ݕଵ ሻǡ ǥ ǡ ሺݔ ǡ ݕ ሻሽ אሺܺ ൈ ܻሻ ,ܻ אሼെͳǡͳሽ, l is the number of the training samples. The SVM works by maximizing the margin between two classes in feature space as follows [20]: N 1 min J (w , b, ξi ) = w T w + C ¦ ξi 2 i =1
equation:
(
(
(
wki ∝ wki −1 p z k xki
)
(7)
)
(8)
Where is covariance and d is defined by (9). The obtained observation likelihood p z k xki is used to relate
(
)
i
weight w of the particles in (8). If the extracted sample is classified into positive set by SVM, the proposed algorithm calculates the distance d. Oppositely, the algorithm ignores the result if the extracted sample is classified into negative set. The obtained d expresses the distance between candidate sample and standard positive sample. In other word, if distance d getting small, the sample has a higher probability to be positive and the weight getting to big.
(3)
Subject to the constraints:
yi [ wT ϕ (xi ) + b] ≥ 1 − ξi , 0 Where i is the slack variable, C is the penalty factor, ϕ (⋅) is the mapping from input space to feature space. By taking the Lagrangian of (3), the original problem can be derived as:
1 l max ¦ α i − ¦ α iα j yi y j k (xi , x) 2 i =1, j =1 i =1
)
p zk xki ∝ exp − λ ⋅ d 2
݀ൌ ͳ െ
l
(4)
(9)
D. Extracted TCO Samples
Subject to the constraints: l
¦α y i
i
=0
αi ≥ 0
i =1
Where k (xi , x) = ϕ (xi ) ⋅ ϕ (x) is a kernel function, and i i=1,2,…l, is the Lagrange coefficients which is obtained from (4). The hyperplane function can be expressed as:
(a) Positive TCO samples;
m
f ( x) = ¦ α i y i x Ti x + b
(5)
i =1
Where m is the number of support vectors. Further, the obtained hyperplane function (5) can be used to derive the classification posterior probability [21] defined as follow:
1 P( y = 1 f ) = 1 + exp ( Af + B )
(b) Negative TCO samples; Fig. 4. Extracted TCO samples.
In traditional detection and tracking community, some public sample datasets are available. But for human tracking in TCO vision, there are no existing datasets can be utilized until now. So, we have to manually extract the human (foreground) samples in TCO images with a tedious and time consuming process. Also, the negative samples are obtained automatically from a set of TCO images not containing human. Some extracted samples are shown in Fig. 4. The formed TCO dataset can be used to train the classifier, test the proposed tracking method and further research on detection and tracking in TCO.
(6)
It expresses the posterior probability of this sample be classified into class 1 (positive set). Oppositely, 1-P(y=1|f) is the probability of sample be classified into class -1 (negative set). C. Likelihood As aforementioned, if an extracted local feature be
99
E. Tracking Window Distribution in TCO vision Tracking in TCO vision is different with traditional tracking system due to inherent distortion caused by light reflection of convex catadioptric mirror. So, the effective omni-image is the image on catadioptric mirror that is circular. Naturally, the polar coordinate is employed for TCO tracking system, and the center of omni-image is aligning with the origin of the polar coordinate. The origin O (0, 0) of the image XY coordinates is located on the top left corner. The center O1 (0, 0) of the polar coordinate can be obtained (Fig. 5).
results of proposed tracking method, gray-level information based particle filter and gray-level information based online adaptive particle filter are compared. Through test in a number of different experiments, the effectiveness, stability and robustness of proposed tracking algorithm are verified. 1) “Proposed method” is the proposed tracking method which utilizes the SVM classification posterior probability to impact the observation likelihood of particle filter. 2) “G-PF” is the particle filter based on gray-level information, which has a fixed reference template. 3) “A-PF” is the online adaptive particle filter based on gray-level information, which updates the reference template on time instead of fixed template. For a fair comparison, we set the same parameters for above three tracking methods, such as particles number N, covariance and so on. The dynamic models of the three tracking methods are random walking models which represent as xk = xk −1 + vk , where vk is a zero-mean Gaussian random variable. For Proposed method, the SVM classifier has been well trained with 1500 positive samples and 1900 negative samples. Due to limited space, we just show three experimental results with two shot in the daytime and one in the nighttime. In this paper, it is should be noted that we just test single target to discuss the feasibility of the Proposed method in TCO vision. After that, multi-targets tracking of TCO vision will be extended in our future work. In experiment I, a person walks around the TCO-sensor with a short time occlusion in the daytime. At the beginning, three trackers are initialized respectively, and they tracked the target successfully (Fig. 6). After several frames later, A-PF tends to drift which results in lost target at Frame 149 finally (Fig. 7). This is not difficult to understand since A-PF is easy impacted by the background although it good at adaptive to the changing of target. Moreover, only limited gray-level information can be adopted by A-PF in thermal image instead of three channels RBG features in normal visible image. In this experiment, only Proposed method and G-PF survive to track the human target successfully to the end. Although both of A-PF and G-PF are based on gray-level information, the reason of different results is A-PF keeps update the template to adapt the change of target that easy tend to drift but G-PF keep the fixed template that is effective when the distraction from background is limited. In addition, we set a short time occlusion to test reaction sensitivity of the tracking methods when target recover from the occlusion. As shown in the Fig. 8, the Proposed method can recover from the occlusion in time when target appear at Frame 332. In contrast, G-PF cannot capture the target sharply. Due to SVM classifier is well trained, the human target can be detected quickly even if the tracking window is not well center on the target. Instead, A-PF/G-PF requires relative restrict template match. If the particles cannot well land on the target, A-PF/G-PF has risk to fail. Especially in the situation of low number of particles N (N=30 in our experiment), the tracking performance of
Fig. 5. Polar coordinate in TCO tracking system.
In catadioptric omnidirectional vision system, the size of target S in the image varies with the distance D between target in image and origin of polar coordinate (Fig. 5). When the target approaching to the catadioptric omnidirectional sensor from far to near in world coordinate, the size of target S getting to big and distance D getting to small. That is to say, the size of target S in image depends on the distance D. Based on forementioned characteristic of catadioptric omnidirectional vision, the particle filter can distribute the tracking window more reasonable. The size of tracking window S is changing proportionally with the distance between particle and center of omni-image. In practical situation, the real size of tracking window is depend on the system configuration, such as the curvature of catadioptric mirror, the orientation of catadioptric mirror, the height of catadioptric mirror from ground and so on. In this paper, we adopt HOG feature as the contour descriptor of target. We use a 2 by 2 cell array to form a block, each cell consists of 4 pixels and the 9-bin histogram of gradient magnitude at each orientation is computed. So, each block forms a 36-dimensional vector. Finally, the proposed tracking method is formed. The SVM classification posterior probability is integrated with the particle filter, and the tracking window distribution based on characteristic of catadioptric omnidirectional vision is proposed. According to above, the proposed tracking method should able to track the human in TCO vision efficiently. The detail experiments are show below. III. EXPERIMENT In the beginning of this paper, the configuration of TCO system has been introduced. In this section, the experiment
100
well. After a few frames later, Proposed method and G-PF lose the target when the target disappear at the distant place for a while. However, the Proposed method can capture the target immediately when it appears again because the particles of Proposed method look around the target at the nearby place where it disappeared (Fig. 10). In contrast, G-PF loses the target forever due to the particles of G-PF have drift to the wrong place where has similar gray-level distribution as target.
Proposed method is much better than A-PF/G-PF in TCO vision. In experiment II, a person walks from near to far relative to the TCO-sensor in the daytime. Three trackers are well initialized as experiment I and they track the human target well in the first few frames. Several frames later, A-PF begin to drift and lose the target completely at the Frame 103 due to accumulated changing of adaptive template (Fig. 9). In the meantime, Proposed method and G-PF still work on tracking
Fig. 6. Experiment I, Frame 1. From left to right: Proposed method, G-PF, A-PF
Fig. 7. Experiment I, Frame 149. From left to right: Proposed method, G-PF, A-PF
Fig. 8. Experiment I, Frame 332. From left to right: Proposed method, G-PF, A-PF
Fig. 9. Experiment II, Frame 103. From left to right: Proposed method, G-PF, A-PF
Fig. 10. Experiment II, Frame 202. From left to right: Proposed method, G-PF, A-PF
101
Fig. 11. Experiment III, Frame 32. From left to right: Proposed method, G-PF, A-PF
Fig. 12. Experiment III, Frame 362. From left to right: Proposed method, G-PF, A-PF [4] F. Suard, A. Rakotomamonjy, A. Bensrhair, Pedestrian detection using infrared images and histograms of oriented gradients, Intelligent Vehicles Symposium 2006, June 13-15, Tokyo, Japan [5] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121-167, 1998 [6] V. Vapnik. Statistical Learning Theory. Wiley, 1998 [7] N. Dalal, B. Triggs, Histogram of Oriented Gradients for Human Detection, CVPR, 2005. volume 2, pp 886–893 [8] C. X. Dai, Y. F. Zheng, X. Li, Layered representation for pedestrian detection and tracking in infrared imagery, Computer Vision and Image Understanding. 106 (2007) 288-299 [9] S. Y. Yang, G. W. Min and C. Zhang, Tracking unknown moving targets on omnidirectional vision, Vision Research, 49 (2009), pp362-367 [10] T. E. Boult, X. Gao, R. Micheals, M. Eckmann, Omni-directional visual surveillance, Image Vis. Comput. 22, (2004), pp 515-534 [11] J. C. Bazin, K. J. Yoon, I. Kweon, C Demonceaux, P Vasseur, Particle filter approach adapted to catadioptric images for target tracking application, IEEE, BMVC, 2009. [12] O. A Jaime and B.C Eduardo, Omnidirectional vision tracking with particle filter, ICPR 2006. [13] W. Schulz, M. Enzwiler and T. Ehlgen, Pedestrian recognition from a moving catadioptric camera, Proceedings of the 29th DAGM conference on Pattern recognition 2007, pp.456-465 [14] A. L. C Barczak, J. O. Jr and V. G. Jr, Face tracking using a hyperbolic catadioptric omnidirectional system, Res. Lett. Inf. Math. Sci, 2009 (13), pp55-67 [15] W. L. Ye, H. P. Liu, F. C. Sun and M. Gao, Vehicle tracking based on co-learning particle filter, IROS, 2009, pp.2979-2984 [16] M. S. Arulampalam, S Maskell, N Gordon and T Clapp, A Tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking, IEEE Transactions on Signal Processing, (50), 2002, pp 174-188 [17] A. Doucet, J.F.G. de Freitas and N.J.Gordon, “An introduction to sequential Monte Carlo methods,” in Sequential Monte Carlo Methods in Practice, A. Doucet, J.F.G.de Freitas and N.J.Gordon, Eds. New York: Springer-Verlag, 2001 [18] M. Isard, A. Blake, Condensation-conditional desity propagation for visual tracking, IJCV, vol.29,no.1,1998, pp.5-28 [19] V. Vapnik, The Nature of Statistical Learning Theory, Spring-Verlag, New York, 1995 [20] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121-167, 1998 [21] J. C. Platt. “Probabilistic outputs for support vector machines and comparisions to regularized likelihood methods”. in Advances in Large Margin Classifiers, Alexander J. Smola, Peter Bartlett, Bernhard Scholkopf, Dale Schuurmans, eds., MIT Press, (1999)
In experiment III, a person walks around the TCO-sensor in the nighttime, and system also well initialized at the beginning. As previous two experiments, A-PF loses target again at Frame 32 (Fig. 11). At Frame 362, G-PF also loses the target completely without occlusion and disappearance but caused by background distraction (Fig. 12). The reason of GPF lost target is due to very similar gray-level feature distribution of surrounding background with reference template at Frame 362. Finally, only Proposed method tracks the human target well and survive till the end of experiment. In all experiments, Proposed method not only tracks the human target successfully but well focus on center of target and very limited bias occur. In contrast, G-PF and A-PF often happen to bias and drifting in some difficult situations, which cause to lose target in the end. In summary, the above three experiments can effective verify the Proposed method has a good performance on tracking the human target in TCO vision. IV. CONCLUSION In this paper, we present a novel human tracking system in TCO vision. The proposed tracking method makes an efficient utilization of SVM classification posterior probability to relate the observation likelihood of particle filter for tracking purpose. Moreover, the proposed tracking approach has achieved a good performance on tracking the human in TCO vision through a variety of experiments. Finally, we will extend our research to multi-target tracking in TCO vision for real TCO surveillance system in the future. REFERENCES [1] P. Viola, M. J. Jones, and D. Snow. Detecting pedestrians using patterns of motion and appearance. In Proc. ICCV, Volume 2, pages 734-741, 2003 [2] F. Xu, X. Liu, K. Fujimura, Pedestrian detection and tracking with night vision, IEEE Trans. Intell. Transport. Syst. 6 (2005) 63-71 [3] J. Davis, M. Keck, A two-stage approach to person detection in thermal imagery, Proc. Workshop on Applications of Computer Vision, 2005, IEEE OTCBVS WS Series Bench
102