Human Tracking and Following Using Sensor Fusion Approach for ...

Human Tracking and Following Using Sensor Fusion Approach for Mobile Assistive Companion Robot Ren C. Luo, Nai-Wen Chang, Shih-Chi Lin and Shih-Chiang Wu Intelligent Robotics and Automation Laboratory Department of Electrical Engineering ational Taiwan University o. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan [email protected], [email protected], [email protected] Abstract – The ability to track and follow target person in intelligent service mobile robot is indispensable. A robust method for tracking and following a target person with a small size mobile robot by integrating single vision sensor and laser range finder is proposed. Instead of stereo-vision, we acquire the distance between mobile robot and target person by single camera. The laser range finder and vision sensor have their respective drawbacks. To compensate the drawbacks of each sensor we present the complementary data fusion approach – Covariance Intersection, it will complement the uncertainty of each sensor measure and enhance the reliability of human’s position information. The Virtual Spring Model is the control rule of mobile robot that can smoothly tracking target person. Experimental results validate the robust performance of the method. Index Terms – Sensor Fusion, Robot Tracking, Intelligent Robot, Virtual Spring Model

I. INTRODUCTION Mobile robots in recent years have been gradually integrated into our daily lives. Human Robot Interaction (HRI) becomes one of important topics. In tradition, HRI systems such as voice command and other Human-Computer Interface interact with user passively. Now we try to make mobile robot interact with users actively. For active interaction, automatically detecting and following users are essential and fundamental functions. Robot with such functions can offer better service to users. The challenges include detection of the target person, calculating the position and orientation of the one and real-time and robust methods. For target person detection, there have several methods such as distribution based on skin color [1], motion-based [2], and machine learning [3] those have been developed. Upward methods are using single camera to process. All the above methods are short of the distance information. In order to acquire the distance information, some papers propose some methods to detect people by using laser range finder (LRF) with high precision, widely range detection and high reliability. Therefore, many human detection approaches are using LRF to detect human’s leg or torso [4]. Although the laser range finder has good accuracy in distance measurement but the effects of identification human have not been very well.

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

So some research use two or more complementary characteristics such as combining vision sensor, laser range finder or ultrasonic sensors to enhance the accuracy of human detection [5][6][7]. Some of human detection methods use Particle Filter. Isard and Blake [8] introduce the Particle Filter for visual tracking. Even though the Particle filter can handle nonlinear and non-Gaussian systems, it is highly-dependent to sampling and have to design a good proposal distribution which is usually difficult to design. In order to cope with this problem, Merwe, Doucet, Freitas et al [9] developed the unscented Particle Filter that uses the unscented Kalman filter to produce proposal distribution. In current literature, mobile robots which have human tracking function are almost as high as human beings. But the field of how to use small size mobile robot to track human, there is no discussion with details. Single vision-sensor methods only detect human’s orientation but not distance information except stereo-visual sensor. In this paper, we introduce to detect and estimate the human’s distance and orientation by using single camera. The laser range finder also can detect the distance and orientation of human’s legs. Finally it integrates the information by the vision sensor and laser range sensor to do sensor fusion – Covariance Intersection that will enhance the reliability of human’s position information. The paper is organized as follows: Section II describes the system structure of the mobile robot. Section III presents the detection method of vision sensor and laser range finder. Section IV presents the method of sensor fusion that integrates the information of human’s position from each sensor. Section V presents the method of robot following. Finally the experimental results and conclusion show in Section VI. II. SYSTEM ARCHITECTURE A. Robot Hardware Architecture The robot we present is a friendly robot assistant – LuoGuide. The robot is specially designed by our Laboratory and used to assist to serve for the Laboratory. It includes a embedded PC (1.5GHz, 2GB), and is equipped with a laser range finder (URG), color camera, microphone, Wireless LAN, and has a storage box can help us to delivery the books or notebooks. The hardware structure is shown in Fig.2.

2235

Fig. 1 The height relations of robot and human. From right to left is a general height person, a normal size services robot and our experiment platform.

Fig. 2 Hardware structure of friendly robot assistant –LuoGuide.

B. The Human Tracking and Following Method Framework In this paper, we propose a method to use two kinds of sensor to accomplish the ability of human tracking and following. The human detection system is a parallel processing, dealing with image data and laser data at the same time. The flow chart is shown in Fig. 3. Fist, the visual and laser sensor detect human respectively and estimate the distance and orientation. After estimating position of human, it starts to do sensor fusion to reduce the data uncertainty and enhance the reliability of human’s position. Finally use the position comes from sensor fusion to conduct motion control to implement the real-time human tracking.

In order to accurately detect user’s position, a method for tracking and following people is proposed. The Adaboost algorithm is be recommend to fast detect objects [4], it is a machine learning approach for visual object detection which is able to processing images extremely rapidly and measure up high detection rates. For implementing real-time target person detection, we make use of Adaboost algorithm. There are three main contributions of fast object detection. The first is creating a new image representation called the “Integral Image” which extracts features very quickly on our detector. The second is Adaboost learning algorithm which selects a small number of important features from a larger set and generates extremely efficient classifiers. The third is using cascade structure to combine increasingly more complex classifiers which can let background regions of image to be quickly throw away and makes more calculation on likely object region. In other words, AdaBoost is an algorithm for constructing a ”strong” classifier as linear combination of “simple” “weak” classifiers. The above advantages mentioned make Adaboost has excellent performance for fast object detection. To make a better detection from any side view of the target person, so we train our classifier, including the positive sample from front side, back side and profile human’s upper half-body image as shown in Fig. 4. That can realize no matter which side view the mobile robot detect, it all will take it as an object. Adaboost produces many weak classifiers after training, and then cascade those weak classifiers to become a strong classifier as shown in Fig. 5. Our training data have 1,000 pictures of positive sample (human), 3,000 pictures of negative sample (not human), and the strong classifier stage is split to 12 stages.

Fig. 4 Each of training samples includes front side, back side and profile human images.

Fig. 5 The structure of the human detector Fig. 3 The human Detection flow chart

III. THE HUMAN TRACKING ALGORITHM A. Image Processing

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

When the human upper half-body is detected by the Adaboost algorithm, we deduce the relationship between mobile robots and human. Suppose people are in front of the

2236

mobile robot in the world coordinate, the relation of relative position appears by pinhole mode. Assume that human beings stand in front of mobile robot. There is no offset angle as shown in Fig. 6. And H m with H c are respectively representative height of human and camera, in addition they are known. Dm is the distance of human and robot, θc is camera’s angle of elevation, f is camera focal length, the human’s position in image plane is (u ' , v' ) , by the pinhole mode can be obtained the object project on image plane produce an angle φ with center-line on v direction. If φ was obtained, the distance between mobile robot and human Dm can be calculated by (1).

Dm =

Hm − Hc tan(θc − φ )

; φ = tan −1 (

v'−v0 ) f

(1)

If human beings stand in front of mobile robot but not on the center-line as shown in Fig. 7, the object project on image plane produce an angle ϕ with centre-line on u direction this was the offset angle between robot and human. After applying (2) we get the offset angle ϕ . By the relation of similar triangles (3), it is easy to acquire the length Dh .

ϕ = tan −1 (

u '−u0 ) f

f h ; h= = Dm + f Dh + h

f 2 + ∆u 2

(2) (3)

Fig. 7 Human beings has offset angle to estimate position schematic

B. Laser Processing The laser scanner on robot platform is Hokuyo URG-04LX, and it is fixed on the robot approximately 30cm above the ground. Therefore it can only perceive the legs of people. This Sensor, which determines the distance by measuring the time of flight (TOF) of the emitted pulse, commonly applied in automation field [10]. The Hokuyo URG-04LX sensor scans at 10Hz, the sensor can rotate by a significant amount between the start and end of a given region thus it becomes necessary to interpolate the servo’s position between readings using this velocity [11], [12], [13]. Generally, a person can be located by two closely positioned segments (as shown in Fig. 8). Each leg can be perceived an arc-like shape, and the diameter of the arc is close to 70mm~80mm. The human detecting is based on couple arclike segments congregating in a laser scan, and set the center error δ = 40mm. If the intersection points of any perpendicular bisector of the segment are in the circle with the radius δ , the segment is a candidate leg of person. When there are many leg-like disturbers (such as table or chair legs), the pair-legs is difficult to distinguish. For our experiments, URG-04LX is configured to work in 4 meter range mode, with the angular resolution of 0.352°(360° /1024 step). In order to have a physical calibration for the further error propagation and proper treatment of the spatial uncertainty in the feature extraction, we experimentally determine the main uncertainty components in the range measurements of the URG-04LX scanner. A more detailed characterization of the URG-04LX sensor is provided by [14].

Fig. 6 Human beings stand in front of robot’s schematic and they is no offset angle.

rˆ = r + δr , δr = (0,σ r )

Finally the light of the above mentioned methods can be defined relations of distance and angle between robot platform and human. So in the section IV, we introduce sensor fusion of CCD and Laser Range Finder (LRF) which estimate the human’s position.

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

where

r is an actual range and σ r

(4) is a deviation.

The uncertainty in the angle of the measurements is caused by the finite resolution of the scanner mirror encoder and the finite measuring width of the laser. However, in most of the laser scanners this uncertainty is very small, and it is often neglected [15]. In order to adjust the Hokuyo URG-04LX’s measuring inaccuracy, we aimed with 50cm, 100cm, 150cm, 200cm, 250cm, 300cm corresponding bias measuring for 1000 times. Computing the mean bias, variance and distribution of

2237

each simple distance, to accomplish reduce the space uncertainty and inaccuracy. Detail results were shown in Fig. 9.

together to yield an output µ o , where µ v and µl are estimation from vision and laser range finder system model, and Pvv , Pll and Po are their covariance. The Covariance Intersection Algorithm (CI) [16] is used to fuse these. The intersection is characterized by the convex combination of the covariance, and the Covariance Intersection algorithm is: (5) Po−1 = ωPvv−1 + (1 − ω ) Pll−1

Fig. 8. Laser processing perceives the legs of people. The center of couple legs (red).

70 60 millimeter(mm)

(6) where ω ∈ [1, 0], and ω modifies the relative weights assigned to Pvv−1 and Pll−1 . Different choices of ω can be used to optimize the covariance estimate with respect to different performance criteria such as minimizing the determinant of Po−1 . The fact that this update is conservative for every ω and

Bias data Regression curve Variance data

80

Po−1 µ o = ωPvv−1 µ v + (1 − ω ) Pll−1 µ l

50

it can be shown using a proof by which demonstrating the matrix (7) Po − E[ µ~o µ~oT ]

40 30 20 10 0

500

1000 1500 2000 Measurement Distance (mm)

2500

µ~l are the errors in µv and µl .

IV. COMPLEMENTARY FUSION ALGORITHM In general, for the feature extraction from human with single camera, we can obtain reliable orientation information from vision system, but may get rough depth information. On the other hand, LRF model process is in contrast with vision process, it is easy to measure the accurate distance of an extracted object, but it is hard detect and recognize the feature in the environment. In these proposed approach, we combine the vision feature with range model feature to reduce uncertainty of detecting and recognizing the position of the human. Sensor fusion improves the measurement performance when different sensor modalities are measuring the same target. Fig. 10 shows the complementary correlation. The ellipses represent the measuring uncertainties. 3 2

In other words, the Covariance Intersection method provides an estimate and a covariance matrix whose covariance ellipsoid encloses the intersection region. The estimate is consistent irrespective of any value of Pvl . If leg features are observed in a laser scan (LRF features), these features have to be associated to the already known ones in the state vector. A common way to do this is to calculate the Mahalanobis Distance, and then the Covariance Intersection Algorithm (CI) is used to fuse these to estimate an accurate human position. The detailed algorithm processing as follows:

Complementary Fusion Algorithm Data: all LRF features ( µl , Pll ) in a local scan i

x  i µ li =  li  and corresponding covariance matrix Pll i

1

 yl 

Vision Process

• Estimate Parameter of vision observation

-1 -2

-3

(a)

i

Result: To estimate of the tracking human position information, and update them when they are identical. Initialize: object detection = False For all LRF features from each scan do

Fusion Result

0

i   µ v =  v  and corresponding covariance matrix Pvv y  v

x

LRF Process -2

-1

0

1

2

3

(b)

•Compute Mahalanobis Distance between vision observation and all LRF features in a laser scan

Fig. 10. (a) Shows the complementary correlation, and (b) shows the fused correlation.

DM ( µli , µ v ) = ( µli − µ v )T S −1 ( µli − µv )

Two pieces of information, µ v and µl , are to be fused

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

Pvl

between the two prior estimates: ~ = P (ωP −1µ~ + (1 − ω ) P −1µ~ ) , where µ~ and The error µ o o v v l l v

Fig. 9. (Left) To be aimed with 50cm, 100cm, 150cm, 200cm, 250cm, 300cm corresponding measurable distribution. (Right) To be aimed with 50cm, 100cm, 150cm, 200cm, 250cm, 300cm corresponding bias (red) and its regression curve (blue) and corresponding variance data (green).

Human leg detect by LRF Covariance of vision process Covariance of LRF process

is positive semidefinite for any cross covariance

3000

2238

(Mahalanobis distance be defined as dissimilarity measure between

µ

i l

and

v r  1 w / 2   v   v  = 1 − w / 2 ω     l 

µv of the same distribution with

the covariance matrix S) If the Mahalanobis Distance is within a threshold T, assume it is the same tracking object then •Fuse LRF feature and vision feature with CI

Po

= [ ωPvv−1 + (1 − ω ) Pll−1

(12)

]−1

µo = ωPo Pvv−1µv + (1 − ω ) Po Pll−1µl •object detection = True End End If object diction =true then •Trigger tracking motion control End

Fig. 11 The structure of the human detector

VI. EXPERIMENTAL RESULTS AND CONCLUSION

V. ROBOT FOLLOWING CONTROL METHOD After the complementary fusion algorithm can accurately estimate human position. We propose a virtual spring model to control the robot. This proposed method is derived from the assumption that the target people and the mobile robot are connected by a virtual spring. Fig. 11 shows the virtual spring model. The vector of the mobile robot is represented by follow. (8) X = [ xr yr θ r ]T The input ( d , φ ) of the virtual spring model are obtained form complementary fusion algorithm, where d represents the distance between robot and target human and φ the angle from

The proposed data fusion using covariance intersection approach is implemented in our robot. Fig.12 shows using single camera to detect and estimate the human’s orientation and distance. The laser range finder also can detect the orientation and distance of human’s legs, shown in Fig.13. Specially, when there are many leg-like disturbers (such as table or chair legs), the pair-legs is difficult to distinguish. So after above processing, we integrate the human information by the vision sensor and laser range sensor to do sensor fusion process – covariance intersection that will enhance the reliability of human’s position information.

the direction of the robot to the virtual spring. Elastic F1 is proportional to d and F2 is proportional to φ . We assume that the free length of the virtual spring is 0 while the robot is successfully tracked. F1 and F2 are defined as follows:

F1 = k1d

Fig. 12. The vision feature of LRF process detects human body. (a) front, (b) flank and (c) back.

(9)

F2 = k2φ

Where, k1 and k 2 are the expansion and bending coefficients of the virtual spring respectively. The dynamic equations are derived as follows:

mv& = − F1 cos φ − F2 sin φ − k3v (10) iω& = ( F1 sin φ + F2 cos φ − k 4ω ) L m is the mass of the robot and i is the moment of inertia of robot. k3 and k 4 are the viscous friction coefficients of

(a)

(b)

(c)

(d)

translation and rotation respectively. Thus, we obtain follows:

v& = −

F1 F k cos φ − 2 sin φ − 3 v m m m

ω& = ( F1 sin φ + F2 cos φ − k 4ω )

Fig. 13. The leg feature of LRF process detects human (a) stand up, (b) walking, (c) sideways and (d) two people.

(11)

L i

vr and vl represent the velocity of the of the right and wheels respectively. w is the distance between the right and left wheels. Finally the robot motion control system can be obtained as follows:

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

Fig. 14 (right) shows the tracking scenario that our robot combined intensity and range data to locate the human by covariance intersection. The result of fusing intensity and range data using covariance intersection is shown in Fig. 14 (left). Fig. 15 shows the distance variation and orientation

2239

variation between robot and target human during the tracking and following. Frame 10 68

Distance (cm)

67 66 65 64 Covariance of LRF Covariance of Vision Covariance of CI Result Mean of LRF Mean of Vision Mean of CI Result

63 62 -4

-3

-2

-1 0 1 Oientation (degree)

2

3

4

Frame 49

76 75

Distance (cm)

74 73 72 71 Covariance of LRF Covariance of Vision Covariance of CI Result Mean of LRF Mean of Vision Mean of CI Result

70 69 -4

-3

-2

-1 0 1 Orientation (degree)

2

3

4

Frame 90

79 78

Distance (cm)

77 76 75 74 73 Covariance of LRF Covariance of Vision Covariance of CI Result Mean of LRF Mean of Vision Mean of CI Result

72 71 -5

0 Orientation (degree)

5

Fig. 14. The robot tracking human by fusing vision feature (blue) and LRF feature (red), and estimate the fused result (black). 120 LRF Orientation LRF Distance Vision Orientation Vision Distance

Distance (red) (cm) Orientation (blue) (degree)

100 80 60 40 20

[1] Prahlad Vadakkepat, Peter Lim, Liyanage C. De Silva, Liu Jing, and Li Li Ling, “Multimodal Approach to Human-Face Detection and Tracking” IEEE transactions on industrial electronics, vol. 55, no. 3, March 2008. [2] A. Utsumi and N. Tetsutani, “Human tracking using multiplecamerabased head appearance modeling,” in Proc. 6th IEEE Int. Conf. Autom. Face Gesture Recog., May 2004, pp. 657–662 [3] P. Viola and M.J. Jones. ”Robust real-time object detection”. In: Proceedings of the IEEE Workshop on Statistical and Theories of Computer Vision, 2001. [4] A. Fod, A. Howard, and M. J. Mataric, “Laser based people tracking,” in Proc. of the IEEE International Conference on Robotics & Automation (ICRA), Washington DC, United States, pp.3024-3029, 2002. [5] Wen Dai, Aysegul Cuhadar, Peter X. Liu, “Robot Tracking Using Vision and Laser Sensors” IEEE Conference on Automation Science and Engineering, Washington DC, August 23-26, 2008. [6] A. Scheidig, S. Mueller, C. Martin, and H. M. Gross, “Generating Persons Movement Trajectories on a Mobile Robot” IEEE International Symposium on Robot and Human Interactive Communication, Hatfield, UK, September 6-8, 2006. [7] X. Ma, C. Hu, X. Dai, K. Qian, “Sensor Integration for Person Tracking and Following with Mobile Robot”, International Conference on Intelligent Robots and Systems, France , August 23-26, 2008. [8] M. Isard, A. Blake, “Visual tracking by stochastic propagation of conditional density”, in Proc. 4th European Conf. Computer Vision, Cambridge, UK, April 1996 [9] R. Merwe, A. Doucet, N. Freitas et al, “The unscented particle filter,”in Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department, August 2000. [10] Terawaki, “Measuring Distance Type Obstacle Detection Sensor PBS03JN Instruction Manual”, Hokuyo Automatic Co., LTD., April 2004, pp. 1-19. [11] R.O.Duda, R.E.Hart. Use of the Hough Transform to Detect Lines and Curves in Pictures, CACM(15). No. 1, January 1972, pp. 11-15. [12] P.V.C. Hough. Method and Means for Recognizing Complex Patterns, US Patent 3,069,654, December 1962. [13] T.S. Michael and T. Quint. Sphere of inuence graphs in general metric spaces. Mathematical and Computer Modelling, 29:45{53, 1994. [14] [Hokuyo Automation, 2005] Hokuyo Automation. Scanning laser range finder for robotics. http://www.hokuyo-aut.jp/products/urg/urg.htm, 2005. [15] M. D. Adams, Sensor Modelling, Design and Data Processing for Autonomous Navigation, Singapore, World Scientific, 1999. [16] S.J. Julier, J.K Uhlmann, “A non-divergent estimation algorithm in the presence of unknown correlations,” American Control Conference, vol.4, pp.2369-2373, June 1997.

0 -20

0

10

20

30

40

50 Frame

60

70

80

90

100

Fig. 15. The distance variation and orientation variation in 100 frames. The target starts moving in frame 10 and frame 53 and stop in frame 49 and frame 90.

We have proposed a method for real-time tracking and following of the human on small size mobile robot. Proved by experiments, it can accurately calculate distance of target human by using single camera. The complementary sensor fusion improves the weakness of measurement from each sensor and reduces the uncertainty on calculation of target person’s position so that it comes out a better reliability. Integrating with the virtual spring model, we successfully implement the function that makes mobile robot real-time track and follow target person. REFERENCES

978-1-4244-4649-0/09/$25.00 ©2009 IEEE

2240

Human Tracking and Following Using Sensor Fusion Approach for ...

Human Tracking and Following Using Sensor Fusion Approach for ...

Suggest Documents

Sensor Fusion for 3D Human Body Tracking with ... - Semantic Scholar

Sensor Fusion Cognition using Belief Filtering for Tracking ... - CiteSeerX

Submarine tracking using multi-sensor fusion and reactive ... - CiteSeerX

Distributed Fusion and Tracking in Multi-Sensor

an approach for tracking wildlife using wireless sensor networks

Relative Vessel Motion Tracking using Sensor Fusion, Aruco Markers ...

Position Tracking Using Sensor Fusion of a Wireless ... - CiteSeerX

Temporal Fusion in Multi-Sensor Target Tracking

A Sensor Fusion Approach for PbD - Institute for Anthropomatics and ...

Skeleton and visual tracking fusion for human following task of service ...

Wireless Sensor Network and RFID Fusion Approach for Mobile Robot ...

Lidar Sensor Fusion for Car-Following on Highways - FU-Berlin

Wireless Sensor Network and RFID Fusion Approach for Mobile Robot

Radar/Lidar Sensor Fusion for Car-Following on Highways - CiteSeerX

Improved Particle Filter in Sensor Fusion for Tracking ... - IEEE Xplore

Visual tracking using Sensor Networks

A Sensor Fusion Method for Tracking Vertical Velocity ...

Multi-Sensor Data Fusion for Hand Tracking using Kinect and Leap ...

Detection and Tracking Using Wireless Sensor ...

Multi-Sensor Data Acquisition, Tracking, Fusion and Intelligent ...

Identifying and Tracking Pedestrians Based on Sensor Fusion ... - MDPI

Sensor Fusion: A Rough Granular Approach

Sensor Fusion for Linear Motors, an Approach for Low-cost ...

A Possibilistic Approach To Sensor Fusion In