A Robust Approach for Person Localization in Multi-camera Environment

2010 International Conference on Pattern Recognition

A Robust Approach for Person Localization in Multi-camera Environment

Luo Sun, Huijun Di, Linmi Tao and Guangyou Xu Tsinghua National Laboratory for Information Science and Technology Tsinghua University, 100084 Beijing, P.R. China {sunluo00, dhj98}@mails.tsinghua.edu.cn {linmi, xgy-dcs}@tsinghua.edu.cn estimation of dense disparities or 3D feature tracking under articulated motion is hard to be guaranteed. Consequently, it is attractive to accomplish person localization without need of dealing with the point-wise correspondence problem, such as [4]-[7]. In [4], ground location of person is obtained by estimating the cluster centroid of likelihood map on the ground plane that is determined according to color information. Under homographic constraint, feet region of person can be obtained by multiplying the warped multiview foreground likelihood maps in [5], or by computing the intersection of projected multi-view foreground images in [6]. In these methods, either the likelihood calculation based on color information is prone to being unreliable and greatly relies on the assumption on colors of people’s clothes, or feet must be visible in at least two views, which is not always guaranteed especially in indoor environment. Only in [7], the method neither relies on color information nor assumes the visibility of feet, where ground location of person is obtained by projecting the detected principal axes from each view onto the ground and calculating their intersection point. However, it assumes that every camera is oriented such that a perpendicular line in the scene should project to an approximately vertical line in the image, which requires that the camera’s image plane should be perpendicular to the ground plane, or the target person should be sufficiently far away from the camera. This assumption is usually reasonable for outdoor environment. However, it would introduce system error in indoor environment where the assumption cannot be complied well. Inspired by [7], this paper proposes a robust vision-based approach for person localization, which can eliminate the system error but without need for full calibration. We introduce the perpendicular projection of camera’s optical center and explore the multi-view geometric constraints regarding it. By employing the constraints, the localization problem is finally posed as a linear optimizing problem. The proposed approach has several advantages: 1) no assumption on the positions and orientations of cameras except the cameras should have certain common field of view; 2) no assumption on the visibility of particular body part (e.g., feet), except a portion of person should be observed in at least two views; 3) reliability in terms of tolerating occlusion, body posture change and inaccurate motion detection. Our approach can also provide error control regarding localization result and be further extended to measure person height by simple geometric relationship.

Abstract—Person localization is fundamental in human centered computing, since person should be localized before being actively serviced. This paper proposed a robust approach to localize person based on the geometric constraints in multi-camera environment. The proposed algorithm has several advantages: 1) no assumption on the positions and orientations of cameras except the cameras should have certain common field of view; 2) no assumption on the visibility of particular body part (e.g., feet), except a portion of person should be observed in at least two views; 3) reliability in terms of tolerating occlusion, body posture change and inaccurate motion detection. It can also provide error control and be further extended to measure person height. The efficacy of the approach is demonstrated on challenging real-world scenarios. Keywords-person localization; multi-camera

I.

INTRODUCTION

Person localization in indoor environment is important in many applications, such as visual surveillance and ambient assisted living. Recently there has been an increasing research interest toward human-centered computing [1]. The main motivation is that computers should be able to adapt to people rather than vice versa. For instance, in smart home for health care of elderly people, users move freely in the scene and HCI systems should be able to seamlessly interact with them at any site within the scene. In this regard knowing users’ location (i.e., person localization) is a precondition for human-centered computing. There are several ways of localization, including visionbased methods, RFID-based methods, and so on. Visionbased methods are more attractive due to their advantages such as no need to wear sensor. This paper focuses on visionbased methods of person localization which have no restrictions on person’s movement or posture, and are robust in real situations. However, enabling free movement and posture variation gives rise to difficulties caused by occlusion and the complicated articulated motion of human body. In this situation, a logical step is to make use of multiple cameras so as to recover information that might be missed in a particular view. Under multi-camera configuration, point-wise matching based stereo method is one of the possible options for localization, e.g., 3D location of person can be obtained according to the estimated dense disparities [2], or by directly 3D feature tracking [3], etc. However, reliable 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.981

4020 4040 4036

II.

MULTI-VIEW GEOMETRIC CONSTRAINTS

Suppose a planar surface π in space. In the context of localization, π is usually the ground plane where person moves. In this section, when we say “perpendicular”, we mean “perpendicular to π ”. Let’s first define the homography mapping of an arbitrary line in space. Suppose the image plane I of a camera facing π ; and assume the homography H from I to π is known. Let UV denote an arbitrary line in space and uv be the corresponding line on I , as shown in Fig.1. U gVg is the homography mapping of UV on π under the

C1

C2

Q3 C1g 2 Q3g

camera, where U g (or Vg ) is the intersection point of π and the line going through the camera’s optical center C and U (or V ). The plane coordinate of U g (or Vg ) on π can be calculated by directly applying homography H on u (or v ).

Q1

P3

Cg2

Q2

P1

P2 2 Q2g

1 Q1g

Q1g2

Figure 2. Three lines perpendicular to plane and

P3Q3 ( P1 , P2

and

1 Q3g

P3

are on

π

Q21 g

π

, namely

PQ 1 1 , P2 Q2

), are watched by two cameras from

distinct viewpoints.

Figure 1. Homography mapping of an arbitrary line.

In multi-camera environment, it can be proved that the following two constraints regarding perpendicular line remain. Constraint 1 The homography mappings of the same perpendicular line on π under different cameras are concurrent, and the intersection point is the line’s perpendicular projection. For instance in Fig.2, the homography mappings of PQ 1 1 on π under the two cameras intersect at P1 , which is the perpendicular projection of PQ 1 1 . Constraint 2 The homography mappings of perpendicular lines at different locations on π under the same camera are concurrent, and the intersection point is the perpendicular projection of the camera’s optical center. e.g., in Fig.2, the homography mappings of PQ 1 1 , P2 Q2 and P3 Q3 on π under the first camera intersect at C1g , which is the perpendicular projection of the optical center C1 . In [7], Constraint 1 is used to determine the ground location of person, by projecting the principal axes of person in each view onto the ground and calculating their intersection point. By assuming a perpendicular line in the scene is projected to an approximately vertical line in the image plane, the detection of principal axis is reduced to an estimation of the vertical axis’s horizontal position from extracted foreground pixels.

4037 4041 4021

To eliminate the above assumption that is too restricted for indoor environment, Constraint 2 is exploited in this paper, namely to make use of the perpendicular projection of camera’s optical center, which is superposition with the intersection of the lines fitting each projected person region on the ground. As the result, the objective of person localization can be posed as an optimizing problem under the proposed geometric constraints. This is the base of our approach that will be discussed in the following section. It should be noted that, once the homography from a camera’s image plane to the ground plane (used in [7] and can be determined by the algorithm in [8]) is known, the perpendicular projection of camera’s optical center can be obtained automatically by calculating the vanishing point of homography mappings of a perpendicular ruler standing at different locations in the scene. III.

PERSON LOCALIZATION

To obtain person region in the image of each camera, we use background modeling and foreground subtraction. The extracted foreground from each view is then projected onto the ground plane under the corresponding homography. A set of projected foreground pixels are therefore derived. The procedure is shown in Fig.3. Let M denote the number of cameras and N i (i = 1, 2, , M ) denote the number of foreground pixels extracted from the ith camera. All projected foreground pixels on the ground plane are represented by the set F = { p ij , (i = 1, 2, , M ; j = 1, 2, , N i )} , where p ij denotes the jth pixel of projected foreground from the ith camera. We define person location as ground location of the person's perpendicular axis along which the volume of

person distributes symmetrically. Let X denote the random variable that represents person location. The objective of localization is to maximize the posterior probability distribution P( X | F ) given the observation F , namely

x p = arg max P( X | F ) ,

If we define D as square of algebra distance from the

point p ij to the line cig X , equation (5) can be deformed into a least square problem that can be solved by linear algebra. Meanwhile, the error ellipse regarding localization result can be also determined, which would be useful for future probability-based processing. Given the ground location of person, person height can be measured by simple geometric relationship when the head region is correctly extracted in any view. e.g., in Fig.2, the following relationship exists.

(1)

X

where x p is the expected localization result. Using Bayesian rule, we have

P( X | F ) ∝ P( X ) P( F | X ) ,

(2)

where P (X ) is essentially related to the prior knowledge about current person location. Uniform prior is assumed in our approach, suggesting no prior information about person location. Under this assumption, we only need to consider the likelihood item in (2). Assuming projected foreground pixels are independent to each other, we have, C

Ni

i =1

j =1

P ( X | F ) ∝ P ( F | X ) = ∏∏ P ( p ij | X )

1 1 1 1 1 PQ 1 1 = PQ 1 1 g / C g Q1 g C g C ,

where C1g C 1 is the height of the first camera’s optical center that can be determined along with the ground location of the optical center. IV.

(3)

particular projected pixel p ij , the distance to the line cig X can be used to measure how the pixel fits the hypothesized location X . Intuitionally, the smaller the distance is, the better the fitting is. Therefore, we can define P ( p ij | X ) in our approach as, i j

i g

(4) i j

where D( p , c X ) is the distance from the point p to the

line cgi X . Substitute (4) into (3), we have C

Ni

x p = arg min  D( p ij , cgi X ) X

(5)

i =1 j =1

TABLE I.

c

Our approach Scene1(4 cams) Scene2(4 cams) Scene3(3 cams) Scene4(3 cams)

c

x p . c1g , cg2

and

cg3

5.32cm 5.19cm 4.42cm 4.82cm

The approach in [7] 22.81cm 21.33cm 18.58cm 19.27cm

We have also tested the processing capability of our approach on a normal-configured PC and the results are shown in Tab.2. We notice that the processing frame rate of four-camera configuration is roughly 3/4 of that of threecamera configuration and the processing frame rate of highresolution configuration is roughly 1/4 of that of lowresolution configuration. The observation indicates that most of the running time is consumed by per-camera processing, e.g. background modeling, which implies that our

3 g

Figure 3. Projected foregrounds on the ground plane and the expected localization result

MEAN DISTANCE FROM CENTROID TRAJECTORY

1 g

xp cg2

EXPERIMENT

The proposed approach has been examined under several real-world scenarios. One is an indoor environment with a table and a chair inside (Scene1). There are four stationary cameras, monitoring the environment from distinct viewpoints. The cameras are mounted near the ceiling, tilting toward the floor. Fig.4 shows the experiment results of person localization and height measurement in the scenario. It is difficult to directly compare our approach with others’, as different methods have different definition of location as well as geometric constraints. However, the output of most localization methods is motion trajectories of objects. Thus, we manually label the perpendicular projection of body centroid on the ground as ground truth and compare the trajectories obtained by our approach and by [7]. We can find that the trajectory of our approach is much closer to the ground truth than that from [7] where system error exists. We further measure accuracy quantitatively with the mean distance between the localization result and the ground truth under three different indoor scenarios, as shown in Tab.1. Our approach has significant advantage over that in [7] in terms of robustness and accuracy in indoor environment.

As discussed in the previous section, we know that for a

P ( p ij | X ) ∝ exp(− D( p ij , cgi X )) ,

(6)

are the ground locations of the

optical centers’ perpendicular projections.

4038 4042 4022

Figure 4. Experiment results in one scenario. The person trajectories on the ground obtained by our approach and by [7] as well as the manually labeled ground truth are shown in (b), colored with white, gray and blue, respectively. (a) shows person height coinciding with the trajectories. (c) shows the localization procedure at one time instant. The centers of the yellow ellipse (the error ellipse) and the white circle are the localization results determined by our approach and by [7], respectively; and the projected principle axes used in [7] are shown as white lines. The original images and the corresponding foregrounds are shown in (d).

localization which is essentially a linear optimizing problem shows computational efficiency. TABLE II.

Average processing frame rate

PROCESSING SPEED OF OUR APPROACH 3 cams Res:320*240

4 cams Res:320*240

4 cams Res:640*480

12.2fps

8.9fps

2.4fps

REFERENCES [1]

[2]

[3]

V.

CONCLUSION

In this paper, we have proposed a robust vision-based approach for person localization in multi-camera environment. This is achieved by employing the proposed geometric constraints into the probabilistic formulization of localization problem. Real-world experiments have demonstrated robustness, accuracy and computational efficiency of the approach, which can be widely used in human computing applications.

[4]

[5]

[6]

ACKNOWLEDGMENT

[7]

This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 60873266 and 90820304. We are about to be thankful for the thoughtful comments and suggestions of our reviewers.

[8]

4039 4043 4023

A. Jaimes, N. Sebe, and D. Gatica-Perez, “Human-Centered Computing: A Multimedia Perspective,” Proc. ACM International Conf. on Multimedia, ACM Press, pp. 855-864, Oct. 2006. S. Bahadori, L. Iocchi, G. R. Leone, D. Nardi, and L. Scozzafava, “Real-Time People Localization and Tracking through Fixed Stereo Vision,” Applied Intelligence, vol.26, pp. 83-97, Apr. 2007. S. L. Dockstader and A. M. Tekalp, “Multiple Camera Tracking of Interacting and Occluded Human Motion,” Proceedings of the IEEE, vol. 89, pp. 1441-1455, Oct. 2001. A. Mittal and L. S. Davis, “M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene Using RegionBased Stereo,” International Journal of Computer Vision, vol. 51, pp. 189-203, Feb. 2002. S. Khan and M. Shah, “A Multi-view Approach to Tracking People in Crowded Scenes Using a Planar Homography Constraint,” Proc. European Conf. on Computer Vision, pp. 133-146(IV), 2006. S. Park and M. M. Trivedi, “Multi-perspective Video Analysis of Persons and Vehicles for Enhanced Situational Awareness,” Proc. IEEE International Conf. on Intelligence and Security Informatics, pp. 440-451, 2006. W. Hu, M. Hu, T. Tan, J. Lou, and S. Maybank, “Principal AxisBased Correspondence between Multiple Cameras for People Tracking,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, pp. 663-671, Apr. 2006. Richard Hartley and Andrew Zisserman, Multiple View Geometry in Computer Vision, Cambridge University Press, Mar. 2004.

A Robust Approach for Person Localization in Multi-camera Environment

A Robust Approach for Person Localization in Multi-camera Environment

Suggest Documents

A Robust Vehicle Localization Approach Based on

Secure and Robust Localization In A Wireless Ad Hoc Environment

Robust Stereo-Based Person Detection and Tracking for a Person ...

A Comparison of Multicamera Person-Tracking ... - Semantic Scholar

A Person-Centered Approach to Sustaining A Lean Environment-Job ...

Pedestrian Dead Reckoning for Person Localization in a ... - IPIN'2011

A Robust Vector Matching Localization Approach Based on Multiple ...

Person-Environment Fit - CiteSeerX

Robust AGC in a competitive environment

Through-Wall Person Localization Using

A New Atlas Localization Approach for

Work Design as an Approach to Person-Environment Fit - ScienceDirect

The person-environment ï¬t approach to stress: Recurring problems ...

Robust Stereo-Based Person Detection and Tracking for a ... - CiteSeerX

A Mobile Vision System for Robust Multi-Person ... - Computer Vision

A Mobile Vision System for Robust Multi-Person Tracking

A robust self-localization system for a small mobile

A robust Iterative Inverse Filtering approach for

A Robust Optimization Approach for Humanitarian

A New Approach for Robust Design Optimization

A robust approach for identification of cancer

A New Approach for Robust Design Optimization

Multicamera, Multimethod Measurements for ... - MDPI

ROBUST SPEAKER LOCALIZATION USING A MICROPHONE ARRAY

A Robust Approach for Person Localization in Multi-camera Environment