Detection of People Carrying Objects: a Motion-based Recognition ...

269 downloads 0 Views 558KB Size Report
[16] J. Piscopo and J. A. Baley. Kinesiology: the Science of. Movement. John Wiley and Sons, 1st edition, 1981. [17] Y. Song, X. Feng, and P. Perona. Towards ...
Detection of People Carrying Objects : a Motion-based Recognition Approach Chiraz BenAbdelkader and Larry Davis Computer Vision Laboratory University of Maryland College Park, MD 20742 USA chiraz,[email protected]

Abstract

the way it is carried [10, 13]. According to these studies, people carrying a (heavy) object adjust the way they walk in order to minimize their energy expenditure (in fact this is a general concept in gait dynamics that applies to any walking conditions) [9, 12]. Consequently, their cadence tends to be higher and their stride length shorter. Also, the duration of the double-support phase of the gait cycle (i.e. the period of time when both feet are on the ground) tends to be larger for a person carrying an object. Carried objects can be classified into two (non-mutually exclusive) types (1) those that alter the way the person walks (i.e. the biomechanics of gait) due to their sheer weight and/or size, and (2) those that alter the way the person appears because they occlude part of the body when carried. Consequently, there are (at least) two approaches to visual detection of a carried object: we can either determine if the person’s gait is within the normal range (assuming we have a model of ‘normal gait’), or we can characterize the changes in appearance (in terms of its shape or texture) that are indicative of the presence of a carried object. In clinical gait analysis, gait abnormalities are typically detected by measuring certain gait parameters (temporal, kinematic and kinetic) and comparing them with those of a naturally walking person [15]. It is difficult to compute kinematic parameters with current state-of-the-art computer vision, since this requires accurate tracking of body landmarks. Furthermore, although recent work has shown that it is possible to compute stride length robustly from video [19, 4, 2], their estimation error is no smaller than the difference between natural and load-carrying stride lengths (which is typically on the order of only 1-2 cm [10]). The method of this paper takes the second, nonparametric, approach. We formulate two constraints in terms of the spatiotemporal patterns of the binary shape silhouette, that we claim to be satisfied by a naturally-walking person but not a person carrying an object. This method is view-invariant, however it can only detect a carried object that protrudes sufficiently outside the body silhouette. It is

We describe a method to detect instances of a walking person carrying an object seen from a stationary camera. We take a correspondence-free motion-based recognition approach, that exploits known shape and periodicity cues of the human silhouette shape. Specifically, we subdivide the binary silhouette into four horizontal segments, and analyze the temporal behavior of the bounding box width over each segment. We posit that the periodicity and amplitudes of these time series satisfy certain criteria for a natural walking person, and deviations therefrom are an indication that the person might be carrying an object. The method is tested on 41 360x240 color outdoor sequences of people walking and carrying objects at various poses and camera viewpoints. A correct detection rate of 85% and a false alarm rate of 12% are obtained.

1 Introduction An important class of human activities are those involving interactions of people with objects in the scene, such as depositing an object, picking up an object, and the exchange of an object between two people. Given the time intervals during which objects are carried by any one person, we expect that a temporal logical reasoning system will be able to infer events of object pickup, object deposit and object exchange. In this paper, we address the visual processing task of determining these time intervals (during which an object is being carried by a person). Carried object detection is also of interest to person identification applications, since carried objects often alter the person’s the dynamics and/or appearance of a person’s gait, and hence might affect the performance of a gait recognition method. The clinical gait analysis and ergonomics research communities (among others) have studied the effect of loadcarrying on human gait as a function of the load size and 1

robust to segmentation and tracking errors, since it analyzes shape over many frames, unlike a static shape analysis approach for example that would try to detect a ‘bump’ in the silhouette from a single frame. We test the method on 41 outdoor sequences spontaneously recorded in the parking lot of a university building, and achieves a detection rate of 85% and a false alarm rate of 12%. To limit the scope of the problem, we make the following assumptions:

segmentation phase of the object. Another important difference is that we explicitly constrain the location of the object to be either in the arms region and/or legs region, since as noted above, the silhouette signature of the region above the arms are not periodic.

3 Method A walking person is first detected and tracked for frames in the video sequence, then classified as some naturally-walking or object-carrying based on spatiotemporal analysis of the obtained binary silhouettes.



 The camera is stationary.



 The person is walking in upright pose. This is a reasonable assumption for a person carrying an object.

3.1 Foreground Detection and Tracking

 The person walks with a constant velocity for a few seconds.

Since the camera is assumed static, foreground detection is achieved via a non-parametric background modelling technique that is essentially a generalization of the mixed-Gaussian background modelling approach, and is well suited for outdoor scenes in which the background is often not perfectly static (for e.g. occasional movement of tree leaves and grass) [7]. A number of standard morphological cleaning operations are applied to the detected blobs to correct for random noise. Frame-to-frame tracking of a moving object is done via simple overlap of its blob bounding boxes in the current and previous frames.

2 Related Work Analysis and modeling of the human body and/or its motion are the subject of several areas of computer vision, such as action/activity/gecture recognition, pedestrian detection, and gait recognition [3, 6, 11, 4, 17, 1]. The solution approaches to these problems typically fall under one of two categories: structure-based or structure-free. The former assumes the action or gait to be a sequence of static configurations (poses), and recognizes it by mapping features extracted from each frame to a configuration model. The latter characterizes and recovers the motion generated by the action or gait, without reference to the underlying pose of the moving body. Haritaoglu’s Backpack [8] system is the only work we know of that addresses the specific problem of carried object detection for video surveillance applications. Like our method, it uses both shape and motion cues. It first locates significantly protruding regions of the silhouette via static shape symmetry analysis. Each outlier region is then classified as being part of the carried object or of the body based on the periodicity of its vertical silhouette profile. Implicit in this method is the assumption that aperiodic outlier regions correspond to the carried object and periodic regions to the body. This can often fail for a variety of reasons. For example, the axis of symmetry (which is computed as the blob’s major axis) is very sensitive to detection noise, as well as to the size and shape of the carried object itself. Also, using a heuristically-determined threshold to filter out small non-symmetric regions makes this method less robust. Like Backpack, we use a silhouette signature shape feature to capture the periodicity of the human body. A major difference lies in that we analyze both the periodicity and amplitude of these shape features over time to detect the carried object, and only use static shape analysis in the final

3.2 Carried Object Detection Human gait is highly structured both in space and time, due to the bilateral symmetry of the human body and the cyclic coordinated movement patterns of the various body parts, which repeat at the fundamental frequency of walking. A good model for the motion of the legs is a pair of planar pendula oscillating  Ó out of phase [14, 16, 12]. The same can be said about the swinging of the arms [18]. We expect that the presence of a sufficiently large carried object will (at least locally) distort this spatio-temporal structure. In order to capture the differences between natural gait and load-carrying gait, we analyze the temporal behavior of the widths of horizontal segments of the silhouette. Specifically, we formulate two constraints on the periodicity and amplitude of these features, and posit that the violation of these constraints is highly indicative that the person is carrying an object. Consider the subdivision of the silhouette into 4 segments, shown in Figure 1; three equal contiguous horizontal segments over the lower body region, denoted , ,  (bottom segments first), and one segment for the upper body region, denoted . We also define       (i.e. the lower half of the body). We compute the boundary box width over each of the defined segments of the silhouette, for each blob in the se-



2

 

    

(a)

Region U

H/4 H H/6

Region L3

Region L2

H/6

Region L1

H/6

Figure 1. Subdivision of body silhouette into 5 segments for shape feature computation.

 

quence. The time series thus obtained are denoted by Ä  , Ľ  , ľ  , Ä¿   and Í  , corresponding to segments , , ,  and , respectively. Since natural walking gait is characterized by oscillation of the legs and swinging of the arms at the period of gait, we contend that:

         

  

(b)

Figure 2. Computation of gait period via autocorrelation of time series of bounding box width of binary silhouettes.

 Ä   ¼   ¼ (1)  Í   ¼   ¼ (2) where    denotes the fundamental period of a time series, and ¼ the period of walking. The latter is estimated via periodicity analysis of the width of the entire person’s bounding box, i.e.     Ä  Í . For this we use

signed-ranks test (at significance level 0.05), which is a nonparametric test for determining whether the medians of two samples are equal [5].

4 Experiments and Results We tested the method on 41 outdoor sequences taken from various camera viewpoints, and captured at 30 fps and an image size of 360x240. All sequences were recorded without the subjects’ a priori knowledge at the parking lot of a university building (their consent to use the sequences was obtained afterwards). Table 1 summarizes the type of sequences used and the detection results for each category. The detection rate is 85% (35 out of 41) and the false alarm rate is at 11.76% (2 out of 17).

the autocorrelation method which is robust to colored noise and non-linear amplitude modulations, unlike Fourier analysis [4]. We first smooth the signal, piecewise detrend it to account for any depth changes, then compute its autocor and relation  , where is in some interval  is chosen to be sufficiently larger than ¼ . The period of  , denoted , is estimated as the average distance between each two consecutive peaks in  , as illustrated in Figure 2. However ¼ is estimated as or  depending on the camera viewpoint, as explained in [2]. The third and fourth constraints we formulate are an artifact of the pendular-like motion of the arms and legs, and state that for a naturally-walking person:

 











 

  

 Ä   Í   Ľ   ľ   Ä¿    denotes the median of a time

Not Carrying Carrying, upper body Carrying, lower body

Total 17 11 13

Natural-walking

Load-carrying

15 2 2

2 9 11

(3)

Table 1. Carried object detection results on 41 outdoor

(4)

sequences: rows depict the type of sequence, and columns depict our method’s detection result.

series. where These constraints are verified via Wilcoxon’s matched-pairs 3

4.1 Discussion

40 B(t)

30 20 10

In the following, we discuss the results for a few of the sequences. For each example, the left figure shows one frame of the walking person, the top-right figure shows Í   (in blue), Ä   (in green) and   (in red), the center-right figure shows their respective autocorrelation functions (with same colors), and the bottom-right shows Ľ   (in blue), ľ   (in green), Ä¿   (in red), and Í   (in cyan).

   

 

A(r)



 

20

40

60

80

100

120

140

160

180

time

0 −0.5 −1

−60

−40

−20

0 r

20

40

60

40 30 B(t)

 

0 1 0.5

20 10 0

 

0

20

40

60

80

(a)

100

120

140

160

180

(b) 40 30 B(t)

4.1.1 Natural-walking Gait

20 10 0

Figure 3 illustrates a person walking fronto-parallel and a person walking non-fronto-parallel to the camera. All four constraints are satisfied in both cases. Note however that while in the former case both Ä   and Í   have period   ¼, in the latter Í   has period ¼ . This is because the swinging motion of the arm furthest away from the camera is occluded by the body when the person walks at an angle.

 

 



40

60

80

100

120

140

160

time

A(r)

0 −0.5 −1

 

−50

−40

−30

−20

−10

0 r

10

20

30

40

50

40 30 B(t)



20

1 0.5

20 10 0

0

20

40

60

80

100

120

140

160

time

(c)

(d)

Figure 3. Width series and their corresponding autocor-

4.1.2 Carried Object in Lower Body Region

relation functions for a person walking fronto-parallel (a,b), and a person walking non-fronto-parallel (c,d) to camera.

Figure 4 shows five examples in which the carried object resides mostly in the lower body region (i.e. held on the side with one hand). Consequently, Constraint 3 is satisfied in all cases since the object makes the lower region even larger than the upper region, while Constraint 4 is violated in all cases. Constraint 1 is satisfied in all cases, while Constraint 2 is only satisfied by bottom two cases, mainly because the arms hardly swing when holding heavy objects.

the object does not protrude outside the body silhouette. Figure 6 shows four examples of false negatives. In the first two cases, the object is carried in one hand and is not detected because it is too small. In the third case, the carried object was quite large, but did not protrude enough outside the body silhouette. Finally in the fourth case, the object carried on the shoulder in not detected because our method does not analyze the body above the arms region. Note that the person is carrying object in the other hand, but is also not detected because it is too small.

4.1.3 Carried Object in Upper Body Region Figure 4 shows five examples in which the carried object resides mostly in the upper body (with both arms, on shoulder, or on the back). Constraint 3 is violated in the first two cases, because the object makes the upper body appear larger than the lower body. Constraint 4 is only violated in the third case. Constraint 1 is satisfied in all cases, while Constraint 2 is violated in all but the second case, again because the arms hardly swing when holding an object.

5 Conclusions and Future Work We have described a novel method for determining whether a person in carrying an object in monocular sequences seen from a stationary camera. This is achieved via temporal correspondence-free analysis of binary shape features, that exploits the periodic and pendular-like motion of legs and arms. The method is view-invariant and is robust to segmentation and tracking errors. It achieves a detection rate of 85% and a false alarm rate of 12% when tested on 41 mostly non-fronto-parallel video sequences. One way we are working to extend this method is by deducing from the current time series analysis the body region where the object is located, to be able to segment it and possibly infer its type.

4.1.4 False Alarms and False Negatives False alarms, i.e. falsely detecting a carried object, occur when any of the four constraints is violated not because the person is carrying an object, but due to some other reason, such as image noise, segmentation errors, and fluffy clothes. The two false alarms in our experiments were both caused by background subtraction errors (the color of the person’s clothes was very similar to the background’s). False negatives, i.e. failure of our method to detect that a person is actually carrying an object, typically occur when 4

40

30

30 B(t)

B(t)

40

20

20

10

10

0

50

100

150

200

250

0

time

1

A(r)

A(r)

−0.5

60

80

100

120

140

160

180

200

time

0 −0.5

−1

−80

−60

−40

−20

0 r

20

40

60

−1

80

40

40

30

30 B(t)

B(t)

40

0.5

0

20

−60

−40

−20

0 r

20

40

60

20

10 0

20

1

0.5

10

0

50

100

150

200

0

250

0

20

40

60

80

100

120

time

(a)

140

160

180

200

time

(b)

(a)

(b) 40 30 B(t)

B(t)

30 20

20

10

10 0

50

100

150

0

time

1

40

60

80

100

120

140

160

time

0.5 A(r)

A(r)

20

1

0.5 0 −0.5

0 −0.5

−1

−40

−30

−20

−10

0 r

10

20

30

−1

40

40

−50

−40

−30

−20

−10

0 r

10

20

30

40

50

60

30 B(t)

40 B(t)

20

20

10 0

0

20

40

60

80

100

120

140

0

160

0

20

40

60

80

100

time

(c)

(d)

(c)

160

30 20

B(t)

B(t)

140

(d)

30 20 10

10 0

20

40

60

80

100

120

140

160

180

0

50

100

150

time

1

200

250

300

time

1

0.5 A(r)

120

time

0.5 A(r)

0 −0.5

0 −0.5

−1 −60

−40

−20

0 r

20

40

60

−1

−100

−80

−60

−40

−20

0 r

40

20

40

60

80

100

40 30 B(t)

30 B(t)

20

20

10

10 0

0

20

40

60

80

100

120

140

160

180

0

time

0

50

100

150

200

250

300

time

(e)

(f)

(e)

(f)

B(t)

40 30 B(t)

30 20

20 10

10 0

50

100

150

200 0

20

40

60

80

100

120

140

160

180

time

1

time

1

0.5 A(r)

0.5 A(r)

0 −0.5

0 −0.5

−1

−60

−40

−20

0 r

20

40

60

−1 −60

50

−40

−20

0 r

20

40

60

40 30

30 B(t)

B(t)

40

20 10 0

20 10

0

50

100

150

0

200

0

20

40

60

80

100

time

(g)

120

140

160

180

time

(h)

(g)

(h)

25 30 B(t)

B(t)

20 15

20

10

10

5 0

50

100

150

200

250

0

300

time

1

A(r)

A(r)

0

−0.5

−1

−1

−100

−80

−60

−40

−20

0 r

20

40

60

80

30

40

50

60

40

15

30

10

80

90

100

−20

−15

−10

−5

0 r

5

10

15

20

90

100

20 10

5

0

0

50

100

150

200

250

300

0

10

20

30

40

50

60

70

80

time

time

(i)

70

time

100

20

B(t)

B(t)

20

0

−0.5

0

10

1 0.5

0.5

(i)

(j)

(j)

Figure 4. Width series and their corresponding autocor-

Figure 5. Width series and their corresponding autocor-

relation functions for cases when carried objects reside in lower body region.

relation functions for cases when carried objects reside in upper body region.

5

110

Acknowledgment The authors would like to thank Harsh Nanda for collection of video data, Ahmed Elgammal for providing background subtraction code, and to Ross Cutler from Microsoft Research for providing code and references for periodicity analysis. The support of the National Institute of Justice (FAS No. 01529393) is also gratefully acknowledged.

B(t)

30 20 10 0

50

100

150

200

time

1

A(r)

0.5 0 −0.5 −1

−60

−40

−20

0 r

20

40

60

40

References

B(t)

30 20 10 0

0

50

100

150

200

time

[1] C. BenAbdelkader. Gait as a biometric for person identification in video sequences. Technical Report 4289, University of Maryland College Park, 2001. [2] C. BenAbdelkader, R. Cutler, and L. Davis. Eigengait: A performance analysis with different camera viewpoints and variable clothing. In FGR, 2002. [3] L. W. Campbell and A. Bobick. Recognition of human body motion using phase space constraints. In ICCV, 1995. [4] R. Cutler and L. Davis. Robust real-time periodic motion detection, analysis and applications. PAMI, 13(2), 2000. [5] W. W. Daniel. Applied Non-parametric Statistics. PWSKENT Publishing Company, 1978. [6] J. W. Davis and A. F. Bobick. The representation and recognition of action using temporal templates. In CVPR, 1997. [7] A. Elgammal, D. Harwood, and L. Davis. Non-parametric model for background subtraction. In ICCV, 2000. [8] I. Haritaoglu, R. Cutler, D. Harwood, and L. Davis. Backpack: Detection of people carrying objects using silhouettes. CVIU, 6(3), 2001. [9] V. Inman, H. J. Ralston, and F. Todd. Human Walking. Williams and Wilkins, 1981. [10] H. Kinoshita. Effects of different loads and carrying systems on selected biomechanical parameters describing walking gait. Ergonomics, 28(9), 1985. [11] J. Little and J. Boyd. Recognizing people by their gait: the shape of motion. Videre, 1(2), 1998. [12] K. Luttgens and K. Wells. Kinesiology: Scientific Basis of Human Motion. Saunders College Publishing, 7th edition, 1982. [13] P. Martin and R. Nelson. The effect of carried loads on the walking patterns of men and women. Ergonomics, 29(10), 1986. [14] T. A. McMahon. Muscles, Reflexes, and Locomotion. Princeton University Press, 1984. [15] J. Perry. Gait Analysis: Normal and Pathological Function. SLACK Inc., 1992. [16] J. Piscopo and J. A. Baley. Kinesiology: the Science of Movement. John Wiley and Sons, 1st edition, 1981. [17] Y. Song, X. Feng, and P. Perona. Towards detection of human motion. In CVPR, 2000. [18] D. Webb, R. H. Tuttle, and M. Baksh. Pendular activity of human upper limbs during slow and normal walking. American Journal of Physical Anthropology, 93, 1994. [19] S. Yasutomi and H. Mori. A method for discriminating pedestrians based on rythm. In IEEE/RSG Intl Conf. on Intelligent Robots and Systems, 1994.

(a)

(b)

B(t)

30 20 10 0

20

40

60

80

100

120

140

160

180

200

time

1

A(r)

0.5 0 −0.5 −1

−60

−40

−20

0 r

20

40

60

40

B(t)

30 20 10 0

0

20

40

60

80

100

120

140

160

180

200

time

(c)

(d) 40

B(t)

30 20 10 0

20

40

60

80

100

120

140

160

time

1

A(r)

0.5 0 −0.5 −1

−50

−40

−30

−20

−10

0 r

10

20

30

40

50

40

B(t)

30 20 10 0

0

20

40

60

80

100

120

140

160

time

(e)

(f) 40 B(t)

30 20 10 0

20

40

60

80

100

120

140

160

180

time

1

A(r)

0.5 0 −0.5 −1

−60

−40

−20

0 r

20

40

60

40

B(t)

30 20 10 0

(g)

0

20

40

60

80

100

120

140

160

180

(h)

Figure 6. Width series and their corresponding autocorrelation functions for false negatives, i.e. where carried object is not detected.

6

Suggest Documents