Using social effects to guide tracking in complex scenes ... - CiteSeerX

10 downloads 143 Views 159KB Size Report
tracking of multiple targets in complex scenes. The new method, Motion Parameter Sharing, incorporates social motion information into tracking predictions.
Using social effects to guide tracking in complex scenes A. P. French Computer Science and IT University of Nottingham, UK

A. Naeem Computer Science and IT University of Nottingham, UK

I. L. Dryden School of Mathematical Sciences University of Nottingham, UK

T. P. Pridmore Computer Science and IT University of Nottingham, UK

tracking. Extra effort is required when extending algorithms designed for single targets to multiple targets. The simultaneous application of multiple single target trackers to groups of targets in close proximity results in failure as the trackers very quickly coalesce onto the target(s) producing the best measurement results. Therefore, specialized multiple target tracking systems are required which explicitly model the locations, etc, of all targets. Examples of early work on probabilistic multiple target tracking include the Joint Probabilistic Data Association Filter (JPDAF) [1] and Multiple Hypothesis Tracker (MHT) [2]. Although these algorithms do begin to specifically address the problem of tracking multiple targets, both are prone to error in situations involving significant clutter or multiple targets in close proximity. What is required are algorithms which can take into account the distribution and location of each target relative to the others. An example of an algorithm which begins to address this problem is the interaction function-equipped MCMC particle filter (Khan et al. [3]) which explicitly models the spatial interaction of targets by effectively suppressing measurements which fall within a radius of other target predictions. Essentially this is enforcing the rule that no two targets can occupy the same space. This work has recently been extended to allow for varying numbers of targets [4]. Our work builds on these existing methods, but begins with a significantly different assumption. We believe that although tracking multiple targets does indeed generate certain new problems, it also allows a new abundance of social information to be incorporated into the tracking process. We argue that such socially-rooted motion can and should be used to help guide tracking of groups of social targets (e.g. people, animals) and other targets subject to shared physical constraints on their motion (e.g. cars moving down a road). Previous work has focused on methods of coping with the problems associated with the requirement to track multiple interacting targets; we believe that by taking social motion into account some of these problems may be finessed. The proposed approach to tracking multiple targets will be justified and explained in Section 3, followed by a description of the general framework in Section 4, a discussion of groups and motion sharing in Section 5 and

Abstract This paper presents a new methodology for improving the tracking of multiple targets in complex scenes. The new method, Motion Parameter Sharing, incorporates social motion information into tracking predictions. This is achieved by allowing a tracker to share motion estimates within groups of targets which have previously been moving in a coordinated fashion. The method is intuitive and, as well as aiding the prediction estimates, allows the implicit formation of ‘social groups’ of targets as a side effect of the process. The underlying reasoning and method are presented, as well as a description of how the method fits into the framework of a typical Bayesian tracking system. This is followed by some preliminary results which suggest the method is more accurate and robust than algorithms which do not incorporate the social information available in multiple target scenarios.

1. Introduction Recently, demand has risen for systems which can apply behavioural activity analysis to image sequences showing actions taking place in complex real world scenes. These environments typically involve multiple targets moving through un- or loosely-constrained spaces, such as people moving through shopping centres, railway stations and other potentially crowded public areas. Behavioural analysis aims to model and label individual target and group behaviour; often the motivation is to automatically flag dangerous or suspicious actions. However, a clear pre-requisite to such analysis is the output of a tracker which can track multiple targets well under close proximity and in highly social situations. This paper presents a new tracking methodology which makes use of the available social information that may be present between certain targets in the scene to help improve tracking performance.

2. Approach and previous work Although the incorporation of social motion information into tracking frameworks is a new concept, there does exist a body of literature on general multiple target 1

some experimental results in Section 6. The proposed method will be discussed and set fully in context in the Discussion, Section 7.

An observer might infer a social group of targets based on many, currently unknown, factors. Though measures of proximity may be involved, it is highly likely that members of the group will also come into close proximity to other unrelated, individuals. Instantaneous proximity alone is therefore unlikely to be sufficient to support this kind of analysis. Other factors involved might include the visual appearance of the targets and in particular their relative motion over a representative time period. These features have parallels in the Gestalt laws of organization [5], a set of rules which suggest how a person might see a collective whole rather than a set of disparate components. Target proximity has a direct equivalent in the Gestalt view, where objects close to each other are grouped based on nearness. Visual appearance of targets is represented as similarity of objects in the Gestalt approach, where similar items are grouped together. Targets moving together can be thought of as form some holistic entity (“closure” or “common fate”) and such a coordinated group can also be thought of as being formed based on “simplicity” (having regularity, and smoothness). Any one or a combination of these measures might be valid measures to form groups over, depending on the situation. These authors, however, believe the most general type of inference is that derived when targets exhibiting coordinated motion, i.e. “moving together” for a period of time, are considered to form a social group.

3. The role of social motion When more than one target is present in a scene there is a possibility that they may be influencing each other’s motion. Take, for example, a group of friends walking through a crowded public space. The friends are likely to be moving in generally the same direction, towards a common goal. A human observer might (explicitly or implicitly) group the targets together, and would also be likely to draw inferences about the group, for example that they know each other and are heading to the same place. Now suppose that part of the group is occluded from the observer by some obstruction. If the observer has inferred that the targets form some cohesive group, s/he will be able to estimate with reasonable accuracy where the hidden people are based on the location of their (visible) friends. Even if the path the visible friends take is non-linear, for example if they change direction or stop and chat, it is still a reasonable assumption that the other, hidden members of the group have performed the same action also. Thus, the motion of the occluded friends can be predicted based on how the other members of their inferred social group are moving. This situation is illustrated in Figure 1. It should be stressed that a social group need not move as a rigid structure for this type of inference to be performed. As long as the occluded targets’ motion is related to that of at least one visible target, some estimate of their likely position and motion can be obtained. Note also that targets do not have to be occluded for social motion information to be of value. Some members of the group might exhibit more erratic motion than others, or appear more similar to background objects or unrelated targets, making them harder to track. The presence of motion estimates from related, more reliably tracked targets can only help in these circumstances.

4. Social tracking framework We now show how this new methodology can be integrated into a practical tracking framework. We have termed this new method Motion Parameter Sharing (MPS) because of the way motion parameters are shared between socially coordinated targets. At this juncture it is worth stressing that the methodology can be incorporated into potentially any tracking system which incorporates knowledge of the states of multiple targets. Motion Parameter Sharing operates by allowing pairs of targets, whose state estimates suggest might be members of the same social group, to share motion information. This in turn allows those targets to produce predictions which incorporate hypotheses based on their partners’, as well as their own, current motion estimates. To construct a motion parameter sharing tracker we require a way to group targets between which it is desirable and sensible to share motion information. As stated earlier, it is believed that a sensible first parameter to define groups over is how coordinated the targets’ motions have been over a recent time window. So, targets which have been moving in a similar way could be grouped as being ‘socially coordinated’. This is described more fully in Section 5.

Figure 1: Left: Multiple targets moving through a space. After a period of observation, a human observer might infer groups between targets which have been moving in a similar way (dotted lines indicate group membership) Right: One member of the inferred group has been occluded, but their position can be predicted from the other, visible targets.

2

Any MPS-equipped tracking algorithm must also have a way of predicting a target’s motion based on a mix of internal motion parameters (e.g. velocity) and those parameters shared from the other targets in the same social group. This idea fits nicely into a Bayesian filtering framework, as follows. A Bayesian filter equation for multiple target tracking can be expressed as:

where

X t −1

where

P( X t | X t −1 ) P( X t −1 | Z t −1 )

(1)

P( X t | Z t ) is the posterior distribution at time t.

P( Z t | X t )

represents the probability that a measurement was observed given the state of all the targets X t at time t. The motion model P ( X t | X t −1 ) makes an estimate of the state at time t from the state at time t-1. This motion model can be adapted so that the state prediction incorporates the motions of other coordinated targets, forming a social component to the posterior distribution estimate. As this happens at the motion model stage of the Bayes filter, it can be incorporated in any algorithm which uses the Bayes filter as a foundation (e.g. MCMC multiple target tracking [3], Condensation [6] etc.). It is built into the motion model as follows. Into the motion model is placed a label variable, y, which indicates which target pairs might share motion parameters. This mirrors the mixed state Condensation particle filter ([7]), essentially with the pre-computed state transition matrix replaced with the dynamically computed motion sharing probabilities for each pair of targets. Mixed state Bayesian tracking is described by:

5. Groups and parameter sharing The motion parameter sharing framework allows similarity in any feature(s) to be used to trigger sharing between targets. To date two methods have been used for defining associations between targets, correlating speed and correlating velocity. These measures are then used as indicators of the strengths of social relationships between the targets. Both methods have been built into Markov chain Monte Carlo tracking algorithms. Some results are presented in Section 6. For the first implementation of MPS [8], speed was correlated across all targets over a short time history window. Pearson correlation values were then treated as approximations to probabilities and used to define the transition matrix for a mixed state proposal density, similar to the static transition probabilities which define the motion model in use in mixed state condensation [7]. One of two motion models, one using the targets internal motion parameters and one using the motion parameters of a socially- coordinated other target is selected, based on the strength of the correlation of speed between the two targets over a previous time window. Therefore the motion model in use for a target when generating new particles can be social or non-social, and is represented by the labeling variable y in Equation 3. In the second implementation, velocity was used as the motion type over which to calculate social motion. Two separate velocity proposals were used, one social and one

P( X t | X t −1 ) = P( xt | y t , X t −1 ) P( y t | X t −1 ) (2) where X t now comprises (x, y); x describes the state of the system and y is now a discrete variable labeling the type of motion parameter sharing in operation. Note that this equation makes the Markovian assumption which allows predictions to be made solely from the previous time step. In Motion Parameter Sharing, however, the motion parameter label y is dependant not only on the last time step, but on a previous time window of N frames from which information about all the targets’ motions is used. For example, correlations might be calculated between the targets’ motions. Therefore the MPS equation becomes semi-Markovian, with the y parameter requiring a short time history in its calculation, as shown in equation (3) below:

P( X t | χ t −1 ) ≈ P( xt | y t , X t −1 ) P( yt | χ 't −1 )

represents a complete state history up until

time t-1 and χ ' represents a partial history over the previous N frames. This history approximation is necessary as calculating motion correlations over complete histories is both inappropriate in most situations, and processor intensive. So, a posterior distribution is created which has a mix of both a non-social, internal model of motion and a model which incorporates the motion of other, socially grouped targets. As demonstrated by mixed-state condensation [7], this is neatly achieved within a particle filtering framework; some particles are generated using a given target’s own estimated motion, and some using that of its fellow group members. The mix varies depending on how the groups are formed, and how strongly targets are measured as moving together. Exactly how the social groups are formed, ie. how P( yt | χ 't −1 ) is realized, is flexible and this is one of the key features of the methodology: the manifestation of how exactly groups are defined and motions between them are shared can be changed per implementation. Two possible options are described in Section 5, and results from both these options are presented in Section 6.

P( X t | Z t ) = P( Z t | X t )



χ t −1

(3) 3

not. The choice of proposal depends on the value of an indicator function, hp,q which defines whether a social relationship exists between two targets. The value of this function firstly depends on whether the two targets are moving in approximately the same direction and secondly on how well their speeds are correlated. This effectively filters out targets moving in wildly different directions. Performing a reliable 2-d correlation requires velocity estimates over an infeasibly large time window and so the simpler indicator function method was used. Targets p and q moving in a similar direction and with similar speeds (correlation above a threshold) would then be marked as having potential for sharing motion parameters i.e. hp,q = 1. So the motion proposal step for a new location becomes x ' p ,t = x p ,t −1 + v ' p ,t where x p ,t −1 is the location

6. Experimental results The proposed methodology was first built into a Markov chain Monte Carlo particle filtering framework [3], using speed as the parameter to correlate over. Artificial test sequences were generated so that the tracking could be compared to a ground truth of high accuracy. The sequence consisted of 4 circular targets moving with constant acceleration along a curved path in images of resolution 720x576. Gaussian image noise (σ=4.0) was added independently to each frame, and about 150 distracting background circles were added for the 130frame sequence. This is illustrated in Figure 2.

of target p at time t-1, and the proposal for the velocity of target p with motion parameter sharing from q is: v’p,t = vq,t + εp,t

if

hp,q = 1

v’p,t = vp,t + εp,t

if

hp,q = 0

and (4)

where εp,t ∼ N2(0, σ2vI2), where I2 is the 2x2 identity matrix. This was fitted into the Metropolis-Hastings step of an MCMC particle filter. As described, the criteria for motion parameter sharing (the indicator function h pq ) can vary with the application

Figure 2. Example background and paths of targets used in the artificial data experiments. When used, the occlusion occurs between the X’s, and the perturbation is indicated on another path.

domain. For example, if required a true multi-dimensional correlation or mutual information between parameters could be calculated. For correlations above a defined threshold, the targets might be considered as being a part of the same social motion group. These correlations are calculated over a time history. It has been found that 0.5 to 1 second of motion history can allow sensible groupings in a variety of cases, though clearly this parameter depends on the timescale of the actions taking place and merits further research. It is worth reiterating that the presented methods for calculating motion sharing functions in the motion model are only examples where a set of large possible functions exist. Any probabilistic way of determining how motion parameters are shared can be used in the MPS algorithm. In fact, in the case of MCMC filters these values do not have to be strict probabilities, as is the case with using Pearson coefficients in place of them. Any appropriate indicator function can be employed. It should also be noted that both speed and velocity present advantages and disadvantages when used in MPS as social grouping parameters. Correlating over speed groups targets that are heading towards each other, for example. This might be appropriate in some domains, but in many a common direction of travel might be desirable. Hence the use of velocity in the real world tests in Section 5.

A simple colour distance measure was used as the observation model in all experiments [9]. The Motion Parameter Sharing algorithm was compared to an existing particle filtering algorithm [3]. Each algorithm was run 5 times on the sequence. Results are presented in Table 1. Table 1: RMS errors (pixels) for a particle filter with and without the MPS social motion component, tested on an artificial sequence. Run number 1 2 3 4 5

Ordinary particle filter 0.69 0.71 0.67 0.61 0.69

MPS filter 0.57 0.63 0.62 0.68 0.62

particle

Table 1 shows that on 4 out of 5 runs MPS improved tracking. Inspection of run 4 (where MPS performed worse) reveals, however, that the errors occurred before the social group had been determined, so was not an error caused by MPS itself. These results reveal a slight increase in performance for MPS (t(8)=-2.04, p=0.08). 4

Frm #

both of these algorithms had a comparable execution time of around 20ms on a 1.4GHz PC. A perturbation was added onto one of the paths in the sequence. This simulates one member of the groups performing an avoidance maneuver and is also an example of severe and localized noise. Again, the MPS tracker was compared to the reference particle filter [3]. Results are presented in Table 2.

Sequences with trackers on target at end Average RMS error for correct runs (pixels)

MPS

2.60

0.772

24

10 40

From the first row of Table 2 it can be seen that there is a high level of difference in the robustness performance for the two algorithms, χ2 (1, N=20)=13.3, p=0.0003. An occlusion test was also performed, where one of the targets was removed from the sequence for 30 frames. Results are presented in Table 3.

48

Table 3: Occlusion sequence, RMS errors (pixels) for a particle filter with and without the MPS social motion component, measured over 10 runs

Sequences with trackers on target at end Average RMS error for correct runs (pixels)

MPS

1

Table 2: Perturbation sequence, RMS errors (pixels) for a particle filter with and without the MPS social motion component, measured over 10 runs. Ordinary particle filter 2

MCMC

Ordinary particle filter 0

MPS

-

0.767

57

10 Figure 3. Non-social MCMC and MPS-MCMC tracking 5 pedestrians through a real image sequence obtained from the PETS database.

There is a very high statistical significance in the robustness performances of the two algorithms’ results presented in the first row of Table 3 (χ2 (1, N=20) = 20, p

Suggest Documents