Robust Object Tracking with Fuzzy Shape Estimation Jesús García, José M. Molina Universidad Carlos III de Madrid. Departamento de Informática Avenida de la Universidad, 30. Leganés 28911. Spain
[email protected]
Juan A. Besada, Javier I. Portillo, José R. Casar Universidad Politécnica de Madrid E.T.S.I. Telecomunicación Ciudad Universitaria s/n. Madrid 28040. Spain
Abstract – This work presents a novel, efficient and robust approach for object tracking based on sequences of video images. A fuzzy system has been developed to ponder update decisions both for the trajectories and shapes estimated for targets with the set of image regions (blobs) extracted from each frame. Several numeric heuristics, describing the quality of gated groups of blobs and predicted tracks, are considered to generate confidence levels used in the update process. Rules are aimed to generate the most appropriate decisions under different conditions, emulating the reasoned decisions taken by an expert, and have been derived with a systematic analysis of performance. The application area is the Surveillance of airport surface, including situations with very closely spaced objects (aircraft and surface vehicles moving on apron). System performance with real image sequences of representative ground operations is shown at the end.
not demanding any active interaction from the targets to detect them, but lacking identification. One of the most appealing features of video-based systems is that they are potentially capable to provide identification, by means of a tail-number recognition system coordinated with the tracker, and act as non-cooperative sensors, not requiring additional equipment on-board of controlled targets. Besides, cameras can be configured as a set of local installations providing coverage in areas with lacks of detection for other systems (occluded areas in radar, screening for communications in cooperative systems, etc.). This work is centered on the tracking system operating on sequences of video images, analyzing the main problems to be solved and peculiar characteristics taken into account to design the solutions. The general architecture and main blocks integrated were described in [2]. Basically, the system follows a distributed structure, with a local processor operating on the image sequences provided by each camera. Each processor calculates target trajectories (local tracks) in the projected camera plane, performing two steps. First, moving targets are detected against their local background to generate detected pixels, connected then to form image regions referred to as blobs. Blobs are defined with their spatial borders, generally a rectangular box, centroid location and area. Then, the tracker must distinguish all targets in the scene and track their motion, applying association and filtering processes to blobs extracted from the processed images.
Keywords: Video Image Tracking, Data Association, Fuzzy Logic.
1. Introduction A video-based surveillance system offers interesting advantages to be considered when analyzing alternative types of sensor technologies for a certain application. That is the case for automatic surveillance of airport surface, one of the core components of Advanced Surface Movement, Guidance and Control Systems (A-SMGCS [1]). This function is in charge of the automatic detection, identification and tracking of all interesting targets (aircraft and relevant ground vehicles) in the airport movement area, for all-weather conditions. The performance requirements for this function, in terms of accuracy, integrity, global coverage, identification capability, etc., are not accomplished by any single technology, and this function must collect and fuse information from available complementary sensors and information systems.
A basic requirement for system feasibility is the capability to keep high levels of continuity and robustness. A single track per target should appear, avoiding as much as possible discarding real tracks, the generation of random tracks, or mixing estimates from different targets under the operational conditions. Besides, the system is required to process in real time high rates of image sequences (in the order of 5 frames per second), so algorithms are constrained to continuously process considerable amounts of data using modest computation hardware (PCs). The proposed approach presented in this work is based on the knowledge of target shapes and dimensions to extract and correlate the appropriate set of blobs with each track. So, association decisions will be
Conventional sensors for airport surface are usually classified as cooperative, based on the use of on-board equipment in targets and capable to simultaneously provide identification and position, and non-cooperative,
ISIF © 2002
64
Tracking (MHT) [3] or multi-scan approaches [4]. A possible overcome may be the removal of these constraints and enumerate all possible grouping and assignment hypothesis [5]. However, this search could demand excessive computation load to process in real time the sequences, not solving besides some problems such as the assignation of corrupted blobs resulting from targets mixing.
basically based on targets contours of the estimated shapes. The dynamic evolution of these estimations, targets shapes and trajectories, in accordance with the information extracted from images, is the key aspect to guarantee good performance figures regarding continuity. A fuzzy system has been developed to evaluate the confidence given to the information contained both in the gated blobs and predicted tracks, based on a set of numeric heuristics describing the characteristic of these multipleblob-multiple-track association scenarios. In next section, the specific association problems in this application are analyzed, presenting the fuzzy approach to evaluate the confidence levels used to update estimators describing targets shapes and motion parameters. Finally, system output in five example scenarios is presented, indicating the response for complex situations, with real image sequences of representative ground operations.
An all-neighbors approach taking soft decisions, similar to JPDA [6], seems more adequate for this problem, since all blobs potentially gated with each track are used to update it, requiring moderate computation load. Usual formulation of JPDA derives analytical expressions for weighting factors based on statistical residuals between track predictions and candidate updating plots. However, this approach is not directly applicable to the problem with extended targets, since residuals are computed assuming simplified models considering the centroids, predicted positions and certain error distributions.
2. Data association with video images The data association logic is one of the basic aspects determining the system capability to cope with dense, multi-target scenarios. Its design must take into account the characteristics and quality of processed data. In this case, data are the blobs, resulting from the detection subsystem applied on image sequences of airport surface scenes.
It is more adequate to extend the association, using an explicit representation of target shape and dimensions, together with motion estimation, to select the set of updating blobs for each track. The evolution of shape, accordingly to the information obtained from the images sequence, should be decided considering a number of factors to overcome the problems mentioned above. There are not detailed models or analytical expressions to design this process, similar to JPDA, but an analysis of continuity performance with different strategies, depending on numeric heuristics describing the situations, provide robust rules to take appropriate association decisions.
A first problem to be considered is the imperfect image segmentation, due to image irregularities, shadows occlusions, etc., resulting in multiple blobs potentially generated for a single target. This splitting effect may appear randomly, with irregular surface vehicles or shadows attached to real targets, or systematically when some obstacle or target partially occludes other interest targets moving behind. So, blobs must be re-connected by association system, deciding which ones are to be grouped and associated to each target.
3. Problem of target shape estimation After the detection and extraction of blobs belonging to moving targets, the tracking system tries to correlate them with estimated tracks. As mentioned above, association will be performed by means of a correlation mask, predicted to the frame time instant from the last update. This mask conforms to the estimated shape of the target, and will be updated also with the spatial information contained in the blobs.
On the other hand, closely spaced targets lead to overlapping images, appearing some targets partial or totally occluded by other targets or obstacles, so that some blobs can be the result of incorrect segmentations and really represent to several targets. Therefore, each available frame to process presents a set of blob-to-track multi-assignment problems to be solved, where several (or none) blobs may be assigned to the same track and simultaneously several tracks could overlap and share common blobs. A satisfactory trade-off is required to cope with these two types of situations.
Therefore, track-state vectors with position and cinematic estimates (2D location and velocity referred to the camera plane) are complemented with attributes defining a spatial representation of target extension and shape. So, the predicted target contour is used to gate blobs extracted in next frame. For the sake of simplicity, first a rectangular box has been used to represent the target, as indicated in figure 1. Around the predicted position, ( xˆ p , yˆ p ) , a rectangular box is defined, (xmin,
This data association problem is not directly tractable with the usual hypothesis of one-to-one plot-track correspondences, generally assumed by conventional systems, from classic Nearest Neighbor solutions to Bayesian extensions, such as Multiple Hypothesis
xmax, ymin, ymax), with the estimated target dimensions
65
(ˆl H , ˆl V ) . Then, an outer gate, computed with parameters
a certain group, an also to asses confidence in predicted track. They were detailed in [7], and are summarized next:
∆H, ∆V, is used to finally gate the potential blobs updating the track estimates.
Overlapping: it is computed as the fraction of blob area contained within track predicted region. Its values vary from 1, when the blob is completely included within the inner track predicted gate, and 0, when it is out of the outer target gate.
•
Group density and distance to track: this number depends on the ratio, ρ, between areas of detected regions and non-detected areas (holes) in the finally reconnected pseudo-blob. So, when grouped blobs are very scattered, a low value of ρ will indicate different targets probably have originated them. Then, a criterion based in the distance to track is used to finally compute this heuristic, whose values fall from 1, when distance is zero, to ρ, for the most separated blob.
•
Conflict with other tracks: this heuristic evaluates the likelihood of blob being in conflict with other tracks, different from that whose blob-track confidence is being assessed. It is intended to detect the case when trajectories are so close that track gates get overlapped and besides graduate the severity of conflict. Its definition is similar to first heuristic, overlapping, but computed with only with inner gates of the additional tracks. In the case that several tracks are in conflict with the one evaluated, the maximum overlapping degree is selected for this heuristic.
•
Proximity to image borders: finally, image borders are the areas where tracks are usually initialized, and so they are transient areas where tracks are not stabilized yet. This number evaluates if the blob is close to any of the four image borders.
outer target gate
inner target gate
∆V
y m ax
ˆl V
•
yˆ p
track sta te v ecto r (p red icted )
∆H
y m in
xˆ p x m in
G ated b lob s
ˆl H
x m ax
Figure 1. Target segmentation with estimated box This outer gate allows the system track dynamic variations in target shape along the sequence, for targets not perfectly matching to predictions due to variations in projected shape (changes of orientation, distance, etc.), or maneuvers. Besides, it avoids the initialization of tracks around existing ones, potential source of instabilities. The process of shape update with new information should reach a trade-off between the following conflicting requirements: •
it must re-connect the different blobs representing a single target to avoid track-splitting effects. Grouping must adapt to gradual variations in targets sizes and shapes due to changes in distances and orientations of targets.
•
grouping should be limited to avoid the connection of image regions originated by different objects.
•
when different targets approach, it should avoid grouping their image regions, since their tracks can be wrongly updated. As result, some tracks could be discarded and others deformed after including regions from more than one target.
These heuristics provide useful information to be considered when assessing the confidence that may be given to each blob before track update. Additionally, the predicted track may be also characterized with some heuristics, indicating the confidence given to the fact that this track represents motion of a real target, detecting when it is deviated from real trajectory. They are the following:
So, the shape must be dynamically updated with the information contained in blobs, but the changes must be smooth, avoiding instabilities in scenarios with closely spaced objects.
4. Fuzzy shape definition
•
The final weight of gated blobs in the update phase should take into account the aspects mentioned before. Although there is not any closed expression doing that, similar to statistical residuals, some numeric heuristics, computed with simple geometrical analysis of blobs and predicted tracks, have shown to provide helpful indications to be considered. They can be used to assess the confidence given to each blob, after it is included into
•
•
66
Number of missed updates: it is the number of consecutive frames where no blob was included into track inner gate. Track detected area: conversely to blob overlapping heuristic, it is the proportion of area, within predicted inner gate, filled with blobs detected in current frame. Proximity to image borders: this value is equivalent to the one computed for blobs.
4.1 Fuzzy System assessing confidence levels
disjunctions of fuzzy statements about the linguistic variables Lhi, and their consequent fuzzy statements about
Heuristics defined above will be the input to unknown mathematical functions computing certain confidence levels both for blobs and predicted tracks. A rules system based on fuzzy logic has been developed in order to approximate these functions. The rules have been obtained by analysis of conventional tracking systems performance under different conditions, depending on values of heuristics. They could represent the most proper actions to take under a set of particular conditions to guarantee track continuity, emulating the decisions taken by an expert. Fuzzy reasoning techniques may be adopted to reproduce these behavior under the conditions specified in the rules, and besides generate the proper output for all intermediate cases.
LCONF. LCONF is a linguistic variable representing blobs or track confidence levels, with a set of possible values {lα1,…, lαn}. The Mamdani implication [9] has been chosen to assign the meaning to these fuzzy conditional statements: the fuzzy subset of ordered pairs (hi, α), with hi∈Hi and α∈CONF, of the Cartesian product of (lhij x lαk) with degree of membership is given by min(µlhij(hi),µlα (α)). Finally, α is the k
defuzzification of LCONF, and CONF represents its numerical domain (universe of discourse of LCONF). The final aspect to be considered is the inference strategy to manipulate the knowledge contained in the FRA, in order to achieve a global judgment about confidence levels. The compositional rule of inference (CRI), proposed by Zadeh [8], (approximate extension of the familiar rule of modus ponens), serves us as inference mechanism to obtain the fuzzy subset induced in CONF by a fuzzy statement of the form (Lhi is lhir), through each
The first step to build this system should be the selection of adequate descriptions of heuristics and rules relating them with the outputs: confidence levels for blobs and predictions. To better cope with the intrinsic uncertainty that underlies each of the heuristics, their numerical values should be mapped into qualitative symbolic labels, through a fuzzification process [8], transforming them into linguistic variables.
conditional statement of the FRA. That is the fuzzy subset of CONF whose membership function is obtained after max-min product of discretized versions of µlhir(hi) and µ lhijxlαk(hi, α), represented with relational matrices [8]. Since there will be several conditional statements forming the FRA, the meaning of LCONF will be the intersection of the intermediate meanings resulting from each application of the CRI (min of all the induced consequent membership functions). Finally, the adopted defuzzification process on LCONF will be a modified version of the Center of Gravity procedure [9], treating the labels separately to generate a weighted sum.
Each linguistic variable will have a set of possible values, labels, defined each as a fuzzy subset for which the value of the linguistic variable serves as a label. So, a fuzzy subset A of a universe of discourse U is characterized by a membership function, µA:U->[0,1], assigning for each element y of U, a number µA(y) representing the degree of membership of y in A. The operation of fuzzification has the effect of transforming a nonfuzzy set or quantity into a fuzzy set.
In this specific application, there are two implemented systems assessing confidence levels for blobs, αb, and tracks, αp, respectively with five inputs (overlap, density, conflict, blob border, track border) and three inputs (missings, coverage, track border). Each linguistic variable is defined with three fuzzy sets: small (S), medium (M) and large (L) The membership functions for them, and the rules defined (14) are detailed in [8].
Using this concepts, for heuristic hi, a linguistic variable Lhi is introduced, together with its set of values {lhi1,lhi2,...,lhimi}, whose cardinality is mi. Each term lhij in the set, labels a fuzzy subset in the universe of discourse Hi, with membership function µlhij(hi). The fuzzification operation adopted, affecting the precomputed numerical heuristics hi, will result in their transformation into a fuzzy singleton [8], a fuzzy subset whose support is a single point in Hi, with membership function equals to one.
4.2 Shape and trajectory update Target estimated shape will vary very smoothly, accordingly to confidence levels of gated blobs. The estimated position (measured centroid to update track vector) will depend both on these blobs confidence levels, αbi, and on predicted track confidence, αp, in order to avoid losing tracks when they deviate from real trajectory. So, estimated shape (dimensions of box) is the
A fuzzy relational algorithm [8] (FRA) will store the knowledge required to obtain the final confidence level, CONF, both for blobs and tracks involved in each decision. It is composed of a finite set of fuzzy conditional statements of the form IF {Lhi is lhij} THEN {LCONF is lαk}, where their antecedent can be conjunctions and/or
67
most constrained feature, remaining “locked” while the blobs confidence levels are not high enough, while estimated position (where the bounding box is located) will be a trade-off between confidence levels estimated both for blobs and tracks.
conform with the conflict-free blobs (with high confidence levels for association). So, the biases produced by maneuvers are corrected. Track 1 (predicted)
With the rectangular simplification considered, only two shape parameters are estimated: length, width (ˆl H , ˆl V ) . If we consider horizontal coordinate, the two gated blobs with the minimum and maximum extremes for coordinate x, (xbmin, xbmax) are taken into account. Denoting their associated confidence levels, computed by fuzzy system, as α1H, α2H, the minimum and maximum values are obtained: αminH=min[α1H, α2H]; αmaxH=max[α1H, α2H]
blob in conflict conflict-free blobs
Track 2 (predicted) Track 1 (updated)
First, the target horizontal length is updated considering the minimum blob confidence value, αminH: ˆl H [k ] = α minH ( x bmax − x bmin ) + (1 − α minH )ˆl H [k − 1] (1) Track 2 (updated)
So, the estimated target length will be modified only in the case that both blobs have enough confidence. Then, the estimated target bounds (location of box) are updated from the blob with the highest confidence, αmaxH, considering also the value for track confidence, αp. It is required that αp reaches a minimum threshold, Tp, to weight the track prediction with the blob having highest confidence. In other case, track prediction is discarded, and box is positioned aligned with the best blob, in order to avoid track lost when deviation between predictions and detected regions increases. For instance, if left-hand side blob defining vale xbmin had the highest confidence, the estimated target bounds would be updated as follows:
Figure 2. Shape update with conflicts and maneuvers Finally, the measured target centroid used to update the estimated track vector is extracted from the set of blobs gated in the updated track contour, after applying the logic explained above to generate target bounds. To do that, only the portion of blobs within the track box are considered, and they are weighted with their areas, x a = ∑ x i A i , as indicated in figure 3: i
gated blobs centroids
• αp>Tp: xˆ min [k ] = α maxH x bmin + (1 − α maxH )(xˆ min [k − 1]+ vˆ x [k − 1]T ) xˆ max [k ] = xˆ min [k ] + ˆl H [k ]
ˆl V
Blob 1
Area2 Area3
p seudo-blob centroid: x a
Blob 3
• αp