Oceanic Engineering, IEEE Journal of - CiteSeerX

IEEE JOURNAL OF OCEANIC ENGINEERING, VOL. 22, NO. 4, OCTOBER 1997

625

Curved Shape Reconstruction Using Multiple Hypothesis Tracking Bradley A. Moran, Member, IEEE, John J. Leonard, and Chryssostomos Chryssostomidis

Abstract— Panoramic sweeps produced by a scanning range sensor often defy interpretation using conventional line-of-sight models, particularly when the environment contains curved, specularly reflective surfaces. Combining multiple scans from different vantage points provides geometric constraints necessary to solve this problem, but not without introducing new difficulties. Existing multiple scan implementations, for the most part, ignore the data correspondence issue. The multiple hypothesis tracking (MHT) algorithm explicitly deals with data correspondence. Given canonical observations extracted from raw scans, the MHT applies multiple behavior models to explain their evolution from one scan to the next. This technique identifies different topological features in the world to which it assigns the corresponding measurements. We apply the algorithm to real sonar scans generated specifically for this investigation. The experiments consist of interrogating a variety of two-dimensional prismatic objects, standing on end in a 1.2-m-deep freshwater tank, from multiple vantage points using a 1.25-MHz profiling sonar system. The results reflect the validity of the algorithm under the initial assumptions and its gradual performance degradation when these assumptions fail to characterize the environment adequately. We close with recommendations that detail extending the approach to handle more natural underwater settings. Index Terms— Data association, multitarget tracking, shape reconstruction, sonar.

I. INTRODUCTION

T

HE field of tracking and data association was born out of a need to process the vast amount of information generated by the highly sensitive sensors found in modern surveillance systems. In air and naval defense and in air traffic control, for example, radar and sonar arrays generate huge data loads when exposed to multiple targets, decoys, and clutter. Each of these cases contains the need to establish correspondence between known targets and observations, to provide target state estimates that evolve in agreement with the observed data, and to explain the remaining observations.

Manuscript received August 27, 1995; revised June 15, 1997. This work was supported in part by the Advanced Research Projects Agency under Contract MDA 972-88-C-0040. B. A. Moran is with the Sea Grant Autonomous Underwater Vehicles Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. J. J. Leonard is with the Department of Ocean Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. C. Chryssostomidis is with the Sea Grant College Program, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. He is also with the Department of Ocean Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA. Publisher Item Identifier S 0364-9059(97)06431-5.

Fig. 1. An example to illustrate multitarget tracking with clutter. The squares represent the state estimates at time k for targets T1 and T2 . The arrows based on these and uncertainty ellipses indicate predictions at time k estimates. Observation z3 confuses the assignment of z1 and z2 , making the simplest correspondence (each prediction with its nearest measurement) possibly erroneous.

+1

Establishing the correspondence between an observation and a target requires a means by which to assess the quality of the assignment—a comparison between the actual values and the expected values. Predicting the expected values in an observation, on the other hand, requires a target state estimate based on at least one previously assigned measurement. The inaccuracy and uncertainty of each observation render difficult these mutually dependent tasks. Multitarget tracking simultaneously addresses these issues—resolving the inaccuracy in the values of measurements concurrently with the uncertainty in the origins of measurements. Surveillance applications typically implement tracking as a discrete time recursive filter, incrementally updated each time the sensor generates a new scan. To illustrate the concept, Fig. 1 shows the time estimates for two confirmed targets, and . These estimates reflect all previously available and current information, denoted . At time a sensor reports three distinct observations, , each with unknown origin. Predictions at time , based on the previous state estimates, provide a basis from which to compare the observations. The uncertainty in each prediction determines the size and orientation of its corresponding elliptical validation gate. In the absence of , the most natural partitioning assigns measurements and to targets and . Enter and the obvious solution vanishes. Three equally likely explanations now exist for each measurement: new target, confirmed target, and false alarm.

0364–9059/97$10.00  1997 IEEE

626


(a)

Fig. 3. Canonical observations corresponding to Fig. 2. These RCD’s result from a narrow-kernel median filter (three points) and extraction parameters 1 cm and w 3.6 .

=

(b) Fig. 2. A typical scan. (a) The raw returns shown using simple line-of-sight inversion. The unfilled triangle represents the location of the sonar. (b) The environment that corresponds to the scan contains an 18-cm-diameter aluminum cylinder and a flooded triangle with thin aluminum walls, standing on end in a 1.3-m-deep freshwater tank. The far (concrete) wall of the tank, rough relative to the wavelength of the 1.25-MHz sensor, contrasts with the (glass) windows in the foreground.

A. Canonical Observations As in surveillance applications, our sensor scans an area and generates target observations that evolve from one scan to the next. We define targets as the points on the surface of an object that satisfy the critical alignment conditions of isolated targets discussed in [1]. As a result of knife-edge diffraction, sharp corners satisfy these conditions. Points on a smooth face where the normal vector coincides with the sensor axis also produce isolated returns due to specular reflection. Fig. 2 shows a typical scan from the sensor taken in the MIT Testing Tank facility. The rear wall of the tank contradicts the isolated target assumption because its surface is rough relative to the 1.2-mm wavelength of the sensor. In contrast, the glass observation windows of the tank are smooth, i.e., specular reflectors. The two objects in the tank are a triangular prism, made by folding aluminum sheets, and an aluminum cylinder with thick walls. Isolated targets allow us to take advantage of the physical sonar models discussed by Kuc [2], Leonard [3], and others. Observations show that due to the finite beamwidth of acoustic propagation, consecutive returns from a scanning sensor ap-

=

pear quite similar in range, hence the term regions of constant depth (RCD’s). Fig. 3 shows the RCD’s extracted from the scan in Fig. 2. To account for minor variations in the adjacent returns, we introduce a tolerance parameter for making comparisons. A sequence of consecutive returns that differ from adjacent neighbors by less than constitute an RCD. In practice, we also introduce an angular parameter that acts as a threshold criterion on the minimum width an RCD must span. An RCD provides two pieces of information: range and bearing. The minimum range of its member returns defines the net range. The angular centroid of the individual returns whose range equals the net range define the net bearing angle.

II. DATA ASSOCIATION TECHNIQUES Fortmann et al. [4] offer the joint probabilistic data association filter (JPDAF) for tracking a known number of targets in the presence of spurious observations. The principal drawback with this approach is that it never explicitly attempts to explain the origin of the observations, partly because the JPDAF lacks a method for track initiation. It is a targetoriented filter which seeks to maintain the tracks, assumed to exist already, corresponding to a known number of targets. The joint likelihood track formation method, from Morefield [5], offers a potential solution to this issue by relaxing the condition that the number of targets is known. The multiple hypothesis tracking (MHT) filter offers an alternative approach that combines a Bayesian framework for calculating the probability of associations with an ability to initiate new tracks. Reid [6] pioneered MHT for tracking point targets in the far field, e.g., radar tracking of aircraft. The MHT technique shows promise in a variety of applications, such as surveillance, air traffic control, collision avoidance, and navigation. The MHT distinguishes itself from the previous examples by being a measurement-based approach. Rather than traverse the list of targets to update each state estimate, it attempts to explain the origin of each measurement. To do so, it chooses from three possible explanations: an already known target for

MORAN et al.: CURVED SHAPE RECONSTRUCTION USING MULTIPLE HYPOTHESIS TRACKING

which a state estimate exists, a previously undetected target, or a false alarm. Perhaps its most important feature is that it defers making assignment decisions until later iterations of the filter. Rather than rely solely on current observations to make decisions, it also looks into the future. In essence, it selects the most likely assignment based on how subsequent measurements reinforce certain data associations and refute others. Although the MHT algorithm seldom appears in the literature, several important references exist. Mori et al. [7] thoroughly review the mathematics of the MHT, concurrently relaxing the original requirement that target distribution densities are known a priori. Kurien [8] addresses practical aspects of its efficiency for purposes of developing real-time implementations.

627

Fig. 4. As the sensor moves from one location to another, the RCD’s it produces evolve such that their arcs mutually intersect at the corner. This behavior suggests that we track the observations using a vertex model that explicitly represents its location.

III. THE MODEL-BASED MHT FILTER In Reid’s [6] original implementation, the same dynamic model applies to all targets. Cox and Leonard [9] extend the MHT to a broader class of applications by allowing multiple behavior models, hence the model-based MHT. This more general tracking filter works for different types of targets as well as for different behaviors applied to the same type of target. Used to interpret air sonar measurements in land robotics, this approach offers the most promise for our application. The implementation described herein results from a combined effort. Cox and Miller [10] have developed a public domain body of source code which handles the multiple hypothesis portion of the MHT technique. Their framework, the MHT kernel, provides the generic methods shared by all model-based MHT applications: generating continuation hypotheses, propagating likelihoods from parent to child hypotheses, and pruning the hypothesis and track trees. We have supplied the MHT kernel with our own target models and estimation techniques. The methods developed locally for our implementation include: • track initiation to calculate the initial state estimate used to start a track from an observation; • measurement validation to decide whether a particular measurement either might belong or does not belong to a specific track; • track updating to detail how a model updates the state estimates of a target within a single behavior and how it generates estimates across differing behaviors; • likelihood calculation to assesses the quality of the hypothesis that a measurement belongs to a track. Much of the complexity of the algorithm lies in the third point, later clarified in this section. A. Target Model Representations The objects used in our experiments later described in Section IV contain two types of geometric features: vertices and faces.1 Vertices correspond to the sharp edges of our 1 We borrow from geometric modeling the terms face and vertex to describe parts of an object [11]. Even though we have implemented these algorithms in two dimensions, we wish to establish a generic approach which readily extends to a full three-dimensional (3-D) treatment of the measurements.

objects and faces correspond to the smooth surfaces between them, whether straight or curved. Two types of features imply the need for at least two target behavior models. We have instead developed three different behaviors: one for vertices and two for faces. (Our ability to handle the latter improves with the inclusion of two distinct behaviors because a face can be planar or curved.) For each target model, we have a different measurement model used to predict observations. As the canonical observation contains a bearing angle and a range , we find each of the three measurement models share the form (1) in which the parameters and depend on both time and the true state of the feature . The vector represents random noise which corrupts the measurements. 1) The Vertex Feature: As the sensor moves past the sharp corner of an object, we observe that the corresponding RCD’s evolve such that their circular arcs remain in contact with the corner, as shown in Fig. 4. This behavior suggests a target model that exploits the geometric constraints of mutual intersection. The appropriate representation consistent with this constraint models the geometry of a vertex target with its Cartesian coordinates, . The model chosen globally represents the geometry of a sharp corner because it describes the state of the feature in toto. To predict the bearing and range values for observation made of vertex from sensor location , we apply the measurement function , given by

(2)

which instantiates the general form given in (1). Note that we must exercise particular care when establishing the quadrant of the bearing angle when predicting an observation.2 2 Most computer languages offer a quadrant sensitive method. C and C++, for example, have the math library function atan2(y, x).

628


(a)

(a)

(b) Fig. 5. Observations of smooth faces. The point of contact between an RCD and a face moves along with the sensor. It traces the boundary of the object, remaining tangent to both (a) planar and (b) curved faces.

2) The Face Feature: Unlike the rather straightforward vertex, whose contact point with an RCD remains stationary, the contact or ground point of a face measurement moves as the sensor location changes. The ground point shadows the sensor, with the RCD arc remaining tangent to the face, as shown for both planar and curved faces in Fig. 5. The combination of specular reflection and sensor beamwidth indicate that we should expect this kind of behavior when observing a face. It suggests we use estimates of the local geometry of the face to track its observations, because the tangent to a smooth face is a differential property, independent of the global shape of a curved face. The technique parallels the practice found in computational geometry of reconstructing a space curve from its derivatives using integration. We can trace out the shape of a curve exactly, given initial conditions and continuous knowledge of its differential properties (tangent, curvature, and higher order intrinsics) [12]. Our approach applies this idea using approximations. Given a starting point that depends on the first observation, we step along the shape of the face using the local estimates that depend on subsequent observations. Two target models, named plane and sculpt, estimate the local shape of a face, as shown in Fig. 6 with their accompanying equations of measurement . The state estimate for the plane model, so named because it estimates the two-dimensional (2, an D) tangent plane, contains two parameters, angle of orientation and a distance from the origin. Fig. 6(a) demonstrates how quickly the tangent plane deviates from the true shape of a curved face—the higher the curvature, the less accurate the predictive capabilities. As a result of this shortcoming, we also implement the sculpt model, named for its capacity to handle free-form sculpted shape. This higher , incorporates a circular order state estimate, and radius of curvature , arc with central coordinates

(b) Fig. 6. Face model primitive states: (a) plane (; ) and (b) sculpt (; ; ). The measurement model h assumes a sensor located at (x; y ). We add to the bearing angles when taking the negative branch of the absolute values to resolve the ambiguity derived from the relative sensor location (near or far side of a plane, convex, or concave sculpt).

m

to represent the local shape of a face. It predicts observations more accurately than a plane because it more closely resembles the underlying shape of a highly curved face, as demonstrated by Fig. 6(b). B. Measurement Validation Though not strictly necessary, measurement validation makes the tracking filter more efficient by limiting the number of association hypotheses generated. It only attempts those associations that meet a minimum acceptance criterion. In this sense, measurement validation acts as a threshold that prevents the filter from wasting computational resources by considering highly unlikely data-to-target assignments. For each new scan, the filter uses the available state estimates from the previous cycle to predict the observations that should have occurred. It applies the appropriate measurement model for each target estimate in the existing tracks. Since an exact match never occurs, the filter builds a tolerance region around the prediction to allow for disagreement. The shape and orientation of this region depend on the uncertainty in the state estimate and the nominal measurement uncertainty. An


TABLE I PROBABILITY MASS IN VARIOUS VALIDATION GATES

observation that lies within the region is sufficiently close to its expected value to be feasible, resulting in the generation of an association hypothesis for the target being considered. Using the notation from Bar-Shalom and Fortmann [13], we now detail how to validate a particular observation using the target state estimate given by . The filter first makes a state prediction to describe any changes that may have occurred between time , when the estimate was calculated, and the current time . Given the known location of the sensor, it predicts the expected observation (3) by applying the measurement model to the state prediction. The shape of the tolerance region around this prediction follows from the innovation covariance (4) of the measurement The Jacobian matrix function maps the uncertainty of the state prediction into measurement space to be combined with the measurement covariance . Indicating the level of agreement between the actual and predicted observations, the innovation (5) specifies the absolute difference between the two. When normalized by its covariance , the innovation establishes a statistical distance to assess the likelihood that the measurement belongs to the target. To wit, the inequality (6) establishes a validation gate, whose size follows from the value of the threshold parameter , akin to the number of standard deviations. Observations that satisfy the inequality all fall within an ellipsoid in measurement space centered on the predicted value of the measurement. Table I gives the probability mass contained inside validation gates for various . These values result from the chi-square distribution of the innovation covariance (with number of degrees of freedom identical to the dimensionality of the measurements). C. Generating Hypotheses The MHT kernel generates hypotheses that enumerate all possible explanations for each of the observations in every new scan [10]. From the point of view of explaining observations, the following hypotheses are possible. • Track Initiation: each observation has the opportunity to generate a start hypothesis.

629

• Track Continuation: the tracks in which the observation validates must be updated. • False Alarm: spurious observations sometimes occur as a result of random clutter in the environment. The last of these becomes the default explanation when no other association emerges with sufficiently high likelihood because we would rather ignore a measurement than incorrectly assign it to a target. Three different track continuation hypotheses offer all possible explanations of the measurements: 1) a track may receive a skip hypothesis to provide for track continuation in the absence of new information; 2) a track may receive an end hypothesis to explain the possibility that a target ceases to exist (disabled for the experiments in Section IV); and 3) a track may receive assignment hypotheses which require an update for each state estimate. In general, the provisions for deleting tracks and for tracks that skip measurements make the filter more robust to real data, for they allow it to respond uniformly when the sensing system fails to detect an existing target. Fig. 7 illustrates with a block diagram the structure in which this enumeration occurs. An important point merits attention. Though the face model generates two track continuation hypotheses for each valid observation, our implementation of the MHT filter applies a feasibility check after calculating the state estimates. Conditions occasionally arise in the data which make the estimate calculations numerically ill-conditioned, an infinite radius sculpt for example. Should this occur, the filter removes the track continuation hypothesis from the newly generated set. We devote more attention to the specifics of this test in our discussion of the state estimate calculations later in this section. 1) The Start Hypothesis: As previously indicated, each observation initiates a new track for both classes of geometric features: sharp corner and smooth face. Ideally, a single observation would uniquely initialize all independent parameters in each of the different target models. A problem arises when the state vector contains more degrees of freedom than can be satisfied by one measurement. Consider tracking a point target, for example, using a sensor that measures only the bearing angle. A single observation constrains the location of the point to lie along the sensor axis, but it provides no unique solution for the full state vector. Our approach limits the track initiation hypotheses to only those behavior models which allow full state initialization from the available information in a single observation. The vertex and plane target models both meet this criterion, but the sculpt model does not. The sculpt behavior emerges from a previously initialized plane behavior after the second validated observation has occurred. For the vertex and plane target models, we create an initial state estimate by applying an inverse measurement function to the initial observation (7) which relies on the assumption that the measurement noise averages out, .

630


about their expected value, the uncertainty becomes (11)

Fig. 7. The model-based MHT filter. Each track has an opportunity to validate the observations in the current scan ( + 1). The filter generates a continue hypothesis for each match. In addition, the filter generates both false alarm and track initiation hypotheses for all observations and skip hypotheses for all existing tracks. (All child hypotheses are generated inside of the dashed box.) After calculating the likelihoods for the new set of hypotheses, it finally makes a firm decision regarding the observations -scans back in time.

Zk

N

For a sensor located at that records observation , we have vertex and plane state estimates (8) (9) Unlike its corresponding measurement prediction that occasionally adds to the calculated bearing angle due to the restriction that , the plane model initializes unambiguously because we allow both positive and negative values for . The uncertainty of the estimate is its mean squared error. To calculate the uncertainty in the initial target state estimate , we develop an idea from Smith et al. [14]. Expanding the inverse measurement model in a truncated Taylor series (10) , the error in the estimate takes with Jacobian the value . Relying again on the assumption that the measurements are normally distributed

in which we recognize the measurement covariance . Beyond initial state and uncertainty estimates, a track start hypothesis requires an initial likelihood of being true. One approach, quite common in the limited body of MHT literature, models new features, and false alarms as either poisson or uniform distributions. It then combines these distributions with the detection probability for the sensor to arrive at the probability of the assignment event [6], [15]. The specifics of the particular distribution chosen, in general, depend on both environmental factors and the performance characteristics of the sensor system. In theory, we can measure these parameters and arrive at a reasonable value for the initial probability. Faced with the absence of a reasonable estimate for such distributions, we instead strive for a very simple treatment of initial likelihood. We assign constant and uniform values to the likelihood of new features and false alarms such that the possible classifications are collectively exhaustive. Since we have two different types of features, we assign to each new feature hypothesis an initial likelihood of 0.33 and to the false alarm hypothesis a likelihood of 0.34. Because we would rather discard an observation than incorrectly classify it, a false alarm has slightly higher likelihood than a new feature. Subsequent observations of actual features increase the likelihood of the new feature tracks, whereas the likelihood of a false alarm assignment remains static. If none of the subsequent observations validates in any of the feature tracks, the false alarm assignment dominates. D. Recursive Likelihood Calculation The model-based MHT considers a vast multitude of possible assignment hypotheses by growing a hierarchical tree of assignments to enumerate the possible explanations for the observations made. Unfortunately, the relentless branching in the assignment tree which occurs at each time step causes exponential complexity in the algorithm. The need arises to prune the tree, in order to ensure the computational tractability of the filter. The pruning strategies described later in this section rely on having an estimate for likelihood of each enumerated hypothesis. Let the notation indicate an arbitrary assignment event for the current scan . It associates some new measurements with established tracks, some with new targets, and some with false alarms. We construct a new association history at time for this particular assignment as the joint event (12) recursively defined in terms of a time association history, (parent hypothesis). Our goal is to approximate the likelihood of this event, conditioned on the cumulative measurement set. In his original development of the MHT, Reid [6] gives the derivation for the actual event probability. Applying Bayes


631

the observation has a normal distribution around its predicted value. Equivalently stated, the innovation ( ) has zero-mean Gaussian probability density function

rule, he writes

(16) (13) which has normalization constant . The three terms in this calculation reflect the following: 1) the probability that the particular observations will occur conditioned on the current assignment, the previous assignments, and earlier data; 2) the probability that the assignment event correctly associates the current observations, also conditioned on the previous assignment hypothesis and earlier data; and 3) the probability that the parent hypothesis contained the correct assignments. Our approach parallels this derivation. For the first term, we provide the statistical distance of the measurement, which assesses how well it agrees with the estimates. For the second term, we provide the likelihood of what type of event has occurred, i.e., skip or continue. The previous iteration of the filter makes the last term available, and the constant averages out, so we ignore it. If represents the log-likelihood of the parent hypothesis, then for the current hypothesis we find (14) and referring to the distance and event logwith likelihoods. The association event describes the specific means by which a track evolves, i.e., with or without a skipped measurement. For its event log-likelihood, we use if assignment if skip

(15)

with equal to the probability that the sensor will detect the particular feature under consideration. The detection probability of a world feature follows from the specific characteristics of the sensing and extracting systems and the nature of the target. In theory, we can measure these parameters directly, through careful observations of sensor performance under realistic operating conditions. The interrelationships between features and the motion of the sensor make this task extremely difficult, however. In practice, we opt to treat the probability of detection similarly to the way we establish the likelihood of a start track hypothesis as earlier described. Namely, each feature has the same probability of detection, equal to 2/3. We deliberately choose a value higher than 1/2 so that the algorithm favors tracks with consecutive observations, the upper branch in (15), and disfavors ones with skipped observations. The distance likelihood borrows from the validation gate, previously defined in (6). Recall that the gate acts as a tolerance region, centered on the expected observation, which must include the actual measurement in order for the filter to even generate an association hypothesis. The test for this condition requires calculating the statistical distance between the measurement and its expectation. Similarly for the first term in the hypothesis log-likelihood, (14), we assume that

with covariance , earlier defined in (4). Reflecting such a distribution, we find for the distance log-likelihood if assignment if skip

(17)

The zero indicates that the skip continuation hypothesis follows from its parent with probability one in the absence of any new information. E. State and Covariance Estimates A real-time tracking application needs to update its parameter estimates at regular intervals because it uses them to make repeated assignments based on both the value of the estimate and on its degree of uncertainty. In light of these considerations, an estimator should meet the following criteria. • Constant time: The estimate requires nearly the same amount of computation time for each filter iteration. This property enables it to keep pace with incoming data indefinitely, continuing to produce timely results even if no upper bound exists on the total number of observations. • Self-diagnostic: It provides a measure of its error, establishing a degree of confidence with which to accept its value. This information facilitates the process of sensor fusion, in which estimates from various sources are combined such that each one is weighted in inverse proportion to its uncertainty. The first criteria favors recursive over batch estimators, the latter of which derive their name from the characteristic processing of the entire data set with the arrival of each new measurement. The ever increasing calculation time inherent in batch processing eventually chokes any real-time application. The second criteria proves equally important. Having a measure of error accompanying each estimate enables the sensible combination of multiple estimates. We expect these estimates, nominally from different sources, to contradict each other. It will prove useful, however, to calculate a composite estimate in which each of the individual components makes a contribution weighted according to its degree of confidence. 1) Vertex Target Model: We estimate vertex target states using a first-order extended Kalman filter (EKF). This fully recursive method normally applies to a dynamic system (18) represents plant noise. At first glance, a vertex in which belonging to a fixed target appears static. Dynamic behavior emerges, however, when the sensor moves from one location to another (recall Fig. 4). Given estimates at time of both the state and the error we make a prediction of the new state and its covariance (19) (20)

632


(a)

(b)

(c)

(d)

0

Fig. 8. Evolution of a vertex estimate using the EKF. The actual vertex lies at position (x; y ) = (0.000, 0.232) m. The 2-cm “+” in each frame defines the scale. Estimated states ^ are drawn with an 8 error ellipse (determined by the covariance matrix ). (a) We initialize the EKF using the time-of-flight inversion method. (b) After four observations have arrived, the EKF gives a much better estimate of the vertex location than the initial estimate. (c) After six observations, the error ellipse shrinks and becomes less oblate. (d) A closer look reveals the accuracy of the EKF estimate.

x

P

Fig. 10. N -scan back pruning. The MHT prunes the hypothesis tree N -scans prior to the present, making a firm decision for the assignments above the decision node. Fig. 9. An example of very simple branching. Each scan in this sequence contains only one RCD, and the face model only allows the plane behavior. Imagine the complexity when the typical scan contains 5–8 RCD’s and we allow both plane and sculpt face model behaviors. Even worse still, the structure shown here repeats itself for all existing tracks as well.

In a properly dynamic system, these predictions reflect changes undergone by the true state since our last measurement. The static vertex assumption, however, renders this prediction trivially simple (the state transition matrix is Identity). The filter

uses the state prediction in conjunction with the known sensor location to predict the value of the expected measurement as in (2). Recall the innovation and its covariance , the same as those used in calculating the validation gate, indicate how well the actual observation agrees with its expectation. The EKF updates the state estimate by averaging the predicted state with


633

• Target state vectors, with insufficient capacity to approximate the global shape of the face, respond only to the most recent observations. • In the absence of a true recursive formulation, operating on a small batch of recent data represents the best chance at achieving real time performance. The least-squares estimate , given by

(24)

Fig. 11. Data set for first example. The 12-scan time sequence indicated by the unfilled triangles begins at the top left and proceeds clockwise around the triangular object. Most of the 15 RCD’s in these scans correspond to the two sharp corners of the triangular object and the face between them. Two RCD’s clutter the data, having no obvious explanation.

in which , individually weighs the contribution from the bearing and range components of an RCD, according to the inverse measurement covariance . The range of the summation incorporates the most recent observations, thus replacing the traditional state transition matrix in typical EKF formulations. Using the same measurements, we calculate

the new measurement. The filter gain tells us how to weigh the contributions from these different sources (21) to map the unby using the measurement model Jacobian certainty (in the predicted state estimate) into measurement space and then normalizing it with the innovation covariance . The update step applies the filter gain in the following manner: (22) (23) actually decreases because new information has Note that arrived. Also illustrating this trend, Fig. 8 presents a sequence of estimates for a vertex target that corresponds to the corner of a triangle. The state uncertainty estimate, shown as an error ellipse scaled by a factor of 8, not only shrinks as more information arrives but also becomes more circular. The initial oblateness results from the large uncertainty in the observed bearing angle of an RCD feature. 2) Face: Recall that the vertex model globally represents the geometry of a sharp corner. The EKF works well for these targets because the recursively calculated estimates it provides use all of the measurements. Anticipate the failure of the EKF in tracking the plane and sculpt target models that correspond to smooth faces, however, because both of these models provide only local estimates of the underlying geometry. Alternatively, nonlinear least-squares estimation shows promise. Since a recursive formulation for this approach is impossible, we apply the estimator to a moving data window that comprises only the most recent observations. Implemented in this manner, a nominal batch processing approach meets the constant time criterion from earlier in this section. This particular method appeals to us for the following reasons.

(25)

as the approximate mean-squared error of this estimate. The optimal size of the batch depends on both the quality and quantity of the measurements. The number we choose for represents a compromise between two conflicting ideals. The state estimate exhibits high responsiveness when it processes fewer measurements. Unfortunately, it responds equally well to the noise in the measurements, thus a larger batch enhances the stability of the state estimate though at the cost of increasing lag. We have experimented with batch sizes in the range 2–4 and have settled on 3 for the results shown in Section IV [1]. The final concern in calculating the state estimates is their feasibility. Situations arise when the data contrive to illcondition the least-squares calculation in (24). Consecutive RCD’s which share the same bearing, for example, generate a sculpt state estimate with infinite radius. Aside from the usual problems that arise when one tries to calculate an infinite value, an infinite radius sculpt is indistinguishable from a plane. At the other end of the spectrum, neither we nor the algorithm can differentiate between a sculpt with zero radius and a vertex. The vertex target model with the EKF works much better than the small batch under these circumstances. To address the feasibility issue, we set upper and lower bounds on the radius parameter of the sculpt target estimate. Though the filter performs the actual calculation of (24) using an unconstrained method [16], [17], the results must pass through the established bounds. Those estimates that fail this simple test never generate hypotheses.

F. Tree Management The MHT filter generates a hypothesis tree with exponential complexity. Fig. 9, for example, shows a very simple case with one observation per scan and two target models. A realistic

634


Fig. 12. MHT time sequence for first example. In this time sequence, we show the decisions made by the MHT filter concerning the origin of each RCD observation in the data set. Track 0 (shown in frames 1, 14, and 15) by convention contains all false alarms. Track 1 (frames 2, 3, 5–7, and 9) shows the RCD’s assigned to the face of the triangle. In two of these frames, the MHT has calculated that the sculpt primitive (; ; ) has higher likelihood than the plane primitive (; ) due to the imperfect nature of real data. Tracks 2 and 3 (frames 4, 8, and 10–13) contain the RCD’s assigned to the two vertices of the triangle. In the final frame, the MHT filter erroneously classifies the RCD that corresponds to the vertex of the triangle as a false alarm because it lacks subsequent observations in support of alternative hypotheses.

example contains many more hypotheses. We combine several strategies to limit and manage the growth of the hypothesis and track trees: screening and pruning. The former, applied

prior to the actual generation of hypotheses, includes validation gates and feasibility checks. We have already discussed the implementation of both of these.


635

node, the MHT adds the individual likelihood contributions of its leaves into a composite likelihood for the entire branch. The most likely branch survives, with the filter pruning away the others. Above the decision node remains a simple list of assignments. IV. EXPERIMENTAL RESULTS

Fig. 13. Data set for second example. This sequence of 40 scans begins at the left and proceeds to the right. Recall that the unfilled triangles indicate the location of the sensor.

Pruning, on the other hand, acts after the hypotheses have been generated. The MHT kernel employs three pruning strategies: -best: Limits the assignment hypotheses actually placed on the tree to the best; Ratio: Limits hypotheses to the ones whose likelihood ratio vis-a-vis the most likely hypothesis exceeds a minimum threshold ; -scan back: Removes all branches from the tree, save the most likely one, at the node that corresponds to steps back in time (see Fig. 10). A superficial glance suggests that the first two of these strategies offer little benefit at high cost. It would appear that they require enumerating all possible assignment hypotheses and ranking them. Techniques exist to rank assignments without exhaustive enumeration [18], [19]. Cox and Miller [15] incorporate Murty’s algorithm [20] in the MHT kernel. This approach, from combinatorial optimization, ranks assignments by cost in polynomial time. By limiting the number of hypotheses generated at the current time step, -best and minimum ratio pruning limit the branching as well. As a result, their effectiveness increases drastically at subsequent time steps. and the maximum number The minimum likelihood ratio of hypotheses can potentially trivialize each other. In other prevents the kernel from enuwords, too small a value for merating hypotheses of likelihood anywhere near the minimum ratio. Conversely, too large a value for limits the number of hypotheses so effectively that is rendered redundant. When balanced, however, they mutually support each other. Experience tells us that setting to achieves balanced results. These values affect the computational throughput of the MHT filter, an issue we address in Section V. The results in Section IV reflect values of 100 and 0.01. Only those hypotheses that escape the first two strategies join the tree. There they remain, spawning their own progeny, scans later. In scan back until the final strategy occurs pruning, shown in Fig. 10, the MHT kernel looks at the decision node that corresponds to time steps prior to the current time. To determine the likelihood of each branch at this

In this section, we present two time sequences which contain snapshots of the decisions made by the MHT filter as data arrives. The first example contains a sequence of scans which correspond to a sensor moving around a triangular prism. Fig. 11 presents an overview of the setup, with the sensor locations and the extracted RCD’s overlaid upon the actual geometry of the triangle. For sake of discussion, let the top left sensor location correspond to discrete time step ; the sensor moves clockwise with each scan, incrementing the clock by one. The data set for this example contains 12 scans with a total of 15 RCD’s. Fig. 12 shows how the MHT filter assigns each of the 15 RCD’s in the cumulative data set. For this example, the filter pruning parameters were set to 100-best hypotheses, minimum ratio 0.01, and 5-scan back pruning. The least-squares estimates for the face were calculated with a maximum batch size of three, as shown in frames 5–7 and 9. One point of note in Fig. 12 is the unexpected presence of a sculpt target state in both frames 3 and 5. Even though the underlying face of the triangle is flat, the sculpt target behavior model better supports the actual data. In frames 6 and 7, the plane target model resumes being the most likely alternative. Another observation we make regarding this example, the filter assigns the very last observation that occurs to the false alarm track (frame 15) even though it quite clearly corresponds to one of the sharp corners. The last observation lacks subsequent measurements that support any alternate data association hypotheses of higher likelihood. For the second example, we apply the MHT filter to a set of observations corresponding to the smoothly curved face of a parabolic cylinder. The 40 scans in this example contain exactly 40 RCD’s, one per scan, as shown in Fig. 13. The MHT algorithm nevertheless continues to experience high complexity because it still branches at each step. It always attempts to initiate new tracks using both the plane and vertex target models and to split face tracks into plane and sculpt targets. This example leads to the important result that the MHT filter successfully explains the entire set of observations using a single face track, a subset of which is shown in Fig. 14. Thought there are 40 time steps in all for this track, the nine frames (every fourth step in the range 4–36) in Fig. 14 sufficiently demonstrate the ability of the MHT filter to track the face without interruption by any incorrectly interpreted vertex targets. The first three frames behave exactly as expected—a sculpt target model whose curvature estimate increases to match that of the underlying geometry. We see an unexpected behavior in frames 5–8, when a plane primitive state best explains the observations. The opposite has already occurred once

636


Fig. 14. Time history of curved face track. Every fourth estimate in the sequence shows that the algorithm switches between the sculpt target model [shown as a circle whose parameters (; ; ) are indicated in the text] and the plane target model (straight lines with and ) when tracking a curved face.

(a)

(b)

Fig. 15. Results from the second example: (a) 40 observations that correspond to the smooth face and (b) the final geometry using a linked rod representation. Extraneous vertices correspond either to corners in the internal plywood support frame or to small bubbles in the epoxy coating applied for waterproofing.

before, when a sculpt primitive best explained some of the observations of a flat face in Fig. 12 of the previous example, but it comes as a surprise nevertheless. We attribute this unexpected result to measurement error in the RCD’s. In the end, the filter successfully partitions the measurements that correspond to the curved face, shown at the top in Fig. 15. Using a linked rod representation, as described in [1] and [21], we arrive at the final geometry shown at the bottom.

V. DISCUSSION Quite accurate reconstructions are possible because the algorithm groups together returns obtained from different vantage points that originate from the same geometric feature, while rejecting spurious measurements. Our approach addresses uncertainty in the origins of measurements (data association or correspondence uncertainty) as well as in the


values of measurements (noise uncertainty). It models the uncertainty due to noise by covariance matrices and represents the data association uncertainty by Bayesian probabilities attached to nodes of a hypothesis tree, each node representing different possible assignments of measurements to features. Because our experiments thus far have been restricted to static, rigid, 2-D scenes, further research is necessary to extend the approach to more challenging underwater scenes. Theoretically, the techniques readily extend to 3-D (arc features, for example, transform into solid angle sections of a sphere). In practice, however, track initiation becomes more problematic. The MHT formulation relies on the assumption that the target state estimate (in our case, the location of a face, edge, or vertex) can be initialized from a single measurement. Subsequent measurements are evaluated via data-to-target association hypotheses by comparing new observations to predictions made based on the estimated target state. In three dimensions, however, target states contain more degrees of freedom than can be initialized from a single measurement. Two solutions appear feasible. The first is to remove responsibility for track initiation from the MHT and provide a specialized module for the purpose, much as Kurien [8] has done. The second solution is to employ an array of sonar sensors that makes several measurements of a geometric feature at each scan, thereby providing sufficient information to allow the initiation of geometric features. Finally, the specular wavelength regime only applies to objects whose rms surface roughness is less than . Under typical marine conditions of both sedimentation and biological fouling, many surfaces will reflect diffusely, not specularly. As such, they will cease to fit the conditions of being an isolated target. With a feature extraction technique that can adapt to a range of surface roughness scales, it should be possible to classify a diffuse scatterer such as the top wall in Fig. 2 as a single track.

ACKNOWLEDGMENT The authors thank Dr. I. J. Cox and M. L. Miller of the NEC Research Institute in Princeton, NJ, for their thoughtful insights and software expertise offered to this research.

637

[8] T. Kurien, “Issues in the design of practical multitarget tracking algorithms,” in Multitarget-Multisensor Tracking: Advanced Applications, Y. Bar-Shalom, Ed. Norwood, MA: Artech House, 1990, pp. 43–83. [9] I. J. Cox and J. J. Leonard, “Modeling a dynamic environment using a Bayesian multiple hypothesis approach,” Artif. Intell., vol. 66, pp. 311–344, 1994. [10] M. L. Miller, Implementation Notes for MHT With Multiple Target Models. Princeton, NJ: NEC Research Inst., 1993. [11] M. E. Mortenson, Geometric Modeling. New York: Wiley, 1985. [12] M. do Carmo, Differential Geometry of Curves and Surfaces. Englewood Cliffs, NJ: Prentice-Hall, 1976. [13] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. New York: Academic, 1988. [14] R. Smith, M. Self, and P. Cheeseman, “Estimating uncertain spatial relationships in robotics,” in Autonomous Robot Vehicles, I. J. Cox and G. T. Wilfong, Eds. Berlin, Germany: Springer-Verlag, 1990. [15] I. J. Cox and M. L. Miller, “On finding ranked assignments with application to multi-target tracking and motion correspondence,” IEEE Trans. Aerosp. Electron. Syst., vol. 32, 1995. [16] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization. New York: Academic, 1981. [17] Numerical Algorithms Group, Oxford, U.K., NAG Fortran Library, Mark 14, 1990. [18] W. L. Brogan, “Algorithm for ranked assignments with applications to multiobject tracking,” IEEE J. Guidance, vol. 12, pp. 357–364, 1989. [19] V. Nagarajan, M. R. Chidambara, and R. N. Sharma, “Combinatorial problems in multitarget tracking—A comprehensive survey,” in Proc. Inst. Elect. Eng., 1987, vol. 134, pt. F, no. 1, pp. 113–118. [20] K. G. Murty, “An algorithm for ranking all the assignments in order of increasing cost,” Oper. Res., vol. 16, pp. 682–687, 1968. [21] J. J. Koenderink, Solid Shape. Cambridge, MA: MIT Press, 1990.

Bradley A. Moran (M’94) received the A.B. degree cum laude in mathematics from Yale University, New Haven, CT, in 1985, the S.M. degrees in mechanical engineering and in naval architecture and marine engineering, both in 1990, and the Ph.D. degree in ocean engineering in 1994, all from the Massachusetts Institute of Technology (MIT), Cambridge. Prior to his work in data fusion, he has published in the fields of computational geometry and computer-aided design. From 1989 to 1992, he chaired the MIT Student Section of the Society of Naval Architects and Marine Engineers. In 1991, he was awarded the International Society of Offshore and Polar Engineers Scholarship. He remains at MIT in the Sea Grant Underwater Vehicles Laboratory. Having completed a Post-Doctoral Fellowship, his current title is Research Engineer in which he directs software development for the Odyssey class AUV’s and autonomous ocean sampling networks. His research interests include autonomous guidance, navigation, and control. Dr. Moran is a member of the Alpha Lambda Delta and Sigma Xi as well as the IEEE Oceanic Engineering Society.

REFERENCES [1] B. A. Moran, “Underwater shape reconstruction in two dimensions,” Ph.D. thesis, Mass. Inst. Technol., May 1994. [2] R. Kuc and M. W. Siegel, “Physically based simulation model for acoustic sensor robot navigation,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, Nov. 1987. [3] J. J. Leonard and H. F. Durrant-Whyte, Directed Sonar Sensing for Mobile Robot Navigation. Boston, MA: Kluwer, 1992. [4] T. E. Fortmann, Y. Bar-Shalom, and M. Scheffe, “Sonar tracking of multiple targets using joint probabilistic data association,” IEEE J. Oceanic Eng., vol. OE-8, pp. 173–184, July 1983. [5] C. L. Morefield, “Application of 0-1 integer programming to multitarget tracking problems,” IEEE Trans. Automat. Contr., vol. AC-22, pp. 302–312, June 1977. [6] D. B. Reid, “An algorithm for tracking multiple targets,” IEEE Trans. Automat. Contr., vol. AC-24, pp. 843–854, Dec. 1979. [7] S. Mori, C. Chong, E. Tse, and R. Wishner, “Tracking and classifying multiple targets without a priori identification,” IEEE Trans. Automat. Contr., vol. AC-31, pp. 401–409, May 1986.

John J. Leonard received the B.S.E. degree in electrical engineering and science from the University of Pennsylvania, Philadelphia, in 1987 and the Ph.D. degree from the University of Oxford, U.K., in 1994. He performed research for his doctoral thesis, “Directed Sonar Sensing for Mobile Robot Navigation,” from 1987 to 1990, as a member of the Oxford Robotics Research Group. In 1990 and 1991, he was a Visiting Scientist at NEC Research Institute, Princeton, NJ. From 1991 to 1996, he was post-Doctoral Fellow and Research Engineer in the Underwater Vehicles Laboratory of the Massachusetts Institute of Technology (MIT) Sea Grant College Program, Cambridge. He is an Assistant Professor of Ocean Engineering at MIT. His research addresses the problem of sensor data fusion in marine robotics, with application to the problems of navigation and control of underwater vehicles and acoustic scene reconstruction. Dr. Leonard is a member of the IEEE Oceanic Engineering, Robotics and Automation, and Computer Societies, and the Acoustical Society of America. He currently serves as a guest associate editor for the IEEE JOURNAL OF OCEANIC ENGINEERING for underwater vehicle systems.

638

Chryssostomos Chryssostomidis joined the faculty of the Massachusetts Institute of Technology (MIT), Cambridge, in 1970 as Assistant Professor of Naval Architecture, was promoted to Associate Professor, and in 1982, he became a full Professor in the Department of Ocean Engineering. That same year, he became Director of the MIT Sea Grant College Program. He has been Director of the Design Laboratory since its inception in the early 1970’s. In 1989, he established the MIT Sea Grant Underwater Vehicles Laboratory, a new laboratory at MIT which develops technology and systems for advanced autonomous underwater vehicles. In 1994, he was appointed as Department Head in Ocean Engineering. Most recently, he has established a multidisciplinary research team to address key issues underlying the design and fabrication of high-speed, high-performance surface ships. He has received many acknowledgments for outstanding contributions. He served as Naval Sea Systems Research Professor from 1985 through 1987. In 1975 and 1976, he served as Von-Humbolt Scholar at Ruhr University, Bochum, Germany. Since January 1993, he holds a new Professorship, the Henry L. and Grace Doherty Professor of Ocean Science and Engineering. His publications display his wide range of interests including design methodology for ships, vortex-induced response of flexible cylinders, conceptual study of a ship for sub-seabed nuclear waste disposal, and abyssal ocean option for waste management. Dr. Chryssostomidis is a fellow of the Society of Naval Architects and Marine Engineers. He was awarded The Captain Joseph H. Linnard Prize in 1976 (best paper contributed to the Proceedings of the Society of Naval Architects and Marine Engineers). Recently he has been named one of two Professors of Teaching Innovation by the MIT School of Engineering.