Information Measures for Object Recognition Accommodating ...

1896

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 5, AUGUST 2000

Information Measures for Object Recognition Accommodating Signature Variability Matthew L. Cooper, Member, IEEE, and Michael I. Miller

Abstract—This paper presents measures characterizing the information content of remote observations of ground scenes imaged via optical and infrared sensors. Object recognition is posed in the context of deformable templates; the special Euclidean group is used to accommodate geometric variation of object pose. Principal component analysis of object signatures is used to represent and efficiently accommodate variation in object signature due to changes in the thermal state of the object surface. Mutual information measures, which are independent of the recognition system, are calculated quantifying both the information gain due to remote observations of the scene and the information loss due to signature variability. Signature model mismatch is quantitatively examined using the Kullback–Leibler divergence. Expressions are derived quadratically approximating the posterior conditional entropy on the orthogonal group for high signal-to-noise ratio. It is demonstrated that quadratic modules accurately characterize the posterior entropy for broad ranges of signal-to-noise ratio. Information gain in multiple-sensor scenarios is quantified, and it is demonstrated that the cost of signature uncertainty for the Comanche series of FLIR images collected by the U.S. Army Night Vision Electronic Sensors Directorate is approximately 0.8 bits with an associated near doubling of mean-squared error uncertainty in pose. Index Terms—Infrared imaging, mutual information, object recognition, performance analysis of recognition systems.

I. INTRODUCTION

W

ITHIN the Center for Imaging Science, we have been studying the use of deformable templates for rigid body Object Recognition (OR) [1]–[3] accommodating the geometric variations of orientation, position, and scale. A major thrust of our work has been the extension of deformable templates to accommodate signature variation. In natural imagery, a major source of variability in the appearance of objects is variation in illumination. Belhumeur and Yuille have developed methods for accommodating variations in the illumination of human faces using principal components analysis decomposition of imagery [4], [5] (see also [6]). Joshi et al. applied principal components analysis to represent the variation of shape of neuro-anatomical submanifolds [7]. In this work, principal components analysis was extended to submanifolds in three-dimensional space.

Manuscript received May 12, 1999; revised January 6, 2000. This work was supported by ONR N00014-98-1-0606, ARO DAAH04-95-1-0494, and the Boeing-McDonnell Foundation. M. L. Cooper was with the Department of Electrical Engineering, Washington University, St. Louis, MO 63130 USA. He is now with Fuji-Xerox Palo Alto Laboratory, Palo Alto, CA 94304 USA (e-mail: [email protected]). M. I. Miller is with the Center for Imaging Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD 21218 USA (e-mail: [email protected]). Communicated by A. O. Hero, Guest Editor. Publisher Item Identifier S 0018-9448(00)05530-9.

Building on such parametric representations, in this paper we extend deformable templates under the geometric group action to represent object signature. The extended template enables us to both incorporate signature information into recognition algorithms and examine recognition performance loss associated with signature variability. The second major thrust of this paper is the establishment of a general class of performance measures for recognition based on information-theoretic measures. These performance measures for recognition are algorithm-independent, serving as benchmarks against which the plethora of proposed recognition algorithms may be compared. Information-theoretic measures have been frequently used in recognition algorithm design. Colmenarez and Huang have developed detection algorithms using models designed to maximize interclass discrimination as measured by relative entropy [8], [9]. Viola and Wells estimate object pose as the rigid transformation of an object model that maximizes an empirical estimate of the mutual information between the sensor data and model [10]. Fisher et al. have developed a feature extraction technique for classification based on mutual information and have, in turn, applied this approach to pose estimation [11], [12]. There is also a large literature on the application of information-theoretic measures to performance analysis of statistical recognition algorithms (e.g., Fano’s inequality, Stein’s lemma, see Cover and Thomas [13] for an overview). Shao et al. have applied mutual information as a performance measure and system design tool in computed tomography [14]. Our work is complementary in that we propose the use of mutual information as a performance analysis and system design tool for model-based object recognition. We specifically address the following question posed in the report of the Army Research Office Working Group on ATR [15]: independent of any particular algorithm, how much information do various remote sensors provide about complex scenes. We formulate the recognition problem via a communications model using the “source-channel” characterization of Shannon; the decoder attempts to infer properties of the sources (objects’ configurations in the imaged scene) from observations obtained via the remote-sensing channel. We shall exploit the ability of statistical, model-based approaches to provide confidence measures and, in turn, error bounds for the estimated object configurations. Herein, we quantify the information provided by remote sensor data regarding the objects’ configurations in the observed scene. The measure of information studied throughout is mutual information, originally proposed by Shannon [16]. Mutual information provides a bound on the information regarding the objects’ configurations that may be transmitted via the re-

0018–9448/00$10.00 © 2000 IEEE

COOPER AND MILLER: INFORMATION MEASURES FOR OBJECT RECOGNITION ACCOMMODATING SIGNATURE VARIABILITY

mote-sensing channel. A significant advantage of the statistical communications point of view is that all of the classical estimation bounds for communication channels are immediately inherited on the nonadditive parameter spaces of recognition. The approach is general and is applicable to object detection, classification, and recognition. A. Pattern-Theoretic Recognition The application of Grenander’s pattern theory [17] to recognition proceeds in three steps: 1) the representation of the scene being observed; 2) the formation of remote sensor observations; and 3) the statistical inference of the parameters describing the objects in the remotely observed scene. Objects are represented using templates; their infinite variety of appearance is accommodated via transformations which act on the templates. The rigid-body templates consist of geometric descriptions for the objects of interest (e.g., computer-aided design (CAD) models). The transformations form groups, and the deformable template is the orbit under a group action, its subgroups, and products. Estimation becomes identification of the group action and object class, requiring matching the observed remotely sensed image with the particular instance of the template. Define the idealized image or template with geometric variability introduced via the rigid Euclidean group actions of translation and rotation. , the alphabet of posScenes consist of objects of class , sible objects, each with an associated parameter vector the group of transformations describing position and orientation. The class of variabilities associated with the physics of the sensors is accommodated through the statistical laws describing the transformation of the underlying object representations to the observable imagery. Denote the deformation mechais observed as nism through which the ideal image , which can be either deterministic or random. The may have components corresponding to multiple data . Characterize them via a statistical sensors: , transition law, the likelihood function to summarizing completely the mapping from the input , the likelihood of given . Shannon’s the output source-channel model of communication theory is consistent with the Bayesian conceptual separation of the representation with its associated prior of the space of possible images , and the image formation process specifying the output with transition law of the remote sensing channel. The power of this view is that there is one true underlying scene, irrespective of the number . A single of sensor measurements forming inference problem is solved, with the multiple observations due to multiple sensors providing additional information in the posterior distribution. Sensor fusion occurs automatically in this framework.

1897

jointly parameterized. Section III applies the extended template to compute the posterior probability density given FLIR imagery over this joint parameter space. In Section IV, information-theoretic performance analysis is presented quantifying the information gain due to remote sensor observations in the context of the inference. Additionally, information loss associated with model mismatch is quantified via the Kullback–Leibler divergence. Section V presents asymptotic quadratic approximations to the posterior density allowing us to accurately approximate the information measures for orientation estimation. Finally, Section VI quantitatively characterizes sensor fusion; the information measures are computed to assess the relative performance of several sensors and performance gains due to sensor fusion. II. REPRESENTING VARIABILITY OF POSE AND SIGNATURE We examine the pose and signature variation associated with the thermal variations of the surface of object viewed by FLIR sensors. FLIR systems passively sense objects via their reflected or emitted thermal radiation, which may vary dramatically with changes in the object thermal signature (equivalently, the object surface temperature profile). Shown in the top row of Fig. 1 are views of an object at multiple poses. The bottom row shows the object at a single pose with multiple object thermal signatures. Similarly, in optical or video imaging contexts, changes in the illumination of a scene generate considerable signature variability (e.g., [18]). Estimation of the object signature is fundamental to accurate estimation of the object’s class and pose. A. Geometric Variability Templates are constructed corresponding to CAD representations of the surfaces of the rigid objects. Denote such an ideal , , the set of location indices on template the object surface. A rigid template defined by a CAD model is shown in the top row of Fig. 1 at three different poses. Geometric variation, due to variability of pose of the objects is introduced via the rigid motions of translation and rotation. The set of transformations are matrix Lie group actions on the templates. For ground-based scenes, we use the axis-fixed rotaand parameterized by rotation tion group, identified with ; transformations are of angle, and translations in the plane . Elements of , the Special Euthe form, clidean Group, are composed according to . This is denoted by the semidirect product [19] . The deformable template over which inference occurs is the orbit under (1)

B. Modeling Signature Variation via Principal Components B. Paper Organization In Section II, deformable templates for geometric variation are extended to accommodate the signature variation typical of forward-looking infrared (FLIR) imagery. Object signatures are modeled as Gaussian random fields on the surface of the template, with the parameters describing object pose and signature

We shall focus on infrared sensors operating in the 8–12- m band, in which object material properties and operational states as well as meteorological conditions contribute to signature variability. A complete description of the signature of an object specifies the emitted radiance as a function of position on the object surface. The surface can be decomposed into regions of

1898


Fig. 1. The two primary modes of variation of ground-based objects observed via FLIR sensors: pose and signature. In the top row, a T62 tank is shown at three different orientations. In the bottom row, the tank is observed with three thermal signatures at a single pose.

polygonal facets constituting a CAD model representation of the surface, as in the models describing the surface geometry used by the FLIR simulation software package PRISM [20]. The thermal signature is described by a vector with elements given by the radiance at the lattice points of the template (CAD as the radiant model). Define the signature, intensity indexed over the surface lattice with dimensionality . Incorporating signature information, the deformable template becomes the set of all objects generated under the group action with superimposed scalar field . Thus an element of the template specifies the geometric transformation of each indexing the object, as well the radiant intensity at point . Denote an element of the deformable template that point, , where the orbit under the by group action becomes

generated directly from the sample covariance with elements

where

The sample covariance is decomposed into an eigenfunction/eigenvalue representation numerically via the Singular Value Decomposition (SVD). For deriving the basis, the inner is employed as the product respecting the surface measure signature random field is defined on the object surface. On the satisfy discretized surface lattice , the basis functions (4)

(2)

We have generalized the group action to accommodate signature and pose variation. Representing the object signature increases the dimensionality of the parameterization from to . We want to minimize this increase. Model as a scalar-valued Gaussian random field the signature and representing the object radiant intensity with mean . Our approach is to characterize signature varicovariance ation statistically via empirical covariances. Following Joshi et al. [7], we apply eigenexpansion techniques to random fields indexed on the object surface. Eigenexpansion of the covariance operator of the Gaussian random field gives a complete for object signature representation orthonormal basis

, computed

, the surface area where is a diagonal matrix with . Performing an eigenexof the region indexed by pansion of the empirical covariance matrix is equivalent to the classical statistical technique of principal components analysis are the signature principal components, and [21]. Hence the are found using Algorithm 1. Algorithm 1 (Computation of Signature Principal Component Basis): satisfy (4). Define the 1) The basis elements with elements trix

for Then

, so that

ma-

.

(3) 2) Define

term subspace In general, is large, so we choose the to represent optimally in the minimum mean squared error sense [7]. We construct via the PRISM simulator an –element database of thermal signatures. The eigenbasis is

. is an Eigenvector of , where the superscript denotes the matrix transpose. 3) Using the Singular Value Decomposition, . The columns of are the eigenvectors of


1899

TABLE I SUMMARIES OF THE CONTENTS OF THE EMPIRICAL DATABASES USED. THE STATIC DATABASE ISOLATES METEOROLOGICAL VARIABILITY WHILE THE DYNAMIC DATABASE ISOLATES OPERATIONAL VARIABILITY. THE COMPOSITE DATABASE COMPOSES THE TWO MODES OF VARIABILITY

Fig. 2. Visualizations of the first three eigensignatures given by the first three principal components, from left to right, 8 (l); 8 (l); 8 (l); l 2 L from three databases. The top row shows the eigensignatures rendered for the static database. The middle row shows the eigensignatures for the dynamic database. The bottom row shows the eigensignatures for the composite database.

and the respective eigenvalues are the elements of the diagonal matrix . are 4) The eigenvectors and eigenvalues of where , for

.

The PRISM simulation package was used to build databases isolating two primary modes of FLIR signature variations: meteorological and operational. Meteorological variation, namely, changes in the weather and atmospheric conditions, was isolated in the “static” database. Operational variation resulting from changes in the state of the object, including varying the speed of the tank, whether the motor is on or off, how long the motor has been on, whether the main gun is firing or not, etc., was isolated in the “dynamic” database. A third, “composite” database was generated by composing these two modes of signature variation. Details regarding database construction appear in Table I. The number of signatures contained in each database appears in the rightmost column.

Intuitive conclusions may be drawn from the plots and renderings of the first three eigensignatures (first signature principal components) from the three databases. The first three eigensignatures (left to right, ) of the static database are shown in the top row of Fig. 2. The first eigensignature of the static database exhibits the uniform thermal emissivity variations of the tank associated with changes in the meteorological conditions. The middle row shows the first three eigensignatures of the dynamic database, each of which express localized variations in the regions of the tank containing and adjacent to the motor. The first dynamic eigensignature demonstrates the intense heating of the exhaust system and motor of the tank. The first three eigensignatures of the composite database are shown in the bottom row, also expressing the modes of variation isolated in the static and dynamic databases. To analyze the dimensionality of this representation for object signatures, the normalized power spectrum ( plotted versus ) of the eigenbasis representing object signatures is shown in the left panel of Fig. 3. The right panel shows the plot of normalized mean-squared error versus the number of eigenfunctions retained in the expansion of (3).

1900


Fig. 3. The left panel shows the normalized power spectrum squared ( = ) of the eigenbasis representing FLIR variability. On the right appears the normalized mean-squared error versus the number of terms used in the eigenexpansion, averaged over three databases.

III. BAYESIAN POSTERIOR DISTRIBUTION The parameter space for the Bayesian posterior distribution is now clear. Retaining terms in the eigenexpansion, the object signature is represented as

with multivariate normal distribution: dependent normal with zero mean and variance value associated with the th eigensignature

are in, the eigen-

(8) (5) A. The Imaging Model The deformable template becomes

(6) is given by . Fig. 3 indicates that the dimensionality required for accurate representa. This repretion of thermal signature is on the order of sents a reduction in the dimensionality of the extended templates to . We identify the object signature from with its corresponding subspace projection coefficients. In all simulations that follow, the signature is represented by the first twenty projection coefficients

Thus . Throughout, the prior probability measure associated with the with posterior conparameters of interest is denoted by with density . ditioned on the observation is the product of the prior with the data likeliincorporating variability introduced by the hood sensor in the formation of the remote observation of the scene. Throughout, object signature and pose are assumed independent. The joint posterior is the product of the data likelihood and the joint prior (7) The prior probability on the rotation group will be uniform on the circle. The object signature is modeled as a Gaussian random field, parameterized by the subspace projection coefficients ,

For the ground scenario, the scenes considered consist of a T62 tank at a known position against a constant background. The video imager uses orthographic projection according to the . The FLIR imager uses permapping spective projection, according to the mapping . For generating FLIR scenes, Gaussian- and Poisson-based charge-coupled device (CCD) sensor models are used [22], [23] coupled to standard infrared radiation from CAD models of the object generated by the PRISM simulator. The projective transformations are the deterministic component of from the ideal image space , inthe operation cluding complete object signature information, to the positive , the two–dimensional (2-D) CCD demeasurement space tector space including both blur and Poisson noise [24]. We denote the projective transformation of the ideal scene by the operator defined as the near-field projection of the ideal image of the object with templete at transformation onto the detector plane. The Poisson log-likelihood is

(9) is the cardinality ignoring terms without dependence on . of the detector space. In the high photon limit, the measurements are modeled as a Gaussian random field [25]–[27] with mean the projective transformation of the scene (10) ignoring terms without dependence on .


B. Sensor Fusion The Bayesian formulation readily incorporates additional sensors into the inference. As sensors make conditionally independent observations of the scene, the product of each of the individual data likelihood terms forms the joint data likelihood observations, for all available observations. If there are , produced by different sensors, then

(11) Sensors will require individual signature representations for the . While an FLIR system corresponding object signature will sense temperature, optical sensors will be sensitive to the illumination conditions. These variations are modeled via respective representations for the object signature. We shall examine two sensors: the video imaging sensor and the FLIR sensor.

1901

directly on the parameter spaces of the three-dimensional scene. Additionally, the sensor data are not preprocessed. Both alternative parameterizations of the object pose and additional data processing will in general decrease the mutual information, resulting in exponential performance loss per (15). The mutual information can be computed jointly or individually for the parameters of object class, position, orientation, and signature. For simplicity, we shall focus on the parameters of object signature and orientation, noting that the analysis can be generalized to any parameter space on which we can construct . To study the impact the posterior of signature variability in ground-based object orientation estimation via FLIR, we study three forms of the mutual informaand . In the defition: nitions that follow, fixed parameters and variables of integration will be lowercase, random variables capital. measures the information gain in orientation given the image data when the object signature is known and given by

IV. MUTUAL INFORMATION ANALYSIS The mutual information between two sets of random variables directly measures the dependence between them [13]; we propose its use to quantify the dependence between the remote oband the object parameters of servation of the scene . The form of the mutual inforpose, position, and class mation for continuous processes is (12)

(16) measures the average information gain over the space of possible object signatures given the observation

(13) is the differential entropy of the parameter vector where and is the conditional differential entropy of given the observation . This statistical measure quantifies the information gain regarding the object pose associated with the sensor data. This form of the mutual information bounds the information regarding the true parameter values contained in any estimator of the parameters according to (14)

(17) measures the average information gain over the space of signatures given the observation in the case that the object signature is unknown

per the data-processing inequality [13]. Additionally, bounds on estimator performance may be derived directly based on the mutual information, for example [14]

(15) This bound illustrates the algorithm-independent and model-dependent nature of this performance measure. While (15) ap, the bound depends on the plies to any estimator based on ): the mapping sensor likelihood model (concealed in where the parameter input is identified the corresponding instance of the template dewith . It should be emphasized that formed by the sensor to the pattern-theoretic representations employed to accommodate geometric variability allow us to construct the Bayes’ posterior

(18) For this calculation, we generate the space of signatures

by marginalizing over

(19) Monte Carlo random sampling is used to evaluate the integrals . The integrals over are evaluated via Monte Carlo over sampling conditioning on the object pose and signature . The

1902


Fig. 4. Here the T62 tank is displayed with five thermal signatures. In the top row, left to right: profile 140, profile 75, and profile 240. In the bottom row: profile 45 on the left and profile 8 on the right. These differing signatures are used to quantify performance variation due to inaccurate signature information.

integrals over set

are numerically evaluated via the discrete . Thus

(20) is an estimate of

A. The Cost of Model Mismatch The information loss associated with the use of incorrect signature information is quantified by the relative entropy between the posterior probability density models. Let denote the posterior model assuming object signature ; define to be the true object signature. The relative entropy or Kullback–Leibler divergence is

such that (21) The information loss due to the posterior density model mismatch is

where

(22)

The various thermal signatures studied are shown in Fig. 4. Fig. 5 examines these forms of mutual information. The top for several objects viewed via the left panel shows optical imager. Notice the relative variations of the curves for the three objects: T62 tank (dot-dashed), HMMV (solid), and truck (dashed). The top right panel of Fig. 5 examines the FLIR (solid line) with sensor, superimposing for several thermal signatures: . Notice the performance dependence on the underlying object signature and the increased information gain with the knowledge of the true object signature. The object signature will not generally characterizes avbe known. The mutual information, erage performance in the absence of signature information. The variation of this nuisance parameter decreases the information gain. We can measure precisely the cost in information of unknown thermal signature averaged over the signatures in the databases. (solid curve) The lower left panel of Fig. 5 shows (dashed curve), the average information gain and in object pose given the FLIR imagery and the correct object signature. Conditioning on the signature reduces the entropy, implying that these forms of mutual information are ordered (23) The difference between the two, shown in the right panel, is the average cost in bits of the unknown signature in terms of information regarding the object pose.

(24) Similarly, the information loss in the absence of a priori signature information is

(25) is computed per (19). This measure equivalently Here quantifies the average information loss due to signature model mismatch (26) (27) (28) Fig. 6 illustrates the information loss due to model mismatch. of Fig. 4. In the left panel the observations assume The Kullback–Leibler divergences and appear. Among the curves, the posterior model of (19) minimizes the information loss. In the right panel, the simulations are repeated . Again modeling the object signature for the case minimizes via marginalizing the joint posterior the information loss.


1903

Fig. 5. The top left panel shows I (O; I ) for the optical imager, demonstrating the dependence of information gain on the object class. The top right panel shows I (O; I ; t) for the FLIR sensor for several underlying thermal signatures. The bottom left panel compares I (O; I ; T ) (solid line) and I (O; I ) (dashed line), showing the information loss due to the absence of knowledge of the object signature. The bottom right panel shows the average information loss due to signature variation, as measured by the Kullback–Leibler divergence, D ( (O j I ; T ) k (O j I )), versus increasing signal-to-noise ratio (SNR).

Fig. 6. The left panel shows information loss when t = t : D ( k ); D ( k The right panel shows information loss when t = t : D ( k ); D ( k panels, information loss is minimized by the Bayes signature model without a priori assumption of object signature.

V. ASYMPTOTIC APPROXIMATION OF INFORMATION MEASURES The information measures for recognition presented in the previous section are complicated integrals that do not lend themselves to closed-form evaluation. For calculation, we approximate the measures numerically via simulation as before, or analytically under additional assumptions. In this section, building on asymptotic approximations for Bayesian posterior distributions as the signal-to-noise ratio (SNR) approaches infinity, we present approximations for the posterior entropy of the object orientation. The approach is general and may be extended to information measures for any more complex parameter space that

), and ), and D (

D ( k

k

). ). In both

may be locally approximated by a finite-dimensional Euclidean space. We follow the development of Grenander et al.[28]. For via the Euler angle large SNR we identify the rotation representation

The differential posterior entropy conditioned on the observais tion (29) This form of the posterior entropy is a function of both the unand the SNR of the obderlying object orientation

1904


Fig. 7. Top row: Numerical (solid) and analytic (dashed) approximations to H imagery of the T62 tank at SNR 5, 15, and 30 dB from left to right.

=

servation

(2

j

i

) for, from left to right, = 30

. The probability mass function on the discrete set is related to the posterior

density as (30) This gives the discrete posterior entropy (31) The asymptotics of Bayesian posterior distributions have been repeatedly examined in the literature. As in [29], we asymptotically model the posterior of as normal with mean the true and variance the inverse of the Fisher inforobject rotation . This posterior model is based on a quadratic mation Taylor series expansion of the high SNR data likelihood (see , the [28] for more details). On the discrete set quadratic probability mass function is

(32) calculated both numerically following Fig. 7 shows (31) (solid curves), and quadratically (dashed curves) according to (33) In the first quadratic approximation (dot-dashed curves) we directly following [28] compute

and

= 70

. Bottom row: Video

(35) denotes the mapping from the rotation to Here the ideal noiseless image produced by the sensor of the object with rotation . In [30], the minimum mean-squared-error was derived based on the (MMSE) estimator on Hilbert–Schmidt (HS) norm. Motivated by the Cramér-Rao bound (e.g., [31]) relating the Fisher information and MMSE estimator, we additionally approximate the asymptotic variance of using the MMSE MSE

(36)

In both cases, the asymptotic variance is a function of SNR and . The variance decreases with increasing SNR, quadratically modeling the asymptotic concentration of the posterior about . For these experiments the discrete set of possible orientations to obtain more acis curate results at high SNR. The results appear in the top row of 30 (left panel) and 70 (right panel). The Fig. 7 for numerical results are the solid curves. The dot-dashed curves are the quadratic approximation based on the empirical variance, and the dashed curves are the quadratic approximations based on the HS squared error. Notice the surprising range of SNR values for which the quadratic approximations closely fit the numerical results. The images in the bottom row show the T62 tank at SNR 5, 15, and 30 dB from left to right. The second estimate of the Fisher information is based on the HS pose estimator [30], [32]. This estimate is a function of the mean-squared error MSE

(34)

(37)


Fig. 8. In the left panel, the conditional entropies H (O j I ) (solid), H (O j I ) (dashed), H (O j I In the right panel, the mutual informations, I (O; I ) (solid), I (O ; I ) (dashed), I (O ; I ;I

1905

;I ) (x) are plotted versus increasing SNR. ) (x) are plotted versus increasing SNR.

Fig. 9. Top row: FLIR observations courtesy of Dr. J. Ratches, U.S. Army NVESD. Bottom row: renderings of estimated orientation and signature from corresponding observation of top row.

evaluated via Monte Carlo random sampling. We relate the to squared error on by the HS squared error on Taylor’s series expansion (38) (39) in (36). The accuracy of this approximation illusgiving trates the direct connection between the information measures and estimator performance embodied by bounds such as (15). VI. MULTIPLE SENSOR PERFORMANCE CHARACTERIZATION Sensor fusion is readily accommodated via the use of the joint likelihood per (11). We consider FLIR, simulated by PRISM, and an optical video imager. The top row of Fig. 8 shows simfor the ulation results for the conditional entropy the video sensor (dashed), FLIR sensor (solid), for the joint case in the left and panel. In the right panel, the mutual information is shown for each of the sensors and for the joint case. Notice the increased information gain due to the second available observation in the joint case.

VII. COMANCHE FLIR DATA Two sets of validation experiments have been performed. In the first, we consider joint object orientation and signature estimation given FLIR observations. The approach is the maximization of the posterior of (7) over the joint parameter space. The maximization is achieved via gradient descent, using numerical approximations to the derivative of the posterior [33]. For the experiments, data were provided by Dr. J. Ratches of the Night Vision and Electronic Sensors Directorate (NVESD). Three observations are shown in the top row of Fig. 9. The corresponding estimates are shown in the bottom row. In each case, the algorithm was initialized with orientation broadside, gun facing to the right. The initial object signature was the mean signature from the database. The signature was represented using the dimensional subspace of signature principal components. In all three cases the correct object orientation was estimated. In a second set of experiments, the SNR model for FLIR employed in [25], [26] based on the mean object and background pixel intensity was used to estimate the operating SNR of 5.07 dB of the FLIR sensor that was used to collect the NVESD data. In the plots of Fig. 10 we show mean-squared

1906


Fig. 10. Left panel: Mean-squared error for orientation estimation via FLIR with and without signature information. Right panel: Mutual information curves for FLIR with and without signature information. In both cases the solid lines correspond to performance with signature information, and the dashed curves without signature information. The operating SNR of the NVESD FLIR data is indicated on the curves.

error loss (left panel) and information loss (right panel) due to object signature variation, labeling the points on the performance curves corresponding to the operating SNR of the FLIR sensor. The mean-squared error curves are computed using the Hilbert–Schmidt estimator [30] of (37) conditioning on the ob30 . Notice the information loss due to ject orientation signature uncertainty is 0.86 bits with associated mean-squared error performance loss by a multiplicative factor of . This is consistent with the bound of (15) which . predicts a loss by the factor of VIII. CONCLUSIONS This paper presents deformable template representations accommodating both geometric and signature variability of objects in ground-based scenes. In addition, a general informationtheoretic performance analysis paradigm for recognition is proposed based on the mutual information measure. This measure allows for the quantitative analysis of the information regarding an observed scene contained in remote sensor data in general scenarios. The uncertainty associated with signature variation is quantified via mutual information analysis and equivalently the Kullback–Leibler divergence. Accurate quadratic approximations to the posterior entropy which may be generalized to other information measures have been developed in the high SNR regime. Simulation results show that the asymptotic model accurately approximates the information measures over a broad range of SNR, indicating that imaging sensors may be modeled asymptotically in general low-dimensional object recognition estimation scenarios. The mutual information measures have been used to quantify both performance differences among individual sensors and recognition scenarios, as well as the information gain afforded by the joint use of multiple sensors. ACKNOWLEDGMENT Dr. Aaron Lanterman contributed to the development of the idea of expressing signature variation as a scalar random field via principal components analysis. Prof. Anuj Srivastava provided helpful discussions regarding asymptotic approximations to posterior densities in Bayesian object recognition. Prof. A. O. Hero and the anonymous referees prepared constructive reviews that improved our presentation of this work. The Night

Vision Electronic Sensor Directorate provided FLIR imagery to the Center for Imaging Science to validate the simulations. REFERENCES [1] A. Srivastava, M. Miller, and U. Grenander, “Multiple target direction of arrival tracking,” IEEE Trans. Signal Processing, vol. 43, pp. 1282–1285, May 1995. [2] M. Miller, A. Srivastava, and U. Grenander, “Conditional-mean estimation via jump-diffusion processes in multiple target tracking/recognition,” IEEE Trans. Signal Processing, vol. 43, pp. 2678–2690, Nov. 1995. [3] M. Miller, U. Grenander, J. O’Sullivan, and D. Snyder, “Automatic target recognition organized via jump-diffusion algorithms,” IEEE Trans. Image Processing, vol. 6, pp. 157–174, Jan. 1997. [4] A. Yuille and P. Hallinan, “Deformable Templates,” in Active Vision. Cambridge, MA: MIT Press, 1992. [5] P. Belhumeur, J. Hespanha, and D. Kreigman, “Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection,” IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 711–720, July 1997. [6] H. Murase and S. Nayar, “Visual learning and recognition of 3-D objects from appearance,” Int. J. Computer Vision, vol. 14, pp. 5–24, 1995. [7] S. Joshi, M. Miller, and U. Grenander, “On the geometry and shape of brain sub-manifolds,” Int. J. Pattern Recogn. Artificial Intell., vol. 11, no. 8, pp. 1317–1343, 1997. [8] A. Colmenarez and T. Huang, “Face detection with information-based maximum discrimination,” in Proc. Int. Conf. Computer Vision and Pattern Recognition, 1997. [9] , “Pattern detection with information-based maximum discrimination and error bootstrapping,” in Proc. Int. Conf. Computer Vision and Pattern Recognition, 1998. [10] P. Viola and W. Wells, “Alignment by maximization of mutual information,” Int. J. Computer Vision, vol. 24, no. 2, pp. 137–154, 1997. [11] J. Fisher III and J. Principe, “A methodology for information-theoretic feature extraction,” in Proc. World Congr. Computational Intelligence, 1998. [12] J. Principe, D. Xu, and J. Fisher III, “Pose estimation in SAR using an information-theoretic criterion,” Proc. SPIE, vol. 3370, pp. 218–229, 1998. [13] T. Cover and J. Thomas, Elements of Information Theory. New York: Wiley, 1991. [14] L. Shao, A. Hero, W. Rogers, and N. Clinthorne, “The mutual information criterion for SPECT aperture evaluation and design,” IEEE Trans. Med. Imag., vol. 8, no. 4, pp. 322–336, 1989. [15] J. Aggarwal, B. Guenther, R. Keyes, P. Kruse, M. Miller, J. Ratches, J. Shapiro, D. Skatrud, T. Turpin, and R. Zwirn, “Report of the Working Group on Automatic Target Recognition,” Army Res. Office, 1994. [16] C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J., vol. 27, pp. 379–423, 623–656, 1948. [17] U. Grenander, General Pattern Theory. Oxford, U.K.: Oxford Univ. Press, 1994. [18] Y. Moses, Y. Adini, and S. Ullman, “Face recognition: The problem of compensating for changes in illumination direction,” in Europ. Conf. Computer Vision, 1994, pp. 286–296. [19] W. Boothby, An Introduction to Differentiable Manifolds and Riemannian Geometry. New York: Academic, 1986.


[20] “Prism 3.1 Users’ Manual,” Keweenaw Res. Ctr., Michigan Technol. Univ., Houghton, 1987. [21] R. Muirhead, Aspects of Multivariate Statistical Theory. New York: Wiley, 1982. [22] A. Lanterman, M. Miller, D. Snyder, and W. Miceli, “The unification of detection, tracking, and recognition for radar/ladar processing,” Proc. SPIE, vol. 2562, pp. 150–216, 1995. [23] A. Lanterman, M. Miller, and D. Snyder, “Implementation of jump-diffusion processes for understanding flir scenes,” Proc. SPIE, vol. 2485, pp. 309–320, 1995. [24] D. L. Snyder, A. M. Hammoud, and R. L. White, “Image recovery from data acquired with a charge-coupled-device camera,” J. Opt. Soc. Amer. A, vol. 10, pp. 1014–1023, 1993. [25] J. Kostakis, M. Cooper, T. Green, M. Miller, J. O’Sullivan, J. Shapiro, and D. Snyder, “Multispectral active-passive sensor fusion for groundbased target orientation estimation,” Proc. SPIE, vol. 3371, pp. 500–507, 1998. [26] , “Multispectral sensor fusion for ground-based target orientation estimation: FLIR, LADAR, HRR,” Proc. SPIE, vol. 3717, 1999.

1907

[27] S. Hannon and J. Shapiro, “Active-passive detection of multipixel targets,” Proc. SPIE, vol. 1222, pp. 2–23, 1990. [28] U. Grenander, A. Srivastava, and M. Miller, “Asymptotic performance analysis of Bayesian target recognition,” IEEE Trans. Inform. Theory, vol. 46, pp. 1658–1665, July 2000. [29] B. Clarke, “Asymptotic normality of the posterior in relative entropy,” IEEE Trans. Inform. Theory, vol. 45, pp. 165–176, Jan. 1999. [30] U. Grenander, M. Miller, and A. Srivastava, “Hilbert-Schmidt lower bounds for estimators on matrix lie groups,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 1–13, Aug. 1998. [31] H. VanTrees, Detection, Estimation, and Modulation Theory Part I. New York: Wiley, 1968. [32] A. Srivastsava, “Inferences on Transformation Groups Generating Patterns on Rigid Motions,” Ph.D. dissertation, Washington Univ., St. Louis, MO, Aug. 1996. [33] A. Lanterman, “Jump-Diffusion Algorithms for the Automated Understanding of Forward-Looking Infrared Scenes,” M.S. thesis, Washington Univ., Dept. Elec. Eng., St. Louis, MO, May 1995.