Information Gain in Object Recognition Via Sensor ... - Semantic Scholar

Information Gain in Object Recognition Via Sensor Fusion Matthew Cooper and Michael Miller Center For Imaging Science Department of Electrical Engineering Washington University, St. Louis, Missouri 63130

Abstract:

We have been studying information theoretic measures, entropy and mutual information, as performance metrics on the information gain given a standard suite of sensors. Object pose is described by a single angle of rotation using a Lie group parameterization; observations are generated using CAD models for the targets of interest and simulators. Variability in the data due to the sensor by which the scene is remotely observed is statistically characterized via the data likelihood function. Given observations from multiple sensors, data fusion is automatic in the posterior density. We consider the mutual information between the target pose and remote observation as a performance measure in the pose estimation context. We have quantitatively examined the additional information gains due to sensor fusion. Furthermore, we relate the information theoretic performance measures with probability of error in the pose estimation problem via Fano's classic inequality.

Keywords: Automatic Target Recognition (ATR), Information Theory, ATR Performance Analysis

1 Introduction We have been studying the use of deformable templates for rigid body Automatic Target Recognition (ATR) [1, 2, 3, 4] accommodating the geometric variations of orientation, po For additional information: http://cis.wustl.edu/. This paper was presented at the 1998 International Conference on Multisource-Multisensor Data Fusion, 9 July, 1998. This work was supported by ONR N00014-94-1-0859, ONR/AASERT N00014-94-1-1135, and ARO DAAH04-95-1-0494

sition, and scale. The application of pattern theory to the ATR problem proceeds in three steps: (i) the representation of the scene being observed, (ii) the formation of remote observations by multiple sensors, and (iii) inference via computational search and optimization strategies [4]. We formulate the ATR problem via a communications model using the \source-channel" characterization of Shannon; the decoder attempts to infer properties of the sources (targets) in the scene from the observations of the scene viewed through the remotesensing channel. Such a view allows us to answer the fundamental question of how much information various sensors provide about remote scenes.

1.1 Representing Target Variability Via Deformable Templates Objects are represented using templates; their in nite variety of pose is represented via transformations which act on the templates. In the context of ATR involving objects, the transformations form groups. A deformable template is then the orbit under a group action, its subgroups, and products. Estimation becomes identi cation of the group action and target type, requiring matching the observed remotely sensed image with the particular instance of the template. Scenes consist of targets of class 2 A, where A denotes an alphabet of possible targets, each with an associated parameter vector s 2 S the group of transformations describing position and orientation. Templates are constructed corresponding to CAD repre-

Figure 1: The two primary modes of variation of ground based targets observed via FLIR sensors:geometric & thermodynamic variability. In the top row a T62 tank is shown at two dierent orientations. In the bottom row, the tank is shown in dierent thermodynamic states at a single pose. sentations of the 2-dimensional surface manifolds of the rigid objects. Denote such an ideal template as Itemp fItemp (x); x 2 X g, X the space indexing location in the target. A rigid template de ned by a CAD model is shown in the top row of Figure 1 at two dierent poses. Geometric variation, due to variability of pose of the targets is introduced via the rigid motions of translation and rotation. The set of transformations S are Lie group actions on the templates. For ground-based scenes, we use the axis- xed 2 2 rotation group, identi ed with SO(2), and translations in the plane, R I 2 . The transformations are of the form, s = (O; a), O 2 SO(2), and a 2 RI 2 a position vector. Then s 2 S SE(2) SO(2) n RI 2 , the Special Euclidean Group, where n denotes the semi-direct product [5], with S : X $ X according to sx 7! Ox + a. The deformable template over which inference occurs is the orbit under SE(2):

I fItemp(Ox + a) : (O; a) 2 Sg: (1) Signature variation, exhibited in the bottom row of Figure 1 in which the geometric parameters are xed, is accommodated by modeling the target thermodynamic state by a scalar Gaussian random eld on the target

surface. Speci cally, we de ne the target thermodynamic state as the random eld T (l); l 2 L; jLj = L, where L is the discrete set of lattice points on the target surface speci ed by the template (CAD model). As shown in [6, 7, 8], the truncated Eigen-expansion in which the rst C = 20 thermodynamic principal components with the greatest associated Eigenvalues are retained, eectively represents the target thermodynamic signature. De ne the extended template accommodating both the in nity of geometric and signature variations of targets in optical and FLIR imagery using the equivalent variations of the coecients () in the Eigen-expansion of the target thermodynamic signature: I fItemp (Ox + a; T ()) : (s; ) 2 S SE(2) RI 20 g: (2)

1.2 Source-Channel Model for ATR

Denote the system through which the ideal images are observed by the deformation mechanism D : I ! I D , which can be either deterministic or random. The data I D may have multiple components corresponding to the varD D D ious sensors: I I1 ; I2 ; : : : , each characterized via a likelihood function L(j) : I I D ! RI ; i = 1; 2; : (3) The conceptual separation of the representation of the space of possible images I with its associated prior (I (s)); I (s) 2 I ; s 2 S , and the image formation process with transition law L(I D jI ), I D 2 I D , specifying the output of the remote sensing channel, is consistent with Shannon's source-channel view of communications. The importance of this view is that there is only one true underlying scene, irrespective of the number of sensor measurements forming I D = (I1D ; I2D ; : : : ). Only one inference problem is solved, with the multiple observations due to multiple sensors viewed as providing additional information in the posterior distribution. Sensor fusion occurs automatically in this framework.

The Bayesian formulation naturally incorporates additional sensors into the inference algorithm. Remote sensor observations are conditionally independent given the parameters describing the scene; the data likelihood terms for each sensor are thus multiplied to give the joint data likelihood for all available observations. If there are S observations, fI1D ; ; ISD g, produced by S dierent sensors, L(I1D ; ; ISD jO) =

S Y i=1

L(IiD jO):

(4)

We shall focus on two sensors: the optical imaging sensor and the FLIR sensor.

2 The Imaging Models For the ground scenario the scenes considered herein consists of a T62 tank at a known position against a constant background. The video imaging process is modeled as a far- eld orthographic projection, with a narrow eld of view implying the mapping of a point in 3D space to the 2-D detector space according to (x1 ; x2 ; x3 ) 7! (x1 ; x2 ). The measurements are assumed to be Gaussian random elds with mean eld corresponding to the orthographic projection of the scene onto the camera [3, 4]. The FLIR imager uses perspective projection, according to the mapping (x1 ; x2 ; x3 ) 7! ( xx31 ; xx23 ) creating the vanishing point eect in which objects which are further away from the sensor appear closer to the center of the detector. For generating FLIR scenes, Gaussian and Poisson-based CCD sensor models are used [9, 10] coupled to standard infra-red radiation from CAD models of the target generated by the PRISM simulator [11]. In the high photon limit, the measurements I D fIiD : i = 1; ; ND g are a Gaussian random eld with mean the projective transformation of the scene.

3 Mutual Information Performance Analysis Mutual information provides a direct performance measure of ATR systems. The mutual information between two sets of random variables is the direct measure of the dependence between them [12], and we use it to quantify the dependence between the remote observation of the scene and the target pose. Mutual information provides the bound on communication rate for errorless communication of the parameters describing the scene via the remote-sensing channel. View the observation data I D as a random vector statistically processed to infer the random parameters of target pose. The mutual information is then a function of I D measuring its information content regarding the unknown parameter.

3.1 Mutual Information

Here we consider ground-based scenes in which target class and position are known. The objective is to estimate target pose parameterized by a single angle of rotation, O 2 SO(2). In the Bayesian setting the inference depends on the posterior density, (OjI D ). To assess estimator performance, we measure the information content of the posterior density. Speci cally, we compute the mutual information between target pose and the available observation: I (O; I D ) = H (O) ? H (OjI D ); (5) H (O) = ?

Z

SO(2)

Z

log2 ((O))(dO); Z

(6)

log ((OjI D )) SO(2) I D 2 (7) ( dO; dI D ): The prior on target pose is assumed uniform. For our simulations, we integrate over SO(2) numerically using the discrete set of target orientations O = fO1 ; ; OM g SO(2). To avoid the complications in this problem due to target symmetry we only consider azimuth angles in the set fi : i = H (OjI D ) = ?

1; P ; M g [0; 2 ). H (O) is approximated as ? Mi=1 (Oi ) log2 [(Oi )] = log2 [M ]. The integral over I D is evaluated via Monte Carlo random sampling conditioning on the target pose. Figure 2 shows curves for the mutual information I (O; I D ) versus signal to noise ratio (SNR) for the T62 tank, the HMMV (personnel carrier), and a truck. Note the variation in the information gain due to the target geometry. Mutual Information: Optical Imager 7

HMMV Truck T62

6

5

D

I(O;I )

4

3

2

1

0 −20

−15

−10

−5

0

5 SNR (dB)

10

15

20

25

30

Figure 2: The plot shows mutual information curves versus SNR for three targets for the optical imaging sensor: the T62 tank, the HMMV, and the truck.

3.2 Sensor Fusion

Sensor fusion is readily accommodated by substituting the posterior conditioned on independent observations of multiple sensors (via Equation 4 for the multiple sensor likelihood function). The mutual information becomes: D ;I D I (O; IFLIR V IDEO ) = H (O) ? D ; ID H (OjIFLIR V IDEO )

(8)

D ;I D H (OjIFLIR V IDEO ) =

?

Z

Z

D ; ID log ((OjIFLIR V IDEO )) SO(2) IFD IVD 2 (9) D ; dI D (dO; dIFLIR ) : V IDEO

In the top panel of Figure 3 simulation results for the conditional entropy, H (OjI D ), for the FLIR sensor, the optical imaging sensor, and

the joint case are shown versus increasing signal to noise ratio. The availability of a second observation orders the forms of the conditional entropy and hence the forms of the mutual information as follows: D ; ID D H (OjIFLIR V IDEO ) H (OjIFLIR ); (10) D D D H (OjIFLIR ; IV IDEO ) H (OjIV IDEO ); (11) D D D I (O; IFLIR ; IV IDEO ) I (O; IFLIR ); (12) D D D I (O; IFLIR ; IV IDEO ) I (O; IV IDEO ): (13) In the bottom panel, the mutual information, I (O; I D ) is shown for each of the individual sensors as well as for the joint case. Figure 4 expands on these results by incorporating a millimeter wave range radar (HRR) into the sensor suite. The simulator and models used for the HRR system are described in [13]. The top panel shows conditional entropy curves, and the bottom panel shows mutual information curves.

4 Fano's Inequality & Probability of Error Fano's classic inequality [12] relates conditional entropy and probability of error in estimation problems: Pe P rfO^ (I D ) 6= OT g (14) D (15) = H (ODjI ^) ?DH (Pe ) H (OjI ; O(I ) 6= OT ) ID) ? 1 Hlog(O(jjOj (16) ? 1) : 2 Equation 16 allows us to lower bound the probability of error in the pose estimation problem. We have previously examined forms of the conditional entropy for multiple sensor scenarios and noted their implied ordering. This ordering in turn orders the lower bounds for Pe provided by Fano's inequality. In Figure 5 we compare the lower bounds for Pe for the FLIR, optical imager, and joint cases.

Conditional Entropy H(O|D)

Conditional Entropy Curves

7

7 FLIR VIDEO JOINT

6

FLIR HRR VIDEO JOINT

6

5

4

4

H(O|D)

H(O|ID)

5

3

3

2

2

1

1

0 −20

−10

0

10 SNR (dB)

20

30

40

0 −20

−10

0

Mutual Information I(O;D)

10 SNR (dB)

20

30

40

Mutual Information Curves

7

7 FLIR VIDEO JOINT

6

FLIR HRR VIDEO JOINT

6

5

4

4

I(O;D)

I(O;ID)

5

3

3

2

2

1

1

0 −20

−10

0

10 SNR (dB)

20

30

40

Figure 3: In the top panel, the conditional enD ) (solid line), H (OjI D tropy H (OjIFLIR V IDEO ) D D (dashed line),H (OjIFLIR ; IV IDEO ), (x) is plotted versus increasing SNR. In the bottom panel, the mutual information, D ) (solid line), I (O; I D I (O; IFLIR V IDEO ) (dashed D ; ID line),I (O; IFLIR ) (x) is plotted versus V IDEO increasing SNR.

5 Conclusion Using a deformable template representation for the target of interest, we have developed mutual information as a performance metric for ATR scenarios. Speci cally, in the pose estimation setting, we have evaluated I (O; I D ), the mutual information between the target pose and the remote observation(s) of the scene. This measure quanti es both the additional information gain due to the use of multiple sensors and relative performance dierences between the sensors considered. Furthermore, Fano's inequality has been used to

0 −20

−15

−10

−5

0 SNR (dB)

5

10

15

20

Figure 4: The top panel, the conditional entropy curves for the FLIR, optical imaging, HRR sensor, and joint case (x) versus increasing SNR. In the bottom panel, the corresponding mutual information curves are plotted versus increasing SNR. relate the information theoretic performance measures to the probability of error in the pose estimation problem.

References [1] M. Miller, R. Teichman, A. Srivastava, J.A. O'Sullivan, and D. Snyder. Jump-diusion processes for automated tracking-target recognition. In Proceedings of the Twenty-Seventh Annual Conference Conference on Information Sciences and Systems, pages 617{622, Baltimore, Maryland, March 24-26 1993. Johns Hopkins University.

Probability of Error Curves: FLIR, Video, & Joint 0.9 FLIR VIDEO JOINT

0.8

Minimum Probability of Error

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 −20

−15

−10

−5

0

5 SNR (dB)

10

15

20

25

30

Figure 5: The curves show the lower bounds for Pe given by Fano's inequality (Equation 16) versus SNR for the FLIR (solid), the optical imager (dashed), and the joint case (x). [2] A. Srivastava, M. Miller, and U. Grenander. Multiple target direction of arrival tracking. IEEE Transactions on Signal Processing, 43(5):1282{1285, May 1995. [3] M. Miller, A. Srivastava, and U. Grenander. Conditional-mean estimation via jump-diusion processes in multiple target tracking/recognition. IEEE Transactions on Signal Processing, 43(11):2678{ 2690, November 1995. [4] M. Miller, U. Grenander, J. O'Sullivan, and D. Snyder. Automatic Target Recognition Organized Via Jump-Diusion Algorithms. IEEE Transactions on Image Processing, 6(1):157{174, January, 1997. [5] W. Boothby. An Introduction to Dierentiable Manifolds and Riemannian Geometry. Academic Press, 1986. [6] M. Cooper, A. Lanterman, S. Joshi, and M. Miller. Representing The Variation of Thermodynamic State Via Principal Components Analysis. In Proceedings of the Third Workshop On Conventional Weapon ATR. U.S. Army Missile Command, Redstone Arsenal, AL, November 1996, pp. 481-490.

[7] M. Cooper, U. Grenander, M. Miller, and A. Srivastava. Accommodating Geometric and Thermodynamic Variatiability for Forward-Looking Infrared Radar Systems. Proc. SPIE, 3070:162-72, 1997. [8] M. Cooper and M. Miller. Information Measures for Objedt Recognition. To appear, Proc. SPIE, 3370, 1998. [9] A. Lanterman, M. Miller, D. Snyder, and W. Miceli. The uni cation of detection, tracking, and recognition for Radar/Ladar Processing, Proc. SPIE, 2562:150-16, 1995. [10] D. L. Snyder, A. M. Hammoud, and R. L. White. Image recovery from data acquired with a charge-coupled-device camera. J. Optical Society of America A, 10:1014{ 1023, 1993. [11] A. Curran, et al.. Prism 3.1 Users' Manual. Keweenaw Research Center, Michigan Technological University, 1987. [12] T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991. [13] S. Jacobs, J. O'Sullivan. High Resolution Radar Models for Joint Tracking and Recognition. Proc. 1997 IEEE National Radar Conference, pp. 99-104.

Information Gain in Object Recognition Via Sensor ... - Semantic Scholar

Information Gain in Object Recognition Via Sensor ... - Semantic Scholar

Suggest Documents

Maximizing Throughput Gain via Resource ... - Semantic Scholar

Fast Neuromimetic Object Recognition using ... - Semantic Scholar

Object Recognition using Disk based ... - Semantic Scholar

HyperImages: Using object recognition for ... - Semantic Scholar

Recurrent processing during object recognition - Semantic Scholar

sensing Hydrodynamic object recognition using ... - Semantic Scholar

sensing Hydrodynamic object recognition using ... - Semantic Scholar

Object Categorization via Local Kernels - Semantic Scholar

Object Recognition via Local Patch Labelling - Microsoft

accelerometer-based gesture recognition via ... - Semantic Scholar

Pattern Recognition Via Vasconcelos' Genetic ... - Semantic Scholar

Improving Face Recognition via Narrowband ... - Semantic Scholar

microphone-array speech recognition via ... - Semantic Scholar

accelerometer-based gesture recognition via ... - Semantic Scholar

Gesture Recognition Robot via Kinect Sensor

Semantic Information and Sensor Networks - Semantic Scholar

Object Recognition in 3D Point Cloud of Urban ... - Semantic Scholar

visual object recognition in the context of mobile ... - Semantic Scholar

Shape-based Object Recognition in Videos Using ... - Semantic Scholar

Object Instance Search in Videos via Spatio ... - Semantic Scholar

Object Recognition Using Distributions of Edge Information

Information Measures for Object Recognition Accommodating ...

Information Measures for Object Recognition Accommodating ...

Tactile Object Recognition From Appearance Information