A Methodology for Integrating Images and Text for Object Identification

A METHODOLOGY FOR INTEGRATING IMAGES AND TEXT FOR OBJECT IDENTIFICATION Patrick Paulson Ryan Hohimer Pete Doucette William Harvey Gamal Seedahmed Gregg Petrie Lou Martucci Pacific Northwest National Laboratory Richland, WA [email protected] [email protected] [email protected] [email protected] [email protected] [email protected] [email protected]

ABSTRACT Often text and imagery contain information that must be combined to solve a problem. One approach begins with transforming the raw text and imagery into a common structure that contains the critical information in a usable form. This paper presents an application in which the imagery of vehicles and the text from police reports were combined to demonstrate the power of data fusion to correctly identify the target vehicle--e.g., a red 2002 Ford truck identified in a police report--from a collection of diverse vehicle images. The imagery was abstracted into a common signature by first capturing the conceptual models of the imagery experts in software. Our system then (1) extracted fundamental features (e.g., wheel base, color), (2) made inferences about the information (e.g., it’s a red Ford) and then (3) translated the raw information into an abstract knowledge signature that was designed to both capture the important features and account for uncertainty. Likewise, the conceptual models of text analysis experts were instantiated into software that was used to generate an abstract knowledge signature that could be readily compared to the imagery knowledge signature. While this experiment primary focus was to demonstrate the power of text and imagery fusion for a specific example it also suggested several ways that text and geo-registered imagery could be combined to help solve other types of problems.

DATA FUSION IN IMAGE AND TEXT ANALYSIS The goal of our research has been to build semantic bridges between data stored in various media types. Linking multiple information sources allows users to form an overall picture of a situation and allows undiscovered relationships to be discerned between documents. The current focus of our research is to fuse the semantic content of imagery and text information. Knowledge Signatures and Information Fusion Our approach to fusing the semantic information of images and text makes use of the Knowledge Signature architecture we recently developed at PNNL (Thomson, Cowell et al. 2004). A knowledge signature is a real-valued vector that captures semantics of data within a particular context or domain. This approach will simplify analytic tasks requiring fusion of multiple types of data, since data items can be identified and sorted using the vocabulary and tacit knowledge from the problem domain, independently of the format of the data. Using Knowledge Signatures for information fusion is illustrated in Figure 1. Images and text are separately processed by routines called observers. Each observer is designed to determine whether a particular representation of data contains a particular semantic concept. The semantic concepts that are of interest are described by an ontology describing domain knowledge. All observations on a piece of data are collected into a knowledge signature

ASPRS 2006 Annual Conference Reno, Nevada May 1-5, 2006

that is indexed by the semantic concepts in the ontology. The resulting signatures can be compared to determine if they contain similar semantic meaning.

Imagery

Text

Automated Image A l i

Image Observers

Image Knowledge Signature

Signature Comparison (Ontology)

Automated Text Analysis Text Observers

Image-Text Fusion

Text Knowledge Signature

Figure 1. Knowledge Signatures and Information Fusion. In addition to concepts that can be directly observed, the ontology contains knowledge on how to infer the existence of additional concepts. This allows automated reasoning to refine observed knowledge signatures in order to find hidden similarity. An example of the importance of refinement is shown in Figure 2. This figure shows a portion of simple knowledge signatures after observation from an image and a piece of text and again after refinement. This image observer can directly observe the measurement of a wheelbase. The text observer recognizes that “F150” is a truck model. The value -1 is used as a sentinel to indicate that no observation was made for a concept. Since there are no concepts that are observed in both signatures, similarity cannot be computed. However, since we have inference rules in our ontology that indicate that a long wheelbase indicates a truck, after refinement both signatures have values for all concepts, allowing a comparison.

APPLICATION TO VEHICLE CLASSIFICATION Knowledge Signatures research at PNNL has been primarily focused on extracting signatures from text. In order to show the utility of our approach for information fusion, we implemented an application that compares images of vehicles to textual descriptions of vehicles. This application implements enforcement of a ‘watch-list’ at a vehicle chokepoint.

Chokepoint Scenario The application is meant to simulate data collection and evaluation at a vehicle chokepoint. A chokepoint has several characteristics that simplify the data collection process: • Slow speeds • One or two lanes • Tight traffic controls • Sensors close to vehicles • Can not be easily avoided


One benefit of a chokepoint scenario is that by restricting attention to a single lane, vehicle occlusions, a major problem in many systems (Huang and Liao 2004), can be avoided.

“F150 Pick Up” Observed values: Long Wheelbase Medium Wheelbase Short Wheelbase Truck Sedan SUV

0.9 0.2 0.0 -1.0 -1.0 -1.0

-1.0 -1.0 -1.0 0.8 0.0 0.0

0.9 0.2 0.0 0.9 0.0 0.0

1.0 0.0 0.0 0.8 0.0 0.0

Refined values: Long Wheelbase Medium Wheelbase Short Wheelbase Truck Sedan SUV

Figure 2. Inferring Hidden Similarity. Since non-intrusive sensors are often desired, systems typically make use of video systems with a single camera producing low-resolution images. Chokepoints allow for careful positioning of sensor devices and known backgrounds to facilitate image processing. Examples of chokepoints include ferry loading, parking at events, toll booths, and border crossings. Figure 3 shows a chokepoint at a ferry crossing. This chokepoint allows for overhead and close side view of vehicles and can allow surveillance of vehicles before they are allowed on the closed environment of a ferry.

Evaluation of Image/Text Fusion Our hypothesis is that knowledge signatures provide a way to determine the similarity of semantic content extracted from imagery and text data. If this is true, then the similarity determined by the system should correspond to the similarity described by domain experts. The goal of the evaluation is to demonstrate the feasibility of using knowledge signatures for information fusion, especially the usefulness of refinement and similarity metrics. Less emphasis is placed on feature observation. In order to test our hypothesis, we surveyed a group of subjects to create ground truth for the similarity of textual descriptions of vehicles to images of vehicles. We then explored the correlation between the system’s similarity results to the ground truth data. Since subjects’ opinions regarding similarity of objects differed, the best the system could hope to do would be to not vary from the consensus opinion by more than the average expert varies.


Figure 3. Vehicle chokepoint.

EXPERIMENTAL SETUP Watch List The text descriptions in our watch list are brief descriptions similar to text descriptions from the police logs reported my many local newspapers. The five text descriptions used for the trials are shown in Figure 6. Light blue Chevy White Cadillac Escalade Late model Green 4 door SUV 2005 dark red Ford pickup Blue 2 door coupe Figure 4. Text Descriptions.

Vehicle Images A collection of images was taken for development of the image feature observers. The correspondence between the field-collected measurements of vehicle features and measurements computed by the observers are shown in Table 1. The images were modified to emulate controlled conditions at a vehicle chokepoint. The primary modification was to simplify the background so that a vehicle silhouette could be easily extracted. In a real-world application we would attempt to control the background during data collection. shows one of the field collected images before and after modification. The measurements indicated that length measurements were systematically short and that there was significant variance in the height measurements. The errors are consistent with lens distortions and could probably be eliminated with camera calibration. To allow testing of the reasoning and fusion portions of the system the field measurements were used instead of values observed by the system software. Also, additional images were obtained from the Internet; the appropriate published dimensions were obtained for these images.


Figure 1. Image Cleanup.

Collection of Ground Truth from Human Subjects Five subjects were interviewed about vehicle classification to develop a baseline for comparison with the system’s performance. Each subject was given the list of text descriptions and 22 vehicle images. For each text description, the subject identified the 3 images that best matched the description and a rating on a scale of 1-5 of the quality of the match. These values are given in Table 2 along with the top 3 selections of the ksigs system.

THE VEHICLE IDENTIFICATION DOMAIN Knowledge collection and feature extraction was guided by attempting to determine the make, model, year, color, and body style of vehicles. Our observation that these are common concepts in vehicle identification was verified by examining police-call logs reported on the world-wide web. These concepts formed the mode groups of our ontology.

Extracting Features From Text Features were extracted from text using a simple gazetteer mechanism. The particular words in text indicate the existence of particular concepts. For example, ‘hardtop’ indicates the concept of ‘Coupe’ and ‘Chevy’ indicates the concept of ‘Chevrolet’.

Extracting Features from Images The chokepoint scenario allows us to make assumptions that simplify feature extraction for images. For example, the camera triggering device can be designed so that the front tire of the vehicle can be assumed to be in a specific region of the image. Ellipse detection is then used to locate the front tire. The ellipse detector can then be directed to likely locations for the rear wheel. This provides an accurate measurement of wheelbase. The wheel locations also serve as an anchor for algorithms to detect the basic shape of the vehicle, from which height and overall length can be extracted. The median color of the shape provides a good sample for vehicle color. The HSI model for color was used. Once the HSI values were determined, they were mapped into domain concepts such has ‘High intensity’, ‘Medium intensity’, etc. Intermediate values indicated a partial presence of two or more concepts.


Table 1. Measured and Observed Dimensions Description Vehicle Image

Image 3 Image 6 Image 7 Image 10 Image 11 Image 12 Image 16 Image 18

Image 19

Blue 2005 Ford F150 Supercrew Bronze 2000 Lincoln Continental Silver 2005 Ford SD F-350 4x4 Crewcab Green 2005 Ford Expedition White 2005 Ford Explorer 4-Door XLT 4x4 Silver Ford 2005 Escape XLT 4WD Bronze 2003 Ford Taurus Dark Red 2005 Ford SD F-250 4x4 Crew Cab Beige 2003 Ford LX 4-Door Crown Victoria

Wheelbase Measured Observed 139 137.88

Length Measured Observed 218 214.48

Height Measured Observed 75 74.91

110 109.21 174 173.86

200 190.87 258 252.15

55 56.64 79 81.38

119 116.53 113 112.02

199 187.21 184 178.78

74 78.71 68 74.21

103 103.58 108.5 110.34 156 157.00

172 169.78 194.5 183.98 236 232.05

69 70.70 55 58.33 78 79.13

114.5 113.70

203.5 197.89

58 59.03

Domain Knowledge Much of our domain knowledge was obtained from a spreadsheet of car and light truck specifications (Communications 2004). This document contains information on body styles and dimensions. This knowledge is useful since dimensions can be reliably extracted from images in our scenario. The extracted dimensions were compared with dimensions of known vehicles and the result used to determine the presence of the corresponding concept. As an example, to test for the presence of a Ford Crown Victoria, the height, length, and wheelbase extracted from the image are compared to the known measurements of 58.3, 212, and 114.5 inches. Each comparison results in a value close to 1 for an exact match and near 0 for a bad match. The minimum of these values is the value for the presence of a Ford Crown Victoria. Each vehicle described in wards was placed in the appropriate class for its body style, number of doors, make, and model.


Table 2. Evaluation of image/text similarity Selections (Image/Quality) “Light blue Chevy” Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Ksigs “White Cadillac Escalade” Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Ksigs “Late model Green 4 door SUV” Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Ksigs “2005 dark red Ford Pickup” Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Ksigs “Blue 2 door coupe” Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Ksigs

2 2 2 2 2 6

4 5 5 4 3 -

8 3 16 16 6 3

2 3 3 3 3 -

21 21 21 6 3 2

2 1 3 2 2 -

5 5 5 19 11 5

4 5 5 4 3 -

11 11 11 16 12 22

3 4 4 3 3 -

14 13 22 6 13 11

2 2 3 2 3 -

10 13 13 10 10 10

5 5 5 5 4 -

13 10 20 12 14 13

3 4 5 4 2 -

20 9 9 20 20 20

2 1 3 3 2 -

18 18 18 18 18 18

5 5 5 5 4 -

8 3 15 15 3 15

2 3 3 2 2 -

15 7 8 8 7 20

2 3 1 1 1 -

21 21 21 17 17 6

4 5 5 3 3 -

17 2 17 21 21 21

3 4 3 3 2 -

2 17 2 4 1 2

2 3 1 1 1 -

Many particular concepts might give good matches for the dimensions extracted from the image. If these concepts share classification information – they are all a particular body style or model, for example--then the strength for that body-style or model will be increased by the concept refinement mechanism. Domain information about color was developed in-house, primarily by making subjective judgments about how hue, saturation, and intensity measurements correspond to verbal color descriptions such as ‘dark blue’, ‘pink’, and ‘pale yellow’. For example, the existence of ‘Teal’ is indicated by a hue of cyan or teal, a medium intensity, and a medium or low value for saturation.

Evaluation Metric The goal of the empirical study is to compare the performance of the system to an average human subject. The system returns a sorted list of matches for each question. The quality of the matches can be sorted, but the quality of match is not directly comparable to the quality decisions made by the human subjects.


Our initial metric is the percentage of selections of a subject that match the selections of other subjects. The quality measurement is not taken into account. Since our knowledge signature metric produces only an ordering and does not compute quality, the top three matches of the Knowledge Signature system are used. Table 3. Comparison of Ksigs to subjects Subject Subject 1 Subject 2 Subject 3 Subject 4 Subject 5 Knowledge Signatures System

% of matching responses 61.7 58.3 60.0 46.7 50.0 58.7

The Knowledge Signature system’s choices matched 58.7% of the subjects’ selections, which bettered the average subject’s score of 55.3%.

RELATIONSHIP TO PREVIOUS APPROACHES The research described in this paper is associated with research in data fusion and information retrieval.

Data Fusion Most current research on data fusion focuses on two different aspects of fusion. Carvaolho et al. (Carvalho, Heinzelman et al. 2003) use the terms low-level and high-level data fusion to refer to these aspects. Their architecture is general and intended to be applied across problem domains, but the particular fusion processes used depend on the particular application and problem domain. In the Knowledge Signatures architecture, the domain modeled by an ontology that allows the use of pre-defined refinement strategies. High-level fusion techniques attempt to draw information from multiple types of sources and model more abstract concepts within a particular domain. Much research in ‘high-level’ fusion is domain or application specific in which only the extracted features of data-items are available to the fusion processor. Since the signature observers commonly make use of domain knowledge to facilitate feature extraction, our architecture is moving toward what Roemer et al. (Roemer, Kacprzynski et al. 2001) call a hybrid fusion architecture. Soibelman, et al (Soibelman, Liu et al. 2004) describe a methodology to extract information from heterogeneous databases and fuse it into models which can be subjected to data analysis techniques. This system echoes our approach in that multiple sources are merged into a specific model; the explicit use of modeled semantic field during fusion is not used. Growe (Growe 1999) uses a semantic graph to model problem domain information. This knowledge is used to facilitate fusion of multi-sensor data. The fused data is and domain knowledge is used for road-recognition.

Reasoning Support in Data Fusion While many fusion architectures use domain reasoning to some degree, these systems are largely aimed at either low-level fusion of particular data-types or high-level fusion in particular problem domains. The Knowledge Signature architecture models the problem domain being addressed by a set of concepts that are pertinent in that domain. The relationships between the concepts are maintained separately within an ontology, and specific refinement and comparison operations are defined with respect to that ontology. Several other researchers are making preliminary efforts in using ontologies to facilitate information fusion (Fonseca, Egenhofer et al. 2002; Boury-Bisset 2003).

Previous Work in Vehicle Classification Huang and Liao (Huang and Liao 2004) describe image processing primitives needed to extract vehicle shapes. They are largely concerned with occlusion detection and use motion and other techniques to find occlusions. Vehicles are classified as sedans, vans, pickup, truck, van truck, bus, and trailer. The illustrations for pickup, truck, and van-truck that show vehicle types commonly found on American roads. No basis for this hierarchy is given. The ASPRS 2006 Annual Conference Reno, Nevada May 1-5, 2006

paper offers categorization rules that use length, aspect ratio, and compact ratio to classify images into these categories with an overall success rate of 91%, although some categories were not well represented in the trials.

CONCLUSIONS AND FUTURE WORK Knowledge Signatures provide a way to compare the semantic information associated with a data items even if the data items have different data formats or are different media. This report showed how semantic information from textual documents and electronic images can be compared. The semantic similarity metrics presented are novel in that they provide a knowledge-guided mechanism to help eliminate double counting of evidence of similarity due to natural known data dependence. A sample program demonstrates the use of the technology in a practical application: Identifying vehicles given a simple text description. Empirical results were presented that show that the system determines similarity as well as human subjects. Future work includes application of Knowledge Signatures to additional problems, such as target acquisition. Observers will be constructed for additional data types, such as streaming video.

REFERENCES Boury-Bisset, A.-C. (2003). Ontology-based Approach for Information Fusion. Carvalho, H. S., W. B. Heinzelman, et al. (2003). A general data fusion architecture. 6th International Conference on Information Fusion, Cairns, Queensland, Australia. Communications, W. s. (2004). '05 Model U.S. Car and Light Truck Specifications and Prices. Fonseca, F. T., M. J. Egenhofer, et al. (2002). "Using Ontologies for Integrated Geographic Information Systems." Transactions in GIS 6(3). Growe, S. (1999). Knowledge Based Interpretation of Multisensor and Multitemporal Remote Sensing Images. Meeting of the EARSel SIG, Spain. Huang, C.-L. and W.-C. Liao (2004). A Vision-Based Vehicle Identification System. 17th International Conference on Pattern Recognition, Cambridge, UK, IEEE. Roemer, M. J., G. J. Kacprzynski, et al. (2001). Assessment of Data and Knoweldge Fusion Strategies for Prognostics and Health Management. IEEE International Conference on Aerospace. Soibelman, L., L. Y. Liu, et al. (2004). Data fusion and modeling for construction management knowledge discovery. International Conference on Computing in Civil and Building Engineering, Weimar, Germany. Thomson, J., A. Cowell, et al. (2004). Knowledge Signatures for Information Integration. Posters of the 2004 ODBASE (Ontologies, Databases, and Applications of Semantics) International Conference, Agia Napa, Cyprus, Springer.


A Methodology for Integrating Images and Text for Object Identification

A Methodology for Integrating Images and Text for Object Identification

Suggest Documents

A Qur'anic Methodology for Integrating Knowledge ...

A Methodology and Implementation for Annotating Digital Images for ...

A Methodology for the Improvement of Object

Object Identification in Dynamic Images Based on

A hybrid system identification methodology for ...

A methodology for integrating design for quality in ... - Semantic Scholar

A methodology for integrating network theory and ... - Semantic Scholar

A methodology for integrating cell formation and production planning ...

A methodology for integrating business process and ... - CiteSeerX

A methodology for Integrating and Synchronizing the ... - CiteSeerX

A Systematic Methodology for Multi-Images Encryption and Decryption

Methodology for Structural Identification and Damage

Collaboration Methodology for Integrating Non-Functional

Methodology for Selecting Best Management Practices Integrating ...

Methodology for processing backscattered electron images ...

Integrating a Structured-Text Retrieval System with an Object ...

Text Independent Speaker Identification using Integrating ... - CiteSeerX

A Methodology for Building a Repository of Object-Oriented Design ...

A Methodology for Building a Repository of Object-Oriented Design ...

Text Area Identification in Web Images

Efficient Object Detection for High Resolution Images

A methodology for identification and control of electro

A Latent Semantic Analysis Methodology for the Identification and ...

A Methodology for Identification of Waterlogging and ...