A Comparison between Keyword, Text Ontology and Multi-Modality ...

11 downloads 163 Views 176KB Size Report
Google image search engine to collect domain specific images. We also keep these images in ranking order according to the Google image result. This set of ...
Does Ontology Help in Image Retrieval? — A Comparison between Keyword, Text Ontology and Multi-Modality Ontology Approaches Huan Wang, Song Liu, and Liang-Tien Chia Centre for Multimedia and Network Technology, School of Computer Engineering Nanyang Technological University, Singapore 639798 {wa0004an,

pg03988006, asltchia}@ntu.edu.sg

ABSTRACT Ontologies are effective for representing domain concepts and relations in a form of semantic network. Many efforts have been made to import ontology into information matchmaking and retrieval. This trend is further accelerated by the convergence of various high-level concepts and low-level features supported by ontologies. In this paper we propose a comparison between traditional keyword based image retrieval and the promising ontology based image retrieval. To be complete, we construct the ontologies not only on text annotation, but also on a combination of text annotation and image feature. The experiments are conducted on a medium-sized data set including about 4000 images. The result proved the efficacy of utilizing both text and image features in a multi-modality ontology to improve the image retrieval.

Categories and Subject Descriptors H.5.1 [INFORMATION INTERFACES AND PRESENTATION]: Multimedia Information Systems

General Terms Algorithms, Performance, Experimentation

1.

INTRODUCTION

Image retrieval has always been one of the most active research fields. Nowadays most popular web image retrieval systems are based on keyword searching in the surrounding text of images. However, this kind of retrieval requires adequate text information with correct keyword description about the corresponding images, which is not always true in real situations. Moreover, most pure text-based retrievals simply ignore the helpful image features which can be extracted through multimedia analysis. Some irrelevant images are retrieved for this reason. Content-base image retrieval(CBIR) has been studied for many years. It is a technique to extract image features like dominant color, color histogram, texture, object shape, and so on. The main problem is the semantic gap between the low-level image features and high-level human-understandable concepts. Though efforts like relevance feedback(RF) has been made to bridge this gap, lots of user interactions are involved. Ontologies are designed to capture shared knowledge and overcome the semantic heterogeneity among domains. Knowledge is

collected by expertise of individuals and represented in description logics. Machine can understand these uniformed representations and the gap between human-readable knowledge and machine-understandable logics is naturally bridged. A few applications[5] have used ontologies purely from MPEG-7 feature descriptors. For example, [6] designed and proved ontologies as middle-level structures to bridge the semantic gap between low-level feature and high-level concept. The main contribution of this paper is a comparison between the aforementioned traditional approaches and the ontology approaches. Though many people have tried to use vector or treestructure approaches, no actual application or explicit result are shown. In order to accomplish this experiment, we designed different ontologies in a selected domain and constructed corresponding domain knowledge. We compare the retrieval performance of different approaches and discuss their pros and cons. To the best of our knowledge, no existing works have compared keyword based retrieval with ontology based retrieval, or pure text ontology based retrieval with multi-modality ontology based retrieval in the domain of image retrieval. The rest of this paper is organized as follows: Section 2 introduces some related work. Section 3 focuses on the discussion of the designed ontology model, which is later used in the comparison. The experimental results and conclusions are given in Sections 4 and 5 respectively.

2. RELATED WORK In this section, we have a brief retrospect to previous works on different approaches. These approaches include content-based image retrievals(CBIR), text-based image search engines, and some ontology-based retrieval systems. Some of the techniques are applied in our comparison. The most intuitive way of web image retrieval is to use the surrounding textual information. Many textbased image search engines have been designed and made available on WWW, such as Y ahoo(T M ) and Google(T M ) . These textbased search engines search for images by text features, such as file name, annotation and so on. In our experiment, we use the Google image search engine to collect domain specific images. We also keep these images in ranking order according to the Google image result. This set of images and its ranking order is used as a retrieval result of the representative keyword search engines. Other than text information, image content is also a useful resource for retrieval. CBIR captures image features and organizes them into a meaningful retrieval architecture [7]. However, the semantic gap between the image feature and high-level concept is still an open issued. Ontology has been validated in practice[5][6][4] to bridge

Copyright is held by the author/owner(s). MM’06, October 23–27, 2006, Santa Barbara, California, USA. ACM 1-59593-447-2/06/0010.

109

Animal Description Ontology Section 3.1 Animal Domain Ontology

Section 3.2 Textual Description Ontology

Section 3.3 Visual Description Ontology

AnimalConcept Textual Concept

Jackal

Domesticdog Fox

color

Distribution

Wilddog

Wolf RedWolf

CapeFox

Pixcolor

ImageProperty

Outdoor P 1

Graph

Colorful

Europe

Wildlife

WhiteFur RedFur

Image

Building

hasDistribution

NoneFur

ColorFur Indoor

Red Asia

GreyWolf

Fur

Content

Greysacle

Gray America

RedFox

Visual Concept

Habitat

Canine

BrownFur

Asia

RedWolf

hasFur hasPixColor

hasColor

P4 BrownFur

Colorful

Red hasContent

Wild

RedFox hasFur hasDistribution

RedFur

USA Section 3.4 Generated Image Concept

rdfs:subClassOf containConcept

Property and its

Class

value

Class-Generated Class relationships

Generated Class

Figure 1: Layer structure of Ontologies this semantic gap. Furthermore, attempts[1][3] have been made to combine text information with image features for better retrieval performance. However, these works either depend on domains with uniform or simple image contents, or fail to give explicit experiment result.

3.

Table 1: Performance of image classification Classification Colorful/Graylike Photograph/Drawing Outdoor/Indoor HumanRelevantScene/Buildings/Wildlife Greenery/Sand/Stone/Snow/Others WhiteFur/RedFur/GrayFur/BrownFur/NonAnmial

ONTOLOGY STRUCTURE

In this section we briefly discuss the structures of the text and multi-modality ontologies, based on which we do image retrieval and get comparable results with text-based image retrieval. As our main focus is the comparison result, we do not include details of ontology construction, which is available at [8]. The experimental domain is canine, which is a sub-domain of animal. It is a challenging domain due to animals’ varied shapes and complex living environments. Therefore ontology is effective on capturing and integrating various aspects of information. This ontology is well-defined for semantic research scenarios and open to further extension in our future work.

0.921 0.842 0.806 0.794 0.814 0.634

3.2 Textual Description Ontology Textual description ontology is purely based on text and it is used to encapsulate high-level narrative animal description. By this ontology, certain animal is associated with its domain knowledge. That is why this ontology works better than single keyword on capturing semantic interpretations from different context. Several classes have been defined like “ScientificName”, “Diet”, “Habitat”, “Distribution” and “ColorDescription”. And semantic relationships have been generated to connect different concepts including “hasName”, “hasDiet”, “hasHabitat”, “hasDistribution”, “hasColorDescription”. The class and relationship definitions of the ontology are extracted from the BBC Science & Nature Animal category as well. We also generate general knowledge ontologies like geographical ontology, color ontology which are associated with the text ontologies by the relationships defined above.

3.1 Animal Domain Ontology The animal domain ontology is the basis of the following text ontology and multi-modality ontology. It provides semantic information of taxonomy definition for the target domain and handles the classification of animal species. This work is usually done by domain expert, while in our case, we derive the formal definition and domain knowledge from the BBC Science & Nature Animal category1 . It provides standard and unified descriptions in various aspects for around 620 animals. 20 subspecies under the domain of canine are collected as our experiment subjects. We re-define the hyponyms relationship between two concepts as subclass property in this ontology. For example, fox is a kind of canine (hyponyms), and therefore fox is defined as a subclass of canine in animal domain ontology. A motivating example is that without this domain ontology, a dhole image will not be returned for a search for wild dog while it actually should be. 1

ACCR

3.3 Visual Description Ontology Besides the single-modality textual description ontology, we define a multi-modality ontology which combines textual description and image features. First we build specific knowledge base in which the classes and relationships are extracted from low-level features. Then we incorporate this visual description ontology with the aforementioned text ontology to the multi-modality ontology. This combined ontology works better on images with loosely-coupled text annotation. We formulate each image classification scheme as a class in the

http://www.bbc.co.uk/nature/wildfacts/animals a z.shtml

110

Arctic Fox Retrieval Result

ontology and define the image categories under this classification scheme as its subclasses. These classes include GreyLikeImage, ColorImage, ContentType, OutdoorScene, IndoorScene, BuildingRelevant, HumanRelevant, WildlifeScene, FurColor. A complete list of relationships are extracted from low-level features as follows: “hasPixColor”, “hasPixProp”, “hasEnvironment”, “hasContent” and “hasFur”. So far, we have high-level descriptions generated not only from textual information, but also from low-level image attributes.

35

100

80

60

40 Arctoc Fox(Google) Arctic Fox(Text Ontology) Arctic Fox(Multi−Modality Ontology) Arctic Fox(Optimal)

20

0

Figure 1 provides a structure of the canine ontology in our system. We use ellipse and rectangle to represent pre-defined class and generated class respectively. The horizontal line in the middle is to separate the two different kinds of classes. Part of the ontology is omitted due to the limited space. Two examples red fox and red wolf are given to show how we define concrete animal concept using ontology. We can see red fox and red wolf are two generated classes of superclass canine. From the textual description ontology, we know red fox has distribution in USA and red wolf has distribution in Asia. The visual description ontology indicates the fur color of red wolf is brown and the fur color of red fox is red, even though in the query they share the same keyword red. The visual description information is helpful to filter a majority of inaccurate results. For instance, we can reasonably infer from an indoor background that a wild cape fox is not likely to exist in the image.

0

50 100 150 Number of Images Retrieved(in ranking order)

Number of Correct Images Retrieved

Number of Correct Images Retrieved

120

3.4 Examples of the Generated Classes

4.

Bush Dog Retrieval Result

140

30

25

20

15

10 Bush Dog(Google) Bush Dog(Text Ontology) Bush Dog(Multi−Modality Ontology) Bush Dog(Optimal)

5

0

200

0

140

60

120

50

40

30

20 Coyote(Google) Coyote(Text Ontology) Coyote(Multi−Modality Ontology) Coyote(Optimal)

10

0

0

50 100 150 Number of Images Retrieved(in ranking order)

200

Ethiopian Wolf Retrieval Result

70

200

Number of Correct Images Retrieved

Number of Correct Images Retrieved

Coyote Retrieval Result

50 100 150 Number of Images Retrieved(in ranking order)

100

80

60

40 Ethiopian Wolf(Google) Ethiopian Wolf(Text Ontology) Ethiopian Wolf(Multi−Modality Ontology) Ethiopian Wolf(Optimal)

20

0

0

50 100 150 Number of Images (in ranking order)

200

Figure 2: A comparison of image retrieval results between different approaches(1)

EXPERIMENTAL RESULTS

In the experiment we compare the ontology-based image retrieval systems with the Google image Search, which is among the best keyword based search engines and handles over 2 billion images. The experiment data set is set up by a total of 4000 images using the top 200 Google images of each of the 20 canine subspecies. Google image is used in our experiment as it is accessible and other researchers can easily compare our performance with their experimental results. The reason for using only the top 200 ranking results of Google image is that this set of images are statistically and visually higher in significance and ranking. The data set of the web images and their web pages is downloaded by our image crawler. The experimental results for low-level feature extraction using SVM is shown in Table1, which is used to build the multi-modality ontology. For the comparison we use the Google image Search with text ontology-based retrieval and multi-modality ontology-based retrieval. For semantic matchmaking of ontologies, we choose RACER version 1.9[2] as our reasoner since it is able to provide consistency checking of the knowledge base, computing entailed knowledge via resolution and processing queries through complex reasoning. To evaluate the performance of high-level information extraction based on low-level features, we list the average correct classification rates (ACCR) in Table 1. In the classification, one third randomly selected data are used as training samples and rest of the data are used as test samples. We repeat each classification 10 times and calculate the ACCR. The last set of classification does not achieve very good classification performance because the fur color of fox is affected by the change of illumination and viewing angle.

formance of textual description ontology retrieval is slightly better than keyword based search. However, we find that text ontology retrieval is still hampered by the lack of text information within the web page. For example, if no related concept and relationship is extracted from the surrounding text, the generated class of this image is void. The result ranking is based on the degree of match: exact match, subsume match and disjoint. As a void class is disjoint with any pre-defined canine class, the image is ranked low in the final result even if it is correct.

4.2 Keyword, Text Ontology versus MultiModality Ontology From the figure, we can see the multi-modality ontology-based retrieval outperforms others by returning more relevant images with higher ranking. In the best case arctic fox, the multi-modality ontology-based retrieval almost overlaps the optimum blue line, which returns the N correct images in the first N ranking positions. The result benefits much from high accuracy of image feature classification in WhiteFur, whose ACCR is 0.826(this ACCR value is different from the one shown in Table 1 as that value is the Average ACCR of all fur types). However, there are gaps between the multi-modality ontology results and the optimal results in most cases. We presume it could be due to one or more of these reasons: First the performance could be affected by the accuracy of image feature classification; Second, the lack of text information in the web pages will result in less correspondence in text ontology and multi-modality ontology; Third, concurrently we have not completed study on the accuracy of rule-based engines and reasoners, we are not sure if the reasoner we use provides the best matchmaking result.

4.1 Keyword versus Text Ontology We apply semantic matchmaking on the 200 top-ranking Google images with web pages and present some results of the overall performance of different approaches in Figure 2. In this test, the 200 images in each category include the training and test data used in previous image classification test. We can see that the overall per-

4.3 Comparison of Precision Results In Web image search and retrieval systems, precision is a very important guideline to measure the performance of a retrieval sys-

111

Top 20 Image Retrieval Result

Google result Text ontology result Multi−Modality ontology result

45

Number of Conrrect Image Retrieved

Number of Correct Image Retrieved

Top 40 Image Retrieval Result 50

Google result Text ontology result Multi−Modality ontology result

25

20

15

10

5

40 35 30 25 20 15 10 5

0

0

5

10 Animal Subspecies

15

0

20

0

5

Top 60 Image Retrieval Result

Number of Correct Images Retrieved

Number of Correct Images Retrieved

20

Google result Text ontology result Multi−Modality ontology result

90

60 50 40 30 20 10 0

15

Top 80 Images Retrieval Result 100

Google result Text ontology result Multi−Modality ontology result

70

10 Animal Subspecies

80 70 60 50 40 30 20 10

0

5

10 Animal Subspecies

15

20

0

0

5

10 Animal Subspecies

15

20

Figure 3: A comparison of image retrieval results between different approaches (2)

6. REFERENCES

tem because most users will browse only limited numbers of results. Therefore recall rate for a Web image retrieval is not that crucial. In Figure 3, we show the correct number of image retrieved in the top 20, 40, 60 and 80 for all 20 canine subspecies. The 20 subspecies are: 1. aardwolf, 2. African wild dog, 3. bateared fox, 4. black jackal, 5. cape fox, 6. arctic fox, 7. gray fox, 8. red fox, 9. kit fox, 10. bush dog, 11. coyote, 12. dhole, 13. dingo, 14. Ethiopian wolf, 15. fennec fox, 16. golden jackal, 17. gray wolf, 18. maned wolf, 19. red wolf and 20. spotted hyena. From the figure, we can see for top 20 image retrieval result, nearly all images retrieved by the multi-modality ontology are correct. In most cases, ontology-based image retrieval can achieve better retrieval precision than keyword-based image search. As we only implement generic image classification mechanism which is not particularly designed for the target domain, we can see from Table 1 that if we only apply the image retrieval based on image classification results, we may fail to outperform the normal textual based image retrieval mechanism. However, by combining the high-level textual information with low-level image features, we are able to improve the retrieval precision by about 5 to 30 percent.

5.

[1] Y. A. Aslandogan, C. Thier, C. T. Yu, J. Zou, and N. Rishe. Using semantic contents and wordnet in image retrieval. In Proceedings of 20th annual international ACM SIGIR conference on Research and development in information, pages 286–295, 1997. [2] V. Haarslev and R. Moller. Racer system description. In Proceeding of International Joint Conference on Automated Reasoning, pages 701–705, 2001. [3] S. Hammiche, S. Benbernou, M.-S. Hacid, and A. Vakali. Semantic retrieval of multimedia data. In MMDB, pages 36–44, 2004. [4] B. Hu, S. Dasmahapatra, P. Lewis, and N. Shadbolt. Ontology-based medical image annotation with description logics. In Proceedings of The 15th IEEE International Conference on Tools with Artificial Intelligence, pages 77–82, 2002. [5] J. Hunter. Adding multimedia to the semantic web - building an mpeg-7 ontology. International Semantic Web Working Symposium, August 2001. [6] S. Liu, L.-T. Chia, and S. Chan. Ontology for nature-scene image retrieval. In On the Move to Meaningful Internet Systems 2004: CoopIS, DOA, and ODBASE, pages 1050–1061, June 2004. [7] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Trans. PAMI, 22(12):1349–1379, DECEMBER 2000. [8] H. Wang, S. Liu, C. Zhou, and L.-T. Chia. Ontology Construction. available at: http://cemnet.ntu.edu.sg/pet device/wanghuan/Ontolo gy%20Construction.pdf, April 2006.

CONCLUSION

This paper has presented a comparison between keyword-based image retrieval and the ontology-based image retrieval. Different approaches have their own pros and cons. Keyword-based approach is user friendly and easy to apply with an acceptable retrieval precision, while semantically rich ontology addresses the need for complete descriptions of image retrieval and improves the precision of retrieval. However, the lack of text information which affects the performance of keyword approach is still a problem in text ontology approach. Ontology works better with the combination of image features. Though there is a trade-off between the complexity and performance, ontology is still a viable choice when better performance is expected with a smaller result set.

112