2011 International Conference on Semantic Technology and Information Retrieval 28-29 June 2011, Putrajaya, Malaysia
A Framework for Integrating DBpedia in a MultiModality Ontology News Image Retrieval System Khalid, Y.I.A Knowledge Technology Research Group, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia.
[email protected]
Abstract—knowledge sharing communities like Wikipedia and automated extraction like DBpedia enable a large construction of machine processing knowledge bases with relational fact of entities. These options give a great opportunity for researcher to use it as a domain concept between low-level features and highlevel concepts for image retrieval. The collection of images attached to entities, such as on-line news articles with images, are abundant on the Internet. Still, it is difficult to retrieve accurate information on these entities. Using entity names in a search engine yields large lists, but often results in imprecise and unsatisfactory outcomes. Our goal is to populate a knowledge base with on-line image news resources in the BBC sport domain. This system will yield high precision, a high recall and include diverse sports photos for specific entities. A multi-modality ontology retrieval system, with relational facts about entities for generating expanded queries, will be used to retrieve results. DBpedia will be used as a domain sport ontology description, and will be integrated with a textual description and a visual description, both generated by hand. To overcome semantic interoperability between ontologies, automated ontology alignment is used. In addition, visual similarity measures based on MPEG7 descriptions and SIFT features, are used for higher diversity in the final rankings. Keywords-Image Retrieval, Ontology, DBpedia, Text Retrieval, Multi-Modality Ontology and Sport News.
I.
INTRODUCTION
The proliferation of digital images requires tools for extracting knowledge from the content to enable intelligent and efficient multimedia organization, filtering and retrieval. Challenges arise when the number of digital web images continually increase. The majority of digital images on the web use content descriptions to help in searching. However, [14] claimed in research that image retrieval which focuses only on the low level features or content descriptions causes difficulties for users to articulate needs through key word searches. The existence of semantic searches has becomes one of the main showcases in image retrieval because it accentuates natural language concepts. Ontology, which clearly defines concepts and interrelationships in a domain, has been widely used in many information retrieval fields, including document indexing. Examples include extracting semantic contents from a set of text documents; image
978-1-61284-353-7/11/$26.00 ©2011 IEEE
Noah, S.A. Knowledge Technology Research Group, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia 43600 Bangi, Selangor Darul Ehsan, Malaysia.
[email protected] retrieval and classification, i.e., using concepts either from image features or surrounding text for content representation and video retrieval, i.e., using text in video captions for semantic concept detection. [7] Due to that, research has often [15][5][13][4][10] suggested that the best way to arrange, update and retrieve images is to combine low-level features and high-level concepts in order to reduce the semantic gap. A definition of “semantic gap” is offered by [14]: “The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data has for a user in a given situation.” – [13] Note that most approaches involve an external lexical dictionary or online category to determine an entity’s ontology; both certainly improve performance. However, some questions remain: Is ontology just the hierarchical collection of concepts with a parent-child relationship? Is ontology scalable when extended to a large domain? Because of these questions, on-going research is proposed to increase the effectiveness of semantic image retrieval. An integration of multi-modality ontology with Linked Open Data technology, DBpedia, is recommended. The Linking Open Data project has become one of the main showcases for successful community-driven adoptions of semantic web technologies during the last few years. It aims at developing best practices to opening up “data gardens” on the Web, interlinking open data sets and enabling web developers to make use of that information. [10] For this research, DBpedia served as the domain source, having the capability to enhance the retrieval model. The scope of this research is to populate images and provide descriptions for on-line sports news collected from BBC Sports, and to use semantic web technology to populate and annotate image galleries. Such ontologies will be used not only to classify multimedia, but also to correlate concepts to related texts. This approach ensures consistency in terminology, leading to a more accurate and precise query result. The performance was evaluated by comparing the experiment results with other image search engines, like Getty Images search and Google image search.
144
II.
BACKGROUND AND RELATED RESEARCH
A. Content Based Image Retrieval The development of Content Based Image Retrieval (CBIR) systems has been going on for years, and research is still being conducted to better this field. CBIR is a technique used to extract image features like dominant color, color histogram, texture, object, shape and so on. Its aim is to solve the limitations of manual text-based indexing, which is both time-consuming and inconsistent [18].The examples of CBIR systems include: PicSOM system [11]. However, there remains a gap. In general each low-level feature only captures one aspect of an image. With feature representation, there often is no single visual feature which best describes the image’s content. Though research has been done on retrieval feedback [4] and region based image retrieval [20] problems still arise because of the required user interactions [6] and the inability to capture the high level concepts of some users’ queries. That is, computers process images to form numerical data as feature representations of images. However, humans recognize images based on high-level concepts, semantically – making low-level feature representation difficult to correspond directly to high-level concepts. As CBIR systems process, store and retrieve images in terms of low-level image features, it is not possible for current CBIR systems to query images by saying “find me images which contain buildings and cars” or “find me images which describe city life”. It is very difficult to ask users to transform high-level concepts or semantics into low-level features [19] B. Ontology Ontologies play an important role in solving the issue of semantic heterogeneity among various domains and can be viewed as a source of sharing knowledge. It was designed to link the semantic gap between low-level features and the highlevel human concepts. Research has been done in this area to lessen the semantic gap by linking the application of MPEG7 descriptors with ontologies. Research had been done by [7] whereby a framework is integrated into one prototype image retrieval system. The research claimed it produced a fast and efficient medium for query and retrieval. This claim was supported by [18] in a study using the same MPEG7 standards, which provided effective low-level feature descriptors, combined with Jena toolkit [12], providing ontology-specific services. Image annotation is another proposed strategy to enhance the concept of ontology [13] combined a 2 dimensional view of image retrieval, utilizing low-level descriptions and content dimensions with a corresponding annotation. This was called M-OntoMat-Annotizer. This links low-level MPEG7 descriptors to conventional semantic web ontologies and annotations [12]. Annotations exist because of user needs. Users need to understand the function of color, features, and other qualities to use the CBIR query interface. Often they don’t, which leads to wrong queries, and weak outputs. Users send queries in a variety of different ways. Various methods
978-1-61284-353-7/11/$26.00 ©2011 IEEE
been used to show the advantages of image annotation, using both the manual and automatic methods. The same methods were applied in this study, using automatic annotation to give meaning to an image collection. Manual annotation is time consuming and costly. According to [8] research, ontology was used in an image retrieval system for a museum collection. The researchers claimed that ontology applied in image retrieval can handle the problem of knowledge encapsulation for semantic annotation. In this research, the focus was more on semantic annotation. Semantic web ontology and metadata languages were shown as an effective way to annotate images for retrieval. Keyword based queries were less effective in formulating the information needs. There is research that focuses on image processing and embedding using various methods. According to [21], region based image retrieval, using object ontology and relevance feedback, could enhance the CBIR system. It was claimed that the advantages are seen through the utilization of unsupervised segmentation methods for dividing the images into regions that are to be indexed later. One of the limitations of this study is the usage of ontology. In this paper, ontology was used only for simply vocabulary listing and not in representing any ontology languages like Web Ontology Language (OWL), Resource Description Framework (RDF), and RDF Schema. The query in this research is formulated in text using the predefined keywords. In this study, regions that match the keyword query are presented to the user. Then, the user can give feedback on the retrieved images. From the input given by the user, the system adapts the relevant answers using the Support Vector Model (SVM). To filter out unrelated images, the Constraint Similarity Measure (CSM) was used. This paper uses the same methods to manipulate outputs. [4] claims there is a problem in the object segmentation technique, so another approach was proposed. This approach, Visual Ontology Query Interface is used for querying OWL ontology and was built using the CBIR technique. This study suggests that through the query interface, users are able to formulate various ontology queries without having known SPARQL Protocol and RDF Query Language (SPARQL). However, this application is really suitable only for comic images, but not original captured images. According to [20], ontologies can be used to relate semantic descriptors to the parametric representation for visual image processing. This is how ontology is linked to image features. This paper concluded that using ontology leads to an effective computational and representational mechanism through the ontological query language (OQUEL). By claiming that previous research has not fully closed the semantic gap, [5] suggest an image retrieval system focusing on a multi-modality ontology model. This requires integrating both text information and low-level features. This prototype system was developed in order to overcome the semantic heterogeneity of various images. A semantic matchmaking algorithm was embedded, leaving semantic reasoning and
145
description logic to create intelligent image retrieval with high precision. To show the effectiveness of the system, it was compared with traditional text based image search engines and Google. Based on the methods reviewed, it was clear that ontology was the best option to enhance an image retrieval system. C. Linked Data in Semantic Web Current image retrieval systems use keywords and domain ontology for vocabulary control. The existence of Semantic Web technology, in particular Linked Data and DBpedia, enhance image retrieval across a particular domain by integrating the data and linking documents. The main objective of Linked Data technologies is developing best practices to opening up the “data gardens” on the Web, interlinking open data sets on the Web and enabling web developers to make use of that rich source of information [10]. The DBpedia data set is a large multi-domain ontology which was derived from Wikipedia. The DBpedia data set currently describes 2.9 million “things” with 479 million “facts.” Ontology is a shallow, operates cross-domain and was manually created based on the most commonly used info boxes within Wikipedia. The ontology currently covers over 205 classes, forming a hierarchy with 1,210 properties (wiki.dbpedia.org, 2009). D. Architecture The scope of this research is to explore the impact of integrating Linked Data technology, DBpedia ontology, across a particular domain, and to subsequently evaluate the effectiveness of a multi-modality image retrieval system. The proposed approach takes advantage of the natural gap in semantics, classifying atomic concepts using support vector machines and high-level concepts using Bayesian networks. Upon classifying the image, the system should reflect the image semantics, its features, content and the semantic category, as part of a semantic link space. Table 1 illustrates the layered architecture. The bottom layer represents the original image collections, collected from various sources, and the layer directly above it will represent the image semantics using ontology. The semantic link layer prepares image networks based on available image semantics (text description and domain knowledge) and the features that correspond to the respective images, constituting a description layer of features. III.
MULTI-MODALITY ONTOLOGY RETRIEVAL SYSTEM
This section provides an overview of the components and outlines the work-flow of a multi-modality ontology for an image retrieval system. The proposed system includes two components, namely the ontology construction and operational interface of the image retrieval part, as shown in Fig.1. The operational interface which includes the process of spreading activation, ranking and indexing process is beyond the scope of this paper. The focus is on the generation of a multi-modal ontology, including the following steps: (1) a set of relevant concepts of the target domain with associated semantic relations, including the class and subclass relations, are extracted from DBpedia;(2) extraction of other concepts from
978-1-61284-353-7/11/$26.00 ©2011 IEEE
TABLE I.
SEMANTIC LAYER ARCHITECTURE
Operation Interface Feature Semantic Link Description Semantic Web Representation (XML, RDF, Ontology, etc) Raw Image Collection relations, including the class and subclass relations, are extracted from DBpedia; (2) extraction of other concepts from news articles and text descriptions that come along with the images. There were three separate aspects of this effort. First, domain knowledge of sports, such as sport name, type, event, and people from DBpedia to represent the domain ontology. Second, sport news images with related image descriptions were collected from the web and segmented to set up an image database together with annotated text. This formed part of the low-level features that were extracted in an independent manner. Third, domain knowledge was associated with image features, including the text description ontology to construct the multi-modal ontology model. For the operational interface, users first submitted queries in the system. Then, the search engine fetched all related images in the database, analyzed these images, and produced required inputs for the inference process. After the inference process, the relevance of each image to the submitted query was calculated. Finally, these images were ranked according to relevance and returned to the users. A. Ontology Building The ontology building process was divided into three separate phases: A DBpedia-based high-level text description ontology for a specific domain, text-based ontology construction from the news and visual feature ontology construction. Sport news classes were defined as target concepts. Each target concept was associated with a set of common concepts, including both high-level and low-level information. The process for constructing the ontology can be seen clearly in Fig. 2.The sport news ontology was incorporated into the semantic link space using ontology alignment. Because of the importance of the free text input query, a relevance score of each image to the particular concept was calculated after a user submitted a query to the system. Images were then ranked according to the corresponding relevance values of particular concepts and returned to the user. Experimental results demonstrated the efficacy of using the proposed multi-modal ontology with DBpedia for image retrieval.
146
extracted from such datasets, where the super class is Person, with a series of descendent concepts and the concept of entity is Badminton Player, with Natalie Munt as an instance.
Figure 1. Multi-Modality Ontology System Approach Figure 2. Diagram of Ontology Construction Model
B. DBpedia-Based Domain Ontology DBpedia was used as the domain ontology. DBpedia includes a large multi-domain ontology, derived from Wikipedia. The DBpedia data set currently describes 3.4 million “things” with over 1 billion “facts” (March 2010) (http://wiki.dbpedia.org)[1]. The DBpedia project extracts various kinds of structured information from Wikipedia editions in 92 languages and combines this information into a huge, cross-domain knowledge base. DBpedia uses the Resource Description Framework (RDF) as a flexible data model for representing extracted information and for publishing it on the Web [1]. The emergence of such an online encyclopedia has gradually taken the place of traditional web directories, which are usually domain specific and loosely structured, and further helped the machine learning of human knowledge. DBpedia is an ideal source for knowledge acquisition and ontology construction for various domains. First of all, it is an online collaborative encyclopedia project, which offers elaborations of more than three millions words and phrases. This number is considered large compared to WordNet’s word coverage, besides its broad knowledge coverage and up-todate vocabulary. Each "thing" in the DBpedia data set is identified by a URI reference of the form http://dbpedia.org/resource/Name, where Name is taken from the URL of the source Wikipedia article, which has the form http://en.wikipedia.org/wiki/Name. Thus, each resource is tied directly to an English-language Wikipedia article. DBpedia resource is described by various properties. Every DBpedia resource is described by a label, short and long English abstract, a link to the corresponding Wikipedia page, and a link to an image depicting the object of interest. An automatic hierarchical ontology structure can be directly
978-1-61284-353-7/11/$26.00 ©2011 IEEE
DBpedia also uses Info-box Ontology as a new Info-box extraction method, based on hand-generated mappings of Wikipedia info-boxes/templates to a newly created DBpedia Ontology. The mappings adjust weaknesses in Wikipedia’s info-box system, such as using different infoboxes for the same type of thing (class) or using different property names for the same property. Therefore, the instance data within the info-box ontology is much cleaner and better structured than the Info-box Dataset. According to [1] there are three different Info-box Ontology data sets: • The Ontology Info-box Type dataset contains the rdf:types of the instances, extracted from the info-boxes. • The Ontology Info-box Properties dataset contains the actual data values that have been extracted from infoboxes. The data values are represented using ontology properties (e.g., 'volume') that may be applied to different things (e.g., the volume of a lake and the volume of a planet). • The Ontology Info-box Properties (specific) dataset contains properties that have been specialized for a specific class using a specific unit i.e. the property height is specialized on the class “Person” using the unit centimeters instead of meters. Further ontology construction among all the extracted concepts was done. This step was based on the DBpedia infobox dataset and ontology structure. It offers a systematic categorization of all the classes and concepts. All the classes in DBpedia were extracted without removing any concepts (?). DBpedia plays a role as our domain ontology to control the vocabulary and classes in text extraction.
147
C. Textual-Driven Ontology Construction The text driven ontology construction is mainly concerned with extracting concepts (specific instances) from news articles and textual image descriptions, and populating the sport news ontology with them. The sport news ontology used in the research is derived from the work of [9].The sport news ontology is different from the domain extracted from DBpedia in terms of the interrelationship between the concepts of news and sport. The DBpedia ontology is a domain dependent ontology; the sport news ontology is a generic ontology. We combined both because a domain dependent ontology provides concepts in a fine grain, while generic ontologies provide concepts in coarser grain. Training data was collected from BBC news on sports with the date range being from January until December 2009. All of the test data was collected by hand and subsequently segmented. Text driven ontology construction focuses on a particular type of sport article. It also includes a description of any images. Concepts and entities were then extracted from these texts using the OpenCalais Submission Tool [2]. OpenCalais web service automatically creates rich semantic metadata for the content base on specific named entities such as person, organization and country. Such rich semantic metadata are then populated as instances in the basic sport news ontology previously described. D. Visual Description Ontology The images found in the news article were segmented. The next task was to annotate the images, and break them into classifications of low-level, or atomic, concepts and classifications of high-level concepts in the domain-specific ontology. In a general sense, concepts are atomic [4] if they are terms that can describe a specific object or image segment. Examples are a ball, a stick, a net or other well-defined objects. High-level semantic concepts, on the other hand, are used to describe an environment with a set of existing atomic concepts associated with them. For example, an image that contains a ball, a hoop, shoes, and humans is described as a basketball game. Then the text description included with the image is classified according to the concept, which is also based on the domain ontology. For the visual feature ontology, images are segmented and descriptions provide the color and texture details for the image’s content using image annotation tools. In this research, LabelMe Annotation Tools [3] were used due to the quality of designing and the ability to recognize a class of objects instead of only single instances of an object. A traditional dataset, for example, may contain images of dogs all of the same size and orientation. In contrast, LabelMe contains images of dogs in multiple angles, sizes, and orientations. It was also designed for recognizing objects embedded in arbitrary scenes instead of only images that have beencropped, normalized, and/or resized to display a single object. With the complex annotation, instead of labeling an entire image (which also limits each image to containing a single object), LabelMe allows the annotation of multiple
978-1-61284-353-7/11/$26.00 ©2011 IEEE
objects within an image by specifying a polygon-bounding box containing the object. It contains a large number of object classes and easily allows the creation of new classes. The class labels used in classification is associated with the terms defined in the domain knowledge extracted from DBpedia. For example, in a sport domain, images are first classified as colorful or grayscale. Images which are classified as colorful are then further classified into indoor or outdoor categories. Such classifications are useful for identifying the sporting event. For example, images of a badminton sport event are likely to be associated with the indoor category. All three ontologies will be aligned and used for the construction of a multi-modality image retrieval system. IV.
DISCUSSION & CONCLUSION
The image concepts in this research are based on three main ontologies; the text based ontology, the description and text from the news; the image annotation from feature extraction; and lastly, the domain ontology, extracted from DBpedia ontology. In conclusion, this research is in progress and only the first phase is completed. Much research has been done on image retrieval [16][17], however only limited research focuses on the usage of knowledge sharing communities like DBpedia in creating and supporting domain knowledge. This research is being conducted in the hopes of contributing towards the development of ontology and semantic research, and subsequently enhancing image retrieval systems. REFERENCES [1] [2] [3]
DBpedia. from http://wiki.dbpedia.org OpenCalais Submission Tools. From http://www.opencalais.com B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman, LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, pages 157-173, Volume 77, Numbers 1-3, May, (2008). [4] D.N.F.A. Iskandar, J. A. T., S.M.M.Tahaghoghi.. A Visual Ontology Query Interface for Content-Based Image Retrieval. Paper presented at the HFT 2008. from www.hft2008.org. (2008) [5] Huan, W., L.T.Chia, S.Liu. Image Retrieval with Multi-Modality Ontology. Multimedia System, 13, 379-390.(2008). [6] Huan, W., Song, L., & Liang-Tien, C. Does ontology help in image retrieval?: a comparison between keyword, text ontology and multimodality ontology approaches. Paper presented at the Proceedings of the 14th annual ACM international conference on Multimedia. (2006). [7] Huan, W, Xing Jiang, Liang-Tien Chia and Ah-Hwee Tan, "Wikipedia2Onto – Building Concept Ontology Automatically, Experimenting with Web Image Retrieval," Informatica, vol. 34, pp. 297-306, (2010). [8] Hyvönen, E., Mäkelä, E., Salminen, M., Valo, A., Viljanen, K., Saarela, S., et al. MuseumFinland--Finnish museums on the semantic web. Web Semantics: Science, Services and Agents on the World Wide Web, 3(23), 224-241.(2005) [9] Khan, L. Standards for image annotation using Semantic Web. Computer Standards & Interfaces, 29(2), 196-204.(2007). [10] Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., et al. Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections. In The Semantic Web: Research and Applications (pp. 723-737).(2009). [11] M Koskela, J. L., S Laakso, E Oja. The PicSOM retrieval system: description and evaluations. The challenge of image retrieval, Brighton.(2000).
148
[12] B. McBride, "Jena: a semantic Web toolkit," Internet Computing, IEEE, vol. 6, pp. 55-59, (2002). [13] Petridis, K., D. Anastasopoulos, et al.. M-OntoMat-Annotizer: Image Annotation Linking Ontologies and Multimedia Low-Level Features. Knowledge-Based Intelligent Information and Engineering Systems. Springer Berlin / Heidelberg. 4253: 633-640.(2006) [14] Ritendra, D., Dhiraj, J., Jia, L., & James, Z. W. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Survey, 40(2), 160.(2008). [15] S.Auer, C. B., G. Kobilarov, J. Lehmann, R. Cyganiak, Z.G. Ives. DBpedia: A Nucleus for a Web of Open Data In The Semantic Web (Vol. 4825, pp. 722-735): Springer Berlin / Heidelberg.(2007). [16] Shahrul Azman Mohd. Noah, Saiful Bahri Sabtu: Binding semantic to a sketch based query specification tool. Int. Arab J. Inf. Technol. 6(2): 116-123 (2009)
978-1-61284-353-7/11/$26.00 ©2011 IEEE
[17] Shahrul Azman M.N., Azilawati A., Tengku Mohd, T.S. & Tengku Siti Meriam, T.W. 2008. Exploiting Surrounding Text for Retrieving Web Images. Journal of Computer Science, 4(10): 842-846. [18] Smeulders, A., Worring, M., Santini, S.; Gupta, A., Jain, R., "Contentbased image retrieval at the end of the early years," Pattern Analysis and Machine Intelligence, IEEE vol. 22, pp. 1349 - 1380, (2000) [19] Tsai, C.-F. & Hung, C. Automatically Annotating Images with Keywords: A Review of Image Annotation Systems. Recent Patents on Computer Science1, 68, 55 (2008). [20] Vasileios, M., Ioannis, K., & Michael, G. S. An Ontology Approach to Object-Based Image Retrieval.(2003) [21] W. Zheng, Y. Q., James Ford, Filliam S. Makedon. Ontology-based Image Retrieval. WSEAS MMACTEE-WAMUS-NOLASC.(2004)
149