SIREA: Image Retrieval using Ontology of Qualitative Semantic Image Descriptions Saqib Majeed University of Arid Agriculture Rawalpindi, Pakistan
[email protected] Abstract—Our research addresses semantic retrieval of images from plethora of image dumps by imparting human cognition in the image retrieval process. Proposed architecture SIREA addresses the issues of keyword based image retrieval and content based image retrieval through semantics. Performance of semantic image retrieval is greatly enhanced by incorporating domain ontology that facilitates in intelligent image retrieval via semantic concepts, categories and qualitative attributes. Effectiveness of our approach is evident from retrieval accuracy and relevance of images retrieved from huge repository of natural scenes. Keywords—Semantic Image Retrieval; CBIR; RDF; Domain Ontology; Qualittaive Spatial Representations
I.
INTRODUCTION
Intelligent image retrieval and navigation of digital collections from overwhelming digital repositories on internet is calibrated as a great research challenge. Image retrieval in these days is not merely confined to querying for an image, accessing heterogeneous digital repositories and fetching the images which may or may not be accurate and relevant as desired. Efforts are made for reconaince of correct images from image deposits that are closer to human intuition and cognition [1,2,3,4]. The advent of semantic web, focused on minimizing the disparity of interpretation between “machines” and “humans” pertinent to internet artifacts, has further complemented this domain of research where exploration and invention of new image retrieval techniques is gaining more attention. Barring availability of contemporary image retrieval techniques [5,6,7,8,9]such as “keyword based searching” or “content based searching”, the need for more effective image retrieval techniques, capable of addressing issues in existing techniques has become inevitable. Keyword based searching uses inexpressive image -labels which cannot fully describe the content\context of an image and may add to ambiguity while processing an image query [10]. Moreover, any change in the textual annotations (based on native language) will be a challenge to search. Content Based Information Retrieval (CBIR), has been put forth as a substitute to the “keyword based searching”. CBIR manipulates the low-level features of an image i.e. color value (RGB or YUV etc), texture, shape and space etc [11] to search an image. A plethora of algorithms working on low level features of image are available but they lack a description that a
Zia ul Qayyum, Sohail Sarwar Iqra University Islamabad, Pakistan
[email protected],
[email protected] user can easily comprehend. So they do not satisfy and comfort human cognition. Semantic Image Retrieval addresses the above issues by retrieving images based on certain meanings\definition associated with every image and every part of the image (each meaning\definition is referred as a “concept”) with the help of ontology(ies)[12]. Image retrieval based on semantics, specifically retrieval of natural scenes as in our case (as shown in Figure 1), attempts to provide results near to human perception by matching similarity of the “object of interests (OOI) [13] a.k.a. concepts”. For example, following queries may be posed on a repository containing “natural scene” images based on some concepts like sky, water, foliage etc. • “Find images of forest” or • “Find images with sky and water” or • “Find images with mountains and foliage and sand”. However, these queries may not be equally simple all the time as above. Complexity to these may be added on incorporating different “Relations” among objects in the “Query Criteria” while searching certain images. For example: • “Find landscape images having more water than grass” • “Find Forest images with more snow than rocks and less sand than foliage” • Find Forest image with foliage after sky and there is more foliage than sky.
Fig 1: A Landscape image with concepts of water, sky, mountains, snow and foliage.
Above examples imply that a combination of semantic information i.e. the concepts and qualitative relations (more, less, equal etc.) has capacity to model the human instincts for semantic classification of images and their retrieval capabilities, the focal point of this paper emanating from our
978-1-4799-0604-8/13/$31.00 ©2013 IEEE
research. We have developed domain ontology to theoretically model descriptions of the images with respect to image features, semantic hierarchy, semantic concepts and spatial relationships among those concepts. Proposed model works on human understandable high level features of images that have been transformed into Resource Description Framework (RDF) triples. The metadata contains information of sources (i.e., URI, image, category, concept, relationship and region) in the form of RDF triples having a key role in performing semantic search of images. These search operations are performed on image repository carrying different categories of images where categorization is based on human percept reference to image appearance. This representation of image content with semantic terms facilitates to query and access images in a more intuitive, easier and preferred way entailing in enhanced accuracy and relevance. Section 2 discusses the literature survey. Section 3 is furnished with architecture of proposed system while section 4 contains the implementation details. The results and the evaluation of the proposed system prototype are elaborated in section 5. Section 6 concludes the work and reveals intended future direction. II.
All approaches discussed do not consider qualitative relationship among the high level concepts of images except the one proposed by Wang [13] where image retrieval is based on spatial relationships. The research framework proposed consists of two main components, including the extraction of low-level features, objects and the identification of the semantic object extraction and representation. III.
SYSTEM ARCHITECTURE
“An Ontology Based Image Retrieval Approach using Qualitative Semantic Image Descriptions” has been proposed to improve the precision and relevance of image search results. The architecture “SIREA- Semantic Image Retrieval with Enhanced Accuracy” is presented in Fig. 2 Matching of RDF triples has been employed instead of keywords in order to concentrate on the context of the search terms. The proposed framework has following components: Image Repository Manager, Image Retrieval Manager, Domain Ontology and Image Repository.
LITERATURE SURVEY
We have orchestrated below some of the relevant work in this area specially related to query-by-example and contentoriented querying paradigms. Image retrieval systems are of two different types [13]. The first approach offers search capabilities of local or global image features such as color or texture. The other approach is the idea of adding keywords to images as an annotation. Humans work well in scoring images, because they normally have a great knowledge of the area of the image belongs to. But besides the fact that this is a very tedious task to index a lot of pictures, humans tend to make a subjective image so annotation effort applies to other. Systems that are the subject of the second approach offer support for the manual annotation or try to automate this process. The objective is to minimize the subjectivity of manual annotation, guiding the annotation process or to support sustainable human. The perceptual segmentation approach in Depalov et al [15], as discussed in the section 2.1 above, has not been applied in their work for image categorization and retrieval; but the relative effectiveness of their approach in regards to image segmentation and labeling can be used to perform keyword based image retrieval. More recent reviews of CBIR techniques by Deb et al. [13] discuss state of the art segmentation, indexing and retrieval for a number of CBIR systems. The gap between low level image features and high level semantic expressions are bottlenecks to the access of multimedia data from databases. These surveys reveal one important aspect that almost all existing approaches rely on using low level image feature, categorization and retrieval. Image understanding is a key to all content-based image categorization and retrieval systems.
Fig. 2: System Architecture SIREA
Image Repository Manager receives the natural scene image and transforms them in a “processed” form that can be stored and then semantically retrieved. Image repository comprises of six different types of images with respect to the way humans can perceive their category reference to appearance of image. Image categorizer manages these categories of images by indentifying each of the images as that of: (i) Forest (ii) Field (iii) Sky clouds (iv) Water spaces (v) Landscape and (vi) Landscape with mountains. A customized version of Grid annotator performs appropriate image transformation by dividing each image into a grid of 10 x 10 (making 100 parts of each image). Each cell represents a “concept” from available set of concepts which are managed by Concept Analyzer. Each set of 100 concepts represents a category the image belongs to. Eleven concepts, listed below, have been used to qualitatively describe an image in totality: i) Grass ii) Flowers iii) Sky iv) Field v) Foliage vi) Mountains vii) Rocks viii) Sand xi) Snow x) Trunks xi) Water
This can better be viewed from the image divided into cells and tagged with concepts as given in Fig 3
Top Region
Region Middle
particular concept available in user query in an image, retrieved via a query as top 10 images or top 5 images. Domain ontology is a description of concepts in a field domain, relationships between concepts, and cases or people who are real things to fill that structure for real world applications of ontology. Here domain ontology consists of five classes namely (a) Image (b) Category (c) Cell (d) Concept and (e) Row. Images Repository is a digital collection of processed images that are stored in digital formats and accessible by retrieval systems. The image content may be stored locally, or accessed remotely via computer networks. IV. IMPLEMENTATION STRATEGY
Bottom Region
Fig. 3: A typical Processed Image produced by Image Repository Manager
Grid annotator represents each of the processed images in the form of regions as well i.e. Top region, Middle region and Bottom region. This representation would aid to comprehend the queries where retrieval of image depends upon set of concepts in certain region(s). For example “Find an image with foliage after sky” is interpreted by our system as Find “image where sky appears in Top region and foliage appears in middle or bottom regions”. Each of these regions in turn are a collection of rows i.e. Top region has three rows, Middle region has four rows and Bottom region has three rows. Each row in a region comprises of ten cells, where each cell is annotated with one concept. . The precision of mapping between an image cell and concept is accurate to the degree of 99% [12]. One percent of issue arises due to having two concepts in a single cell. For example, a single cell may contain concept of foliage and sky, so some noise is introduced. This negligible issue can be addressed by increasing the size of image grid from 10 x 10 to 100 x 100 or beyond that. Image Retrieval Manager receives the user receives a user query in natural language, performs the query-reformulation by transforming the user query into a system query(i.e. RDF triples), Three different variations of queries have been provided in order to search the images based on (a) global labels of images (b) high level concept of the images and (c) qualitative semantic concepts of the image.These queries are passed to the ontology interface, for performing the semantic matching on the query search terms to retrieve images. However, this retrieval does not process the actual images themselves rather an abstract representation of the images with respect to a particular information model is used that was created during the image indexing process. The query result will be the images reference coming from the domain ontology, the actual images will be passed to the Image ranker after comparing with the image reference. It retrieves images that are related to the user's query and ranks them according to the degree of relevance between images and query triples. The images are ranked according to the occurrence frequency of a
In this section, the implementation details of proposed system are discussed. JAVA and Protégé have been used for implementation of system front-end and back-end respectively. One of the key phases in our implementation is building the domain ontology by identifying domain classes, individuals in those classes and relationship between objects of the classes via properties. Each of the class in domain ontology is given below: Classes in Domain Ontology The Domain Ontology defined in terms of RDF triples in our framework consists of following classes: • Image Class This class contains the “names of the images” as individual i.e. global label of image. These global labels are used to access the images from image repository and facilitate in query search based on “global labels”. • Category Class This class contains the each “name of image category” as individuals. The categories have already been described in section 3. • Cell Class This class refers to a single operational unit on which all retrieval decisions are dependent i.e. “Cell”. The individuals in this class are equal to “number of images x 100”, where 100 refer to number of cells in which each image is divided. • Concept Class “Names of the concepts” are the individuals of this class. Eleven concepts have already been described in section 3. •
•
Row Class This class contains the “names of cells” as individuals. The number of individuals in a row is equal to the “number of individuals in Image Class * 10”. Region Class This class contains three individuals namely Top region, Middle region and Bottom
region. The number of individuals are equal to the number of individuals in image class * 3. The snippet of the domain ontology, containing above classes, is given in the following:
?cell ns:hasConcept ?con. { ?con rdfs:label "Water"^^xsd:string} UNION { ?con rdfs:label "Sky"^^xsd:string} } group by ?img order by desc Limit 3. Above query selects an image that contains the concepts of “sky” and “water” in such a way that lower cells having concept of water in middle region or bottom region. The query returns following three images as we imposed a Limit 3, arranged with respect to the degree of similarity (more to less).
Figure 4: A Snippet of Domain Ontology in N3 Notation
Object Properties In order to manage the relationship among classes and deal with qualitative semantic issues, a number of properties introduced, few of those properties are shortly described below: hasCategory property associates each instance of Image Class with respective instance of Image Category i.e. any of six categories mentioned in section 3: Select where {
?image ?cat
?image ns:hasCatogeory ?cat. }
hasConcepts property associates the concept with respective image cells in the dataset. Each image is annotated with eleven concepts like Sky, Foliage etc. hasCells property Each row instance of image is associated with ten cells. This property associates each instance of Row Class with instances of Cell Class. hasPart property associates each image instance to the cells in which each image has been divided (There are 100 cells in each image). User gives a query in native language at a simplified user interface. User interface offers 6 categories; relationships of concepts and concepts for the where clause of query. The simplified queries are transformed into SPARQL queries to perform the searching of RDF triples from available metadata in the ontology. The corresponding RDF triple of user query to search the images as done by Image Retrieval Manager is presented with following variations: Retrieval based on Image concepts User gives a query in native language to “Find all images having sky and water” The corresponding RDF triple of user query is: Select ?img (count(?cell)) where { ?img ns:hasPart ?cell. ?lcell ns:lowerCell ?cell.
Fig-5 Top 3 images retrieved semantically
Semantic Retrieval based on Qualitative Relations User gives a query in native language to “Find all images having more water than grass” The corresponding RDF triple of user query is: Select (max(?img) as ?min) where { ?img ns:hasPart ?cell ?cell ns:hasConcept ?con. ?con rdfs:label "Water"^^xsd:string. { Select (count(?cell) as ?tcells) where { ?img ns:hasPart ?cell. ?cell ns:hasConcept ?con.?con rdfs:label "Grass"^^xsd:string. } group by ?img order by ?cel }
V. RESULTS AND EVALUATION The efficacy of proposed architecture has been measured by comparing the images retrieved semantically in view of queries and descriptions provided by users. One of the most
renowned ways to evaluate such systems is the “Psychophysical Evaluation” [19], where humans choose certain images belonging to certain categories and these images appear "visually similar”. Given a set of images, participants are requested to give description of images such as names of images (e.g. image of forest, images of sky etc), state visible features distinguishing an image from other images in repository (e.g. images with sky and water, images with foliage and snow etc), or query visible traits of image with relations (e.g. images with more foliage than water, images with less mountains than sand etc). Each of the description\query posed by user is furnished to the system developed, which returns almost the same results as described by user(s) earlier amenable to human cognition and context. This implies best performance and nearly 90 % accuracy of proposed system. Remaining 10% of noise is due to human specific understanding and description of images. Here is the description of our experimental setup involving participants, images in our repository and types of queries. Participants for Psychophysical Evaluation 30 Participants were requested to participate in the evaluation of our system through their input. These participants were coming from varying professional and academic domains (e.g. management sciences, fine arts, medical studies, agriculture, and computer sciences etc), different genders and different age groups (ranging from 20 to 55 years) for carrying out the experiments. Participants were divided into two groups of 15 individuals in each group. This diversification in audience was intentionally incorporated to comprehensively observe the cognition of humans with a blend of various demographics. Dataset and Queries for Psychophysical Evaluation Out of the collection of manually classified 300 natural scene images, each group of participants was given same set of 300 images. Hence, every user was given a set of 30 images and each set of 30 images had five images belonging to each category (we devised total six image categories). Each of these images was represented using 11 concepts. The experiments were carried out in different ways where each participant was asked to • Give a short description of the image by considering the high level concepts of the image (query by high level concepts of the images) • Describe an image keeping in view proportion of every concept reference to another concept in the same image (qualitative semantic concepts of the image). Following results were obtained by performing experiment mentioned above: TABLE I.
RESULTS BASED ON HIGH LEVEL SEMANTIC CONCEPTS
TABLE II.
RESULTS BASED ON QUALITATIVE RELATIONS
As evident from results above, proposed approach presents an accuracy of almost 90% in few cases and 100% in most of the cases. Human bias or personal interpretation of concepts has introduced some decrease in accuracy of image retrieval. For example, images of fields and forests have been interpreted the same, same is the case with landscape images and landscape images with mountains. The concepts of grass & foliage, rocks & mountains may be commingled in the same way. VI.
CONCLUSION AND FUTURE DIRECTION
An image retrieval approach based on semantic and qualitative semantic relations is presented signified with descriptive characteristics of an image. Moreover, similarity measures and calculations based on qualitative spaces (i.e. concepts and categories) associated with each image are described and manoeuvred differently for better qualitative semantic image retrieval. The effectiveness of proposed approach, as stated by results, would prove to be another step to shorten the gap between interpretation of machines and humans in retrieving images. We look forward to augment the framework for other heterogeneities i.e. incomplete and incompatible RDF triples. In current framework, we do not consider partially (i.e. incomplete) matched RDF triples that may contain important information. REFERENCES [1] [2]
[3]
[4]
[5] [6]
[7] [8]
[9]
J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73. J. P. Eakins. Automatic image content retrieval – are we getting anywhere. In Proceedings of Third International Conference on Electronic Library and Visual Information Research, pages 123–135, May 1996. T. Hermes, C. Klauck, J. Kreyÿ, and J. Zhang. Image retrieval for information systems. In W. Niblack and R. Jain, editors, Storage and Retrieval for Image and Video Databases III, volume 2420 of SPIE Proceedings, pages 394–405, San Jose, CA, USA, February 1995. SPIE. W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, and G. Taubin. The QBIC Project: Querying Images By Content Using Color, Texture, and Shape. In IS&T/SPIE Symposium on Electronical Imaging Science & Technology, San Jose, CA, USA, February 1993. A. Rosenfeld. Picture processing by computer. ACM Computing Surveys, 1(3):147{176, 1969. R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, inuences, and trends of the new age. In ACM Computing Surveys, 2008 Letter Symbols for Quantities, ANSI Standard Y10.5-1968. Ontology-Based Image Retrieval Eero HyvÄonen, Avril Styrman, and Samppa aarela, University of Helsinki, Department of Computer Science IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 7, JULY 2005 979 A Unified Framework for Image Retrieval Using Keyword and Visual Features Feng Jing, Mingjing Li, Hong-Jiang Zhang, and Bo Zhang Hui Hui Wang, Dzulkifli Mohamad, N.A.Ismail. “Image Retrieval: Techniques, challenge and Trend”. In International conference on
[10]
[11]
[12]
[13]
[14]
Machine Vision, Image processing and Pattern Analysis, Bangkok, 2009] H. Tamura and S. Mori. A data management system for manipulating large images. In Proceedings of Workshop on Picture Data Description and Management, pages 45{54, Chicago, Illinois, USA, April 1977 J. P. Eakins. Automatic image content retrieval – are we getting anywhere. In Proceedings of Third International Conference on Electronic Library and Visual Information Research, pages 123–135, May 1996. Qayyum, Z.U., Cohn, A.G.: “Qualitative Approaches to Semantic Scene Modelling and Retrieval”. Proceedings of 26th SGAI Int. Conference on Innovative Techniques and Applications of Artificial Intelligence, Research and Development in Intelligent Systems, XXIII, SpringerVerlag (2006) 346–359 Hui Hui Wang, Dzulkifli Mohamad & N.A. Ismail “Semantic Gap in CBIR: Automatic Objects Spatial Relationships Semantic Extraction and representation International Journal Of Image Processing (IJIP), Volume (4): Issue (3)2010 V. N. Gudivada and V. V. Raghavan. Content-based image retrieval systems. In Proceedings of the 1995 ACM 23rd annual conference on
[15]
[16]
[17]
[18]
Computer science, number 28 in 9, pages 18–22, Nashville, Tennesse, USA, 1995. ACM Press. D Depalov, T Pappas, D Li and B Gandhi. “Perceptually based Techniques for Semantic Image Classification and Retrieval”. Human Vision and Electronic Imaging XI. Edited by Rogowitz, Bernice E, ; Pappas, Thrasyvoulos N. ; Daly, Scott J.; Proceedings of the SPIE, 6057:354-363, 2006. Haiping Zhu, Jiwei Zhong, Jianming Li, Yong Yu, “An Approach for semantic search by matching RDF graphs”, In the proc. of the Special Track on Semantic Web at the 15th International FLAIRS Conference, pp. 514-519 2002. P Enser and C Sandom. “Towards a Comprehensive Survey of the Semantic Gap in Visual Image Retrieval”. Proceedings of International, Conference on image and Video Retrieval, LNCS, Springer, 27-28:279287, 2003. W Wang, Y Song and A Zhang. “Semantics Based Image Retrieval by Region Saliency”. Proceedings of International Conference on image and video Retrieval, LNCS-2383:29-37, 2002.
[19] Tversky, A. “Features of similarity. Psychological Review”, Vol. 84(4): pp. 327-352, 1977.