In using web search engines, there are cases where the name of the target object is ... RDF domain ontology and MPEG-7 visual descriptors. However, in this ...
Visual Description Conversion for Enhancing Search Engines and Navigational Systems Taro Tezuka and Katsumi Tanaka Graduate School of Informatics, Kyoto University {tezuka, tanaka}@dl.kuis.kyoto-u.ac.jp
Abstract. In using web search engines, there are cases where the name of the target object is unavailable, and the user can only give the visual descriptions of the object. The existing keyword-based search engines have limited capabilities under such situations. In the real-space oriented search engines also, there are often cases where the user wants to search using the visual characteristics of the object. In the car or walk navigation systems, the visual descriptions of the buildings are often more useful than their names, when traveling in an unfamiliar area. As a fundamental technology for converting names and visual descriptions of objects, we investigate the method of extracting these pairs from large size texts, such as the Web and encyclopedias. The extracted information is integrated to meet the requirements for such conversions. Keywords: Visual description extraction, web search engine, real-space search, car navigation, meta-search.
1 Introduction The object name and visual description are two most common metadata available for the concrete objects. The conversion between these two attributes is effective in various situations, as indicated below. Searching: Today’s web search engines have high reliability in finding web pages when the object name is specified. For example, if the product’s name was given, its web site comes into top entries in most cases. However, the name of the entity is not always available. The existing search engines do not give reliable results when only the product’s description was given. In case where the name of the object is not known by the user, it is often difficult to reach the product’s page only by descriptions. Navigation: In car or walk navigation systems, the names of the building are not helpful if the user is not well acquainted with the area. The visual descriptions are preferred in these situations. Since most existing navigational systems store the building data by their names, a conversion system between the object’s name and visual description is necessary to meet this requirement. X. Zhou et al. (Eds.): APWeb 2006, LNCS 3841, pp. 955–960, 2006. c Springer-Verlag Berlin Heidelberg 2006
956
T. Tezuka and K. Tanaka
The conversion module between object names and visual descriptions can be applied to various fields. However, such general conversion engine does not exist yet. The goal of this research is to create a system for converting object names with visual descriptions, and vice versa, by applying knowledge mining to large text sources, such as the Web and encyclopedias, as indicated in Figure 1.
Fig. 1. Applications of name and visual description conversions
The rest of the paper is organized as follows. Section 2 is the related work. Section 3 discusses the searching ad navigation based on visual descriptions. Section 4 discusses extraction of the name and the visual description pairs from large size text data. Section 5 is the conclusion.
2 Related Work The use of visual descriptions has been discussed for various applications. Jaimes and Chang described a conceptual framework for indexing different aspects of visual information[1]. Mark et al. modeled visual descriptions specific to geographic objects[2]. The M-OntoMat-Annotizer is a tool for semantic annotation on images and videos [3][4]. It has the Visual Descriptor Extraction (VDE) tool as a plug-in that links RDF domain ontology and MPEG-7 visual descriptors. However, in this model MPEG7 visual descriptors must be registered manually beforehand. There have been researches on image based searching. Julia discussed the real-space search environment[6]. Watanabe et al. implemented a system where the characters contained in a picture is used as a search query[7]. Finding similar images based on color and texture is a popular mechanism. In these systems, images are used as search queries[8][9]. However, searching based on the shapes of the objects is still difficult, especially for three dimensional objects[10][11]. Therefore, at present moment, our method of searching based on visual descriptions is effective.
VD Conversion for Enhancing Search Engines and Navigational Systems
957
Nakaoka et al. discussed search engines for children[12]. They pointed out that children often have difficulties in giving names of objects they want to search for. The children’s search engine must provide means to understand children’s intentions without asking for specific object names. The use of visually significant objects such as landmarks in car navigation has been discussed for example by Michon and Denis, Elias and Brenner[13][14][16]. These researches suggest that landmarks provide effective mean of route navigation. However, visual descriptions of landmarks are more effective than the landmark names, especially for the first time visitors.
3 Searching by Visual Description As the domain of the search engines continue to expand, the search engines must evolve also. In the real-space mobile or ubiquitous environment, object names are not always available, and the visual descriptions are the only queries that the user can provide. At present moment, object identification based on RFID (radio frequency identification) tags have not been wide spread yet, and the image recognition technology is still under development. The automatic identification of the object being unsatisfactory in many situations, our model of converting visual descriptions to object names is effective mean of enhancing the search engines. The application examples are as indicated below. Tourism: The user is unable to give the name of the building he/she sees. He/she can only give the present location, and describe the building. This is also true when he/she encountered any kind of unfamiliar objects while traveling. Unfamiliar tools: In case of rarely-used tools such as the home workshop tools, the user can not remember their names, and only give their descriptions. Web search queries by children: In the web search engines for children, visual descriptions play important role, since children often can not give the exact name of what he/she wants to search. Instead, they can describe the objects using visual impressions. In general, the visual-description based search engines from the user side consist of the following steps. 1. 2. 3. 4.
The user send visual description as the query. The system stems the description. The system find the object name that best matches the stemmed terms. The search engine finds the corresponding web page.
On the other hand, spatial description can be used for navigation tasks also. Most of the car or walk navigation systems today have visualization mechanism by either the map or 3D graphics interface. 3D graphics is said to require too much user attention, and therefore not preferred while driving. Describing routes by machine voice is often safer for the drivers. In such case, the visual descriptions of the buildings are effective.
958
T. Tezuka and K. Tanaka
The visual description is often more useful than the object’s name in car or walk navigation. For example, if the name “Tokyo Station” was provided, the user who is not well acquainted with the area has no idea what kind of building he/she must look for. In such case, providing the visual description “the old red brick building” is helpful.
4 Visual Description Mining from the Web and Encyclopedias The visual description mining means obtaining the pairs of the object names and visual descriptions from data sources. The dependency analyzer, a commonly used tool in natural language analysis, is used to extract noun-to-describer relationships. From a preliminary experiment, we have extracted actual terms or phrases that indicate visual descriptions. Some of these indicators found on the extracted surrounding texts are listed in Table 1. The frequencies of these terms in the surrounding texts are used for distinguishing visual descriptions from general descriptions. Table 1. Visual description indicators Type
Example
Case indicators
of, in, on, at, to, beside, inside, by, next to, into, compared to
Action indicators
see, watch, look up, remember, learn, amaze, go, visit
Visual adjectives
large, significant, huge, high, tall, beautiful, plush, impressive, one of a kind, white, brown, red, blue, art-nouveau, Renaissance-like, Modern, European, Oriental
Visual components
illumination, clock tower, floor, landmark, design, facade, windows, roof, brick, stone, steel-frame, neon sign
Components of the VD-Miner, our visual description mining engine, are as follows. 1. 2. 3. 4.
The parser obtains surrounding texts from the source texts. The stemmer converts terms into stems. The dependency analyzer extracts terms describing the target object. The aggregator integrates the terms to quantize the descriptions.
The scripts were written using Perl. Google Web API was used to collect the source web pages[17]. The dependency analysis is performed using a Japanese dependency analyzer KNP[18]. The result is stored as a relational database built on the PostgreSQL database management system.
VD Conversion for Enhancing Search Engines and Navigational Systems
959
Figure 2 is the overall structure of the VD-Miner.
Fig. 2. Components of the VD-Miner
Although our present system uses the Web as the source of data, visual descriptions can also be extracted from encyclopedias in the same way. The data mining is easier and the result is more trustworthy in this case, since encyclopedias have structured data.
5 Conclusion This paper discussed the conversion system between the name and the visual description of objects. We have discussed enhanced search interface and navigational systems as the application examples. We gave the overview on the mining of visual description based on web resources and encyclopedia data. The experiments were performed for significant geographic objects, and the frequencies of visual descriptions contained in the surrounding texts were measured. We have also extracted indicators that signify visual description phrases from other descriptions, based on the obtained text. Future works include refinement of the indicators of visual descriptions, the mechanism for the automatic extraction of visual descriptions from encyclopedia, and also the merging of the retrieved information into the knowledge base.
References 1. A. Jaimes and S. F. Chang, A conceptual framework for indexing visual information at multiple levels, SPIE Internet Imageing 2000, San Jose, California, 2000 2. D. Mark, B. Smith and B. Tversky, Ontology and geographic objects: An empirical study of cognitive categorization, in C Freksa and D. Mark (Eds), Spatial Information Theory: Cognitive and Computational Foundations of Geographic Information Science, Lecture Notes in Computer Science 1661, pp. 283-298, Springer-Verlag, 1999
960
T. Tezuka and K. Tanaka
3. S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouvaras, Y. Avrithis, S. Handschuh, I. Kompatsiaris, S. Staab, and M. G. Strintzis, Semantic annotation of images and videos for multimedia analysis, 2nd European Semantic Web Conference (ESWC 2005), Heraklion, Greece, 2005 4. K. Petridis, Semantic annotation of images and videos: The M-OntoMat-Annotizer tool, http://www.iti.gr/ kosmas/work.html 5. T. Yumoto, Q. Ma, K. Sumiya and Katsumi Tanaka, A dynamic content integration language for video data and Web content, Proceedings of the 4th International Conference on Web Information Systems Engineering (WISE2003), pp.83-92, Roma, Italy, 2003 6. L. Julia, Augmenting humans’ experiences, Australian Conference on Computer-Human Interaction (OzCHI 99), Wagga Wagga, Australia, 1999 7. Y. Watanabe, Y. Okada, Y. B. Kim and T. Takeda, Translation Camera, Proceedings of the 14th International Conference on Pattern Recognition, Brisbane, Australia, 1998 8. J. R. Smith and S. F. Chang, Local color and texture extraction and spatial query, IEEE International Conference on Image Processing, Vol. 3, pp. 1011-1014, Lausanne, Switzerland, 1996¡ 9. W. Y. Ma, D. Yining, and B. S. Manjunath, Tools for texture/color based search of images, Proceedings of the SPIE The International Society for Optical Engineering, Vol.3016, pp. 496-507, 1997 10. J. R. Smith and S. F. Chang, Visualseek: a fully automated content-based image query system, ACM Multimedia Conference, pp. 87-98, Boston, Massachusetts, 1996 11. T. F. Cootes, A. Hill, C. J. Taylor, and J. Haslam, Use of active shape models for locating structures in medical images, Image and Vision Computing, Vol. 12, No. 6, pp. 355-365, 1994 12. M. Nakaoka, Y. Shirota and K. Tanaka, Web information retrieval using ontology for children based on their lifestyle, Proceedings of International Special Workshop on Databases for Next Generation Researchers (SWOD2005), Tokyo, Japan, 2005 13. P. E. Michon and M. Denis, When and why are visual landmarks used in giving directions?, D. R. Montello (Ed.), Spatial Information Theory: Foundations of Geographic Information Science, Lecture Notes in Computer Science 2205, Springer-Verlag, pp. 292-305, 2001 14. B. Elias and C. Brenner, Automatic generation and application of landmarks in navigation data sets, P. Fisher, (Eds.), Developments in Spatial Data Handling, pp. 469-480, SpringerVerlag, Berlin, 2004 15. M. Raubal and S. Winter, Enriching wayfinding instructions with local landmarks, M. Egenhofer and D. Mark (Eds.), Geographic Information Science, Lecture Notes in Computer Science 2478, Springer-Verlag, pp. 243-259, 2003 16. T. Tezuka, Y. Yokota, M. Iwaihara and K. Tanaka, Extraction of cognitively-significant place names and regions from web-based physical proximity co-occurrences, X. Zhou, S. Su, M. P. Papazoglou, M. E. Orlowska, and K. G. Jeffery (Eds.), Web Information Systems - WISE 2004, Lecture Notes in Computer Science 3306, pp. 113-124, Springer-Verlag, 2004 17. Google Web API, http://www.google.com/apis/ 18. Japanese dependency analyzer KNP, http://www.kc.t.u-tokyo.ac.jp/nl-resource/knp.html