From Concepts to Implementation and Visualization: Tools from a Team-Based Approach to IR Uma Murthy1, Ricardo da Silva Torres2, Edward A. Fox1, Logambigai Venkatachalam1, Seungwon Yang1 and Marcos A. Gonçalves3 1
Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA 2 Institute of Computing, University of Campinas, SP, Brazil 3 Computer Science Department, Federal University of Minas Gerais, Belo Horizonte, Brazil
{umurthy, fox, lokeya, seungwon} @vt.edu,
[email protected],
[email protected]
EXTENDED ABSTRACT Researchers have been studying and developing teaching materials for information retrieval (IR), such as [3]. Toolkits also have been built that provide hands-on experience to students. For example, IR-Toolbox [4] is an effort to close the gap between the students’ understanding of IR concepts and real-life indexing and search systems. Such tools might be good for helping students in non-technical areas such as in the Library and Information Science field to develop their conceptual model of search engines. However, they do not cover emerging topics and skills, such as content-based image retrieval (CBIR) and fusion search. Although there is open source software (such as those in http://www.searchtools.com/tools/tools-opensource.html) that can be used to teach basic and advanced IR topics, they require a student to have high-level technical knowledge and to spend a long time to gain a practical understanding of these topics. We present a new and rapid approach to teach basic and advanced IR topics, such as text retrieval, web-based IR, CBIR, and fusion search, to Computer Science (CS) graduate students. We designed projects that would help students grasp the abovementioned IR topics. Students, working in teams, were given a practical application to start with – the Superimposed Application for Image Description and Retrieval [5]. SAIDR (earlier, SIERRA) allows users to associate parts of images with multimedia information such as text annotations. Also, users may retrieve information in one of two 2 ways: (1) Perform text-based retrieval on annotations; (2) Perform CBIR on images and parts of images that look like a query image (or part of a query image). Each team was asked to build an enhancement for this application, involving text retrieval and/or CBIR, in three weeks time. The sub-projects are described in Table 1. The outcome of this activity was that students learned about IR concepts while being able to relate their applicability to a real world problem (Figure 1). Details of these projects may be found at http://collab.dlib.vt.edu/runwiki/wiki.pl?TabletPcImageRetrievalS uperimposedInformation. We will demonstrate the tools developed along with the IR concepts they illustrate (Table 1). We believe these tools may aid others to learn about basic and advanced topics in IR.
ACKNOWLEDGMENTS Thanks go to NSF (DUE-0435059), Microsoft (tablet PC grant), CAPES, FAPESP, and CNPq to help support this work. Thanks to all students of CS5604, who worked on the sub-projects (listed at http://collab.dlib.vt.edu/runwiki/wiki.pl?SICBIRStudentsRoster).
Figure 1. Enhancing learning by connecting IR concepts and practical applications. Table 1. IR Concepts and Tools to Learn about Them Concept Text-based retrieval - comparison with DB retrieval - collection, organization, and retrieval of web-based info CBIR - indexing - descriptors - feature vectors - distance function Fusion search
Sub-projects/tools developed that demonstrate this concept Sub-project 1: Replace database retrieval of annotations with Lucene-based [1] retrieval. Sub-project 2: Associate parts of images with not just text annotations, but also parts of web pages. This extracted information would be indexed for later retrieval. Sub-project 3: Develop synchronization of information about annotations and images between (local) SAIDR and Flickr [2] Sub-project 4: Add M-tree indexing for (fish) images collection to provide more efficient retrieval and distance computation for similarity queries. Sub-project 5: Find the best of 6 descriptors for a collection of (fish) images. Sub-project 6: Combine results from text retrieval and CBIR, considering: • Weights assigned to algorithms (e.g., text retrieval 30% weight vs. 70% for CBIR) • Combination based on weighted arithmetic, geometric, or harmonic mean • Score based on similarity, or rank.
REFERENCES [1] The Apache Lucene project: http://lucene.apache.org [2] Flickr – a photo sharing application: http://flickr.com [3] M. Chau, Z. Huang, and H. Chen. Teaching key topics in computer science and information systems through a web search engine project. JERIC, 3(3):2, 2003. [4] E. N. Efthimiadis and N. G. Freier. IR-toolbox: an experiential learning tool for teaching IR. In SIGIR '07, p. 914, New York, NY, USA, 2007. ACM. [5] U. Murthy, R. da S. Torres and E. A. Fox, SIERRA - A Superimposed Application for Enhanced Image Description and Retrieval. In LNCS 4172/2006, Springer, 540-543.