A Content-Based Image Retrieval Service for Archaeology Collections Naga Srinivas Vemuri1, Ricardo da S. Torres2, Rao Shen1, Marcos André Gonçalves3, Weiguo Fan1, and Edward A. Fox1 1
Digital Library Research Lab, Virginia Tech, USA. {nvemuri, rshen, mgoncalv, wfan, fox} @vt.edu 2 Institute of Computing, State University of Campinas, Av. Albert Einstein, 1251, CEP 13084-851, Campinas, SP, Brazil.
[email protected] 3 Department of Computer Science, Federal University of Minas Gerais, CEP 31270-901, Belo Horizonte, MG, Brazil.
[email protected]
Abstract. Archeological sites have heterogeneous information ranging from different artifacts, image data, geo-spatial information, chronological data, and other relevant metadata. ETANA-DL, an archaeology digital library, provides various services by integrating the heterogeneous data available in different collections. This demonstration presents an initial prototype for searching DL objects based on the image content, using the Content-Based Image Search Component (CBISC) from Virginia Tech/State University of Campinas.
1 Introduction Archeological systems involve heterogeneous data such as different kinds of artifacts, corresponding images, geo-spatial information, chronological information, and relevant metadata. ETANA-DL [1], an archaeology digital library, tries to integrate this heterogeneous metadata and provides services to its user societies. Archeologists consider artifact’s image data as vital information for documenting, analyzing, and sharing. We address this fact by developing an initial search prototype that uses a Content-Based Image Search Component (CBISC) recently proposed [2].
2 Our Approach The CBISC prototype allows the end user to search for similar DL objects based on the image content present in the system. To perform a query, the user has to upload the query image and specify k to denote the desired number of similar images. The
CBISC component extracts the feature vectors from the query image using the Border/Interior Pixel Classification [3] image descriptor, computes L1 distance with feature vectors of the ETANA-DL image collection, and returns the top k DL artifacts whose image content is similar to that of the query. Figure 1a shows a sample query image, and Figure 1b shows the top 4 similar images returned for the query image. We identify that these returned images appear more relevant compared to the other images present in the collection. The entire figure shows the corresponding DL objects returned by the CBISC component. We used the BIC image descriptor, as it suits well with the kind of images available in our system. Our approach is different from other content based retrieval approaches in that the complete DL object information along with its image content is returned for a query image. We believe that this approach will be very useful especially when collections have inconsistencies in their metadata. From our past experience, it is not uncommon for archaeological collections to have inconsistencies in textual metadata. In such a situation, an archaeologist looking at an artifact in a particular dig might discover other artifacts discovered at the same dig even though the corresponding textual metadata is incorrect. It should be noted that this is more a complimentary strategy than an alternative to textual metadata search.
Fig. 1. An example scenario of content-based image query in ETANA-DL.
3 Conclusion Presently, our focus is to extend this architecture, before the conference, by providing other services on top of this component. The existing recommendation component of ETANA-DL provides recommendations to individual users using a collaborative filtering mechanism. We shall extend it to provide recommendations based on the image content of a selected DL object. Also, we shall perform a comparison with other image descriptors to evaluate their effectiveness for archaeology image collections. After deployment, we shall evaluate these services by performing usability studies. Acknowldgements: This work is funded in part by the National Science Foundation (ITR-0325579). Further support is provided through CNPq, CAPES, FAPESP, CNPq WebMaps and AgroFlow projects, FAEPEX, 5S-QV project grant MCT/CNPq/CTINFO 551013/2005-2 and by the Microsoft eScience grant.
References 1. U. Ravindranathan. Prototyping Digital Libraries Handling Heterogeneous Data Sources An ETANA-DL Case Study. Masters Thesis. Computer Science, Virginia Tech, Blacksburg VA, April 2004, http://scholar.lib.vt.edu/theses/available/etd-04262004-153555/ 2. Torres, R. da S., Medeiros, C.B., Goncalves, M.A., Fox, E.A.: A digital library framework for biodiversity information systems. International Journal on Digital Libraries 6 (2006) 3-17 3. Stehling, R. O., Nascimento, M. A., Falcão, A. X.: A compact and efficient image retrieval approach based on border/interior pixel classification. CIKM 2002: 102-109.