Multimed Tools Appl DOI 10.1007/s11042-012-1249-z
Instance based personalized multi-form image browsing and retrieval Esin Guldogan & Thomas Olsson & Else Lagerstam & Moncef Gabbouj
# Springer Science+Business Media New York 2012
Abstract It is important to adapt and personalize image browsing and retrieval systems based on users’ preferences for improved user experience and satisfaction. In this paper, we present a novel instance based personalized multi-form image representation with implicit relevance feedback and adaptive weighting approach for image browsing and retrieval systems. In the proposed system, images are grouped into forms, which represent different information on images such as location, content etc. We conducted user interviews on image browsing, sharing and retrieval systems for understanding image browsing and searching behaviors of users. Based on the insights gained from the user interview study we propose an adaptive weighting method and implicit relevance feedback for multi-form structures that aim to improve the efficiency and accuracy of the system. Statistics of the past actions are considered for modeling the target of the users. Thus, on each iteration weights of the forms are updated adaptively. Moreover, retrieval results are modified according to the users’ preferences on iterations in order to improve personalized user experience. The proposed method has been evaluated and results are illustrated in the paper. It is shown that, satisfactory improvements can be achieved with proposed approaches in the multi-form scheme. Keywords Content-based image indexing and retrieval . Image browsing . Implicit feedback . Personalized and adaptive image image browsing
This work is supported by Devices and Interoperability Ecosystem—DIEM—project is part of the Finnish ICT SHOK program coordinated by TIVIT and funded by Finnish Funding Agency for Technology and Innovation. E. Guldogan (*) : M. Gabbouj Department of Signal Processing, Tampere University of Technology, Tampere, Finland e-mail:
[email protected] M. Gabbouj e-mail:
[email protected] T. Olsson : E. Lagerstam Unit of Human-Centered Technology, Tampere University of Technology, Tampere, Finland
Multimed Tools Appl
1 Introduction Current social networking systems and public image sharing sites allow image browsing via user-defined or pre-defined categories. Small group of image sharing sites also support image browsing with geo-tags and collaboratively created user tags (folksonomies). Multimedia content analysis is required on such systems to increase browsing efficiency, retrieval accuracy and reliability. Multimedia content is analyzed via low-level features such as color, texture and shape in Content-Based Multimedia Indexing and Retrieval (CBMIR) systems. The use of low-level features does not yield satisfactory retrieval results in many cases; especially, when high-level concepts in the user’s mind are not easily expressible in terms of low-level features. This challenge is called “semantic gap” between low-level feature vector representation and semantic concept of image content [18]. Recent systems intend to combine low and high-level features for achieving significantly higher semantic performance [18, 26, 29]. Images have their own stories that cannot be captured by low-level features. Each image may belong to a certain time, place, event, an object etc. that users are interested in to find and browse. Available social networks on image sharing, editing, classifying have various types of associated information around the images. Such systems are called multi-form systems, which allow users to access to the images through multi ways of information forms. For example, Flickr claims to host more than 4 billion images (January 2011) and most of them are tagged by users. Only during January 2011, 3.6 million geo-tagged images were uploaded to Flickr. Available associated information of the images should be exploited into indexing, browsing and searching scheme in order to improve the semantic accuracy and user experience. All these associated information around images are referred as “forms” in this paper. Each form elicits different angles of an image for accessing and searching. Multiform image browsing stands for browsing images through more than one piece of obtained information from an image and its surroundings such as location, date information etc. Browsing through all available information may not yield satisfactory results, due to language dependencies of tags and mass amount of data. Combining content-based features and available information of multimedia items increase alternatives and choices of the results for the users. Due to diverse and huge amount of retrieval results based on different forms, adaptive and personalized approaches are required for higher user satisfaction and user experience. Understanding human-technology interaction is an essential issue to increase the usability and efficiency of applications that can be adapted according to the user’s needs. User’s perception should be involved in order to fill the semantic gap in retrieval systems. Kosch and Döller [15] stated open issues in multimedia database systems, where one of the conclusions is the user perception needs. Sandhaus and Boll [24] represented an extensive survey on personal and social photo analysis and retrieval. They expressed the importance of involving human perception and usage of photos in such systems. An opportunity should be provided to users for evaluating the results of the system for considering the human perception subjectivity. This approach is called relevance feedback, and it has become a common research study in CBMIR area [12, 30]. Relevance feedback is an iterative process, which improves the accuracy of the system by modifying the parameters of the system based on user’s feedback for the results. Relevance feedback requires an explicit feedback from the user, where users are expected to evaluate the results as relevant and irrelevant ones. This process needs considerable amount of effort, thus users do not prefer it [2, 23]. Consequently, implicit feedback techniques became more popular recently [13, 25]. Implicit feedback techniques collect
Multimed Tools Appl
the information from the user’s actions by tracing behaviors during the search and browsing tasks. Most of the introduced relevance feedback methods in the literature treat single iterations as a separate query [30]. For example Djordjevic and Izquierdo [4] used variance in order to describe discrimination power of the feature. Main drawback of such approaches is the necessity of large amount of labeled data and several iterations for improving the semantic accuracy. More recent approaches employ machine learning [12, 22]. However, the computational complexity significantly increases when the learning process is involved. Additionally, user’s preferences might vary in each session. For multi-object images, user might use the same image for different search tasks. Therefore, if the system learns that a certain picture is associated only for one task; future searches with this picture will yield results with the same objects in the previous query. Human-computer interaction most often includes three main elements: user, technological artifact and task; users perform their tasks with technological artifacts [7]. Interaction between the user and the application includes cognitive processes. Users formulate and develop their action plans according to the task or target. Behavioral targeting is defined as collecting information on users’ behaviors to adapt the system accordingly. Behavioral targeting method and understanding the user behavior are the baselines of this study as a step towards obtaining an accurate image browsing and retrieval system for users. We aim at adapting the system according to users’ tasks and behavior and eventually improve the efficiency of the application by exploiting the statistical data for implicit feedback. In this paper, we use probabilistic user behavior modeling for adapting the system according to the users’ actions. User behavior for image browsing and retrieval task can be viewed as a probabilistic model PðSnext =HðU ÞÞ where Snext represents the next step in user’s action and H(U) represents the history of the user U and P is a probabilistic function [19]. Manavoglu et al. [19] studied a mixture model based approach for learning individualized behavior models for the web users. Torres and Parkes [27] discussed the need for user modeling and adaptability in effective retrieval systems. They form a model based on Bayesian networks and Bayesian user modeling, which can be applied to CBIR applications. Benbunan-Fich et al. [1] presented an exploratory study designed to understand user behavior with new mobile applications. Moghaddam et al. [20] studied visualization technique by generating 2D display of the retrieved images. They introduced a user-modeling technique by collecting the physical interaction of the user to the system for improving the browsing efficiency. Kim et al. [14] addressed user interfaces with user adaptation approach in image retrieval systems. They analyzed human-computer interaction for designing and implementing intelligent user interfaces. User profiling has been utilized in various research domains in the literature for personalization purposes. Kuniavsky [16] gives answers for the questions “find out who your customers are, what they want and what they need”. Indeed, it is the starting point of designing and adapting a system according to the user’s requirements. Kuniavsky also explains the user profiling approach in the book and he expresses the importance of questionnaires for the profiling process in general. Weiss et al. [28] studied user-profile based personalization in order to select and recommend content with respect to users’ interest for automated online video or TV services. In the previous study [9], we introduced adaptive weighting method for satisfying different user needs based on user’s history. The previous image browsing system supports adaptivity between different forms. However, it does not personalize query results within each form. In this paper, we aim to personalize and adapt the system according to the user’s needs and tasks with implicit feedback. It is essential to understand how the users could utilize multi-form information for searching and browsing. We present a user study to
Multimed Tools Appl
understand user behavior in image browsing and searching, utilized by an adaptive weighting method and a novel implicit feedback approach. By understanding users’ targets and tasks we aim at improving browsing and retrieval efficiency and obtaining accurate image categorization structure for successful retrieval results. The novelties of this paper consist of:
& & &
Use of multi-form structures in image browsing and retrieval, Adaptive weighting method based on statistical user behavior modeling for multi-form structures, and An implicit feedback technique for improving efficiency and accuracy of the query results.
The rest of the paper is organized as follows: Section 2 shortly describes our user study on user behavior in image browsing and retrieval tasks and Section 3 briefly discusses the multi-form structures. Section 4 presents the proposed methods for adaptive and personalized image browsing. Section 5 shows the experimental results. Finally, Section 6 concludes the paper along with some future remarks.
2 Understanding user behaviour on image browsing and searching We conducted a brief interview study on people’s existing practices in image browsing and searching. With this we aimed at understanding potentially new needs, requirements, and preferred practices for such activities, and adapt the multi-form image representation according to the users’ preferences. Interviewing as method is an established and widely used method in behavioral sciences and research of human-computer interaction. Especially, those systems involving user interactivity have used interview methods for usability studies [3]. Regarding related work with similar approaches, Jaimes [11] studied human factors, such as human memory, context, and subjectivity, in the use of Flickr, and found these to influence automatic content-based retrieval systems. Eakins et al. [5] used online questionnaire method in order to improve the user interface of CBIR systems. 2.1 User study methodology We approached the topic from a perspective on existing practices in using applications intended for image browsing and retrieval. We set our focus on such applications that are used by diverse types of end-users, not for example applications or databases intended for professional use. However, the target user group for the interviews was selected to include people with high experience in image browsing, searching, and categorization, as well as those with less experience in such activities. We conducted 11 thematic semi-structured single interviews for users of related systems. Examples of used applications in this study were Google image search, Flickr, Facebook, Deviantart, Picasa, Orkut, and various online photo galleries. In other words, the selected applications support image-centered activities and other additional tasks, such as social networking (e.g. Facebook). 2.2 Participants We had 11 participants, 9 of them male and 2 female. The participants included students, researchers, and engineers from computer science, software systems, electronics,
Multimed Tools Appl
telecommunication, and information technology. The age distribution varied from 25 to 45, with median of 28. 10/11 participants reported to search for images for their personal purposes (e.g. hobbies, spending time) and 6/11 reported for professional purposes (e.g. to find illustration for a presentation). 8/11 participants said to search for images daily or several times a week, and 3/11 less frequently. 6/11 reported to have photography as a hobby, and 5/11 to take photos every now and then. On monthly basis, they reported to take roughly 10–200 photos. These facts indicate that all of the participants were at least somewhat photography-oriented. This was evident also in sharing practices: 8/11 shared their photos publicly online and 6/11 to specified recipients (e.g. via Facebook or Flickr). 2.3 Analysis of interview data We transcribed the audio recordings from the interview. These were analyzed with a qualitative approach (coding, categorization) to interpret and identify the main aspects of the interview data. With such constructive research approach and small sample, the results were mainly to help in developing the adaptive multi-form browsing scheme—not to model or quantify the users’ behaviors in statistical sense. 2.4 User study results The study provided us with insights into the participants’ current practices with images, as well as preferences for multi-form image browsing and retrieval. In the following, we give a summary of the main findings. Overall, searching strategies included use of user-generated keywords and tags, content based search, and manual browsing of large image sets. Also, some mentioned to search images by user name, date/time, ratings, location, and technical details such as resolution. Some said to search by browsing the images in order of time (date of upload or similar) but on the other hand, some said that in most cases one does not know how new or old the image is. The keywords used in searches included, for example, description of the content, names of places or people, and event. Most users emphasized that the way of searching images often change from case to case, and it is very hard to identify clear patterns in one’s behavior. The practices are dependent on various aspects, such as has the image been shared by a known person vs. unknown, for what purposes the image is needed (spending time & leisure, professional & utilitarian use, finding for technically or artistically high-quality photos), or if one is searching for a specific image seen before vs. simply any image that well represents some group or topic. A few participants mentioned that often several rounds of keyword-based search are needed: “If I do not find what I am searching for I try to choose more accurate keyword” (male, 25). The aforementioned aspects also affected the tool for searching: for example Google image search was used for generic images and specific applications were used if one has seen the image before in the particular service. One participant clarified the diversity of practices with an example that he might initiate the image query either by a text description to describe an event or by name of a location. Several participants mentioned that the initial retrieving goal mostly determines the rest of the search strategy. This was mentioned for example by a 23 years old male: “I do not think my search strategy varies based on anything else but on what I want to find for”. Overall, content and location-based retrieval were most often considered as most useful. However, at the same time it was very hard to prioritize between these approaches as the initial target for searching significantly affects the first steps in search.
Multimed Tools Appl
With regard to browsing, the participants mentioned to browse by user name, tags, type of content, service suggestions, event, or location. Also, previously seen images might have been bookmarked or marked as favorite, which serves as an easy way to retrieve it later on. “I bookmark web pages including interesting images, because I cannot do favorites from different web pages” (Male, 43). Again, the practice depended, not only on the application used, but also on the initial purpose for browsing. Automatic content recognition and tagging (e.g. faces of people), suggestions for additional tags (e.g. based on visual features and object recognition), and tools for finding relevant similar images were identified by the participants to be the most important parts missing or badly implemented in existing retrieval systems. “It would be nice to see all the images where I am, thus face recognition or automatic tagging would be nice” (male, 25). “By place searching is quite easy, but by content searching could be easier” (male, 31). When asked about needs for content-based search (based on visual features of the image), most participants became interested or even excited—although it was hard for some to envision how it could work in practice. “With this, image categories would be better organized” (female, 28). “It would be nice for example to recognize a certain font from an image” (male, 43). “Automatic tagging based on this would help absolutely” (male, 27). Inquiring about tag-based (folksonomy) search divided opinions. Some considered it useful and had used it successfully before, as discussed above. “It would be helpful when searching something particular—I searched for an image of a fruit and I needed to tag it manually” (Male, 25). On the other hand some considered tags too subjective and dependent on the creator’s (or the one who tags the image) point of view on the content. Thus, an image could be tagged considering its type of content, event of creation, people presented, visual features etc. Additional filtering would be needed in utilizing the tags most efficiently. When asked about location-based search, most participants brought out its usefulness. “If I am about to travel or if I want to show something for my friends. Or if I want to find photos taken from the same location at different times of year.” (male 43). Naturally, the scale was seen to add complexity here: i.e. would the search be on country-level, city-level, street-level etc. Also different versions of place-names in different languages were brought up as a challenge. Overall, when asked about prioritizing between different searching strategies, 5/11 mentioned that content would be their primary strategy if it was currently functioning properly. Location was preferred by 2/11 participants, and several mentioned it to be the second most useful. Surprisingly, time/date came only after content and location, and was mentioned by many as their second or third option. 3/11 refused to make any general-level prioritization as the way of searching was mentioned to be strongly dependent on the situation and purpose of search. Nevertheless, the perceived usefulness of content and location-based search was a slight surprise for us, but on the other hand provides further evidence that such strategies should be better utilized and enabled in image retrieval systems.
3 Multi-form structures Large image collections need to be managed and visualized efficiently. Various methods, algorithms and systems have been proposed addressing image categorization and indexing problems where further studies revealed Content-based image indexing, browsing and retrieval applications (CBIR). CBIR systems often analyze image content via the so-called
Multimed Tools Appl
low-level features such as color, texture and shape. Such low-level descriptors cannot often completely state the semantic concepts from the user’s perspective. As a potential solution for the semantic gap between low-level visual features and human perception, image classification methods have been proposed to categorize images into semantically meaningful classes [13, 17]. However, most of the classification schemes are entirely based on visual information. Recent systems associate the text information to images by unsupervised annotation [4]. In such systems, query process is based on natural-language on automatically annotated image database. Major weakness of this approach is to have limited defined vocabularies. Image categorization results highly impact the accuracy of annotation, which will proportionally affect the retrieval accuracy. Tags and annotations help the system to refer the high-level concepts in an image [8]. However, they do not represent and express comprehensively the information lying in the image individually. Similarly, geo-tags can be used to categorize images by their location, which can be useful when the task is based on location information. We propose that image browsing and searching applications should categorize images based on all available information separately based on heuristic results from the user study. It should be noted that one image may belong to more than one category. In this study, we assume that current images systems (online sharing systems, personal photo collections etc.) have four forms of information available that can be listed as: Location, content, date, and tags and annotations. Independent structures of the forms can be selected depending on a data set. In this study we indexed the images independently for each forms. For location form, image database is indexed hierarchically according to their geo-tags and search results are calculated based on the coordinates. For date, images can be ordered sequentially. For content, hierarchical image indexing method is utilized for content-based indexing of the image databases. Finally, image database is grouped according to their common tags. These four forms allow users to query and browse images through four different ways in order to achieve their target. If the user is initiating the query with an image (Query by example) as represented in Fig. 1, the user might be interested in finding: a) b) c) d)
Semantically similar images that contains same content Images taken at the same day Images taken at the same or nearby locations Images that are tagged similarly by other users
Consequently, if the user is interested in finding images taken at the same place, there is no need to retrieve images taken at different places. Therefore, adaptive weighting approach may be utilized on multi-form structures to update weights of the forms according to the user’s past actions. The user’s actions will also affect the retrieval results via implicit feedback. Therefore, the browsing path, the results and the weight of the forms will be different on every instance. The adaptive weighting method and implicit feedback approach will be described in more details in the next section.
4 The proposed personalized and adaptive image browsing approach In multi-form structures, browsing and retrieval efficiency can be improved by adapting and personalizing the system according to user’s target. Users do not want to put effort on providing explicit feedback, which is a laborious process. Moreover, most of the recent relevance feedback methods requires large amount of positive and negative labeled images in order to improve the query results. Implicit feedback is an accepted efficient way to do
Multimed Tools Appl
Fig. 1 Example of multi-form representation of image browsing scheme
personalized search. Implicit feedback does not require user’s explicit relevance judgment. Instead, implicit feedback deduces the user’s target through the hidden hints provided by the user. However, we need to analyze carefully those hints and do not incorporate noise into the new query, which may even decrease the retrieval performance. In this study, we introduce a novel approach for utilizing hidden hints from user and form the results accordingly. It is assumed that each selected (clicked) image while browsing/querying is a clue for modeling the target of the user. Thus, each selection in the forms is considered as an input to the feedback method in order to estimate the user’s target. Therefore, every selected image should be considered together instead of an individual input. For example, if the user still continues on 4th iteration by selecting four different images on each iteration, all four images should have common information from the user’s perspective and will help to find the target. Therefore, all images should be considered as a set of inputs, which will help personalizing the results according to the user’s needs. The proposed implicit feedback approach works as follows: On each iteration, a visual model is calculated from selected images. The aim is to find (or create) most suitable image and its feature vector for the query by utilizing user’s previous selections. Thus, created visual model (μIQS ) represents the set of images, which are selected by the user, with low-level features by averaging the feature vectors of the images in the set. A simple model is created from the average mean of the feature vectors of the selected images as follows: μIQS ¼ ð1=tÞ
t X fi IQS
ð1Þ
i¼0
where IQS is the set of images selected (clicked) by the user, t represents the number of IQS and f() represents a low-level feature vector of the images.
Multimed Tools Appl
After creating the simple model, nearest neighbors of the model is calculated and ranked as a result. It should be noted that this step is performed first and the distances are given to the adaptive weighting step, which also adapts the number of images that will be shown to the user from that particular form. Thus, implicit feedback affects the overall retrieval results and determines the user satisfaction and retrieval accuracy. Figures 2 and 3 represent example queries on first and fourth iterations respectively in a sample feature space. On every query, distances are calculated from the query image in a feature space and closest items within a defined radius are shown to the user. Circles in the figures represent the closest items that are shown to the user as a result. Figure 3 shows four different selected images by the user, and the visual model created on 4th iteration. The solid circle represents the modified results, where the original query results were supposed to be the dotted circle on 4th iteration. New results are closer to the previous query images in the feature space, which express the semantic relation in theory. In the interviews, a user depicted “depending on the task, I either browse the images by location or by object in the image”. Therefore, we propose that according to the user’s past actions, the weights of the forms can be updated in order to improve the efficiency and accuracy. Thus, user behavior for image browsing and retrieval task can be viewed as a probabilistic model PðSnext =HðU ÞÞ where Snext represents the next step in user’s action, H (U) represents the history of the user U and P is a probabilistic function. Probability and entropy are calculated from the past actions of the user. Additionally, distance of the retrieved items give information about the user’s preferences and actions in order to understand the target. When a form has smallest distance, the weight of that particular form will increase proportionally. In order to express the progress of the proposed scheme, example use cases may be as follows: Use case 1: User initiates the image search with an example image and is interested in finding semantically similar images. Thus the user selects image(s) from the form “content” which are indexed based on their semantic content. In this case, next iteration will retrieve more images consisting of semantically similar images. Use case 2: User initiates the query with an example image of a certain place, and looking for images taken at nearby places. Hence, the user selects image(s) from the group “location”, and in the next iteration the user can retrieve more images based on the location information.
Fig. 2 Sample 1st iteration represented on a feature space
Query Image
Multimed Tools Appl Fig. 3 Sample 4th iteration represented on a feature space
1st Selected Image
2nd Selected Image
Visual Model
3rd Selected Image 4th Selected Image
The weights of the forms can be updated based on the probability of next action as follows: " !# 1 Df Pðf Þ lnðPðf ÞÞ þ PF ð2Þ wf ¼ 1=2 Entropy f ¼1 Df
Entropy ¼
F X
Pðf Þ lnðPðf ÞÞ
ð3Þ
f ¼1
where wf represents the weight of the form f, F represents the number of forms, Df represents the minimum distance that comes from the form f and the sum of the weights are equal to one as follows: F X
wf ¼ 1
ð4Þ
f ¼1
On each iteration, query results are modified and weights of the forms are calculated according to the user’s preferences. Thus, each session may represent different number of images for every single form with varied query results from the previous session. The proposed adaptive weighting method and implicit feedback technique are evaluated with various experiments represented in the following section.
5 Experimental results In the experiments, we have utilized public benchmark MIRFlickr, Corel real-world and Caltech101 image databases for evaluating the accuracy of the proposed approaches. MIRFlickr [10] database contains 25,000 images with 223,500 tags, where the average number of tags per image is 8.94. We have constructed a smaller database with 24 classes, where each class contains 100 images. The sample classes are: sky, water, portrait, baby, animal etc. Caltech [6] dataset has 101 categories each of the classes having 40–800 images. We have constructed a smaller database with 10 classes, where each class contains 100 images. The
Multimed Tools Appl
Average Precision %
sample classes are: airplane, car, motorbike, dolphin, flamingo etc. However, MIRFlickr and Caltech databases do not have available geo-tags. For this reason, location and date information are added manually to the images for the experimental purposes. The database is divided into equal number of images and labeled with random 24 city names and dates. Date and location labeling is performed only for UI purposes in order to support four different forms during the experiments. Moreover, those manually added information are not considered in the numerical evaluation of the proposed method. Images in date and location forms are assumed as nonrelevant to the query image although it retrieves images initially. In order to express the evaluation results objectively, we have utilized only “tag” and “content” information, which are already available in MIRFlickr database. The improvement revealed by the proposed method can only be assessed by “content” and “tag” forms, due to available benchmark databases with ground-truth labels. “Date” and “Location” forms are considered as nonrelevant with the query images, thus Dl ¼ Dd ¼ 1 . In Caltech dataset experiments, only “content” form is considered as the dataset do not include other information forms. For evaluating the implicit feedback results on “content” form, a Corel database with 5000 images used. These images are pre-assigned by a group of human observers to 50 semantic classes each containing 100 images. The sample classes are: Africa, Beach, Buildings, Buses, Dinosaurs, Flowers, Elephants, Horses, Food, and Mountains. 50 queries are performed on the database by selecting five images randomly from each class. Average precision values are calculated based on the retrieval results from these queries. In all experiments, the following low-level color, shape, and texture features are used: YUV, HSV and RGB color histograms with 128-bins, Gray Level Co-Occurrence Matrix texture feature with parameter value 12, Canny Edge Histogram, and Dominant Color with 3 colors. 24 images are shown to the user as a result of a query, where the number of images per form is equal in the first iteration (6 images per form). After the first iteration, depending on user’s selection, implicit feedback approach is performed and after that weights and the number of images for each form are updated accordingly. In order to assess proposed implicit feedback method’s performance clearly, user’s selection is assumed as “content” and the semantic retrieval results are illustrated in figures. Figure 4 represents the average precision values for 4 iterations with only adaptive weighting method introduced in [9] and with proposed implicit feedback method. On each iteration, retrieval results are modified based on implicit feedback and numbers of returned images are also updated. The proposed method improves the accuracy of the retrieval gradually on every iteration on MIRFlickr database.
40 37.5 35 32.5 30 27.5 25 22.5 20 17.5 15 12.5 10 7.5 5 2.5 0
Only Adaptive Weighting Adaptive Weighting with Implicit Feedback
1
2
3
4
Iterations Fig. 4 Retrieval accuracy of the proposed method on MIRFlickr dataset
Multimed Tools Appl 0.4 0.35
Precision
0.3 0.25
Only Adaptive Weighting
0.2
Adaptive Weighting with Implicit Feedback
0.15 0.1 0.05 0 0
0.05
0.1
0.15
0.2
Recall Fig. 5 Average Precision-Recall curve of the proposed method on MIRFlickr dataset
Figure 5 illustrates the average precision and recall values of 30 image queries with four iterations with only adaptive weighting method introduced in [9] and with proposed implicit feedback method. It can be seen from the figures that the proposed implicit feedback approach improves the accuracy of the retrieval results. Figure 6 illustrates the average precision values of 30 image queries on six iterations with only adaptive weighting method and with proposed implicit feedback method. Adaptive weighting method considers only the user’s history on selection of the forms and minimum distance from that particular form. Thus, on every iteration a new query is performed utilizing selected single query image. Numbers of images shown to the user for a query increase proportionally with the adaptive weights. On the other hand, when implicit feedback approach is added to the system, every previous query image considered to construct a new query and query path, thus the system will be personalized for each user. Consequently, on every iteration updated query results and number of images shown to the user will increase according to the previously selected images. If we assume that in six iterations selected query images (or clicked images) are the same, the results of the first iteration will be the same since there is no information from the user’s history. However, on sixth iteration, although the query images are the same, the results will not be the same for two approaches. 45
Average Precision (%)
40 35 Only Adaptive Weighting
30 25 20
Adaptive Weighting with Implicit Feedback
15 10 5 0 1
2
3
4
5
Iterations Fig. 6 Retrieval accuracy of the proposed method on Caltech dataset
6
Multimed Tools Appl 100
Average Precision (%)
90 80
1st Iteration
70
Proposed Method
60
Variance Based Method
50
NN Based Method
40 30 20 10 0 bus
elephant
rose
africans
beach
Fig. 7 Retrieval accuracy of sample classes on Corel database
Implicit feedback method will modify the sixth query by considering the previous five images selected by the user. It can be seen from the figures that the proposed implicit feedback approach improves the accuracy of the retrieval results. The proposed implicit feedback method is compared with the variance-based feature weighting method presented by Djordevic and Izquierdo [4] and the neighborhood-based method presented by Piras and Giacinto [21], and the results are illustrated in Figs. 7 and 8 to evaluate and compare its improvements on image retrieval accuracy. Figure 7 represents the average precision for the experimental results on sample classes. Retrieval accuracy is highly dependent on the content of the image, thus sample classes are represented individually to support this claim. In Fig. 7, first iteration shows the accuracy of a regular image retrieval system with low-level features. Accuracy is improving noticeably after four iterations, where the proposed method outperforms the other two methods. Figure 8 shows the overall retrieval accuracy of 4 iterations in Corel database. In variance-based method, the weights are calculated based on variances of the distances of the positive and negative labeled images in feature space. Max-precision is the highest precision obtained during the experiments for particular single query with the same parameters. It can be observed from the figures that, the 80
Average Precision (%)
75 70 Proposed method 65
Variance Based Method
60
NN Based Method Max Precision
55 50 1st Iteration
2nd Iteration
3rd Iteration
Fig. 8 Retrieval accuracy of overall Corel Database
4th Iteration
Multimed Tools Appl
proposed method outperforms the other two methods. This is due to the fact that the distance variances of small number of labeled samples (positive or negative) do not generalize the whole image database efficiently. For most categories there is a high variance in low-level features over different images. In neighborhood-based method, the weights are calculated based on nearest relevant item and nearest non-relevant item among the labeled images. Nearest relevant image has the lowest distance to the query item in the feature space. Naturally, the closest images are represented to the user in the first iteration. Therefore, the nearest relevant item and nearest non-relevant item do not change after the first relevance feedback iteration. The proposed method, instead of weighting the feature spaces, generates a model for user’s perception and retrieves the items accordingly in the next iterations. In the proposed method, each selected image on iterations affects the visual model considerably; however in the other methods a few labeled samples do not affect the retrieval results significantly.
6 Conclusions In this paper, we proposed an adaptive and personalized image browsing and retrieval scheme with multi-form structures. The user interview study supports our claims that a user’s image browsing and retrieval strategies are based on initial task, which can be estimated with the help of statistical user behavior modeling approach. We introduced novel approaches for adaptive image browsing and an implicit feedback method for personalized image search based on user’s browsing history. Image browsing and retrieval systems may incorporate a users’s perception in order to fill the semantic gap by involving every clue given by the user and adapting the application according to the users’ needs. Modeling the user behavior provides information to estimate the next action of the user, thus helps updating the weights and retrieved results of the system accordingly. Therefore, we mainly aim at understanding the target of the user and personalizing the multi-form image representation according to the user’s past actions. We do not model the user’s behaviors in image browsing scheme. With reference to the result of user interview studies, we confirmed our claim that users’ action strategies on image browsing and retrieval is dependent on initial task. The target of the user can be estimated from the past actions on the multi-form image structures. The proposed multi-form image browsing and retrieval approach is a flexible system, since it is independent from underlying methods of indexing and retrieval. Moreover, it allows various extensions such as different classification schemes. Such characteristics make the multi-form scheme feasible for integration into various domains and image databases. Higher practical benefits and better semantic results can be achieved by improving the underlying indexing structures of the forms. In the future, experimental studies will be carried out using different benchmark image databases with 4 forms. In addition, this work may be extended to multimodal features for multimedia databases.
References 1. Benbunan-Fich R, Benbunan A (2007) Understanding user behavior with new mobile applications. J Strateg Inf Syst 16(4):393–412 2. Bockting S, Ooms M, Hiemstra D, Vet PVD, Huibers T (2008) Evaluating relevance feedback: an image retrieval interface for children. In: Proceedings of the Dutch-Belgian Information Retrieval Workshop, 14–15 Apr, pp 15–20
Multimed Tools Appl 3. Covey DT (2002) Usage and usability assessment: library practices and concerns. Digital Library Federation, Council on Library and Information Resources reports, January 4. Djordjevic D, Izquierdo E (2007) An object- and user-driven system for semantic-based image annotation and retrieval. IEEE Trans Circuits Syst Video Technol 17(3):313–323 5. Eakins JP, Briggs P, Burford B (2004) Image retrieval interfaces: a user perspective. In: Proceedings of Third International Conference on Image and Video Retrieval, CIVR 2004, Proceedings of Lecture Notes in Computer Science 3115, Dublin, Ireland, July 21–23, pp 628–637 6. Fei-Fei L, Fergus R, Perona P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. IEEE. CVPR 2004, Workshop on Generative-Model Based Vision 7. Gray WD, Altmann EM (2001) Cognitive modeling and human-computer interaction. In: Karwowski W (ed) International encyclopedia of ergonomics and human factors, vol 1, pp 387–391 8. Guldogan E, Gabbouj M (2010) Adaptive image classification based on folksonomy. Proceedings of the International Workshop on Image Analysis for Multimedia Interactive Services, WIAMIS 2010, Italy, 12–14 April, pp 1–4 9. Guldogan E, Lagerstam E, Olsson T, Gabbouj M (2010) Multi-form hierarchical representation of image categories for browsing and retrieval. Proceedings of the SMAP 2010, 5th International Workshop on Semantic Media Adaptation and Personalization, Cyprus, December, pp 64–69 10. Huiskes MJ, Lew MS (2008) The MIR Flickr Retrieval Evaluation. ACM International Conference on Multimedia Information Retrieval (MIR‘08), Vancouver, Canada, pp 39–43 11. Jaimes A (2006) Human factors in automatic image retrieval system design and evaluation. Invited paper, IS&T/SPIE Internet Imaging 2006, San Jose, CA, SPIE 6061, 606103, January 12. Jing F, Li M, Zhang H-J, Zhang B (2004) Relevance feedback in region-based image retrieval. IEEE Trans Circuits Syst Video Technol 14:672 13. Kelly D, Belkin NJ (2001) Reading time, scrolling and interaction: Exploring implicit sources of user preference for relevance feedback. In: Proceedings of the 24th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR ‘01), USA, pp 408–409 14. Kim YH, Rhee PK. Automatic adaptation method in intelligent image retrieval system. Proceedings of the IEEE Region 10 Conference TENCON 99. South Korea, vol 1, pp 439–442 15. Kosch H, Döller M (2005) Multimedia database systems: where are we now? In: Proceedings of Int. Assoc. of Science and Technology for Development—Databases and Applications (IASTED-DBA), Innsbruck, Austria 16. Kuniavsky M (2003) Observing the user experience: a practitioner’s guide to user research. Published by Morgan Kaufmann 560 p, pp 129–155 17. Laaksonen J, Koskela M, Laakso S, Oja E (2000) PicSOM—content-based image retrieval with selforganizing maps. Pattern Recogn Lett 21:1199–1207 18. Liu Y, Zhang D, Lu G, Ma W-Y (2007) A survey of content-based image retrieval with high-level semantics. Pattern Recognit 40(1):262–282 19. Manavoglu E, Pavlov D, Lee Giles C (2003) Probabilistic user behavior models. IEEE International Conference On Data Mining, pp 203–210 20. Moghaddam B, Tian Q, Lesh N, Shen C, Huang TS (2004) Visualization and user-modeling for browsing personal photo libraries. Int J Comput Vision 56(1–2):109–130 21. Piras L, Giacinto G (2009) Neighborhood-based feature weighting for relevance feedback in contentbased retrieval. Workshop on Image Analysis for Multimedia Interactive Services, London, UK, May 6–8, pp 238–241 22. Rao Y, Mundur P, Yesha Y (2006) Fuzzy SVM ensembles for relevance feedback in image retrieval. Learning 350–359 23. Robertson S (2001) Evaluation in information retrieval. Lecture Notes in Computer Science 1980, USA, pp 81–92 24. Sandhaus P, Boll S (2011) Semantic analysis and retrieval in personal and social photo collections. Multimed Tools Appl 51:5–33 25. Shen X, Tan B, Zhai C. Context-sensitive information retrieval using implicit feedback. Proceedings of The 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘05, 43, USA, pp 43–50 26. Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380 27. Torres JM, Parkes A (2000) User modeling and adaptivity in visual information retrieval systems. In: Proceedings of the Workshop on Computational Semiotics for New Media 28. Weiss D, Scheuerer J, Wenleder M, Erk A, Gülbahar M, Linnhoff-Popien C (2008) A user profile-based personalization system for digital multimedia content. In: Proceedings of the 3rd International Conference
Multimed Tools Appl On Digital Interactive Media In Entertainment And Arts, DIMEA ‘08, Athens, Greece, September, vol 349, pp 281–288 29. Zhou XS, Huang TS (2000) CBIR: from low-level features to high-level semantics. In: Proceedings of SPIE Image and Video Communication and Processing, pp 24–28 30. Zhou X, Huang TS (2003) Relevance feedback for image retrieval: a comprehensive review. ACM Multimed Syst 8:536–554
Dr. Esin Guldogan received her MSc. and PhD. degrees from Tampere University of Technology, Finland in 2003 and 2008 respectively. Dr. Guldogan is working as a senior researcher and project manager at the Department of Signal Processing at Tampere University of Technology. Currently, she is a visiting researcher at Nokia Research Center. She has been managing a small team collaborating in DIEM-MMR, PeCoCo, Intelligent Media and Large Scale 3D Content Processing projects. Her team focuses on researching and developing content-based multimedia search and classification applications on mobile devices within social networks. Her research interests include: • Multimedia content-based analysis, • Indexing and retrieval, • Dimension reduction of feature space and multimodal feature fusion, • Multi-form semantic multimedia classification, • Image search on social networks.
Thomas Olsson is a researcher and PhD candidate at Tampere University of Technology, Unit of HumanCentered Technology. His research interests include human-technology interaction in mobile and ubiquitous systems, with a special focus on user experience and users’ expectations. He is currently finalizing his dissertation on user experience of mobile augmented reality, expecting to graduate in the end of 2012.
Multimed Tools Appl
Else Lagerstam is a researcher (M.Sc) at Unit of Human-Centered Technology (IHTE), Department of Software Systems, Tampere University of Technology. She graduated as Master of Science in Feb. 2011. Her research interests vary from content interaction and ubiquitous computing to sociological themes: • Mixed & augmented reality • Collective content experience • Human and social behavior with the help of technology
Moncef Gabbouj received his BS degree in electrical engineering in 1985 from Oklahoma State University, Stillwater, and his MS and PhD degrees in electrical engineering from Purdue University, West Lafayette, Indiana, in 1986 and 1989, respectively. Dr. Gabbouj is an Academy Professor with the Academy of Finland since January 2011. He is currently on sabbatical leave at the School of Electrical Engineering, Purdue University, West Lafayette, Indiana, USA (August–December 2011) and the Viterbi School of Engineering at the University of Southern California (January–June 2012). He was Professor at the Department of Signal Processing, Tampere University of Technology, Tampere, Finland. He was Head of the Department during 2002–2007. Dr. Gabbouj was a visiting professor at the American University of Sharjah, UAE, in 2007–2008 and Senior Research Fellow of the Academy of Finland in 1997–1998 and 2007–2008. His research interests include multimedia content-based analysis, indexing and retrieval, nonlinear signal and image processing and analysis, voice conversion, and video processing and coding. Dr. Gabbouj was Honorary Guest Professor of Jilin University, China (2005–2010). Dr. Gabbouj is a Fellow of the IEEE. He served as Distinguished Lecturer for the IEEE Circuits and Systems Society in 2004–2005, and Past-Chairman of the IEEE-EURASIP NSIP (Nonlinear Signal and Image Processing) Board. He was chairman of the Algorithm Group of the EC COST 211quat. He served as associate editor of the IEEE Transactions on Image Processing, and was guest editor of Multimedia Tools and Applications, the European journal Applied Signal Processing. He is the past chairman of the IEEE Finland Section, the IEEE Circuits and Systems Society, Technical Committee on Digital Signal Processing, and the IEEE SP/CAS Finland Chapter. He is a member of IEEE SP and CAS societies. Dr. Gabbouj was recipient of the 2012 Nokia Foundation Visiting Professor Award, the 2005 Nokia
Multimed Tools Appl Foundation Recognition Award, and co-recipient of the Myril B. Reed Best Paper Award from the 32nd Midwest Symposium on Circuits and Systems and the NORSIG Best Paper Award from the 1994 Nordic Signal Processing Symposium. He was also the supervisor of the main author receiving the Best Student Paper Award from IEEE International Symposium on Multimedia, ISM 2011. He is co-author of 480 publications.