Adding a Semantic Layer to Flickr Images Search ... - Semantic Scholar

5 downloads 92873 Views 2MB Size Report
we present an application scenario for Flickr: in general .... Flickr clusters returned for the tag bottle ..... clusters: one with head “Android” and the other one with.
Adding a Semantic Layer to Flickr Images Search Service D. Barbuto, G. Contaldi, S. Senatore Dipartimento di Informatica - Università degli Studi di Salerno via Ponte don Melillo – 84084 Fisciano (SA) email: [email protected], g.contaldi@ studenti.unisa.it, ssenatore@ unisa.it Abstract—The growing amount of images on the Web, the diffusion of social media sharing web sites demand effective tools for searching targeted images. In general, the performance of Web image search depends on the quality of images annotation, but often the keywords (or tags) associated to an image are given without relevance information, strictly connected to a subjective feeling of the taggers and far from the objective description of the image. In this paper, we propose a simple approach for social media sharing web sites such as Flickr, Zooomr, etc. to support users to retrieve images semantically correlated to a given tagged image. In this paper, we present an application scenario for Flickr: in general Flickr returns all the images that meets the input tags, without no semantic analysis and evaluation of the effectiveness of the search results. Our approach adds a semantic layer on the Flickr output: processes the tags associated to the retuned images to discover the appropriate semantics of them, by arranging the results in a more user friendly view.

processing is applied. After the analysis of tags, our application returns relevant images, organized in semantic categories (through filtering/merging of redundant or blend information discovered in these categories). Each semantic category has associated a tags-cloud, and, when possible, it is hierarchically described by means of its own subcategories. The remaining of the paper is organized as follows: Section II provides an overview about Flickr and its main uses and functionalities exploited by web communities; Section III is devoted to describe the add-on provided by our application, through a formal description and modeling of the approach. Then, Section IV shows the application at work, in order to evidence the new functionalities. Experimental results and conclusion close the paper.

Keywords: tagging, similarity measure, semantic clusters, Flickr images, tf-idf.

Flickr can be considered the most popular photo sharing website that allows users to upload their own photos into customizable albums that can then be tagged, organized, and publicly posted. In August 2011, Flickr reported that it was hosting more than 6 billion images organized by more than 9 million registered Web-users. The annotation is achieved manually and generally provides information about the content of the photos or contextual and semantic information, so often most of the tags associated to an image by Flickr users are imprecise and not informative. In addition, the average number of tags for each Flickr image is quite small [4], because generally users cannot think of many words [7] in a short moment and do not like to spend some time thinking about the additional or more precise tags. Moreover, the tags are given without an importance order and then they are saved according to the input sequence. This way, the tags are not always representative of an image, or worst, quite unrelated. Some studies reveal, indeed there are only around 50% tags actually related to the image [2] and only less than 10% of the images have the most relevant tags at the top position in their attached tag list [3]. The nature of Flickr tagging has moved some communities to study and analyze the users’ behavior, evidencing that the main motivation that moves user to tag photos is to make them better accessible to the public [4]. The study presented in [5] reveals the modalities by which the users tag their photos and what type of tags are usually used: in general, users prefer to tag the photos with more than one word, increasing the spectrum of meaning associated to each photo, even though, this way may contribute to augment the amount of noisy [6].

I.

INTRODUCTION

The images discovery on the World Wide Web needs sophisticated content-based image retrieval techniques, still far from bridging the semantic gap between human concepts that are expressed by keyword-based queries, and visual features that are extracted from the images [1]. Yet, the wide diffusion of Web 2.0 applications and semantics associated to the resources prove the users are willing to provide contextual information through annotation. The success of social media web sites such as Flickr is due to easily share user-generated tags, to describe the image contents on the Web, according to personal feeling and motivation. Recent user-behavior analysis outlines the annotation of Flickr images is driven by personal idea, intention, and the desire to improve the photos accessibility and retrieval to the general public [4]. Thus, the users are used to tag their photos to share them with family, friends, and online community at large, and they use their own terms to describe the images that can be different by other users that tag similar images [8]. Due to the self-centered human nature in the tagging task, the images search and retrieval represent tricky activities that need preliminary processing to capture objective aspects in the human generated tag, associated to an image. To face this issue, our approach aims at returning Flickr photos semantically related to a given one. We start giving as an input an already existing Flickr photo. The tags associated to a given photos are taken into account to discover similar photos; no further technique of image

II.

TAGGING IN FLICKR

The freedom of tagging has sparked the growth of the community contributed multimedia content available online, but, as the other side of the coin, has limited the access of the social media, making the photos search and retrieval ineffective. Just querying Flickr tag-based image search service 1 and discovering it cannot provide the option of ranking the tagged images; the result of a query, indeed is a sequence of images, each one contains in the tag list the query-word, even though the returned images do not meet the meaning of the query, the effective user request.

clusters related to a given tag. We call collection the set of these clusters and head the tag associated to the clusters of the collection. Formally: Definition 1: Given a tag t, a collection Cl(t) = {C1, C2, …Cn} is a set of the clusters obtained by t. The tag t is called head of Cl: head(Ci) = t, with i=1, …,n. Our goal is to find out clusters that can be merged, because describe the same semantic concept. We evaluate similarity/relatedness among clusters belonging to different collection. Definition 2: Let X and Y be two clusters such that head(X) ≠ head(Y). X is related to Y, rel(X, Y) if there is a tag t ∈ X that appears as head in Y, head(Y) = t. Note that rel(X, Y) ≠ rel(Y, X). Definition 3: Let X and Y be two clusters such that head(X) ≠ head(Y). X and Y are mutually related and are denoted as sim(X, Y) when it holds: sim(X, Y) ⇔ rel(X, Y), rel(Y, X). (1) Definition 3 strictly depends on the reciprocal inclusion of head-tags; it emphasizes that if there are two clusters whose heads are included in one another, then they are considered similar and are candidate to be merged.

Figure 1. Flickr clusters returned for the tag bottle

Flickr is not sensitive to the order of tags associated to an image, but maintains Flickr clusters, which provided a popular tag, give related tags grouped into clusters. For example exploring Flickr clusters for the word apple2, four different categories are returned: laptop, fruit, smartphone and NYC. Yet, the most of the times that we call for clusters associated to a word, the results are not very meaningful. Figure 1 shows the Flickr clusters returned with tag bottle: there are more than one cluster whose meaning is quite confused and uncertain. The clusters seem partially overlapped, including uncertain and blend information. This is due to the fact that a word appears as a tag in the photo, even though it is not related with the photo content. III.

FORMAL BACKGROUND

Our approach aims at discovering photos similar to a given one, by exploiting the semantics behind the tags associated to those photos. To do this, we consider all the tags associated to the input photo, and, for each tag, we consider the related Flickr clusters. Some basic formalism is given, in order to better describe the approach. Particularly, a cluster-based similarity measure has been defined and then, it is expressed in term of tag-based similarity measure. As said and shown in Figure 1, Flickr maintains all the 1 2

http://www.flickr.com/search/ http://www.flickr.com/photos/tags/apple/clusters/

Figure 2. A Flickr photo and the relative XML-based tags.

Example 1: let us suppose that after an API call, Flickr returns the photo shown in Figure 2 and the associated tags: android, froyo and google. Then, for each tag, a call for clusters is launched. The result is: − Cl(“Android”) = {(“paranoid”, “radiohead”, ”marvin”), (“google”, “htc”, “tmobile”,…), (robot, scifi, cyborg, …), (starwars, c3po)} − Cl(“Froyo”) = {(“frozenyogurt”, “yogurt”, “food”, …)} − Cl(“Google”) = {(“g1”, “android”, “phone”, …), (“yahoo”, “flickr”, “art”), (“maps”, “street”, ... )} From Definition 2 and Definition 3 we get the cluster (“google”, “htc”, “tmobile”,…) of Cl(“Android”) and the cluster (“g1”, “android”, “phone”, …) of Cl(“Google”) are

similar. Now, let us suppose that the necesssary implication in (1) does not hold; that means one of the tterms on the right is not true. That means that, considering foor simplicity two collections, there is a head-tag of a coollection that is included in a cluster of the other collectioon, but the viceversa is not true, i.e., the head of the latter collection is not included in one of the clusters of the first collection. Thus, to evaluate some similarity between these two collections, we know having the partial relation from (1): we call cluster pivot, the cluster that includes the tag-headd from the other collection. In order to get the other partt of relation, we evaluate the similarity between the cluster pivot and all the clusters in the other collection (i.e., the ccollection whose head is in the cluster pivot). More formally: Definition 4: Let Cl1 and Cl2 be two collections with two clusters X ∈ Cl1 and Y ∈ Cl2 respectiveely. Let t be a tag such that head(X) = t and rel(Y, X); thhen Y represents the cluster pivot. Example 2: considering the previous eexample, let us suppose that the cluster (“google”, “htc”, “tmobile”…) is missing. Then, the cluster pivot is the first ccluster with head “Google”, that is (“g1”, “android”, “phone”, …). The similarity is evaluated between the clusster pivot and one of the clusters in the other collection. The ggoal is to find out the cluster more similar to the cluster pivot. Definition 5: Given X={t1, t2, …, tn} and Y ={r1, r2, …, rm} two clusters of tags, such that X is the ccluster pivot. The similarity between X and Y, sim (X, Y) is: n

∑ max sim(X, Y) =

1< j < m

s (t i' , r j' )

i =1

(2)

min(| X |, | Y |)

where |.| stands for the size (cardinnality) of a set (cluster), A={ (t i' , r j' ) ) | s(t i' , r j' ) > 0}, ii.e., the set of all the lexical similarity greater than zero,, t i' and r j' are the stems of the tags ti, and rj, respectivvely. The value s(t i' , r j' ) is computed as follows:

s(ti' , r j' ) = syn(t i' , r j' ) − err (t i' , r j' ) It is 0 when

syn(t i' , r j' )

=

syn(t i' , r j' )