Iterative Annotation of Multi-relational Social Networks St´ephane Peters
Ludovic Denoyer
Patrick Gallinari
University Pierre et Marie Curie LIP6 Paris, France Email:
[email protected]
University Pierre et Marie Curie LIP6 Paris, France Email:
[email protected]
University Pierre et Marie Curie LIP6 Paris, France Email:
[email protected]
Abstract—We consider here the task of multi-label classification for data organized in a multi-relational graph. We propose the IMMCA model - Iterative Multi-label Multi-Relational Classification Algorithm - a general algorithm for solving the inference and learning problems for this task. Inference is performed iteratively by propagating scores according to the multi-relational structure of the data. We detail two instances of this general model, implementing two different label propagation schemes on the multi-graph. This is the first collective classification method able to handle multiple relations and to perform multi-label classification in multi-graphs. The target application is image annotation in large social media sharing web sites (Flickr). The goal is to assign labels for images when users and images are connected through multiple relations - authorship, friendship, or visual/textual similarities. We show that our model is able to deal with both content and social relations and performs well on real datasets. Additional experiments on artificial data allow us analyzing the behavior of our method in different situations.
I. I NTRODUCTION Recently, social network mining has become a major area of interest for computer science. Different generic tasks have emerged from this research. Analyzing the structure of the networks has motivated an intensive academic research with topics like the detection of communities ([1]), or the inference of hierarchical structures ([2]). Closer to machine learning preoccupations and to the focus of this paper, is the classification of “pieces of data” - images, documents, videos, ... - organized according to a social network. This classification task corresponds to different real-world applications such as scientific articles classification ([3]), spam filtering ([4]), etc. Different models have been developed for classifying linked data. They come from different streams of research, either from the machine learning ([5], [6], [7]) or data mining ([3], [8], [9], [10]). Most models share the common idea of representing a social network by a graph where nodes represent the data to be classified and edges represent the relations between data elements. These models suffer from different limitations. Early methods for example ([5], [7]) only considered the relational structure for propagating node labels, completely ignoring the node content (e.g. text). All current models only consider simple graphs with one type of relation. This is clearly a strong limitation in the context of social networks applications, where multiple relations (e.g. friendship, comments, authorships) among resources and users, bring
rich complementary information. Another limitation, probably easier to remedy to, is that up to our knowledge, all current models have been designed for single-label classification problems, i.e. only one label is assigned to each node. This does not fit with applications like image annotation, where multiple labels are associated to network node elements. We consider here the task of classification or annotation of the nodes of a partially labeled multi-relational graph in a multi-label setting with a large set of possible labels. The targeted applications are classification problems in complex social networks, like automatic image/documents/videos annotation in large social sharing Websites (YouTube1 or Flickr2 for example). We propose a new model called IMMCA for Iterative Multi-label Multi-Relational Classification Algorithm which is an extension of the Iterative Classification Algorithm ([9], [10]) for multi-relational graphs and multilabel classification. This model is able to learn from a partially labeled graph how to label other nodes, using both the semantic node content and the complex multi-relational structure of the underlying social network organizing the data. This is, up to our knowledge, the first collective classification method designed for multi-relational data and for multi-label classification. The model is evaluated on a dataset collected from Flickr. Note that this formal model developed for social network applications can also be applied to any multi-relational dataset. The contributions of the paper are threefold: • We propose a new learning algorithm (IMMCA) for solving the problem of multi-label, multi-relational graph classification. This model uses both the content information and the different types of relations among data, and learns a distinct propagation model for each type of relations and labels. These relations can come from an underlying social network but also from similarities computed over the objects to annotate. • We present two instances of the model corresponding to two label propagation schemes over the graph structure. They allow us to deal with different learning conditions 1 http://www.youtube.com 2 http://www.flickr.com
and to easily adapt the model to different datasets and relation types. • We present an experimental evaluation on three datasets. The two first ones are a small and a larger datasets extracted from Flickr and correspond to the image annotation problem: the objective is to learn to tag images with many possible tags. On the small corpus, comparisons with mono-relational baselines, show that the proposed model is able to take benefit from the richer information present in the different relations. Experiments on the large corpus show that it scales well. The third dataset is an artificial one, and is used for demonstrating the ability of our model to deal with heterogeneous relations among data and for further analyzing its behavior. The paper is organized as follows. Section II gives an overview of existing methods for graph classification and image annotation. In Section III, we briefly present the ICA algorithm, which inspired the IMMCA model. Section IV describes the IMMCA algorithm. We present the inference and learning steps in Sections IV-A and IV-B. We then detail in Section IV-C two instances of the model corresponding to two different label propagation schemes. At last, we describe a large set of experiments made on both a real corpus of image annotation extracted from Flickr (Section V-A) and on artificial data (Section V-C). II. E XISTING W ORK A. Graph labeling Among the models recently proposed for graph labeling in the machine learning community, one can distinguish two main families. The first one, Regularized Models, has developed transductive or semi-supervised models for minimizing an objective function upon the graph structure. These models consider that a good labeling should be smooth upon graph structure so that the score of a node is close to the scores of its neighbors. This is done by using a regularization term Ω defined over all the pairs P of connected nodes (i, j) with weight wi,j such as Ω = wi,j (fi −fj )2 where fi and fj are the predicted scores i,j
of nodes i and j. This term Ω penalizes differences between scores of connected nodes. Many variants of this model have been proposed depending on the nature of the graph data (with content at each node [4] or without content, using only the graph structure [5]), depending on the nature of the task (transductive [11], or semi-supervised [4]) and using different variants of the regularization term [7]. Different algorithms have been used for minimizing this loss from classical gradient descent to algebraic methods implementing random walks. These models have been mainly used for binary classification. They have recently been generalized to the problem of ranking nodes [12] and to multi-label annotation [13]. The second family, Iterative Models, makes use of inference algorithms which label nodes iteratively given their current context, i.e the current labels of their neighbors. Example of
such models are the Iterative Classification Algorithm (ICA) [10], [14], [9] and its SICA variant [6], Gibbs Sampling methods [14] or Stacked Learning models [15]. They all learn a local re-estimation function so as to estimate the probability P (li |Ni ) of labeling node i knowing the labels of its neighbors Ni . Different methods for learning this probability have been developed, for example by using simulation [6] or stacked learning machines [15]. These models have been used for single-label classification tasks with applications like the thematic categorization of websites or of scientific articles for example. All these models have been designed for mono-relational graphs and cannot be used for multi-relational datasets. The are mostly aimed at single-label or binary classification/ranking of nodes. B. Image Annotation Besides generic graph classification, specific methods have been developed for classifying networked data in the context of different applications. We focus in this review on image annotation since this is our target application here. There is an important literature on image annotation and tag recommendation in classical (non graph) settings using for example generative or latent models ([16], [17]). The methods recently proposed for exploiting links between images mainly exploit tag propagation ([18]) and use classical label propagation techniques based on random walks or semisupervised learning. The graph is generally a similarity matrix over the images. This matrix may encode visual similarities as well as similarities on metadata information, e.g. time and location. Most methods exploiting similarities cannot deal with tags/labels referring to non-observable concepts (e.g. season). Other methods make use of the semantic relations among tags. For example ([19], [20]) exploit tag co-occurrence in order to improve their predicted ranking . None of the above methods is able to handle the multirelational structure induced by a social network. III. P RELIMINARIES A. Notations and Tasks Let us consider a multi-relational graph G = (N , R) defined by: • A set of nodes N = (n1 , ...nN ) where N is the number of nodes. 2 • A set of relations between nodes R = {ri,j }, i, j ∈ N . ri,j may encode a multiple relation between nodes i and j, it is described below. The nodes correspond to the pieces of information we want to annotate. Typically, they can be images, documents, users, etc... This information is organized into a network represented by the set of relations R. We consider that each node contains content information (visual or textual for example) encoded into a features vector, ni ∈ RD where D is the dimension of the content space. We also consider that nodes are connected through different types of relations. These relations can correspond for example to a friendship relation between
users, an authorship relation between images or documents, or similarities between two nodes. Let us denote R the number of types of relations. Let 1 ri,j (1) ri,j = ... R ri,j
k where ri,j ≥ 0 be the weight of the relations k between ni k = 0 means that there is no relation of type k and nj . ri,j between the two nodes. We consider a set of possible labels denoted L = (1, . . . , L) where L is the number of labels. Let us denote yi = (yi1 , . . . , yil , . . . , yiL ) the scores of the different labels for node ni , so that the higher yil is, the more relevant the label l is for node ni . In this article, we address the annotation problem as a learning problem where the goal is to find the labels of unlabeled nodes in a partially labeled graph. Let us denote Nl = (n1 , ..., nl ) ⊂ N the set of nodes where the y values are known. In the context of a social image sharing website, these nodes correspond to images that have been manually labeled by users. l is the number of labeled nodes. For these labeled nodes, thye label scores are defined as: ( 1 if ni belongs to label l l ∀ni ∈ Nl , ∀l ∈ L, yi = (2) −1 elsewhere
Let Nu = (nl+1 , ..., nN ) be a set of unlabeled nodes3 . The goal is to predict the missing values yil for all ni ∈ Nu , using node content information, relations in R and labeled nodes. B. Iterative Classification Algorithm ICA is a simple and efficient collective classification method [9], [10]. We have built on this framework for our new algorithm. Note that the ideas presented here for multiple relations and multi-labels could be used with other baseline models than ICA, which could also be extended to this multirelation, multi-label setting. We briefly describe here the ICA principles before introducing our multi-relational, multiple label extension in the next part. ICA has been proposed for classifying nodes of a partially labeled mono-relational graph4 . It performs in two steps: • Learning step: It consists in training, a classification function fθ for classifying a node using its content and the labels of its neighbors. This function is learned on the labeled part of the graph using classical machine learning algorithms like perceptron, or SVM. The learned function is called re-estimation function in the following. • Inference step: The inference iteratively selects at random unlabeled nodes and computes their new label using the re-estimation function considering the current label assignment of their neighbors. This step is repeated 3 We
consider that l α where α is a manually fixed threshold which allows us to limit the number of relations kept. For all the experiments, we have used different training set sizes, keeping 10 % of the corpus as a validation set for choosing the gradient step, the L2-regularization parameter λ and the number of gradient descent iterations. The reported results have been computed on the remaining images. For lack of place, we only report here results obtained with 10% of training labeled nodes but the conclusion of the different experiments remains the same with all the training sizes tested. Note that, the bigger the training size is, the better the performances are. The performance has been computed as the average on 5 runs. We focus here on different aspects of the experiments: 1) Multi-relational versus Content Only: In Figure 2, we compare our model with a max-margin perceptron classifier operating on the content of the images (textual or visual) without using any relation. For all L values, our model clearly outperforms the content only model particularly with a large
40
VC Only
All Relations
S propagation scheme Fig. 2. AVPR for Multi-relational IMMCA with ΦGP c versus Content Only models (max-margin perceptron) w.r.t the number of labels L
number of possible labels. With 1000 labels, our approach is four to five times better than the content only approach. This can easily be explained by the fact that many Flickr labels are not directly related with the visual content of the pictures, e.g. type of camera (canon or nikon), dates (2007,2008,...), locations. 2) Mono-relational versus Multi-relational: Here, we compare the performance of the mono-relational model - where we only use one relation - with the multi-relational IMMCA model. Our mono-relational, multi-label model is similar to single-label classification ICA, but uses scores of the neighbors instead of assigned labels for ICA. We have computed the performance of mono-relational model for each of the 5 different S relations, using the ΦGP representation which models the c simplest propagation scheme with content information. The performance is presented in Figure 3 for 10% of images in the training set and different numbers of tags. First, one can see that the authorship relation (AR) clearly dominates the other ones. This means that each author tends to use the same set of tags for his own pictures. The multi-relational model performs better than the mono-relational one, particularly for the P@3 measure presented in Figure 4 and for large numbers of labels. These results show the ability of our approach to exploit the richer information provided by many types of relations. Moreover, even if the authorship relation can lead to good performances here, with lower training and inference times, our multi-relational approach allow us to combine information from different sources without knowing the contribution of each of them. 3) Influence of the Propagation Scheme: We have used the four propagation schemes in the experiments presented in Figure 5. The first scheme (GPS) only learns one propagation scheme for all the labels, the LPS scheme learns one different model for each label. On the Flickr corpus, GPS clearly outperforms LPS. This can be explained by the fact that LPS needs more parameters to be learned and thus, needs more training examples than GPS. Moreover, for this corpus the AR relation clearly dominates all others and propagates well labels. We present in part V-C, experiments on artificial data,
0,7
0,6
0,6
0,5
0,5 AVPR
AVPR
0,7
0,4 0,3
0,4
0,3 0,2
0,2 0,1
0,1 0 10
0 10
20
30
40
50
100
200
300
500
20
30
40
1000
50
100
200
300
500
1000
Number of Tags
Number of Tags AR
FR
CR
MR
TR
All Relations
0,35
0,7
0,3
0,6
0,25
0,5
0,2
0,4
0,15
LPS without content
0,3
0,1
0,2
0,05
0,1
0
LPS with TC
GPS without content
Fig. 5. AVPR for the multi-relational IMMCA for the two propagation schemes, with and without Textual Content, for a 10% training set w.r.t the number of labels L
AVPR
P@3
Fig. 3. AVPR for Mono-relational versus Multi-relational IMMCA with S propagation scheme using Textual Content for 10% in training set ΦGP c w.r.t the number of labels L
GPS with TC
0
10
20
30
40
50
100
200
300
500
1000
10
20
Number of Labels AR
FR
CR
MR
TR
30
40
50
100
200
300
500
1000
Number of Tags
All Relations
Fig. 4. P@3 for Mono-relational versus Multi-relational IMMCA with S propagation scheme using Textual Content for a 10% training set ΦGP c w.r.t the number of labels L
showing conditions where the LPS model outperforms the GPS one. 4) Influence of the content information: Figure 5 and Figure 6 show the influence of the use of content information in the model. First, one can note that the textual content of images (title of the images) gives higher performances than color histograms. Indeed, many Flickr labels are completely decorelated from the visual content (e.g. 2007, or summer labels) and are easier to find using the title of the pictures. Moreover, the propagation model without content performs as well as the propagation with the textual information. For Flickr, content information is extremely noisy and does not bring additional relevant information for tag scoring compared to label propagation. This is clearly due to the nature of the data of this corpus and other datasets might certainly lead to other conclusions. B. Large Flickr corpus The large flickr corpus is composed of 47,065 pictures and 100 possible labels. Due to its size, we only consider the textual content TC of the pictures and the three following types of relations: AR, FR and TR. The training set is composed of S propagation 10 % of the pictures and we have used the ΦGP c
GPS with VC
GPS with TC
GPS without content
Fig. 6. AVPR of multi-relational IMMCA with different content information and scheme ΦGP S for a 10% training set w.r.t the number of labels L
scheme. The results are summarized in Figure 7. They clearly confirm the ability of the model to deal with multi-relational data and to outperform the mono-relational configurations. These experiments show that the model is able to deal with large corpora. Note that, on a classical computer, the learning and inference processes spend about 1 hour. C. Propagation Model on Artificial datasets In order to further analyze the behavior of the two propagation schemes GPS and LPS, we run a set of experiments on an artificial multi-relational dataset generated as follows: • It is composed of N = 10, 000 nodes , L = 3 possible labels and R = 3 type of relations. • Each node is empty (no content information) • Relations of type 0 are randomly chosen among the different nodes • Relations of type 1 connect a majority of nodes with labels 1 and a few nodes with label 0 or 2 • Relations of type 2 mostly connect nodes with labels 2 and a few nodes with label 0 or 1 Figure 8 shows the performance of both propagation schemes on this dataset, as a function of the training set size. One can see here that LPS better propagates the labels and
Relations Random Model AR FR TR AR, FR and TR conjointly
AVPR 3.6 % 36.5 % 18.4 % 12.5 % 38.5 %
S and 10% of the Fig. 7. Results on the large Flick corpus, with ΦGP c pictures in the training set
Weights for Relation 0 Relation 1 Relation 2
GPS All Labels 1.11 0.81 -1.7
Label 0 0.33 1.08 -1.87
LPS Label 1 -1.75 0.94 -0.49
Label 2 -0.14 -1.04 0.29
Fig. 9. Weights learned by the model on the artificial dataset. GPS learns one weight for each type of relation while LPS learns one weight for each type of relation and each label. Bold weights show the relevancy of the LPS propagation scheme when the labels propagate differently depending on the type of relations.
1
ACKNOWLEDGMENT This work was partially supported by the French National Agency of Research (Fragrances, ANR-08-CORD-008-01 and ExDeus/Cedres, ANR-09-CORD-010-04, Projects).
0,95
AVPR
0,9
0,85
R EFERENCES 0,8
0,75 0,01
0,1
0,2
0,3
Training size GPS
Fig. 8.
LPS
AVPR for the GPS and LPS schemes w.r.t the training size
obtains better performances. This experiment shows the ability of LPS to capture the specificity of relations 1 and relations 2 while GPS, which learns the same propagation scheme for all labels, fails. Figure 9 shows the weights θ learned by our models (see part IV-C). For the GPS model, one can see that the model learns propagation with positive weights for relation 0 and relation 1. With such weights, it is not able to propagate label 2 accurately. For the LPS model, the weights are different for the three labels. Particularly, the model learns high weights for relation 1 and label 1 which means that it has learned that label 1 propagates through relations of type 1. For label 2 and relation 2, we can see the same effect. This set of experiments also show that the learned parameters can be used in order to understand which label propagates on which relations. VI. C ONCLUSION We have proposed the Iterative Multi-label Multi-Relational Classification Algorithm (IMMCA), a new model for the image annotation task in social networks which can take advantage of both the content and the multiple relations available in such networks. It is not restricted to social networks and can be applied to any multi-relational graph. This model is an extension of the ICA algorithm to the multi-relational, multilabel classification task. The underlying idea is to learn a label propagation scheme through the different relations of the graph and then to use this learned propagation scheme to iteratively label each unlabeled node. We have described four instances of our model. Experimental evaluation on both a real-world dataset and an artificial one demonstrate the effectiveness of our approach and help understand the relevance of the variants of the proposed model.
[1] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of community hierarchies in large networks,” CoRR, vol. abs/0803.0476, 2008. [2] A. Clauset, C. Moore, and M. E. J. Newman, “Structural inference of hierarchies in networks,” CoRR, vol. abs/physics/0610051, 2006. [3] M. Bilgic, G. Namata, and L. Getoor, “Combining collective classification and link prediction,” in ICDM Workshops, 2007, pp. 381–386. [4] J. Abernethy, O. Chapelle, and C. Castillo, “Web spam identification through content and hyperlinks,” in AIRWeb ’08. New York, NY, USA: ACM, 2008. [5] D. Zhou, J. Huang, and B. Sch¨olkopf, “Learning from labeled and unlabeled data on a directed graph,” in ICML, 2005, pp. 1036–1043. [6] F. Maes, S. Peters, L. Denoyer, and P. Gallinari, “Simulated iterative classification a new learning procedure for graph labeling,” in ECML/PKDD (2), 2009, pp. 47–62. [7] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨olkopf, “Learning with local and global consistency,” in NIPS, 2003. [8] S. Hill, F. J. Provost, and C. Volinsky, “Learning and inference in massive social networks,” in MLG, 2007. [9] P. Sen, G. M. Namata, M. Bilgic, L. Getoor, B. Gallagher, and T. EliassiRad, “Collective classification in network data,” University of Maryland, College Park, Tech. Rep. CS-TR-4905, 2008. [10] D. Jensen, J. Neville, and B. Gallagher, “Why collective inference improves relational classification,” in KDD ’04. New York, NY, USA: ACM, 2004, pp. 593–598. [11] X. Zhu, Z. Ghahramani, and J. D. Lafferty, “Semi-supervised learning using gaussian fields and harmonic functions,” in ICML, 2003. [12] S. Agarwal, “Ranking on graph data,” in ICML ’06: Proceedings of the 23rd international conference on Machine learning. New York, NY, USA: ACM, 2006, pp. 25–32. [13] L. Denoyer and P. Gallinari, “A ranking based model for automatic image annotation in a social network,” Tech. Rep., 2010. [14] S. Macskassy and F. Provost, “A simple relational classifier,” in The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2nd workshop on Multi-relational Data Mining, 2003. [15] Z. Kou and W. W. Cohen, “Stacked graphical models for efficient inference in markov random fields,” in SDM, 2007. [16] Li and Wang, “Real-time computerized annotation of pictures,” in IEEE transactions on pattern analysis and machine intelligence, 2008, pp. 985–1002. [17] F. Monay and D. Gatica-Perez, “On image auto-annotation with latent space models,” in MULTIMEDIA ’03. New York, NY, USA: ACM, 2003, pp. 275–278. [18] L. Cao, J. Luo, and T. S. Huang, “Annotating photo collections by label propagation according to multiple similarity cues,” in MM ’08, 2008. [19] B. Sigurbj¨ornsson and R. van Zwol, “Flickr tag recommendation based on collective knowledge,” in WWW ’08. New York, NY, USA: ACM, 2008, pp. 327–336. [20] L. Wu, L. Yang, N. Yu, and X.-S. Hua, “Learning to tag,” in WWW, 2009, pp. 361–370. [21] Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. S. Kandola, “The perceptron algorithm with uneven margins,” in ICML, 2002, pp. 379–386.