Exhaustive and Guided Algorithms for

0 downloads 0 Views 1MB Size Report
II our professional social network and how we extracted it. We detail in ... The social network has been extracted from Microsoft Academic search website ... Page 8 .... The Information Scent Search (IIS) picks the next person who has the.
Exhaustive and Guided Algorithms for Recommendation in a Professional Social Network Maria Malek, Dalia Sulieman EISTI-Laris laboratory PRES Cergy University FRANCE [email protected], [email protected] July 31, 2010 Abstract This paper proposes skills recommendation algorithm in a professional social network. This network consists of a set of persons with professional weighted ties. To answer the request of an actor, the system recommends a list of other actors that match the best requested criteria. We propose two recommendation algorithms based on three types of knowledge: The first type deals with information concerning the person. This information is stored in the actor vertex level and constitues the user profiles description. The second type of information is computed from the network structure itself. Actually, this consists of exploring the links starting from the initial actor exploring the maximum spanning tree whose the root is the initial actor. We can thus reduce the search space of target actors. While the third type of information is based on the betweenness centrality measure associated to each actor. This measure enables to estimate the control of an actor over other pairs of actors. We use this measure to extract the best paths from the previous spanning tree.

1

1

Introduction

A social network is a set of people or groups of people with some pattern of contacts or interactions between them. Social networks analysis is defined as the study of social entities such as people in organizations called actors, and their interactions and relationships. A social network is modelled by a graph or network, where each vertex is a node (actor) and each edge is a relationship. We can study the structural properties as well as the role and the social prestige of each actor [12, 9, 7]. We can also find different types of sub graphs such as communities formed by groups of actors with common interests, by isolating the group individuals with a high density [5]. The social network can be also a source for the development of recommendations: find an expert in a given field, suggest products to sell, offer a friend, etc. This development may be based on paths exploration algorithm, degree analysis ([1, 2]). In this paper, we propose an algorithm for computing recommendation of skills in a professional social network. Our network consists of a set of persons with professional weighted ties. To answer the request of an actor, the system recommends a list of other actors that match the best requested criteria. An example is to search for a person whose expertise matches a given task. We work on a social network which is composed of authors related together by similarity links. These authors are extracted from bibliographic data. We propose two recommendation algorithms based on three types of

2

knowledge: • The first type deals with information concerning the person. This information is stored in the actor vertex level and can be represented by an ontology describing user profiles. • The second type of information is computed from the network structure itself. Actually, this consists of exploring the links starting from the initial actor exploring the maximum spanning tree whose the root is the initial actor. We can thus reduce the search space of target actors. • The third type of information is based on the betweenness centrality measure associated to each actor. This measure enables to estimate the control of an actor over other pairs of actors. We use this measure to extract the best paths from the previous spanning tree. The reminder of this article is organized as follow: We describe in section II our professional social network and how we extracted it. We detail in section III our approach for expert recommending where the exhaustive ad the guided algorithms are proposed. We present then our experiment results in section IV. In section V, some related works are described. We finally conclude.

2

Professional network

Social network is composed of authors extracted from bibliographic data. In this graph, nodes are the authors, while the evaluated edges are the similarity degree between these authors. Each author Z has a given profile P roZ 3

. This profile is described by a weighted vector of keywords Ti , these keywords present the topics the authors’ interests. P roZ = {(T1 , P1 ), (T2 , P2 ) . . . , (T1 , P1 )} The goal of the system is to recommend, in response to a certain author query, a group of ranked authors according to the similarity between their profiles (terms of interests Tm ) and the query terms. For that we had to extract a social network that presents the authors and the relations between them. The social network has been extracted from Microsoft Academic search website libra.msra.cn. Firstly, we have firstly extracted a connected network of authors from this site. The obtained network is described as a valued directed graph (see figure 1 and 2), the nodes of this graph are the authors while the edges of this graph are the citations between these authors, each edge has a value representing the number of citations between two connected authors. This social network is presented by a matrix L. In this matrix: Lij equals n if author i cites a author j n times. In fact, this network presents the citations number between authors (not the similarity between authors), then, we have extracted another social network which is the similarity network depending on this network as described in the next section.

2.1

The similarity based social network

The similarity social network is represented by a non-oriented graph, its nodes present authors and its edges present the similarity between authors. For every node, a weighted vector of keywords is extracted and stored to describe the user’s profile as mentioned above.

4

We suppose that two authors are structurally similar if they: cite a certain number of authors in common or if they are cited by a certain number of authors in common. The similarity relation in this network is based on two matrices, the cocitation matrix and the bibliographic coupling matrix. 2.1.1

Co-citation matrix

The co-citation matrix measures the similarity between authors. It is computed by: Cij =

n X

Lki Lkj

(1)

k=1

where L is the matrix representing the social network of citations as mentioned above (see figure 1). According to this matrix, if two authors cite a certain number of other authors in common, then we can say these two authors have similar interests. 2.1.2

Bibliographic coupling

The bibliographic coupling matrix is another similarity measure between authors which is given by : Bij =

n X

Lik Ljk

(2)

k=1

According to this matrix, if two authors are cited by a certain number of other authors (they are in the bibliography of other authors), then these two authors are similar.

5

2.1.3

Structural similarity graph

The similarity graph is defined as the sum of the two previous matrices the co-citation matrix C and the bibliographic coupling matrix B. A similarity relation between two authors is created if they cite the same authors or if they are cited by a common author and if the two nodes i and j satisfy the condition [B + C][i][j] >= threshold. In this case we obtain a similarity based social network from the citations based social network (see figure 1).

Figure 1: From citations graph to similarity graph.

6

Figure 2: Two non-directed similarity graphs extracted from the global directed graph, the first one is denser.

3 3.1

Recommendation algorithm The algorithm idea

The idea is to propose a search algorithm which combines the semantic aspect, the structure and the social networks proprieties: The semantic part is the information stocked about the actor (the person) within each node. In other termes, it is consists of the user profile. The structural part is the information described by the network structure. Our contribution consists of using the maximum spanning tree in order to enhance the search performance. The social part consists of using the betweenness of actors in order to retain certain paths which are more prestigious than others.

7

3.1.1

The semantic part

We compute the similarity between the request Rx of an author X, and the profile of an author Z : • RX is the request of X and is composed of a set of terms Ti : RX = {T1 , T2 .., Tn } • P roZ is the profile associated to the actor Z presented by a set of weighted terms : P roZ = {(T1 , P1 ), (T2 , P2 ).., (Tm , Pm )}. The similarity is given by: P

P roZ .Pj i=1 P roZ ∗ Pj + |RX \P roZ |

sim(Rx , P roZ ) = Pm

j∈inter(RX ,P roZ )

(3)

With: inter(RX , P roZ ) = {k ∈ {1, . . . m}, such as, P roZ · TK ∈ RX } 3.1.2

The structural part

We extract the maximum spanning tree from the valuated similarity graph using the Kruskal algorithm ([8, 4]) and by taking the maximum edge values instead of the minimum values. We aim to enhance the research by finding an optimized navigation in the spanning tree, in stead of exploring the whole or even a part of the graph. 3.1.3

Nodes’ beetweenness

The betweenness centrality is given by the equation:

CB (i) =

X Pjk (i)

P jk 8

(4)

Where: Pjk (i)Is the number of the shortest paths between j and k, which pass from the node i. Pjk Is the number of the shortest paths between j and k. The use of the betweenness allows to prefer certain more privileged search paths for the requested recommendation.

3.2

The algorithm

To elaborate some recommendation, we propose to navigate a covering spanning tree in seated of considering the whole graph. This will help to take significant navigation paths and to enhance the system performance. The recommendation algorithm searches a response to the user request by searching the extracted spanning tree. The algorithm input is composed of a request Rx posed by an author X, this request is formed as a chain of keywords T i . Rx = {T1 , T2 . . . , Tn }. The algorithm output corresponds to a response to the author X request which is presented by a weighted sequence of recommended authors {(Z1 , P1 ), (Z2 , P2 ).., (Zn , Pn )} ; as well as the semantic chain connecting the two actors X, Zi

1

(see figure 4).

The algorithm is given as follow: 1. Compute the maximum spanning tree (see figure 3). 2. Compute and store the betweenness of all the nodes. 3. Extract from the spanning tree a ranked list of actors to recommend 1

The semantic chain connecting the two actors X, Zi is constituted of the list of terms extracted form the profil of nodes (authors) relating X to Zi

9

by using the exhaustive algorithm or the guided one.

Figure 3: The maximum spanning tree computed for the similarity graph

3.3

The exhaustive version

1. Search the spanning tree starting by the user X (figure 4) and using the breadth first strategy. We search for the nodes Zi where: sim(RX , P roZi ) >= threshold to recommend to X. 2. Compute the rating Pi associated to each author Zi , this rating depends on two values : the similarity and the betweenness centrality of the authors on the path of the solution. 

Pl

 sim(RX , P roZi ) ∗

Pi =  

j−1

sim(RX , P roZi )

CB (Yj ) l



if l ≥ 1  if not

 

(5)

Y1 , Y2 , . . . , Yl−1 is the set of authors present on the path relating X to Zi . 10

Figure 4: Searching the spanning tree using the breadth first search algorithm - An exemple of an authors list to recommend can be [Z4 , Z3 , Z1 , Z2 ] ranked according to their rating measurements, the semantic chain between X and Z4 is [pro(X), pro(Y1 ), pro(Y2 ), pro(Z4 )].

3.4

The guided version

We propose a second version which is more efficient that allows to search solution, by finding more quickly the search path in the spanning tree instead of applying the breadth first strategy. We use an heuristic allowing to choose the next node to visit among a set of candidates ones; we apply the A* algorithm that allows to choose the node Y that maximise the following heuristic:

h(Y ) = sim(P roX , P roY ) ∗ CB (Y ) until we reach the node Z that verifies:

sim(X, Z) >= threshold.

11

We can prove that our heuristic is monotone and that it decreases slowly on the solution’s path, we can prove also that it recognizes the solution. On the other hand, we show with experiments (see next section) that this version converges more quickly to the solution and succeeds to explore from 11% to 49% from the spanning tree explored by the exhaustive version.

4

Experimentations

Table 1 presents some statistics about the social network (that describes the similarity between authors): nodes number, edges number and graph density (in social network the graph density is small). Figure 5 presents the degrees distribution of this network and shows that it evolves according to the power law distribution. Nodes number Edges number Graph density

7065 1 009 940 4, 05. 10-2

Table 1: Some statistics about the social network describing the structural similarity between authors. We now present an exemple of an experience : we suppose that the author ”Francesco Masulli ” submit a request composed of three terms : T1 = Ranking, T2 = Clustering, T3 = Data mining: by applying the algorithm, we obtained table 2 as output, it shows a group of ranked authors to recommend. This table gives the name of the recommended authors as well as their rating values and their distances from it. We have also evaluated the guided version compared to the exhaustive one. We have done ten experiences: each experience begin with a request 12

Figure 5: Degrees distribution of the extracted social network. Author Mikolaj Morzy Steven Warner Bob Garcia Wendy Gersten Manuel Lozano Matthias Schonlau Lyane T Watson Carl Wunsch Yang Seok Kim David W Aha

Rating 0.0072 0.0005 0.00037 0.00029 0.00027 9e-005 7.89e-005 6.78e-005 3.38e-005 2.39e-005

Distance 2 2 2 3 2 2 2 2 2 2

Table 2: Recommendation results: the found authors, their ratings and their distances from the root author who sent the request. elaborated by an author X (which becomes the root of the spanning tree). For each request, we apply both versions of the algorithm and we pick up the following measurements (see table 3): The rank of the found (recommended) author by the guided algorithm remember that the exhaustive algorithm propose for the same request a set of recommended authors and their ranks. 13

The number of visited nodes by the guided algorithm The computation time We notice that for 8 experiences (see table 3), the rank number 1 is found by the guided version, while the rank number 2 is found for the 2 other experiences. Only a part of the spanning tree is searched by the guided version. The search space is thus reduced 11% to 49%. The computation time is also reduced. N 1 2 3 4 5 6 7 8 9 10

The exhaustive algorithm Recommended author Rating Computation time Andrew Emili 0.00064 159,41s G V Belle 0.00141 159,35s Hans A Kestler 0.00060 150,41s Jimin Pei 0.00002 160,61s John F Canny 0.00003 159,99s C Wang 0.00010 157,37s J Michael Brady 0.00001 162,68s Peter G Neumann 0.00022 160,72s Peter Eades 0.00004 153,95s Liang Chen 0.00019 161.71s

Recommended author Andrew Emili (1) G V Belle (1) Yuichi Asahiro (2) Jimin Pei (1) John F Canny (1) C Wang (1) J Michael Brady (1) Elizabeth J O neil (2) Peter Eades (1) Liang Chen (1)

The A* algorithm Computation time 109,27s 17,45s 11,66s 32,52s 21,77s 233,99s 118,74s 40,49s 54,47s 14,14s

Table 3: Comparaison of the breadth first search algorithm and the A* algorithm: each experience corresponds to a request sent by an author root, the recommended authors correspond to those who have the most important rating. We notice that the first author found by the exhaustive algorithm is found also by the A* for 8 experiences.

5

Related work

Graph algorithms have been used for experts’ recommendation in social networks. These strategies are essentially ([14]): • Breadth First Search which broadcasts the query to every person in a social network. • Random Walk Search (RWS) that randomly chooses one of the current’s neighbor to whom to spread the query. 14

explored graph 39.25% 21.13% 13.86% 20.02% 11.77% 49.13% 41.14% 24.88% 30.95% 16.67%

• Best Connected Search (BCS) proposed by [3] which makes use of the skewed degree distribution of many networks. • The Weak and Strong Ties algorithms are based on the idea that the connections between two individuals can have different strengths. The strength of association varies and is not always symmetric. • Hamming Distance Search (HDS) picks the neighbor which has the most uncommon friends with the current user • The Information Scent Search (IIS) picks the next person who has the highest match score between the query and the profile. Searching expertise in social network has been approached in Zhang and Ackerman work since 2005 ([14]). Graph search strategies were applied and evaluated on the Enron email data ([14]). The evaluation criteria are: the number of people used per query, the depth of the query chain. The IIS is not obviously better than out degree based strategies (BCS and HDS). Weak Ties have been found to be important in helping people get new information. There will be found that weak ties are critical for automated expertise finding. The out degree strategy such BCS and HDC in networks like the Enron’s have clear advantages over other strategies ([14]). In ([6]), the problem of expertise identification using Email communications is treated. A content-based algorithm is compared with a graph based algorithm using the HITS algorithm and taking into consideration both text and communication. Results show that the graph based algorithm performs better. The same idea is developed in ([10]) showing that 15

social networks analysis techniques as the expertise propagation algorithm leads to significant performance improvement. In ([15]), the recommendation is formalized as a ranking problem over a heterogeneous social network. Random Wark Search is used to elaborate a recommendation when a person is doing a search or when browsing the information. On the other hand, in ([13]) the relations between authors in three networks of scientific collaborations, and the different interactions between them are studied. Two authors are connected if they have paper in common. In ([11]) the structure of social network of mathematical papers and the relations between authors in mathematical field are studied, the nodes of this network are the mathematicians and the edges are the common papers between them, the evolution of this network over the time (number of authors, number of papers) is also presented.

6

Conclusion

In this paper, we propose an algorithm for computing recommendation of skills in a professional social network. We study a professional network which contains authors connected together. Each author contains a profile description. The nodes of this network are authors while the edges are the similarity between them. Our objective is to recommend to a given author who submits a query a group of ranked authors as response. This recommendation is based, on one hand, on the similarity between authors’ profiles and the submitted request and on the other hand, on the betweenness centrality of authors found on the search paths. To search the graph we first extract the 16

most representative spanning tree and then we explore this tree. The first proposed algorithm is an exhaustive one, it is based on breadth strategy to explore the spanning tree until finding a suitable author to recommend. The second algorithm uses the A* algorithm for searching the spanning tree instead of the breadth search strategy. We specify an admissible heuristic which depends on the similarity between the submitted request and the user profile on the one hand, and on the betweenness measure on the other hand. Experiment results show that the guided version leads to propose the better recommendation and enhance the search performance. By comparing both algorithms we notice that 11% to 49% of the original search space is explored. We are now working on the elaboration of user profile by using a domain ontology representation. We aim to extend our algorithm to search several connected communities. We will try also to use the spanning tree for semantic finality that can for exemple leads to discover semantic matching between different communities.

References [1] Ecole d’´et´e web intelligence. In WI09. Universit de Lyon, 2009. [2] Lada A. Adamic and Eytan Adar. How to search a social network, July 2005. [3] Lada A. Adamic, Orkut Buyukkokten, and Eytan Adar. A social network caught in the web. First Monday, 8(6), 2003.

17

[4] C. Berge. Graphes. Gauther-Villars, 1983. [5] Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of community hierarchies in large networks. CoRR, abs/0803.0476, 2008. [6] Christopher S. Campbell, Paul P. Maglio, Alex Cozzi, and Byron Dom. Expertise identification using email communications. In CIKM, pages 528–531, 2003. [7] M. G. Everett and S. P. Borgatti. The centrality of groups and classes. Journal of Mathematical Sociology, 23(3):181–201, 1999. [8] J-C. Fournier. Th´eorie de Graphes et applications. Lavoisier, 2006. [9] Linton C. Freeman. Centrality in social networks: Conceptual clarification. Social Networks, 1(3):215–239, 1979. [10] Yupeng Fu, Rongjing Xiang, Yiqun Liu, Min Zhang, and Shaoping Ma. Finding experts using social network analysis. In Web Intelligence, pages 77–80, 2007. [11] J.W. Grossman. The evolution of the mathematical research collaboration graph. Congressus Numeratium, 2002. [12] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45:167–256, Mar 2003. [13] M. E. J. Newman. Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Science of the United States (PNAS), 101:5200–5205, 2004. 18

[14] Jun Zhang and Mark S. Ackerman. Searching for expertise in social networks: a simulation of potential strategies. In GROUP, pages 71–80, 2005. [15] Jun Zhang, Mark S. Ackerman, and Lada Adamic. Expertise networks in online communities: structure and algorithms. In WWW ’07: Proceedings of the 16th international conference on World Wide Web, pages 221–230, New York, NY, USA, 2007. ACM.

19

Suggest Documents