Decision Support Systems 54 (2013) 1442–1451
Contents lists available at SciVerse ScienceDirect
Decision Support Systems journal homepage: www.elsevier.com/locate/dss
ExpertRank: A topic-aware expert finding algorithm for online knowledge communities G. Alan Wang a,⁎, Jian Jiao b, Alan S. Abrahams a, Weiguo Fan c, e, Zhongju Zhang d a
Department of Business Information Technology, Pamplin College of Business, Virginia Tech, 1007 Pamplin Hall, Blacksburg, VA 24061, United States Department of Computer Science, Virginia Tech, 114 McBryde Hall, Blacksburg, VA 24061, United States Department of Accounting and Information Systems, Pamplin College of Business, Virginia Tech, 3007 Pamplin Hall, Blacksburg, VA 24061, United States d Operations and Information Management Department, School of Business, University of Connecticut, 2100 Hillside Road, Unit 1041, Storrs, CT 06269, United States e School of Information, Zhejiang University of Finance and Economics, Hang Zhou, 310018, P.R. China b c
a r t i c l e
i n f o
Article history: Received 13 April 2012 Received in revised form 26 October 2012 Accepted 9 December 2012 Available online 4 January 2013 Keywords: Expert finding Online community Ranking Vector space model Social network analysis Social media analytics
a b s t r a c t With increasing knowledge demands and limited availability of expertise and resources within organizations, professionals often rely on external sources when seeking knowledge. Online knowledge communities are Internet based virtual communities that specialize in knowledge seeking and sharing. They provide a virtual media environment where individuals with common interests seek and share knowledge across time and space. A large online community may have millions of participants who have accrued a large knowledge repository with millions of text documents. However, due to the low information quality of user-generated content, it is very challenging to develop an effective knowledge management system for facilitating knowledge seeking and sharing in online communities. Knowledge management literature suggests that effective knowledge management should make accessible not only written knowledge but also experts who are a source of information and can perform a given organizational or social function. Existing expert finding systems evaluate one's expertise based on either the contents of authored documents or one's social status within his or her knowledge community. However, very few studies consider both indicators collectively. In addition, very few studies focus on virtual communities where information quality is often poorer than that in organizational knowledge repositories. In this study we propose a novel expert finding algorithm, ExpertRank, that evaluates expertise based on both document-based relevance and one's authority in his or her knowledge community. We modify the PageRank algorithm to evaluate one's authority so that it reduces the effect of certain biasing communication behavior in online communities. We explore three different expert ranking strategies that combine document-based relevance and authority: linear combination, cascade ranking, and multiplication scaling. We evaluate ExpertRank using a popular online knowledge community. Experiments show that the proposed algorithm achieves the best performance when both document-based relevance and authority are considered. © 2012 Elsevier B.V. All rights reserved.
1. Introduction Due to increasing knowledge demands and the limited availability of expertise and resources within an organization, professionals often seek knowledge from external sources such as the Internet for problem solving, especially in the information technology (IT) industry where finding the current best solution is difficult [12,52,59]. Online knowledge communities (KCs) are among the most popular and effective of such sources.
⁎ Corresponding author. Tel.: +1 540 231 5074; fax: +1 540 231 3752. E-mail addresses:
[email protected] (G.A. Wang),
[email protected] (J. Jiao),
[email protected] (A.S. Abrahams),
[email protected] (W. Fan),
[email protected] (Z. Zhang). 0167-9236/$ – see front matter © 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.dss.2012.12.020
KCs are a special type of networks of practice [8,51], specializing in knowledge sharing and seeking. They are composed of individuals who share common interests and voluntarily work together to expand their understanding of a knowledge domain through learning and sharing [31–33]. KCs rely on cooperating members as primary resources, who collaboratively share knowledge and help build a community knowledge repository. They provide a virtual media environment where individuals may seek and share knowledge across time and space. The technologies supporting KCs have evolved from traditional listservs and newsgroups into more advanced Web-based discussion forums and interactive communication systems that are rich in social media. The number of registered members in KCs is also growing rapidly. For example, big-boards.com ranks some of the largest online discussion forums that are implemented using non-proprietary
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
discussion forum platforms. As of October 2012, big-boards listed 186 forums with more than 10 million posts and 1753 forums with more than 1 million posts. Many of the large forums discuss topics such as electronics, automobiles, mechanical equipment, software, healthcare, and sports and gaming. For example, one of the largest IT-related KCs, experts-exchange.com, has attracted more than 10,000 active experts, and more than 3.8 million registered members. Together, these knowledge seekers and experts have contributed more than 16 million postings related to IT solutions since 1996. Many large companies – e.g., Dell (“Dell Community”), Microsoft (“Microsoft Developer Network Forums”; “Microsoft Office Discussion Groups”), TurboTax (“Turbotax Live Community”), Amazon (“Amazon Web Service Discussion Forums”), and others – also run vast, proprietary internal KCs for both their employees and customers for self-service technical support [39,53]. Knowledge management literature suggests that an effective knowledge management solution should make accessible not only written knowledge but also experts who are a source of information and can perform a given organizational or social function [58]. Expert finding systems, also known as expertise finding systems or expert recommender systems, are an important tool for KCs to make individuals with sought knowledge accessible. An expert in a KC may help solve technical problems directly or refer to other sources of information as indirect solutions. Different from traditional organizations, where only those with unique knowledge about an item of interest are considered as experts [38], the definition of experts in an online KC is much broader in the sense that each community member may have some degree of expertise in a certain area [3]. Existing expert finding techniques often rely on the following indicators to find one's expertise areas and the level of expertise: self-classification, document-based relevance, and social importance. The majority of existing expert finding techniques are applied to organizations where information quality is high and knowledge hierarchy is well defined. However, information quality in online communities is considerably poorer than that in organizations. Gu and Konana [22] found that information quality in an online community is often inversely related to the size of its membership. In the largest community-based open-access online knowledge repository, Wikipedia, merely 0.09% of the articles met a set of information quality assessment criteria and qualified as featured articles as of September 2012 [54]. It is unclear whether existing expert finding techniques still work in the context of online communities. Most automated expert finding techniques rely on document-based relevance to predict the expertise level of experts for a given query [1,2,5,6,29,45]. These techniques assume that the relevance of one's authored documents to the query is positively related to their expertise level on the query. Recent studies find that one's social importance inferred from the structural characteristics of a social network can also be used to find experts [62]. However, very few studies have collectively considered both document-based relevance and social network based social importance. In this study we propose a new expert finding algorithm, namely ExpertRank, for finding the experts on a given query in an online knowledge community. The algorithm evaluates experts based on document-based relevance to a given query as well as the experts' social importance in the community. The rest of this paper is organized as follows. Section 2 briefly reviews related work. Section 3 defines our research objectives. Section 4 describes the ExpertRank algorithm in detail. Section 5 explains our evaluation methods and experimental results. Conclusions and future work are discussed in Section 6. 2. Related work The expert finding problem has attracted research attention in various contexts. Davenport and Prusak argue that knowledge exists within people and is “a fluid mix of framed experience, values, contextual information, and expert insight” [15]. Traditionally, expert finding has been applied to organizational or enterprise
1443
knowledge repositories where knowledge is well documented and information quality is high. For example, computer systems such as the Answer Garden [1,2] and the DEMOIR approach [58] have been developed to find the appropriate expert for a given problem in an organization. Both systems achieved satisfactory results in the context of organizations. Compared with organizations, online KCs usually do not have a universal knowledge structure (e.g., a hierarchical structure). Knowledge is generated when community users are engaged in online discussions and try to help each other solve problems. Information stored in KCs tends to have low information quality due to the fact that KC participants make voluntary contributions and have no obligation to maintain high information quality. Poor information quality will definitely affect the performance of knowledge management activities that involve information processing. Moreover, information quality is strongly related to the perceived usefulness and acceptance of information technology [30]. It is not known if organizationoriented expert finding techniques can still be effective and useful in online KCs. Existing expert finding systems utilize three sources of expertise indicators to make expert recommendations: self-disclosed information such as yellow pages and directories, content of authored documents (and software artifacts), and interaction history and social network analysis [40]. In the rest of this section we review existing expert finding techniques using each type of expertise indicators. 2.1. Expert finding based on self-disclosed information Self-disclosure requires expert candidates to explicitly declare their expertise in their posted profiles [15]. Expert recommendation systems that employ this approach include yellowpages.com, guru.com, 88owls.com, and other opt-in directory listings of experts. The manual process is time-consuming and expertise profiles are also unlikely to remain current as each user's expertise is continuously expanding. 2.2. Expert finding based on authored documents Documents that were written or reviewed by an expert candidate are also a useful expertise indicator. There are a number of expert finding systems that use text mining techniques to automatically capture the authors' expertise [28,34,45]. These expert finding techniques are referred to as document-based techniques. For example, in an organizational context, the Answer Garden system [1,2] analyzes questions and answers sent to experts and categorizes the questions and experts into an ontology. The users of the system can navigate the ontology of questions and find experts as the “leaf nodes” in the ontology. However, the technique works only with a predefined set of experts that is hard to change. In addition, an ontology may not be readily available and it may take some effort to build one. Streeter and Lochbaum proposed an Expert/Expert-Locator (EEL) system that pairs requests for technical information with appropriate technical organizations [45]. Their system automatically constructs a semantic space of organizations and terms using a statistical matrix decomposition technique to represent term-based semantic similarity in text documents. In the field of stock-market prediction, Hill and Ready-Campbell [23], assess an expert's skill as the conformance between that expert's historic predictions and actual stock market results. Experts with the highest ranking are those with the highest average number of correct future stock movement predictions (‘outperform’ vs. ‘underperform’), for outlook period X (typically 40 trading days = 2 work months), made by the expert over the prior calendar year. Other document-based expert finding techniques rely on information retrieval techniques to determine the relevance between an
1444
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
expert candidate and a search query. For example, Balog et al. proposed two generative language models to identify experts given a search query and a collection of documents associated with expert candidates [5,6]. Krulwich and Burkey developed ContactFinder that refers electronic bulletin board users with queries to people who can help them based on historical bulletin board messages [29]. The system categorizes messages and extracts their topics using a set of heuristics. Document-based expert finding techniques have achieved satisfactory performance in their applications. 2.3. Expert finding based on social network analysis Document-based techniques can be used to find experts because documents contain terms that are semantically relevant to the authors' expertise areas. Although document-based techniques are effective, they fail to consider each expert candidate's importance or influence in the social network that they belong to. Social cognitive studies have repeatedly shown that one's social influence plays an important role in the perception of their expertise. Romney et al. found that there is generally a positive relationship between the degree that a person's messages are shared with others, and that person's competence [41]. Kameda et al. reasoned that “cognitively central members of a community can provide social validation for other members' knowledge, and that, concurrently, their knowledge is confirmed by other members, leading to the perception of well-balanced knowledge or expertise in the focal task domain” [26]. Information system studies also show that peripheral information not related to document content is very important in knowledge seeking activities. The knowledge adoption model found that both content-based information quality and non-content-based information source credibility are positively related to perceived information usefulness and users' intention to adopt the received knowledge [46,49]. Cross et al. showed that social structural influence was one of the factors in determining whether a pair of survey respondents reported exchanging information benefit [13]. Specifically, they observed that social influence played an important role in problem reformulation and the way in which solutions became more broadly accepted. Ibarra suggests that network centrality is positively related to administrative and technical innovation involvement in an organization [24]. Similar to formal authority, higher network centrality implies a higher degree of access to and control over valued information resources [9]. Expert finding based on social network based characteristics alone is rare. Zhang et al. [62] built a social network based on historical post-reply activities in an online Java programming discussion forum. They proposed and evaluated several link-based expert finding algorithms. They found that the PageRank-based expert finding algorithm outperformed other network-based algorithms in the online community setting. However, their algorithm can be used only to find experts at the community level. It is not capable of finding experts on a specific topic or query.
Campbell et al. introduced a HITS (Hypertext Induced Topic Search) based expert finding algorithm in the context of email communications [10,14]. HITS is a graph-based link analysis algorithm originally used for rating the importance of Web pages based on authority and hub scores. The algorithm considered both email contents and communication patterns when determining experts' rankings. Experiments showed that their approach performed better than those algorithms that considered only email contents. However, their experiments considered very small networks (less than 15 people) and failed to explain the meaning of hub and authority in HITS in the context of expert finding. Fu et al. made a similar attempt that sought to combine document contents and social network information to perform expert finding [21]. They built social networks based on email communications and the co-occurrences of people on Web pages. They also demonstrated performance improvement over a content-based expert finding technique. But their technique relied on the identification of a number of seed experts. The performance of their technique was sensitive to the selection of seeds (i.e., non-experts). Moreover, they used an enterprise dataset where documents had relatively high information quality. It is questionable whether the same approach can be applied to online KCs where information quality is low and the size of the community network is very large. Expert finding problems have also been explored using knowledge sources that are not well structured and managed. Zhang et al. proposed an expert finding algorithm in an online help-seeking community, the Java Forum [61]. Their approach considered both document contents and social network information. However, the effectiveness of the proposed algorithm is unknown because of the lack of empirical evaluations. We summarize existing expert finding methods in Table 1. Expert finding based on manually self-disclosed information is timeconsuming and it is difficult to keep up with constantly expanding expertise profiles. Both document-based expertise indicators and social influence are important factors for locating experts in automated expert finding techniques. However, most of the techniques apply to organizational contexts where information quality is high. Expert finding techniques proposed by Kurlwich and Burkey [29] and Zhang et al. [61] aim to extract expertise profiles based on postings made in online communities where information quality is low. However, Kurlwich and Burkey's method fails to consider experts' importance in a social context, and the effectiveness of Zhang et al.'s method is unknown. Though Hill and Ready-Campbell [23] consider online communities, they consider only the sequence of binary (‘outperform’ vs. ‘underperform’) stock predictions made by each expert, for each stock ticker symbol, rather than the full textual content of the expert's postings. Information quality in their method is therefore high, as only a small, highlystructured subset of available information is used for ranking the expert. Finally, most previous work has focused only on the static ranking of domain experts without considering user-specific information needs, whereas our work serves as a method of dynamic expert ranking based on topic queries.
2.4. Hybrid expert finding techniques
3. Research objectives
Recent expert finding systems have started considering both document-based expertise indicators and social relationships [27,36,62]. For example, Serdyukov et al. proposed a relevance propagation algorithm that considered not only the documents directly related to an expert but also those indirectly related through document–author relationships [43]. They also utilized additional peripheral information such as hyperlinks in documents, the organizational structure, and user feedback. Their evaluation results showed that their approach outperformed those methods that considered only directly-related documents.
In this study we propose a new expert finding technique that aims to make the following contributions. First, the technique can be applied to online KCs which, compared with organizational or enterprise knowledge repositories, do not have a knowledge ontology, have low information quality, and are rich in social media. Second, our expert finding technique dynamically ranks experts in the area specified by a search query. Lastly, our technique employs both document content and social network based characteristics as expertise indicators. Document contents are used to evaluate the relevance between an expert candidate's expertise and a search query. Social influence is important
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
1445
Table 1 Existing expert finding studies. Study
Expertise indicators used Self-disclosure
Davenport and Prusak, 1998 Ackerman and Malone, 1990 Balog et al., 2006, 2007 Streeter and Lochbaum, 1988 Krulwich and Burkey, 1996 Hill and Ready-Campbell, 2011 Zhang et al., 2007 Serdyukov, 2007 Campbell et al. 2003 Fu et al., 2007 Zhang et al., 2007 This study
Document-based
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
✓ ✓ ✓ ✓ ✓ ✓
in assessing an expert candidate's competence and knowledge balance. We describe our expert finding algorithm in detail in Section 4. 4. The ExpertRank algorithm We define the expert ranking problem in online KCs as one that identifies the expertise of a certain expert candidate ca given a query topic q: Expert(ca,q). We argue that an expert candidate's expertise on a certain topic depends not only on whether the candidate has relevant knowledge, but also on the candidate's social importance or influence in the community. The idea is similar to what is behind a Web information retrieval technique [25], where the rank of a search result page with respect to a user query is determined by a combination of content-based similarity and link-based importance. We define the expertise of a candidate ca with respect to a query q as a heuristic combination of expertise relevance and authority: ExpertRankðca; qÞ ¼ f ðREðca; qÞ; AU ðcaÞÞ
Domain
Document type
Information quality
Organization Organization Organization Organization Online community Online community Online community Organization Organization Organization Online community Online community
N/A N/A W3C HTML documents Computer accessible documents Electronic bulletin board messages Online user postings Online user postings W3C HTML documents Emails Emails Online user postings Online user postings
High High High High Low High Low High High High Low Low
Social network
ð1Þ
where RE(ca,q) denotes a relevance score between the documents (i.e., posts in online KCs) authored by the candidate ca and a user query q, AU(ca) denotes the global authority score of the candidate ca reflecting the social importance of the member within the community. The function f(•) is a combination strategy that collectively considers expertise relevance and authority to produce an expert ranking. 4.1. Expertise relevance Expertise relevance indicates whether or not a community member has any level of expertise with regard to a specific topic. In order to calculate the expertise relevance score, we first consider each community participant as a potential expert candidate. We represent each candidate with an expertise profile built by merging all the documents that the candidate has previously authored into a single document. Stop-word removal and stemming should be performed to remove insignificant words and convert derived words into their root forms before such a document is generated for each candidate. Given a search query, we can employ the Vector Space Model [42], a classic information retrieval technique, to calculate a cosine similarity score [4] between each candidate's expertise profile document ca and the query q:
4.2. Expert authority To determine a member's social influence or importance, we need to construct a social network that represents the social interactions of members in an online KC. In a typical online knowledge community, as Fig. 1 illustrates, a community user initiates a discussion topic by creating a new thread while others join the discussion by posting replies to the same thread. The user–thread relationships can be represented using a graph: see Fig. 1.(1). The user–thread relationship graph has two types of vertices: users ui and threads ti. A directed edge from a user to a thread denotes a thread starting relationship where the user creates a new thread by posting a question. A directed edge from a thread to a user denotes a thread replying relationship where the user has replied to the thread. We extract user–user relationships by connecting users who participate in the same thread with directed edges from topic starters to repliers: see Fig. 1.(2). The user–user graph depicts social interactions in the online community. We use this user–user relationship graph to infer one's social importance in an online KC. Link analysis algorithms such as PageRank [7] can measure the importance (i.e., authority) of a node in a network based on the link structure of the network. Zhang et al. [62] adapted the concept of PageRank in the context of expert finding. The intuition behind their expert ranking algorithm is that a user's expertise level should be higher than those users whom he or she is able to help. It is similar to the intuition behind the PageRank algorithm where the importance of a Web page depends on the pages that link to it. In the context of online community discussions, a user A will receive a vote of support from another user B whose question is answered by A. If several users help answer B's question, the vote of support from B is evenly distributed to those users. Fig. 1.(2) illustrates a user–user relationship graph derived from discussions in an online community. A user–user relationship graph can be formally defined as U=bV, E>, where V is a vector of N
t
∑j¼1 wj dj REðca; qÞ ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi t t 2 2 ∑j¼1 dj ∑j¼1 wj
ð2Þ
where wj denotes the term frequency-inverse document frequency (TF-IDF) weight of term j in an expertise profile document and dj denotes the TF-IDF weight of term j in the query vector q.
Fig. 1. A user-thread graph and a user-user relationship graph.
1446
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
users and E consists of directed links from discussion thread starters to repliers. Each element in the user–user relationship matrix U, also known as an adjacency matrix in PageRank, can be calculated as uij ¼
0 if ði; jÞ∉E 1=out ðiÞ if ði; jÞ∈E
ð3Þ
where out(i) is user i's out-degree, i.e., the total number of users who have helped user i. Each element uij represents the vote of support received for user j from user i. Therefore, we can develop a PageRank-like algorithm to measure a user's importance or authority (AU) in a recursive way:
The matrix S becomes 2
0 60 6 S¼6 62 40 0
1 0 0 0 0
N X j¼1
AU ðjÞ⋅uji þ ð1−dÞ
1 : N
ð4Þ
d is a damping factor similar to that used in PageRank, allowing the recursive cycles to jump to a random user in the community rather than to follow the links derived from discussions a fraction (1-d) of the time. It is commonly set to 0.85 [7,56]. An iterative algorithm such as the power iteration method can be used in this recursive computation using an initial assignment of authority scores such as AU(i)=1/N [7,56]. The iteration ends when the authority scores converge.
4.3. A modified PageRank algorithm
3 0 07 7 17 7: 05 0
0 6 0 6 S′ ¼ 6 6 1:5 4 0 0
0:5 0 0 0 0
0 0:5 0 0 0
3 0 0 0 07 7 1 17 7: 0 05 0 0
The last step of the algorithm normalizes the weights by each row so that weights on each row add up to 1. We then get 2
0 6 0 6 U′ ¼ 6 6 0:43 4 0 0
1 0 0 0 1 0 0 0 0:285 0 0 0 0 0 0
3 0 0 7 7 0:285 7 7: 0 5 0
4.4. Combination strategies
In Eq. (4) we assign an equal weight to each user–user edge when determining the authority score. However, those edges may carry different importance weights. Let us consider two scenarios: (1) two users connected by an edge because they participated in one discussion thread; (2) two users connected by an edge because they participated in a number of different discussion threads. Obviously, the edge in the second scenario should carry more importance. Moreover, we noticed a problem when we applied PageRank to assess users' authority scores in online KCs. Those who tend to discuss frequently with a small number of users would accumulate high authority scores in PageRank. However, they may not deserve high authority ratings for the following reasons. First, those users may share a common interest that is only attractive to a very small audience. They would only deserve high authority scores when the sought topic is highly relevant to their interests. Second, those users often form a collusion structure that leads to inflated PageRank scores. A perfect collusion structure consists of nodes having no external out-links but only internal out-links to each other. Zhang et al. found that a small group of low ranked (e.g., 10,000th) users could form a collusion that catapulted them into the top-400 [60]. To reduce the influence of colluded groups, we modified the calculation of the expert authority score as follows: N X 1 AU ðiÞ ¼ d⋅ AU ðjÞ⋅u′ji þ ð1−dÞ N j¼1
0 0 1 0 0
In Step (2) the algorithm updates the edge weights in matrix S by weight by subtracting the number of reverse links (both direct and indirect links) from user j to user i with the maximum length of an indirect link being no more than m. Assuming m = 2 and ß = 0.5, s′31 = s31 − s13 − ß*min(s12, s23) = 2 − 0 − 0.5 ∗ min(1,1) = 1.5. We then get a new matrix S′ 2
AU ðiÞ ¼ d⋅
0 1 0 0 0
ð5Þ
where u′ji denotes an element in a weighted adjacency matrix U′. We propose a weighted reference relationship (WRR) algorithm for calculating the weighted adjacency matrix. Fig. 2 illustrates the details of the WRR algorithm. The idea of the WRR algorithm is illustrated in Fig. 3. Step (1) of the WRR algorithm assigns each edge a weight nij, which is the number of replies user j made to user i (e.g., n12 = s12 = 1, n31 = s31 = 2). Self-referencing edges will receive zero weight (e.g., n44 = s44 = 0).
Now we consider the combination strategy (i.e., f(•) in Eq. (1)) that combines expertise relevance and expert authority into a single expert ranking score. We propose three strategies as follows: Strategy 1. Linear combination. The weighted linear combination method is a commonly used data fusion method and has been successfully applied to various data fusion scenarios [35,37,55]. In this strategy we assume that both content-based relevance and link-based authority determine an expertise rank score collectively and simultaneously. The relevance score and authority score are combined as a weighted sum ERðui ; qÞ ¼ a⋅REðui ; qÞ þ ð1−aÞ⋅AU ðui Þ where ui is an expert candidate and a ∈ [0,1] is a coefficient. However, a high rank score achieved using this strategy does not necessarily mean high scores in both relevance and authority. It is possible that a high rank score may result from a highly skewed authority score despite a low relevance score (i.e., the user's expertise profile is not relevant to the search query at all). Strategy 2. Cascade ranking. The cascade ranking method uses a sequence of ranking functions to progressively refine a rank order [50]. We first use the content-based relevance score alone to rank users; then select a percentage b ∈ [0,1] of top experts based on the relevance ranking; re-rank the top experts using link-based authority scores. In this strategy we first remove those users whose expertise profiles are less relevant. Therefore, those who rank highly in the final ranking list will have an expertise profile highly relevant to the search query and a high authority score. Strategy 3. Scaling strategy. The scaling strategy has been previously used in supervised classification to combine results predicted by different classifiers and in some cases it outperforms the
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
1447
Fig. 2. The weighted reference relationship algorithm.
linear combination strategy [47]. Following this strategy, we combine content-based relevance score and link-based authority score using multiplication. c
ERðui ; qÞ ¼ REðui ; qÞ AU ðui Þ where c is a scaling parameter reflecting the contribution of relevance and authority to the final expert ranking. 5. Experiments In this section we describe our empirical evaluations conducted for the proposed ExpertRank algorithm. Because a number of parameters are involved in the proposed algorithm, we first conducted experiments to determine the optimal values for those parameters. In addition we tested the three proposed strategies for combining content-based relevance and social network based authority and determined the optimal strategy. Finally, using the optimal parameter values and combination strategy, we compared the proposed ExpertRank algorithm with two baseline methods: (1) a document-based method where only content-based relevance scores were used to rank experts, and, (2) a hybrid (document-based and social-network-based) method which used only a standard (i.e., non-modified) PageRank algorithm to rank experts. 5.1. Dataset Microsoft Office Discussion Groups is the official online helpseeking community that allows Microsoft Office users to share
experiences and seek help for problems related to the software suite. The community consists of a number of forums with each focusing on a specific Office product, e.g., Word, Access, and Excel. We chose to use this community as our experiment's testbed because this well-established community has accumulated a large volume of online postings with rich social interactions embedded in the posting–replying activities. Moreover, the community members and Microsoft personnel have manually identified the experts in the community. Each year a number of outstanding members of the community are selected to receive the Microsoft Most Valuable Professionals (MVP) award in recognition of their willingness and ability to help others solve problems. These selected members are grouped into different competence lists in corresponding sub forums. These official member competence lists can be naturally employed as the gold standard for our empirical evaluations. We crawled all online postings in the Microsoft Office Discussion Groups from the establishment of the discussion groups in 2005 to the end of year 2007, with approximately three years of data. Table 2 lists some basic statistics of the crawled data. Similar to what researchers did in TREC [5], we extracted 19 topic phrases that were the names (i.e., Microsoft Office product names) of the 19 forums and used them as search queries for finding experts in each topic area. 5.2. Evaluation metrics We employed several commonly used information retrieval metrics to evaluate our experimental results including precision (P), recall (R), macro-average F-measure(Ma), micro-average F-measure(Mi), P@10, and P@20 [57]. Given a query q in a query set Q, let aq denote the number of true experts detected (true positive), bq the total number of experts detected (true positive and false positive), dq the number of
Table 2 Microsoft Office Discussion Groups.
Fig. 3. An illustration of the WRR algorithm.
Number Number Number Number Number Number Number
of forums of sub-domains of users of users who have asked questions of users who have replied to questions of question posts of replying posts
19 76 121,289 105,968 66,369 228,787 624,219
1448
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
0.5
P@n measures precision for the top n experts detected. Ma and Mi are F-measures that are weighted harmonic means of precision and recall [57]. When a micro-average F-measure is calculated, the binary expert decisions for all search queries are collected in a joint pool. When a macro-average F-measure is calculated, a separate F-measure Fq is first calculated for each search query before all Fq values are averaged over all search queries.
0.45
measure
0.4
0.35
5.3. The optimal parameter values 0.3 precision recall MacroF1 MicroF1
0.25
0.2 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
B Fig. 4. The impact of different β values on expert finding (m = 2).
experts in the gold standard (true positive and false negative). The metrics are calculated using the following formula: ∑ aq
P¼
q∈Q
∑ bq
;
q∈Q
∑ aq
R¼
q∈Q
∑ dq
; 5.4. The optimal combination strategy
q∈Q
aq aq ⋅ bq dq Fq ¼ a aq ; q þ bq dq 2⋅
∑ Fq Ma ¼
Mi ¼
q
jQ j
To determine the parameters β and m in the WRR algorithm, we applied the WRR algorithm to each of the 19 topic queries and compared the obtained sets of experts to the corresponding MVP competence lists. We first let m = 2 and conducted a sensitivity analysis on the value of β in a range between 0 and 1. Fig. 4 shows the precision, recall, and F-measures over different β values. We can see that the coefficient β has a slight impact on the performance of the expert ranking system. As we increased the value of β, the corresponding measures were slightly improved in a monotonic way. In the following experiments we let β = 1 in order to achieve the optimal performance. Once the optimal value for coefficient β was determined, we investigated how the largest geodesic distance, m, would affect the performance of expert finding. We limited the value range of m to no more than three because the computational complexity would grow exponentially as m increases. Results showed that both F-measure values had the best performance when m = 2 (see Fig. 5). Therefore, the optimal value of m is 2.
;
2⋅P⋅R : PþR
We conducted experiments to determine the optimal strategy for combining the content-based expertise relevance score and the link-based expert authority score. Fig. 6 shows the expert ranking results using the linear combination strategy. It achieved its best performance (MacroF1 = 0.17) when a = 0. That means, under the linear combination assumption, using the authority score alone to rank experts would be adequate. The cascade ranking strategy appeared to be the optimal strategy, as Fig. 7 shows. The algorithm achieved its best performance (MacroF1 = 0.34) when the top 40% of experts – ranked based on expertise relevance score – were re-ranked using authority scores. The best performance of the scaling strategy, as shown in Fig. 8, was comparable with that of cascade ranking. We prefer the
Fig. 5. The impact of different m values on expert finding (β = 1).
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
60
0.2 precision recall MacroF1 MicroF1
0.18 0.16
50
50 40
0.14
measure
1449
41.88
39.34 34.27
34.16
30.18
30
Baseline
0.12 0.1
11.37 10
0.08
8.73
11.58
0 P(%)
0.04
11.88
9.88
0
0.06
R(%)
Ma(%)
Mi(%) P@10(%) P@20(%)
Fig. 9. Performance comparison between ExpertRank and document-based expert finding.
0.02 0
ExpertRank
20
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
a Fig. 6. Expert finding performance using the linear combination strategy.
cascade ranking strategy because the optimal value of the parameter c in the scaling strategy is difficult to interpret.
was used to find the experts. The results are shown in Fig. 9. ExpertRank significantly outperformed the content-based baseline method (p b 0.001). ExpertRank's F-measures (MacroF1 and MicroF1) were around 34%. Those of the baseline method were only about 11%, which was in line with previous expert finding studies based on documents alone, such as [5]. 5.6. Comparing with PageRank-based hybrid expert finding
5.5. Comparing with document-based expert finding Using the optimal parameter values and combination strategy, we evaluated the effectiveness of our ExpertRank algorithm in comparison with a baseline method where only the content-based relevance score
0.45 precision recall MacroF1 MicroF1
0.4 0.35
In this experiment we compared the effectiveness of the proposed WRR algorithm in our ExpertRank method with the PageRank used in hybrid expert finding [62]. The WRR algorithm is designed to reduce the effect of inflated authority scores due to colluded discussion patterns. We used WRR and PageRank to calculate expert candidates' authority scores separately before applying the scores to the proposed ExpertRank algorithm. As shown in Fig. 10, our proposed expert finding technique achieved better results in almost all performance measures when WRR was used to calculate authority scores in hybrid expert finding.
measure
0.3
6. Conclusions and future work 0.25 0.2 0.15 0.1 0.05 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
b Fig. 7. Expert finding performance using the cascade ranking strategy.
0.4 0.35
measure
0.3
precision recall MacroF1 MicroF1
In this paper we proposed a new algorithm for finding experts in online KCs. Online KCs provide useful and accessible knowledge sources in various fields. Different from organizational knowledge repositories that are well managed and maintained, the knowledge bases in online KCs do not have a universal knowledge structure and tend to have low information quality due to the voluntary and anonymous nature of their participants. On the other hand, online KCs are rich social media which support interactions between knowledge seekers and contributors. Existing expert finding techniques are mainly document-centric with few leveraging contextual information such as social media. We proposed a novel expert finding technique, ExpertRank, for finding experts in online KCs. ExpertRank evaluates 60
57.5 55
55
0.25
50 46.18 44.73
0.2 45
45 45 PageRank
0.15
WRR
40
37.63 36.32 35.85 35.56
0.1 35
0.05
31.75 30.75
30 0 0.10.2
0.5
1
1.5
2
2.5
3
P(%)
R(%)
Ma(%)
Mi(%) P@10(%) P@20(%)
c Fig. 8. Expert finding performance using the scaling strategy.
Fig. 10. Performance comparison between ExpertRank and PageRank-based hybrid expert finding.
1450
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451
experts on both knowledge relevance and authority in the community. We proposed a modified PageRank algorithm that reduces the biasing influence of small-interconnected groups on the calculation of authority scores. Experimental results showed that the ExpertRank algorithm outperformed both document-centric expert finding and PageRank-based hybrid expert finding. This research has several limitations which we plan to address in the future. Currently, we use a simple TF-IDF weighting to index all keywords in user postings when creating a user profile. Better term weighting strategies, such as Okapi [44], could be tried to improve the accuracy of user profiling. In addition, we use only content-based and link-based evidence as expertise indicators. There could be other sources of evidence about user expertise, such as the tenure of a user in a KC, number of postings from historic time t to current time, number of users helped from historic time t to current time, etc. These features could be combined with our current design to make the user expertise prediction more effective and accurate. Our current combination strategy is relatively straightforward and simple. We could use more powerful fusion strategies such as genetic algorithms [11,19,20] or genetic programming [16–18,48] to automatically design and fine-tune the fusion strategies. Moreover, our current evaluation is based on only one large online KC. We could validate our proposed algorithm in more diversified KCs to test its efficacy and generalizability. Expert finding has become increasingly important in large corporations which have amassed a huge volume of data such as employee internal communications (e.g., emails), knowledge exchanges on internal discussion forums, and customer service interactions collected via the Web. Notable examples include GE, Dell, IBM, KPMG, Microsoft, and Google. The research ideas presented in this article could be easily extended or modified to these data to help build expert databases or organizational memory systems that facilitate knowledge exchange among employees.
Acknowledgments This research is partly supported by the Natural Science Foundation of China (grant #70872089 and #71072129) and the National Science Foundation (grant #DUE-0840719).
References [1] M.S. Ackerman, T.W. Malone, Answer Garden: a tool for growing organizational memory, Proceedings of the ACM SIGOIS and IEEE CS TCOA Conference on Office Information Systems, Cambridge, MA, 1990, pp. 31–39. [2] M. Ackerman, D. McDonald, Answer Garden 2: merging organizational memory with collaborative help, Proceedings of the 1996 ACM Conference on Computer Supported Cooperative Work, Boston, MA, 1996, pp. 97–105. [3] M.S. Ackerman, V. Wulf, V. Pipek, Sharing Expertise: Beyond Knowledge Management, MIT Press, Cambridge, MA, 2002. [4] R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, ACM Press, New York, NY, 1999. [5] K. Balog, L. Azzopardi, M. de Rijke, Formal models for expert finding in enterprise corpora, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, 2006, p. 50. [6] K. Balog, T. Bogers, L. Azzopardi, M. de Rijke, A. van den Bosch, Broad expertise retrieval in sparse data environments, Proceedings of the 30th Annual International ACM SIGIR Conference, Amsterdam, The Netherlands, 2007, pp. 551–558. [7] S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks 30 (1998) 107–117. [8] J.S. Brown, P. Duguid, Knowledge and organization: a social-practice perspective, Organization Science 12 (2) (2001) 198–213. [9] R.S. Burt, Toward a Structural Theory of Action, Academic Press, New York, NY, 1982. [10] C.S. Campbell, P.P. Maglio, A. Cozzi, B. Dom, Expertise identification using email communications, Proceedings of the 12th International Conference on Information and Knowledge Management, New Orleans, LA, 2003. [11] H. Chen, Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms, Journal of the American Society for Information Science 46 (1995) 194–216. [12] D. Constant, L. Sproull, S. Kiesler, The kindness of strangers: the usefulness of electronic weak ties for technical advice, Organization Science 7 (2) (1996) 119.
[13] R. Cross, R.E. Rice, A. Parker, Information seeking in social context: structural influences and receipt of information benefits, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 31 (4) (2001) 438–448. [14] R. D'Amore, Expertise community detection, Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 2004, pp. 498–499. [15] T. Davenport, L. Prusak, Working Knowledge: How Organizations Manage What They Know, Harvard Business School Press, Boston, MA, 1998. [16] W. Fan, M.D. Gordon, P. Pathak, Discovery of context-specific ranking functions for effective information retrieval using genetic programming, IEEE Transactions on Knowledge and Data Engineering 16 (2004) 523–527. [17] W. Fan, M.D. Gordon, P. Pathak, A generic ranking function discovery framework by genetic programming for information retrieval, Information Processing and Management 40 (4) (2004) 587–602. [18] W. Fan, M.D. Gordon, P. Pathak, Genetic programming-based discovery of ranking functions for effective web search, Journal of Management Information Systems 21 (4) (2005) 37–56. [19] W. Fan, M. Gordon, P. Pathak, An integrated two-stage model for intelligent information routing, Decision Support Systems 42 (1) (2006) 362–374. [20] W. Fan, P. Pathak, M. Zhou, Genetic-based approaches in ranking function discovery and optimization in information retrieval — a framework, Decision Support Systems 47 (4) (2009) 398–407. [21] Y. Fu, R. Xiang, Y. Liu, M. Zhang, S. Ma, Finding experts using social network analysis, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Fremont, CA, 2007, pp. 77–80. [22] B. Gu, P. Konana, B. Rajagopalan, H.W.M. Chen, Competition among virtual communities and user valuation: the case of investing-related communities, Information Systems Research 18 (1) (2007) 68–85. [23] S. Hill, N. Ready-Campbell, Expert stock picker: the wisdom of (experts in) crowds, International Journal of Electronic Commerce 15 (3) (2011) 73–102. [24] H. Ibarra, Network centrality, power, and innovation involvement: determinants of technical and administrative roles, Academy of Management Journal 36 (3) (1993) 471–501. [25] R. Jin, S. Dumais, Probabilistic combination of content and links, Proceedings of the 24th International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, 2001, pp. 402–403. [26] T. Kameda, Y. Ohtsubo, M. Takezawa, Centrality in sociocognitive networks and social influence: an illustration in a group decision-making context, Journal of Personality and Social Psychology 73 (2) (1997) 296. [27] H. Kautz, B. Selman, M. Shah, Referral web: combining social networks and collaborative filtering, Communications of the ACM 40 (3) (1997) 63–65. [28] B. Krulwich, C. Burkey, Contactfinder agent: answering bulletin board questions with referrals, The 13th National Conference on Artificial Intelligence, Portland, OR, 1996. [29] B. Krulwich, C. Burkey, Contactfinder agent: answering bulletin board questions with referrals, Proceedings of the 13th National Conference on Artificial Intelligence, Portland, OR, 1996. [30] A.L. Lederer, D.J. Maupin, M.P. Sena, Y. Zhuang, The technology acceptance model and the world wide web, Decision Support Systems 29 (3) (2000) 269–282. [31] H. Lin, W. Fan, Z. Zhang, Uncovering critical success factors for web-based knowledge communities, Proceedings of the 16th Biennial Conference of the International Telecommunications Society, Beijing, China, 2006. [32] H. Lin, W. Fan, L. Wallace, Z. Zhang, An empirical study of knowledge community success, Proceedings of the 40th Hawaii International Conference on System Sciences (HICSS), Big Islands, Hawaii, 2007. [33] H. Lin, W. Fan, Z. Zhang, A qualitative study of web-based knowledge communities: examining success factors, International Journal of e-Collaboration 5 (3) (2009) 39–57. [34] X. Liu, G.A. Wang, A. Johri, M. Zhou, W. Fan, Harnessing global expertise: a comparative study of expertise profiling methods for online communities, Information Systems Frontiers (2012) 1–13. [35] C. Marrocco, P. Simeone, F. Tortorella, A linear combination of classifiers via rank margin maximization, Proceedings of the 2010 Joint IAPR International Conference on Structural, Syntactic, and Statistical Pattern Recognition, Cesme, Izmir, Turkey, 2010, pp. 650–659. [36] D. Mattox, M. Maybury, D. Morey, Enterprise expert and knowledge discovery, Proceedings of the 8th International Conference on Human-Computer Interaction, Munich, Germany, 1999, pp. 23–27. [37] P. McCullagh, J.A. Nelder, Generalized Linear Models, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL, 1989. [38] P.A. Morris, Combining expert judgments: a Bayesian approach, Management Science 23 (7) (1977) 679–693. [39] S. Nambisan, R.A. Baron, Interactions in virtual customer environments: implications for product support and customer relationship management, Journal of Interactive Marketing 21 (2) (2007) 42–62. [40] T. Reichling, M. Veith, V. Wulf, Expert recommender: designing for a network organization, Learning in Communities (2009) 139–171. [41] A.K. Romney, S.C. Weller, W.H. Batchelder, Culture as consensus: a theory of culture and informant accuracy, American Anthropologist 88 (2) (1986) 313–338. [42] G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing, Communications of the ACM 18 (11) (1975) 613–620. [43] P. Serdyukov, H. Rode, D. Hiemstra, University of Twente at the TREC 2007 Enterprise Track: modeling relevance propagation for the expert search task, The 16th Text Retrieval Conference (TREC 2007) Enterprise Track, Gaithersburg, MD, 2007. [44] K. Sparck Jones, S. Walker, S. Robertson, A probabilistic model of information retrieval: development and comparative experiments part 2, Information Processing and Management 36 (6) (2000) 809–840.
G.A. Wang et al. / Decision Support Systems 54 (2013) 1442–1451 [45] L. Streeter, K. Lochbaum, An expert/expert-locating system based on automatic representation of semantic structure, Proceedings of the Fourth Conference on Artificial Intelligence Applications, San Diego, CA, 1988, pp. 380–388. [46] S.W. Sussman, W.S. Siegal, Informational influence in organizations: an integrated approach to knowledge adoption, Information Systems Research 14 (1) (2003) 47–65. [47] D.M.J. Tax, R.P.W. Duin, M. Van Breukelen, Comparison between product and mean classifier combination rules, Proceedings of Workshop on Statistical Pattern Recognition, Prague, Czech Republic, 1997. [48] R.S. Torres, A. Falcao, M.A. Goncalves, J.P. Papa, B. Zhang, W. Fan, E.A. Fox, A genetic programming framework for content-based image retrieval, Pattern Recognition 42 (2) (2009) 283–293. [49] G.A. Wang, X. Liu, W. Fan, A text classification framework for finding helpful user-generated contents in online communities, Proceedings of the 2011 International Conference on Information Systems, Shanghai, China, 2011. [50] L. Wang, J. Lin, D. Metzler, A cascade ranking model for efficient ranked retrieval, Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 2011, pp. 105–114. [51] M. Wasko, S. Faraj, Why should I share? Examining social capital and knowledge contribution in electronic network of practice, MIS Quarterly 29 (1) (2005) 35–57. [52] M. Wasko, S. Faraj, R. Teigland, Collective action and knowledge contribution in electronic networks of practice, Journal of the Association for Information Systems 5 (11–12) (2004) 494–513. [53] C. Wiertz, K. de Ruyter, Beyond the call of duty: why customers contribute to firm-hosted commercial online communities, Organization Studies 28 (3) (2007) 347–376. [54] Wikipedia, Wikipedia: Featured Article Statistics, available at: http://en.wikipedia. org/wiki/Wikipedia:Featured_article_statistics, (last accessed on October 11, 2012). [55] S. Wu, Linear combination of component results in information retrieval, Data & Knowledge Engineering 71 (1) (2011) 114–126. [56] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. Ng, B. Liu, P.S. Yu, Top 10 algorithms in data mining, Knowledge and Information Systems 14 (1) (2008) 1–37. [57] Y. Yang, An evaluation of statistical approaches to text categorization, Information Retrieval 1 (1) (1999) 69–90. [58] D. Yimam-Seid, A. Kobsa, Expert Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach, Sharing Expertise: Beyond Knowledge Management, MIT Press, Cambridge, MA, 2003. [59] W. Zhang, S. Watts, Knowledge adoption in online communities of practice, Systemes d'Information et Management 9 (1) (2003) 81–102. [60] H. Zhang, A. Goel, R. Govindan, K. Mason, B. Van Roy, Making eigenvector-based reputation systems robust to collusion, Algorithms and Models for the Web-Graph (2004) 92–104. [61] J. Zhang, M. Ackerman, L. Adamic, K. Nam, Qume: a mechanism to support expertise finding in online help-seeking communities, The ACM Symposium on User Interface Systems and Technology, ACM, 2007, p. 114. [62] J. Zhang, M.S. Ackerman, L. Adamic, Expertise networks in online communities: structure and algorithms, Proceedings of the 16th International World Wide Web Conference (WWW), Banff, Canada, 2007.
G. Alan Wang is an Assistant Professor in the Department of Business Information Technology, Pamplin College of Business, at Virginia Tech. He received a Ph.D. in Management Information Systems from the University of Arizona, an M.S. in Industrial Engineering from Louisiana State University, and a B.E. in Industrial Management & Engineering from Tianjin University. His research interests include heterogeneous data management, data cleansing, data mining and knowledge discovery, and decision support systems. He has published in Communications of the ACM, IEEE Transactions of Systems, Man and Cybernetics (Part A), IEEE Computer, Group Decision and Negotiation, Journal of the American Society for Information Science and Technology, and Journal of Intelligence Community Research and Development.
Jian Jiao is a Ph.D. candidate in Computer Science at Virginia Tech and a Software Design Engineer at Microsoft. He holds an M.S. in Computer Science from the Beijing Institute of Technology, and has previous work experience at Microsoft Research Asia and Motorola.
1451
Alan S. Abrahams is an Assistant Professor in the Department of Business Information Technology, Pamplin College of Business, at Virginia Tech. He received a Ph.D. in Computer Science from the University of Cambridge, and holds a Bachelor of Business Science degree from the University of Cape Town. Dr. Abrahams's primary research interest is in the application of decision support systems in entrepreneurship. He has published in a variety of journals including Expert Systems with Applications, Journal of Computer Information Systems, Communications of the AIS, and Group Decision and Negotiation.
Weiguo (Patrick) Fan is a Full Professor of Accounting and Information Systems and Full Professor of Computer Science (courtesy) at the Virginia Polytechnic Institute and State University (Virginia Tech). He received his Ph.D. in Business Administration from the Ross School of Business, University of Michigan, Ann Arbor, in 2002, a M.S. in Computer Science from the National University of Singapore in 1997, and a B.E. in Information and Control Engineering from the Xi'an Jiaotong University, P.R. China, in 1995. His research interests focus on the design and development of novel information technologies – information retrieval, data mining, text/web mining, business intelligence techniques – to support better business information management and decision making. He has published more than 100 refereed journal and conference papers. His research has appeared in journals such as Information Systems Research, Journal of Management Information Systems, IEEE Transactions on Knowledge and Data Engineering, Information Systems, Communications of the ACM, Journal of the American Society on Information Science and Technology, Information Processing and Management, Decision Support Systems, ACM Transactions on Internet Technology, Pattern Recognition, IEEE Intelligent Systems, Pattern Recognition Letters, International Journal of e-Collaboration, and International Journal of Electronic Business.
Zhongju (John) Zhang is an Associate Professor in the School of Business, University of Connecticut. He received his Ph.D. in Management Science (with minors in Economics and Operations Management) from the University of Washington Business School. Zhang's research focuses on the problems at the interface of information systems/ technologies, marketing, economics, and operations research. His research has been published in academic journals including Information Systems Research, INFORMS Journal on Computing, Journal of Management Information Systems, IEEE Transactions on Engineering Management, Decision Support Systems, European Journal of Operational Research, Communications of the ACM, Decision Sciences Journal, as well as in various international conference proceedings. Zhang was a co-recipient of the Best IS Publications of the Year 2010. He has also won the Research Excellence Award (2011), the MBA Teacher of the Year (2010), the Best Paper Award (2009), and the Ackerman Scholar Award (2007–2009) at the UConn School of Business. Zhang is a guest associate editor for MIS Quarterly and serves on the editorial board of Journal of Database Management, Journal of Electronic Commerce Research, and Electronic Commerce Research and Applications.