On the exploitation of relationships in online

0 downloads 0 Views 254KB Size Report
Professor Christos Makris). In September 1999, he entered the Department of. Computer Engineering and Informatics, School of Engineering, University of.
Int. J. Collaborative Enterprise, Vol. 2, No. 1, 2011

On the exploitation of relationships in online communities of practice G. Gkotsis* and Nikos Karacapilidis Industrial Management and Information Systems Lab, MEAD, University of Patras, 26500 Rio Patras, Greece Fax: +30-2610-997260 E-mail: [email protected] E-mail: [email protected] *Corresponding author

Nikos Tsirakis Computer Engineering and Informatics Department, University of Patras, 26500 Rio Patras, Greece Fax: +30-2610-960322 E-mail: [email protected] Abstract: Numerous tools aiming at facilitating or enhancing collaboration among members of diverse communities have been already deployed and tested over the web. Taking into account the particularities of online communities of practice, this paper introduces a framework for mining knowledge that is hidden in such settings. Our motivation stems from the criticism that contemporary tools receive regarding lack of active participation and limited engagement in their use, partially due to their inability of identifying and meaningfully exploiting important relationships among community members and collaboration-related assets. Particular attention is given to the identification of requirements imposed by contemporary communities and learning contexts. Our overall approach elaborates and integrates issues from the disciplines of data mining and social networking. Keywords: community of practice; clustering; data mining; knowledge discovery; social network analysis; knowledge discovery; collaboration. Reference to this paper should be made as follows: Gkotsis, G., Karacapilidis, N. and Tsirakis, N. (2011) ‘On the exploitation of relationships in online communities of practice’, Int. J. Collaborative Enterprise, Vol. 2, No. 1, pp.75–86. Biographical notes: G. Gkotsis is a PhD student at the Mechanical Engineering and Aeronautics Department, University of Patras. He received his MSc from the Computer Engineering and Informatics Department, University of Patras in 2005, where he received his diploma in 2002. His research interests are on collaboration, argumentation systems, knowledge management and visualisation, web engineering and hypertext.

Copyright © 2011 Inderscience Enterprises Ltd.

75

76

G. Gkotsis et al. Nikos Karacapilidis is a Professor at the University of Patras, Greece in the field of management information systems. His research interests lie in the areas of intelligent web-based information systems, e-collaboration, knowledge management systems, group decision support systems, computer-supported argumentation, enterprise information systems and semantic web. Nikos Tsirakis is a PhD candidate in the Computer Engineering and Informatics Department in the University of Patras (Advisor Assistant Professor Christos Makris). In September 1999, he entered the Department of Computer Engineering and Informatics, School of Engineering, University of Patras and he received his diploma as a Computer Engineer in 2004. In October 2006, he received his Master in Computer Science in the same department. His research interests are the design and analysis of data mining algorithms and applications, specially for huge data manipulation, e.g., databases, data streams and XML data, hypertext, software quality assessment and finally web technologies.

1

Introduction

As information diffusion is becoming enormous, contemporary knowledge workers are facing a series of problems. People are straggling when trying to filter relevant information, extract knowledge out of it, and apply specific practices on a problem under consideration. This phenomenon, broadly known as ‘information overload’, has currently raised new, challenging, but not fully addressed issues. At the same time, it is widely admitted that one of the best means to keep a knowledge worker’s competence high is through continuous learning (Rosenberg, 2000). In fact, most organisations already support learning activities through seminars and other traditional learning activities. Nevertheless, these activities do not comply with every learning need. Collaborative environments aiming at supporting collaboration among groups of people forming communities of practice (CoPs) are believed to be one of the most promising solutions to promote what is commonly known as ‘collective intelligence’ or ‘organisational memory’ (Ackerman, 1998). The term CoP is used to define a group of people with “common disciplinary background, similar work activities, tools and shared stories, contexts and values” (Millen et al., 2002). At the same time, modern learning theories strongly support the value of communities and collaborative work as effective settings for learning (Hoadley and Kilner, 2005). More specifically, they emphasise on collaborative learning work that refers to processes, methodologies and environments, where professionals engage in a common task and where individuals are accountable to each other. When speaking about collaborative learning, we espouse Wenger’s perspective of learning as a social phenomenon in the context of our lived experience of participation in the world (Wenger, 1998). A comparison between online and traditional CoPs reveals several differences (Palloff and Pratt, 1999). Even though the goal of this work is not to study them thoroughly, some of them are intentionally mentioned. A traditional CoP often confronts time and space limitations and its members are assigned a specific role. Furthermore, entrance to a traditional CoP may require a more intentional motivation, while CoP members tend to self-organise through certain physical activities. On the other hand,

On the exploitation of relationships in online communities of practice

77

online CoPs are more likely to define more ‘fluid’ limitations. This can result to more ‘peripheral’ members with less ‘visibility’ (Zhang and Storck, 2001). Taking the above into consideration, several implications arise concerning online CoPs. In fact, usage analysis of such communities has shown that one of the greatest problems is the fading-back or even the absence of members’ identities (Haythornthwaite et al., 2000). Related to the above remarks, contemporary tools receive criticism as far as active participation and engagement of their users is concerned; this is partially due to the inability of identifying and meaningfully exploiting a set of important relationships among community members and collaboration-related assets. To address this problem, this paper introduces a framework that enables one to reveal meaningful relationships, as well as other valuable information, about the community members’ roles and competencies. The proposed framework exploits and integrates features originally found in the data mining and social networking disciplines, and is intended to be used towards strengthening a community’s integrity.

2 Related work Contemporary approaches to web-based collaboration environments build on diverse user profiling mechanisms (Fink and Kobsa, 2000). These approaches usually distinguish between static (user defined) and dynamic (system updated) profiles. Dynamic attributes derive by tracking down user actions and aim at providing a more personalised environment. Personalisation of the environment may include user interface adaptation by making most usable actions or information more easily accessible. Moreover, by taking into account a user’s profile, these approaches aim at filtering information that generally resides in a big collection of documents. Information filtering is achieved by reading the content of these documents and combining this content with the user’s profile. The main goal of these approaches is to provide individualised recommendations to the users about the system items (Burke, 2002). Social network analysis (SNA) is a tool that allows the examination of social relationships in a group of users, which is able to reveal hidden relationships (Wasserman and Faust, 1994). In business, SNA can be a useful tool to reveal relationships and organisational structure other than those formally defined. These relationships are extracted by examining the communication level among employees, usually resulting to a collaboration graph. The outcome of this graph’s analysis is likely to result to a flow chart that does not necessarily follow the formal organisational structure. Traditionally, social relationships are revealed by acquiring data manually (through questionnaires or interviews). SNA has been applied with success in business for decades and is regarded as a useful tool to analyse how a corporate operates. This includes identifying employees that are ‘key players’ in the communication flow of the company. From a technological point of view, ‘friend of a friend’ (FOAF, http://foaf-project.org/), which is an extension of resource description framework (RDF) specified with web ontology language (OWL), has received great appraisal and may be considered as a part of the effort to enhance web with more semantics (semantic web). It is worth noting that within this context, Tim Berners Lee introduced the term ‘giant global graph’ as the graph that unites World Wide Web and social graph (Lee, 2007). Based on the above technology, much research has been conducted

78

G. Gkotsis et al.

(Paolillo and Wright, 2004; Mika, 2005), mainly focusing on technological solutions that exploit the FOAF standard to create large organised information about users. From another perspective, data mining plays an important role in the area of collaboration and learning support systems. Data mining can be defined as the effort to generate actionable models through automated analysis of their databases. Data mining can only be deployed successfully when it generates insights that are substantially deeper than what a simple view of data can provide. Clustering is one of the basic data mining techniques (Han and Kamber, 2005; Hand et al., 2001), on which numerous approaches have been already proposed. Generally speaking, the goal of clustering is, given a specific dataset, to find ‘naturally’ occurring groups within this dataset. With the rapid increase of web-traffic, understanding user behaviour based on their interaction with a website is becoming more and more important. Clustering, in correlation with personalisation techniques of this information space, has become a necessity. Web-based systems (including collaboration support systems) are equipped with mechanisms that report on events, conditions, errors and alerts. In addition, when dealing with web-based learning systems, even common log files about users’ requests are considered. The combination of this information and the use of data mining techniques can help one analyse users’ activities in depth. In summary, the need for extracting and analysing relationships among users has been identified for a long time. The topic of this paper is addressed by two different but complementary approaches. One of them applies data mining techniques that results to the identification of groups of people with similar preferences. The other adopts SNA for gathering the required knowledge with regard to these groups. This approach is applied to an online CoP aiming at revealing issues like management or human resource vulnerabilities. Integrating these approaches, this paper introduces a framework that aims at identifying, measuring and exploiting hidden relationships in a CoP. The overall approach followed in this paper, as described in the next section, takes into account the particularities of these communities by customising practices from data mining and collaborative filtering and unites them with SNA.

3 Mining hidden knowledge Let us consider a collaboration support system, which adopts common Web 2.0 features and functionalities. The system allows members of a CoP to easily express pieces of their knowledge through various types of documents. These documents are uploaded on virtual working environments, usually known as workspaces. Let us also assume that the system supports document rating by its users. For such a setting, we introduce a framework for mining hidden knowledge. The proposed framework exploits a set of metrics that are discussed below. First, we define user similarity sim(i, j) between users i and j. Similarity is calculated by exploiting the user nearest neighbour algorithm (McLaughlin and Herlocker, 2004), which takes as input the items’ ratings provided by users. More specifically, the algorithm for measuring user similarity is based on the Pearson correlation, with respect to the commonly rated items. We define: ∑ sim(i, j) =

a∈Ai ∩Aj

(Ratingi (a) − Ratingi )(Ratingj (a) − Ratingj ) σi σj

,

On the exploitation of relationships in online communities of practice

79

where σi and σj represent the standard deviations of the common item ratings for users i and j, respectively, Ai is the set of items that user i has rated, Ratingi (a) is the rating of user i on item a, and Ratingi is the average rating of user i on the set of commonly rated items. To avoid overestimating the similarity of users with small corpus of documents that have been rating by both users i and j, the above equation is amended as follows: sim′ (i, j) =

min{|Ai ∩ Aj |, γ} · sim(ij), γ

where γ is a system-defined threshold (Herlocker et al., 2002). In our case, the users of the system are the authors of all documents uploaded on the system’s workspaces. To weigh account ratings of user i on documents created by user j, a modification of the above measurement is defined. It is: sim′′ (i, j) =

Ratingij · sim′ (ij), Ratingi

where Ratingij is the average rating of user i on documents created by user j. The above equation takes into account how user i evaluates documents of user j with regards to the rest of their common documents. As derives from the above, similarity between users i and j is not reciprocal. This remark is of great importance. Naturally, depending on a CoP’s activity and nature of data, there are cases where the number of user’s j documents that are rated by user i is small and therefore the information is limited. Thus, depending on the amount and nature of our data, either sim′ (ij) or sim′′ (ij) can be selected (see more in Section 3.1). Secondly, we define Relationshipij between users i and j as a boolean measurement of direct communication between these users as follows: for each couple (i, j) of members i and j, we create a matrix A where aij is the number of document ratings that user i has made on documents created by user j. From this matrix A, we construct a new symmetric matrix A′ where a′ij = max{aij , aji } (weak cohesion) or a′ij = min{aij , aji } (strong cohesion). Assuming that ni is the overall number of ratings of user i, we define a symmetric measurement, called communication degree dij , which represents the communication flow between users i and j, as follows: dij =

a′ij . ni + nj − a′ij

dij is 1 in case where users i and j rate exclusively each other’s documents, and 0 if none of the users has rated any of the other’s documents (weak cohesion) or at least one has not rated all of the other’s documents (strong cohesion). We define the binary function Relationshipij to indicate the existence or not of adequate direct information exchange between users i and j. It is: { 1 if dij ≥ t, Relationshipij = 0 if dij < t where t is a threshold depending on the nature and needs of the community under consideration. Relationshipij is the fundamental relationship that is used to construct the social network in the community and will be used for the required analysis.

80

G. Gkotsis et al.

Other metrics adopted within the proposed framework are: • Clusters: they refer to groups of entities (users in our case), in a way that entities in one cluster are very similar, while entities in different clusters are quite distinct. Each cluster can combine various plausible criteria (Bock, 1989). Sometimes, there are some requirements about the entities in the cluster, such as to share the same or closely related properties, to show small mutual distances or dissimilarities or to have ‘contacts’ with at least one more entity in the same cluster. • Degree: it expresses the number of people a CoP member is connected to. Members with high degree may be considered as of major importance in the network hub, since they keep the CoP tightly connected. • Betweenness: While someone may be tightly connected with someone else, it might be the case that some CoP members express the CoP’s integrity ‘better’. People with high betweenness value are considered to better express a collective consensus. More formally, betweenness of a member can be expressed as the total number of shortest paths between all pairs of members that pass through this member (Freeman, 1977; Anthonisse, 1971). • Closeness: it expresses the total number of links that a member must go through in order to reach everyone else in the network (Sabidussi, 1966).

3.1 Clustering Clustering can be defined as the process of organising objects in a database into groups (clusters), such that objects within the same cluster have a high degree of similarity (while objects belonging to different clusters have a high degree of dissimilarity) (Anderberg et al., 1973; Jain and Dubes, 1988; Kaufman and Rousseeuw, 1990). Generally speaking, clustering methods about numerical data have been viewed in opposition to conceptual clustering methods developed in artificial intelligence. More precisely, numerical techniques emphasise on the determination of homogeneous clusters according to some similarity measures, but provide low-level descriptions of clusters (Anderberg et al., 1973). Recently, there are works on clustering that focus on numerical data, whose inherent geometric properties can be exploited to naturally define distance functions between data points, such as DBSCAN (Ester et al., 1996), BIRTH (Zhang et al., 1996), C2P (Nanopoulos et al., 2001), CURE (Guha et al., 2001), CHAMELEON (Karypis et al., 1999), WaveCluster (Sheikholeslami et al., 2000). However, data mining applications frequently involve many datasets that also consist of categorical attributes on which distance functions are not naturally defined. Referring to a specific CoP, a cluster is a collection of users that share similar ratings on items of the same workspace. For example, let SP1 , SP2 ,..., SPk be the k workspaces used by a community A. We build an array X of size n × n (n is the number of users), where the cell Xij denotes the correlation between user i and user j. Correlation can be either sim′ (ij) or sim′′ (ij), which will result to symmetric undirected or directed arrays, respectively. After the construction of these arrays, a unified array can be built for the whole set of workspaces by calculating the average value of each cell. At this point, it is worth noting that the exploitation of data arrays for each workspace (and not the unified data array of the community) can provide a more detailed

On the exploitation of relationships in online communities of practice

81

view of user and workspace activity. This approach of clustering user data for a specific workspace results in lower level clusters, the so-called micro-clusters (Aggarwal et al., 2003). Typically, micro-clusters reduce data, allow the creation of data summaries and have found usage in data streams and applications with vast amount of data that evolve over time very fast. Moreover, micro-clusters can be used as an intermediate result to create higher level clusters. In the context of a collaborative environment, micro-clusters can be utilised as indicators for a detailed and comparative view of the CoP under investigation (through incremental clustering) (Can, 1993). Regarding the clustering procedure and depending on the nature of the data gathered, two different approaches with respect to the nature of the arrays (one concerning symmetric undirected arrays and one concerning directed arrays) are followed. Both of them are described below.

3.1.1 Symmetric undirected arrays In this case, an algorithm for hierarchical clustering is applied. In hierarchical clustering, there is a partitioning procedure of objects into optimally homogeneous groups (Johnson, 1967). There are two different categories of hierarchical algorithms: these that repeatedly merge two smaller clusters into a larger one, and those that split a larger cluster into smaller ones. In minmax cut algorithm (Ding et al., 2001), given n data objects and the pair similarity matrix W = (wi,j ) (where wi,j is the similarity weight between i and j) the main scope is to partition data into two clusters A and B. The principle of this algorithm is to minimise similarity between clusters and maximise similarity within a cluster. The similarity between clusters A and B is defined as the cutsize ∑ s(A, B) = i∈A,j∈B w(i, j). Similarity (self-similarity) within a cluster A is the sum of all similarity weights within A: s(A, A). Consequently, the algorithm requires to minimise s(A, B) and maximise s(A, A) and s(B, B), which is formulated by the min-max cut function M M cut(A, B): M M cut(A, B) =

s(A, B) s(A, B) + s(A, A) s(B, B)

Linkage l(A, B) is a closeness or similarity measure between clusters A and B; it quantifies cluster similarity more efficiently than weight, since it normalises cluster weight: l(A, B) =

s(A, B) s(A, A) · s(B, B)

S(A,i) , where For a single user i, his linkage to cluster A is: l(A, i) = S(A,A) S(A, i) = S(A, B), B = {i}. According to this equation users close to the cut can be found. If a user i belongs to a cluster, his linkage with this cluster will be high. When a user is near the cut, then the linkage difference can be used: ∆l(i) = l(i, A) − l(i, B). A user with small ∆l is near the cut and is a possible candidate to be moved to the other cluster. Following this concept, more clusters can be created by recursively applying minmax cut until certain stopping criteria are met (such as the number of clusters or a predefined minmax cut value). Another clustering choice based on the above algorithm is the use

82

G. Gkotsis et al.

of average similarity that is maximised during clustering. The average self-similarity of this cluster is computed as s¯(A, A) = s(A, A)/n2 , where n = |A|. A high average selfsimilarity is a sign of a cluster with homogeneous objects. According to this approach, the loosest cluster (i.e., the one with the smallest average similarity) is split.

3.1.2 Directed arrays In this case, the clustering algorithm presented by Chakrabarti et al. (1998) (which is based on Kleinberg’s link analysis algorithm, Kleinberg, 1999) is adopted. Initially, this analysis was applied to documents related to each other through directed relationships (like in the World Wide Web); for every document, authority and hub scores are calculated as the sum of hub and authority scores pointed to and from this document, respectively. More specifically, having set up the user similarity matrix X of size n × n as before, a weighted directed graph of users is allocated and the non-principal eigenvectors of X T · X are computed. The components of each non-principal eigenvector are assigned to each user. By ordering in increasing order the values of each eigenvector, a partition is declared among users at the largest gap between them. The entries of X T · X represent authority scores and those of X · X T refer to hub scores. The result is groups of users that are close to each another under the authority or hub score value. The algorithm can create up to 2 · n different groups of users, but experimental results have shown that groups with users that have large coordinates in the first few eigenvectors tend to recur and therefore a first set of collections of users can satisfy the clustering procedure (e.g., Kleinberg, 1999). Having considered these groups of users, cluster analysis can be performed. Such an analysis may provide (depending on the data) a better understanding about the relations between users, especially in systems where multiple types of data appear. It can also help us detect outliers (objects from data that have different distribution of values from the majority of other objects in the workspace), which may emerge as singletons or small clusters. For a given intensity of communication or actions, clusters emphasise the various intensity levels. Cluster analysis also reveals useful information for interpretation. We can find how many groups are in the same community, while each group can be inspected to reveal patterns of interest.

3.2 Social network analysis After the clustering procedure, groups of users that share the same or closely related preferences with regards to their documents’ ratings are revealed. More formally, there is a classification of members of every CoP into a set of clusters C1 , C2 , . . . , Cm . For each cluster, the proposed framework computes the values of each of the SNA measures (i.e., degree, betweenness and closeness). More specifically, contrary to clustering where the measurement was correlation (i.e., user similarity), SNA exploits the Relationshipij measurement, which corresponds to communication among members. This analysis can provide useful findings; for example, a specific group of users considered to have high similarity will appear as a cluster. More generally, the combination of clustering and SNA highlights properties in groups of users of particular interest; this extracted knowledge can be provided to the users through notification or recommendations mechanisms. Finally, concerning the computational and technological aspects of SNA metrics introduced, even though a thorough study of the different algorithms and implementation

On the exploitation of relationships in online communities of practice

83

approaches concerning SNA does not fall in the scope of this paper, some of them are mentioned. Closeness and betweenness are two metrics of the centrality of vertex within a graph and can be computed with well established algorithms that are tightly coupled with common shortest path algorithms (e.g., graph traversal using Dijkstra’s algorithm, Dijkstra, 1959). From an implementation point-of-view, LEDA (Mehlhorn et al., 1997) is a general scope library which provides adequate support to perform the required computations.

4 Discussion A CoP is regarded as a dynamic network of users. Like any other network, there are some parts that may have crucial impact on its efficiency. The identification of a set of qualitative metrics that describe the above network can be useful to help members work more adequately and guarantee the CoP continuity. The framework we introduce in this paper may be applied to a large number of systems designed to support collaboration and may help people identify information that was not obvious. More specifically, by identifying clusters of users within a community, it is possible to reveal intellectual, social or spiritual differences among community members, as well as the way these users self-organise to smaller groups. The above remarks may either be proven of vital importance to assure the community’s survivability, or may possibly expose the need to review the community’s current structure. Furthermore, by applying SNA on clusters, it is possible to identify whether a social network is centralised or not. Establishment of decentralised networks leads to a healthier network, since this results to less members that act as brokers (Hanneman and Riddle, 2005). This role is very powerful: a broker ties two groups of people and acts as a communication bridge (more formally, a broker is identified as a ‘cutpoint’ in the graph, i.e., a node that if removed increases the un-connected parts). In this manner, members that are found to have crucial impact on the group’s activity or collaboration may be notified and assume their corresponding duties and tasks, thus making sure that information is being propagated properly. Moreover, users that share common ideas and tend to rate other users’ contribution similarly can be brought together. In such a way, communication among members may be certified and collaboration can persist, even if some communication problems occur. Finally, one more issue that was taken under consideration and may be the subject of exploitation is the enlargement of legitimate peripheral participation (Lave and Wenger, 1991). By this term, we refer to the ability that a CoP has to attract newcomers and accommodate the unfolding of their abilities. Analysing user actions, identification of user relationships and extraction of several metrics regarding the communication level among CoP members is a promising technique in order to improve community productivity and awareness; these members are able to resolve issues faster and more efficiently, while members’ competencies may emerge in a more natural way. More specifically, awareness services can be built aiming at the self-evaluation of the participation of community members, providing them with valuable feedback about their overall contribution to the community and assisting them in collaborative learning as well as in self-reflecting. Using statistic reports populated from the analysis described above, such services can measure the level of the members’ contribution to the collaboration procedure. In such a way, a CoP member can keep track of the actual impact of resources posted by each user. Thus, one can be aware

84

G. Gkotsis et al.

of the overall impression that other members have about his participation. Finally, the entire CoP may advance faster and more securely, while its evolution can remain under constant and meaningful observation.

5 Conclusions In this paper, we introduced a framework that can be applied to a wide range of software platforms aiming at facilitating collaboration and learning among users. Our motivation stems from the fact that contemporary collaboration support environments suffer from low user engagement. Having described the basic characteristics of the settings under consideration, we presented an approach that integrates techniques from the data mining and SNA disciplines. More precisely, we formulated two different clustering approaches in order to find the values of some meaningful metrics. Moreover, we combined the outcomes of the proposed clustering methodology with SNA metrics. The result of this effort is to reveal valuable knowledge residing in a CoP. This knowledge may be used in a variety of ways, aiming at enhancing the system’s capability and allowing users communicate and collaborate more effectively. Future work plans include thorough validation of the proposed framework through real data from diverse CoPs using a particular collaboration support tool (namely CoPe it!, Karacapilidis et al., 2009), and investigation of diverse scenarios for its further exploitation.

References Ackerman, M. (1998) ‘Augmenting organizational memory: a field study of answer garden’, ACM Transactions on Information Systems (TOIS), Vol. 16, No. 3, pp.203–224. Aggarwal, C.C., Han, J., Wang, J. and Yu, P.S. (2003) ‘A framework for clustering evolving data streams’, VLDB ’2003: Proceedings of the 29th International Conference on Very Large Data Bases, VLDB Endowment, pp.81–92. Anderberg, M. et al. (1973) Cluster Analysis for Applications, Academic Press, New York. Anthonisse, J. (1971) ‘The rush in a directed graph’, Report BN9/71, Stichting Mathematisch Centrum, Amsterdam, The Netherlands. Bock, H. (1989) ‘Probabilistic aspects in cluster analysis’, Conceptual and Numerical Analysis of Data, pp.12–44. Burke, R. (2002) ‘Hybrid recommender systems: survey and experiments’, User Modeling and User-Adapted Interaction, Vol. 12, No. 4, pp.331–370. Can, F. (1993) ‘Incremental clustering for dynamic information processing’, ACM Trans. Inf. Syst., Vol. 11, No. 2, pp. 143–164. Chakrabarti, S., Dom, B., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S. and Tomkins, A. (1998) ‘Spectral filtering for resource discovery’, ACM SIGIR workshop on Hypertext Information Retrieval on the Web. Dijkstra, E. (1959) ‘A note on two problems in connexion with graphs’, Numerische Mathematik, Vol. 1, No. 1, pp.269–271. Ding, C., He, X., Zha, H., Gu, M. and Simon, H. (2001) ‘A min-max cut algorithm for graph partitioning and data clustering’, Proceedings of the 2001 IEEE International Conference on Data Mining, Washington, DC, USA, pp.107–114. Ester, M., Kriegel, H., Sander, J. and Xu, X. (1996) ‘A density-based algorithm for discovering clusters in large spatial databases with noise’, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, Vol. 6, pp.226–231.

On the exploitation of relationships in online communities of practice

85

Fink, J. and Kobsa, A. (2000) ‘A review and analysis of commercial user modeling servers for personalization on the world wide web’, User Modeling and User-Adapted Interaction, Vol. 10, No. 2, pp.209–249. Freeman, L. (1977) ‘A set of measures of centrality based on betweenness’, Sociometry, Vol. 40, No. 1, pp.35–41. Guha, S., Rastogi, R. and Shim, K. (2001) ‘CURE: An efficient clustering algorithm for large databases’, Information Systems, Vol. 26, No. 1, pp.35–58. Han, J. and Kamber, M. (2005) Data mining: Concepts and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Hand, D., Mannila, H. and Smyth, P. (2001) Principles of Data Mining, MIT Press, Cambridge, MA, USA. Hanneman, R. and Riddle, M. (2005) Introduction to Social Network Methods, University of California, Riverside, CA. Haythornthwaite, C., Kazmer, M., Robins, J. and Shoemaker, S. (2000) ‘Community development among distance learners: temporal and technological dimensions’, Journal of Computer-Mediated Communication, Vol. 6, No. 1, pp.1–26. Herlocker, J., Konstan, J. and Riedl, J. (2002) ‘An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms’, Information Retrieval, Vol. 5, No. 4, pp.287–310. Hoadley, C. and Kilner, P. (2005) ‘Using technology to transform communities of practice into knowledge-building communities’, ACM SIGGROUP Bulletin, Vol. 25, No. 1, p.40. Jain, A. and Dubes, R. (1988) Algorithms for Clustering Data Prentice-Hall, Inc., p.320. Johnson, S. (1967) ‘Hierarchical clustering schemes’, Psychometrika, Vol. 32, No. 3, pp.241–254. Karacapilidis, N., Tzagarakis, M., Karousos, N., Gkotsis, G., Kallistros, V., Christodoulou, S., Mettouris, C. and Nousia, D. (2009) ‘Tackling cognitively-complex collaboration with CoPe it!’, International Journal of Web-Based Learning and Teaching Technologies, Vol. 4, No. 3, pp.22–38. Karypis, G., Han, E. and Kumar, V. (1999) ‘Chameleon: a hierarchical clustering algorithm using dynamic modeling’, IEEE computer, Vol. 32, No. 8, pp.68–75. Kaufman, L. and Rousseeuw, P. (1990) Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Series in Probability and Mathematical Statistics, Applied Probability and Statistics Section (EUA) Kleinberg, J. (1999) ‘Authoritative sources in a hyperlinked environment’, Journal of the ACM, Vol. 46, No. 5, pp.604–632. Lave, J. and Wenger, E. (1991) Situated Learning: Legitimate Peripheral Participation, Cambridge University Press. Lee, T.B. (2007) Giant Global Graph, available at http://dig.csail.mit.edu/breadcrumbs/node/215 (accessed on 17 January 2010). McLaughlin, M. and Herlocker, J. (2004) ‘A collaborative filtering algorithm and evaluation metric that accurately model the user experience’, Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM New York, NY, USA, pp.329–336. Mehlhorn, K., N¨aher, S. and Uhrig, C. (1997) ‘The LEDA platform for combinatorial and geometric computing’, Automata, Languages and Programming, pp.7–16. Mika, P. (2005) ‘Flink: semantic web technology for the extraction and analysis of social networks’, Web Semantics: Science, Services and Agents on the World Wide Web, Vol. 3, Nos. 2–3, pp.211–223. Millen, D.R., Fontaine, M.A. and Muller, M.J. (2002) ‘Understanding the benefit and costs of communities of practice’, Commun. ACM, Vol. 45, No. 4, pp.69–73.

86

G. Gkotsis et al.

Nanopoulos, A., Theodoridis, Y. and Manolopoulos, Y. (2001) ‘C2P: clustering based on closest pairs’, Proceedings of the 27th International Conference on Very Large Data Bases, Morgan Kaufmann Publishers Inc., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp.331–340. Palloff, R. and Pratt, K. (1999) Building Learning Communities in Cyberspace, Jossey-Bass Publishers. Paolillo, J. and Wright, E. (2004) ‘The challenges of FOAF characterization’, Proceedings of the 1st Workshop on Friend of a Friend, Social Networking and the (Semantic) Web, 1st Workshop on Friend of a Friend, Galway, Ireland. Rosenberg, M.J. (2000) E-Learning: Strategies for Delivering Knowledge in the Digital Age, McGraw-Hill. Sabidussi, G. (1966) ‘The centrality index of a graph’, Psychometrika, Vol. 31, No. 4, pp.581–603. Sheikholeslami, G., Chatterjee, S. and Zhang, A. (2000) ‘Wavecluster: a multi-resolution clustering approach for very large spatial databases’, Proceedings of the International Conference on Very Large Data Bases, Springer-Verlag New York, Inc., Secaucus, NJ, USA, Vol. 8, pp.289–304. Wasserman, S. and Faust, K. (1994) Social Network Analysis: Methods and Applications, Cambridge University Press. Wenger, E. (1998) Communities of Practice: Learning, Meaning, and Identity, Cambridge University Press. Zhang, T., Ramakrishnan, R. and Livny, M. (1996) ‘BIRCH: an efficient data clustering method for very large databases’, ACM SIGMOD Record, Vol. 25, No. 2, pp.103–114. Zhang, W. and Storck, J. (2001) ‘Peripheral members in online communities’, Proceedings of the Seventh Americas Conference on Information Systems, Boston, Massachusetts.