Socialisation in Peer-to-Peer Knowledge Management Christoph Schmitz & Steffen Staab & Christoph Tempich (Institut AIFB, Universit¨at Karlsruhe, Germany
[email protected])
Abstract: IT support for knowledge management that builds on rather standard information systems architectures, e.g. a Web server with underlying database technologies, has proven beneficial in many situations where knowledge processes supported in this way were comparatively rigid and where the value of knowledge could be reasonably easily be assessed. However, these assumptions do not hold for less rigid knowledge processes; thus, more decentralised solutions have been proposed. With SWAPSTER we have built a Peer-to-Peer knowledge management platform that avoids some of the issues that detriment centralised solutions. This platform is surveyed here. We also show methods that support new ways of socialisation enabled by the Peer-to-Peer platform. Key Words: Distributed Knowledge Management, Socialisation, Simulation Category: H 4.1 I 2.4 C 2.4 I 6.3 I 2.11
1
Introduction
IT support for knowledge management that builds on rather standard information systems architectures, e.g. a Web server with underlying database technologies, has proved beneficial in many situations where knowledge processes supported in this way were comparatively rigid and where the value of knowledge could be reasonably easily be assessed. A typical instance of such a KM setting at a consulting company may, e.g., involve debriefing at project touchdowns, where new or supporting insights about the current project practices would be collected, their value assessed and — possibly — inserted into the knowledge base. This knowledge process is, and should be, very rigidly defined and it is crucial for the success of the best practice knowledge base is the assessment of old and new knowledge entries and their easy retrieval. However, once we compare this to the situation that, e.g., consultants face in their daily practice, the assumptions on the knowledge processes do not hold. The consultants react on the spur of the moment, they do continuously produce and refine knowledge items, the values of which cannot be assessed as the influence of such items only appears in the end of the project, and, moreover, their assessments require a lot of time. Thus, a lot of knowledge that is made explicit on their laptops in their daily ad-hoc consulting practice is not available at any centralised resource. This knowledge remains with them and is not accessible by their peers, thus aggravating the finding and accessing of ad-hoc created knowledge as well as the socialisation with the right person!
With SWAPSTER we have built a Peer-to-Peer knowledge management platform that avoids some of the issues that detriment centralised solutions. This platform is surveyed here. While such a platform in itself appears to be very valuable in order to allow for accessing and finding ad-hoc created knowledge items, we also show methods that support new ways of socialisation (cf. [8]) enabled by the Peer-to-Peer platform. In the following, we briefly introduce SWAPSTER. Then we sketch two methods and experiments for online and offline creation of networking topologies that support socialisation between people that have created ad-hoc knowledge.
2
The SWAPSTER Platform for Peer-to-Peer KM
The SWAP environment (Semantic Web And Peer-to-peer; short Swapster) [3] is a generic infrastructure which was designed to enable knowledge sharing in a distributed network. Nodes wrap knowledge from their local sources (files, e-mails, etc.) and they ask for and retrieve knowledge from their peers. For communicating knowledge SWAPSTER transmits RDF structures1 , which are used to convey conceptual structures (e.g., the definition of what a conference is) as well as corresponding data (e.g., data about I-Know-2004). For structured queries as well as for keyword queries, SWAPSTER uses SeRQL, an SQL-like query language that allows for queries combining the conceptual and the data level and for returning newly constructed RDF-structures. Some of the core components of SWAPSTER are: The Local Node Repository stores all the knowledge as well as meta knowledge a node wants to share with other peers in RDF, accessible by SeRQL queries. The Knowledge Sources of a node comprise all sources from which knowledge can be extracted. Knowledge sources supported by SWAPSTER are the Windows file system, Outlook e-Mail and relational databases. Obviously, about full text SWAPSTER extracts only some meta information (‘who wrote a file?’, ‘who sent an e-Mail?’, ‘which keywords are contained?’), but not the knowledge contained in the text. For databases, SWAPSTER may fully represent their structures, e.g. their tuples. The Knowledge Source Integrator (KSI) maps and migrates the different knowledge sources into the local node repository in order to have one well-defined, unique, and coherent store for each node. In addition to literal knowledge, the KSI extends the knowledge proper by meta knowledge according to the SWAPSTER meta data model [3], including data such as provenance of knowledge (‘which node made this particular statement?’). The meta data model is the basis for the online socialisation support algorithm (Section 3.2). The User Interface allows for query composition and browsing of the local node repository. Finally, the Communication Adapter (CA) is responsible 1
http://www.w3.org/RDF/
for relaying communication between nodes in the network. Our current implementation of the CA builds on JXTA.
3
Socialisation between Peer Nodes
Even though knowledge management measures strive to improve the knowledge flow in organisations to their benefit, in general communication in person networks has shown to be rather effective. For instance, Milgram showed in his classical experiment that peer-to-peer passing of information to targeted, though unknown parties is effectively and efficiently possible [6]. In this experiment subjects were given the task to have a letter reach an unknown addressee. The subjects as well as the people in this relay chain were given the constraint that they were only allowed to relay the letter to people they would know. The experiments showed that the letters indeed reached their targets, i.e. the forwarding was effective, and did so within a few steps, i.e. the forwarding was efficient. The bases for the success of Milgram’s experiment were (i) the knowledge that people had about other people and (ii) the network topology that was formed by their ‘knows’-relationships, which was a small-world network topology. The structures of many naturally evolving networks such as collaboration networks or food chains have been proven to have similar properties [12]. Two characteristics of these networks have been observed [12]: Social networks have a large clustering coefficient, i. e. on the average, the neighbours of each peer have many links between each other. Second, while clustered, these networks still have a small characteristic path length (CPL), i. e. the average distance between any two given nodes is small. Many types of regular or random graphs are known to have either a large clustering coefficient or a small CPL, but not both. SWAPSTER forms a network of nodes (or peers) that ask and relay queries (vaguely) similar to how the people in the Milgram experiment forwarded letters. However, in SWAPSTER the name and address is given in form of a SeRQL query for content. For SWAPSTER to be effective and efficient, i.e. return knowledge that is searched for and not flood the network with requests, it is hence critical that (i) a peer knows what its other peers know and (ii) a SWAPSTER network has a small-world network topology. In the following we briefly sketch two methods that further the development of ‘sociable’ network structures. There is one method for offline or batch topology formation and there is one method for ad-hoc topology formation. The two are complementary, as the first is more suitable for jump-starting communication with a reasonable network structure, while the second one is better suited to reflect changing needs and changing networks.
3.1
Offline Socialisation
The goal of the algorithms we present in this section is to generate a network topology that has the desirable properties stated above – low diameter and high clustering – in a way that is suitable for P2P environments. This basically means that we can only exploit local knowledge on the peers, i. e. knowledge about their respective neighbors. Starting from a given network, e. g. a random one, we need to rewire the connections between the peers in a way such that the network evolves over time into a small-world network. Moreover, we want the clusters to be homogeneous in the sense that they consist of peers which are concerned with similar topics. To that end, we extended the clustering coefficient as introduced by Watts and Strogatz [12] to include the homogeneity of the clusters into the measure. This weighted clustering coefficient indicates to what extent each peer in the network is surrounded by other peers which are also connected to each other and cover similar topics to itself. 3.1.1
Rewiring Algorithms
To put the rewiring into practice we use two strategies for rewiring which can be executed by peers which are in a neighborhood that does not satisfy the above mentioned criteria. These strategies are based on self descriptions consiting of entites or sets of entities from an ontology which describes the peer’s interests. With both strategies, a peer joining the network or wishing to improve its position executes a walk through the network, collecting self descriptions of others on the way. It may then drop old neighbors in favor of newly found ones which are closer to the peer’s own self description. RandomWalk The peers on the way are chosen randomly; this is suitable for an unstructured network where no assumptions can be made about the location of more suitable neighbors. GradientWalk At each node, the neighbor the self description of which is closest to the originating peer is selected. This makes sense when the network is already structured, so that “walking towards” a more suitable neighborhood is possible. 3.1.2
Evaluation Setting
Starting out with a network of 2500 peers in which each peer was randomly connected to 5 others, we applied the both rewiring strategies (with equal probabilities) to restructure the network. Each peer was labelled with an item from the top two levels of the DMOZ taxonomy (see 3.2.2); the metric used to determine the similarities of the self descriptions was the graph distance within the taxonomy.
We measured the characteristic path length2 , the clustering coefficient, and the weighted clustering coefficient over time. Rewiring started at time step 10000. 3.1.3
Results
Figure 1 shows that while the clustering coefficient increased from 0.14% to 52%, the characteristic path length was almost halved (from 4.03 to 2.04). Figure 1 also shows that the weighted clustering coefficient increased even more than the clustering coefficient (0.009% to 30%) – this means that we do not just yield dense clusters, but dense clusters that cover coherent sets of topics. Our experiments show that a clustered, low-diameter graph structure can be created using the abovementioned rewiring strategies. The network can thus be structured a-priori to benefit query routing and socialisation between neighboring peers. 3.2
Online Socialisation
Self-descriptions, e.g. summaries of one’s documents, typically represent the past interests of a person, while a person’s future knowledge is to some extent indicated by the questions a person asks. REMINDIN’ — our query routing algorithm — is based on a social metaphor of asking for knowledge and getting response when one is interested in some topics[?]. 3.2.1
REMINDIN’ Algorithm
REMINDIN’ builds on social metaphors of how such a human network works: We observe that a human who searches for answers to a question may exploit the following assumptions3 : (1) A question is asked to the person who one assumes that he best answers the question. terms only means that he has the most knowledge. In one may consider properties like latency, costs, perceives a person as knowledgeable in a certain domain if he/she knew answers to our previous questions. (3) A general assumption is that if a person is well informed about a specific domain, he/she will probably be well informed about a similar, e.g. the next more general, topic, too. (4) To quite some extent, people are more or less knowledgeable independently of the domain. (5) The profoundness of knowledge that one perceives in other persons is not measured on an absolute scale. Rather, it is often considered to be relative to one’s own knowledge. REMINDIN’ builds on the metaphors of peer-to-peer networks being like a human social network and adopts the above mentioned assumptions in an algorithmic manner. 2
3
Due to the fact that the all-pairs-shortest-paths computation of the CPL is expensive, we took random 10% samples of all nodes and measured all paths from these, hence the small variations in CPL We do not claim that these observations of social networks are in any way exhaustive or without exceptions.
3.2.2
Evaluation Setting
In order to evaluate the effectiveness and efficiency of REMINDIN’ we have simulated a peer-to-peer network with plausible datasets of statements and query routing by REMINDIN’. We chose DMOZ, the open directory project4 , as an input to set up the local node repositories of the individual peers. DMOZ manually categorises Web pages of general interest into a topic hierarchy. For each topic one or several editors are responsible to define subtopics or related topics and to contribute links to outside Web pages to the topic pages of DMOZ. In our simulations an editor corresponds to a single peer. Hence, the local node repository of one peer contains the topic hierarchy and links to outside Web pages of the topics he is responsible for. Consequently, in the simulation, an editor asks for links to outside Web pages which belong to a certain topic. We initialized the simulation with a network topology which resembles the JXTA network. In the simulation, peers were chosen randomly and they were given a randomly selected query to question the remote peers in the network. Each peer used REMINDIN’ to select up to pmax = 2 peers to send the query to. As a baseline we compared REMINDIN’ to a naive algorithm. In this case a peer selected the remote peers randomly. A peer that received a query tried to answer the query. In some evaluation scenarios, we have integrated a randomContribution. The randomContribution percentage of selected peers were randomly exchanged again randomly selected ones known by the querying peer. Each query was forwarded until the maximal number of hops (hmax ) is reached. To evaluate the effectiveness, i.e. to measure to which extent one may retrieve statements from the peer-to-peer network based only on local knowledge about possibly relevant peers, and efficiency, i.e. to what extent the network is being flooded by one query, of our approach we used two measures. The Recall describes the proportion between all relevant statements – with respect to a query – in the peer network and the retrieved ones. Messages per query indicates how many messages are send around due to one query. 3.2.3
Results
Our evaluations show, that (1) peers learn about other peers’ knowledge, (2) find knowledge faster over time, (3) are able to retrieve a high proportion of the available knowledge, and (4) peer networks profit from ‘random social contacts’ such as is happening in the famous coffee corners of human organisations. Figure 2 summarises the comparison between REMINDIN’ and the naive approach. In this scenario we used 20,000 queries and 1100 peers. The naive approach had a constant recall of approximately 20%. The recall of REMINDIN’ 4
http://www.dmoz.org/
0.6
6
0.5
5
0.4
4
0.3
3
0.2
2
Comparision of Query Routing Algorithms
REMINDIN' 0,8
REMINDIN' with Random Naive algorithm
1
0.1
0,7
0,6 Recall (Statements)
Characteristic path length
Clustering coefficient Weighted clustering coefficient
0,9
0,5
0,4
0,3
0,2
0,1
0
0
5000
10000
15000
20000
25000
30000
0 35000
Simulation time
0 0
5000
10000
15000
20000
No. of Queries
Figure 1: Characteristic path length and
Figure 2: Peer selection based on RE-
clustering coefficients over time
MINDIN’ and a little randomness achieves 80% recall
without random contribution increases steadily over time and reaches a recall of 50% after all queries (cf. 1,2). Note that 20,000 queries in total result in just about 18 queries per peer, a fairly low number. REMINDIN’ with a little random contribution to the peer selection produces even better results (cf. 4). After 20,000 queries it reaches a recall of almost 80% (cf. 3).
4
Related Work
In recent years the shortcomings of centralised knowledge management systems for some applications have become evident and decentralised solutions have been proposed to the community (cf. [11]). Susarla et. al. in [4]focuses on incentives to share knowledge in a peer-to-peer system. Bonifacio et. al. presents in [1] the Kex platform, which has similar objectives as the SWAPSTER platform. While they support manual community building, viz. socialising, we here present automatic methods. Another prominent semantic based peer-to-peer system is Edutella. They acknowledge in [7] the need for cluster building methods in peer-to-peer systems but have sofar not presented any sophisticated solution as we have done in this work. Khambatti et al [5] point out the value of community formation in P2P systems for efficient communication among peers of related interests. In [10] a number of technologies are summarised which support decentralised knowledge management. However, they all lack semantic components which are required to enable sophisticated knowledge sharing.
5
Conclusion
In recent years the short comings of centralised knowledge management systems for some applications have become evident and decentralised solutions have been proposed. We have presented the SWAP peer-to-peer platform, which is decentralised knowledge management platform. For the platform we have discussed two algorithms to enable for offline and online socialisation.
We have demonstrated how rewiring strategies can be used to modify or grow network structures in a way that yields dense clusters of peers of similar interest. In our evaluations, we have furthermore shown that our algorithm REMINDIN’ fosters online socialisation. Intriguingly the knowledge creation abilities of coffee corners, viz. bringing people casually together, is also beneficial for our routing algorithm. The system as well as the both methods are ideal candidates to support vivid organisational forms as Virtual Organisations [2] or Communities of practise [9]. For the latter Smith et al. in [4] describes technical requirements on a system which could facilitate the emergence of communities of practise. With the presented work we leap one step further in this direction.
References 1. Matteo Bonifacio, Paolo Bouquet, Gianluca Mameli, and Michele Nori. Peermediated distributed knowldege management. In van Elst et al. [11]. To appear 2003. 2. Luis M. Camarinha-Matos and Hamideh Afsarmanesh, editors. Processes and Foundations for Virtual Organizations, volume 262 of IFIP INTERNATIONAL FEDERATION FOR INFORMATION PROCESSIN. Kluwer Academic Publishers, 2003. 3. Marc Ehrig et al. The SWAP data and metadata model for semantics-based peerto-peer systems. In Proc. Multiagent System Technologies, MATES 2003, LNCS 2831, 2003. 4. Clyde W. Holsapple, editor. Handbook on Knowledge Management. Springer, international handbooks on information systems edition, 2003. 5. M.S. Khambatti, K.D. Ryu, and P. Dasgupta. Push-pull gossiping for information sharing in peer-to-peer communities. In Proc. International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pages 1393– 1399, Las Vegas, Nevada, June 2003. 6. Stanlay Milgram. The small world problem. Psychology Today, 67(1), 1967. 7. W. Nejdl et al. Super-peer-based routing and clustering strategies for rdf-based peer-to-peer networks. In Proc. of the 2003 WWW Conference, Budapest, Hungary, May 2003. 8. Ikujiro Nonaka and Hirotaka Takeuchi. The Knowledge Creating Company. Oxford University Press, 1995. 9. J. Seely-Brown and P. Duguid. Organisational learning and communities of practice. Organisational Science, 2(1), 1991. 10. Eric Tsui. Technologies for personal and peer-to-peer (p2p) knowledge management. Technical report, CSC Leading Edge Forum (LEF), 2001. 11. Ludger van Elst, Virginia Dignum, and Andreas Abecker, editors. Lecture Notes in Artificial Intelligence (LNAI) 2926. Springer, Berlin, 2003. Revised and Invited Papers. 12. Duncan J. Watts and Steven Strogatz. Collective dynamics of ’small-world’ networks. Nature, 393:440–442, June 1998.