PALADIN: A Pattern Based Approach to ... - Semantic Scholar

Proceedings of I-KNOW ’06 Graz, Austria, September 6 - 8, 2006

PALADIN: A Pattern Based Approach to Knowledge Discovery in Digital Social Networks Ralf Klamma, Marc Spaniol and Dimitar Denev (Lehrstuhl Informatik 5, RWTH Aachen University, Germany) {klamma|spaniol|denev}@i5.informatik.rwth-aachen.de

Abstract: Digital media are used to facilitate social structures thus building digital social networks. Disturbances in such networks occur on different levels (egocentric level, subgroup level, network) and have to be analyzed in the multidimensional context of reference disciplines like sociology and knowledge management. This paper presents a first repository of disturbance patterns for the analysis of digital social networks. Based on the Actor-Network Theory and the Social Network Analysis, new socio-theoretical models for handling complex media settings were developed. On these models a pattern language is defined to describe multidimensional disturbance patterns and to store them in a newly developed pattern repository. The core of the pattern language is the formal expression language for pattern (FELP) which used to specify the structural and the content-specific properties of digital social networks. Results can be visualized with open source graph visualization software. To evaluate the approach a case study has been performed in a repository containing 118 mailing lists and 17.359 individuals. Patterns like troll, spammer and burst have been applied successfully. Key Words: Semantic Web, Knowledge Discovery, Pattern Language, Social Network Analysis, Actor-Network Theory Category: H.3.1, H.3.2, H.3.3, J.4

1

Introduction

Information technology develops rapidly and so does computer-mediated communication. Internet has provided an environment where social networks can be facilitated on the basis of digtial media such as email, web logs, and wikis [17, 19]. These developments can be identified in the Semantic Web as well. The Friend of a Friend (FOAF) network [9] is an example for a member network. In the course of time digital networks expand, the media relations inside them become more complex and the chance for pathological behaviour (disturbances) increases. Such disturbances exist when members perceive differences between the expected behaviour of the network and the actual one. Examples are trolls and spammers. The troll aims at drawing attention and starting useless discussions. The spammer sends irrelevant for the community messages, mainly advertisements or containing binary files. The disturbances may hinder the communication, however they may induce reflection in the network and are starting points for learning processes. Therefore, it is important to detect and predict disturbances in the digital social networks.

458

Klamma R., Spaniol M., Denev D.: PALADIN: A Pattern Based Approach ...

The multidimensional context of the digital social networks is a subject of various disciplines such as sociology, computer science, and media theory. The fact that the knowledge about the disturbances is gathered mainly through experience and observation hamper the disturbance discovery. A pattern language overcomes these difficulties. A pattern from such a pattern language can be used for description, detection and prediction of disturbances. In order to enable the automatic application of the patterns of that pattern language, a taxonomy which gives a formal model of the digital social network is necessary. The approach, presented in this paper - PAttern LAnguage for DIsturbances in digital social Networks (PALADIN), combines, for the first time, digital social networks with a pattern-based automatic knowledge discovery technique. Its main characteristics are the model of the digital social networks, the pattern language for the multidimensional disturbances, and the mechanism for the automatic application of the patterns.

2

Related Work

In the field of analysis of digital social networks there exist a number of projects. We present three of them here with a stress on the model they use for representing a social network. In the COMB project [4] a study of the usage of mailing lists had been conducted. The mails received in a mailing list were analyzed and threads respectively genres were created. There are eight genres in COMB - dialogue, team announcement, socializing, distribution of completed work, reminder, group decision, distribution of project work, and criticism. Every genre has two properties: form and purpose. The mail content was read. Depending on the content the categories were marked as present or absent. The threads and the genres were cross-analyzed. Ariadne is a Java-based plug-in to the Eclipse IDE that visualizes social networks which exist in distributed software projects [7]. The social network is a network of developers. It is intended to improve the communication and the coordination in big software projects. Using the relations between the packages, classes and methods in the source repository, technical dependencies were defined. These dependencies and the authorship information of the software components were used for building the social network of the developers. The network was visualized with JUNG framework [16]. Flink is a system for extraction, aggregation and visualization of social networks [15]. The sources which Flink uses for creating the network are html pages, delivered as a result of Google queries, FOAF portals from the Semantic Web, public collections of emails, and bibliographical data. The gathered information was stored as RDF in Sesame server. The user interface of Flink was developed as a Java web application. Again JUNG framework [16] was used to display the network. All these


459

approaches focus on member networks in one medium only, neglecting complex media sets and dependencies between the actors of a network. None of the approaches here tries to apply patterns for analysis nor they analyse the possible pathological behaviour of a network.

3

PALADIN

We developed the PALADIN approach to discover and predict disturbances in digital social networks with patterns on the basis of the Actor-Network Theory (ANT) and a pattern language. A pattern of disturbance in PALADIN stands for a certain type of perceived behaviour or social structure in the digital social networks that is or has a potential to become a factor which will deflect the network from its normal activities. We adapted the structure of the Alexandrian patterns [1] to describe the patterns of disturbance. A pattern has a name, disturbance, description, forces, force relations, solution, rationale, and pattern relations. The name of the pattern is a short but descriptive name. Disturbance is a condition which indicates the existence of the pattern in a social network. The description explains the problem to which the pattern provides a solution and the settings when the pattern occurs. A force represents an actor relevant to the disturbance. The force relation corresponds to a relation between actors included in the pattern as forces. The solution contains the actions which are proposed to be carried out in the situation to which the pattern refers. The rationale is used for reasoning about the forces and the disturbance. It may include examples, stories of past successes or failures. The pattern relations show if the pattern being currently defined has anything in common with other patterns. The pattern relations are important for the structure of the pattern language. 3.1

ANT Model of the Digital Social Networks

Based on the ANT, we define a model of the digital social networks. It provides a taxonomy which formalizes the networks. The actor in the model stands for a human or an object without any distinction. An actor may have a set of properties and relations. Any set of social, technical, textual, and conceptual actors involved in a certain activity forms a network. The network can be considered an actor as well and as every actor the network may have properties and relations. Apart from the network, there are three special types of actors which compose a social network. The member stands for an existing person or a sub-community. The medium enables the members to perform certain activities among which the most important are establishing communication links and exchanging information. The artifacts are objects created by the members using some media.

460


In PALADIN a digital social network is a set of actors along with their relations. It has properties which can be using in the definition of the patterns of disturbance. Every network supports certain types of media. They influence how the communication links between the members are created. Using the members as edges and the communication links as vertices, we build a graph G = (V, E). We define properties for the network and the members which have their roots in the Social Network Analysis (SNA). The most important network properties are the small world [21] property and the scale free network [2, 3] property. In networks which are both small world and scale free, there are members which are “hubs” in the network [18]. Communication inside the network depends much on them which might be a source of disturbance. A medium or a set of media is the basis of digital social networks. The media influence the networks, their structures and the way how the communication is established. Every type of medium defines how the communication links between the members are built. The media which have been considered in PALADIN are email, mailing list, blog, transaction-based web site, wiki, url, and chat room. The communication between members is realized with the exchange of artifacts. An artifact is created by a member in a certain medium. The artifacts represent the information circulating in the digital social network. The types of artifacts analyzed in PALADIN are message, thread, burst, conversation, blog entry, comment, web page, transaction, and feedback. The properties mostly used for the pattern definition are the author and the date of creation. Most of the patterns are defined following the non-formal perception or intuition. An exception is the burst which is a conceptual artifact based on the activity in the network for a given time period. It can be used for detecting topics which appear, gain popularity, and then fade. In table 1, it can be seen what kind of artifacts are supported by each media.

Email Mailing Blog Transaction Wiki Chat URL List based Web Site Room Message + + Thread + + Burst + + + + + Conversation + Blog Entry + Comment + + + Web Page + + Transaction + Feedback + Table 1: Media and Artifacts


461

The basic social unit which forms a digital social network is the member. Members have properties which are derived from SNA. They describe their positions and are used to estimate the importance of the members in the network. These properties include the three types of centrality - degree centrality, closeness centrality and betweenness centrality [11, 13, 8]. The social capital of the members can be estimated with the existence of structural holes [12]. The measure describing them is the efficiency [5]. These properties can be addressed in the patterns and with their help situations with potential for disturbances can be detected. According to the activity of the members and their patterns of communication, they may be ascribed different roles [14, 20, 6, 10]. Questioner, answering person, troll, conversationalist and spammer are the roles used in PALADIN. Questioner is a member, who seeks help and information, the answering person provides answers to the raised questions without going into discussions, the conversationalist is a member who actively participates in discussions. The ANT model tries to encompass structures and properties of digital social networks, looking not only at the member network as the existing projects but taking into account the media, the artifacts, and the relations among them. 3.2

Pattern Language

The pattern language includes pattern structure definition, the Formal Expression Language for Patterns (FELP) and an algorithm for application of the patterns in digital social networks. Formal Expression Language for Patterns (FELP) is the language used for defining formally a disturbance in the patterns. FELP consists of variables and rules for constructing formal expressions. It supports logical, arithmetical, comparison, and aggregate operations. A variable may be a simple one or a property. The simple variables are bound to actors in the network or are set as pattern parameters, the properties are bound to properties or relations of actors. The logical operations include conjunction, disjunction, negation, the universal and the existential quantifiers. The arithmetical operations are addition, subtraction, multiplication, and division. Equal, non equal, greater, and less comparisons are allowed also in FELP. The aggregate operations are sum, average, and count. They are executed over a set of variables which satisfy a given condition. When the disturbance expression is defined the variables troll, thread and message are bound to actors from the model of the digital social networks. During the pattern evaluation process, these variables are substituted with real actors from a given digital social network. The values for the disturbance expression for every possible substitution of the variables is computed. If at least one of them is positive then a disturbance in the social network has been discovered. The forces and the force relations in that pattern instance correspond to the

462


actors and their relations used in the substitution which has given the positive result. The actions described in the solution of the pattern may be executed in order for the network to overcome the disturbance. A sample set of patterns has been defined and tested against social networks built from the available database sources. The basis for the patterns are enquiries about USENET groups [10, 20]. The patterns reflect the existence of trolls, spammers, conversationalists, questioners, answering persons, bursts and structural holes in the digital social networks. As an example of a pattern of disturbance let us examine the troll pattern. The pattern demonstrates the expressive power of FELP very intuitively. In a mailing list, any person who post only in threads started by herself or himself is considered a troll. The FELP disturbance expression looks like: (∃[troll|(∃[thread|(thread.author = troll)∧ (count[message|(message.author = troll)∧ (message.posted = thread)])> minP osts])∧ (¬∃[thread1 , message1 |(thread1 .author1 = troll)∧ (message1 .author = troll ∧ message1 .posted = thread1 )])]) 3.3

PALADIN Implementation

In summary, PALADIN is a Java-based application implemented on the basis of the ANT model and the pattern language. It offers a web interface for defining, browsing and storing patterns, actors and actor properties. The repository used for them is a eXist XML database management system. PALADIN includes a general network implementation which focuses on the structural properties of the members. Two specific implementations of social networks, the one based on mailing lists, the other on a chat room, are created using data from other research projects. The data have been stored in a IBM DB2 database. For the visualization of the networks, JUNG framework has been deployed. PALADIN implements the pattern application algorithm including the interpretation of the disturbance patterns for every possible substitution of actors. The first results from pattern-based knowledge discovery approach showed that it may uncover relations between actors, properties and phenomena in the digital social networks which might appear at first glance be considered independent of each other. 3.4

Results

The case study has been performed with 8 patterns of disturbance over 119 social network instances. In these networks over 17.000 individuals have exchanged more than 200.000 messages for the last three years. The number of identified


463

disturbances are presented in table 2. Even if we have to filter the results manually for false positives, the numbers show that disturbances in digital social networks are quite common and their existence may be used to start individual of network wide learning processes.

Disturbance Number of networks No Answering Person 61 No Questioner 67 No Conversationalist 76 Existing Spammer 86 Existing Troll 2 Existing Burst 22 Existing Structural Hole 67 Existing Independent Discussions 13 Table 2: Number of disturbances identified in 119 networks

4

Conclusions & Outlook

Communities increasingly use digital media such as email, websites, blogs, forums, chat rooms etc. to exchange knowledge. There are research projects focusing on the analysis of the digital social networks. However, there is no unified approach which allows modeling of the various types of media in use nowadays. Furthermore, the predictive power of patterns presenting knowledge gathered thought observations has been neglected. PALADIN has demonstrated an approach for knowledge discovery, stressing on the disturbance in digital social networks using patterns. The results of the application of the troll, spammer, questioner, answering person, and structural hole patterns show interesting results and dependencies. Still, there is a need for human filtering of the results. To enable a better analysis of the outcome from the pattern application, an appropriate methodology for the visualization of the disturbances in their multidimensional contexts must be developed. PALADIN can be extended in several directions. The ANT model defines an archetype of the digital social networks. It can be used as a basis for a common ontology which describes comprehensively the characteristics of such networks. Further research can be conducted in the area of identifying patterns of disturbance and storing them using the pattern language. A step forward will be the combination of PALADIN with simulation of a social network. So, future developments and disturbances might be predicted early enough.

464


Acknowledgements This work was supported by German National Science Foundation (DFG) within the collaborative research centers SFB/FK 427 “Media and Cultural Communication” and by the 6th Framework IST programme of the EC through the Network of Excellence in Professional Learning (PROLEARN) IST-2003-507310. We would like to thank our colleagues Luise Springer, Arno Wolter, Lutz Ellrich and Dominik Schmitz for their cooperation.

References 1. C. Alexander. A Pattern Language: Towns, Buildings, Construction (Center for Environmental Structure Series). Oxford University Press Inc, USA, 1978. 2. A.-L. Barab´ asi. The physics of the web. Physics World, 14:33–38, 2001. 3. A.-L. Barab´ asi. Linked: The New Science of Networks. Perseus Publishing, 2002. 4. M. A. Boudourides, M. Mavrikakis, and E. Vasileiadou. E-mail threads, genres & networks in a project mailing list. In Internet Research, 2002. 5. R. Burt. Structural Holes: the Social Structure of Competition. Harvard University Press, 1992. 6. K. S. Cheung, F. S. Lee, R. K. Ip, and C. Wagner. The development of a successful on-line community. International Journal of The Computer, the Internet and Management, 13:71–89, 2005. 7. de Souza, Dourish, Redmiles, Quirk, Trainer, and From. From technical dependencies to social dependencies. In Social Networks Workshop at the CSCW Conference, 2004. 8. A. Degenne and M. Fors. Introducing Social Networks. SAGE Publications, 1999. 9. L. Dodds. An introduction to foaf. xml.com, 2004. 10. D. Fisher, M. Smith, and H. T. Welser. You are who you talk to: Detecting roles in usenet newsgroups. In Proceedings of the 39th Hawaii International Conference on System Sciences, 2006. 11. L. Freeman. Centrality in social networks. conceptual clarification. Social Networks, 1:215–239, 1979. 12. M. Granovetter. The strenght of the weak ties. American Journal of Sociology, 78:1360–1380, 1973. 13. R. A. Hanneman. Introduction to Social Network Methods. University of California, 2001. 14. T. Madanmohan and S. Navelkar. Roles and knowledge management in online technology communities: an ethnography study. International Journal of Web Based Communities, 1:71–89, 2004. 15. P. Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Journal of Web Semantics, 2005. 16. J. O’Madadhain, D. Fisher, S. White, and Y. Boey. The jung (java universal network/graph) framework. Technical report, University of California, 2003. 17. R. Parikh. Towards a theory of social software. In Sixth International Workshop on Deontic Logic in Computer Science, 2002. 18. A. Scharnhorst. Complex networks and the web: Insights from nonlinear physics. The Journal of Computer-Mediated Communication, 8:4, 2003. 19. C. Shirky. Social software: A new generation of tools. Esther Dyson’s Monthly Report, 10, 2003. 20. T. C. Turner, M. A. Smith, D. Fisher, and H. T. Welser. Picturing usenet: Mapping computer-mediated collective action. Journal of Computer-Mediated Communication, 10:7, 2005. 21. D. J. Watts. Small Worlds: The Dynamics of Networks Between Order and Randomness. Princeton University Press, 1999.

PALADIN: A Pattern Based Approach to ... - Semantic Scholar

PALADIN: A Pattern Based Approach to ... - Semantic Scholar

Suggest Documents

A Pattern-based Approach for Building Reusable ... - Semantic Scholar

A Pattern based Approach for constructing ... - Semantic Scholar

A Texture Based Pattern Recognition Approach to

A SIMULATION-BASED APPROACH TO ... - Semantic Scholar

a pattern-based approach - CiteSeerX

A multivariate approach to the association pattern ... - Semantic Scholar

A Data Streaming Approach to Pattern Recognition ... - Semantic Scholar

A Preamble Pattern Identification based ... - Semantic Scholar

A model-based approach to video-based eye ... - Semantic Scholar

An Automata-Based Approach to Pattern Matching

OPTIMIZATIONâBASED APPROACH TO PATH ... - Semantic Scholar

A Corpus-based approach to generalising a ... - Semantic Scholar

Pattern Recognition Approach in Multidimensional ... - Semantic Scholar

Pattern-based approaches to semantic relation ... - Semantic Scholar

Pattern-Oriented Approach to Software Process ... - Semantic Scholar

A COMPONENT-BASED APPROACH FOR ... - Semantic Scholar

A Statistical Features Based Approach - Semantic Scholar

a context-based approach - Semantic Scholar

A Pattern Based Approach to Answering Factoid ... - Computer Science

A Pattern-Based Approach to Model Software Performance

A Connection Pattern-based Approach to Detect Network ... - CiteSeerX

A pattern based approach to defining the dynamic infrastructure of

A Pattern-Based Approach to Model Software Performance

A Pattern-based Approach to Business Process Modeling ... - CiteSeerX

PALADIN: A Pattern Based Approach to ... - Semantic Scholar