enables the graph-based analysis of multi-modal communication .... we are using feeder applications to extract resource associations from Web-based email list.
IADIS International Conference WWW/Internet 2008
D.STORE: CAPTURING TEAM INFORMATION SPACES WITH RESOURCE-BASED INFORMATION NETWORKS Matthias Uflacker, Alexander Zeier Hasso Plattner Institute for Software Systems Engineering 14482 Potsdam, Germany
ABSTRACT In this paper we outline a practical approach to capturing the communication and information sharing activities of online collaboration groups such as distributed design teams. Following a non-interfering process, we trace digital communication channels such as project wikis and emails for relationships between people and information resources. To this end, we have implemented d.store, a resource-oriented platform to consolidate extracted associations in project teams and to construct semantic information networks as a formal representation for the relationships at project run-time. With that, the platform becomes a significant element in the social construction of common information spaces on the Web and enables the graph-based analysis of multi-modal communication characteristics in online collaboration of global virtual teams. KEYWORDS Distributed Collaboration, Computer-supported Co-operative Work, Information Networks
1. INTRODUCTION Interpersonal communication, i.e. the active sharing and distribution of information represents a pivotal and continuous activity in collaboration groups such as design and engineering teams (Poltrock et al., 2003; Sonnenwald and Lievrouw, 1997). The ability to trace the proliferation of this information and to assess communication characteristics of distributed collaboration presents a growing need for global teams and their management. Having a detailed insight into the who, what and when of a team information space early in the process can expose detrimental characteristics and facilitate the evaluation and improvement of collaborative activities (cf. Bannon and Bodker, 1997). However, measuring and appraising the communication characteristics of social communities is a challenging task. Sharing information in a team is an intrinsically informal and ad-hoc procedure, utilizing multiple communication channels for synchronous and asynchronous information transfer between two or more participants. Inspecting these communication structures usually requires active third-party involvement, long-term observations or intruding methods that impair the work of a team or individual. With the digital footprint of shared information steadily growing, the exploration into computer-supported observation and analysis of team collaboration becomes increasingly feasible. The Internet, and with it the success of the World Wide Web continues to shape the way collaborating groups communicate and exchange information, especially in geographically distributed teams. Email has become the de-facto standard for informal messaging. Wikis, blogs, community platforms, and desktop-like online collaboration tools have entered the enterprise and professional scenarios. More and more project-relevant information is stored and distributed online, becoming instantly accessible to dispersed collaboration partners. At the same time, the growing number of services provided on the Web today creates highly distributed and decentralized chunks of project-relevant information. Related information is often disconnected from cohering objects and its context. To address this issue and to increase the comprehensibility of team collaboration and information sharing practice, we have implemented d.store, a resource-oriented service platform to capture and provide central access to de-central information and associated relationships on the Web. The aim of this work is to leverage existing or generated online team information and communication traces in order to recreate a global picture
395
ISBN: 978-972-8924-68-3 © 2008 IADIS
of the information sharing activities in collaborating groups with no additional interaction required from the team. Associations between people and/or information shared over online communication channels such as email and project wikis are extracted from diverse sources and consolidated into semantic social information networks. We define a corresponding graph structure to formally encode relationships between people and information entities based on domain ontologies and to provide a project-centric view on a team's digital information space. This representation enables semantic reasoning about and the graph-based analysis of team communication behavior across multiple communication channels over time.
2. SOCIAL INFORMATION NETWORKS Social information networks define a set of typed nodes that are connected via directed, typed associations. They capture relations between both, people and information, expressing arbitrary associations between individuals in a team and/or information resources. We give a formal definition: Let class: V → CV be a mapping between nodes v ∈ V and node types c ∈ CV, assigning one or more node types to every node v. We then define a social information network IN as a directed graph GIN := (CV, V, CE, E, class), where CV is a set of node types and V is a set of nodes in the network with class(v) ∈ CV for v ∈ V. CE is a set of association types, E represents the set of edges in a network. An edge e ∈ E is defined as a 5tuple 〈s, p, o, ts, te〉 where s ∈ V is the source node of the edge, p ∈ CE represents the association type, o ∈ V is the target node, ts ∈ Ν is a timestamp to mark the beginning of an edge's validity, and te ∈ Ν ∪ ∞, te > ts denotes the expiration of an edge. Hence, edges describe time-annotated relationships between nodes, where every edge e defines a directed association of type p ∈ CE between subject s and object o with s, o ∈ V. The two parameters ts and te mark the life span of an association between two nodes. The first value appoints the point in time at which the association statement started to hold true, the second one is set to the time of becoming invalid or to infinity during its validity period. This way, we can allow the traceability of an information network’s evolution over the course of a project and to investigate previous states of an information space. A type (c ∈ CV) of a network node defines the class of the information object it represents. Note that CV contains at least the node type person to define the class of nodes representing human individuals participating in the collaboration. The additional node types distinguish between different classes of information objects such as wiki pages, shared documents or email messages.
Figure 1. Schematic representation of an information network
Figure 1 shows the basic concepts behind social information networks. The example is defined by two node types c1, c2 ∈ CV and four nodes a - d ∈ V with class(a|c) = c1 and class(b|d) = c2. Four edges with types p1 - p4 ∈ CE represent arbitrary associations between different types of nodes. The temporal properties of associations are not visualized in the figure.
3. DESIGN AND IMPLEMENTATION OF D.STORE To validate the feasibility of our approach, we have implemented d.store, a resource-oriented platform for the construction of social information networks. Building on the architectural principles of representational state transfer (REST) (Fielding, 2000), the platform provides a resource-based service interface for creating, reading, and updating information networks. Every network and every node in a network is represented by a
396
IADIS International Conference WWW/Internet 2008
resource, uniquely identifiable by an URL. In the case of network nodes, resources represent the context and meta data for analyzed resources, containing their relationships to persons and other resources. The functionality of the platform is provided by an uniform resource interface defined by the HTTP/1.1 protocol standard (Fielding et al, 1999): Get, Put, Post and Delete operations provide the application logic required for the inspection and manipulation of networks, nodes, and individual relationships. The data model of the platform builds on the Resource Description Framework (RDF) (W3C-2, 2004 (2)) and the OWL vocabulary extension (W3C, 2004) to store nodes and associations in information networks. To reflect the temporal evolution of a network, the traditional data schema of triple stores has been extended with two additional timestamps, marking the validity period of a relationship statement in a graph. Hence, deletions and updates do not physically override or remove data from the persistency layer, but are marked as invalid by setting the secondary timestamp to the time of update. Passing time parameters to model requests allows querying the previous states of networks. The type of a network node is determined in RDF by making the node an individual of a corresponding resource class (via the RDF type property). The available resource classes, along with relevant relationships that can be identified by parsing the content or server-side traces of shared information are formulated in domain-specific ontologies. Currently, four different node types along with their associations are defined and supported by the platform: Persons, Emails, Web Resources, and Wiki Resources as a specialization of the former. Figure 2 gives an overview of the platform architecture.
Figure 2. Overview of the d.store architecture
The REST API provides stateless access to the network resources managed by the platform. It supports multiple request and response entity representations such as RDF-XML, JavaScript Object Notation, and HTML. The media type of the server response is negotiated between client and server, as specified by the HTTP/1.1 protocol standard, but can also be explicitly selected via request parameter. We have developed a number of feeder applications to scan sources for online team communication activities like email archives, wiki pages, and server log files to extract associations between resources and human actors. Identified resources are posted along with the identified relationships to the platform. At the time of writing, we are using feeder applications to extract resource associations from Web-based email list archives, project wikis, and activity logs of shared online folders for a number of global virtual teams under observation. With that, we are able to automatically extract and formalize direct and inferred associations between heterogeneous project-relevant resources in an RDF-based model. The set of relationships encoded in the networks is far-ranging, covering sender/receiver/replier relationships between persons and emails, author relationships in project wikis, but also associations between information resources from different communication channels (e.g. an email refering to a wiki resource). Continuously receiving information from the feeders, the platform consolidates facts and relationships as specified in the resource ontologies. Based on the assignment of actor ID’s (e.g. email addresses and wiki names) and a set of mapping rules, the platform infers direct relationships between project participants and information resources. Triggered by updates in the network, a rule engine executes platform rules that exist
397
ISBN: 978-972-8924-68-3 © 2008 IADIS
e.g. for the automatic inference of relationships between emails sent to an email list and people subscribed to that list. For brevity reasons, only a very limited description of the REST interface and parameters can be given here. We give an example for the application logic provided by a resource representing a network node in table 1. Table 1. Interface of a node resource URL pattern: //resources/ GET Get a representation of the network node containing meta information about the resource it describes (e.g. URL, outgoing relations) PUT Update the state of a node, manipulating its set of existing relationships POST Create a new property or association between this node and a target node DELETE Remove the node from the network
4. APPLICATION AND PRELIMINARY RESULTS Starting to employ the platform for the analysis of multi-modal communication behavior, we used data collected from eleven global engineering projects, which were running for a period of nine month. The projects were placed in a joint academic partnership between Stanford University and six global institutions, with each project team distributed and composed out of two groups of three students. Teams were set up multi-disciplinary, with involved disciplines ranging from mechanical engineering, software engineering, to product and industrial design. The eleven project teams were working independently on prevailing engineering design tasks that were given out and accompanied by corporate liaisons under realistic project conditions, budget and time constraints. The design process involved early need finding activities, user observations, iterative prototyping and evaluation, and closed with a fully functional and documented prototype handed over to the responsible liaison. The organization of individual team processes with regard to scheduled meetings, presentations to the liaisons, and the definition of work packages was fully under the responsibility of the students. However, in addition to the similar team structures and budget constraints, all projects were synchronized in terms of start and end dates, major milestones and documentation deadlines, qualifying them for the comparison of their information sharing activities. The teams have been provided with dedicated IT infrastructure to support synchronous communication (video conferencing, multi-user desktop systems) and asynchronous communication (email lists, project wikis, shared document spaces) with their peer team members off-site. Email archives and server log files provided insights into asynchronous information sharing activities via email and project wikis, of which both technologies turned out to be highly-adopted tools in the observed projects. We started to derive social information networks out of approx. 8700 emails (containing more than 2900 hyperlinks and 1700 file attachments), 1200 wiki resources and shared documents in public online folders. The average ratio between emails being sent and the number of direct associations to secondary information objects submitted in the form of attachments and hyperlinks was approx. 1 to 0.6. Obviously, email messaging is commonly used as a tool to share information that is not only encoded in the message itself, but is provided in files or in external information resources on the Web, supporting the relevance of having a multi-modal, multi-channel view on team communication processes. With the import of data from multiple projects, we have created semantically rich social information networks with each one connecting nodes in the range of 1,000 resources or more. The average amount of time-annotated triple statements per resource was more than seven statements. This does not consider inferred statements, which again increases the effective number of relationships considerably. First results show that the performance of the platform, and especially the inference engine has proven to fully scale up to our analysis needs. Current steps comprise the visualization and combined analysis of network structures and content of information in order to reveal hidden characteristics of team communication in global projects.
398
IADIS International Conference WWW/Internet 2008
5. CONCLUSION AND OUTLOOK This work has motivated a non-interfering approach to assess the online information sharing activities of collaboration groups. We have argued that the perceptiveness of complex communication processes in distributed collaboration needs to be enhanced in order to increase context awareness and observability of the exchanged information. To this end, we have proposed a graph-based approach in which associations identified in online information sharing are captured and consolidated semantic networks are derived. By processing data that is generated through the use of accepted collaboration tools, no additional interaction and overhead is introduced to the workflow of process participants. The consolidated information can be leveraged instantly to retrieve information and to enhance context awareness in collaborative design activities at project time. At the same time, social information networks present a basis for a structured graph analysis, promoting research into information sharing practices and communication behavior of distributed teams. The d.store platform offers a resource-oriented service interface to social information networks, providing central access to context and relationships of de-central information objects and individuals. Our work continues with a detailed exploration of the collected data and communication signatures that we can derive out of it. In particular, we will use the graph model to apply algorithms that identify characteristic elements in the communication behavior of global, virtual teams and will analyze the potential impact that these signatures have on the team performance.
REFERENCES Bannon, L. and Bodker, S, 1997. Constructing common information spaces. Proceedings of the fifth European Conference on Computer-supported Cooperative Work, Norwell, MA, pp. 81-96. Fielding, R. et al, 1999. Hypertext Transfer Protocol – HTTP/1.1. RFC Editor, The Internet Engineering Taskforce, http://www.ietf.org/rfc/rfc2616 Fielding, R., 2000. Architectural styles and the design of network-based software architectures. Ph.D. dissertation, University of California, Irvine. Poltrock, S. et al, 2003. Information Seeking and Sharing in Design Teams. Proceedings of the 2003 International ACM SIGGROUP Conference on Supporting Group Work. New York, NY, pp. 239-247. Sonnenwald, D.H. and Lievrouw, L.A., 1997. Collaboration during the design process: A case study of communication, information behavior, and project performance. ISIC ’96: Proceedings of an International Conference on Information Seeking in Context. London, UK, pp. 179-204. W3C, 2004. OWL Web Ontology Language Reference, W3C Recommendation, http://www.w3c.org/TR/owl-ref/ W3C, 2004 (2). Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommendation, http://www.w3c.org/TR/rdf-concepts/
399