Mining for \Consistent" Relationship Structures Srinath Srinivasa1 Myra Spilliopoulouy Brandenburgische Technische Universit at, PF 101344, Cottbus, Germany
[email protected] y Institut f ur Wirtschaftinformatik, Humboldt-Universitat zu Berlin, Spandauer Str., 1, 10178 Berlin, Germany
[email protected]
1 INTRODUCTION Internet based commercial activities have shown to adopt a fundamentally dierent set of equations compared to conventional commerce [1]. Some of the hallmarks of internet commerce are the intangible nature of the commercial activity and the presence of a constantly changing environment. In this context, it becomes necessary to look for relationship patterns in order to identify hidden models of behavior. Most approaches towards identi cation of hidden patterns rely on an aggregate measure for the support of a particular assertion. However, when identifying patterns of traversals, interactions or any other relationship structure, it is not sucient to rely on aggregate measures alone. We would also require information about how consistently the patterns have manifested themselves. In this paper we introduce the notion of consistency for relationship patterns that are mined from log les. In an intuitive sense, the consistency of a particular relationship between two nodes is calculated by introducing a measure called the relationship \level". The relationship level is an indicator of how much the relationship is preferred, despite the cost in executing the relationship. This work is part of our bigger eort to provide support for managing collaborations among a set of independent entities. The proposed model for consistent patterns can nd many applications like redesigning web sites, identifying opportunities for strategic collaboration, designing work ows for virtual organization structures and modeling the behavior of systems of autonomous entities{ like internet based marketplaces. This research was supported by the German Research Society, Berlin-Brandenburg Graduate School in Distributed Information Systems (DFG grantno. GRK 316). 1
2 CONSISTENT RELATIONSHIPS The problem domain consists of a set of nodes (which could be either sites, or pages or any similar logical entity) and a set of log les which depict traversals or interactions or any other kind of relationships that have been recorded across these nodes. Mining for patterns of relationships among the nodes would essentially involve construction of a graph structure depicting the materialized relationships. In a traditional sense, mining for such patterns would involve computing an aggregate measure of the number of manifestations of the dierent kinds of relationships and providing a value of support for each of the dierent relationships. However, for making decisions like redesign of web sites or planning strategic collaborations, one would require information not only on the number of manifestations, but also on the consistency of such manifestations. The notion of consistency proposed here is based on a cost factor that is applied between any two nodes. The essential idea behind the model is to mine for patterns which have at least a minimum support, despite the cost of manifesting the relationships. Mathematically, the problem domain is depicted by a graph, whose vertices are the set of nodes being considered, and the edges, or the relationships among the vertices are the traversals recorded in the log les. In order to record the changes over time of the set of traversals and also in the set of nodes, the graph is sampled at equidistant time intervals [0; i; 2i; :::]. The set of nodes and the relationships between them at the end of time interval t is represented by a time-indexed graph G = (V ; E ), where V = fn1 ; n2 ; :::; n g is the set of all nodes of G at time t, and E is a set of tuples of the form (n ; n ; cdegree ; clevel ); n ; n 2 V , which indicates relationships between the nodes at time t. The term cdegree contains information about the number of manifestations of the relationship that has been recorded in time interval t, and the term clevel contains a measure of the relationship \level" or consistency of the relationship. Figure 1 depicts the concept of relationship level in an intuitive manner. Mathematically, cdegree of a relationship R is the number of manifestations of R in time t, normalized over the entire set of relationships manifested in t
t
t
t
k
t
i
j
t
t
i
j
t
t
t
t
Manifestation of Relationship R Relationship "level" of R
Cost of executing R
Figure 1: Relationship level t.
And the relationship level of R is calculated as{ clevel0 = (1 ? ) cdegree0 clevel = clevel ?1 + (1 ? )cdegree where 0 1 called the \relationship overhead", is a heuristic measure of the cost involved in relating the two nodes. At any given point in time, relationship levels of edges in the graph provides information on how much each of the edges are preferred, relative to their cost. Using this notion, the entire graph can now be partitioned into dierent relationship clusters, based on an interestingness criteria or threshold value of relationship level. A level-n cluster of G is the graph consisting of all nodes of G and only those relationships whose relationship level is at least n. While a level-n cluster provides information about the consistency of each individual relationship, it provides no information about the consistency of the cluster itself. In order to obtain a measure of the consistency of relationship clusters, the concept of relationship level is again applied to sets of relationships. The cohesiveness of a cluster is de ned to be the average of relationship levels of all the relationships in the cluster. A measure of the consistency of the structure depicted by a level-n cluster is hence determined by the cohesiveness \level" of the cluster. This term is called as \degree of organization" of the relationship cluster, and is calculated as follows{ OD0 (G0 ) = (1 ? ) CG0 (G0 ) OD (G0 ) = OD ?1 (G0 ) + (1 ? ) CG (G0 ) where, OD is the degree of organization, CG is the cohesiveness value for the cluster, and 0 1 called the \organizational inertia", is a heuristic measure of the cost involved in forming a new organizational structure out of the set of nodes. The value of may or may not be dependent on the values of each individual relationship. t
t
t
t
t
t
t
t
Hence a level-n relationship cluster with a degree of organization value of k would not only provide information about how consistently relationships in the cluster are preferred, but also on how consistently this structure itself is maintained. Using the above paradigms consistent interaction structures can be discerned from transaction data. In [3] the above paradigm is applied to messages of a newsgroup to determine how users of the newsgroup interact with one another, in terms of replying to each other's messages. The experiment revealed a set of three users who tended to interact among themselves and hence had formed an eective sub group.
3 CONCLUSION This paper introduced the notion of consistency in mining relationship patterns. The proposed model can be adapted to problems of mining for relationship patterns in dierent applications. Presently we are also applying the paradigm for nding consistent navigation paths in web logs, based on the paradigms for web log mining introduced in [2]. From a more general perspective, the paradigms proposed here are used to address systems of autonomous players, which we call \ad hoc" systems, where each player of the system has its own individual goals, and players interact with one another to achieve their goals. The task is to design services which can cater to such a system of interactions. The notion of consistent interaction patterns is used to analyze the interaction structures for designing new services.
REFERENCES [1] M. Goldhaber. The Attention Economy and the Net. First Monday, Vol 2, No 4, April 2 1997, http://www. rstmonday.dk/ [2] M. Spiliopoulou and L. C. Faulstich. WUM: A Tool for Web Utilization Analysis. In EDBT Workshop WebDB'98, Valencia, Spain, March 1998. [3] S. Srinivasa and M. Spiliopoulou. Degree of Organization: Modeling Collaboration Structures in Information Marketplaces. Technical Report, Brandenburgische Technische Univer-
, I-05/1999.
sit at