Detection of invisible nodes in a homogeneous

0 downloads 0 Views 543KB Size Report
Abstract. Crystallization algorithm to detect invisible nodes in a homo- .... person starts talking and a conversation takes place among neighboring per- sons.
Detection of invisible nodes in a homogeneous network Yoshiharu Maeno1 , Kiichi Ito2 , and Yukio Ohsawa3 1

Graduate School of Systems Management, Tsukuba University, 3-29-11 Otsuka, Bunkyo-ku, Tokyo, 112-0012 Japan, [email protected] 2 Graduate School of Media and Governance, Keio University, 5322 Endo, Fujisawa-shi, Kanagawa 252-8520 Japan 3 School of Engineering, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-8563 Japan

Abstract. Crystallization algorithm to detect invisible nodes in a homogeneous network is studied. The homogeneous network does not possess very large hub nodes. The portion of the data generated from a simulation model, where the invisible nodes are likely to exist, is identified with good precision. This demonstrates that the human-interactive annealing along with the crystallization algorithm can be applied to homogeneous networks as well as inhomogeneous networks.

1

Introduction

The researchers of chance discovery have recently recognized a new class of problems where previous tools such as KeyGraph [Oh06] fail to visualize; discovery of important events neither visible nor observed as data. Unobserved events often play an important role in the dynamics of observed events. Such invisible relevant events are named dark events after dark matter in cosmology. The dark events are essential to understanding communication between employees invisible from a manager in a company, investigating unnoticed clues for crime cases, or harnessing unexpected attacks from terrorist networks. human-interactive annealing along with crystallization algorithm is a new method [Ma07a], [Oh05], where the location of dark events are inferred and visualized based on the weak higher-order mixture between independent clusters within observed data. Characteristics of the algorithm for scale-free network were studied in [Ma06a]. Demonstration of the human-interactive annealing for detecting a hidden leader, inventing new ideas for technology development, and fostering communication by introducing new catalyst personality were described in [Ma06c], [Ma06d], [Ma07b]. The experimental condition for the above studies, however, results in focusing on the relationship between events, characterized by an inhomogeneous network such as Barabasi-Albert model [Ba99]. The inhomogeneous network possesses a few hub nodes. The hub nodes have a very large nodal degree, which contributes to the efficient connection between individual nodes. The homogeneous network, in contrast, does not possess very large hub nodes. Terrorists [Po06] are rather

II

characterized by a homogeneous network, while conventional business organizations are characterized by an inhomogeneous network. In this paper, we study crystallization algorithm to detect invisible nodes in a homogeneous network. The portion of the data generated from a simulated homogeneous netework, where invisible nodes are likely to exist, is identified with the algorithm.

2

Crystallization algorithm

This section describes the crystallization algorithm briefly [Ma07a]. Observed basket data bi = {ej } shall be the input. The contents of the basket are a set of events, grouped under a specific subject. At first, events are identified, that is, the all events in the basket data bi are picked up. An individual event is denoted by ej . Then, the events are clustered into groups based on co-occurrence as a measure of similarity between events. Jaccard coefficient for the all pairs of events is calculated from eq.(1) as a measure of the co-occurrence. Ja(ei , ej ) ≡

Freq(ei ∩ ej ) . Freq(ei ∪ ej )

(1)

Here, we employ k-medoid clustering algorithm for simplicity [Ha01]. It is an EM algorithm similar to k-means algorithm for numerical data. A medoid event emed(j) is an event locating most centrally within a cluster cj . They are initially selected at random. Other |E| − |C| events are classified into the clusters based on the closeness to the medoids. A new medoid is selected within the individual cluster so that the sum of closeness from events within the cluster to the modoid is maximal. The closeness is evaluated by the eq.(2). This is repeated until the medoid converges. The resulting clusters are denoted by cj . ∑

Clj ≡

Ja(emed(j) , ej ).

(2)

ej 6=emed(j) ,ej ∈cj

Finally, dark events are crystallized from higher order mixture between independent clusters. A dummy event DEi is inserted into a basket bi , that results in bi → {ej , DEi }. The index i can be used to identify the basket where the corresponding dummy event was inserted. The dummy event may represent a set of invisible participants to the basket. The higher-order mixture of independent clusters is calculated from the co-occurrence between the dummy event and the events in the clusters within an individual basket. The co-occurrence is evaluated according to eq.(3). The dummy events which have large co-occurrence are picked up as candidate basket data to locate missing nodes. |C|−1

Co(DEi , C) ≡

∑ j=0

max

ej ∈cj ,ej ∈bi

Ja(DEi , ej ).

(3)

III

3

Simulation study

We present a basic evaluation of the crystallization algorithm using test data generated from a homogeneous scale-free network. The scale-free network is a commonly used model to describe human’s communication, relationship or dependence in social problems [Ba99]. 3.1

Simulation model

Fig.1 shows a homogeneous scale-free network having 995 nodes. This network is used in the experiment. As a whole, (1) the network seems to have two large groups: larger one on the left, and smaller one on the right, (2) smaller groups seem to exit inside the two large groups, (3) the boundary between the groups is not clear, and (4) most importantly, the network does not include large hub nodes providing a center facility between nodes. Most nodes have similar characteristics in terms of nodal degree. This is confirmed by Fig.2. The homogeneous network is compared with a typical inhomogeneous network (Barabasi-Albert model) having 1000 nodes. The inhomogeneous network tends to contain centrally located hub nodes. The hub nodes influence the way the network operates. However, random deletion of events has little effect on the network’s connectivity and effectiveness. In both networks, the occurrence frequency distribution of nodal degree is ruled by a power law, that is, scale-free nature. Both networks have an average nodal degree of 4. The nodal degree of the homogeneous network, however, ranges from 3 to 8, while hub nodes having degree of 10 to 20 exist in the inhomogeneous network. The objective of the experiment is to evaluate how much information regarding deleted nodes (invisible nodes) the crystallization algorithm can recover from the test data with a measure of precision. The test data was generated in the two steps below. 1. Basket data formation: Basket data representing neighbor nodes is generated from the network. Nodes under a direct influence from an individual node are grouped into a basket. For example, we can imagine a situation where a person starts talking and a conversation takes place among neighboring persons. The area of such influence is specified approximately with the distance from the initiating person. An example basket data is like (954, 1930, 3261, 5093, 5223, 7743, 7808, ...), representing communication chain starting from the node No.954. 2. Latent structure configuration: A latent structure regarding nodes of interest is configured to the basket data by deleting the nodes from the basket data. The structure is made invisible. As a result, the deleted nodes and the links inter-connecting them become a latent structure hidden behind the basket data. The example basket data becomes (954, 1930, 3261, 5093, 7743, 7808, ...) when the node No.5223 is a latent structure being focused in the experiment. This basket data is the input to the crystallization algorithm. The crystallization algorithm attempts to identify this basket data as a candidate where a node may be hidden.

IV

Fig. 1. Homogeneous network consisting of 995 nodes. Nodes around No.5223 are made invisible as a latent structure in the observed data in the experiment.

3.2

Evaluation

We present quantitative performance evaluation to see whether the crystallization algorithm can identify basket data where the nodes were deleted as a correct answer. In information retrieval, precision has been used as evaluation criteria. Precision is the fraction of relevant data among the all data returned by search. Here, precision is evaluated by calculating the ratio of correct dummy events within the dummy events picked up by the crystallization algorithm. The correct dummy events are those which were inserted to the basket data where the events had been deleted. In other words, they are those relevant to understanding the latent structure. A single experimental condition is demonstrated in this paper. Systematic study on various condistions is planned, and will be reported elsewhere. Basket data consisting of nodes within five hops from an individual node is employed in the step 1 described in 3.1. One hop is as long as one edge on the network shown in Fig.1. The number of nodes within five hops is about 20% of the whole node on the average. This is a relatively long distance communication. The latent structure of interest includes fifteen nodes within two hops from the node No.5223. The node No.5223 has a nodal degree of 6. So to speak, they are a group of strategists supporting the organization’s functions. Note that they are, however, not like a CEO governing a whole company. These nodes are deleted

V

Fig. 2. Nodal degree characteristics of the network: (a) inhomogeneous scale-free network consisting of 1000 nodes (Barabasi-Albert model), (b) homogeneous scale-free network consisting of 995 nodes. Both networks have an average nodal degree of 4.

in the step 2 described in 3.1, so that they can be invisible on the basket data for the purpose of experiment. Fig.3 shows the calculated precision to identify the baskets, where a node was made invisible, as a function of the number of picked up baskets. The top 10 baskets picked up as candidates are correct. Precision is good. This shows the crystallization algorithm provides relevant suggestions regarding missing event. The crystallization algorithm can be applied to homogeneous networks as well as inhomogeneous networks.

4

Conclusion

Crystallization algorithm to detect invisible nodes in a homogeneous network is studied. The homogeneous network does not possess very large hub nodes. The portion of the data generated from a simulation model, where the invisible nodes are likely to exist, is identified with good precision. This demonstrates that the human-interactive annealing along with the crystallization algorithm can be applied to homogeneous networks as well as inhomogeneous networks.

References [Ba99] A. L. Barabasi, R. Albert, and H. Jeong: Mean-field theory for scale-free random networks, Physica A, 272, 173-187 (1999).

VI

Fig. 3. Precision to identify the baskets, where a node was made invisible, as a function of the number of picked up baskets.

[Ha01] T. Hastie, R. Tibshirani, and J. Friedman: The elements of statistical learning: Data mining, inference, and prediction (Springer series in statistics). SpringerVerlag (2001). [Ma07a] Y. Maeno, and Y. Ohsawa: Human-computer interactive annealing for discovering invisible dark events, to appear, IEEE Transactions on Industrial Electronics (2007). [Ma07b] Y. Maeno, K. Ito, and Y. Ohsawa: Catalyst personality for fostering communication among groups with opposing preference, submitted to IEA/AIE (International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems). [Ma06a] Y. Maeno, and Y. Ohsawa: Stable deterministic crystallization for discovering hidden hubs, Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Taipei (2006). [Ma06c] Y. Maeno, and Y. Ohsawa: Crystallization highlighting hidden leaders, Proceedings of IPMU (Information Processing and Management of Uncertainty in Knowledge-Based Systems) International Conference, Paris (2006). [Ma06d] Y. Maeno, K. Ito, K. Horie, and Y. Ohsawa: Human-interactive annealing for turning threat to opportunity in technology development, Proceedings of the IEEE/WIC/ACM International Conference on Data Mining, Workshop on Risk Mining, Hong Kong (2006). [Oh06] Y. Ohsawa eds.: Chance discovery in real world decision making. SpringerVerlag (2006). [Oh05] Y. Ohsawa: Data crystallization: chance discovery extended for dealing with unobservable events, New Mathematics and Natural Computation, 1, 373-392 (2005). [Po06] R. L. Popp, and J. Yen: Emergent information technologies and enabling policies for counter-terrorism, IEEE Press (2006).

Suggest Documents