Jan 13, 2014 - Abstractâ National security is one of the biggest challenges of the world ... Limitations of traditional static social network analysis to be applied ...
Journal of Industrial and Intelligent Information Vol. 2, No. 2, June 2014
Hidden Members and Key Players Detection in Covert Networks Using Multiple Heterogeneous Layers Wasi H. Butt, Usman Qamar, and Shoab A. Khan National University of Sciences and Technology, Islamabad, Pakistan Email: {wasi, usmanq, shoabak}@ceme.nust.edu.pk
cannot predict the associations between important individuals. Important individuals in a network are commonly known as key players. The key players act as hub between many other individuals. They are most important in the network because most of the communication is done through them. If they are accurately identified and extracted, the whole subgroup or even the group can be destabilized and the accurate identification is not possible by observing just one medium of interaction. The idea behind the proposed model is to identify a complete terrorist group with the help of one identified individual and its associations through different layers. If key players are detected correctly, the network can be destabilized hence minimizing the threat. Rest of the paper is organized as follows– Section 2 presents some of the contributions made is this area so far by different researchers. The proposed model is presented in detail in Section 3. Section 4 presents the implementation and results of the proposed model while section 5 provides the conclusion and future areas of work.
Abstract— National security is one of the biggest challenges of the world today because of the number of terrorist incidents occurring across the world. It has been seen that suspicious individuals work in very organized groups. They hide themselves between common public and communicate through different media. The whole network is often organized in such a way that all members may not be directly interacting; rather they may be interacting through different mediators and may be using different media for interaction. This type of group is difficult to identify completely because of the presence of mediators and also different media of communication. In order to mitigate the potential risk of terrorist activities, such organized groups are required to be destabilized which is a big challenge for national security organizations. In this paper we propose a new model for detecting such networks. Detection is done by integrating databases having records of associations of suspicious individuals from different domains. Index Terms—covert networks, key player detection, hidden members detection
I.
INTRODUCTION
Among other challenges which world is facing today, one which requires attention is the challenge of national security. A number of efforts have been done by different researchers who have put in their contributions in order to resolve the problem. Over several years, social network analysis (SNA) and other graph-based technologies have been used commonly for finding hidden suspicious networks. [1] - [5]. A similar problem has been tried to be solved and presented in this article. It has been noted that different terrorist attacks in the world, were coordinated by a small, determined networks of individuals who used a number of stratagems to obscure their identities and connectivity [6]. Terrorists are usually organized in small hidden cells deliberate to restrain the chance of being detected [7], [8] The threat of such attacks can be minimized if such suspicious groups and their members are identified in a timely manner. The challenge in identifying members of such networks is that often such groups are not directly linked, i.e. members are not directly connected through same medium of interaction. Only viewing one medium of interaction, we
II.
Bin Zhu et al presented the concept of concept visualization to facilitate the understanding of social network concepts [9]. The authors have implemented their concepts in a tool Net Vizer and have also presented results of experiments proving that their presented system facilitated better understanding of the concepts of betweenness centrality, gatekeepers of subgroups, and structural similarity and also supported a faster comprehension of subgroup identification. Gong Yu presented a social network clustering analysis algorithm [10]. Author has claimed presented algorithm different from traditional clustering algorithms. The proposed algorithm can group objects in a social network into different classes based on their links and identify relation among classes. Nasrullah Memon and Henrik Legind Larsen presented framework of investigative data mining toolkit [11]. The proposed algorithms and techniques have been implemented and applied on some past terrorist attacks including Bali Bombing, 9/11, 1st world Trade Center
Manuscript received July 21, 2013; revised October 14, 2013. 2014 Engineering and Technology Publishing doi: 10.12720/jiii.2.2.142-146
RELATED WORK
142
Journal of Industrial and Intelligent Information Vol. 2, No. 2, June 2014
Nasrullah Memon et al [18] presented detection of critical regions in covert networks taking terrorist networks of 9/11 as a case study. In the research they have presented the study of structural cohesion which they say that was traditionally used only for Social Network Analysis (SNA), but according to them can also be used in numerous other application areas like investigative data mining for destabilizing terrorist networks. The number of actors which if removed from a group would result in the disconnection of group defines the structural cohesion. They also have discussed several structural cohesion concepts namely cliques, n-cliques, nclans and k-plex to conclude familiarity, robustness and reach ability within subgroups of the 9/11 terrorist network. Moreover they have also proposed a methodology of detecting critical regions in covet networks; removing/capturing which nodes will disrupt most of the network. The same researchers have made another contribution [19]. Study and development of new measures, theories, mathematical models and algorithms to detect high value individuals in terrorist networks has been discussed in this contribution. Some specific models and tools have also been described and applied to a case study of 7/7 London Bombing to reveal their applicability to the area. Nasrullah Memon and David L. Hicks introduced investigative data mining technique in the study of terrorist networks and applied it to detect high value individuals [20]. The introduced technique has been applied to the case of 7/7 bombing and has been presented as a case study. Nasrullah Memon, Abdul Rasool Qureshi, Uffe Kock Wiil, David L. Hicks [21] presented area of subgroup detection in terrorist networks. A novel algorithm for subgroup detection has been proposed, and an implemented demonstration system has been presented. The idea presented in this work is that discovery of the organizational structure of terrorist networks leads investigators to terrorist cells. Therefore, detection of covert networks from terrorists’ data is important to terrorism investigation and prevention of future terrorist activity. Alice Paul [22], in his thesis titled “Detecting Covert Members of Terrorist Networks” proposed to model a covert organization as a social network where edges represent communication between members and then determine the subset of members to remove that maximizes the amount of communication through the key leader. Author has presented a mixed-integer linear program representing this problem and also decomposition for this optimization problem. Structural characteristics of vertices and subsets that increase communication have also been discussed. Author is of the view that future work should develop his presented structural properties as well as heuristics for solving this problem as discussed approaches prove impractical for larger graphs, often running out of memory. Belinda A. Chiera [23] in the University of South Australia presented their work. That work takes a first step towards determining how to locate hidden terrorist
1993 and Khobar Tower Bombing in order to construct command structure of the networks. A framework for destabilizing terrorist network has been presented by Kathleen M. Carley et al [12]. Limitations of traditional static social network analysis to be applied for identification and destabilization a covert network which have a cellular structure have been mentioned. Dynamic social network analysis have been proposed as a solution to the problem in which adding dropping of members has also been taken into account. The strategy to destabilize a network has been proposed. The proposed strategy has been recommended to be applied after the study of whole network including detection of key entities. Process of adding/dropping key entities has also been discussed. An algorithm for rapid computation of group betweenness centrality (Betweenness centrality is a measure of a node's centrality in a network which is the number of shortest paths from all vertices to all others that pass through that node.) has been presented by Rami Puzis et al in which authors claimed the computational time independent of the network size [13]. The algorithm is based on the concept of path betweenness centrality, which has also been discussed in the paper. The method has been shown to be used to find the most prominent group. The method has also been applied for epidemic control in communication networks. The method has also been claimed that can be used to evaluate distributions of group betweenness centrality and its correlation with group degree. An improved algorithm to find betweenness centrality has been presented by Ulrik Brandes [14]. The proposed algorithm has been claimed to take O(n + m) space and run in O(nm) and O(nm + n2 log n) time on un weighted and weighted networks, respectively, where m is the number of links. Experimental evidence is also provided in the article. A polynomial-time randomized algorithm for distinguishing high k-path centrality vertices from low kpath centrality vertices in any given (unweighted or weighted) graph has been presented by Tharaka Alahakoon [15]. Author has claimed the proposed algorithms to be faster theoretically and experimentally than the best known deterministic algorithm for computing exact betweenness centrality values (Brandes' algorithm). A multi-grained parallel algorithm for computing betweenness centrality has been proposed by Guangming Tan et al [16]. The proposed method is based on a novel algorithmic handling of access conflict for a CREW PRAM algorithm. Data-processor mapping, a novel edgenumbering strategy and a new triple array data structure has been proposed recording the shortest path for eliminating conflict to access the shared memory. An algorithm named QUBE (Quick algorithm for Updating BEtweenness centrality) has been presented by Min-Joong Lee et al [17]. Authors have claimed that their work is the first work that proposes an efficient algorithm which handles the update of the betweenness centralities of vertices in a graph.
2014 Engineering and Technology Publishing
143
Journal of Industrial and Intelligent Information Vol. 2, No. 2, June 2014
networks through the novel use of group-based social network metrics to characterize the features of hidden networks. Steve Kramer [24] developed and presented a new technique. The proposed algorithms have been implemented in Paragon Network Analysis (PNA) software. Author has claimed their method is more robust to missing or erroneous data as compared to earlier techniques, especially techniques based on traditional centrality measures or on subgroup connectivity. The presented algorithm has been claimed that it can follow the changes in a terrorist cell as it shifts from a hidden “sleeper” state to an “active” state and as a proof of claim; results of tests have also been included. It has been described that the presented software could assist the intelligence community by warning of impending attacks from hidden cells going into action.
to ‘b’, or ‘a’ has sent a text message to ‘b’ or ‘a’ has transferred an amount to ‘b’ account and so on, add neighbors to the suspicious class. Also add association between individual and its neighbor to set A. The process continues until a leaf individual is detected, i.e. an individual that has no further neighbor. Construct graph G where elements of S are the nodes and A contains edges between them. Apply social network analysis concepts (betweenness centrality, gatekeepers of subgroups, and structural similarity) to identify the key players Identification of key players can lead to destabilizing the terrorist network. IV.
III.
PROPOSED MODEL
The proposed model was implemented using visual c#. A database of 0.1 million individuals with 1 million transactions in was selected. Each record of transaction was taken as one association between two individuals and the medium of association. The data was visualized to detect the groups. When only email layer was considered the following associations were detected. Node 1 sent email to node 4, so node 1 has an association with 4, node 15 has association with node 10, node 10 is associated with node 20, node 20 has association with 22. So two groups were detected; one with two nodes i.e. node 1 and node 4 and other with nodes 15, 10, 20 and 22. To detect the key players, degree centrality was chosen and applied which states that the most important node of the network is one which has highest degree [25] i.e. which is connected to maximum links. In our scenario for email layer, the most important node or the key player is the node which sent or received maximum emails during the time span under observation. Following figure shows the associations and the detected groups traversing email layer only. Also applying degree centrality, node 10 and 20, both with highest degree of 2 were identified as the key players. Number of emails represents the strength of bond between two individuals but that is not included in the scope here. Ultimately all layers were considered as our proposed approach says for the detection of groups and key players. Following interesting scenario was detected. Node 1 sends an email to node 4. Node 4 sends an sms to node 11. Node 11 transfers an amount from his account to node 7’s account. Node 7 travels with node 9 in the same flight. Node 9 talks with node 15 on telephone. Node 15 sends an email to node 10 and also sends sms to ensure the delivery of email. Node 10 sends email to node 20, talks with him on phone and sends an sms to him. Node 20 informs node 22 through email. Node 22 calls node 23 through telephone. Node 23 travels with node 31 on same flight, node 31 talks with node 34. Node 34 does the actual job. The result graph after applying the proposed model is a multi-graph as there can be more than one association
In our proposed model, each medium of communication or interaction is termed as a layer of association, for example email is one layer of association, and similarly bank transactions data is another layer. The proposed model caters for the distribution of associations of individuals in different layers. As presented in the literature, traditional approaches have been considering only one layer of association, which can easily ignore the active individuals from being noticed at different layers. The novelty of our proposed model is that different heterogeneous layers are taken into account in order to detect a group and the key players in order to destabilize the group. The suggested model can be used in the following fashion. Select master layer (Domain expert can suggest any layer to be a master layer, commonly the most important layer) which will be used to classify individuals in suspicious and nonsuspicious sets, on the basis of contents of the interaction between two individuals e.g. if email layer is chosen as master layer, contents of emails can be used to include sender and receiver in suspicious/non-suspicious class and similarly if bank transactions database is chosen as master layer, transaction money crossing a specific preset threshold can be included in suspicious and rest in non-suspicious class. Set of suspicious individuals is denoted as S={S0,S1,……,Sn}. Also the associations found between members of S will be added to a set A in which an element will represent the association between two individuals, e.g. if S0 is sender of a suspicious email and S1 and S2 are the receivers, (S0, S1) and (S0, S2) will be added to set A. For each individual in set S, find neighbors searching all layers and add them to set S, neighbors are defined as all individuals which have a direct link with node under consideration e.g. ‘b’ is a neighbor of ‘a’ if a has sent an email 2014 Engineering and Technology Publishing
IMPLEMENTATION AND RESULTS
144
Journal of Industrial and Intelligent Information Vol. 2, No. 2, June 2014
between two individuals. As shown in the above figure node 10 has an email, sms and telephonic conversation association with node 20. Presence of multiple edges between two nodes also shows the strength of bond between them. Table I shows the results taken after applying degree centrality on the detected group. TABLE I. Nodes
DEGREE CENTRALITIES
Degree 1
1/12= 0.083
4
2
2/12=0.16
11
2
2/12=0.16
7
2
2/12=0.16
9
2
2/12=0.16
15
3
3/12=0.25
10
5
5/12=0.42
20
4
4/12=0.33
22
2
2/12=0.16
23
2
2/12=0.16
31
2
2/12=0.16
34
1
1/12=0.083
TABLE II.
V.
Normalized Centrality (Node Degree/Total Nodes-1) [25]
1
Identified Groups
Detected Key Player
Email
G1: 1,4
Node 10
G2: 15,10,20,22
Node 20
G1: 4,11
Node 10
REFERENCES [1]
Original Key Player
[2]
Node 10
[3]
Node 10
[4]
Node 10
[5]
G2: 15,10,20 G1: 9,15 Telephonic Conversation
G2: 20,22
All nodes have equal degree
[6]
G3: 31, 34
[7]
G4, 10,20 Air Travel
G1: 7, 9 G2: 23,31
All nodes have equal degree
Node 10 [8]
Funds Transfer
G1: 7,11
All nodes have equal degree
Node 10
All (Proposed)
G1: 1, 4, 11, 7, 9, 15, 10, 20, 22, 23, 31, 34
Node 10
Node 10
[9] [10] [11]
According to the calculated results Node 10 with highest degree is detected as the key player. Now in the above scenario, if we will be observing only one medium, we will never be able to identify this whole group and also we would never reach the actual individual. Secondly the key player identified is different on the basis of SNA concepts but if we would be looking
2014 Engineering and Technology Publishing
CONCLUSIONS
National security is a big challenge of world today because of increase in the number of terrorist incidents across the globe. One of the main problems to prevent such incidents is to detect suspicious individuals organized in covert groups in carry to carry out such activities. Members of such group often communicate through hidden mediators using different modes of interactions. A model has been proposed in the article to consider all available association layers which may be used by such individuals in order to coordinate such activities. Also the model suggests consideration of multiple layers to find the most important nodes which may be eliminated to destabilize such networks to prevent their desired events. The proposed model has been implemented and results have also been presented.
DETECTED GROUPS
Layer
SMS
at only one layer, the key player would be definitely different and who would be false. Table II contains summary of result obtained after applying detection on all layers separately and then collectively. The results are clear proof of the strength of proposed approach.
[12] [13]
[14]
145
T. Coffman, S. Greenblatt, and S. Marcus, “Graph-based technologies for intelligence analysis,” Communications of the ACM, Special Issue on Emerging Technologies for Homeland Security,” vol. 47, no. 3, pp. 45-47, March 2004. T. Cofman and S. Marcus, “Pattern classification in social network analysis: A case study,” in Proc. IEEE Aerospace Conference, Big Sky, MT, March 2004. T. Cofman and S. Marcus, “Dynamic classification of groups trough social network analysis and HMMs,” in Proc. IEEE Aerospace Conference, Big Sky, MT, March 2004. M. Mukherjee and L. Holder, “Graph-based data mining on social networks,” Workshop on Link Analysis and Group Detection, 2004. J. Baumes et al, “Discovering hidden groups in communication networks,” in Proc. 2nd NSF/NIJ Symposium on Intelligence and Security Informatics, 2004. S. E. Martonosi et al, “A new framework for network disruption,” Computer Science: Social and Information Networks, 2011. K. M. Carley, “Estimating vulnerabilities in large covert networks,” in Proc. 9th International Command and Control Research and Technology Symposium, Coronado Resort, CA, 2004. M. Tsvetovat and K. M. Carley, “Generation of realistic social network datasets for testing of analysis and simulation tools,” Carnegie-Mellon University Technical Report CMU-ISRI-05-130, 2005. B. Zhu et al, Visualizing Social Network Concepts, Published by B. V. Elsevier, 2010. G. Yu, “Social network analysis based on BSP clustering algorithm,” Communications of the IIMA, vol. 7, no. 4, 2007. N. Memon and H. L. Larsen, “Investigative data mining toolkit: A software prototype for visualizing, analyzing and destabilizing terrorist networks,” Visualising Network Information, 2006, pp. 14-1–14-24. K. M. Carley, J. Reminga, and N. Kamneva, “Destabilizing terrorist networks,” in Proc. NAACSOS Conference, 2003. R. Puzis, Y. Elovici, and S. Dolev, “Fast algorithm for successive computation of group betweenness centrality,” American Physical Society, 2007. U. Brandes, “A faster algorithm for betweenness centrality,” Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163-177, 2001.
Journal of Industrial and Intelligent Information Vol. 2, No. 2, June 2014
[15] T. Alahakoon, “Path centrality: A new centrality measure in networks,” MS Dissertation, Department of Computer Science and Engineering College of Engineering, University of South Florida, 2010. [16] G. Tan, D. Tu, and N. Sun, "A parallel algorithm for computing betweenness centrality," in Proc. International Conference on Parallel Processing, 2009, pp. 340-347. [17] M-J Lee, J. Lee, and J. Y. Park, “A quick algorithm for updating betweenness centrality,” International World Wide Web Conference Committee, Lyon, France, 2012 [18] N. Memon, K. C. Kristoffersen, D. L. Hicks, and H. L. Larsen, "Detecting critical regions in covert networks: A case study of 9/11 terrorists network," The Second International Conference on Availability, Reliability and Security, 2007, pp. 861-870. [19] N. Memon, N. Harkiolakis, and D. L. Hicks, “Detecting highvalue individuals in covert networks: 7/7 london bombing case study,” in Proc. Computer Systems and Applications, 2008. [20] N. Memon and D. L. Hicks, “Detecting key players in 11-M terrorist network: A case study,” Presented at Third International Conference on Availability, Reliability and Security, 2008. [21] N. Memon, A. R. Qureshi, U. K. Wiil, and D. L. Hicks, “Novel algorithms for subgroup detection in terrorist networks,” presented at International Conference on Availability, Reliability and Security, 2009. [22] P. Alice, "Detecting covert members of terrorist networks,” Bachelors Thesis, Department of Mathematics, Harvey Mudd College, 2012. [23] B. A. Chiera, “Group-Based social network characterisation of hidden terrorist networks,” in Proc. 1st International Cyber Resilience Conference, Edith Cowan University, Perth Western Australia, August, 2010. [24] S. Kramer, “A new method for detecting and tracking covert terrorist networks,” Published as a white Paper by Paragon Science Inc. [25] L. Tang and H. Liu, Community Detection and Mining in Social Media, ch. 2, September 2010.
engineering. He also worked as Assistant Professor in the same department and also developed a number of software applications Usman Qamar is working as Assistant Professor in the Department of Computer Engineering, College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Pakistan. He has done his PhD from The University of Manchester in Text and Data Mining. Currently he is supervising a number of MS and PhD projects. His research interests are Databases, Data Mining, Text Mining, Sentiment Analysis, Social Network Analysis and Business Intelligence. Shoab A. Khan has a Ph.D. in Electrical and Computer Engineering from Georgia Institute of Technology, Atlanta, GA, USA. He is a professor of electrical and computer engineering at the College of Electrical & Mechanical Engineering, National University of Sciences and Technology (NUST). He is a founding member of Center for Advanced Studies in Engineering (CASE) and CEO of Center for Advanced Research in Engineering (CARE). CASE is a primer engineering institution and runs the largest post graduate engineering program in the country, whereas CARE has risen to be one of the most profound High Technology Engineering organizations in Pakistan. Dr Khan is actively involved in research and development and has 5 awarded US patents and 160+ international publications. Dr Khan has more than 17 years of industrial experience in companies like Scientific Atlanta, Picture Tel, Cisco Systems, and Avaz Networks. His most renowned engineering work was when he established and then headed a team that executed a pioneering work of System on Chip (SoC) design in Pakistan. He with his team designed the world highest density media processor for carrier class voice processing system. He has been awarded Tamgh-e-Imtiaz, the Presidential Award for his contribution in the field of Engineering. He has also been a recipient of National Education Award 2001 in the category of “Outstanding Services to Science and Technology”, NCR National Excellence Award in the category of IT Education, prestigious Cisco System research grant, ICT R&D and PTCL R&D research funding beside executing on multimillion US dollars worth of projects. This year CARE made history by winning PASHA ICT awards in four different categories and then winning three APICTA Merit awards in Malaysia in three different categories.
Wasi H. Butt is doing his PhD in the discipline of Software Engineering from Department of Computer Engineering, College of Electrical and Mechanical Engineering, National University of Sciences and Technology, Pakistan. His PhD research topic is Terrorist Group Leader Detection using Social Network Analysis. Before starting PhD, he received his Masters degree from the same institution in the field of software
2014 Engineering and Technology Publishing
146