Node-Based Probing and Monitoring to Investigate Use ... - IEEE Xplore

0 downloads 0 Views 339KB Size Report
Abstract. We consider the requirements for node-based probing and monitoring for network forensic investigation of the use of peer-to-peer technologies for ...
Third 3rd International InternationalWorkshop Workshopon onSystematic SystematicApproaches ApproachestotoDigital DigitalForensic ForensicEngineering Engineering

Node-based Probing and Monitoring to Investigate Use of Peer-ToPeer Technologies for Distribution of Contraband Material Deborah W. Keeling Justice Administration University of Louisville [email protected]

Olfa Nasraoui Knowledge Discovery & Web Mining Lab Computer Engineering & Computer Science University of Louisville [email protected] Adel Elmaghraby Computer Engineering & Computer Science Dept. University of Louisville [email protected]

George Higgins Justice Administration University of Louisville [email protected]

Michael Losavio Justice Administration and Computer Engineering & Computer Science Dept. University of Louisville [email protected]

Abstract We consider the requirements for node-based probing and monitoring for network forensic investigation of the use of peer-to-peer technologies for distribution of contraband material. The architecture of peer-to-peer (P2P) data exchanges must be examined for opportunities to capture data on the transfer of contraband data with a focus on node structures in P2P exchanges. This examination is of technical, social and legal aspects of P2P use leading to the design and testing of forensically-sound investigative tools and protocols. Computational research must examine: 1. Undercover Node-based Probing and Monitoring to Build an Approximate Model of Network Activity 2. Flagging Contraband Content (keyword, hashes, other patterns) 3. Evaluation against different recipient querying, distribution and routing cases 4. Using the Evaluation results to fine-tune the node positioning strategy Legal and social research is needed to examine the U.S. and transnational legal constraints on the use of particular tools and the presence of possible behavioral signatures. Keywords: probing, monitoring, peer-to-peer, contraband, network, forensics, legal, social

978-0-7695-3171-7/08 $25.00 © 2008 IEEE DOI 10.1109/SADFE.2008.16

135

1 Introduction - Node-based Probing and Monitoring of P2P Usage 1.1 Purpose, goals, and objectives To develop a network forensic protocol to address P2P contraband exchanges, we must examine and test the architecture of peer-to-peer data exchanges to create legal tools to capture data on the transfer of contraband data. 1.2 Relevant literature P2P networks offer a very attractive platform for participants in contraband information exchange due to the sense of “anonymity” that prevails while performing P2P transactions. P2P networks vary in the level of preserving the anonymity of their users. The anonymity comes from the idea that in a broadcast-based P2P retrieval, a query is initiated at a starting node, then it is relayed from this node to all nodes in its neighborhood, and so on, until a matching information is found on a peer node, in which case a direct communication is established from the initiating node and the last node to transfer the content. In this platform, it is hard to know who requested certain information as it is difficult, even if possible, to determine whether a node requested the data for itself or simply requested the data on behalf of another node. In a way, everybody on the network acts as a universal sender and universal receiver to maintain anonymity. Examples of the challenge of decentralized anonymous and oversight-resistant P2P frameworks include Freenet and the GNUnet Looking at related work in P2P search, Yang et al. [5] proposed more efficient search engines in P2P spaces that reduce system load while locating resources. Manku et al. [6] proposed “lookahead” algorithms to help improve P2P performance. From a monitoring perspective, tools such as the IR-Wire system [8] may assist in accessing a large set of data on users and activity in P2P networks. Wang et al. [9] showed that even with anonymizing routers, peer-to-peer VoIP traffic can be tracked. Badonnel et al. [10] proposed a management system for tracking online misconduct in P2P networks using configurable honeypots. Constant polling of P2P networks may track first-uploaders of material using tools such as FirstSource. But challenges include designing reliable algorithms for monitoring and an adequate P2P network testbed. Wall [14] notes characteristics of the Internet that have enabled individuals to easily commit criminal activity; to date, little research has been used to understand these activities using criminological theory.

2 Research Design and Methods 2.1 Computational aspects We consider that the transaction space for digital contraband defines three possible roles (A, B, and C) that a given node may play in a P2P network-based contraband exchange as shown in Figure 1: distributor of contraband (A), connection node (B) that only serves to establish a route from A to B, via the relay mechanism described in the broadcast-based retrieval above, and recipient node (C) that has requested the contraband. While nodes of type B participate without intent, node A is suspect, and node type C could be suspect or innocent depending on the intent of its query. The latter ambiguity arises because in most P2P networks, retrieval tries to match the query’s text against filenames of the suspect content, however filenames can be misleading, and as a result users can occasionally end up receiving contraband content that does

136

not really match their need. However, a node that contributes in roles of type C very frequently is likely to be suspect.

Creator/Top Level Distributor (A)

Connection (B)

Recipient (C)

Connection (B)

Recipient (C)

Connection (B)

Recipient (C)

Figure 1: Transaction space for P2P contraband exchange Our approach is based on four components: 2.1.1 Undercover node-based probing and monitoring to build an approximate model of network activity: Our technical approach positions a large number (K) of under-cover nodes that serve three purposes (as displayed in Figure 2): (1) some of the nodes will act like honey pots, thus intentionally probing the network by “requesting” contraband content, in a similar manner to type C nodes. We call these nodes the “query initiators”. (2) Other nodes will act as connection nodes (type B), that work as “monitoring nodes”. (3) An additional possibility is to have nodes act like “fake distributors”, distributing content that has contraband-like sounding filenames or metadata, but where the actual content is not contraband. Nodes that act in this capacity serve only to pinpoint suspicious requests, thus contributing to densification of the routing data that will be collected and monitored, as explained below. Once positioned, the under-cover nodes of type C will periodically select from a list of queries populated based on domain knowledge provided by law enforcement and optionally from a volunteer driven wiki platform. Then they will initiate the selected query by broadcasting to their neighbors. Based on most existing P2P protocols, these neighbors (type B) will in turn relay these messages (unless the content was found) to their own neighbors, and so on. Once content is found on a node, the initiating node will have sufficient information (IP address and port number) to open a direct connection with the distributing node (type A). In this case, we have a hit, and we record this distributing node’s information (ID) in a suspect database, as well as the key for the retrieved content (to avoid duplicate detection), and increment its “activity counter” which later helps in ranking all nodes based on suspiciousness. Monitoring will occur independently even when the queries are not initiated by devoted under-cover nodes. Suppose that an authentic suspect node of type C initiates its own suspect query by broadcasting it to its neighbors. Based on most existing P2P protocols, these neighbors (type B) will in turn relay these messages (unless the content was found, in which case they are identified as type A) to their own neighbors, and so on. If one of the under-cover nodes (call it Y) happens to be on the message broadcasting route or sub-network, then it will intercept the query both during its forward route coming from immediate neighbor X and during its backward route (once the content has been found on another node) when coming from neighboring node Z. Thus we would have reconstructed a small portion of the route,

137

which is given by X, Y, Z. It is hoped that during the same time window, many other undercover nodes will similarly capture other parts of the route, and that from this “collective” information, we can attempt to reconstruct a bigger picture of the route. This reconstruction can help in inferring the most likely contraband query initiators (or recipients, thus type C) and content distributors (thus type A). This information will in turn contribute to better estimates of the suspect rankings of the different nodes. One way to perform the above “inference” is by using relational probabilistic inference or by network influence propagation mechanisms where evidence is gradually propagated in several iterations via the links from node to node. In addition to the above “suspect query”-based routing construction and node ranking, the network topology will be periodically estimated based on a query-less crawling mechanism, such as the one proposed by Ripeanu et al. [11] to map the Gnutella network. Some of the ranking principles employed in the trust-based ranking in P2P networks, proposed by Marti and Garcia-Molina [12], can also be used. There are other, less anonymity-preserving topologies that are easier to monitor due to use of designated nodes as indices of content (such as structured overlay architectures that rely on super-peers or ultra-peers to perform most of the routing). We do not elaborate on these cases, because they are easier to handle than the pure (unstructured) P2P overlays, considered above. Last, there is the possibility of collaborative tagging of suspect nodes, where a large number of volunteers “report” suspect nodes by tagging them whenever they respond to suspect queries. The volunteers can perform this information querying and recording manually (though tedious) or they can volunteer to install a small add-on to their P2P software. This tests classical guardian activity in deterrence theory 2.1.2 Flagging Contraband Content (Keyword, File Hashes, Other Patterns): Contraband content will be recognized primarily through a unique file-hash signature from three sources: (i) known contraband in an index that would be stored and maintained by law enforcement agencies, (ii) contraband that would be intercepted by law enforcement personnel as part of the dynamic monitoring process, and (iii) tags provided by a possibly large number of volunteers in a wiki-style approach. A Web interface for capturing this data and maintaining it on a secure database could be built. 2.1.3 Evaluation against different recipient querying, distribution and routing cases: Evaluation can take place based on well-controlled experiments that use a simulated network topology with several ground truth scenarios representing different recipient querying, distribution and routing cases. For each scenario, there will be a ground-truth ranking of the nodes based on their suspiciousness levels. Then for each application of the above querying and monitoring process, the “estimated” or inferred node rankings will be periodically compared against their known ranking of suspiciousness, and metrics, inspired from information retrieval, such as coverage and precision will be computed. 2.1.4 Closing the loop: using the evaluation to fine-tune node positioning strategy: Based on repeating experiments with different scenarios and parameters, the above evaluation can shed some light on the suitability of the node positioning strategy, as well as the suitability of any parameters governing the entire inference process, such as the number of nodes that are positioned. This will be formalized using an experimental design methodology based on factor analysis, examining each factor alone and in correlation with other factors. The factors will be the parameters themselves and the various activity scenario types that may be suggested by law enforcement experts.

138

Figure 2: Undercover and Probing Monitoring Structure: an initiator node submits a suspect query, while intermediate undercover nodes record data exchange in their neighborhood, partially reconstructing a route toward the distributor. The Distributor node is detected and recognized with certainty if the query initiator is an undercover node; it may be inferred with uncertainty via the partially re-constructed routes. 2.2 Interaction with legal and social aspects There may be social science options that can aid in protocol development. It would be necessary that we identify possible study protocols, if any, for analyzing behavioral characteristics related to online contraband activity. It is essential that we review and document the legal issues associated with particular computational methods of data capture and analysis. Legal limitations on particular implementations of investigative tools must be noted and addressed. This requires that we outline general issues, domestic and transnational, that must be addressed by technical methods of data capture and analysis for peer-to-peer networks.

3 Conclusion - Implications for criminal justice policy and practice This protocol offers a valuable schematic of the analyses of P2P networks in network forensic investigations. P2P identity and authentication in digital environments can be of exceptional value. Court cases in the United States have approved the powerful tactical use of electronic data to justify issuance of warrants to search and seize computers. See, e.g. United States v. Gourde, 440 F.3d 1065 (9th Cir. 2006)

139

Conversely, failure to develop powerful and reliable network forensic tools for P2P data exchanges allows the unchecked growth and expansion of illegal and dangerous online data exchanges. These, in turn, may feed the growth of primary contraband development, particularly the expanded exploitation and abuse of children.

4 References [1]

[2]

[3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

RIAA Hell: Automatic Wireless MP3 Sharing, http://www.therawfeed.com/2006/01/riaa-hell-automaticwireless-mp3.html, last visited November 5, 2007; Andy Oram A Thought Experiment: Evading Echelon Through Peer-to-PeerApril 27, 2001, http://www.praxagora.com/andyo/wr anon_p2p_echelon.html, last visited November 6, 2007; U.S. General Accounting Office (2003). File-sharing Programs: Peer-to-Peer Networks Provide Ready Access to Child Pornography. Washington, D.C.; Testimony Before the Committee on Government Reform, House of Representatives FILE-SHARING PROGRAMS: Child Pornography Is Readily Accessible over Peer-to-Peer Networks, Statement of GAO Director of Information Management Issues Linda Koontz. Losavio, Michael,The Law of Possession of Digital Objects: Dominion & Control Issues in Digital Forensics Investigations and Prosecutions, First International Workshop on Systematic Approaches to Digital Forensic Engineering, November, 2005, Taipei, Taiwan; Losavio, Michael, Wilson, Deborah, Elmaghraby, Adel, “Prevalence, Use and Evidentiary Issues of Digital Evidence of Cellular Telephone Consumer and Small Scale Digital Devices” Journal of Digital Forensic Practice, December, 2006 V. 1, pp 291-296 Yang, Helen, Garcia-Molina, Hector, Improving Search in Peer-to-Peer Systems, http://dbpubs.stanford.edu:8090/pub/2001-47 Gurmeet Singh Manku, Moni Naor and Udi Wieder. Know thy Neighbor's Neighbor: The power of Lookahead in Randomized P2P Networks, http://infolab.stanford.edu/~manku/papers/04stoc-nn.pdf Shefali Sharma, Linh Thai Nguyen, Dongmei Jia , IR-Wire: A Research Tool for P2P Information Retrieval, http://www.emse.fr/OSIR06/2006-osir-p33-sharma.pdf Xinyuan Wang, Shiping Chen, Sushil Jajodia, Tracking Anonymous Peer to Peer VoIP Calls on the Internet http://ise.gmu.edu/~xwangc/Publications/CCS05-VoIPTracking.pdf Badonnel, State, Chrisment, Festor, "A Management Platform for Tracking Cyber Predators in Peer-to-Peer Networks," icimp, p. 11, Second International Conference on Internet Monitoring and Protection (ICIMP 2007), 2007. http://p2p.weblogsinc.com/2005/05/05/baytsp-claims-it-can-track-down-original-file-sharers/ M. Ripeanu, I. Foster, and A. Iamnitchi, Mapping the Gnutella network: Properties of large-scale peer-to-peer systems and implications for system design, IEEE Internet Computing-special issue on peer-to-peer networking, vol. 6(1) 2002 S. Marti and H. Garcia-Molina, Taxonomy of trust: Categorizing P2P reputation systems, Computer Networks 50:472-484, 2006. Elmaghraby, Graham, Godwin, Losavio, Wilson, "Challenge--Construction of an Adequate Digital Forensics Testbed," SADFE, pp. 147-149, Second International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE'07), 2007 Wall, D. S. (2005). The Internet as a conduit for criminal activity. In A. Pattavina (Ed.), Information technology and the criminal justice system (pp. 78-94). Thousand Oaks, CA: Sage. “Working Between Disciplines – Issues in Building the Digital Forensics Bridge From Computer Science to Judicial Science” Digital Forensics Research Workshop 2006, Panel Co-chairs-Michael Losavio and Marcus Rogers, Members Deborah Wilson, Adel Elmaghraby, S. Srinivasan, David Elder. August, 2006, Purdue University, West Lafayette, Indiana, U.S.A.

140

Suggest Documents