Document not found! Please try again

A New Approach for Peer-to-Peer Distributed ...

6 downloads 14189 Views 5MB Size Report
Implementation of the server side HTTP. Server with support for: o Displaying WebPages o Displaying Regular files o Handling Cookies o Handling Forms (GET ...
A New Approach for Peer-to-Peer Distributed Computation

Supervised By: Dr. Khaled Nagaty Faculty of Informatics and Computer Science Department of Computer Science Alex Movsessian

Student ID: 111171

Submitted to the Faculty of Informatics and Computer Science The British University in Egypt In partial fulfillment of the requirements for the Degree of BACHELOR OF SCIENCE June, 2013

Alex Movsessian

Computer Science

BUE

A New Approach for Peer-to-Peer Distributed Computation

Abstract Distributed Computing is considered to be one of the challenging problems in computer science. The entry barriers are quite significant due to the inherent complexity which such systems have to provide. This system provides a new approach for tackling the problems related to distributed computing by creating a multi-tier structure that is quite accessible and powerful enough for creating distributed parallel applications with a minimal learning curve and debugging effort, coupled with inherent safety and modularity allowing for further expansion of features.

Submitted to Prof. Ahmed Hamad, Dean of the Faculty Faculty Committee Dr Khalid Nagaty, Dissertation Advisor

ii

Alex Movsessian

Computer Science

BUE

Acknowledgments

I would like to seize this occasion to acknowledge the help, vision and trust that my graduation project advisor Dr. Khalid Nagaty had offered me. His support had facilitated the process and provided quite a bit of motivation and the right quantity of catalytical weight over the various project stages. I would also like to thank Professor Ahmed Hamad for the perception and knowledge in the past years. Furthermore, I thank those of my competent colleagues whom with their cooperation and intuition the past few years had been quite advantageous and inspiring.

iii

Alex Movsessian

Computer Science

BUE

Table of Contents Table of Contents................................................................................................. iv List of Figures .....................................................................................................viii List of Graphs....................................................................................................... xi List of Tables....................................................................................................... xii Chapter 1 - Introduction ...................................................................................... 13 1.1

Motivation .............................................................................................. 14

1.2 Objective.................................................................................................... 16 1.3 Project Management.................................................................................. 19 1.3.1 Management Methodology .................................................................. 19 1.3.2 Tabular Representation ....................................................................... 20 1.3.3 Gantt Chart Representation................................................................. 22 1.4 Licensing ................................................................................................... 22 Chapter 2 - Review of Literature ......................................................................... 24 2.1 Computer Networking ................................................................................ 25 2.1.1 OSI Model ........................................................................................... 25 2.1.2 The IP Protocol.................................................................................... 26 2.1.3 The TCP Protocol ................................................................................ 27 2.1.4 Client-Server Web Structure................................................................ 27 2.1.5 Peer-to-Peer Networks ........................................................................ 30 2.2 Distributed Systems ................................................................................... 37 2.2.1 DHT ..................................................................................................... 37 2.2.2 Kademlia ............................................................................................. 40 2.2.3 Distributed File Systems ...................................................................... 45 iv

Alex Movsessian

2.3

Computer Science

BUE

Security .................................................................................................. 46

2.3.1 Symmetric Key Encryption .................................................................. 46 2.3.2 AES Algorithm ..................................................................................... 48 2.3.3 Public Key Encryption.......................................................................... 50 2.3.4 RSA ..................................................................................................... 51 2.3.5 Key Exchange ..................................................................................... 53 2.3.4 Hashing Algorithms ............................................................................. 54 2.3.5 SHA 512 .............................................................................................. 55 2.4 Web Services ............................................................................................ 57 2.4.1 HTTP Protocol ..................................................................................... 57 2.4.2 Software as a Service (SaaS) ............................................................. 60 2.5 Used Tools ................................................................................................ 62 2.5.1

Python ............................................................................................. 62

2.5.2

The C# Programming Language ..................................................... 64

2.5.3

Model View View-Model (MVVM) .................................................... 66

Chapter 3 - Related Works ................................................................................. 69 3.1 Google Map Reduce .................................................................................. 70 3.2 BitTorrent Sync .......................................................................................... 72 3.3 Skype......................................................................................................... 73 3.4 BitCoin ....................................................................................................... 74 Chapter 4 - Contribution...................................................................................... 77 4.1 TCP Connection ........................................................................................ 81 4.2 Authentication ............................................................................................ 82 4.3 Message Exchange ................................................................................... 83 4.4 Proxification (Tunneling) ............................................................................ 84 v

Alex Movsessian

Computer Science

BUE

4.5 DHT ........................................................................................................... 88 4.6 Web Server ................................................................................................ 90 4.7

Distributed File System .......................................................................... 92

4.7.1

File Identifiers .................................................................................. 92

4.7.2

Network Functionality ...................................................................... 93

4.7.2

Local File Management ................................................................... 95

4.8

Omega Python ....................................................................................... 99

4.9

Attack Vectors...................................................................................... 101

Chapter 5 - Experiments and Test Results ....................................................... 105 5.1 N-Way Merge .......................................................................................... 106 5.2 Load Balancing for Link Crawler .............................................................. 109 5.3 Experimental Data Analysis ..................................................................... 111 5.3.1 Data Analysis Tools ........................................................................... 111 5.3.2 Experimental Data ............................................................................. 115 5.4 Comparison to Other Peer-to-Peer Systems ........................................... 127 5.5 Optimality................................................................................................. 130 Chapter 6 – Conclusion and Future Works ....................................................... 133 6.1

Conclusion ........................................................................................... 134

6.2 Future Works ........................................................................................... 138 6.2.1 Distributed SQL-like Database .......................................................... 138 6.2.2 Live Streaming .................................................................................. 138 6.2.3 Native Algorithmic Libraries ............................................................... 138 6.2.4 Ad-hoc Application Ports ................................................................... 139 6.2.5 Dynamic Cloud Storage..................................................................... 139 6.2.6 Monetisation ...................................................................................... 141 vi

Alex Movsessian

Computer Science

BUE

Appendix A - Project Schedule Gantt Chart ...................................................... 143 Appendix B - Project Classes UML Diagram .................................................... 147 Appendix C - Performance Analysis for Launching Parallel Application ........... 149 Appendix D – Handshake Process Detailed Log .............................................. 154 Appendix E - Creative Commons License Legal Code ..................................... 160 Glossary of Key Terms ..................................................................................... 171 References ....................................................................................................... 176 Index ................................................................................................................. 186 Acronyms .......................................................................................................... 192

vii

Alex Movsessian

Computer Science

BUE

List of Figures

Fig. 1 Relationship between System Components.............................................. 18 Fig. 2 The RAD Model (Leffingwel, 2007) ........................................................... 20 Fig. 3 Creative Commons License Summary (Creative Commons, 2013) .......... 23 Fig. 4 The OSI Model (Mintzberg, 2010) ............................................................. 25 Fig. 5 IP Packet Structure (Cisco, 2012) ............................................................. 26 Fig. 6 TCP Packet Structure (Cisco, 2012) ......................................................... 27 Fig.7 Client Server Model ................................................................................... 28 Fig. 8 Illustration of the Bit Torrent Network (Siganos, 2009) .............................. 32 Fig. 9 The Tor Network Logo (The Tor Project, 2012) ........................................ 33 Fig. 10 Illustration of the Tor Network (McCoy, 2008) ......................................... 35 Fig. 11 SETI@Home Network Logo (SETI@Home Labs, 2013) ......................... 36 Fig. 12 BitTorrent Logo (BitTorrent Labs, 2013) ................................................. 38 Fig. 13 Kademlia Binary Tree Example (Maymounkov, 2002) ............................ 42 Fig. 14 Kademlia Lookup Process (Maymounkov, 2002) .................................... 43 Fig. 15 AES (Paar, 2010) .................................................................................... 49 Fig. 16 RSA Key Pair Generation (Holmes, 2012) .............................................. 51 Fig. 17 A hash function used in a phonebook to map names to phone numbers (Rogaway, 2004)................................................................................................. 55 Fig. 18 SHA-512 (Grembowski, 2002) ................................................................ 56 Fig. 19 basic operation of the HTTP protocol for a client to retrieve a page from a server (Budgen, 2003 ......................................................................................... 58 Fig. 20 HTTP Message Structure (Berners-Lee, 1999)....................................... 59 Fig. 21 Python Logo (Python Software Foundation, 2012) ................................. 62 Fig. 22 C# Logo (Microsoft, 2013)....................................................................... 64 Fig. 23 Illustration of the MVVM Design Pattern (Microsoft, 2008)...................... 66 Fig. 24 WPF Architecture (Freeman, 2010) ........................................................ 67 Fig. 25 Illustration for MapReduce Basic Operations (Yang, 2007) .................... 70 Fig. 27 BitTorrent Sync User Interface for Distributed File Management (BitTorrent Labs, 2013) ....................................................................................... 72 viii

Alex Movsessian

Computer Science

BUE

Fig. 26 BitTorrent Sync Logo (BitTorrent Labs, 2013)......................................... 72 Fig. 29 Illustration for the Skype P2P network structure (Dean, 2012)................ 73 Fig. 28 Skype Logo (Microsoft, 2013) ................................................................. 73 Fig. 30 Bitcoin Logo (Nakamoto, 2013) ............................................................... 74 Fig. 31 Illustration for the BitCoin operation. The cloud represents the P2P network and the DHT (Nakamoto, 2013) ............................................................ 75 Fig. 32 The value of BitCoin versus US dollar indicating its recently growing potential (Maurer, 2013)...................................................................................... 76 Fig. 33 System Diagram ..................................................................................... 80 Fig. 34 RSA Key Exchange Automaton .............................................................. 82 Fig. 35 Envelope Data Structure Used for Transmitting the Messages ............... 83 Fig. 36 The Proxification Process ....................................................................... 85 Fig. 37 Proxification Example.............................................................................. 86 Fig. 38 The usage of nodes as multiple proxies .................................................. 87 Fig. 39 DHT Buckets Structure ........................................................................... 89 Fig. 40 Operation of the local web server ........................................................... 91 Fig. 41 Hierarchical Data Structure Format ......................................................... 93 Fig. 42 Sample Parallel Search Query ................................................................ 94 Fig. 43 Flat table representation of the file structure ........................................... 96 Fig. 44 FileSystemWatcher Operation ................................................................ 97 Fig. 45 FileSystemWatcher Sample Demo ......................................................... 98 Fig. 46 N-Way Merge ........................................................................................ 108 Fig. 47 Web Crawler Load Balancing Operation ............................................... 110 Fig. 48 Wireshark Logo (Wireshark, 2013) ....................................................... 111 Fig. 49 WPE Pro Main Window Screenshot ...................................................... 112 Fig. 50 Visual Studio Logo (Microsoft, 2012) .................................................... 112 Fig. 51 Illustration of TraceRoute Operation (Cisco, 2012) ............................... 113 Fig. 52 No-IP Logo (No-IP, 2012) ..................................................................... 114 Fig. 53 Screenshot of the Web Crawler Load Balancer Application .................. 117 Fig. 54 Link Loading Response Time Analysis ................................................. 118 Fig. 55 Web Scrapper Queue ........................................................................... 119 ix

Alex Movsessian

Computer Science

BUE

Fig. 56 Web Scrapper Finished......................................................................... 120 Fig. 57 Web Scrapper Result Links................................................................... 121 Fig. 58 Packet Dump 1 ..................................................................................... 122 Fig. 59 Packets after handshake....................................................................... 124 Fig. 60 BubbleSort Pseudocode ....................................................................... 130 Fig. 61 Merging Algorithm Pseudocode ............................................................ 131 Fig. 62 Example of In-App Advertisement in uTorrent ...................................... 142 Fig. 63 Project Schedule Gantt Chart - Part 1................................................... 144 Fig. 64 Project Schedule Gantt Chart - Part 2................................................... 145 Fig. 65 Project Schedule Gantt Chart - Part 3................................................... 146 Fig. 66 Project Classes UML Diagram .............................................................. 148 Fig. 67 Handshake Log Part 1 .......................................................................... 155 Fig. 68 Handshake Log Part 2 .......................................................................... 156 Fig. 69 Handshake Log Part 3 .......................................................................... 157 Fig. 70 Handshake Log Part 4 .......................................................................... 158 Fig. 71 Handshake Log Part 5 ......................................................................... 159

x

Alex Movsessian

Computer Science

BUE

List of Graphs

Graph. 1 Node Uptime Probability (Maymounkov, 2002) .................................... 44 Graph. 2 Average Network Latency .................................................................. 116 Graph. 3 BitTorrent Speed Analysis.................................................................. 129 Graph 4 Comparison Between Two Approaches of Sorting.............................. 132

xi

Alex Movsessian

Computer Science

BUE

List of Tables

Table 1 Project Schedule .................................................................................... 20 Table 2 Summary of AES Algorithm Features (Paar, 2010) ............................... 48 Table. 3 HTTP Status Codes (Berners-Lee, 1999) ............................................. 58 Table 4 Python Standard Library Modules (Python Software Foundation, 2012) 63 Table. 5 System Components ............................................................................. 78 Table. 6 States in RSA Key Exchange ................................................................ 82 Table.7 Omega Python Primitives....................................................................... 99 Table 8 Attack Vectors Summary...................................................................... 101 Table 9 Network Latency .................................................................................. 115 Table 10 Communication Monitor PC Specifications ........................................ 121 Table 11 Gathered Experimental Elements ...................................................... 126 Table 12 Peer-to-Peer Networks Comparison .................................................. 127 Table 13 Summary Comparison with Related Works........................................ 136

xii

Alex Movsessian

Computer Science

BUE

Chapter 1 Introduction Chapter 1 - Introduction

13

Alex Movsessian

Computer Science

BUE

1. Introduction

Peer-to-Peer networks are being increasingly regarded as a major breakthrough in computing. Throughout the last decade which witnessed a radical increase in the number of interconnected computer, various peer-topeer platforms were created. However, the majority of the aforementioned platforms focused upon file sharing and anonymity. Quite few managed to harness the aggregate computing power, and those who did, were in an adhoc application-oriented basis rather than a generic distributed computing manner. In this paper a new kind of peer-to-peer structural system shall be presented addressing such demands for generic application-independent peer-to-peer computation. 1.1 Motivation

The increasing number of connected users to the Internet in the last decade has opened up various new frontiers for innovation which were unthinkable of only few years ago. Every day over 1 billion Internet users create over 2.5 quintillion bytes of data (IBM, 2013). The demand for computing power and data handling has been increasing exponentially for years, and it’s not expected to slow down any time soon. Such demand spawned an increasing need for making use of the power of as many computers as possible in the most efficient ways to coop with the needed processing tasks. This has led to the creation of several science fiction like parallel computing systems in organisations such as Google, Microsoft, Amazon and various security entities. However, most of the development in parallel and distributed computing has remained almost exclusive to large organisations capable of purchasing and running hundreds of thousands of machines on their premises. 14

Alex Movsessian

Computer Science

BUE

Almost all of the Internet-based public-resource distributed solutions which were created before were based on a quite ad-hoc basis, whereby a very specific instance of a problem was to be solved using a group of “peers”, typically utilising their idle time, with a master command and control server handling the data from all of the peers. The quintessential example of such efforts is the SETI@Home project for analysing data about the potential of extraterrestrial life (Anderson, 2002). There were very few attempts to make large scale distributed computation using arbitrary public computers connected through the Internet in a general manner similar to that of file sharing, whereby any group of computers could cooperate and share a file in a relatively easy and secure manner with minimal requirements in terms of setting up the necessary coordination and control infrastructure and background knowledge. Having a general-purpose and easy to use system for performing distributed computation over the Internet would open up various opportunities for productivity and innovation. Such opportunities would be exploited by a much wider group of users who could begin with a minimal knowledge about distributed computing and yet be able to create useful distributed parallel computing solutions for complex problems. The target problems range from quite moderate large data processing all the way up to advanced data mining, artificial intelligence and image processing algorithms. The discussed project in this paper shall tackle this issue and present a solution in the form of an integrated distributed system structure with various components aimed to be working together, facilitating the creation of distributed applications on general-purpose Internet-connected machines.

15

Alex Movsessian

Computer Science

BUE

1.2 Objective

The project’s main objective is to provide the required components for an integrated distributed system, integrated and usable together in an easy manner even for novice programmers. In addition, the project’s plan is to make use of state-of-art industrystandard security measures for all of its components, ensuring to provide the maximum possible level of security and privacy to the users participating in the system. That objective has to be not confused with anonymity. The project’s emphasis on security comes from the need of preventing any potential abuse for the system that might cause security exploitation risks to the user’s machines. That is somewhat different than providing anonymity for reasons such as free speech. Anonymity is provided as a side effect of some of the security measures built in, however it is not thoroughly emphasised as a primal goal unlike other similar projects which have the sole goal of providing anonymous Internet access, and maintaining it is ultimately the responsibility for the users. The ultimate goal of the project is leveraging the entry barrier for distributed computation, making it possible for novice programmers to experiment and tackle such algorithms and design patterns in a quite friendly and intuitive manner. The proposed system is a distributed computing model which has several aspects aimed for facilitating both distributed file sharing, distributed communication and distributed parallel processing. The primal aspects of the system include: 

A base Transmission Control Protocol (TCP) based communication over the Internet among networked computers reachable through a publicly announced IP address. 16

Alex Movsessian



Computer Science

BUE

A “Proxification” layer for establishing “tunnels” to communicate to nodes which do not have publicly accessible IP address or would like to remain anonymous.



A unique-identifier base system or identifying and locating various entities on the network (nodes and stored resources).



A cryptographic protocol for encrypting all communications end-to-end for all peers participating in the network.



A Distributed Hash Table (DHT) for locating and storing nodes and resources in the system.



A distributed file system based on the aforementioned DHT layer for locating files on the network using unique paths.



An authentication layer operating on top of the distributed file system layer for ensuring the authenticity of returned files based on the previouslyknown identity of their owners.



An internal HTTP Web Server for serving local content from the network to the local client machine through standards-compliant HTTP web browser or a generic user agent.



A custom Python-based programming language for facilitating parallel distributed programming, providing a set of powerful and simple primitives for making use of the network infrastructure as an abstract entity.

17

Alex Movsessian

Computer Science

BUE

Fig. 1 shows a diagrammatic representation for the relationship among the various parts of the system.

Python + HTTP

Distributed Hash Table (DHT)

Cryptography (RSA + AES)

Proxification

TCP Connection

Fig. 1 Relationship between System Components

18

Alex Movsessian

Computer Science

BUE

1.3 Project Management

This section is concerned with discussing the higher-level aspects of managing the software project. It presents the selected software development methodology, coupled with a tabular and a visual Gantt Chart representation. 1.3.1 Management Methodology

Given the research centric nature of the project, as the requirements and feature were expected to evolve throughout the lifetime of the project due to the various results discovered during the research phase and the testing and experimental phase, it was deemed that the Rapid Application Development (RAD) methodology was the most suitable one. The RAD model emphasis interleaving the project planning with the programming the software, either through the creation of fully functional modules or prototypes for use as proof-of-concept for potential features (Larsen, 2008). In addition, RAD encourages writing test cases proactively, which is crucial in case of the system being developed, due to the somewhat large and complex interaction between its various components, which are being added at different stages, and should not change the previous behavior or result in unintended changes in the interaction among the previously made components and relationships among them (Roman, 2005). The RAD model is adopted by various software organizations, and its increasing usage was the primal reason behind several features added to model Integrated Development Environments (IDEs) such as Visual Studio to facilitate and accelerate the RAD process (Hejlsberg, 2007). Fig. 2 shows The RAD Model (Leffingwel, 2007).

19

Alex Movsessian

Computer Science

BUE

Fig. 2 The RAD Model (Leffingwel, 2007) 1.3.2 Tabular Representation

Table 1 Project Schedule Time Period September 2012 - November 2012

Tasks 

Implementation of the server side HTTP Server with support for: o Displaying WebPages o Displaying Regular files o Handling Cookies o Handling Forms (GET and PUT requests) 

Initial research about network structure and cryptography and authentication mechanisms

20

Alex Movsessian

December 2012 - January 2013

Computer Science

BUE

Implementation of the network portion: o Data and command transfer mechanisms. o Authentication, cryptography and other security measures of the protocol. o Testing of the network portion of the project on a sample network of 10-20 computers.

February - March 2013 o Implementation of the dynamic programming language. o Thesis Writing. April 2013

o Testing the program on medium-sized networks o Building sample dynamic websites using the language. o Building sample multi-peer applications using the language: For example, a chat application. o Building sample distributed applications based on the programming language [e.g. Merge sort, Chat application, Controlling of multiple client computers simultaneously]

May 2013

o Continuation of testing o Thesis Writing

21

Alex Movsessian

Computer Science

BUE

1.3.3 Gantt Chart Representation

A Gantt chart representation for the major milestones is included in Appendix A. 1.4 Licensing

Considering the quintessential nature of the produced work, it was concluded that an open source release would fit the best, as it would allow for maximum collaboration among interested developers in creating related works based on the system, maintain it and to add new features and customisations as they please. Putting the work in the public domain, lifting any copyright restrictions was one option. However, to prevent potential inappropriate usage by commercial entities which might add features to the system and sell them for-profit in unrelated products, gaining significant amounts of money without benefiting back the open source community and not putting attribution back to the author or to the community members, it was decided to choose the Creative Commons ShareAlike license. The license allows for use on the condition of sharing back the source code that was added to the system, thus ensuring it remaining as free and open source. The full detailed legal code of the license is provided in Appendix E. Fig. 3 shows a summary for the Share Alike license agreement (Creative Commons, 2012).

22

Alex Movsessian

Computer Science

BUE

Fig. 3 Creative Commons License Summary (Creative Commons, 2013)

23

Alex Movsessian

Computer Science

BUE

Chapter 2 Review of Literature Chapter 2 - Review of Literature

24

Alex Movsessian

Computer Science

BUE

2. Review of Literature 2.1 Computer Networking 2.1.1 OSI Model

The Open Systems Interconnection (OSI) is a conceptual model for representing the abstract layers which form the current communication model over the Internet (Stallings, 1987). Fig. 4 shows the seven layers of the OSI model (Mintzberg, 2010).

Fig. 4 The OSI Model (Mintzberg, 2010) The physical layer is concerned with the actual physical infrastructure in terms of routers, gateways, hubs and fibre optic cables. The Data Link layer concerned with the transmission of data between directly connected network entities. The Network layer is the one responsible for transmitting a data packet from a host on a particular network to another host on another network (that is 25

Alex Movsessian

Computer Science

BUE

where the IP protocol operates). The transport layer is concerned transmission of data to the applications, the ones at the layers above it (this is where the TCP protocol operates). The layers above the transmission layer are responsible for the operation directly with individual user application. Some of the upper layers may be not used directly. 2.1.2 The IP Protocol

The IP Protocol is the main protocol used to identify networked nodes, both in a Local Area Network (LAN) or on the Internet. The IP main function is routing of messages from one location onto another given a certain address (Forouzan, 2002). The most used version of IP currently is IPv4, which has a 32 bit address space (4 billion nodes) and is in the process of being gradually replaced with the more recent version of IPv6 with a 128 bit address space to satisfy the increasing demand for more connected nodes on the Internet (Deering, 1998). Fig. 5 shows the structure of an IP packet (Cisco, 2012).

Fig. 5 IP Packet Structure (Cisco, 2012)

26

Alex Movsessian

Computer Science

BUE

2.1.3 The TCP Protocol

The TCP Protocol servers as one of the backbone protocols for the Internet together with IP, for transmitting messages in ordered sequences guaranteeing a reliably and error-checked transmission across various nodes, assuming no malicious interference (Forouzan, 2002). The primal drawback of TCP is the lack of adversarial expectation (Bellovin, 1989). Thus, it is possible for a 3rd party to interfere the traffic among two connected parties and listen to or alter it without a clear method for detection assuming the usage of TCP alone. Fig.6 shows the structure of a TCP packet (Cisco, 2012).

Fig. 6 TCP Packet Structure (Cisco, 2012)

2.1.4 Client-Server Web Structure

The traditional web, which was introduced by Tim Berners-Lee in 1990, which became the basis for hosting the majority websites on the Internet until now:

27

Alex Movsessian

Computer Science

BUE

The fundamental intuition behind the web is that it consists of three main entities: Clients, Servers and service providers (Berners-Lee, 2001). 

Clients: The users interesting in visiting hosted websites.



Servers: The computers which host the content which the clients connect and obtain their data from.



Service providers: Includes entities such Internet Service Providers (ISPs), domain name servers and proxy servers.

Fig.7 shows a diagram of the traditional web model.

Server

Client 3 ISP

Client 1 ISP Client 2 ISP

Client 1

Client 2

Client 3

Fig.7 Client Server Model

28

Alex Movsessian

Computer Science

BUE

The major drawbacks of the traditional web are: 

DDoS attacks: Distributed Denial of Service Attacks: Which exploit a major flaw in the architecture of the web, which is that the bandwidth available to the server becomes inversely proportional with the amount of consumption by the clients connecting to it. To simply put it, the more clients who demand content from a server, the more stressed that server becomes, and eventually -if the attack is strong enough- it can no longer have sufficient bandwidth for delivering any content. Even without DDoS attacks specifically, paying for the bandwidth for hosting content is the main cost factor for any major website with a lot of traffic (Hills, 2006).



Vulnerability to censorship by the service providers: For example governments blocking websites for political reasons (Harwit, 2001).



Vulnerability to the abuse of legal frameworks of the Internet: For example, a website’s domain name could be ceased by the country operating it (e.g. the United States has the right to cease any .com domain name) (Cukier, 2005).



Vulnerability to spoofing by the service providers: Whereby an ISP or a malicious hacker might redirect the traffic coming to a website to another one, in order to steal information (Beverly, 2009).



Vulnerability for interception: For example, governments recording chat and email conversations of citizens (Sinclair, 2002).

Those drawbacks, coupled with other factors led to the proliferation of the Peer-to-Peer networks which began gaining significant momentum since the late 1990s.

29

Alex Movsessian

Computer Science

BUE

2.1.5 Peer-to-Peer Networks

Peer-to-Peer networks are an alternate model for establishing connections between a group of computers interested in communicating and exchanging data or resources together. The model does not make use of a centralized server, rather it relies on the peers themselves to act as both clients and servers within the system (Bellovin, 2001). First generation of P2P networks included the infamous Napster and Kazaa. The main feature of the first generation of P2P networks was having a main centralized group of servers for the purposes of location peers and files on the same network (Bellovin, 2001). That led to the following problems: 

The central servers were able to monitor the entire network, which led to several potential privacy concerns.



As the networks grew in terms of number of peers and files, scaling the coordination and search indexing servers up was becoming a critical issue, and in case of downtime of a critical number of the main command and control servers, the entire network would become effectively shut downed.



It was relatively easy to block access to such networks, either by filing copyright lawsuits based on the claim that they facilitate sharing copyrighted files, or by merely blocking the domain name entries of the centralized servers from a particular network (Yang, 2008).



Most implementations were based on downloading each file from one peer, as a one piece, much like the traditional server implementation (Pouwelse, 2005).

30

Alex Movsessian

Computer Science

BUE

Thus, despite the hype surrounding P2P in its infancy, the first generation networks did not provide a sufficient solution to the problems faced by the traditional web.

Second generation of P2P networks began gaining momentum in the mid 2000s following the problems which followed the first generation, in particular the shutting down of Napster. Examples include Bit Torrent and Gnutella. This is the current generation of P2P networks in operation around the world.

The major traditional web problems which the second generation P2P networks solved: 

Immunity against DDoS attacks, as there were no longer centralized servers to search file hosted files or coordinate peers, thus each peer in the network contributed its bandwidth to the swarm of peers requesting a file, even before fully downloading a file. Thus, the more requests become available for a file, the more bandwidth it gets (Garbacki, 2005).



Splitting files into multiple pieces, also known as chunks. Chunks are parts of a file, based on a start offset and number of bytes of the chunk. Thus facilitating downloading files from multiple peers simultaneously, where each peer contributes a chunk or more of the file (Epema, 2005).

The major drawbacks for the second generation of P2P networks are: 

Lack of encryption in most default implementations, which did not help in solving the censorship issue, as a lot of ISPs and governments collaborated in installing traffic analyzers which could determine the content being downloaded, and link it back to the users who downloaded it. In recent years there have been an increasing number of lawsuits

31

Alex Movsessian

Computer Science

BUE

against individual’s downloaded content (e.g. movies) from P2P networks such as Bit Torrent (Hatehet, 2010). 

Lack of a centralized way to search the network for files due to lack of centralized servers. A drawback that did not exist with the first generation, but was it was a necessity to be introduced in the second generation, to keep the network from making it an easy target to take down (Tian, 2006).



Dependence on “tracker servers”: Each file hosted on the Bit Torrent or similar networks is being tracked by one or more servers, whose role is to facilitate finding peers having the same file for a new peer to download it from them. If all the tracker servers for a file go down, it won’t become possible to locate and download the file (Wu, 2006).

Fig.8 shows Illustration of the Bit Torrent Network (Siganos, 2009).

Fig. 8 Illustration of the Bit Torrent Network (Siganos, 2009)

32

Alex Movsessian

Computer Science

BUE

In addition to P2P networks for sharing files, there have been other P2P networks introduced in the recent years for the purpose of providing anonymity for servers. One particularly popular example is the Tor network. Fig. 9 shows the logo of the Tor network, one that has became quite famous in the recent years in various news outlets (The Tor Project, 2012).

Fig. 9 The Tor Network Logo (The Tor Project, 2012) The Tor network (and other similar networks such as Freenet) which was introduced in 2004 provides a method for web servers to host content without being directly observable. The way servers are identified and authenticated is by their public key which acts as a pseudo domain name identifier. The basic intuition is that one does not connect to the server hosting the content directly, but instead goes through a series of at least 3 nodes other nodes in the network prior to reaching the server, with a public key based authentication mechanism in place to verify that one is connecting to the intended server on the far end of the network (McCoy, 2008). However, the Tor network (and similar ones like Freenet) does not provide ability to host files within the network itself, as all the hosted websites are hosted on their individual websites such as the case in the traditional web. Thus, the Tor network might still provide anonymity for servers; however it still has most of the drawbacks of the traditional Web servers:



DDoS: Tor network servers can still be vulnerable to DDoS attacks, just like the regular web. In fact, DDoS attacks on Tor-hosted 33

Alex Movsessian

Computer Science

BUE

servers does have another major problem of slowing down the entire Tor network significantly as it would slow down not just the server hosting the data but all the intermediate nodes for each connection being made to the server .



Anonymity: Despite the connection being passed through several nodes, anonymity of the data hosted on Darknet networks such as Tor can still be compromised via two primary methods:



Mass-network sniffing: Whereby an entity capable of dedicating a sufficient number of computer resources could be monitoring and analyzing the communication patterns of a large number of nodes (connected peers) in the Tor network, and by various means of statistical correlation analysis obtain several clues about where the actual servers are hosted.



Server software exploits: Whereby an expert in server software or a hacker can find an exploit in the server software enabling him/her to know the server actual IP address and take it down.

Fig. 10 shows an illustration of the functional structure of the Tor network (McCoy, 2008).

34

Alex Movsessian

Computer Science

BUE

Fig. 10 Illustration of the Tor Network (McCoy, 2008) Thus, the state of the P2P networks is currently in a dichotomy. One end of the spectrum offers to host files only such as Bit Torrent and on the other end, networks such as Tor offering anonymity for existing servers which allow hosting of full websites on regular servers. None of which provide true means security for the content. In case of P2P file sharing networks such as Bit Torrent, their encryption is typically non-existent or optional at best, and in case of Darknets like the Tor network, the true location of the server could still being exposed, and even with the location of the servers remaining anonymous, they still are vulnerable to DDoS attacks such as the traditional web. It is also worth to mention that besides those two major architectures for P2P which are used by millions around the world currently, there exist other fragmented “Ad-Hoc” P2P networks for specific purposes of utilizing the computer power of the peers, such as the SETI@Home which aims to gather 35

Alex Movsessian

Computer Science

BUE

enough computing power for analyzing mass data and signals in search for signs of intelligence in the outer space (Anderson, 2002). Fig. 11 shows the SETI@Home project logo (SETI@Home Labs, 2013).

Ad-hoc networks such as SETI@Home have yet another major drawback. When a new distributed computing task becomes available, usually programmers create new P2P applications from scratch to support such Ad-Hoc applications. This

Fig. 11 SETI@Home Network Logo (SETI@Home Labs, 2013)

makes them have the challenging task of finding users to join the network and distributing their application to them, in addition to having to make sure that their networking and communication codes in the new programs are correct and work as intended.

36

Alex Movsessian

Computer Science

BUE

2.2 Distributed Systems 2.2.1 DHT

Distributed Hash Tables (DHTs) are a form of distributed systems which offer the basic functionality of a hash table, that is: 

Storing key value pairs (k, v).



Looking up a value given its key.

DHTs are intended to be formed through arbitrary number of nodes connected together in a peer-to-peer network, without any centralized servers or any form of “master” nodes holding the coordination among the participating nodes in the system (Gaurang, 2013). Reliability wise, DHTs are designed to provide maximum possible sustainability, allowing nodes to enter and leave unconditionally without causing any significant degree of disruption for the rest of the nodes or the stored values (Naor, 2003). Prior to the invention of DHTs, lookups were still possible, however they heavily relied upon recursive algorithms, which were quite inefficient as they required traversing the entire network in a recursive fashion, causing general slow down for any file lookups, and potential for misuse and spreading malware through forging the returned results (Kalafut, 2006). With the increasing growth of the Internet, in particular high-bandwidth always-on computers, DHTs gained a significant amount of momentum and found several applications in which hundreds of thousands and up to tens of millions of peers were connected simultaneously. DHTs for most of the time acts as a “middle layer” for such applications, facilitating a part of their major

37

Alex Movsessian

Computer Science

BUE

functionality that could not be done otherwise. Applications using DHTs currently include:



Bit Torrent: The popular file sharing system, which uses DHTs to avoid having centralized “tracker” servers

Fig. 12 BitTorrent Logo (BitTorrent Labs, 2013)

which could be taken off by governments or by sheer amount of overwhelming traffic for popular content. In case of Bit Torrent, the DHTs map from the hash code of a particular file (the key) onto another file that holds more detailed information about the torrent structure which includes sub-files, each with its hash and block size (Cohen, 2008). Fig.12 shows the BitTorrent logo (BitTorrent Labs, 2013). 

BitCoin: The increasingly popular online virtual currency makes use of DHT to keep track of all the made financial transactions among all peers, to prevent issues such as double spending and race conditions which could lead to unintended behaviour such as spending the same currency token twice (Babaioff, 2012).



Freenet: An anonymous peer-to-peer network makes use of DHTs for storing the information about files stored on the network, mapping from their names to their hash codes and actual content (Clarke, 2012).



Botnets: Such as the Kelihos and Storm botnets, which make use of DHTs to coordinate attacking targets without having a “master” command and control server that could be disrupted by law enforcement agencies (Ortloff, 2011). 38

Alex Movsessian

Computer Science

BUE

DHTs have several parameters, some of which include: 

Key size (address space): Assuming the values are numerical (which could be generalized to any other value type), the key size is the number of different values a key could have. Typically powers of two are used. A key size of 128-bits means 2128 different values to the key are possible.



Lookup method: This is how the lookups are performed for a particular key. Methods include either iterative (where one node iterates through the required nodes to find the required value) or recursive. Most DHTs prefer an iterative method for looking up, as it puts less stress over the network.



Stored objects type: Primitive DHTs store numbers or single strings; however various ones allow the storage of arbitrary objects. The stored object type does not depend on the DHT structure, as it would on the actual implementation details.



Redundancy: For improving reliability, redundancy is usually a critical parameter for DHTs, whereby a single key-value pair is stored at multiple nodes to prevent the loss of data in case a node goes offline, and to accelerate the lookup process for popular values.

39

Alex Movsessian

Computer Science

BUE

2.2.2 Kademlia

One of the major DHT structures widely used on the Internet is the Kademlia structure. Kademlia was designed by Petar Maymounkov and David Mazières in 2002. It is currently used in various file sharing networks such as the Kad network, Bit Torrent network and the Bit Coin network. In Kademlia, each node and each file is supposed to have a unique numerical identifier. In their paper, Maymounkov and Mazières did not put restrictions onto which values should be used, as long as they were uniformly random (Maymounkov, 2002). Kademlia introduces the notion of a “distance metric”, which is used to classify how two nodes are close to each other. The idea of a distance metric was used in other DHTs as well such as an older DHT system called “Chord”. The primal motivation behind it is the intuitive fact that if one wants to find a value, one would go to the closest peer he/she knows about in the system, and then if that peer is not the intended one, it would return the list of closest peers it knows about, for which one would ask again in an iterative process until the value is either found or the search space is exhausted (Maymounkov, 2002). The distance metric in Kademlia is the Exclusive Or (XOR) of two IDs. The advantage is that XOR satisfies the generic distance properties, namely: 

Self Distance is always equal to zero: XOR (X, X) = 0



Symmetry: XOR(X, Y) = XOR(Y, X)



Triangular Inequality: 40

Alex Movsessian

Computer Science

BUE

The sum of any two sides is always greater than the third side. In its lookup process, Kademlia does not have distinction between looking up for peers or for keys. Both do have the same address structure. A lookup function has the parameter of ID and a flag for looking up values. When a node receives a lookup request, it examines the closest nodes it knows about with relative to the requested ID. The primal difference is that when looking up values, a particular node would return the value if it owns it, otherwise it would return the list of the IDs of the closest nodes it knows about. The process keeps going forward as long as the newly retrieved nodes are smaller in their distance from the requested node. That would mean advancing a single bit at a time. The primal advantage for this design is that all lookups are O(log n), where n is the number of nodes in the network (Maymounkov, 2002). To illustrate the advantage of such a speedy lookup process, the logarithmic function would mean for example no more than 20 lookups for nodes if the network has 10 million simultaneously connected nodes and no more than 32 if the network has 4 billion simultaneously connected nodes. Fig.13 shows a Kademlia Binary Tree Example (Maymounkov, 2002).

41

Alex Movsessian

Computer Science

BUE

Fig. 13 Kademlia Binary Tree Example (Maymounkov, 2002)

42

Alex Movsessian

Computer Science

BUE

Fig. 14 shows the Kademlia lookup process illustrated by an example (Maymounkov, 2002).

Fig. 14 Kademlia Lookup Process (Maymounkov, 2002)

For practical purposes, the original Kademlia paper suggested that networks with large address space (for example 160 bits), that the lookup be divided into “chunks” of the address. That means instead of comparing the 160 bits, which would lead to the bits following the first few having a probability of almost zero to become a match, to compare every i th bit instead, thus having a more efficient storage facility for storing the known nodes. 43

Alex Movsessian

Computer Science

BUE

For reliability purposes, the original design paper of Kademlia makes the observation that a node is more likely to remain online in the future as a function of how long it stayed online previously (its uptime). Graph.1 shows a studied Node Uptime Probability (Maymounkov, 2002).

Graph. 1 Node Uptime Probability (Maymounkov, 2002) That observation is used when picking which nodes to keep as “favourite” nodes, if the distance is the same, or as a general heuristic rule to optimize the lookup process. Implementation wise, the original design paper suggests using UDP (User Datagram Protocol). However, in almost all of recent implementations UDP has been dropped in favour of TCP for increased reliability.

44

Alex Movsessian

Computer Science

BUE

2.2.3 Distributed File Systems

A Distributed File System (also known as Clustered File System or Parallel File System) is a system for serving files, enabling accessing them from a variety of computers simultaneously (Baker, 1991). They are either centrally managed (such as the Google File System, or Microsoft’s Distributed File System) or decentralized (either fully or partially by means of several “master” nodes monitoring the file system). Distributed File Systems have the following criteria (Howard, 1988): 

Avoiding Single Point of Failure: Due the nature of having a large number of connected nodes, failure of any number of nodes is expected and should be considered in the design process.



Performance: Overhead for data exchange should be kept minimal to avoid having exponential performance drawbacks as the number of connected nodes and stored files increase.



Concurrency: Mechanisms should be set to avoid having collisions between multiple versions of files issued by various parties.



Security: Measures should be put in place to prevent single node or small number of misbehaving nodes from abusing the system and causing irrecoverable damage.

45

Alex Movsessian

Computer Science

BUE

2.3 Security

2.3.1 Symmetric Key Encryption

A symmetric key encryption algorithm is one that given an input (called “plain text”), and a key, outputs what’s called “cipher text”, which is based on deterministic transformations of the plaintext based on the key. The corresponding decryption algorithm does a reverse process of taking a cipher text and transforming it to the original plaintext it was created from by making use of the same key that was used for encryption (hence the “symmetric key” terminology) (Bellare, 1997). Two major types’ symmetric-key encryption methods exist, which are: 

Stream Ciphers: Which encrypt a stream one character (or token) at a time. Those have the advantage of being “online”, as in encrypting the data on the fly, with a minimal overhead and time (Shujun, 2001).



Block Ciphers: Which encrypt the data one block at a time, padding the plaintext whenever needed to match the specified size of the block parameter for the algorithm. They do make use of various “rounds”, whereby in each round a group of permutations is performed on the data from the previous round based on a new key derived from the previous round permutations on the original key (Bellare, 2000).

Both types of ciphers produce an output that is the same size as the input, having very minimal space overhead, with minimal runtime requirements, as the operations were designed to be conducted in a speedy fashion, with even various processors having built-in special purpose instructions particularly for block cipher algorithms.

46

Alex Movsessian

Computer Science

BUE

Data Encryption Standard (DES) used to be the dominant block cipher until the late 1990s, where the advance in computing power rendered it obsolete. Instead, the Advanced Encryption Standard (AES) evolved in 2000 as the winner for a contest held by the U.S. National Institute of Standards and Technology (NIST) to create the next generation cipher for use on top-secret documents protection (Lipmaa, 2000).

47

Alex Movsessian

Computer Science

BUE

2.3.2 AES Algorithm

Table.2 shows a summary of the AES algorithm features. Table 2 Summary of AES Algorithm Features (Paar, 2010) Block size

128 bits (fixed)

Key sizes

128, 192 or 256 bits

Rounds

10, 12 or 14 (varies according to the key size)

No critical attack methods have been developed against AES and it has been in use by the US government starting from 2001 for all data including “top secret” classified ones. The popularity of AES drove various software manufacturers to support it for encrypting the traffic of their applications. Various software vendors included AES libraries in their development tools (such as the .Net AES library and the Java AES library). Hardware manufacturers such as Intel embedded AES instruction sets in their high-end processors as well for speeding up the encryption and decryption process. A throughput of 700 MB per second per thread has been claimed on the Intel i7 chipset, which was credited largely to the parallel enhancements which were possible for AES (Benadjila, 2009). Fig. 15 shows the basic operations of the AES algorithm at a high level (Paar, 2010).

48

Alex Movsessian

Computer Science

BUE

Fig. 15 AES (Paar, 2010)

49

Alex Movsessian

Computer Science

BUE

2.3.3 Public Key Encryption

Public Key Encryption is also known as “asymmetric key encryption” makes use of two distinct keys which are created together. The keys could be referred interchangeably as public key and private key. A message M encrypted by one of the keys could be decrypted by the other (Cramer, 2003). The primal advantages of public key encryption is Authenticity: Because one of the keys is not shared, it could be used for either encryption (in which case ensuring only the party who know the secret key was the one who encrypted a certain piece of data), or decryption (ensuring only a single party would be able to read a certain message). The former is referred to as “digital signature”. Disadvantages to public key encryption include: 

Relatively slower speed for encryption and decryption.



Not supporting messages larger than the key size, which causes the need for relatively sophisticated splitting for the messages.



Adding potential overhead to the encrypted messages if they are less than the key size (Barrett, 1987).

To combat the aforementioned disadvantages, public key encryption is usually used with symmetric key encryption, whereby public key encryption provides the authenticity portion that is lacking in symmetric encryption, while symmetric encryption provides the speed and low overhead needed for highspeed communications (Harkins, 1998).

50

Alex Movsessian

Computer Science

BUE

2.3.4 RSA

The RSA algorithm is the most widely known and used design for public key encryption. It was first described in 1977 by Ron Rivest, Adi Shamir and Leonard Adleman. The RSA consists of generating a key pair first. Fig.16 shows an algorithm for performing that process (Holmes, 2012).

Fig. 16 RSA Key Pair Generation (Holmes, 2012)

The encryption and decryption are illustrated through the following equations (“m” is the plaintext and c is the cipher text).

Early implementations of the RSA algorithm included various issues which were resolved eventually, some of which include: 51

Alex Movsessian



Computer Science

BUE

Timing attacks: Given the time that a message was generated, and knowing that the random number generator used for generating the primes has initialized its seed by that time, it was possible to reconstruct the key pair by a 3rd party. This issue has been resolved by using other parameters than the time for initializing the random number generator, hence the invention of “cryptographic random number generators (Jun, 1999).



Enumerable RSA keys: Various faulty RSA key generators would general only an enumerable number of keys (in the range of few millions), which could be all enumerated eventually and tested on various keys. Such an issue has been resolved by the invention of more secure RSA key generators (Kelsey, 1998).



Small Key Sizes: Using enough computing power (dozens or hundreds of machines) which is currently obtainable through services which rent processing time such as Amazon EC2 or Microsoft Windows Azure, it is possible to have enough computing power to crack RSA keys of smaller size such as 256 or even 512 bits. Such an issue led to the recommendation of using key sizes of at least 1024 bits, with 2048 becoming an industry standard (Courtois, 2007).

52

Alex Movsessian

Computer Science

BUE

2.3.5 Key Exchange

As previously mentioned, symmetric key encryption algorithms such as AES are quite efficient and fast, yet they do lack the authenticity needed for them to operate on an adversarial open medium such as the Internet. Using AES for encryption is acceptable on the Internet assuming the two parties could first exchange the required encryption key in a safe manner, without any potential adversaries able to intercept it. The RSA key exchange between parties A and B is as follows (on the assumption that party A is the one which would initiate the exchange): 

“A” generates a random number; to be used once and only once (called nonce as in “number use once”).



“A” encrypts the nonce with B’s public key. Only B could decrypt the nonce.



“A” sends the encrypted nonce to B.



“B” receives the nonce, decrypts it, concatenates a nonce of its own encrypted with “A”’s public key, and sends the new message to “A”.



By this point, “A” would have authenticated “B”, and thus it would send “B”’s nonce after decrypting it, with an AES key to use for encrypting the exchanges messages for the rest of the session.



Receiving its nonce, “B” would have authenticated “A” as well, and the secure transmission could begin, with all messages encrypted using the AES cryptographic algorithm.

53

Alex Movsessian

Computer Science

BUE

2.3.4 Hashing Algorithms

A hashing algorithm is a function that takes an input from a universal key space and maps it to a smaller key space generating the hash value. Hash codes are used for variety of purposes such as authentication of data and integrity checking. Hash functions which are used for critical data are referred to as “cryptographic hash functions”, which exhibit various properties to resist potential crypto analytical attacks (Preneel, 1994). Hash functions have the following features: 

One-way Function: It is infeasible given a hash code to generate the original value from which it was hashed.



Collision Resistance: It is infeasible to find two values for which their hash code is the same.

Fig.17 shows A hash function used in a phonebook to map names to phone numbers (Rogaway, 2004).

54

Alex Movsessian

Computer Science

BUE

Fig. 17 A hash function used in a phonebook to map names to phone numbers (Rogaway, 2004)

2.3.5 SHA 512

A particular implementation of hashing algorithms is the Secure Hash Algorithm (SHA). In the 512 bit variation, it produces hashes of 512 bit size for a given byte array of input (Grembowski, 2002). In 2001 it was approved by the US National Institute for Standards and Technology (NIST) for usage of government documents for data processing purposes, and has since become the industry standard, having various supporting libraries in modern programming frameworks such as .Net and Java. The primal security aspect in SHA is its ability to map inputs onto unique outputs with infinitesimal probability for collision given inputs of reasonable sizes. That is partly due to the larger key space it has than previous hash algorithms (such as CRC32) and its inherent mathematical properties as a design specification to ensure a sufficient level of complexity.

55

Alex Movsessian

Computer Science

BUE

SHA divides the input into various blocks, and makes use of a cryptographic compression function to reduce the size of each block onto a fixed one that is represented by the number of keys in the SHA parameter. Fig. 18 shows the operation of the SHA-512 algorithm (Grembowski, 2002).

Fig. 18 SHA-512 (Grembowski, 2002)

56

Alex Movsessian

Computer Science

BUE

2.4 Web Services 2.4.1 HTTP Protocol The Hypertext Transfer Protocol (HTTP) is the most commonly used protocol on the web for transmitting web pages. It was designed in 1991 by Sir Tim Berners Lee for satisfying the need of serving web pages to a variety of different computers which not a common protocol for message exchanged before. HTTP is a stateless protocol, which does not maintain memory about any previous messages exchanged during a communication session. The primal operations it supports are “Request” with a parameter indicating the requested resource (for example a web page path), and “Response” which carries out a response for a previously sent request message (Fielding, 1999). HTTP is a client-server protocol, whereby the client entity connects to a server and sends its messages, and the server replies to the clients. In HTTP, a client is known as “user agent”. The most widely adopted HTTP standard is HTTP/1.1 which was released in 1999. Fig. 19 shows the basic operation of the HTTP protocol for a client to retrieve a page from a server (Budgen, 2003).

57

Alex Movsessian

Computer Science

BUE

Fig. 19 basic operation of the HTTP protocol for a client to retrieve a page from a server (Budgen, 2003 HTTP supports a number of status codes to indicate certain flags (Berners-Lee, 1996). Table. 1 shows the different HTTP status codes and their meanings. Table. 3 HTTP Status Codes (Berners-Lee, 1999) Status Code

Semantic Meaning

200

No errors, request satisfied successfully

301

Web page moved permanently

305

A proxy is required

403

Access denied

500

Internal Server Error

404

Page not found

58

Alex Movsessian

Computer Science

BUE

In addition, HTTP supports the notion of “cookies”, which are small chunks of text (not more than 4 kilobytes typically), which when sent by the server are stored locally and then sent every time from the browser to the server (Bristol, 2001). Fig.20 shows the structure of HTTP header messages (Berners-Lee, 1999).

Fig. 20 HTTP Message Structure (Berners-Lee, 1999) Various HTTP Clients exist such as Internet Explorer, Mozilla Firefox and Google Chrome. HTTP Servers exist as well, either though independent applications such as Microsoft’s Internet Information Services (IIS) or natively in operating systems such as the “HTTPServer” Windows APIs.

59

Alex Movsessian

Computer Science

BUE

2.4.2 Software as a Service (SaaS)

Software as a Service (SaaS) is a growing 12 Billion US dollars a year industry. The primal aspects of SaaS evolves around offering applications which make use of significant and expensive computation of networking power (Buxmann, 2008). Such applications are accessible through relatively-affordable “thin clients”, a term that refers to computing equipment with the minimal capabilities of an interface to larger system running in the background (Buxmann, 2008). SaaS has been existing in various forms since the dawn of the modern computing era in the 1960s, when mainframes had “terminals” accessible by various researchers and other computer users who could make use of the powerful terminal resources by means of the terminals providing the input/output mechanism (Turner, 2003). More recently, the primal focus for SaaS became on the “cloud” platforms. “Cloud” platforms nowadays refers to services offered by major corporations such as Amazon’s EC^2 and Microsoft’s Windows Azure (Mell, 2011). Those corporations make use of their ability to purchase large amounts of computing machinery and storing them in what’s being called “server farms”. A typical server farm would have 10,000-100,000 machines having storage in the excess of petabytes, with a combined processing power equivalent to that of a supercomputer (Gandhi, 2010). By having the computing infrastructure in place, such corporations could quite effortlessly lease portions of it for small individual requests at a minimal cost, as the bulk of the cost is distributed evenly among millions of potential users. 60

Alex Movsessian

Computer Science

BUE

The primal criticism for such offered services evolves around lack of clear privacy regulations, whereby governments of certain other entities such as advertisers might have access to customer’s data, as all of the data is encrypted by ways known and accessible by the vendor. Another critical point around such cloud services is the alleged price-fixing techniques preventing the services from getting low-enough in price as it could be due to various agreements between service providers (Connor, 2007). In addition, various cloud service providers impose limits on the maximum duration, power or bandwidth that could be used by an individual at a time. Such limits prevent the expansion of certain services beyond a certain point after which they become forced to come up with their own solutions at a significantly higher investment cost (Cooper, 2012).

61

Alex Movsessian

Computer Science

BUE

2.5 Used Tools 2.5.1 Python Fig. 21 shows the iconic logo of the Python programming language (Python Software Foundation, 2012).

Fig. 21 Python Logo (Python Software Foundation, 2012) The Python programming language was designed by Guido van Rossum and it first appeared in 1991. It is a general purpose, high-level, object oriented language with emphasis on code readability and simplicity (Van-Rossum, 1994). Python encompasses various programming paradigms such as imperative, object oriented and functional. Python makes use of dynamic typing, whereby the data types are implicit and are resolved during the runtime. A data in Python is either a primitive value type (such as an integer) or an object. Python features an automatic garbage collection for objects in a manner similar to the Java programming language. Python and its reference implementation (which is called CPython) are free and open sourced, where the major development decisions are conducted through a committee headed by the language’s creator Guido van Rossum. Aside from having dynamic typing, the statement and control flow of Python is quite similar to that of other general purpose languages such as C++ or Java, with minor syntactical differences such as the use of whitespace for delimiting blocks rather than parenthesis or bracketing keywords (Chun, 2006). Python was originally conceived to be a scripting language, one that is used for creating short adhoc scripts for small applications; however it later 62

Alex Movsessian

Computer Science

BUE

evolved and became used for creating widely used large applications, mainly due to its ease of use and large library providing quite significant amount of ready-touse functionality at a minimal effort. The current state of Python has two major releases being incremented in parallel, one of which is the 2.7.x track, and the other is the 3.x track. Both are quite similar yet do exhibit several syntactic differences making them not directly compatible with each other (Chun, 2010). Table.4 includes examples of the Python standard built-in library modules (Lund, 2001). Table 4 Python Standard Library Modules (Python Software Foundation, 2012) Module Name

Functionality

zlib

Compression compatible with gzip

zipfile

Work with ZIP archives

csv

CSV File Reading and Writing

hashlib

Secure hashes and message digests

sha

SHA algorithm

os

Miscellaneous operating system interfaces

io

Core tools for working with streams

time

Time access and conversions

thread

Multiple threads of control

mmap

Memory-mapped file support

email

An email and MIME handling package

uu

Encode and decode uuencode files

HTMLParser

Simple HTML and XHTML parser

xml.dom

The Document Object Model API

browser

Convenient Web-browser controller

BaseHTTPServer

Basic HTTP server

63

Alex Movsessian

Computer Science

BUE

2.5.2 The C# Programming Language Fig.22 shows the logo of the C# programming language (Microsoft, 2013).

Fig. 22 C# Logo (Microsoft, 2013) The C# programming language was designed by Microsoft through a development team led by Anders Hejlsberg. It was first released in 2000, together with the first version of the .Net framework. It was created to be multiparadigm, encompassing the lessons learnt from developing previous programming languages and providing a powerful standard library with a powerful editor and debugger (Microsoft Visual Studio) to facilitate the increasing demands of modern application development (Williams, 2002). C# initially supported static typing exclusively, however with recent versions dynamic typing has been added. C# programs are converted into an intermediate language created by Microsoft called the CLI (Common Intermediate Language). Programs are executed using the Common Language Runtime (CLR), which is responsible for handling the various library functionality and performing the necessary operating system abstractions (Kokholm, 2006). C# has various features suitable for modern programming tasks (Gannod, 2008), such as: 

Safe software engineering practices: It supports automatic garbage collection, bound checking for arrays, detection of usage of uninitialized variables.

64

Alex Movsessian



Computer Science

BUE

Modular design support: The language encourages modularity through supporting several important constructs such as classes and namespaces.



Internationalisation: C# strings are all Unicode, allowing seamless support for various languages without any altering in the implementation of a particular software program.



Portability: C# programs are capable of executing on a variety of platforms and operating systems, ranging from traditional desktop and laptop computers all the way through embedded systems and microcontrollers.

In addition, C# introduced various new constructs which were not common for general purpose languages (Shildt, 2008), such as: 

Lambda functions: Which are generic functions, defined and called “inline” (with no need to be previously declared elsewhere).



Properties: This is a syntactic sugar over traditional getters and setters, allowing for a more slick looking code to be produced.



LINQ: Which is a query language similar to SQL, allows performing sophisticated querying operations on aggregate data structures (such as arrays or dictionaries) or resources (such as external databases).

65

Alex Movsessian

Computer Science

BUE

2.5.3 Model View View-Model (MVVM)

Model View View-Model is a design pattern conceived by Microsoft as a more-modern alternative to their existing Graphical User Interface (GUI) building and designing mechanism that was called Windows Forms (Microsoft, 2006). The MVVM pattern evolved from the need to separate the program’s logic and user data from the user interface. Older UI designing patterns such as Windows Forms allowed the proliferation of various programming malpractices, the main one of which was using the GUI as a backing store for data, thus complicating the design process, raising the potential for bugs and increasing the difficulty of debugging and porting applications to other platforms (Smith, 2009). Fig. 23 is an illustration of the MVVM Design pattern, and the interaction among its various components (Microsoft, 2008).

Fig. 23 Illustration of the MVVM Design Pattern (Microsoft, 2008) As Fig. 26 shows, the principal idea in MVVM is the separation of the user interface and the user data, and having the interaction between both independent from the main program logic that is not related to the UI.

66

Alex Movsessian

Computer Science

BUE

In Microsoft’s implementation, MVVM was accomplished through the creation of the Windows Presentation Foundation (WPF) subsystem in the .Net framework. WPF represents each element in the UI as a separate object that could be used either independently on its own (for example a button), or to group other objects (for example a Window). Fig. 24 shows the structural system block diagram of WPF (Freeman, 2010).

Fig. 24 WPF Architecture (Freeman, 2010) WPF objects could be initialised programmatically through code. However, such a process tends to be quite tedious, time consuming and produces code that is difficult to maintain. To combat such as issue, Microsoft allowed the creation of the UI objects through a newly devised 67

Alex Movsessian

Computer Science

BUE

XML-based language called XAML, which stands for eXtensible Application Markup Language (Louridas, 2007). XAML allows both defining initialising the UI elements and performing data binding with the backend data and event handling program logic. Editing XAML files is quite an accessible task through the use of various editors such as Microsoft’s Visual Studio or Microsoft’s Expression Blend. Such flexible editing capabilities allows the creation of portable and dynamic Graphical User Interfaces (GUIs) without the need of explicitly designing and compiling various files prior to displaying the GUI and binding it with the backend data and program logic.

68

Alex Movsessian

Computer Science

BUE

Chapter 3 Related Works Chapter 3 - Related Works

69

Alex Movsessian

Computer Science

BUE

3. Related Works 3.1 Google Map Reduce

Google’s Map Reduce is the parallel and distributed computing engine that is used by Google to index the World Wide Web and to handle most of its complicated search and algorithmic queries such as Graph. algorithms on large data sets (Dean, 2008).

Fig. 25 shows the operation of a sample Map Reduce query (Yang, 2007).

Fig. 25 Illustration for MapReduce Basic Operations (Yang, 2007) Map Reduce is enhanced specifically for large scale data centres with guaranteed mutual trust and lack of adversarial parameters. The primal issue about Map Reduce is that it’s designed to be managed in a fully centralized manner, offering only a small set of functional programming 70

Alex Movsessian

Computer Science

BUE

primitives which are not straightforward to tackle for usage on implementing general purpose algorithms (Yang, 2007).

71

Alex Movsessian

Computer Science

BUE

3.2 BitTorrent Sync BitTorrent Sync is a project that makes use of the Bit Torrent protocol to create a distributed file system based on the Bit

Fig. 26 BitTorrent Sync Logo (BitTorrent Labs, 2013)

Torrent protocol. Fig. 26 shows the BitTorrent sync logo. It accomplishes its objective by making use of it to enable having shared folders with encrypted data stored on a group of interconnected peers (Bharambe, 2012). The primal concerns about BitTorrent Sync include lack of sufficient security measures, in addition to having several issues linked to the original implementation of BitTorrent such as issues regarding NAT traversal and Proxification (Rosenberg, 2010). Fig. 27 shows the BitTorrent Sync User Interface for Distributed File Management (BitTorrent Labs, 2013).

Fig. 27 BitTorrent Sync User Interface for Distributed File Management (BitTorrent Labs, 2013) 72

Alex Movsessian

Computer Science

BUE

3.3 Skype Prior to Microsoft’s acquisition of Skype in 2011, it was operated almost exclusively as a peer-to-peer network, with central servers used primarily for authentication purposes. Fig. 28 shows the Skype logo (Microsoft, 2013). The service provided a good example of the degree of stability and reliability which is possible to achieve through peer to peer

Fig. 28 Skype Logo (Microsoft, 2013)

networks in real time applications such as voice and text chatting and exchanging files. A notable feature of Skype was the use of NAT traversal through the use of proxy nodes, in a manner somewhat similar to the “Proxification” to be used in the system discussed in this paper (Baset, 2006). Fig.29 shows the Skype network structure (Dean, 2012).

Fig. 29 Illustration for the Skype P2P network structure (Dean, 2012)

73

Alex Movsessian

Computer Science

BUE

3.4 BitCoin

Released in 2008, BitCoin is another peer-topeer application making use of a DHT to facilitate

Fig. 30 Bitcoin Logo (Nakamoto, 2013)

trading virtual currencies. Fig. 30 shows the BitCoin logo. Throughout the recent years Bit Coin has proved the potential for large scale peer-to-peer networks with DHTs in quite rigorous and resource intensive tasks such as the ones required for maintaining the BitCoin currency (Nakamoto, 2008). Fig.31 shows the BitCoin operation (Nakamoto, 2013).

74

Alex Movsessian

Computer Science

BUE

Fig. 31 Illustration for the BitCoin operation. The cloud represents the P2P network and the DHT (Nakamoto, 2013)

75

Alex Movsessian

Computer Science

BUE

The rise in BitCoin’s value have been staggering over the past few years. Fig.32 shows value of BitCoin versus US dollar indicating its recently growing potential (Maurer, 2013).

Fig. 32 The value of BitCoin versus US dollar indicating its recently growing potential (Maurer, 2013)

76

Alex Movsessian

Computer Science

BUE

Chapter 4 Contribution Chapter 4 - Contribution

77

Alex Movsessian

Computer Science

BUE

4. Contribution

In this section, the various aspects of the system previously mentioned in the structural diagram in chapter 1 shall be discussed one by one, together with their relation to each other and notable technical details whenever appropriate.

In order to be able to execute the required distributed computation tasks, the system component could be classified into four major layers interacting with each other. Table. 3 shows the main components of the system. Table. 5 System Components Component Communication Layer

Function Responsible for transmitting the data among the different peers throughout the selected communication medium (either the Internet or Local Area Network). Its components include:

Cryptography Layer



TCP Connection



Proxification

Responsible for ensuring the security and authenticity of the transmitted data among the peers on the network, in both end-to-end and global manner. Its components include:

Peer-to-Peer coordination



Key exchange procedure



Message exchange module

Responsible for synchronizing the overall structure of the relationship among the peers, facilitating the operations of finding other peers and

78

Alex Movsessian

Computer Science

BUE

data on the network. Its components include: 

Distributed Hash Table (DHT) based on the Kademlia structure



Distributed Hierarchical File System

Computation Layer

Responsible for providing an abstract interface to the network to facilitate programming. Its components include: 

Web Server



Omega Python

Fig. 33 shows the interaction of the various system components.

79

Alex Movsessian

Computer Science

BUE

Fig. 33 System Diagram

80

Alex Movsessian

Computer Science

BUE

4.1 TCP Connection The initial process a node has to do when launches is to establish a TCP connection with another node on the network. The program makes use of the .Net NetworkStream and TcpListener libraries for the actual process of creating the TCP communication channel. This connection establishment process can be made in one of two ways: 

Bootstrapping: In this case the node is supposed to know the IP address, listening port and unique identifier of another node in the network, by providing the former information, the program would establish the communication and authentication sequence (which is explained later in the following chapter).



Passive Listening: In this case, the nod listens to incoming connections on a specific port, until at least one node connects to it, thus linking it with the rest of the network.

Following the success of a connection, a “Peer” object is created, which is a custom object for the purposes of the system that holds the corresponding NetworkStream object that is required for data transmission with that particular node, together with the necessary information such as the peer’s public identifier. Note that in either case, the program makes use of independent threads for the previously mentioned processes, to allow for maximum use of the machine resources, in particular because the processes are by nature not CPU-bound, but are IO-bound, as most of the time is spent in waiting for the messages to be sent and for reply to come back from the second party on the communication channel. The messages sent through the communication channel are in the form of “Message” objects. “Message” is a custom object created for the purpose of making sending commands become easier. Message objects act as basic hash Tables, holding key value pairs, where the key is a string, and the value is a serializable object.

81

Alex Movsessian

Computer Science

BUE

4.2 Authentication

After the TCP connection is established, an authentication sequence is started, based on the RSA key exchanged that was discussed in an earlier chapter. The process can be modelled as an automaton. The party which established the connection process is referred to as Bob and the receiving connection party is referred to as Alice. Fig. 34 diagrammatically depicts the different communication process automaton states.

Fig. 34 RSA Key Exchange Automaton Table. 4 explains the automaton states and transitions. Table. 6 States in RSA Key Exchange State

Explanation

Q0

Bob sent a nonce to Alice encrypted with Alice’s public key.

Q1

Bob received the nonce he sent to Alice back, which means he authenticated Alice. In addition, he received a nonce from Alice encrypted with his public key. He sends to Alice back her nonce encrypted with her public key.

Q2

Alice authenticated Bob, and sent to him the AES key. Communication can be established now.

82

Alex Movsessian

Computer Science

BUE

4.3 Message Exchange Following the communication and basic AES key exchange, a secure channel is established between the two parties. The data is encrypted using AES with a distinct initialization vector per message. The advantage of having a distinct initialization vector is that each message even if it was identical to previously sent one and encrypted with the same key, would have a different cipher text. That would work to prevent correlation attacks making use of collected data over long periods of time. The Message objects are serialized and then encrypted and encapsulated into an “Envelope” object. Envelope objects are created for the sole purpose of holding the Message objects together with the AES initialization vector per message. Fig.35 shows the Envelope Data Structure Used for Transmitting the messages.

AES 256-bits Encrypted Message Object

AES Initialisation Vector (Unique Per Envelope)

Fig. 35 Envelope Data Structure Used for Transmitting the Messages Note: A detailed structure for the UML class and other classes used throughout the system is provided in Appendix B. 83

Alex Movsessian

Computer Science

BUE

4.4 Proxification (Tunneling)

A major aspect of the created system is tunnelling support. The tunnelling problem is that the basic TCP connection and authentication process discussed in the previous chapters would work seamlessly assuming that both parties know the IP address of each other and are directly accessible to each other. That is not the case in several situations where one party or both might be interested in keeping their IP addresses anonymous or might not be able to establish direct TCP connection to each other due to the presence of a firewall or some sort of blocking mechanism at one end or both. To combat such an issue, the system has a support for “Proxification”. The purpose of this is that each node assuming it is connected to some other node should be able to use that node as a proxy to communicate to any other node in the network, including nodes which use the same Proxification technique. Fig.36 is a sample diagram for modelling the Proxification process.

84

Alex Movsessian

Computer Science

BUE

Node 1

Proxy 1

Proxy 2

Node 2

Fig. 36 The Proxification Process Internally, “proxified” peers are represented by a “ProxifiedPeer” object which inherits from the “Peer” object. The difference between the “Peer” and “ProxifiedPeer” objects is that ProxifiedPeer stores as well an instance of the Proxy used to communicate to that peer, and instead of sending messages directly to a NetworkStream object, it sends the messages to the proxy peer who sends them to the receiving peer. On the receiving end the messages are stored in a queue and are popped out as soon as they are available by asynchronous pulling mechanism.

85

Alex Movsessian

Computer Science

BUE

Fig. 37, which was generated by the system client program, shows the sample connection diagram for 3 peers connected to each other, with the current peer shown as the root node of the tree. For each peer the first eight digits of its identifier, IP address and join date to the network are shown. The Fig. indicates the root node acting as the primal Proxification coordinator for the network of the three peers.

Fig. 37 Proxification Example It is notable also that the nature of the Proxification process allows for the creation of multiple “cloaking tunnels”. The aforementioned term refers to a group of interconnected computers chained together where each node in the chain (or connection Graph.) proxifies the previous one, effectively creating a difficult-totrace interconnection in a manner that is similar to, and can be extended to the Tor network which was discussed earlier. Fig. 38 illustrates a sample usage of the Proxification process for achieving anonymity.

86

Alex Movsessian

Computer Science

BUE

Fig. 38 The usage of nodes as multiple proxies

87

Alex Movsessian

Computer Science

BUE

4.5 DHT

The Distributed Hash Table makes use of the Kademlia design. Internally it is represented by an array of heaps (SortedList C# object) which holds instances of the connected nodes and their corresponding Peer objects. Each heap holds the nodes which have a specific distance to the peer. For example the first heap holds the nodes which have a distance of 1 to the current node. Each heap is sorted primarily using the Kademlia Exclusive Or (XOR) metric. For nodes which have the same XOR distance, they are sorted according to the time of establishing the connection, which as discussed in the DHT chapter previously, was found to be optimal, as the reliability of a node was found to be a function of how long it spent connected to the network. As each node is added, its distance to the current node is calculated and appended to the appropriate position in the specified heap. The DHT stores the following kinds of objects: 

Nodes: Other nodes in the network than the current node, which are currently connected to and authenticated.



Files: Which are accessible through their SHA-512 hash codes.



External Entities: Such as File Paths, which are used for other purposes such as the Distributed File System (DFS) or the parallel processing mechanisms.

Fig. 39 shows the linearised structure of the DHT buckets where the nodes and known peers are being held.

88

Alex Movsessian

Bucket 1

Computer Science

Bucket 2

BUE

...

Bucket K

Fig. 39 DHT Buckets Structure The buckets are ordered from the Bucket 1 up to Bucket K. The ordering of the buckets maps directly to the distance between the current node and the i th bucket. Nodes with distance 1 are stored in Bucket 1, ones of distance 2 are stored in Bucket 2 and similarly up to Bucket K. Note that there is not Bucket 0 in the actual implementation of the system, as only one node could possibly map to it, and that is the local node itself which is already identified and stored separately.

89

Alex Movsessian

Computer Science

BUE

4.6 Web Server

In order to directly access the peer-to-peer structure from a web browser, a custom web server was created. The server makes use of HttpListener .Net objects, which use the Microsoft Windows HTPs APIs for listening and serving web content from a specific port. The created listener is conFig.d to respond to requests on port “6789” with the URL prefix of “localhost”. That means any requests from a browser running locally with the prefix “http://localhost:6789” would be redirected to the listening program which would be able to respond to it. The HTTP server supports all of standard HTTP requests, including cookies, sending forms and uploading files through GET and POST HTTP requests discussed earlier. Fig. 40 shows the Operation of the local web server.

90

Alex Movsessian

Computer Science

BUE

1. The user enters the URL http://localhost:6789 into the browser. The browser establishes a TCP connection to the local running web server, and sends the HTTP request over the connection.

Fig. 40 Operation of the local web server

91

Alex Movsessian

4.7

Computer Science

BUE

Distributed File System

The system offers a distributed file system that allows files to be located on the network through a minimal syntax. 4.7.1 File Identifiers

The used system follows a hierarchical structure for the files. In the this system each file is identified by a top-most element that indicates the node which created it, and that servers partially for authentication purposes. Afterwards, the file path is given relative to that node ID using the forward slash symbol to separate between its various tokens. Each identifier is given in the following format: / is the unique identifier of the peer which originally created the file. It is used to authenticate the origin of the file and to prevent collisions if various peers created files with similar file paths. is the file path, in a UNIX format (for example /folder1/file1.jpg) Fig.41 illustrates the Hierarchical Data Structure Format for a group of directories holding various files.

92

Alex Movsessian

Computer Science

BUE

Root (Author Peer ID)

Fig. 41 Hierarchical Data Structure Format 4.7.2

Network Functionality

The distributed file system functions by querying the network for the hash of the file path, which would return a FileInfo object that holds the actual file creation date, signature and author peer. After Retrieving the FileInfo object, the node proceeds to search for the returned file hash onto the network. The network querying is done in parallel, according to the parameter α which determines how many nodes to query simultaneously. In the current implementation the value of α is set to 3 which is the recommended value in the original Kademlia paper.

93

Alex Movsessian

Computer Science

BUE

The search procedures follow the same Kademlia DHT invariants for ensuring consistency and logarithmic performance guarantees. Fig.42 illustrates a sample parallel search query with three participating peers.

Node

Peer 1

Peer 2

Peer 3

Results

Fig. 42 Sample Parallel Search Query

94

Alex Movsessian

Computer Science

BUE

4.7.2 Local File Management

The files stored locally are the ones which are linked to the original user that is represented by the current peer running the system at a given time. Those files are expected to be stored in various local folders. The user should be able to dynamically add, rename, remove or modify any of those files or subfolders without prior notification to the program. Those actions are handled in the following manner: 

Write, Rename and Delete Operations: The user could freely alter any file as long as it’s not being transferred to another party on the system. Otherwise a message would appear from the operating system indicating that a write lock is being obtained on the file and cannot be released until either the data transfer is finished or the user manually terminates the program.



Create Operations: When a user creates a new file or folder inside one of the pre-specified shared locations, it would be automatically analysed for its corresponding hash code and added to the list of shared files.

Internally, the files are represented in a flat-table structure, whereby each path is mapped directly to an object holding its detailed information such as creation time and hash code. Thus, each added directory is flattened into that structure prior to being used. This has various advantages such as: 

Speed lookup in constant time O(1): Because a given query for a search path does not have to do any sort of recursion or backtracking. This is particular useful in a context where various queries are expected, with potential adversarial ones.



Ease of modification and deletion: Making advantage of the constant time O(1) operations of the hash table to quietly erase or alter the information regarding a particular file. 95

Alex Movsessian

Computer Science

BUE

Fig. 43 shows the flat-table structure that is used internally for the file storage. %root%/Folder/File 1 %root%/Folder/Another file %root%/Folder/MoviesFolder/movie 1

Object 1 Object 2 Object 3 ...

Fig. 43 Flat table representation of the file structure

96

Alex Movsessian

Computer Science

BUE

In order to be able to detect the former changes, a parallel asynchronous event handling mechanism was created using the FileSystemWatcher operating system objects. Those objects are capable of monitoring a specified folder for any changes and reporting back asynchronously through a new thread per each event which would handle altering an internal concurrent data structure used for representing the files. Fig. 44 shows the operational process of the FileSystemWatcher objects (Microsoft, 2010).

Fig. 44 FileSystemWatcher Operation (Microsoft, 2010)

97

Alex Movsessian

Computer Science

BUE

Fig.45 shows a sample demo application using the FileSystemWatcher for monitoring a certain file.

Fig. 45 FileSystemWatcher Sample Demo

98

Alex Movsessian

4.8

Computer Science

BUE

Omega Python

Omega Python is the name designating the custom programming language made for use by the system. The language is a superset of standard Python, with additional primitives for abstracting the peer to peer network. Table.5 shows the added primitives. Table.7 Omega Python Primitives Primitive Name w_launch_instance

Description Launches an instance of a specified application on the specified peer, given the application ID and the peer ID, and returns the running instance ID.

w_send_msg

Sends message to the specified peer ID. Messages sent are variant of Python dictionaries, which are quite similar to the intrinsic Message objects used for the primal system communication.

w_register_listener

Registers an asynchronous event listener which is executed on a separate C# thread every time a message is received from a node to the particular Python instance which registered the event listener.

Omega Python gives the developers freedom and simplicity for using the network as it abstracts the entire underlying structure onto the mentioned primitives. The power of having a superset of Python allows the developers to make use of the rich Python standard library for performing operations such as web crawling, data encryption and compression, parsing XML without the need to create new libraries, and with the flexibility to import existing debugged and tested code. The events are executed in an asynchronous manner, with each event executing on a separate thread with its own newly created “STAThread”. STA 99

Alex Movsessian

Computer Science

BUE

refers to: Single Thread Apartment, which is an attribute that is set for the created Omega Python threads, and its used to enable communication between the Omega Python components and the rest of the Microsoft Windows Operating System components.

100

Alex Movsessian

4.9

Computer Science

BUE

Attack Vectors

Considering the open-nature of the program, being used on the Internet and accessible by various adversarial entities, ranging from casual code tinkerers all the way up to professional programmers working for malicious entities. An attack vector is a method by which a potential entity with malicious intentions might exploit the system in order to do unintended actions on it, such as accessing someone’s computer without obtaining proper authentication. The system was designed with various attack possibilities being put into consideration. Table. 6 summarised the various potential attack vectors and how each was defended against. Table 8 Attack Vectors Summary Attack Timing Attack

Description

Defence

Usage of information on

Usage of Cryptographic

time where a peer

random generators,

identifier or a message

which take into account

encryption key was

those specific attack by

generated to simulate a

means of adding entropy

random number

based on various system

generator and obtain the

parameters and random

corresponding private or

functions.

initialisation vectors. Such an attack was common in the early days of the web to circumvent websites security certificates (Huan, 2013).

101

Alex Movsessian

Buffer-overflow Attack

Computer Science

BUE

Sending illegal stream of

Use of protected .Net

characters which might

managed-code libraries

have malicious

for the entire

embedded code,

communication back-end,

triggering the execution

which have safe string

of unintended functions in streaming mechanisms the operating system.

built into them, throwing

Such an attack is

exceptions and

particularly popular for

effectively terminating

Internet-based

any ongoing connections

applications because of

with a particular peer who

the means of sending the

may send such malicious

input are not visible

or ill-formatted input.

directly to the end user (Shen, 2012). Crypto analysis Attack

Most common forms of

Industry-standard key

crypto analysis attacks

sizes were used. RSA

currently evolve around

2048 bits and AES 256

using parallel

bits. Best known attacks

computation, in the form

evolve around RSA 512

of leased resources from

bit keys and AES 64 bit

cloud services such as

keys (Izu, 2012). No

Windows Azure or

attacks have been known

Amazon EC2 to try and

about the used key sizes,

attack a particular RSA or and they are currently the AES key.

industry standard for cryptographic communications, used by large companies such as Google (Powell, 2012).

102

Alex Movsessian

Computer Science

BUE

Communication Flow

An entity might attempt to A strict automaton

Overriding

directly send commands

governing the

to another entity prior to

authentication process

the completion of

have been designed as

appropriate

shown in section 4.2. The

authentication

system is conFig.d to

procedures.

follow the automaton procedure strictly, and to disconnect the peer in case an invalid transition was invoked.

Excessive Resources

A peer might attempt to

A “reputation” system

Exploitation

post a large number of

within each peer

requests to a single or a

maintains for each peer

small group of peers.

and corresponding IP address a historical log of the requests which arrived from and to that peer. Such records allow the peer to ban a particular peer or an IP address in case of resource-abuse detection. Such mechanism guarantees further stability to the network structure and resource availability.

103

Alex Movsessian

Hash Collisions

Computer Science

BUE

A malicious entity might

The SHA 512 hashing

be able to create a peer

algorithm is used, which

identifier with the same

has no possible known

identifier as other peer,

feasible collision attacks,

thus being able to

and is approved for

digitally sign the files

usage for protection of

distributed by a particular

confidential information

peer and establish

by the US government

connections with peers

(Kumar, 2012).

trusted by the circumvented peer. Executing non trusted

A malicious peer might

Microsoft’s .Net

code

be able to execute codes

framework which was

not approved by the user

used in the system has

(outside of the approved

built in “Code Access

programming

Security (CAS)” . That

framework).

feature prevents codes from executing procedures outside of an approved range. Procedures higher than the current given privileges would require manual approval by the user (Luo, 2012).

104

Alex Movsessian

Computer Science

BUE

Chapter 5 Experiments and Test Results Chapter 5 - Experiments and Test Results

105

Alex Movsessian

Computer Science

BUE

5. Experiments and Test Results

Two primal applications were created for testing the system. Each application represents a different problem to be solved using the system’s offerings. The applications are aimed to be a proof of concept for the system’s functionality, and to provide real-world cases for which analysis would be possible. In sections 5.1 and 5.2 the structure of the applications shall be discussed algorithmically. Afterwards, in section 5.3 a detailed analysis for the results of the experiments regarding various network performance metrics shall be discussed. Section 5.4 would have a comparison with other networks, in particular BitTorrent and eMule. 5.1 N-Way Merge

The idea behind the N-Way Merge algorithm is to partition a given data set that is required to be sorted into an equal number of subsets, where each subset would be given to a particular peer to sort, with the results returning in the end to a master peer who would combine them onto a single file. The main use case of N-Way Merge rises when the entity interested in sorting the data does not have sufficient memory to sort it in place. An example to demonstrate the algorithm: Assume we have a file of size 1 gigabyte, 10 machines each of free RAM of 100 megabytes. It would be infeasible to sort the file on any single machine individually. The file would be divided into 10 parts, each of size 100 megabytes, and sent to 10 different peers by the master.

106

Alex Movsessian

Computer Science

BUE

Each peer would sort the 100 megabyte section of the file individually in (n/m) log (n/m). Where n is the number of numbers in the original file, and m is the number of peers. In this case m = 10 and n can be assumed to equal 256 million (assuming the file holds 32 bit (4 byte) integers). Following the sorting, each peer would return to the master the smallest number in its list. The master would have a heap of tuples. Each tuple would hold the number and the corresponding machine. The heap would be of a fixed size of m (where m is the number of machines). At each step, the master would pp the smallest number from the heap and asks the machine which had originally returned that number to return its next smallest number. The reason for not requesting all machines to return their smallest numbers is that only the machine which returned the smallest number is the one that would have changed its smallest returned value. The runtime complexity of this algorithm is O(n) for the master (where n is the number of numbers in the original file) and (n/m) for the m peers. The space complexity for the master is O(m) (which can be considered as O(1) as m is typically very small ~10-100), and O(n/m) for each peer. Note: Appendix C includes detailed performance analysis for the process of executing the parallel computing application thread. Fig.46 shows the general process of the N-Way merge algorithm.

107

Alex Movsessian

Computer Science

BUE

Fig. 46 N-Way Merge

108

Alex Movsessian

Computer Science

BUE

5.2 Load Balancing for Link Crawler

Another example of the use of the system is Load Balancing. The particular case is the assumption that a peer has a list of web page addresses and is interested in extracting the links inside each web page address. Such a task done locally would consume too much time, and parallelizing it locally would not be helpful, as the primary limit would be the bandwidth of the server. Thus, it is considered to be a quintessential example for the usage of the system. A potential solution for this problem using the designed system can be illustrated through the following example, which is essentially a variation on the well-known producer-consumer problem: 

The master would have a thread-safe queue inside of which it places the links.



The master would launch “m” instances of the application at “m” peers.



Each peer would ask the master for the next address in the queue that is not given yet to another peer.



The master would pop the top address on the queue and return it to the requesting peer.



After processing and extracting the links from the given address, the peer would return the list of links inside the given address to the master, potentially with extra flags to handle unexpected situations such as not found or blocked web addresses.



The process repeats until the master no longer has links in the queue, and for which it would send “exit” commands to peers requesting links to process.

109

Alex Movsessian

Computer Science

BUE

Fig.47 shows the Web Crawler Load Balancing Operation

Multiple Peers Scanning the URLs Simultaneously

Fig. 47 Web Crawler Load Balancing Operation

110

Alex Movsessian

Computer Science

BUE

5.3 Experimental Data Analysis

For ease of analysis and debugging, while obtaining a sufficient sample set, a test was conducted through 5 different nodes in 5 different countries, each running the system, in addition to a 6th node running various sample programs, including the previously mentioned two examples and another test data on the network. 5.3.1 Data Analysis Tools

The following tools were used to obtain data helping for the experimental data analysis: 

Wireshark: It is an open-source tool, considered to be one of the most powerful network packet

Fig. 48 Wireshark Logo (Wireshark, 2013)

analysis, used in the industry for the development and debugging of network protocols. Fig. 48 shows the Wireshark Logo (Wireshark, 2013). It has the capability of watching the entire packet stream through a selected network interface. The tool was used to analyse the structure of the transmitted packets.

111

Alex Movsessian



Computer Science

BUE

WPE Pro: It is another popular tool, in particular on the Windows platform. WPE Pro has the capability to monitor the network transfers of a particular process in detail, and to even alter the transmitted data or to spoof transmission data. It was used to obtain ad-hoc data per process for each running instance of the system processes. Fig. 49 shows the main options of the WPE program.

Fig. 49 WPE Pro Main Window Screenshot



Visual Studio Performance Analyser and Threading Visualiser: Both were parts of the Visual Studio debugging tools. Fig.50 shows the Visual Studio logo (Microsoft, 2012). They were used to give detailed overview of the performance of each individual thread and cores in relative to the entire machine’s

Fig. 50 Visual Studio Logo (Microsoft, 2012) 112

Alex Movsessian

Computer Science

BUE

performance. Results obtained by those tools are mentioned in more detail in Appendix C.



Trace Route: Trace Route is a tool for network diagnosis that is used to display the path taken by packets from source to a particular destination. It shows in detail the nodes during the path, with the “hops” which represent the edges between the nodes in the network Graph. Traceroute makes use of sending Internet Control Message Protocol (ICMP) “Echo” messages through the network to get back replies from the nodes it encounters during the taken path. The tool was used together with a visualisation tool called “CountryTraceRoute” that would provide detailed visuals regarding the output of the command line tool, helping in obtaining more intuitive understanding of its results. Fig.51 elaborates on the operation of the trace route utility (Cisco, 2012).

Fig. 51 Illustration of TraceRoute Operation (Cisco, 2012)

113

Alex Movsessian



Computer Science

BUE

No-IP: No-IP is a provider for dynamic domain name services. Fig. 52 shows the no-ip logo (No-IP, 2012).

Fig. 52 No-IP Logo (No-IP, 2012)

This service allows a machine to have its own domain relative to no-ip. The service was used to give dynamic names to the peers and to the machine used for the coordination of the tests rather than having to enter the long IP addresses, and to change them if a peer got disconnected and connected again.

114

Alex Movsessian

Computer Science

BUE

5.3.2 Experimental Data

5.3.2.1 Network Performance

Five nodes were selected for participating. A 6th node was responsible for coordinating the tasks. Table.7 shows the network latency for the select nodes. Table 9 Network Latency Node

Geographical Location

Node 0 (Tasks Coordinator)

Egypt

Node 1

France

Node 2

Germany

Node 3

Russia

Node 4

China

Node 5

United States

The network latency from the coordinator peer to the Internet Service Provider (ISP) endpoint was found to be: 83 milliseconds. The network latency for each node was analysed using the aforementioned Trace Route too. Graph.2 shows the average network latency.

115

Alex Movsessian

Computer Science

BUE

Average Network Latency (ms) Node 5

Node 4

Node 3

Node 2

Node 1

Node 0 (Tasks Coordinator) 0

20

40

Node 0 (Tasks Coordinator) Average Network Latency (ms) 165

60

80

100

120

140

160

180

Node 1

Node 2

Node 3

Node 4

Node 5

124

123

132

141

124

Graph. 2 Average Network Latency Response time was calculated based on the second experimental case, the load balancing link crawler. The application was executed on 6 peers, to crawl 12 links. Fig.53 shows a screenshot for the specified application.

116

Alex Movsessian

Computer Science

BUE

Fig. 53 Screenshot of the Web Crawler Load Balancer Application The average load time for a link was calculated using the data obtained from the latest versions of three web browsers: Microsoft’s Internet Explorer, Mozilla Firefox and Google Chrome. The average response time for loading a link was found to be: 2.31 seconds.

Fig. 54 provides extra details for the link loading analysis process for an experimental set of web links.

117

Fig. 54 Link Loading Response Time Analysis

Afterwards, the connection to the peers was established from a master server to monitor the communication process for recording the necessary experimental data. Following the launch of the application on the network, a full packet inspection was made on the running peers in order to capture all the packets, including the data and the timing to be able to come up with detailed performance analysis based on the response time indicated by those packet data logs. Fig. 55 shows the web scrapper application status, having finished 2 links and processing 2 links, with the remaining links put on the queue.

Fig. 55 Web Scrapper Queue

Alex Movsessian

Computer Science

BUE

Fig. 56 shows the web scrapper application having finished processing the given links.

Fig. 56 Web Scrapper Finished

120

Alex Movsessian

Computer Science

BUE

Fig.57 displays the results of the web scrapper, in HTML format.

Fig. 57 Web Scrapper Result Links Table. 8 shows the specifications of the PC used to monitor the communications. Table 10 Communication Monitor PC Specifications Item

Specifications

CPU

Intel 1.83 Core2Due

RAM

2 GB

Operating System

Windows 7 Service Pack 1

IPv6 Support

Yes

Behind NAT

No

Initially, to analyse the actual transmitted data and verify that the encryption process was performed correctly, ensuring that the used networking 121

Alex Movsessian

Computer Science

BUE

libraries were not leaking un-intentional data, WPE Pro was used for examining all the transmitted packets throughout the handshake process and following the transmission of a sample test messages. Fig. 58 shows a packet dump for the application process obtained using WPE Pro.

Fig. 58 Packet Dump 1

122

Alex Movsessian

Computer Science

BUE

As shown in Fig. 58, the initial handshake commands were –as expectedsent out through simple XML-based serialisation process, whereby the embedded objects were transmitted as plaintext. Such behaviour was expected as the encryption process is supposed to begin following the handshake procedure. What was not known prior to analysing the handshake packets that the serialisation process made through the used libraries appended to the XML data tokens relating to the originating program, together with its version. In this case the substring “omega, Version 1.0.0.0”. Although it was previously unknown, as there was not clear documentation about the exact structure of serialised data transmission in the official libraries manuals, however such a feature could be used for various purposes, some of which: 

Network Filtering: Network administrators using Deep Packet Inspection (DPI) tools could easily set up filters to allow or disallow the communications of the system machines quite easily. That could be useful to make sure the usage of the system adheres to the organisational rules it is to be used within.



Version Control: The transmitted version information could be used to create compatibility layers between the different versions of the system quite easily without the need of adding additional fields by means of altering the message structure which could increase its size unnecessarily.

Following the handshake process, the rest of packets were –as expectedtransmitted in a fully encrypted manner. Fig. 59 shows a packet dump for the communication after the handshake process was established.

123

Alex Movsessian

Computer Science

BUE

Fig. 59 Packets after handshake In total, 32 packets were sent and 93 were received at each end during the handshake process. The average overall time taken, including the application and network latency, was 1321 milliseconds. The somewhat seemingly large number of packets is justified as: The number of packets is proportional to the size of the transmitted messages, as the IP protocol requires splitting larger messages into smaller ones. The initial messages are encrypted using the RSA algorithm, which as mentioned 124

Alex Movsessian

Computer Science

BUE

previously in the Review of Literature chapter is not symmetric in the size of messages it encrypts. Thus, the initial messages encrypted using RSA only would have a significant overhead factor increasing their size and the number of packets transmitted. The messages in the handshake process include the relatively large public key objects (2048 bits for each peer) in addition to the nonce objects (256 bits for each peer) and the encryption AES key (256 bits, sent by the connection initiating peer). However, following the handshake process, all messages are transmitted using the AES encryption, which as discussed in the Literature Review chapter does not have any overhead, meaning that all the transmitted messages do not incur any overhead, and thus require less number of packets. Note: A detailed data packet log for the handshake process is included in Appendix D. Following the establishment of the handshake process, the link crawler application was started. Table.11 shows the gathered data regarding the process.

125

Alex Movsessian

Computer Science

BUE

Table 11 Gathered Experimental Elements Element Locating application on the Distributed

Measure 500 milliseconds

File System Initiating the server element

100 milliseconds

Initiating the client element

1200 milliseconds Note: The bulk of the time to launch the client was due to loading the client’s GUI dynamically using the XAML engine.

Sending command between client and

300 milliseconds

server instances

Note: The reason the time between applications is slightly higher than that between the individual peer instances is that –as previously mentionedapplications send messages through a second layer of encapsulation on top of the already existing encapsulation and abstraction “envelope” mechanism used for communication among the individual peers.

126

Alex Movsessian

Computer Science

BUE

5.4 Comparison to Other Peer-to-Peer Systems

Despite the somewhat seemingly similar structure between the current system and other peer-to-peer systems such as BitTorrent and eMule, it might be worth it to re-mention some of the major differences in the network and communications structure between the aforementioned systems. Table.12 shows a comparison between various Peer-to-Peer networks. Table 12 Peer-to-Peer Networks Comparison Omega (Current

BitTorrent

eMule

System) DHT Structure

Kademlia

Kademlia

Kademlia

Data Exchange

Message-based

Message-based

Message-based

Authentication

RSA 2048-bit Key

None

None

Optional AES 128

None

Exchange Encryption

AES 256 bits

bits File Transfers

Integrity Check

One-message

Segmented from

Segmented from

based

multiple peers in

single peer at a

parallel.

time.

Per file segment

Retroactive (after

Retroactive (after file transfer)

Client

C#

file transfer) C++

C++

Language

Considering that authentication is performed only once when a peer establishes a connection channel with another peer, it could be considered to have a minimal overall impact on the performance. That makes the primal factors affecting the performance measurement when comparing the selected peer-to-

127

Alex Movsessian

Computer Science

BUE

peer systems is the encryption and file transfers, as they would be executing for the bulk of the time in the duration of connection between peers. The AES encryption adds a constant time factor to the time for transmitting messages, as they have to be encrypted first at the sender peer, and correspondingly decrypted at the receiving peer. The AES performance was measured on an 2007-era Intel Core2Duo 1.6 GHz processor, and found to have a throughput of 11 MB/second. More thorough measurements using a variety of processors and conditions were made by Intel and found to have similar results with a maximum potential of 700 MB/sec on an i7 processor (Intel, 2011). Such measurements for the AES are considered to be sufficient considering the overall bandwidth cap itself which is much smaller than the AES cap itself. Regarding the file-transfer performance measure, the main factor to put into consideration is the size of data to be transmitted. A sample experiment was conducted on the BitTorrent protocol by participating in a file swarm both downloading and uploading a group of files. The major pattern that was noticeable was the rapid fluctuation in the upload/download speeds. Graph.3 shows an analysis for the speed of BitTorrent, that was performed by averaging out the speed of download and upload by means of transferring various large files through the BitTorrent network.

128

Alex Movsessian

Computer Science

BUE

Graph. 3 BitTorrent Speed Analysis Such fluctuation is natural given the parallel nature of the protocol, whereby multiple peers share segmented data, whereby various peers might come and leave the swarm throughout the duration of the data transfer session, with various peer employing techniques such as “chocking” whereby they completely stop transmitting data for periods of time to certain peers with low bandwidth. The previous issue does not exist in the omega system as the data transmission is done in a single-segment with a particular peer. That makes it favourable as well for peers with slow data connections whereby they might suffer from the “choking” mechanism in the BitTorrent protocol. A notable exception whereby BitTorrent might outperform the Omega system could be when the data transmitted is of sufficiently large size that it might not be feasible to be transmitted on a single session. Such issue is resolved through the use of application-aware data transmission using the Omega-Python language abstraction primitives to create a custom layer for segmented data-transmission. Such a layer would allow for a far easier and more customisable application-aware data transmission superior to that of BitTorrent.

129

Alex Movsessian

Computer Science

BUE

5.5 Optimality

Regarding the second experimental application that was discussed, the one applying the n-way merge algorithm on a distributed system, various notes were taken regarding its efficiency. Aside from the obvious need for the n-way merge algorithm due to resource constraints (RAM in this case), a notable thing to realise, is that the inherit parallel nature of the n-way algorithm allows for maximising performance even when using it on a single machine. The idea being is that dividing up inherently non-linear operations into smaller groups, and doing a linear-time operation for combining the results would give up a higher advantage performance-wise even if one used sub-optimal algorithm. As an example, one could consider the Bubble Sort algorithm. Fig.60 shows the pseudo code for the BubbleSort algorithm.

Fig. 60 BubbleSort Pseudocode

130

Alex Movsessian

Computer Science

BUE

Fig.61 shows the pseudo code for the merging algorithm.

Fig. 61 Merging Algorithm Pseudocode The BubbleSort algorithm is known to be a suboptimal solution for the sorting problem, having a complexity of O(n 2). As an example, if one would sort a list of 10,000 32-bit integers, the expected asymptotic number of operations would be proportional to (10,000)2 = 100,000,000 (hundred million operations). However, if the 10,000 integers were divided into 10 groups, each of size 1,000 integers, the overall time to sort all the lists would have an asymptotic bound of: 10 x (1,000)2 = 10,000,000 (ten million) In addition to O(n) time for merging the results of the 10 lists, leading to a total of asymptotic bound of 10,010,000 (ten million and ten thousand operations). Graph.4 shows a comparison between the two approaches discussed.

131

Alex Movsessian

Computer Science

BUE

120000000 100000000 80000000 60000000

40000000 20000000 0 Bubble Sort

Distributed Bubble Sort

Graph 4 Comparison Between Two Approaches of Sorting As one would notice, there is quite a significant difference, namely and order of magnitude. Thus, merely dividing up the task made the operation 10 times faster. To put this into perspective, assuming that there exists a problem similar to that, where there is no known linear time O(n) solution for solving individual instances, but there exists a linear time solution for merging them, which is a somewhat common pattern in various divide-and-conquer algorithms, assuming the data set is sufficiently large, for a problem where the classical O(n 2) approach would take 10 days, using the classical approach algorithm without changing and by only dividing up the data set into buckets, where each bucket is processed individually using the same O(n2) algorithm and the results are merged in the end with an O(n) algorithm, the same problem that was going to take 10 days would take only 1 day. Such case is not uncommon nowadays, in particular for various problems where either a linear time algorithm does not exist, or coding it might be quite challenging, yet a relatively simple O(n 2) or higher algorithm might exist. By using proper division of the input data set, one might end up with surprisingly efficient results with minimal effort. 132

Alex Movsessian

Computer Science

BUE

Chapter 6 Conclusion and Future Work Chapter 6 – Conclusion and Future Works

133

Alex Movsessian

6.1

Computer Science

BUE

Conclusion “The more we reduce ourselves to machines in the lower things, the more force we shall set free to use in the higher.” (Anna C. Brackett, The Technique of Rest, 1892) The project’s aim was to facilitate the creation of distributed parallel

applications, a process that was considered to be quite challenging and demanding an advanced degree of experience in various fields such as parallel processing, synchronisation and networking. Researching for the project took quite some time, but was quite fruitful, as it showed clearly what was possible, and to which extent, based on the previous works in the related areas such as peer-to-peer networking and programming languages. It was revealed that the Kademlia DHT structure has proven itself in the recent years to be the most reliable basis for large scale peer-to-peer networks such as BitTorrent. Security algorithms such as RSA and AES, with their derivations for key exchange were investigated, in order to find the suitable secure parameters. Various programming, scripting languages and design patterns were studied in order to come up with an easy, intuitive yet powerful design for a new programming language capable of harnessing the powers of the new system. Throughout the journey of building up the system various challenges were faced and dealt with in terms of design and debugging problems, most of which being unfamiliar at the practical level, such as deadlocks. Tackling those issues has proven to be quite an exciting and worthy task to accomplish, one that was necessary for the greater cause of having the system functional and running. The objective was accomplished through the creation of a multi-layer distributed system. A TCP-based system for connection coupled with capabilities for bypassing traditional network obstacles was developed for ensuring stable and accessible operation for the underlying network communication channels. 134

Alex Movsessian

Computer Science

BUE

A distributed hash table was developed based on the well-known and powerful Kademlia structure in order to establish a fast, efficient and flexible storage and lookup facilities among the peers in the network. A distributed hierarchical file system was created in order to share the data among the nodes in the system. A custom programming language based on the Python language was developed to facilitate accessing the system as an abstract block of functionality through various simple primitives. To ensure security and privacy of the participants, industry-standard cryptographic functionalities were utilised creating an invincible shield of protection against most of well known attacks on traditional peer to peer systems. Throughout the testing phase it was found that such added security measures added only a minimal overhead that was quite well justified given the added safely and confidentiality to the system’s communication layer. Various kinds of algorithms were tested on the system, including variants of the producer-consumer problem, web scrapping and load balancing applications and sample data exchange and communication applications. The tests were followed by detailed performance analysis regarding the different interaction operations among the threads on the different parts of the system. The results were deemed to be satisfactory, fulfilling the primary objectives of having the ease of code, and the secondary ones of having smooth and reliable performance. Security wise, the system was found to behave as expected regarding the data protection. Certain aspects regarding the extra data being transmitted through the underlying communication foundation were noticed and were made use of in enhancing the communication protocol. In addition, certain performance optimality opportunities for various divide-and-conquer algorithmic patterns were noticed and their use was documented. The system’s modular design opens up various routes for expansion in terms of adding add-on applications which make use of its provided primitives, 135

Alex Movsessian

Computer Science

BUE

such as distributed cloud sharing file systems and secure multimedia chatting. Other potential paths for expansion include adding new layers and APIs to the system’s programming and abstraction mechanisms to support more advanced tasks such as programming Graphical Processing Units (GPUs) and having built in representation techniques and algorithms for common tasks such as Graph. traversal and web crawling. Combining the aforementioned features into a single system, allowed it to become a new category of network applications, one that is fundamentally different in terms of the overall power than previously made network systems. Table.13 contains a brief outlining summary for the advantages of the implemented system with respect to other similar works. Table 13 Summary Comparison with Related Works Distributed

Scalability

Cryptography

Computation Anonymity

File Sharing Implemented System Hierarchical

Kademlia

RSA 2048 +

General-

Two-Way

Distributed

DHT

AES 256

Purpose

Proxification

N/A

N/A

N/A

N/A

Only Ad-hoc

N/A

File System BitTorrent SETI@Home

N/A

Tor Network

N/A

N/A

Having the system deployed on a large scale, with participation and interest of competent developers, the system promises to change the way traditional peer-to-peer and distributed computation is being performed online, creating new and more efficient solutions for problems which were deemed ineffective to solve 136

Alex Movsessian

Computer Science

BUE

before, opening up the gates for a new Internet era of innovation, competition and productivity.

137

Alex Movsessian

Computer Science

BUE

6.2 Future Works

6.2.1 Distributed SQL-like Database

It might be possible to append to the current system’s functionalities a distributed SQL-like database, and dynamic reduction algorithm processing techniques and syntax based on the already conceived distributed computing primitives to facilitate performing direct SQL queries. The primal challenges would involve read-write authentication and deadlock prevention. A reference for handling such issues could be the BitCoin network that was discussed in the Related Works chapter. 6.2.2 Live Streaming

The current system is targeted towards message-oriented applications. While it might be possible to create streaming services by means of repeated message transmission, it would require less effort and become more efficient in terms of processing capacity and bandwidth requirements to offer direct streaming capability into the built in peer-to-peer network handling mechanisms. A closely similar service that could be used to have a frame of reference for the expected issues could be Skype, which was previously mentioned in the Related Works section. 6.2.3 Native Algorithmic Libraries

Native libraries would contain sets of functions which could make usage of the distributed systems more efficient and easier. Such libraries cold include: 

Image processing libraries: This could for instance split an image into various parts and recognize each part on a separate system. A potential

138

Alex Movsessian

Computer Science

BUE

application for this could include row of cars photographed, and the system recognizing the number on each car plate on a separate system. 

Speech Recognition Libraries: Those libraries convert speed either from sound files or video files to text. A potential application for those could be like YouTube’s automatic closed caption generation (Steiner, 2010), whereby parts of a video file would be processed individually and the end result closed caption file be combined by a master server. Such a service could help hearing impaired, ones who want to improve their foreign language skills and search engines doing searches on video content.



GPU (Graphics Processing Unit) Libraries: Which would make use of the GPU power together with the processor power in a simple and abstracted manner, enabling the creation of more efficient distributed applications.

6.2.4 Ad-hoc Application Ports

This could enable porting existing applications and their protocols, such as BitCoin and executing them directly on the system, making use of its features which are not present in the original applications such as NAT traversal and endto-end encryption and Proxification. 6.2.5 Dynamic Cloud Storage

The built in distributed file system could be attached to a new module for creating a local folder-like space in an operating system for storing arbitrary data in the folder being backed up automatically to the peer-to-peer system. Such distributed structure when used for backup purposes would have the primal advantage of preventing data loss cases from major backup providers going offline or from potential government censorship. A prominent example of one of those cases is the MegaUpload website which was used for backinup and sharing data globally up until its closure in 2012. The website was made use of a 139

Alex Movsessian

Computer Science

BUE

centralised data centre, and following a series of legal problem the data ended up being ceased by the US authorities and various claims of loss of data and backups were made (Sidenius, 2012). The primal concern would be to encrypt the data itself prior to uploading it, which might be possible through a variety of solutions such as having a separate symmetric key that is encrypted with the identifier (public key) of the author peer. A notable thing that would have to change would be the reliability or load balancing of the data across the network, as the new data would not be expected to be accessed as frequently as regularly stored data. Such a problem could be alleviated by adding new rules for the data storage and redundancy mechanisms of the system, to allow- for instance- have a “black box” storage area at each participating node for storing data from other nodes in an encrypted and plausible-deniable manner.

140

Alex Movsessian

Computer Science

BUE

6.2.6 Monetisation

Being open sourced and free does have various potentials for generating revenues. The idea behind that is mainly the users preferring not to compile or customise the program themselves and to use a ready-built copy of it whenever possible. Such a usage pattern allows for various monetisation potential such as: 

Integrating with a search engine: That would allow the users to make queries for shared files on the system through websites which share such links from the program itself; without having to open up a separate web browser instance. Such a feature has been used in popular free software such as Mozilla Firefox, where Google agreed to make a yearly deal worth roughly a billion US dollars in order to become the default search engine for Firefox (Hazan, 2013). Another example of search engines making deals to be used by default with free programs include IrfanView, which is a free image editing program where during the installation the user is prompted to change one’s browser default search engine to the sponsored one.



In-App Advertisements: Where a small portion of the application would be used to display advertisements, potentially ones related to the content the user is currently viewing. Such an approach has been proven successful throughout the recent decade. One of the earliest examples for in-app advertisements include Gmail’s targeted ads, which feature small toolbars when the user is viewing or composing an email, showing related sponsored content advertisement related to certain keywords in the user’s email (Yuan, 2012). Another example for the success of in-app advertising includes the popular BitTorrent client uTorrent, which features a small toolbar offering advertising to paid content such as music and movies (Ferri, 2012). A growing frontier for in-app advertisements also include mobile applications, in particular on the Android platform, where Google has made various initiatives and programs to facilitate users offering advertisements inside of their free applications (Stevens, 2012). Fig.56 is a 141

Alex Movsessian

Computer Science

BUE

partial screenshot of the window of uTorrent, a popular BitTorrent client program, showing the in-app advertisements.

Fig. 62 Example of In-App Advertisement in uTorrent

142

Appendix A Project Schedule Gantt Chart

Appendix A - Project Schedule Gantt Chart

Fig.57,58 and 59 show a Gantt chart representation for the project schedule.

Fig. 63 Project Schedule Gantt Chart - Part 1

Alex Movsessian

Computer Science

BUE

Fig. 64 Project Schedule Gantt Chart - Part 2

145

Alex Movsessian

Computer Science

BUE

Fig. 65 Project Schedule Gantt Chart - Part 3

146

Appendix B Project Classes UML Diagram

Appendix B - Project Classes UML Diagram

Fig.60 shows the UML diagram of the classes used in the system.

Fig. 66 Project Classes UML Diagram

Appendix C Performance Analysis for Launching Parallel Application Appendix C - Performance Analysis for Launching Parallel Application

Graph.6 shows the function analysis for thread initialisation

Graph. 6 Function Analysis for thread initialisation

Alex Movsessian

Computer Science

BUE

Graph.7 shows CPU and GPU Utilisation Analysis.

Graph. 7 CPU and GPU Utilisation Analysis

151

Alex Movsessian

Computer Science

BUE

Graph.8 shows the performance per thread.

Graph. 8 Performance Per Thread

152

Alex Movsessian

Computer Science

BUE

Graph.9 shows the CPU core utilisation.

Graph. 9 CPU Cores Utilisation (2 cores in this test)

153

Appendix D Handshake Process Detailed Log Appendix D – Handshake Process Detailed Log

Alex Movsessian

Computer Science

BUE

Fig.61-65 show packet logs during the different stages of the handshake process.

Fig. 67 Handshake Log Part 1

155

Alex Movsessian

Computer Science

BUE

Fig. 68 Handshake Log Part 2

156

Alex Movsessian

Computer Science

BUE

Fig. 69 Handshake Log Part 3

157

Alex Movsessian

Computer Science

BUE

Fig. 70 Handshake Log Part 4

158

Alex Movsessian

Computer Science

BUE

Fig. 71 Handshake Log Part 5

159

Alex Movsessian

Computer Science

BUE

Appendix E Creative Commons License Legal Code Appendix E - Creative Commons License Legal Code

160

Alex Movsessian

Computer Science

BUE

Legal Code License THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED. BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND CONDITIONS. 1. Definitions a. "Adaptation" means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered an Adaptation for the purpose of this License. b. "Collection" means a collection of literary or artistic works, such as encyclopedias and anthologies, or performances, phonograms or 161

Alex Movsessian

Computer Science

BUE

broadcasts, or other works or subject matter other than works listed in Section 1(f) below, which, by reason of the selection and arrangement of their contents, constitute intellectual creations, in which the Work is included in its entirety in unmodified form along with one or more other contributions, each constituting separate and independent works in themselves, which together are assembled into a collective whole. A work that constitutes a Collection will not be considered an Adaptation (as defined below) for the purposes of this License. c. "Creative Commons Compatible License" means a license that is listed at http://creativecommons.org/compatiblelicenses that has been approved by Creative Commons as being essentially equivalent to this License, including, at a minimum, because that license: (i) contains terms that have the same purpose, meaning and effect as the License Elements of this License; and, (ii) explicitly permits the relicensing of adaptations of works made available under that license under this License or a Creative Commons jurisdiction license with the same License Elements as this License. d. "Distribute" means to make available to the public the original and copies of the Work or Adaptation, as appropriate, through sale or other transfer of ownership. e. "License Elements" means the following high-level license attributes as selected by Licensor and indicated in the title of this License: Attribution, ShareAlike. f. "Licensor" means the individual, individuals, entity or entities that offer(s) the Work under the terms of this License. g. "Original Author" means, in the case of a literary or artistic work, the individual, individuals, entity or entities who created the Work or if no individual or entity can be identified, the publisher; and in addition (i) in the case of a performance the actors, singers, musicians, dancers, and other persons who act, sing, deliver, declaim, play in, interpret or otherwise perform literary or artistic works or expressions of folklore; (ii) in the case 162

Alex Movsessian

Computer Science

BUE

of a phonogram the producer being the person or legal entity who first fixes the sounds of a performance or other sounds; and, (iii) in the case of broadcasts, the organization that transmits the broadcast. h. "Work" means the literary and/or artistic work offered under the terms of this License including without limitation any production in the literary, scientific and artistic domain, whatever may be the mode or form of its expression including digital form, such as a book, pamphlet and other writing; a lecture, address, sermon or other work of the same nature; a dramatic or dramatico-musical work; a choreographic work or entertainment in dumb show; a musical composition with or without words; a cinematographic work to which are assimilated works expressed by a process analogous to cinematography; a work of drawing, painting, architecture, sculpture, engraving or lithography; a photographic work to which are assimilated works expressed by a process analogous to photography; a work of applied art; an illustration, map, plan, sketch or three-dimensional work relative to geography, topography, architecture or science; a performance; a broadcast; a phonogram; a compilation of data to the extent it is protected as a copyrightable work; or a work performed by a variety or circus performer to the extent it is not otherwise considered a literary or artistic work. i. "You" means an individual or entity exercising rights under this License who has not previously violated the terms of this License with respect to the Work, or who has received express permission from the Licensor to exercise rights under this License despite a previous violation. j.

"Publicly Perform" means to perform public recitations of the Work and to communicate to the public those public recitations, by any means or process, including by wire or wireless means or public digital performances; to make available to the public Works in such a way that members of the public may access these Works from a place and at a place individually chosen by them; to perform the Work to the public by any means or process and the communication to the public of the 163

Alex Movsessian

Computer Science

BUE

performances of the Work, including by public digital performance; to broadcast and rebroadcast the Work by any means including signs, sounds or images. k. "Reproduce" means to make copies of the Work by any means including without limitation by sound or visual recordings and the right of fixation and reproducing fixations of the Work, including storage of a protected performance or phonogram in digital form or other electronic medium. 2. Fair Dealing Rights. Nothing in this License is intended to reduce, limit, or restrict any uses free from copyright or rights arising from limitations or exceptions that are provided for in connection with the copyright protection under copyright law or other applicable laws. 3. License Grant. Subject to the terms and conditions of this License, Licensor hereby grants You a worldwide, royalty-free, non-exclusive, perpetual (for the duration of the applicable copyright) license to exercise the rights in the Work as stated below: a. to Reproduce the Work, to incorporate the Work into one or more Collections, and to Reproduce the Work as incorporated in the Collections; b. to create and Reproduce Adaptations provided that any such Adaptation, including any translation in any medium, takes reasonable steps to clearly label, demarcate or otherwise identify that changes were made to the original Work. For example, a translation could be marked "The original work was translated from English to Spanish," or a modification could indicate "The original work has been modified."; c. to Distribute and Publicly Perform the Work including as incorporated in Collections; and, d. to Distribute and Publicly Perform Adaptations. e. For the avoidance of doubt:

164

Alex Movsessian

i.

Computer Science

BUE

Non-waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme cannot be waived, the Licensor reserves the exclusive right to collect such royalties for any exercise by You of the rights granted under this License;

ii.

Waivable Compulsory License Schemes. In those jurisdictions in which the right to collect royalties through any statutory or compulsory licensing scheme can be waived, the Licensor waives the exclusive right to collect such royalties for any exercise by You of the rights granted under this License; and,

iii.

Voluntary License Schemes. The Licensor waives the right to collect royalties, whether individually or, in the event that the Licensor is a member of a collecting society that administers voluntary licensing schemes, via that society, from any exercise by You of the rights granted under this License.

The above rights may be exercised in all media and formats whether now known or hereafter devised. The above rights include the right to make such modifications as are technically necessary to exercise the rights in other media and formats. Subject to Section 8(f), all rights not expressly granted by Licensor are hereby reserved. 4. Restrictions. The license granted in Section 3 above is expressly made subject to and limited by the following restrictions: a. You may Distribute or Publicly Perform the Work only under the terms of this License. You must include a copy of, or the Uniform Resource Identifier (URI) for, this License with every copy of the Work You Distribute or Publicly Perform. You may not offer or impose any terms on the Work that restrict the terms of this License or the ability of the recipient of the Work to exercise the rights granted to that recipient under the terms of the License. You may not sublicense the Work. You must keep intact all 165

Alex Movsessian

Computer Science

BUE

notices that refer to this License and to the disclaimer of warranties with every copy of the Work You Distribute or Publicly Perform. When You Distribute or Publicly Perform the Work, You may not impose any effective technological measures on the Work that restrict the ability of a recipient of the Work from You to exercise the rights granted to that recipient under the terms of the License. This Section 4(a) applies to the Work as incorporated in a Collection, but this does not require the Collection apart from the Work itself to be made subject to the terms of this License. If You create a Collection, upon notice from any Licensor You must, to the extent practicable, remove from the Collection any credit as required by Section 4(c), as requested. If You create an Adaptation, upon notice from any Licensor You must, to the extent practicable, remove from the Adaptation any credit as required by Section 4(c), as requested. b. You may Distribute or Publicly Perform an Adaptation only under the terms of: (i) this License; (ii) a later version of this License with the same License Elements as this License; (iii) a Creative Commons jurisdiction license (either this or a later license version) that contains the same License Elements as this License (e.g., Attribution-ShareAlike 3.0 US)); (iv) a Creative Commons Compatible License. If you license the Adaptation under one of the licenses mentioned in (iv), you must comply with the terms of that license. If you license the Adaptation under the terms of any of the licenses mentioned in (i), (ii) or (iii) (the "Applicable License"), you must comply with the terms of the Applicable License generally and the following provisions: (I) You must include a copy of, or the URI for, the Applicable License with every copy of each Adaptation You Distribute or Publicly Perform; (II) You may not offer or impose any terms on the Adaptation that restrict the terms of the Applicable License or the ability of the recipient of the Adaptation to exercise the rights granted to that recipient under the terms of the Applicable License; (III) You must keep intact all notices that refer to the Applicable License and to the disclaimer of warranties with every copy of the Work as included in the 166

Alex Movsessian

Computer Science

BUE

Adaptation You Distribute or Publicly Perform; (IV) when You Distribute or Publicly Perform the Adaptation, You may not impose any effective technological measures on the Adaptation that restrict the ability of a recipient of the Adaptation from You to exercise the rights granted to that recipient under the terms of the Applicable License. This Section 4(b) applies to the Adaptation as incorporated in a Collection, but this does not require the Collection apart from the Adaptation itself to be made subject to the terms of the Applicable License. c. If You Distribute, or Publicly Perform the Work or any Adaptations or Collections, You must, unless a request has been made pursuant to Section 4(a), keep intact all copyright notices for the Work and provide, reasonable to the medium or means You are utilizing: (i) the name of the Original Author (or pseudonym, if applicable) if supplied, and/or if the Original Author and/or Licensor designate another party or parties (e.g., a sponsor institute, publishing entity, journal) for attribution ("Attribution Parties") in Licensor's copyright notice, terms of service or by other reasonable means, the name of such party or parties; (ii) the title of the Work if supplied; (iii) to the extent reasonably practicable, the URI, if any, that Licensor specifies to be associated with the Work, unless such URI does not refer to the copyright notice or licensing information for the Work; and (iv) , consistent with Ssection 3(b), in the case of an Adaptation, a credit identifying the use of the Work in the Adaptation (e.g., "French translation of the Work by Original Author," or "Screenplay based on original Work by Original Author"). The credit required by this Section 4(c) may be implemented in any reasonable manner; provided, however, that in the case of a Adaptation or Collection, at a minimum such credit will appear, if a credit for all contributing authors of the Adaptation or Collection appears, then as part of these credits and in a manner at least as prominent as the credits for the other contributing authors. For the avoidance of doubt, You may only use the credit required by this Section for the purpose of attribution in the manner set out above and, by 167

Alex Movsessian

Computer Science

BUE

exercising Your rights under this License, You may not implicitly or explicitly assert or imply any connection with, sponsorship or endorsement by the Original Author, Licensor and/or Attribution Parties, as appropriate, of You or Your use of the Work, without the separate, express prior written permission of the Original Author, Licensor and/or Attribution Parties. d. Except as otherwise agreed in writing by the Licensor or as may be otherwise permitted by applicable law, if You Reproduce, Distribute or Publicly Perform the Work either by itself or as part of any Adaptations or Collections, You must not distort, mutilate, modify or take other derogatory action in relation to the Work which would be prejudicial to the Original Author's honor or reputation. Licensor agrees that in those jurisdictions (e.g. Japan), in which any exercise of the right granted in Section 3(b) of this License (the right to make Adaptations) would be deemed to be a distortion, mutilation, modification or other derogatory action prejudicial to the Original Author's honor and reputation, the Licensor will waive or not assert, as appropriate, this Section, to the fullest extent permitted by the applicable national law, to enable You to reasonably exercise Your right under Section 3(b) of this License (right to make Adaptations) but not otherwise. 5. Representations, Warranties and Disclaimer UNLESS OTHERWISE MUTUALLY AGREED TO BY THE PARTIES IN WRITING, LICENSOR OFFERS THE WORK AS-IS AND MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND CONCERNING THE WORK, EXPRESS, IMPLIED, STATUTORY OR OTHERWISE, INCLUDING, WITHOUT LIMITATION, WARRANTIES OF TITLE, MERCHANTIBILITY, FITNESS FOR A PARTICULAR PURPOSE, NONINFRINGEMENT, OR THE ABSENCE OF LATENT OR OTHER DEFECTS, ACCURACY, OR THE PRESENCE OF ABSENCE OF ERRORS, WHETHER OR NOT DISCOVERABLE. SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION

168

Alex Movsessian

Computer Science

BUE

OF IMPLIED WARRANTIES, SO SUCH EXCLUSION MAY NOT APPLY TO YOU. 6. Limitation on Liability. EXCEPT TO THE EXTENT REQUIRED BY APPLICABLE LAW, IN NO EVENT WILL LICENSOR BE LIABLE TO YOU ON ANY LEGAL THEORY FOR ANY SPECIAL, INCIDENTAL, CONSEQUENTIAL, PUNITIVE OR EXEMPLARY DAMAGES ARISING OUT OF THIS LICENSE OR THE USE OF THE WORK, EVEN IF LICENSOR HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 7. Termination a. This License and the rights granted hereunder will terminate automatically upon any breach by You of the terms of this License. Individuals or entities who have received Adaptations or Collections from You under this License, however, will not have their licenses terminated provided such individuals or entities remain in full compliance with those licenses. Sections 1, 2, 5, 6, 7, and 8 will survive any termination of this License. b. Subject to the above terms and conditions, the license granted here is perpetual (for the duration of the applicable copyright in the Work). Notwithstanding the above, Licensor reserves the right to release the Work under different license terms or to stop distributing the Work at any time; provided, however that any such election will not serve to withdraw this License (or any other license that has been, or is required to be, granted under the terms of this License), and this License will continue in full force and effect unless terminated as stated above. 8. Miscellaneous a. Each time You Distribute or Publicly Perform the Work or a Collection, the Licensor offers to the recipient a license to the Work on the same terms and conditions as the license granted to You under this License.

169

Alex Movsessian

Computer Science

BUE

b. Each time You Distribute or Publicly Perform an Adaptation, Licensor offers to the recipient a license to the original Work on the same terms and conditions as the license granted to You under this License. c. If any provision of this License is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this License, and without further action by the parties to this agreement, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable. d. No term or provision of this License shall be deemed waived and no breach consented to unless such waiver or consent shall be in writing and signed by the party to be charged with such waiver or consent. e. This License constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This License may not be modified without the mutual written agreement of the Licensor and You. f. The rights granted under, and the subject matter referenced, in this License were drafted utilizing the terminology of the Berne Convention for the Protection of Literary and Artistic Works (as amended on September 28, 1979), the Rome Convention of 1961, the WIPO Copyright Treaty of 1996, the WIPO Performances and Phonograms Treaty of 1996 and the Universal Copyright Convention (as revised on July 24, 1971). These rights and subject matter take effect in the relevant jurisdiction in which the License terms are sought to be enforced according to the corresponding provisions of the implementation of those treaty provisions in the applicable national law. If the standard suite of rights granted under applicable copyright law includes additional rights not granted under this License, such additional rights are deemed to be included in the License; this License is not intended to restrict the license of any rights under applicable law. 170

Alex Movsessian

Computer Science

BUE

Glossary of Key Terms Glossary of Key Terms

171

Alex Movsessian

Computer Science

BUE

Ad-hoc Software: Software that is made for a particular purpose, to solve a specific instance of a problem with no capability of solving general instances of a problem or expansion otherwise. Attack Vector: A method by which a malicious entity might perform unintended and potentially harmful actions on a particular system. C#: The C# programming language was designed by Microsoft through a development team led by Anders Hejlsberg. It was first released in 2000, together with the first version of the .Net framework. DDoS Attack: Exploits a major flaw in the architecture of the web, which is that the bandwidth available to the server becomes inversely proportional with the amount of consumption by the clients connecting to it. DHT: Distributed Hash Tables(DHTs) are a form of distributed systems which offer the basic functionality of a hash table, that is: Storing key value pairs (k, v) and Looking up a value given its key (Gaurang, 2013). Distributed File System: A non-localised file system. One where the files are mostly stored on remote computers other than the user’s. Exploit: The use of a piece of code or data in a particular system to perform malicious or intended actions to the system’s structure or users. FileSystemWatcher: A mechanism in Microsoft’s Windows operating system to asychrnonously watch for changes in files and folders. Hashing Algorithm: A hashing algorithm is a function that takes an input from a universal key space and maps it to a smaller key space generating the hash value. HTTP Protocol: The Hypertext Transfer Protocol (HTTP) is the most commonly used protocol on the web for transmitting web pages.

172

Alex Movsessian

Computer Science

BUE

IP Protocol: The IP Protocol is the main protocol used to identify networked nodes, both in a Local Area Network (LAN) or on the Internet. The IP main function is routing of messages from one location onto another given a certain address (Forouzan, 2002). Kademlia: One of the major DHT structures widely used on the Internet is the Kademlia structure, designed by Petar Maymounkov and David Mazières in 2002. Key Size: The number of bits used to represent a key. Typically used to calculate the number of possible keys. Key Value Pair: A set that defined a particular entry in a lookup table. Typically the key is used to lookup the value. Link Crawler: A program that takes one or more links, downloads their content and retrieves certain data from them such as the links in their web pages. Lookup Table: A mechanism by which values are stored and could be retrieved by means of a smaller input subset. A hash table is an example of lookup Tables(Baset, 2006). MVVM: Model View View-Model is a design pattern conceived by Microsoft as a more-modern alternative to their existing Graphical User Interface (GUI) building and designing mechanism that was called Windows Forms (Microsoft, 2006). N-Way Merge: An algorithm for sorting data in parallel by means of dividing them, into smaller groups which are sorted individually and then merged together. OSI Model: The Open Systems Interconnection (OSI) is a conceptual model for representing the abstract layers which form the current communication model over the Internet (Stallings, 1987).

173

Alex Movsessian

Computer Science

BUE

Packet Analysis: A method by which the transmitted packets on a network are captured and examined, primarly for purposes of debugging protocols and validating their functionality. Peer-to-Peer Networks: Peer-to-Peer networks are an alternate model for establishing connections between a group of computers interested in communicating and exchanging data or resources together. The model does not make use of a centralized server, rather it relies on the peers themselves to act as both clients and servers within the system (Bellovin, 2001). Public Key Encryption: Public Key Encryption is also known as “asymmetric key encryption” makes use of two distinct keys which are created together. The keys could be referred interchangeably as public key and private key. A message M encrypted by one of the keys could be decrypted by the other (Cramer, 2003). Python: The Python programming language was designed by Guido van Rossum and it first appeared in 1991. It is a general purpose, high-level, object oriented language with emphasis on code readability and simplicity (VanRossum, 1994). SaaS: Software as a Service (SaaS) is a growing 12 Billion US dollars a year industry. The primal aspects of SaaS evolves around offering applications which make use of significant and expensive computation of networking power (Buxmann, 2008). Speech Recognition: A process by which a computerised audio data is converted into textual strings. Symmetric Key Encryption: A symmetric key encryption algorithm is one that given an input (called “plain text”), and a key, outputs what’s called “cipher text”, which is based on deterministic transformations of the plaintext based on the key (Paar, 2010). TCP Protocol: The TCP Protocol servers as one of the backbone protocols for the Internet together with IP, for transmitting messages in ordered sequences 174

Alex Movsessian

Computer Science

BUE

guaranteeing a reliably and error-checked transmission across various nodes, assuming no malicious interference (Forouzan, 2002). Web Server: Specialised server used for hosting data to become accessible mainly through the HTTP protocol. WPF: WPF represents each element in the UI as a separate object that could be used either independently on its own (for example a button), or to group other objects (Microsoft, 2006). XAML: XML based language used for defining and instantiating WPF objects (Microsoft, 2006).

175

Alex Movsessian

Computer Science

BUE

References References

176

Alex Movsessian

Computer Science

BUE

References



Anderson, D. P., Cobb, J., Korpela, E., Lebofsky, M., & Werthimer, D. (2002). SETI@ home: an experiment in public-resource computing. Communications of the ACM, 45(11), 56-61.



Anderson, D. P., Cobb, J., Korpela, E., Lebofsky, M., & Werthimer, D. (2002). SETI@ home: an experiment in public-resource computing. Communications of the ACM, 45(11), 56-61.



Babaioff, M., Dobzinski, S., Oren, S., & Zohar, A. (2012, June). On bitcoin and red balloons. In Proceedings of the 13th ACM Conference on Electronic Commerce (pp. 56-73). ACM.



Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., & Ousterhout, J. K. (1991, September). Measurements of a distributed file system. In ACM SIGOPS Operating Systems Review (Vol. 25, No. 5, pp. 198-212). ACM.



Barrett, P. (1987, January). Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In Advances in Cryptology–CRYPTO (Vol. 86, pp. 311323).



Baset, S. A., & Schulzrinne, H. (2006, April). An analysis of the skype peer-to-peer internet telephony protocol. In IEEE infocom (Vol. 6, pp. 23-29).



Bellare, M., Desai, A., Jokipii, E., & Rogaway, P. (1997, October). A concrete security treatment of symmetric encryption. In Foundations of Computer Science, 1997. Proceedings., 38th Annual Symposium on (pp. 394-403). IEEE.



Bellare, M., Kilian, J., & Rogaway, P. (2000). The security of the cipher block chaining message authentication code. Journal of Computer and System Sciences, 61(3), 362-399.

177

Alex Movsessian



Computer Science

BUE

Bellovin, S. (2001, June). Security aspects of Napster and Gnutella. In 2001 Usenix Annual Technical Conference.



Bellovin, S. M. (1989). Security problems in the TCP/IP protocol suite. ACM SIGCOMM Computer Communication Review, 19(2), 32-48.



Benadjila, R., Billet, O., Gueron, S., & Robshaw, M. J. (2009). The Intel AES instructions set and the SHA-3 candidates. In Advances in Cryptology–ASIACRYPT 2009 (pp. 162-178). Springer Berlin Heidelberg.



Berners-Lee, T., & Fischetti, M. (2001). Weaving the Web: The original design and ultimate destiny of the World Wide Web by its inventor. DIANE Publishing Company.



Berners-Lee, T., Fielding, R., & Frystyk, H. (1996). Hypertext transfer protocol--HTTP/1.0.



Bertoni, G., Daemen, J., Peeters, M., & Van Assche, G. (2009). Keccak specifications. Submission to NIST (Round 2).



Beverly, R., Berger, A., & Hyun, Y. (2009, November). Understanding the efficacy of deployed internet source address validation filtering. In Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference (pp. 356-369). ACM.



Bharambe, A. R., Herley, C., & Padmanabhan, V. N. (2012). Analyzing and improving BitTorrent performance. Microsoft Research, Microsoft Corporation One Microsoft Way Redmond, WA, 98052, 2005-03.



BitTorrent Labs (2013). Featured Experiments.



Buxmann, P., Hess, T., & Lehmann, S. (2008). Software as a Service.Wirtschaftsinformatik, 50(6), 500-503.



Chun, W. J. (2006). Core python programming. Prentice Hall PTR.



Clarke, I., Sandberg, O., Wiley, B., & Hong, T. W. (2001, January). Freenet: A distributed anonymous information storage and retrieval system. In Designing Privacy Enhancing Technologies (pp. 46-66). Springer Berlin Heidelberg.



Cohen, B. (2008). The BitTorrent protocol specification. 178

Alex Movsessian

Computer Science



Connor, J. M. (2007). Global price fixing. Springer.



Cooper, M., Griffin, J., & Knowledge, P. (2012). THE ROLE OF

BUE

ANTITRUST IN PROTECTING COMPETITION, INNOVATION AND CONSUMERS AS THE DIGITAL REVOLUTION MATURES: THE CASE AGAINST THE UNIVERSAL‐EMI MERGER AND E‐BOOK PRICE FIXING. 

Cormen, T. H. (2013). Algorithms Unlocked. MIT Press (MA).



Courtois, N. T. (2007). CTC2 and fast algebraic attacks on block ciphers revisited. IACR ePrint report, 152, 2007.



Cramer, R., & Shoup, V. (2003). Design and analysis of practical public-key encryption schemes secure against adaptive chosen ciphertext attack. SIAM Journal on Computing, 33(1), 167-226.



Cukier, K. N. (2005). Who will control the Internet? Washington battles the world. Foreign Affairs, 7-13.



Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107113.



Deering, S. E. (1998). Internet protocol, version 6 (IPv6) specification.



Diffie, W., & Hellman, M. (1976). New directions in cryptography. Information Theory, IEEE Transactions on, 22(6), 644-654.



Dingledine, R., Mathewson, N., & Syverson, P. (2004). Tor: The second-generation onion router. NAVAL RESEARCH LAB WASHINGTON DC.



Duan, M., & Lai, X. (2012). Improved zero-sum distinguisher for full round Keccak-f permutation. Chinese Science Bulletin, 57(6), 694-697.



Feldhofer, M., Wolkerstorfer, J., & Rijmen, V. (2005, October). AES implementation on a grain of sand. In Information Security, IEE Proceedings (Vol. 152, No. 1, pp. 13-20). IET.



Ferri, M. (2012). A Detailed Look Inside The Illegal Movie Market. Working Paper, August.

179

Alex Movsessian



Computer Science

BUE

Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., & Berners-Lee, T. (1999). Hypertext transfer protocol–HTTP/1.1.



Fok, C. L., Roman, G. C., & Lu, C. (2005, June). Rapid development and flexible deployment of adaptive wireless sensor network applications. In Distributed Computing Systems, 2005. ICDCS 2005. Proceedings. 25th IEEE International Conference on (pp. 653-662). IEEE.



Forouzan, B. A. (2002). TCP/IP protocol suite. McGraw-Hill, Inc..



Freeman, A. (2010). Windows Presentation Foundation. In Introducing Visual C# 2010 (pp. 1069-1098). Apress.



Gandhi, A., Gupta, V., Harchol-Balter, M., & Kozuch, M. A. (2010). Optimality analysis of energy-performance trade-off for server farm management.Performance Evaluation, 67(11), 1155-1171.



Gannod, G. C., Burge, J. E., & Helmick, M. T. (2008, May). Using the inverted classroom to teach software engineering. In Proceedings of the 30th international conference on Software engineering (pp. 777786). ACM.



GAURANG, P. (2013). DISTRIBUTED HASH TABLE.



Grembowski, T., Lien, R., Gaj, K., Nguyen, N., Bellows, P., Flidr, J., ... & Schott, B. (2002). Comparative analysis of the hardware implementations of hash functions SHA-1 and SHA-512. In Information Security (pp. 75-89). Springer Berlin Heidelberg.



Grembowski, T., Lien, R., Gaj, K., Nguyen, N., Bellows, P., Flidr, J., ... & Schott, B. (2002). Comparative analysis of the hardware implementations of hash functions SHA-1 and SHA-512. In Information Security (pp. 75-89). Springer Berlin Heidelberg.



Harkins, D., & Carrel, D. (1998). The internet key exchange (IKE). RFC 2409, november.



Harwit, E., & Clark, D. (2001). Shaping the internet in China. Evolution of political control over network infrastructure and content. Asian Survey, 41(3), 377-408. 180

Alex Movsessian



Computer Science

BUE

Hatahet, S., Challal, Y., & Bouabdallah, A. (2010, June). BiTIT: Throttling BitTorrent illegal traffic. In Computers and Communications (ISCC), 2010 IEEE Symposium on (pp. 708-713). IEEE.



Hazan, J. G. (2013). Stop Being Evil: A Proposal for Unbiased Google Search.Mich. L. Rev., 111, 789-789.



Hejlsberg, A., Wiltamuth, S., & Golde, P. (2006). The C# programming language. Addison-Wesley Professional.



Hills, J. (2006). What's New? War, Censorship and Global Transmission From the TeleGraph to the Internet. International Communication Gazette, 68(3), 195-216.



Howard, J. H., Kazar, M. L., Menees, S. G., Nichols, D. A., Satyanarayanan, M., Sidebotham, R. N., & West, M. J. (1988). Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1), 51-81.



Huan, T. (2013). The Application of SSL Protocol in Computer Network Communication. In Intelligence Computation and Evolutionary Computation (pp. 779-783). Springer Berlin Heidelberg.



Izu, T., Morikawa, Y., Nogami, Y., Sakemi, Y., & Takenaka, M. (2012). Detailed cost estimation of CNTW attack against EMV signature scheme. InFinancial Cryptographyand Data Security (pp. 13-26). Springer Berlin Heidelberg.



Jun, B., & Kocher, P. (1999). The Intel random number generator. CryptographyResearch Inc. white paper.



Kalafut, A., Acharya, A., & Gupta, M. (2006, October). A study of malware in peer-to-peer networks. In Proceedings of the 6th ACM SIGCOMM conference on Internet measurement (pp. 327-332). ACM.



Kaluszka, A. (2010). Distributed Hash Tables.



Kelsey, J., Schneier, B., Wagner, D., & Hall, C. (1998, January). Cryptanalytic attacks on pseudorandom number generators. In Fast Software Encryption (pp. 168-188). Springer Berlin Heidelberg.

181

Alex Movsessian



Computer Science

BUE

Kokholm, N., & Sestoft, P. (2006). The C5 generic collection library for C# and CLI. The IT University of Copenhagen.



Kristol, D. M. (2001). HTTP Cookies: Standards, privacy, and politics. ACM Transactions on Internet Technology (TOIT), 1(2), 151-198.



Kumar, M. (2012). Implementation Of Data Encryption Standard (DES) & Implementation of SHA-512 Algorithm for Attaining Digital Signature and Message Authentication (Doctoral dissertation).



Larsen, P. D., Kristoffersen, E. N., Mccrae, D., Kiziltunc, M. K., & Glasson, S. (2008). U.S. Patent Application 12/163,687.



Lipmaa, H., Wagner, D., & Rogaway, P. (2000). Comments to NIST concerning AES modes of operation: CTR-mode encryption.



Louridas, P. (2007). Declarative gui programming in microsoft windows.Software, IEEE, 24(4), 16-19.



Lundh, F. (2001). Python Standard Library. O'Reilly Media, Inc..



Luo, M., & Deters, R. (2012). Improving access control for mobile consumers of services by use of context and trust within the call-stack. In Advances in User Modeling (pp. 256-267). Springer Berlin Heidelberg.



Maurer, B., Nelms, T. C., & Swartz, L. (2013). “When perhaps the real problem is money itself!”: the practical materiality of Bitcoin. Social Semiotics, (ahead-of-print), 1-17.



Maymounkov, P., & Mazieres, D. (2002). Kademlia: A peer-to-peer information system based on the xor metric. Peer-to-Peer Systems, 53-65.



McCoy, D., Bauer, K., Grunwald, D., Kohno, T., & Sicker, D. (2008). Shining light in dark places: Understanding the Tor network. In Privacy Enhancing Technologies (pp. 63-76). Springer Berlin/Heidelberg.



McCoy, D., Bauer, K., Grunwald, D., Kohno, T., & Sicker, D. (2008, January). Shining light in dark places: Understanding the Tor network. In Privacy Enhancing Technologies (pp. 63-76). Springer Berlin Heidelberg. 182

Alex Movsessian



Computer Science

BUE

Mell, P., & Grance, T. (2011). The NIST definition of cloud computing (draft).NIST special publication, 800, 145.



Mintzberg, H. (2010). Structure in sevens, designing effective organizations.



Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. Consulted, 1, 2012.



Naor, M., & Wieder, U. (2003). A simple fault tolerant distributed hash Table.In Peer-to-Peer Systems II (pp. 88-97). Springer Berlin Heidelberg.



Ortloff, S. Botnet shutdown success story-again: Disabling the new hlux/kelihos botnet. Computer Fraud Security.



Paar, C., & Pelzl, J. (2010). Understanding Cryptography. The RSA Cryptosystem.



Pouwelse, J., Garbacki, P., Epema, D., & Sips, H. (2005). The bittorrent p2p file-sharing system: Measurements and analysis. Peerto-Peer Systems IV, 205-216.



Powell, J. N., Beaulé, P. E., Antoniou, J., Bourne, R. B., Schemitsch, E. H., Vendittoli, P., ... & Smit, A. (2012). A SURVEY OF THE CANADIAN RESURFACING WORKING GROUP EXPERIENCE: RATES OF CONVERSION FROM RSA TO THR. Journal of Bone & Joint Surgery, British Volume,94(SUPP XXXVIII), 165-165.



Preneel, B. (1994). Cryptographic hash functions. European Transactions on Telecommunications, 5(4), 431-448.



Ripeanu, M. (2001, August). Peer-to-peer architecture case study: Gnutella network. In Peer-to-Peer Computing, 2001. Proceedings. First International Conference on (pp. 99-100). IEEE.



Rogaway, P., & Shrimpton, T. (2004, January). Cryptographic hashfunction basics: Definitions, implications, and separations for preimage resistance, second-preimage resistance, and collision resistance. In Fast Software Encryption (pp. 371-388). Springer Berlin Heidelberg.

183

Alex Movsessian



Computer Science

BUE

Rosenberg, J. (2010). Interactive connectivity establishment (ICE): A protocol for network address translator (NAT) traversal for offer/answer protocols.



Rosenberg, J. (2010). Interactive connectivity establishment (ICE): A protocol for network address translator (NAT) traversal for offer/answer protocols.



Schildt, H. (2008). C# 3.0 THE COMPLETE REFERENCE 3/E. McGraw-Hill, Inc..



Shen, X., Wang, H. Y., & Wang, K. (2012). U.S. Patent No. 8,140,828. Washington, DC: U.S. Patent and Trademark Office.



Shujun, L., Xuanqin, M., & Yuanlong, C. (2001). Pseudo-random bit generator based on couple chaotic systems and its applications in stream-cipher cryptocraphy. In Progress in Cryptology—INDOCRYPT 2001 (pp. 316-329). Springer Berlin Heidelberg.



Sidenius, K., Jensen, M., Tommerup, E., Magdalena, F., & Jensen, B. (2012).Copyright vs. Megaupload (Doctoral dissertation).



Siganos, G., Pujol, J. M., & Rodriguez, P. (2009). Monitoring the bittorrent monitors: A bird’s eye view. In Passive and Active Network Measurement (pp. 175-184). Springer Berlin Heidelberg.



Sinclair, G. (2002). The Internet in China: Information revolution or authoritarian solution?. Modem Chinese Studies.



Smith, J. (2009). WPF apps with the model-view-ViewModel design pattern.MSDN magazine, (2009).



Stallings, W. (1987). Handbook of computer-communications standards; Vol. 1: the open systems interconnection (OSI) model and OSI-related standards. Macmillan Publishing Co., Inc..



Steiner, T., & Hausenblas, M. (2010, November). SemWebVid-making video a first class semantic web citizen and a first class web Bourgeois. In 9th International Semantic Web Conference (ISWC’10).

184

Alex Movsessian



Computer Science

BUE

Stevens, R., Gibler, C., Crussell, J., Erickson, J., & Chen, H. (2012). Investigating user privacy in android ad libraries. IEEE Mobile Security Technologies (MoST).



Tian, Y., Wu, D., & Ng, K. W. (2006, April). Modeling, analysis and improvement for bittorrent-like file sharing networks. In INFOCOM 2006. 25th IEEE International Conference on Computer Communications. Proceedings (pp. 1-11). IEEE.



Turner, M., Budgen, D., & Brereton, P. (2003). Turning software into a service.Computer., 36(10), 38-44.



Van Rossum, G. (1994). Python programming language.



Washington, L. C., & Trappe, W. (2002). Introduction to cryptography: with coding theory. Prentice Hall PTR.



Williams, M. (2002). Microsoft Visual C# (core reference). Microsoft Press.



Yang, A. M., Jiang, S. Y., & Deng, H. (2008, November). A P2P network traffic classification method using SVM. In Young Computer Scientists, 2008. ICYCS 2008. The 9th International Conference for (pp. 398-403). IEEE.



Yang, H. C., Dasdan, A., Hsiao, R. L., & Parker, D. S. (2007, June). Map-reduce-merge: simplified relational data processing on large clusters. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data (pp. 1029-1040). ACM.



Yuan, S., Abidin, A. Z., Sloan, M., & Wang, J. (2012). Internet Advertising: An Interplay among Advertisers, Online Publishers, Ad Exchanges and Web Users. arXiv preprint arXiv:1206.1754.

185

Alex Movsessian

Computer Science

BUE

Index Index

186

Alex Movsessian

Computer Science

BUE

A address space, 39

D

Ad-hoc Application Ports, 143 Data Analysis Tools, 114 AES Algorithm, 48 DDoS, 29, 33 Amazon’s EC^2, 59 DHT Buckets Structure, 92 anonymity, 16 distributed computation, 16 Anonymity, 34 Distributed File System, 95 Authentication, 84 Distributed File Systems, 45 Automaton, 84 Distributed Hash Table, 91

B

Distributed Hash Tables, 37

Bit Coin, 38

Dynamic Cloud Storage, 143

Bit Torrent, 38

F

BitCoin, 75 File Identifiers, 95 BitTorrent Sync, 71 Freenet, 38 Block Ciphers, 46 Future Works, 142 Bootstrapping, 82 Botnets, 38

C

G Gantt Chart, 22 Geographical Location, 118

C# Programming Language, 63 Graphics Processing Unit, 143 censorship, 29 Collision Resistance, 54 Conclusion, 138

H Hashing Algorithms, 54 187

Alex Movsessian

Computer Science

HTTP Protocol, 57

I

BUE

M Management Methodology, 19

Image processing, 142

Map Reduce, 69

Integrating with a search engine,

Mass-network sniffing, 34

145

Message Exchange, 85

Integrity Check, 130

Microsoft’s Windows Azure, 59

Internationalisation, 64

Model View View-Model (MVVM), 65

Introduction, 14

Motivation, 14

IP Protocol, 26

N

ISPs, 28 Native Algorithmic Libraries, 142

K

Network Filtering, 126

Kademlia, 40

Network Latency, 119

Key Exchange, 53

Network Performance, 118

L

No-IP, 117 N-Way Merge, 109

Lambda functions, 64 Licensing, 22

O

Link Crawler, 112

Objective, 16

LINQ, 64

Omega Python, 102

Load Balancing, 112

One-way Function, 54

Local File Management, 98

Optimality, 133

Lookup method, 39

OSI Model, 25

188

Alex Movsessian

Computer Science

P

BUE

Speech Recognition Libraries, 143 spoofing, 29

Passive Listening, 82 SQL, 142 Peer-to-Peer Networks, 30 Stream Ciphers, 46 Performance Analyser, 115 Streaming, 142 Project Management, 19 Symmetric Key Encryption, 46 Project Schedule, 21 Proxification, 17, 87

T

Public Key Encryption, 50

TCP, 16

Python, 17

TCP Connection, 82

Python programming language, 61

TCP Protocol, 27

R

Timing attacks, 52 Tor Network, 33

RAD, 19 Trace Route, 116 Relationship between System Components, 18

V

RSA algorithm, 51

Version Control, 126

S

W

Server software exploits, 34

Web Server, 17, 93

SETI@Home, 36

Wireshark, 114

SHA 512, 55

WPE Pro, 115

Skype, 73

X

Software as a Service (SaaS), 59 XAML, 67

189

Acronyms Acronyms

Alex Movsessian

Computer Science

BUE

AES: Advanced Encryption Standard Bit: Binary Digit BT: BitTorrent CPU: Central Processing Unit DDoS: distributed Denial of Service DES: Data Encryption Standard FSW: File System Watcher GPU: Graph.ics Processing Unit HTTP: Hyper Text Processing Language IIS: Internet Information Services IP: Internet Protocol ISP: Internet Service Provider LAN: Local Area Network LINQ: Language-Integrated Query MVVM: Model View View-Model NIST: National Institute for Standards and Technology OOP: Object Oriented Programming OSI: Open Systems Interconnection P2P: Peer-to-Peer RAD: Rapid Application Development RAM: Random Access Memory

191

Alex Movsessian

Computer Science

BUE

RSA: Rivest Shamir Adelman SaaS: Software as a Service SETI: Search for Extra Terrestrial Intelligence SHA: Secure Hash Algorithm SQL: Structured Query Language TCP: Transmission Control Protocol URL: Uniform Resource Locator WPE: Winsock Packet Editor WPF: Windows Presentation Foundation XAML: Extensible Application Markup Language XML: Extensible Markup Language XOR: Exclusive Or

192

Suggest Documents