Challenges in P2P Computing - Semantic Scholar

2 downloads 0 Views 436KB Size Report
Hybrid architecture: KaZaA. ❍ A decentralized network of centralized clusters. ❍ Sacrifice anonymity to achieve efficiency. ❍ Cross between Napster and ...
Challenges in P2P Computing Lionel M. Ni Department of Computer Science Hong Kong Univ. of Science & Technology [email protected] http://www.cs.ust.hk/~ni 1

Peer-to-Peer Model q P2P: a class of systems and applications that

employ distributed resources to perform a critical function in a decentralized manner m m m m m

Link the resources of all peers Resources: storage, CPU cycles, content, etc. All peers are servers and equal – highly scalable All peers are autonomous (different owners) Peers are both clients and servers

q Examples: Napster, Gnutella, KaZaA, Bit-

Torrent, E-Donkey…

2

Taxonomy of P2P File Sharing Networks q Unstructured: file placement is unrelated

to the overlay topology m m m

With a central server: Napster Fully decentralized: Gnutella Hierarchical: KaZaA

q Structured: the overlay topology & file (or

file indices) placement are tightly controlled m

One-dimensional coordinate space: Chord

3

Pure Decentralized Model q Gnutella m m

m m

Flood queries within the horizon (TTL) Query hit sent back along the request path Select peer/file to download Directly download from selected peers

query

q Disadvantage m Not efficient m Query Loss Problem

4

Hierarchical Model q Hybrid architecture: KaZaA m m m

A decentralized network of centralized clusters Sacrifice anonymity to achieve efficiency Cross between Napster and Gnutella

q Each node is either a super-

peer or assigned to a superpeer q Search mechanism among super-peers is blind flooding, same as in Gnutella

5

Existing P2Ps not Scalable q P2P traffic contributes the largest portion

of Internet traffic (ACM SIGCOMM’02 ) q P2P dominates the campus network, consuming 43% of all bandwidth compared to 14% for WWW traffic (OSDI 2002) q Given 95% of any two nodes are less than 7 hops away and TTL=7, Gnutella’s current search generates 330 TB/month with only 50,000 nodes q Gnutella has around 2 million users online at any time; KaZaA has 3-4 million users online at any time 6

P2P Computing q Can P2P survive without a business model? m Low entrance barrier m Low maintenance cost q How to grow a P2P network? m Traffic efficient m Attract more peers

• Rich contents – Willing to contribute • Fast response time • Trustworthy environment – Anonymity; secure; integrity; availability 7

Traffic Efficient P2P q Topology Mismatch q Efficient Search q Multi-download q Replication and Cache

8

Message Duplications in Overlay Connections S

M

L

P 2 copies

2 copies Q 3 copies

Same message traversed over the same channel twice (LM and MP)

Implosion: duplicated messages are sent to the same node

9

Topology Mismatch Problem

S is the source. The longest physical link SC will be traversed three times when the overlay does not match the physical. 10

Topology Mismatch Problem

Up to 70 % of the query responses along mismatching paths 11

What We Know Now? q Even given the global knowledge of all the

peering nodes and non-peering nodes, with millions of nodes in the system, and they are randomly coming and leaving, it is difficult, if not impossible, to compute an optimal overlay topology. q The reality is even worse, m we

don’t have global knowledge; m we don’t have a central server.

q Therefore, we need to explore distributed

approaches.

12

Our Contributions q ACE (ICDCS04) and AOTO (Globecomm03) m 1-hop neighbor information m The simplest and slowest q LTM (Infocom04, IEEE TPDS) m Location-aware approach m The convergent speed of LTM is the fastest, but needs synchronization. q SBO (IPDPS04) m With half overhead of ACE/AOTO, reduces the traffic cost the most. 13

Free Riding Problem q Free riding m No individual is willing to contribute towards the cost of something (public goods) when he/she hopes that someone else will bear the cost instead q Free riding leads to degradation of the

performance of the system and adds vulnerability to the system q Lacks incentives for cooperation q In P2P, many people just download files contributed by others and never share any of their files m m

Files shared in P2P are ‘public goods’ Free Rider Statistics in Gnutella: 70% of users share no files 14

Free Riding Challenges q What exactly is the free riding problem

and what are the consequences? q Could the problem be solved? How? q How can “incentives mechanism” be created? m Incentive

policy: Maze (Peking U.) m Credit management: fully distributed m Anonymity m Availability of open source

q Other models: e.g., game theoretic model 15

P2P Anonymity q Should be an option to peers q Different levels of anonymity m Publisher Anonymity

• Hide publisher identity to resist censorship • Dilemma : Anonymity or Authenticity

m Initiator

(requester) Anonymity m Responder Anonymity m Mutual Anonymity

16

Path-based Initiator Anonymity Z

X

I

R

Y 17

Mix and Onion Routing x Send to y y Send to A A Messag e B

y Send to A A Messag e x

A Messag e y

A

q x only knows B and y; but x does not know whether B is the q q q q

initiator The identity of B is hidden from x, y, A. The message is hidden from x, y. B knows everything. Onion routing improves from Mix with symmetric keys

18

Mutual Anonymity l

l

Mutual Communication Anonymity: (1) hiding requester’s identity, (2) hiding responder’s identity, and (3) no others are able to guess the two parties and the shared document. Previous studies l

l l

Onion routing: initiator knows everything and responder knows nothing APFS P5 19

Mutual Anonymity: APFS q A and B build paths, and publicize their tail nodes

(agents). q A and B connect to each other through their tail. m m m

A and B anonymously send packets to their tails (TA, TB) Their tails then overtly contact the responder’s tail node. The responder’s tail passes the packets to the anonymous responder.

A

TA

TB

B

C

TC

TD

D

20

Problems with Mutual Anonymity q Hard to achieve in a bidirectional way m Without knowledge of each other, how to accomplish the transaction? Anonymous but knowing he/she is good m How to trust the opposing party? q High overhead m Cryptography cost added (Onion Routing) m Consuming a lot of capacity, bandwidth, storage… (splitting file into shares, multicasting… ) q Conflict with some design goals of P2P m Efficiency m Decentralization: agent management m Peer discovery: search is more complicate 21

Trustworthy P2P q Provide a trustworthy P2P environment m Peer trustiness m Anonymity (option) m File integrity m Availability q Malicious peers m Fake information m Create unnecessary traffic m Virus spreading q Handling of malicious peers m Detection and prevention 22

Peer Trustiness q Peer-to-Peer is a fully distributed m With no central coordination m No central database m No global view of the system m Peers are autonomous, and may be anonymous m Peers are unreliable m Transactions are performed between Peers q How can a peer trust another peer?

23

Four Approaches of “Authentic” q Oldest Document m The oldest submission is considered authentic m Timestamping systems • Can you trust the timestamp? • Very difficult without a centralized CA

q Expert-based m After search results are returned, ask experts for opinion. m Authoritative nodes keep track of signatures m Expert group management

• Many expert groups with different opinions • Experts get feedback from peers to update their opinion • Can experts trust feedback without a centralized CA?

24

Four Approaches of “Authentic” (cont’d) q Voting-based m Votes of many experts m Voter/Expert management m Asking for votes from experts (not all experts will return the vote) m Experts may be humans m Spoofing of votes, nodes and files q Reputation-based m Weight votes, some experts more trustworthy m Dynamically adjust the reputation 25

Three elements: Identity, Trust, Reputation q Identity m Who is making a statement (responder) q Trust m Can I believe the person who is making the statement q Reputation m What is the history of trust in the person making the statement

• Build a reputation history for each pair of peers

m Reputation

management 26

Reputation-based Trust Management q Trust Management m a mechanism that allows to establish mutual trust. m Peer A trusts B does not imply vice versa q Reputation m a measure that is derived from direct or indirect knowledge on earlier transactions. q Reputation-based trust management m one specific form of Trust Management.

27

Strategy of Reputation-based Trust Management q Get opinion list based on self observation q Propagate it by reputation collection m Voting m From Friends (Friends of Friends) q Building a reputation matrix m A peer’s reputation may be different to other different peers m Calculating own part m Cooperating to obtain a globe matrix

28

More Issues q Secure Score Management m Voting among multiple score managers m Peer score held by another peer q Threat scenarios m Malicious individuals (always bad) m Malicious collectives (always bad, trust among bad peers) m Camouflaged collectives (sometimes good to trick people) m Malicious spies (good all the time but friends with bad folks) 29

File Authenticity q Given a query, the authentic response has to be

distinguished m

Associated with the Reputation of provider

q How to guarantee the “authentic” ? q Including the file integrity m CRC, hashing, MACs, digital signatures m Provider a secure transforming

q File Authenticity m How do you know you have the right file? m Bogus copies m Corrupt copies

q Need detection/correction mechanisms 30

File Integrity q Is it enough to find the trust source? q The attacker can modify the file content m Man in the middle attacks q How to provide a secure transforming? q Need detection/correction mechanisms

31

Availability q Nodes should be always up q Overlay DoS & DDos attacks m Flooding a node with messages m Malicious super-nodes in Gnutella • Claims that the victim has all files requested

q Attack CPU availability m Sending complex queries q Attack file storage m Submit bogus documents q Attack quality-of-service m Serve a file slowly m Send a different file 32

Why Defending against P2P Overlay DDoS? q The flooding based search mechanism makes

overlay DDoS in P2Ps simple in design and operation q The anonymity design of P2P helps the malicious nodes easily hide behind other peers

33

Challenges In the future, P2P protocols need to be designed to make it hard for adversaries to construct DDoS attacks by taking advantage of loosely constrained protocol features. q More research is necessary to understand the effects of other types of DoS attacks in various P2P networks. q It is important to design future P2P protocols such that they do not open up new opportunities for attackers to use as amplifiers and back-door communication channels. q

34

Challenges q To protect against Man in the Middle attack, one

way to defeat attacker is for nodes to authenticate other nodes. This can be achieved by obtaining a node's public key through a secure channel (e.g., a trusted party such as certificate vendor, or through a web of trust like PGP) and validating their fingerprints. q Make firewalls smarter so that peer-to-peer applications can cooperate with the firewall to allow traffic the administrator wants. Firewalls must become more sophisticated, allowing systems behind the firewall to ask permission to run a particular peer-to-peer application. 35

Block Malicious or Misbehave Peers q How to isolate bad peers? q Blacklist m Bad peer can change names q Flow control and monitoring q Special monitoring peers (police force) m Special privilege q Remain open

36

Blocking Discipline-free P2P q P2P must be trustworthy compliant q Discipline-free P2P m None trustworthy compliant m Illegal information m Unable or difficult to catch bad peers q Block non-compliant P2P systems q Special force to attack discipline-free P2P

37

Summary q Why securing P2P data sharing applications

is challenging? q Open and autonomous nature q Peers can join and leave freely m Peers

cannot necessarily be trusted m route queries or respond correctly m store documents when asked to m serve documents when requested

38

Summary (cont’d) q Develop techniques to deal with fail-stop and

byzantine failures that are acceptable from a performance and security standpoint in a P2P context. q The lesson for P2P designers is that without accountability in a network, it is difficult to enforce rules of social responsibility. q Use security toolbox like JXTA (project JXTA has security API's and a library that implements RSA, RC4, MD5, SHA-1, a psuedo-random number generator, and digital signatures) to help P2P implementation. 39

Summary (cont’d) q Research is required to explore a broad range of

fundamental P2P issues such as: peer-node identity, naming, configuration and capabilities; P2P network organization and scope; resource discovery, content lookup, search and distribution; request routing and operation in the presence of mobility; adaptation to expected peer-node instability; monitoring of P2P operations; security of P2P systems involving reputation-based trust for ad-hoc systems or more centralized, CA-like approaches; etc.

40

Question? Thank You!

41

Suggest Documents