Position of Modern Peer-to-peer Systems in the Distributed Systems Architecture Mojca Ciglari , Tone Vidmar University of Ljubljana – Faculty of Computer Science, Tržaška 25, 1000 Ljubljana, Slovenia, +386 1 47 68 377
[email protected]
Abstract
a precise terminology whose value time didn't lessen a bit.
Recently, the hottest development in the area of distributed computing is the appearance of peer-topeer systems, employing a direct communication and free resources on desktop computers. In the paper, we describe peer-to-peer systems, compare the definitions of distributed systems with those of peer- to-peer systems and stress the characteristic aspects.
Later, we observe several quite similar attempts to redefine the concept of distribution. Let us mention the paper [Ozs91], which presents a threedimensional classification of the distributed database system implementation alternatives with regard to heterogeneity, distribution and autonomy of system parts. The paper [Leff91] presents a classification of transaction processing systems in a similar manner. However, Coulouris et al in [Cou01] give a different definition, highlighting the fact that the system components communicate only by passing messages.
Keywords: distributed systems, architecture, peerto-peer computing, content sharing, grid computing,
1 INTRODUCTION The idea of a distributed system is present ever since man first attempted to connect two computers with each other. Today, the concept of distributed systems is no more viewed as a novelty. Distributed systems are penetrating into virtually all areas of computing. Their widespread usage is constantly encouraged with technical and infrastructural development, and consequently several new ideas are emerging that even increase their importance and popularity. One of them is certainly peer-to-peer (P2P) computing. The idea within peer-to-peer computing is to use the free edge resources and combine their power into something like a giant file system or a supercomputer. In the paper, we first outline a few definitions of a distributed system and apply them to P2P systems. Further we mention some popular P2P applications, with emphasize on their similarities and differences. We also discuss characteristics of P2P concept and how they fit into the frame of distributed systems.
2 RELATED WORK In the early 1978, P. H. Enslow recognized a "distributed processing" cliché and the paper [Ens78] is a brilliant result of his attempt to clarify the original meaning of the expression distributed system. He discusses the five essential characteristics that a new born class of systems should exhibit, envisions the course of development and introduces
Peer-to-peer systems became really popular with the thought-provoking success of Napster in 2000. Since they represent such a new topic, at the moment we only refer to the book [Ora01], which presents the goals of best-known P2P systems, their problems and the solutions used there. On the other hand, a quite a few related sites and essays can be found on the Internet. There are definitions from Peer-to-peer working group [p2pwg], Shirky and Dougherty, well-known peer-to-peer analysts [Shi00], [Dou01], share their opinions on what P2P computing is and what is not. Then there are homepages of probably every P2P system, application or company among others all of those mentioned in the paper, and most of them offer their overview, design philosophy and at least some technical information.
3 WHAT IS A DISTRIBUTED SYSTEM? The idea of distributed systems is quite old: it emerged at the same time as the idea of networking, but not until lately technical possibilities enabled their widespread use. This section lists a few descriptions of a distributed system that will help us with the characterization of P2P systems in the remainder of the paper.
3.1 Five characteristics Enslow [Ens78] states that at least four physical components of a system might be distributed: hardware, data, processing and control (operating system). Their mutual interleaving, dependency and
1
(co)operation are characterized with the following features: a) Multiple (possibly heterogenous) resources, dynamically assigned to specific tasks. b) Physical distribution of these resources. c) High-level operating system on top of unique local OS-es (at least as a design philosophy). d) System transparency (the server does not have to be identified). e) Cooperative autonomy – interaction of physical and logical resources.
3.2 3D Cube Later, the distinction between distributed database systems and distributed processing systems somewhat affected also the characteristics and classification. Distributed database systems took shape of a three dimensional cube [Osz91] regarding to the degrees of control, autonomy and heterogeneity decentralization, as shown in Fig. 1. Interoperable systems as a subclass of multidatabase (heterogenous database systems) are further described in [Bri92] as most loosely coupled information-sharing systems. Global function is limited to simple data exchange and does not support full database functionality. Communication protocols are usually standardized. Distribution (single site / several sites)
Autonomy (no, some, full) Heterogeneity (homogenous, heterogeneous)
Fig. 3.1: The three dimensions of a distributed database system.
3.3 A Quintuple Leff and Pu’s classification [Leff01] of distributed processing systems differentiates them according to the values of the following five parameters: set of machines, set of processes, degree of heterogeneity, set of logical data, set of sites. Possible values are one and many or low and high degree. The system is then described by the quintuple of the above values.
We can capture the essential features of each system and from that gather the potential benefits and/or problems known from similar configurations.
3.4 Passing Messages Coulouris et al. present us with a very simple and effective definition in [Cou01]. HW or SW components located at networked computers communicate and coordinate their actions only by passing messages. No high-level control is required. This definition is not very restricting, However, it depends on the concept of message-passing. For example: Is disk sharing (i.e. reading from and writing to the same disk area) also message passing? One could see it that way, while in general we assume some kind of communication network among the communicating parties.
4 WHAT IS PEER-TO-PEER? Peer-to-peer is a catchword today, but it is not yet commonly agreed what it denotes. Is it a system, concept, architecture or merely an idea? Peer computers are those who have equal standing, as in rank or class. The idea of peer application architecture seems appropriate considering application architecture gradation: stand-alone applications, client and server applications, peer-topeer applications – exchanging the roles of client and server. Ideal peer system is fully decentralized, with no hosts with special administrative role. However, building such a system can be quite difficult when we count computers in tens of thousands – many popular systems use so called hybrid approach to introduce a sense of hierarchy in the system (Jabber [Jab], Napster [Nap]). With an eye on overall system architecture, we can identify two P2P system types: with a central server / coordinator that serves as a pointer to the other edge computers, assigns the load and combines the results, and without one. The Internet was originally designed as a peer-topeer system, but soon the client-server concept prevailed and Internet became more and more centralized around a few sites that generate a majority of all the traffic. With the expansion of the Internet, we observed how the hosts divided their roles: some of them are servers, they offer services and data, while the majority of small, usually home computers (PCs) act as clients. These are sometimes referred to as an edge of the net. Clients just need to know how to ask a question (request a service) and receive a response. If a person with such a client machine wanted, for example, to publish his homepage, he often found a well-known server and uploaded it there. With the today's form of "commercial Internet" we observe less cooperative and more selfish behaviour of its users. The idea within peer-to-peer computing is to use the free edge resources and combine their
2
power into something like a giant file system or a supercomputer. The cooperation should be enforced by means of system design: either the system would not work properly without cooperation, or it would reward proper behaviour and punish misbehaviour. (Napster [Nap] set an example of such motivating design: while one was downloading his favourite mp3, the others were able to download files from his own collection.). System components are peers, highly autonomous, able to decide for themselves when and for how long they want to act as a part of the distributed computing system. Different authorities in the P2P area give emphasis to several characteristics, unique to P2P systems: Sharing resources: P2P working group defines peer-to-peer computing as sharing of computer resources and services by direct exchange. Further it lists four business areas that would benefit from it: collaboration, edge services (caching), distributed computing and resources, intelligent agents. Edges of the net: Shirky [Shi00] is more precise: P2P is a class of applications that take advantage of resources (storage, cycles, content, human presence), available at the edges of Internet. He also suggests a “litmus test” for P2P system, considering only variable connectivity and high autonomy of network edges. PIE – Presence, Identity, Edge: Dougherty [Dou01] cites and explains Shirky’s definition. He introduces the term PIE to describe the core elements of peer-to-peer applications: Presence (is the resource online?), Identity (uniqueness of the available resources) and Edge resources (typically PCs). However, the problems that P2P systems deal with, are common to all authors without regard to their definition: security, addressing, accessing firewalled computers and dealing with NAT, message routing, presence detection…
5 COMPARRISON OF SELECTED P2P SYSTEMS In this section, we briefly compare general characteristics and selected representatives of P2P systems. Roughly, we divide P2P systems in four groups according to the field they support: Distributed search with file (content) sharing: Napster, Gnutella, FastTrack (algorithm used by KazaA and Morpheus), Freenet (virtual information store). Distributed (grid) computing: Seti@home, Distributed.net, Entropia, DataSynapse, Avaki. Real-time communication messaging) and: Jabber, ICQ.
(chat,
instant
Collaboration, groupware: Groove, WebDAV
5.1 Gnutella Gnutella network [Gnut] is representative of the wide group of file-sharing systems. The Gnutella protocol enables distributed file search employing either client-server or peer-to-peer paradigm. Since peers act as clients and servers at the same time, they are referred to as servents. A servent connects itself to a Gnutella network by establishing a connection to another connected servent. Servents communicate by exchanging descriptors – control messages, queries and query hits. Each descriptor has limited time to live (TTL), set at its origin. With each hop to another servent, the TTL is decreased by 1. A descriptor contains addresses – (servent ID or IP address) of its origin (and destination, where appropriate). A servent who receives a descriptor with TTL greater than 1, sent to another destination, forwards it to his neighbour servents. Query hits are sent in response to queries along the same path that the query took: only servents that have seen the original query forward them. When a servent receives a query hit, it establishes a direct http connection for the file download. Files are not downloaded over Gnutella network. When the resulting file is to be downloaded from the firewalled users, who cannot accept incoming connections, another descriptor called Push is routed over Gnutella network. When a firewalled servent receives Push, he himself establishes an outbound connection to the servent that initiated the Push request, and sends the file. In case the outbound connection cannot be established, probably the other servent is behind the firewall too. In that case, file cannot be transferred.
5.2 Entropia Entropia [Ent] is a platform that harnesses the available power of the networked PCs (so-called computing grid) to run computationally intensive applications. Server software splits the job into small tasks that are distributed among the PCs. The PCs execute the assigned tasks only in their idle time and afterwards send the results back to the server. The amazing feature of not only Entropia but almost every P2P distributed computing system is that it is scalable almost with no limitations. Like many other distributed computing companies, Entropia demonstrates its potentials by powering non-profit bio-medical projects (fightAIDS@home, and many others). Entropia employs Globus Toolkit [Glo] – a set of protocols and software tools that make it easier to build computational grids and grid-based
3
applications. Globus is an open-source project and manages the areas of security, information infrastructure (meta-data about the grid), resource and data management, communication, fault detection and portability. However, all the distributed computing systems need a central server who splits the job, assigns tasks to computers and reassembles the results. We do not observe any particular interaction among peers here and therefore, although dealing with similar problems, we should rather than P2P label such systems as peer-oriented, as suggested in [Sun].
5.3 Jabber Jabber [Jab] is a technology, at first designed to support an instant-messaging system, fully based on XML, supporting conversations person-to-person as well as conversations between people and applications, and even conversations among applications with no people directly involved. Jabber uses a client-server architecture, which means that all the messages from one client to another go through the server. However, any client is encouraged to negotiate a direct connection to another client for application-specific use. The network model is adopted from the plain old e-mail system. A Jabber server communicates with clients as well as with other servers.
5.4 Groove Groove [Groo] is a platform, enabling effective collaboration within the enterprise as well over its boundaries. It offers a virtual space and tools for group interaction, including content-sharing, communication and joint activity in the real time. Each group possesses its shared space, replicated on every member’s computer. When one of the members adds something to that space, change is reflected on all the machines. The system offers a variety of services – security, storage, synchronization, and connectivity. Groove’s architecture is hybrid peer/server: support for collaboration is peer-oriented while system management functions are centralized (component management, routing options, license management, reporting).
6 P2P IN A DISTRIBUTED SYSTEM ARCHITECTURE Now let's go back to the definition and characteristics of distributed systems. In this section, we will try to fit the P2P systems into the definitions, cited in Section 3. Enslow's definition of distributed system (3.1): We definitely observe multiple heterogenous resources, physical distributed over the network. But
tipically there is no high-level OS that would unify and integrate the control over the components. Usually, control is distributed. However we could talk about some kind of distributed OS or some functions of one on top of the local OS in a few cases - Groove is one such example. Also the idea of highlevel OS could be replaced with adherence to some standards, protocols or middleware. We observe at least “more-or-less cooperative” autonomy (parts of the system are highly autonomous, and sometimes act too selfishly). Enslow’s idea of cooperative autonomy meant that, although able to refuse a request, the components are basically willing to cooperate. Any resource must be able to refuse a request or break up the communication or processing at any moment System transparency concept is further extended into several classes in [Cou01], based on ANSA’s definition: In P2P systems, we tipically observe location transparency (resources accessed without knowledge about their location), concurrency transparency (several processes can use the resource simultaneously and they don't have to know about the concurrency control), mobility (migration) transparency (allows the movement of resources the concept of transience - sources of information constantly changing locations), performance transparency (allows the system to be reconfigured as loads vary) and scaling transparency (system and applications can expand without changes to accommodate a growing load). Often - depending on the system and the type of the resource - we can also talk about access transparency (local and remote resources accessed in the same way), replication transparency ("invisible" multiple instances of a resource) and failure transparency (concealment of hardware or software faults). 3D cube (3.2): A P2P system can take up any of the four positions with full autonomy: homogenous or heterogenous, single site or several sites. But by definition we cannot find a P2P system with low or only moderate autonomy. The quintuple (3.3): Similarly as with 3.2, a P2P system will typically have all the five values set to a high degree. Message passing (3.4): As we could recognize from the cited examples, all the participants in P2P systems can communicate only by message passing. From cited definitions and examples we can conclude that although emerging in very colorful varieties, P2P systems typically show up in only a few architectural forms of distributed systems.
6 CONCLUSIONS Obviously, P2P systems offer several advantages that end users have already recognized. But there’s
4
no doubt time will bring up new areas of application. In the paper, we tried to clarify the idea behind peerto-peer concept with regard to well known “distributed” ideas. We didn’t stress the problems that peer philosophy is to overcome (legal, security etc), but rather surveyed the basic concepts and examples in this field.
REFERENCES: [Ens78] Enslow, P. H., Jr.: What is a "Distributed" Data Processing System?, Computer January 1978, 13-21. [Ozs91] Ozsu, M. T., Valduriez, P.: Distributed Database Systems: Where Are We Now?, IEEE Computer, August 1991, 68-78. [Leff91] Leff, A., Pu, C.: A Classification of Transaction Processing Systems, IEEE Computer, June 1991, 63-75. [Bri92] Bright, M. W., Hurson, A. R., Pakzad, S. H.: A Taxonomy and Current Issues in Multidatabase Systems, IEEE Computer, March 1992. [Cou01] Coulouris, G., Dollimore, J., Kindberg, T.: Distributed Systems Concept and Design, 3rd edition, Addison-Wesley, 2001. [Ora01] Oram, A. (ed.): Peer-to-Peer: Harnessing the Power of Disruptive Technologies, O'Reilly, 2001. [p2pwg] Peer-to-peer working group, http://www.peer-topeerwg.org/whatis/index (2002) [Shi00] Shirky, C.: What is P2P… And What Isn't, The O'Reilly Network, November 2000, http://www.oreillynet.com/pub/a/p2p/2000/11/24 /shirky1-whatisp2p.html (2001) [Dou01] Dougherty, D.: All the Pieces of a PIE, http://www.oreilly.com/catalog/p2presearch/ chapter/ch01.html, (2001) to appear in: Shirky, C., Truelove, K., Dornfest, R., Gonze, L.: 2001 P2P Networking Overview, O’Reilly 2001 (in press); [Ent] Entropia: www.entropia.com (2002) [Glob] Globus: www.globus.org (2002) [Jab] Jabber: www.jabber.org (2002) [Nap] Napster: www.napster.com (2001) [Gnut] Gnutella protocol: www.gnutella.com, www.clip2.com/GnutellaProtocol04.pdf (2001) [Groo] Groove: www.groove.net (2002) [Sun] Sundsted, T.: The practice of peer-to-peer computing, IBM DeveloperWorks, March 2001, www-106.ibm.com/developerworks/library/jp2p/index.html (2002)
5