Using P2P to Distribute Large-volume Contents – Research Problems, Solutions and Future Directions Simon G. M. Koo, C. S. George Lee
Karthik Kannan
School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907 Email: {koo, csglee}@purdue.edu
Krannet School of Management Purdue University, West Lafayette, IN 47907 Email:
[email protected]
Abstract— The research focus of Peer-to-Peer (P2P) network design had long been on low-complexity content mapping and efficient search mechanisms. System such as CAN, Chord, Pastry are pioneers in the area and have been cited by almost every P2P work. These systems provide an efficient substrate for sharing/distributing small files like MP3 or images, because searching time is essential to the overall performance. However, for large-volume contents like multimedia documents or scientific data sets, the efficiency of the searching process becomes a negligible factor to the system’s performance, while the data transfer efficiency becomes the determinant of the overall distribution efficiency. In this paper, we describe a new set of research problems when P2P is used for distribution of large-volume contents, and propose solutions to these problems.
I. I NTRODUCTION
AND
M OTIVATION
In a recent survey commissioned by the Motion Picture Association of America (MPAA), about 24 percent of Internet users had downloaded a featurelength film online at least once, and these downloaders had averaged about 11 films each [4]. Motivated by the recent success of the MP3 music industry, both on the mitigation on illegal downloading and the multi-million-dollar-generating profits, we envision that commercialization of legal movie downloading will generate the same amount of impact, if not more. The problem is, the size of a DVD-quality movie normally ranges from 800MB to 4GB, and it will be very expensive if Content Distribution Networks (CDNs) or other client/server-based technology is being used as the distribution channel. Movie distribution is not the only application that involves large-volume content distribution. Applications such as the sharing of massive scientific data or bioinformatic databases among institutions are
calling for low-cost and effective solutions, and also are facing the same challenges. P2P technologies can provide a low-cost and scale-free solution for content distribution. However, the emphases of most P2P systems, including CAN, Chord, Pastry, etc., had been put on lowcomplexity content mapping and efficient search mechanisms. Many of these applications are efficient when sharing/distributing small files like MP3 or images, which are a few MegaBytes in size. This is because searching time is essential to the overall performance, and the time spent on data transmission is short relative to that on large files. For large-volume contents like multimedia document or scientific data sets, the efficiency of the searching process becomes a negligible factor to the overall performance, while the data transfer efficiency becomes the determinant of the distribution efficiency. Another problem about using P2P for content distribution is that most existing P2P applications are also designed to enhance anonymity in file sharing, which makes it very difficult to control and track the list of people who has downloaded the contents. So for content that requires subscription, like movies, P2P may not be a desirable choice due to its lack of functionality in accounting. The same problem also happens to scientific-data sharing systems where only authenticated users are allowed to obtain the content. Currently, BitTorrent (BT) [7] and eDonkey [9] are the only two well-known P2P systems that are conducive to distribute large-volume contents. eDonkey is a proprietary protocol and there is no information about the details on its operation. BT is used by Linux distributors like Debian and
Mandrake for distributing their versions of Linux. The key feature of BT that makes it conducive to distribute huge files is that BT divides the content into several pieces and allowing peers to exchange those pieces instead of the complete file. Such a mechanism has been demonstrated to improve the efficiency of P2P exchanges [24], and analytical work has justified this claim [21]. The intuition is that when content is broken into pieces for P2P exchange, it will take a shorter time before a peer begins to upload to its neighbors while simultaneously download from the community. Also, as specified in BT protocol, peers who do not upload their content pieces will be “snubbed” by others and will not receive any content. Such properties reduce the existence of freeriders (by forcing everyone to contribute) and improve the overall system performance. In this paper, we describe a new set of research problems when using P2P for distribution of largevolume contents, and present our solutions to these problems. Our main focus is on how to improve distribution efficiency when P2P is used to distribute large-volume contents and the following research problems will be addressed in Section II: 1) The design of an efficient neighbor-selection algorithm at the tracker level; 2) The design of an incentive-compatible mechanism at the peer level. We will also discuss other open problems like content protection, pricing, and authentication in Section III
technique used in CDN. Figure 1 shows a the idea of network of trackers.
II. R ESEARCH P ROBLEMS AND S OLUTIONS In the rest of the paper, we adapt the framework of a hybird P2P network model [22] as the model for large-volume distribution. In a hybrid P2P, a central entity, referred to as the “tracker” thereafter, is dedicated for coordination and accounting during the distribution process. Both BT and eDonkey use trackers in their distribution process. The presence of a tracker eliminates the need of complex searching mechanisms at peer level, and it also allows content owner to keep track on who has downloaded the content from the framework. For scalability, we designed a hierarchical system for this hybrid P2P content distribution system using DNS-based redirection ([2], [8]), a common
One important function provided by the tracker is to strategically select neighbors for the peers and build an overlay network for the distribution. The current implementation of BT uses random selection. We anticipate that the distribution efficiency can be improved by strategically selecting neighbors. The problem is, how should the neighbors be selected. Previous works on neighbor-selection ([1], [3], [25]) were all based on pure P2P in a completely distributed environment. With a central entity (the tracker) in hybrid P2P, a centralized optimization algorithm can be used to achieve better performance. In our previous studies [12] we have shown that by increasing the content availability to the peer
Fig. 1.
Network of trackers.
When peers are making request to tracker, instead of connecting to a particular tracker, they will be redirected to the geographically nearest tracker using DNS-based redirection. Each tracker is responsible for peers in a regional area. Trackers also need to communicate with other trackers and obtain information about peers from other regions, so that they may select neighbors from other regions for better performance. Details on the design, including inter-regional coordination, fault-tolerance, and user accounting, is the focus of our future work. A. Neighbor Selection Problem
from their immediate neighbors, it can significantly improve system performance. The idea is to find neighbors with the most disjoint set of content pieces, and allow them to exchange with each other. In order to do so, the optimization problem has to be formulated as an integer programming problem. It is known that integer programming problem is hard to solve, so we proposed a real-time genetic-algorithm-based neighbor-selection strategy to achieve this objective. We showed through simulations that the proposed strategy does not only enhance system performance by improving system capacity and reducing average downloading time, but also discourages peers from untruthfully revealing their downloading progress. B. Incentive-compatible Mechanism Design At the peer level, we need an incentivecompatible mechanism to encourage resource contributions from peers, because the presence of freeloaders is one of the major causes of service degradation. The design of incentive P2P system has been addressed in many existing works (see [15], [19]). On top of this, the design should aim at improving distribution efficiency, and it should not introduce high computation complexity to the peers. Existing mechanisms like (micro-) paymentbased ones require additional guarantees on security and trust ([20] provided a strong argument against the use of mircopayments), while reputation-based mechanisms (see [5], [10], [11], [14], [15], [17], [18], [23]) have their own shortcomings on the propagation of reputation and the problem of faking. Specific to our problem, the goal is to design a mechanism such that the distribution process is as efficient as possible. We propose a “seeing is believing” mechanism [13] in which a peer will decide how much capacity will be assigned to which neighbors based on what it has experienced. The advantage of this design is its simplicity – each peer makes its own decision based on what it experienced. This approach reduces the chances for faking and compromising, and it rewards the peers who can contribute more to their neighbors. The mechanism applies a utility-based Cournot adjustment process [6] where peers will maximize their contributions for a fair or better return, and we show that by adopting this utility-based protocol,
the system will achieve Cournot Equilibrium [16]. Nonetheless, the protocol also enhance distribution efficiency, and because it does not rely on any reputation exchange, the overhead it imposes on the system is minimal. The algorithm itself runs in O(N log N) time, where N is the number of neighbors a peer connects to. III. O PEN P ROBLEMS In addition to the design of the tracker network mentioned in Section II, the following open problems are yet to be addressed: • Content Protection: Digital Right Management (DRM) allows content owner to deny access of the (encrypted) content from unsubscribed users, even if they have a complete copy of the content. The design of DRM and the integration of DRM to the P2P framework are important research issues within the problem. • Authentication, Authorization and Accounting (AAA): In this paper we assumed the tracker is responsible for the AAA functionalities. However, details on the exact design and operation, as well as the integration of AAA to user databases and downloading policy are still open issues. • Network Service Pricing: Since part of the distribution process is delegated to the peers (using their resources), how should the content provider price the service in order to reflect such contribution from peers? This is an important issue especially if the P2P content distribution system is used in a business context like movie distribution. IV. C ONCLUSION In this paper, we present the problem of using P2P for distribution of large-volume contents and the reasons why existing P2P systems are inefficient in handling the problem. We briefly presented our solutions to the neighbor-selection problem using genetic algorithms and our “seeing is believing” incentive-compatible mechanism for resource trading. We also presented some open issues regarding to this new P2P research direction. P2P research had long been focused on efficient content-locating and searching. The introduction of
distribution efficiency in P2P should open the door for many new research ideas and directions. R EFERENCES [1] M. Adler, R. Kumar, K. Ross, D. Rubenstein, D. Turner, and D. D. Yao, “Optimal Peer Selection in a Free-Market Peer-Resource Economy,” in Proc. of the Second Workshop on Economics of Peer-to-Peer Systems, Cambridge, MA, June 2004. [2] A. Barbir, B. Cain, R. Nair, and O. Spatscheck, “Known Content Network (CN) Request-Routing Mechanisms,” RFC 3568, July 2003. [3] D. S. Bernstein, Z. Feng, B. N. Levine, and S. Zilberstein, “Adaptive Peer Selection,” in Proceedings of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS03), Berkeley, CA, Feb. 2003. [4] J. Borland, “Survey: Movie-swapping up; Kazaa down,” CNET News.com. [Online] http://news.com.com/2100-10255267992.html, 2004. [5] S. Buchegger and J.-Y. le Boudec, “A Robust Reputation System for P2P and Mobile Ad-hoc Networks,” in Proc. of the Second Workshop on Economics of Peer-to-Peer Systems, Cambridge, MA, 2004. [6] C. Buragohain, D. Agrawal, and S. Suri, “A Game Theoretic Framework for Incentives in P2P Systems,” in Proc. of the Third Intl. Conf. on Peer-to-Peer Computing (P2P’03), 2003. [7] B. Cohen, “Incentives Build Robustness in BitTorrent,” in Proc. of the First Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, June 2003. [Online]. Available: http://bitconjurer.org/BitTorrent/ [8] M. Day, B. Cain, G. Tomlinson, and P. Rzewski, “A Model for Content Internetworking (CDI),” RFC 3466, Feb. 2003. [9] eDonkey 2000. [Online]. Available: http://www.edonkey2000.com/ [10] D. Ghosal, B. K. Poon, and K. Kong, “P2P contracts: a framework for resource and service exchange,” Future Generation Computer Systems, 2004. [11] S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, “The Eigentrust Algorithm for Reputation Management in P2P Networks,” in Proc of the Twelfth International World Wide Web Conference, May 2003. [12] S. G. M. Koo, C. S. G. Lee, and K. Kannan, “A GeneticAlgorithm-Based Neighbor-Selection Strategy for Hybrid Peerto-Peer Networks,” in Proc. of the 13th IEEE International Conference on Computer Communications and Networks (ICCCN’04), Chicago, IL, Oct. 2004, pp. 469–474. [13] ——, “A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks,” Submitted for publication, 2004. [14] H. T. Kung and C. Wu, “Differentiated Admission for Peerto-Peer Systems: Incentivizing Peers to Contribute Their Resources,” in Proc. of the 2nd International Workshop on Peerto-Peer Systems (IPTPS ’03), Berkeley, CA, Feb. 2003. [15] K. Lai, M. Feldman, I. Stoica, and J. Chuang, “Incentives for Cooperation in Peer-to-Peer Networks,” in Proc. of the First Workshop on Economics of Peer-to-Peer Systems, Berkeley, CA, June 2003. [16] D. G. Luenberger, Microeconomic Theory. McGraw-Hill, 1997.
[17] R. T. B. Ma, S. C. M. Lee, J. C. S. Lui, and D. K. Y. Yau, “An Incentive Mechanism for P2P Networks,” in Proc. of the 24th Intl. Conf. on Distributed Computing Systems (ICDCS ’04), Tokyo, Japan, Mar. 2004. [18] T. Moreton and A. Twigg, “Trading in Trust, Tokens, and Stamps,” in Proc. of the First Workshop on Economics of Peerto-Peer Systems, Berkeley, CA, June 2003. [19] T.-W. Ngan, D. Wallach, and P. Druschel, “Enforcing Fair Sharing of Peer-to-Peer Resources,” in Proc. of the 2nd International Workshop on Peer-to-Peer Systems (IPTPS ’03), Berkeley, CA, Feb. 2003. [20] A. Odlyzko, “The Case Against Micropayments,” in Financial Cryptography 2003, J. Camp and R. Wright, Eds. Springer, 2003. [21] D. Qiu and R. Srikant, “Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks,” in Proc. ACM SIGCOMM, Portland, OR, Sept. 2004. [22] R. Schollmeier, “A Definition of Peer-to-Peer Networking for the Classification of Peer-to-Peer Architectures and Applications,” in Proc. of the First International Conference on Peerto-Peer Computing (P2P ’01), Lingk¨oping, Sweden, Aug. 2001, pp. 101–102. [23] K. Tamilmani, V. Pai, and A. Mohr, “SWIFT: A System With Incentives For Trading,” in Proc. of the Second Workshop on Economics of Peer-to-Peer Systems, Cambridge, MA, 2004. [24] X. Yang and G. de Veciana, “Service Capacity of Peer to Peer Networks,” in Proc. of INFOCOM 2004, Hong Kong, Mar. 2004. [25] L. Zou, E. Zegura, and M. H. Ammar, “The Effect of Peer Selection and Buffering Strategies on the Performance of Peerto-Peer File Sharing Systems,” in Proceedings of Tenth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), Fort Worth, Texas, 2002.