public domain p2p file sharing networks content and their ... - CiteSeerX

48 downloads 12234 Views 184KB Size Report
chosen this technology to search programs, films, songs, ... The 6 main public domain P2P file-sharing networks are .... availability, are pornographic videos.
PUBLIC DOMAIN P2P FILE SHARING NETWORKS CONTENT AND THEIR EVOLUTION Jaime Lloret1, Juan R. Diaz2, Jose M. Jiménez3, Manuel Esteve4 Department of Communications, Polytechnic University of Valencia Camino Vera s/n. 46022 Valencia Spain 1 [email protected], [email protected], [email protected], [email protected] ABSTRACT Since the appearance of Peer-To-Peer file sharing networks a few years ago, many Internet users have chosen this technology to search programs, films, songs, documents, etc. In this article, six public P2P networks (Gnutella, FastTrack, Opennap, Edonkey, MP2P and Soulseek) have been analyzed. We have tracked their evolution in terms of connected users, number of shared files and total amount of data shared inside of them for two years. The results are compared and discussed with previous measurements taken by other authors in order to show their evolution. We have measured their type of content and we have checked that those analyzed P2P networks seems to be specialized in different type of files.

The 6 main public domain P2P file-sharing networks are Gnutella [1], FastTrack [2], Opennap [3], Edonkey [4], MP2P [5] and Soulseek [6], although there are other networks not so popular [7].

KEY WORDS Peer-to-Peer networks, File sharing networks, Evolution.

Most P2P networks have not been designed to address security issues (authentication, file permissions and file integrity); some P2P clients even include addware or spyware. Moreover, as files are often downloaded from unknown users, they may include some viruses or trojans inside the files. Even so, keeping in mind those aspects, the number of Internet users using P2P file-sharing networks is still growing.

1. Introduction Currently there are many P2P file-sharing networks in existence, and many of them have millions of on-line users and millions of GBytes shared. Although, there are users that try to download files from the network, without intention of providing any, there are a lot of users who are able to share what they have with the whole community without caring about who is downloading their files. One of the first steps is to differentiate between P2P network and P2P clients. P2P networks are a set of rules and interactions that allow P2P clients to communicate. A P2P client is a computer application that allows a user interacts with other users in the same network. The number of P2P emergent networks is continuously increasing and their clients are having more and more capabilities every time. P2P networks can be classified according the parameters taken into account (structure, grade of centralization, discovery and search algorithm, file downloading system, etc.). In all these types of P2P networks, data transfer is made directly between edge clients, without any central server mediating this transfer.

Because the social impact of public domain P2P filesharing networks, both industry and academia are spending time and money analyzing several aspects of these networks: • The legality of the files that are being shared [8]. • The potential risks for home users [9] • The potential risks in enterprises with workers using their workstations as P2P clients [10].

Sandvine Incorporated has measured in [11] that some ISPs’ networks became rapidly congested and P2P traffic reached nearly 60% of the total traffic in their networks. Although not so striking, Internet2 administrators also computed impressive results on 16 February 2004 where 10.46% of the total traffic was originated by P2P filesharing [12]. On the other hand, CAIDA (Cooperative Association for Internet Data Analysis) also shows that Internet traffic is mainly dominated by HTTP and P2P file-sharing protocols [13]. There are several public domain P2P file-sharing networks measurements published, some of them have been taken in a deceptive manner (e.g., the number of connected users is calculated solely based on the amount of users that download a certain P2P client program [9, 10]), others give the maximum number of users only (e.g., Napster reached the number of 20 million connected users with 1.5 billion of shared files in July 2000 [14]), and others, just give the average of users for a certain period

of time (e.g., S. Saroiu et al., in their article [15], show the average number of connected users in a P2P network). Some studies have analyzed which type of P2P filesharing network is used, by Internet users, from different regions of the world [16]. Some papers even study the economic cost of downloading a file analyzing the required time [17]. None of those papers and measurements aforementioned has studied the users and their content from inside the network, using as a source the protocol. P2P file-sharing networks analyzed in this paper have protocols that permit to measure the number of users, number of files, the amount of shared information and the type of shared files inside each one of these networks. On the other hand, the number of users and the number of shared files in those public domain P2P file-sharing networks varies over the time, so an evolution of these types of networks is needed in order to have updated measures. This paper is structured as follows. Section 2 shows selected P2P file-sharing networks evolution; the measurements have been taken from 2003 to 2005. Section 3 shows the evolution of their content compared with measurements taken from other authors. Current percentage of video, audio, documents, programs and compressed files shared in those networks are shown in section 4. Finally, in section 5, we have explained our conclusions.

2. P2P file sharing Networks evolution Figures 1 and 2 show the evolution in terms of users and number of shared files, respectively, in Gnutella, FastTrack, Opennap, Edonkey networks. Those measurements have been taken from March 2003 to March 2005. MP2P and Soulseek networks measurements have been taken from March 2004 to March 2005. To take accurate measurements of the corresponding P2P file-sharing networks, the most adequate clients have been selected; bearing in mind those that would provide the most information on the architecture or the highest update frequency to measure the parameters. The Gnutella network has been analyzed with the Limewire client. In the FastTrack network, the measurements have been taken with the KaZaA Lite client. In order to analyze OpenNap network, the Xnap client has been used. The eDonkey2000 client has been utilized to analyze the eDonkey network. The MP2P architecture has been analyzed by means of the Piolet client. Finally, the Nicotine client has been used in order to analyze the SoulSeek network. File replication in P2P file-sharing networks is ordered by power-law, more concretely by the Zipf-law, as papers [18][19][20] demonstrate it. This law explains that few users have most of the files in the network. We have

observed, with the measurements taken, that all analyzed P2P networks have their number of shared files by a user greater than years before. This makes sense, the amount of shared files in a P2P file-sharing network trend to grow because the most popular files trend to replicate in the network. However, we have to take into account that users hard disks capacity are limited, so many users use to record in optical disks many downloaded files, then, those files are deleted from users hard disks. Even so, our measurements indicate that users of Gnutella network, have grown their files per user from an average of 306.98 files in 2003 to an average of 348.04 files in 2005. FastTrack users had an average of 213.87 files in their hard disks in 2003, next 6 months this number was decreasing because many users left that P2P network and, later, it has grown till 276.60 files per user in 2005. OpenNap users began with an average of 586.38 files per user, this number decreased in November 2003 and till that date it has been growing up to 655.01 files per user. Users in eDonkey network are the one with fewer files per user. They began with an average of 74.13 files per user in 2003 and now they have 115.27 files per user. Finally, MP2P users have had an average of 244.50 files per user in both years. Soulseek network doesn’t allow measuring how many files all users share, so we couldn’t know their average. Reference [21] shows measurements taken (number of users, number of shared files and total amount of shared information) by us, in each one of these networks, during a week. Last measurements, in 2005, show that eDonkey network can have variations of ± 1 million of users in one hour. FastTrack can have variations of ± 700,000 users in one hour, OpenNap and Gnutella can have variations of ± 50,000 users in one hour and MP2P can have variations of ± 10,000 users in one hour. Soulseek is still growing.

3. P2P file-sharing evolution

networks

content

This section compares our measurements with some measurements taken by other authors. J. Chu, K. Labonte and B. Levine, in their paper [22], took measurements for Gnutella network using Limeware client, from February 24, 2002 to March 25, 2002. In this paper, they searched some files (avi, mp3, mpg, etc.). Those files were the most popular files in the network. Now, we have searched the same files to see their current availability. They created three lists with “top 50 all files”, “top 50 audio files” and “top 50 video files” in that moment. We have searched the same files in the Gnutella network, using the same desktop P2P file-sharing client application (Limeware), and we have observed the following: • All files in “top 50 all files” have decreased their number of replicas in the network. None of them has increased or maintained their availability. The average of that decreasing is over 75.6%.

Num berof users

4500000

Gnutella OpenNap MP2P

FastTrack eDonkey Soulseek

Number of users

FastTrack eDonkey

900000000 800000000 700000000 600000000 500000000

4000000 3500000 3000000 2500000

1000000 500000

m

ar -0 3 ay -0 3 ju l-0 3 se p03 no v03 en e04 m ar -0 4 m ay -0 4 ju l-0 4 se p04 no v04 en e05 m ar -0 5

0 Tim e

m ar -0 3 m ay -0 3 ju l-0 3 se p03 no v03 en e04 m ar -0 4 m ay -0 4 ju l-0 4 se p04 no v04 en e05 m ar -0 5

400000000 300000000 200000000 100000000 0

2000000 1500000

m

Gnutella OpenNap MP2P

Figure 1. Number of users in analyzed networks

• All files in “top 50 audio files” have decreased with an average of 74.1%. • Results in “top 50 videos files” list are very different. The number of users, in more than a half of those videos, has grown. Most of those videos, that have grown their availability, are pornographic videos. According to the measures taken in previous section, the number of files inside the analyzed P2P file-sharing networks continues growing. However, older files, that before were the most replicated, are diminishing their availability as time goes on. That happens because new files are appearing in the network, and those new files become the most replicated, despite older files trend to disappear. The files most downloaded aren’t always the same. This analysis is not applied to pornographic videos, this type of files grow their availability as time goes on. On the other hand, Subhabrata Sen and Jia Wang, in their paper [23], took some measurements in Gnutella and FastTrack networks, from September 2001 to December 2001. Between this dates, both networks were growing as they say in their paper. There were an average of 197.445 users in Gnutella network and an average of 4.450.149 users in FastTrack network. If we compare those measures with our measurements, we can observe that between those dates and the first time we took measurements, the number of users began to decrease. Currently, the number of Gnutella users is growing again, but the number of FastTrak users is still decreasing. Beverly Yang and Hector Garcia-Molina, in their paper [24], used experimental results taken from OpenNap network, at the end of 2000. Users in this network had an average of 168 files per user. As we can observe in our measurements, this file average has varied along the years. In March 2003, every user had an average of 586 files per user. In October 2003, this average decreased to 444 files per user. In March 2004, the average grew up to 619 files per user. In March 2005, the average we have measured is 665 files per user. Currently, this average trends to grow. As the measurements have varied over the time, researchers have to model their networks taking into account these types of parameters.

Time

Figure 2. Number of files in analyzed networks

Table 1 summarizes all results obtained. What percentage of files per users has varied along 2 years is also shown.

3. P2P file-sharing networks content In order to know which percentage has some type of files inside the P2P file-sharing networks analyzed, we have classified 5 types of files: • • • • •

Video: avi, mpg, asf, mpeg, rm y wmv. Audio: mp3, ogg, wma, wav, aac y ac3. Documents: pdf, doc, txt, rtf, wps, sxw y pps. Programs: exe, com, msi y sh. Compressed files: ace, rar, arj, zip y gz.

This classification allows us to know what types of files are most popular and replicated between P2P file-sharing network users. Figures 3, 4, 5, 6, 7 and 8, shows that all P2P networks, discarding eDonkey network, have audio files as the most type of files. It happens because P2P filesharing network origins were developed to share audio files only. However, the second type of shared files in all these networks vary considerably. Table 2 shows the classification for each analyzed network. Measures obtained allow us to know that many of those shared files are multimedia, and there are not so many programs or compressed files shared. The network which multimedia files percentage is greater (discarding MP2P network, whose files are audio only) is Soulseek with a 98%. The second one is Gnutella with 77%. OpenNap has 73%, FastTrack has 66% and the last one is eDonkey with 63%. On the other hand, we want to highlight that the P2P filesharing network with most total amount of shared data is eDonkey network. It happens because video files and compressed files, in that network, have big sizes. Some of these files have more than 4 GB. If we take into account that users with more bandwidth have more online times downloading files from the network (as it is measured by S. Sen and J. Wang in [23]), that users will have the most replicated files.

7%

4%

7%

1%

14%

11%

26%

12% 36%

Videos

Videos

20%

4%

Videos

9%

Audio

Audio

Audio

Documents

Documents

Documents

Programs

Programs

Programs

Compressed

Compressed

Compressed

55%

41%

53%

Figure 4. Gnutella Content 12%

Figure 5. FastTrack Content

2%

6%

7%

3%

32%

21%

Videos

Videos

Videos

Audio

Audio

Audio

Documents

Documents

Documents

Programs

Programs

Programs

Compressed

Compressed

Compressed

100%

31%

Figure 7. eDonkey Content Centralization Gnutella FastTrack OpenNap eDonkey MP2P SoulSeek

Figure 6. OpenNap Content

0%

4%

82%

Figure 8. MP2P Content Download System

Users.

Files

decentralized Multi-souce Grows Grows Partially decentralized Multi-souce Decreases Grows Partially decentralized Multi-souce Decreases Decreases Partially decentralized Segmented Multi-souce Grows Grows Partially decentralized Single-source Grows Grows Centralized Single-source Grows n/t

Figure 9. Soulseek Content Type of files All Alls All All Audio All

Files per user +13.38% +29.33% +13.41% +55.49% -0.02% n/t

Protocol

Ports

TCP 6346, 6347 TCP 1214 TCP 6699, 7777, 8888 TCP, UDP 4661-4666 UDP 41170 TCP 2234, 5534

Table 1. Analyzed networks summary (n/t: measurements not taken) Gnutella FastTrack OpenNap eDonkey MP2P Soulseek

Video 2nd 3rd 2nd 1st 2nd

Audio 1st 1st 1st 2nd 1st 1st

Documents Programs Compressed 3rd 4th 5th 2nd 4th 5th th th 4 5 3rd 3rd 4th 5th 4th 5th 3rd

Table 2. P2P file sharing networks file classification

4. Conclusion There is not any relationship between the number of online users in a P2P file-sharing network and the number of shared files in the network. OpenNap and MP2P networks had the same number of users in March 2004; however, OpenNap network had three times more shared files than MP2P network. On the other hand, eDonkey network has had always more users than OpenNap network, however, OpenNap network has had more shared files than eDonkey network between March 2003 and January 2004. The network with most average of shared files per user is OpenNap. The total amount of shared data in P2P networks are not dependent with the number of shared files inside the network. FastTrack network is the one with most shared files, however, the network with more total amount of shared data is eDonkey network because many of the files shared are big size files.

P2P networks, whose number of users is stable or is decreasing, grows its number of files per user in the network because the users replicate the shared files in the network. The number of users of some older P2P file-sharing networks is decreasing because new P2P file-sharing networks, with more features, are appearing. For instance, the eDonkey network is better than others for lager files, so it is attracting users that look for video files, from the other ones. However, the total number of users using P2P file-sharing networks is growing. There are more files and users in P2P file-sharing networks, but older files trend to disappear. New files grow their replication rapidly, but as the time goes on, the availability decreases because users delete those files. However, pornographic files, discarding their age, grow their availability inside the network as it has been shown in section 3. We have seen that some P2P networks are able to support variations till 1 million of more, or less, users in one hour. Our results have been compared and discussed with previous measurements taken by other authors in order to show P2P networks evolution. This paper demonstrates

that other author’s measurements for P2P networks (such as average of files per user, most popular files and the type of files shared by Internet users) have varied along the years. As it has been demonstrated in [25], the file popularity in P2P file-sharing networks has a relationship with their popularity in web searchers. This article also demonstrates that P2P file-sharing networks seem to be specialized in different type of files, although many of them allow sharing all type of files. When a user wants to search any type of file, he/she has to take into account the users in any one of the analyzed networks (measured in section 3) and the percentage of this type of file inside the networks (measured in section 4) to obtain where is more probably to find the desired file.

References: [1] Eytan Adar and Bernardo Huberman. Free riding on gnutella. First Monday, 5(10), October 2000. [2] Nathaniel Leibowitz, Matei Ripeanu, and Adam Wierzbicki, Deconstructing the Kazaa Network, 3rd IEEE Workshop on Internet Applications. June 2003. [3] OpenNap Website, http://opennap.sourceforge.net/ [4] Oliver Heckmann and Axel Bock. The eDonkey 2000 Protocol. Technical Report KOM-TR-08-2002, Multimedia Communications Lab, Darmstadt University of Technology. December 2002. [5] MP2P Website: http://www.blubster.com/protocol1.html [6] Soulseek Website, http://www.slsk.org [7] Wikipedia Website: http://www.wikipedia.org/wiki/Peer-to-peer [8] Palisade. Available at: http://www.palisadesys.com/news&events/press_releases/ p2pstudyrelease.shtml [9] United States House of Representatives Committee on Government Reform, Staff Repport. File-sharing programs and peer-to-peer networks privacy and security risks. United States House of Representatives Committee on Government. May 2003. [10] AssetMetrix Research Labs, Corporate. P2P (PeerTo-Peer) Usage and Risk Analysis. Technical Report. 2003. [11] Sandvine Incorporated. Peer-to-Peer File Sharing: The impact of filesharing on service provider networks, White Paper, Sepember 2002.

[12] Internet2 Website http://netflow.internet2.edu/weekly/20040216/ [13] CAIDA Website: http://www.caida.org [14] The Pew Internet & American Life Project’s Online Music, September 28, 2000. Available at http://www.pewinternet.org/pdfs/PIP_Online_Music_Rep ort2.pdf [15] Stefan Saroiu, P. Krishna Gummadi and Steven D. Gribble. A Measurement Study of Peer-to-Peer File Sharing Systems, Department of Computer Science & Engineering, University of Washington, Technical Report. UW-CSE-01-06-02. 2002. [16] Sanvine Incorporated. Regional Characteristics of P2P: File sharing as a multi-application, multi-national phenomenon. White Paper. October 2003. [17] Artur Marques. Freeloading. Available at http://arturmarques.com/docs/economics/arturmarques_d ot_com_freeloading.pdf [18] Z. Ge, D. R. Figueiredo, S. Jaiswal, J. Kurose, D. Towsley. Modeling Peer-Peer File Sharing Systems, Proceedings of IEEE Infocom 2003, April 2003. [19] Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, John Zahorjan, Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload. Proceedings of the nineteenth ACM symposium on Operating systems principles. Pag. 314-329. 2003. [20] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and replication in unstructured peer-topeer networks. Proceedings. of the sixteenth International Conference on Supercomputing, ACM Press. Pag. 84–95, 2002. [21] J. Lloret, B. Molina, C. Palau, M. Esteve, Public Peer-To-Peer Filesharing Networks’ Evaluation. The 2nd IASTED International Conference on Communication and Computer Networks. MIT Cambridge, MA. Nov. 2004. [22] J. Chu, K. Labonte, and B. Levine. Availability and locality measurements of peer-to-peer file systems. In Proceedings of SPIE ITCom: Scalability and Traffic Control in IP Networks, vol. 4868. July 2002. [23] Subhabrata Sen an Jia Wang. Analyzing peer-to-peer traffic across large networks. IEEE/ACM Transactions on Networking (TON) archive. Volume 12, Issue 2. Pag. 219232. 2004. [24] Yang, B., Garcia-Molina, Hector. Comparing Hybrid Peer-to-Peer Systems. Proceedings of the 27th Intl. Conf. on Very Large Data Bases. Pag. 561–570. October 2001. [25] J. Lloret, J. R. Diaz, J. M. Jimenez and M. Esteve. The Popularity Parameter in Unstructured P2P Filesharing Networks. WSEAS Transactions on Computers, Issue 6, Volume 3. December 2004.

Suggest Documents