of communication do not rely on the classical client/server model used in the ... a drawback because hosts are not dedicated entities like routers. Routers are ... C10. Real Network. Overlay Network. Figure 1. Virtual vision of the network. Nowadays ... data plan is again a tree built on top of the control plan. Zigzag ([18]) is ...
Source vs Data-driven Approach for Live P2P Streaming Thomas Silverston and Olivier Fourmaux Universit´e Pierre et Marie Curie-Paris6 Laboratoire d’Informatique de Paris 6 8 rue du Capitaine Scott 75015 Paris, France {Thomas.Silverston, Olivier.Fourmaux}@lip6.fr Abstract Live streaming applications are increasing on the Internet. These applications are delay sensitive and need group communication. Presently, protocols designed for this kind of communication do not rely on the classical client/server model used in the Internet but organize the receivers into an overlay network, where they are supposed to collaborate with each other following the peer-to-peer model. Live p2p streaming protocols can be classified in three different categories: source-driven, receiver-driven and datadriven protocols. Each of them manages the overlay differently. In this paper we compare them by simulation to specify what is the most appropriate approach for these protocols. We implement a new simulator of p2p network and we choose two well-known protocols for simulations: a sourcedriven and a data-driven protocol. To our knowledge, our works are the first to compare with the same simulator and scenarii different approaches for live p2p streaming. Our simulations show that nodes organization on the overlay influences drastically network global performances, and data-driven approach seems to be the most appropriate approach for these protocols because it is less sensitive to dynamicity of nodes which is the main problem to resolve for these applications.
1. Introduction Success of the Internet generated deep changes in its use. As its number of users increases, new applications appeared and contributed to accelerate its development. For example, web and more recently p2p file sharing applications. Nowadays, a new kind of application is getting success: live p2p streaming application like TVs or radios on the Internet. They target a lot of people therefore they need group communication functionalities. Moreover, streaming applications are delay sensitive and consume many resources thus
they need mechanisms to manage resources. Opposite to client/server model usually used in the Internet in which roles of participants are clearly separated between users, which receive content, and server, which provides it, in p2p model, roles of participants are not clearly defined: each of them can be client, server, or client and server at the same time and then all entities collaborate with each other. Compare with client/server model, p2p model allows a more efficient data diffusion because content can be found in more than one point of the network. Thus, it is not only one server which shares its resources (bandwidth, processing time) but all receivers to allow the broadcast. Originally, the Internet is designed for point-to-point communication. However, new applications involve the need for the Internet to have group communication mechanisms (point-to-multipoint). Group communication requires to duplicate all packets to receivers. A first proposal was to implement these mechanisms on the network layer. The advantage of this solution is to not overload networks links. However this solution has been slowed down for several reasons ([9]). Group communication of network layer requires routers to maintain state for group, that means that the core of the Internet must do complex tasks, which goes against the driving principles of the Internet: intelligence in the edge of the network and simplicity in the core of the network. Moreover, Internet providers prevent uncontrolled group communication because they could not forecast the traffic in their network. Another solution for group communication is to implement these functionalities on the edge of the network, i.e. in the hosts. In this case, hosts themselves duplicate packets to other hosts. This is called applicative multicast because group communication functionalities are implemented to the applicative layer. This solution is flexible and easy to deploy and do not require Internet infrastructure modifications. However, packets duplication by the hosts can be a drawback because hosts are not dedicated entities like routers. Routers are entities performing continuing process and are not prone to many breakdowns or failures whereas
hosts can be very dynamic: they can join and quit the network whenever they want and can suffer breakdowns or failures. If a host in charge for the duplication of packets to other hosts leaves the network (explicitly or not), it will affect directly the other hosts. Finally, implementation of group communication functionalities into applicative layer is easier to deploy than implementation in network layer but involves another problem to deal with: the dynamicity of the hosts. In live streaming applications, hosts have to reach group communication. Thus, every host considers they are directly connected to some other hosts that constitute its neighborhood. This vision of the network is independent of underlying network topology. Nodes have an abstract, a virtual vision of the network and then organize themselves into an overlay network (figure 1). The objective for the nodes is to organize into an efficient virtual topology (overlay) to diffuse messages to every member of the group. Spanning Tree, mesh or ring are examples of virtual topology. C6
Overlay Network C13
C2
C11
C1 C10 C3
C8
C15
C14
C7
C9 C5
C4
C6
C13
1
5
C2
C11
C1
3
0
7
C10
C3 6
2 C15
C8
4
C14
C9 C5
Streaming applications read and decode the stream during their reception. Thus, streaming applications are delay sensitive because packets need to be read in time to ensure a good quality of reception to the user. Moreover, these applications like TVs or radios on the Internet need group communication functionalities because they target a lot of receivers. Streaming applications require to implement all mechanisms previously presented: an overlay to ensure group communication and a p2p network to involve all receivers and their resources in the communication. Protocols for live P2P streaming application can be classified in three categories: source-driven, receiver-driven and data-driven approach. Topologies built in the overlay can be identical but the way to construct them or to use them are different depending on the approach The objective of this work is to specify the better approach to use for live p2p streaming. We choose to implement a new discret event simulator because simulation is an efficient technique to observe global comportment of the protocols. We simulate a source-driven protocol with PeerCast ([6]) and a data-driven protocol with Donet ([21]). These protocols have been chosen because they are representative of the approach they use, and they are very often cited. Moreover these two protocols have already been implemented and used with success on the Internet ([5], [2], [1]). The rest of this paper is organized as follows. Section 2 discusses of the different approaches and related work. Section 3 shows the results of simulations and comparisons between source-driven and data-driven approaches. Finally, we conclude the work in section 4 and present our future work.
C7
C4
Real Network
Figure 1. Virtual vision of the network. Nowadays, applications needing group communications implement these functionnalities in the hosts, on the edge of the network. That is why hosts organize themselves into an overlay network. However, application-layer multicast is not performing as well as network layer multicast. Indeed, on the overlay, each host transmits messages to its neighbors therefore only one copy of the message gets across the overlay link but different links on the overlay can rely on the same links in the underlying network. Thus, mechanisms have to be added to allow a better utilization of network resources. Hosts on the overlay have to collaborate for sharing judiciously available resources to reach their objective: group communication for all hosts. It means, hosts can be organized in a collaborative network on the overlay: a p2p overlay network
2. Related Works Streaming applications target a lot of receivers simultaneously, therefore they need group communication mechanisms. Thus, protocols organize receivers into a p2p overlay network. Each of these protocols can create totally different overlay topology like spanning trees or meshes. However, it is the approach they use to manage the overlay which differentiates them: source-driven, receiver-driven or data-driven approaches.
2.1. Source-driven Approach According to this approach, we differentiate a control plan and a data plan. Data plan is used to send data to all peers whereas control plan is used to manage the group and dynamicity of peers (arrivals and departures). Usually, data plans result from control plans. With Peercast ([6]), control plan is a tree the root of which is the source of the stream, and data plan uses the same tree. Differently, Narada ([8])
control plan is a mesh between peers, and data plan is a tree built on top of the mesh, where the root is the source. There are other control plan topologies like Nice ([3]) which defines hierarchical organization of peers into a cluster, and data plan is again a tree built on top of the control plan. Zigzag ([18]) is working like Nice with a clustering control plan and a tree data plan, but seems to improve nodes degree metrics compared to Nice. Bullet ([12]) builds a tree for control plan and data plan but adds some mechanisms which make it less dependent of the overlay topology. For all of these protocols, control plan can be radically different from each other but data plan is always a spanning tree built on top of control plan and where the root is the source. A data plan which is a tree where the root is the source characterizes source-driven approach. Basically, source-driven approach transposes network layer multicast to applicative layer.
2.2. Data-driven Approach Unlike source-driven approach, data-driven approach does not clearly separate control plan and data plan. Group members (peers) exchange control messages about data availability in the network. Each peer chooses itself its neighborhood according to the data it wants ([21], [4], [16], [20]). Protocols like Donet ([21]) use epidemic algorithms and there is not really a control plan or a data plan built in the overlay. Epidemic algorithm is based on the following mechanisms: an entity wishing to send a message to all the other entities, send it to randomly selected entities. These entities send it again randomly to other entities. At the end, this mechanism makes the message possible in order to be transmitted to all receivers without a clearly defined topology built in the overlay. A main difficulty to resolve for group communication implemented at applicative layer is the dynamicity of the hosts. Indeed, it is inconsistent to structure the overlay in a fixed topology if the hosts are very dynamic, because arrivals and departures of hosts can have a deep impact on all the overlay network. That is why data-driven approach, which does not really structure the overlay, could be less sensitive to dynamicity of hosts, and then could be the best approach to use for live p2p streaming protocols.
2.3. Receiver-driven Approach With receiver-driven approach, control plan and data plan are clearly separated similarly to source-driven approach, and control plan can be a tree, a cluster or a mesh. Conversely, with receiver-driven approach, data plan is a tree and it is rooted at the receiver side instead of the source side. The receivers organize resources (peers of the net-
work) as well as they can to obtain the stream ([11]). However, receiver-driven approach is usually related to data encoding like layered coding ([14]) or multiple description coding (MDC, [10]) as in [17], [19], [7], [13]. With layered coding, data is encoded in several layers and it is necessary to receive at least the main layer. The other layers will only improve the quality of reception. With MDC, it is almost like layered coding but it is not necessary to receive a particular layer. For all of these techniques, you have to get the most possible layers to obtain the best quality of reception. An advantage of receiver-driven approach is to authorize limited capacities hosts to take part in the stream broadcast. However, this approach allows -by nature- a not optimal quality of stream reception since a host could only receive a part of the layers. Since we choose to work on the distribution structure of streams, data encoding used by receiverdriven approach could be considered as an optimization for previous approaches. That is why we did not include this approach in our current work yet.
3. Results of Simulations We simulate Donet and three implementations of Peercast (Peercast-C2 building a tree with 2 children per node, Peercast-C3 with 3 children per node and Peercast-C10 with 10 children per node) to estimate their global performances according to different metrics. Then, we could compare them thanks to these metrics. Typically, users of live p2p streaming applications need to receive a stream of good quality, i.e. a continuous stream. In other words, protocols should limit packet losses which happen when clients leave the networks and depending peers have to recover the stream. Since it is about live streaming, protocols should minimize the duration between the moment where the clients enter the network and the moment they receive their first packet of the stream. This is all the more true if a client asks for the stream consequently to departures of peers. Thus, we compare approaches used by these protocols according to 2 important metrics: average time to first packet (t2fp) and data packet losses rate for all the network. For all the simulations, we have some invariants assumptions. We generated a thousand autonomous systems (AS) topology with BRITE ([15]). We choose AS granularity because it is usually the inter-AS delays which are more important than intra-AS delays where resources are well configured to reach an efficiency routing. The delays used in the topology are close to delays we could find in the Internet. Each client is randomly connected to a topology node. The delay towards the node is fixed at 20ms, as we can observe with residential connexions. Finally, each client enters the network at instant 0s.
3.1. Average Time to First Packet In these simulations we want to know the delays it takes to receive the first packet of the stream according to the protocols and their different implementations. We varied the number of clients from 1 to 1000. Each plotted value is an average of a hundred simulation runs. This was done in order to represent the network global behavior.
Time to first packet (T2fp) - milliseconds
5000
average T2fp - Peercast-C2 average T2fp - Peercast-C3 average T2fp - Peercast-C10 average T2fp - Donet
4000
3000
2000
1000
0
0
200
400
600
800
1000
Number of clients in the overlay network
Figure 2. Average time to first packet for all simulated protocols Figure 2 resumes the average time to first packet for all the simulated protocols. Average time to first packet for Peercast increases logarithmically with the number of clients. Donet suffers an increase for its average time to first packet and then stabilizes, and stays almost constant with the number of clients. The increase is well identified and is due to our implementation of Donet protocol where deputy clients do not have to perform three requests to get the stream whereas other clients have to ([21]). Firstly, figure 2 shows average time to first packet for Donet is shorter than for Peercast. Secondly, average time to first packet for Peercast is longer for implementations with less children and increases logarithmically. For the three implementations of Peercast, a tree has been built into the overlay and each client should get across the tree to find a peer able to accept it as its child, and we know a tree traversal is logarithmic. However, the more a node accepts children, the more resources it shares with the other peers. With Peercast-C10, shared resources will be more important in the network, therefore a client will have to perform less requests to find a peer able to accept it as its child because the tree will be not as deep. In other words, the average time to first packet will be shorter in a tree where nodes are able to accept the greatest number of children. Finally, Donet has an average time to first packet shorter than all implementations of Peercast. It means Donet is bet-
ter than Peercast according to this metric. This is due to the approach used with Donet to establish peering relationships. With Donet, clients establish peering relationships with peers randomly obtained from deputy clients. It is more often probable to be redirected towards a peer having still enough resources, contrary to Peercast where a client should meticulously get across all the tree structure of the overlay. Clients in Peercast-C10 network have an average time to first packet shorter than with other Peercast implementations because with Peercast-C10, clients share more resources in the network. However, clients in Donet network have an average time to first packet shorter than clients in Peercast-C10 network although in Donet network, clients share as many resources as clients in Peercast-C10 network. Indeed, with our implementation of Donet, a client can eventually send the stream to ten peers like PeercastC10. Donet improves the global performances of the network according to this metric. This improvement is due to the approach used by Donet and the randomness provided by epidemic algorithms.
3.2. Distribution of clients in the overlay structure Simulations show distribution of clients according to the distance -in hops- to the source, i.e. according to the level in overlay topology. In this experience, we fixed the number of clients to 1000. Each plotted value is an average of a hundred simulation runs. This was done in order to represent the network global behavior. Figure 3(a) shows, with Peercast-C10 that there is a pike of clients at the 3rd and 4th level of the overlay structure, therefore most of the clients are at 3 or 4 hops of the source. Some clients can also be at 7 hops of the source. With Peercast-C3, distribution is more fairly balanced from 4th to 10th level, which corresponds to a distance from 4 to 10 hops from the source. Some clients can eventually be at 15 hops from the source. With Peercast-C2 , the distribution seems well balanced from 6th to 14th level and is similar to a Gaussian distribution. Some clients can also reach the 21th level. With Donet, most of the clients are situated between 4th to 10th level as with Peercast-C3 and some clients can also reach the 14th level. For all implementations of Peercast, Peercast-C10 has its clients closer to the source. When a node can accept more children, the tree built in the overlay is not as deep for a same size network. This corroborates the previous observation about average time to first packet. With Peercast-C10, clients are closer to the source in hops, therefore delays and time to first packet are consequently shorter than in all other Peercast implementations. With Donet, clients distribution in the overlay (with regards to the source) seems similar to clients distribution
percentage of data packet losses rate
35 percentage of Clients
7
Peercast-C10 Donet Peercast-C3 Peercast-C2
40
30 25 20 15 10 5 0
0
5
10 Level in the overlay
15
20
2 Peercast-C2 Peercast-C3 Peercast-C10 Donet
6 5 4 3 2 1 0
Peercast-C2 Peercast-C3 Peercast-C10 Donet
1.8 percentage of data packet losses rate
45
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2
0
200
400
600
800
1000
number of leaving clients
0
2
4
6
8
10
12
14
16
18
level of departures
(a) Distribution of clients in the overlay structure (b) Data packet losses rates according to number (c) Data packet losses rates according to leaving of leaving clients level of clients
Figure 3. Time to first Packet with Peercast-C3. However, average time to first packet for Donet is shorter than for Peercast-C3 (figure 2). Indeed, with Donet, each client gets the first packet with random peering relationships, without any consideration with its direct distance from the source. In a Donet network, a client does not have to get across all the structure to receive the stream. This is due to data-driven approach used by Donet which does not structure the overlay and makes it possible for a client not to depend on the overlay structure.
3.3. Data packet losses rates according to dynamicity of clients In these experiences, we measure the data packet losses rates in all the network in relation to dynamicity of clients. We simulated a one minute long stream, and clients leave explicitly the network during the simulation. We measure data packet losses rates with two sets of simulations: one where we varied the number of leaving clients and one where leaving clients left the network according to their level in the overlay topology. In this experience, we fixed the network size to 1000 clients. Each plotted value is an average of ten simulation runs. This was done in order to represent the network global behavior. Clients leave the network after 30s. Figure 3(b) shows data packet losses rates are increasing with the number of client leaving the network. The data packet losses rate of Donet is smaller than data packet losses rates of all Peercast implementations. In addition, the data packet losses rate of Peercast-C10 is smaller than the data packet losses rate of Peercast-C3 which is also smaller to the rate of Peercast-C2. Figure 3(c) shows the Donet data packet losses rate is much smaller than all Peercast losses rates but varie lightly (from 0.1% to 0.2%) with leaving level included from 4th to 10th. Peercast losses rates are globally decreasing but at higher rates. This experience shows, leaving levels have not the same effect with Donet and PeerCast.
In Donet network (figure 3(b)), clients get a set of randomly chosen peers to establish peering relationships and continue to discover other peers during all the streaming duration. If a peer leaves the network, affected peers already know other peers to recover the stream and do not suffer many data packet losses. With Peercast, clients are redirected in the structure and have to get across the structure again to find a peer able to accept them. It can be quite long and causes many data packet losses, particularly if a client has to get across the entire overlay structure. In a Donet network, if all peers known by a client have left the network, the client has to restart a complete join procedure. We observed previously (figure 2), average time to first packet of Donet is shorter than with Peercast, therefore clients in Donet network will not suffer as many losses as in Peercast network. With Donet, the variation between 4th and 10th level comes from the distribution of clients which are mainly distributed within these levels (figure 3(a)). The increase is due to the number of leaving clients but not the level of leaving clients. With Peercast, the overlay topology is a tree and there is less clients at smaller levels than at higher levels. Nevertheless, data packet losses rates are more important at smaller level than at higher level. In other words, for Donet, if the leaving level is densely populated (there are a lot of client in the leaving level), Donet network will suffer a relatively high data packet losses rate whereas for Peercast, leaving levels close to the source will cause many data packet losses although these levels are sparsely populated. With Peercast, topologies are rigorously structured in a tree, and downstream clients depend directly on upstream clients. Thanks to data-driven approach, Donet does not structure the overlay, and departures of nodes have only local effects and there are no dependencies between levels in its overlay topology. For all simulations, we observe that the Donet data packet losses rate is smaller than all Peercast data packet losses rates. According to this metric, Donet is better than Peercast. Donet resists to massive departures of clients
thanks to redundancy informations about other peers and makes the clients not depending on the overlay structure with random mechanisms. As informations redundancy and randomness are mechanisms provided by the approach used by Donet, this is clearly data-driven approach which allows Donet to exceed source-driven approach used by Peercast.
4. Conclusion Live p2p streaming applications are increasing on the Internet. They have an inconsistent use of the Internet because the Internet provides a best effort service whereas these applications are delay sensitive. Protocols used by these applications have to be well designed, so that users receive stream with required properties and network resources are optimised. From our works, we note p2p live streaming protocols can use several different approaches to manage the overlay and it influences drastically the network global performances of the network. We compared two main approaches with our simulator according to two important metrics: average time to first packet and data packet losses rates. The simulations results allow us to conclude that datadriven approach is better than source-driven approach. Indeed, p2p live streaming protocols require group communications functionnalities which have to be implemented at the hosts, but hosts can be very dynamic on the Internet. Data-driven approach, thanks to epidemic algorithms, does not structure the clients in the overlay and uses random mechanisms and redundant informations which prevent and anticipate the effects of the dynamicity of the clients. That is why data-driven approach seems to be the best approach to use for this kind of communication. Our present works turn toward improving simulated scenarii through more realistic network parameters and topologies. Thus we could obtain a better model to appreciate the impact of data-driven p2p live streaming applications on the hosts and the network itself. We are studying the traffic of recent streaming p2p applications and particularly applications using data-driven approach to get the finest evaluation of data transmitted. Moreover, we want to evaluate several aspects related to the data-driven approach and precisely its scalability since this approach adds redundant informations in the network.
References [1] http://www.coolstreaming.org. [2] http://www.peercast.org. [3] S. Banerjee, B. Bhattacharjee, and C. Kommareddy. Scalable application layer multicast. In SIGCOMM ’02: Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications, pages 205–217, New York, NY, USA, 2002. ACM Press.
[4] S. Banerjee, S. Lee, B. Bhattacharjee, and A. Srinivasan. Resilient multicast using overlays. In SIGMETRICS ’03: Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 102–113, New York, NY, USA, 2003. ACM Press. [5] M. Bawa, H. Deshpande, and H. Garcia-Molina. Streaming live media over peers. In Technical Report, 2002. [6] M. Bawa, H. Deshpande, and H. Garcia-Molina. Transience of peers & streaming media. SIGCOMM Comput. Commun. Rev., 33(1):107–112, 2003. [7] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. I. T. Rowstron, and A. Singh. Splitstream: High-bandwidth content distribution in cooperative environments. In IPTPS, pages 292–303, 2003. [8] Y.-H. Chu, S. G. Rao, and H. Zhang. A case for end system multicast. In Measurement and Modeling of Computer Systems, pages 1–12, 2000. [9] C. Diot, B. N. Levine, B. Lyles, H. Kassem, and D. Balensiefen. Deployment issues for the IP multicast service and architecture. IEEE Network, 14(1):78–88, / 2000. [10] V. K. Goyal. Multiple description coding: Compression meets the network. IEEE Signal Processing Magazine, 18(5):74–93, September 2001. [11] M. Hefeeda, A. Habib, B. Boyan, D. Xu, and B. Bhargava. PROMISE: peer-to-peer media streaming using CollectCast. In MM’03, 2003. [12] D. Kostic, A. Rodriguez, J. Albrecht, and A. Vahdat. Bullet: High bandwidth data dissemination using an overlay mesh. In Proc ACM SOSP., 2003. [13] J. Li. Peerstreaming: A practical receiver-driven peer-topeer media streaming system. In MSR-TR-2004-101, 2004. [14] S. McCanne, V. Jacobson, and M. Vetterli. Receiver-driven layered multicast. In ACM SIGCOMM, volume 26,4, pages 117–130, New York, Aug. 1996. ACM Press. [15] A. Medina, A. Lakhina, I. Matta, and J. Byers. BRITE: An approach to universal topology generation. In Proc. ACM Mascots, 2001. [16] V. S. Pai, K. Kumar, K. Tamilmani, V. Sambamurthy, and A. E. Mohr. Chainsaw: Eliminating trees from overlay multicast. In IPTPS, pages 127–140, 2005. [17] R. Rejaie and A. Ortega. Pals: peer-to-peer adaptive layered streaming. In NOSSDAV ’03: Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, pages 153–161, New York, NY, USA, 2003. ACM Press. [18] D. Tran, K. Hua, and T. Do. Zigzag: An efficient peerto-peer scheme for media streaming. In IEEE INFOCOM, 2003. [19] P. C. V.N. Padmanbhan, H.J. Wang. Resilient peer-to-peer streaming. In Proc. IEEE ICNP, 2003. [20] M. Zhang, L. Zhao, Y. Tang, J.-G. Luo, and S.-Q. Yang. Large-scale live media streaming over peer-to-peer networks through global internet. In P2PMMS’05: Proceedings of the ACM workshop on Advances in peer-to-peer multimedia streaming, pages 21–28, New York, NY, USA, 2005. ACM Press. [21] X. Zhang, J. Liu, B. Li, and T. P. Yum. Coolstreaming/donet: A data-driven overlay network for peer-to-peer live media streaming. In Proc. IEEE Infocom, 2005.