Journal of Network and Computer Applications 29 (2006) 25–45 www.elsevier.com/locate/jnca
Assessing the efficiency of stream reuse techniques in P2P video-on-demand systems Leonardo Bidese de Pinho*, Claudio Luis de Amorim Laborato´rio de Computac¸a˜o Paralela, Programa de Engenharia de Sistemas e Computac¸a˜o/COPPE, Centro de Tecnologia, Bloco H, Sala 319, Ilha do Funda˜o, Caixa Postal 68511, CEP 21941-972, Universidade Federal do Rio de Janeiro, Rio de Janeiro, RJ, Brasil Received 4 February 2004; received in revised form 21 October 2004; accepted 27 October 2004
Abstract Many works have reported simulated performance benefits of stream reuse techniques such as batching, chaining, and patching to the scalability of VoD systems. However, the relative contribution of such techniques has been rarely evaluated in practical implementations of scalable VoD servers. In this work, we investigated the efficiency of representative stream reuse techniques on the GloVE system, a low-cost scalable VoD platform whose resulting performance depends on the combination of the stream techniques it uses. More specifically, we evaluated performance of the GloVE system on delivering multiple MPEG-1 videos to clients connected to the server through a 100 Mbps Ethernet switch, and arrival rates varying from 6 to 120 clients/min. We present experimental results including startup latency, occupancy of server’s channels, and aggregate bandwidth that GloVE demanded for several combinations of stream reuse techniques. Overall, our results reveal that stream reuse techniques in isolation offer limited performance scalability to VoD systems and only balanced combination of batching, chaining, and patching explains the scalable performance of GloVE on delivering highly popular videos with low startup latency while using the smallest number of server’s channels. q 2006 Elsevier Ltd. All rights reserved. Keywords: Video-on-Demand (VoD); Continuous media; Stream reuse techniques; Peer-to-Peer (P2P) systems; Scalability
* Corresponding author. Tel.: C55 21 2562 7851; fax: C55 21 2562 7849. E-mail addresses:
[email protected] (L.B. de Pinho),
[email protected] (C.L. de Amorim). 1084-8045/$ - see front matter q 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.jnca.2004.10.009
26
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
1. Introduction In recent years, considerable research efforts have been concentrated on the design of scalable Video on Demand (VoD) systems since they represent a key-enabling technology for several classes of continuous media applications, such as distance learning and home entertainment. In particular, a central problem VoD system designers face is that in a typical VoD system with many videos, even a transmission of a single high-resolution video stream in a compressed format, consumes a substantial amount of resources of both the video server and the distribution network, which limits scalable performance of the VoD system. Given that users can choose any video and start playback at anytime, a significant investment on server’s hardware will be required to support large audiences. Therefore, it is fundamental that a VoD server supports stream distribution strategies capable of using efficiently the server’s resources in order to reduce its cost/audience ratio. Usually, a VoD server supports a finite pool of stream channels, where the amount of channels is often determined by the server’s network bandwidth divided by the video playback rate. The reason that conventional VoD systems cannot scale is because they dedicate one different channel to each active client, in an one-to-one approach (hereafter we refer to the equipment that a client uses to access the VoD service as the active client or simply the client). As a result, the total number of clients a conventional VoD system supports is equal to the limited number of server’s channels. The efficiency of a VoD system can be characterized by two metrics: scalability and startup latency. The former accounts for the number of simultaneous active clients a VoD system can support on transmitting video streams without interruptions to the video playback whereas the latter defines the average time difference between a client request and starting the video playback. Due to the scalability limitation of the one-to-one approach, several scalable stream reuse techniques (also known as “stream merging” or “streaming” techniques) have been proposed in the literature. Basically, such techniques allow multiple clients to share the stream contents that are delivered through each of the server’s channels, in an one-to-many approach. Two important examples of scalable streaming techniques are Chaining (Sheu, 1997), where earlier arriving clients transmit video streams to the new clients, forming stream chains across the network, and Patching (Hua, 1998), in which extra short streams, namely the patches, are added to active multicast streams so that clients that missed the beginning of any of the streams can catch it up. In previous work, we proposed a new streaming technique, which we called the Cooperative Video Cache (CVC) (Ishikawa and Amorim, 2001) that combines chaining and patching under a P2P model. Essentially, CVC extends the functionality of clients’ playout buffers1, by treating them as parts of a globally distributed memory that caches and supplies stream contents to active clients in a cooperative way. In Pinho et al. (2002), Pinho et al. (2003) we reported performance results of a VoD prototype we developed, namely the Global Video Environment (GloVE) that confirmed that CVC adds scalable
1
Clients use playout buffers to hide the network jitter and inherent variations of variable-bit-rate (VBR) videos.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
27
performance to conventional VoD servers in practical environments such as video distribution in an intranet. In contrast with previous works that evaluated performance of scalable streaming techniques through simulations, in this work we assess the efficiency of representative streaming techniques in practical situations. More specifically, since the GloVE prototype allows the evaluation of streaming techniques either separately or in any combination of them, we examined their relative contribution to the scalability of VoD systems that GloVE supports. In addition, we addressed the impact of the popularity distribution of videos on the effective use of VoD system’s resources. In summary, the main contributions of our work are: † We assessed the relative contribution of stream reuse techniques to the scalable performance of VoD systems in practical settings. † We quantified the influence of video prefetching rate on the performance behavior of a VoD server. † We evaluated the impact of the distribution of video popularity on VoD system’s performance. The remainder of this paper is structured as follows. In Section 2, we briefly discuss related works. Section 3 presents stream reuse techniques. In Section 4, we describe the main characteristics of the GloVE platform. Section 5 analyzes GloVE’s performance results for several configurations of streaming techniques under various workloads. Finally, we conclude in Section 6.
2. Related work In this section, we describe related works on stream reuse techniques and peer-to-peer systems. 2.1. Works on stream reuse techniques The techniques we evaluated in this paper were Batching (Dan et al., 1996), Chaining (Sheu et al., 1997), Patching (Hua et al., 1998), and CVC (Ishikawa and Amorim, 2001) We used the Batching technique in a different way as proposed by its authors. In our approach clients instead of the server could apply the Batching technique2. Regarding to Chaining and Patching, we implemented both techniques in a straight way. An extension to the Patching technique we did not evaluate was the Transition Patching (Cai and Hua, 1999), which proposed the application of patches to the patch streams. The original CVC (Ishikawa and Amorim, 2001) introduced the cooperative stream cache over the distributed playout buffers of the clients using only the concepts of Chaining and Patching. In recent works (Pinho et al., 2002; Pinho et al., 2003), we showed that 2
Refer to the work in (Ghose and Kim, 2000) for a classification of Batching policies.
28
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
the Batching technique could improve performance of CVC in VoD systems that only used unicast. Finally, the work in (Bar-Noy et al., 2004) presented a comparison of stream merging algorithms, but the authors did not report results of any practical implementation of the algorithms they analyzed. 2.2. Works on peer-to-peer systems The explosive growth of peer-to-peer systems started with the development of filesharing software, noticeably Napster (Napster) and Gnutella (Gnutella). Napster employed a centralized server to keep the metadata of all the contents the peers stored across the Internet. Gnutella introduced the distributed metadata approach, which obviated the need for a central management. GloVE is similar to Napster in the sense that both use a central metadata manager. Recently, closely related works in peer-to-peer video streaming have been developed: CoopNet (Padmanabhan et al., 2002), P2Cast (Guo et al., 2002), ZigZag (Tran et al., 2003), GnuStream (Jiang et al., 2003), DirectStream (Guo et al., 2003), and PROMISE (Hedeeda et al., 2003) are valuable academic examples whereas AllCast, vTrails, and C-span (C-star) are examples of commercial software. The idea behind CoopNet is to use clients as stream providers of contents they received from the server previously, but only over periods of server overload (called “flash crowd”). P2Cast used clients to create application level multicast trees for streaming, where these clients could send patches to other clients to recover from video sequences they missed. ZigZag also proposed an application-level multicast tree scheme, but its main goal was to support small end-to-end delay. The current version of GloVE uses IP Multicast facility as provided by typical LANs, but can be extended to work with application-level multicast too. GnuStream introduced the novelty of working over the widespread Gnutella network, which had already offered a huge amount of continuous media content. DirectStream is an on-demand video streaming service that uses a directory server to administer active clients in a manner very similar to that of GloVE. PROMISE offered support for client heterogeneity, reliability, and limited capacity, which made the system suitable for the Internet as shown in the experiments the authors reported. This is not the case of our current version of GloVE. In relation to commercial products, detailed technical information was unavailable, so no direct relationship could be made with GloVE. Also, such works reported simulated performance of their proposals, in contrast with our work that evaluated a prototype using realistic network traffic.
3. Stream reuse techniques This section presents briefly scalable streaming techniques that are representative examples of the class of multicast-based stream reuse techniques that have been proposed in the literature (Ma and Shin, 2002). These techniques can be categorized as proactive or reactive depending whether the VoD server transmits periodically video streams through its channels independently of client requests or the VoD server sends streams through its channels in response to client requests.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
29
3.1. Proactive streaming techniques 3.1.1. Broadcasting Usually, broadcast schemes (Hu, 2001) divide the video stream into segments that are transmitted periodically to clients through dedicated server’s channels. In this way, while a video segment plays back, the server schedules and transmits the subsequent video segment in due time, thus guaranteeing no interruptions to the video playback. Despite its high scalability, broadcasting wastes many channels and imposes high latency (typically in the order of hundreds of seconds). 3.2. Reactive streaming techniques 3.2.1. Batching In this approach, requests issued within a certain time interval to the same video are first enqueued and the group of requests is served afterwards by a single multicast stream (Dan et al., 1996) Two scheduling alternatives can be used either based on the queue’s length or the average waiting time of a request. In the former, a multicast stream is sent to the waiting clients after the queue length reaches a determined value. In the latter, arriving clients are inserted into the queue forming a group that will wait until a determined time, after which a video stream will be transmitted to that waiting group. Note that early clients will wait more than later ones, which tends to create variations in startup latency, especially when batching allows relatively long queuing time in an attempt to reduce occupancy of the server’s channels. 3.2.2. Chaining The idea behind Chaining (Sheu et al., 1997) is to create chains of client buffers. Specifically, as long as an early client holds the initial part of a video in its playout buffer, the early client rather than the video server can service subsequent requests for the video, thus alleviating the demand on the server. We assume clients start video playback from the beginning. Note that chaining follows the peer-to-peer model, which is highly scalable. In practice, however, the scalability of chaining depends on the aggregate bandwidth the distribution network supports, since it limits ultimately the amount of active streams across the network. 3.2.3. Patching In this technique (Hua et al., 1998), the server starts a full video multicast to the first client that requested the video. It has to be a full stream transmission because the server has to provide all video segments in sequence to the first client. However, subsequent clients that request the same video will be added to the existing video multicast group while storing the streaming contents they receive in their local buffers. Concurrently, the server will transmit to each new client an extra video segment, namely the patch, which contains the initial video segment the new client missed. Once the patch arrives, the client can start to playback the video patch, after which it switches to its local buffer to continue to playback the remained video segments. Assuming the VoD system uses a stream rate equal to its playback rate, the length of the patch stream will be proportional to the time
30
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
interval between the request issued by the first client and the current request. Thus, a potential disadvantage of patching is that it requires playout buffers with enough capacity to cope with large patches. 3.2.4. Cooperative video cache (CVC) This technique implements a global video cache management over the video streams that have been stored in the client’s local buffers across the communication network. In addition, it uses a combination of chaining and patching when reusing video streams from the global video cache. Basically, CVC reuses the distributed video contents by chaining client buffers or applying patches to recycle active multicast streams in such a way that minimizes communication traffic (Ishikawa and Amorim, 2001). In past works (Pinho et al., 2002; Pinho et al., 2003), we added Batching to the original CVC to enable arriving clients within a given time interval to form multicast groups. Batching is particularly suitable when CVC is coupled with an unicast pull-based server3. In this case, the opportunities for stream reuse come from the multicast streams that clients generate. A client can multicast a video stream provided that it has control of the time of delivery of video segments. An efficient manner to enforce time control is to synchronize the video transmission with either its reception or its consumption rate. Since the reception rate is inadequate for pull-based servers, the consumption rate becomes the natural control option. However, using the latter is limited by the fact that a client only starts to playback the video when its playout buffer reaches the minimum content level, which we called the ‘prefetch limit’. Thus, a client cannot multicast the video it receives until the prefetch limit is reached. In the meantime, the server will have to handle new requests, thus increasing the occupancy of its channels. This situation exacerbates when the client arrival rate is high, since many server’s channels will be busy. Batching enables CVC to hide this negative effect, because it allows incoming requests to join the multicast group associated with the first client prefetching for the video. As a result, once the playout buffer of the first client hits the prefetch limit, it will start transferring blocks not only to its video decoder but also to its multicast group. In this work, we concentrate on the use of reactive streaming techniques to build scalable VoD servers with low startup latency.
4. The GloVE system In this section, we describe briefly the main features of GloVE (Global Video Environment), a scalable peer-to-peer VoD system that we built and used to assess the efficiency of stream reuse techniques in practical situations. GloVE (Fig. 1) implements the cooperative video cache while supporting a peer-topeer system with a centralized metadata component, namely the CVC Manager (CVCM). CVCM monitors globally the video contents that are stored in the playout buffers of 3 VoD Servers are defined either as push-based or as pull-based depending whether the server or the client is whom determines the transmission rate, respectively.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
31
Fig. 1. The architecture of the Global Video Environment (GloVE).
the clients. The CVC Client (CVCC) software allows the content of the client buffer it manages to be shared with other clients, thus potentially reducing the demand on the CVC Server’s (CVCS) channels. GloVE assumes that the communication network has IP multicast support and that the VoD server manages the video as a sequence of blocks that are randomly accessible4. Next, we describe the GloVE components in more detail. 4.1. The CVC server (CVCS) CVCS works as block-based, unicast-only video server. CVCS receives requests from clients and sends back 64 KB blocks to them using the UDP protocol over unicast transmission. CVCS guarantees the order of arrival of blocks at the clients by retransmitting sub-blocks in case of being corrupted or lost. Clients can use two types of access to the server: † Coarse access. This access type happens when clients request blocks to the server without controlling the transmission rate. In this case, the response time a client experiences depends on the available server’s bandwidth, which is inversely proportional to the amount of connected clients that receive video streams from the server. For coarse accesses using a Fast-Ethernet switch (100 Mbps), the duration of prefetching ranges from less than one second up to the elapsed time equivalent to the size of the prefetching segment of the playout buffer. † Smooth access. This type of access refers to clients requesting blocks at the average video playback rate. Using smooth access, the duration of prefetching is proportional to the elapsed time equivalent to the size of the prefetching segment of the playout buffer. 4 In current design of GloVE, the server always sends unicast streams. In contrast, clients potentially send multicast streams when ordered by CVCM, although they may have only a single receiver in some cases.
32
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
4.2. The CVC client (CVCC) CVCC runs on the client device and manages its local playout buffer. CVCC uses multiple threads (Fig. 2) to implement the following main tasks: 1) To request/receive blocks to fill the buffer, 2) To retrieve blocks from the buffer and send them to a third-part video decoder software, and 3) To send blocks through a multicast stream when signalized by the CVC manager. These tasks are executed by five threads: † ReadServer thread. It runs when there is no opportunity for reusing video content from another client, and it fills up the playout buffer by retrieving video blocks from the server. † ReadMulticast thread. It controls the reception of video blocks that come from the provider’s multicast group and stores them in the playout buffer; it also unblocks the Recover thread if some block is missing. † PlaySend thread. It stays blocked until the playout buffer is filled up to the prefetch limit, when it pipelines blocks from the playout buffer to the video decoder and also towards its multicast group if signalized by the Control thread. † Control thread. It remains blocked waiting for a message from CVCM to either start or stop supplying video streams. † Recover thread. This thread wakes up when a video block is missed, and requests a new provider to CVCM. If a new provider is available then the ReadMulticast thread is restarted, otherwise the Recover thread enables ReadServer to retrieve blocks from the server. The current GloVE design assumes that the decoder is able to receive content from the standard input. In this way, the decoder is linked to the application through a pipe, allowing the client to read blocks in a synchronized manner with its video playback rate.
Fig. 2. The framework of the CVC Client (CVCC).
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
33
4.3. The CVC manager (CVCM) Usually CVCM runs in the same server’s unit, and keeps track of all the active clients in the system by maintaining information on video blocks each client last received and last transmitted. Based on these data, CVCM uses a finite-state machine (FSM) to categorize the client’ state dynamically. The FSM defines three state transitions according to the incoming requests, as follows: † Prefetching video blocks, in which CVCM uses Batching for reusing contents. A client stays in this state until its playout buffer reaches the prefetch limit. † Playback without discarding, in which CVCM creates a new full multicast stream. A client changes to this state if it had not become a provider during the prefetching phase. After this phase, it can only be used to create a new multicast, provided that the incoming video stream did not exceed the playout buffer’ size. † Providing reusable stream. In this state, CVCM inserts a new client into the group of receivers of the multicast stream already active, and orders the server to send a patch to the incoming client. A client enters in this state if it is asked to provide streams during one of the two above states. A client remains in this state for a period of time equivalent to half of the playout buffer’s length. 4.4. Stream reuse policies Given several options for stream reuse, CVCM has to implement effective policies that maximize the overall system’s performance. Since each stream reuse technique has pros and cons and that CVC can dynamically select a suitable reuse technique, we established the following two main policies: † Policy with emphasis on chaining (PEC), where the CVC manager always tries first to create a new multicast stream from an active client that is playing the video and had not discarded the initial block yet. If this is not possible, CVC tries to reuse some active multicast stream using Patching. In general, this strategy creates long chains of active clients, where the majority of the multicast streams has just a single receiver. This facilitates the task of recovering from broken chains. However, the PEC policy may lead to an aggregate bandwidth consumption that is excessive. † Policy with emphasis on patching (PEP), which gives higher priority to reuse the current multicast stream, so it requires less aggregate bandwidth. However, it increases the complexity of recovering from broken chains as several clients share the same stream. Also, the PEP policy is likely to often use short patch streams. 4.5. Operation modes Another significant feature of CVCM is that it allows the evaluation of alternative operation modes using different combinations of Chaining, Patching, and Batching. In this work, we evaluated our VoD prototype for five operation modes we defined, as follows.
34
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
† Conventional mode, which mimics a conventional VoD system without any stream reuse technique. † Chaining mode, which implements chaining as we proposed, creating client chains where each stream has only one receiver. † Batching mode, in which clients that request the same video will form a single group, which the first client that receives the stream will multicast to. † PatchingCBatching mode, which implements a combination of patching and batching modes. † CVCCBatching mode, which implements CVC with Batching extension. For this mode, we concentrated our performance analysis on the emphasis-on-patching (PEP) policy only, since it produced the best results according to our past experiments (Pinho et al., 2002; Pinho et al., 2003).
5. Experimental methodology In this section, we present our testbed, the evaluation methodology, and experimental results we obtained 5.1. Hardware platform The experiments we describe below were performed on a 6-node cluster of Intel Pentium IV 2.4 GHz, 1 GB RAM, running Rocks Linux 2.3.2 with kernel 2.4.18 using a Fast Ethernet switch with IP multicast support (3Com Superstack II 3300, version 2.69). One node executed both the video server (CVCS) and the CVC manager (CVCM) whereas each of the other five nodes executed multiple instances of clients. 5.2. Evaluation methodology We used a MPEG-1 video (352x240, 29.97 frames/s) of Star Wars IV movie, with average transmission rate near to 1.45 Mbps. We allowed multiple clients per node, for which we developed an emulator of MPEG-1 decoder, which used a trace file containing the playback time of each segment of the video. The trace file was obtained by collecting the blocks’ consumption pattern of the real decoder over previous execution of the video. The workload we assumed range from medium to high clients’ arrival rates according to a Poisson process with values of 6, 10, 20, 30, 60, 90, and 120 clients/min. We assumed that the CVCS video server supports at most 56 MPEG-1 streams simultaneously, according to previous measurements we made using the same network infrastructure. Also, we evaluated the playout buffer for sizes 64, 128, and 256 blocks, which could store video sequences of 22, 44, and 88 s, respectively. We evaluated performance of a VoD system under practical network conditions for all defined operation modes, except for the conventional one. By practical conditions we mean that, instead of simulation (MacGregor, 2002), we employed an emulation approach by setting GloVE variables as follows: (1) 64 KB blocks as the system access unit;
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
35
(2) prefetch limit of 32 blocks; (3) patch limit equal to half of playout buffer size less a tolerance of four blocks, and (4) discarding limit equal to playout buffer size less the prefetch limit minus a tolerance of eight blocks. The CVC manager introduced a tolerance of four blocks due to possible difference in information regarding the video contents that the playout buffers actually stored and the metadata of playout buffers across the system, which the manager maintained. The values of GloVE variables determine the time window where contents can be potentially reused. Given the average consumption rate of the video we used in the tests, the average block playtime was equal to 0.345 s. The duration of prefetch phase was determined by the stream source, as follows. For coarse access to the server, the prefetch duration depended on the number of active streams in the server, with a minimal value near to 350 ms. For smooth access to the server the prefetch duration was about 11 s. When a client was the streaming source, the prefetch duration depended on the kind of reuse it was employed, ranging from few seconds with patching up to 22 s with batching. As an example, consider a playout buffer of 128 blocks and the patch limit of 64 blocks. In this way, the streams that the clients generated could be reused for patching within 20 s (block play time!patch limit) interval after their creation. Moreover, a client could provide a new complete stream up to 30 s (block play time!limit of discarding) after starting playback. For the experiments with multiple videos, we emulated a collection of eight videos, where the video popularity followed a general Zipf distribution with aZ0.7 (Dan et al., 1996). In addition, we also investigated the sensitivity of the VoD system to aZ0 and aZ1 which represented an uniform distribution, and Zipf without skew, respectively. 5.3. Experimental results We present performance results of our experiments in three separate parts: singlevideo performance, multiple-video performance, and a sensitivity analysis of server’s performance to the popularity distribution of the videos. We used 3D graphics to plot performance results we obtained of our VoD server for several performance metrics. Specifically, we plotted the amount of busy channels and active streams for several combinations of client arrival rate, playout buffer size, and operation modes. The main reason for using 3D rather than 2D curves was that 3D curves became much more suitable to show visually the main performance differences between the combining effect of stream reuse techniques with multiple variables. 5.3.1. Experiments with a single video 5.3.1.1. Channel utilization. Fig. 3 shows the utilization of server’s channels using block request rate with smooth and coarse access for different GloVE modes and playout buffer sizes. Note that channel occupancy indicates the relative degree of scalability of each particular technique. Specifically, the lower channel occupancy a technique produced for a given amount of active clients the higher was its scalability. Fig. 3(a) shows that the CVCCBatching mode occupied only one channel, which is the minimum amount of channels (MAC), for playout buffers with size larger than 64 blocks.
36
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
Fig. 3. Utilization of server channels under each GloVE operation mode, (a) CVCCBatching, (b) Chaining, (c) PatchingCBatching, (d) Batching.
For arrival rates lower than 20 clients/min, CVCCBatching showed a small increase in channel occupancy because the average interval between consecutive requests was at least 6 s. Recalling that 64-block playout buffers allowed stream reuse after starting playback within intervals of at most 8 s for chaining and 10 s for Patching, the chance of clients missing the reuse window was high for arrival rates less than 20 clients/min, which reflected the low use of channels. As the reuse window depended on prefetching for modes that included Batching, the opportunity for stream reuse was higher for smooth accesses to the server. The reason was that smooth accesses generated long prefetchings, leading to lower channel’s occupation in comparison with coarse accesses, which issued short prefetching operations. Regarding buffer size, the Chaining mode (Fig. 3(b)) also presented different behavior with 64-block playout buffers for low arrival rates. In particular, the reuse window was limited to Chaining within 8 s intervals, which restricted MAC to eight channels. Another characteristic of chaining mode was the significant influence of prefetching on arrival rates higher than 20 clients/min. In this case, the higher was the arrival rate, so was the probability of a client to arrive while previous ones were either prefetching for or providing video streams. Note that when coarse accesses were used the prefetching phase was shorter and the channel occupancy presented slightly lower values. Fig. 3(c) shows that the PatchingCBatching mode was influenced strongly by the server access type. The fact was that this mode required initial stream reuse using
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
37
Batching, because a multicast stream had to be active to allow reuse via Patching. Given that prefetching for short streams used coarse access to the server, the use of channels tended to be very high for arrival rates lower than 60 clients/min. In contrast, with smooth access the opportunity for stream reuse using Batching increased significantly, so that MAC was reached at 20 clients/min. Note that the playout buffer size influenced the chances of using Patching, especially when using larger buffers for low arrival rates, which led to better use of channels. As can be seen in Fig. 3(d), the server access type also determined the efficiency of the Batching mode. Specifically, when using the server with smooth access, channel’s occupation was very similar to that achieved by the PatchingCBatching mode. However, Batching did not benefit from larger buffers, in contrast with PatchingCBatching, which was improved for low arrival rates. The difference of channel occupancy curves of the two models came from the longer average prefetching the Batching mode generated for arrival rates ranging from 20 to 90 clients/min. When Patching was used, the average prefetching was substantially reduced, decreasing the opportunity of stream reuse for the next arriving clients. 5.3.1.2. Active streams. Fig. 4 shows the amount of active streams in the system for each operation mode. The values in the figure indicate the aggregate bandwidth as required by
Fig. 4. Number of active streams under each GloVE operation mode, (a) CVCCBatching, (b) Chaining, (c) PatchingCBatching, (d) Batching.
38
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
each mode. Note that the minimum amount of active streams was at least equal two, which included one stream from both the server and the first client. For the CVCCBatching mode, as shown in Fig. 4(a), the number of active streams was practically the same whether using smooth or coarse accesses to the server. The size of a playout buffer only influenced the amount of active streams for low arrival rates. Another factor that affected the number of active streams was the type of stream reuse. Whenever Patching could be used no full streams were generated. Otherwise, the server attempted to use Chaining, and generated a new stream whenever possible. The latter alternative occurred often for low arrival rates and playout buffers with small size. Intuitively, the Chaining mode (Fig. 4(b)) led to the same amount of active streams and clients, since Chaining generated the streams that had only one receiver and the Chaining mode was independent of buffer size and server’s access type. Fig. 4(c) presents the curves of active streams for the PatchingCBatching mode. As the figure shows, this mode was influenced by both the playout buffer size and server access type, especially for arrival rates lower than 60 clients/min. For smooth access, the server had more opportunities for stream reuse through Patching, which led to less active streams. For coarse access, as the arrival rate decreased, the probability of stream reuse using Batching increased with just one receiver per stream, and also with many more streams coming from the server because no stream reuse was possible. The number of active streams for the Batching mode as shown in Fig. 4(d) depended on both arrival rate and server access type. For arrival rates higher than 30 clients/min the influence of server access type was negligible, with many clients sharing the same streams. For arrival rates lower than 30 clients/min, the majority of streams for a server with coarse access came from the server itself, because there was no opportunity to use Batching. When using a server with smooth access, Batching dominated the generation of streams. 5.3.1.3. Startup latency. The latency between a video request and its playback is shown in Fig. 5. Intuitively, an efficient and balanced VoD system should provide short startup latency to all clients with minimum time variance between them. Fig. 5(a) presents the startup latency for a server with smooth accesses. The figure shows the average and range (minimum and maximum) values we measured for each
Fig. 5. Error bars showing playback start latency (max, avg, min) under different modes using playout buffers of 128 blocks, (a) with smooth prefetching, (b) with coarse prefetching.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
39
client arrival rate. The average latency for the CVCCBatching mode was shorter than 10 s, which was slightly less than the latency of 11 s that clients experienced when they received streams from the server at the playback rate. The minimum latency value was near to 5 s, which occurred when Patching was used and the patch stream was equivalent to the half of the prefetch limit (16 blocks!block play timeZ5.5 s). The maximum value was near to 18 s, which happened when a client was inserted into the group of a prefetching client (maximum latencyZ2!prefetch limitZ22 s). For the Chaining mode, the average latency was near to 11 s, and the minimum and maximum values were very close to the average latency. This occurred because in Chaining all video streams followed the playback rate and no waiting time happened between issuing the video request and starting the prefetching for the video stream. The PatchingCBatching mode had similar behavior to that of CVCCBatching. In particular, it led to a slightly decrease on the average latency for arrival rates between 20 and 60 clients/min, because the use of Patching increased within this range. For the Batching mode, the average latency was near to 13 s because a lot of clients were inserted into the group of a client that already performed the prefetching for the video, which also increased the minimum startup latency to about 8 s. Fig. 5(b) presents startup latency for a server with coarse accesses. Notice that all modes had minimum latency near to 350 ms, which was experienced by the first client that received a stream from the server. For the CVCCBatching mode, the average latency was close to 4 s, with maximum value less than 12 s, which happened to a client that was inserted in the group of a prefetching client (350 ms C11 s). For the Chaining mode, the average latency was approximately 9 s for arrival rates from 10 to 30 clients/min. However, for higher arrival rates, the average latency started to decrease due to the prefetching effect, which required additional server channels. The maximum startup latency was similar to that of CVCCBatching. The PatchingCBatching mode had an average startup latency that was dependent on the client arrival rate. As the arrival rate increased, so did the average startup latency because more clients were served through Patching instead of the VoD server. The maximum startup latency was close to 9 s, and followed that of Patching. For the Batching mode, the average latency increased according to the arrival rate until 60 clients/min, at which the average latency was 12 s, while maintaining this value for higher arrival rates. Batching produced the maximum value of 17 s for the average startup latency for this server’s access type. This could be explained as the result of a new client being inserted into the multicast group of a previous client that was prefetching for the same video. 5.3.2. Experiments with multiple videos 5.3.2.1. Channel utilization. Fig. 6 presents the utilization of server’s channels for different GloVE modes and eight videos stored in the VoD server. The videos’ popularity followed a Zipf distribution with aZ0.7. Note that at least one channel was used for each different video the clients requested. In the worst case, each new client request used one channel, which represented ultimately the behavior of a conventional VoD system. The curves of Fig. 6(a) show that the server used a minimum amount of channels (MAC) when GloVE reached at least 20 clients/min with 256-block playout buffers. The
40
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
Fig. 6. Utilization of server channels under each GloVE operation mode, (a) CVCCBatching, (b) Chaining, (c) PatchingCBatching, (d) Batching.
curve for 128-block playout buffers indicates that MAC occurred at 60 clients/min. Recall that for a single video, MAC was reached with near to 10 clients/min. The same relationship remained for 64-block playout buffers, which demonstrated the direct influence of the number of videos on the occupancy of server’s channels. The negative impact of the server access type on the occupancy of server’s channels was significant for average playout buffer size and average arrival rate. This result contrasted with the case of a single video where the negative impact was restricted to the smallest playout buffers for low arrival rates. Similarly to the single video case, the Chaining mode (refer to Fig. 6(b)) continued to suffer the negative consequence of prefetching, especially for arrival rates higher than 30 clients/min. At this arrival rate, we noticed that the minimum server’s occupancy occurred with 16 channels. For the same reason, as in the single video case, Chaining performed better using coarse accesses to the server. For the PatchingCBatching mode, the observations we pointed out for the single video case held also for multiple videos, as can be seen in Fig. 6(c). The main difference was that there existed considerably few opportunities to reuse streams in either Batching or Patching. For arrival rates ranging from 60 to 120 clients/min, MAC was reached with smooth accesses to the server. Note that with coarse access to the server, MAC was 20 and for the highest client arrival rate.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
41
In Fig. 6(d), channel’s occupancy behavior was very similar to that of PatchingC Batching mode we previously analyzed. However, it led to slightly higher channel occupancy for arrival rates ranging from 10 to 90 clients/min. 5.3.2.2. Active streams. The amount of active streams appears in Fig. 7. The minimum number of active streams tended to be 16 provided that all the videos were requested twice at least. In CVCCBatching mode (Fig. 7(a)), the curves remained similar for both coarse and smooth accesses. The size of the playout buffer affected the performance for arrival rates from low to medium. As explained in the case of a single video, the emphasis on Patching led to the lowest amount of active streams. Naturally, Chaining tended to be used more within shorter arrival rates and with smaller playout buffers. In Chaining mode (Fig. 7(b)) the number of active streams was equal to the number of active clients, as in the case of single video. A comparison of the two modes as shown in Fig. 7(c) and (d), suggested that both modes performed similarly. The main differences appeared mainly for playout buffer sizes of 128 and 256 blocks. With either playout buffer sizes, the PatchingCBatching mode generated fewer streams due to the opportunity of stream reuse through Patching, which
Fig. 7. Number of active streams under each GloVE operation mode, (a) CVCCBatching, (b) Chaining, (c) PatchingCBatching, (d) Batching.
42
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
Fig. 8. Error bars showing playback start latency (max, avg, min) under different modes using playout buffer of 128 blocks, (a) with smooth prefetching, (b) with coarse prefetching.
was unavailable in the Batching-only mode. Also, while the former achieved a minimum count of 16 streams, the latter created as many as 26 streams at least. 5.3.2.3. Startup latency. The startup latency between a client request and starting video playback is shown in Fig. 8. Fig. 8(a) shows the startup latency when using a server with smooth access. Again, all values the figure presents were very similar to those of a single video, so the previous observations remained valid. Fig. 8(b) presents the startup latency for a server with coarse accesses. In contrast with smooth access, the average values for this access mode revealed significant modifications to the startup latency, especially for PatchingCBatching and Batching modes. As the number of clients that received streams from the server raised significantly, the average startup latency decreased near to 1 for arrival rates lower than 60 clients/min. For higher arrival rates, the average startup latency started to increase. 5.3.3. Effect of the popularity distribution of videos on server’s performance Fig. 9 illustrates the impact of the distribution of video’s popularity on the occupancy of server’s channels.
Fig. 9. Utilization of server channels under different values of Zipf skew, (a) smooth access, (b) coarse access.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
43
Table 1 Summary of results, average (Avg) and corresponding standard deviation (SD), measured for single video and playout buffer size of 128 blocks Operation mode
Server channels Smooth
CVCC Batching Chaining PatchingC Batching Batching
Coarse SD
Amount of streams
Latency (s)
Smooth
Smooth
Avg
Coarse
Avg
SD
Avg
SD
Avg
1.2 17.3
0.6 4.3
1.4 14
0.9 3.6
8.7 55.7
7.1 0.6
8.7 55.5
4.1 4.7
5.8 6.6
23.6 17.1
18.3 18.9
10.4 18.2
9.6 13.8
26.5 26.1
SD
Coarse
Avg
SD
Avg
SD
6.8 1.1
10.1 11
0.5 0.1
4 7.9
0.4 1.3
18.4 19
10.3 13.2
0.4 0.3
2.2 9.9
0.7 3
Table 2 Summary of results, average (Avg) and corresponding standard deviation (SD), measured for single video and playout buffer size of 128 blocks Operation mode
Server channels Smooth Avg
CVCC Batching Chaining PatchingC Batching Batching
Amount of streams Coarse
SD
Smooth
Latency (s)
Coarse
Smooth
Coarse
Avg
SD
Avg
SD
Avg
SD
Avg
SD
Avg
SD
14.5 30.9
7.9 2.8
15.7 25.8
8.9 4.2
31.2 55.6
12.1 0.9
31.9 55.3
11.7 1.5
10.9 10.9
0.1 0
4.5 5.8
0.6 0.6
24.6 26
16.1 15.9
49.2 49.8
9.2 8.3
35.2 43.4
14.8 11.2
51.9 53.8
7.1 4.1
11 12.3
0.1 0.8
1.2 1.5
0.6 1.1
As can be seen in Fig. 9(a) and (b), the variations on the distribution of popularity of videos did not impact significantly the amount of channels the VoD system required to service the client requests. Indeed, none of the operation modes presented variations higher than 20% on the occupancy of server’s channels. 5.4. Discussion In the following, we discuss overall results presented in the previous section. We assume the average values we obtained with playout buffers of 128 blocks for arrival rates of 6, 10, 20, 30, 60, 90, and 120 clients/min5. 5 Except for startup latency and popularity distribution of videos, which did not consider the value of 6 clients/min.
44
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
Table 3 Average channel usage for different Zipf skews and playout buffer size of 128 blocks Operation mode CVCC Batching Chaining PatchingC Batching Batching
Zipf with aZ0
Zipf with aZ0.7
Zipf with aZ1
Smooth
Coarse
Smooth
Coarse
Smooth
Coarse
13.6 30.7
14.8 25.9
12.1 30.3
13.1 24.6
11.3 27.0
11.9 22.9
23.4 24.7
47.7 48.2
20.5 22.1
48.1 48.8
18.5 20.3
47.3 47.6
Table 1 presents the results for a server with single video. These values showed that the CVCCBatching mode achieved the best results for all the performance metrics we evaluated. Table 2 summarizes the average results for multiple videos. Again, the mode CVCC Batching outperformed the other modes for all the performance metrics we tested. Finally, Table 3 shows average channel occupancy for several distributions of popularity of videos. The results in the table demonstrated clearly the low impact of the a value on channel occupancy. Also, they supported the evidence that the system should perform well in widely different environments.
6. Conclusions In this work, we compared the efficiency of reactive stream reuse techniques in a practical peer-to-peer VoD system called the Global Video Environment (GloVE) we developed in the Parallel Computing Laboratory at COPPE/UFRJ. In particular, we described the design and evaluation of different operation modes of GloVE according to stream reuse techniques it implemented, namely Batching, Chaining, Patching, and CVC. Also, we measured the influence on system behavior of client’s access pattern to the server. Finally, we analyzed the impact of the popularity distribution of videos on GloVE’s behavior. Overall, the CVCCBatching mode outperformed the other modes on all metrics we assessed, for both cases of single and multiple videos stored in the server. Also, CVCC Batching performance was not significantly affected by the client access pattern to the server, which supports the evidence that such combination of stream reuse techniques can work with different VoD servers. Furthermore, we showed that skews on the popularity distribution of videos did not impact significantly the VoD system’s performance. Thus, we speculate that the VoD system will attain good performance for other heterogeneous audiences with different access patterns to the videos.
L.B. de Pinho, C.L. de Amorim / Journal of Network and Computer Applications 29 (2006) 25–45
45
Acknowledgements The authors would like to thank the anonymous referees for comments that helped improve the paper. We also thank Edison Ishikawa for his helpful comments on all phases of this work, and Leonardo Bragato for the design and implementation of the CVCS video server. This research was partially supported by the Brazilian Agencies: CAPES, CNPQ, and FINEP. References AllCast, http://www.allcast.com. Bar-Noy A, Goshi J, Ladner RE, Tam K. Comparison of stream merging algorithms for media-on-demand. Multimedia Syst 2004;9:411–23. C-star, http://www.centerspan.com. Cai Y, Hua KA. An effcient bandwidth-sharing technique for true video on demand systems, in: Proceedings of the ACM multimedia, Los Angeles, CA, USA; 1999, pp. 211–14. Dan A, Sitaram D, Shahabuddin P. Dynamic batching policies for an on-demand video server. Multimedia Syst 1996;4(3):112–21. Ghose D, Kim HJ. Scheduling video streams in video-on-demand systems: a survey. Multimedia Tools Applic 2000;11:167–95. Gnutella, http://www.gnutella.com. Guo Y, Suh K, Kurose J, Towsley D. P2Cast: peer-to-peer patching scheme for VoD service, in: Proceedings of the 12th word wide web conference (WWW), Budapest, Hungary; 2002, pp. 301–09. Guo Y, Suh K, Kurose J, Towsley D. A peer-to-peer on-demand streaming service and its performance evaluation, in: Proceedings of the international conference on multimedia and expo (ICME), 2; 2003, pp. 649–52. Hedeeda M, HabibA, Botev B, Xu D, Bhargava B. PROMISE: peer-to-peer media streaming using collectcast, in: Proceedings of the ACM multimedia, Berkeley, CA, USA; 2003, pp. 45–54. Hu A. Video-on-demand broadcasting protocols: a comprehensive study, in: Proceedings of the IEEE INFOCOM, Vol. 1, Anchorage, AK, USA; 2001, pp. 508–17. Hua KA, Cai Y, Sheu S. Patching: a multicast technique for true video-on-demand services, in: Proceedings of the ACM multimedia, Orlando, FL, USA; 1998, pp. 191–200. Ishikawa E, Amorim C. Cooperative video caching for interactive and scalable VoD systems, in: Proceedings of the first international conference on networking, part 2, lecture notes in computer science, 2094; 2001, pp. 776–785. Jiang D, Dong Y, Xu D, Bhargava B. Gnustream: a p2p media streaming system prototype, in: Proceedings of the international conference on multimedia and expo (ICME), 2; 2003, pp. 325–28. Ma H, Shin KG. Multicast video-on-demand services. ACM SIGCOMM Comput Commun Rev 2002;32(1): 31–43. MacGregor I. The relationship between simulation and emulation, in: Proceedings of the winter simulation conference (WSC’02), San Diego, CA, USA; 2002, pp. 1683–88. Napster, http://www.napster.com. Padmanabhan VN, Wang HJ, Chou PA, Sripanidkulchai K. Distributing streaming media content using cooperative networking, in: Proceedings of the NOSSDAV, Miami, FL, USA; 2002, pp. 177–86. Pinho LB, Ishikawa E, Amorim CL. GloVE: a distributed environment for low cost scalable VoD systems, in: proceedings of the 14th symposium on computer architecture and high performance computing (SBACPAD), vitoria, ES, Brazil; 2002, pp. 117–124. Pinho LB, Ishikawa E, Amorim CL. GloVE: a distributed environment for scalable video-on-demand systems. Int J High Perfor Comput Appl (IJHPCA) 2003;17(2):147–61. Sheu S, Hua KA, Tavanapong W. Chaining: a generalized batching technique for video-on-demand, in: Proceedings of the international conference on multimedia computing and systems; 1997, pp. 110–117. Tran D, Hua K, Do T. Zigzag: an efficient peer-to-peer scheme for media streaming, in: Proceedings of the IEEE INFOCOM, San Francisco, CA, USA; 2003, pp. 1283–92. vTrails, http://www.vtrails.com.