A dependable multisource streaming system for peer ...

4 downloads 21523 Views 5MB Size Report
Jan 20, 2017 - IPTV and the Internet video streaming industries alike, due to its growing .... Content Delivery Network (CDN) has recently arisen as the main ...
A dependable multisource streaming system for peer-to-peer -based video on demand services provisioning Abdelhamid Nafaa, Baptiste Gourdin & Liam Murphy

Multimedia Tools and Applications An International Journal ISSN 1380-7501 Volume 59 Number 1 Multimed Tools Appl (2012) 59:169-220 DOI 10.1007/s11042-011-0755-8

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media, LLC. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication.

1 23

Author's personal copy Multimed Tools Appl (2012) 59:169–220 DOI 10.1007/s11042-011-0755-8

A dependable multisource streaming system for peer-to-peer -based video on demand services provisioning Abdelhamid Nafaa & Baptiste Gourdin & Liam Murphy

Published online: 8 February 2011 # Springer Science+Business Media, LLC 2011

Abstract In this article, we present the design, implementation, and analysis of a scalable VOD (Video On Demand) distribution architecture for IP networks. The focus of our work is on the underlying multisource streaming architecture upon which the P2P (Peer-to-Peer) based VOD services provisioning system relies. While multipoint-to-point multisource streaming is the core building block for a distributed VOD services provisioning system, it also introduces new reliability challenges as the streaming failure probability increases with the number of sources in a session. A major contribution of our work is the design of a suite of distinct yet complementary reliability/failover mechanisms that can be leveraged to improve the dependability of multisource streaming, and the viability of P2P-based VOD systems in general. Our work shows that the reliability/failover mechanisms can be arranged, combined, and alternated in advanced adaptation policies in order to deal with different conditions exhibited by the network. Another contribution of our work consists of implementing and assessing the performance of the different reliability mechanisms and adaptation policies in a real prototype system. We evaluate both the accuracy of streaming problems diagnosis, and the efficiency of the reliability mechanisms, in two adaptation strategies: one responsive to loss variation, and the other responsive to delay variation. Keywords Multimedia communication . Multisource streaming . Error control . VOD services provisioning . QoS . P2P communication

1 Introduction Video-On-Demand (VOD) service distribution is gaining unprecedented interest from the IPTV and the Internet video streaming industries alike, due to its growing success with This material is partially based upon works supported by the Science Foundation Ireland under Grant No 09/ SIRG/I1560. The work was also partially supported by the Enterprise Ireland VidAs project, Technology Development Commercialisation Fund, CFTD/07/203—VIDAS.

A. Nafaa (*) : B. Gourdin : L. Murphy School of Computer Science & Informatics, University College Dublin, Dublin, Ireland e-mail: [email protected]

Author's personal copy 170

Multimed Tools Appl (2012) 59:169–220

consumers as a preferred way to access video content. Unlike the rigid linear video services programming that is the norm in broadcast TV, the VOD on-demand content access model provides greater flexibility by allowing end-users to browse and consume video content in a non-scheduled way. Besides providing a VOD offering to meet evolving end-user requirements, it is now very important for a service provider to design scalable VOD streaming systems in order to provision video services in a cost-effective way, thereby achieving competitiveness in the marketplace. A peer-to-peer (P2P) based VOD streaming system is an ideal candidate to achieve high system scalability. In a P2P streaming system, the VOD streaming session requested by a given Set-Top Box (STB) is effectively provisioned via a multisource streaming session from other neighboring STBs. This scalable VOD streaming solution has the advantage of scaling naturally with the number of STBs (equivalently, end-users) active in the network. As mentioned earlier, designing a scalable VOD services distribution architecture is paramount for the viability of VOD services in an IP network context. At the same time, it is important to reduce the per-VOD-service cost. A P2P-based architecture is an appropriate candidate as the computing and bandwidth cost is pushed towards the network edge (at STBs). This allows the service providers to gain competitiveness in terms of delivery efficiency as they reduce the cost associated with maintaining costly back-end VOD servers. Gains in storage and bandwidth can be commercially exploited by passing on the savings to end-users through cheaper VOD service provisioning fees, higher quality experience, or larger video content library. A scalable VOD services distribution architecture based on P2P should rely on a dependable multisource streaming architecture able to manage all possible streaming issues inherent to IP networks. Clearly, while multisource streaming is an indispensable enabler for P2P-based VOD service provisioning system, it also brings in new streaming reliability challenges. Relying on multiple streaming sources in a VOD session unavoidably increases the streaming vulnerability and failure probability. From a service provider perspective, the reliability of the underlying multisource streaming is of extreme importance to roll-out viable and predictable P2P-based VOD services and ultimately guarantee existing QoS (Quality of Service) and other SLAs (Service Level Agreements) commitments in place. The first contribution of this work consists of designing and implementing a full-scale multisource streaming architecture based on IETF and industry-wide standards and best practices. We describe this novel multisource streaming architecture together with all necessary building blocks such as: a proprietary packet-level video file format tailored for optimized multisource streaming, a number of content pre-processing mechanisms that are designed to improve the reliability and efficiency of the multisource streaming, and a coordinated signaling architecture that manages the underlying resources in the P2P network. Another important contribution of this paper resides in the design and implementation of various and complementary reliability/failover mechanisms to overcome the most common streaming issues. These reliability and failover mechanisms are comprised of (i) adaptive error control protocols that dynamically adjust to network conditions and provide both preventive and reactive packet loss recovery, and (ii) integrated delay drift correction mechanisms, (iii) dynamic streaming load balancing among contributing peers, and (iv) dynamic switching and replacing contributing peers. The performance of each of these failover mechanisms is individually assessed with respects to the three most relevant performance metrics: efficiency, accuracy, and responsiveness. Further, we show that the different failover mechanisms can be combined and alternated in a high-level coherent

Author's personal copy Multimed Tools Appl (2012) 59:169–220

171

adaptation policy that monitors the network conditions before addressing in the most appropriate way. In the following, we first introduce, in Section 2, the main constraints related to designing a scalable VOD solution and review related research initiatives with a focus on realized system deployments. Section 3 describes P2P-based VOD provisioning system architecture which is thereafter the subject of our performance study. Section 4 is dedicated to the thorough description of our reliability and failover mechanisms for multisource streaming, while Section 5 discuss how the failover mechanisms can be combined in higher level policies and adaptation strategies to methodically diagnose and tackle common streaming problems. It first exposes what performances can be expected from a full-scale system deployment, and then evaluates failover mechanisms and adaptation strategies in different realistic test scenarios. Tradeoffs are highlighted in each case and guidelines given to better devise holistic adaptation strategies depending on the system deployment specificities and use case characteristics. Section 6 covers the experimental results obtained from the system prototype. Finally, we draw the main conclusions in Section 7.

2 Architectures for large-scale VOD distribution over IP networks Today, there are a number of examples of large-scale video distribution systems over IP networks with different network delivery models and different levels of complexity. The most relevant research and scientific works are discussed in the following. Most of research works on P2P streaming have been focused on the increasingly popular Live P2P streaming. Approaches such as AnySee [11] and PROMISE [7] were proposed to overcome the scalability issues that arise when delivering live streams to a very large number of receivers. These research works focus on building efficient multicast overlay among receivers to effectively distribute and share live video feeds in a large network. The issue is rather in establishing appropriate peering processes among nearby peers that forward to each other subsets of the received streams. If a receiver wants to access a given live stream, it joins the overlay multicast group associated with that live stream and start receiving complementary sub-streams from different other close by receivers (peers) that are part of the same overlay multicast (i.e., tuned to the same channel). This approach proved to be very scalable to support large number of receivers by locally distributing the load of serving a new receiver, instead of throwing the entire load on the original live streaming server. Several live streaming offerings such as Octoshape and Abacast [9] rely today on a peer-assisted live streaming architecture to off-load the streaming servers and scale up to support millions of users. AnySee system was reported to scale up to 60,000 users in a broadcast session over Internet, while CNN.com reported over 200,000 simultaneous streams thanks to the combination of the scalability of P2P streaming with the predictability of CDN servers. Although there exist many approaches today to perform P2P live streaming over IP networks, P2P VOD streaming is essentially a different problem as it involves streaming pre-encoded content and, as such, it adds the content availability dimension to the problem. With a limited uplink and storage capacities at STBs, it is important to design advanced models able to efficiently translate content popularity into content availability in the network. Additionally, the very design of VOD-targeted multisource streaming architecture has different requirements compared to a multisource streaming architecture targeted for a live streaming use case. In the case of VOD-targeted multisource streaming architecture,

Author's personal copy 172

Multimed Tools Appl (2012) 59:169–220

many additional mechanisms can be integrated into the content pre-processing in order to favor more efficient adaptation during the streaming phase. There are two main approaches to increase the scalability of a streaming server: (i) vertical scalability improvement that consists of employing caching techniques, content partitioning, and other hardware-related techniques to improve the server delivery capabilities, and (ii) horizontal scalability improvement that consists in employing a highly-available cluster of computers where the load entailed by serving VOD services is shared among different nodes (computers) [13]. Content Delivery Network (CDN) has recently arisen as the main approach to deliver large volumes of content over IP networks [2]. The content is cached in a distributed overlay network of nodes covering large geographic areas so as to accommodate large number of end-users [2]. The CDN approach can be considered as an advanced evolution of the horizontal scalability model, while the P2P (Peer to Peer) streaming approach can be seen as the extreme evolution of the horizontal scalability improvement model. The broad problem of VOD services provisioning in P2P architectures has been relatively well covered in the recent literature. In [1], authors address the issue of providing VoD services using P2P mesh-based networks. The focus is put on investigating scheduling techniques to make sure that content is progressively downloaded from other peers so that the receiver can instantly access and render the video content. Guidelines are given to build play-as-you-download P2P swarming systems with high playback rates and low start-up delays. P2VoD [5] introduces an interesting concept of caching combined with an applicationlevel multicast overlay tree. Each client in P2VoD has a variable-size FIFO buffer to cache the most recent content of the video stream it receives. P2VoD uses well the live streaming properties to satisfy the requirements of an instantaneous video on-demand access. The recently received content potions are indeed only temporarily cached at clients, which essentially means that the on-demand access is guaranteed only if the arrival rate of clients is steady and uniformly distributed in order to cover the needs of a given new arriving client with all portions being available in the P2P network when needed. This is a strong assumption that cannot be relied on to build a professional VOD service. Again, the predictability of such a VOD system is very uncertain as the service provider does not have control on the available bandwidth at each peer, not to mention the fact that the peers themselves are free to go offline at any time. In [8], authors give a very valuable insight on how P2P solutions may improve the scalability of the so-called “physical” or managed network. This view is contrasted with limitations of the so-called “cloud” or Internet—based P2P networks. By considering bandwidth availability of underlying links in a physical network, the authors argue that it is possible to improve P2P IPTV services by a factor of 3. A cost/benefit analysis of different pricing strategies is also provided for P2P IPTV services taking into account incentives that might be built-in the system and other models usually used to incentivize the content sharing. Many works have addressed the problem of content availability in P2P networks. In [3], the authors examine the factors that contribute to the variability in download time of a self-organizing P2P file distribution application. This study suggests that self-organizing P2P file distributions needs external/centralized help in order to provide QoS guarantees. Relying on the self-organization of the P2P network where the content is spread based only on its popularity over a large time scale is not an appropriate strategy to minimize the VOD rejection rate. The multisource streaming is not a new concept. It has been fairly covered by the relevant literature in multimedia communications. The multi-source streaming is usually

Author's personal copy Multimed Tools Appl (2012) 59:169–220

173

associated with scalable video coding (SVC) and multi-layer video streaming [15] where the video content is sub-streamed at encoding level. While the content substreaming at content level offers a higher resiliency against packet loss and delay, it is significantly less bandwidth-efficient compared to a packet-level content sub streaming. In [10]., the multisource streaming concept is further augmented with a Forward Error Correction (FEC) protocol in order to improve the reliability of P2P media streaming services. The work in [6] addresses the broad P2P media streaming problem space with a view of optimizing the system resources for an open Internet-based use case. The authors particularly investigate distributed solutions to the resource allocation problem in the P2P media streaming system with very little coverage of the multisource streaming aspect. It is proposed a distributed algorithm to inject the content fragments in the P2P network and incentivize peers sharing. This differs slightly from our approach to the P2P-based streaming in terms of content, injection, performance modeling, and resource allocation. More details on these aspects can be found in and [14]. Very much like our multisource streaming architecture, [16] introduces a receiver-driven adaptation mechanism that adjusts to the available bandwidth from each contributing peer. This approach relies on layered video content and TCP-friendly video streaming from each contributing peer. The receiving peer monitors the bandwidth available at each active contributing peer in order to determine the number of layers to be streamed by each contributing peer. This multisource streaming system is part of a large P2P overlay network for media streaming. The system is evaluated uniquely using simulations and does address the network condition changes from a simplistic strategy of stream layers allocation per contributing peer. Authors in [4] combine multicast with multisource streaming to design a P2P Live streaming system such as the one described above. The peers transmit their own streams and are able to share, with other active peers, part of video streams as they are received. This multisource streaming architecture is based on the Mutualcast concept where all active peers receive all video content available in the network. The focus of this work is rather on optimizing the coordination among peers to share part of the received streams. Using a related live P2P streaming use case, the authors in [12] derive proper peer connectivity to minimize bandwidth bottleneck; an efficient pattern of delivery is also designed to improve love content delivery and minimize the content bottleneck. Again, this work’s focus is rather on system dimensioning issues that we address in [14]. The reliability of multisource streaming against network uncertainty is not covered by this work. To the best of our knowledge, there are no existing works addressing the reliability of multisource streaming with in-depth analysis of all relevant QoS issues that might arise in realistic system deployments. The design, specification, implementation, and analysis of actual multi-source streaming architectures have been poorly covered by the relevant literature in multimedia communications. In this paper, we present a reliable multisource streaming architecture tailored for a large scale P2P-based VOD services provisioning over IP networks. It is designed based on IETF streaming standards and the video streaming industry best practices. This architecture includes several reliability/ failover mechanisms that have been designed and implemented to overcome different sorts of common streaming issues such as delay, loss, and jitter. Further, these reliability mechanisms can be combined in coherent adaptation policies that monitor all aspects of the serving multisource session and undertake corrective actions that can be escalated with the severity of streaming problems. Our architecture is validated though an implementation in a full-scale prototype.

Author's personal copy 174

Multimed Tools Appl (2012) 59:169–220

3 Overall VidAs architecture description Before describing the basic components of the VidAs system, it is important to first briefly describe the main characteristics of the video content fragmentation (sub-streaming) process, which is central to the peer-assisted VOD streaming architecture. The video content in VidAs is first transformed from a track-based video file format into a packet-level video file format ready for streaming.1 Afterwards, the aggregated packetlevel video file is further sub-streamed into several complementary sub-streamed files (content fragments); each sub-streamed video file is meant to be streamed by a different contributing STB to the requesting STB. The objective here is to reduce the contribution (in terms of bit rate) of each contributing STB so as to overcome the limited uplink capacities in asymmetrical broadband networks. The advantage of packet-level (in contrast with Frame-level) video content substreaming resides in the fact that the original video file would be split into sub-streams of packets with a, more or less, predictable constant data-rate for each sub-stream, leading to a more deterministic streaming system. As illustrated in Fig. 1, an original video stream is sub-streamed at packet level into different sub-streamed complementary video files. The couple (Start, Step) is used to uniquely identify a given sub-stream from a particular video content. The parameter Start represents the first RTP (Real-Time Transport Protocol)’s Sequence Number (SN) of a sub-stream, while Step represents the stride between successive RTP sequence numbers of packets belonging to the sub-stream. For instance, the sub-stream containing the first 20% of the packets in the aggregated video stream is identified by the filter start = 1/step = 5, meaning the sub-stream is composed of RTP packets with the SN = 1, 6, 11, 16, 21, etc. The packet-level video file fragmentation requires managing different aspects to allow for effective multisource streaming. The packets should be sequence-numbered and timestamped in the same space to allow the receiving STB to multiplex the different sub-streams into one original aggregated packet stream that is thereafter appropriately decoded and displayed (see Fig. 1). Therefore, both RTP’s [17] sequence numbers and time stamps are generated at the video fragmentation time with fixed starting RTP sequence number and time stamp so that the video content stay consistent as it spreads in the overall network upon subsequent VOD sessions completion. It is now clear that the entire concept of Peer-assisted VOD architecture for broadband networks is based on the multisource streaming that, in turn, relies on the sub-streaming of video content into complementary versions. Note that complementary sub-stream and content fragment will be interchangeably used in the remaining of the paper. In the following, we describe the overall VidAs network architecture, describing the role of all entities and the interaction among them. At the highest level of the VidAs architecture, the system comprises of three basic components: 1. SuperNode (SN): this is where the main intelligence in the system lies—it allocates resources (uplink bandwidth of contributing peers) for each incoming VOD request. In order to perform this, the SN tracks two main resources: (i) the currently available uplink bandwidth at each peer, and (ii) the content stored in each peer. When a given receiving peer requests a specific VOD content, a VOD request is sent from that latter 1

Note that the VidAs streaming system is fully compliant with the IETF standards for MPEG-4 Part 10 (resp., H.264) video fragmentation and streaming over IP networks [17]. It also complies with all ISMA (Internet Streaming Media Alliance) practices and recommendations for MPEG-4 video streaming over IP networks.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

175

40 ms

Original VidAs file before extraction mechanism

Part1

#1

#2

#3

#4

#5

#6

#7

#8

#9

40 ms

40 ms : Inter-image interval

Content = 100% / Step = 1 / First = 1 #10

#11

#12

Sub-streamed video content file after extraction (Video Part)

Part1

#1

#6

0 ms

#2

#7

40 ms

#3

#2

0 ms

#4

0 ms

#6

40 ms

40 ms

#8

#5

#1

0 ms

#3

0 ms

#5

Content = 50% / Step = 2 / First = 2 0 ms

#10

40 ms

#12

40 ms

0 ms

40 ms

#14

0 ms

#16

Part3

Content = 20% / Step = 5 / First = 5 #10

40 ms

#7

Part3

#14

80 ms

#15

80 ms

Sub-streamed video content file after extraction (Video Part)

Part1

Part3

Content = 20% / Step = 5 / First = 3

Sub-streamed video content file after extraction (Video Part)

Part1

Part3

Part3

#13

Sub-streamed video content file after extraction (Video Part)

Part1

#16

Content = 20% / Step = 5 / First = 2

#8

40 ms

#15

Content = 20% / Step = 5 / First = 1

Sub-streamed video content file after extraction (Video Part)

Part1

#14

#12

80 ms

Sub-streamed video content file after extraction (Video Part)

Part1

#13

Part3

Content = 50% / Step = 2 / First = 1 #9

40 ms

#11

0 ms

#13

40 ms

#15

Part3

Fig. 1 Packet-level video file sub-streaming

receiving peer to the SN. The SN looks up its database to determine the most appropriate set of contributing STBs that might stream complementary streams to the receiving peer. Clearly, the SN handles the VOD session initiation signaling, while the load of serving the actual video streams is handled by the contributing peers and eventual caches; 2. Peer: this can be a Set-Top Box or some other intelligence (device with computing capabilities) which typically sits close to a user’s viewing capabilities (TV, projector, and monitor). The peer is assumed to have significant storage capacities materialized by a hard disc that can contain sub-streams from different video titles of a video content library; upon sending a VOD request to the SN, a requesting peer will receive a response from the SN with a list of potential contributing peers that will provide complementary VOD sub-streams. The requesting peer then initiates real-time sub-streaming sessions with the contributing peers through RTSP (Real-Time Streaming Protocol) and RTP protocols. The RTSP protocol is essentially a signaling channel used to control the actual media streaming channeled through RTP. 3. Cache: this is a large data store. The cache can be considered as a passive peer (STB) in the sense that it can contribute to VOD sessions, but will never request a VOD session in the network. The main role of the cache is to offset the limited uplink bandwidth capacities available at STBs—the cache can also contributes to VOD sessions with much higher data volumes. Typically, in an Internet video

Author's personal copy 176

Multimed Tools Appl (2012) 59:169–220

streaming scenario, the caches will be located in a CDN provider’s network in order to ensure high availability and predictable performances. The caches can be hosted by the broadband operator if the peer-assisted VOD platform is deployed as part of an IPTV solution. The different components of the peer-assisted VOD architecture described above are depicted in Fig. 2. Again, in contrast to Fig. 1’s example, the video fragmentation into sub-streams may also generate sub-streamed files with balanced data volume. For instance, the video content may be split into: 10 complementary sub-streams, each containing 10% of the original video stream; or, 5 complementary sub-streams each containing 20% of the original video stream; or 2 complementary sub-streams, each containing 50% of the original video stream.

4 Failover mechanisms for a reliable multisource streaming In the following, we describe in details the adaptive reliability mechanisms integrated in our multisource streaming architecture. The multisource streaming reliability is articulated around three different failover mechanisms, namely delay drift correction, traffic load balancing among the contributing peers, and peer switching. Each one of these failover mechanisms will be thoroughly described in a separate sub-section below.

TV

TV

TV

TV

3rd Party access control STB

Credential Backend

15

20

1

Access Solu tion s

CIS C O

Supernode

FAULT

MODE

RESET

SELECT

ON

FAULT

MODE

RESET

SELECT

D X /T L E N A H C B

5

10

15

20

D X /T L E N A H C B TM

D

1

SY S TE M S

Access So lu tio n s POWER

STB

STB ON

TM

D

ILNE LAN

SELECT

D X /T L E N A H C B

10

X /R T A D

RX D T XD

MODE

RESET

TM 5

LIN E ALN

FAULT

D X /T L E N A H C B

1

S Y S TE MS POWER

RX D T XD

ON

ILNE L AN

SELECT

RXD TXD

MODE

RESET

RX D TXD

FAULT

ILNE LAN

STB ON

CIS C O

CIS C O

X /R T A D

5

10

SY S TE MS

15

20

TM

D

1

Acce ss So lutio ns POWER

X /R T A D

Contributing Peer (CP2)

Contributing Contributing v Peer (CP4) Peer (CP3)

CIS C O

50

SY S TE MS

15

120

D

Acce ss Solu tion s POWER

X /R T A D

Contributing Peer (CP1)

Legend TCP-based long-lived SN-to-STB session RTSP/RTP content sub-streaming session STB authentication session

Router Caching Router

Router Router

Scenario H.264 HD quality at 3.5 Mbps 10 STBs, each contributing at 250 Kbps 1 cache contribution at 1 Mbps

10

15

20

TV

S Y S TE MS

5

10

15

20

TV

Requesting Peer (RP)

50

15

120

Contributing Contributing Peer (CP8) Peer (CP7) ON

STB

FAULT

MODE

RESET

SELECT

D X /T L E N A H C B

TM

D

1

Access So lutio ns X /R T A D

TV

CIS C O

50

15

120

ON

X /R T A D

TV

STB FAULT

MODE

RESET

SELECT

CIS C O

SY S T E MS

5

10

15

20

FAULT

MODE

RESET

SELECT

STB D X /T L E N A H C B

TM

D

1

Acce ss So lutio ns POWER

Contributing Peer (CP6) ON

D X /T L E N A H C B

1

Access So lutio ns POWER

TM

D

SY S T E M S

X /R T A D

TV

CIS C O

5

15

20

ON

X /R T A D

TV

FAULT

MODE

RESET

SELECT

D X /T L E N A H C B

D

1

Access Solu tion s POWER

STB

TM

10

S Y S TE MS

LIN E LAN

SELECT

D X /T L E N A H C B

POWER

RXD TXD

MODE

RESET

SY S T E M S

LIN E LAN

STB

FAULT

1

CIS C O

RXD TXD

ON TM

D

X /R T A D

LIN E LN A RXD TXD

SELECT

Acce ss Solu tion s POWER

LIN E LAN

MODE

RESET

1

CIS C O

RX D T XD

Contributing Peer (CP9) FAULT

D X /T L E N A H C B

TM

D

Acce ss Solu tion s X /R T A D

STB ON

D X /T L E N A H C B

5

LINE

SELECT

1

POWER

LAN

MODE

RESET

SY S TE MS

RX D TXD

FAULT

LNE I LAN

STB RX D TD X

ON

CIS C O

LIN E L AN RXD T XD

Contributing Peer (CP9)

CIS C O

5

10

15

20

TM

D

S Y S TE MS

Access Solu tion s POWER

X /R T A D

Contributing Peer (CP5)

TV

Protocol View The receiving STB communicate with the SN to get a LocList (list of contributing peers) The receiving STB start 11 complementary and simultaneous RTSP/RTP sessions Real-time buffering and display at the receiving STB

Fig. 2 High-level view of the VidAs architecture

Author's personal copy Multimed Tools Appl (2012) 59:169–220

177

4.1 Delay drift correction When dealing with multiple real-time sub-streaming sessions it is important to make sure that all sub-streams received at the receiving peer are perfectly synchronized in terms of flow data rate. Synchronization is necessary so that the multiplexed (aggregated) substreams will always contain all necessary packets of the aggregated video stream to be played out. As illustrated in Fig. 3, the time gap between received packets and packets being played is monitored at video part level in order to detect any delay drift between substreams. This delay gap measures precisely the length of the buffered video data ready to be played. This buffer needs to be large enough to allow for possible failover actions in case of failure caused by the network or a contributing peer involved in the multisource streaming session. In the example described in Fig. 3, Part2 is running the risk of buffer underflow caused by accumulating streaming delays. This event should consequently trigger appropriate delay correction actions. The delay drift detection and correction are performed at video sub-stream level, although the consequences will be ultimately felt at the video part level. The information related to unsynchronized packets (too late or too early arrivals) are tracked from the packet descriptor list associated to each sub-stream; this allows the receiving peer to identify the sub-stream (contributing peer) causing the delay drift. The Content Guardian is the entity at the multisource streaming receiver side that oversees all the streaming operations including initiating the VOD session, preventing failures, detecting failures, and coordinating the failover mechanisms. Most of the intelligence integrated to the multisource streaming system lies in the Content

Peer 1

Peer 2

Peer 3

Peer 4

Peer 5

SubStream Reception System

SubStream Reception System

SubStream Reception System

SubStream Reception System

SubStream Reception System

Part1

Part2

Part3 Playing Head

Playing Head Delay Gap: Buffered video packets

Writting Head (transferring from buffer to file)

Aggregation

Video Player

Fig. 3 The case for the delay drift problem

Playing Head

Delay Gap: Buffered video packets

Writting Head (transferring from buffer to file)

Writting Head (transferring from buffer to file)

Delay Problem : Peers Too Slow for a given video Part

Content Guardian

Author's personal copy 178

Multimed Tools Appl (2012) 59:169–220

Guardian.2 Note that the delay drift can also be detected directly at higher level such as video part level (see Fig. 4). In our multisource streaming architecture, there is a hierarchical delay drift detection that starts at the Content Guardian level and is thereafter further extended to the Stream Guardian. A delay drift is first detected at content part level by the Content Guardian that identifies the sub-stream (resp. Stream Guardian) behind the delay drift using sequence numbers of late packets. At this point, the Stream Guardian handling the failing sub-stream in question is asked to provide detailed drift measurements from the Descriptor List. More details about how the delay drift is precisely estimated are provided in Section 6.2. The Fig. 4 gives more details on the process of delay drift detection and correction at sub-streaming session level. When the Stream Guardian detects a delay drift by observing a persisting delay gap increasing (resp., decreasing), it sends a request to the sub-stream sender (contributing peer) asking for speeding-up or slowing-down the video streaming pace for a fixed time period. We refer to this fixed time period as Delay Lag in case of a Slow-Down request or a Delay Lead in case of a Hurry-Up request. In case where the delay drift is caused by a slow streaming pace at contributing peer, which is observed through a shrinking delay gap at the receiving peer, then the receiving peer send a Hurry-Up request via an out-of-bound signaling protocol (RTSP, in our case) to the contributing peer. The Hurry-Up contain a delay lead period (in milliseconds) indicating to the contributing peer how much acceleration is needed. In practice, the contributing peer will slightly reduce the duration time (inter-packet interval) of several packets until an overall reduction equal to the delay lead period has been reached. In contrast, if the delay drift is caused by rapid streaming, detected through a too large delay gap, then the receiving peer will send a Slow-Down request to contributing peer, indicating a specific delay lag period to be applied. In this case, the duration time of several packets will be slightly increased till the overall delay increasing is equal to the delay lag specified in the SlowDown request conveyed through RTSP. It is worth mentioning that a delay drift is established if a given delay gap (buffer length), associated with a specific sub-stream, goes beyond a specific interval. For instance, if the delay gap drops below 3 s or increases over 7 s. As long as the delay gap is within these thresholds, the delay drift correction procedure is not triggered. Again, the delay drift detection can be achieved in the same manner, but at video part level by only tracking the delay gap at video part level. By using such a delay drift detection technique that is based on the receiver’s video consumption pace and a fixed target data buffer length, we avoid relativity issues that might arise when focusing on bringing all sub-streams buffer to a comparable level. Figure 28 in the Appendix Section shows the details of the delay drift correction operation that takes place at each sub-streaming session between the receiving peer and the contributing peer experiencing delay problems. In the following, each single step taking place in the delay drift correction procedure is detailed with references to the diagram illustrated in Fig. 28. After initiating the multiple sub-streaming sessions with all contributing peers (1), the receiving peer starts receiving (2) the sub-streams as explained in Fig. 3. Afterwards, the receiving peer detects a delay drift in the sub-stream transmitted by the 2 In the following, Stream Guardian and Content Guardian are sometimes interchangeably used. The multisource streaming intelligence is indeed implemented in a distributed/hierarchical with the Content Guardian overseeing all active sub-streams and coordinating possible failover actions, while the Stream Guardian is responsible of monitoring sub-streaming QoS performance and enforcing failover actions.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

SubStream Sending System

179

SubStream Reception System

3

Receiving Peer

1

Contributing Peer

RTP RTSP Inter-process interaction

2

Stream Guardian

Legend 1 : Streaming delays drift detected at the receiving peer 2 : Transmission of Hurry-Up or Slow Down message via RTSP to the Contributing peer in order to correct the streaming speed 3 : Contributing peer speeds-up (or slow-down) its streaming speed for a fixed time period

Fig. 4 Streaming delay drift correction

contributing peer CP5 (3); the delay detection can be performed using different monitoring techniques as revealed earlier. At this point, the receiving peer RP sends a delay correction request (Hurry-Up or Slow-Down) to the contributing peer CP5 (4) to inform this latter that a given delay lead (or lag) period have been observed and should be progressively corrected. The delay correction requests are sent from the receiving peer to the contributing peer through the RTSP protocol that has been augmented to support these new requests. The contributing peer CP starts correcting its transmission pace with the delay lead (or lag) period (5). This correction is smoothly spread over multiple packets to not cause a brutal spike in the streaming data rate. If this delay drift correction is not successful over the long run, and another delay drift appears, the receiving peer can start another delay drift correction. Upon several unsuccessful delay drifts corrections (6), the receiving peer sends a permanent delay drift correction to the contributing peer CP5 (7). Besides a delay lead (or lag) period, the receiving peer specifies also a time period over which the delay lead (or lag) correction should be applied. Simply put, if the receiving peer observes that the CP5’s substream accumulates 500 ms of lateness each 10 min, then the receiving peer asks CP5 to regularly apply a 500-ms delay lead correction after each 10 min of streaming, in a permanent way. This way, the number of individual delay correction requests will be reduced by trying to permanently correct the discrepancy in the clocks of the receiving peer and the contributing peer. The above described delays drift correction maybe as well tackled through a different failover mechanism if the delay issue grows bigger and/or persistent. A traffic load reduction of the contributing peer maybe performed in order to reduce the processing load of the failing contributing peer. This failover mechanism is called load balancing in the sense that it rebalance the streaming load between the contributing peers already active, in an effort to relieve the failing contributing peer. Finally, the failover mechanism of last resort will be to transfer the entire streaming contribution of the failing contributing peer to a new contributing peer provided by the SuperNode. This mechanism is referred to as peer switching. Both these two additional failover mechanisms are further described below.

Author's personal copy 180

Multimed Tools Appl (2012) 59:169–220

4.2 Contributing peers’ load balancing This failover technique consists in changing the contribution load of a subset of contributing peers to tackle a partial failure of a given contributing peer. Typically, if a contributing peer is experiencing a high loss rate, the receiving peer decides to reduce the streaming load of that contributing peer. The reduced streaming volume will be met by an increase of the streaming volume of one (or more) contributing peers. The rationale behind this is that the high loss rate experienced is likely to be caused by a network congestion which could be elevated by reducing the sub-streaming data rate of the failing contributing peer. As revealed earlier, the traffic offered load of a contributing peer (contribution) is uniquely determined by the couple (Start, Step). Start refers to the sequence number of the first packet to be streamed as part of the sub-stream to be provided by the contributing peer, while Step refers to the sequence number difference between two successive packets in the sub-stream to be provided by the contributing peer. The contribution level of a given contributing peer is determined by the ratio 1/Step. A peer’s contribution is also referred to with the term “filter” which reveals the way this is implemented in practice. In fact, while a contributing peer will parse the whole video Part and prepare itself to stream it; we use the identification of its contribution (Start, Step) to create a filter that control which packets get actually transmitted in the network. The filter transmits only the subset of packets that are part of the peer’s contribution AS identified by the couple (Start, Step). Figure 5 illustrates the main idea behind the load balancing concept. In this graphical example, there are 3 contributing peers with an initial contribution of one third (1/3) of the CP1 Contribution

CP2 Contribution

CP3 Contribution

Before : Start=1/Step=3 After : Start=1/Step=3 AND Start=8/Step=12

Before : Start=2/Step=3 After : Start=2/Step=12 AND Start=5/Step=12

Before : Start=3/Step=3 After : Start=3/Step=3 AND Start=11/Step=12

SubStream Reception System

SubStream Reception System

1

3

Content Guardian

Legend The Content Guardian orders the Load Balancing : 1 :Increase the Contributionon of Peer3 from 33% to 41% 2 : Increase the Contributionon of Peer1 from 33% to 41% 3 : Decrease the Contributionon of Peer2 from 33% to 18%

Fig. 5 Contributing peers’ load balancing

SubStream Reception System

2

Author's personal copy Multimed Tools Appl (2012) 59:169–220

181

aggregated video stream. Upon streaming problems (high loss rates or high delays) with the contributing peer CP2, the Content Guardian at the receiving peer executes a load balancing procedure to partially relieve the contributing peer CP2 from its load, and distribute the reduced load over the contributing peers CP1 and CP3. In Fig. 5’s example, CP2’s contribution is changed from 1/3 to 2/12, while CP1’s contribution changes from 1/3 to 5/ 12, and CP3’s contribution changes from 1/3 to 5/12. After receiving the load balancing request, the contributing peers CP3 and CP4 add a second filter to their first filter in order to increase their respective contributions accordingly. On the other hand, the contributing peer CP2 deletes its initial filter and use 2 new filters (Start = 2/Step = 12 and Start = 5/Step = 12). It is important to note that the Content Guardian is responsible of calculating the new filters before sending a load balancing request to the contributing peers. Like in the delay drift correction mechanism, the load balancing requests are sent via the RTSP protocol from the receiving peer to the individual contributing peers. In order to determine appropriate candidate contributing peers to increase their loads, the receiving peer uses the LocList sent by the SN to determine how much each contributing peer can contribute based on the level of content availability. In fact, the LocList sent in response to a VOD requests mentions the list of contributing peers, their exact contribution to the session, and the overall availability of the content at contributing peer’s hard drive. Figure 6 gives more details on the load balancing procedure with more focus on the RTSP signaling messages exchanged between the contributing peer and the receiving peer. Like with the delay drift correction procedure, most of the signaling messages between the receiving peer and the contributing peer are conveyed via the RTSP protocol. It is worth recalling that each sub-streaming session has an RTP, RTCP, and RTSP connections established between the contributing peer and the receiving peer. The RTP convey video

Contributing Peer

SubStream Sending System

SubStream Reception System

2

2

1

RTP RTSP Interaction

4

Receiving Peer

3

StreamGuardian

Legend 1 : New start/step couple is signalled to the contributing peer via out-ofbound signalling (RTSP protocol); the contributing peer updates its sending filter; 2 : The contributing peer aknowledges the load balancing operation via inbound signalling included in RTP FEC packets transmitted in the downstream to the StreamGuardian; 3 : The Sub-Stream Reception System Filters are updated – the FEC recovery system and retransmission system are also updated; 4 : The receiving peer acknowledges the completion of the load balancing operation via out-of-bound signalling using RTSP protocol

Fig. 6 Contributing peers’ load balancing—a protocol view

Author's personal copy 182

Multimed Tools Appl (2012) 59:169–220

packets in the downlink sense, while RTCP is used to send quality of service performance metrics in the uplink sense, and RTSP is used to initiate and control the sub-streaming session. Referring to Fig. 6, the receiving peer ends a load balancing request to the contributing peer via the RTSP channel. Afterward, the contributing peer applies the new contribution filter and sends an acknowledgement through in-bound signaling within the RTP data flow; the new filter confirmation is indicated in the headers of FEC packets, which are part of the RTP video sub-stream. At this point, the receiving peer updates its modules such as the FEC recovery system to reflect the new changes in its streaming contribution. A detailed view of the peers’ contribution load balancing procedure is illustrated in Fig. 29. First, the receiving peer requests a VOD content from the SuperNode SN (1), specifying the content title T. After checking its local database the SuperNode SN provides the LocList which contains the list of candidate contributing peers CP1, CP2, CP3, CP4, and CP4 along with their respective contributions towards the aggregated video stream. The LocList also contains information pertaining to the contributing peer, specifying all fragments (sub-streams) of the title T possessed by each contributing peer mentioned in the LocList. This information will allow the receiving peer to re-allocated the streaming load among the list of contributing peers in case of a failure event during the multisource streaming session. After initiating the multiple sub-streaming sessions with all contributing peers (3), the receiving peer starts receiving (4) the sub-streams as explained in Fig. 6. The receiving peer observes a persisting high loss rate (resp. high delay) in the sub-stream transmitted by the contributing peer CP5 (5); this detection is achieved through a continuous monitoring of the packet loss rate (resp. delay gap) at the receiving peer. Based on the previously received LocList, the receiving peer determines a new streaming load allocation in an effort to reduce the contribution of the contributing peer CP5 (6). This procedure consists in identifying existing contributing peers that can take some streaming load from the failing contributing peer CP5’s. The process of re-allocation the streaming load may involve an interaction with the SuperNode in certain cases, although it is more preferable to have this task independently handled at the receiving peer for responsiveness reasons. In either case, the receiving peer has to inform the SuperNode of the new streaming contribution received from each contributing peer active in the multisource streaming session—see the step (10) in Fig. 29 within the Appendix Section. In Fig. 29’s example, CP3 and CP4 are selected to share part of CP5’s streaming load. During the decision process (6), the receiving peer might eventually interact through messages exchange (using RTSP or other protocol) with the candidate contributing peer CP3 and CP4 to check if their respective available uplink bandwidth allow them to take on additional load from the failing contributing peer CP5. The receiving peer sends a load balancing request to CP4 (7) and CP3 (8) to increase their respective contribution. When, the receiving peer starts receiving the CP3 and CP4 sub-streams with an increased volume as requested, then it sends a load balancing request to the contributing peer CP5 to reduce its contribution with the same combined data volume increases at CP3 and CP4 (9). After successfully completing the load balancing procedure, the receiving peer sends and an update to the SuperNode (10). This update allows the SN to keep its records up to date in respect to the actual situation in the network. More specifically, the SN will flag the contributing peer CP5 as non-reliable for future VOD sessions, and reduce the uplink bandwidth available at the contributing peers CP3 and CP4 according to their recent contributions increase.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

183

4.3 Contributing peer switching The peer switching consists of replacing a badly performing contributing peer with a new contributing peer provided by the SuperNode. The peer switching is usually executed if prior failover mechanisms (such as delay drift correction, FEC redundancy increase, and contributing peer load balancing) have been unsuccessfully tried. It is used as a failover mechanism of last resort in our multisource streaming system since it cannot be performed entirely locally to the multisource streaming session. The involvement of SuperNode is, indeed, needed to complete a peer switching operation. Figure 7 illustrates the peer switching procedure where a badly performing contributing peer CP5 is replaced by two new contributing peers CP4 and CP6. The peer switching procedure can, indeed, replace a badly performing peer CP5 with one or several new peers to completely take over all CP5’s load. Figure 7 shows the peer switching procedure from multisource streaming perspectives, while Fig. 30 illustrates the peer switching procedure from a signaling point of view, detailing the messages exchange procedure. Referring to Fig. 7, the contributing peer CP5 starts showing failure signs during the course of the multisource streaming session with the receiving peer. The Content Guardian CG detects this by analyzing the loss rates and delays performances experienced by the substream transmitted by the contributing peer CP5. After trying different other types of failover mechanisms, the Content Guardian decides to proceed with a peer switching procedure. Prior to the peer switching procedure takes place, the Content Guardian reports the failure of the contributing peer CP5 to the SuperNode, and asks this latter for possible replacement contributing peer(s). In our example, the replacing contributing peers CP4 and CP6 will equally share the load of the failing contributing peer CP5. Also, a typical multisource session would involve more than three initial contributing peers.

Contributing Peer CP4

Contributing Peer CP5

Contributing Peer CP6

(replacement peer)

(peer to replace)

(replacement peer)

Contribution Requested: 1/2 First Packet to Stream: 32565

Contribution Requested: 1/1 First Packet to Stream: 1

Contribution Requested: 2/2 First Packet to Stream: 32566

SubStream Reception System

SubStream Reception System

SubStream Reception System

1

2

1

Legend:

Content Guardian

Fig. 7 Contributing peers’ switching—sub-stream level view

1 : Initialisation of new Peers with the new contribution filter (start/step couples and the packet to begin with) 2 : End the connection with the failing peer (Peer2)

Author's personal copy 184

Multimed Tools Appl (2012) 59:169–220

Upon receiving a list of possible replacing contributing peers along with their respective contribution to cover the failing contributing peer CP5’s streaming load, the receiving peer starts a sub-streaming session with the replacing contributing peers CP4 and CP6 indicating what packet sequence number to start with. More specifically, if the receiving peer has currently received the packet number 32566, the replacing contributing peer CP4 will be asked to contribute with a start = 32565/step = 2, while the replacing contributing peer CP2 will be asked to contribute with a start = 32566/step = 2. The objective here is to ask the replacing contributing peers CP4 and CP6 to resume the multisource streaming session from where it is left by the failing contributing peer CP5. Once the peer switching operation is successfully completed, the receiving peer will stop the sub-streaming session with the failing contributing peer CP5, and then report to the SuperNode that the peer switching has been successfully completed. Figure 30 (in the Appendix Section) illustrates the overall peer switching procedure from signaling and messaging exchange perspectives. First, the receiving peer requests a VOD content from the SuperNode SN (1), specifying the content title T. After checking its local database the SuperNode SN provides the LocList which contains the list of candidate contributing peers CP1, CP2, CP3, CP4, and CP4 along with the contribution of each towards the aggregated video stream. After initiating the multiple sub-streaming sessions with all contributing peers (3), the receiving peer starts receiving (4) the sub-streams as introduced earlier. As the contributing peer CP5 start failing (5), the receiving peer observes a persisting high loss rate (resp. high delay) in the sub-stream transmitted by the contributing peer CP5 (6); this detection is based on continuously monitoring the packet loss and delay at the receiving peer. After trying less intrusive failover mechanisms (7) such as peers’ contributions load balancing, the receiving peer asks the SuperNode for a peer switching procedure (8) to replace the failing contributing peer CP5. The SupeNode looks up its database to find a suitable replacing contributing peer that possess (i) the content portion that the failing contributing peer CP5 is providing and (ii) enough uplink bandwidth to sub-stream the latter content portion. The SuperNode send a peer switching reply (9) containing a candidate contributing peer CP6 to replace the failing contributing peer CP5. Note that the SuperNode SN may as well suggest multiple replacing contributing peers to replace failing contributing and share its initial contribution load. The receiving peer is now in position to initiate a sub-stream session with the replacing contributing peer CP6 (10). The sub-streaming session with the replacing peer CP6 is initiated from the current point of time in the aggregated streaming session such as illustrated in Fig. 7. When the receiving peer starts receiving the sub-stream from the replacing peer CP6 (11), it stops the ongoing sub-streaming session with the failing contributing peer CP5 (12). Once the peer switching operation is successfully completed the receiving peer confirms the success of the operation to the SuperNode. The SuperNode can then update its database as described earlier. It is worth recalling that the peer switching technique is performed only in case of major problems and when other failover mechanisms do not show success. The way all the failover mechanisms will be invoked and alternated to respond to multisource streaming problems will be detailed in the next sub-section. 4.4 Adaptation strategies The Content Guardian at the receiving peer use advanced policies to first diagnosis the problem and then employ or alternate between the three failover mechanisms introduced

Author's personal copy Multimed Tools Appl (2012) 59:169–220

185

above: delay drift correction, contributing peers’ loads balancing, and contributing peer switching. Delay drift correction is a lightweight operation and its associated cost and risk are very low. Speeding-up a contributing peer transmission rate might lead to an increased packet loss and higher delays when the uplink bandwidth is limited; in our case, the transmission rate speeding up by a given delay lead is stretched over a long enough time period so as to avoid aggressive data rate spike. The use of already established RTSP signaling channel allows a responsive reaction by Content Guardian and reduced traffic overhead. Contribution peer load balancing is more intrusive than the delay drift correction and less intrusive than the contributing peer switching. Most of the time, after performing a load balancing operation, we keep the same number of sub-streaming sessions, although their respective contributions change. All information needed for a load balancing operation are already available at the Content Guardian from the LocList initially received as a response to the VOD request. The load balancing can be applied without interaction/authorization from the SuperNode if the involved contribution increases at the candidate contributing peers CP is not very high, and below a specific threshold. Prior interaction with the SuperNode is required for a high increase in the contribution of a given contributing peer as this increase might sometimes go beyond the available uplink bandwidth. Contributing peer switching requires an explicit interaction with the SuperNode to receive the details pertaining to one or several potential replacing contributing peers. The SuperNode looks up its database to identify potential contributing peers that have the content part that is being contributed by the failing contributing peer; the SN further checks if the potential contributing peers possess enough uplink bandwidth to support this contribution. Peer switching is considered as intrusive and as a last-resort failover mechanism since it involves additional latency associated with the handshake exchange between the CP and the SN, in addition to the new RTSP/RTP sub-streaming session initiation with the replacing contributing peer. The newly initiated sub-streaming sessions need to be synchronized with ongoing sub-streaming sessions that are active between the receiving peer and the well-performing contributing peers. The sub-streaming sessions’ synchronization is achieved using the delay drift correction mechanism with Hurry-Up and Slow-Down requests. Depending on the nature of the multisource streaming problem, some of the above failover mechanisms are more efficient than others. We focus on resolving three main problems that usually arise with multisource streaming: poor link quality, network congestion, and overloaded contributing peer. a)

Poor link quality means that one or several physical network links are causing excessive packet loss along the path between a contributing peer and the receiving peer. This usually manifests itself through a high BER that leads to a high packet loss rate that significantly degrades the video quality. Usually poor link quality leads to a more or less constant packet loss, and delays that stay stable and don’t increase. Executing a load balancing operation won’t have much effect and the experienced loss rate (before FEC processing) usually stays stable and high. b) Network congestion appears when one or multiple nodes are congested along the path from a contributing peer to the receiving peer. Congestion will lead to an increase in both the loss rate and the delays. Ideally, the congestion is avoided by reducing the data rate along the congested path. c) Overloaded contributing peer happens when the processing power of a contributing peer is saturated, slowing down all running processes and the network transmission

Author's personal copy 186

Multimed Tools Appl (2012) 59:169–220

speed with it. This may lead to increased delays and high loss rates. It is important to reduce the processing load of the contributing peer to overcome this situation. Besides the above mentioned streaming problems, the individual sub-streaming sessions usually experience delay drifts that are due to unsynchronized clocks at the contributing peer and the receiving peer. These regular delay drifts are addressed by temporary and permanent delay corrections as discussed in Section 4.1. Additionally, network communications may exhibit random and transient packet loss not tied to a specific problem. These are addressed with the combination of two reliability mechanisms (retransmission and FEC). In essence, the Content Guardian at the receiving peer relies on performance measurement achieved at each received sub-stream to detect possible problems and undertake appropriate preventive and corrective actions to avoid the interruption of the multisource streaming session. The available measurements are essentially based on loss rate and delays monitoring. The loss rate maybe measured at different levels: (i) before recovering with FEC which essentially describes the loss rate exhibited by the network; (ii) after recovering with FEC which somehow measures the efficiency of the FEC scheme and rate being employed; and (iii) after retransmission which is the final loss rate experienced by the sub-stream that will impact video quality. The loss rate measured after retransmission is the one primary used by the Content Guardian to take preventive and corrective actions. However, if the loss rate before FEC goes beyond certain threshold (e.g., 10%), then the Content Guardian will also consider undertaking preventive and corrective actions; to reduce the experienced loss rate and save the extra bandwidth used by FEC and retransmission. An interesting approach employed in our system consists of averaging the above 3 loss rates and use the resulting averaged loss rate at the Content Guardian to evaluate the streaming performance of each sub-stream and its associated contributing peer. Like with the loss rate, the delays may be measured at different levels by the receiving peer; delay measurement could be performed at the aggregated video stream level, video Part level or sub-stream level. In our case we measure the delay at the sub-streaming session level such as introduced in Fig. 3. The delay measurement consists of measuring the gap between the contributing peer timing and the receiving peer timing. More concretely, the Content Guardian measures both the time elapsed between the reception of two given RTP packets and the RTP Timestamp difference between these two packets. This effectively reveals how synchronized are the contributing and receiving peers. It also shows if the substreaming in question is late or in advance.

5 Multisource streaming failover mechanisms coordination In the following two sub-sections we will present two adaptation policies meant to overcome the fluctuating network conditions and allow for a more dependable multisource streaming. The first adaptation policy is designed to tackle loss problems, while the second adaptation policy is dedicated to tackle delay problems encountered during the course of a multisource streaming session. Each of these adaptation policies is rather indicative of how an advanced streaming adaptation policy can be articulated around different failover mechanisms, relying on the monitoring of different streaming performance metrics at the multisource streaming receiver. One can indeed design different adaptation scenarios

Author's personal copy Multimed Tools Appl (2012) 59:169–220

187

reflecting different sensitivities to specific problems that might be sensed through network measurements and reporting. The objective in this section is to show how intelligence can be embedded in the multisource streaming system in order to systematically detect streaming issues, diagnose the most likely cause of the issue, and progressively undertake appropriate corrective actions. It is important to note that each sub-streaming session between the receiving peer and a contributing peer is independently managed by the Content Guardian since each substreaming session is subject to independent network conditions. Both the delay and loss streaming adaptation policies, introduced in the next two sub-sections, run concurrently and enforce corrective actions (i.e., failover mechanisms) in an independent way. Streaming performance metrics associated to each sub-streaming sessions are separately gathered and tracked to guide the diagnosis and corrective actions under the adaptation strategy or policy. Accordingly, the QoS adaptation and control in each sub-streaming session is managed by a specific set of adaptation policies independently of other ongoing sub-streaming sessions. 5.1 Adaptation strategy for high packet losses Figure 8 shows the adaptation strategy that is followed by the Content Guardian at the receiving peer to address persisting high loss rates with a given sub-stream. It clearly shows how the problem is progressively diagnosed and tackled in step-by-step basis. This state diagram is executed by the content Guardian for each active sub-stream between the receiving peer and a contributing peer. First, the Content Guardian moves from idle state to an active state if a new loss rate report arrives. It is worth recalling that the loss rate reports are conveyed from the contributing peer to the receiving peer through the RTCP channel. In our implementation, Wait Report

1

High Loss Rate?

no

no 5 2

yes

Lowered Loss Rate?

3

Stable Delays?

yes

6 no Increase FEC

7

Low FEC Level?

yes

8

Perform Load Balancing (Network Congestion)

9

no

no

Max Number of Load Balancing?

10

yes

Peer Switching (Link Quality Problem)

Fig. 8 Baseline adaptation scenario for increased loss problems

4

yes

Perform Peer Switching (Link Quality Problem)

Author's personal copy 188

Multimed Tools Appl (2012) 59:169–220

the loss rate is weight-averaged with past reports to reduce the volatility of the network conditions measurement without degrading the responsiveness of the system. Also, the loss rate measurement used by the Content Guardian in the decision process can be one of the three different types of loss rates measurement mentioned earlier: loss rate before FEC, loss rate after FEC, or loss rate after retransmission; one can also use any combination of the three loss rate measurements available at the CG. When a new loss rate report is received at Content Guardian, this latter checks the loss rate measurement in the report which smoothed over few past measurements. If the loss rate is lower than a pre-defined threshold, then the Content Guardian saves the loss rate measurement and don’t take any further action—it goes to idle state (1). At this point, the underlying error control mechanisms, i.e. combination of FEC and retransmission, dynamically adjust to tackle the spiking loss rate measurement. In other words, a high variation in the loss rate could be just due to a transient harsh network conditions that does not imply a persistent loss rate increase usually caused by congestion or failing link. This approach ensures that the Content Guardian does not engage failover mechanisms to counter transient harsh network conditions and end up causing counter-productive situation and unstable streaming system. If the measured loss rate is higher than a predefined threshold, then the Content Guardian proceeds to undertake further failover actions (2). At this point, the underlying error control mechanisms are producing very little results, so the persisting high loss rate needs to be tackled with different mechanisms. The most likely cause of a persisting high loss rate is either congestion or a bad link quality along the streaming path. It is worth mentioning that the predefined maximum loss rate threshold is set to 3% in our study, although it can be varied at deployment stage to accommodate specific streaming system requirements. If the measured loss rate is actually decreasing (3) compared to previous measurements, the Content Guardian further checks if the currently measured delays are stable compared to previous measurements. If the measured delays are stable (stay low), then the Content Guardian concludes that there is a potential problem of link quality along the path crossed by the sub-stream; here a contributing peer switching procedure is triggered. The rationale here is that a spike in the measured loss rate that is not accompanied with increased delays is probably caused by a bad link with a steady high BER (Bit Error Rate). In fact, this situation means that the spiking loss arte is being suppressed by the underlying error control mechanisms, while the delays stayed low and stable as a high BER along the path has no effect on delays. On the other hand, if there is a reduced loss rate but the delays are increasing then the Content Guardian proceed to the Idle state (5) and leave the adaptation strategy for high delays to deal with this situation; see the next sub-section for more details on the adaptation strategy used to counter high delays. After confirming that the loss rates are both high and persistent, the Content Guardian proceeds to check the level of FEC being transmitted (6). If the FEC redundancy level is low, then the Content Guardian orders an increase in the FEC redundancy level (7). Here, the FEC redundancy rate is compared to the loss rate measured before FEC; if the FEC redundancy rate is significantly lower than the loss rate measured before FEC then the FEC redundancy level is deemed too low. Note that usually the FEC redundancy rate is selfadjusting and it usually reacts dynamically to an increased loss rate without a need for external intervention from the Content Guardian. If instead the FEC redundancy level is already high (8), then the content Guardian considers using the peer contribution load balancing. The most likely cause associated with

Author's personal copy Multimed Tools Appl (2012) 59:169–220

189

a persistent high loss rate is either network congestion or link reliability problems if the delays are stable. At this point, the Content Guardian checks if a maximum number of load balancing operations has been reached on the concerned sub-stream. If a maximum number of load balancing operations has not been reached yet (9), then a new load balancing operation is triggered to reduce the contribution of the failing contributing peer. It is assumed here that the problem is most likely caused by a network congestion (resp. peer saturation) that can be alleviated by relieving the contributing peer with a reduced streaming volume contribution. On the other hand, if a maximum number of load balancing operations have been reached then the Content Guardian starts a peer switching operation (10); in this particular situation—persistent loss rate despite several load balancing actions—means that the path between the concerned contributing peer and the receiving peer contains a bad physical link or a severe congestion that can’t be overcome by reducing the contribution of the contributing peer. Link problem usually refer to a persistent failure along the network path that can be caused by a bad link, bad node, bad peer connection. The only solution to this problem is to completely switch contributing peers and replace the failing one with a new replacing contributing peer provided by the SuperNode. After performing a peer switching operation the Content Guardian ends this adaptation policy instance as the sub-streaming session is effectively ended with the peer switching operation. 5.2 Adaptation strategy for high delays Figure 9 shows the adaptation strategy that is followed by the Content Guardian CG at the receiving peer to counter a delay drift problem observed through a shrinking/increasing Wait report Hurry-Up or Slow-Down

New Delay Report

2 no

4

no Max Num. Permanent Delay Correct.? 6

Perform Load Balancing (Saturated Peer)

7

no

9

1

yes

5

no

yes

Already Tried L.B. 8

yes

New Delay Drift Problem?

yes

Improvement ?

10

no

Peer Switching

Fig. 9 Baseline adaptation scenario for increased loss problems

Max. Num. of Delay Correction ?

Permanent Hurry-Up/ Slow-Down (Clock Sync Issues)

3

yes

Author's personal copy 190

Multimed Tools Appl (2012) 59:169–220

buffer length (i.e., lowering/increasing delay gap). It clearly shows in a step-by-step basis how the problem is progressively diagnosed and faced. Again, this state diagram is executed by the content Guardian for each active sub-stream. First, the Content Guardian moves from idle state to an active state if a new delay report arrives. When a new delay report is received at the Content Guardian, this latter check the delay measurement in the report which reflects the last measured delay gap of the substream being monitored. If the measured buffer length is beyond a max or min the thresholds, as discussed in Fig. 3, then the Content Guardian concludes that there is a delay drift problem. Again, the delay drift could be either caused by unsynchronized clocks at the contributing peer and the receiving peer. If the Content Guardian establishes that there is a delay drift (1), it further checks how many delay correction operations (Hurry-Up or Slow-Down) have been performed on that specific sub-stream and the corresponding contributing peer. If the maximum number of delay correction operations has not been reached (2), then the Content Guardian perform an additional Hurry-Up or Slow-Down operation depending if the problem is in fact caused by a an advance or lateness in the streaming pace, respectively. The Content Guardian moves then to the idle state. The delay correction requests are sent via the RTSP control channel, specifying the delay correction (in milliseconds) to be applied. The delay correction is called delay lead in the case of a HurryUp request, or called a delay lag in the case of a Slow-Down. The contributing peer acknowledges the reception of the delay correction request and provides a further notice to the receiving peer when the correction has been entirely and successfully enforced. If the maximum number of delay drift corrections have been reached, then the Content Guardian asks the failing contributing peer to apply a permanent delay correction which specifies a time period over which a specific delay lead (for Hurry-Up) or delay lag (for a Slow-Down) should be applied (3). The Content Guardian moves then to the idle state. The maximum number of delay corrections is configurable and can be expressed in terms of number of corrections or in terms of absolute delay in milliseconds. The permanent delay drift correction is applied to reduce the number of one-off Hurry-Up/Slow-Down correction requests sent by the receiving peer to the failing contributing peer. The corresponding delay lead (resp., lag) in the permanent Hurry-Up (resp. Slow-Down) request is determined based on past trends on the evolution of the sub-stream’s buffer length. The most common problem that a permanent delay drift correction is devised to address is the one of unsynchronized clocks between a receiving peer and a contributing peer. Instead of regularly sending delay corrections, the receiving peer factors in all the delay corrections order from the start of the substreaming session and then issue a permanent delay correction that will be constantly enforced at the contributing peer to overcome the estimated clocks de-synchronization. Clearly, the receiving peer’s clock should be used as a reference in our multisource streaming system since this is where the video content is being rendered; most importantly it is a way to synchronize all the contributing peers around a the streaming consumption pace at the receiving peer. If the new delay report shows that the delay drift is persisting through several past measurements (4), then the Content Guardian check whether the maximum number of permanent delay drift corrections has been reached or not. If the maximum number of permanent delay drift corrections has not been reached yet (5), then a new permanent delays drift correction is performed. Otherwise, if the maximum number of permanent delay drift corrections has been reached (6), the Content Guardian checks if the load balancing has been tried before with the sub-stream associated with the failing

Author's personal copy Multimed Tools Appl (2012) 59:169–220

191

contributing peer. In our particular implementation, we tolerate up to two permanent delay drift corrections before moving on to use more radical corrective actions to tackle the persisting delay problems. If the load balancing mechanism has not been tried yet to reduce the streaming load of the failing contributing peer, then the Content Guardian performs a load balancing operation (7), and then move to the idle state. It is assumed here that the problem is most likely caused by a saturated or overloaded contributing peer; a network congestion problem is unlikely here because the persistent delay increases are not accompanied with high loss rates. If this situation was, in fact, caused by a congestion, then increasing repeatedly the transmission pace would end-up causing buffer overflow at the congested node along the path and results in packet loss. These high loss rates would in turn be tackled by the adaptation policy for high loss rates as introduced in Fig. 8 and the above sub-section. Finally, if a load balancing operations has been reached, then the Content Guardian check whether an improvement of the delay drift has been observed compared to past delay measurements, or not (8). At this point, if an improvement has been observed, then the Content Guardian performs an additional (final) load balancing operation (9) as this seems to be helping in improving the situation. Finally, if no improvement has been achieved by the last load balancing operations, then the Content Guardian perform a peer switching operation (10). In this case, the most likely cause of the problem could be severe network congestion or unstable and extremely overloaded contributing peer. After performing a peer switching operation the Content guardian ends this adaptation policy instance as the sub-streaming session is effectively ended with the peer switching operation. 5.3 Optimization and interworking of delay/loss adaptation policies The streaming video quality and the underlying codec-integrated resiliency techniques have different sensitiveness to different QoS performance metrics such as delay or loss. It is, therefore, important to address both the delay and loss issues through separate adaptation policies in order to independently address each QoS performance metric according to its potential impact on the video streaming quality. Additionally, certain network/streaming problems can only be tackled by specifically addressing either the delay or loss problem. For instance, a bad link quality with a high BER can only be detected by monitoring the packet loss rate, while a saturated contributing peer or a light congestion will be detected by monitoring the delay variations at the receiving peer. By using separate adaptation policies it is possible to independently tune each adaptation policy to fit either (i) a specific audiovisual stream with a specific loss and delay resilience, or (ii) a specific deployment scenario with different end-to-end delays and different number of nodes along the streaming path. On the other hand, we identified some situations where there is a close complementarities and interdependency between the two QoS adaptation policies introduced earlier. Particularly, simultaneous measurement reading of both QoS performance metrics (delay and loss) could improve the network failure diagnosis process leading to more accurate counteractions. A network congestion can be diagnosed in a straightforward manner if the receiving peer measures increasing delay and loss rate at the same time on a given sub-streaming session. The following Section, will describe the behavior of each adaptation strategy, highlighting possible improvements that can be introduced to the adaptation strategies in order to accommodate a specific deployment scenario.

Author's personal copy 192

Multimed Tools Appl (2012) 59:169–220

6 Experimental results In the following, we thoroughly describe the way the different failover mechanisms work and how these are combined and alternated in meaningful and coherent adaptation strategies against loss and delay variation. We also highlight all relevant adaptation strategies’ parameters that can be adjusted to fit different system use cases. The objective here is to highlight the flexibility of the adaptation strategies instead of designing an optimal one that fits a particular deployment’s requirement. The following sub-sections also focus on thoroughly assessing the performances of the failover mechanisms in terms of efficiency, responsiveness, and accuracy. We use different test scenarios with a broad range of varying network conditions to test the dependable multisource streaming architecture in a realistic experimental test bed. All the different test scenarios introduced in the following subsections are done on the same test bed configuration, which involves four contributing peers (Peer I, Peer II, Peer III, and Peer IV) in addition to a Receiving Peer where most of the measurement and logging is performed. All involved peers, contributing and receiving, have the same hardware specification of a PC with an Intel Core Duo CPU T9300 with 2.50 GHz, and 4.0 GB of RAM. An important observation worth mentioning is that the processing power at receiving and contributing peers is high enough to handle the different tasks of streaming adaptation, error control, and retransmission. Any streaming performance quality drift or degradation is the consequence of the network conditions variation that we implicitly induced using Ntework Emulator (NetEm). 6.1 Performance assessment of loss adaptation strategy In this section, we focus on conducting several tests to assess the performances of the loss adaptation strategy from different angles in order to ultimately highlight important performance aspects such as responsiveness, efficiency, and accuracy. As discussed earlier, the Content Guardian relies on different loss rate measurements to (i) react to a given situation, (ii) undertake the most suitable corrective measures, and finally (iii) appropriately calibrate the adaptation mechanisms. Any adaptation to an increasing loss rate is triggered by the measurement of an excessive loss rate that is well above a pre-configured maximum threshold (MaxThr). If the measured final loss rate (FLR) is over the pre-defined threshold (MaxThr), then an adaptation action is deemed necessary at the receiving peer. The final loss rate (FLR) is a combination of three different loss rates measured at different levels: loss rate before FEC processing (LrBF), loss rate after FEC processing (LrAF), and loss rate after retransmission (LrAR). The final loss rate that triggers the adaptation is calculated following the formula (1): FLR ¼ a  LrBF þ b  LrAF þ l  LrAR; with a ¼ :33; b ¼ :33; and l ¼ :33

ð1Þ

The three basic loss rate measurements (LrBF, LrAF, and LrAR) are each time smoothed over the past three measurements in order to avoid over-reaction to transient bad network conditions. This essentially means that we average each loss rate measurement with the past two measurements in order to have a stable measurement variation and, at the same time, take into account historic measurements. The measurement smoothing mechanism ultimately helps temper the adaptation mechanisms aggressiveness by ignoring transient extreme swings in network conditions.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

193

Based on specific deployment situation of the reliable multisource streaming system, one can give a different relative importance to each one of the three loss rate measurements used in the calculation of FLR. Intuitively, the final loss rate calculation should be biased towards the loss rate after retransmission (LrAR) if the audiovisual content being streamed does not use a loss-resilient codec and is consequently very sensitive to packet loss. This way, a slightly high LrAR would immediately trigger countermeasures to overcome an increasing loss rate. On the other hand, one can avoid using the LrAR for a higher system responsiveness since the calculation of LrAR for a given period lags the other loss rate measurements (LrBF and LrAF) as it takes few more RTTs (several retransmission attempts) before a packet is deemed lost. Note that the maximum threshold (MaxThr) that the FLR is compared to is also adjustable to accommodate a sensitivity of a specific video codec or to reflect a specific network behavior. In our experiments the maximum threshold is set to 3% of packet loss, but it is adjustable as part of the overall loss adaptation strategy customization. Upon detecting a an abnormal loss rate and establishing that a possible congestion is affecting one of the received sub-streams, the Content Guardian tries to assess the severity of the congestion and precisely quantify the bandwidth available along the path affected by the congestion. At this point, the Content guardian asks the concerned contributing peer (through a load balancing procedure) to reduce its streaming contribution to fit the available bandwidth along the path; the streaming load reduction is an estimation based on the characterization of the achievable throughput and adjusts the sub-streaming volume to conceal the experienced loss rate. One or multiple other contributing peers active in the multisource streaming session are then asked to increase their streaming contribution to make up for the streaming reduction described above. The Content Guardian uses the loss rate measured before FEC processing (LrBF) to calculate the amount of streaming contribution reduction—see formula (2). The LrBF give an unbiased view of how much throughput is achievable on the network as it is calculated before recovering with FEC redundancy or retransmission. Contribution Reduction ¼ LrBF »Current Contribution

ð2Þ

The formula (2) drives the load balancing procedure by accurately ordering the affected contributing peer to reduce its streaming contribution with the appropriate streaming volume in line with what can be carried along the path between the receiving peer and the failing contributing peer. The LrBF measurement needs to be very accurate here. The accuracy here is essential in minimizing the number of successive load balancing procedures at the failing contributing peer while favoring high network utilization by maximizing the streaming volume of the failing contributing peer. 6.1.1 Test scenario In this test scenario, we emulate a network congestion occurring along the path crossed by one of the sub-streams received by the receiving peer. This is achieved through NetEm network emulator that allows reducing the available bandwidth at a specific point in time. While network congestions are usually caused by one (or multiple) buffer overflow(s) in the router(s) along a specific network path, the resulting effects are very much like the ones provoked by a progressive bandwidth reduction. The network congestion is, in fact, perceived by end-point protocols as a sharp reduction in the bandwidth causing high clustered packet loss. Note that network congestion is also typically characterized by a

Author's personal copy 194

Multimed Tools Appl (2012) 59:169–220

steady build up in the end-to-end delay. The delay-related streaming problems diagnosis and tackling will be addressed in a separate adaptation strategy presented in Section 6.2. In the following, we will evaluate different aspects of the loss adaptation system such as accuracy of the QoS measurement system, responsiveness of the overall adaptation system, and efficiency of individual failover mechanisms. In this test scenario both Peer I and Peer II contribute in the multisource streaming session to the tune of 33% each, while Peer III and Peer IV provide each a contribution of 17%. More precisely, Peer I, II, III, and IV contribute respectively to the multisource streaming session with the following Start/Step filters 1/3, 2/3, 3/6, and 6/6. During the multisource streaming session a severe congestion is introduced along the path crossed by Peer IV’s sub-stream. We assess the performance of the overall system in front of two levels of congestions, a severe one provoked by limiting the bandwidth to 120 Kbps and a less severe one provoked by limiting the bandwidth to 180 Kbps. In the following section, we present two sets of performance evaluation results. The second set of results shows the improvement of the loss adaptation strategy by using better QoS performance metrics measurement and problem diagnosis mechanisms. This concretely consists of using more data points to smooth the measured loss rate before FEC (LrBF) which allows better characterization of the network conditions and hence better efficiency in dimensioning the load balancing operation. 6.1.2 Experimental results analysis In the following, we thoroughly investigate the behavior of the adaptation strategy mentioned above and identify possible. Figures 10 and 11 show how the load balancing is achieved in real-time after a bandwidth reduction to 120 Kbps and 180 Kbps at the Peer IV; obviously, limiting the bandwidth to 120 Kbps has more severe consequences than limiting Peer Contribution in VOD Streaming Session 120 Kbps reduction in bandwidth 35

Content providing (%)

30

Load Balancing Operations

25

20

15

Peer I (192.168.0.156)

10

Peer II (192.168.0.203) Peer III (192.168.0.134)

5

Peer IV (192.168.0.204)

0 0

50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000

Time (ms)

Fig. 10 Successive load balancing procedures—120 Kbps reduction in bandwidth

Author's personal copy Multimed Tools Appl (2012) 59:169–220

195

Peer Contribution in VOD Streaming Session Bandwidth reduced to180 Kbps 35

30

Load Balancing Operations

Streaming Load (%)

25

20

15 Peer I (192.168.0.156)

10

Peer II (192.168.0.203) Peer III (192.168.0.204)

5

Peer VI (192.168.0.134)

0 0

50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000

Time (ms)

Fig. 11 Successive load balancing procedures—180 Kbps reduction in bandwidth

the bandwidth to 180 Kbps. Figures 10 and 11 represent the instantaneous received streaming volume from each contributing peer as measured by the receiving peer. At the beginning, the Peer I provides 33% of the stream, the Peer II provides another 33% while Peer III and Peer IV provides 17% each. As illustrated in Fig. 10 when the bandwidth is limited to 120 Kbps, the Content Guardian detects the problem and then engages a load balancing procedure at t=32.05 s. At the same time, the instantaneous measured LrBF increases very aggressively to around 60% (see Fig. 14). This procedure is completed at t=32.85, at which point the Peer IV is successfully requested to coarsely reduce its streaming contribution. It takes roughly 800 ms to complete the procedure. Based on the measured LrBF along the path crossed by Peer IV’s sub-stream, the load balancing procedure results in increasing Peer III’s contribution to 18% and decreasing Peer IV’s contribution to 14%. As the first load balancing procedure fails to reduce the experienced loss rate before FEC recovering (LrBF) another, more aggressive, load balancing procedure is executed at t=35.15 and completed at t=36.45. This results in increasing Peer III’s contribution to 28% and the reduction of the Peer IV’s contribution to 4%. On the other hand, when the bandwidth is reduced to 180 Kbps, the measured LrBF is less alarming with a peak loss rate of 43% as illustrated in Fig. 15. At this point, the Content Guardian gradually overcomes this congestion situation via three successive load balancing procedures engaged at t=31.35 s, t=34.20 s, and t=37.60 s and completed at t= 32.15 s, t=35.20 s, and t=38.70 s, respectively. Clearly, the first load balancing procedure is very smooth due to the fact that the used LrBF is averaged over time periods before and after the bandwidth reduction. The second load balancing procedure relies on an LrBF measured after the first bandwidth reduction, which leads to a higher bandwidth reduction at the failing peer (Peer IV). Finally, the third load balancing procedure represents some sort of final adjustment to completely eliminate the residual loss rate (see Fig. 15).

Author's personal copy 196

Multimed Tools Appl (2012) 59:169–220

While the overall time spent with the adaptation for the 180-Kbps-reduction scenario is much larger than the time spent during the adaptation with the 120-Kbps-reduction scenario, the adaptation is much more accurate with the former test scenario. In fact, the FEC redundancy transmission has been completely eliminated with the 180-Kbps-reduction scenario while a residual loss rate keeps the FEC redundancy transmission active in the 120-Kbps-reduction scenario. Figures 12 and 13 illustrate the instantaneous bandwidth consumption at the Peer IV when the bandwidth is reduced to 120 Kbps and 180 Kbps, respectively. The instantaneous bandwidth consumption is broken down into the bandwidth consumed by the video stream, the FEC packets transmission, and the retransmission packets. The total bandwidth consumption is each time given. It is clearly shown how the bandwidth consumed by retransmission soars with the spiking high loss rate when the FEC redundancy transmission is at a reduced level and cannot recover the lost packets. At this point, the FEC redundancy transmission will progressively increase to tackle the increasing loss rates which increase the packet loss recovery rate, and thus reduce the number of retransmission requests. It can be observed from Fig. 12 that after the first load balancing procedure, there was still a significant residual loss rate that kept a high level of retransmission and FEC redundancy transmission. Obviously, the Content Guardian over-reacted with the 120-Kbps-reduction scenario through a steep streaming reduction change during the second load balancing procedure; also there were only two load balancing procedures executed, as opposed to three load balancing procedures when the bandwidth is reduced to 180 Kbps. Again, this is explained by the fact that the measured LrBF was quite understating the current network condition when the Content Guardian was calculating the parameters for the first load balancing procedure. The fact that the LrBF measurement is smoothed over the three past measurements lead to the dilution of the last measurement report with the two first ones that showed no loss rate. As illustrated in Fig. 14, at t=31.85 s when the content guardian Instantaneous Bandwidth Consumption (Peer IV) Bandwidth reduced to 120 Kbps 500

Video Stream FEC

450

Retransmis sion Total Data Rate

400

Bandwidth (Kbps)

350 300 250 200 150 100 50 0

1

21 42 63 83 104 124 145 165 186 206 227 247 268 288 309 330 350 373 395 417 439 461

Time (sec)

Fig. 12 Instantaneous bandwidth consumption—120 Kbps reduction in bandwidth

Author's personal copy Multimed Tools Appl (2012) 59:169–220

197

Instantaneous Bandwidth Consumption (Peer IV) Bandwidth reduced to 180 Kbps 600 Video Streaming FEC 500 Retransmission

Bandwidth (Kbps)

Total Data Rate 400

300

200

100

0 1

22 42 63 83 104 124 145 166 186 207 228 248 269 289 310 330 351 372 393 415 436 457

Time (sec)

Fig. 13 Instantaneous bandwidth consumption—180 Kbps reduction in bandwidth

Loss Rate Measurements (LrBF vs. LrAF vs. LrAR) Bandwidth reduction of 120 Kbps 70 LrBF LrAF

60

LrAR

Loss Rate (%)

50

40

30

20

10

0 1

101

201

301

401

Time (x500 ms)

Fig. 14 Loss rate measurements—120 Kbps reduction in bandwidth

501

601

Author's personal copy 198

Multimed Tools Appl (2012) 59:169–220

engaged the first load balancing procedure in the 120 Kbps-reduction scenario, the LrBF measurements were showing a 12% loss rate which lead to a streaming contribution reduction of only 2% of the aggregated video stream (i.e., a 12% reduction in Peer IV’s contribution). Afterwards, the measured LrBF spiked to 32% in t=31.35 s, then 38% in t= 31.60 s, and finally peaked at 58% in t=31.90 s. By the time the first load balancing procedure was completed in t=32.85, the LrBF measurement was showing an average loss rate of 33% at t=32.85 s, 40% at t=33.25 s, 45% at t=33.50 s, and 50% at t=33.65 s. When the second load balancing procedure was engaged in t=35.15 s, the measured LrBF was oscillating between 50% and 31% which lead to an aggressive Peer IV’s contribution reduction during the second load balancing operation. In contrast, the measured LrBF with the 180 Kbps-reduction scenario is much more reduced with a maximum of 33%. During the first load balancing procedure at t=31.35 s, the measured LrBF shows a packet loss percentage oscillating between 6%, 10%, and 14%. This has accordingly provoked a conservative 10% reduction in Peer VI’ current streaming contribution which corresponds to a new contribution of 15%, instead of 17% initially. Afterward, during the second load balancing procedure at t=34.20 s the measured LrBF shows 30% packet loss, which consequently lead to the reduction of Peer IV’s streaming contribution to 10%—this roughly corresponds to a 35% reduction compared to the last level of streaming contribution of 15%. Finally, during the last load balancing procedure at t=37.60 s, the measured LrBF was showing between 6% and 11% of packet loss, which caused the Content Guardian to further correct Peer IV’ contribution with an additional 10% from the current contribution level, bringing the final Peer IV’s contribution to 9%. There is a certain randomness element attached to the way a network congestion manifests itself in terms of measured loss rates over the time. Although the congestion usually leads to fixed amount of reduction in the bandwidth, an important disparity in the measured loss rate can be experienced due to the very nature of buffer overflow and random packet drops occurring during a congestion event. In fact, depending on how the offered load adjusts to a reduction in the available bandwidth, the congestion may have different consequences on the packet flows in terms of loss rates. Obviously, if the flows are responsive (e.g., TCP or TFRC) the network congestion will be short-lived with minor consequences. As discussed above, the sub-streaming session can be affected in a non-uniform way leading to different loss rates over the time. In the 120-Kbps-reduction scenario, we observed that at the beginning of the network congestion the sub-streaming session was not badly affected by the congestion and the first load balancing lead to a small reduction in the streaming contribution due to the fact that measured LrBF was showing only a 12% loss rate. However, the packet loss rate as perceived by the receiving peer has spiked afterwards to reach a peak of 58% which lead to fairly aggressive and clearly inaccurate reduction of Peer IV’s streaming contribution. This load balancing dimensioning problem was caused by the fact that as the first load balancing is engaged the past three LrBF measurements show a large discrepancy with the two first ones indicating values close to zero. This causes the Content Guardian to fail to fully capture the current network conditions, which unavoidably lead to an under-sized first load balancing procedure that falls short to eliminate the incurred LrBF. One can improve the adaptation accuracy for a slightly deteriorated responsiveness by only considering LrBF measurements showing an increasing loss rate. This way the load balancing operation will be dimensioned based on LrBF measurements achieved after the congestion event. This should particularly improve the performances of the first load balancing operation.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

199

It is important to keep the number of consecutive load balancing procedures as low as possible as this affects the adaptation responsiveness and how long the sub-streaming session will be subject to packet loss with possible effects on the overall video quality. At the same time, one should avoid aggressively reducing the streaming volume of the contributing peer concerned by the network congestion; this would underutilize the network path between the failing contributing peer and the receiving peer and put too much of a burden on the other contributing peer taking the extra streaming load. Reducing the LrBF measurement smoothing period (i.e., using just the last report or two) would lead to an even higher instability in the system that will likely over-react to transient deteriorating network conditions and trigger load balancing procedures when the temporary increase in loss rate can be tackled by the adaptive FEC and opportunistic retransmissions. On the other hand, increasing the LrBF smoothing period would increase the load balancing accuracy while significantly deteriorating the system responsiveness against quickly changing network conditions. While it is very important to keep the LrBF measurements smoothed over the past 3 reports in order to appropriately dimension the first load balancing procedure and avoid too much instability in the system performances, one should use as many LrBF measurement data points as possible during the second load balancing procedure. Using all LrBF measurement reports from the completion of the first load balancing to the engagement of the second load balancing procedure would greatly increase the accuracy of the network conditions characterization and lead to a more accurate second load balancing procedure. This will help avoiding a third load balancing procedure in certain cases. Based on the observations above, we implemented a dynamic load balancing dimensioning approach that uses the last 3 LrBF measurement reports for the first load balancing procedure, and all reports received between the first and the second load balancing procedures to dimension the second load balancing procedure. In the same manner, all LrBF measurements performed between the second load balancing procedure and the third load balancing procedure are used to dimension that latter. This approach is referred to as dynamic smoothing (Fig. 15). Figure 16 shows clearly that the dynamic smoothing has greatly improved the performances of the load balancing. The first load balancing procedure reduces Peer IV’s contribution from 16% to 14% just like with the 3-reports-smoothing approach introduced above. This is to be expected since the dynamic smoothing approach still uses the LrBF measurement averaged over the last three reports in order to dimension the first load balancing. The improvement is particularly noticeable when it comes to the second load balancing procedure. The improved efficiency in characterizing the network conditions through several LrBF measurements has indeed led to an improved load balancing accuracy, reducing Peer IV’s contribution from 14% to just 6%. At the end of the second load balancing procedure Peer IV’s bandwidth consumption is around 80 Kbps with the dynamic smoothing approach, instead of 50 Kbps with the regular approach. This indeed greatly improves the underlying network resources utilization (see Figs. 12 and 18). On the other hand, Fig. 17 shows that there are still three consecutive load balancing procedures for the 180 Kbps-reduction scenario when using the dynamic smoothing approach. However, these successive load balancing procedures are much more accurate reducing the contribution volume of Peer IV with just what it is needed to eliminate the experienced loss rate. After the three load balancing procedures, Peer IV’s contribution is reduced to 12% with the dynamic smoothing approach, instead of 9% with the regular approach. This translates into an additional bandwidth utilization of over 50 Kbps (see Figs. 13 and 19).

Author's personal copy 200

Multimed Tools Appl (2012) 59:169–220

Loss Rate Measurements (LrBF vs. LrAF vs. LrAR) Bandwidth reduced to180 Kbps 35

LrBF

30

LrAF LrAR

Loss Rate (%)

25

20

15

10

5

0 1

51

101

151

201

251

301

351

401

451

501

551

601

Time (x500 ms)

Fig. 15 Loss rate measurements—180 Kbps reduction in bandwidth

Peer Contribution in VOD Streaming Session Bandwidth reduction of 120 Kbps 35

30

Load Balancing Operations

Content providing (%)

25

20

15 192.168.0.203 192.168.0.156

10

192.168.0.134 192.168.0.204

5

0 0

50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000

Time (ms)

Fig. 16 Successive load balancing procedures—120 Kbps reduction in bandwidth (dynamic LrBF smoothing)

Author's personal copy Multimed Tools Appl (2012) 59:169–220

201

Peer Contribution in VOD Streaming Session Bandwidth reduction of 180 Kbps 35

30

Load Balancing Operations

Content providing (%)

25

20

15

192.168.0.203

10

192.168.0.156 192.168.0.204

5

192.168.0.134 0 0

50000 100000 150000 200000 250000 300000 350000 400000 450000 500000 550000

Time (ms)

Fig. 17 Successive load balancing procedures—180 Kbps reduction in bandwidth (dynamic LrBF smoothing)

The improvement in terms of network utilization comes with a slightly extended instability period. In fact, when using the regular smoothing approach the third load balancing procedure is completed at t=37.60 s while it is completed at t=39.15 s when using the dynamic smoothing approach for a 180 Kbps-reduction scenario (see Figs. 11 and 17). This is also the case in the 120 Kbps-reduction scenario. The slightly extended adaptation period can be afforded by a service provider if the implied packet loss rate is recoverable through a combination of adaptive FEC redundancy transmission and opportunistic retransmissions. As illustrated in Figs. 18 and 19, the adaptive FEC redundancy transmission and opportunistic retransmission continues well after the completion of the last load balancing procedure, so as to control the residual loss rate. One can clearly observe from Fig. 20 that after the three consecutive load balancing operations there is still a fairly increased LrBF associated with the extended adaptation period, which is entirely recovered through adaptive FEC redundancy transmission. Figure 21 shows how the delay evolves throughout the experiment for all the involved contributing peers. This figures measures the delay drift as calculated at the receiving peer. The delay drift for each sub-stream is calculated following the formula (3) below: DelayDrift ¼ ½ðTSðiÞ  TSðjÞÞ  ðRT ðiÞ  RT ðjÞÞ

ð3Þ

Here RT(i) represents the reception time of the last packet received by the sub-stream in question, while RT(j) represents the reception time of the packet being consumed by the video player. The reception time difference depends on the receiving peer clock as it quantifies the time elapsed between the reception of two packets measured using the receiving peer’s clock. On the other hand, the difference between Timestamps of two packets quantifies the time interval between the transmissions of two packets using the

Author's personal copy 202

Multimed Tools Appl (2012) 59:169–220

Instantaneous Bandwidth Consumption (Peer IV) Bandwidth reduction to 120 Kbps 600 Video Stream FEC

500 Retransmission

Bandwidth (Kbps)

Total Data Rate

400

300

200

100

0 1

32

63

94

125

155

186

217

248

279

310

340

372

405

438

Time (sec)

Fig. 18 Instantaneous bandwidth consumption—120 Kbps reduction in bandwidth (dynamic LrBF smoothing)

Instantaneous Bandwidth Consumption (Peer IV) Bandwidth reduction of 180 Kbps

450 Video Stream

400

FEC Retransmission

350

Bandwidth (Kbps)

Total Data Rate

300 250 200 150 100 50 0 2

23 43 64 84 105 125 146 167 187 208 229 250 270 291 311 332 353 373 394 415 436 456

Time (sec)

Fig. 19 Instantaneous bandwidth consumption—180 Kbps reduction in bandwidth (dynamic LrBF smoothing)

Author's personal copy Multimed Tools Appl (2012) 59:169–220

203

Loss Rate Measurements (LrBF vs. LrAF vs. LrAR) Bandwidth reduction of 180 Kbps 40

LrBF LrAF LrAR

35

Loss Rate (%)

30

25

20

15

10

5

0 1

51

101

151

201

251

301

351

401

451

501

551

601

Time (x500 ms)

Fig. 20 Instantaneous loss rate measurements—180 Kbps reduction in bandwidth (dynamic LrBF smoothing)

contributing peer clock. Note that the delay drift is measured once each 500 ms. A DelayDrift with a negative value means that the sub-stream is running late; in this case, a delay drift correction through RTSP-HurryUp is most appropriate. In contrast, a DelayDrift with a positive value means that the sub-stream is in advance and a delay correction through RTSP-SlowDown is suitable. Instantaneous Delay Drift Measurements Bandwidth reduction of 180 Kbps 100 50

Delay (ms)

0 -50 -100 -150 -200 -250

Peer II Peer I Peer III Peer IV

-300

Time (ms)

Fig. 21 Instantaneous delay drift measurement—180 Kbps reduction in bandwidth (dynamic LrBF smoothing)

Author's personal copy 204

Multimed Tools Appl (2012) 59:169–220

We can observe from Fig. 21 that there is a steady minor delay drift for all the received sub-streams. This is explained by the fact that after the stream shaping and the re-scheduling of packet transmission time (via the field Duration), the system doesn’t account for the time actually spent by the multisource streaming server to transmit the packet (including system calls and hardware latencies). This minor transmission delay varies from a contributing peer to another and would vastly depend on the STB processing capabilities and its level of saturation. Another factor contributing to this minor and steady delay drift is the inherent desynchronization between the clocks used at the different peers in the network. A mechanism has been implemented at the contributing peer side to significantly reduce this steady drift and limit it to less than 50 ms over 10 min. This negligible delay drift can be corrected during the streaming session through delay drift correction mechanisms. This mechanism consists in accounting for processing time between the transmission of each RTP packets. All the results presented in Section 6.2 use this pemanent delay correction at the contributing peer to account for minor delay introduced by both our implementation and the underlying hardware processing latencies. As illustrated in Fig. 21, there is an important delay drift that starts building up for the Peer IV’s sub-stream at around t=300 s. This increase in the delay drift is characteristic of network congestion events. It can be noticed that the delay is each time confined through successive load balancing operations till it is concealed after the third load balancing operation. An important observation from the above experiment is that network congestion is better addressed through multiple load balancing operations to reduce the offered load of the concerned sub-stream and conceal the congestion effects: increasing loss rates and delays. The maximum loss rate threshold used to trigger such load balancing operation should be well dimensioned to not overreact against transient packet loss usually exhibited by networks. A major improvement of the system accuracy in countering network congestion via load balancing procedures consist of using more loss rate measurement data points between consecutive load balancing operations (i.e., dynamic LrBF smoothing) in order to better assess the current network conditions. This has greatly improved the network utilization by relying more on (i) past loss rate (LrBF) measurements and (ii) the underlying error control mechanisms to stabilize residual packet loss. 6.1.3 Discussion It is essential to appropriately distinguish between the different sorts of issues in order to tackle them with the most appropriate combination of failover mechanisms. The different network problems that might lead to an unusual high loss rate are: network congestion, bad link quality, and transient high BER. These three network problems manifest themselves differently and are better addressed through a different combination of failover mechanisms. For instance, a bad link is typically manifests itself through a steady fixed high loss rate with stable low delays. The loss rate usually stays fixed even after a load balancing. Ideally, if diagnosed early, the bad link problem should be addressed through a peer switching operation and not a load balancing operation. On the other hand, a transient high loss rate should manifest itself through a temporary spike in the loss rate that can be accompanied by slightly increased delays. Ideally, a transient high packet loss rate should be entirely handled by the underlying error control mechanisms (FEC redundancy transmission and opportunistic retransmission) without engaging other failover mechanisms. Obviously there is an important trade-off in the network measurement accuracy and achieving early problem diagnosis that needs to be thoroughly investigated and addressed. The way instantaneous loss rate measurements are averaged and smoothed with past

Author's personal copy Multimed Tools Appl (2012) 59:169–220

205

measurements play an important role in establishing whether a measured high loss rate is transient or persistent, which ultimately determine whether the problem is caused by a transient high loss rate or a bad link quality. For instance, in a very responsive multisource streaming system there is a very fine line between diagnosing severe network congestion and diagnosing a transient high-loss rate (e.g., light congestion). In the same manner, it is fairly hard to distinguish between a persistent bad link quality and a transient high BER. Taking the time to clearly establish the network problem through extended periods and longer-term measurement smoothing would lead to a serious loss in responsiveness which may cause severe video quality degradation due to unsustainable high loss rates. The tuning of all these parameters are left to the discretion of the system administration that should be uniquely guided by the specificities of the environment where the system is deployed. The loss adaptation strategy should rely on additional delay measurements to appropriately diagnose the network problem and better face it with appropriate failover mechanisms. As revealed earlier, a network congestion can be diagnosed through increasing delays and packet loss at the beginning and then fixed high delays and loss rates when the network congestion is at its peak (see Fig. 21). This is explained by the very nature of network congestion that is, in fact, caused by a buffer overflow in a router along the path of the concerned sub-stream. Typically, a load balancing operation should reduce or eliminate the network congestion as explained above. 6.2 Performance assessment of delay adaptation strategy In this part of the performance evaluation section, we focus on the delay adaptation strategy. The loss adaptation strategy is therefore disabled in order to fully exhibit the specificities and constraints of delay drifts and how these can be best tackled with delay drift correction operations. The Content Guardian constantly monitors the buffer length of each received sub-stream in order to determine the amplitude of potential delay drifts. This is achieved by comparing the TimeStamp of the packet that has been last received TS(i) with the TimeStamp of the packet that is currently being consumed by the video player TS(j). The different between these two Timestamps is afterwards compared to the delay difference between the reception time of packets i and j. If the difference is null then the sub-streaming session between the contributing peer and the receiving peer doesn’t suffer de-synchronization. Otherwise the Content Guardian establishes that there is a delay drift that needs to be quantified. The delay drift is quantified as by the formula (3) introduced above. The above delay drift measurement technique essentially consists of assessing the synchronization between the packet sending pace at the contributing peer and the packets consumption pace at the receiving peer. While the Timestamp is designed to synchronize both the sender and receiver on the same streaming pace, inherent time clock discrepancies lead to de-synchronization in media streaming systems. The RTP [17] Internet standard includes mechanisms to periodically resynchronize the communicating end-systems through RTCP-SR (Sender Report) messages—in this case, the sender regularly resynchronize the receiver. However, in the case of multisource streaming it is important to have the streaming timing and synchronization coordinated by the receiving peer that has a global view of all active sub-streaming sessions. Further, in its capacity of the focal point of the multisource streaming session reception and video rendering/consumption, the receiving peer should ensure this important task of sub-streams synchronization. If the DelayDrift in the formula (3) above returns a positive value then the Content Guardian concludes that the sub-stream transmission is in advance compared to the pace of

Author's personal copy 206

Multimed Tools Appl (2012) 59:169–220

video decoding and retrieving. This means that the contributing peer needs to be sloweddown using an RTSP-SlowDown request with the DelayDrift value as a delay lag. In contrast, if the DelayDrift value is negative then the Content Guardian concludes that the contributing peer is running late and needs to be speeded-up using an RTSP-HurryUp request with a delay lead equal to the absolute value of DelayDrift. Note that a delay drift correction operation is triggered if the DelayDrift value is over +500 ms or below −500 ms. The above method of calculating the delay drift is just an example of how this can be done. It is otherwise possible to quantify the delay drift in relative basis by only measuring the buffer length of the received aggregate video stream in terms of time and compare it with fixed thresholds. The Content Guardian can afterwards compare the delay drift measured at the different received sub-streams with the one measured at the aggregated stream buffer, and then undertake correction actions at each individual sub-stream as necessary. Although this can be an easier way to implement a delay drift control mechanism, it is better to focus on using the receiving peer clock to measure the delay drifts which would inherently ensure all received substreams are assessed based on a constant reference. 6.2.1 Test scenario In this sub-section, we describe the experimentation scenario that is used to evaluate the performances of the delay drift detection and correction under varying scenarios. The objective is to focus on three important QoS attributes, namely responsiveness, accuracy, and stability. First, the responsiveness in our delay drift correction subsystem consists in promptly detecting a delay drift and establishing its characteristics before quickly addressing it using appropriate delay correction mechanism. Like with loss adaptation strategy, the delay drift measurements at the Content Guardian are also smoothed in order to absorb transient variations and avoid an overreaction to a temporary spike in delays. The delay drift measurements are indeed averaged over three consecutive delay drift measurements (i.e., a measurement over 1.5 s). It is worth recalling that the maximum acceptable delay drift is set conservatively at ±500 ms, which leaves the Content Guardian a considerable margin to detect and a correct delay drift before this latter leads to devastating consequences on the video streaming quality of experience. Second, the accuracy in our delay drift correction subsystem consists in precisely (i) quantifying any given delay drift through measurements at the Content Guardian, and (ii) correcting the actual delay drift with precision. The former mainly depend on the way the delay drift is calculated as given in the formula (3) while the latter represents the ability of the content guardian to bring back the delay drift as close to Zero as possible upon a delay correction request. Third, the stability in our delay correction subsystem resides in its ability to yield consistent performances in front of a highly volatile environment where delays exhibit coarse variations over time. This also includes the ability of the delay adaptation subsystem to sustain a good synchronization between the contributing peer and the receiving peer as the delay drift unexpectedly moves in opposite directions, i.e., from a negative delay drift (lateness) to a positive delay drift (advance). Figure 22 illustrates the experimentation setup that will be used to evaluate the delay drift correction mechanism in our multisource streaming testbed. It highlights three different test scenarios with different network conditions and delay variation patterns between the four different contributing peers (Peer I, Peer II, Peer III, and Peer IV) and the receiving peer. In this experiment, a multisource streaming session is launched with four different contributing peers with different network conditions along each path. Each one of

Author's personal copy Multimed Tools Appl (2012) 59:169–220

207

sub-streams associated to the first three contributing peers are subject to a different test scenario, while the sub-stream associated to Peer IV (not mentioned in the Fig. 22 below) is used as a baseline with delay variation affecting it. The delay variation between each contributing peer and receiving peer are implemented using the NetEm network emulator that uses Linux traffic control libraries to alter the normal course of packets routing by introducing delays, jitters, losses, etc. The sub-stream contributed by Peer I is subject to a steady delay increase over an extended period of time. This will build up a negative delay drift (lateness) that will in turn periodically triggers a delay drift correction (RTSP-HurryUp) each time the measured delay drift is below −500 ms. After 3 successive RSTP-SpeedUp, the Content Guardian requests a final delay drift correction (RTSP-HurryUp) along with a permanent delay correction that will aim to constantly correct the delay drift at Peer I by regularly speeding-up the streaming pace with a pre-calculated delay lead over a predefined period of time. The objective of this scenario is to assess the performance of the delay drift correction mechanism in face of the most common delay issues that arise with streaming media. This type of delay issues are usually caused by network congestions along the path between the streaming sender and receiver. The sub-stream contributed by Peer II is subject to a steady delay increase followed by a steady delay decrease, and then a steady delay increase again. This scenario exhibits a typical network congestion that builds up in severity over 180 s before dissipating at the same pace. Finally, the congestion will builds up again at the same pace. This delay variation scenario asks for delay drift corrections featuring both streaming rate acceleration and deceleration. It is important to assess the ability of the delay drift correction system to

Peer II

Peer I

Peer III

NetEmScenario NetEmScenario Between t=20s & t=200s, increase the delay by100ms each 5s (for a total delay of 3.60s at t=200s)

NetEmScenario Between t=20s & t=500s, increase the delay by 100ms each 5s (for a total delay of 9.60s at t=500s)

The goal of this test is to assess the accuracy of Permanent RTSP_Hurry-Up (we use the measurements of CumulDelayDrift to dimension the correction)

Between t=200s & t=380s, decrease delay by100ms each 5s (for total delay of 0s at t=380)

The goal of this test is to assess the accuracy of subsequent opposite delay corrections (Hurry-Up, follwed by Slow-Down, and another )

2

Content Guardian

Fig. 22 Delay drift correction—description of the test scenario

At t=260s, increase the delay by 1.8s in one go Between t=320s & t=500s, increase the delay by 400ms each10s for a total delay of 7.2s at t=500s)

Between t=380s & t=560s, increase the delay by 100ms each 5s (for a total delay of 3.60s at t=560s)

1

Between t=80s & t=260s, increase the delays by100ms each10s (for a total delay of 1.80s at t=260s)

3

The goal of this test is to assess the efficiency of the delay corrections mechanism against coarse delay variations

Author's personal copy 208

Multimed Tools Appl (2012) 59:169–220

keep both the contributing and the receiving peer synchronized and the delay drift within a predefined range [−500 ms, 500 ms]. Finally, the sub-stream contributed by Peer III is subject to extreme delay variations in opposite directions. First, the delay is steadily increased with a slow pace of 100 ms each 10 s over 260 s time period, and then a very coarse decrease of 1.8 s suddenly occurs at t= 260 s. Afterwards, between t=320 s and t=500 s, the delay is aggressively increased again with an additional 400 ms after each 10 s of streaming. This scenario features very steep delay variations in order to exhibits the behavior of the delay correction mechanism in very extreme network conditions, although these exceptional conditions would be usually accompanied with high losses—in which case the loss adaptation mechanisms would promptly take actions as revealed in earlier sub-sections. It is worth noting that the test scenarios described above are not synchronized with the delay drift measurements performed at the receiving peer in respect to time. There is a slight shift of about 15 s on the time axis that results from the fact that the NetEm’s delay variation scenario is launched after starting the actual multisource streaming session. Figure 23 illustrates how the delay drift correction mechanism reacts when faced with a delay variation scenario as described in Fig. 22 above, under Peer II NetEm’s scenario. A first important observation is that the Content Guardian is able to precisely measure the delay drift with an error margin in the orders of few milliseconds. As the delay variation scenario starts its first phase at t=20 s, we can see that the delay drift increases by 100 ms for each 5 s that t=20 second +100 ms every 5s

t=200 second -100 ms every 5s

t=380 second +100 ms every 5s

Instantaneous Delay Drift Measurement Peer II (192.168.0.203) 1000

500

Delay Drift (ms)

0

-500

-1000

-1500

-2000

0 14000 28000 42000 56000 70000 84000 98000 112000 126000 140000 154000 168000 182000 196000 210000 224000 238000 252000 266000 280000 294000 308000 322000 336000 350000 364000 378000 392000 406000 420000 434000 448000 462000 476000 490000 504000 518000 532000 546000

-2500

Time(ms) Hurry Up (ms): -704 -705 -601 Permanent HU: 19% (-706) Slow Down (ms): Permanent SD:

-860 -900 -729

36% (-825) 740 748 759 24% (790)

Fig. 23 Details of the delay drift correction operation

Author's personal copy Multimed Tools Appl (2012) 59:169–220

209

passes. After three consecutive permanent delay drift corrections (at t=59.18 s, t=95.18 s, and t=128.25 s respectively), a fourth one takes place at t=161.33 s that is accompanied with a permanent delay correction action that will be regularly enforced at the Peer II. The amplitude of the above delay corrections is clearly shown in Fig. 23. The permanent delay correction is expressed as a ‰ number that essentially measures the number of milliseconds that the streaming pace is corrected with after each second. The permanent delay correction keeps being enforced over time to avoid using an excessive number of punctual delay corrections. To do so, the Content Guardian keeps track of all delay corrections (CumulativedelayDrift) that have been issued in the past. This is calculated as follows: CumulDelayDriftðiÞ ¼ CumulDelayDriftði  1Þ þ DelayDrift

ð4Þ

The content guardian also keeps track of the session the time elapsed since the first delay correction message was issued. Once three consecutive delay corrections of the same nature (HurryUp or SlowDown) have been executed, the Content Guardian issues one last delay correction before calculating a permanent delay correction as follows: PermanentDelayCorrection ¼

CumulDelayDrift CorrectionTime

ð5Þ

Here CorrectionTime is the time measured between the issuance of the first delay correction and the time when the Content Guardian decides to proceed with a permanent delay correction. Again, only the past three consecutive delay corrections of the same nature are considered here. For instance, the CorrectionTime for the first permanent delay correction in the Fig. 23 is equal to (161.33 s−59.18 s)=102.15 s, while the CumulDelayDrift is equal to the summation of all delay drifts correction during that the CorrectionTime period. CumulDelayDrift is then equal to ð705ms þ 601ms þ 706msÞ ¼ 2012ms. Using these values in formula (5) gives a permanent correction of about 19.6 ms for each second of streaming (≈19‰). The delay drift has a negative sign when the sub-stream is behind the scheduled pace, while it has a positive sign when the sub-stream is running in advance compared to the scheduled pace. The scheduled pace is the one indicated by the Timestamps in conformance with video stream temporal resolution. It should be noted that the video stream shaping has been de-activated for the purpose of the delay control study. Clearly, the permanent Hurry-UP issued at t=161.33 s was successful in stabilizing the steadily increasing delay as can be observed between t=1,670 s and t=200 s. During this period, the measured delay drift stayed relatively stable as the sub-stream was still experiencing a delay increase of 100 ms each 5 s (i.e., 20 ms each 1 s). Based on the measurements of the cumulative delay drift experienced so far, the Content Guardian calculate a permanent (HurryUp) delay correction of 19 ms each 1 s, which means that the contributing peer Peer II will reduce the value of the Duration field of each RTP packets by 1.9%. This permanent delay correction has been determined using the values of CumulDelayDrift and CorrectionTime as calculated above. The accuracy of the permanent is very appreciable with ±1 ms of error margin. Another important observation is that upon each delay correction operation there is a spike in the delay drift measurement at the receiving peer. This is tied to the fact that the Content Guardian uses the RTSP/TCP protocols to order delay drift correction operation in a synchronous manner, which means that the Content Guardian has to wait for acknowledgement of the delay drift correction completion. During this time, the Content

Author's personal copy 210

Multimed Tools Appl (2012) 59:169–220

Guardian does not log out delay drift measurements, though the packets keeps being streamed leaving the video quality unaffected. The entire delay control sub-system operates independently of the actual media streaming process, and this logging implementation glitch has no impact on the overall multisource streaming architecture. The test scenario associated with Peer II is further analyzed in a following subsection (see Fig. 26). Again, it is worth mentioning that there is a small delay gap between the time when the delay correction operations is issued and conveyed through the RTSP protocol, the time when this delay correction is enforced at the contributing peer, and afterwards reflected in the delay drift measurements at the receiving peer. In the following sub-section, we first discuss the performances of the Content Guardian in dealing with different delays’ fluctuations at the different active contributing peers (Peer I, Peer II, and Peer III). 6.2.2 Experimental results analysis Figure 24 illustrates the instantaneous delay drift measurements for the different substreaming sessions at the multisource streaming session. There is highlighted how the Content Guardian manages each sub-streaming session independently in real-time basis depending on the network conditions experienced by each sub-stream. The timing reference used to resynchronize each sub-streaming session is tied to the Timesamp timeline that is used as a global timing reference for the overall VOD stream. More specifically, the Timestamp timeline is monitored by the receiving peer that uses its internal clock to detect any delay discrepancies between the Timestamp timeline and the actual streaming pace of each contributing peer involved in the multisource streaming session. Clearly, the video stream consumption pace at the receiving peer drives all underlying delay corrections that take place at the active contributing peers. A more detailed analysis of each single sub-stream involved in the experiment is provided in Figs. 25, 26 and 27. The focus will be put on evaluating the performances of our delay drift control mechanism in respect to the three test scenarios introduced earlier in Fig. 22. Instantaneous Delay Drift Measurement At the Receiving Peer 1500 Peer II (192.168.0.203) 1000 500

Peer I (192.168.0.156) Peer III (192.168.0.134) Peer IV (192.168.0.204)

Delay (ms)

0 -500 -1000 -1500 -2000

-3000

0 11000 22000 33000 44000 55000 66000 77000 88000 99000 110000 121000 132000 143000 154000 165000 176000 187000 198000 209000 220000 231000 242000 253000 264000 275000 286000 297000 308000 319000 330000 341000 352000 363000 374000 385000 396000 407000 418000 429000 440000 451000 462000 473000 484000 495000

-2500

Time (ms)

Fig. 24 Delay drift correction—a global view of the multisource streaming session

Author's personal copy Multimed Tools Appl (2012) 59:169–220

211

Instantaneous Delay Drift Measurement Peer I (192.168.0.156) 500

0

Delay (ms)

-500

-1000

-1500

-2000

-2500

0 13000 26000 39000 52000 65000 78000 91000 104000 117000 130000 143000 156000 169000 182000 195000 208000 221000 234000 247000 260000 273000 286000 299000 312000 325000 338000 351000 364000 377000 390000 403000 416000 429000 442000 455000 468000 481000 494000

-3000

Time (ms) Hurry Up (ms): Permanent HU

-699

-605

-610 37% (-800)

Fig. 25 Delay drift correction for Peer I

Figure 25 illustrates the streaming performance and assesses the efficiency of the delay correction sub-system in respect to the test scenario associated with the Peer I. This scenario consists of steadily increasing the communication delays by 100 ms for each 5 s, for a total cumulated delay of 9.6 s at t=500 s. Although this scenario is not realistic in an actual streaming, it is nonetheless very useful to evaluate the delay corrections accuracy in front of progressively worsening network congestion. Such a steady streaming delay increase exhibits a serious clock discrepancy between the receiving peer and the contributing peer (Peer I). As it can be observed from Fig. 25, the Content Guardian issues three successive delay corrections (RTSP-HurryUp) in order to cope with the accumulating video streaming lateness. The three first delay corrections are respectively issued at t=86.17 s, t=119.29 s, and t= 149.27 s with a correction of 699 ms, 605 ms, and 610 ms. Each time the streaming drift is corrected and brought back to around −200 ms. The delay correction falls short in re-adjusting the delay drift to a neutral (0 s) level due to the time necessary to enforce the delay correction operation with the entailed RTSP signaling that roughly performs with a 200 ms responsiveness—this includes RTT (round trip time) and necessary RTSP acknowledgements. Finally, as the three previous delay correction fail to permanently stabilize the delay drift, the Content Guardian issues a permanent delay correction request with an initial delay correction of 800 ms and a recurring correction of 37‰ (37 ms of correction after each second). It is worth recalling that the positive and negative signs associated with delay corrections helps to indicate to the contributing peer whether it should reduce or increase the pre-calculated inter-packet time interval. One can easily verify the calculation behind this permanent delay correction by using the cumulated measured drift between the first

Author's personal copy 212

Multimed Tools Appl (2012) 59:169–220

Instantaneous Delay Drift Measurement Peer II (192.168.0.203) 1000

500

Delay Drift (ms)

0

-500

-1000

-1500

-2000

0 14500 29000 43500 58000 72500 87000 101500 116000 130500 145000 159500 174000 188500 203000 217500 232000 246500 261000 275500 290000 304500 319000 333500 348000 362500 377000 391500 406000 420500 435000 449500 464000 478500 493000 507500 522000 536500

-2500

Time (ms) Hurry Up (ms): -704 -705 -601 Permanent HU:

Slow Down (ms): Permanent SD:

-874 -749 -846 (-706)

36% (-825)

740 748 759 24% (-790)

Fig. 26 Delay drift correction for Peer II

delay drift correction (at t=86.169 s) the permanent delay correction (at t=182.182 s), and then dividing this compound delay drift by the time interval between the first delay drift correction and the permanent delay correction. This would give a delay drift (in ms) per milliseconds that should be multiplied by 1,000 to get a delay drift measured in milliseconds for each second of video streaming. The permanent delay correction issued at t=182.18 s is successful in stabilizing the ramping delay drift, but results in a residual negative delay drift of about 7 ms for each 5 s of streaming. While this is admittedly a negligible error margin when put in the larger context of 500 s of streaming, it should be possible to further narrow it by accounting for the signaling latencies when dimensioning the permanent delay drift correction. Figure 26 illustrates the performances of the delay drift control mechanism in dealing with the test scenario associated with the Peer II. The focus of this test scenario is on frequent changes of the end-to-end delay build up, with a delay increase phase followed by a delay decrease phase and then another delay increase phase. This behavior mimics a typical network congestion that slowly builds up to reach a peak at which point endsystems responsive flows reacts by reducing their throughput which leads to a steady attenuation of the congestion severity. Surprisingly, the delay drift measurement during the second phase of the test scenario associated with Peer II reveals that the delay decrease is much more aggressive compared to the first phase of the scenario, although the NetEm-induced delay decrease pace is the same as in the first phase of the test scenario (i.e., a variation of 100 ms per each 5 s of

Author's personal copy Multimed Tools Appl (2012) 59:169–220

213

Instantaneous Delay Drift Measurement Peer III (192.168.0.134)

1500 1000

Delay Drift (ms)

500

0 -500 -1000

-1500 -2000 -2500

0 13000 26000 39000 52000 65000 78000 91000 104000 117000 130000 143000 156000 169000 182000 195000 208000 221000 234000 247000 260000 273000 286000 299000 312000 325000 338000 351000 364000 377000 390000 403000 416000 429000 442000 455000 468000 481000 494000

-3000

Time (ms) Hurry Up (ms): Permanent HU

-609

Slow Down (ms):

-606

-803 -803 -803 37% (-800)

1294

Fig. 27 Delay drift correction for Peer III

streaming). This is due to the fact that a permanent Hurry-UP delay correction has been enforced at t=161.33 with a continuous correction of 95 s each 5 s to face a constant delay increase of 100 ms each 5 s. Clearly, when the test scenario enters its second phase, there is already an initial excess streaming pace speed up (delay decreasing) of 95 ms each 5 s that is enforced at the Peer II upon the permanent HurryUP request issued at t=161.33 s. This brings the overall incurred delay drift during the second phase of the test scenario to an advance of 195 ms for each 5 s of streaming. As a consequence, the successive delay corrections (RTSP-SlowDown) operations are closer to each other in time with an average 20 s between consecutive delay corrections, which is half of the time interval between successive delay corrections in the previous phase of the test scenario. At t=283.45 s, the Content Guardian issues a permanent RTSP-SlowDown request with a one-off correction of 790 ms followed by a permanent correction of 24‰ (i.e., streaming pace slow down of 125 ms each 5 s). This latter permanent streaming pace slowing-down is meant to counteract the existing streaming delay advance of 195 ms each 5 s. However, in our streaming delay control architecture a new permanent delay drift correction always replace the ongoing one. This means that at t=283.45 s, the sub-stream streaming pace will be effectively running an advance of only 100 ms each 5 s (20 ms per second)—the permanent correction is over dimensioned here. Upon the enforcement of the latter permanent RSTP-SlowDown, there will be a residual delay increase of 21 ms each 5 s (i.e., around 5 ms of latency per second) that can be observed in Fig. 26 during the period between t=300 s and t=380 s. At this point of time, the aggressive delay advance is somehow contained till the third phase of the test scenario starts at t=380 s.

Author's personal copy 214

Multimed Tools Appl (2012) 59:169–220

The limited accuracy of the second permanent delay correction is due to the fact that our delay correction system losses somehow accuracy when operating under challenging network conditions. Although it is very unlikely that our multisource streaming will be deployed to face extreme delay variation scenarios such as the one exhibited by the test scenarios we consider here, it is important to carefully study the performance of the delay correction sub-system and tackle any impending design issues in order to produce a more stable and viable multisource streaming system. An initial observation in regards to the permanent delay correction is that it is more challenging to precisely dimension a correction when the delay drifts with an aggressive pace. By the time the permanent delay drift correction is issued and enforced at the contributing peer, the delay would have drifted a little bit more. Finally the third phase of the test scenario associated with Peer II consists of a steady delay increase of 100 ms for each 5 s for 180 s, which should bring the overall delay drift to 3.60 s at t=560 s. Like observed during the second phase of the test scenario, the delay decrease is more aggressive due to the permanent delay decrease (RTSP-SlowDown) that is already in place. In fact, when the third phase of the test scenario starts the delay advance of 100 ms per 5 s is replaced by a delay lateness of 100 ms per 5 s. This means that the previously issued permanent RSTP-Slow down will further weigh on the delay drift to produce an overall streaming pace lateness of about 215 ms for each 5 s. This can be verified by calculating the average time interval between the different one-off delay corrections (RTSP-HurryUp) that at t=403,194 ms, t=424,173, and t=442,316 ms. Referring to Fig. 26, a steep permanent delay correction (RTSP-HurryUP) is finally issued at t=460.251 s in order to contain the steadily increasing streaming lateness. This permanent delay increase is dimensioned for a regular delay correction of −36‰ which corresponds to a streaming pace speeding up with 180 ms after each 5 s. After the permanent delay correction, the Peer II’s streaming pace is left with a residual delay drift of approximately 35 ms for each 5 s as it can be observed in Fig. 26 from t=410 s to t=560 s. Figure 27 illustrates the performance of the delay correction mechanism in respect to the test scenario associated with Peer III as described in Fig. 22. When the delay is slowly increased between t=80 s and t=260 s, the Content Guardian responds with a delay correction operation when the actual measured delay drift go beyond the −500 ms threshold. A very important observation is that the two successive delay drift corrections show an improved accuracy compared to previous test scenarios. This is mainly due to the fact that the delay increase happens at much lower pace (100 ms each 10 s instead of each 5 s of streaming) which gives the Content Guardian more margins to more appropriately dimension the delay drift correction based on accurate delay drift measurements. In fact, after each delay drift correction, the measured delay drift is brought back to −100 ms as opposed to −200 ms with the two previous test scenarios. Another side benefit is the ability of the Content Guardian to contain the delay drift and responsively execute corrections before the delay drift goes beyond −600 ms. At t=260 s, an abrupt decrease of the end-to-end delay of 1.8 s occurs, which dramatically increase the delay drift to an advance of about 1.2 s, from a lateness −600 ms. At this point, the Content Guardian issues an RTSP-SlowDown with a correction level of 1,294 ms. This precisely reduces the delay drift to an acceptable delay drift level of −100 ms for the next 60 s of streaming. At t=320 s, the sub-stream associated to Peer III is subject to an aggressive steady delay increase of 400 ms for each 10 s of streaming. This causes the Content Guardian to issue three successive delay correction operations (RTSP-HurryUp) that respectively take place at t=373.62 s, t=394.68 s, t=416.96 s with a delay drift correction of −803 ms each time. Due to the fact that the delay increase happens with a very aggressive pace, the three successive delay corrections bring back the delay drift to just −500 ms; clearly. By the time the delay

Author's personal copy Multimed Tools Appl (2012) 59:169–220

215

drift correction is enforced the delay drift slips by an additional 400 ms. The latency involved in the signalling is obviously behind this lag in the response time necessary to enforce a delay drift correction. When considered from an overall multisource streaming session perspectives, the permanent delay correction turns to be very effective in containing prolonged delay drifts for a limited signaling overhead. The error margin of approximately 5 ms per second is largely acceptable in the context of a multisource streaming system that employs different mechanisms to face both delay and loss variations. In fact, a scenario as harsh as the one associated with Peer III is unlikely to happen since such a steep delay increases will be certainly accompanied with high loss rates (i.e.., network congestion), meaning that load balancing operation will be ordered by the loss control sub-system to reduce both the loss rate and delay. 6.2.3 Discussion Clearly, the whole delay correction sub-system show better accuracy and responsiveness when the delay variations are not too aggressive. With a slowly deteriorating delay drift the Content Guardian can promptly react and tackle the delay drift before this latter widely exceeds the predefined thresholds [−500 ms, +500 ms]. When the delay drifting pace is limited the response time of the delay drift correction will significantly improve in respect to the signaling time which should lead to better control performances. On the other hand, the delay drift correction accuracy should also improve in more stable network conditions where the end-to-end delays change smoothly with a constant pace. In this case, the delay correction dimensioning yield better accuracy in capturing future evolutions of the delay; this is particularly prevalent for permanent delay correction operations. Most of the main parameters of our delay drift correction sub-system may be tailored to accommodate the requirements of a particular use case and a specific deployment environment in terms of streaming content characteristics, network RTT, typical network conditions fluctuations, Quality of Experience (QoE) targets, and existing QoS guarantees (SLAs), etc. For instance, in our specific implementation, the Content Guardian executes delay drift corrections decisions based on a final delay drift measurement that is smoothed over the past 5 actual delay drift measures; each of these actual delay drift measurements consists effectively of assessing the difference between Timestamp-based delays and reception time—based delays for a reference RTP packet and the last received RTP packet. Two consecutive actual delay drift measurements are spaced with 3 s. Therefore, the Content Guardian tracks delay drift trend changes over the last 15 s which significantly influences the responsiveness of the entire delay correction sub-system. Steep delay drift changes would lead to a swift reaction (see the third test scenario, Fig. 27) with less than 6 s between successive delay corrections, while a smooth delay variation will lead to a less prompt reaction by the Content Guardian (see the first test scenario, Fig. 25). The service provider can either set very conservative acceptable delay drift thresholds and enforce a smooth delay drift measurement averaging, or instead increase the delay drift thresholds to the highest acceptable levels from QoE perspectives and enforce very responsive delay drift measurements averaging. Another important observation in relation to the major system’s parameters playing in the delay correction performances is that the reference RTP packet—used to measure and consequently correct the delay drift—maybe changed throughout the streaming session to produces widely different system performances. In our specific implementation, the reference RTP packet is re-set each time the final delay drift indicates very stable

Author's personal copy 216

Multimed Tools Appl (2012) 59:169–220

performances. This will clear all saved cumulative delay drift measurements (used for permanent delay corrections), and effectively restart the whole cycle of delay drift correction. The idea behind this design choice is to reset the delay correction mechanism after each network congestion event (major delay variations events) since successive congestion events are usually uncorrelated, and should be dealt with accordingly. All the different adjustable parameters in our failover mechanisms can be adjusted as part of the overall adaptation strategy design. This includes the delay/loss measurement thresholds upon which the adaptation strategies react, the smoothing periods for the different QoS performance metrics, etc.

7 Conclusion In this article, we presented a highly scalable architecture for P2P-based VOD services provisioning over IP networks. Our focus was on evaluating the viability of such an architecture in dealing with network uncertainties and other streaming issues that grow in importance with the scale of the streaming system. Our P2P streaming system relies on a dependable multisource streaming system that implements various reliability mechanisms such as dynamic FEC protocols switching, opportunistic packet retransmission, streaming load balancing, delay drift correction, and peer switching. These different, and often complementary, reliability mechanisms can be combined into meaningful adaptation strategies that are driven by network conditions as diagnosed through analyzing events exhibited by the network or contributing peers. In this work, we assess the performance of two simple yet powerful adaptation strategies: loss and delay adaptation strategies. The objective is not to design the best adaptation strategy, but to rather evaluate the accuracy and efficiency of the individual failover mechanisms. By relying on accurate and efficient failover mechanisms, it is clearly possible to tailor an adaptation strategy to meet particular deployment's requirements and accommodate specific use cases—based on network capacity, traffic pattern, streaming content nature, SLAs in place, target QoE (Quality of Experience), etc. A major contribution in this paper consists of validating the design of our P2P-based VOD provisioning system and the underlying multisource streaming sub-system through full-scale system prototyping and implementation. The different implementation choices have been highlighted in each case by focusing on the performance/complexity tradeoff. We have shown in this work that when combined with the appropriate reliability mechanisms, multisource streaming can be very dependable in a professional VOD service provisioning offering. Using simple mechanisms it is possible to deal with a broad range of typical streaming/networking problems for a marginal cost in terms of overhead, complexity, and responsiveness. An extensive performance evaluation of our system under different extreme configurations shows that all major streaming issues can be fairly handled using the suite of multisource streaming failover mechanisms. An interesting future research direction would be to combine the delay adaptation strategy and the loss adaptation strategy in a holistic and comprehensive overall strategy that individually monitors and correlates the delay and loss variations to more efficiently and promptly diagnose streaming problems. This essentially consists of dimensioning the corrective actions and failover mechanisms based on joint delay and loss variations measurements. At the same time, such a comprehensive adaptation strategy can alternate between loss- and delay-specific failover mechanisms (e.g. delay correction and load balancing) to more accurately address those streaming problems that have both delay and loss consequences, such as congestion or a saturated contributing peer.

Author's personal copy Multimed Tools Appl (2012) 59:169–220

217

Appendix Figures 28, 29 and 30 below illustrate the details of the processes of delay drift correction, streaming load balancing, and peer switching, respectively. Contributing Peers CP1

CP2

CP3

CP4

Receiving Peer RP

CP5

1. Initiating Sub-Streaming Sessions with peers CP1, CP2, CP3, CP4, CP5

… 2. Simultaneous SubStreams Reception

3. Delay Drift Detection with CP5 sub-stream 4. Delay Correction with a fixed time period One-off Hurry-Up or Slow-Down Request 5. Delay correction

… 6. After Several Delay Correction Attempts, Delay Drift Persisting with CP5 sub-stream 7. Permanent Delay Correction Permanent Hurry-Up or Slow-Down Request

Fig. 28 Streaming delay drift correction—step by step Receiving Peer RP

Contributing Peers CP1

CP2

CP3

CP4

CP5

SuperNode SN

1. New VOD Session Request 2. List of Contributing Peers



3. Initiating Sub-Streaming Sessions with peers CP1, CP2, CP3, CP4, CP5 4. Simultaneous Sub-Streams Reception 5. Persisting High Loss /Delay with CP5 sub-stream 6. Determining a new load allocation 7. Increase Streaming Contribution 8. Increase Streaming Contribution 9. Decrease Streaming Contribution 10. Completion of Peer Switching

Fig. 29 Contributing peers’ load balancing—step by step

Author's personal copy 218

Multimed Tools Appl (2012) 59:169–220

Contributing Peers CP1

CP2

CP3

CP4

CP5

Contributing Peer (replacing peer) CP6

Receiving Peer RP

SuperNode SN

1. New VOD Session Request 2. List of Contributing Peers 3. Initiating Sub-Streaming Sessions With peers CP1, CP2, CP3, CP4, CP5

… 4. Simultaneous SubStreams Reception 5. Failure of CP5

6. Failure Detection 7. Try Load Balancing 8. Peer Switching Request for CP5 9. Suggestion of CP6 as Replacement Peer 10. Initiating Session with CP6 11. Sub-Stream Reception

12. Stopping Session with CP5 13. Completion of Peer Switching

Fig. 30 Contributing peers’ switching—signaling view

References 1. Annapureddy S et al (2007) Exploring VoD in P2P swarming systems. In: Proc. of IEEE INFOCOM. Anchorage, Alaska, USA, pp 2571–2575 2. Apostolopoulos J et al (2002) On multiple description streaming with content delivery networks. In: Proc. of IEEE INFOCOM 2002, vol. 3. New York, NY, pp 1736–1745, June 3. Bindal R, Cao P (2006) Can self-organizing P2P file distribution provide QoS guarantees? In: ACM SIGOPS operating systems review, special issue on self-organizing systems, vol. 40, issue 3, pp 22–30, July 4. de Asis Lopez-Fuentes F, Steinbach E (2008) Adaptive multi-source video multicast. 2008 IEEE International Conference on Multimedia and Expo, IEEE ICME 2008. Hannover, Germany, pp 457–460, 23–26 April 5. Do TT et al (2004) P2VoD: providing fault tolerant video-on-demand streaming in peer-to-peer environment. In: Proceedings of 2004 IEEE International Conference on Communications, vol. 3. Paris, France, pp 1467–1472, 20–24 June 6. Dongyan X, Hefeeda M, Hambrusch S, Bhargava B (2002) On peer-to-peer media streaming. In: Proc. of the 22nd IEEE International Conference on Distributed Computing Systems 2002, IEEE ICDCS’02. Vienna, Austria, pp 363–371, 2–5 July 7. Hefeeda M et al (2003) PROMISE: peer-to-peer media streaming using collectcast. In: Proceedings of ACM MULTIMEDIA 2003. Berkley, CA, USA, pp 45–54, 2–8 November

Author's personal copy Multimed Tools Appl (2012) 59:169–220

219

8. Huang Y et al (2007) When is P2P technology beneficial to IPTV services? NOSSDAV 2007, UrbanaChampaign, IL, USA, June 9. Huntington D (2005) U.S. Patent num. 6,970,937 (Filed on Jun 15, 2001): user-relayed data broadcasting. Abacast Inc., November 29th 10. Itaya S, Enokido T, Takizawa M, Yamada A (2005) A scalable multimedia streaming model based-on multi-source streaming concept. In: Proc. of 11th IEEE International Conference on Parallel and Distributed Systems, IEEE ICPADS 2005, vol. 1. Fuduoka, Japan, pp 15–21, 20–22 July 11. Liao X et al (2006) AnySee: peer-to-peer live streaming. In: Proceedings of IEEE INFOCOM 2006. Barcelona, Spain, pp 1–10, April 12. Magharei N, Rejaie R (2007) PRIME: peer-to-peer receiver-driven mesh-based streaming. In: Proc of 26TH IEEE INFOCOM 2007. Anchorage, AL, USA, pp 1415–14236, 7–12 May 13. Mundur P et al (2004) End-to-end analysis of distributed video-on-demand systems. In: IEEE Transactions on Multimedia, vol. 6, no. 1, February 14. Nafaa A (2010) Design, implementation, and analysis of a scalable VOD services delivery system: a P2P architecture. In: IEEE Transactions on Parallel and Distributed Systems, to appear in June 15. Schierl T, Ganger K, Hellge C, Wiegand T, Stockhammer T (2006) SVC-based multisource streaming for robust video transmission in mobile ad hoc networks. In: IEEE Wireless Communications Magazine, vol. 13, issue: 5, pp 96–103, October 16. Vikash A, Reza R (2005) Adaptive multi-source streaming in heterogeneous peer-to-peer network. In: Proc. of the SPIE Multimedia Computing and Networking Conference 2005, vol. 5680. San Jose, CA, USA, pp 13–25, 19–20 January 17. Yoshihiro K et al (2000) RFC3016: RTP payload format for MPEG-4 Audio/Visual streams. Internet Engineering Task Force, November

Dr. Abdelhamid Nafaa is a Research Fellow with University College Dublin. Before joining UCD, he was a professor assistant at University of Versailles-SQY and acted as technology consultant for U.S. and EU based companies in the area of reliable wireless multimedia communication and large-scale video distribution systems. He has been involved in several EU and EI R&D projects.

Author's personal copy 220

Multimed Tools Appl (2012) 59:169–220

Baptiste Gourdin is a Postgraduate student at the Stanford Computer Security Lab where he works on research issues related ton web and mobile devices security. Prior to that, he was with the Performance Engineering Lab in University College Dublin, where he worked on multi-source video streaming. He holds an Engineering degree in computer systems, networks and security from EPITA in France.

Prof. Liam Murphy received a B.E. in Electrical Engineering from University College Dublin (UCD) in 1985, and an M.Sc. and Ph.D. in Electrical Engineering and Computer Sciences from the University of California, Berkeley in 1988 and 1992, respectively. He is currently an Associate Professor in Computer Science at UCD, where he is Director of the Performance Engineering Laboratory. His current research projects involve mobile and wireless systems, computer network convergence issues, and web services performance.

Suggest Documents