Seamless Content Delivery over Mobile 3G+/4G Networks

Mobile Netw Appl DOI 10.1007/s11036-010-0259-1

Seamless Content Delivery over Mobile 3G+/4G Networks Theodore Zahariadis & Karsten Grüneberg & Luca Celetto

# Springer Science+Business Media, LLC 2010

Abstract Widespread and affordable mobile broadband access opens up opportunities for delivery of new streaming services everywhere and anytime. However, what is expected to fundamentally change the way how people use the network is the ability to produce, and seamlessly deliver and share their own multimedia content. In this paper we describe the content distribution and adaptation architecture that we have implemented and tested, the results utilising new coding formats of video coding (e.g. SVC, MVC) and new methods for increasing the robustness of video delivery. Keywords scalable video coding . multi-view coding . multi-description coding . cross layer adaptation . 3G+/4G mobile networks

Part of this work has been funded by the EC under projects FP-ICT214063 SEA, FP7-ICT-248036 COAST and FP7-ICT-249065 nextMedia. T. Zahariadis (*) Synelixis Solutions, Farmakidou 10, Chalkida 34100, Greece e-mail: [email protected] K. Grüneberg Fraunhofer Heinrich-Hertz-Institut, Einsteinufer 37, 10587 Berlin, Germany e-mail: [email protected] L. Celetto ST Microelectronics, Via C. Olivetti2, Agrate Brianza 20141, Italy e-mail: [email protected]

1 Introduction The next generation of mobile networks, such as HSPA, LTE and Mobile WiMAX have gained much interest in their abilities to offer higher throughputs, lower cost-per-bit of information and they are more “friendly” to IP traffic. Moreover, automatic handover between heterogeneous mobile networks (vertical handover) is already foreseen in ETSI and ITU standards. Wireless networks however, both for mobile and stationary users, are still prone to some common drawbacks. They are heterogeneous in bandwidth, reliability and receiver device characteristics. In wireless channels, packets can be delayed (due to queuing, propagation, transmission and processing delays), lost or discarded due to complexity/resource limitations or display capabilities of the receiver. Hence, the experienced packet losses can be up to 10% or more, and the time allocated to the various users and the resulting good-put for multimedia bit stream transmission can also vary significantly over time. Many multimedia applications aim to cope with a certain amount of packet losses, depending on the used sequence characteristics, compression schemes, and error concealment strategies available at the receiver (e.g. packet losses up to 5% or more can be tolerated at times). Consequently, unlike file transfers, real-time multimedia applications do not require a complete absence of packet losses, but rather a cooperation between application layer and lower layers to select the optimal wireless transmission strategy that maximizes the multimedia performance. To achieve a high level of acceptability and proliferation of wireless multimedia, in particular wireless video, several key requirements need to be satisfied by multimedia streaming solutions over such channels:

Mobile Netw Appl

& & &

Easy adaptability to wireless bandwidth fluctuations. Sources of the fluctuation can be from multipath fading, mobility, handover, competing traffic and so on. Robustness to partial data losses. Degradation of quality due to partial data loss should have minimal impact on the final user perceived quality. Support for heterogeneous wireless clients. Available bandwidth, computing capabilities, buffer availabilities, display resolutions, power limitations amongst other things should be taken into account.

In this paper, we describe an innovative network architecture for seamless content delivery, maintaining the integrity and wherever applicable, adapting and enriching the quality of the media across the whole distribution chain focusing on the mobile networks. The proposed system is based on new Media Aware Network Element (MANE) nodes that take advantage of the new media coding in order to offer enhanced Perceived Quality of Service (PQoS). The proposed architecture has been implemented, tested and validated over real testbeds and real-time emulators of various networks (HSPA, LTE, WiMAX).

next network in the delivery path. This will be extremely important in case of a low bandwidth, but guaranteed QoS mobile networks. On the other hand, the 3GPP Service Architecture Evolution (SAE) develops a framework for the migration of the 3GPP system to a higher-data-rate, lower-latency, packet-optimized system that supports multiple Radio Access Technologies (RATs). It is to be an all-IP core network part of the LTE along with being a connection point for other access technologies, both from 3GPP and non-3GPP. This includes most of the mobile network technologies, e.g. the HSPA, UMTS, GPRS, WiMAX and several landline access networks like the xDSL. A simplified mapping between the 3GPP SAE concept and the proposed architecture is depicted in Fig. 2. For the integration of the sNMG into SAE environment the reference points to the SAE anchor, the SGi interface is of major relevance. For maintaining the QoS throughout the entire distribution path, it is also important to allow mapping of service QoS from the sNMG to the SAE’s QoS policy. For example, the SAE architecture allows pushing QoS profiles based on subscriber IDs into the network all the way to radio access networks.

3 Cross-layer adaptation 2 Logical network architecture In case of building a multimedia service delivery architecture, it is desirable to have as much information of the lower layers (up to the network layer) as possible, along with scalability functionality coming with the media codecs. In order to overcome this problem, wherever applicable in the proposed architecture, we introduce intelligent media/ network aware entities. These could be a layer of new or enhanced networked nodes. We introduce two MANE types: a) streaming Home Media Gateway (sHMG) layer of nodes, located at the edge of the extended home environment and more important, b) a streaming Network Media Gateway (sNMG) layer of nodes located at the edge between the core and the access network (Fig. 1). The proposed MANE nodes offer functions like network and terminal awareness, content enrichment and content protection, multimedia storage, dynamic content adaptation and enriched PQoS by dynamically combining multiple multimedia content layers from various sources. In the longer term, they may be integrated on Internet Multimedia Systems (IMS) as define by ETSI TISPAN. Moreover, as they have knowledge of the underlined networks, they provide information on the network conditions/characteristics, which may be utilised by the Cross Layer Control (CLC) mechanism and adapt the multimedia streams to the

Over the described network architecture, we propose to provide content adaptation [1, 2]. Within the paper, we consider the evolving H.264 SVC (Scalable Video Coding) and MVC (Multi View Coding), as the major foreseen content encoding technologies over heterogeneous mobile networks and large audiences. SVC will offer layered temporal/spatial/quality content scalability, while MVC will introduce a truly personalised video delivery experience by allowing the user to select among the different views embedded in a single video stream. The proposed cross-layer and end-to-end signalling solution is based on MPEG-21 approach [3]. However, signalling has been adapted to follow the IETF (SDP [4] and RTSP [5]) approach. Taking into account the proposed architecture, the network nodes and the final terminal capabilities (ranging form laptops to mobile phones), we adopt a general adaptation network architecture as shown in Fig. 3. In this view, we assume that in the path from the Content Provider (including content prosumers) to the terminal, we may have N+1 Adaptation Engines (AE). Each engine is responsible for adapting the video stream to the next network in the path i.e. AEi adapts the video stream to the characteristics/ capabilities of Network i, always taking into account the final terminal capabilities and user requirements. As the adaptation options may be limited, some adaptation engines may perform stream adaptation or just forward (relay) network, streaming, terminal or user

Mobile Netw Appl Fig. 1 Proposed content delivery network architecture

P2P Network Overlay

sNMG Layer

Internet PDN/ Internet

Interactive/ On Demand Network

Mobile Network

Broadcasting Network

sHMG Professional Content Providers

Content Prosumers

combining SVC layers or MVC views and initiating MDC distribution over different paths.

characteristics to the next AE along the connection path. It is important to note however, that the last adaptation engine (AEL) will also have the responsibility to terminate the adaptation in case the terminal is not able to handle it. The architecture of Fig. 3 is further analysed in Cross Layer Adaptation and Control functional nodes as shown in Fig. 4. The major CLC nodes are the Adaptation Decision Module (ADM) and the Adaptation Execution Module (AEM), which offer the following functionality: &

&

Additionally, based on the business model, the ownership of the nodes and the capabilities of the terminal, three supporting entities may also be defined: &

Adaptation Decision Module (ADM): This module is able to decide if and what adaptation has to take place. Based on a multi-criteria decision framework, and network and terminal capabilities sensing, ADM will allow tuning of the encoding/streaming parameters, optimizing the endto-end rate, the distortion image quality and the resilience strategies at the application layer, as well as information regarding the connecting terminals. Adaptation Execution Module (AEM). This module is the context aware module, which actually performs the A/V handling. AEM functions include dropping or R Rx+

Home Netkwork

&

&

Content Storage Module (CSM). This module is utilised to store or cache the A/V content segments (layers, views, descriptions). It may act as an A/V server or a peer node in a P2P environment supporting on-the fly content enrichment. Network Awareness Module (NAM): This module has knowledge of the physical characteristics of the network (multiple access, QoS classes, coverage). Moreover, it may be able to measure or probe network parameters e.g. number of users, available bandwidth, etc. Terminal Awareness Module (TAM): This module has knowledge of the physical characteristics of the terminal (display, network interfaces, processing power,

sNMG HSPA/ UMTS SGi

Packet Data Network SVC/MVC Server or Peer

SAE Anchor

3GPP Anchor

MME UPE

S1 S2

S7 Evolved Packet Core

ePDG

LTE . . . DVB-H . . . WiMAX xDSL

Content Providers

Fig. 2 Mapping of the proposed MANE to the SAE architecture

sHMG

Content Prosumers

Mobile Netw Appl Fig. 3 Adaptation network architecture

consequently the network load is drops. Thus, the adaptation decision is propagated to the ADM modules that are closer to the Video Server. This assumption is not valid in P2P networks or in content aware networks that keep caches of the content in the network for latter use [9]. In a third level, the ADM communicates with the AEM to perform the content adaptation.

decoding capabilities). Moreover, it may be able to measure parameters at the terminal e.g. CPU load, battery life, free storage space, etc. 3.1 The adaptation engine architecture We assume that the received stream may be SVC, MVC, MDC (Multi description coding) or P2P (Peer to Peer) video chucks. A more detailed view of the Adaptation Engine is shown in Fig. 5. As it may be seen, the stream may be P2P, SVC/AVC, MVC or MDC and may be adapted by one or more AEM sub-modules. In a first layer the NAM/TAM modules sense the terminal characteristics and the network characteristics and conditions (e.g. available bandwidth, network congestion, bit rate ratio, error packet ratio, jitter, etc.) and provide it as input to the ADM module. In a second layer, the ADM modules communicate horizontally to exchange information and make decisions. The ADM module also handles the SDP signalling flows. In this way it is aware of the video streams (e.g. SVC, MVC) and their characteristics (e.g. number of enhanced layers, number of views etc.). It is important to note that each ADM is making a decision for the network that will follow. In VOD cases, when an adaptation needs to take place (e.g. drop an enhanced layer in SVC or a view in MVC), it is preferable to do that adaptation closer to the server, so that the the required bandwidth is reduces and Fig. 4 Cross Layer Modules communication

4 System evaluation and results Due to the complexity of the platform, the great heterogeneity of the target access network technologies and topologies (P2P), validation and testing has not been a trivial task. A number of network and network components have been initiated and integrated before the proposed functionality could be validated. It consists of three islands: a) a System Architecture Evolution (SAE) emulated testbed b) a P2P testbed, based on the PlanetLab network [6], and c) a 3G+/4G infrastructure testbed, which consists of a test Node B, providing UMTS (R99) and HSDPA access, and a public WiFi network. In this paper we concentrate on the emulated testbed and the 3G+/4G testbed. In the scenario that we describe (Fig. 6), we assume that a high quality SVC encoded video (base layer + Peer Nodes

Peer Nodes

CSM

CSM

CSM

AEM

ADM

Peer Nodes

CSM

AEM

AEM

ADM

AEM

ADM

ADM CSM

NAM Content Provider A/V Stream

NAM

NAM

SEA Media Node

In the network NAM (optional)

NAM

SEA Media Node

Application Signalling

NAM

SDP signalling or MPEG-21 metadata

TAM

Terminal

Mobile Netw Appl Fig. 5 Adaptation engine architecture

MPEG21 DIA metadata/IETF compliant optional param

P2P Video Streaming

MPEG21 UCD/UED

SVC Video Streaming

RTSP Session SDP/ IETF compliant streaming

MVC Video Streaming

VLC_VAR i/f

MDC Video Streaming

at least 2 enhanced layers) or a 3D video (e.g. MVC with left and right views) are streamed (either from a professional server or from a prosumer laptop) to a mobile terminal, and adaptation is supported both by the terminals and the network. We can assume that a user is located at a LTE cell. The user receives a SVC or a MVC encoded video. Then, the user moves to a HSPA cell. Initially the connection is good. However, as the terminal roams towards the edges of the cell, the network can’t support this video quality and it has to be adapted (drop the enhanced layers or the different views). As a result, the user still receives the video with a decent quality, taking into account the reduced bandwidth. Then the user moves to the WIMAX mobile network, which is even more crowded. The network can’t support the provided video quality, so further adaptation/ quality degradation takes place, but the video continues to run smoothly. In order to validate the adaptation results, some video sequences have been recorded and then evaluated by means of Fig. 6 The SVC streaming, multi-RAT, terminal assisted adaptation

objective measurements and formal subjective assessment by means of “expert viewing” tests. The results of these tests provide an index of “efficiency” of the system under evaluation. 4.1 Objective tests In general, objective metrics for measuring digital video performance are required for specification of system performance requirements, comparison of competing service offerings, service level agreements, network maintenance, and optimization of the use of limited network resources such as transmission bandwidth. To be accurate, digital video quality measurements must be based on the perceived quality of the actual video being received by the final users, rather than the measured quality of traditional video test signals (e.g. color bar). This is because the performance of digital video systems is variable and depends upon the dynamic characteristics of both the

Mobile Netw Appl Fig. 7 Mariposa SVC Test Sequence a) without adaptation, b) with adaptation

original video (e.g., spatial detail, motion) and the digital transmission system (e.g., bit rate, error rate). 4.1.1 SVC streaming In order to have a visual effect of the lost and received frames, the bandwidth and the highest decoded layer, the VLC [7] was modified, so that the received frames are shown as red bars (green bars appear periodically showing IDR frames which are random access points of the video sequence), while the size of the bars shows the number of enhanced layers that are received. Figure 7a) shows SVC video with base layer, and enhanced layers 1 and 2, streamed without adaptation (frames are missing). The quality is high, but there are frames missing (frozen video) and this lowers the PQoS. In case of Fig. 7b), adaptation takes place and the enhanced layer 2 is dropped, when the network is not able to support the required bandwidth. Thus, SVC is streamed with base layer and enhanced layer 1 only. In this case, the quality is somewhat

lower, but also the required bandwidth is lower and there are no frames missing. As a result the PQoS is comparably higher. Figures 8 and 9 show the throughput when roaming between different networks with and without adaptation and the frame delay in both cases. 4.1.2 MVC streaming Just like the SVC streaming, in order to have a visual effect of the lost frames and received frames, the VLC was modified also for MVC decoding. When there are no frames received (frozen stream), the red bars disappear. Figure 10 shows the case of (a) streaming without adaptation (frames are missing) and (b) streaming with adaptation. In case of (a), the MVC is streamed as 3D (both left and right view). The quality is better and the 3D effect is supported, but there are frames missing (frozen video) and this lowers the PQoS. In case of adaptation (b), when the network is not able to support the required bandwidth, one view is dropped and the video from 3D switches to 2D. As every

Fig. 8 SVC Streaming- throughput a) without adaptation, b) with adaptation

Mobile Netw Appl

Fig. 9 SVC Streaming- Frame delay a) without adaptation, b) with adaptation

second frame is associated with a view, alternating red and white bars in Fig. 10b) indicate that one of the views is not decoded. In this case, the quality is somewhat lower, but also the video stream data received are lower and there are no missing frames. So, PQoS is comparably higher. In order to evaluate the continuity of the streaming service, we have measured the service continuity (received frames as a function of time) or consequently the lost frames (frozen video). Figure 11 shows MVC/3D streaming without (a) and with adaptation (b). When adaptation is adopted, the number of lost frames (represented as black lines in Fig. 11) is much lower, and thus the continuity index is improved. As a result the PQoS is significantly improved. 4.2 Subjective tests Subjective tests were based on ITU-R BT-500, adapted to the project needs. ITU-R BT-500 describes the test conditions and the test setup for subjective visual tests. To receive meaningful and consistent results at least 20 persons, with different expertise in video quality, ranging from experts to not experienced persons, rated the different films. All tests used Fig. 10 Grasshopper 3D/MVC Test Sequence a) without adaptation, b) with adaptation

the Double Stimulus Continuous Quality Scale (DSCQS) test method which is described in ITU-R BT-500-11 [8]. The core of the DSCQS test method is the display order of encoded and original sequence and the scale which is used to express the quality of a certain video sequence. In the DSCQS test method each test cell consists of a coded video and its uncoded (original) version. The order of coded and original video is random. One pair of coded and original video sequences is repeated once before the subjects are asked to rate the quality of the two video sequences. In our case, instead of the original and the encoded video, the video without adaptation (demonstrating a number of lost frames and frozen video) and with adaptation (changing the SVC number of layers, switching between 3D and 2D, using MDC or not and smoother flow) was played back. For rating a continuous scale ranging from 0 (low quality) to 100 (high quality) was used. The people participating at the rating were asked to rate both (the not-adapted and the adapted) video sequences in terms of the overall quality of each clip. Note that the participants were not informed, that one of the video sequence is the not- adapted one and the other the adapted one, but were just asked to rate the quality of both test clips.

Mobile Netw Appl Frozen Frames With Adaptation

Frozen Frames Without Adaptation

a

1

b

132 263 394 525 656 787 918 1049 1180 1311 1442 1573 1704 1835 1966 2097 2228 2359 2490 2621

1

132 263 394 525 656 787 918 1049 1180 1311 1442 1573 1704 1835 1966 2097 2228 2359 2490 2621 Frame #

Frame #

Fig. 11 Grasshopper MVC/3D Test Sequence a) without adaptation, b) with adaptation

Before the actual tests start, each participant passed a small training phase to become familiar with the testing procedure. To adjust the range from a badly encoded video to a perfect reconstruction, the first three votes of each test are chosen to reflect the whole range of possible quality during this test. The votes for these first three pairs of video were later ignored. They were just used to allow each subject to set their personal range from “bad” to “perfect” Figures 12 and 13. The vote for each coded sequence was then calculated as: QLOSS ¼ Qoriginal Qreceived where QLoss is the quality of experience that is lost, Qoriginal is the quality of the original (streamed) video when the no losses occur and Qreceived is the received and recorded video subject to transmission losses. As the DSCQS method allows the coded video to be rated better than the original (as it may appear in cases where the coded video is visually transparent), in the same sense the received video might in some cases be rated better than the original one. So, QLoss was Fig. 12 DSCQS voting scale

clipped in order to be a zero or positive value and then was matched to a standard Opinion Scale (OS) by calculating: OS ¼ 5 QLOSS =20 All data was then statistically processed to obtain the Mean Opinion Scale (MOS) by averaging the votes of all participants. In addition the Standard Deviation and the 95% Confidence Interval (CI) were computed. It is assumed that a lack of overlap with the 95% interpretation CI provides a strong indication of the existence of differences (from the statistical point of view) between adjacent MOS values. In Fig. 14, we show streaming of two test sequences (Mariposa, a SVC test sequence and Grasshopper, a 3D/MVC Test sequence) with and without adaptation. In Steps 1.x the sequences are streamed from a professional server, while in Steps 2.x, user generated content is simulated by streaming the sequences over an ADSL line. Initially, the video receiver is located at a LTE cell (Step x.1), then moves to HSPA cell (Step x.2) and then to WIMAX network (Step x.3). As it can be seen, the adapted sequence starts with a small reduction (due to the initial higher set-up delay). However, as the scenarios evolve, the average opinion is comparably improved. For example, on the mariposa sequence, the adaptation resulted in an average reduction of 0.51 points, while the not-adapted soccer sequence resulted in an average reduction of 1.37 points. On the grasshopper sequence the adaptation resulted in a reduction Fig. 13 MOS scale with typical interpretation

Mobile Netw Appl Grasshopper (3D/MVC Test Sequence) 5

4

4

MOS

MOS

Mariposa (SVC Test Sequence) 5

3

2

3

2

Not Adapted

Not Adapted

Adapted

Adapted

1

1 Step 1.1

Step 1.2

Step 1.3

Step 2.1

Step 2.2

Step 2.3

Step 1.1

Step 1.2

Step 1.3

Step 2.1

Step 2.2

Step 2.3

Fig. 14 Mariposa and Grasshopper test sequences (not adapted vs. adapted)

of 0.4 points, while the not adapted sequence resulted in an average reduction of 0.82 points. The tests were repeated with sequences that had different context. The movement in the test sequence “harbour” is very slow, while in the test sequence “soccer” the objects move much faster (Fig. 15). It was shown that in case there is a lot of action in the sequence, PQoS is significantly increased by video adaptation and sequence continuation, even if the video quality drops.

solutions over such channels. Within this paper we have proposed a new network architecture, an adaptation method and evaluated the results over various networks. As results we may summarize the following:

5 Conclusions

&

In wireless channels, packets may be delayed (due to queuing, propagation, transmission and processing delays), lost or discarded due to complexity/resource limitations or display capabilities of the receiver. The experienced packet losses can be up to 10% or more, and the time allocated to the various users and the resulting good-put for multimedia bit stream transmission can also vary significantly over time. However, unlike file transfers, real-time multimedia applications do not require a complete absence of packet losses, but rather a cooperation between application layer and lower layers to select the optimal wireless transmission strategy that maximizes the multimedia performance. To achieve a high level of acceptability and proliferation of wireless multimedia, in particular wireless video, several key requirements need to be satisfied by multimedia streaming

&

The PQoS is reduced very rapidly if the video stream is frozen (frames are lost). On the contrary, actions that keep the video streaming are positively considered, even if this results in lower video quality. However, the video content is also a parameter (e.g. the reactions are quite different in case of sports/football games or artistic films). Dropping the higher enhanced layers of an SVC stream or dropping the left or the right view of a 3D/MVC stream may result in avoiding video frozen, in case of bad network conditions. This is due to lower bandwidth requirements. Moreover, this will result in lower CPU load, thus longer battery life. PSNR and VQM will be better. On the other hand the memory is not really affected, but the same buffer size is used for increasing the video resilience. The PQoS in this case is increased. The benefit in PQoS on adapting a video stream is increased based on the motion and the video context. In case the video has action (e.g. soccer) video continuity is very important and the users really appreciate avoiding frozen frames.

&

Additionally to the proposed architecture and the adaptation mechanisms, we are currently continuing reSoccer (SVC Test Sequence) 5

4

4

MOS

MOS

Harbour (SVC Test Sequence) 5

3

2

Not Adapted

3

2

Not Adapted

Adapted 1

Adapted 1

Step 1.1

Step 1.2

Step 1.3

Step 2.1

Step 2.2

Step 2.3

Fig. 15 Harbour and Soccer test sequences (not adapted vs. adapted)

Step 1.1

Step 1.2

Step 1.3

Step 2.1

Step 2.2

Step 2.3

Mobile Netw Appl

search on ways and methods that content awareness and in the network cached content can further enrich the PQoS [9].

References 1. FP7 ICT SEA (“SEAmless Content Delivery”) STREP project partially funded by the EC, under contract ICT-214063, www.ist-sea.eu 2. Zahariadis T, Negru O, Álvarez F (14–16 April 2008) Scalable Content Delivery Over P2P Convergent Networks. IEEE International Symposium of Consumer Electronics (ISCE 2008), Portugal

3. ISO, “ISO/IEC TR 21000-1:2001 - Information technology – Multimedia framework (MPEG-21) – Part 1: Vision, Technologies and Strategy” Retrieved 2009-10-31 4. Handley M, Jacobson V (April 1998) SDP: Session Description Protocol. RFC 2327 5. Schulzrinne H, Rao A, Lanphier R (April 1998) Real Time Streaming Protocol (RTSP). RFC 2326 6. http://www.planet-lab.org/ 7. www.videolan.org 8. Rec. ITU-R BT.500-11 (2002) Methodology for the subjective assessment of the quality of television pictures 9. FP7 ICT COAST (“COntent Aware Searching, retrieval and sTreaming”) STREP project funded by the EC, under contract ICT-248036, www.coast-fp7.eu