Connected TV and Beyond - Semantic Scholar

4 downloads 44566 Views 266KB Size Report
composition, by merging the base (broadcast) content with add- on (Internet) content. ... Nevertheless, apart from the creation of smart TV devices able to consume .... convergence and inter-operation of the media provisioning platforms and ...
Connected TV and Beyond Sergios Soursos

Nikos Doulamis

Intracom S.A. Telecom Solutions [email protected]

National Technical University of Athens [email protected]

Abstract—Nowadays, a paradigm shift is under way in the world of Digital Broadcast TV. This change, similar to that of the mobile market, promises a future where modern TV sets and settop boxes will become the merging point of TV and computers. The ‘Connected TV’ will allow users to access content available either from the broadcast channels or the Internet. There are already some independent attempts in realizing this concept as well as standardization efforts that aim at closing the gap between different implementations. However, more focus is placed on the integration of Internet content while the traditional broadcast part is neglected. In this work, we outline the current status and propose a framework for creating a true hybrid solution, where end users can enrich broadcast content with Internet-based enhancements so that they can enjoy improved and personalized viewing experience. In fact, this approach allows for content composition, by merging the base (broadcast) content with addon (Internet) content. This will further facilitate the opening of the TV market, the emergence of new business models and the offering of more advanced and personalized services.

I. I NTRODUCTION During the last decade, the Internet world has experienced the bloom of digital (multimedia) content. Large amounts of user-generated content are uploaded every day in various web sites (e.g. YouTube, Flickr, etc.), film and music industries see Internet as an efficient mean to distribute their content, different versions of the same content are offered in an ondemand or live, free or pay-per-view manner. New means of dissemination and transportation of large content have emerged, i.e., peer-to-peer (P2P) or P2P-assisted distribution. At the same time, people use the Internet to interact, to exchange comments on specific pieces of content as well as to share content and information. Social networking has evolved into an everyday activity that drives the evolution of internet-based applications, especially since mobile and handheld devices have entered the market. On the other hand, the digital era has transformed the broadcast TV industry, to some extent only. The digitization of the TV signal and the introduction of IPTV platforms have offered a variety of better quality channels and High-Definition (HD) video content. However, broadcast TV still remains a flat service and lacks of personalization capabilities. There exist a number of commercial solutions that merge the two worlds (Internet and TV) by offering TV sets that can access both (i.e. Apple TV, Google TV). This new trend, as was the transformation of the mobile world a few years ago with the rise of smart-phones and successful mobile applications, is called Internet-connected TV or simply Connected TV.

Nevertheless, apart from the creation of smart TV devices able to consume Internet-based content, the two worlds remain disconnected: there’s no correlation between the broadcasted TV program and the internet-based content that may be consumed at the same time. Or, at least, the two contents may be indirectly related due to their reference to the same event or information. 2011 is supposed to be the “year of the cable cut”, where TV sets become independent of the broadcast provider and can fetch content directly from the Internet. It becomes obvious that there is a strong competition between traditional and over-the-top (OTT) providers that not necessarily promotes the evolution of the TV. What is missing from the current TV ecosystem is to offer personalized viewing experience, by allowing end users to compose infotainment services by combining content from the Internet and the broadcast worlds and consume them as a single, composite piece of content. In this paper, we propose a framework for offering personalized TV services, the Enhanced Connected TV, where Internet-based content enhancements come to complement existing broadcast content. The remainder of this paper is as follows: in Sec. II we briefly present the status of the TV ecosystem and its future directions and in Sec. III we describe our vision of the future of TV through the merging of the broadcast and internet worlds. We continue in Sec. IV by describing with more details the architecture of our proposal, highlighting the underlying technologies, in media and network level that will enable further the TV evolution. A summary is included in Sec. V. II. C URRENT S TATUS Connected TV, also referred to as Smart TV, describes the current trend of integrating the Internet features into modern TV sets and set-top boxes (STBs). Moreover, it involves the technological convergence between computers and TV sets/STBs, in the same way that smart phones realized the convergence of mobile devices and computers. In this new generation of TV, focus has been placed on Internet TV, overthe-top content and on-demand streaming media, while less focus has been put on the traditional broadcast features. In this context, there are several initiatives that try to standardize certain aspects of the Connected TV specification. One of the first established forums related to TV activities is TV-Anytime forum [1], formed in 1999. Its goal is “to develop specifications to enable audio-visual and other services based on mass-market high volume digital storage in consumer platforms.” The ultimate objective is to specify the functionality

of a Personal Digital Recorder (PDR) which will allow storing large quantities of multimedia content for personal use. As a next step, the Network Digital Recorder (NDR) will facilitate the transfer of content between local storage devices, using home networks, and to “micro-navigate” around content. Open IPTV Forum (OIPF) [2] is an industrial initiative that was founded in 2007 with the purpose of producing a fully scalable and interoperable IPTV system and to increase the Quality-of-Experience by blending added-value services and TV services from the Open Internet and Managed Networks. To achieve this OIPF will specify a scalable architecture, the required interfaces, the functionality of the network and terminal elements, a common User-Network Interface (UNI) allowing to access Managed Networks and the Open Internet, as well as to describe interactive and personalized services and technology choices for all major functionality. Hybrid Broadcast Broadband TV (HbbTV) [3], is a European industry initiative aiming at “harmonizing the broadcast and broadband delivery of news, information and entertainment to the end consumer through TVs and set-top boxes with an optional web connection.” In 2010, HbbTV became a standard. The specification document defines “a platform for signalling, transport, and presentation of enhanced and interactive applications designed for running on hybrid terminals that include both a DVB compliant broadcast connection and a broadband connection to the internet.” Following other standards, HbbTV describes the entire framework for building applications for TV systems, how linear1 and non-linear2 A/V content can be embedded into such applications and how broadband content can be related to specific broadcast TV content. Moreover, there exists a number of Internet-based OTT providers that offer on-demand video streams (mostly movies) with advanced playback features. Netflix offers Internet video streaming of selected titles and maintains a personalized videorecommendation system based on ratings and reviews by its customers. Hulu is an OTT subscription service offering adsupported on demand streaming video of TV shows, movies, webisodes and other new media, trailers, clips, and behindthe-scenes footage from various networks and studios. Vudu is a media technology company responsible for the development of interactive media services and devices that are used to distribute full-length movies over the Internet to television, using a hybrid peer-to-peer TV technology. YouView is a proposed open, internet-connected television platform, allowing consumers to access a range of services (television channels, radio stations, on-demand services and internet content) using a compliant device built to a common technical standard and through a broadband internet connection. There are also many STB manufacturers that have adopted certain Connected TV concepts and have included them in their commercial products. Among them, Apple TV and 1 Broadcast 2 A/V

to end.

A/V content intended to be viewed in real time by the end user. content that does not have to be consumed linearly from beginning

Google TV are the most popular ones. Apple TV is a digital media receiver designed to play IPTV digital content originating from the iTunes Store, Netflix, YouTube, Flickr, MobileMe or any computer running iTunes onto an enhanceddefinition or high-definition widescreen television. Apple TV allows consumers to use an HDTV set to consume content that originates from limited Internet services or a local network and supports downloading/streaming podcasts. Google TV is a Smart TV platform from Google which integrates Google’s Android operating system and the Google Chrome browser to create an interactive television overlay on top of existing internet television and WebTV sites adding a 10-foot user interface. Consumers can access HBO, CNBC, and content from other providers. Partners have built applications that allow customers to access content in unique ways. Regarding the TV set manufacturers, some of them have developed their own connected TV platforms, like Samsung Smart TV, LG NetCast, Panasonic Viera and Philips Net TV, while others have supported existing initiatives, like Sony Internet TV which runs Google TV. III. T HE V ISION OF E NHANCED C ONNECTED TV It is made obvious that the future of TV lies in the integration of the Broadband and Internet worlds and in the deployment of consuming devices which can handle content from both environments. Important steps towards this direction are being made by the numerous TV and STB manufacturers. Also, well-coordinated standardization activities are taking place by the aforementioned initiatives and forums. However, the focus of the Connected TV approach is on devices and interfaces and not on the actual content. Current trends allow for embedding of (non-)linear A/V content into custom applications, or vice-versa, and enable devices to connect to both worlds. An important omission though remains the inability for the end-user to combine content from different sources. This would allow end-users to personalize their viewing experience by combining the base broadband stream with third party (compatible) add-on stream(s) (e.g. different subtitle stream or different audio track stream). This, in turn, would directly relate the internet-based content to certain broadcast streams, as opposed to the indirect relation of content and web-based applications realized by the current approaches, where there’s no actual requirement for a content to be viewed at the time of consumption of the second one. Our vision is to further progress the TV set from an allin-one device connected to both broadcast and internet, to an advanced device that can dynamically combine content from both worlds, empowered by the advances in the media and network technologies. We call “enhancement” the Internetbased content that comes to complement the existing broadcast content and implement added-value services. The enhancement is merged with the broadcast content on the stream level and can have the form of overlay stream or audio/visual filters acting on the base stream. The offerings this approach brings into play are obvious: it deals with a fully hybrid solution, as opposed to HbbTV which

only relates applications with broadcast content and embeds the latter into the former. The resulting architecture can follow the existing standards and allow for true personalization and customization of media streams. It also realizes the true convergence and inter-operation of the media provisioning platforms and systems and renders the management of (network) resources more efficient: there’s no longer need to transfer the full quality/information stream to all customers but rather broadcast the base stream and allow for dynamic requests for personalized content through alternative networks. A. Application Scenarios The traditional “pull” model of added-value services is also adopted in the Connected TV environment. However, the ability to relate and combine contents from different sources permits a more innovative and direct approach: the “push” model. According to this model, once the end user starts watching a TV program, the system can recommend to him (in the form of notifications) available enhancements that are related to the current content, considering also the user’s preferences, the network and device capabilities and charging options. The user can in turn select any of the recommendations and activate them. The recommended enhancements can be of several forms and serve different goals. One popular type of enhancements is the embedded advertising. Current advertisements are shown in breaks of the broadcast program. One step forward is to consider the preferences of the viewer and location information. Embedded advertising is a step even further. Advertisements will be related with the broadcast content. When selecting the car that appearing in a movie scene, a small notification window can pop up and provide information about the model with a link to the full advertisement. Of course, user preferences and locality information can still be used when deciding what objects in the scene should be attached to an advertisement and what info the full advertisement should display. Other characteristics of the Connected TV can be combined with this feature as well: non-linear A/V content can be paused when the full advertisement is displayed and linear A/V content can be buffered for the duration of the advertisement and replayed once the user returns to the viewing mode. Another type of broadcast enhancement can be the A/V aids for sight-/hearing-impaired people. Once the TV set is configured accordingly, it automatically fetches the appropriate enhancements from the Internet in order to improve the viewing experience of the impaired people. Another type of A/V enhancements can be the HD, or even beyond HD, broadcast enrichments, as well as 3D and immersive enhancements. Depending on the capabilities of their displaying/decoding device, the users can enrich/adapt their viewing experience. Other added-value offerings in the form of Internet-based broadcast enhancements may include multi-language subtitles and/or audio tracks for broadcasted movies as well as lyrics and “karaoke” mode for music video clips. Moreover, overlaystyle information can be used in many contexts, like football matches (e.g. info on the players’ statistics), movies (e.g. info

about actor’s filmography) and documentaries (e.g. Wikipedialike info about buildings and monuments, in combination with embedded geo-location data and map services). B. Usage and Business Models The market of the Connected TV offerings is already following the successful paradigm of the application store for smartphones: the user can browse from his TV set all the available services, compatible with the specific manufacturer and based on the business relationships with third party (OTT) providers. In this context, we envision a market/store where apart from selecting application or services, the end-user will also be able to select (types of) enhancements, i.e. an “enhancements store”. Note that the “push” model is still in place, and the enhancements will be actually displayed as soon as the system detects that the current TV program supports one (or more) of the chosen enhancements. In such an environment, third party companies offering broadcast enhancements can enter the market. At the same time, OTT providers can switch from competing with traditional broadcast TV providers to offering complementary services as well, i.e. broadcast enhancements through the broadband channel. Any business entity that provides broadcast enhancements needs to reach an agreement with content owner/distributor so as to be able to create and, most importantly, synchronize the enhancement streams. The benefit for the broadcasters and the content owners/distributors is that they allow the enrichment of their content which will in turn renders it more popular and hence increase their customer base and profits. For those ISPs that will be adopting the content-aware networking paradigm, the benefit can be two-fold: i) taking advantage of the in-network mechanisms, they will be able to advertise themselves as “content-friendly”, allowing for in-time delivery of content and thus attracting more customers and ii) they can also be able to participate, as third party enhancement providers, in the aforementioned business chain. IV. E MERGING A RCHITECTURE AND T ECHNOLOGIES Enhanced Connected TV focuses on the inherent synchronization of broadcast and internet flows and their merging into a single media stream. It consists of three main planes (see Fig. 1): the Media Plane that refers either to the Internet or Broadcast flows originating from the content providers, the Synchronization and Adaptation Plane with the purpose of composing the two heterogeneous media streams, which is further divided to the Media Synchronization and Network Adaptation sub-planes, and the Composition Plane that indicates the home platform interfaces able to decode and synchronize the two media flows. A. Media Plane Two types of media streams are considered; the base layer and enhancement layer streams. Broadcast or base layer streams, refer to the media content transmitted over the broadcast delivery channel. Broadcast media are encoded

Fig. 1.

The Concept of Enhanced Connected TV

and transmitted using the DVB framework, e.g. the DVBT or DVB-T2. The broadcast content is enriched with new or extended media container and transport stream format, able to describe the required synchronization marks. For the description language existing standards such as MPEG-7 [4] and/or MPEG-21 [5] can be used. Internet or enhancement layer streams are generated by either the broadcaster or third-party companies and published over the IP network. Internet media are encoded using different specifications than the broadcast streams. Popular encodings that can serve this purpose are the Scalable Video Coding (SVC) [6] and the Multiple Description Coding (MDC) [7]. Each internet flow is associated with a synchronization scheme that describes the encoding specifications and the way of synchronizing this stream with the broadcast base layer. B. Synchronization and Adaptation Plane Synchronization is accomplished in two main aspects; the media synchronization and the network adaptation. Media synchronization enables inherent media consumption of broadcast content and IP services at the content level. New media container formats must be introduced that allow for the inherent composition between the broadcast and the internet media parts. Spatio-temporal video analysis algorithms must be investigated for estimating synchronization marks in broadcast streams like, for example, detection and tracking of objects of interest or categorization of events in video streams. Finally, the enhancement layers will be transmitted using a new media transport stream able to inherently compose broadcast and internet media flows into one single consumption flow. Media synchronization facilitates the off-line media content composition and synchronization at the terminal. However, to maintain the integrity and quality of the composite media stream and to provide personalized high Quality of Experience (QoE) media services, it is required to incorporate network synchronization strategies for the internet flows (enhancement layers) for a just-in-time reception with the broadcast streams at the receiving end. Due to the high diversity between internet

content and broadcast delivery platforms, network adaptation requires new co-operative research strategies between the underlying networks and media scalable descriptors. A content-aware network [8],[9] architecture must be established to allow for easy and fast publication and discovery of content items/services. This architecture must also guarantee just-in-time reception of the internet-based media flows to be synchronized with the broadcast streams. For this reason, dynamic adaptive streaming algorithms are required, which incorporate both information from the media (e.g., SVC) and the network plane through monitoring of the network status and extraction of media characteristics from the embedded content description. Additionally, the delivery requirements from the end-users, concerning the quality and resolution of the received media streams must be analysed and the requirements of the enhancement layers must be propagated to the network for adjusting the network resources in order to guarantee, as much as possible, the end-user constraints. Finally, the network conditions should adjust the way of encoding and processing of media content assuming scalable multiple description schemes. The aforementioned cooperative media-network adaptation requires the acquisition of the contextual delivery conditions in the sense of terminal capabilities, network status, media characteristics and users’ preferences. Therefore, network awareness mechanisms should be incorporated along with content/resource awareness tools. C. Composition Plane The composition plane is used for synthesizing the two heterogeneous media flows. It incorporates tools for decoding the two heterogeneous streams in a framework that the different media parts are synchronized both at spatio-temporal and resolution level. The basic interactive mechanisms as well as the personalization schemes are included here, enabling users to create their own personalized media experience. It is expected that a single terminal device will incorporate the entire functionality of this plane. D. Overall Architecture Figure 2 depicts the envisioned architecture, which consists of three main entities: the Home Gateway/Set-Top Box, located at the user’s premises, the Content-Aware Network Node, residing in the operators’ premises, and the Content Server, located at the content provider’s premises. Each entity comprises of certain functionality that is depicted in the form of stack and follows the same color coding with Fig. 1, referring to the network adaptation (purple), media synchronization (red) and media composition (grey) planes. The Content Server (CS) resides in the content provider premises, i.e., the broadcaster and the provider of enhancement layers. It is responsible for providing the metadata referring to the content transmitted and the media format in use. Moreover, it is where the spatio-temporal marking of content takes place, which will allow later for the synthesis of the media streams. If the broadcast media (base layer) is a live feed then

Fig. 2.

Overview of the Architecture

the synchronization marks (for time and object tracking) are placed by the broadcaster. Otherwise, the process of placing synchronization marks can take place offline and be performed by the broadcaster, the enhancements provider or even the content creator. The media is then encoded using an extension of the MPEG standards and is transmitted over the broadcast channel using the DVB framework. For the enhancement layer, either the broadcaster or the enhancements’ provider will provide the respective media along with a description of how to associate timing and placement of the enhancement with the synchronization marks already placed in the base layer. The CS is also responsible for the publication of the enhancement layer to the content-aware Internet infrastructure. Finally, being the source of the enhancement layer, the CS will initiate the delivery of the requested layer(s). The Home Gateway/Set-Top Box (HG/STB) resides at the end user’s premises and is responsible for the personalization of the broadcast content according the user’s needs. The HG/STB performs automatic content metadata extraction, once a media stream is received from the broadcast network. It is also responsible to initiate the discovery mechanism so as to find the related enhancement layers and moreover to request the delivery of a specific enhancement layer, chosen either by the user himself or a personalization agent. Finally, the HG/STB is responsible to synchronize the two streams and create one composite stream. Thus, it is required that the HG/STB will be able to decode, read the synchronization marks, synchronize and repackage the streams, considering the (different) media formats and the respective metadata. The Content-Aware Network Node (CANN) interconnects with other CANNs belonging to the same or different ISPs and together they form a content-aware overlay. This overlay incorporates a variety of functionality. First of all, it deals with the publication and discovery of content metadata, allowing for base and enhancement layers to be associated and discovered. Moreover, a context acquisition mechanism is included. In the proposed framework, context has a four dimension aspect: i) the network context that describes the current network

status, the network policies and the operator’s preferences, ii) the terminal device context that describes the capabilities and limitations of the terminals, iii) the media context that refers to the type as well as the encoding parameters and the QoS requirements of the media streams and iv) the user context in the sense of users’ preferences. Context acquisition is achieved through edge CANNs, attached to the access networks, which acquire information about the media, the terminals and the users, and through core CANNs which mainly collect information about the network. Finally, the CANNs are responsible to ensure the just-in-time delivery of the enhancement layers to the HG/STBs. To do so, adaptive delivery mechanisms, based on QoS and in-network caching concepts and considering the contextual information, will be materialized by the CANNs. Hybrid media-network adaptation mechanisms that will face the quite different requirements of the two content layers may also be included here. V. S UMMARY In this paper, we have proposed a new approach for a hybrid Broadcast Internet TV framework. Current trends for Connected TV involve technologies for allowing TV sets to consume multimedia content from both the Broadcast and Internet worlds and to embed A/V content in applications. However, the true merging of Broadcast and Internet should include the ability to combine content from different sources (broadcasted of Internet-based). Our approach, the Enhanced Connected TV, can be considered as an extension of existing standardization attempts (e.g. HbbTV). The main objective is to allow for composition and synchronization of media streams. Base (broadcast) layer combined with internet-based enhancement layers that use new or extended media containers will revolutionize and transform the TV market, paving the way for new business models leading to a fully open market. The proposed architecture consists of three planes that incorporate a number of functionalities and its key architectural entities have been identified. R EFERENCES [1] TV-Anytime Forum, URL: http://www.tv-anytime.org/ [2] Open IPTV Forum, Open IPTV Forum Whitepaper, January 2009 [3] ETSI Technical Specification 102 796 v1.1.1, Hybrid Broadcast Broadband TV, June 2010 [4] B. S. Manjunath, P. Salembier and T. Sikora, Introduction to MPEG 7: Multimedia Content Description Language, John Willey and Sons, Ltd, England 2002 [5] I.S. Burnett, F. Pereira, R. Van de Walle, and R. Koenen, The MPEG-21 Book, John Willey and Sons, Ltd, England 2006 [6] H. Schwarz, D. Marpe, and T. Wiegand, Overview of the Scalable Video Coding Extension of the H.264/AVC Standard, IEEE Trans. On CSVT, Vol. 17, No. 9, pp. 1103-1120, 2007. [7] V. K. Goyal, Multiple Description Coding: Compression Meets the Network, IEEE Signal Processing Magazine, vol. 18, no. 5, pp. 7494, Sept. 2001. [8] T. Koponen, M. Chawla, B-G. Chun, A. Ermolinskiy, K. H. Kim, S. Shenker, and I. Stoica, A Data-oriented (and Beyond) Network Architecture, in Proc. ACM SIGCOMM 07, Kyoto, Japan, Aug. 2007 [9] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard, Networking Named Content, in CoNEXT 09. New York, NY, USA: ACM, 2009, pp. 112