Data Broadcasting and Interactive Television REGIS J. CRINON, DINKAR BHAT, DAVID CATAPANO, GOMER THOMAS, JAMES T. VAN LOO, AND GUN BANG Invited Paper
This paper provides an overview of the digital television (DTV) data broadcast service and interactive service technologies that have been deployed over the last ten years. We show how these trials have led to the development of data protocol and software middleware specifications, worldwide. Particular attention is given to the series of standards established by the Advanced Television System Committee. Experimental deployments to both Personal Computer (PC) and Set-Top-Box (STB)_receivers are considered, with an emphasis on the services that have introduced new business models for DTV operators. Keywords—Datacasting, digital television (DTV) infrastructure, digital television (DTV) middleware, digital television (DTV) services, interactive television.
I. INTRODUCTION The introduction of digital television (DTV) services has opened up many new vistas. One of these is the ability to include data in a DTV broadcast stream along with the audio and video. This capability can be used to provide an enhanced experience for television viewers (interactive television data broadcasting), and it can be used to deliver data for applications that have no direct connection to television programming (general purpose data broadcasting). This paper deals with both classes of “datacasting” (data broadcasting) applications. An important enabler for interactive television is the need for new functionality in TV receivers, including frame buffers and new capture logic for demultiplexing and parsing digital streams. Because the new sophisticated receiver environments offer more real estate and higher resolution
Manuscript received July 20, 2005; revised October 15, 2005. R. J. Crinon and J. T. Van Loo are with Microsoft Corporation, Redmond, WA 98052 USA (e-mail:
[email protected]; jamesvan@microsoft. com). D. Bhat, D. Catapano and G. Thomas are with Triveni Digital, Inc., Princeton Junction, NJ 08550 USA (e-mail:
[email protected];
[email protected];
[email protected]). G. Bang is with Electronics and Telecommunications Research Institute, Daejeon 305-350, Korea (e-mail:
[email protected]). Digital Object Identifier 10.1109/JPROC.2005.861020
graphics, over-the-air broadcast, satellite, and cable operators have realized that there is an opportunity to supplement their audio/video services with new types of interactive and data enhancement services. An early form of interactive television came with the advent of the electronic program guide (EPG) application that allows consumers to navigate through a large set of digital channels. The development of the World Wide Web in the early 1990s and the rapid growth in Internet services gave an additional incentive to DTV operators to look beyond simple EPG applications. To understand the various potential benefits and pitfalls of interactive TV, cable and over-the-air broadcasters launched a series of interactive TV trials starting in the mid-1990s, including nationwide trials by the Public Broadcasting Service (PBS) and a number of its member stations in 2001 and 2002, which were based on the Advanced Television Enhancement Forum (ATVEF) specification. At the same time, general purpose datacasting was being considered by many public TV stations as a means of furthering their public service mission and by some commercial TV stations as a possible source of new revenue. The interest in new data services was further motivated by the need to download software upgrades for the new DTV receiver systems. For example, in the mid–1990s, the Tele-TV service was designed to support a data download capability for delivering Operating System patches/enhancements to its set-top-box terminals. All of these factors led to a wide recognition of the need for standards, and the Advanced Television Systems Committee (ATSC) in particular started working on the A/9x series of standards to address the data delivery aspects of data broadcast and interactive television services, or in other words, the fundamental transmission protocols and signaling mechanisms needed for the deployment of any such services. In parallel to this effort came the realization by the industry that standardizing the software run-time environment and application interfaces in DTV receivers is necessary if broad deployment of such services is to happen. The idea here was to provide a minimum set of software interfaces and hardware
0018-9219/$20.00 © 2006 IEEE 102
PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
resources that service providers can assume available in any DTV receiver. In ATSC this standardization effort took the name of DTV Application Software Environment (DASE), which was completed in 2003. With the prospect of standardized software infrastructure in every receiver, consumer electronics companies, information technology companies, and cable operators in the United States got together to solve the problem of how such new DTV receivers could interface with the communication and conditional access protocols used by cable operators. From this collaboration came the Unidirectional cable interface based on the concept of a “point of deployment” (POD) module, which is a Personal Computer Memory Card International Association (PCMCIA) card for translating cable conditional access to a standardized interface into the DTV receiver based on the Digital Transmission Content Protection (DTCP) specification. CableCARD is the marketing term that has been adopted by the consumer electronics and cable companies for the POD module. Data broadcasting and interactive television are slowly becoming a reality as, one by one, the standards and technologies fall into place. The next milestone will certainly be a bidirectional cable agreement between consumer electronics and information technology companies and cable operators to allow full interactivity between a DTV receiver and a cable head-end. Once the agreement and the supporting standards are ready, it will only be a matter of time before the broad consumer market gets access to datacasting and interactive services. So, what are these protocol and software technologies that are going to make this all possible? What are the remaining challenges? This is what we propose to discuss in this paper. The paper is organized around two main topics: general purpose data broadcasting and interactive television. Although these application domains have much in common, they have a number of key differences. General purpose datacasting services are often targeted for enterprises (such as businesses or schools), rather than consumers. Their target receiving devices are typically PCs or portable devices, rather than TV receivers. The data items are typically not delivered in real time and must be cached on the hard disk drive of the receiver before the application consuming the data can start using them. The broadcast data items are often shared among multiple users on an LAN. II. GENERAL PURPOSE DATA BROADCASTING A. Business Motivation In the United States, most of the stations involved in data delivery over digital terrestrial television broadcasts so far have been public broadcasting stations, largely because of differences in mission and funding between public broadcasting stations and commercial broadcasting stations. 1) Public Broadcast Stations: Most public television stations view their public service mission in broad terms, encompassing not only public service programming, but also
CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
provision of other educational and information services to their communities, and even a role in pioneering innovative broadcast technologies. The funding sources for public television stations both reflect and reinforce this view. Most of them get a mix of money from government grants, corporate sponsorships and grants, and individual memberships and gifts. This funding is seldom in the nature of direct payment for service, but is based instead on a general perception that the stations are providing valuable public services. Thus, public television stations can embrace data broadcasting in support of education, emergency management, and other public services without having to show a direct return on the investment. Their enhanced reputation from using innovative new technology to provide new types of services leads indirectly to increased funding. The result is that at the time of this writing, around 50 public television stations have installed a DTV data broadcast system of some kind, with which they are deploying a variety of applications. 2) Commercial Stations: Commercial television stations typically need to show a more direct connection between the services they offer and the payment for them. Their traditional revenue sources are based on advertising targeted to consumers, so it is natural for them to look toward data broadcast services aimed at consumers and supported by advertising, for example data-enhanced TV programming or data-only “TV programs.” However, broadcasters are reluctant to invest in data-enhanced TV programming until enough consumers have data-capable DTV sets, and consumers are reluctant to pay for data-capable DTV sets until there is a significant amount of data-enhanced programming available, a classical “chicken and egg” situation. This is aggravated by the fact that until very recently there was no standard for data-enhanced programming that was widely accepted across both terrestrial and cable environments. Another option for commercial TV broadcasters is leasing bandwidth to third parties for use as a data delivery pipe for applications that may be unrelated to TV broadcasting. It has taken a while for this to develop, since it requires a whole new business orientation for broadcasters, with new sales channels and new marketing expertise. There were several early attempts to start up services based on delivering various kinds of digital files to consumers over DTV signals, but the cost of the necessary custom receivers created a formidable economic barrier, and the attempts were not very successful. More recently the cost of data-capable receivers (and disk storage) has come down, and several efforts are under way now to offer push video on demand (VOD) services, whereby video files are broadcast to data-capable set-top boxes in the hands of individual consumers and stored on disk there. The consumer can then view the videos on demand, either on a pay-per-view or subscription basis. Early indications for the success of such services are promising, although it is too early to tell for sure yet. The bottom line is that so far only a handful of commercial stations are broadcasting data.
103
B. Applications
C. Application Requirements
Digital TV enables a great many different types of data broadcast applications, which may be classified in a number of ways [1]. Three of the more important classification criteria are coupling, target audience, and type of data (from a technology standpoint). 1) Coupling: One classification basis for applications is the degree to which the application is coupled to the normal TV programming. Tightly coupled data are intended to enhance TV programming in real time. An example of such an application is the display on demand of player, team, league and game stats during a sports event. The timing of such data is often closely synchronized with the video and audio frames being shown. Loosely coupled data are related to the program but not closely synchronized in time. For example, an educational program might data broadcast supplementary self-test quizzes along with the program. Noncoupled data are typically contained in separate “dataonly” virtual channels. Some examples of applications enabled under this category are delivery of software updates to DTV receivers, delivery of emergency alerts to public safety agencies, and push VOD services. 2) Target Audience: Another important classification basis for applications is in terms of the target audience, i.e., whether the broadcast data is targeted to enterprises or to the consumer mass market. For data broadcast applications targeted to consumers, a key requirement for success is that it must be possible to produce low cost DTV receivers that can receive and consume the data. At the time of this writing, DTV technology is just reaching the point of maturity where such low cost data-capable receivers are becoming possible. For applications targeted to enterprises, the cost of the DTV receivers is less crucial. Data may be sent only to specialized receivers in an enterprise, which then may distribute it, for example over an LAN, to other users in the enterprise. Since the value of such applications may easily outweigh the cost of specialized receivers, they are more viable from a business perspective. Almost all enterprise-targeted applications involve loosely coupled data. Examples of such applications include distribution of educational material to public schools and distribution of emergency alerts to public safety agencies. 3) Technology: Applications may also be classified by the type of data, for example streams (such as streaming video or audio), files (such as video clips or text documents) and network datagrams (such as Internet Protocol (IP) packets). The type of data can be further broken down into synchronized, synchronous, or asynchronous data. Synchronized data has a strong timing relationship with another stream, such as a video stream. Each item in a synchronized data stream is intended to be presented at a specific time, relative to a clock established in another stream in the broadcast. Synchronous data has an internal timing association among its own data items, but not with any other stream in the broadcast. Asynchronous data has no internal or external timing relationships.
The different datacasting applications described above require a variety of different types of functionality in order to meet their objectives [2], such as the following. • Bandwidth management: Often broadcast bandwidth must be allocated among multiple applications, varying by date and time, with priorities among the applications. • Scheduling: Different types of schedules may be required, such as one-time delivery, periodic delivery at specified intervals, continuous carouseling, etc. Start times, durations, priorities, and bitrates may need to be specified. For some applications it may be necessary to specify separately the times at which the data is fetched from its source and the times at which it is broadcast. • Flow control: To efficiently utilize available bandwidth in a transport stream for data broadcasting, the multiplexer would have to exchange bandwidth information, usually called flow control messages, with the data server. Consequently, the data server would appropriately control its output bandwidth. • Queue management: Different content items may be assigned different priorities during scheduling. For instance, critical items that need to go out at specific times may get the highest priority at the scheduled time, whereas noncritical items may get lower priorities. Further, certain items may be given fixed bandwidth profiles as compared to others that are sent on a best-effort basis. • Error recovery: Parts of content may get corrupted or lost during transmission. While the transmission system typically has built-in error correction (for example Reed–Solomon and Trellis forward error correction (FEC) in the 8-VSB (vestigial sideband) modulation system used for ATSC DTV broadcasts), additional correction at a higher layer can be useful in more lossy conditions. • Receiver targeting: Often content has to be targeted to reach a specific set of receivers; for instance, in a distance education application only students registered for a course must receive course material. Receivers may be targeted by identity or attribute. • Encryption: Content may need to be encrypted; for instance, in a homeland security application. Securely delivering the encryption key to receivers is a challenge. Encryption must work in tandem with receiver targeting since the key must only be available to targeted receivers. • Receiver service discovery: The basic structure of a DTV broadcast stream is specified in the so-called MPEG-2 Systems Standard [5], which is a member of the family of MPEG-2 standards developed by the Moving Picture Experts Group, a working group of the International Standards Organization (ISO). The broadcast stream consists of a sequence of transport stream packets, and each packet contains in its header an identifier called a PID that identifies which specific audio stream, video stream, data stream, etc., the packet
104
PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
•
•
•
•
belongs to. Service discovery means finding the broadcast band and the specific PID or PIDs for the transport stream packets carrying the data for the desired service, identifying the encapsulation format used, identifying the correct IP addresses (for data carried in IP packets), etc. Receiver acknowledgments: For certain applications like emergency alert notification, it is imperative that content be received. In such applications, receivers must be made aware that they should acknowledge certain data items and must be told where they should send their acknowledgment. Status reporting: Both the content server and receivers must report status such that they can be monitored or controlled. The server must report information like the transmitted items, total bits sent out, bandwidth usage, and number of acknowledgment received from receivers in the field, etc. The receiver must report on the bandwidth currently being received, the current cache size, etc. Remote monitoring and control: Operators of data servers need tools to monitor and control the data server remotely. Features that may be needed in a remote controller include the ability to start and stop the server, monitor bandwidth usage, monitor warnings and errors, change settings like the identifier value for the MPEG-2 Program Element used for broadcasting the data. Autolaunching Content on Arrival: Sometimes it is useful for content to be automatically launched on arrival by the appropriate application. For instance, when a Hypertext Markup Language (HTML) page describing an emergency is delivered, the page may be launched immediately on arrival.
D. Protocols In order to support efficient interoperability among data broadcast products, a major requirement is that the data must be encapsulated in standard formats and delivered using standard protocols. In addition, there must a well-defined in-band service description framework for signaling the presence of current data services and announcing future data services. The ATSC Data Broadcast Standard [3] describes basic encapsulation formats and protocols and signaling and announcement information that can be used in different scenarios. The encapsulation formats and protocols are arranged in a layered fashion, as illustrated in Fig. 1. They are all based on the MPEG-2 transport stream structure. The different encapsulation formats have different structures layered above that, as described below. Data piping encapsulation is intended for proprietary applications with special requirements that are not met by the other standard encapsulation protocols. The data items are simply packed into MPEG-2 transport stream packets in an application-dependent way. Since the standard encapsulation protocols meet the needs of most applications, this form of encapsulation is not being widely used in practice today. CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
Fig. 1.
Protocol stack defined in the ATSC data broadcast standard.
Fig. 2.
Addressable section encapsulation for protocol datagrams.
Data streaming encapsulation is intended for streaming data that make use of the program clock reference (PCR) and presentation time stamp (PTS) values defined in the MPEG-2 Systems standard [4] for timing. The data stream is packed into packetized elementary stream (PES) packets, the same type that is defined in the MPEG-2 Systems Standard for carrying audio and video streams for TV programming. In current practice, however, most streaming data is IP streaming media, typically carried in the form of IP multicast streams encapsulated by protocol datagram encapsulation. Protocol datagram encapsulation is intended for carrying network protocol traffic, such as IP traffic, in a DTV broadcast stream. The protocol encapsulation format specified in the ATSC Data Broadcast Standard, often called multiprotocol encapsulation (MPE), encapsulates network protocol data units in Distributed Storage Media Command and Control (DSM-CC) addressable sections [8], as shown in Fig. 2. These addressable sections are in turn packed into MPEG-2 transport stream packets. This method is capable of handling diverse protocols, including those in the IP family, the Internetwork Packet Exchange (IPX) family and the Open Systems Interconnection (OSI) family. Logical Link Control/Subnetwork Access Point (LLC/SNAP) headers [5], [6] are used within the addressable sections to identify the network protocol which is being carried. However, in recognition of the widespread usage of the IP protocol, these headers are optional (and in fact deprecated) for the IP protocol, but are required for all other network protocols. IP multicasting is especially attractive for datacasting because it is a broadcast protocol, it is well understood, receivers can implement IP stacks easily, and data received over the broadcast medium can be treated much the same as regular IP traffic. In particular, it can be rerouted over LANs if required. 105
Fig. 3.
Data download encapsulation.
Data download encapsulation is intended primarily for files, in either a one-time delivery or continuous carousel mode. It can also be used for delivery of unbounded data streams. The data download protocol specified in the ATSC Data Broadcast Standard is based on the DSM-CC data download protocol [9]. It provides an efficient way of packaging bounded data objects like files into modules, versioning them, sending them once or cyclically, sending control messages to enable receivers to deal appropriately with the data, and including time stamps for time synchronization if required. A data carousel, where bounded modules are sent cyclically, is a common usage of this protocol. Fig. 3 illustrates data download encapsulation. Each data object is organized into one or more discrete data modules, and in order to carry data modules in the MPEG-2 transport stream, each module is segmented into the payload of a number of download data block (DDB) sections, each with a maximum payload of 4066 bytes. The download protocol may be constructed with one or two layers of control information, contained in download info indication (DII) and download server initiate (DSI) sections. The DDB, DII. and DSI sections are packed into MPEG-2 transport stream packets. For simple data structures, the one layer protocol will suffice. The one-layer protocol limits the number of modules that may be referenced to a small number and does not allow any logical grouping of modules. Hence, for more complex data structures, the two-layer protocol may be used, allowing a larger number of modules and providing a way to logically group modules. For delivery of unbounded data streams, a continuous sequence of new versions of a module may be transmitted. In practice, many file delivery applications today do not use the data download protocol. Instead they encapsulate files into IP packets with either a proprietary file encapsulation protocol or a standard file encapsulation protocol such as the Society of Motion Picture and Television Engineers (SMPTE) Unidirectional HTTP (UHTTP) [7], and then encapsulate the IP packets into MPEG-2 transport stream packets with protocol datagram encapsulation. 106
The next question is how receivers find data services in a transport stream. Under the ATSC Data Broadcast Standard, this can be done by means of the data service table (DST). The DST may have entries for one or more applications. Each application may have one more “taps,” which point to resources of the application. A tap always specifies an MPEG-2 program element in an MPEG-2 transport stream, and may include additional selection information like a specific IP address in the case of encapsulated IP packets in addressable sections. The DST is signaled in the virtual channel table (VCT) and the program map table (PMT), defined in the ATSC PSIP standard [10] and the MPEG-2 Systems standard, respectively. In order to facilitate the implementation of data broadcast services in the ATSC space, an implementation guide [11] is available, which provides a set of guidelines in accordance with the data broadcast standard. The information therein applies to broadcasters, service providers, and equipment manufacturers. For those geographical areas in the world where the digital video broadcasting (DVB) DTV standards are used, the DVB specification for data broadcasting [12] contains data encapsulation protocols very similar to those in the ATSC standard. However, this specification takes a very different approach to signaling. For IP multicast services, it is useful to supplement the signaling information in the DST with messages formatted according to the Internet Engineering Task Force (IETF) Session Description Protocol (SDP) [13] and IETF Session Announcement Protocol (SAP) [14]. These SDP/SAP messages describe various properties of a multicast service, including the IP addresses and UDP ports used for the service. In the ATSC domain, the use of SAP/SDP in conjunction with the DST is required for IP multicast services, as described in the ATSC standard on delivery of IP multicast sessions over ATSC data broadcast [15]. Typically, a tap in the DST for an application carrying an IP multicast service gives the IP address of the SDP/SAP announcements. This ATSC standard also describes other aspects of the management of multicast services within virtual channels under different conditions, and associated signaling requirements. In the U.S. cable world, the SCTE (Society of Cable Telecommunications Engineers) standard on IP multicast for digital MPEG networks [16] describes the protocols for carriage of multicast services. While the DSM-CC addressable section encapsulation format in [16] is almost identical to that in [15], the signaling implementation is different. In [15] each application that carries a multicast service is considered as an independent network (so a single MPEG-2 program may consist of multiple networks), and hence the scope of IP addresses is restricted to that application. In [16] each MPEG-2 program is considered to be a network, so the scope of IP addresses is the entire program. The approach in [16] may require more care in the management of multicast addresses, since it may require coordination among multiple independent content providers whose applications are being inserted into the same MPEG-2 program. The content providers must make sure that there are no collisions in multicast addresses. However, the ability under [15] to PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
scope multicast addresses more efficiently comes at the price of more complex implementation of the IP stacks in the data receivers, since the number of virtual network interfaces in the receiver can change dynamically as applications come and go in a program. For some applications it is desirable to synchronize the display of large data objects, such as complete HTML pages containing embedded images and such, with a video stream. It is possible to do this using the PCR values that are used for audio and video synchronization. However, the PCR values may have discontinuities, for example where commercials are spliced in. This makes it difficult to place meaningful PTS values in the header of the encapsulated data objects in the usual way, since it may take a long time for the data object to download, and the timeline may suffer a discontinuity in the meantime. To deal with this problem ATSC developed a trigger standard [17], which uses separate triggers to decouple the data objects from their activation timing. The triggers are very small objects that carry the activation time for data objects and are appropriately transmitted in the presence of timeline discontinuities. The trigger standard also enables delivery of events to receivers where the trigger itself carries the user-defined payload instead of referring to a data object. One useful application of datacasting with unique signaling requirements is delivery of in-band updates or upgrades of firmware, middleware, device drivers, and application software in terminal devices like consumer TV sets or set-top boxes. The ATSC Software Download Data Service standard [18] provides a protocol to support this application, based on the two-layer data download protocol. This protocol has built-in signaling to indicate the manufacturer and model of the device for which each downloadable software module is intended. Similar protocols are defined by the Specification for System Software Update in DVB Systems [19] for those geographical regions using the DVB digital TV standards, and by the ANSI/SCTE Host-POD Interface Standard [20] for cable environments in the United States (where ANSI is the American National Standards Institute). So far this section has focused on protocols and encapsulation formats for transmitting data in the forward channels, but for many applications a back channel is also important. One way to enable interactivity between users and applications is to transfer application data to the terminal device’s local cache and allow the user to run the application from the local cache. This is sometimes called “local interactivity.” On the other hand, it may be required to interact with data and software on remote servers through an interaction channel, for instance using an Internet connection through a cable modem. This is called “remote interactivity,” and the ATSC Interaction Channel Protocols standard [21] defines protocols on the return channel. (See Section III of this paper for a more complete discussion on interactive television.) While the datacasting standards are sufficient for transmitting and organizing data services in terms of applications, they do not provide sufficient signaling to support all the application requirements defined in Section II-C above, like targeting, support for receiver acknowledgment, etc. CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
Fig. 4.
General data broadcast scenario.
However, the standard signaling can be extended to fill the gap through the use of in-band Extensible Markup Language (XML) messages, which will be called “catalogs” in this paper, that provide fine-grained information about what data are in the datacast stream and how receivers should handle them. The service framework described in [3] and [15] can be used to locate the catalogs themselves. In the case of data services based on encapsulated IP packets the catalogs for an application can be periodically transmitted on a particular IP multicast address and port. The DST can tell the receiver where to find the SDP/SAP messages for the application, and the SDP/SAP messages can tell the receiver the IP addresses and UDP ports for the catalogs, as well as the IP addresses and UDP ports for the actual data. The catalogs can contain information to satisfy many of the application requirements described in Section II-C above. Consider the case when data is targeted and encrypted. Access control information in a catalog can tell whether items are intended for targeted receivers only or all receivers. If they are for targeted receivers, then the ID of the targeted receivers can be listed (in the case of targeting by ID). If the content is encrypted, then the access control information giving information about the encryption algorithm and the decryption keys can be contained in the catalog. Similarly, for FEC and compression, the catalog can provide information about the FEC and compression algorithms that were used. With this information in the catalog, receivers can correctly download data items. For enabling receiver acknowledgments, the catalog can specify the location where receivers must send the acknowledgment messages. More details about this approach can be found in [2]. E. Infrastructure In order to understand the requirements for the data broadcast infrastructure, it is first necessary to understand the general data broadcast environment. 1) Data Broadcast Environment: Fig. 4 illustrates a general data broadcast scenario, where data files, media streams, and/or protocol packets for a variety of applications are broadcast to a variety of different types of data receivers. There are typically three roles involved in such a scenario: • manager of the broadcast pipe (broadcaster); 107
• managers of the content flow (content providers); • end users of the content (content recipients). Depending on the specific application, these roles may be filled by members of the same organization, by members of three different organizations, or by any combination in between. In any case, any data broadcast infrastructure implementation should support the needs of all three of these roles. 2) Head-End: There are two logical architectural components at the head-end—scheduling workstations to meet the need of the content providers and a data server to meet the needs of the broadcaster. In actual implementations these may be combined on a common platform, or they may be on separate, distributed platforms. A scheduling workstation allows a content provider to perform detailed scheduling for retrieval and broadcast of individual data items. The content provider can change the scheduling at any time. The scheduling information is transferred to the data server as needed. The content provider may specify to the data server the receivers to which the content must be targeted, and whether it should be encrypted. It may also specify if content must be compressed or encoded with error correction. Sometimes it is necessary for a content provider to distribute the same data to multiple data servers, to be broadcast at different rates and times. Convenient scheduling tools are needed to accommodate these different situations. The data server is by necessity located at the broadcaster’s facility, since it is the component that actually inserts the data into the broadcast stream. The data server may also allocate bandwidth among multiple different content providers or applications, and may keep track of the bandwidth actually used by the different content providers or applications for billing purposes. Once the data server receives the data items pushed by the providers, it applies encryption, error correction, etc. as required. The data server enforces and meters the bandwidth usage for each content provider. It can then turn over the data to an MPEG-2 gateway according to the schedules provided. The gateway would encode the data to MPEG-2 packets and feed it to a multiplexer for actual insertion into the broadcast stream. Alternatively, the data server could encode the data into MPEG-2 format itself, and feed it to the multiplexer. An advantage of the former approach is that the data server is freed from the task of interfacing with different multiplexers. On the other hand, the gateway is an extra component in the system, which can add to the cost. Many TV stations use variable bitrate (VBR) encoders, in which the bandwidth used for encoding the video depends on the nature of the picture. When there is a great deal of detail in the frames and a great deal of change from one frame to another, more bandwidth is needed to encode the video than when there is little detail and little change from frame to frame. When VBR encoders are used the broadcast stream has some so-called “opportunistic” bandwidth which can be used for datacasting, i.e., bandwidth which is available when the video is not very demanding but not available at other times. 108
Since the multiplexer is aware of how much opportunistic bandwidth is available at any time, it can provide the information in real time to a data server through a handshake protocol such as the SMPTE 325M protocol [22]. Of course, the data server must support the same protocol as the multiplexer in order to take advantage of the opportunistic bandwidth. In various application scenarios a single content provider may need to provide data to several data servers, and conversely multiple content providers may need to provide data to a single data server. Broadcasting networks may have distributed data servers, for instance at a national uplink site and at local stations. The national data servers insert content of national interest while local data servers insert local content of local interest. 3) Receiver End: The data receiver is the architectural component in the chain responsible for receiving data and making it available to end users. It is equipped with a device responsible for tuning to the appropriate data broadcast channel and extracting data. The device may be a set-top box or a PC receiver card. If it is a PC-based device, it may be internally connected through a peripheral component interconnect (PCI) bus, or externally through a universal serial bus (USB) port. The former approach is a little tidier (and supported higher data rates until the advent of USB 2.0), but the latter is more convenient and portable. If the data consist of IP packets, and if the device is equipped with an appropriate software driver (like a network driver interface specification (NDIS) miniport driver on the Microsoft Windows software platform), the device may be visible to the PC user as a regular network interface card (NIC), albeit one that can only carry inbound traffic. If the underlying data are in the form of IP datagrams and the device acts as an NIC, then it can simply pass the decapsulated data to the application level via the usual IP stack. In fact, the device can often act as a router, and forward the IP data over an LAN to other users. The application level software can then extract targeted data, apply decryption, decompression, FEC, and take other actions as required. It may also be responsible for sending acknowledgment messages regarding status of received data items to the data server. If data items are incomplete because packets are corrupted or dropped during transmission, then the server can resend only those packets and the receiver must be able to reconstruct the data items correctly. The receiver can store items on a hard disk, perform cachemanagement appropriately, and autolaunch applications as deemed necessary by the application. The receiver can act as an edge-server, in which case the stored data is made available to other users over an LAN. In enterprise applications, receivers acting like edge-servers or routers may be popular, since the receiver cost per end user is much less. F. Case Studies 1) WRAL/WRAZ: WRAL, a Capital Broadcasting Company station in Raleigh, NC, was the first TV station in the United States to go on the air with digital TV, in June 1996, and was among the first TV stations in the United States to PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
Fig. 5. Screenshot from WRAL TotalCast Service.
start data broadcasting, in 1999. Fig. 5 shows a screen shot from the WRAL TotalCast service. The TotalCast service promotes the WRAL brand by delivering files to consumer PCs in the Raleigh area with such content as a mini-Website derived from the WRAL.com Web site, video clips that can be viewed on demand (from WRAL newscasts, local programs, specials, and documentaries), audio clips (from North Carolina News Network), computer games, and real-time stats during basketball and baseball contests. 2) Kentucky Education Television (KET): KET, an agency of the state of Kentucky, operates a network of 16 TV transmitters covering the entire state. The KET staff began dreaming of data broadcasting in early 1999 and realized that dream when they started transmitting data in May 2003. They are currently datacasting state legislative sessions and committee hearings in the form of IP streaming media and also datacasting weather alerts and warnings. These data services are accessible to anyone with a PC, a DTV adapter, and software available from the KET Web site. In addition to this, they are datacasting health services information available only to Kentucky Family and Health Services personnel, and they are working with the Department of Transportation on a project to deliver content to information kiosks in roadside rest areas throughout the state (locations which are very difficult and expensive to connect to the Internet). 3) KLCS: TV station KLCS, owned and operated by the Los Angeles Unified School District (LAUSD), installed a data broadcasting system in the fall of 2003, and soon after that implemented a fully automated system that allows teachers in the school district to order educational videos from the extensive KLCS video archive via a Web interface and have them delivered by data broadcasting directly to servers in their schools. When a video arrives at a school, the teacher who ordered it is automatically sent an e-mail notification containing the URL where the video can be accessed over the school’s LAN. The response to the initial pilot implementation was so positive that KLCS is now in the process of extending the system to all the high schools CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
in the city, and hopes to extend it to the other schools in the city before too long. 4) Nebraska Educational Telecommunications (NET): NET operates a network of 9 transmitters and 14 translators covering the state of Nebraska. They began data broadcasting very early, and they now have several applications, all education specific. One application provides a Web site where teachers can access a searchable library index and request files containing video, audio, text, still photos, or multimedia materials and have them dumped via data broadcasting directly to their desktop. Another version of this has an edge-server that serves multiple schools connected by a very high-speed data link. If the ordered content is already on the edge-server, it goes to the user directly. Otherwise the content is delivered to the edge server via data broadcasting, with e-mail notification to the teacher when it arrives. They are currently working with the state Department of Roads and Department of Tourism to implement a content delivery system for kiosks at roadside rests. G. Prospects for the Future The number of data broadcast projects using ATSC broadcast facilities has started to rise sharply within the last couple of years, and there are several factors that are likely to make this trend continue for the foreseeable future. • The cost of data-capable receivers will continue to fall. • The examples of successful data broadcast projects will encourage others to replicate them in other locations. • New ideas for data broadcast applications will continue to arise as broadcasters and entrepreneurs get a better understanding of the potential of the technology. Not only will there be an increasing number of applications targeted to enterprise receivers, but consumer applications are likely to play an increasingly important role in the near future. III. INTERACTIVE SOFTWARE ENVIRONMENTS A. Business Motivations The introduction of Interactive Television technologies has been driven by two consumer experiences: Personalization of the television services and bringing e-commerce into the living room, often referenced as “t-commerce.” Personalization of services has been motivated by the fact that it increases consumer loyalty and also gives operators an option to localize their services (local preferences, local businesses). Services falling in this category include the following. • EPGs: This is perhaps the older form of interactive TV. An EPG allows viewers to navigate through a large set of DTV channels. Over the years, EPG applications have been enhanced with personalized features allowing each viewer to preselect a preferred set of channels or to control access to channels. • VOD applications: VOD is an interactive application that has received a lot of attention over the last few years. A VOD application provides viewers with a selection of movies/TV shows available immediately. 109
Upon selecting a movie, the viewer initiates a session with a remote server. Over the years, VOD applications have been enhanced with search applications to facilitate browsing through large sets of movie databases. • Passive commercial enhancements: During a commercial, an interactive TV application provides a series of options that a TV viewer can select to get more information on various aspects of the product/service being advertised. The options and enhancements are typically shown on the screen as graphics overlay. This concept is also applicable to news and sports enhancements. • TV game enhancements that allow viewers to play along and measure up to the contestants on TV. • Polling applications: Viewers are presented with an option to vote in support of a candidate. Interactive polling applications are particularly well suited for reality shows or for music contests, for example. T-commerce has been driven to move some of the e-commerce services available from the PC today to the TV screen. This includes the following. • Active commercial enhancements that complement the passive commercial enhancements described above with the possibility to engage into a financial transaction to purchase the product or service being advertised. • Games involving gambling or money pools of some kind (card games, sports bets). • Home shopping and auctions: Typically offered on dedicated channels, an application allows viewers to purchase the item on display. • Purchases of music or educational materials that can be downloaded from a service after purchase. Categorization of interactive applications can follow other criteria. For example, interactive services in some instances require a return channel that allows the interactive TV application to communicate with a remote transaction server. This is the case for most of the T-commerce applications, but some of the personalization applications (VOD, polls) also rely on such infrastructure. These applications are referenced as transactional applications. The remaining applications are local interactive applications because they solely rely on data stored in the DTV receiver. Another categorization, bounded versus unbounded applications, is sometimes used to differentiate applications that are related to programming versus those that are not, respectively. In some cases, the business challenges associated the deployment and viability of some of these services should not be underestimated: for example, any application enhancing a commercial can take the viewers away from watching subsequent commercials. If proper safeguards are not in place, the value of the following commercials can be reduced and this of course, could make a big impact on the overall advertising business. B. History With the advent of Moving Pictures Experts Group (MPEG)-based digital video and audio compression technologies in the early 1990s, cable and telecommunication
110
companies started exploring the possibility of enhancing the traditional TV viewing experience with new interactive services like walled gardens, games, home shopping and educational programs. The biggest trial was conducted by Time Warner in Orlando, FL in 1994. It was called Full Service Network (FSN) and had a few thousand homes connected to a fiber to the curb network for easy access to services such as home shopping and VOD. Other trials followed shortly thereafter, and a number of interactive TV application providers such as Wink, OpenTV, Liberate, PowerTV, Canal+ Technologies competed aggressively for participations in these trials. Most of these interactive software providers worked closely with the two main set-top-box providers, Motorola Broadband (formerly called General Instruments) and Scientific Atlanta to gain certification on their platforms. Some of them participated in the deployments of services such as the Tele-TV service that Bell Atlantic, Nynex, and Pacific Telesis planned to deploy over microwave transmission channels. In parallel to these activities, the rapid evolution of the World Wide Web offered over-the-air broadcasters the opportunity to roll out field trials on their own with services like Intercast. Intercast services brought together Intel, NBC, and several content providers such as CNBC, CNN, and the Weather Channel. Intercast used data transmitted in the vertical blanking interval of the video signal to provide Web page enhancements to TV programming on the PC. This effort was quickly followed by a more TV-centric service called Advanced Television Enhancement Forum (ATVEF), also initiated by Intel in cooperation with broadcast operators like PBS. The ATVEF specification defined an HTML-based content format for broadcasting trigger events to interactive applications which, following their activation at specific instants, would invoke preloaded graphics and image- and text-based enhancements to the TV content. In 2000, the Declarative Data Essence (DDE) Ad Hoc Group of the SMPTE D27 Technical Committee standardized the ATVEF as a first level interactive TV content authoring format [7]. Another product called WebTV was deployed in the mid1990s, which was all about bringing PC content to the TV environment. The product of a cooperation between WebTV, Sony, and Philips, WebTV provided consumers with a receiver device to get access to the Internet from their TV. Fonts, color, and management of screen real estate were all specific to interlaced TV displays. Shortly after its acquisition by Microsoft Corporation, WebTV released WebTV Plus, which consisted of a device including a TV tuner and a TV listings service in addition to the base WebTV service. These brand new interactive TV trials prompted standardization bodies like ATSC to invest in the development of asynchronous and synchronized data protocol and middleware standards for enabling the development, distribution, and management of ITV and datacasting applications around a unified infrastructure. ATSC started working on the development of the Data Broadcast and DTV Application Software Environment (DASE) specifications in 1996; the 1[Online.]
Available: http://www.atsc.org/standards.html
PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
standards ATSC A/90 [3] and ATSC A/100 [23] were completed in 2000 and 2003, respectively.1 The Multiprotocol Encapsulation, Data Carousel, and Object Carousel Protocols described in A/90 have been used extensively. Lately, an agreement between the consumer electronic companies and the cable industry on a removable security module (CableCARD) for unidirectional and soon bidirectional services will certainly provide the framework for broad deployment of interactive television. C. Technologies An overview of the various interactive television software technologies is provided in the following sections. At the bottom of the stack come the MPEG-2 Systems protocols (see Section III-C1), the Service Information protocols (see Section III-C2), and then the data communication protocols (see Section III-C3), and finally the application programming interfaces (APIs; see Section III-C4). As each of these layers represents an extremely rich topic, each of the following subsections describes only a very specific aspect of the protocols or software interfaces in use today. 1) MPEG-2 Systems: The MPEG-2 Systems specification [4] defines the MPEG-2 transport stream packetization layer, multiplexing layer, and synchronization layer used in most of the broadcast systems in the world today. The specification includes the carriage of program specific information in the PMT which provides the means for receivers to identify the MPEG-2 packets that convey the data relative to a particular datacasting or interactive services. In this regard, the functionality provided by the PMT is a simple extension of what it is being used for today (that is, identification of video and audio elementary streams in a multiplex). By providing a mechanism to list the elementary streams composing a DTV virtual channel, the PMT provides a simple mean to bind data elementary streams to a particular DTV virtual channel. The virtual channel may be exclusively a data service channel, in which case no audio or video elementary stream is listed. In an ATSC network, the PMT is transmitted at least once every 400 ms. 2) Service Information: The ATSC Service Information Protocol for over-the-air broadcast systems is called the Program and System Information Protocol (PSIP). PSIP provides a method for publishing the type of interactive TV applications bound to a virtual channel. More specifically, the service type information provided in the virtual channel table provides the EPG receiver application with information on whether the virtual channel is exclusively a data service channel (in which case, no audio or video elementary stream is associated with the data elementary streams), an audio/data channel, or a video/audio/data channel. PSIP also provides a mechanism for informing the schedule of each event by means of auxiliary tables called the event information table (when the data service is provided along with video and/or audio) or a data event table (when the data service is a standalone service). 3) Data Communication Protocols: There are many forms of data communication protocols used in interactive television today. The DSM-CC specification [9] defines CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
some of these protocols, like the Data Download or the Object Carousel Protocol. It also describes the method for tunneling other popular protocols like IP in special MPEG-2 Systems structures called DSM-CC addressable sections as mentioned in Section II-D above. The Object Carousel Protocol has played a key role in the development of the interactive television system mainly because it serves as a bridge between the traditional, low-level MPEG-2 Systems-based delivery mechanisms used in any DTV service and the interactive software middleware running in a PC or a digital set-top box. More specifically, the Object Carousel Protocol is a “file-in-the-sky” system which defines a hierarchical organization among simple data modules transmitted in an MPEG-2 transport stream. The result is a set of data objects representing directory and data objects linked among themselves by a sophisticated referencing mechanism. As a result, a datacasting or Interactive application running in a digital receiver can use Object Carousels to navigate efficiently through large sets of data. Depending on the amount of dynamic memory or hard disk drive space available, a receiver may elect to cache none of, a portion of or the full Object Carousel. Although the organization of the directory and data objects cached in a receiver may not necessarily follow the directory/object organization prescribed by the object carousel, it is usually the case that both data organizations are typically identical as any discrepancy between the two requires additional metadata in the object carousel (to represent the intended data organization in the cache). In effect, the Object Carousel Protocol is the bridge between the data transmission part and the cache management part of a datacasting or interactive data service. To understand the essential role that the Object Carousel Protocol has played in the deployment of interactive DTV services, it is worth going over the history and fundamentals of the protocol. The protocol was designed in early 1995 as part of the MPEG DSM-CC standard [9]. The motivation for designing the protocol was to extend the use of the file access and directory listing APIs defined for conventional bidirectional VOD networks to unidirectional broadcast networks. The result of the work was an additional protocol layer called BIOP (for Broadcast Inter-ORB Protocol—the true technical term for the Object Carousel Protocol) on top of the simple, flat organization of a DSM-CC Data Carousel Protocol. In effect, the Object Carousel Protocol allows the payload of data modules to represent a collection of directory names, file names, or data files. As shown in Fig. 6, the Data Carousel Protocol is the periodic transmission of data modules carried in MPEG-2 sections. See [24] for a complete description of the protocol. It must be noted that, depending on a variety of considerations such as access time by the datacasting or interactive TV application and robustness against transmission errors, data modules may be repeated several times in a single carousel period. For example, in the figure, the repetition rate , for module , , respectively, is . The DSM-CC Object Carousel Protocol defines a family of objects that make up the payload of each data module. The object design follows a common framework defined by 111
Fig. 6. Object carousel with multiple data modules.
BIOP [9]. A BIOP object may not be split across multiple data modules, but a data module may contain zero, one or multiple BIOP objects. The DSM-CC Object Carousel Protocol defines the following set of BIOP objects. • The BIOP File Message: This is the object conveying the data files consumed by the datacasting and/or interactive data service application using the object carousel. • The BIOP Directory Message: This is the object conveying the list of subdirectory, file, stream, or stream event names falling under the current directory. • The BIOP ServiceGateway Message: This object is a special directory object which is used to designate the top directory of the hierarchical structure of an object carousel. • The BIOP Stream Message: This object is essentially a reference to an audio, video, or data stream. • The BIOP Stream Event Message: This object represents asynchronous or synchronized triggers for specific actions to be implemented by the datacasting and/or interactive data service application that use the object carousel. Stream event objects may or may not be bound to streams in the object carousel. The BIOP objects shown above share the same structure: a message header followed by a message subheader followed by the message payload. The message header is a generic IOP header defined by Common Object Request Broker Architecture (CORBA) and the message subheader contains specific information to the BIOP protocol like objectKey and a string identifying the kind of object (“srg” for service gateway, “dir” for directory, “fil” for files, “str” for streams, and “ste” for stream events. The mechanism used to bind these objects together is the interoperable object reference (IOR). An IOR is a reference to a BIOP object and, as such, contains all the information needed to locate the BIOP object in an MPEG-2 transport stream. An IOR consists of one or more tagged profiles. A tagged profile is either a reference to a BIOP object carried in the same object carousel or a reference to a BIOP object carried in an external object carousel (defined in the DSM-CC standard as “BIOPProfileBody” and “LiteOptionsProfileBody,” respectively). A BIOPProfileBody includes one or more references to the MPEG-2 program element(s) carrying acquisition information about the data module where the target BIOP object resides. This information is captured in the connection binder substructure 112
of the IOR. Also, the IOR also includes an objectKey substructure which is the unambiguous identifier of the BIOP object in the object carousel. The objectKey is typically used by the Object Carousel Protocol software stack in a digital receiver to parse the objects in a data module sequentially until the target BIOP object has been found. On the other hand, a LiteOptionsProfileBody contains the reference to another object carousel within the same MPEG-2 transport stream or in another MPEG-2 transport stream. This information is included in a ServiceDomain substructure of the IOR. Fig. 7 shows a simple object carousel made of three files organized in two directories. The right portion of the figure shows the IORs included in each of the BIOP and directory objects. The IORs for the directory object are included in the service gateway object. The IORs for the BIOP files and are in the directory object. The IOR of the service gateway object is typically included in an external location (the DownloadServerInitiate message of a data carousel) that the receiver can get easily to mount the object carousel at some predefined location of its file system. The ATSC A/95 [25] standard specifies the constraints that must be applied to designs of object carousels delivered in ATSC transport streams. The ATSC design includes extensions for providing URI-based references to objects in the object carousel, thereby providing a mechanism for abstracting the location of BIOP objects from their location in an object carousel. The primary advantage of this abstraction is to decouple the application software development from the location of the data files it consumes. More specifically, an application developer can simply refer to objects in the object carousel via the URI reference mechanism without having to pay attention to the final location of such object in the object carousel. The application does not need to be modified if the organization of the files in object carousel changes. On the transmission side, the broadcast, satellite, or cable operator has complete freedom to map and schedule the delivery of the BIOP objects in the carousel. This is the IOR of the BIOP protocol that effectively brings the transmission operations world and the application software development world together. The joint ATSC/CableLabs standardization activities started more than two years ago now are providing the opportunity to align the object carousel designs between the over-the-air/satellite and the cable industries. In particular, enhancements are being made around the infrastructure needed to make synchronized triggers based on BIOP stream event objects work. Current design assumes easy and direct access to MPEG-2 timing information to bind media time with the MPEG-2 System time clock used by the receiver in an unambiguous fashion, even in the presence of PCR discontinuities. However, this is generally not the case and ATSC, after realizing this almost five years ago, introduced the A/93 (Asynchronous/Synchronized trigger) standard in 2002 to provide an alternative solution to the industry. 4) Application Programming Interfaces: DASE and Advanced Common Application Platform (ACAP) are the two specifications that ATSC has developed toward the definition of a common DTV receiver software platform and APIs PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
Fig. 7. Organization of directories, files, and their references in an object carousel.
Fig. 8. Resources available to an interactive application.
that interactive TV application developers can use. In the following subsections, an overview of such a software platform is provided. a) Resources: The semantics of the application often require that it interact with platform resources. Fig. 8 illustrates the resources available to the application. The first stages of the pipeline relate to transport streams. The first stage tunes. The second stage isolates specific streams within the transport stream. The third stage, the section filter, isolates specific programs, or specific tables within specific programs. The functions above the pipeline can exploit the feature to isolate protocol of interest, for example, the protocol that publishes service metadata. The later stages of the pipeline relate to the content. The decryption element recovers the media streams. The decode elements are specific to the media stream. In addition to audio, the specification supports four devices that present to the screen. The devices render into a frame buffer, the planes of which divide into four sets. The first plane set from back to front, stores a background image. The second plane set is for the video. The third plane set is for graphics. The fourth plane set is for captions. The graphics device represents color samples as three color components plus a fourth component that blends the CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
sample with the current contents of the plane set. The application can composite source images with the plane set and, since the plane set stores the weights, can also composite the plane set for graphics with the plane sets behind. The figure also illustrates the components that support services and applications. The application can create filters that parse the service metadata to isolate services of interest. The application can, with proper permissions, then select the service. The application can also create filters to isolate applications bound to the service and then request that the platform launch the application. The platform extracts the application from the object carousel, creates a context in which the application executes, and launches the application. b) Scope: As the discussion above suggests, the scope of the interfaces available to applications is considerable. The application can, with proper permissions, perform these operations: • create preferences; inquire preferences; configure platform defaults, such as localization of text, in response to preferences; • language selection; configure the default language for the audio device and the caption device; • configure the transport hardware; register interest in transport events, for example the allocation or release of transport resources; • configure the media hardware; control the media streams; register interest in media events, for example changes to the video format; • configure the return channel; create sockets and exchange datagrams; • create section filters; extract specific tables in the transport stream; • create service filters; browse the service metadata; register interest in events, for example, the introduction, replacement, or deletion of a service; • service selection; resolve the service name to the service address; select the service; control the service execution; register interest in service events, for example transitions of the execution state machine; 113
• create application filters; browse the available applications; control the application execution; register interest in application events, for example transition of the execution state machine; • browse the object carousel; load specific objects of the object carousel; register interest in object carousel events, for example the addition, replacement, or subtraction of objects; • access persistent storage; • create widgets; configure the widget position, order, focus; composite images into the graphics plane set; register interest in interaction device events, for example, focus status. While other media solutions often include comparable functions, the television design center is evident through subtle details. The target device receives broadcast streams, the details of which often change without application intervention. The transport stream that publishes service metadata might announce the addition or subtraction of services. The object carousel might increment the application version number, so as to indicate that the previous application version is stale. The details of the video stream, such as its aspect ratio, can change. These details are often of interest to applications. If the application is to register its presentation with the video below, for example, the application must understand the origin and size of the video stream. For these reasons the design provides extensive events to which applications can subscribe so as to detect changes to the platform state. It is probable that the execution of one application will affect other applications. The allocation of resources to one application, for example, can interfere with other applications, so the design defines permission classes to protect transport resources and media resources. The design also anticipates a master application that executes in the background. Since its purpose is to coordinate the execution of other applications, it survives service selection. It can filter the available applications, control their execution state machine, and arbitrate resource conflicts. The master application also can extract data streams otherwise not available to applications. To account for the distance to the screen, the default widget set is distinctive. The design provides remote control events as well as keyboard events. The design lets applications interpose filters to isolate specific events. The application need not have focus to receive interaction device events. If the application is a program guide, for example, the application can listen for remote control events and, upon the receipt of the events, activate its widgets. c) Secure execution: The first line of defense to secure execution is the Java language. The language provides constructs that allow the author to control access. If an operation is private, for example, just the code of the class that declares the operation can access it. If a class is final, there can be no subclass that might implement different operations. The compilation phase then enforces the rules. The second line of defense is the exchange representation, the bytecode design, which again controls access rights. The bytecode design does not support the concept of di114
rect storage access; rather, access is though object references that do not expose the storage location. The application can just inspect data within its protection domain. The third line of defense is the verification phase, which occurs just before code execution. The verification phase enforces rules the evaluation of which requires the emulation of code execution. If the application attempts to cast an object to a superclass, for example, the verification phase confirms that the object is indeed a subclass. The application cannot forge objects that evade the access rules. The core packages then provide a design pattern available to further control access. The various packages define permission classes that can contain three components: 1) the class to which the permission relates; b) the specific target, for example, a specific file; and c) the specific actions, for example, read, write, or delete. The permission class just provides the language to articulate the available operations. To complete the design, the standard provides the mechanism to relate specific permissions to specific applications. Before the platform executes the application, it evaluates the permission requests. The platform first determines the code source, that is the network location at which the application resides, and the certificate(s) of the organization(s) that signed the application. The platform traverses a certificate chain until it encounters a certificate of an organization that the platform trusts, at which point the platform grants the specific permissions. The standard builds on these concepts to protect sensitive data and scarce resources. The standard defines permissions for each of the elements of Fig. 8. The object carousel provides, in addition to the application itself, a permission request file. The file describes the resources to which the application request access. If the certificate chain confirms that the platform can trust the application, the platform grants access to the resources. If the application should attempt to access other scarce resources, the platform detects that the resource requires a permission that the application does not possess. The access attempt fails and the platform raises an exception. While these mechanisms assure that application cannot access resources without permission, since the standard supports simultaneous execution of multiple applications, applications might contend for scarce resources. The standard provides a prioritization mechanism, but since the assignments are made at the time the object carousel is built, conflicts can still occur. The standard provides a solution. If the platform detects resource contention, it escalates to the master application, which prioritizes the applications. The platform then applies the result to resolve resource conflicts. d) State of the standardization process: The evolution of the ACAP specification involved three distinct projects. Fig. 9 illustrates the process which began with the inception in September 1998 of the specialists’ group for the DASE specification. The specification became a candidate standard in November 2002 and became a standard in March 2003. While the DASE specification and the Open Cable Application Platform (OCAP) specification are both built on similar runtimes and common core interfaces, the television PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
Fig. 9. ATSC standardization timeline for interactive TV middleware.
•
Fig. 10.
Conceptual model for data broadcasting service.
• packages and signatures of the two specifications differ. Since the designs had comparable scopes, it was thought that harmonization of the designs was feasible. A joint task force of specialists from both organizations was formed in August 2002 and completed its investigations in September 2003. The recommendation was build on the transport protocol specifications of Advanced Television Systems Committee and to adopt the application interface specification of CableLabs. 5) Essential Elements for Head-End System: A model for an end-to-end data broadcasting service system is shown in Fig. 10. The model is composed of several components, including modules for content generation, transmission, and reception. The contents generation processing unit includes components for A/V acquisition, content authoring, audio/video encoding, generation of system-level program information (PSI) and program guide data (PSIP), management and storage for the audio/video/data assets, and the return channel server(s) used for supporting the interactive services. The outputs of each of these components are then multiplexed into one or more MPEG-2 transport streams which in turn, is modulated as an 8-VSB signal. A more detailed description of the modules is provided below. • Data content generation: The first step in the development of a DTV data broadcasting service is the development of an application based on Java as procedural language or XML as declarative language. The choice to use either a procedural or declarative CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
•
•
•
framework is driven by several factors (complexity of graphics, for example) and it is generally recognized that a procedural framework is to preferred for scenarios calling for extensive user more interactivity while a declarative environment is to be preferred for scenarios calling for rapid development of applications that typically require less computational power. For this reason, procedural environments are more suitable for the development of interactive services such as games and quiz shows. On the other hand, declarative applications are to be preferred for applications designed to produce large amounts of information such like actor and actress information or drama synopsis. Transport: The role of the transmission module is to encode the video and audio contents and to encapsulate a procedural or declarative application and its data according to the transport protocols referenced in [24]. The application and its data are generally encapsulated into an object carousel before being multiplexed with the audio and video elementary stream into an MPEG-2 transport stream. Also, the necessary MPEG-systems program system information and the relevant application information table (AIT) are multiplexed so receivers can discover the presence and location of the data elementary stream conveying the object carousel data. Program scheduler: The role of the program scheduler is to generate and manage the schedule for the audio and/or video program as well as the broadcast and interactive data services. Minimally, the schedule consists of a start time, duration, metadata such as storage path, and optionally for data services, information binding the application to a specific audio and/or video program. The program scheduler typically controls much of the head-end system including data servers and video and audio encoders. Data server: The data server has three distinct functions. Its first responsibility is to store and manage the application code; its second role is to encapsulate and packetize the application data before it is transmitted to receivers; this is typically achieved by means of the object carousel protocol. The final responsibility of the data server is to generate the AIT, which holds all the information (application name, location of resources, arguments) that a receiver needs to run the application properly. PSIP/SI server: The ATSC PSIP server and the system information server provide the information needed for associating audio, video, and data elementary streams with a DTV virtual channel. System information consists of the program element identifier (PID) values for each video, audio, and data elementary stream. Multiplexer: The multiplexer multiplexes the audio, video, and data MPEG-2 transport packets produced by the encoders and the data servers. Multiplexing of the packets is done according to the bitrate assigned for each of the elementary streams. The resulting MPEG-2 transport stream also includes the PSIP data and of 115
shows [Fig. 11(c) and 11(d)] are true bidirectional interactive services requiring a return channel to establish a session with a remote transaction server. IV. CONCLUSION
Fig. 11. Data broadcasting service screenshots. (a) Player information. (b) Card game. (c) Shopping. (d) Quiz.
course the program system information (MPEG-2 systems program association table, MPEG-2 systems program map table) digital receivers rely on to discover the services. • Modulation: Finally, for ATSC terrestrial broadcasting service, the MPEG-2 transport stream is modulated into a radio-frequency signal. The modulation is done according to an eight-level vestigial side band (8-BSB) scheme specified by ATSC. • Reception: DTV terminals receive the 8-VSB signal and decode the signal into an MPEG-2 transport stream, which is then demultiplexed into individual elementary streams using the program specific information. Also, the terminal includes the procedural or declarative middleware responsible for executing the interactive data application on the terminal. Generally, the graphics associated with a data application is overlaid on the video content or alternatively, is shown on the side panels of a wide aspect ratio (16 : 9) display. For most interactive applications, users can use the remote control unit or a keyboard to navigate through the various states of the application during the audiovisual program. It is not uncommon that several interactive applications are concurrent with a single audiovisual program. Fig. 11 shows screenshots of experimental data broadcasting services in South Korea. Fig. 11 illustrates various types of data services. A program related data service is shown in Fig. 11(a). The service provides auxiliary player information for any viewer interested in specific player information or statistics. Such a system was prototyped during the soccer World Cup in 2002. A non-program-related interactive service is shown in Fig. 11(b): it is a card game that a user can select and play card game while watching a TV program. The interactive data service shown Fig. 11(a) and 11(b) are both unidirectional services because these can be provided without the need of a return channel. On the other hand, interactive services like shopping or quiz 116
This paper has provided an overview of the data broadcast and interactive television service technologies that have been used in field trials and commercial deployments over the last ten years. These technologies include communication data protocols for the forward and return channels and DTV receiver middleware APIs that content provider can rely on to deploy their services. The convergence of standards toward similar protocols and DTV receiver runtimes, the appearance of PC/DTV convergence products such as PCs running the Windows XP Media Center Edition platform, the migration of audio/video decoding capability from the set-top box into the DTV receiver are all pointing to a gradual change of the traditional TV experience to a future where DTV will provide an enhanced viewing experience with interactive data services. Future interactive TV services will allow operators to provide increased personalization to their customers. This drive toward personalization will bring new consumer behaviors which in turn may open the door to new pay services such as T-commerce or product documentation services. Additional technical solutions such as digital rights management are needed, but they are now being evaluated and slowly deployed. There are, however, important problems that remain to be solved before any broad deployment of interactive TV can happen. One of them is certainly the issue of conformance for interactive applications. Indeed, service providers will increasingly be seeking guarantees that their applications run and present content as it was originally intended. Acquisition, caching, and execution environment in DTV receivers can all have an impact on how the data is processed and rendered. Therefore, we can expect the development of minimum hardware and software guidelines for DTV receivers in the near future. Another challenge is the development of logical interfaces into a broadcaster head-end that let service providers define the interactive TV experience and the business rules associated with their service. This means that new specifications for metadata for describing interactive data services are going to be needed. It is only when these logical interfaces are available that DTV operators will have a way to integrate the revenues and costs associated with operating interactive data services in their plants. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers, who have all provided great feedback. REFERENCES [1] G. Thomas, “ATSC datacasting: opportunities and challenges,” in Proc. NAB 2000 Broadcast Engineering Conf., pp. 307–314. [2] D. Catapano and G. Thomas, “Fine-grained announcement of datacast services,” in Proc. NAB 2004 Broadcast Engineering Conf. pp. 249–256. [3] ATSC data broadcast standard, ATSC Standard A/90, 2000.
PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006
[4] Information technology—Generic coding of moving pictures and associated audio—Part 1: Systems, ISO/IEC Standard 13818-1, 2000. [5] Information technology—Telecommunications and information exchange between systems—Local and metropolitan area networks—Specific Requirements—Part 1: Overview of local area network standards, ISO/IEC/TR3 Standard 8802-1, 1997. [6] Information technology—Telecommunications and information exchange between systems—Local and metropolitan area networks—Specific requirements—Part 2: Logical link control, ISO/IEC Standard 8802-2, 1998. [7] Television—Declarative data essence—Unidirectional Hypertext Transport Protocol, SMPTE Standard 364M-2001. [8] Information technology—Generic coding of moving pictures and associated audio information—Part 6: Extensions for DSM-CC, additions to support data broadcasting, ISO/IEC Standard 13818-6 Amendment 1, 2000. [9] Information technology—Generic coding of moving pictures and associated audio information—Part 6: Extensions for DSM-CC, ISO/IEC Standard 13818-6, 1998. [10] Program and system information protocol for terrestrial broadcast and cable (Revision B), ATSC Standard A/65B, 2003. [11] Implementation guidelines for the ATSC data broadcast standard, ATSC Recommended Practice A/91, 2001. [12] DVB specification for data broadcasting, ETSI Standard EN 301 192 V1.4.1, 2004. [13] “SDP: Session description protocol,” Internet Engineering Task Force, RFC 2327, Apr. 1998. [14] “Session announcement protocol,” Internet Engineering Task Force, RFC 2974, Oct. 2000. [15] Delivery of IP multicast sessions over ATSC Data Broadcast, ATSC Standard A/92, 2002. [16] IP multicast for digital MPEG networks, ANSI/SCTE Standard 42, 2002. [17] Synchronized/asynchronous trigger, ATSC Standard A/93, 2002. [18] Software download data service, ATSC Standard A/97, 2004. [19] Specification for system software update in DVB systems, ETSI Standard ETSI TS 102 006 V1.3.1, 2004. [20] HOST-POD interface standard, ANSI/SCTE Standard 28, 2004. [21] ATSC interaction channel protocols, ATSC Standard A/96, 2004. [22] Digital television—Opportunistic data broadcast flow control, SMPTE Standard 325M-1999. [23] DTV application software environment level 1 (DASE-1), ATSC Standard A/100, 2003. [24] R. Chernock, R. Crinon, M. Dolan, and J. Mick, Jr., Data Broadcasting—Understanding the ATSC Data Broadcast Standard , ser. McGraw-Hill Video/Audio Professional Series, New York, McGraw-Hill, 2001. [25] Transport stream file system, ATSC Standard A/95, Feb. 2003.
Regis J. Crinon (Member, IEEE) received the M.S.E.E. degree from the University of Delaware, Newark, in 1984 and the Ph.D. degree in electrical and computer engineering from Oregon State University, Corvallis, in 1994. He started his career at Tektronix Inc., Beaverton, OR, where he codeveloped the three-dimensional NTSC and PAL chrominance/luminance separation for the Emmy award winning Profile video editing system. In 1987, he was a Visiting Scientist at the Advanced Television Research Program, Massachusetts Institute of Technology, Cambridge. He then joined Thomson Consumer Electronics, Indianapolis, IN, where he was the data services system architect for the TELE-TV system. At Sharp Laboratories of America, Camas, WA, he worked on MPEG-4 Video and Systems. More recently, he was with Intel Corporation, Hillsboro, OR, where he was the engineering manager for the development of a prototype end-to-end PC-based datacasting system. He has been with Microsoft Corporation, Redmond, WA, since 2002 where he is currently a Lead Program Manager in the Digital Media Division. He is also an Adjunct Faculty Member at Oregon State University, where he has taught courses in the area of Digital Video Processing and also has served as Ph.D. student advisor. He also coauthored a book entitled Data Broadcasting; Understanding the ATSC Data Broadcast Standard (McGraw-Hill, 2001).
CRINON et al.: DATA BROADCASTING AND INTERACTIVE TELEVISION
Dr. Crinon has been an active participant in the MPEG Systems standardization process. In 1999, he was recognized twice by MPEG for outstanding contributions to MPEG Systems standards. He also was the chairman of the ATSC T3/S13 Data Broadcast Specialist Group from 2000 until 2002 and received the ATSC Bernard J. Lechner Outstanding Technical Contributor Award in 2002.
Dinkar Bhat received the B.Tech. degree in electrical engineering from the Indian Institute of Technology at Madras (now Chennai), the M.S. degree in computer science from the University of Iowa, Iowa City, and the Ph.D. degree in computer science from Columbia University, New York. He is Principal Engineer at Triveni Digital, Inc., Princeton, NJ, where he has made many substantial contributions to products for data broadcasting, bitstream monitoring and analysis, and PSIP generation and grooming. He has published in leading journals, such as the IEEE TRANSACTIONS ON PATTERN ANALYSIS, and in IEEE, Society of Motion Picture Television Engineers (SMPTE), and National Association of Broadcasters (NAB) conferences. He holds two patents in the area of digital television.
David Catapano received the B.S. degree in applied computer science from the University of Wisconsin-Parkside, Kenosha. He worked for 14 years at the Xerox Corp. in Rochester, NY, developing advanced digital printing technology and products. He is currently Senior Director of Product Development at Triveni Digital, Inc., Princeton, NJ, with overall responsibility for development of all Triveni Digital products, and he has been development manager and primary architect of the Triveni Digital SkyScraper data broadcast product line since its inception in 2000. He was an early contributor to the ATSC specialist group T3/S17 (DASE), and he oversaw the development of an early prototype JavaTV/DASE environment. He holds three patents for printer-related technology.
Gomer Thomas (Member, IEEE) received B.A. degrees in mathematics from Pomona College, Claremont, CA, and the University of Cambridge in Cambridge, England, and the Ph.D. degree in mathematics from the University of Illinois, Champaign-Urbana. He is Principal Scientist at Triveni Digital, Inc., Princeton, NJ, with current focus on Triveni Digital’s SkyScraper data broadcast product line. has At various points in his career, he has carried out research and development in abstract algebra, real-time systems, heuristic combinatorial algorithms, distributed data management, and digital television. He has contributed to standards work in the ATSC specialist groups T3/S8 (Transport) and T3/S13 (Data Broadcasting). He has presented numerous technical talks and papers at National Association of Broadcasters (NAB), IEEE, Society of Motion Picture Television Engineers (SMPTE), Society of Cable Telecommunications Engineers (SCTE), Public Broadcasting System (PBS), and Society of Broadcast Engineers (SBE) conferences and meetings.
117
James T. Van Loo received the B.S.E.E. degree from the University of Michigan and the M.S.E.E. degree from the University of Florida. He began his career with the flight simulation component of General Electric, where his interests were algorithm design for flight simulation hardware and algorithms to create the scene content. He then joined Sun Microsystems where he contributed to graphics workstation design. He later was active in multimedia object framework standards where he was one of the authors of the Multimedia Home Platform specification of the Digital Video Broadcast consortium. In his current position at Microsoft Corporation, Redmond, WA, he contributes to television standards specification. His research interest is in the intersection of aesthetics and science.
118
Gun Bang received the M.S. degree in computer engineering from the Hallym University, Chuncheon, Korea, in 1997, and he is currently enrolled in the Ph.D. program of the Computer Science department at Korea University, Seoul, Korea. From 1998 to 1999, he developed a video telephone system based on the H.264 codec at NADA Research and Institute, Seoul, Korea. He has been a Senior Engineer at Electronics and Telecommunications Research and Institute, Taejoen, Korea, since 2000. He has been an active participant in ATSC T3/S2 Advanced Common Application Platform Specialist Group since 2002. He also has participated in the development of an ATSC DASE-based data broadcasting prototype for the FIFA Korea-Japan WorldCup in 2002 and he is currently the secretary for the TTA (Telecommunications Technology Association) TC3/PG312 Working Group responsible for the development of a terrestrial data broadcasting standard in Korea. He is also active in ISO/IEC JTC1/ SC29/WG11-MPEG. His current research interests focus on data broadcasting, digital right management for digital broadcasting, and personalized TV.
PROCEEDINGS OF THE IEEE, VOL. 94, NO. 1, JANUARY 2006