A TV Program Generation System Using Digest Video Scenes and a

Proceedings of the 34th Hawaii International Conference on System Sciences - 2001

A TV Program Generation System Using Digest Video Scenes and a Scripting Markup Language Yukari Shirota, 1,2 Takako Hashimoto, 1,2 Akiyo Nadamoto, 3 Taeko Hattori, 4 Atsushi Iizawa, 1,2 Katsumi Tanaka, 3 and Kazutoshi Sumiya 5 1

Information Broadcasting Laboratories, Inc. {shirota,takako,izw}@ibl.co.jp

2

Software Research Center, Imaging System Business Group, Ricoh Company, Ltd. The authors are partly on loan from Ricoh Company, Ltd. to Information Broadcasting Laboratories, Inc. {shirota,takako,izw}@src.ricoh.co.jp 3

4

Division of Information and Media Science, Graduate School of Science and Technology, Kobe University {nadamoto,tanaka}@db.cs.kobe-u.ac.jp

Department of Computer and Systems Engineering, Graduate School of Science and Technology, Kobe University [email protected] 5

Division of Urban Information Systems, Research Center for Urban Safety and Security, Kobe University [email protected]

Abstract This paper describes a TV program generation system using digest video scenes that are retrieved from video streams with the program indexes. The key features of the system are: (1) TV programs can be dynamically generated from digest video scenes selected by user preference. (2) Directions can be added using a happiness or sadness level based on the user preferences. (3) Personalized TV programs for an individual viewer can be made. The procedures taken by the system are as follows: (1) Conjunctive expressions between scenes are automatically generated, (2) Emotional expressions are automatically generated by user preference, (3) TV program metaphors are defined, (4) Direction templates corresponding to the metaphors are defined • (5) These expressions and definitions are coded using a markup language, and (6) Contents such as virtual characters and movies are synchronized. The resultant program can be shown on a TV set. Key words: digital broadcasting, digest making, TV program direction, emotional expression

1. Introduction Digital television (TV) broadcasting technology made rapid progress, and commercial digital TV broadcast services have started in several countries. In Japan, commercial digital broadcast service by BS (Broadcast Satellite) is scheduled to

start in December 2000. Following the advent of digital satellite broadcasting, preparations are being made for digital terrestrial television broadcasting with the goal of launching the service in 2003. Digital TV broadcasting technology will bring drastic changes in the usage patterns of conventional TVs. Storing TV program contents on viewers' TV sets will make it possible for viewers to watch their own personalized TV programs, retrieved and reorganized from the stored programs. Viewers can watch their own programs in a nonlinear way. As one of the key issues in the development of personal TV, it should be possible to broadcast program index and related data as well as conventional TV program content. Here, a program index is data which describes the content of TV programs. Related data includes related Web information as well as EPG (electronic program guide) data. This metadata can also be stored on viewers' terminals, and can be utilized for retrieval and reorganization of stored TV programs. By using stored TV program contents and program index data, it becomes possible to retrieve fragments of video scenes in which a viewer is interested. Hashimoto et al. [3,4,5,6] have developed an experimental system for TV digest generation from stored TV contents and program index data. The major functions of the digest generation system are: (1) Generation of an intuitive index from program index data The program index data is a collection of pairs of keywords and shot numbers, where those keywords describe each

0-7695-0981-9/01 $10.00 (c) 2001 IEEE

1


fragment of individual video shots. The granularity of the program index is too fine for viewers to use it for video scene retrieval, but our digest generation system can generate more intuitive index data from those keywords. (2) Retrieval of important scenes by intuitive index Using a generated intuitive index, each viewer can retrieve a collection of video scenes matching his/her interests. The result of the retrieval is a collection of video scenes. In order to transform the collection of retrieved scenes into a TV program, Shirota et al [9] developed a narration generation system, which generates (a) a narration to connect two video scenes, (b) narrative sentence reflecting a viewer's favorites scenes and his or her viewpoint. Furthermore, Nadamoto et al. [8] developed several technologies to transform favorite Web pages into a TV program: (1) Automatic transformation of text with images (e.g. Web pages) into a CG animation. Thus, it becomes possible to watch Web pages like TV programs. (2) An XML-based markup language for representing TV program content, called Scripting-XML (abbreviated to SXML). Based on the above technologies, in this paper, we propose a new type of digital TV service, a personalized TV program generation, and describe technical issues concerning with the personalized-program generation. Our personalized-program generation is based on the above digest/narration generation function and a new markup language called Personalized Program Markup Language (abbreviated to PPML) which is a newly designed language for personalized-program generation, incorporating several S-XML functions. Viewers can watch the generated personalized TV program with a specific program called Personalized Program Viewer (abbreviated PPV). The key feature of our personalized-program generation system is: (1) TV programs can be dynamically generated from digest video scenes selected by user preference. (2) Directions can be added using a happiness or sadness level based on the user preferences and directions such as program presenter characters and camera operations can be coded.

TV program Making Language (TVML) [7,13], proposed by NHK(Japan Broadcasting Corporation) Science and Technical Research Laboratories, is a tool to produce an entire TV program on the desktop. It is a kind of scripting language by which a CG-based TV program is described. A TV program script written in TVML is played like a conventional TV program by the TVML player. That is, a TVML script is translated into CG animation with synthesized speech, virtual camera movement and real video.

(3) Personalized TV programs for an individual viewer can be made. In a word, our approach is a TV program generation technique using real video scenes with annotated narrations, and virtual animation characters on the CG TV studio set.

2.2 Services for Personalized TV Programs

2.1 Markup Languages for TV Program Generation

2.1.1 SMIL Boston Synchronized Multimedia Integration Language (SMIL) Boston [10,14] builds upon the W3C SMIL 1.0 Recommendation, and adds important extensions, including reusable modules, generic animation, improved interactivity, and TV integration, all written in the Extensible Markup Language (XML). The SMIL Boston Working Draft proposes several extensions to SMIL 1.0, such as integration with TV broadcasts, animation functionality, improved support for navigation of timed presentations, and the ability to integrate SMIL markup in other XML-based languages.

2.1.2 TVML

2.1.3 BML and B-XML In Japan, the XML Working Group of the Association of Radio Industries and Businesses (ARIB) is currently developing an XML based multimedia content format which can be used for both data broadcasting services like BS and terrestrial broadcasting. The application language is called BML (Broadcast Markup Language) [1] and B-XML (Broadcast XML). Basically, B-XML is an extended version of BML, with the extensions designed to help process any DTDs.

2.1.4 S-XML We have proposed Scripting-XML (S-XML) [8] for passive Web-browsing using the TV-program metaphor. In S-XML notation, content-related tags are separated from style-related tags. This is because we wish to allow for the possibility that scripts might be presented in more than one style.

2.2.1 WebTV WebTV [15] provides a way to access the Internet using a TV without computers. Users can send/receive e-mail and surf the Internet, watching TV at the same time. Furthermore, Web pages related to the TV program currently on the air can be displayed automatically. WebTV adapts many techniques for efficient use of TV displays and input devices: gray-scale fonts, alpha-blending, a simple remote controller interface. WebTV would sent to be one of the solutions for the fusion of TV and computers. Web content items are, however, browsed in the conventional read and click manner.

2. Related Work In this section, we provide pointers to research work addressing a variety of issues in TV program generation technology: synchronized multimedia languages and digital TV broadcast services. Also, we describe our approach in contradistinction to the cited works.

2 0-7695-0981-9/01 $10.00 (c) 2001 IEEE

2


elements. BML/B-XML can synchronize and display contents using tag formats, specifying how to switch text contents on an anchor element. Unfortunately, dramatic presentation and CG animation are not considered. S-XML provides for the description of Web pages. It can describe the behavior of CG animation and synchronized formats (e.g., the tag ). The notation of a course of events (i.e., introduction, development, turn, and conclusion) is provided for dramatic presentation. PPML functions are almost all based on S-XML. Actually, PPML activates SMIL and TVML functions in its processing (See Figure 1). Display layout functions proposed in SMIL are also considered. WebTV takes aim at the reasonable display management of Web contents in TV environments. The content targeted by WebTV consists of ordinary Internet resources (i.e. Web pages), not video programs on the air. TV Anytime is a working proposal for the TV environment that includes huge variety of storage devices. The aim of TV Anytime is the same as that of our idea. However, reconstruction and abstraction of the program contents are not considered.

2.2.2 TV Anytime TV Anytime Forum [12] was launched with the goal of creating a new type of multimedia service that makes the best use of three very different media: the real-time services of broadcasting, the highly flexible services of the Internet, and the use of a high volume digital storage device. Before the advent of the TV Anytime Forum, there was DAVIC (the Digital Audio-Visual Council) [2]. The standardization of new multimedia services using a storage device and TV Anytime/Anywhere has been promoted since last year, after completion the DAVIC 1.5 specifications regarding broadcasting services.

2.3 Our Approach PPML is a language which is newly designed for personalized-program generation. We designed PPML, incorporating several S-XML functions. In this section, we discuss a comparison of the functions between the markup languages cited previously and PPML. Table 1 shows the features of each language described Section 2.1. The comparison is based on four functions: (1) synchronization, (2) display layout, (3) dramatic presentation, and (4) CG animation.

3. A TV Program Generation System Using Digest Scenes In this section, we shall explain our TV program generation system by digest scenes and the scripting markup language PPML. The whole system consists of the following two subsystems: (1) A personalized digest making system (abbreviated to PDMS), and (2) A TV program generation system based on the use of PDMS. The output data of PDMS becomes the input data for the TV program generation system (See Figure 2). The advantage of recent digital data broadcasting is that it can deliver additional data as indices attached to the TV program contents. Using these indices, PDMS can calucate the importance level parameters, construct a digest of the program, and generate an explanation for each extracted scene. The output data from PDMS that is the input data to the TV program generation system consists of the following threetuple data: (1) Video scene range, (2) Superimposition , and (3) Importance level parameters. The video scene range (1) points the start/end frame numbers in the video contents. The superimposition (2) is an explanation for the scene. The importance level parameters (3) express the importance level of the scene parts. One program digest consists of multiple video scenes. For example, in a baseball game, there are several important and interesting scenes. Any extracted scene has the above three-tuple data.

Table 1: Comparison Table of Markup Languages for TV Program Generation Synchronization Display Layout Dramatic Presentation CG Animation

Target Object

SMIL

TVML Good N/A Poor

BML/ B-XML Good Good N/A

Good Good N/A N/A

Good N/A Medium

PPM L Good Good Good

Good

N/A

Good

Good

Web

S-XML

Retrieval Resultant Data From Program Contents Databases Emotional Expression

Scripting Layer

Presentation Layer

S-XML

TVML Player

PPML

Syncronaization Control on a Display

SMIL Player

Figu r e 1: Rela tions hip between S-XML a nd PPML

SMIL can describe sequential presentation and parallel presentation using tag and tag respectively. SMIL is one of the most powerful languages for the production of synchronized multimedia content. However, it cannot handle virtual animation characters like on-line human agents and cannot express the dramatic presentation. TVML can handle virtual animation characters, synthesized speech, virtual camera movement and real video. However, it cannot synchronize contents and cannot specify the layout of

3.1 Automatic Generation of Explanations In the development of PDMS, we developed the explanation generation methods. The details of the methods are described in [3,4,5]. However, the generated explanations were too simple and poorly expressed. We consider our target users are ordinary persons including children and the elderly who cannot be expected to be familiar 3

0-7695-0981-9/01 $10.00 (c) 2001 IEEE

3


Personalized digest making system (PDMS)

TV program metaphor definition

Video scene range TV program metaphor definition store

TV program metaphor DB

superimposition TV program metaphor

Explanation generation module

Importance level parameter

Prescript TV program metaphor information retrieval

Event script Postscript Emotional level parameter

Selection of emotional IDs

Digest scene definition

Process

Emotional ID Emotional ID mapping table

TV program direction template definition

Selection of templates TV program direction template mapping table

Preparatory operations Definition by PPML

TV program direction template PPML interpretor

TV program direction template definition store

TV program direction template DB

Figure 2: System Architecture of a TV Program Generation (c) Hierarchical expression: To make the expressions more natural, our system uses structural data from the target contents. In almost all the contents, the structure is a hierarchy. If the structure is a hierarchy, the system generates explanations, step by step, from the top level to the bottom level. For example, in the case of a baseball game, there is a consist-of hierarchy which is "game" "inning" - "at bat" - "pitching." While an announcer briefly speaks the game contents, following the hierarchy, he first explains the whole game and then explains the inning step by step. In addition, there are three kinds of explanations, corresponding to one event, such as a homerun or a slam dunk: (1) the preface to explain the circumstances before the event happens, (2) the event itself, and (3) an afterword to explain the resultant circumstances after the event has happened.

with computer output expressions that are not user-friendly and easy to understand. Thus, it is necessary to explore a method to interpret those hard-to-understand explanations into a more user-friendly form for ordinary users. Our goal is to make explanations which enable users to sympathize with the contents. With this in mind, we have developed the explanation generation module in the following ways [9]: (a) Conjunctive expression: The system focuses on the relationship between scenes and creates conjunctive relationships, compared with both the previous scene and the following scene. With the conjunctive expressions, the explanations become more natural and easy to understand for TV viewers. (b) Emotional (standpoint-dependent) expression: In addition, we also generate emotional expressions for the explanations. While the existing method outputs only neutral expressions, our new method generates an expression that includes the happiness or sadness level based on user preferences. The key point is how to calculate the viewer's emotional level. If the viewer's preferences are clear, for example, the viewer is a supporter of the Giants, then it is not so difficult to calculate the emotional level. As far as we are concerned, the existing computer-based system has not included this standpoint-dependent approach.

By introducing these three kinds of expressions, we have developed an automatic explanation generation module in our TV program generation system (See the right part of Figure 2). Figure 3 illustrates an example of the generated explanations which would accompany a baseball game. The output data of the explanation generation module consists of the following four-tuple data:

4 0-7695-0981-9/01 $10.00 (c) 2001 IEEE

4


"On October third, the game Giants vs. Carps was held in Tokyo Dome stadium. Carp's 1st inning” // Suppose that the viewer is a supporter of the team Carp. “The batter Brown hit a solo home run. It’s great! Carp started the scoring 1-0.” “The top of 1st inning ended, Carp started the scoring 1-0.”

(1) Preface explanation (2) Event explanation (3) Afterword explanation (4) Emotional level parameters. The explanations (1), (2), and (3) are words that a virtual character announcer will speak. The words have been already included in the emotional expressions, the conjunctive expressions, and the hierarchical structure expressions. The emotional level parameter (4) is a number which was calculated by the explanation generation module. If the value is a positive number, then the parameter shows the happiness or satisfaction level of the viewer. If the value is a negative number, then the parameter shows the unhappiness or discontent level of the viewer. The parameter can be used to decide which directions should be selected. For example, if the parameter shows the viewer's feelings are happy, then happy and bright stage effects should be selected. The current version of the digest system outputs only one emotional level parameter. However, in the future, we will be able to describe and input multiple parameters to express emotions, in more detail.

3.2.2 TV program Metaphor Definition Before the system begins automatic TV program generation, a system operator, in advance, has to prepare the following files: a TV program metaphor definition file and a TV direction template definition file. Firstly, we shall explain the TV program metaphor definition. The digest scenes are finally shown as a TV program. We call the style of a TV program the "program metaphor" [8,11]. For example, there is a news program metaphor, a discussion program metaphor, and a variety program metaphor. The “news program metaphor” determines that the content should be presented as a news show, where an anchor person (with possibly an assistant) narrates the content. The “variety program metaphor” determines that the content should be presented as a variety show, where several funny characters present the content accompanied by amusing behavior. According to each program metaphor, the system operator prepares a corresponding template script, by which the specified content is instantiated and presented. For example, appropriate studio sets and camera work specifications are prepared for each program metaphor. The corresponding template script is called "metaphor definition file" in our system. The important parameters of a metaphor definition file are as follows: (1) metaphor ID: Which style is defined is described by the "metaphor ID." (2) commentator: Characters who figure in a TV program such as a main commentator. The parameters is represented as "commentatori (i=1,2,3,...)." (3) props: Props that appear in a TV program such as a chalkboard. The parameters are represented as "propi (i=1,2,3,...)." (4) image and sound: Image files and music files including sound effect files. The parameters are represented as "imagei" and "soundi (i=1,2,3,...)."

Carp’s 1st inning,

Game The bottom of the 1st inning ended, Giants reversed.

Hierarchical expression

Furthermore

Great!

Oops!

Moreover

Top of the 2nd

Conjuctive expression

Bottom of the 1st

But,

Top of the 1st

The top of the 1st inning ended Carp started the scoring 1-0.

Oct. 3rd Giants Vs. Carp in Tokyo Dome

Too bad!

Emotional expression

The batter, Etoh C hit a solo home-run, 1 point was added. The batter, Takahasi, got a RBI (runs batted in) hit. The batter, Kawai, got a RBI hit. The batter, Matsui, hit a three-run home-run, 3 points were added.

Figure 3: An Example of Generated Explanations

3.2 Markup Language PPML Now we will explain the scripting markup language PPML. As shown in Figure 2, PPML is used for three kinds of definitions which are (1) TV program metaphor definition, (2) digest scene definition, and (3) TV direction template defition.

3.2.1 Digest Scene Definition The TV program generation system internally produces a digest scene definition file, compiling the output of PDMS and the output of the explanation generation module (See the center of Figure 2). We shall show you an exmple of digest scene difinition file in the following: 10111 25443 “1st inning Carp: The batter Brown solo home run. Carp started the scoring 1-0.”

These parameters defined in a metaphor definition file are used in writing TV program direction templates. The parameters must 5

0-7695-0981-9/01 $10.00 (c) 2001 IEEE

5


// parallel sound=&sound1 // movie area=”display”, filename=&movie //event script name=&commentator1, text=&eventscript name=commentator1, pitch=3 area=”display”, text=&superimpose // postscript name=&commentator1, text=&postscript

be consistent through the final TV program. If a commentator is not defined in the metaphor definition file, we cannot use the commentator in the direction templates. In other words, dataconsistency is required between a metaphor definition file and the direction templates. For consistency, the system operator writes the "metaphorID" at the top of the file and template. The system finds the candidate templates with the same "metaphorID" as the "metaphorID" defined in the metaphor definition file. We shall show an example of TV program metaphor definition files in the followings: one anchor person //studio set and virtual characters file=”/usr/…/spark.iv”

A TV Program Generation System Using Digest Video Scenes and a

A TV Program Generation System Using Digest Video Scenes and a

Suggest Documents

A Message Digest System Using Key Concept and ...

Information Fusion in a TV program Recommendation System

A TV program recommendation system based on big data

Using a GIS System for the Generation of Driving Simulator Scenes

Top Notch TV Video Program and Activity Worksheets Teaching ...

Audio-Video Summarization of TV News Using

Generation of Views of TV Content Using TV Viewers' Perspectives

A Video Driver System Designed Using a Top-Down ... - CiteSeerX

Characterisation of Acoustic Scenes using a ...

CloudMoV: A Mobile Social TV System using Cloud Services - SERSC

CloudMoV: A Mobile Social TV System using Cloud Services - SERSC

A TV program recommender framework

INTERPRETATION OF 2D SCENES USING A GENERAL

TV and Video - Ericsson [PDF]

XtraGen â A Natural Language Generation System Using XML- and ...

Mobile TV & Video

Temporal video segmentation to scenes using high-level ...

Online Registration of Dynamic Scenes using Video ... - JHU-Vision Lab

t-Room: Next Generation Video Communication System

Temporal video segmentation to scenes using high-level audiovisual ...

AUTEUR: The Creation of Humorous Scenes Using Automated Video ...

Webified Video: Media Conversion from TV Program to Web Content

Catalonia TV System and Social TV

Rapid Video Browsing on a VCR using a TV Set-top Box - CiteSeerX

A TV Program Generation System Using Digest Video Scenes and a