Automatic Detection of Flash Movie Genre using Bayesian Approach

0 downloads 0 Views 258KB Size Report
increasingly important for Flash movie management as a complement to topical principles of classification. Genre classification can identify Flash movies ...
Automatic Detection of Flash Movie Genre using Bayesian Approach

1

Dawei Ding1 Jun Yang3 {dwding, itqli, csliu}@cityu.edu.hk

Qing Li1 Liping Wang1 Liu Wenyin2 [email protected] [email protected]

Dept. of Computer Engineering and Information Technology, City University of Hong Kong, HKSAR, China 2

Dept. of Computer Science, City University of Hong Kong, HKSAR, China

3

Language Technology Institute, School of Computer Science, Carnegie Mellon University

Abstract As Flash – a relatively new rich media format becomes more and more popular on the Web, genre becomes increasingly important for Flash movie management as a complement to topical principles of classification. Genre classification can identify Flash movies authored in a style to most likely satisfy a user's information need. In this paper we present a method for detecting the Flash genre quickly and easily by employing a Bayesian approach. A feature set for representing genre information was proposed and used to build automatic genre classification algorithms. The performance of the proposed approach was evaluated by training a Bayesian classifier on real-world data sets. Classification results from our experiments on thousands of Flash movies demonstrate the usefulness of this approach.

Keywords: Genre detection, Bayesian classifier, Flash movie

1. Introduction Flash™ proposed by Macromedia Inc. is a relatively new format for vector-based interactive movies, which can be embedded in web pages and delivered over the Web. After its advent dated 1997, Flash has experienced a remarkable growth in the global scale and becomes nowadays a prevailing media format on the Web. The statistics [1] from Macromedia states that by June 2003, 97% (or 436 million) Internet users are able to view Flash movies using Macromedia Flash Player, the rendering tool of Flash. Seven out of the top 10 global websites or 62% among the top 50 have adopted Flash content in their pages, including AOL Time Warner Network, Yahoo!, and eBay. Besides their predominant presence as multimedia and interactive components of static, textual websites, Flash movies are being created as a variety of genres, such as cartoons, commercial presentations, MTV movies, and computer games. A genre class is a group of movies that are written in a similar style, and hence is an important factor in retrieving useful movies and

0-7803-8603-5/04/$20.00 ©2004 IEEE.

focus on the Flash movie genre dimension of subjectivity. Up to now genre classification for Flash has to be performed manually. Therefore techniques for automatic genre classification would be a valuable addition to the development of Flash retrieval systems.

2. Related Work As mentioned in the introduction, despite the fast growth of Flash media on the web, very limited research has been devoted to Flash retrieval and classification. As the first endeavor in content-based Flash retrieval, Yang et al. [4] proposed a generic framework named FLAME, which embodies a 3-tier architecture for the representation, indexing and rudimentary retrieval of Flash movies on the Web. The semantic model [3] suggested by Ding et al. shows the potential of leveraging co-occurrence analysis of elements in the context of scenes for improving the performance of Flash retrieval. Although manually annotated genre information has been used to manage Flash movies on the Web, to the best of our knowledge, there is no prior published work in automatic genre classification of Flash movies. On the other hand, there have been a number of works devoted to the genre classification of multimedia data. For each type of data, some specific approaches have been proposed. Kessler, Nunberg, and Schutze [5] use punctuation as one of the surface cues for the classification of text into genres. Truong et. al. [2] suggested a set of computational features including editing effects, motion, and color for the task of automatic video categorization. An approach to the topic in audio was presented in [6] where music given as raw audio is classified into genres based on musical surface and rhythm features.

3. Flash Classification 3.1. Genre Definition

Conceivably, genre information is a very important type of semantic feature useable to index Flash movies and to compose effective queries. Especially, it allows users to reduce their “search space” substantially by choosing their interested genre or browsing the movie collection by genres. Its importance can be seen from the extensive use of “topic genres” or “directories” in commercial search engines like Google. Therefore, we show in this section how to extract such information from a combination of low-level and high-level features using machine-learning techniques. Specifically, we define 7 genres for Flash movie primarily by their purposes and also by their appearance, namely Game, Music TV (MTV), Cartoon, Interface, Banner/Logo, Intro, and Others. The meanings of the first 3 genres are consistent with that of commonsense. “Interface” movies refer to those used as standalone, online interfaces (rather than as components in an interface), which support all the functions and interactions available in a web page. “Banner/Logo” movies are commercial advertisements in the form of banner or logo embedded into web pages. An “Intro” movie is a short snippet that introduces something, which can be a product, a movie, a company or studio, or even a family member. It can be regarded as an advertisement on a broad sense, but compared with banner or logo movies it has a temporal duration. The movies that do not belong to any of the above genres are put under “Other”.

3.2. Features Extraction The classification of Flash movies is conducted by following machine learning approaches: first, a classifier is trained based on a training set of manually classified movies; the classifier is then applied to label unclassified movies. The specific features used to train the classifier are concluded by observing the characteristics of the movies in each genre as well as

the discrepancy of the movies from different genres. The observations made on each Flash genre are detailed in Table 1. Based on these observations, we select the following features of Flash movies to train the classifier: 1. 2. 3.

Movie length in terms of frame count Size of the Flash (.SWF) file Area of a movie frame as the product of frame width and height 4. Ratio of the frame width against its height 5. Amount of user interactions per frame 6. Amount of action scripts per frame 7. Average number of hyperlinks (clickable objects) per frame 8. Number of event sounds normalized by the frame count 9. Whether the movie has a long streaming sound 10. Whether the movie contains embedded images and/or videos All the features mentioned are computational in the sense that they can be automatically extracted from the raw data file of a Flash movie, or specifically from the features of the three basic movie elements, namely, objects, actions, and interactions [4]. JavaSWF (http://www.javaswf.org), a Flash-to-XML converter is utilized to convert the binary contents of a Flash movie into a series of encoded XML tags, thus the details of how the movie is organized and the plot is performed can be recognized by our analyzing program. The extracting methods of these features are explained, respectively, as follows. z Movie Information Certain additional elementary features about the Flash movie itself can be rather useful from the retrieval point of view:

Table 1: Characteristics of the movies in each genre Genre

File Size

Movie Size

Length

Stream Sound

Game MTV

Large

Long

Cartoon Large

Interface Banner/Logo Small Intro

>Small

Yes

Event Sound

Interaction

Script

Yes

Rich

Rich

Hyper Links

Image/ Video

Limited

>Medium

Limited

Short

Rich

Short

Limited

>Medium

Limited

No

Yes

1.

2. 3.

Length: Length means the temporal length of a Flash movie. This feature is virtually the length of the main timeline in a Flash movie. Frame Size: Frame size is the combination of the width and height of the frames in a Flash movie. Sound: The number and the length of the stream sounds and event sounds could be extracted when parsing a Flash movie.

z Scene Complexity Based on our understanding of a Flash movie’s authoring process, it is found that the scene complexity of a Flash movie generally indicates some of the producer’s intention when making the movie. A Flash movie may incorporate many kinds of media objects, e.g., video, image, vector and text. In many cases, whether a movie is composed with complex visual features or not may affect the expression of meaning. To compute scene complexity, 3 factors are concerned: 1. Average number of images: This factor is defined as the average images in a frame of the movie. 2. Average number of colors: This defines the number of colors used in a movie; therefore, it indicates in what degree the movie could be seen as, e.g., “vivid”. 3. Average number of vertexes: The complexity of a vector graphics could be measured with this factor. It reflects the design style of the movie, say, to be “sketch” or “complex”. z Interactivity We define this feature to measure the interactivity of a Flash movie by calculating the amount and complexity of user interactions involved in a movie. This feature describes the amount of actions involved in a movie given its length, particularly the number of active and passive actions. It is computed based on 2 factors, as: 1. Number of passive actions: A passive action is defined as a “chance” that the movie could interact with a user by waiting for the user action, such as clicking a button, etc. After parsing the Flash movie, all of the actions are analyzed and user interaction-related actions are recorded. 2. Number of active actions: An active action is the action that could be actively performed by a movie. This factor is actually the number of scripts that contained in a Flash movie. It measures whether this movie is “script-driven” or not.

3.3. Bayesian Classifier As can be seen, the feature set is a mixture of low-level, primitive features and high-level, semantic features. Moreover, while some of the features are of real values,

other features are discrete (or binary). This adds particular difficulty to the choice of classifiers, as many classifiers can deal with either exclusively real value inputs or exclusively discrete inputs. Our investigation leads us to use the naïve/Gaussian Bayesian classifier, which can take both real-value and discrete inputs and produce categorical outputs. With this classifier, our classification problem can be formulated as the solution of equation (1), which * of a given movie is means the predicted genre f class the one with the maximum probability, given the set of real-value features FR { f1,..., f m } and the set of discrete features FD { f m 1 ,..., f n } of the movie. * f class

arg max P( f class

i | FR , FD )

i

arg max

P ( FR , FD | f class i ) P( f class P ( FR , FD )

i)

(1)

In equation (1), P( FR , FD ) can be removed since it is unaffected by the choice of i. Moreover, if we assume that all the features are independent and each realvalue feature fj for the movies in the ith class (i.e., f class =i) conforms to the Gaussian distribution N (uij ,V 2 ij ) with mean uij and variance V 2ij , the

equation can be extended as: * f class

m

arg max P ( f class i

j 0

n

– P( f j | f class

i )– N ( f j ; u ij , V ij2 )

i)

(2)

j m 1

where P( f class i) is the prior probability of a given movie belonging to the ith genre, which is computed as the fraction of training samples in the ith genre among all the samples. P( f j | f class i ) (j>m) describes the probability that the jth (discrete) feature of a movie in genre i has the value fj, and it is estimated by the fraction of sample movies with jth feature equal to fj in the ith genre among all the movies in that genre. N ( f j ; uij ,V ij2 ) gives the Gaussian probability density function, where uij is computed as the mean of fj among the movies in the ith genre, and V 2ij is computed as the covariance of fj among these movies. In practice, we do not manually include the genre “Others” in the training and application of the classifier, since this genre is just the “garbage collector” of the movies without salient genre. Moreover, we do not put every movie into its “most probable” genre; rather, our algorithm defines a threshold and assigns a movie into its most suitable

4. Performance Evaluation The performance of Flash classification using Bayesian classifier has been tested on the real-world movies collected by our crawler. Among tens of thousands of the collected movies, we randomly select 2,000 movies and manually classify them into the 6 genres described in Section 3.1, except for the “Other” genre. (The movies are so selected that every movie must belong to one of the 6 genres.) The result of the manual classification is the ground truth of our experiment. Notably, the distribution of the movies among the genres is very uneven, with genre “Interface” having 843 movies and “MTV” less than 50. This unevenness does not compromise the accuracy of our experiments. Instead, it justifies the introduction of the prior probability of each genre P( f class i) (see Section 3.3) in the classifier, which is estimated from this distribution. Given that the training samples are chosen randomly, this estimated probability describes the probability of a movie belonging to a certain genre without knowing the content of that movie. The experiment is conducted using 10-fold crossvalidation. We randomly divide the 2,000 training movies into 10 even groups, with 200 movies in each group. In each round, we use 9 groups to train the Bayesian classifier (i.e., estimating the parameters of Equation 1), then use the trained classifier to predict the genre of the movies in the remaining group. The classification accuracy of this round is computed as the fraction of number of correctly classified movies among all the 200 movies in the testing group. We repeat the experiment for 10 rounds, each time using a different group as the testing data (and the rest as training data). The classification accuracy averaged over the 10 rounds is plot in Figure 1. As we can see, the classification accuracy is around 80% for 4 genres (Banner/Logo, Cartoon, Game, and MTV), while the accuracy on “Interface” is only about 40%, which drags the average accuracy down to 72.4%. The possible reason for the poor performance on “Interface”, as we observe, is the lack of salient features for the movies in this genre. Despite this, the overall performance is rather decent. In a practical context where “recall” is not the main concern, we can further improve the classification by assigning a movie into a genre only if its probability of

belonging to this genre is significantly larger than its probability with other genres. If a movie has several “most probable” genres, we can either put it into these genres simultaneously, or just put it under “Other”. This is not a problem in the context of a search engine, since there are basically an infinite number of movies in each genre and thus missing some of them does not make a big difference in practice. 0.9 0.8 Classification accuracy

genre only if its probability (of belonging to this genre) exceeds the threshold; otherwise the movie is classified as “Others”.

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Banner & Logo

Cartoon

Game

Interface

Intro

MTV

Categories

Figure 1: Classification accuracy for various genres

5. Conclusion With the number and types of Flash movies on the Web increasing, tools for automatic organization of the content have to be created. In this regard, genre information forms one of the most distinguishing features for Flash retrieval. We have presented in this paper an approach to analyze the structural features of Flash movies and to integrate this information into automatic genre-detection using Bayesian classifier.

6. References [1] Flash adoption statistics. http://www.macromedia.com/ software/player_census/flashplayer [2] Ba Tu Truong, Svetha Venkatesh, Chitra Dorai: Automatic Genre Identification for Content-Based Video Categorization. International Conference on Pattern Recognition (ICPR'00) : 4230-4233, September, 2000 [3] Dawei Ding, Qing Li, Bo Feng, and Liu Wenyin, A Semantic Model for Flash Retrieval Using Co-occurrence Analysis, in Proc. ACM Multimedia 2003, November 2003, Berkeley, CA. [4] Jun Yang, Qing Li, Liu Wenyin, Yueting Zhuang, Search for Flash Movies on the Web, in Proc. 3rd International Conference on Web Information Systems Engineering, (Workshops) (WISEw'02), IEEE Computer Society, December 2002. [5] Brett Kessler, Geoffrey Nunberg, and Hinrich Schuetze. Automatic Detection of Text Genre. In Proceedings ACL/EACL, 1997. [6] G. Tzanetakis, G. Essl, P. Cook, "Automatic Musical Genre Classification of Audio Signals ", In. Proc. Int. Symposium on Music Information Retrieval (ISMIR), Bloomington, Indiana, 2001.

Suggest Documents