Late-Breaking Work: Engineering of Interactive Systems
#chi4good, CHI 2016, San Jose, CA, USA
Recommending Movies Based on ` Mise-en-Scene Design Yashar Deldjoo Politecnico di Milano
[email protected]
Franca Garzotto Politecnico di Milano
[email protected]
Mehdi Elahi Politecnico di Milano
[email protected]
Pietro Piazzolla Politecnico di Milano
[email protected]
Paolo Cremonesi Politecnico di Milano
[email protected]
Abstract In this paper, we present an ongoing work that will ultimately result in a movie recommender system based on the Mise-en-Scène characteristics of the movies. We believe that the preferences of users on movies can be well described in terms of the mise-en-scène, i.e., the design aspects of movie making influencing aesthetic and style. Examples of mise-en-scène characteristics are Lighting, colors, background, and movements. Our recommender system opens new opportunities in the design of new user interfaces able to offer a personalized way to search for interesting movies through the analysis of film styles rather than using the traditional classifications of movies based on explicit attributes such as genre and cast.
Author Keywords movie recommendation, film making, video processing
Introduction Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). CHI’16 Extended Abstracts, May 7–12, 2016, San Jose, CA, USA. ACM 978-1-4503-4082-3/16/05. http://dx.doi.org/10.1145/2851581.2892551
Recommender Systems (RSs) are applications, that are capable of filtering large information and selecting the items that are likely to be attractive for users [18]. Particularly, they play an important role in video-on-demand web applications (e.g., YouTube and Netflix) characterized by a huge catalogs of movies: the ultimate aim of RSs is to find and recommend to users the movies that are most likely to be attractive for them. However, RSs cannot make rele-
1540
Late-Breaking Work: Engineering of Interactive Systems
vant recommendation of movies, before some information is available on these movies. Recommendations are typically made using implicit and explicit preferences of users on movies’ attributes (e.g., genre, director, and actors)[6]. However, user’s preferences can be also described by the mise-en-scène characteristics of movies [8, 7], i.e., the design aspects of a movie production used to classify aesthetic and style. Lighting, colors, background, and movements in a movie are all examples of mise-en-scène features. Although viewers may not consciously notice movie style, it still influences the viewer’s experience of the movie. The mise-en-scène highlights similarities in the narratives, as movie makers typically relate the overall movie style to reflect the story, and can be used to categorize movies at a finer level compared to the traditional movie features [7]. In this research work, we propose the exploitation of automatically extracted design visual features of movies based on mise-en-scène design characteristics, in the context of recommender systems. We propose a novel recommender system that automatically analyze video contents and extracts a set of representative stylistic visual features grounded on Applied Media Aesthetic [22], i.e., the theory that is concerned with the relation of aesthetic media attributes (e.g., light, camera movements, and colors) with the perceptual reactions they are able to evoke in consumers of media communication, particularly movies. Our results poses new challenges in the design of user interfaces able to integrate the stylistic features of movies into a comprehensive and practical recommender system, as the perceived quality of a recommender system is determined by its algorithm as well as by its usability [5, 4, 3].
#chi4good, CHI 2016, San Jose, CA, USA
This is a novel and multidisciplinary approach, from both design and engineering perspectives, toward video recommendation systems. It can build huge influence on the research area, and revolutionize the industry, e.g., social video sharing. More specifically, in this work we have conducted a preliminary data analysis in order to investigate two conjectures: (i) if stylistic visual features extracted from trailers are a good representation of the corresponding features extracted from the original full-length movies, and (ii) if the stylistic visual features are informative indicators of the movies. We briefly present the results of the analysis and provide a discussion on the ultimate goals we pursue. An extended version of this work has been published in [7].
Technical Background A prerequisite for RSs is the availability of information about “explicit” content features of the items. In movie RSs, such features are associated to the items as structured metainformation (e.g., movie genre, director, and cast) or unstructured meta-information (e.g., plot, tags and textual reviews). In contrast, we propose a stylistic-based movie recommendation technique that exploits “implicit” content characteristics of items, i.e., features that are “encapsulated" in the items and must be computationally “extracted” from them. For example, two movies may be from the same genre, but they can be different based on the movie style. “The Fifth Element" and the “War of the Worlds" are both sci-fi movies about an alien invasion. However, they are shot completely different, with Luc Besson (The Fifth Element) using bright colors while Steven Spielberg (War of the Worlds) preferring dark scenes. Although a viewer may not consciously notice the two different movie styles, they still affect the viewer’s
1541
Late-Breaking Work: Engineering of Interactive Systems
experience of the movie. There are countless ways to create a movie based on the same script simply by changing the mise-en-scène [11].
Figure 1: above. Out of the past (1947) an example of highly contrasted lighting. below. The wizard of OZ (1939) flat lighting example.
Furthermore, mise-en-scène characteristics of the movies can bring additional benefits to RSs. For example, miseen-scène can be used to tackle with the Cold Start problem which occurs when the system is unable to accurately recommend a new item to the existing users [10]. This is a situation that typically occurs in social movie-sharing web applications (e.g., YouTube) where every day, hundred millions of hours of videos are uploaded by users and may contain no meta-data and no user preference. Traditional techniques would neglect to consider these new items even if they may be relevant for recommendation purposes, as the recommender has no content to analyze but video files. To the best of my knowledge, this problem has not been yet effectively solved [19].
Artistic Motivation In this section, we describe the artistic background to the idea of stylistic visual features for movie recommendation. We do this by describing the stylistic visual features from an artistic point of view and explaining the relation between these visual features and the corresponding aesthetic variables in movie-making domain.
Figure 2: above. An image from Django Unchained (2012). The red hue is used to increase the scene sense of violence. below. An image from Lincoln (2012). Blue tone is used to produce the sense of coldness and fatigue experienced by the characters.
The study on aesthetic elements and how their combination contributes to establish the meaning conveyed by an artistic work is the subject of different disciplines such as semiotics, and traditional aesthetic studies. The shared notion is that humans respond to certain stimuli in ways that are predictable, up to a given extent. One of the consequences of the above notion is that similar stimuli are expected to provoke similar reactions, and this as the result may allow
#chi4good, CHI 2016, San Jose, CA, USA
to group similar works of art together by the reaction they are expected to provoke. Among these disciplines, Applied Media Aesthetic [22], particularly, is concerned with the relation between a number of media elements, such as light, camera movements, colors, with the perceptual reactions they are able to evoke in consumers of media communication, mainly videos and films. Such media elements, that together build the visual images composing the media, are investigated following a rather formalistic approach that suits the purposes of this paper. By an analysis of cameras, lenses, lighting, etc., as production tools as well as their aesthetic characteristics and uses, Applied Media Aesthetic tries to identify patterns in how such elements operate to produce the desired effect in communicating emotions and meanings. The image elements that are usually addressed as fundamental in the literature, e.g. in [9], even if with slight differences due to the specific context, are lights and shadows, colors, space representation, motion. It has been proved, e.g. in [17][1], that some aspects concerning these elements can be computed from the video data stream as statistical values. We call these computable aspects as features. We will now look into closer details of the features, investigated for content-based video recommendation in this paper to provide a solid overview on how they are used to producing perceptual reaction in the audience. Lighting There are at least two different purposes for lighting in movies chiaroscuro and f lat lighting . While, the first is a lighting technique characterized by high contrast between light and shadow areas that puts the emphasis on an unnatural effect, the latter instead is a neutral, realistic,
1542
Late-Breaking Work: Engineering of Interactive Systems
way of illuminating, whose purpose is to enable recognition of stage objects. Figure 1 illustrates the difference between these two alternatives. Colors The expressive quality of colors is closely related to that of lighting, sharing the same ability to set or magnify the feeling derived by a given situation. Even if an exact correlation between colors and the feeling they may evoke is not currently supported by enough scientific data, colors nonetheless, have an expressive impact that has been investigated thoroughly, e.g. in [20]. An interesting metric to quantify this impact has been proposed in [21] as perceived color energy , a quantity that depends on a color’s saturation, brightness and the size of the area the color covers in an image. Also the hue plays a role as if it tends toward reds, the quantity of energy is more, while if it tends more on blues, it is less. These tendencies are shown in examples of Figure 2. Motion The illusion of movement given by screening a sequence of still frames in rapid succession is the very reason of cinema existence. In a video or movie, there are different types of motions to consider: • Profilmic movements: Every movement that concerns elements, shot by the camera, falls in this category, e.g. performers motion, or vehicles. The movement can be real or perceived. By deciding the type and quantity of motion an ‘actor’ has, considering as actor any possible protagonist of a scene, the director defines, among others, the level of attention to, or expectations from, the scene. As an example, the hero walking slowly in a dark alley, or a fast car chasing.
#chi4good, CHI 2016, San Jose, CA, USA
• Camera movements: are the movements that alter the point of view on the narrated events. Camera movements, such as the pan, truck, pedestal, or dolly, can be used for different purposes. Some usages are descriptive, to introduce landscapes or actors, to follow performers actions, and others concern the narration, to relate two or more different elements, e.g., anticipating a car’s route to show an unseen obstacle, to move toward or away from events. • Sequences movements: As shots changes, using cuts or other transitions, the rhythm of the movie changes accordingly. Generally, a faster rhythm is associated with excitement, and a slower rhythm suggests a more relaxed pace [2]. In this paper, we followed the approach in [17], considering the motion content of a scene as a feature that aggregate and generalize both profilmic and camera movements.
Research Objectives There are a number of objectives that are expected to achieve at the end of this on-going research work: • development and evaluation of a novel movie recommendation system, based on automatic extraction of visual stylistic features from the multimedia content; the extracted features represent Mise-en-Scène characteristics of the movies; • as a broader goal, design and development of a novel video retrieval platform, including the HCI, that improves searching and recommendation capabilities, based on aesthetic attributes (i.e., visual features) derived from movie styles as determined by movie maker professionals, and accurately match viewers’ perceptions;
1543
Late-Breaking Work: Engineering of Interactive Systems
• extraction of audio features that can effectively describe the movies in the audio feature space and will be used together with visual features to improve the representation model.
Visual features While stylistic visual features have been marginally explored in the community of recommender systems, they have been extensively studied in other fields such as Computer Vision and Video Retrieval [17, 14]. By reviewing the state-of-the-art works in these disciplines, we have identified and selected five visual features that have shown promising results in representing the movie contents and being the most informative and distinctive visual features: (1) Average Shot Length, (2) Color Variance, (3) Average Motion, (4) Motion Variation, and (5) Lighting Key. Average Shot Length: a single camera action is named a shot and the total number of shots in a video can be indicative of the pace at which the movie is being created. For example, action movies typically contain quick movements of the camera in comparison to drama movies. Hence, in action movies, average shot length is expected to be high and in drama to be low. Color Variance: it is known that variance of colors in movies is highly correlated with their corresponding genre. Indeed, directors tend to use a large variety of bright colors for comedy movies and darker combination of colors for horror movies. For each key frame represented in LUV color space, we compute the generalized color variance [17], which is indicative of the color variation in that key frame. Average and Variation of Motion in a video can be caused either as the result of the camera movements (camera motion) or movements of the objects being filmed (object motion). While measuring the average shot length may focuses on the former, it is also desired that the latter type to be captured accurately. For
#chi4good, CHI 2016, San Jose, CA, USA
this purpose, motion features are extracted. We used optical flow [13], indicative of motion, as a robust estimate of the pixel velocities over a sequence of images being filmed. Lighting is considered as a discriminating factor among movie genres and shall be effectively measured as a key playing factor to control the type of emotion induced to a movie consumer. For example, comedy movies often contain abundance of light with a low key-to-fill ratio, i.e., a low ratio between the brightest and dimmest light. This concept in cinematography is known as high-key lightening. On the other hand, horror or noir movies exploit low-key lightening, i.e. low amount of light and a high key-to-fill ratio. We have extracted these visual features, automatically, from each video and used them for recommendation generation. We conducted a preliminary analysis which is described in the next section.
Preliminary Analysis We have conducted a preliminary experiment using a dataset of 167 movies sampled randomly from 4 main genres, i.e., Action, Comedy, Drama, and Horror. Some of the movies were from mixed genres. The dataset consisted of both fulllength movies and their trailers. Almost 95% of the movies are recent (year of production between 1990 and 2015). Only 5% of the movies were produce before the 90s. In this preliminary experiment, we are interested in investigating (i) if visual features extracted from trailers are, in general, a good approximation of the corresponding features extracted from the original full-length movies, and (ii) if the visual features are informative indicators of the movies. We have computed the similarity between the visual features extracted from the full-length movies and the trailers. The similarity values have been computed using the well known Cosine similarity metric [15, 16]. The average sim-
1544
Late-Breaking Work: Engineering of Interactive Systems
ilarity is 0.78 out of 1 (median is 0.80). More than 75% of the movies have a similarity greater than 0.7 between the full-length movie and trailer. Moreover, less than 3% of the movies have a similarity below 0.5. Overall, the cosine similarity shows a substantial correlation between the full-length movies and trailers. This is an interesting outcome that basically indicates that the trailers of the movies can be considered as good representatives of the corresponding full-length movies. We have obtained high correlation between all visual features, except with feature 2 (color variance) and 4 (object motion): the average similarity values are 0.71, 0.57, 0.76, 0.56, and 0.92 for the first to fifth visual feature, respectively. Features 2 and 4 show less similarity, comparing the full-length movies and trailers, suggesting that their adoption, if extracted from trailers, should provide less accurate recommendations. We have also performed a Wilcoxon significance test comparing features extracted from the full-length movies and trailers. The results show that no significant difference exists between the features average motion and lighting key, which clearly shows that the full-length movies and trailers are highly correlation with respect to these two features. For the other features, significant differences have been obtained. This basically states that some of the extracted features may be either less correlated or not very informative. In order to identify the visual features that are more useful in terms of recommendation quality, we have computed Entropy as a measure [12] of the informativeness of the data. Our results show that the entropy scores of almost all visual stylistic features are large, meaning that the informative content is rich: the entropy values are 0.83, 0.61,
#chi4good, CHI 2016, San Jose, CA, USA
0.70, 0.76, and 0.93 for the first to fifth visual feature, respectively. The most informative feature, in terms of entropy score is feature 5, i.e., lighting key, and the least informative feature is the feature 2, i.e., color variance. This observation is in the full consistency with the other findings, that we have obtained from, e.g. Wilcoxon test and correlation analysis (similarity of between features)
Conclusion In this paper, we present an ongoing work of building a video recommender system that uses Mise-en-Scène characteristics of movies in order to generate recommendations. Our recommender system will encompass a technique to automatically analyze video contents and to extract a set of representative stylistic features, i.e., Average Shot Length, Color Variance, Average Motion, Motion Variation, and Lighting Key. We present a preliminary results of analysis that show (i) the trailers of the movies are well representative of the full-length movies (ii) the stylistic visual features are well informative of the movie content. For future work, we plan to design and develop an online web application with a novel HCI that will provide a personalized way to search for interesting movies through the analysis of film styles rather than using the traditional classifications of movies based on explicit attributes such as genre and cast. We plan to design and conduct a real user study in order to evaluate the quality of the recommendation as well as the usability of the system.
Acknowledgements This work is supported by Telecom Italia S.p.A., Open Innovation Department, Joint Open Lab S-Cube, Milan.
1545
Late-Breaking Work: Engineering of Interactive Systems
REFERENCES 1. Warren Buckland. 2008. What Does the Statistical Style Analysis of Film Involve? A Review of Moving into Pictures. More on Film History, Style, and Analysis. Literary and Linguistic Computing 23, 2 (2008), 219–230. DOI: http://dx.doi.org/10.1093/llc/fqm046 2. Kazimierz Choro´s. 2009. Video Shot Selection and Content-Based Scene Detection for Automatic Classification of TV Sports News. In Internet  Technical Development and Applications, Ewaryst Tkacz and Adrian Kapczynski (Eds.). Advances in Intelligent and Soft Computing, Vol. 64. Springer Berlin Heidelberg, 73–80. 3. Dan Cosley, Shyong K Lam, Istvan Albert, Joseph A Konstan, and John Riedl. 2003. Is seeing believing?: how recommender system interfaces affect users’ opinions. In Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, 585–592. 4. Paolo Cremonesi, Franca Garzotto, and Roberto Turrin. 2012a. Investigating the persuasion potential of recommender systems from a quality perspective: An empirical study. ACM Transactions on Interactive Intelligent Systems (TiiS) 2, 2 (2012), 11. 5. Paolo Cremonesi, Franca Garzottto, and Roberto Turrin. 2012b. User effort vs. accuracy in rating-based elicitation. In Proceedings of the sixth ACM conference on Recommender systems. ACM, 27–34. 6. Marco de Gemmis, Pasquale Lops, Cataldo Musto, Fedelucio Narducci, and Giovanni Semeraro. 2015. Semantics-Aware Content-Based Recommender Systems. In Recommender Systems Handbook. Springer, 119–159.
#chi4good, CHI 2016, San Jose, CA, USA
7. Yashar Deldjoo, Mehdi Elahi, Paolo Cremonesi, Franca Garzotto, Pietro Piazzolla, and Massimo Quadrana. 2016. Content-based Video Recommendation System based on Stylistic Visual Features. Journal on Data Semantics Special Issue on Recommender Systems (2016). 8. Yashar Deldjoo, Mehdi Elahi, Massimo Quadrana, Paolo Cremonesi, and Franca Garzotto. 2015. Toward Effective Movie Recommendations Based on Mise-en-Scène Film Styles. In Proceedings of the 11th Biannual Conference on Italian SIGCHI Chapter. ACM, 162–165. 9. Chitra Dorai and Svetha Venkatesh. 2001. Computational Media Aesthetics: Finding Meaning Beautiful. IEEE MultiMedia 8, 4 (Oct. 2001), 10–12. DOI:http://dx.doi.org/10.1109/93.959093 10. Mehdi Elahi, Francesco Ricci, and Neil Rubens. 2013. Active learning strategies for rating elicitation in collaborative filtering: a system-wide perspective. ACM Transactions on Intelligent Systems and Technology (TIST) 5, 1 (2013), 13. 11. J. Gibbs. 2002. Mise-en-scène: Film Style and Interpretation. Wallflower. https://books.google.it/books?id=j4dqY_phZlEC 12. Isabelle Guyon, Nada Matic, Vladimir Vapnik, and others. 1996. Discovering Informative Patterns and Data Cleaning. (1996). 13. Berthold K Horn and Brian G Schunck. 1981. Determining optical flow. In 1981 Technical Symposium East. International Society for Optics and Photonics, 319–331.
1546
Late-Breaking Work: Engineering of Interactive Systems
14. Weiming Hu, Nianhua Xie, Li Li, Xianglin Zeng, and Stephen Maybank. 2011. A survey on visual content-based video indexing and retrieval. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 41, 6 (2011), 797–819. 15. Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro. 2011. Content-based recommender systems: State of the art and trends. In Recommender systems handbook. Springer, 73–105. 16. Michael J. Pazzani and Daniel Billsus. 2007. The Adaptive Web. Springer-Verlag, Berlin, Heidelberg, Chapter Content-based Recommendation Systems, 325–341. http: //dl.acm.org/citation.cfm?id=1768197.1768209 17. Zeeshan Rasheed, Yaser Sheikh, and Mubarak Shah. 2005. On the use of computable features for film classification. Circuits and Systems for Video Technology, IEEE Transactions on 15, 1 (2005), 52–64. 18. Francesco Ricci, Lior Rokach, and Bracha Shapira. 2011. Introduction to recommender systems handbook. In Recommender Systems Handbook, Francesco Ricci,
#chi4good, CHI 2016, San Jose, CA, USA
Lior Rokach, Bracha Shapira, and Paul Kantor (Eds.). Springer Verlag, 1–35. 19. Neil Rubens, Mehdi Elahi, Masashi Sugiyama, and Dain Kaplan. 2015. Active Learning in Recommender Systems. In Recommender Systems Handbook chapter 24: Recommending Active Learning. Springer US, 809–846. 20. Patricia Valdez and Albert Mehrabian. 1994. Effects of color on emotions. Journal of Experimental Psychology: General 123, 4 (1994), 394. 21. Hee Lin Wang and Loong-Fah Cheong. 2006. Affective understanding in film. Circuits and Systems for Video Technology, IEEE Transactions on 16, 6 (June 2006), 689–704. DOI: http://dx.doi.org/10.1109/TCSVT.2006.873781 22. Herbert Zettl. 2002. Essentials of Applied Media Aesthetics. In Media Computing, Chitra Dorai and Svetha Venkatesh (Eds.). The Springer International Series in Video Computing, Vol. 4. Springer US, 11–38.
1547