demonstrate and visualize its functionality on a scientific method for shot detection. 1 Introduction. The field of documentation of steadily increasing amounts.
Visualizing steps for shot detection Marc Ritter and Maximilian Eibl Chemnitz University of Technology D-09111, Chemnitz, Germany {ritm,eibl}@cs.tu-chemnitz.de
Abstract This article introduces the current research and teaching framework, developed at the Chair Media Informatics at Chemnitz University of Technology. Within the demo-session we demonstrate and visualize its functionality on a scientific method for shot detection.
1
Introduction
The field of documentation of steadily increasing amounts of data and digital archiving is one of the challenging topics in the investigations of current research. The Retrieval Group from the research project sachsMedia — Cooperative Producing, Storage, Retrieval and Distribution of Audiovisual Media is currently engaged in the extraction and indexing of important informations of predefined objects from the video sources of local television stations in preparation for successive and custom-driven search processes. [sachsMedia, 2009] The close relationship of the project specific fields speech analysis (SPR), video analysis (VID), and metadata handling (MDH) led to the development of the common framework AMOPA, which is described in more detail in section 2. The field of methods proposed for object detection in literature is huge. Applied onto videos, not a small part of this algorithmic class fails due to abrupt changes of content within consecutive frames (scene change). The detection of shot boundaries is also widely discussed and became a major step in preprocessing, usually used to minimize the failure in postponed object detection. We are using an approach from [Liu et al., 2006] proposed at the scientific competition TRECVID and explain its integration into our framework in section 3.
2
A framework to enhance video analysis
The Java-based research framework AMOPA – Automated MOving Picture Annotator is easily extensible and allows rapid prototyping of arbitrary process-driven workflow concepts, traditionally used in image processing. Figure 1 shows the framework and several components, which are available as open source projects. The open source library FFMPEG is used to open, close, and access any supported kind of video streams. The project Streambaby1 and its subcomponent FFMPEG-Java directly invoke the C functions from Java code via Java Native Access (JNA). 1
http://code.google.com/p/streambaby
Figure 1: Architecture of the research framework AMOPA. (From: [Ritter, 2009]) The implementation of the process concept is based on an extended version of the toolkit Java Media Utility, provided in 2006 in an early state of development by the engineer Paolo Mosna [Mosna, 2007]. Linear workflows can be modeled by using its engine framework. Thereby connected processes allow to pass results in the form of defined objects along the image processing chain (IPC). Any chain contains one input, one output, and several intermediate processes, which can be dynamically instantiated and parameterized during runtime by the means of an XML file. All custom processes are started as separated full-functional threads with optional object sharing. To facilitate and accelerate the creation of customized image processing chains, a visualization toolkit consisting of two components is currently under development. The first one is a graphical editor, oriented at the comfortable usability of GraphEditPlus2 . The other one implements a global window manager, capable of handling any graphical output at any time from registered processes.
3
Applied shot detection
The international scientific Text REtrieval Conference (TREC) series “encourage[s] research in information retrieval from large text collections”3 . The track on Video Retrieval Evaluation became an independent evaluation (TRECVID) in 2003. Finally, Shot Boundary Detection remained a major task until 2007. [Smeaton et al., 2006] 2 3
http://www.thedeemon.com/GraphEditPlus http://trec.nist.gov
Figure 2: Adapted processing chain from the method of AT&T in AMOPA. All processes are marked in dark grey. The white finite state machine detectors Cut and Dissolve are simply aggregated by the connected controlling process used for automated shot detection.
3.1
Definitions
According to [Smeaton et al., 1999] a shot is defined as a sequence of frames “resulting from continuous recording by a single camera”, whereas a scene is frequently composed of multiple shots. In turn the video itself naturally consists of a collection of scenes. Transitions between two adjacent shots can be divided into four major types. A “hard” cut spans two consecutive frames and occurs after a complete change of shots. Fade, dissolves and wipes are examples of gradual transitions containing more than two frames. [Zhang et al., 1993]
3.2
The method of AT&T
[Liu et al., 2006] suggested a promising solution to reliably detect different types of shot transitions. Since hard cuts and dissolves occur most frequent, we decided to implement the corresponding detectors within our framework for further analysis. Figure 2 illustrates the adapted image processing chain of the related work from AT&T. The video frames are read by the input process FrameReader and are passed onto the feature extraction component (left box). The shot boundary detection is done by the connected detectors (right box), which are using the results from feature extraction. In the end the module ResultFusion avoids the overlapping of shot transition candidates. The integrated block-based motion detection is one of the key features to detect shot boundaries for sure. At first the current image has to be segmented into non-overlapping blocks, preferably of size 48 × 48. Subsequently a motion vector for this template to the next frame is calculated using a customized search range of 32 × 32. The difference between the best match within the search range and the underlying template is called matching error. These statements are repeated for every block and the current overall matching error M EA is computed. The actual shot detection is performed by the shot detectors, which are implemented as finite state machines (FSM), starting at state 0. As an example, figure 3 shows the variables used in the cut detector along a sample video. The detector owns a state variable AverageM E, that is updated by convex linear infinite impulse response in state 0. It changes into transition candidate state 2, if the
Figure 3: Visualization of the course of variables from the cut detector (AverageM E—upper figure, M EA —middle figure, detected cuts are represented by the peaks of the different states of the finite state machine within the lower figure) at the randomly chosen video BG 26797 from TRECVID 2008 data set with an overall length of 3,605 frames. AverageM E is a multiple of its predecessor and if the current M EA is higher than within the last five frames. The verification state 3 is reached, if the current M EA remains higher than before. If the dissimilarity between the shot candidates is high, a detected shot is marked in state 1. In any other cases state 0 is invoked. First runs indicate, that the hard cut detector seems to perform superior at data sets from different TRECVID years as well as from local TV stations.
4
Conclusions
Although our framework is still under development, we have shown in short, that it might be equipped with arbitrary algorithms. The chaining for the applied shot detection is not necessarily novel, but the possibility to select and visualize (parts of) algorithms at any stage in time and at comparatively low costs provides a convenient base for further development and examination of state-ofthe-art algorithms. For more detail please refer to [Ritter, 2009]. A more sophisticated version of the presented algorithm from [Liu et al., 2006] for shot detection was introduced by its authors in 2007, whereas a profound description can be found in [Liu et al., 2007].
Acknowledgments This work and the project sachsMedia was funded by Unternehmen Region, the initiative of innovations of the BMBF Neue L¨ander, Germany.
References [Liu et al., 2006] Zhu Liu, Eric Zavesky, David Gibbon, Behzad Shahraray, and Patrick Haffner. AT&T RESEARCH AT TRECVID 2006. Workshop Contribution, AT&T Labs-Research, 2006. http://wwwnlpir.nist.gov/projects/tvpubs/tv6.papers/att.pdf, 13.05.2009. [Liu et al., 2007] Zhu Liu, Eric Zavesky, David Gibbon, Behzad Shahraray, and Patrick Haffner. AT&T RESEARCH AT TRECVID 2007. Workshop Contribution, AT&T Labs-Research, 200 Laurel Avenue South, Middletown, NJ 07748, 2007. http://wwwnlpir.nist.gov/projects/tvpubs/tv7.papers/att.pdf, 13.05.2009. [Mosna, 2007] Paolo Mosna. JMU: Java Media Utility, 2007. http://sourceforge.net/projects/jmu, 13.05.2009. [Ritter, 2009] Marc Ritter. Visualisierung von Prozessketten zur Shot Detection. In Workshop Audiovisuelle Medien: WAM 2009, Chemnitzer Informatik-Berichte, pages 135–150. Chemnitz University of Technology, Saxony, Germany, 2009. http://archiv.tu-chemnitz.de/pub/2009/0095/index.html, 15.06.2009. [sachsMedia, 2009] sachsMedia. InnoProfile Projekt sachsMedia — Cooperative Producing, Storage, Retrieval and Distribution of Audiovisual Media, 2009. http://www.tuchemnitz.de/informatik/Medieninformatik/sachsmedia, http://www.unternehmen-region.de/de/1849.php, 14.05.2009. [Smeaton et al., 1999] Alan F. Smeaton, J. Gilvarry, G. Gormley, B. Tobin, S. Marlow, and N. Murphy. An evaluation of alternative techniques for automatic detection of shot boundaries. In School of Electronic Engineering, pages 8–9, 1999. [Smeaton et al., 2006] Alan F. Smeaton, Paul Over, and Wessel Kraaij. Evaluation campaigns and trecvid. In MIR ’06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321–330, New York, NY, USA, 2006. ACM Press. http://www-nlpir.nist.gov/projects/trecvid/, 14.05.2009. [Zhang et al., 1993] Hongjiang Zhang, Atreyi Kankanhalli, and Stephen W. Smoliar. Automatic partitioning of full-motion video. Multimedia Systems, 1(1):10–28, 1993.