Detection of Representative Frames of A Shot using ... - CiteSeerX

2 downloads 0 Views 725KB Size Report
hypothesis testing to detect the subshots within a shot ... ods [10, 9] simply consider first or first and last frame ..... [5] S. H. Han, K. J. Yoon, and I. S. Kweon. A new ...
Detection of Representative Frames of A Shot using Multivariate Wald-Wolfowitz Test 

P. P. Mohanta S. K. Saha B. Chanda 1 ECS Unit, Indian Statistical Institute, Kolkata, India 2 CSE Department, Jadavpur University, Kolkata, India E-mail: [email protected], sks [email protected], [email protected] Abstract

The key-frames can also be used to group the shots to form a scene, which is one step higher toward summarization and abstraction. In general, key-frame detection methodologies assume that the video has already been segmented into shots. Then from each detected shot, the key-frames are selected according to certain criteria. Some methods [10, 9] simply consider first or first and last frame of each shot as the key-frames. Pentland et al [8] have deployed temporal sampling. Zhonghua et al [13] have suggested a scheme based on the ratio of object and background to select one key-frame for each shot. All these schemes select a fixed number of key-frames per shot and the selection criteria ignores the visual content of the frames. In reality, in a shot of longer duration, there is a possibility of sufficient variation to consider multiple key-frames. In case of a ’miss’ in shot detection process, one may have a merged shot and for that also multiple key-frames are desired. Thus, number of key-frames for a shot can not be pre-judged and it is to be decided dynamically depending on the shot content. Few clustering based approaches [14, 3] are also reported. Gunsel et al [4] have relied on threshold method. In such cases, the performance depends on the proper selection of threshold, initialization of clustering parameters. A motion analysis based technique is presented in [12]. Frame difference based schemes are reported in [1, 5]. Ciocca et al [1] have presented a dynamic technique where the visual content of the frames are described in terms of colour histogram, edge direction histogram and wavelet statistics. Key-frames are selected following the algorithm based on high curvature points of cumulative frame difference. But, the algorithm depends on certain parameters and tuning of those are crucial. Past study reveals that although several schemes have been tried still a content based automatic and dynamic technique for key-frame extraction is in demand. In this work, we present a hypothesis testing based

For efficient indexing, browsing and retrieval of video data and also for video summarization, extraction of representative frames is essential. Once a video stream is segmented into shots, the representative frames or key-frames for the shot are selected. Automatic selection of suitable representatives for a wide variety of shots is still a challenge as the number of such frames in a shot may also vary depending on the variation in the content. In this work, we propose a novel scheme that relies on Wald-Wolfowitz runs test based hypothesis testing to detect the subshots within a shot and then for each subshot, the frame rendering the highest fidelity is extracted as the key-frame. Experimental result shows that the scheme works satisfactorily for a wide variety of shots.

1. Introduction Due to advancement of video technology the volume of digital video data has increased dramatically. For efficient access of such data, video summarization, indexing and retrieval has become an active area of research. For all such applications, video segmentation is the fundamental step. The segmentation scheme detects the boundary of the shots in a video. The frames in a shot i.e., the no-change frames represent a continuous action with either fixed or very slowly changing objects and background and possess a common semantics. Once the shot boundaries are detected, the next task may be to determine the representative frame(s) for each shot. Such frames are known as key-frames. The key-frames arranged chronologically may be considered as the video storyboard [1] that may be offered to the user as a summary for browsing the video. So, the key-frames can act as the access points/bookmarks [1] for a video stream. 1

978-1-4244-2175-6/08/$25.00 ©2008 IEEE

methodology which satisfies the desired features. The paper is organized as follows. Section 2 presents the proposed methodology. Experimental results are presented in section 3 and finally concluding remarks are cited in section 4.

2. Proposed Methodology The proposed technique assumes that video stream has already been segmented into shots and frames in a shot are fed as input to key-frame detection methodology. The proposed scheme is not prejudiced by any pre-conceived number of key-frames per shot. Essentially the frames in a shot are very similar in terms of their content with very little variation between successive frames. But, in shot of long duration or in a shot depicting complex event, frames after a considerable gap may differ enough to qualify for a separate representative. The proposed scheme actually tries to find the subshots which differ significantly in comparison from its neighbouring subshot. Then, for each subshot, depending on suitable criteria, a representative frame is selected. Thus, the proposed methodology evolves around the task of determination of subshots and it is carried out by non-parametric hypothesis testing based on Wald-Wolfowitz runs test [11].

2.1. Wald-Wolfowitz Runs Test Wald-Wolfowitz runs test is used to solve the nonparametric two sample problem. Suppose, there are two  samples and  of size  and  respectively and the corresponding distributions are  and  . , the null hypothesis to be tested and  the alternate hypothesis are as follows:  : and  come from same population i.e.    .  : They come from different population i.e.     . In classical Wald-Wolfowitz test, it is assumed that sample points are univariate.    observations are  sorted in ascending order and assigned the labels or  depending on the sample to which it belongs. Test statistic,  is computed based on  , the number of runs (a run is a sequence of identical labels) as follows.



 $

 !  #" ! , % ! '- &(%! )+*  , & *

As,  follows standard normal distribution, the critical region may be chosen for a given level of significance which signifies the maximum probability of rejecting a true . If  falls within critical region, . is rejected. Physically, low value of  denotes that two samples are

less interleaved in the ordered list and it leads to the interpretation that they are from different population. This test can be applied in detecting whether two sets of frames belong to same population or not. Two sequences of frames in a shot can be taken as two samples  and  and the null hypothesis i.e. they belong to same population can be tested. As the frames are represented by a feature vector, the sample points are no longer univariate. Hence a multivariate generalization of the scheme is required. Friedman and Rafsky [2] have suggested a multivariate generalization by using the minimal spanning tree (MST) of the sample points as an alternative for univariate sorted list. the steps are as follows:

/ /

Organize the sample points in an MST.

/

Remove the edges connecting the different samples to obtain   102436587)9;:5='5A@B

Suggest Documents