Effective Detection of Various Wipe Transitions - Semantic Scholar

6 downloads 57828 Views 1023KB Size Report
in video editing (refer to the Adobe Premiere video editing soft- ware). The wipe effects may vary in their border shapes, moving speeds, and moving directions.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

663

Effective Detection of Various Wipe Transitions Shan Li, Student Member, IEEE, and Moon-Chuen Lee

Abstract—Automatic detection of wipes and their frame ranges is important for the purpose of reliable video parsing and video database indexing. Wipes are difficult to detect because of the complexity and variety of the transition effects. Many of the existing wipe detection algorithms could detect only a few wipe effects. The false/miss detection problem caused by motion is also very serious. In this paper, we propose a novel wipe detection algorithm that can detect most wipe effects with accurate frame ranges. We carefully model a wipe based on its nature and then use the model to filter out possible confusion caused by motion or other transition effects. More precisely, properties of independence and completeness are proposed to characterize an ideal wipe; frame ranges of potential wipes are located by finding sequences which are a close approximation to an ideal wipe. Bayes rule is applied to each potential wipe to statistically estimate an adaptive threshold for the purpose of wipe verification. Experiment results on videos with different genres show that the proposed methodology can be used to detect various wipe effects effectively. Index Terms—Multimedia analysis, shot segmentation, video processing, wipe detection.

I. INTRODUCTION S DIGITAL video data are now widely available, it is essential that video processing techniques should be able to browse and retrieve videos according to the contents of the video clips. Video segmentation is the first step of video indexing. It temporally segments a video sequence into adjacent shots, and detects transition that connects the two adjacent shots. It is important to make such a low level video processing task fully automatic. Different types of transitions may be used to connect adjacent shots, including abrupt change (video cut) and gradual transitions. An abrupt change, as shown in Fig. 1(a), is defined as the direct concatenation of two shots. It can be detected by locating the isolated peaks in the time series of inter-frame discontinuity values [14], [19], [20]. Using different features to measure discontinuity values between frames, a cut detection scheme can classify the discontinuity values into either shot cut or non shot cut. A gradual transition involves a number of frames in two adjacent shots. Gradual transitions can be roughly categorized into two types: dissolves and wipes, as shown in Fig. 1(b) and (c). In a dissolve, the first sequence is fading out, and the second is fading in. Assuming that the pixels’ color values change linearly

A

Manuscript received July 11, 2006; revised December 9, 2006. This work was supported in part by the Hong Kong Earmarked Grant CUHK4377/02E and in part by the CUHK Direct Grant under Project ID 2050260. This paper was recommended by Associate Editor L. Guan. The authors are with the Computer Science and Engineering Department, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong (e-mail: sli@cse. cuhk.edu.hk; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2007.896621

during a dissolve transition, most approaches detect dissolves by estimating errors of linear regression or approximating the frame-based intensity variance curve of a dissolve to be a parabolic shape [1], [13], [19]. Previous researches have mostly focused on the detection of cuts and dissolves. Wipe transition, as another common transition applied in video editing, receives less attention, possibly due to the complexity and diversity in its transition patterns. It is, however, important to detect the wipe transitions correctly for three different reasons. First, as shots provide the ground for nearly all high-level video processing, it is important to correctly detect the boundaries of shots, including those that are connected by wipe transitions. Second, during video production, each transition type is chosen carefully in order to support the content and context of the video sequences [10]. Automatic detection of wipes and their frame ranges can be used for automatic recognition of the video genres. Third, wipe transitions could produce disturbances on other gradual transition detection. Correct detection of the wipes and their frame ranges, therefore, may help to improve the detection performance of other transition types. In wipe transitions, the pixels in the current shot are replaced by those in the next shot step by step until the current shot is completely replaced by the next one. The transition may involve one or several moving borders that can be of any shape. There are more than 30 types of wipe effects commonly used in video editing (refer to the Adobe Premiere video editing software). The wipe effects may vary in their border shapes, moving speeds, and moving directions. Fig. 2 gives examples of wipe effects with five different shapes for illustration. Compared to cuts and dissolve transitions, wipes are relatively difficult to detect and have not been well studied in previous work. Some approaches examine the statistical or structural properties of the wipe frames and developed wipe detectors based on the statistical model [1], [6]. They, however, have a high false alarm rate. Besides, they have difficulty in locating the frame range of a wipe transition accurately. In [17], Wu et al. proposed a wipe detection method based on the standard deviation of projected pixel-wise differences from images. The method can only detect a limited number of wipe effects. Some more recent approaches [4], [8], [14] detect wipe effects by investigating the orientation of boundary lines in spatial temporal images. They can only detect the specific types of wipes that produce slanted line in spatial temporal images. Besides, object/camera motions could produce the effect of slanted lines as well, which causes false alarms. In [12], the authors chose to use B-Spline interpolation to measure the linearity of the wipe frames projected on different directions. Similarly, this method only detects a few special wipe effects that produce linearity on selected directions. Pei et al. [15] proposed to use motion vectors to find the scene change region of each frame. A wipe is declared once the accumulation of the change regions covers a

1051-8215/$25.00 © 2007 IEEE

664

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

Fig. 1. Different transition types. (a) Cut. (b) Dissolve. (c) Wipe.

Fig. 2. Examples of wipe effects. (a) Source and destination shot. (b) Barn doors. (c) Iris points with thick borders. (d) Iris round. (e) Iris shapes. (f) Pinwheel.

big portion of the frame. The method suffers from a high rate of false alarm, because effects other than wipes may also gradually generate many change regions. The approaches proposed in [3], [18] estimate the change regions by calculating pixelwise differences between two adjacent frames and use the center and variance of the change regions to characterize the wipe patterns. Obviously, this method can only characterize a few types of simple wipe effects, such as horizontal wipe, vertical wipe, circular wipe and diagonal wipe. Mackowiak et al. [11] proposed detecting wipes using video descriptors such as motion activity and dominant color; yet only horizontal and vertical wipes were considered in their work. From our study of the existing wipe detection methods, we notice many of the methods suffer from some or all of the following problems: 1) deficient in detecting various wipe effects simultaneously; 2) difficult to distinguish wipes from motions

which could cause miss detection and false alarms; 3) difficult to locate the frame range of wipes accurately. One major reason for the failure of existing methods to detect various types of wipe effects is that they have not been able to generalize the intrinsic properties of various types of wipes. Any ad hoc method based on a few specific types of wipes would not be able to detect a variety of wipes. In this paper, we propose a generic method to detect different wipe transitions. In the proposed approach, wipes are distinguished from object/camera motions and other gradual transitions using two common properties: independence, and completeness. The property of independence means that every pixel will change its value only once; whereas the property of completeness means that all pixels will have their values changed after the completion of an ideal wipe transition. We formulate a cost function based on the two properties to measure a given sequence’s deviation from an ideal

LI AND LEE: EFFECTIVE DETECTION OF VARIOUS WIPE TRANSITIONS

665

Fig. 3. Wipe transition sequences and their scene change regions between consecutive frames. (a) Inset wipe from top-left to bottom-right. (b) Checker wipe from left to right.

wipe. The frame boundary of a wipe is found when the cost function reaches its local minimum. In order to address the ill-posed threshold determination problem, we use Bayes rule to statistically estimate an adaptive threshold for wipe detection. The rest of the paper is organized as follows. Section II characterizes a wipe transition. Section III presents the model of wipe detection based on the generalized properties of different wipe effects. Section IV introduces the detection of a special type of wipes known as motion wipes. Experiment results are presented in Section V. Section VI concludes the paper.

TABLE I COMPARISON OF DIFFERENT EFFECTS

II. CHARACTERIZING A WIPE TRANSTION

overlap (i.e., they are mutually exclusive). This characteristic is referred to as the property of independence in this paper. 2) The union of all the elements in should cover the entire frame. This characteristic is referred to as the property of completeness. The independence property indicates that every pixel changes its value only once during an ideal wipe transition; and the completeness property indicates that all pixels will have their values changed after the completion of a wipe transition. It is essential that camera/object motions and other gradual transitions, such as dissolves and fades, do not possess the above two properties (see Table I). With camera motions, the pixels keep changing their values; whereas, during object motion, some pixels keep changing and others remain unchanged. In dissolve or other gradual transitions [see Fig. 1(b)], the pixels change constantly to shot . Cuts [see Fig. 1(a)] and gradually from shot somehow satisfy both properties. However, all pixels change abruptly in one single frame. So cuts could be easily distinguished from wipes. Note that in an ideal wipe, both and are assumed having no motions. In practice, the scene change parts could be sensitive to motions. To make the proposed properties valid for a real wipe, it is essential that represents the real scene change regions between two frames. That is, for a given frame within

Suppose a wipe transition spreads over frames. Denote the as ; and scene change region between frames and represent the current shot and the next shot, respectively. An ideal wipe sequence (i.e., during an ideal wipe transition, the two shots are assumed to be motionless) can be modeled as otherwise (1) is the pixel intensity at position where . is the total number of frames in the sequence. puted as

in frame is com-

(2) Equations (1) and (2) indicate that in frame , the pixels whose values have been changed so far in the sequence belong to shot , and the remaining pixels belong to shot . Fig. 3 illustrates different wipe effects and their scene change regions between consecutive frames. From Fig. 3, two observations can be made. 1) In an ideal wipe transition, any two scene change regions and from the set do not

666

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

Fig. 5. Pixel trajectory during a thick-border wipe. (a) The moving border jumps over the pixel. (b) The moving border passes through the pixel.

Fig. 4. General wipe detection algorithm.

change more than once (i.e., moving border’s thickness

times). is related to the and moving speed as (3)

a sequence, we should be able to locate the regions whose contents have been changed from the previous frame, despite the existence of motions in the video shots. III. PROPOSED WIPE DETECTION METHOD A. Outline of the Detection Model Based on the properties of independence and completeness, a wipe detection method is proposed in this section to detect various wipe effects and their frame range. Two major steps are involved in the proposed wipe detection model. First, for a video segment between each pair of shot cut boundaries, we find any potential wipe sequences. This involves subsampling the original video sequence, detecting scene change regions and locating the frame boundaries of the potential wipe sequence. The second step applies statistical inference on each potential wipe to identify whether or not it is a wipe. Fig. 4. outlines the overall algorithm for detecting wipes between two consecutive cut boundaries. In the following sections, we explain different steps of the algorithm in detail. B. Sequence Subsampling Wipe transitions usually involve one or several moving borders. The ideal wipe effects have sharp borders (i.e., the borders are of zero thickness). However, some wipes may have thick borders, such as the effect shown in Fig. 2(c). In thick-border wipes, pixels may have two different behaviors. One is that the moving border jumps over some pixels, where the two proposed properties will be satisfied. However, the independence property is violated in the other case, where the moving border passes through the pixels. Fig. 5 shows the trajectory of a pixel’s value in the two situations. From Fig. 5(b), we notice that during a thick-border wipe, the independence property is violated as a pixel’s value could

Subsampling the wipe sequence could reduce the violation. If frames, one frame is sampled from every the pixel would change its value at most twice during a wipe transition; on the other hand, the pixel value could be changed . Compared to the size of a video more times if frame, the border thickness is relatively small. Statistics based on a large number of videos suggest that the thickness of a wipe border is usually smaller than 10% of the frame length , and that a wipe usually spans over 15–30 frames. So, for (3), we . Thus, we can set . have It is clear that a sampled sharp-border wipe sequence also satisfies the properties of independence and completeness. Therefore, we can first subsample all video sequences before performing wipe detection. C. Robust Detection of Scene Change Regions The scene change region computed from direct comparison of pixel-based features is sensitive to motions. To solve this problem, we make use of the motion vectors that already exist in MPEG video streams. In MPEG1/MPEG2 formats, the motion vectors of each macro block can be rebuilt from I, B and P frames. Most motion vector reconstruction (i.e., video dein the previous frame that corcoders) can find a position responds to the position in the current frame. If a macro . If the two macro block is intra-coded, we set at the current blocks are quite dissimilar, the position frame can be declared as a scene change region. Equation (2) thus becomes (4) The threshold is used to filter out small differences. We use statistics to estimate the threshold which adapts to the video sequence itself. We expect that samples of the same scene blocks

LI AND LEE: EFFECTIVE DETECTION OF VARIOUS WIPE TRANSITIONS

667

Fig. 6. Detection of scene change regions. (a) Test sequence: wipe transition starts from frame 7, and ends at frame 28. (b) Detected scene change regions between consecutive frames. White dots represent scene change blocks.

are likely to display normal noise distributions [16]. The distribution of the same block’s value over time is interpreted as a Gaussian model

(5) is an scalar matrix; is the size of the where square block; and are the mean and covariance matrix, respectively. Each block is recursively updated to combine information of the most recent frame with the knowledge contained in the current parameters of the Gaussian model and in the prior information. Suppose that the motion vector of block at frame points to block at frame , we can as update the model of block

is the set of neighboring blocks that have been upwhere dated from previous frames. Detection of the scene change regions over a wipe sequence is shown in Fig. 6. To reduce noise, was applied Gaussian blurring of size [5 5] with to the detected scene change regions. Most of the small isolated white dots in Fig. 6(b) were removed after forcing the blurred image of scene change regions to be binary. D. Locating Potential Wipe Sequences Using a Cost Function As mentioned in Section II, motions and other gradual transitions types such as dissolves and fades do not satisfy the proposed properties. A cost function is formulated based on the accumulated scene change regions to measure the deviation of a given sequence s from ideal wipe behaviors:

(8) (6) is the learning rate. It is prowhere portional to the posterior probability that has matched the distribution of , given the previous observations (i.e., and ). The updating operation between every two frames allows the detector to compensate for the noises. A nonmatch is is beyond 2 standard deviations of the distribution found if is considered as a scene of . Then the block at position change region. Note that if a block in the current frame appears for the first time, the Gaussian model of the block can be initialized as (7)

where otherwise. is the number of times that a region has changed; is the total number of regions in a frame; the tuning factor is set to 0.5 by default. The first component of (8) measures a video sequence’s deviation from the independence propis ; and will erty. Its value will be large if be 0 when is 0 or 1. The latter component measures the deviation from the completeness property; its value will be 0 when is nonzero. The cost function penalizes regions that did not change at all, or those that changed more than once. The cost function can be used to locate potential wipe sequences having small deviations. Assuming that the cut boundaries have already been found using any existing detection methods (eg. [9]), we aim to find

668

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

Fig. 7. Boundary determination. (a) Statistics of change regions between two consecutive frames. The accumulation process is triggered at frame 7. (b) Cost function curve with start frame 7, and ends frame 28, where the cost function value reaches its minimum.

potential wipe sequences between each pair of cut boundaries. Since wipe effects usually result in a significant numbers of scene change regions between consecutive frames, the ratio of the number of detected change regions to the total number of regions in a frame can be used to trigger the accumulation of , the change regions. Once the ratio exceeds a threshold current frame is considered as the start frame of the potential wipe. The end frame of the potential wipe sequence is when the cost value reaches a minimum within found to be , or the ratio becomes smaller the window than the threshold . The window size can be chosen as the shortest duration of a wipe in a video. As a wipe usually spans in our work. Fig. 7 evaluover 15 24 frames, we set ates the cost value of an input video sequence shown in Fig. 6. The accumulation of scene change regions is triggered at frame , 7. The minimum of the cost function is obtained at showing that the cost function reaches its minimum at the wipe boundary. The potential sequence obtained so far may not be a wipe, since any other sequences with falsely detected start frames may also reach a local minimum of the cost function. Now we need to check whether the potential wipe is a wipe or not.

of the sequence , the opGiven the cost value timal decision can be made by comparing the conditional proband : if choose abilities , else choose . Using the Bayes rule, the decision can equivalently be based on checking the following inequalities: (9) The likelihood functions and are estimated from the training sequences, by calculating the distributions of the values of wipe sequences and nonwipe sequences (i.e., regular shots), respectively. Since there is no prior parametric modeling of the distributions of the likelihood functions, and must be estimated nonparametrically directly from the database of examples, by computing the values within wipes and nonwipe shots, histograms of the respectively. In this work, we derive the ideal distributions and ) with using Parzen windows. Two kernel functions ( and , exponential forms are used to estimate respectively [5] for wipes for nonwipe sequences

E. Wipe Transition Detection To identify if a potential wipe sequence is a wipe, we apply a detection method similar to the work in [7] by conceiving the detection problem as a statistical detecting process. More precisely, the detection process can be regarded as a statistical inference of the two hypotheses. : the potential wipe sequence is a real wipe sequence. : the potential wipe sequence is not a real wipe sequence.

(10) where parameters.

,

,

are the estimated

in (9) defines an adaptive threshold for decision making based on the two hypotheses. In general, is considered as a priori probability ; it is related to the number of frames elapsed since the last shot cut boundary. In order to integrate the context information of the specific shot

LI AND LEE: EFFECTIVE DETECTION OF VARIOUS WIPE TRANSITIONS

669

Fig. 8. Example sequences with motion wipes: (a) Vertical motion wipe (type i). (b) Horizontal motion wipe (type ii). (c) Push (type iii).

transition type, Hanjalic et al. [2], [7] suggested to modify using the conditional probability . The context is a set of events that characterize . Thus

should be inversely proRegarding the second event, portional to the difference between the two frames: (14)

(11) Studies on a large number of video shots suggest that can be modeled as the cumulative probability of a Poison function [7] (12)

and are the first frame and the last frame of the where potential wipe sequence; is the histogram-based difference is the normalization factor, debetween the two frames; and of the segment between the last fined as the mean value of shot cut boundary and the first frame of the examined sequence. IV. DETECTION OF MOTION WIPES

where is the distance between the last shot cut boundary and the first frame of the potential wipe sequence; is the average shot length in the training dataset. is defined as the product of conditional probabilities . Two events can be of different events: derived from two observations. 1) : A wipe usually involves 15 24 frames (0.5 0.8 s in 30 fps videos); so a much shorter or longer sequence is less likely to be a real wipe. 2) : The first frame and the last frame of a potential wipe sequence should be much different, assuming that the wipe transition involves two shots with different contents. could be Regarding the first event, the probability of the considered as a probability function of the length . To ensure the value of the examined sequence: function would change smoothly with different values of , the probability function is formulated as a Gaussian error function (13) where

, the parameters and are estimated from the training dataset, by making the of all wipes in the training average probability values dataset larger than 0.5. can be found experimentally to The parameters optimize the detection performance on the training datasets; (10, 2) was selected from our training database.

The wipe transitions mentioned so far are assumed to have no foreground or background movement (i.e., the two involved shots are not moved globally). This type of wipes, referred to as motionless wipe, belongs to the most common form of wipe transitions. However, there is another type of wipes, known as motion wipe, where the two shots involved may move in or move out during the transition. According to the movement of the two shots, motion wipe could be classified into three types: 1) the destination frame gradually moves in to replace the initial frame; 2) the initial frame gradually moves out to uncover the destination frame; or 3) the destination frame moves in while the initial frame moves out. Fig. 8 shows some sample sequences of motion wipes. The detection of the former two types of effects is similar to the detection of motionless wipes as introduced in Section III, with a small modification on the calculation of scene change regions. The third type of motion wipe [see Fig. 8(c)] resembles camera panning and is, therefore, difficult to be distinguished from a video sequence with camera panning. One possible solution may involve global motion detection and vertical/horizontal line detection. In this paper, we aim to detect the first two types of wipes, leaving the third type to the future work. As motion wipes could also have thick/blurred borders, the sub sampling technique introduced in Section III-B can also applied to the video sequence before performing further detection steps. Using the method similar to the one introduced in Section III-C, we can find scene change regions between two

670

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

TABLE III TEST VIDEO DATASET

Fig. 9. Entering and exiting regions from two motion wipes. (a) Exiting regions extracted from Fig. 8(a). (b) Entering regions extracted from Fig. 8(b).

TABLE II TRAINING VIDEO DATASET

adjacent frames. In this way, exiting regions and entering regions can be obtained by finding, respectively, the regions disappearing from the previous frame and the new regions entering the next frame. Exiting regions can be used to detect type i motion wipes, while entering regions can be used to detect type ii motion wipes. Fig. 9 shows the entering and exiting regions extracted from two motion wipe sequences shown in Fig. 8(a) and (b), respectively. The white parts indicate the exiting regions or entering regions. and the entering Fig. 9 shows that the exiting regions in their two respective video sequences satisfy regions both the independency and completeness properties. Regarding or as the scene change regions between two adjacent frames, motion wipes can be detected using the same statistical detection method proposed in Section III-D, except that the and , respectively. cost values are now calculated from If either one of the two cost values is detected as wipe by the statistical detector, a motion wipe can be declared. V. EXPERIMENTS A. Test Data In the experiments, we use a large number of video data digitalized from TV programs and TREC05 video repository. All videos were captured in 30 fps, with pixel resolution of 640 480. All the selected video datasets contain considerable camera/object motions and transition effects such as cuts, fade in/out and dissolves. The parameters of the proposed method were estimated by using different training video datasets as outlined in Table II. The proposed method was then tested on a collection of video clips with a total length of 178 min. The test video data, as summarized in Table III, contain 34 different wipe shapes, including horizontal/ vertical/diagonal wipes, Venetian, checker,

Fig. 10. Sensitivity of the proposed method with respect to introduced threshold T . (a) P-R comparison. (b) P R comparison.

0

paint splatter, random blocks, iris, sliding, etc. Among the 180 motionless wipes, there are 138 sharp-border wipes and 42 thick-border wipes. And among the 71 motion wipes, there are 60 sharp-border wipes and 11 thick-border wipes. Other than wipes, there are a total of 1216 cuts, 97 dissolves/fades, 63 other unclassified gradual transitions. B. Measurement of Detection Performance The detection performance is measured in precision and recall . The frame-based precision and recall are also used to further evaluate the “quality” of each correct detection. They are defined, respectively, as

(15) the set of wipes in the video, where set of all detected wipes, and detected wipes; set of frames in the wipe, set of all detected frames, and detected frames of the wipe.

LI AND LEE: EFFECTIVE DETECTION OF VARIOUS WIPE TRANSITIONS

TABLE IV MOTIONLESS WIPE DETECTION RESULTS

TABLE V MOTION WIPE DETECTION RESULTS

TABLE VI AVERAGE DETECTION PERFORMANCE (100%)

TABLE VII DETECTION STATISTICS FOR SHARP-BORDER AND THICK-BORDER WIPES

C. Results The threshold is used in the method to activate the wipe detection process. It is tuned step by step and reaches its optimum value when the tradeoff between precision and recall is most balanced. The optimization is initially carried out using the training dataset as summarized in Table II, where is obtained. Fig. 10 shows the change in average , , , and rates for the test data, where is varied to evaluate the sensitivity. It is seen from Fig. 10 that a small change in has only a light effect on performance. , we evaluated the proposed method Using using the real video datasets as summarized in Table III. The detection results for motionless and motion wipes are as shown in Tables IV and V, respectively. Similar to the work in [12], in our implementation, any partial detection of more than 10% of the frames in a wipe is considered as a correct detection. The average performance results are as summarized in Table VI. The results presented in Tables IV–VI are for both sharp and thick border wipes; whereas Table VII shows the performance statistics for sharp and thick border wipes separately. As shown in Table VI, we obtain relatively high and wellbalanced precision and recall rates. For motionless wipes, the

671

average precision rate and recall rate are 93.4% and 94.4%, respectively. For motion wipes, the average precision rate and recall rate are 90.0% and 88.7%, respectively. The high precision rates indicate that the proposed method could be quite robust to various transition effects (other than wipes) and camera/object motions in the test data. The high recall rates, on the other hand, suggest that our method is effective in detecting a good diversity of wipe patterns. In the test video datasets, there are more than 30 types of shape effects associated with the 180 wipes. The detection results for these wipes show that the proposed method is not restricted to wipes with any regular (linear) or rigid changing patterns. Concerning the frame accuracy of the correct detections, our method can locate the frame range of the wipe sequences satisfactorily, with the average frame-based precision rate of 89.1% and recall rate of 93.3% for motionless wipes. Regarding the overall detection performance, it can achieve an average precision rate of 87.1% and an average recall rate of 88.9% for motion wipes. The slightly lower detection performance for the motion wipes could have been caused by the global movements in motion wipes which tend to bring more noise to the detection process. There are a few miss detections and false alarms in the detection results. Most false alarms are found in the sequences where the object/camera motion patterns are similar to wipe effects. For example, in one program in CCTV 4 news, a flag sweeping over the frame was wrongly detected as a wipe [see Fig. 11(a)]. The proposed method can miss some wipes where the two involved shots are within the same scene, such as a wipe involving two video shots with similar contents within a sports field in a football match [see Fig. 11(b)]. Some other miss detections could be caused by very fast object or camera motions in the wipes. In this situation, the fast motions cannot be detected by the motion estimation technique used in our detection method. The proposed method will then wrongly identify a considerable number of scene change regions between two consecutive frames. The time complexity for the implementation of the proposed , motion wipe or motionless wipe detection method is where is the frame size and is the length of the video segment between two cut boundaries. Based on the experiments carried out on a Pentium-R 2.79-GHz machine with 512-M memory, the average time for wipe detection is around 0.033 s/frame, for video sequence with 640 480 frame size. D. Performance Comparison As we stated in Section III-E, a cut-off threshold for the cost values can be pre-defined in order to identify whether a potential wipe sequence is a wipe, without using the proposed statistical detection technique (SDT). We performed wipe detection experiments by manually setting to different values. Then the two methods are compared in Table VIII. was tuned between 0 and 1, at an interval of 0.025 and we obtained the optimal performance (i.e., the tradeoff between precision and re. Note that using different call is most balanced) when values of will not affect the frame accuracy of the detections. Therefore, and are not included in Table VIII. The results shown in Table VIII suggest that applying SDT in wipe verification could greatly improve the detection performance. The proposed method could achieve 91.7% average pre-

672

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 6, JUNE 2007

Fig. 11. (a) Example of false alarm. (b) Example of miss detection.

TABLE VIII PERFORMANCE OF PROPOSED DETECTION METHOD WITH AND WITHOUT SDT

TABLE IX COMPARISON OF MOTIONLESS WIPE DETECTION METHODS

TABLE X COMPARISON OF MOTION WIPE DETECTION METHODS

cision rate, and 91.6% average recall rate, outperforming the method using pre-defined , which could achieve only an average performance no better than around 83.7% precision rate and 82.7% recall rate. By incorporating a priori knowledge and contextual information of wipes, the proposed method with SDT can adaptively compute a threshold for wipe verification and can therefore improve the wipe detection performance given the input information of a potential wipe sequence. The computation complexity for wipe detection with or without using SDT is the same since applying the SDT would take only a constant time. Moreover, in our implementation, the processing time of the proposed method using SDT has been shortened by computing the prior probabilities offline. We also perform comparison experiments using several benchmarking wipe detection methods. The first approach used for comparison is denoted as the statistical method (SM) [1] that exploits the linear change in the means and the variances of the frames in the wipe region. The threshold set used in the method is obtained experimentally so as to give the best results in average. The second approach used for comparison makes use of the method proposed in [3] and is referred to as trajectory method (TM). In this case, a moving strip is obtained by performing binarization on the difference image between two consecutive frames. A temporal window of size is used in our implementation to monitor the trajectory of the moving strip. The B-spline interpolation curve fitting technique (BICF) presented in [12] is also used for comparison. In this method, six time-evolving signals are calculated from projections of direction-emphasized frames on four angles: . Wipe regions are found when any one of the six signals is fitted with a B-spline interpolated function with small errors. Finally, the motion vector based

method (MVM) proposed in [15] is used for comparison to evaluate the performance of a technique that explicitly utilized motion vector fields in the compressed domain. The motionless wipe detection results of the above four methods and our proposed method are as presented in Table IX. Among the four methods, SM and MVM can detect motion wipes. The motion wipe detection results of SM, MVM and our proposed method are as shown in Table X. In motionless wipe detection, the proposed method achieved the best performance in terms of precision and recall. It managed to reduce the false alarms generally caused by other types of gradual transitions and camera/object motions. On the hand, SM raised a large number of false alarms (193), which could have been caused by the intensive object/camera motions in the videos. MVM also produced significant number of false alarms (35). TM and BICF produced a good number of misses (61 by TM, and 44 by the BICF); their inferior performance could have been caused by their design for detecting only several types of wipe effects. In addition, MVM only checks for the property of completeness, which accounts partially for its weak performance, since fast motions and other effects could also gradually change the contents of many blocks in the video frame, leading to false alarms. The nonprecise motion vectors in the compressed domain could also cause weak detection performance. In motion wipe detection, the proposed method also outperforms the other two methods in terms of precision and recall

LI AND LEE: EFFECTIVE DETECTION OF VARIOUS WIPE TRANSITIONS

rate. From Table X, MVM can also detect motion transitions effectively, yet its precision rate is significantly lower than the . The statistical method has the proposed method worst precision rate (26.6%). When measuring the detection performance based on the accuracy of the detected frame range of wipes, the proposed method also achieved satisfactory results; its good performance could be attributed to its application of both the independence property and the completeness property in determining the wipe boundaries. SM, TM and BICF also detected the wipe boundaries with a good accuracy. The use of a predefined threshold to terminate the detection process by MVM may explain , why it detected less accurate wipe positions ( for motionless wipe detection, and , for motion wipe detection).

VI. CONCLUSION This paper proposes an effective method for detecting different types of wipe effects. We have modeled a wipe based on its unique properties and distinguished wipes from motions and other gradual transitions such as dissolve and fade. With the introduction of the cost function, we detect the wipe boundaries with a good precision by searching for the local minimum of the cost function. The miss detection rate caused by motions is reduced by using a robust method to estimate the scene change regions. The wipe detection performance is optimized by incorporating a priori knowledge and contextual information of wipes in a statistical detection framework. Experiment results on videos with different genres show that the proposed methodology can detect various wipe effects effectively. For the rare cases of wipes with very fast motions, the proposed method may generate more miss detections because the motion estimation used may not be able to track such very fast motions. We could consider using motion estimation methods involving multi scale processing in our future work. For example, the fast hierarchical matching algorithm proposed in [13] could be combined with the robust scene change regions estimation technique to address the very fast motion problem. Further, the proposed method assumes that a wipe occurs in an inter-scene level. Therefore, the present method could have problem in detecting wipe transitions within the same scene; this problem should be addressed in our future work. Nevertheless, wipe effects within the same scene are not so common since the two shots involved in a wipe usually have distinct color/contents layouts in order to convey a clear wipe effect.

REFERENCES [1] A. M. Alattar, “Wipe scene change detector for use with video compression algorithms and MPEG-7,” IEEE Trans. Consum. Electron., vol. 44, no. 1, pp. 43–51, Feb. 1998. [2] G. Boccignone, A. Chianese, V. Moscato, and A. Picariello, “Foveated shot detection for video segmentation,” IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 3, pp. 365–377, Mar. 2005. [3] P. Campisi, A. Neri, and L. Sorgi, “Wipe effect detection for video sequences,” in Proc. IEEE Workshop Multimedia Signal Process., Dec. 2002, pp. 161–164.

673

[4] M. S. Drew, Z.-N. Li, and X. Zhong, “Video dissolve and wipe detection via spatio-temporal images of chromatic histogram differences,” in Proc. IEEE Int. Conf. Image Process., Sep. 2000, vol. 3, pp. 929–932. [5] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. New York: Wiley, 2001. [6] W. A. C. Fernando, C. N. Canagarajah, and D. R. Bull, “Wipe scene change detection in video sequences,” in Proc. ICASSP, 1999, pp. 294–298. [7] A. Hanjalic, “Shot-boundary detection: Unraveled and resolved?,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 2, pp. 90–105, Feb. 2002. [8] H. Kim, J. Lee, and M.-H. Song, “An efficient graphical shot verifier incorporating visual rhythm,” in Proc. IEEE Int. Conf. Multimedia Comput. Syst., Jun. 1999, vol. 1, pp. 827–834. [9] S. Li and M. C. Lee, “An improved sliding window method for shot change detection,” in Proc. 7th IASTED Int. Con. Signal and Image Process., Honolulu, HI, Aug. 2005, pp. 246–254. [10] R. Lienhart, “Reliable transition detection in videos: A survey and practitioner’s guide,” Int. J. Image Graph., vol. 1, no. 3, pp. 469–486, 2001. [11] S. Mackowiak and M. Relewicz, “Wipe transition detection based on motion activity and dominant colors descriptors,” in Proc. 4th Int. Symp. Image Signal Process. Anal., 2005, pp. 480–483. [12] J. Nam and A. H. Tewfik, “Detection of gradual transitions in video sequences using B-splines interpolation,” IEEE Trans. Multimedia, vol. 7, no. 4, pp. 667–679, Aug. 2005. [13] K. M. Nam, J. S. Kim, R. H. Park, and Y. S. Shim, “A fast hierarchical motion vector estimation algorithm using mean pyramid,” IEEE Trans. Circuits Syst. Video Technol., vol. 5, no. 4, pp. 344–351, Aug. 1995. [14] C. W. Ngo, T. C. Pong, and H. J. Zhang, “On clustering and retrieval of video shots through temporal slices analysis,” IEEE Trans. Multimedia, vol. 4, no. 4, pp. 446–458, Dec. 2002. [15] S.-C. Pei and Y.-Z. Chou, “Effective wipe detection in MPEG compressed video using macro block type information,” IEEE Trans. Multimedia, vol. 4, no. 3, pp. 309–319, 2002. [16] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland, “Pfinder: Real-time tracking of the human body,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 780–785, 1997. [17] M. Wu, W. Wolf, and B. Liu, “An algorithm for wipe detection,” in Proc. ICIP, 1998, pp. 893–897. [18] H. Yu, G. Bozdagi, and S. Harrington, “Feature-based hierarchical video segmentation,” in Proc. IEEE Int. Conf. Image Process., 1997, vol. 2, pp. 498–501. [19] R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and classifying production effects,” Spring-Verlag Multimedia Syst., vol. 7, pp. 119–128, 1999. [20] H. J. Zhang, A. Kankahalli, and K. Mai, “Automatic partitioning of full-motion video,” ACM/Springer Multimedia Syst., vol. 1, no. 1, pp. 10–28, 1993. Shan Li (S’07) was born in Jiangxi, China. She received the B.S. and M.S. degrees in computer science from Southwest Jiaotong University, Sichuan, China, in 1999 and 2002, respectively. She is currently working toward the Ph.D. degree in the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. Her current research interests include video segmentation and retrieval, image processing, and multimedia applications.

Moon-Chuen Lee received the B.Sc. degree from University College London, London, U.K., the M.Sc. degree from the Imperial College of Science and Technology London, London, U.K., and the Ph.D. degree in computer science from the University of London, London, U.K. He is currently an Associate Professor at the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. His research interests lie in the areas of content-based image retrieval, multimedia applications, network security, and mobile positioning.

Suggest Documents