Efficient Multiresolution Scene Change Detection By ... - IEEE Xplore

4 downloads 0 Views 301KB Size Report
System,” IEEE Transaction on Consumer Electronics, Vol. 41, Num 3, pp. [4]Hemg-Yow Chen,”Some Issues on the Design and Implementation of a Continuous ...
THPM 13.4 Efficient Multiresolution Scene Change Detection by Wavelet Transformation Zheng-Yun Zhuang, Chiou-Ting Hsu, Hemg-Yow Chen, Ming Ouhyoung, Ja-Ling Wu Communications and Multimedia Laboratory, Department o f Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan

Abstract Scene change detection is a primitive operator for a wide range of video applications. At the same time, wavelet transformation has been validated as a powerful tool for multiresolution digital signal processing. In this paper, a novel scene change detection algorithm based on wavelet analysis is proposed and a browsing system is established that can perform scene change detection via multiresolution wavelet analysis on video streams.

I. Introduction Scene change detection can be used to cut a video sequence into content-continuous segments. This may help for taking another view of the video, for video database indexing, and for previewing long videos in a systematic way[4]. Besides, pre-acknowledgment of scene changes also helps us to improve the quality of video compression. Human beings can easily sense the scene changes in normal videos. But when a computer is used in this problem, scene changes can only be treated as dissimilarities between nearby frames in a given video sequence. Previous researchers contributes to this problem in variable ways. To date, scene change detection (also referred to as video shot boundary detection) algorithms can be categorized into six different classes[2]. Intuitive pixel-difference-based and edge-trackingbased approaches were proposed. Some researchers solve this problem by histogram-based approach, while others use the statistics-based one. Still others achieve by intensity-based (compression difference) ways[3], or by motion-vector based algorithm. What we propose here is a multiresolution wavelet-based approach that can not only detect the scene changes efficiently and accurately, but also is capable of detecting scene changes in real-time while playing video, with the off-line preprocessing been taken. A user can use the system default threshold and enjoy accurate detection rate of 95% scene change hit rate. If this still can not be satisfied, he or she can modify the threshold, play the video segment-by-segment again manually, and check if all of the scene changes are proper. In our algorithm, Haar wavelet basis (Haar Transform) is adapted to decompose each video frame in YIQ domain and extract the meaningful feature coefficients suitable for detect-

ing scene change[l]. We measure the distances between every pair of nearby frames by this metric. Finally we set a threshold and pick out the peak values from all the inter-frame distances. As a result, scene changes are easily recognized.

11. Wavelet Based Scene Change Detection Algorithm How the system operates can be described as below: Preprocessing Phase In the preprocessing phase, features for each frame, say the i-th one, will be extracted by the following steps:

250 0-7803-3734-4197 $5.00 0 1997 IEEE

1.Read that frame, Fi 2.Convert to power-of-2 sized images, obtaining C(FJ, where C i s the conversion function that will be described further latter 3.Apply 2D HT(Haar Transform), we get HT2D(C(FJ) 4.Retrieve the coefficients that we are interested in (by truncation and quantization), that is, the characteristic of that frame; then we have Characteristici = T-Q( HT2D(C( Fi)) ) where T-Q represents the truncation and quantization function 5.Augment our scene change database SCDB, originally null, by the characteristic information(a1socalled the signature) from step 4. Step 2 is needed for feeding data into HT filter. What must be emphasized here is that the power of 2 sized image is the cut-off version of the original frame. We just cut down the border data and leave the center region, instead of scaling the whole image to fit the nearest power of 2 size. Our conversion strategy makes sense because it’s time-saving than scaling, and because most video are subjective and the subject is always centrally located. Step 3 uses the Haar basis to perform multiresolution wavelet decomposition on the power of 2 sized image. HTzo is simply an extrapolated version of 1-D HT. The characteristic of the frame (also called the signature) is entered into our database during step 5. Dynamic Scene Change Detection Now, database and data structures needed are properly constructed. What we do now is to traverse through all pairs of adjacency tuples in our database in the following manner: 1.Get a pair of nearby tuples in the database for that video. In relational algebra, get T l = ( ~ ~ ~ ~ , , , e # =(SCDB) i T2=0 Frame# = i+l (SCDB ) 2.Display the frame which the first tuple indicates 3.Compute the distance between these frames using our metric M, that is, to compute

II 7~characteristic (TI), characteristic (T2) IIM 4.If the distance measured by step 3 is greater than our threshold T, a scene change occurs 111. System Implementation Our system is built on Windows 95 in Visual C++. In order to verify our scene change algorithm and fmd out misdetections and false alarms, a user can observe simultaneously both the video playback window (c.f. the lower-left window of Figure 2) and inter-frame distance graph (c.f. the upper window of Fig. 2). When a scene change is detected during playback time, the system stops the video for segment ending and wait for the next playback activation. If a user is not satisfied with current segmentation, one can

change the threshold, play the video again, and verifying the segmentation until one feels satisfied.

IV. Performance Evaluation We analyze our system performance on a PC with a Cyrix 6x86-I66 CPU and an ET6000 display card. We used five raw video clips listed in Figure 4 for test. The complexity for preprocessing phase is proportional to the size of video, and so is the performance for the detection phase, due to the fact that it loops once for each frame in video. Ellapsed time for all of the steps are shown in Fig. 1.

Video Database IV, SPlE 2670, pp. 170-199, 1996 [3]Hemg-Yow Chen, Ja-Ling Wu, “A Multi-Layer Video Browsing System,” IEEE Transaction on Consumer Electronics, Vol. 41, Num 3, pp. 842-850, August I995 [4]Hemg-Yow Chen,”Some Issues on the Design and Implementation of a Continuous Media System,” Ph.D. proposal, National Taiwan Universitv. 1997

V. Conclusion and Future Works What we propose here is a novel wavelet-based approach for scene change detection on video streams. This multiresolution approach not only differs from existing approaches that we mentioned previously, but also detects scene change efficiently and accurately. Meanwhile, our system provides a gracehl user interface. Our algorithm is capable of user-interactive video segmentation, whereas existing applications are usually not. Experiments shows that our false-alarm and mis-detection rate is comparable with the ones of those available algorithms. Future research will be taken on metric refinement, on detecting digital special effects (such as fade-in-fade-out and dissolve), on applying other kinds of transformations, on considering more related frames instead of only two nearby frames, and on developing a snapshot browsing system.

VI. References [ IICharles E. Jacobs, Adam Finkelstein, and David H. Salesin, “Fast

Multiresolution Image Querying” ACM SIGGRAPH’95 Conference Proceeding, pp. 277-286, 1995 r21John S . Boreczky and Lawrence A. Rowe, “Comparison of Video Shot

Figure 2: System overview [Phase

Scene Change Detection

!Step

lTime(sec)

I

Database Retrieval Distance Measuring

~0.00Ol l0.0015

I

Figure 1: Elapsed time for each step

Figure 4: Five videos used for testing our system using default threshold (frames sampled in 20fps) (*: SC: Scene Change Sensed by Human, DSC: Detected Scene Change by Computer)

Figure 3: Scene Change Chart: The values plotted dynamically with playing video is the inter-frame distance between the characteristicsof adjacent frames measured by our metric

251

Suggest Documents