Flashlight Detection in Indoor Sport Videos for Highlight Generation Nunnapus Benjamas1, Nagul Cooharojananone2 and Chuleerat Jaruskulchai1 1
Department of Computer Science, Faculty of Science, Kasetsart University, Bangkok, Thailand Emails: g4664200,
[email protected] 2 Advance Virtual and Intelligent Computing Center (AVIC), Department of Mathematics, Faculty of Science, Chulalongkorn University, Bangkok, Thailand Emails:
[email protected]
ABSTRACT In this paper, we present a novel method for generating indoor sport videos summary highlights using flashlight detection and replay segment detection by detecting frames containing the special shot transitions that sandwich replays. Detected flashlight and replays are utilized in efficient summarization of indoor sport videos. The proposed algorithm first detects the flashlight from region color histogram difference and shot boundaries are detected. Then, the algorithm identifies the replay segments by grouping the detected shot gradual transition that occurs in multiple frames. Our algorithm accurately detects flashlight and replays that it has the ability to capture inherently important events.
Keywords: Flashlight detection, Indoor sport videos, Video summarization.
1. INTRODUCTION The increasing amounts of sport videos need an effective management for handling this problem. Most viewers prefer to select particular segments which are interesting and suitable for their purposes. Summarization could contain only a short video which highlight the interesting events in a sport video. Many summarization techniques have been focused on detecting specific highlights using cinematic features, object-based features, text information from closed captions, and audio features. In the literature, cinematic and object-based features are employed to detect key events. Broadcasterdependent logo transitions that are used before and
after replays are detected by Pan and Li [1], while Pan and van Beek [2] and Kobla et al. and DeMenthon [3] use the observation that frame-to-frame differences demonstrate large fluctuations during slow-motion replays. Kobla et al. and DeMenthon [3] detect slowmotion replays in sport programs as a feature for sports/non-sports video classification. Rui and Gupta [4] utilized announcer’s excited speech and baseball hit for TV baseball programs; Ma and Lu [5] proposed an audio attention model to measure the importance curve of an audio track. Our assumption is the existence of a flashlight that indicates the interesting frames. For indoor sports, photographer capture interesting moment with the flashlight in order to increase the object’s sharpness and brightness. We can take this benefit to use choose the part of interesting in video by detect the flashlight. This research use this benefit in the fighting sports domain, and its specific application to boxing that it to be very popular. Furthermore, we propose an algorithm for detection of replays by automatic detection of the gradual shot transitions that immediately precede and follow replay segments. Pan and Li [1] also detect replays by detecting logo in scene transitions. First, they determined the logo template. After that, detect all logo transitions in the video using the logo template, and identify the replay segments. While we detect the gradual transitions and the associated replay segments. We are able to have a simpler and more efficient detection approach. For the later of the paper, we describe algorithms for shot boundary detection and flashlight detection in the next section, we describe our experimental results in section 3 and we conclude and give a future work in section 4.
2. THE PROPOSED ALGORITHMS
2.2 Flashlight Detection
In this section, we describe the details of our proposed method. The method is composed of two main steps: shot boundary detection and flashlight detection
In real world, such as fighting video, many flashlights occur during a match. Flashlights usually last less than 0.02 second. Thus for normal videos with 25 to 30 frames per second, one flashlight will affect at most one frame. A flashlight causes the changes in a video frame. It may generate a bright frame because the frame interval is longer than the time of flashlight. The quantity of flashlight in the frame of video depends on setting of camera by photographer or position of a camera and a video camera. As shown in Figure 1, the histogram changes at the frame where the flashlight occurs. The histogram goes back to the original after a flashlight frame. On the contrary, for a shot cut, the histogram distribution will not go back to the original. A very high brightness frame could cause a great change in histogram difference. So by only using the histogram difference to detect a very high brightness frame may cause a lot of miss because of a low brightness. However, for a low brightness, the histogram difference can not detect. Solving low brightness problem, each frame is divided into 16 blocks in a 4x4 pattern. A 256-bin RGB histogram is computed for each region. In flashlight detection, we define the ratio of the consecutive frames histogram difference and next three frame histogram differences to identify whether these frame is flashlight or shot cut, the equation(1) is showed below.
2.1 Shot Boundary Detection Shot boundary detection is usually the first step in video processing. Most sports video clip almost always contain both cuts and gradual transitions, such as wipes and dissolves. A cut is an abrupt shot change that occurs in a single frame. A gradual transition is a gradually the image of the next shot appears that occurs in multiple frames. We use histogram difference method because histogram is less sensitive to object motion than other methods. In addition, J.S. Boreczky and L.A. Rowe[6] compared histograms, discrete cosine transform, motion vector, and block matching methods and found that the histogram method was a good trade-off between recall and precision. Thus, our shot detection based on the RGB histogram values. Other researches try to remove flashlight because both shot cut and flashlight have a great change in histogram difference and flashlight will be detected as false shot changes. On the contrary, our algorithm detect the flashlight to determine the part of interesting in sport videos. Our algorithm is applied from the region histogram algorithm described by J.S. Boreczky and L.A. Rowe[6]. Our shot detector consists of four steps. Firstly, we use three thresholds ( Tcount , Thigh , Tlow ), Thigh is a high-difference threshold that maximum highhistogram difference of a region between consecutive frames. Tlow is a low-difference threshold that maximum low-histogram difference of a region between consecutive frames. Tcount is a count threshold that maximum count of region differences that exceed the Thigh or Tlow . Secondly, we divide each of the frames into 16 blocks in a 4x4 pattern. After that, histogram differences are computed for each region between consecutive frames. If the number of region differences that exceed Thigh is greater than Tcount , a cut or high flashlight is assumed, otherwise a low flashlights is assumed. Finally, if the number of region differences that exceed Tlow is greater than Tcount for five successive frames, a gradual transition is declared, otherwise a low flashlight is assumed. Replay segment is declared, when a gradual transition that sandwich replay was detected.
⎧> T ratio D (i , i − 1) =⎨ D (i + 3, i − 1) ⎩≤ T ratio
flashlight
(1)
cut
D (i, i − 1) denotes the histogram difference between frame(i) and its preceding frame(i-1). After shot boundary detection had computed. Flashlight detection is computing for identify what frame is a flashlight frame. Our flashlight detector consists of three steps. Firstly, we use four thresholds ( Tcount , Thigh , Tlow , Tratio ). Three thresholds ( Tcount , Thigh , Tlow )
had used in shot boundary detection; besides, there was a ratio threshold ( Tratio ). Tratio is a given ratio of the consecutive frames histogram difference and next frame histogram differences as equation (1). Secondly, histogram differences are computed for each region between consecutive frames. If the number of region differences that exceed Thigh is greater than Tcount and the ratio is higher than Tratio , a high flashlight is declared. Another, if the number of region differences that exceed Thigh is greater than Tcount but the ratio is not higher than Tratio , a shot cut is declared. Lastly, if the
Frame no.
1
2
3
4
5
Figure 1: Histogram changes in a video due to a flashlight. number of region differences that exceed Tlow is less than Tcount and the ratio is higher than Tratio , a low flashlight is declared.
3. EXPERIMENTAL RESULTS The original data is from VCD of S1-World Championship Voloume1. All videos are in AVI format file, a size of 325×288 pixels, frame rate 25 f/s, about 6 Gigabytes of data, with duration of 25 minute long. The characteristics of the test sets are briefly described in Table 1.
Table 1: Boxing video used in the experiment Video Video1 Video2 Video3 Video4 Total name Duration 00:08:01 00:04:42 00:05:20 00:06:15 00:24:18 #of frames 12028 7100 7938 9407 36473 Flash 69 53 56 70 248 Cut 42 34 36 37 149 Gradual 4 4 5 5 18 Replay 2 2 2 2 8
We apply our algorithm to four boxing videos used in our experiment. A video included three round of a match. Beginning of round was started by zooming. End of first and second round was followed
by gradual transition and replay segment. In Video3 and Video4, and the end of the third round was followed by gradual transition also. These clips contained cuts, wipes, dissolves, pans and zooms. For flashlight detection, we measured the number of flashlight that were correctly detected, the number of false positives, and the number of missed flashlights. Other researches did not report about result of flashlight detection therefore we can not compare our results with others. For shot boundary detection, we measured the number of shot boundaries that were correctly detected, the number of false positives, and the number of missed shot boundaries. A gradual transition was correctly detected if any of the frames of the transition was declared as a shot boundary. A replay segment was correctly detected if any of the ten first frames of the replay segment was declared as a start of segment and any of the ten last frames of the replay segment was declared as a stop of replay segment. We present recall and precision as the appropriate evaluation criteria. Recall is defined as the percentage of desired frames that are retrieved. Precision is defined as the percentage of retrieved items that are desired frames. From the test results we compute (2), (3) : recall =
Ra R
(2)
precision =
Ra
Ra
(3)
A
denotes the number of
frames that were
correctly detected. R denotes the sum of number of frames that were correctly detected and the number of missed frames. A denotes the sum of number of frames that were correctly detected and the number of false positives frames. Results of experiment are described in Table 2. Table 2: Results of experiment
Total Correct Detected Recall (%) Precision (%)
Flash
Cut
Gradual
248 174 226 70.2 77.0
149 112 155 75.2 72.3
18 15 15 83.3 100
Replay Segment 8 5 5 62.5 100
We calculate the average recall and precision for computing algorithm applied to four boxing videos. Overall result of our algorithm give an acceptable result. We have a recall 70.2% and precision 77.0% for flashlight detection, a recall 75.0% and precision 72.3% for shot cut detection, a recall 83.3% for gradual detection , a recall 62.5% for replay segment detection and precision 100% for gradual detection and replay segment detection. The recall rate for this algorithm is much more important than the precision rate, since we will not be tolerant to missing flashlight and replay, but may enjoy watching interesting non-flashlight events. In flashlight detection, the misses and incorrect of the flashlight algorithm are due to the fact that 1) some flashlight frame have a very low brightness which could not change histogram difference. As a result, flashlights missed and 2) the large amounts of camera motion caused a large number of false positives. In cut detection, recall and precision value are lower than gradual transition detection that a large number of false positives due to the large amounts of camera motion and flashlight occurred in end of shot, and a large number of missed cuts due to similar backgrounds in adjacent shots. In gradual transition detection and replay segment detection, there are some errors due to different frames in adjacent shots.
4. CONCLUSION AND FUTURE WORKS In this paper, a novel method for flashlight detection and replay segment detection. Detected flashlight and replay are utilized in efficient summarization of indoor sport videos. A collect of frame containing a flashlight can we composed as a summarize video because it contain a short video which highlight the interesting events in an indoor sport video. This method can be applied with other type of video for example home videos (ceremonies) and news programs (news conferences). While the histogram difference method works reasonably well when used with RGB color model, it has a tendency to omit a low brightness. To overcome this problem, HSB color model that color model efficiently separating the chrominance from the luminance components of color was tested in this algorithm. Overall, using histogram difference method with HSB color model does not result in a low brightness improvements. We are now working on a detecting motion of the boxer for increase efficient video summarization.
5. REFERENCES [1] Pan, , Li, B. and Sezan, M. I., “Automatic detection of replay segments in broadcast sports programs by detection of logos in scene transitions,” In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2002. [2] Pan, H., Van Beek, P. and Sezan, M. I., “Detection of slow-motion segments in sports video for highlights generation,” In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 2001. [3] Kobla, V., DeMenthon, D. and Doermann, D., “Identifying sports videos using replay, text, and camera motion features,” In Proc. of the SPIE conf. on Storage and Retrieval for Media Databases, Jan. 2000. [4] Rui, Y., Gupta, A. and Acero, A., “Automatically extracting highlights for TV baseball programs,” In Proc. of ACM Multimedia, Los Angeles, CA, 2000, pp. 105-115. [5] Ma, Y.F., Lu, L., Zhang, H.J. amd Li, M.J., “An Attention Model for Video Summarization,” In Proc. of 10th ACM International Conference on Multimedia, 2002. [6] Boreczky, J.S. and Rowe, L.A., “Comparison of video shot boundary detection techniques,” Proc. of SPIE Storage and Retrieval for Still Images and Video Databases IV, pp. 170-179, 1996.