final episode of The Sopranos (episode 21 of season 6 entitled âMade in Americaâ), the video abruptly goes to black and the audio cuts off for 10 seconds before ...
VIDEO OUTAGE DETECTION: ALGORITHM AND EVALUATION Amy R. Reibman and Allan R. Wilks AT&T Labs-Research, 180 Park Avenue, Florham Park, NJ 07932 ABSTRACT We present a Video Outage Detection Algorithm (VODA) that detects catastrophic failures in video systems by mimicking human behavior. VODA uses a continuity detector for video, audio, and motion, along with blackness, silence, and stillness detection. An outage is declared when all individual features exhibit a sudden drop; an alarm is generated when an outage lasts over two seconds. We analyze the performance of VODA on a large corpus of movie and TV episodes, and we show it has good performance when applied to over 160 hours of recorded broadcast TV. I. INTRODUCTION When delivering video services, it is essential to deliver a high-quality video signal with interesting content. Good quality video only appears on the viewer’s TV when everything goes right. Impairments can be introduced either by the content provider, the service provider, the network provider, or on the customer premises. Hence, end-to-end quality assessment is important so the operater knows the quality of video being presented to the viewers. This requires measuring at various points along the processing chain. The goal of a well-designed video monitoring system is to detect and report problems in the distribution chain so that the problems can be fixed before any customers call to complain. A variety of faults can occur. A packet loss can cause annoying degradation of the video signal. Insufficient bit-rate can cause a video encoder to output video with heavy and annoying compression artifacts. Poor content acquisition can cause aliasing artifacts, for example, if format conversion from HD to SD (or vice versa) is not done correctly. Viewers can tolerate most of these, provided the content being sent is interpretable in some way or another by the human eye. The worst possible failure is a black screen. There has been a variety of recent work on measuring video quality in a video distribution system when no reference is available for comparison. Verscheure et al. [1] examine the combined effect of compression and packet loss using quantizer values from the received bitstream and packet loss rate. The MSE due to packet loss is estimated in [2], using key information extracted from the video bitstream. Naccari et al. [3] consider this problem in the pixel domain, incorporating the effect of motion-compensated concealment. The impact of error concealment is also used
to compute quality given packet losses in [4]. Visibility of individual packet losses is predicted in [5]. PSNR due to compression in the absence of packet loss has been studied by [6], [7], [8]. In this paper, we assume the video bitstream is well formed and loss free. Therefore, we focus on catastrophic outage detection in the delivered video signal, and particularly on the situation in which the video signal suddenly goes black. One traditional method to monitor this in video distribution networks is to have humans watching banks of TV monitors, and setting off an alarm when one (or more) goes dark. There are many drawbacks to this approach, including cost, operator fatigue, and operator inattention because of boredom. Further, there is no record of when a screen went black nor how long it has been black. Automated black-detection systems typically rely on thresholds. The video black levels, as well as the audio levels, can be monitored and an alarm triggered when the video remains black for too long, or the audio remains soft for too long. However, setting the appropriate thresholds is challenging, because desirable content often has long durations of quiet audio or periods of dark video. Further, commercials are often inserted after one or more black frames [9]. Different programs and different TV stations have different production strategies, and so each would require a different threshold for blackness duration. However, involved viewers know almost immediately when a sudden outage has occurred. At the end of the final episode of The Sopranos (episode 21 of season 6 entitled “Made in America”), the video abruptly goes to black and the audio cuts off for 10 seconds before the credits roll. During the initial airing of this episode, many of the over 11 million of viewers [10] initially thought their video distribution system had failed1 . Suspense had built up during the episode, and the sudden blackness and silence was unexpected. Our goal is to design and evaluate a video outage detection algorithm (VODA) for catastrophic failures that mimics this human reaction. Humans suspect an outage when – in the middle of active motion and audio – the video suddenly 1 Some quotes are “Had the cable gone out?” [11], “Maybe it’s your cable. Maybe it’s your VCR.” [12], “I’m finding it a little hard to believe that so many people actually thought their cable went out at the end of the Sopranos finale.” [13], “What? Did the power go out?” [14]
goes both black and silent. Thus, our algorithm relies on a continuity detector for audio, video, and motion. An additional continuity detector for speech could also be added, but is not considered here. The goal of our algorithm is to generate an alarm so that an outage is being diagnosed and remedied before viewers begin to call technical support. Our VODA can operate at various stages of the processing chain. It is essential to have one instantiation as close as possible to the customer, otherwise outages between the VODA measurement point and the customer may be missed. However, it is also useful to have VODA operating at a few earlier locations in the processing chain so that when an outage does occur, it can be tracked back from the consumer to the point of failure. Our VODA is designed to operate across a wide range of content without manual intervention, and to have low false alarm and high detection capabilities. We describe our VODA in section II, and analyze its performance on a wide range of video content in section III. II. VIDEO OUTAGE DETECTION ALGORITHM Our VODA detects outage by analyzing three values computed from the received audiovisual content: the mean image value, the audio power, and a motion feature. An alarm is generated to report an outage when these all suddenly drop below a threshold simultaneously, and all remain low for a long enough time period. If any of these values gradually decreases until they are below the respective threshold, an alarm is not generated. To compute the mean image value, let be the image value at frame-time at pixel . Then
(1)
is the mean image value at frame-time , where is the number of pixels in each frame. Let be the set of audio samples to be played during frame-time . The audio power during frame-time is
being better predicted using motion than without. Our mo tion feature, " , indicates the fraction of such macroblocks across the entire frame. To reduce the number of computations necessary to compute this motion feature, we compute the sums in (3) using only a subset of the pixels in macroblock . In the current paper, we compute these for every fourth pixel in each # $ % & . We also ignore direction, and we set and boundary macroblocks to speed computation. The motion feature is reported as a percentage between 0 and 100. Using each feature, we individually detect a “feature ( ) * ' " outage”. For a given feature , we define / thresholds + , - , + ,. - and lag 1 0, 0, + , + ,. / by the a boolean outage vector 3 is TRUE in some interval 2 , if and condition that 0 , only if: 4 4 3 (a) ' + , , for 3 (b) ' - +, 5 5 (c) ' ' + ,. , for /. Conditions (a) and (b) say that the feature is below the + , 3 2 threshold throughout the interval , rising to at least that 3 threshold at time . Condition (c) says that at time there is a precipitous drop, of size at least + ,. , compared to each of the previous / values of thefeature. For the image-mean . For the motion feature, and audio features, we use / 6 we use / . To detect catastrophic failure, where an outage happens on all features simultaneously, we take into account the the fact that the decoder may not always have perfect synchronization between3 the video and the audio streams. Hence by extending the audio-outage in each we compute 0 7 direction by 8 frames to account for possible misalignment. Mathematically, 3 07
: 9
;
7?
(2)
0@
A
3 A 07 0B
for all . At this stage, each frame is labeled as being inside a catastrophic outage or not. However, clearly an outage that lasts only one frame should not trigger an alarm. As the duration of an outage increases, the probability that a3failure has occurred. We set off an alarm when the duration C C , for a threshold . There are a number of parameters required to compute C this algorithm: + @ + 7 + B + @ . + 7 . and + B . , 8 , and . If (3) the performance of the algorithm is sensitive to the setting be the sum of absolute differences between two frames of these thresholds, then we have not made any progress; where . If different thresholds would need to be set for different chan ! for some , then this macroblock is labeled as nels and different programs. However, we show in Section where is the number of audio samples in this frame-time. Our motion feature is the fraction of macroblocks that moved. To define this motion feature, we compute motion between two adjacent decoded video frames. We partition the current frame into macroblocks of size 16 by 16 pixels. For a given macroblock , let
5
6
5
x 10
9
x 10
8 5
Outage number (sorted)
Outage number (sorted)
7
4
Sudden audio outage Audio outage Catastrophic outage
3
2
Sudden audio outage Audio outage Catastrophic outage
6
5
4
3
2 1 1
0
0
10
20
30
40
50
60
70
0
80
0
10
20
Duration of outage (sec)
50
60
70
80
4500
Sudden image outage Image outage Catastrophic outage
4000
Outage number (sorted)
1600
Outage number (sorted)
40
5000
1800
1400
1200
1000
800
600
400
3500
Sudden image outage Image outage Catastrophic outage
3000
2500
2000
1500
1000
200
0
30
Duration of outage (sec)
2000
500
0
10
20
30
40
50
60
70
80
Duration of outage (sec)
0
0
2
4
6
8
10
12
14
16
18
Duration of outage (sec)
Fig. 1. Detected outages for full-length movies using (a) audio and (b) mean image.
Fig. 2. Detected outages for TV episodes using (a) audio and (b) mean image.
III that the algorithm performs very similarly using a wide range of settings for these parameters.
seconds. This is because there are often long durations in movies and TV of silence, or very dark scenes. Hence any algorithm that only looks at instantaneous thresholds will not be effective at detecting outages. Sudden image or audio 6 or + 7 . ) are outages (detected using + @ . much less frequent, although still relatively common. Thus, when the image becomes dark or the audio becomes quiet, it is more common to happen gradually than suddenly. When we apply our catastrophic VODA, however, far fewer detected outages occur in this representative content. There is no outage as long as one second in the full-length movie content (Figure 1). Further, there are only two outages longer than 2 seconds in the TV episodes (Figure 2); these each are in episodes of The Sopranos, with the longest one from the final episode. Interestingly, this detected outage of over 9 seconds is consistent with people’s reacton to this episode. Not surprisingly, many sudden image outages occur near the end of movies and the end of TV shows, just prior to the credits. Next, we explore the impact of using different thresholds on the performance of VODA. Figures 3 and 4 show the impact of parameters 8 and + @ , and + B and + B . respec( tively, for various values of each parameter. Varying + @ . ) # 6 6# * 2 (not shown) does not change performance. As can be seen, changing 8 from the default value of 4 to either 2 or 6 does not change most outages significantly. When + @ increases to 19, 25% fewer outages are detected and those that are detected are shorter. Changing + B doesn’t alter performance significantly, but increasing + B . from 10 to 12 prevents VODA from detecting the outage in the final episode of The Sopranos. Based on these observations, we
III. EXPERIMENTAL RESULTS Our final goal in this paper is to evaluate our video outage detection algorithm on a system which records TV from analog cable. Before doing so, we first explore a set of typical movies and TV episodes, both to evaluate our our initial assumption that a dramatic, simultaneous drop in image-mean, motion, and audio levels is rare, and to examine the impact of the parameters on VODA performance. After choosing fixed parameter settings, we then apply our VODA on 4 distinct TV channels each monitored for 40 consecutive hours. III-A. Content statistics of movies and TV episodes To understand the effects of the cinematography on the measures defined above, we examine the statistics for 101 full-length movies and 216 TV episodes, each compressed using MPEG-4 at resolution 720 by 480. We compute the statistics for 0 @ and 0 7 individually for the 16,457,205 frames of movies and 15,402,936 frames of TV episodes. Figures 1 and 2 show the detected frame-outages for movies and TV episodes, respectively, using several different algorithms. Figures 1(a) and 2(a) show the outages detected & using only audio level (+ 7 ); while Figures 1(b) and 2(b) show the outages detected using only mean-image D ). Outages detected using VODA are shown level (+ @ in both versions. As can be seen, when + @ . or + 7 . , there are thousands of frames in outages, with many outages lasting several
20
18
18
16
16
Outage number (sorted)
Outage number (sorted)
20
14
12
J=2 frames J=4 frames J=6 frames
10
8
6
14
12
10
8
6
4
4
2
2
Tm=1 T
=2
T
=3
m
0
0
1
2
3
4
5
6
7
8
9
0
10
m
0
1
2
4
5
25
20
20
15
10
Tµ=17
1
2
3
4
5
6
7
8
9
10
Fig. 3. Impact of 8 and + @ on outage detection: TV data. C
9
10
8
9
10
Tmd=8 T
10
=10
md
Tmd=12 Tmd=15 5
Duration of outage in TV data (sec)
set the time threshold,
8
Tmd=5
µ
Tµ=19
0
7
15
T =18 5
0
6
Duration of outage in TV data (sec)
25
Outage number (sorted)
Number of outages
Duration of outage in TV data (sec)
3
to be 2 seconds.
III-B. Outage detection for recorded broadcast TV Finally, we apply our outage detection algorithm to four different TV channels, each recorded for 40 consecutive hours over an analog cable system using the Hauppauge PVR-500 PCI card with analog tuner. In the 17,263,380 frames in this dataset, there are 24 outages, each under half a second long, as well as 5 detected outages between 1.5– 3 seconds long, all occuring on the CNN Headline News channel. IV. CONCLUSIONS We presented a Video Outage Detection Algorithm (VODA) that detects catastrophic outages of video that becomes suddenly black and silent. While the algorithm is fairly robust to changes in parameter settings, false alarms can still occur. We saw three in a 160-hour interval. Further research may provide more sophisticated features which will improve performance. In practice, it could be possible to use VODA in a reducedreference framework. For example, when content is stored in a video-on-demand system, the incoming content could be processed with VODA to determine if it contains long periods of sudden blackness. Then when the system plays the video, an alarm would only be generated when outages are detected that were not in the original content. V. REFERENCES [1] O. Verscheure, P. Frossard, and M. Hamdi, “User-oriented QOS analysis in MPEG-2 video delivery,” Real-Time Imaging, vol. 5, pp. 305–314, 1999.
0
0
1
2
3
4
5
6
7
Duration of outage in TV data (sec)
Fig. 4. Impact of + B and + B . on outage detection: TV data. [2] A. R. Reibman, V. Vaishampayan, and Y. Sermadevi, “Quality monitoring of video over a packet network,” IEEE Transactions on Multimedia, vol. 6, no. 2, pp. 327–334, April 2004. [3] M. Naccari, M. Tagliasacchi, F. Pereira, and S. Tubaro, “Noreference modeling of the channel induced distortion at the decoder for H.264/AVC video coding,” in IEEE ICIP, 2008. [4] T. Yamada, Y. Miyamoto, and M. Serizawa, “No-reference video quality estimation based on error-concealment effectiveness,” in Packet Video Workshop, 2007. [5] S. Kanumuri, P. C. Cosman, A. R. Reibman, and V. Vaishampayan, “Modeling packet-loss visibility in mpeg-2 video,” IEEE Transactions on Multimedia, 2006. [6] D. Turaga, Y. Chen, and J. Caviedes, “No reference PSNR estimation for compressed pictures,” in IEEE ICIP02, 2002. [7] A. Ichigaya, M. Kurozumi, N. Hara, Y. Nishida, and E. Nakasu, “A method of estimating coding PSNR using quantized DCT coefficients,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 2, pp. 251– 259, February 2006. [8] T. Brandao and M. P. Queluz, “No-reference image quality assessment based on DCT domain statistics,” Signal Processing, vol. 88, no. 4, pp. 822–833, April 2008. [9] N. Dimitrova, H.-J. Zhang, B. Shahraray, I. Sezan, T. Huang, and A. Zakhor, “Applications of video-content analysis and retrieval,” IEEE Multimedia, July-September 2002. [10] “ http://en.wikipedia.org/wiki/Made in America (The Sopranos),” Downloaded 1/1/09. [11] “http://www.news.com.au/couriermail/story/ 0,23739,21891485-7642,00.html,” Downloaded 1/1/09. [12] “http://rogerebert.suntimes.com/apps/pbcs.dll/ article?AID=/20070610/EDITOR/ 70611001/-1/RSS,” Downloaded 1/1/09. [13] “http://www.ew.com/ew/article/0,,20042736,00.html,” Downloaded 1/1/09. [14] “http://featuresblogs.chicagotribune.com/entertainment tv/ 2007/06/are you kidding.html#more,” Downloaded 1/1/09.