header for SPIE use
Measuring ATM video quality of service using an objective picture quality model Jiuhuai Lu*a, Mainak Chatterjeeb, Mayer D. Schwartza, Mihir K. Ravela, Wilfried M. Osbergera a
Tektronix, Inc., Beaverton, OR 97077
b
Department of Computer Science and Engineering, University of Texas at Arlington, TX 76010
ABSTRACT End-to-end quality of service for real-time video delivery over ATM networks depends on both video coding and network performance. Historical measures of QoS for data services have involved packet delivery parameters such as cell loss and delay, but these are not directly representative of a user's perception of the delivered video quality. To this end, this paper describes the use of a video quality measurement scheme that employs a human vision based picture quality model to objectively estimate end-user video quality. We present results to demonstrate that the video quality analysis based approach can be used to determine performance characteristics for distribution-quality video coding parameters over ATM networks. Keywords: video quality, MPEG, ATM, human vision model, quality of service, image quality, cell loss, packet loss
1. INTRODUCTION A common situation in most video network communications is data loss, either introduced by the lossy compression process or network transmission errors. While it is crucial to minimize the data loss, it is equally important to be able to accurately estimate or predict the loss. One advantage of using ATM networks to transmit video content is the guaranteed quality of service (QoS). In an ATM network environment, a user is able to request desired video service at a quality that can be characterized by a set of end-to-end parameters and their error-bounds. While such a QoS specification is useful, it would be desirable that end-users and network content providers be able to measure the actual perceptual quality of network video. This would effectively allow content providers to factor users’ opinions into engineering and operational tradeoff in order to utilize available bandwidth efficiently. Performance of MPEG-2 video over ATM networks has been the focus of many research and development activities. However, most of them are based on visual inspection or simple error measurement [1,2,3]. Therefore establishing the relations between the network error characteristics and end-user quality ratings becomes a rather important perspective for understanding the network QoS performance. As video data are ultimately consumed for visual tasks by end users, performance measurement of video distribution would be most useful if it is directly related to subjective picture quality ratings. Using an ITU-R picture quality rating standard [4], picture quality can be assessed in a perceptual rating scale through psychovisual experiments. In recent years, substantial progress has been made in picture quality models [5,6,7], which are able to predict human quality ratings for degraded video sequences. Using these picture quality models, we may be able to obtain quickly perceptual quality based estimate of video application design tradeoffs, such as optimization between coding rate and distortions in designing and operating MPEG based video encoders. In fact, video encoder developers have started to use human vision based video quality instrument to evaluate design performance [8]. Network transmission constitutes another source of data error in video distribution. In ATM networks, network traffic and errors in the physical layer cause quality loss in end-to-end video transmission. The errors are commonly characterized by ATM performance parameters that include cell loss, cell delay, cell delay variations, cell errors, and bit errors. These ATM errors incur video artifacts such as tiling, motion jitters, and frame freezing in reconstructed MPEG-2 video. The impact of these errors on visual quality depends on their spatial and temporal locations, and cannot be estimated using the ATM QoS parameters. Consequently, it is necessary that the ATM QoS parameters in the networks be associated to quality degradation *
Send email correspondence to
[email protected]
on a perceptual scale. Research has been done to characterize network errors in impaired video. Techniques such as percentage of noticeable errors [1], and duration and rate of video glitches and their spatial extent [10], have been used to describe the severity of ATM impairment. Subjective assessment of video impairment has also been used to evaluate video over ATM [11]. The demand in network video applications and the time-consuming nature of subjective quality experiment prompts us to investigate the utility of a picture quality model to objectively and automatically evaluate ATM video performance. Such capability would benefit the bandwidth management and network configuration to jointly optimize resource allocation and user’s perceived quality in video network applications. With the aim of exploring the relation between network transmission error parameters and the perceptual quality impact to MPEG-2 video over ATM, this paper presents a quantitative analysis of end-to-end video performance using a human vision system based picture quality model [9], Tektronix’s PQA-200. While the performance of ATM network encompasses cell loss, error, and delay behavior, this paper is focused on the quality implications of cell losses on reconstructed MPEG-2 video. By characterizing the cell losses, we are able to examine several other ATM performance parameters that ultimately introduce cell losses: 1) cell delay variations beyond the capacity of ATM multiplexors and the decoder, 2) cell delays beyond the permitted arriving time, and 3) bit errors that cause cell drops. The experiment used video sequences of five minute length containing various types of scenes to analyze average quality impairment ratings. The cell loss impairment was introduced to ATM cells carrying MPEG-2 transport streams by an ATM impairment simulator. The picture quality rating (PQR) results we obtained provide a representation for the relationship between cell losses and video quality impairment. In particular, we were able to quantitatively characterize the sensitivity of network video quality to ATM cell losses.
2. ATM VIDEO QUALITY MEASUREMENT TESTBED ARCHITECTURE 2.1. Overview of the testbed The ATM impairment data were created in an ATM video testbed, the architecture of which is shown in the Figure 1. The source video and decompressed video were stored on video storage systems before they were compared by the picture quality model. Each of the processing blocks can be individually controlled.
Video source MPEG-2 Encoder & ATM Interface
ATM Impairment Simulator
ATM Interface and MPEG-2 Decoder
Picture Quality Analyzer
Picture quality ratings
Figure 1. ATM video quality test bed architecture. 2.2. Video data The test video data was fed into the testbed to create test cases that have fixed visual content and lengths but under different video delivery parameters. The test video was selected to cover a wide variety of content, including news, sports, talk shows, and synthetic computer graphics. This includes broad ranges of spatial textures, sharpness, and motion activities. The test video comprises sixteen video clips concatenated to form a total length of five minutes of video. The length and diversity of the content introduces a range of difficulties to encoding and transmission and hence a range of video delivery performance. The spatial and temporal activities in video sequence critically affect the extent of degradation in delivery and the severity of the perceived degradation. Because of this effect, a test video covering different content types and extended length enables robust assessment of transmission errors even at relatively low error rate, and thus ensure accuracy in measuring average endto-end video performance with minimal content bias. In the experiment, the standard video format of 60Hz, and 720 by 486 was used in all cases.
2.3. MPEG-2 Encoder and Decoder The MPEG-2 encoder converts a digital video input to a compressed video stream. In the compression process, MPEG-2 removes spatial and temporal redundancy by encoding video into intracoded I frames and intercoded P and B frames. One intracoded frame and the subsequent intercoded frames form a group of pictures (GOP). As the I frames normally require more bits to encode than P and B frames, the probability that I frames are affected by a cell loss is typically greater than for P or B frames. In addition, the duration and spatial patterns of impairment due to a cell loss can be different depending on whether the affected frame is intracoded or intercoded. Errors on I frames will propagate over the subsequent P and B frames until the next I frame, while an error on a B frame will not affect the next frame. Consequently, GOP structure also contributes to end-to-end video quality. There are two types of coding rate specification, variable bit-rate (VBR) and constant bit-rate (CBR). While CBR improves network traffic and eases network flow control, CBR video delivers variable picture quality and does not use network bandwidth as efficient as VBR. On the other hand, VBR often involves complex multiplexing and buffering techniques. As this work was focused on characterizing the impairment of ATM cell loss, a CBR MPEG-2 encoding scheme was used to allow separation of the impact of cell loss from other performance factors. In addition, a GOP length of 15, with two B frames for every P frame, was chosen for all the encoding scenarios. This is a typical set of conditions for access network distribution of video. The quality implication of using other GOP structures, which would be another potential subject for investigation, however, is beyond the scope of this experiment. The MPEG-2 decoder decodes an MPEG-2 bit stream and outputs the decoded video in CCIR 601 format. Both the encoder and decoder have sufficient buffer sizes so that no buffer overflow would possibly cause a cell loss. The encoder and decoder were Tektronix VideoEdge M2-T with OC-3c/STM-1 ATM interfaces equipped with commercially available encoder and decoder chips. No error concealment is employed for packet loss by the decoder. 2.4. ATM network configuration The encoder outputs a packetized MPEG-2 transport stream (TS) that is then carried over ATM Adaptation Layer 5 (AAL-5). The data unit in AAL-5, or Protocol Data Unit (PDU), can carry a number of TS packets which can be selected depending on the application. A small PDU, which may limit the distribution capacity, minimize the extent of impairment in the event of PDU loss. A PDU is further mapped to the ATM layer where a PDU is segmented to 8 ATM cells. Each ATM cell has a payload of 48 bytes and a cell header of 5 bytes. In our experiment, we chose to use the smallest PDU that takes two MPEG2 TS packets of 188 bytes each and a PDU trailer of 8 bytes. A 155Mbps optical fiber link was used to connect the ATM interface of the encoder and the ATM interface of the decoder with a permanent virtual circuit (PVC) connection. When ATM impairments were to be simulated, an ATM impairment generator was inserted between the encoder and decoder. ATM cells are the smallest data unit carried by ATM traffic. In a typical ATM interface for an MPEG-2 decoder, the CRC field in the PDU trailer is checked for errors. When a CRC error occurs, caused either by cell errors or a cell loss, it would trigger the discard of the entire PDU. Although it has been reported that in some cases, deactivating the CRC error check may actually reduce visible glitch effect [1], most commercially available ATM interface hardware as well as ours force the error check. As a result, the actual cell loss ratio is at least eight times as much as the cell loss ratio measured at the ATM impairment generator. 2.5. ATM Impairment generator We simulated ATM impairments to video streams using a commercial ATM impairment generator, Adtech AX4000. The ATM impairment generator is able to simulate several ATM network traffic conditions including cell errors and cell losses with a specified error probability distribution function. By characterizing video impairments resulting from cell losses, we are also able to understand the video distortions related to cell delay variations, since cell delay variations, when beyond a tolerable limit, result in cell losses at ATM switches, multiplexors, or receivers. Other ATM errors, such as cell errors and bit errors would cause cell losses in cases when a receiver opts to drop cells containing errors. In fiber optic link based ATM networks, bit error rate is typically below 10-9. Therefore cell losses are the predominant network errors that are directly associated with video picture distortions. The ATM simulator injected cell losses into video streams according to a geometric distribution. Cell losses were allowed to occur in any cell in a PDU. The severity of cell losses, indicated by the cell loss ratio (CLR), was controlled by the
impairment simulator. In addition to the fact that each cell loss ejects an entire PDU, or 8 contiguous ATM cells, a cell loss may trigger a loss of two PDUs. This occurs when the last cell in a PDU is dropped; the following PDU will be lost because the last cell indicates the boundary for reading the next PDU. As a consequence of this, the effective cell loss ratio affecting the test video stream is approximately 9 times the cell loss ratio taking place at the impairment simulator. The cell loss ratio we refer to in the remaining of the paper is the one observed at the simulator. We will use effective cell loss ratio (ECLR) to describe the one observed in the decoder. It must be noted that, although the ATM impairment characteristics in this experiment was measured with particular settings, such as the AAL5 PDU size, etc., the results can be generalized. While the ATM performance characteristics may change in case that corrupted PDUs are not dropped or a larger PDU size is used, the average quality characteristics could be approximated by the impairment rating results obtained in this experiment, as long as the ECLR observed in the decoder measures the same. 2.6. Picture quality analyzer The picture quality analyzer (PQA) is a device designed to predict the perceptual quality ratings of a video sequence that an average end-user would give when judging picture quality. The PQA used in our experiment was a human vision based picture quality model described in [9], a functional equivalent of the commercial Tektronix PQA-200. The picture quality analyzer requires two video inputs, one as the reference represented by the source, and the other originated from the source but with artifacts and noises resulting from lossy compression and transmission. Two types of quality ratings are produced as the output: an overall quality rating summary and a time series of ratings measured for each picture in the sequence [9]. Figure 2 provides an architectural overview of the picture quality analyzer, PQA-200. The Front End Processing models the signal processing paths from digital representation of uncompressed color video in storage to electronic-gun voltage in a display, and from the voltage to pixel brightness on the display. In the second stage, a series of processing takes place to model the human visual processing known to exist in the visual pathway. Each picture is filtered and decomposed into fourlevel spatial-frequency bands. Contrast computation, spatial and temporal masking, and luminance to chrominance masking are applied to each picture. After these operations, the degraded video and reference video are compared channel by channel. The differences are further integrated across all the pixels in a picture to derive a rating output for the picture. These picture ratings can be further integrated to form a rating summary for the whole video sequence. To calculate a summary score, the model uses Minkowski summation with the Minkowski coefficient equal to 4.0 [9].
Y‘
C b‘
C r‘
Front End Processing
Y
u* v*
Luma Processing
Luma JND Map
masking
Chroma Processing
Chroma JND Map
JND and Correlation Summaries
Figure 2. Human vision system based picture quality model flow chart.
3. VIDEO QUALITY RATING RESULTS We compared the video distribution performance in the simulated ATM environment at two encoding rates and several cell loss levels. The cell loss ratios examined in the experiment, ranging from 10-4 to 10-6 were chosen so that they were near the
thresholds where the cell loss impairments become visually noticeable. Additionally, the video sequences that only contained MPEG-2 impairments were also included for comparison. The two MPEG-2 coding rates were 2.5Mbps and 6.0Mbps. Picture distortions resulting from cell losses appear in several different patterns, depending on the coding mode of the impaired frame and the type of data carried by the lost cells. When a cell loss occurs in the data of I frame, glitches can be seen in the form of single or multiple blocks with distorted color or contrast. An error occurring in an I frame propagates until the next I frame. In our video data, the longest glitches lasted for 15 frames, which was the GOP length set in the encoder. In addition to those errors seen in intracoded frames, P and B frames also demonstrated spatial distortions of macroblocks, due to data losses in the motion vectors. Some impaired macroblocks appear to have been moved away from their original locations, resulting in local spatial misalignment. In the absence of ATM errors, visible distortions were only found occasionally at 6Mbps. For video at 2.5Mbps, however, MPEG-2 encoding artifacts were easily seen in most frames, including tiling and blurring. These typical MPEG-2 errors were also observable immediately following scene changes at both bit-rates. In comparing cell loss impairments with encoding artifacts, we found that, at 2.5Mbps, it was difficult sometimes to differentiate between errors caused by ATM cell loss and errors introduced by encoding process when the video was playing at the normal playing speed. 3.1. Picture quality rating of MPEG-2 video with cell loss impairment The video sequences with various degrees of impairments from the MPEG-2 encoder and the ATM traffic simulator were compared with the original sequence using a human vision picture quality model of PQA-200. The analysis tracked the quality over the entire length of the 5-minute video. High quality pictures yield lower PQR ratings. The predicted quality ratings were computed with several assumptions on the viewing conditions. It was assumed that viewers would see the material on a television monitor from a viewing distance of four screen-heights with a gamma value of the display monitor set to 2.5 [9]. The picture quality rating series produced by the picture quality model gave performance ratings at a temporal resolution of one rating per field. Figure 3 shows a series of the ratings over 5 minutes of the test video containing only MPEG-2 distortions at 6.0Mbps. The spikes in the rating series indicated the existence of severe degradation caused by abrupt scene changes and other quick switches of temporal activities. The video encoded at the same bit-rate with cell losses is shown in Figure 4. At a cell loss ratio of 10-5, the randomly distributed spikes mark the locations of impaired pictures in the video sequence. The density of the spikes throughout the 5-minute changes and indicates some uneven distribution in cell loss impairment. The first half appears to get more severe distortions than the second. This may be caused by the fast motion and complex scenes in the first half of the video sequence. Coding at a constant bit-rate for complex frames requires extra compression and therefore makes each cell more critical in representing the pictures. Increasing the cell loss ratio to 10-4 resulted in more spikes. However, PQR ratings for the unimpaired frames still follow the same curve as the ATM error free case shown in Figure 3. The PQA-200 also outputs one summary value for the entire sequence using the Minkowski summation. The overall ratings for the three impairment conditions shown in Figures 3-5 are 3.07, 3.60, 7.00, respectively, indicating the picture quality drops off sharply as CLR increases. As in typical QoS performance monitoring scenarios, we were mainly interested in examining the end-to-end video quality for a given network condition. Figures 6 and 7 show the rating results for 90-second of the material at the two MPEG-2 encoding rates, 2.5 and 6.0Mbps, respectively, and the network cell loss ratio at 10-4. With an equal percentage of information loss, the higher bit-rate video at 6.0Mbps apparently suffers more glitches than the lower bit-rate video. This is simply because the higher bit-rate video stream will experience more cell losses per second than the lower bit-rate video at the same CLR. 3.2. Relation of overall video quality ratings with ATM cell loss ratio As shown in the previous section, the higher bit-rate video suffers from more cell losses than the lower bit-rate video. However, the percentage of picture information packed in each ATM cell in the higher bit-rate video is less than in the lower bit-rate video. To understand the perceptual quality implications in these scenarios, we compared the overall impairment levels at a number of cell loss ratios to examine how different the two video streams encoded at different bit-rates would perform in these network environments. In Figure 8, the relations between the CLR and the overall picture quality ratings are represented by two curves, one for 6.0Mpbs and the other for 2.5Mbps video. It shows that at relatively lower CLRs, the increase in degradation level as a result of the increase in CLR is approximately equal for both video streams till the CLR reaches 10-5, or ECLP 9x10-5. Both video sequences approach to their corresponding ATM error-free quality level at CLRs
25 B it r a te 6 M B P S C e ll lo s s : 0 20
PQR
15
10
5
0 0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
F ie ld n u m b e r
Figure 3. Picture quality rating of a 6.0Mbps MPEG-2 sequence, CLR = 0.
25 B it r a te 6 M B P S C e ll lo s s r a te : 1 E - 5 20
PQR
15
10
5
0 0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
F ie ld n u m b e r
Figure 4. Picture quality rating of the sequence coded at 6.0Mbps with CLR at 10-5, or ECLR at 9.0x10-5. 25 B it r a te 6 M B P S C e ll lo s s r a te : 1 E - 4 20
PQR
15
10
5
0 0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
F ie ld n u m b e r
Figure 5. Picture quality rating of the sequence coded at 6.0Mbps with CLR at 10-4, or ECLR at 9.0x10-4.
25 B it rate 2 .5 M B P S C ell los s rate: 1 E -4 20
PQR
15
10
5
0 0
1000
2000
3000
4000
5000
F ie ld n u m b e r
Figure 6. Video clip from 100th second to 190th second, MPEG-2 bit-rate: 2.5Mbps, CLR:10-4(ECLR:9x10-4).
25 B it rate 6M B P S C ell loss rate: 1E -4 20
PQR
15
10
5
0 0
1000
2000
3000
4000
5000
F ield num b er
Figure 7. Video clip from 100th second to 190th second, MPEG-2 bit-rate: 6.0Mbps, CLR:10-4(ECLR:9x10-4). below 10-6, or equivalently, ECLR below 9x10-6. These results match the findings from ATM video subjective test [11]. However, the picture degradation in the 6Mbps video accelerates much more sharply as the CLR rises beyond 10-5, and eventually becomes worse than the lower bit-rate 2.5Mbps video before the CLR reaches 10-4. The relation between the CLR and picture quality rating suggests that an increase in video bit-rate would not achieve the desired quality gain when the ECLR is higher than 10-4, and if the bit-rate goes even higher, the picture quality could become worse.
4. CONCLUSIONS We examined the QoS performance of end-to-end video transmission over ATM networks on a perceptual scale. Using a video sequence of extended length, we were able to map end-to-end video delivery conditions to user perceptual quality ratings to characterize picture impairments by ATM cell losses. We showed that an increase in video bit-rate may improve
video quality only when cell loss ratio is below a certain level. This experiment also demonstrated that such a threshold level can be quantitatively determined using a picture quality model. Our experimental results also showed that a high bit-rate video would be more vulnerable to ATM cell losses. As video performance optimization requires the knowledge of the relations among other video coding and networking parameters, future investigations remain to be done to effectively determine video delivery parameters that improve overall picture quality. Using similar methods, many encoding and networking options, such as the choice of error resilience methods and packet sizes can be evaluated on a picture quality scale.
7 2.5MBPS 6.0MBPS 2.5MBPS 0 ATM error 6.0MBPS 0 ATM error
Picture Quality Rating
6.5 6 5.5 5 4.5 4 3.5 3 2.5 3.5
4
4.5
5
5.5
6
6.5
Cell in -log CLP )) CellLoss LossRatio Ratio( (in log CLR
Figure 8. Overall picture quality rating versus cell loss ratio.
5. REFERENCES 1. S. Cringeri, B. Khasnabish, A. Lewis, K. Shaib, R. Egorov, and B. Basch, “Transmission of MPEG-2 video streams over ATM,” IEEE Multimedia, pp. 58-71, 1998. 2.
P. Salama, N. Shroff, and E. J. Delp, “A fast suboptimal approach to error concealment in encoded video streams,” Proc.. Int. Conf. Image Processing, Santa Barbara, California, vol. 2, pp. 101-104, 1997. 3. R. Swann, N. Kinsbury, “Error resilient transmission of MPEG-II over noisy wireless ATM networks,” Proc. Int. Conf. Image Processing, Santa Barbara, California, vol. 2, pp. 85-88, 1997. 4. ITU-R Recommendation 500-8, “Methodology for the subjective assessment of the quality of television pictures,” 1998. 5. J. Lubin, “The use of psychophysical data and models in the analysis of display system performance,” in Digital Images and Human Vision, A. B. Watson, ed. Cambridge : MIT Press, 1993, pp. 163-178. 6. S. Daly, “The visible difference predictor: an algorithm for the assessment of image fidelity,” in Digital Images and Human Vision, A. B. Watson, ed. Cambridge: MIT Press, 1993, pp. 179-206. 7. C. J. van den Branden Lambrecht, “A working spatio-temporal model of the human visual system for image restoration and quality assessment applications,” Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Atlanta, Georgia, pp. 2293-96, 1996. 8. J. McVeigh, G. Chen, J. Goldstein, A. Gupta, M. Keith, and S. Wood, “Real time, software MPEG-2 video encoding on a PC,” Picture Coding Symposium’ 99, Portland, Oregon, pp. 331-336, 1999. 9. ANSI document T1A1.5/97-612, “Sarnoff JND Vision Model,” 1997. 10. I. Dalgic and F. A. Tobagi, “Glitches as a measure of video quality degradation caused by packet loss,” in Proc. 7th International Workshop on Packet Video, Brisbane, Australia, pp. 201-206, 1996. 11. J. Zamora, D. Anastassiou, S.-F. Chang, and L. Ulbricht, “Objective and Subjective Quality of Service Performance of Video-on-demand in ATM-WAN,” Signal Processing – Image Communications, vol. 14, no. 6-8, pp. 635-654, 1999.