404
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012
HEVStream: A Framework for Streaming and Evaluation of High Efficiency Video Coding (HEVC) Content in Loss-prone Networks James Nightingale, Student Member, Qi Wang, Member, and Christos Grecos, Senior Member, IEEE Abstract — High Efficiency Video Coding (HEVC) is the next generation video compression standard currently under development within the ITU-T/ISO sponsored Joint Collaborative Team on Video Coding (JCT-VC). The standardization, and eventual adoption, of HEVC will contribute significantly to the future development of many consumer devices. Areas such as broadcast television, multimedia streaming, mobile communications and multimedia/video content storage will all be impacted by implementation of the emerging HEVC standard. Up to this point in time the research focus of HEVC has been on improvements to video compression efficiency and little work has been conducted into streaming of HEVC. In this work we consider the practical barriers to HEVC streaming in realistic environments and propose HEVStream, a streaming and evaluation framework for HEVC encoded content. Our framework fills the current gap in enabling networked HEVC visual applications and permits the implementation, testing and evaluation of HEVC encoded video streaming under a range of packet loss, bandwidth restriction and network delay scenarios in a realistic testbed environment. We provide a basic error concealment method for HEVC to overcome limitations within the decoder and an RTP packetisation format for HEVC Network Abstraction Layer (NAL) units. Comprehensive results of HEVC streaming experiments under various network circumstances are reported. These results provide an insight into the reduction in picture quality, measured as peak signal to noise ratio (PSNR), that can be expected under a wide range of network constraint and packet loss conditions. We report an average loss of 3.61dB when a bandwidth reduction of 10% is applied. We believe that this work will be amongst the first to report on successful design and implementation of HEVC network applications, and evaluation of the effects of network constraints or limitations on the quality of HEVC encoded video streams1. Index Terms — HEVC, video streaming, packet loss, evaluation framework.
I. INTRODUCTION The screen resolution and display capabilities of a broad range of consumer electronic products, from televisions to mobile phones, have improved significantly in recent times. This is set to further increase with the development of Ultra High Definition Television (UHD TV) [1] with screen 1 James Nightingale, Qi Wang and Christos Grecos are with the Audio Visual Communications and Networks Group, School of Computing, University of the West of the Scotland, Paisley, United Kingdom (e-mail:
[email protected]).
Contributed Paper Manuscript received 04/14/12 Current version published 06/22/12 Electronic version published 06/22/12.
resolutions of up to 7680x4320 and the ever increasing size and resolution of displays on mobile devices such as smartphones and tablet PCs. As very high resolution devices are released to market, and their adoption gains pace, consumers will expect to be able to make full use of the display capabilities of these new high specification devices. Video streaming is a bandwidth hungry application which currently accounts for a significant percentage of Internet traffic. Cisco have recently reported that consumer Internet video streaming accounted for 56 exabytes of Internet traffic in 2010 and also predicted that this will grow to 403 exabytes by 2015 [2]. As consumers adopt higher resolution devices in both wired (e.g. U-HD Internet televisions) and wireless (smartphones and tablets) environments, network operators will face new challenges in providing adequate bandwidth to satisfy consumer demand. These new challenges can be met, at least in part, by improving the compression ratio, and thereby reducing the bandwidth requirement, of the current H.264 Advanced Video Coding (H.264/AVC) standard [2]. The emerging High Efficiency Video Coding (HEVC) standard [4], [5] aims to address this issue by providing a 50% increase in compression efficiency over the H.264/AVC standard while maintaining the same level of perceptual visual quality. At the current stage in the development of HEVC, the standardization effort has not produced any experimental results that demonstrate the effects of packet loss during transmission on HEVC encoded video streams. Although an ad-hoc subgroup (AHG14) of the JCT-VC are currently considering a limited set packet loss experiments [8] to determine decoder robustness within the current Test Model under Consideration (TMuC, version HM6.0) [6], no significant progress in this area has been reported either in literature or in the submissions to the JCT-VC standardization effort on HEVC. In this work we enable and investigate the streaming of HEVC encoded video content in loss prone networks on a realistic testbed, which unlike the software proposed in [8] will produce random packet loss from both the application of network constraints and transient interference in the wireless components of the testing system. Our major contribution is the design and implementation of an HEVC streaming and evaluation framework that enables a benchmark system towards achieving realistic real-time HEVC video streaming and permits the effective testing of HEVC performance under a varied range of network conditions. In particular, we define comprehensive end-to-end HEVC-specific video processing and delivery operations for a realistic and complete HEVC streaming system. In addition,
0098 3063/12/$20.00 © 2012 IEEE
J. Nightingale et al.: HEVStream: A Framework for Streaming and Evaluation of High Efficiency Video Coding (HEVC) Content in Loss-prone Networks
we devise a basic yet essential error concealment mechanism for HEVC and integrate it into our evaluation framework to overcome issues with resilience to packet loss in the current implementation of the HEVC decoder [9]. Our testing is performed on a realistic testbed that includes hybrid wired/wireless segments, multiple paths from sender to receiver, and multihomed mobile network functionality (a mobile network that is simultaneously connected to multiple access networks). The effects, on HEVC performance, of applying combinations of bandwidth, delay or packet loss constraints within the testbed environment can be measured and evaluated using our framework. We provide experimental results showing the reduction in visual quality, measured in terms of PSNR, arising from the application of a wide range of network constraints and restrictions to HEVC encoded streams. These results provide benchmark performance indicators for HEVC streaming across a range of hybrid wired/wireless network environments utilizing both single and multipath delivery mechanisms. In light of the paucity of existing work to address the above issues, we anticipate that this paper will be amongst the first studies to offer a functional streaming and evaluation tool for HEVC and to report on the delivery of HEVC streams in loss prone networks. The rest of the paper is organized as follows. Related work on the development of HEVC is reviewed in Section II. Section III describes the proposed streaming and evaluation framework, while our testbed environment and implementation details are presented in Section IV. Section V provides extensive results of our experiments on HEVC streaming in loss prone environments and Section IV concludes the paper.
405
coding structure used in each picture. In H.264/AVC each picture is divided into macroblocks each containing 16x16 luma samples, which can in turn be further divided into smaller blocks (16x8, 8x16, 8x8, 8x4, 4x8 and 4x4). However, in HEVC, pictures are divided into Coding Unit (CU) treeblocks [5] of up to 64x64 luma samples, and the highest level of the treeblock structure is referred to as the Largest Coding Unit (LCU). These tree block structures can then be recursively split into smaller CUs using a quad-tree segmentation structure. HEVC permits CUs of 64x64, 32x32, 16x16 and 8x8 luma samples. The use of larger coding units on the regions of homogeneity within a picture that has little or no motion between two adjacent pictures leads to a significant compression gain when using intra-prediction and transforms. The minimum CU size is determined by the number of levels of depth in the quad-tree structure. At any given depth level each of the four CUs can either be encoded as a single block at that level or split into smaller CUs at the next depth level. A typical HEVC quad-tree structure is illustrated in Fig. 1.
II. RELATED WORK This section describes existing work on the development of HEVC. In the interests of brevity, all of the features of HEVC described in this section are taken from the current draft specification [5] (February 2012) and individual JCT-VC documents are not referenced unless required to illustrate a significant point or concept. The search for a replacement for the current H.264/AVC [2] video encoding standard began in January 2010 with the formation by the ITU Telecommunication Standardization Sector (ITU-T) and the International Organization for Standardization (ISO) of the Joint Collaborative Team for Video Coding (JCT-VC). The remit of the JCT-VC was to design and develop the next generation of video coding standard that should provide an improvement in compression of at least 50% over the current standard without reduction in visual quality. Development of the proposed standard reached its first major milestone in February 2012 with the completion of the committee stage draft of the HEVC specification [5]. In common with H.264/AVC and its extensions, HEVC consists of both a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL). At the VCL layer, one of the most significant differences between HEVC and H.264 is the
Fig. 1. HEVC Coding Unit Tree structure
At each CU quad-tree depth level the rate distortion cost (RD cost) of encoding as a single block is compared the sum of the four smaller blocks and the lowest RD cost option is used. Each node on the CU quad-tree structure represents a CU, which may be further split into Prediction Units (PUs) used for intra- and inter-prediction. Unlike CUs, PUs are not limited to a square shape, however their size and shape are dependent upon which prediction type is used. As can be seen in Fig. 1, a CU has sides of length 2n and may be split into four smaller CUs, each side of which has a length n. TABLE I shows the possible PU size and shape options for each prediction modes available in HEVC. HEVC also contains a further coding structure, Transform Units (TUs), which are defined for transform and quantization
406
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012
purposes. The depth of TU residual quad-tree structures varies according to which HEVC encoding mode used. HEVC includes integer transforms for different TU sizes. New Sample Adaptive Offset (SAO) and Adaptive Loop Filter (ALF) filters are available in HEVC and may be applied subsequent to deblocking filtering. TABLE I POSSIBLE PREDICTION UNITS SIZES Coding Mode
Permitted Sizes
Intra Inter Skip
2n x 2n, n x n 2n x 2n, n x n, 2n x n, n x 2n 2n x 2n
There are two encoding complexity configurations (High Efficiency and Low Complexity) and three coding modes (Intra, Random Access and Low Delay) in the current draft specification of HEVC [5]. The Low Complexity configuration coding sacrifices efficiency in favour of low computational complexity, while the High Efficiency configuration strives to achieve the highest coding efficiency. The Low Delay coding mode uses only previous frames for reference, yet the Random Access coding mode uses subsequent frames for temporal prediction. The Intra mode does not employ any temporal prediction. HEVC also includes two new picture types, the first of which is the Clean Random Access (CRA) picture. A CRA is similar in function the open Group-of-Pictures (GOP) intra picture in H.264/AVC and has been allocated a unique NAL unit type. HEVC streams may begin with either an Instantaneous Decoder Refresh (IDR) or CRA picture and CRA pictures can be signalled using a recovery point. The second new picture type is the Temporal layer access (TLA) pictures used to indicate temporal layer switching points. HEVC includes involves motion compensation and spatial intra-prediction, the application of integer transforms to prediction residuals [7], and CABAC entropy coding uses an arithmetic coding. Similarly to H.264, in-loop deblocking filtering is applied to the reconstructed picture. In HEVC each slice is encoded in a single NAL unit. The size of a slice (and the subsequent NAL unit) may be matched to that of the Maximum Transmission Unit (MTU) of the network, over which the video will be streamed. This provides NAL units equal to or less than the MTU size, thereby removing the need for NAL unit fragmentation and eliminating the possibility of NAL units being discarded due to the first fragment containing header data being lost during transmission. To counter the increased computational complexity of HEVC, when compared with H.264/AVC, HEVC is being designed to support parallel processing at the sub picture level. The standardisation effort with regard to parallelisation is currently ongoing. A recent work by Correa et al [10] address the complexity aspect of HEVC on power-constrained devices by firstly establishing the relationship between CU tree block depth and computational complexity and secondly proposing a mechanism to reduce computational complexity by constraining the CU tree block maximum depth. Work on the development of the networking components required for HEVC is currently emerging both within the JCT-
VC and the IETF. At the Network Abstraction Layer of HEVC, a bitstream format compliant with Annex B [11] of the H.264/AVC standard is employed. In common with H.264/AVC, a one-byte NAL unit header is employed in HEVC. However in HEVC the Nal_type field has now been extended from five bits to six permitting a doubling of the number of possible NAL unit type codes available. Although the extra NAL unit types have not as yet been allocated, some will be needed to support new NAL unit types that would be required for parallelisation and the proposed scalable end multiview extensions to HEVC. An additional one-byte extension header is also used in Video Coding Layer (VCL) NAL units. The HEVC NAL unit header is compared with that of H.264/AVC in Fig. 2.
Fig. 2. NAL Unit Header Comparison between HEVC and H.264/AVC.
The AHG 14 ad-hoc sub group of the JCT-VC have recently released software [11], which will permit researchers to manipulate an HEVC encoded byte stream file to remove NAL units and thereby investigate the robustness to packet loss of the reference decoder in the TmuC. Additionally a first draft of an RTP payload format [12] for HEVC has published within the IETF. However, to the best of our knowledge, no existing work has reported on the use of the NAL unit loss software [11] either in literature or within the JCT-VC, nor has any work reported on the streaming of HEVC encoded streams in a realistic loss prone network environment. III. THE PROPOSED HEVSTREAM FRAMEWORK Our streaming and evaluation framework for HEVC encoded video streams (HEVStream) is designed to enable a benchmark system towards realistic real-time HEVC
J. Nightingale et al.: HEVStream: A Framework for Streaming and Evaluation of High Efficiency Video Coding (HEVC) Content in Loss-prone Networks
streaming and facilitate the investigation of the effects of packet/NAL unit loss on the visual quality of the received video stream. It consists of three stages: pre-processing, streaming and post-processing. It also contains a number of middleware components and a control overlay mechanism. HEVStream is a trace driven streaming environment where NAL units are extracted from the HEVC bytestream file by reading from a custom log file produced by a modified HEVC encoder. Trace driven video streaming evaluation techniques are well established in literature. In [14] Seeling and Reisslein provide a comprehensive survey of trace driven evaluation schemes for the current H.264/AVC standard and its extensions. A. Pre-Processing Stage The pre-processing stage consists of two steps: encoding and prioritization. In the first step the raw video sequence is encoded, according to the chosen test condition (see TABLE III in Section V), using a modified version of the HEVC encoder from the current TMuC [9]. In this work we use the common test conditions recommended by the JCT-VC in [17]. An expanded log file (shown in Fig. 3) is produced by the modified HEVC encoder (in verbose mode) now includes additional fields that will aid the trace driven streaming process. In particular when the encoder writes a NAL unit to the HEVC bytestream file, we record the memory offset from the start of the file to the first byte of the current NAL unit to a field named OFFSET. This provides a means of directly extracting any individual NAL unit from the bitstream file thus aiding experiments involving prioritization schemes. It also provides a unique identifier for each NAL unit, which is then later used for offline comparisons of sender and receiver trace files.
Fig. 3. Data written to HEVC NAL unit trace file for each NAL unit produced by the encoder.
The NAL unit DECODE_TIME is used in selective packet dropping schemes such as [15]. It is the time (in milliseconds) relative to the first NAL unit in a stream by which the current NAL unit must be available at the decoder. NAL units arriving after their DECODE_TIME would not be useful in the decoding process (without causing a buffering wait during playback). The TIMESTAMP field contains the sender timestamp, which is written by the streamer when a packetized NAL unit is passed to the Linux kernel for transmission by the packet scheduler. The second pre-processing step is NAL unit prioritization. Although we have included the PRIORITY field in the NAL unit trace file all NAL units are assigned the same priority in this current work. The encoder NAL unit trace file (shown in Fig. 3) can be manipulated to, for example, re-order HEVC NAL units according to some prioritization mechanism and will, in future work, aid the selective dropping of HEVC NAL units from a stream in response the application of a network
407
resource constraint. This future work will either further amend the HEVC encoder to write directly to PROIRITY field using some encoder calculated rate distortion metric or include a further offline pre-processing step to calculate and write a priority weighting for each NAL unit. B. Streaming Stage The streaming stage of our framework consists of four steps: bytestream extraction, packetisation, scheduling and reception. The first three of which are software modules resident on the streamer node with the fourth software module (reception) resident on the client node. Firstly a custom bytestream extraction tool parses each line in the encoder NAL unit trace file. It seeks to the offset in the HEVC bytestream file and extracts the NAL unit. In the second step extracted NAL units are packetized for streaming over the network. Two NAL unit packetisation strategies are employed. In common test conditions where a Picture Parameter Set (PPS) message occurs in the stream before the first VCL NAL unit of a new picture, we employ a single time aggregation packet. We use the STAP-A noninterleaved packetisation mode defined in the draft RTP payload for HEVC [13]. Using this strategy ensures that the relevant PPS is always delivered to the decoder together with the first NAL unit of the picture that it describes. Whenever a NAL unit is passed from the bytestream extractor for packetisation, the NAL unit type is examined. All Non-VCL control NAL units, with the exception of PPS NAL units, are passed forward for encapsulation in a one NAL unit per RTP packet fashion. Whenever a PPS NAL unit is encountered it is buffered in an aggregation packet buffer and the following VCL NAL unit, which is the first coded slice of a new picture, is extracted from the bytestream. The two NAL units (PPS and VCL) are then passed forward for packetisation in a STAP-A packet with the PPS NAL unit being first (in transmission order) in the packet. All other VCL NAL units are packetized using a one-NAL-unit-per-RTP packet strategy. We employ a custom 12-byte pseudo-RTP header (shown in Fig. 4) that carries information useful in the streaming evaluation framework.
Fig. 4. Custom pseudo-RTP header for HEVC streaming evaluation.
It also permits both selective packet dropping and evaluation of many different HEVC streaming scenarios without the need to inspect NAL unit headers or payload data. As the pseudo-RTP header is the same size as a real RTP header no additional overhead is incurred by the transmission of the evaluation parameters contained in the custom header. The fields in the custom RTP header are populated by the RTP packetisation tool, which combines parses data from both the encoder NAL unit trace file and the NAL unit headers. RTP
408
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012
packets are further encapsulated for transmission over UDP. The operations of the sender and the receiver of HEVStream are illustrated in Fig. 5 and Fig. 6, respectively.
Fig. 6. Client-side operation of HEVStream.
Fig. 5. Sender-side operation of HEVStream.
The third step in the streaming stage is packet scheduling. Two versions of the streamer have been developed. In the first, packets are transmitted over a single hybrid wired/wireless path to the client node. The second version interacts with the multihomed mobile networks components of our testbed and passes packets to the Concurrent Multipath Transmission for Network Mobility (CMT-NEMO) [16] protocol, which distributes the stream over multiple paths for concurrent transmission to the client. In this work we use a simple earliest-delivery-path-first scheduling mechanism to decide which path should be taken by each packet, as explained below. Network constraints are applied at wide area network (WAN) emulation routers within our testbed. The constraints that can be applied include a reduction in available bandwidth, the introduction of additional delay and the random loss of a given percentage of packets in an application flow. In multipath transmission scenarios the WAN emulation routers report current network path state (bandwidth and delay) to the streamer, which uses this data to decide which path will offer the earliest delivery time to the client for any given packet.
The number of middleware components in the system is dependent upon which streaming scenario is employed, as is the nature of the control overlay mechanism used. Scheduling components cooperate with a number of agents running on intermediate nodes within our network. These agents report current network conditions to the scheduler, which can then make informed decisions on how best to match the video stream to the current network conditions. These intermediate hardware and software components are all modified versions of those developed in our previous work [15], [16]. The selective dropping features of these components are available but currently disabled. These will be used in future work on prioritization of HEVC NAL units. Some details of the control overlay mechanism and the network topologies used are provided in Section IV (implementation) of this paper. Reception is the final step in the streamer stage of our framework. At the client side of the testbed (shown in Fig. 6), the receiver application receives RTP packets from the network interface. It then removes the custom pseudo-RTP header and writes the decoder NAL unit trace file, which details each NAL unit received, the order of reception and the timestamp of when the NAL unit arrived at the client node. Received NAL units are written to the receiver HEVC bytestream file in the order in which they were received.
J. Nightingale et al.: HEVStream: A Framework for Streaming and Evaluation of High Efficiency Video Coding (HEVC) Content in Loss-prone Networks
C. Post-Processing Stage The post-processing stage in the framework consists of three steps: lost NAL unit identification, decoding with error concealment and visual quality assessment. The encoder NAL unit trace file is compared with the decoder NAL unit trace file. Missing/lost data packets (and the NAL units they contain) are identified and a record for each NAL unit in the encoder NAL unit trace file is written to the receiver NAL unit loss log. The status of each NAL unit is described as one of reception states, shown in TABLE II.
409
decoder) or Python (pre-processing and post-processing tools) and could be implemented on other platforms with minimal modification.
TABLE II POSSIBLE NAL UNIT RECEPTION STATES Reception State
Description
100 200 300 400 500
NAL unit arrived on time and intact. NAL unit arrived intact but late. NAL unit arrived has missing fragments NAL unit lost during transmission NAL unit selectively dropped by streamer (currently unused)
The comparison between the sender and the receiver trace files not only identifies lost NAL units, but also by comparing the number of bytes in each NAL unit, identifies NAL unit with missing fragments. NAL units with missing fragments can cause the decoder to perform unexpectedly. By comparing the sender and receiver timestamps those packets that arrived after their DECODE_TIME are also identified. The receiver NAL unit loss log file provides summary data on the number of NAL units in each reception state category. A modified version of the HEVC decoder from the current TMuC [9] extracts NAL units from the receiver bytestream file while concurrently reading from the receiver NAL unit loss log. The modified decoder is therefore aware of missing NAL units by an additional out-of-band mechanism. Where an incomplete or lost NAL unit causes a decoder error, we trap this error and permit the decoder to continue decoding other pictures in the stream. Error concealment is performed by using a simple frame copy method. Missing NAL units are compensated for by copying the co-located blocks from the previous picture or last reference picture in the decoder buffer. Finally the error concealed decoded video is compared with the original raw video sequence using the peak signal to noise ratio (PSNR) metric IV. IMPLEMENTATION The proposed HEVStream system has been fully implemented on a realistic hardware based testbed platform, shown in Fig. 7, which offers both a single path hybrid wired/wireless environment and a multihomed mobile networks environment. Built upon the testbed platform previously used in [15] and [16], it uses the CMT-NEMO protocol [16] for concurrent multipath transmission in multihomed mobile networks. All of the software components are written in either C++ (amended HEVC encoder, packetisation, streamer, receiver and amended HEVC
Fig. 7. Topology of the testbed showing major components of the streaming and evaluation framework.
The intermediate network components of our framework provide feedback to the streamer on current network path conditions (instantaneous available bandwidth and end-to-end delay). In the configuration used for these initial benchmark tests of HEVC streaming, this data is recorded at the streamer but no action is taken in response to changes in network path conditions. All packet loss or delay is directly attributable to the application of constraints at the WAN emulation routers. These routers run the NETEM tool, which is included within the IPROUTE2 component of the Linux kernel. Bandwidth restriction, packet loss ratio and added delay constraint are applied using this tool. V. EXPERIMENTAL RESULTS The JCT-VC has published a list [17] of common test conditions for HEVC, shown in TABLE III. Of the twelve test conditions listed, five are designated as optional in [17]. The experiments conducted for this paper used the seven recommended test conditions. A subset of eight different video test sequences from the recommended sequences for HEVC were used in our experiments, show in TABLE IV. For each test sequence/testing condition combination, streaming experiments were conducted over a single network path at a range of bandwidths. Experiments were conducted with the available bandwidth set to 110%, 100%, 97.5%, 95% and 90% of the bandwidth requirement for the encoded test sequence/test condition combination. These testing points represent bandwidth restrictions of 2.5%, 5% and 10% of the stream requirement. We used the bandwidth restriction metric in these experiments as, in real world situations, packet loss is more likely to occur due to a reduction in bandwidth or an increase in end-to-end delay (from congestion etc.) than it is from the application of a policy of deliberately dropping x% of the packets in a stream. Packet loss ratio is however calculated, mapped to bandwidth reduction and reported in the experimental results.
410
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 TABLE III HEVC COMMON TESTING CONDITIONS
Complexity Configuration
High Efficiency
Low Complexity
Encoding Mode Intra (I-HE) (R) Random Access (RA-HE) (R) Low-Delay (LD-HE) (R) Random Access- 10 bit (RA-HE-10) (R) Intra – 10 bit (I-HE-10) (O) Low-Delay – 10 bit(LD-HE-10) (O) Low-Delay – P Slice Only (LD-HE-P) (O) Low –Delay – P Slice-10 bit (LD-HE-P-10) (O)
reported in TABLE IV. Packets are dropped according to the FIFO queuing discipline in the WAN emulation routers, with no intelligent scheduling, therefore a high degree of variability in which type of packets get dropped. This variability can be clearly seen in Fig. 11. It is noted that the relatively higher variance between PSNR results for each test condition in Fig. 11 compared with those in Fig. 10 is primarily due to a lower number of testing samples being available for the Kimono1 sequence. It was only used in single path experiments, whereas the Racehorces sequence was used in single path, multipath and MTU matching experiments.
Intra (I-LC) (R) Random Access (RA-LC) (R) Low-Delay (LD-LC) (R) Low-Delay- P Slice only (LD-LC-P) (O)
R = Recommended Test Condition, O = Optional Test Condition
In the single path experiments a subset of the test sequence and test conditions was also encoded using MTU size matching, where we set the maximum size of a NAL unit at 1400 bytes. This figure reflects the maximum payload for an IPv6 packet of 1500 bytes MTU when all headers and the inclusion of PPS NAL units in a STAP-A type aggregation packet are considered. Additionally, a limited number of experiments were conducted in a multipath scenario over our multihomed mobile networks testing environment For each set of experiments both the packet loss ratio resulting from the application of a bandwidth constraint and the received video quality were measured. The relationship between bandwidth reduction and RTP packet loss is shown in Fig. 8. Although there is a clear relationship between the percentage reduction in bandwidth and the packet loss ratio, it can be seen that the variability in the level of packet loss increases in line with the level of bandwidth reduction. We attribute this variation to the variable bit rate (VBR) nature of the HEVC streams. We observed a wide variation in NAL unit size for different sequence/testing condition combinations. The mean percentage of each type of NAL unit lost across all experiments is reported in Fig. 9, where it can be seen that 93% of lost packets either contained only VCL NAL units or were STAP-A aggregation packets containing both a PPS NAL unit and a VCL NAL unit. This demonstrates that the significantly larger VCL NAL units are more likely to be dropped in an uncontrolled manner to meet a bandwidth constraint than parameter set NAL units, which are typically less than 30 bytes in length. The number of supplemental enhancement information (SEI) messages lost represented 4% of the total packet loss. The relationship between bandwidth reduction and video quality impairment (measured by PSNR) for each of the seven common testing conditions used is shown in Fig. 10 for the Class C Racehorses sequence at 30 fps and in Fig. 11 for the Class B Kimono1 sequence at 24 fps, respectively. These two figures provide typical examples of HEVC streaming performance, with a full set of results for all test sequences
Fig. 8. Packet loss rate for each bandwidth reduction.
Fig. 9. Analysis of NAL units lost across all tests.
The mean difference in PSNR between the HEVC anchor values for sequence/testing condition is reported in Fig. 12. The values for all testing sequences is the mean difference of all sequence/test conditions at QP = 22, 27,32 and 37. At a bandwidth reduction of 10% the mean PSNR drop is 3.61dB with results for 5% and 2.5% reductions being 1.67dB and 0.71dB respecively. Values for the Kimono1 and Racehorses sequences are shown for comparison and to illustrate a consistent trend across all sequences.
J. Nightingale et al.: HEVStream: A Framework for Streaming and Evaluation of High Efficiency Video Coding (HEVC) Content in Loss-prone Networks
411
The relative difference between the the results for the two test sequences and the overall mean can be attributed to two factors, firstly as the test sequences used for illustration were encoded at different QP’s. Racehorses was encoded at QP=22 which resulted in a significantly higher anchor PSNR than that of Kimono1which was encoded at QP = 37. Therefore the higher drop in PSNR for racehorses is a reflection of a consistent percentage drop in PSNR as it had a higher anchor PSNR value. Secondly the use of Racehorses in multipath experiments alos contributed to its higher PSNR loss.
Fig. 12. Mean PSNR differences across all tests.
Fig. 10. PSNR results for the Racehorses sequence.
Fig. 13. Mean PSNR differences between single and multipath streaming.
Full results for mean PSNR difference between experimental results and the HEVC anchor values is reported in TABLE IV for each test sequence used. TABLE IV MEAN DIFFERENCE IN Y-PSNR FROM ANCHOR
Fig. 11. PSNR results for the Kimono1 sequence.
Fig. 13 reports a comparison of mean PSNR differences for all experiments between single and multipath streaming: single path streaming consistenly has a lower PSNR loss by about 2dB than multipath streaming. This is explained by the fact that we did not employ any HEVC specific scheduling mechainsm in conjunction with CMT-NEMO in this work but rather relied on a simple earliest path first mechanism.
Class
Sequence
Bandwidth Restriction Applied 10% 5% 2.5%
B
Kimono1 BQTerrace
3.13dB 3.91dB
1.49dB 1.81dB
0.61dB 0.69dB
C
BQMall Racehorses
3.65dB 3.89dB
1.63dB 1.76dB
0.70dB 0.79dB
D
BasketballPass BQSquare
3.21dB 3.52dB
1.57dB 1.66dB
0.64dB 0.73dB
E
Vidyo1 Vidyo3
3.62dB 3.66dB
1.71dB 1.79dB
0.72dB 0.83dB
412
IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012
For a 10% reduction in bandwidth the drop in PSNR ranges from 3.13dB for the Kimono sequence to 3.91dB for the BQTerrace sequence, with a mean overall difference of 3.61 dB. At 5% reduction in bandwith the PSNR drop ranged from to 1.49dB to 1.81db with a mean overall difference of 1.67dB. With a 2.5% reduction in bandwidth the PSNR drop ranged from 0.61dB to 0.83dB with a mean overall difference of 0.71dB.
[11] [12] [13] [14]
VI. CONCLUSIONS In this work, we have designed and implemented a comprehensive streaming and evaluation framework for HEVC encoded video streams. By utilising this framework, we have provided an insight into the performance implications of streaming HEVC encoded content in loss prone networks. Our framework includes complete end-to-end HEVC-specific video processing and streaming functionality, including a frame copy error concealment method to compensate for missing NAL units and has permitted testing of HEVC streaming under a range of network conditions. We have provided results of experiments conducted on a realistic, hardware based testbed platform. The effects of applying bandwidth, packet loss and path latency constraints on the quality of received video streams are reported across the range of HEVC’s recommended testing conditions. These results provide benchmarks against which future HEVC streaming mechanisms can be measured. Components have been included in the framework to permit selective dropping of network packets/NAL units in response to applied network constraints. Our future work will concentrate on the development of suitable packet/NAL unit prioritization schemes for use in selective dropping schemes for HEVC. REFERENCES [1]
E. Nakasu, "Super Hi-Vision on the Horizon: A Future TV System That Conveys an Enhanced Sense of Reality and Presence," IEEE Consumer Electronics Magazine, vol.1, no.2, pp.36-42, April 2012. [2] Cisco, “http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns 705/ns827/white_paper_c11481360_ns827_Networking_Solutions_White_Paper.htmlCisco visual networking index: forecast and methodology 2010-2015,”, White Paper, June 2011. [3] T. Weingand, G. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560-576, July 2003. [4] D. Marpe et al, "Improved video compression technology and the emerging high efficiency video coding standard," 2011 IEEE International Conference on Consumer Electronics - Berlin (ICCEBerlin), pp.52-56, Sept. 2011. [5] B. Bross, W. J. Han, J. R. Ohm, G. J. Sullivan and T. Weingand, “High efficiency video coding (HEVC) text specification draft 6,” JCT-VC Document, JCTVC-H1003-v21, April 2012. [6] S. Oudin et al, "Block merging for quadtree-based video coding," 2011 IEEE International Conference on Multimedia and Expo (ICME), pp.16, July 2011. [7] M. Winken, P. Helle, D. Marpe, H. Schwarz, and T. Wiegand, “Transform coding in the HEVC test model,” in Proc. IEEE International Conference on Image Processing (ICIP), Sep. 2011. [8] S. Wenger, “Loss robustness report (AHG14),” JCT-VC Document, JCTVC-H0014, Feb. 2012. [9] “HEVC Test model under consideration (TMuC) Revision 6.0,” JCTVC Contribution HM 6.0, March 2012. [10] G. Correa, P. Assuncao, L. Agostini and L. A. Silva Cruz, “Complexity Control of High Efficiency Video Encoders for Power Constrained
[15]
[16] [17] [18]
Devices,” IEEE Trans. on Consum. Electron., 57(4), 1866-1874, November 2011. S. Wenger, “NAL Unit Loss Software,” JCT-VC Document, JCTVCH0072, Feb. 2012. Advanced Video Coding, ITU-T Rec. H.264 and ISO/IEC 14496-10 (MPEG-4 AVC), Version 13, Mar. 2011. T. Schierl, S. Wenger, Y.-K, Wang and M. M. Hannuksela, “RTP payload format for high efficiency video coding,” IETF Internet Draft, Feb. 2012. P. Seeling, M. Reisslein, "Video transport evaluation with H.264 video traces," IEEE Communications Surveys & Tutorials, vol. PP, no.99, pp.1-24, 2011. J. Nightingale, Q. Wang and C. Grecos, “Optimized transmission of H.264 scalable video streams over multiple paths in mobile networks,” IEEE Trans. Consum. Electron., vol. 56, no. 4, pp. 2161-2169, November 2010. J. Nightingale, Q. Wang and C. Grecos, “Removing path switching cost in video delivery over multiple paths in mobile networks,” IEEE Trans. Consum. Electron., vol. 58, no. 1, pp. 38-46, February 2012. F.Bossen, “Common test conditions and software reference configurations,” JCT-VC Document, JCTVC-G1200. M. Horowitz, S. Xu, E. S. Ryu and Y. Ye, “The effect of LCU size on coding efficiency in the context of MTU size matching,” JCT-VC Document, JCTVC-F596, July 2011.
BIOGRAPHIES James Nightingale (S’09) received the BSc degree in Network Computing from Edinburgh Napier University, UK and the BSc (Hons) degree in Computer Networks from the University of the West of Scotland, UK, where he is currently a PhD student. His research interests include mobile networks, multihoming and video streaming techniques. Qi Wang (S’02-M’06) Dr Qi Wang is a Lecturer in Computer Networking with the Audio-Visual Communications and Networks Research Group (AVCN) within the University of the West of Scotland (UWS), UK. Previously, he was a Research Fellow with the University of Strathclyde, UK, and a Telecommunications engineer with the State Grid Corporation of China. He received his BEng and MEng degrees in electronic and communication systems from Dalian Maritime University, China, and his PhD degree in mobile networking from the University of Plymouth, UK. He was a recipient of England ORS award. Recently, he has been involved in a number of international or national projects such as European Union FP6 MULTINET and UK EPSRC DIAS. His research interests include Internet Protocol networks and applications, wireless and mobile networks and video networking. He is the primary supervisor of several PhD programs and he has published over 30 papers in renowned international journals and conference proceedings. He is a Member of IEEE and on the technical program committees of numerous IEEE and other international conferences. Christos Grecos (M’01-SM’06) Prof Christos Grecos is a Professor in Visual Communications Standards, and Head of School of Computing, the University of the West of Scotland (UWS), UK. He leads the Audio-Visual Communications and Networks Research Group (AVCN) with UWS, and his research interests include image/video compression standards, image/video processing and analysis, image/video networking and computer vision. He has published many research papers in top-tier international publications including a number of IEEE transactions on these topics. He is on the editorial board or served as guest editor for many international journals, and he has been invited to give talks in various international conferences. He has been the Principal Investigator for several national or international projects funded by UK EPSRC or EU. He received his PhD degree in Image/Video Coding Algorithms from the University of Glamorgan, UK. He is a Senior Member of IEEE.