Content-Based Time Sampling for Efficient Video Delivery over ...

3 downloads 7944 Views 237KB Size Report
IEEE International Conference on Digital Telecommunications, Cote d' Azure, France, August, 2006 ... Delivery over Networks of Low and Variable Bandwidth.
IEEE International Conference on Digital Telecommunications, Cote d’ Azure, France, August, 2006

Content-Based Time Sampling for Efficient Video Delivery over Networks of Low and Variable Bandwidth Anastasios D. Doulamis

Dimitrios I. Kosmopoulos

Nikolaos D. Doulamis

National Technical University of Athens, Dept. of Electrical and Computer Engineering [email protected]

National Centre for Scientific Research Demokritos, Institute of Informatics & Telecommunications [email protected]

National Technical University of Athens, Dept. of Electrical and Computer Engineering [email protected]

Abstract – A new content-based algorithm is proposed for efficient delivery of video sequences under networks of low and variable bandwidth. It dynamically discards the audiovisual content that cannot be afforded by the network, (e.g., due to bandwidth variations), at a content cost, which outperforms the popular linear frame skipping method. The algorithm estimates a) the amount of information that can be currently delivered by the network and b) the most representative of the frames that would be normally transmitted. All calculations are performed directly on the MPEG compressed domain requiring minimal decoding of the captured multimedia information. The computational complexity of the proposed scheme is also minimal and thus it can be applied in almost real time. A proposed objective evaluation scheme reveals the validity of the approach.

INTRODUCTION I For efficient transmission and browsing of different types of media over heterogeneous network platforms, there is a need to tailor audiovisual information without compromising quality. An adaptable content-based algorithm, which would be able to discard information that cannot be carried out over networks of low and variable bandwidth at a minimum content cost could bring significant benefits. Adaptation normally regards either the spatial or the time domain. In this work, the adaptation is performed in content domain by taking into consideration the temporal variations of the content as well. Adaptation algorithms for video delivery attracted many researchers in the past. In the MPEG-4 standardization activities [1], an adaptable encoding scheme has been proposed through the use of Fine Granularity Scalability (FGS) [2] scheme. With the FGS scheme, video is encoded into a base layer (BL) and one or several enhancement layers (EL). The base layer provides the basic video quality, while the enhancement layer improves the basic quality. The FGS enhancement layer, in contrast to conventional scalable video coding, is hierarchical and can be cut anywhere before transmission at the granularity of the bits.

Other works apply temporal scalability, such as the ones in [3] and [4]. In [3], the temporal scalability is achieved through an efficient management of B frames, while in [4] a new frame is introduced in levels based on the FSG approach and the temporal variation of the bandwidth. Other approaches discard similar content, such as the video abstraction and the hierarchical summarization methods. Algorithms for shot detection can be considered as the first attempts towards a content-based video adaptation [5], [6]. Other more complicated approaches are based on the extraction of multiple key-frames from a shot, able to effectively describe the shot content [7]. In [8], video is decomposed into a sequence of “sub-shots” and a motion intensity index is computed for each of them. Then, all indices are quantized into predefined bins, with each bin assigned a different sampling rate and key-frames are sampled from each sub-shot based on the assigned rate. However, video summarization or abstraction schemes are not appropriate for interactive video delivery over networks of varying bandwidth [9]. This is due to the fact that the goal of these algorithms is to extract a small "video abstract" by discarding visual information. Hierarchical summarization approaches are non-linearly structured multimedia content to allow delivery of audiovisual information at different content resolution levels [10]. In the work of [11], the authors organize video in a content hierarchy permitting access of any content type in a nonlinear way. Linear hierarchical summaries have been reported in the earlier works of [9] and [11] using wavelet filter banks. The main drawback, however, of all the above mentioned methods, is that content organization is performed in a static way preventing content adaptation in terms of bandwidth variations. This means that the amount of the delivered audiovisual content is not restricted by the network channel characteristics, and the terminal devices requirements.

IEEE International Conference on Digital Telecommunications, Cote d’ Azure, France, August, 2006

Bandwidth adaptability has been reported in many works such as [12], [13]. [14]. In particular, in [12] when the bandwidth is lower than the required one, the first video frame is delivered, while the remaining are skipped until bandwidth requirements are satisfied. Such an approach, however, performs only linear adaptation, since content information is not taken into account. Another policy considering the variation of motion activity between the sequence where a frame is transcoded and the sequence where that frame is skipped, has been proposed in [13]. A dynamic method for frame skipping has been proposed in [14]. However, despite its dynamic nature, the aforementioned methods do not exploit content information. In this paper, we introduce a new and efficient contentbased video adaptation scheme which adaptively discards the audiovisual content that cannot be afforded by the network, (e.g., due to the bandwidth variations), at a minimum content cost. In particular, the algorithm estimates the amount of information a) the delivery of which can be afforded by the network at a given time instance and b) that best represents the content of a given segment. The proposed scheme exploits the MPEG domain compressed properties of a video file to allow for an almost real time video content adaptation. In addition, an objective criterion is introduced to measure the performance of the proposed almost real time video adaptation algorithm and to compare it with other schemes. II

CONTENT REPRESENTATION

The purpose of content representation is to transform the pixel-based visual description into a feature-based one, which better describes the visual content. In the proposed scheme, the MPEG encoded properties are exploited since MPEG is the most commonly used standard for video compression and minimal processing is required to obtain the features. Thus, the proposed scheme can be implemented in an almost real time framework. We use the histogram of the color information on a block 8x8 in the YCrCb components and the histogram of motion vectors are estimated over macroblock of 16x16 pixels. The color histogram is constructed by the DC coefficient of the DCT over an 8x8 block. However, the DC coefficient of the DCT is only available in I frames and not in P and B ones and thus we exploit the motion information of P and B and the known DC coefficient of the reference I frame [6]. Color information is independently quantized into a preset number of bins. The h c the vector, which represents the color histogram is decomposed into vectors h Y , h Cr and h Cb which describe the Y, Cr and Cb component of the color information: h c = [(h Y ) T (h Cr ) T (h Cb ) T ]T

where T denotes the transpose operator.

(1)

The motion vectors of the MPEG sequence are also quantized into a preset number of bins. Motion vectors, however, are only available in P and B frames and not in I frames. For the I frames, the same vectors as the ones estimated in the exactly previous P or B frame are taken into account. This assumption is based on the fact that, almost the same motion activity will be encountered within successive frame unless in a shot change. If h m is the motion vectors’ histogram consisting of the vectors h mx and h my , which refer to the x and y velocity of the motion then h m is given as

h m = [(h mx )T (h my )T ]T

(2)

The total feature vector f is given by: f = [(h c )T (h m )T ]T

III

(3)

ALMOST REAL TIME CONTENT ADAPTATION

In this section, we describe the concept behind the algorithm for transmitting selectively frames over a network of low and time-variable bandwidth (see Table 1). We assume that at a time t=n a new video frame should be transmitted through the variable network and at this time t the available bandwidth, say B(n), is less than the minimum required for transmitting a frame in one frame period, say B0. In case that the current bandwidth is greater than the requested one, all multimedia information can be delivered. The ratio L(n)= B0/B(n) indicates how many times the information should be reduced in order to be transmitted through the network. Consequently, the following K ⎧ L(n) if L(n) an integer ⎫ K =⎨ ⎬ ⎩⎣L( n) ⎦ + 1 if L(n) a float ⎭

(4)

frames (i.e., n+1, n+2, …,n+K-1) should be skipped, (as done e.g. in [12]) so as to satisfy the current bandwidth requirements. The ⎣⋅⎦ corresponds to the integer part operator. However, such an approach does not exploit the visual content as has been described by the features of the previous section. The skipped frames may contain a significant amount of information that is lost. A better solution is proposed here that performs a content-based sampling. In this way, among the current and the candidate frames for skipping (i.e., n, n+1, n+2, …,n+K1), the most representative is selected to be delivered (as described in section IV) whereas the remaining frames are discarded. Without loss of generality the most representative frame is the one with index t=n+J, where J is an integer 0≤J

Suggest Documents