Enabling VCR Functionalities in Streaming Media ... - Semantic Scholar

5 downloads 0 Views 351KB Size Report
methods that could be used in order to enable Video Cassette. Recorder (VCR) functionalities in Digital Streaming Media. Our methods are extended for both ...
Enabling VCR Functionalities in Streaming Media (Intelligent Interactive Streaming I2Stream) Alexis M. Tourapis, Member, IEEE, Feng Wu, Member, IEEE, and Shipeng Li, Member, IEEE

Abstract—In this paper we present new architectures and methods that could be used in order to enable Video Cassette Recorder (VCR) functionalities in Digital Streaming Media. Our methods are extended for both scalable and non-scalable media, and could potentially considerably enhance the user's experience while at the same time reduce the decoding complexity and the network bandwidth required for these applications. Index Terms—Digital video, video coding, digital video cassette recording (VCR), streaming video, interactive video, I2stream, scalable video

I. INTRODUCTION ULTIMEDIA streaming applications have in recent years become a very important part of our lives, mainly due to the emergence of new highly efficient video coding standards (MPEG-4 [1] and H.264 [2]), architectures (Microsoft’s Windows MediaTM [3] and RealNetwork’s HelixTM platform [4] -[5]), and digital set-top box devices such as ReplayTV [6] and TiVo [7]. These technologies are currently set to displace older analog-based devices and systems such as analog TV and Video Cassette Recorders (VCR). The task though of enabling Interactive and VCR like functionalities, such as fast-forward (FW), rewind (RW), segment repetition etc, within these digital systems still remains a rather difficult and daunting task considering the nature of coded video. On the other hand, the only constrain for analog VCR devices was for random location/segment seeking mainly due to the use of tape storage. Unlike though analog video, where each picture is completely independent and can be viewed easily at any time, digital video (DV) is instead constrained by a large number of interdependencies within different sections, i.e. pictures or Groups of Pictures (GOP). More specifically, a GOP may consist of Intraframe (I), Interframe (P), and Bi-directional (B) pictures. From these, I pictures are encoded completely independently from other pictures types, while P and B pictures use motion compensated methods and are predicted from previously encoded pictures to achieve much higher coding efficiency. The decoding dependency of P and B pictures immediately

M

A. M. Tourapis was with Microsoft Research Asia, Beijing 100080 China. He is now with Thomson - Corporate Research, 2 Independence Way, Princeton, NJ 08540 USA (phone: 609-987-7329; fax: 609-987-7299; e-mail: [email protected]). F. Wu, is with Microsoft Research Asia, Beijing 100080 China. (e-mail: [email protected]). S. Li, is with Microsoft Research Asia, Beijing 100080 China. (e-mail: [email protected]).

implies that, in order to decode and display a specific segment or picture within a video stream it might be necessary that several other pictures have to be first decoded. Obviously, this requirement could be very expensive for the decoder but also for the network bandwidth, in the case of streaming applications, considering that more data need to be decoded or transmitted respectively at a given instant than usual. If the capabilities of the decoding device (e.g. PocketPC) or the network (e.g. Internet) are limited, then such functionalities within the existing architectures become, if not impossible, very difficult to achieve. To partly solve this problem, most of the existing systems consider only I pictures when a VCR functionality is requested. Basically, in order to achieve a specific FW or RW ratio, only I pictures that are most closely satisfying the given ratio compared to the current position are transmitted and decoded. Evidently, the functionality of this method is highly limited by the GOP structure while, when considering that I pictures themselves usually require considerable higher bitrate than other picture types, it is very likely that this solution is not adequate when considering network bandwidth. Recent standards also tend to prefer only encoding I pictures when necessary (e.g. on scene changes or after relatively long intervals) since higher encoding performance can be achieved by P and B pictures. In [8] Chen et al. proposed, in order to enhance the RW and Random Access functions for a Video Client, converting already received pictures into I pictures and storing them onto a local hard-disk. This would enable the client to be able to access and decode at any time any picture within the video, thus reducing its computational cost. Unfortunately this kind of buffering would also imply that a considerable storage space exists on the decoder, while also it might not be desirable for a client to be able to store the video after this was received (i.e. due to copyright reasons). Other methods instead of using I pictures, considered skipping entire segments/GOPs [9] in order to achieve certain VCR functionalities but their quality and performance was rather limited. In [10] an alternative method was proposed according to which an additional version of the same video is encoded for FW retrieval at a higher frame rate. During normal playback, the original video is transmitted and decoded, while if FW mode was requested, data from the FW video are first GOP synchronized (using key I pictures) and then transmitted. In [11] an extension was proposed where multiple FW and RW videos were used, which enabled support for multiple FW and RW ratios. This method also suggested sub-sampling the FW and RW videos thus reducing storage and transmission cost. Nevertheless, these methods were all restricted by the GOP synchronization process, while none of them appeared to consider network

II. SWITCHING PICTURES AND BI-DIRECTIONAL STREAM COMBINING A. Switching Intra Pictures Switching Intra (SI) Pictures were introduced in [13] for enabling random access, splicing and error resiliency/recovery within a sequence. Such pictures are very similar in structure to I pictures, i.e., applying spatial transform and quantization with [2] or without [1] employing spatial prediction for a block from its neighboring pixels, whereas the blocks are reconstructed as in SP-pictures [12],[14], i.e., applying the transform and quantization steps for the predicted block from the intra prediction. In order to support VCR functionalities using such a method, a secondary switching stream only comprising from SI pictures, as can be seen in Figure 1, is required. If for every picture within the original stream there exists a corresponding, identical in reconstruction, SI picture, this would allow us to

access any picture at any time by just transmitting the SI picture information, and without having to transmit any further information about other pictures within the sequence. Obviously fewer SI pictures could be used, i.e. at given intervals, while other pictures (i.e. P2, P4, P6, and P8 in Figure 1) could be accessed by transmitting and decoding, apart from their own encoded information, their corresponding SI picture. I1

P2

SP3

P4

SP5

P6

SP7

P8

I9

SP10

Original Bitstream

Switching Bitstream SI5

SI3 1

2

3

4

5

SI7 6

7

SI10 8

9

10

Figure 1: Switching through SI pictures.

Unfortunately SI pictures, although considerably reducing decoding complexity, when considering that their size is similar if not slightly larger to that of I pictures would require a considerable network bandwidth for transmission that may not be available. An additional problem of such a method is that for enabling SI reconstruction, the original bitstream needs to be constrained under some restrictions (i.e. the switching points should be coded with the less efficient SP pictures instead of P pictures), which would imply a relatively small loss in coding efficiency for the original bitstream. SI pictures themselves would also require significant file storage, which might not be desirable. Average bitrate (Mbps)

bandwidth variations. New technologies have been recently proposed that can partly solve some of the previously discussed issues. More specifically, in [12]-[14] the concepts of Switching P (SP) and Switching I (SI) pictures were introduced which allowed the use of a more flexible GOP structure within a stream, and thus allowing higher coding efficiency compared to fixed GOP structures. VCR functionalities could be achieved through the transmission of SI pictures, which allowed easy access at any point of the stream without significantly affecting coding performance. In [15]-[16] a different method that employed two opposite direction streams with specific GOP structures was proposed which enabled not only RW functionality, but also managed to considerably reduce the decoder complexity and network bandwidth through efficient switching between the two streams. Due to its nature, we will call this scheme as the Bwd/Fwd scheme throughout this paper. In this paper a new architecture is presented that can further reduce the associated costs on the decoder complexity and network bandwidth, and essentially enhance VCR functionalities within DV applications. Similar to the Bwd/Fwd scheme, this can be achieved by the introduction of an additional, not necessarily of opposite direction, secondary stream named as the Intelligent Interactive Stream (I2Stream). In our architecture, the original stream does not need to comply to any specific GOP structure, and thus can be coded with the best coding performance in mind, exploiting all possible interdependencies between video segments. On the other hand, the I2Stream stream is coded by exploiting considerably fewer interdependencies, thus achieving the required functionalities at lower decoding and bandwidth cost. Furthermore, in our architecture both original and I2Stream streams can be used for normal playback, which can be quite useful for other applications, such as error recovery. In Section II we will first present in more detail the concept of SI pictures and how the Bwd/Fwd scheme could improve VCR functionalities. In Section III the I2Stream architecture will be presented, while finally an extension of the I2Stream within scalable streams will be discussed in Section IV.

22 20 18 16 14 12 10 8 6 4

Single Bwd/Fwd

2

4

6

8

Speed-up factor

10

12

Figure 2: Average bitrate for transmitting a 3Mbps sequence (GOP=14) over the network using a single forward stream versus the Bwd/Fwd scheme with respect to different speed-up factors for Fast Forward mode

B. Bidirectional (Bwd/Fwd) Stream Combining An alternative method using an additional backward coded stream was introduced in [15]-[16]. In this scheme I pictures for both streams were alternately equally placed thus enabling easy random access capability at key intervals (Figure 3a). Furthermore, by introducing a least cost method according to the distance of each requested picture versus the two corresponding I pictures from each bitstream, this scheme allowed considerably reduced bandwidth wastage versus a single bitstream approach (Figure 2). The method also enabled transparent and of equal complexity RW function support. A drift compensated approach was also introduced (Figure 3b) which avoided any undesirable errors from the bitstream switching process by also transmitting an additional residual picture that allowed perfect switching between the two streams, essentially similar to SP and SI pictures. Even though

Forward Bitstream

Forward Bitstream

I

P

P

P

P

P

P

P

I

P

RFB

RFB

RFB

RFB

R FB

R FB

R FB

R FB

R FB

R FB

RBF

RBF

R BF

R BF

R BF

R BF

R BF

RBF

RBF

R BF

P

P

P

P

I

P

P

P

P

P

1

2

3

4

5

6

7

8

9

10

FB Drift Comp.Stream

P

I

P

P

P

P

P

P

P

I

BF Drift Comp.Stream

Backward Bistream P

P

P

P

I

P

P

P

P

P

1

2

3

4

5

6

7

8

9

10

Backward Bistream

(a) Figure 3: Bwd/Fwd Stream without (a) and with (b) Drift Compensation

such method is quite attractive, it can be observed from Figure 2 that in terms of bandwidth but also of decoding complexity further improvement is necessary. In particular, for transmitting a 3Mbps bitstream with a more than 4 times speed–up factor, we would require approximately 3 times the bandwidth to be available. One other, relatively minor, drawback of such a method is that in order for the backward stream to be available, the entire sequence needs to be already available. A fixed GOP size is usually also necessary for both backward and forward streams, thus also affecting coding efficiency. III. INTELLIGENT INTERACTIVE STREAMING (I2STREAM) In this paper we introduce a new scheme that, similar to the Bwd/Fwd scheme, also requires an additional stream named as I2Stream. The I2Stream, due to its format, can solve most of the problems discussed previously, and essentially reduce the decoding complexity and associated network bandwidth of VCR applications even further. Unlike the Bwd/Fwd and SI picture schemes, the I2Stream relies on the usage of a new GOP structure where not all P pictures can be used as references for other future, or past reference pictures, and could even be immediately discarded after decoding (Figure 4a). This concept of disposable P pictures (or Pd pictures), even though would imply some coding efficiency loss for encoding the I2Stream stream (around 30-40% if 3 disposable pictures are used) compared to the original stream which remains unaffected, it allows us of increased flexibility in terms of VCR capabilities. In particular, if there is a request for a specific VCR function, the server or decoder can switch, if needed, to the I2Stream while it does not need to transmit or decode any of the disposable

I

Original Bitstream

P

P

P

P

P

P

I

P

P

Drift Comp.Stream R

R

R

R

R

R

R

R

R

(b)

pictures if such are not needed for display. This can considerably decrease the network bandwidth and decoder complexity requirements. Furthermore, similar to the Bwd/Fwd scheme, data from the original stream could also be used for accessing specific pictures to improve performance, which could be decided based on a given cost measure (e.g number of decoded pictures or/and bitrate). This process obviously requires that the positions of each picture within the streams are accessible, i.e. with the use of a Metadata file, which would also allow us to calculate whether the switching is necessary. Drift compensation could also be used to remove any error propagation due to switching. Drift compensation could be performed from the I2Stream to the original but also vice versa (Figure 4b). This could also be partly avoided, by using a fixed GOP structure and allowing switching only at key intervals, with though some loss in coding efficiency of the original. In general though, the two streams can have relatively distinct GOP structures, and could be optimized separately from each other. It is of course also possible to use the same alternate I picture and bidirectional coding structure of the Bwd/Fwd scheme to improve random access and the RW function (Figure 4b). B pictures (Figure 5) could also be used within the I2Stream stream, and could either completely replace Pd pictures, or can be combined together within the stream. In this case we may allow or disallow B pictures to refer to Pd pictures, depending on whether we would prefer having higher coding efficiency within the I2Stream or improved VCR functionality performance. In general though, although B pictures can enhance coding efficiency, they also tend to increase the dependencies between pictures, and thus also probably limiting the required VCR capabilities. By taking in consideration that if a VCR functionality such

I Stream Bistream

R

Pd

Pd

Pd

P

Pd

Pd

Pd

I

2

3

4

5

6

7

8

9

P

P

P

P

P

I

Pd

R

R

R

R

R

R

R

R

R

I2Stream Bistream P

1

P

P

Drift Comp.Stream

R

2

I

P

I

Original Bitstream

Pr

Pr

Pr

I

2

3

4

5

Pr

Pr

6

7

P

P

Pr

9

10

r

10 1

8

(a) (b) Figure 4: The I2Stream scheme with (a) single and (b) dual compensation switching. I2Stream can also use backward coding (b), to enhance RW.

as FW is used, a user expects to browse through a relatively long number of pictures without caring as much for their actual quality compared to when viewing the original bitstream, higher efficiency can be achieved by coding the entire or certain pictures (i.e. I pictures) within the I2Stream at lower quality or/and resolution compared to the original stream. This would allow us to considerably improve the compression of the I2Stream thus reducing the network bandwidth required. Obviously this would also imply that the drift compensation data needed when switching back to the original stream are likely to be considerably more, but since these are transmitted only when the VCR function is terminated, they would not affect the performance of our scheme significantly. I

Original Bitstream

P

P

P

P

P

P

P

P

P

Drift Comp.Stream R

R

R

R

R

R

R

R

R

R

I2Stream Bistream P

B

Pr

B

I

B

Pr

B

P

B

1

2

3

4

5

6

7

8

9

10

Figure 5: I2Stream with B pictures

An estimate of the average number of pictures needed to be transmitted and decoded for accessing one picture versus a single and the Bwd/Fwd streams can be seen in Figure 6, where we observe that for lower speed up ratios (up to 6 times) considerably fewer pictures (10-40%) are necessary.

Av. # of decoded pictures

7 6

Streaming Enhancement layer

Single BW/FW Proposed

5 4

especially if it is known that the error would be compensated soon enough (i.e. by an I picture). This information could be easily included in the metadata file and can be decided on the server or decoding device. Furthermore, drifting compensation does not always need to be perfect, and some small drifting, for some applications, could be tolerated. Thus, for generalization purposes, the switching pictures need not always comply with the restrictions imposed by the switching process. An additional benefit of our I2Stream architecture is that we may use the I2Stream for other applications that may require temporal or spatial scalability. Obviously the I2Stream can allow temporal scalability due to the disposable Pd and B pictures, while spatial scalability can be achieved if all I2Stream pictures are coded at a lower resolution, and could be essentially used for playback purposes of a relatively coarse version of the video stream at a considerably lower bitrate than the original. Apart from this design enabling scalable streaming applications without the necessity of more complicated designs such as Fine Granularity Scalability (FGS) coding [17]-[20], it could potentially also be useful for several other applications including Picture in Picture (PnP), video browsing, video searching etc. For example, if PnP is requested by a user, two completely different video sequence streams could be transmitted at the same time, one for normal playback using the original stream and a second one for PnP playback. This second stream can be transmitted at a lower resolution and lower frame rate using the I2Stream. The user can at any time select which stream should be for normal or for PnP playback since only a relatively simple switching process between the original and the corresponding I2Stream of each sequence is required.

Streaming Base Layer

3

SF

SF

SF

SF

SF

SF

SF

I

Pr

Pr

P

Pr

Pr

I

1

2

3

4

5

6

7

2 1 2

4

6

8

10

12

Speed-up factor Figure 6: Average number of decoded pictures to display one picture for a given speed-up factor.

As we have previously discussed, drifting compensation can be used for removing any error propagation due to the switching process. Nevertheless, this process can also have an impact on the quality of the original stream due to restrictions imposed in the encoding process (i.e. using SP pictures instead of P). To avoid this, we may restrict switching to the original only at fixed key positions. The I2Stream can itself be used for normal playback purposes and switching is performed only when the switching position is reached. It is also possible, under certain conditions, to allow switching to the original at a given point without caring about error propagation and having to transmit any drift compensation data. This could be quite beneficial and would not impact visual quality significantly

I2Stream Enhancement layer I2Stream Switching layer I2Stream Base Layer

Figure 7: Usage of the I2Stream in scalable streams

IV. I2STREAM IN SCALABLE STREAMS Even though we have already claimed that the I2Stream can provide some scalability capability, this can nevertheless be considered relatively limited. On the other hand, all of the above concepts could also be applied on scalable video architectures [17]-[20]. In this case the I2Stream can also be a scalable stream that consists of a base layer (BL), a switching layer (SL), and an enhancement layer (EL) (Figure 7).

Structures using backward coding (Figure 8a) and B pictures (Figure 8b) could also be used. The BL essentially provides the same functionality as was discussed in previous sections. If a VCR function is chosen then, if necessary, the I2Stream BL is transmitted instead. If it is necessary to switch back to the original stream, then the SL can be transmitted to avoid error propagation on the original. The SL can be coded either as a single error image, or itself use FGS methods. For example, in some cases where some drifting is allowed, the switching could take place by sending only part of, or by even completely omitting, the switching layer. For obvious reasons, and to satisfy the scalability requirements during playback, we observe from Figure 7 that switching is always performed at the two base layers. Nevertheless, the I2Stream EL, which is also FGS coded, can also be used to predict the original's enhancement layer and thus not limiting performance. Furthermore, this layer could be used as a mechanism for providing enhanced quality for any given picture especially if Progressive FGS (PFGS) [19][20] is used on the original. In PFGS coding the Els are not completely independent which would imply that considerable drifting would occur if they were used independently to enhance the quality of a single picture for display. Finally, as in the non-scalable case, the I2Stream can also be used for error resiliency/recovery purposes.

[3] [4] [5] [6] [7] [8]

[9]

[10] [11] [12] [13] [14] [15]

V. CONCLUSION In this paper, we have introduced a new architecture, called I2Stream, which can enable VCR and Interactive capabilities, such as Fast Forward, Rewind, Random Access, Selective Picture Quality Enhancement etc, within multimedia content, while minimizing the required network bandwidth and decoder complexity. Our method can be applied in both scalable and non-scalable video content and architectures, while it can also find use in error resiliency and recovery applications.

[2]

[17] [18] [19]

REFERENCES [1]

[16]

ISO/IEC Standard 14496-2:2001. Information technology – Coding of audio-visual objects – Part 2: Visual Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC) –

[20]

Joint Committee Draft”, document JVT-D015d5, Jul’02 Microsoft Windows MediaTM, Microsoft Corporation Inc., http://www.microsoft.com/windows/windowsmedia/ RealNetworks HelixTM platform, RealNetworks Inc. , http://www.realnetworks.com/solutions/leadership/helix.html The Helix Community, https://www.helixcommunity.org/ SonicBlue Inc. ReplayTV, http://www.replay.com/ TiVo Inc, http://www.tivo.com/ M.S. Chen and D.D. Kandlur, “Downloading and stream conversion: supporting interactive playout of videos in a client station,” in Proc. of 2nd Int. Conf. on Multimedia Computing and Systems, pp 73-80, May 1995 T.G. Kwon and S. Lee, “PRR: prime round-robin placement for implementing VCR operations,” in Proc. of 1995 IEEE Int. Conf. on Systems, Man and Cybernetics-'Intelligent Systems for the 21st Century', Vol.5, Pages:3920-3925, 22-25 Oct 1995 S. Berson, S. Ghandeharizadeh, R.R. Muntz, and X. Ju, “Staggered Striping in Multimedia Information Systems,” SIGMOD Conference 1994, pp. 79-90 D.B. Andersen, “A proposed method for creating VCR functions using MPEG streams,” in Proc. of the Twelfth International Conference on Data Engineering, pp.380-382, 26 Feb-1 Mar 1996 Ragip Kurceren and Marta Karczewicz, " Improved SP-frame Encoding", document VCEG-M73, ITU-T Video Coding Experts Group Meeting, Austin, TX, 02-04 April 2001 Ragip Kurceren and Marta Karczewicz, “New Macroblock Modes for SP-frames”, document VCEG-O47, ITU-T Video Coding Experts Group Meeting, Pattaya, Dec. 2001. Xiaoyan Sun, Feng Wu, Shipeng Li, and Ragip Kurceren, “The improved JVT-B097 SP coding scheme”, document JVT-C114, JVT Meeting, Fairfax, May 2002. Chia-Wen Lin, Jian Zhou, Jeongnam Youn, and Ming-Ting Sun, “MPEG video streaming with VCR functionality,” IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 415-425, Mar. 2001 Chia-Wen Lin, Jeongnam Youn, Jian Zhou, Ming-Ting Sun and Iraj Sodagar, “MPEG video streaming with VCR functionality,” in Proc. IEEE Int. Symp. Multimedia Software Eng., pp. 146-153, Dec. 2000, Taipei, Taiwan. W. Li, “Fine granularity scalability in MPEG-4 for streaming video”, in 2000 Proc. IEEE Int. Symp. on Circuits and Systems, ISCAS 2000, vol 1, 299-302, Switzerland, May 2000. W. Li, “Overview of Fine Granularity Scalability in MPEG-4 Video Standard”, IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 301-317, Mar. 2001 F. Wu, S. Li and Y.-Q. Zhang, “A framework for efficient progressive fine granularity scalable video coding”, IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no 3, pp 332-344, Mar. 2001. Xiaoyan Sun, Feng Wu, Shipeng Li, Wen Gao, “The framework for seamless switching of scalable bitstreams”, ISO/IEC JTC1/SC29/WG11 – Doc. No.W8214, Jeju Island, March 2002.

Streaming Enhancement layer

Streaming Enhancement layer

Streaming Base Layer

Streaming Base Layer

SF

SF

SF

SF

SF

SF

SF

I2Stream Enhancement layer

I2Stream Enhancement layer

I2Stream Switching layer

I2Stream Switching layer

I2Stream Base Layer

I2Stream Base Layer

SF

SF

SF

SF

SF

SF

SF

I

Pr

Pr

P

Pr

Pr

I

I

B

B

P

B

B

I

1

2

3

4

5

6

7

1

2

3

4

5

6

7

(a) Figure 8: I2Streams with (a) backward coded GOP and (b) B pictures

(b)

Suggest Documents