COMPRESSED DOMAIN MPEG-2 VIDEO EDITING WITH VBV REQUIREMENT Ren Egawa Integrated Circuit Systems Sarnoff Corporation Princeton, NJ 08540
[email protected] ABSTRACT A novel method is proposed to achieve efficient MPEG-2 video editing in compressed domain while preserving Video Buffer Verifier (VBV) requirements. Different cases are determined, according to the VBV modes of the bitstreams to be concatenated. For each case, the minimum number of zero-stuffing bits or shortest waiting time between two streams is determined analytically, so that the resulting bit-stream is still VBV-compliant. The simulation results show that the proposed method is applicable to any MPEG-2 bit-stream independent of its encoder. 1. INTRODUCTION As compression becomes the heart of digital video applications, the need for editing video in the compressed domain has substantially increased. The benefits of video editing in the compressed domain are abundant. While the obvious ones include savings for memory, processing power and delay (due to decoding and then re-encoding the edited video back to the compressed domain), the most significant benefit is the preservation of picture quality by avoiding this lossy decode/re-encode chain. Video editing in the compressed domain poses its own technical challenges. One of the major challenges is to ensure that a bitstream concatenated with two independent source bitstream still meet Video Buffer Verifier (VBV) requirement, as defined by the MPEG-2 standard [1]. The VBV requirement is a normative part of the MPEG-2 standard and hence, any MPEG source bitstream inherently meets the VBV requirement. However, there is no guarantee that simple hook up of two bitsteams will result in a new bitstream that still meets the VBV requirement. Some studies have shown that redundant data stuffing in an ad-hoc way can be used to solve this problem [2], [3]. In this paper, we propose a formulation that determines the minimum number of redundant or zero-stuffing bits. By inserting these bits (or simply by introducing a precise waiting period that
A. Aydin Alatan and Ali N. Akansu New Jersey Center for Multimedia Research New Jersey Institute of Technology Newark, NJ 07012 {alatan,ali}@oak.njit.edu corresponds to this insertion), the concatenated stream is able to meet the VBV requirement. In terms of VBV modes of the source bit-streams, all possible concatenation cases are determined and analyzed in the sections that follow. 2. MPEG-2 VBV MODEL The MPEG-2 standard defines a hypothetical buffer model, called Video Buffer Verifier, or simply VBV. The VBV is conceptually connected to the output of an MPEG-2 encoder and the input of an MPEG-2 decoder, as shown in Fig. 1. By monitoring its VBV status, the encoder makes sure that VBV buffer of the decoder does not overflow or underflow. VBV overflow occurs if new data are transmitted to the buffer when the buffer is already full. Whereas VBV underflow occurs if all the bits of a frame have not arrived yet at that frame’s decoding time (time for the compressed bits of that frame to be removed from the buffer for decoding). The MPEG2 standard allows VBV underflow for low-delay bitstream, that is, a bit-stream that contains no B-type picture [4]. However, the standard strictly does not allow VBV overflow under any circumstances [1]. A bit-stream that meets the VBV requirement is considered as VBVcompliant. Some methods about generating a VBVcompliant stream can be found in [3], [4].
Source Picture Data
Compressed MPEG2 Bitstream Data Video Encoder Transmission Video Buffer Verifier (VBV)
VBV Buffer
MPEG2 Video Decoder
De-compressed Picture Data
Figure 1: Overview of VBV in MPEG-2 Standard During encoding process, the encoder specifies VBV buffer size and bit rate, then transmits these settings to the decoder within the bit-stream. Subsequently, both encoder and decoder follow a set of MPEG-2 rules to
determine and execute synchronously to answer these questions: § when to remove data from VBV buffer for decoding? § how many bits of the data to be removed from VBV? The MPEG-2 standard defines two VBV modes, namely, Constant Bit-Rate (CBR) and Variable Bit-Rate (VBR) modes [1]. The rules to determine the above listed tasks differ in these two modes. Fig. 2 depicts both of these models. As it is observed from this figure, CBR mode should have a fixed predetermined rate during data transfer, whereas VBR is assumed to achieve the transfer with its maximum rate until it fills the buffer. As a reference to the MPEG-2 video syntax and for a better comprehension, Fig. 3 illustrates a typical bitstream. CBR VBV mode bit s
• T(n): period that D(n) is entering VBV, i.e., T(j) = vbv_delay(j) - vbv_delay(j+1) + ∆t(j), where ∆t(j) = t(j+1) - t(j) • vbv_delay(n): a bitstream syntax element indicating time that B(n) shall remain in VBV, counting from the time D(n)’s first bit enters VBV until time t(n). In Fig. 2, the diagonal lines represent data entry to the buffer, while the vertical lines show data removed from the buffer. In CBR VBV model, D(n) enter VBV at the piecewise-constant rate R(n)=D(n)/T(n), except for the last picture, nlast. D(nlast) and b(0) enter VBV at Rmax which is specified by bit_rate in the bitstream. In VBR VBV model, data always enters VBV at Rmax, if the buffer is not full. When the buffer is full, no data enters the buffer, i.e., R(n) = 0.
VBR VBV mode bit s
Buffer Size
Buffer Size
3. BITSTREAM EDITING AND CONCATENATION
b0 B0
b1 B0
D0
bit-rate = Rmax always
bit-rate = R0 B3 T0 0
t_init vbv_delay0
t1
t2
t0
0
t3 time
t0
t1
t2
t3 time
Figure 2: CBR VBV mode vs. VBR VBV mode sequence_end_code group_start_code
b0
B0
picture_start_code
D0
picture_start_code
b1
B1 D1
group_start_code b2
picture_start_code
B2 D2
picture_start_code
b3
Let segmentA and segmentB be two sub-streams that are extracted from independent MPEG-2 video bit-streams. Assume that these two segments are to be concatenated to form a new video bitstream. Although both segments are VBV-compliant by themselves, there is no guarantee that the concatenated stream is still going to be VBVcompliant. We propose a method to guarantee VBVcompliance by matching the buffer behavior of the concatenated stream to that of segmentB. Taking both VBV modes into account, there will be at least four cases. Furthermore, even within the same VBV mode, SEC at the end of segmentA indicates that bit-rate and other variables may be changing from segmentA to segmentB. Thus, two additional cases must also be added to consider the existence of SEC. MPEG-2 standard does not allow variable change without SEC at the end of segmentA. The resulting six concatenation cases are as follows:
B3 D3
sequence_end_code
Figure 3: Illustration of B(n), b(n), and D(n) The notations used in Fig. 2 and Fig. 3 are defined as: • B(n): all bits for the nth picture_start_code (PSC) and nth coded picture data (CPD). (If they exist, group_start_code (GSC) and / or sequence_start_code (SSC) immediately preceding the PSC, or sequence_end_code (SEC) following CPD, are also included in B(n)). • b(n): all the bits in B(n), excluding CPD and SEC. • D(n) = B(n) – b(n) + b(n+1) • t(n): time that B(n) must be removed from VBV
§ § § § § §
Case 1: segmentA (CBR) without SEC + segmentB (CBR) Case 2: segmentA (CBR) with SEC + segmentB (CBR) Case 3: segmentA (VBR) without SEC + segmentB (VBR) Case 4: segmentA (VBR) with SEC + segmentB (VBR) Case 5: segmentA (CBR) with SEC + segmentB (VBR) Case 6: segmentA (VBR) with SEC + segmentB (CBR)
The concatenation process for Case 4 and Case 5 is a simple connection, since the latter VBR segment will not be affected from the mismatch of the former segment. However, the other cases require special care. In order to handle these cases, the proposed method uses one of the two techniques below: 1. Zero Stuffing Bit (ZSB) technique (Case 1 and Case 3) 2. Wait Period (Twait) technique (Case 2 and Case 6)
Although ZSB technique is easier to implement, it may introduce buffer overflow during transition from segmentA to segmentB, since the rate is assumed to be Rmax after SEC at concatenation point. Therefore, only Case 1 and Case 3, where no overflow risk exists, should use this technique. Twait technique is derived from the same principle of the ZSB technique. Although Twait technique is more difficult to implement, it will not cause a buffer overflow during such a transition. Before examining both techniques in detail, let us make the following definitions that apply to the next sections:
The smallest integer of variable k should find the smallest value of Nnext, so that the minimum number of Nzsb can be achieved. Note that the value of k always represents the number of VBV underflow occurrences. The MPEG-2 standard allows VBV underflow only for low-delay bitstream. Therefore, if the segments are not themselves low-delay bitstreams, then one must find another pair of point p or point q, where condition for Eq. (1) is true. Alternatively, one may insert a SEC at the end of segmentA and avoid this underflow restriction. 3.1.2. Case 3: segmentA (VBR) without SEC + segmentB (VBR)
§ p : index of the last transmitted picture in segmentA § q : index of the first transmitted picture in segmentB § F(n) : VBV buffer fullness at t(n) before removing B(n) picture data from the buffer. 3.1. Zero Stuffing Bit (ZSB) Technique Zero Stuffing Bit (ZSB) Technique inserts a number of zero bits between two segments. The exact number of ZSB, Nzsb, can be determined by the following relation: Nzsb = Nnext – Nnew where Nnext and Nnew depend on cases described below. 3.1.1. Case 1: segmentA (CBR) without SEC + segmentB (CBR)
The algorithm for Nnext and Nnew can be summarized by the pseudo-code below: Nnew = vbv_delay(q) * R(p) + b(q) if (vbv_delay(p+1)*R(p)+b(p+1)) ≥ (vbv_delay(q)*R(p)+b(q)), then Nnext=vbv_delay(p+1)* R(p) + b(p+1)… … … .(1) else Nnext=(vbv_delay(p+1)+∆t*k)*R(p)+b(p+1)… (2) where k = smallest integer satisfying Nzsb > 0. It should be noted that if the concatenation process employs a simple hook up method (essentially choosing Nzsb=0), F(q) will still be consistent before and after the concatenation. Therefore, VBV compliance still can be met. However, the simple hook up method always introduces a bit-rate drop effect (change in bit-rate within a picture period) during the concatenation transition time period, before matching F(q) to the original F(q) of segmentB. Although MPEG-2 standard allows R(n) of CBR be any rate as long as it is lower than Rmax, MPEG2 strongly recommends that R(n) be constant throughout the entire bitstream, and that this R(n) match the actual value of bit_rate coded in the bitstream. If this requirement is strictly observed, then Eq. (1) must be used.
The pseudo-code is as follows: if (F(p+1) = F(q)), then Nzsb = 0 /*no need to determine Nnew or Nnext*/ else Nnew = number of data bits for segmentB entering VBV during the period from B(q) starts entering until the time t(q). if (F(p+1) > F(q)), then Nnext = number of data bits for segmentA entering VBV during the period from B(p+1) starts entering VBV until the time t(p+1) else Nnext = number of data bits for segmentA entering VBV during the period from B(p+1) starts entering VBV until the time t(p+1+k) where k is the smallest integer satisfying Nzsb > 0. For both Case 1 and Case 3, once the value for Nzsb is determined, the zero bits are inserted between segmentA and segmentB. Together segmentA, the zero bits and segmentB form a new complete bitstream. 3.2. Wait Period Technique This technique determines a waiting period (Twait) during which no bit is entering VBV. Once Twait is determined, it is performed after SEC of segmentA. After Twait period passes, segmentB starts entering VBV. Similar to ZSB technique, one can write Twait = Tnext-Tnew where Tnext and Tnew are explained below. 3.2.1. Case 2: segmentA (CBR) with SEC + segmentB (CBR)
Tnext = vbv_delay(p) + ∆T – T(p), where ∆T is arbitrary defined but must satisfy Twait ≥ 0. Tnew = vbv_delay(q) + b(q) / R2max 3.2.2. Case 6: segmentA (VBR) with SEC + segmentB (CBR)
Tnext = The time period from when B(p) finishes entering VBV until time {t(p) + ∆T}
Tnew = vbv_delay(q) + B(q) / R2max 4. SIMULATION RESULTS In this paper, ZSB technique of Case 1, is selected to demonstrate the effect of the proposed concatenation method. The simulation results for other cases can be found in [5]. Figs. 4 and 5 depict the results with and without utilization of ZSB technique, respectively. In both figures, the concatenated stream's VBV buffer fullness path is represented by the solid line (CS), while the original VBV buffer fullness paths of segmentA and segmentB are represented by the dotted lines, namely BS1 and BS2, respectively. VBV buffer size is selected as 1.65 Mbits. Fig. 4 shows how CS matches BS1 until the last picture data B(p) is removed from the buffer (two lines are on top of each other; dotted line is not visible). After that point, CS matches BS2 from the time that the first picture data B(p) is removed from the buffer. In short, F(q) is the same before and after the concatenation. On the other hand, Fig. 5 shows how F(q) levels of CS and BS2 do not match. Hence, CS eventually overflows the VBV buffer. It should be noted that there is a negative slope of CS after D(p) finishes entering VBV. The negative slope effect is attributed to the negative value of T(j) caused by a large vbv_delay(j+1) value. A negative slope does not reflect the real world, i.e., bit-rate can not be negative. Therefore, Fig. 5 can be viewed as a situation in which MPEG-2 VBV model is not logically possible to follow. In summary, CS path in Fig. 5 simply reveals that the concatenated stream is VBV-incompliant.
and can safely concatenate any two legal MPEG-2 bitstreams. 6. REFERENCES [1] ITU-T H.262/ISO-IEC 13838-2, MPEG-2, H.262, 1996 [2] J. Meng and S-F Chang, "Tools for Compressed-Domain Video Indexing and Editing," SPIEC, Vol.2671, 1996 [3] P.J. Brightwell, S.J. Dancer, and M.J. Knee, "Flexible Switching and Editing of MPEG-2 Video Bitstreams," International Broadcast Convention (IBC97), 1997 [4] Joan L. Mitchell, William B. Pennebaker, Chad E. Fogg, and Didier J. LeGall, MPEG Video Compression Standard, Chapman & Hall, chapter 15, 1997. [5] R. Egewa, “Editing MPEG-2 Video in Compressed Domain”, MS Thesis, NJIT, Jan. 2000.
F(q) B(q) removed from VBV
B(p) removed from VBV
Figure 4: VBV Result with ZSB Technique 5. CONCLUSION This paper proposes a method for concatenating two independent bit-streams in order to obtain a new one that will still meet the VBV requirement of the MPEG-2 standard. Compared to other methods, the proposed method is efficient, since it determines the minimum number of redundant bits to be inserted between the two original segments. The parameters ∆t and ∆T, which are utilized to find Nzsb or Twait, can be defined according to a specific application. The method makes use of information only encoded in a typical MPEG-2 bit-stream, such as bit_rate or vbv_buffer_size. Hence, it can be implemented independent of the source encoders, since it does not impose any restriction or require additional information from the source encoders. Therefore, the proposed method is quite flexible to be utilized in any application
CS's F(q)
BS2's F(q) D(p) finishes entering VBV
Figure 5: VBV Result without ZSB Technique