3114
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Embedded Multiple Description Coding of Video Fabio Verdicchio, Adrian Munteanu, Augustin I. Gavrilescu, Jan Cornelis, and Peter Schelkens, Member, IEEE
Abstract—Real-time delivery of video over best-effort errorprone packet networks requires scalable erasure-resilient compression systems in order to 1) meet the users’ requirements in terms of quality, resolution, and frame-rate; 2) dynamically adapt the rate to the available channel capacity; and 3) provide robustness to data losses, as retransmission is often impractical. Furthermore, the employed erasure-resilience mechanisms should be scalable in order to adapt the degree of resiliency against transmission errors to the varying channel conditions. Driven by these constraints, we propose in this paper a novel design for scalable erasure-resilient video coding that couples the compression efficiency of the open-loop architecture with the robustness provided by multiple description coding. In our approach, scalability and packet-erasure resilience are jointly provided via embedded multiple description scalar quantization. Furthermore, a novel channel-aware rate-allocation technique is proposed that allows for shaping on-the-fly the output bit rate and the degree of resiliency without resorting to channel coding. As a result, robustness to data losses is traded for better visual quality when transmission occurs over reliable channels, while erasure resilience is introduced when noisy links are involved. Numerical results clearly demonstrate the advantages of the proposed approach over equivalent codec instantiations employing 1) no erasure-resilience mechanisms, 2) erasure-resilience with nonscalable redundancy, or 3) data-partitioning principles. Index Terms—Multiple description coding, packet erasure resilience, scalable video coding.
I. INTRODUCTION ULTICAST applications require several representations of the same content to be sent to a variety of end users with different bandwidth provisions and playback capabilities such as display resolution, refresh-rate, and processing power. In conjunction to that, the data delivered through best-effort packet-switched networks may be corrupted or incomplete. Furthermore, participants to multicast sessions should not rely on retransmission mechanisms as a large number of requests and the consequent reiterated packet transmissions would burden the network, thus increasing the risk of congestion. In such application scenarios, it is essential to develop scalable coding techniques that are capable of adapting the transmitted bitstream to the available channel capacity and user demands and, at the
M
Manuscript received July 25, 2005; revised February 10, 2006. This work was supported in part by the Belgian Federal Office for Scientific, Technical and Cultural Affairs (IAP Phase V – MOTION), and in part by the Fund for Scientific Research–Flanders (FWO) under Grant G.0053.03 and Postdoctoral Fellowships (for A. Munteanu and P. Schelkens). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Amir Said. The authors are with the Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel, Pleinlaan 2 B-1050, Brussels, Belgium (e-mail
[email protected];
[email protected];
[email protected];
[email protected];
[email protected]). Digital Object Identifier 10.1109/TIP.2006.877495
same, time provide a controllable degree of resiliency against erasures. In an error-free transmission setting, scalable video coding (SVC) technologies [1]–[7] enable the media providers to generate a single compressed bitstream from which appropriate subsets, producing different visual qualities, frame-rates, and resolutions can be extracted to meet the preferences and the bit-rate requirements of a broad range of clients. In this context, state-of-the-art SVC techniques [3]–[7] abandoned the classic closed-loop predictive architecture [1], [2] in favor of the open-loop architecture [4], wherein motion compensated temporal filtering (MCTF) [8] and spatial multiresolution decomposition are combined with embedded coding. MCTF-based techniques provide full scalability and deliver compression performances superior to closed-loop SVC [8] and competitive against state-of-the-art nonscalable coding on a wide range of bit rates [9]. From a complementary perspective, providing resilience against packet erasures in an error-prone transmission scenario is as important as enabling scalability. In this context, one of the most effective techniques in combating the effect of packet erasures is multiple description coding (MDC), usually coupled with path diversity [10]. Originally proposed as an information theoretic problem [11], [12], MDC generates two or more complementary and (usually) equally important representations of the input, called descriptions, which are sent to the decoder over different links. In case of channel impairments, the decoder approximates the original signal by using the available descriptions. Distortion in the reconstructed signal decreases upon reception of any additional description and is lower bounded by the distortion attained by single description coding (SDC), operating at the same overall bit rate in an error-free transmission scenario. Techniques such as data-partitioning [10], pairwise correlating transforms [13], and multiple description scalar quantization [14] provide practical instantiations of the MDC concept. A few proposals recently appeared joining open-loop scalable video coding with multiple description coding. Bajic and Woods [15] apply data-partitioning approaches to the wavelet coefficients and motion vectors produced by MCTF, thus generating multiple descriptions. Van der Schaar and Turaga [16] propose splitting the sequence of temporally filtered frames to create multiple representations of the input. Critical data, such as motion vectors (MVs) and the lowest temporal band are duplicated in all descriptions, thus introducing redundancy. However, the erasure-resilience mechanisms employed in these systems are not scalable as they follow a rigid design: The degree of resilience against transmission errors is defined prior to encoding based on worst-case assumptions, and no mechanisms
1057-7149/$20.00 © 2006 IEEE
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
are provided to enable the adaptation of the redundancy to the varying channel conditions. In this paper, we propose a completely different approach for scalable erasure-resilient coding of video coupling the MCTF-based architecture with MDC. The core idea is to quantize the spatiotemporal decomposition of the input using our recently introduced embedded multiple description scalar quantizers (EMDSQ) [17]–[21]. This results into a number of descriptions for each spatiotemporal subband that are independently decodable and mutually refine the accuracy of the reconstruction. Additionally, since the quantizers are embedded, the resulting descriptions are progressively refinable, thus preserving full scalability. Description losses result in graceful and predictable degradation of the decoded video and spatiotemporal synthesis is possible without the need of estimating missing descriptions, as in [15] and [16]. The use of EMDSQ allows for steering the number of descriptions [17], [19], [20], similar to data partitioning, and for controlling the redundancy between them [21], similar to correlating transforms. The key aspect is that the encoder can trade robustness to errors for coding efficiency, enabling the adaptability of the video transmission to the varying network conditions. With this respect, the paper investigates a novel framework that controls the degree of resiliency against transmission errors after compression has been performed. Our proposal is completed by the formulation of the rate-allocation problem in the context of unreliable transmission: for any target bit rate and channel condition, the encoder optimizes and transmits only the subset of coded descriptions that minimizes the expected distortion at the decoding site. Considering packet-switched networks, our scheme treats the realistic case wherein descriptions spanning multiple packets are partially received. This extension is enabled by EMDSQ, which, additional to scalability and erasure resilience, allow the distortion computation for any loss pattern. The remainder of the paper is structured as follows. The generic principles of MCTF-based SVC are outlined in Section II, whereas the proposed architecture is introduced in Section III. The novel multiple description rate-allocation scheme is detailed in Section IV. Finally, Section V reports experimental results, while Section VI draws the conclusions of our work. II. SINGLE DESCRIPTION MCTF-BASED VIDEO CODING In this section, we provide a brief overview of the open-loop architecture. Concepts and notations introduced here are further necessary in the presentation of Sections III and IV. The flow-diagram of a classical MCTF-based architecture is depicted in Fig. 1. In a first step, motion-compensated temporal filtering of the input is employed in order to remove the temporal correlation among different frames: in essence, MCTF corresponds to a discrete wavelet transform (DWT) performed along the motion trajectories. The temporal transform is implemented via lifting [23], as illustrated in the simple example of Haar MCTF [6], [22] shown in Fig. 1. The temporal splitting of the input into even and odd frames is followed by motion-compensated prediction, wherein the predict operator produces an approximation of the odd frames based on the even ones. Within
3115
this primal lifting step [23] the error frames are determined. Next, within the dual lifting step [23], the update operator brings the mismatch information back into the even frames. This step aims at producing a temporal average-frame for each input pair. Finally, the normalization step (see Fig. 1) constructs the low-pass ( ) and high-pass ( ) temporally filtered frames, respectively. This decomposition process is iterated, using the -frames at temporal level as input to produce the - and , until the desired number of decomposiframes of level tion levels is reached. Notice that arbitrary wavelet temporal decompositions can be synthesized by alternating primal and dual lifting steps [23]. As the MCTF analysis proceeds, the ensuing - and -frames are spatially decomposed by means of a -level two-dimensional (2-D) DWT and labeled as - and -frames. The outcome of such a three-dimensional spatiotemporal decomposition is then quantized and entropy coded in an embedded fashion, along with the MVs, to ensure fine-grain quality scalability for both texture and motion data. Moreover, temporal and resolution scalability are inherently provided by the temporal-and-spatial multiresolution analysis performed by this architecture. Once the compressed data become available, the encoder selects for transmission only the subset of spatiotemporal subbands needed to comply with the resolution and frame-rate requirements of each client. The rate-allocation procedure then matches the output bit rate to the available bandwidth by limiting the quantization accuracy of the selected subbands and MVs. This task is efficiently performed by optimally truncating each embedded stream, encoding a subband or a MV, so that the minimum distortion in the decoded sequence is attained given the bit-rate constraint. At the decoding site, each client receives the compressed stream and performs entropy decoding and inverse quantization, thus generating approximations of the motion field and wavelet coefficients. Next, the spatiotemporal transform is inverted to reconstruct the sequence. In the context of reliable transmission, reconstruction errors are solely due to limited quantization accuracy of motion and texture information, since the spatiotemporal decomposition is perfectly invertible. III. MULTIPLE DESCRIPTION MCTF-BASED VIDEO CODING The proposed scalable error-resilient MCTF-based architecture is presented in this section. We begin by introducing the concept of embedded multiple description quantization (Section III-A) and then employ EMDSQ in the open-loop architecture in order to enable scalable robust source coding (Section III-B). A. Embedded Multiple Description Scalar Quantizers The basic principle of multiple description scalar quantization, first detailed by Vaishampayan [14], is to first map a continuous-valued source to a finite alphabet, corresponding to central-quantization, and then representing each symbol as an -tuple of indexes, each resulting in side-quantizer mapping. Optimal design of central and side quantizers has been extensively studied for the fixed-rate case [14], [24] and recently extended to embedded quantization, leading to multiple description uniform scalar quantizers [25] (MDUSQ). More
3116
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Fig. 1. Open-loop video coding based on motion-compensated temporal filtering.
recently, we proposed generic EMDSQ [17]–[19], producing double-deadzone central-quantizers, known to be optimal at high rates and very nearly optimal at low-rates [26]. Superior performance of EMDSQ over MDSUQ has been reported for the same level of redundancy [17], [20] and—in contrast to MDUSQ—the design of EMDSQ treats the generic case of an arbitrary number of descriptions . In the following, we provide a brief explanation of EMDSQ and give as example the specific instantiation employed in this paper. We denote the set of embedded central quantizers as , where and correspond with to the finest and coarsest quantization levels respectively. Simwith ilarly, we denote the side quantizers as . Partition cells of central and side quantizers at and . In both any level are denoted, respectively, as are embedded in the ones cases, the partition cells of level i.e., and . A sample at level instantiation of such quantizers, with and is deand picted in Fig. 2, while the analytical expressions of corresponding to this example can be found in [20]. Embedded quantization of each side-encoder proceeds as fol, the absolute value of any sample is quanlows. At level is emitted, along with tized and the corresponding symbol the sign information if needed. At any finer level , the source sample, so far mapped to the cell with , is refined to the cell and is produced, accompanied by the the corresponding symbol sign flag if necessary. At the receiver’s end, each side-decoder progressively refines the sample as more symbols are decoded once all descriptions are received up to the same level , the and its central-quantizer cell is recovered as centroid is chosen to reconstruct the sample. The concept of embedded central and side decoding can be extended with respect to [17]–[21] by taking into account the fact that, in general, the side descriptions need not to be received at the same quantization level . This corresponds to the realistic case of incompletely received side descriptions. Embeddedness allows for decoding each side description up to any quantization level, provided that causality is preserved. Thus, in general,
M
Fig. 2. Example of embedded multiple description scalar quantizers for = = 1. Quantization is shown for positive values only; the negative side of the axis is derived by symmetry. To ensure clarity not all the partition labels (and corresponding reconstruction points) are depicted. of
4 and
Q
P
once the structure of the side quantizers is chosen, we can determine the central partition associated to possibly different quanreceived for each description tization levels . In this case, the central quantizer is defined by the coras follows: responding central partition
(1) with the convention that quantization level of comprises a single-cell any description spanning the entire granular region of the quantizer . Notice that and . B. Multiple Description MCTF-Based Architecture The flow diagram of the proposed scalable erasure-resilient video-coding architecture is given in Fig. 3. Temporal-and-spa-
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
3117
Fig. 3. Proposed architecture coupling MCTF and EMDSQ.
tial decomposition is performed similarly to the SDC case. The resulting - and -frames are quantized using EMDSQ, quality-scalable side descriptions. The quangenerating tizer indices for each description are subsequently entropy coded. In our instantiation of the proposed architecture, we use the quadtree-limited (QT-L) coding algorithm of [27], [28]. QT-L efficiently exploits the intraband dependencies between the wavelet coefficients and provides compression performance competitive to JPEG2000. Notice that the subbands representing different spatial resolutions are independently compressed, thus preserving spatial scalability. As it concerns the motion vectors, we apply the following MDC procedure. First, MVs relative to any frame are quantized using EMDSQ providing a single quantization level. Next, each description is independently coded using a single-layer instantiation of the prediction-based MV coding scheme of [7]. As nonscalable descriptions of each motion field are a result, generated. Upon completion of the above process, a set of bitstreams, losslessly representing each spatiotemporal subband and motion field, is generated. Next, given the desired frame rate and resolution, the encoder must determine the optimal truncation point of the selected embedded bitstreams in order to comply with the available bandwidth. When aiming at maximal robustness, a rate-allocation procedure analogous to the one applied in the context of SDC needs to be performed. In this case, the descriptions of a given subband/MV evenly contribute to the stream sent to the decoder. However, as network conditions depend on the particular client being served, this approach results in unnecessary redundancy within the bitstream whenever the channel is more reliable than the worst-case scenario. We address this problem by devising a rate-allocation scheme which selects for transmission only data belonging to a proper subset of the encoded descriptions. This approach grants the capability to compress and transmit videos through unequally reliable links (e.g., wired/wireless) via a unique coding stage followed by a client-specific optimized selection of the compressed data. Along with the bandwidth-adaptation capability provided by SVC, this feature enables our system to match the variable characteristics of any channel and requires no additional encoding operation. We defer the details of our rate-allocation approach to the next section and conclude the presentation of the proposed architecture by pointing out the conceptual differences
with respect to the two MCTF-based MDC schemes existing in the literature. First, the codec proposed by Bajic and Woods [15] encodes independently neighboring samples, which belong to different partitions; in the case of erroneous transmission, the receiver exploits the correlation among coefficients and interpolates the available samples to estimate the missing ones. It is important to observe that the capability of estimating the missing descriptions provided by [15] is bounded by the existing correlation among transform coefficients, whereas our proposal explicitly introduces redundancy by means of EMDSQ. Furthermore, while we acknowledge that, by adopting overlapping partitions, additional redundancy may be introduced by the codec [15], we notice that this approach inherently requires repeating the compression stage each time the redundancy (hence, the partitioning scheme) need to be varied. As shown next, in our approach, one matches on-the-fly the redundancy, hence the degree of resiliency to the channel conditions using optimized rate allocation. In comparison to [15], the MDC scheme proposed by Van der Schaar and Turaga [16] is more flexible as it forms multiple streams by mapping the compressed frames and motion fields to the various descriptions. When a one-to-one mapping is used no redundancy is introduced; otherwise, completely redundant copies are included in all descriptions. This solution, however, is not efficient since its error resilience properties rely on 1) correlation among filtered frames, which is likely to be low after MCTF, and on 2) the plain repetition of frames and MVs, which fails to increase the quality of the reconstruction as more replicas are received. More importantly, [16] does not propose a procedure to optimally select, on the basis of the estimated channel condition, which frames and MVs need to be replicated and the number of copies to be used. IV. CHANNEL-AWARE RATE-ALLOCATION Throughout this section we detail our novel rate-allocation technique with redundancy control. In Section IV-A, we establish the link between the distortion in the reconstructed sequence and the representation accuracy of each subband/MV generated by MCTF. In Section IV-B, we derive the expected distortion in a stochastic framework accounting for source quantization and transmission errors. In Section IV-C, we formulate the rate-allocation problem which aims at minimizing the expected
3118
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
distortion at decoding side given the available bit rate and the packet-loss rate. Finally, the redundancy control mechanism is described in Section IV-D. A. Distortion Estimation in MCTF Followed by Quantization Given a discrete signal , having compact support of volume ( ), and an approximation of it , we choose the mean-square error (MSE) as our distortion metric, given by Fig. 4. Temporal reconstruction at level j using approximated motion vectors.
(2) We denote by the input video sequence, composed of frames of size , and by its spatiotemporal transform, comprising temporal and spatial decomposition levels. Since quantization is applied on transform-domain samples, we seek for an expression of the overall distortion in the transform domain combining the distortions occurring in each subband and the distortion incurred by approximating the motion vectors. This measure is equivalent to the distortion in , when the reconstructed video sequence, i.e., the spatiotemporal transform is orthonormal. This holds when uniform motion occurs in the sequence (i.e., all pixels are uniquely connected [3], [22]), the motion vectors are error-free, and orthonormal wavelets (e.g., Haar) are employed. In order to keep the formulations tractable, we assume orthonormal spatiotemporal transformations throughout this paper. Notice that, although not strictly orthonormal, the biorthogonal wavelets typically employed in practice (such as the 9–7 biorthogonal basis ubiquitously used in compression) are very near to being orthonormal [29]. We begin by considering the distortion contribution resulting from approximating texture information only, given by
(3) where and represent the transformed high- and low-pass frames at time-instant and temporal levels and , respectively. Next, we account for the contribution to resulting when temporal synthesis is performed using an approximation of the motion field estimated during temporal analysis. This contribution is obtained by considering the effect in the temporal lifting of approximated motion information structure, as explained next. For simplicity, consider the Haar synthesis of a given temand lossless poral level employing inaccurate motion
texture data , , as depicted in Fig. 4. Employing a lossy motion field while inverting the temporal transform introduces , , which some distortion in the reconstructed frames is indicated as
(4) We modify (3) in order to include the above contribution, thus obtaining (5), shown at the bottom of the page. A proof of (5) is given in the Appendix. It is worth noticing that, following similar steps, (4) and (5) can be extended to temporal filters other than the Haar pair. In the remainder of this paper, with a slight to indicate the abuse of notation, we will use the symbol distortion (4) as well as (2). No ambiguity is introduced as the clearly distinguishes distortion ocargument of function curring in wavelet frames caused by quantization from distortion incurred by using approximated motion vectors. We conclude by rewriting (5) such that the contributions of different spatiotemporal levels to the overall distortion are better isolated. The following notations are introduced. At any temporal level, we denote the set of transformed frames as for and for . Similarly, we denote the set of motion vectors produced at any as . level Moreover, at any spatial level, the set of 2-D subbands is for and denoted as for . Finally, we of a frame as . indicate the subband With these notations, (5) compacts into (6), shown at the bottom of the next page. The above formulation of the overall distortion in the reconstructed sequence is particularly useful because the proposed system quantizes each subband and motion field independently, and each contribution to the final distortion is individually evaluated at encoding time, as explained in the following section.
(5)
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
3119
M
2 356 pixels; projections on the
Fig. 5. (a) Distortion-Rate surface, as given by (8) for a two-description ( = 2) representation of a wavelet subband of 288 vertical planes represent the RD curve of each description. (b) Result of resampling the surface for a packet size of 25 KB.
B. Distortion-Rate Function Employing EMDSQ Over Packet-Lossy Network For any subband defined by (1) with tion-rate function
, quantized using EMDSQ as , we define the distor-
size (in bits) , we are only interested in points having output rates evenly distributed with step . Therefore, we need to interpolate the values from (8) to obtain the desired sampling of
as (9) (7)
is the rate (in bits) needed to entropy-code where of the th description. Instead of providing an anlevel alytical expression of for any , we first compute the distortion and the rate associated to any central-quantizer , , and values, a finite set of distorthen derive, from these tion-rate samples which is adequate for the purpose of optimized rate allocation. The first step is performed via direct computa, for any , and measurement of the rate astion of , thus obtaining sociated to each
(8) In general, the output rates provided by (8) are irregularly rate axes, as shown in Fig. 5(a) for . spaced along the Since transmission occurs by means of packets of a given
An example of such a function is given in Fig. 5(b) for and . Notice that possibly comprises noninteger values . For each description , this corresponds to the partial decoding of quantization level . This is feasible since any prefix of the bitstream produced by our quantization/entropy-coding scheme is decodable. Notice that, although (9) does not have a closed form, the and overhead of its direct computation for typical values of (e.g., , , or , ) is negligible if compared for instance to the computational complexity of usually sophisticated motion-estimation strategies employed in temporal filtering. So far, we derived the distortion resulting from compressing into packets in the errorthe source free case. Given (9), we can formulate the expected distortion at the decoder side assuming packet erasures. Each time transmission is performed, a different approximation of is reconstructed depending on the loss pattern, as the decoder processes
(6)
3120
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
all the packets which do not depend on missing data and discards the others. Then, the average distortion at the decoding site can be estimated as
tions, need to be computed. Next, similarly to the case of wavelet subbands, interpolation is used to estimate the intermediate distortion values, in case the compressed motion field spans more packets. C. Rate-Allocation Based on Lagrangian Optimization
(10) where the set of vectors
is
Equation (10) averages the distortion yielded by decoding only packets over all occurrences . the initial For every description , this corresponds to the event of corand losing packet . Subrectly receiving up to packet sequent packets, if received, are useless. The weighting factors in (10) depend on the stochastic model assumed for the packet losses. We assume random memoryless erasures with constant , that is probability
(11) is the discrete-time Heaviside function. We notice where that the case of bursty losses one can still use (11) provided that packets comprising different descriptions are properly interleaved [30], as shown in Section V-A. The rate-distortion behavior for any motion vector quantized using EMDSQ is derived, following the same steps as in the case of the wavelet subbands. Similarly to texture data, the expected . The main difdistortion is expressed by (10) with ference is the use of (4) in evaluating distortion. Specifically, when computing (8), one-level temporal synthesis must be performed with each approximation of the motion field . This is a more demanding task than deriving (8) for a wavelet subband, assuming equivalent values of and . Nevertheless, its computational cost is still negligible for typical values of when a limited amount of motion scalability is supported, i.e., is small. As described in Section III-B, we apply a single-layer instantiation of the EMDSQ to quantize motion . Therefore, only distortion values, corvectors, i.e., responding to all possible combinations of the received descrip-
The following setup is assumed for transmission of compressed video to each client. The coded sequence (or a segment of it) is sent during a transmission slot of duration (s) by means of packets of fixed size (b). We assume that, during each transmission slot, the encoder is aware of the available bit (b/s), corresponding to packets, rate and possesses a reliable estimate of the packet-loss probability. Without the loss of generality, we consider the video segment to be decoded at the original frame rate and resolution. Allocating the available resources means finding which determines, for any a distribution , the optimum number of packets encoding each description of . For a given distribution , the distortion in the reconstructed is a random variable, whose realizations sequence depend on the transmission errors. The expected distortion is obtained applying statistical at the decoder expectation on (6). Exploiting linearity and using (10) we obtain the equation shown at the bottom of the page, while the associated packet-cost is shown in the equation at the bottom . The goal of the next page, where of our rate allocation is to find the optimal packet distribution which minimizes the expected distortion in the decoded sequence without exceeding the available resources. In other words
(12) In case of single description coding, i.e., , the above represents the classical problem of optimal rate-allocation among quantizers. Existing techniques, e.g., [31] and [32], featuring a modest computation complexity could be used to find the opis timal solution. The main difference occurring when the fact that, for any , the expected distortion does not only depend on the total number of packets used to code , but also on the division of the packets among the
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
3121
Fig. 6. (a) Sample path on the distortion-rate surface of Fig. 5. (b) The expected-distortion versus transmitted-packets function corresponding to the selected path.
descriptions. In order to solve (12), we first associate to each one a priori strategy for splitting any amount among the descriptions of . Moreover, we of packets of , the share of packets impose that, for any description is a nondecreasing function of , i.e., see equation (13), shown at the bottom of the page. In loose terms, choosing corresponds to choosing a continuous path on the . As the number of packets distortion-rate surface increases, the distortion decreases along employed to code the selected path. A pictorial example is given in Fig. 6(a) for . Having defined the splitting approach for a given , the corresponding expected distortion solely depends on the scalar quantity and on the packet-loss . A sample function is plotted in Fig. 6(b). As a result, rate a cost-benefit function is obtained for any , and the rate-allocation problem we are concerned with falls back into the classical framework. Next, rather than solving the constrained minimization (12), we employ the well-established technique of Lagrangian multipliers [33] which performs the unconstrained minimization of the functional (14) for some . The packet distribution which minimizes (14) both depend on and the corresponding overall packet cost the multiplier , which is varied until the resulting solution satisfies . Such distribution also minimizes the initial constrained problem (12), subject to the chosen splitting strategy (13). An efficient implementation of the scalar resource is described in allocation which solves (12) for a given
[26] (§8.2) and is used to instantiate our rate allocation. The opis obtained by exploring timal vector packet distribution . However, such a task is impracall possible choices of tical due to the vast number of possible distributions; instead, a much smaller search space can be examined, as explained next. D. Computation-Efficient Redundancy Control The selection of for any not only allows for solving the vector problem (12) using a scalar approach, but also establishes the degree of robustness relative to the transmitted repso as to resentation of . For instance, one may choose evenly share packets among the encoded descriptions; intuitively, in the high loss-rate case, solving (14) assuming such a is likely to yield the lowest expected distortion regardless of . Conversely, when erasure-free transmission is expected, the search for the optimal solution should be carried out which favor only one selecting only the splitting policies so that of the encoded descriptions, i.e., choosing such no redundancy in the stream is introduced. With this respect, in [17] and [19], we demonstrate that, when encoding a source using EMDSQ, reducing the number of descriptions decreases the redundancy within the multiple description reprecomsentation of . Hence, by retaining a subset prising descriptions, one represents the source with a lower redundancy at any reconstruction fidelity. renders the employed erasure-resilience The selection of mechanism scalable as the redundancy level, and, consequently, the degree of resiliency against transmission errors are controllable. We focus on such splitting strategies that evenly share descriptions (thus, preserving packets among the retained the balanced structure of the transmitted descriptions) and send
(13)
3122
Fig. 7. (a) Paths corresponding to the three cases packets curves corresponding to each path.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
M = f1g M = f1 2g M = f2g ,
;
and
(left to right). (b) The tree expected-distortion versus transmitted-
no information to code the remaining ones. The splitting strategies of (13) are, thus, restricted to
information. More precisely, denoting the cardinality of a set as , we impose that
(15) An example of the possible paths allowed by (15) on the distor: In this case, tion-rate surface is shown in Fig. 7(a) with only three choices are possible, corresponding to either side and central decoding. The resulting expected-distortion rate curves , the avare depicted in Fig. 7(b). As expected, when erage distortion achieved by a single description, or , saturates to a larger value than the distortion at. tained by transmitting both descriptions, Notice that, for a given , imposing (15) limits the amount to , since only of allowed splitting approaches that many nonempty sets exist. As a result, the scalar rate-allocation procedure could be performed using each of the permitted ways of distributing packets among all subbands and motion vectors. (15) Although the search space defined by all possible (13), is intuitively much smaller than the one relative to we further limit the computational requirements of the proposed set is associrate-allocation by imposing that a single belonging to the same spatial and ated to all subbands for both temporal resolution. We also employ the set the basic-resolution texture data and the motion information of the corresponding temporal level . Therefore, a total vector distributions are tested during our of which attains the lowest exrate-allocation. The solution pected distortion is then selected to perform the actual transmission. In the remainder of this paper we shall refer to the above approach as the full-search method. In order to keep the complexity of the proposed multiple description rate allocation comparable to the one devised for SDC [7], we impose that higher spatiotemporal levels, i.e., those providing the coarsest spatiotemporal representation of the sequence, are coded using more descriptions, i.e., are more resilient to errors, than lower levels, which contain refinement
(16) We refer to the above strategy as the restricted-search method. A performance assessment between the full- and restricted-search approaches is reported in Section V-B. To conclude, we remark that given the packet-loss rate and overall bit rate constraint, the proposed rate-allocation simultaneously 1) controls the redundancy by selecting the optimum sets for each subband or motion field, and 2) determines the appropriate truncation point within each of the retained descriptions. This approach, not only allows to meet the users’ requirements in terms of bit rate, resolution and frame rate, but also dynamically adapts the degree of resiliency against transmission errors to the given channel condition (i.e., packet-loss rate) after compression of the source has been performed. Furthermore, the proposed restriction of the search space allows for limiting the computational requirements of our rate allocation, whose complexity, for typical values of , is comparable to the one performed in the SDC case. V. EXPERIMENTAL RESULTS In this section, we report the experimental results obtained with an instantiation of the proposed scalable multiple description video coding architecture. The system is based on our scalable MCTF-based wavelet codec [7], equipped with descriptions of both texture EMDSQ producing and motion vectors. Motion data, however, are compressed in a nonscalable fashion. We evaluate the performance of the channel-aware rate-allocation algorithm, featuring the restricted-search method (MDC_r_search) described in Section IV, and compare it against the performance obtained with a rigid rate-allocation approach (MDC_fixed), which transmits, for any subband/MV, all the encoded descriptions, regardless of the channel conditions. Additionally, the experimental results
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
are compared against those obtained with the equivalent single description video codec (SDC). Furthermore, in order to compare against the state-of-the-art, we developed a scalable MDC system incorporating the data-partitioning concepts of Bajic and Woods [15] into . In this way, our video codec, operating also with both our proposed EMDSQ-based system and the data-partitioning-based system (MDC_DP) employ the same texture and motion-vector codecs. Therefore, any performance differences between the two systems will be only due to the way multiple descriptions are created. In all the experiments, we assume a single transmission link through which packets representing different descriptions are transmitted using time multiplexing. The channel acts as an erasure channel and the packet size is set to 1400 bytes including the (ad-hoc) header. The decoder uses all packets which do not depend on missing data (packet dependencies are indicated in the header), and discards the others. The data-partitioning-based system, unlike the proposed EMDSQ system, attempts to estimate an erased description from the other one (if available), as described in [15]. In case of simultaneous loss of all descriptions, both the EMDSQ and the data-partitioning system assume the missing wavelet coefficients or motion vectors to be zero. The quality of the reconstructed sequence is measured using the peak signal to noise ratio, PSNR. Specifically, we employ the average PSNR (Avg_PSNR), defined for YUV 4: 2: 0 color as shown in the equation at the bottom of the sequences s page, where is the number of frames in the sequence and , and represent the , and color components respectively of the frame corresponding to time instant . We report in Section V-A two sets of experiments. In the first set, we assume independent packet losses occurring with distributed in the following intervals: (a) probability , (b) , (c) , and . The second set of experiments is carried (d) out assuming a bursty channel. Simulations are performed using lays in the aforementioned ina Gilbert model [34] where tervals and the average burst length ( ) is set to 3. In all the experiments, we perform the rate-allocation algorithm assuming a single representative value for each interval, , namely: (a) corresponding to the expected loss rate , (b) , (c) , and . For each codec and each interval (a)-(d), (d) 75 streaming simulations are performed, each resulting in a belonging to the given interval with the following value of , (b) , (c) variances: (a) , and (d) . Each interval is then divided into smaller equally sized bins and Avg_PSNR is of the averaged over each bin and plotted against the mean bin.
3123
In the last set of experiments, given in Section V-B, we compare the performance of the restricted-versus full-search approach assuming independent losses. A. Simulations for Independent and Bursty Losses The sequences selected for the first set of experiments are Bus (150 frames) and Football (260 frames), both in CIF format (352 288) at 30 frames per second. Streaming simulations are carried out targeting 512 and 1024 Kb/s. It is worth reminding that, due the scalable nature of the proposed codec, a bitstream suitable for any other bit rate is readily obtained without additional encoding operations. In each streaming simulation, the event of a packet being erased by the channel is modeled as a binary random variable (lost/received), where losses occur with the probability belonging to the selected interval (a)-(d). The same loss probability is assumed for any packet independently of the others, i.e., the random process modeling packet losses is IID. Fig. 8 shows the performance at the two given bit rates for the Bus and Football sequences. The most interesting conclusions are drawn from the behavior at very low and low packet-loss rates, namely intervals (a) and (b). In interval (a), where very few packets are lost, SDC attains the highest Avg_PSNR while the figure for MDC_fixed is up to 2 dB inferior, due to the redundancy among the two descriptions. The curve representing the performance of the MDC_r_search is situated between those of SDC and MDC_fixed. Such behavior confirms the ability of the proposed technique to trim the redundancy within the streamed data, thus obtaining, in the loss-free case, a quality close to one provided by SDC. Indeed, we observe that, when practically no loss occurs, the Avg_PSNR of MDC_r_search is at most 0.7 and 0.9 dB lower than the SDC one at 512 and 1024 Kb/s respectively. Such coding penalty, far less severe than the one incurred by MDC_fixed, is mainly due to the usage of a nonscalable approach for motion vector coding. In the current system, the nonscalable MDC of the MVs cannot equalize the accuracy attained by SDC, without transmitting both MV descriptions. Even then, MDC of the MVs introduces some redundancy, which is not justified by the low packet-loss probability encountered in the left-end of region (a). On the other hand, as the loss rate increases, towards the right end of interval (a) and throughout interval (b), the performance of SDC drops, with its Avg_PSNR curve falling below the MDC ones. We notice that, within interval (b), MDC_r_search is clearly better than MDC_fixed (up to more than 1 dB) at 512 Kb/s, while the two approaches deliver practically the same performance at 1024 Kb/s. As expected, moving towards significant loss rates the SDC quality drops substantially. Within interval (c) the Avg_PSNR of SDC is 4 to 8 dB lower than in the error-free case. Towards the end of
3124
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Fig. 8. Sequences Bus and Football streamed at 512 and 1024 Kb/s assuming independent losses for various packet-loss rates.
interval (d) the plunge exceeds 11 dB and the user experiences a very poor visual quality. On the other hand, both MDC apincreases, proaches, which (as expected) tend to overlap as are far more resilient and deliver an acceptable video quality. Within interval (c), where substantial losses occur, we observe an average quality less than 4 dB inferior to the one attained by SDC in the error-free case, i.e., the best available quality for the
given bit rate. Similarly, the quality reduction at the highest loss rates [upper bound of interval (d)] is limited to less than 5 dB. Comparing against SDC employing no error-protection mechanisms shows the “price” of multiple-description coding. There is a common subliminal belief that having more than one description of the input source must cost a lot in terms of bit rate. This is not at all the case for the proposed EMDSQ-based
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
3125
Fig. 9. Frame 64 from sequence Bus streamed at 1024 Kb/s through an erasure channel featuring IID losses occurring with probabilities: (a) (b) P = 0:05, (c) P = 0:125, and (d) P = 0:2.
system. Indeed, the results highlight that, when operating at low packet-loss rates, the system selects the appropriate information, and the cost of having MDC versus SDC is minimal. On the other hand, at mid-to-high packet-loss rates MDC does its job, namely it provides erasure resilience, while SDC having no protection fails dramatically. Notice though that single-description compressed streams should not be left unprotected if they are operating over packet-lossy networks. In alternative to robust source coding, channel coding approaches based on forward error correction (e.g., [35]) can be employed in order to transform the SDC system into an MDC one and provide resilience against transmission errors. The comparison against the state-of-the-art [15] shows that the proposed EMDSQ-based and the data-partitioning-based
P
= 0:001,
systems achieve similar compression performance at low packet-loss rates. However, at mid-to-high packet-loss rates, the proposed EMDSQ-based codec systematically outperforms the data-partitioning-based system. A sample video frame of the sequence Bus decoded at 1024 Kb/s is given in Fig. 9. The columns in the figure refer to three codec instantiations (SDC, MDC_r_search, and MDC_DP) while the rows depict the frame reconstructed during four streaming simulations, each representative for one packet-loss interval. One notices that, in (a), the visual quality provided by the two MDC approaches is equivalent to the one achieved by SDC. In (b), the increasing packet-losses result in blurriness in the frame decoded by SDC, whereas both MDC approaches are scarcely affected, with MDC_r_search delivering the best visual quality.
3126
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Fig. 10. Sequence Mobile Calendar streamed at 512 and 1024 Kb/s assuming burst losses with L
As the amount of losses grows larger – intervals (c) and (d) – SDC provides a barely intelligible picture. Moreover, unlike MDC_r_search which offers the best visual quality, MDC_DP starts to be seriously affected by the increasing losses, exhibiting annoying visual artifacts. The second set of experiments is carried out using the Mobile Calendar CIF sequence (300 frames), at 30 frames per second. The streaming simulations target video transmission at 512 and 1024 Kb/s over packet-lossy channels with burst losses. The simplification resulting from the assumption of IID losses, i.e., the usage of (11), can still be retained provided that bursts of losses are properly spread across independent packets. Such effect is obtained by interleaving packets scheduled for transmission [30]. We found that the following pseudorandom interleaving scheme (applied to all MDC codecs) provides robustness to bursts and limits the de-interleaving complexity and delay of the decoder. In our scheme, the packets encoding a given description of a given subband, which have strict dependencies, are sequentially fed into the channel. As such, whenever a burst occurs and a packet is erased by the channel, its subsidiary packets—which are now useless—are likely to be erased as well, causing no consequence. On the other hand, when placing in the stream the packets encoding the other description of the same subband, we randomly select an insertion point which is no closer to the end of the first description than the expected burst length. This strategy attempts to avoid the simultaneous erasure of both descriptions by a single burst. Fig. 10 reports the performance obtained at 512 and 1024 Kb/s on the Mobile Calendar sequence. Similar considerations to those made above can be repeated here, since the behavior observed in the bursty case is comparable to that of the IID case. The restricted-search method is able to reduce the performance gap between MDC and SDC up to 1 dB at
= 3 for various packet-loss rates.
low loss-rates, i.e., interval (a). Additionally, MDC provides undisputable gains over SDC in the intervals (b)-(d). It is also important to remark the significant performance gap between our proposed EMDSQ-based approach and the system based on the data-partitioning principles of [15]. Finally, we point out that a performance penalty of up to 4.5 dB is observed if no interleaving is applied. We conclude that interleaving is a critical component and that the adopted interleaving approach is effective in spreading the effect of bursty losses. B. Search-Strategy Analysis In this section, we asses the possible performance loss resulting from restricting our attention only to the splitting straterelative to subsets , satisfying (16). The ratiogies nale behind constraint (16) is that the information carried by higher spatial and temporal decomposition levels contributes to the reconstruction accuracy more than the one contained in the lower levels. Therefore, allocating more redundancy to the upper levels should result in a better trade-off and attain, on average, a lower distortion for the given bit rate. Thus, we expect that satisfy conthe rate-allocation to favor choices of sets dition (16). Moreover, the restricted-search reduces the dimension of the search space, hence the complexity of our rate-allocation approach. The expected minor performance penalty is confirmed by the experimental results shown in Fig. 11 which reports the performance gap between the full- and restrictedsearch methods. Although, in some cases, a solution which violates (16) results in a lower expected distortion, the differences between the two approaches is practically negligible. Finally, from a complexity point of view, the execution time required by the restricted-search algorithm is at most 17% larger than the execution time needed by the rate-allocation employed
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
3127
Fig. 11. Sequence Football streamed at 512 and 1024 Kb/s with independent losses. Difference of average PSNR achieved by full- and restricted-search, respectively.
in the SDC system. Hence, we conclude that opting for the restricted-search strategy allows for keeping the computational cost of the proposed rate-allocation comparable to the one performed in the SDC case. VI. DISCUSSION AND CONCLUSION A novel design for scalable erasure-resilient video coding that couples the compression efficiency of the open-loop architecture with the robustness provided by multiple description coding is proposed in this paper. In our approach, scalability and packet-erasure resilience are jointly provided via embedded multiple description scalar quantization. EMDSQ allows for steering the number of descriptions and the redundancy between them. In the context of packet-switched networks, we derive a novel framework that performs optimized rate-allocation and controls the amount of redundancy by selecting the subset of coded descriptions minimizing the expected distortion at the decoding site. The proposed framework adapts the level of robustness to the network conditions, without resorting to channel coding. The proposed scalable MDC video coding approach seamlessly adapts the data rate to the available channel bandwidth, just the same as in scalable SDC. Similarly, having retained a number of descriptions suitable for a certain packet-loss rate, the system can adapt to a lower loss rate by removing the unnecessary redundancy. Hence, the system is scalable in error-resilience terms. In other words, whatever the channel conditions (bandwidth and packet loss rate) one has, the system can optimize the rate-distortion performance for the given operational points. One conclude that the proposed scalable MDC architecture is suitable for multicast scenarios as it allows for performing a single coding step followed by a rate-redundancy allocation procedure which transmits to each client the appropriate data according to the user’s requirements, available bit-rate and expected packet-loss rate. The advantages of our approach are demonstrated in the context of video transmission over packet-lossy networks by comparing the performance of the proposed system against the equivalent instantiations employing 1) no erasure-resilience mechanisms (SDC), 2) erasure-resilience with nonscalable redundancy (MDC_fixed), and 3) erasure-resilience based on data-partitioning (MDC_DP). The experimental results show that, while transmitting over reliable links, the coding penalty associated with the proposed
approach versus single-description coding is minimal. Specifically, in PSNR terms, MDC_r_search is at most 3% less performing than SDC at low packet-loss rates, interval (a). In other words, the “cost” of MDC is negligible. However, the coding gains observed in video transmission over error-prone channels are significant, to say the least. At moderate loss rates, interval (b), and low bit rates, MDC_r_search is up to 4% better than SDC, while at high bit rates, SDC is 10% below MDC_r_search, which almost overlaps MDC_fixed. At high packet-loss rate, intervals (c) and (d), MDC_r_serch outperforms SDC by up to 28%. In these intervals, our two MDC instantiations practically coincide. We conclude that when low-to-moderate losses are expected, a restricted-search approach exploiting the unequal “importance” of the various subbands achieves the best quality-robustness tradeoff, especially at low bit rates. Additionally, when substantial losses are expected, the search for the best redundancy allocation can be skipped, since in this case all the encoded descriptions need to be sent. We also observe that scalable multiple-description coding of motion should be used in order to further approach the coding performance of SDC in error-free transmission scenarios. Finally, numerical evidence shows that the proposed EMDSQ-based approach outperforms the equivalent system employing the data-partitioning principles of [15]. This seems to indicate that EMDSQ should be favored over data-partitioning in scalable multiple-description coding of video.
APPENDIX Proof of (5): In the following, we neglect the spatial transform, assumed orthonormal, as it does not affect the computaand . tion of distortion, i.e., when We begin by considering the synthesis of level approximated input data are employed. As in [7] and [35], we assume that the contributions of , and to the reconstruction error are disjoint; hence, each approximated frame of is written as
where represents the frame containing the errors induced into by when the sole approximated data used
3128
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
(17)
during one-level inverse MCTF is . Therefore, the distortion in is frame
and motion information in the upper level; hence, following the as same reasoning applied to derive (18), we can write
Iterating on
for
, we obtain (5). REFERENCES
and similarly for . We assume that , and are uncorrelated; hence, the second term in the above equation is zero. Distortion in the entire reconstructed is (17), shown at the top of the page. Notice sequence that the first term in (17) represents the distortion resulting using approximated texture data from synthesizing level and lossless motion information, as in (3), while the second term accounts for the distortion stemming from the use of lossy decoded motion vectors and is compactly represented using the notation introduced in (4). Isolating the temporal low-pass subband in (17) we obtain
(18)
Notice that the first term in (18) is proportional to the distortion in the sequence comprising level one of the temporal decomposition. When two decomposition levels are employed, such distortion originates from approximations in both texture
[1] M. van der Schaar and H. Radha, “Adaptive motion-compensation finegranular-scalability (AMC-FGS) for wireless video,” IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 6, pp. 360–371, Jun. 2002. [2] Y. He, R. Yan, F. Wu, and S. Li, H.26L-Based Fine Granularity Scalable Video Coding, ISO/IEC JTC1/SC29/WG11, Pattaya, Thailand, 2001. [3] R. Ohm, “Three-dimensional subband coding with motion compensation,” IEEE Trans. Image Process., vol. 3, no. 5, pp. 559–571, Sep. 1994. [4] S. J. Choi and J. W. Woods, “Motion-compensated 3-D subband coding of video,” IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 2, pp. 155–167, Feb. 1999. [5] V. Bottreau, M. Benetiere, B. Felts, and B. Pesquet-Popescu, “A fully scalable 3D subband video codec,” in Proc. IEEE Int. Conf. Image Processing, Thessaloniki, Greece, Oct. 2001, vol. 2, pp. 1017–1020. [6] D. Turaga, M. van der Schaar, Y. Andreopoulos, A. Munteanu, and P. Schelkens, “Unconstrained motion compensated temporal filtering (UMCTF) for efficient and flexible interframe wavelet video coding,” Signal Process.: Image Commun., vol. 20, no. 1, pp. 1–19, Jan. 2005. [7] J. Barbarien, A. Munteanu, F. Verdicchio, Y. Andreopoulos, J. Cornelis, and P. Schelkens, “Motion and texture rate-allocation for prediction-based scalable motion-vector coding,” Signal Process.: Image Commun., vol. 20, no. 4, pp. 315–342, Apr. 2005. [8] M. Flierl and B. Girod, “Video coding with motion-compensated lifted wavelet transforms,” Signal Process.: Image Commun.., vol. 19, no. 7, pp. 561–575, Aug. 2004. [9] P. Schelkens, I. Andreopoulos, J. Barbarien, T. Clerckx, F. Verdicchio, A. Munteanu, and M. van der Schaar, “A comparative study of scalable video coding schemes utilizing wavelet technology,” in Proc. SPIE Photonics East, Wavelet Applications in Industrial Processing, Providence, RI, Oct. 2003, vol. 5266, pp. 147–156. [10] J. Apostolopoulos, “Error-resilient video compression through the use of multiple states,” in Proc. IEEE Int. Conf. Image Processing, Vancouver, BC, CA, Sep. 2000, vol. 3, pp. 352–255. [11] L. Ozarow, “On a source-coding problem with two channels and three receivers,” Bell Syst. Tech. J., vol. 59, no. 10, pp. 1909–1921, Dec. 1980.
VERDICCHIO et al.: EMBEDDED MULTIPLE DESCRIPTION CODING OF VIDEO
[12] A. A. El Gamal and T. M. Cover, “Achievable rates for multiple descriptions,” IEEE Trans. Inf. Theory, vol. 28, no. 11, pp. 851–857, Nov. 1982. [13] M. T. Orchard, Y. Wang, V. Vaishampayan, and A. R. Reibman, “Redundancy rate-distortion analysis of multiple description coding using pairwise correlating transforms,” in Proc. IEEE Int. Conf. Image Processing, Santa Barbara, CA, Oct. 1997, vol. 1, pp. 608–611. [14] V. A. Vaishampayan, “Design of multiple description scalar quantizers,” IEEE Trans. Inf. Theory, vol. 39, no. 5, pp. 821–834, May 1993. [15] I. V. Bajic and J. W. Woods, “Domain-based multiple description coding of images and video,” IEEE Tran. Image Process., vol. 12, no. 10, pp. 1211–1225, Oct. 2003. [16] M. van der Schaar and D. Turaga, “Multiple description scalable coding using wavelet-based motion compensated temporal filtering,” in Proc. IEEE Int. Conf. Image Processing, Barcelona, Spain, Sep. 2003, vol. 3, pp. 489–492. [17] A. I. Gavrilescu, A. Munteanu, P. Schelkens, and J. Cornelis, “Embedded multiple description scalar quantizers,” IEE Electon. Lett., vol. 39, no. 13, pp. 979–980, Jun. 2003. [18] ——, “Generalization of embedded multiple description scalar quantizers,” IEE Elect. Lett., vol. 41, no. 2, pp. 63–65, Jan. 2005. [19] A. I. Gavrilescu, A. Munteanu, P. Schelkens, and J. C. Cornelis, “Embedded multiple description scalar quantizers for progressive image transmission,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process, Hong Kong, Apr. 2003, vol. 5, pp. 736–739. [20] A. I. Gavrilescu, A. Munteanu, J. Cornelis, and P. Schelkens, “Highredundancy embedded multiple-description scalar quantizers for robust communication over unreliable channels,” in Proc. Int. Workshop Image Anal. Multimedia Interactive Services, Lisboa, Portugal, Apr. 2004. [21] ——, “A new family of embedded multiple description scalar quantizers,” in Proc. IEEE Int. Conf. Image Processing, Singapore, 2004, vol. 1, pp. 159–162. [22] B. Pesquet-Popescu and V. Bottreau, “Three-dimensional lifting schemes for motion compensated video compression,” in Proc IEEE Int. Conf. Acoust., Speech, Signal Processing, Salt Lake City, UT, 2001, vol. 3, pp. 1793–1796. [23] I. Daubechies and W. Sweldens, “Factoring wavelet transforms into lifting steps,” J. Fourier Anal. Appl., vol. 4, no. 3, pp. 247–269, Mar. 1998. [24] V. A. Vaishampayan and J. Domaszewicz, “Design of entropy-constrained multiple description scalar quantizers,” IEEE Trans. Inf. Theory, vol. 40, no. 1, pp. 245–250, Jan. 1994. [25] T. Guionnet, C. Guillemot, and S. Pateux, “Embedded multiple description coding for progressive image transmission over unreliable channels,” in Proc. IEEE Int. Conf. Image Processing, 2001, vol. 1, pp. 94–97. [26] D. Taubman and M. W. Marcellin, JPEG2000 – Image Compression: Fundamentals, Standards and Practice. Hingham, MA: Kluwer, 2001. [27] A. Munteanu, “Wavelet image coding and multiscale edge detection: algorithms and applications,” Ph.D. dissertation, Dept. ETRO, Vrije Univ. Brussel, Brussels, Belgium, 2003. [28] P. Schelkens, A. Munteanu, J. Barbarien, M. Galca, X. Giro-Nieto, and J. Cornelis, “Wavelet coding of volumetric medical datasets,” IEEE Trans. Med. Imag., vol. 22, no. 3, pp. 441–458, Mar. 2003. [29] J. E. Fowler, “The redundant discrete wavelet transform and additive noise,” Tech. Rep. MSSU-COE-ERC-04–04, Dept. Elect. Comput. Eng., Mississippi State Univ., Missispi State, MS, Mar. 2004. [30] Y. J. Liang, J. G. Apostolopoulos, and B. Girod, “Model-based delaydistortion optimization for video streaming using packet interleaving,” presented at the Asilomar Conf. Signals, Systems, and Computers Pacific Grove, CA, 2002. [31] Y. Shoham and A. Gersho, “Efficient bit allocation for an arbitrary set of quantizers,” IEEE Trans. Acoust., Speech Signal Process, vol. 36, no. 9, pp. 1445–1453, Sep. 1988. [32] E. A. Riskin, “Optimum bit allocation via the generalized breiman, friedman, olshen, and stone algorithm,” IEEE Trans. Inf. Theory, vol. 37, no. 3, pp. 400–402, Mar. 1991. [33] H. Everett, “Generalized lagrange multiplier method for solving problems of optimum allocation of resources,” Oper. Res., vol. 11, no. 3, pp. 399–417, May 1963.
3129
[34] U. Horn, K. Stuhlmüller, M. Link, and B. Girod, “Robust internet video transmission based on scalable coding and unequal error protection,” Signal Process.: Image Commun., vol. 15, no. 1-2, pp. 77–94, Sep. 1999. [35] R. Puri, K.-W. Lee, K. Ramchandran, and V. Bharghavan, “Forward error correction (FEC) codes based multiple description coding for internet video streaming and multicast,” Signal Process: Image Commun., vol. 16, no. 8, pp. 745–762, May 2001. [36] A. Secker and D. Taubman, “Lifting-based invertible motion adaptive transform (LIMAT) framework for highly scalable video compression,” IEEE Trans. Image Process., vol. 12, no. 12, pp. 1530–1542, Dec. 2003.
Fabio Verdicchio was born in Sorrento, Italy, in 1977, and received the M.Sc. degree in electronic engineering from the Università Federico II, Naples, Italy, in 2002. He is currently pursuing the Ph.D. degree in the field of error resilient video coding at the Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), Brussels, Belgium He joined the ETRO, VUB, in 2002. His research interests include wavelets, scalable image/video coding, and transmission of compressed videos over networks. Mr. Verdicchio contributes to the ISO/IEC JTC1/SC29/WG11 (MPEG) committee (Scalable Video Coding Group) and serves as a Reviewer for the IEEE TRANSACTIONS ON IMAGE PROCESSING.
Adrian Munteanu was born in Constanta, Romania, in 1970. He received the M.Sc. degree in electronics and telecommunications from the Politehnica University of Bucharest, Bucharest, Romania, in 1994, the M.Sc. degree in biomedical engineering from the Technical University of Patras, Patras, Greece, in 1996, and the Ph.D. degree in applied sciences from the Vrije Universiteit Brussel (VUB), Brussels, Belgium, in 2003. Since October 1996, he has been a member of the Department of Electronics and Informatics (ETRO), VUB, and since July 2003, he has been a Postdoctoral Researcher at ETRO. His research interests include scalable still-image and video coding, multiresolution image analysis, image and video transmission over networks, video segmentation and indexing, and statistical modeling. He is the author or coauthor of more than 120 scientific publications, and patent applications, and contributions to standards, and he has contributed to four books in his areas of interest.
Augustin I. Gavrilescu was born in Bucharest, Romania, in 1975. He graduated from the Faculty of Electronics and Telecommunications, Applied Electronics Department, Politehnica University of Bucharest (PUB), in 1999. Thereafter, he followed with postgraduate studies in biomedical engineering, Department of Applied Science, English profile, PUB. He received the M.Sc. degree in biomedical engineering from PUB in 2000. In May 2000, he was granted an Erasmus Socrates grant from the Department of Electronics and Informatics (ETRO), Vrije Universiteit Brussel (VUB), Brussels, Belgium, to prepare his M.Sc. thesis. Since October 2000, he has been a member of the ETRO department, VUB, where he is currently pursuing the Ph.D. degree.
3130
Jan Cornelis was born in Wilrijk, Belgium, in 1950. He received the M.Sc. and Ph.D. degrees from the Vrije Universiteit Brussel (VUB), Brussels, Belgium, in 1973 and 1980, respectively. He is a Professor of electronics, medical imaging, and digital image processing at the VUB, where he is also currently the Vice-Rector of Research. He is the Director of the Department of Electronics and Informatics (ETRO), Faculty of Engineering, VUB, and he coordinates the research group on image processing and machine vision (IRIS). His current research directions of the IRIS group include applications in medical imaging, computer vision, remote sensing, mine and minefield detection, and mapping of algorithms on architectures for real time applications. His current research interest is mainly concentrated in the area of image and video compression. He is the author and coauthor of more than 400 scientific publications.
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 10, OCTOBER 2006
Peter Schelkens (M’97) received the Electrical Engineering degree in applied physics in 1994, the degree of Medical Physicist in 1995, and the Ph.D. degree in applied sciences in 2001, all from the Vrije Universiteit Brussel (VUB), Brussels, Belgium. Since October 1994, he has been a member of the Department of Electronics and Informatics (ETRO) Laboratory, VUB, where he currently holds a professorship. He coordinates a research team in the field of image and video compression, 3-D graphics, and error-resilient communication. The team is involved in the ISO/IEC JTC1/SC29/WG1 (JPEG2000) and WG11 (MPEG) committees. Prof. Schelkens is the Belgian head of delegation for the ISO/IEC JPEG standardization committee, and chair/editor of part 10 of JPEG2000. He is also affiliated as Scientific Advisor to the Interuniversity Microelectronics Center (IMEC), Leuven, Belgium, and he is member of the directors’ board of the Interdisciplinary Institute for Broadband Technology (IBBT), Ghent, Belgium.