Coding Image Sequence Intensities Along Motion Trajectories Using EC-CELP Quantization Majid Foodeei 1 2 and Eric Dubois 2 1 ;
ABSTRACT
;
At lower bit rates (less than one bps), the rate distortion performance of DPCM degenerates substanThe goal of this work is to investigate a high qualtially. Resorting to classical source coding theory, a ity, low bit rate image sequence coding scheme which takes full advantage of the temporal redundancies. better compression should be possible by exploiting Such schemes will be important for high-compression the temporal redundancies beyond two elds. Among video coding, such as the one studied in MPEG-4. practical constraints, also to be taken into account are In conventional image sequence coding, a motionthe coding delay and complexity. compensated DPCM con guration is used to remove The intensities along MTs can be modeled as a the temporal redundancies. The performance of the highly correlated Gauss-Markov process. For such DPCM for such highly correlated sources at rates a source model, we have recently shown that [1] [2] below one bps degenerates substantially. We have recently shown that for a highly correlated Gaussusing the EC-CELP coding method, near rate distorMarkov source, practical near rate-distortion function tion function (RDF) performance is possible. It was performance is possible. We used the entropy-consshown that even at low bit rates and with practical trained code-excited linear predictive (EC-CELP) quan- delay and compexity, the high coding performance is tization scheme to obtain such performances. Hence maintained. The main goal of this paper is to examfor low rate image sequence coding, to improve the ine the application of EC-CELP to coding intensities rate-distortion performance, an EC-CELP con guraalong MTs. The required MT estimation and codtion can replace the DPCM con guration to quantize ing con guration are also studied. The performance the image intensities along motion trajectories. To apply EC-CELP quantization, we rst investigate the of this coding scheme and the conventional DPCM methods of motion trajectory estimation and propose structure is compared. a temporal block motion compensation con guration. Then, the performance advantage of EC-CELP con guration over the conventional DPCM structures is 2. Image Deck Trajectory Estimation shown. Fig. 1, shows a sequence of image elds, grouped into coding and MT estimation image decks (only spatial 1. Introduction x dimension is shown.). Assuming that the EC-CELP uses temporal coding blocks of N samples (in this exMotion trajectories (MT) trace out the projection of ample N = 3), the encoder will have a coding delay scene points in the image plane during the time they of N elds. The proposed MT estimation is carried are visible in the image. Motion compensated coding out over N + 1 elds (deck of N + 1 images). The (along the MTs) is an ecient technique in removing output of MT estimator for that deck is a MT patemporal redundancies in image sequence coding. For rameter eld, where each MT is described by a set the motion compensation, ordinarily the MT model of parameters. As in conventional video coding, the is limited to a linear model and the motion estimaparameter elds are also to be coded and transmittion is carried out over two image elds. Assuming ted to the decoder. These parameters are used to high correlation between the corresponding samples obtain the motion-compensated coding image deck of the current frame and the motion-compensated (intensities along MTs). As seen in Fig. 2, this deck neighboring frame, predictive coding is then used in has the property of being highly correlated along the a DPCM structure. The residual prediction error imtemporal axis. It is possible to use either forward or age is quantized using a spatial block quantizer. As backward motion estimation and coding. In these gthe last stage of encoder, lossless coding is applied. ures, for the forward motion estimation, the MTs are This research was supported in part by a grant from the forced to pass through the spatial sampling points of Canadian Institute for Telecommunications Research (CITR) the last frame of each estimation deck. Notice that under the NCE program of the Government of Canada. the MT estimation deck includes a frame from the 1 Electrical Engineering, McGill University, 3480 University previous coding deck. The reason for this becomes Street, Montreal, PQ, CANADA H3A 2A7 2 INRS-T clear as the required \continuity" for the MTs is exelecommunications, Universite du Quebec, 16 Place du Commerce, Verdun, PQ, CANADA H3E 1H6 plained later. In each of the MT estimation decks E-mail: ffoodeei,
[email protected] k ? 2; k ? 1; and k in Fig. 1, one MT is traced within each estimation deck. In deck k, the actual MT (dot-
ted) is linear over the estimation deck. The MT of deck k ? 1 is quadratic. Finally a higher order MT is shown for the deck (k ? 2). To obtain MT parameters, one alternative is to extend the linear displacement eld formulation of motion estimation [3] to a piece-wise linear scheme over the elds of the estimation deck. The advantage of this alternative, as shown by the solid displacement vectors, is that all three MT shown in Fig. 1 (including higher than quadratic motion model) can be approximated. The estimation has to be carried out for consecutive neighboring elds starting from the spatial sampling points of the last frame in the deck. Another alternative to this is to estimate the parameters for a trajectory using all N + 1 elds within the estimation deck. Each MT passes through one spatial sampling point of the last eld (reference eld) and the estimation is either based on the conventional linear motion model or a quadratic model which incorporates both velocity and acceleration. The advantage of such method is in reducing the set of motion parameters (when the MT temporal block length is greater than 2). The details of the second alternative are found in [3] [4]. The details for the rst alternative will be presented in a future publication.
3. EC-CELP for Image Deck Coding To model the highly correlated samples along each of the MTs in Fig. 1 by a stationary rst order GaussMarkov process, we assume that the eect of discontinuity at the coding deck borders is taken care of by the MT \continuity" enforcement. By highcorrelation we mean a correlation parameter a between 0.95 and 1. For lower bit rate coding, when the DPCM quantization scheme performs poorly, the EC-CELP can overcome most of the problems associated with high-quality, low delay, low bit rate source coding of such sources [1] [2]. The diculties are due to contradicting requirements and failure of the traditional high rate models and analysis. The challenge is to eectively combine various source coding schemes suitable to the source model. EC-CELP implicitly combines the advantages of VQ, predictive coding (PC), and analysis-by-synthesis with the merits of the entropy constrained codebook design. In EC-CELP quantization, to obtain locally optimum variable-length block-predictive coding with respect to a delity criterion, an extension of the entropy constrained VQ (ECVQ) algorithm based on a Lagrangian formulation is derived. For low bit rates and highly correlated sources, the EC-CELP followed by an entropy coder has a clear performance advantage over alternative coding con gurations such as entropy constrained DPCM coder (EC-DPCM) [5], entropy constrained VQ (ECVQ) [6], and entropy constrained block transform quantization (EC-BTQ) [7]. As simulation results of EC-CELP operating on the Gauss-Markov model in Fig. 3 show, the performance
of the EC-CELP is the closest to the RDF without excessive delay and complexity. Since the resulting codebook size is relatively small, the coder complexity is not too high. More importantly, for highly correlated signals, the EC-CELP is the only coder which can achieve close to RDF with practical block length N = 3 and hence low delay. Fig. 4 shows the block diagram of the EC-CELP encoder and decoder. To examine the advantages of better exploitation of temporal redundancy, let us assume that the EC-CELP is to code intensities along one single MT and we are not concerned with the spatial redundancies. The input vector (the vector formed by N = 3 samples from one coding image deck k in Fig. 1) S = (s(0); s(1); s(2)) is encoded by the CELP encoder () which generates the output index I 2 I . This index is the input to the entropy encoder (?) which results in the output C 2 C (C = fCI gI 2I ). The inverse of the entropy coder at the decoder (??1 ) generates the output I . The CELP decoder ( ) reconstructs the signal S^ from this index. The CELP encoder with an analysis-by-synthesis structure uses an exhaustive search through the excitation signal codebook. The lter Zero-State-Response (ZSR) and Zero-Input-Response (ZIR) are separated to reduce the complexity. However, as seen in Fig. 1, the coding deck k is not actually the evolution of k ?1th deck and hence to update the ZIR memory we need the MT continuation into the last eld of the previous coding deck (s(?1) in Fig. 1). Since this value is obtained by interpolation using the reconstructed previous deck, there is an interpolation error noise introduced into the coder. However, as seen from the results and as EC-CELP incorporates this noise into the closed-loop, the adverse eect is tolerable. This is what was earlier referred to by enforcing the required \continuity" of MTs. As shown in Fig. 4, the CELP encoder rst generates the ZIR error signal EZIR with above consideration. Then each code vector V(I ) (with index I ) from the codebook is passed through the zero-state synthesis lter to obtain ZSR candidates S^ (I ) for the current input signal vector S. The index of the one which yields minimum entropy-constrained cost is sent to the lossless entropy coder. For the case of a stationary Gauss-Markov source, the predictor coecient value is the same as the Gauss-Markov coecient a. The EC-CELP algorithm is an iterative descent algorithm which nds the convex hull of the N -th order operational RDF. The lower bound to the operational RDF is the N -th order RDF, which as N goes to in nity becomes RDF (for squared error). The average distortion is D = (1=N )E (S; S^ ) and the average rate is R = (1=N )E [len(S)] bps. len(S) = j?(S)j is the length of the codevector representing S and (S; S^ ) is the distortion between S and S^ . The convex hull is found by minimization of the functional J (; ?; ) = E [(S; ((S))) + j(?((S)))j] : ZSR
For each , giving a point on the operational ratedistortion convex hull, starting from the initial coder, the iterative descent algorithm repeatedly updates the mappings ((t) ; ?(t); (t)) until some stopping criterion for the convergence of above functional is met. First, for a given (?(t) ; (t)), the mapping (t+1) is obtained by using the codebook search module cost as (S; ((S))) + j(?((S)))j (Fig. 4). In the second step, given ((t) ; (t)), the codevector lengths (?(t+1) ) are updated. Here, as with experiments of [6], we allow noninteger length. Finally, given ((t) ; ?(t)), the mapping (t+1) is obtained. This amounts to the centroid rule in the \closed loop" CELP design. As a result, for each nonempty Voronoi cell, a set of linear equations has to be solved. The solution constitutes the new codevector. However this formulation is necessary only for the general case of nonstationary Gauss-Markov source model, where the predictor P (Z ) in the CELP coder is adaptive. For the stationary Gauss-Markov case, there is no need for solving such linear systems. For each Voronoi cell, it suces to accumulate an N -dimensional vector and divide this vector by the Voronoi cell population number. The accumulated vector is a simple function of the ZIR error signals (E ) for that cell. The EC-CELP algorithm is outlined in [1] [2] and will be presented in detail in a future publication. ZIR
4. MT Estimation and Coding Con gurations Fig. 5 shows the block diagram of a general motioncompensated image deck coding. For the k-th deck, N
and N + 1 input image elds are buered in decks
D(k) and DMT (k) as inputs into motion-compensated
image deck interpolator and image deck MT estimator
modules respectively. The estimation module calculates the MT parameter eld T (k), using one of the methods of section 2. Using T (k), the interpolator module obtains the image values along the MTs. To do this, one of the known two dimensional interpolation schemes can be used. The output of this module is the (motion-compensated) coding image deck D0(k). The motion-compensated image deck encoder encodes D0(k). As well, the MT parameter eld T (k) are to be coded by this module. The decoder performs the reverse operation to generate the reconstructed motion compensated image deck, D^ 0(k). To obtain the image deck at image grid points, D^ (k), we need to perform scattered data to grid data interpolation. Notice that this last operation, which is more complex and erroneous than the rst interpolation operation, is one disadvantage and diculty associated with the deck image coding proposed here. We used the method of [8] for this grid data interpolation. Since most of the gain of the EC-CELP is obtained with low temporal block size (N =2{3), the relatively regular scattered data (motion-compensated deck) does
not result in large errors. Although we managed to provide alternative simpler interpolation methods for the special scattered data interpolation problem at hand, further work on this module may be needed. The encoder and decoder blocks should include both temporal and spatial lossy and lossless compression techniques. In this rst feasibility study, we are concerned with the examination of advantages of better exploitation of temporal redundancy. Hence we assume that the EC-CELP is to code intensities along MTs without being concerned with the spatial redundancies of the EC-CELP codevector index elds. The motion-compensated image deck D0(k) is the input to the EC-CELP encoder. The output of the EC-CELP, which is trained and designed using a set of test sequences, is a eld of integer-valued indices of the codevectors I . For the index I and the spatial coordinate of the image (x; y), we have fIx;y 2 I ; 8(x; y)g. The entropy of these indices constitutes the rate for the coding scheme. In real situation, lossless coding has to be applied to these index elds. For a simple comparison of the above EC-CELP con guration with the common motion-compensated coding, the EC-CELP maybe replaced with DPCM coder, and their rst order entropy compared. In our experiments, we also did not consider the coding of the trajectory parameter elds, T (k). Similarly, we obtained results for an EC-DPCM as a special case of EC-CELP with temporal block size N = 1 (see results in the next section). This design approach showed improvement over the alternative design method of EC-DPCM in [5] for Gauss-Markov source for rates less than one bps. From the results of Fig. 3, it is expected that the performance of EC-CELP improve as larger N values are used. Due to the EC design advantage, even an EC-CELP with N = 1 (our ECDPCM) should outperform the simple DPCM. As the next phase of the study, reconsidering the spatial redundancies of the above index eld, one alternative is to use a higher order lossless coding on the output index eld. Note that in this alternative no spatial lossy coder is used. As another alternative, a suitable spatial lossy coding technique may be introduced or implicitly incorporated in the structure. These alternatives are the subject of future studies.
5. Results and Conclusions The deck motion estimation of section 2, provided satisfactory results in generating highly correlated signal with \continuity" of MTs (Fig. 2). To focus simulations on coding gains and to eliminate the eects of motion estimation and to minimize the grid data interpolation error on this evaluation, we generated a synthetic image sequence with known simple motion. The motion estimation methods of section 2 resulted in near perfect trajectory description. The sequence marble is generated using the Persistence of Vision Raytracer (POV) software [9], with a sim-
REFERENCES
[1] M. Foodeei and E. Dubois, \Entropy-constrained code-excited linear predictive quantization (ECCELP)," in IEEE Inter. Symposium of Information Theory, June 1994. [2] M. Foodeei and E. Dubois, \Quantization theory and EC-CELP advantages at low bit rates," in IEEE Inf. Theory workshop, Nov. 1994. [3] E. Dubois and J. Konrad, \Estimation of 2D motion elds from image sequences with application to motion-compensated processing," in Motion Analysis and Image Sequence Processing (M. Sezan and R. Lagendijk, eds.), ch. 3, pp. 53{ 87, Kluwer Academic Publishers, 1993. [4] M. Chahine and J. Konrad, \Estimation of trajectories for accelerated motion from time-varying imagery," in Proc. IEEE Int. Conf. Image Processing, Nov. 1994. [5] N. Farvardin and J. W. Modestino, \RateDistortion performance of DPCM schemes for Autoregressive sources," IEEE Trans. Inf. Theory, vol. 31, pp. 402{418, May 1985. [6] P. A. Chou, T. Lookabaugh, and R. M. Gray, \Entropy-constrained vector quantization," IEEE Trans. Acoust. Speech Signal Process., pp. 31{42, Jan. 1989. [7] N. Farvardin and F. Y. Lin, \Performance of entropy-constrained block transform quantizers," IEEE Trans. Inf. Theory, vol. 37, pp. 1433{1439, Sept. 1991. [8] D. T. Sandwell, \Biharmonic spline interpolation of GEOS-3 and SEASAT altimeter data," Geophysical Research Letters, vol. 14, pp. 139{142, Feb. 1987. [9] C. Young, Team coordinator, Persistence of Vision Raytracer, User's Documentation, version 2.0 ed.
s(2) k s(1) k s(0)
s(-1) k-1 k-1
k-2
Coding image deck:
k-2
MT estimation image deck:
ple fractional pixle translational motion. As seen in Fig. 6, the sequence does not include occlusion or newly exposed areas except at the borders (camera moving in front of the textured wall). Figure 6 shows the EC-CELP advantage by comparing the coded synthetic sequence marble using ECCELP and DPCM. In agreement with these subjective performance comparisons, the proposed EC-CELP con guration clear advantage is seen from the ratedistortion performance comparisons shown in Table 1. The gains shown are due to a better temporal redundancy removal. The EC gain is shown by comparing the DPCM results and the results of EC-CELP with N = 1 (our EC-DPCM). EC-CELP with N = 2 gives another substantial improvement over N = 1. However the diculties resulted from the grid data error accumulation, particularly for larger N s (1 dB for N = 2), needs further investigation. Future work also should include spatial redundancy and other aspects of the complete coder described in the previous section.
t y
x
Fig. 1 Coding image deck D(k) with temporal block size N = 3, the 0corresponding motioncompensated coding deck D (k), and the MT estimation image deck DMT (k), with size N + 1. time (t) y
x
Original image deck
3
80 60 40 0 0
20 10
20
30
40
50
(a)
60
70
0
Motion compensated image deck
3
80 60 40 0 0
20 10
20
30
40
50
60
70
0
(b) Fig. 2 3 elds from Miss America sequence as (a) original and (b) motion compensated image decks.
3, 8-D EC-CELP(--), ECVQ(*), ECBTQ (:, 8-D) (a=.9) 1
3, 8-D EC-CELP(--) & ECVQ(*) (a=.98) 1
0.6
0.6
Rate
0.8
Rate
0.8
0.4
0.4
0.2
0.2
0
-1
0 -2 10
0
10
(a)
10
-1
0
10 D/var_x
D/var_x
10
Fig. 3 Rate-distortion performance of EC-CELP
(N = 3; 8; dashed), ECVQ (N = 3; 8; *), EC-BTQ (N = 8, dotted, expected to be similar to ECVQ for a = :99), and RDF (solid) for a=0.9 and a=0.98. S Input Signal,
Φ
Encoder
(b)
Zero Input Response (ZIR) 1/(1-P(Z))
Codebook (I)
Zero State Response (ZSR)
V
(I)
EZIR
SZSR
Γ
1/(1-P(Z)) I
Entropy
Codebook Search Module
Γ −1
Index,
Ψ
Codebook
Inverse
C, From channel
Entropy Coder
Coder
I
I
C, To channel
Decoder
Synthesis Filter
Output
1/(1-P(Z))
Signal S
Fig. 4 EC-CELP encoder and decoder block diagram (motion-compensated image deck temporal coding) Input image
Buffer
D
Motion-compensated
D Motion-compensated
image deck interpolator
C,
image deck encoder & Image deck MT DMT
D
Grid data D
image deck decoder &
T
Interpolator
Output image
MT parameter decoder
Fig. 5 Motion-compensated image deck coding con guration block diagram.
Fig. 6 Image sequence marble, (a) eld 5 of original (b) eld 6 of reconstructed sequence (overall in Table 1) using EC-CELP N = 2 (similar quality for EC-CELP N = 1 and DPCM, rates of Table 1) (c) eld 5 of reconstructed sequence (overall) using ECCELP N = 2 showing adverse eect of grid data interpolation.
MT parameter encoder
T
Estimator
Motion-compensated C, From channel
To
channel
(c)
coder PSNR, coder PSNR, overall Rate DPCM 40.4 40.4 1.2 EC-CELP N = 1 40.1 40.1 0.6 EC-CELP N = 2 40.3 39.4 0.4 Table 1 Performance comparison among DPCM and EC-CELP (N = 1; 2) for image sequence marbel, excluding few image border samples (overall means reconstructed and when applicable includes grid data interpolation eect).