coding schemes do not obey any perceptual image model, .... the Bachelor of Science degree in electrical engineering from the Tokyo Institute of ... research on digital video signal processing, the design of parallel computer architecture, desk-.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 1, JANUARY 1998
1
Guest Editorial Very Low Bit-Rate Video Coding II I. PERSPECTIVE
AND
CHALLENGES
I
N the last few years, novel and very demanding applications, such as desktop videoconferencing and mobile communications, have given a renewed impetus to research in the area of very low bit-rate video coding. The main technical issue of very low bit-rate video communication is clearly the video coding technique, which is required to accomplish the necessary bit rate with sufficient image quality and reasonable hardware cost. Established coding standards, such as MPEG 2 and CCITT H.261, mainly attempt to diminish the statistical redundancy of image data. The coding schemes do not obey any perceptual image model, and, when used for high compression ratios, the reconstructed images show annoying artifacts both in the transmitted and reconstructed images, such as block and mosquito artifacts, blurring, and contour smoothing. Therefore, there is an increasing interest in a framework of very low bit-rate video coding and compression techniques. It is expected that these techniques will eliminate redundant information within and between frames, taking advantage of the properties of the human visual system. In particular, content-based coding methods try to describe the scenes in terms of uniformly textured regions surrounded by contours, in such a way that the regions correspond, as faithfully as possible, to the objects in the scene. The underlying image mode takes into account probably the most significant stimulus to the visual system: discontinuities or edge information. Much effort is put on extracting and coding these “contours.” The remaining features are seen as “textures” and are coded roughly as some type of homogeneous distributions of grey level values. Due to the proximity of this image model to subjective perception, a better tradeoff between quality and compression may be reached. Moreover, as the compression ratio increases, the reconstructed video sequence shows a fairly graceful degradation of image quality. Indeed, the interest and the great potential benefit of very low bit-rate video coding has been the main reason for the current standards effort for MPEG 4. II. PAPERS
IN
THIS ISSUE
The December 1997 and January 1998 issues of the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS document recent results of active research in the field of very low bit-rate video coding. The scope of the included papers illustrates the scientific complexity of the area and the diversity of the approaches followed. In a broad sense, the last issue covered research work of a more “traditional” nature, while this issue includes papers that follow more “cutting edge” approaches. The papers in this issue are divided into following Publisher Item Identifier S 0733-8716(98)01659-X.
four categories: subband/wavelet techniques, model/objectbased video coding, emerging video coding methods, and system implementation. A. Subband/Wavelet Techniques Subband- and wavelet-based techniques are the focus of the work described in two papers. Cinkler in “Very Low Bit-Rate Wavelet Video Coding” describes a very low bit-rate coding scheme that utilizes edge sensitive subband coding for spatial redundancy reduction and window overlapped block-matching motion compensation for temporal redundancy reduction. In this way only significant regions of frame differences are coded, keeping the computational cost low. Significant regions are considered image areas with temporal activity and/or spatial activity. The activity of the areas is determined by an adaptive strategy based on the motion vectors (for temporal activity) and edge map (for spatial activity). The results obtained demonstrate a good coding performance without blocking effects in the decoded image. In “Highly Scalable Wavelet-Based Video Codec for Very Low Bit-Rate Environment,” by Tham et al., a threedimensional (3-D) wavelet packet decomposition is used, together with a 3-D extension of the zerotree data structure. Scalability is introduced through the use of resolution block coding. The resulting codec exhibits performance comparable to H.263 in the 10–30 Kbps range, with the added benefit of both multirate and multiresolution scalability. B. Model/Object-Based Video Coding The increased interest in model- and object-based video coding is reflected in the set of the following five papers. “A Hybrid Model-Based Image Coding System for Very Low Bit-Rate Coding” by Li and Chen proposes a hybrid algorithm for model-based coding that uses a two-stage global motion estimation method. In the first stage, good image features are extracted using a steerable pyramid and tracked by weighted correlation method. A set of initial motion parameters are computed from the initial correspondence matching. The motion parameters are refined using a gradient-based method in a second stage. The two stages are applied hierarchically to remove the influence of local details. This hierarchical application allows the method to cope with large object motion. Compared with known correspondence matching approaches, systematic errors due to displacements are avoided because the motion is measured using the two stages. Simulation results, presented in the paper, suggest that the proposed model-based coding scheme has a very good performance in terms of peak signal-to-noise ratio (PSNR) and compression ratio. Several object-oriented enhancements to the H.263 standard are introduced by Hartung et al. in “Object-Oriented H.263 Compatible Video Coding Platform for Conferencing
0733–8716/98$10.00 1998 IEEE
2
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 1, JANUARY 1998
Applications,” including segmentation into objects of interest, segmentation-based pre- and post-filtering, model-assisted rate control, and adaptive vector quantization. Both software and real-time hardware implementations of the enhanced codec are discussed. The effectiveness of the enhancements is demonstrated through experimental results and by the ranking achieved by the software version of this codec in the recent MPEG 4 trials. Han and Woods employ a Markov random field-based model for simultaneous segmentation and motion estimation in “Adaptive Coding of Moving Objects for Very Low Bit Rates.” Through the temporal linking of objects and the use of spatial color information in the motion estimation, both the encoding of object boundaries and treatment of uncovered regions are facilitated. PSNR performance comparable to that of H.263 is obtained, with greatly improved visual quality. In “Video Representation with Three-Dimensional Entities,” by Martins and Moura, a very low bit-rate coding technique based on textured object models and pose estimates is introduced. In addition to providing a means for achieving high compression, this approach facilitates an object-based approach to video handling. The construction of the object models and pose estimates from depth and texture measurements is described, as is the reconstruction of video frames from the 3-D scene model. Compression ratios in the range of 15 : 1 to 2700 : 1 are demonstrated for some simple synthetic and real video sequences. Calvagno et al. use extended Kalman filtering for the joint estimation of motion and object shape in “Three-Dimensional Motion Estimation of Objects for Video Coding.” Advantages to joint motion and shape estimation include more reliable estimates in the presence of noise, improved motion estimates for abrupt changes in motion, and generally more natural motion estimates. The model used also facilitates the efficient coding of motion information. The effectiveness of this approach for video compression is demonstrated in a simple codec. C. Emerging Video Coding Methods Emerging methods to overcome the limitations of existing very low bit-rate approaches are described in the following two papers. Efficient coding of stereo video sequence is the subject of “Very Low Bit-Rate Coding Algorithm for Stereo Video with Spatiotemporal HVS Model and Binary Correlation Disparity Estimator” by Pei and Lai. The work, utilizing the suppression and the contrast sensitivity property of the human visual system, is focused on a coding scheme that combines a spatiotemporal model and a binary correlation disparity estimator. Their work suggests that the proposed scheme reduces the video signal redundancy and computational complexity, without degrading the subjective image quality. The authors compare their scheme with other existing and proposed stereo video coding systems indicating 1.5–2 times compression gain; i.e., satisfactory subjective image quality of reconstructed full color stereo video sequences at 0.25–0.4 bits per pixel. “Weighted Finite Automata for Video Compression” by Hafner et al. presents a very low bit-rate video compression scheme based on WFA (weighted finite automata). WFA exploit self-similarities, in a fractal-like manner, within a frame and across a sequence of frames to remove spatial and tempo-
ral redundancies. The application of WFA is a combination of methods derived from hierarchical quadtree segmentation and vector quantization and can achieve compression performance equivalent to state of the art codec designs. Furthermore, the simple mathematical structure of WFA provides an ideal platform for efficient hybrid compression implementations. The authors present experimental results of an intraframe and interframe algorithm which utilizes WFA and describe entropy coding modules suitable for fast and efficient video compression. D. System Implementation The final paper in the issue addresses the subject of efficient hardware implementation of very low bit-rate digital systems. Nachtergaele et al. in “Low-Power Data Transfer and Storage Exploration for H.263 Video Decoder System” describe a power exploration methodology for very low bit-rate decoding applications. The methodology is based on the analysis of the required data transfers between storage and processing sites and the sequence of execution. This can provide a systematic identification of areas for potential optimization of storage usage and data transfers. The authors are employing a prototype environment for memory management, called ATOMIUM, that can lead to decoding system designs with significantly reduced power consumption. The results presented in the paper, for an H.263 video decoder, suggest that the power consumption of initial implementations can be reduced by an order of magnitude. ACKNOWLEDGMENT The Guest Editors would like to thank the authors of all papers submitted to this issue. The wealth of research in very low bit-rate video coding is evident from the large number of submissions received. The Guest Editors would also like to acknowledge the contribution of the many experts who participated in the reviewing process of submitted manuscripts. The quality of this issue is due in large part to the constructive suggestions that came out of the review process. In addition, the Guest Editors would like to thank Ms. S. L. McDonald and Dr. A. M. Bush, J-SAC Board representative, for their guidance. Finally, the Guest Editors would like to acknowledge the support of their respective organizations, Japan Broadcasting Corp. (NHK), Aspex Microsystems Ltd., and University of California, Davis. KAZUMASA ENAMI, Guest Editor Japan Broadcasting Corp. (NHK) Tokyo, Japan ANARGYROS KRIKELIS, Guest Editor Aspex Microsystems Ltd. Uxbridge, UB8 3PH U.K. TODD R. REED, Guest Editor University of California, Davis Davis, CA 95616 USA A. M. BUSH, JSAC Board Representative
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 16, NO. 1, JANUARY 1998
3
Kazumasa Enami was born in Nagoya, Aichi Prefecture, Japan, on July 13, 1948. He received the Bachelor of Science degree in electrical engineering from the Tokyo Institute of Technology, Tokyo, in 1971 and the Doctorate of Engineering degree in electrical engineering from the Tokyo Institute of Technology, Tokyo, in 1984. He began his professional career with the NHK (Nippon Hoso Kyokai: Japan Broadcasting Corporation) in 1971. He is currently Director of the Multimedia Services Research Division at the Science and Technical Research Laboratories of NHK where he is responsible for research on digital video signal processing, the design of parallel computer architecture, desktop program production technology, and multimedia services for broadcasting. Dr. Enami is a member of the Board of Directors of Foundation for Intelligent Physical Agents and of the Board of Directors of the Institute of Television Engineers of Japan (ITE). He is also a member of the Society of Motion Picture and Television Engineers (SMPTE) and the Institute of Electronics, Information and Communication Engineers of Japan (IEICE Japan).
Anargyros (Argy) Krikelis (S’83–M’83) received the Diploma of Electrical Engineering and Electronics from University of Patras (Greece) and the M.Sc. in digital systems and Ph.D. degrees from Brunel University, Uxbridge, U.K.. He is the Chief Scientist, a member of the Management Council, and a founding member of Aspex Microsystems Ltd., leading the applications division of the company. He is also a Research Fellow at the Department of Electrical Engineering and Electronics at Brunel University. His research interests are multimedia processing and applications, massively parallel computer architectures, video and image processing applications of massively parallel computation, associative architectures, programming methods and tools for parallel architectures, parallelizing compilers, and neural computation. He is the author of over 40 contributions to international conferences, journals and books. His involvement as a technical leader include projects in image, vision, and video processing and massively parallel processing involving system and application development for areas such as computer vision, digital television, highenergy physics, airborne/spaceborn surveillance and tracking, speech recognition, and database management. Dr. Krikelis has been Guest Editor for special issues for the Journal of Parallel and Distributed Computing, IEEE COMPUTER, and Parallel Computing. He serves on the Editorial Board of IEEE CONCURRENCY and is the Organizing Chair for the Workshop on Parallel Processing and Multimedia, part of the annual International Parallel Processing Symposium (IPPS). He is a member of the IEEE Computer Society.
Todd R. Reed (S’76-M’77-SM’90) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Minnesota, in 1977, 1986, and 1988, respectively. From 1977 to 1983 he was an Electrical Engineer at IBM (San Jose, CA, Rochester, MN, and Boulder, CO), and from 1984 to 1986 he was a Senior Design Engineer for Astrocom Corp., St. Paul, MN. He served as a consultant to the MIT Lincoln Laboratory from 1986 to 1988. In 1988 he was a Visiting Assistant Professor in the Department of Electrical Engineering, University of Minnesota. From 1989 to 1991 he was the head of the image sequence processing research group in the Signal Processing Laboratory, Department of Electrical Engineering, at the Swiss Federal Institute of Technology in Lausanne. He is currently an Associate Professor in the Department of Electrical and Computer Engineering, University of California, Davis. His research interests include image and image sequence processing and coding, multidimensional digital signal processing, and computer vision. Dr. Reed is a member of the European Association for Signal Processing, the Association for Computing Machinery, the Society for Industrial and Applied Mathematics, Tau Beta Pi, and Eta Kappa Nu.