Video, sound, and communications are being added to computers; interactivity is ... very low bit-rate video coding and compression techniques. .... the Bachelor of Science degree in electrical engineering from the Tokyo Institute of Technology,.
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 15, NO. 9, DECEMBER 1997
1685
Guest Editorial Very Low Bit-Rate Video Coding I I. PERSPECTIVE
AND
CHALLENGES
T
HE traditional boundaries between the telecommunications, computer, and TV/film industries are blurring. Elements that have historically belonged to each of the areas are being introduced into the other two. Video, sound, and communications are being added to computers; interactivity is being added to television; and video and interactivity is being added to telecommunications. In the research and commercial communities, three major trends are of particular interest: • the trend toward wireless communications; • the trend toward interactive computer applications; • the trend toward integration of audio-visual data into an ever-increasing number of applications. At the intersections of the traditionally separate areas of interest, these trends must be considered in combination; new expectations and requirements arise, which are not adequately addressed by current or emerging research and development activities. The most important aspect is the use of public switched telephone networks (PSTN’s) or mobile channels as transmission media, which are widely accessible to the general public. Transmission using such media requires that video data are transmitted at very low bit rates in the range from 8–64 kb/s. The main technical issue of very low bit-rate video communication is clearly the video coding technique, which is required to accomplish the necessary bit rate with sufficient image quality and reasonable hardware cost. Established coding standards, such as MPEG 2 and CCITT H.261, mainly attempt to diminish the statistical redundancy of image data. The coding schemes do not obey any perceptual image model and, when used for high compression ratios, the reconstructed images show annoying artifacts both in the transmitted and reconstructed images, such as block and mosquito artifacts, blurring, and contour smoothing. Therefore, there is an increasing interest in a framework of very low bit-rate video coding and compression techniques. It is expected that these techniques will eliminate redundant information within and between frames, taking advantage of the properties of the human visual system. In particular, content-based coding methods try to describe the scenes in terms of uniformly textured regions surrounded by contours, in such a way that the regions correspond, as faithfully as possible, to the objects in the scene. The underlying image mode takes into account probably the most significant stimulus to the visual system: discontinuities or edge information. Much effort is put on extracting and coding these “contours.” The remaining features are seen as “textures” and coded roughly as some type of homogeneous distributions of grey level values. Due to the proximity of this image model to Publisher Item Identifier S 0733-8716(97)07689-0.
subjective perception, a better tradeoff between quality and compression may be reached. Moreover, as the compression ratio increases, the reconstructed video sequence shows a fairly graceful degradation of image quality. There is a vested commercial interest from existing international coding standards (i.e., MPEG 1 and MPEG 2) and other traditional research activities in video compression (e.g., model-based video compression, vector quantization, fractals, and wavelets) to develop techniques for improved quality at very low bit rates of video applications. It is expected that because of the tremendous commercial potential interest in video coding applications, the different techniques will compete very hard to achieve the best possible results (and therefore influence telecomputing areas) exploiting contemporary and novel algorithms as well as the potential of dedicated or programmable high performance digital system solutions. Indeed, the interest and the great potential benefit of very low bit-rate video coding has been the main reason for the current standards effort for MPEG 4. Very low bit-rate coding is a prominent feature in MPEG 4 aiming at the completion of a standards definition in late 1998. II. PAPERS
IN THIS ISSUE
The December 1997 and January 1998 issues of the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS document recent results of active research in the field of very low bit-rate video coding. The scope of the included papers illustrates the scientific complexity of the area and the diversity of the approaches followed. In a broad sense, this December issue covers research work of a more “traditional” nature, while the January issue will be based on more “cutting edge” approaches. The papers in this issue are divided into following three categories: enhanced transform video coding techniques, rate distortion, and channel efficiency. A. Enhanced Transform Video Coding Techniques The first three papers discusses improvements to traditional transform video coding techniques for achieving better results. A scalable video coding scheme for low bit rates is described in “Spatially Scalable Video Compression Employing Resolution Pyramids” by Illgner and M¨uller. Their codec is based on motion compensated predictive coding which has low delay and, according to the authors, is well suited for communications applications. The coding approach decomposes the frames to be coded into a Gaussian pyramid. Motion estimation and compensation is performed between corresponding pyramid levels of successive frames. This results in a pyramid of displaced frame differences from which a least squares Laplacian pyramid is derived. The latter pyramid is quantized and coded. The encoder outputs an embedded bit stream, and its control may truncate the bitstream at any point to sustain a fixed bit-rate. Simulation results of the scheme show that its coding gain is comparable to simulcast coding.
0733–8716/97$10.00 1997 IEEE
1686
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 15, NO. 9, DECEMBER 1997
Zhang et al. in “Image Sequence Coding Using Multiple Level Segmentation and Affine Motion Estimation” present a very low bit-rate codec using multiple level segmentation and affine motion compensation. The work described is addressing the issues involved in motion estimation of large regions. The authors propose a codec design that is based on a variable block size algorithm enhanced with global motion compensation. A predefined set of motion models and an inner block segmentation are used adaptively in motion compensation. The presented experimental results suggest, in the majority of cases, a better bit-rate performance [under the same peak signal-to-noise ratio (PSNR) constraints] when compared with fixed block size approaches and variable block size approaches where only translation motion compensation is utilized. In “Very Low Bit-Rate Video Coding Using Variable Block-Size Entropy-Constrained Residual Vector Quantizers,” by Kwon et al., a multistage residual vector quantizer with transform vector quantizers in the initial stages is used to encode the motion compensated residual in a codec otherwise similar to H.263. Generally improved perceptual quality and PSNR performance are demonstrated as compared to H.263, with the more notable improvements at the lower (e.g., 5.4 kb/s) rates. B. Rate Distortion Rate distortion in very low bit-rate video coding techniques is the subject of the next three papers. Tzovaras et al. in “Optimization of Quadtree Segmentation and Hybrid Two-Dimensional and Three-Dimensional Motion Estimation in a Rate-Distortion Framework” describe a rate-distortion framework and its use in defining a very low bit-rate coding scheme. The scheme is based on quadtree object segmentation and optimized selection of motion estimators. The object segmentation is performed using rate-distortion criteria, and it is fused with motion estimation for each leaf of the tree; i.e., the optimum motion estimator is adopted from a predetermined set of candidate motion estimators. As an extension, the authors use the rate-distortion optimization scheme for optimum allocation of the prediction error corresponding to the motion estimation in the transmitted information. The performance of the proposed scheme is evaluated in two versions of a very low bit-rate coder. Schuster and Katsaggelos apply Lagrangian relaxation and dynamic programming to the bit allocation problem in “A Theory for Optimal Bit Allocation Between Displacement Vector Field and Displaced Frame Difference.” Using this approach in the H.263 framework, both superior rate-distortion performance (compared to the H.263 TMN4 codec) and advantages in rate control are obtained. Using prediction and the local statistics of the motion field to reduce the motion vector search area, Kossentini et al. propose a motion estimation technique optimized for rate-distortion performance in “Predictive RD Optimized Motion Estimation for Very Low Bit-Rate Video Coding.” When applied in the H.263 framework, this approach demonstrates superior performance to the TMN5 coder both in compression and computational complexity.
Efficient techniques for improving the error resilience of audiovisual services in very low bit-rate wireless communication system is the subject of “Error Resilience in Video and Multiplexing Layers for Very Low Bit-Rate Video Coding Systems” by Lee et al. These techniques code the information simultaneously for synchronization and error protection or correction. The approach described in their work is intended to improve the performance of the multiplexing protocol, which combines the video and audio streams, and also to improve the robustness of the coded video. The reported work demonstrates, through simulation, that the approach is efficient with regard to the use of bits and effective against bursty errors common in wireless channels. The authors’ results suggest that for a DECT channel their approach gives an order of magnitude improvement in the probability of lost packets in the multiplexer layer when compared with other conventional approaches. Furthermore, the paper shows that in the video layer an improvement of 1 to 2 dB over the ITU-T Recommendation H.263 is possible. In “Performance of H.263 Video Transmission over Wireless Channels Using Hybrid ARQ,” Liu and El Zarki propose a hybrid automatic repeat request error control approach to deal with the changes in bit error rate and signal-to-noise ratio characteristic to wireless channels. This method is analyzed using a multistate Markov chain to model the channel at the data packet level, and the transmission of H.263 coded video over a wireless channel using this technique is simulated.
ACKNOWLEDGMENT The Guest Editors would like to thank the authors of all papers submitted to this issue. The wealth of research in very low bit-rate video coding is evident from the large number of submissions received. They would also like to acknowledge the contribution of the many experts who participated in the reviewing process of submitted manuscripts. The quality of this issue is due in large part to the constructive suggestions that came out of the review process. In addition, the Guest Editors would like to thank Ms. S. L. McDonald and Dr. A. M. Bush, JSAC Board representative, for their guidance. Finally, the they would like to acknowledge the support of their respective organizations, Japan Broadcasting Corp. (NHK), Aspex Microsystems Ltd., and University of California, Davis.
KAZUMASA ENAMI, Guest Editor Japan Broadcasting Corp. (NHK) Tokyo, Japan ANARGYROS KRIKELIS, Guest Editor Aspex Microsystems Ltd. Uxbridge, UB8 3PH U.K.
C. Channel Efficiency
TODD R. REED, Guest Editor University of California, Davis Davis, CA 95616 USA
The channel efficiency in wireless video coding and ways of improving it is the theme of the last two papers in the issue.
A. M. BUSH, JSAC Board Representative
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 15, NO. 9, DECEMBER 1997
1687
Kazumasa Enami was born in Nagoya, Aichi Prefecture, Japan, on July 13, 1948. He received the Bachelor of Science degree in electrical engineering from the Tokyo Institute of Technology, Tokyo, in 1971 and the Doctorate of Engineering degree in electrical engineering from the Tokyo Institute of Technology, Tokyo, in 1984. He began his professional career with the NHK (Nippon Hoso Kyokai: Japan Broadcasting Corporation) in 1971. He is currently Director of the Multimedia Services Research Division at the Science and Technical Research Laboratories of NHK where he is responsible for research on digital video signal processing, the design of parallel computer architecture, desktop program production technology, and multimedia services for broadcasting. Dr. Enami is a member of the Board of Directors of Foundation for Intelligent Physical Agents and of the Board of Directors of the Institute of Television Engineers of Japan (ITE). He is also a member of the Society of Motion Picture and Television Engineers (SMPTE) and the Institute of Electronics, Information and Communication Engineers of Japan (IEICE Japan).
Anargyros (Argy) Krikelis (S’83–M’83) received the Diploma of Electrical Engineering and Electronics from University of Patras (Greece) and the M.Sc. in digital systems and Ph.D. degrees from Brunel University, Uxbridge, U.K.. He is the Chief Scientist, a member of the Management Council, and a founding member of Aspex Microsystems Ltd., leading the applications division of the company. He is also a Research Fellow at the Department of Electrical Engineering and Electronics at Brunel University. His research interests are multimedia processing and applications, massively parallel computer architectures, video and image processing applications of massively parallel computation, associative architectures, programming methods and tools for parallel architectures, parallelizing compilers, and neural computation. He is the author of over 40 contributions to international conferences, journals and books. His involvement as a technical leader include projects in image, vision, and video processing and massively parallel processing involving system and application development for areas such as computer vision, digital television, highenergy physics, airborne/spaceborn surveillance and tracking, speech recognition, and database management. Dr. Krikelis has been Guest Editor for special issues for the Journal of Parallel and Distributed Computing, IEEE COMPUTER, and Parallel Computing. He serves on the Editorial Board of IEEE CONCURRENCY and is the Organizing Chair for the Workshop on Parallel Processing and Multimedia, part of the annual International Parallel Processing Symposium (IPPS). He is a member of the IEEE Computer Society.
Todd R. Reed (S’76-M’77-SM’90) received the B.S., M.S., and Ph.D. degrees in electrical engineering from the University of Minnesota, in 1977, 1986, and 1988, respectively. From 1977 to 1983 he was an Electrical Engineer at IBM (San Jose, CA, Rochester, MN, and Boulder, CO), and from 1984 to 1986 he was a Senior Design Engineer for Astrocom Corp., St. Paul, MN. He served as a consultant to the MIT Lincoln Laboratory from 1986 to 1988. In 1988 he was a Visiting Assistant Professor in the Department of Electrical Engineering, University of Minnesota. From 1989 to 1991 he was the head of the image sequence processing research group in the Signal Processing Laboratory, Department of Electrical Engineering, at the Swiss Federal Institute of Technology in Lausanne. He is currently an Associate Professor in the Department of Electrical and Computer Engineering, University of California, Davis. His research interests include image and image sequence processing and coding, multidimensional digital signal processing, and computer vision. Dr. Reed is a member of the European Association for Signal Processing, the Association for Computing Machinery, the Society for Industrial and Applied Mathematics, Tau Beta Pi, and Eta Kappa Nu.