consider a general overview of technology and the vision .... electrical engineering from the California Institute of Technology, Pasadena, in 1990 and. 1993 ...
Scanning the Issue Special Issue on Multimedia Signal Processing, Part I Digital audio, digital video, computer graphics, and document processing have evolved as separate fields over the past several decades. With recent advances in hardware, software, and digital signal processing, it is now feasible to integrate these different data streams within a single platform, which is the fundamental driving force behind the apparent convergence of computing, telecommunications, broadcast, and entertainment technologies. This special issue reviews the state of the art in multimedia signal processing, including new research results and recent standardization activities, in two parts. In this first part, we consider a general overview of technology and the vision for the future, media integration, digital data base and libraries, audio processing, and multimedia communications and networking. In the second part, to appear in this PROCEEDINGS in June 1998, topics on image/video processing for multimedia, architecture and implementation, multimedia standards, and applications are presented. Perhaps we can start by first defining multimedia signal processing. The IEEE Signal Processing Society has a long history of research accomplishments in audio, speech, image, and video processing as individual data streams, whereas computer graphics and document processing have grown within the Computer Society. Multimedia signal processing in principle refers to combined processing of multiple media streams. A simple example may be simultaneous use of audio, video, and closed-caption data for content-based search and browsing of multimedia data bases. Today, multimedia signal processing is in its infancy, and many multimedia signal-processing algorithms are in the form of integration of the results of processing several single media streams. For example, the state-ofthe-art audio processing algorithms are distinctly different from those of image and video processing algorithms, and document and graphics processing algorithms are different from those of audio and video processing algorithms. We expect multimedia signal-processing algorithms to evolve to make use of all information in these different media streams simultaneously. With these in mind, many researchers see an urgent need for strong interaction among these traditionally separate lines of research in signal processing. As a result, a proposal to establish a new technical committee (TC) Publisher Item Identifier S 0018-9219(98)03692-5.
to promote multimedia signal processing was developed in the IEEE Signal Processing Society. The proposal was widely supported, which was a clear indication that most signal-processing researchers were interested in potential multimedia applications. Hence, the Technical Committee on Multimedia Signal Processing (MMSP) was founded during the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) in 1996, with the chartered mission to promote technical activities in emerging multimedia research and development. Since its foundation, the multidisciplinary characteristic of the MMSP TC has offered a forum for interaction among researchers and engineers in a variety of technical areas, especially for conducting research that falls in between the domains of traditional areas, e.g., technologies that process multiple signal sources involving speech, music, image, and video. Furthermore, the MMSP TC collaborates with other IEEE societies with similar interests to foster research in multimedia communication, multimedia computing, and multimedia circuits and systems. Clearly, it is at the fringes of existing technical areas, and at the intersection of traditional technologies, that new ideas and research opportunities shine. This field is developing rapidly, and the MMSP TC strives to implement this emerging technology to many of our activities. One event the MMSP TC organizes is the IEEE Workshop on Multimedia Signal Processing. The first MMSP workshop was held in June 1997 and featured both technical presentations and a demonstration session in which the latest research results were presented in a real multimedia setting. The workshop proceedings included a CD-ROM containing all the papers, as well as animated demonstrations in the hypertext mark-up language format that enables hyperlinking of audio, images, and video for a truly multimedia presentation. As a continuation of our TC activities, this special issue on multimedia signal-processing features overview articles in the state-of-the-art research topics and standardization efforts. This issue begins with the “Scanning the Technology” paper, which provides a thorough and in-depth overview of the existing technology. We then cover multimedia technology in several research fields. The first is media integration, without which multimedia signals could not exist. The storage and retrieval of digital multimedia are also of essence for applications related to data bases and li-
0018–9219/98$10.00 1998 IEEE PROCEEDINGS OF THE IEEE, VOL. 86, NO. 5, MAY 1998
751
braries. Then comes the issue of how to transmit and receive multimedia data effectively and efficiently, which relates to the communication infrastructure such as the Internet or broad-band communication networks for multimedia communications. In what follows, we briefly summarize the papers included in this first part of this special issue. I. GENERAL OVERVIEW OF TECHNOLOGY AND VISION FOR FUTURE “On The Applications of Multimedia Processing to Communications” by Cox et al. provides a comprehensive overview of past, present, and future multimedia processing technologies. It provides insights into many problems in signal processing and communication. With wellrecognized expertise in signal processing, the authors contribute this paper that will have long-lasting value in the field of multimedia signal processing. II. MEDIA INTEGRATION In “Toward the Creation of a New Medium for the Multimedia Era,” Nakatsu presents the concept of “hypercommunication” that involves a cyberspace that accommodates various people from different places and time zones, as well as human-like agent communications. The author provides a visionary description of how such technology will provide telecommunications services of the future. “Audio-Visual Integration in Multimodal Communication” by Chen and Rao provides a review of recent research that examines audio-visual integration. Such interaction is very important in multimodal communication, such as in video telephony and video conferencing. With a thorough coverage of research results in the fields of lip reading, facial animation, and lip synchronization, the authors show that joint processing of audio and video provides advantages that are not available when the audio and video are processed independently. “Toward Multimodal Human–Computer Interface” by Sharma et al. discusses recent advances in signal-processing technologies that made novel human–computer interaction modalities possible. Of particular importance is the observation that well-known difficulties in the automated analysis of single modality signals, e.g., image analysis or speech recognition, may be overcome by a multimodal human–computer interface. This paper addresses questions such as which modalities to integrate and how to integrate them. In “Face to Virtual Face,” Thalmann et al. overview technologies related to face analysis and synthesis, including three-dimensional shape reconstruction, animation techniques, facial-motion tracking, facial-expression recognition, and audio-visual synchronization for face-to-virtualface multimedia communication in a virtual world. This paper relates these technologies to the recent MPEG-4 synthetic–natural hybrid coding standardization efforts. III. DIGITAL DATA BASE AND LIBRARIES One of the key problems in video indexing for contentbased retrieval is to develop efficient and effective repre752
sentations of visual content. In this category, there are two papers. The first, “Next-Generation Content Representation, Creation, and Searching for New-Media Applications in Education” by Chang et al., discusses the state-of-the-art techniques for content representation, searching, creation, and editing. These techniques bring the tasks of content creation close to actual users, enabling them to be active producers of audio-visual information. A number of systems developed by the authors are introduced in detail. The authors also present a very interesting case study of new-media applications based on multimedia education experiments in K–12 schools. In the second paper of this category, Irani and Anandan propose scene-based mosaic representations of the content in “Video Indexing Based on Mosaic Representations.” This paper provides a brief overview of mosaic generation methods. It also discusses visual summary and object-based dynamic indexing using mosaic representations. IV. AUDIO PROCESSING For audio processing, we present two papers. The first, “Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Representations” by Vercoe et al. introduces the concept of “structured audio” to provide semantic and symbolic descriptions for representing audio content. This concept is useful for low-bit-rate audio coding, flexible synthesis, and manipulation and retrieval of sound. The authors present a thorough overview of the related techniques and a complete summary of potential applications. In the second paper on this topic, “Fundamental and Technological Limitations of Immersive Audio Systems,” Kyriakakis presents a historical overview of the development of immersive audio technologies. The author discusses the performance of existing immersive audio systems and future research directions. Several fundamental and technological limitations that impede the development of seamless immersive audio are summarized. A desktop audio system using integrated listener-tracking capability to circumvent such technological limitations is also introduced. V. MULTIMEDIA COMMUNICATIONS AND NETWORKING We have two papers focused on the communication and networking issues for multimedia applications. “VBR Video: Tradeoffs and Potentials” by Lakshman et al. examines the transport and storage issues for variable-bitrate video. It not only clarifies the terminology for both networking and video researchers but also explains in detail the tradeoffs among various variable-bit-rate videotransmission techniques. The authors illustrate that increased interaction between the video and network design can improve overall video quality without changing the network capacity. “Error Control and Concealment for Video Communication—A Review” by Wang and Zhu reviews error control and concealment techniques and classifies them into three categories: forward error concealment, postprocessing, and interactive error concealment. In addition to covering curPROCEEDINGS OF THE IEEE, VOL. 86, NO. 5, MAY 1998
rent research activities in the field, the authors also address in detail the practice in international standards. Part II of this special issue will address topics on image/video processing for multimedia, architecture and implementation, multimedia standards, and applications. ACKNOWLEDGMENT The Guest Editors wish to thank the following reviewers for their valuable inputs and comments for this special issue: R. Ansari, M. R. Banham, D. Begault, S. S. Bhattacharyya, F. Bossen, F. Catthoor, S.-F. Chang, M. R. Civanlar, A. G. Constantinides, P. R. Cook, A. Eleftheriadis, E. Eren, E. Frantzeskakis, W. Gardner, B. Girod, L Guan, F Hartung, S Hemami, H Iwata, P Kalra, Z. S. Karim, G. Karlsson, S. Katagiri, A. Katkere, F. Kishino, K. Konstantinides, S. Y. Kung, C.-C. J. Kuo, D. Lee, E.
A. Lee, J. Mao, U. Neumann, T. Pappas, A. Pentland, E. Petajan, F. Pereira, P. Pirsch, N. Shanbhag, R. Sharma, J. O. Smith III, M.-T. Sun, Z. Sun, A. Tabatabai, J. Taylor, A. M. Tekalp, L. Torres, V. Vaishampayan, P. van Beek, P. Vora, Y. Wang, A. Wu, Z. Xiong, and M. Yeung. TSUHAN CHEN, Guest Editor Carnegie-Mellon University Pittsburgh, PA 15213-3890 USA K. J. RAY LIU, Guest Editor University of Maryland College Park, MD 20742 USA A. MURAT TEKALP, Guest Editor University of Rochester Rochester, NY 14627-0126 USA
Tsuhan Chen (Member, IEEE) received the B.S. degree in electrical engineering from the National Taiwan University, Taiwan, R.O.C., in 1987 and the M.S. and Ph.D. degrees in electrical engineering from the California Institute of Technology, Pasadena, in 1990 and 1993, respectively. From January to July 1993, he was a part-time Member of the Technical Staff with the Jet Propulsion Laboratory, Pasadena. From 1993 to 1997, he was with the Visual Communications Research Department, AT&T Bell Laboratories, Holmdel, NJ, later AT&T Labs-Research, Red Bank, NJ, as a Senior Technical Staff Member and then a Principal Technical Staff Member. Since October 1997, he has been with the Department of Electrical and Computer Engineering, Carnegie-Mellon University, Pittsburgh, PA, as an Associate Professor. His research interests include multimedia processing and communication, image/video coding, audio-visual interaction, and multirate signal processing. He has published a number of technical papers and has received two U.S. patents. Dr. Chen is a member of the IEEE Circuits and Systems Society’s Technical Committees on Multimedia Systems and Applications and on Visual Signal Processing and Communication. He is Cofounder and Chairman of the Multimedia Signal Processing Technical Committee of the IEEE Signal Processing Society. He is Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY and IEEE TRANSACTIONS ON IMAGE PROCESSING. He received the Charles Wilts Prize for outstanding independent research at the California Institute of Technology.
K. J. Ray Liu (Senior Member, IEEE) received the B.S. degree from the National Taiwan University in 1983 and the Ph.D. degree from the University of California, Los Angeles, in 1990, both in electrical engineering. Since 1990, he has been with the Electrical Engineering Department and Institute for Systems Research of the University of Maryland at College Park, where he is an Associate Professor. During his sabbatical leave in 1996–1997, he was a Visiting Associate Professor at Stanford University, Stanford, CA. His research interests span various aspects of signal/image processing and communications. He has published more than 130 papers, of which more than 50 are in archival journals and book chapters. His research Web page is http://dspserv.eng.umd.edu. He is an Editor of the Journal of VLSI Signal Processing. He is the Series Editor of a Marcel Dekker series on signal processing and the Coeditor of two books published by IEEE Press. Dr. Liu has received numerous awards, including the 1994 National Science Foundation Young Investigator Award and the IEEE Signal Processing Society’s 1993 Senior Award (Best Paper Award). From the University of Maryland, he has received the 1994 George Corcoran Award for outstanding contributions to electricalengineering education and the 1995–1996 Outstanding Systems Engineering Faculty Award in recognition of outstanding contributions in interdisciplinary research. He was an Associate Editor of IEEE TRANSACTIONS ON SIGNAL PROCESSING. He was a Guest Editor of a special issue of the IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS on Signal Processing for Wireless Communications, and a Founding Member of the Multimedia Signal Processing Technical Committee of the IEEE Signal Processing Society. PROCEEDINGS OF THE IEEE, VOL. 86, NO. 5, MAY 1998
753
A. Murat Tekalp (Senior Member, IEEE) received the M.S. and Ph.D. degrees in electrical, computer, and systems engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1982 and 1984, respectively. From 1984 to 1987, he was a Research Scientist and then a Senior Research Scientist with Eastman Kodak Company, Rochester, NY. He joined the Electrical Engineering Department of the University of Rochester, Rochester, NY, as an Assistant Professor in September 1987, where he currently is a Professor. His current research interests are in the area of digital image and video processing, including image restoration, motion estimation and tracking, object-based coding, digital libraries, and magnetic resonance imaging. He is on the editorial boards of Graphical Models and Image Processing, Visual Communications and Image Representation, and Image Communication. He is an Associate Editor for Multidimensional Systems and Signal Processing. He is the Author of Digital Video Processing (Englewood Cliffs, NJ: Prentice-Hall, 1995). Dr. Tekalp is a member of Sigma Xi. He received the National Science Foundation Research Initiation Award in 1988. He was an Associate Editor for IEEE TRANSACTIONS ON IMAGE PROCESSING (1994–1997) and IEEE TRANSACTIONS ON SIGNAL PROCESSING (1990–1992). He is the past Chair of the IEEE Signal Processing Society Technical Committee on Image and Multidimensional Signal Processing. He was the Technical Program Chair for the 1991 IEEE Signal Processing Society Workshop on Image and Multidimensional Signal Processing and Special Sessions Chair for the 1995 IEEE International Conference on Image Processing. He was the Organizer and First Chair of the Rochester Chapter of the IEEE Signal Processing Society. He also was Chair of the IEEE Rochester Section in 1994–1995.
754
PROCEEDINGS OF THE IEEE, VOL. 86, NO. 5, MAY 1998