IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 5, OCTOBER 2011
841
Special Section on Interactive Multimedia
I
N recent years, there has been much interest in interactive multimedia due to its many applications and business opportunities. An interactive multimedia application refers to live sharing of multimedia contents in terms of video, audio, texts, or images among multiple users in a network. Examples are voice over IP (VoIP), videoconferencing, distributed collaborative environments, teleconferencing, online multiplayer games, social games, etc. As enterprises and interpersonal communications are becoming increasingly global, such interactive multimedia applications overcome accessibility and distance barriers by bringing people together, leading to tremendous saving in traveling time, as well as operational and fuel costs. Interactive multimedia is currently one of the fastest growing sectors in the market, driven by the advances in broadband networking, networking technologies, QoS standards, and audio/video processing techniques. An interactive session requires real-time processing of data and media streams, with the support of user interactions at any time. The design of an interactive multimedia system still faces many technological challenges in order to offer high-quality and low-delay multimedia services. Overcoming these challenges requires joint efforts of various multimedia communities such as system integration and architecture, signal processing, communication/coding and transmission, network design and measurement, standardization, etc. Furthermore, designing a good user interface for smart interactive systems, and the incorporation of automatic perception of human activity (presence, speech, interaction), remains an important area at its infancy. This special section successfully brings together high-quality papers from experts in various multimedia areas. All the papers present effective solutions to tackle interactive multimedia challenges. They represent a diverse range of topics in this timely area, addressing issues in 3-D audio, system and networking for videoconferencing, quality evaluation, and multimedia processing. Through this collection of papers, you will agree with us that interactive multimedia draws upon and integrates results from various multimedia communities to offer effective interactive services. The first paper in this special section, titled “An Interactive 3-D Audio System With Loudspeakers,” addresses the problem of 3-D audio delivery using two loudspeakers that is robust to the listener head position and room reverberation, an important and difficult problem in applications such as videoconferencing. To achieve this goal, the authors utilize a simple webcam that tracks the listener’s head position and orientation, and an explicit room model that is reasonably robust to modeling errors. The second paper, “Optimizing Multi-Rate Peer-to-Peer Video Conferencing Applications,” discusses a peer-to-peer multiparty conferencing application in which different receivers in the same group can receive video at different rate (e.g., through the use of a scalable video codec). It uses Primal and Primal-dual-based distributed algorithms to maximize aggregate utility of receivers in all groups by multi-tree routing. Digital Object Identifier 10.1109/TMM.2011.2165349
The third paper, “Enabling Composition-Based Video-Conferencing for the Home,” describes the design and architecture of a practical videoconferencing system for consumer homes. To meet quality-of-service requirements, the paper discusses various design considerations addressing issues of audiovisual quality, encoding and decoding mechanisms, and interactive delay. A unique and interesting feature of the system is a control interface which can dynamically manipulate and compose streams to offer better and more enjoyable videoconferencing experience. The fourth paper, “Subjective Quality Evaluation via Paired Comparison: Application to Scalable Video Coding,” presents a study of subjective quality assessment of scalable coded video, and provides a guideline for adaptive strategy for a given bandwidth constraint. It shows that priority between spatial resolution and frame rate depends on the bit-rate condition and content type. For low bit-rate, the spatial resolution is more important, whereas for higher bit-rate, a high frame rate is more preferable. Last but not least, the fifth paper titled “One-Pulse FEC Coding for Robust CELP-Coded Speech Transmission Over Erasure Channels” discusses a forward error correction (FEC) technique for CELP-coded speech over erasure channel. The paper focuses on the optimization procedure to compute a single resynchronization pulse, which reduces the propagation error in the decoder in subsequent audio frames. We would like to thank all the authors who contributed to this special session. We are also greatly indebted to the numerous reviewers, who provided their valuable comments out of their busy schedules. Their thorough and timely reviews have greatly improved the quality of the papers in this special section. We would like to express our gratitude to Ms. Rebecca Wollman for her responsiveness to our questions and efforts in putting this special section together in a timely manner. S.-H. GARY CHAN, Guest Editor The Hong Kong University of Science and Technology Hong Kong
[email protected] JIN LI, Guest Editor Microsoft Research USA Redmond, WA
[email protected] PASCAL FROSSARD, Guest Editor Swiss Federal Institute of Technology (EPFL) Lausanne, Switzerland
[email protected] GERASIMOS POTAMIANOS, Guest Editor National Center for Scientific Research (NCSR) “Demokritos” Athens, Greece
[email protected]
1520-9210/$26.00 © 2011 IEEE
842
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 5, OCTOBER 2011
S.-H. Gary Chan (S’89–M’98–SM’03) received the M.S.E. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, in 1994 and 1999, respectively, with a minor in business administration. He received the B.S.E. degree (highest honor) in electrical engineering from Princeton University, Princeton, NJ, in 1993, with certificates in applied and computational mathematics, engineering physics, and engineering and management systems. He is currently an Associate Professor with the Department of Computer Science and Engineering and the Director of Sino Software Research Institute, The Hong Kong University of Science and Technology (HKUST), Hong Kong. His research interest includes multimedia networking, peer-to-peer technologies and streaming, and wireless communication networks. He was a Visiting Research Collaborator at Princeton University (2009), Visiting Associate Professor at Stanford University (2008–2009), Director of Computer Engineering Program at the HKUST (2006–2008), Visiting Assistant Professor in Networking at University of California at Davis (1998–1999), and Research Intern at the NEC Research Institute, Princeton, NJ (1992–1993). He was a William and Leila Fellow at Stanford University (1993–1994). Dr. Chan is an Associate Editor of the IEEE TRANSACTIONS ON MULTIMEDIA, and a Vice-Chair of the Peer-to-Peer Networking and Communications Technical Sub-Committee of the IEEE Comsoc Emerging Technologies Committee. He has been Guest Editor of IEEE TRANSACTIONS ON MULTIMEDIA (2011), IEEE SIGNAL PROCESSING MAGAZINE (2011), IEEE COMMUNICATION MAGAZINE (2007), and Springer Multimedia Tools and Applications (2007). He was the TPC chair of the IEEE Consumer Communications and Networking Conference (CCNC) in 2010, the Multimedia symposium in IEEE Globecom (2007 and 2006) and IEEE ICC (2007 and 2005), and the Workshop on Advances in Peer-to-Peer Multimedia Streaming in the ACM Multimedia Conference (2005). He is a member of honor societies Tau Beta Pi, Sigma Xi, and Phi Beta Kappa. At Princeton, he was the 1993 recipient of the Charles Ira Young Memorial Tablet and Medal and the POEM Newport Award of Excellence.
Jin Li received the Ph.D. degree (with honors) from Tsinghua University, Beijing, China, in 1994. He is a Principal Researcher at Microsoft Research, Redmond, WA, and is managing the Multimedia Communication and Storage team. From 1994 to 1996, he served as a Research Associate at the University of Southern California (USC). From 1996 to 1999, he was a Member of the Technical Staff at the Sharp Laboratories of America (SLA), and represented the interests of SLA in the JPEG 2000 and MPEG 4 standard efforts. He joined Microsoft Research in 1999, as a Lead Researcher and one of the founding members of Microsoft Research Asia, Beijing, China. He has won a Microsoft Gold Star service award in 1999 for his contribution in founding Microsoft Research Asia. He moved to Microsoft Research, Redmond, WA, in 2001. Since 2000, he has also served as an Affiliated Professor in Tsinghua University. His invention has been integrated into many Microsoft products, such as WMA9, Live Messenger, Live Mesh, Windows 7, Lync, Windows 8, Windows 8 server, Azure, Bing, and Xbox Live. Blending theory and system, he excels at interdisciplinary research. He has published work in the top-tier journal and conferences in information theory (ISIT, ITW, T-IT), signal processing (ICASSP, ICIP, DCC, VCIP, T-SP, T-IP, T-MM, T-CSVT), communication theory (INFOCOM, ICC/GLOBECOM), network systems (ACM SIGCOMM), computer systems (USENIX ATC, ICDCS, ACM MM), and database (VLDB, ACM SIGMOD). Dr. Li was the recipient of the Young Investigator Award from Visual Communication and Image Processing’98 in 1998, and the ICME 2009 Best Paper Award. He is/was the Associate Editor/Guest Editor of the IEEE TRANSACTIONS ON MULTIMEDIA, IEEE JOURNAL OF SELECTED AREAS IN COMMUNICATIONS, Journal of Visual Communication and Image Representation, P2P Networking and Applications, and Journal of Communications. He has served on the TPCs and Organization Committee of many conferences, e.g., as the General Chair of PV2009, the lead Program Chair of ICME 2011, the Vice General Chair of ICCCN 2011, and the Workshop Co-Chair of ACM Multimedia 2011.
IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 13, NO. 5, OCTOBER 2011
843
Pascal Frossard (S’96–M’01–SM’04) received the M.S. and Ph.D. degrees, both in electrical engineering, from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, in 1997 and 2000, respectively. Between 2001 and 2003, he was a member of the research staff at the IBM T. J. Watson Research Center, Yorktown Heights, NY, where he worked on media coding and streaming technologies. Since 2003, he has been a Professor at EPFL, where he heads the Signal Processing Laboratory (LTS4). His research interests include image representation and coding, nonlinear representations, visual information analysis, joint source and channel coding, multimedia communications, and multimedia content distribution. Dr. Frossard has been the General Chair of IEEE ICME 2002 and Packet Video 2007, and a member of the organizing or technical program committees of numerous conferences. He has been an Associate Editor of the IEEE TRANSACTIONS ON MULTIMEDIA (2004–) and of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2006–2011). He is an elected member of the IEEE Image and Multidimensional Signal Processing Technical Committee (2007–). He has served as Vice-Chair of the IEEE Multimedia Communications Technical Committee (2004–2006) and as a member of the IEEE Multimedia Signal Processing Technical Committee (2004–2007). He received the Swiss NSF Professorship Award in 2003, the IBM Faculty Award in 2005, and the IEEE TRANSACTIONS ON MULTIMEDIA Best Paper Award in 2011.
Gerasimos Potamianos (M’91) received the Diploma degree in electrical and computer engineering from the National Technical University of Athens, Athens, Greece, in 1988, and the M.S.E. and Ph.D. degrees in electrical and computer engineering from the Johns Hopkins University, Baltimore, MD, in 1990 and 1994, respectively. Following his doctorate, during 1994–1996, he was a Postdoctoral Fellow with the Center for Language and Speech Processing (CLSP) at Johns Hopkins, and from 1996 until 1999, he was a Senior Member of Technical Staff with the then Speech and Image Processing Services Laboratory at AT&T Labs-Research, in Murray Hill and Florham Park, NJ, working on audio-visual automatic speech recognition and synthesis. In 1999, he joined the Human Language Technologies Department at the IBM Thomas J. Watson Research Center, Yorktown Heights, NY, where he eventually became Manager of the Multimodal Conversational Solutions Department. In late 2008, he joined the Telecommunications and Networks Laboratory of the Institute of Computer Science at FORTH, Herakleion, Crete, Greece, as a Senior Researcher, and in April 2009, he joined the Software and Knowledge Engineering Laboratory (SKEL) of the Institute of Informatics and Telecommunications (IIT) at the National Center for Scientific Research (NCSR), “Demokritos” in Athens, Greece, as a Research Director. His research interests span the areas of multimodal speech processing with applications to human-computer interaction and ambient intelligence, with particular emphasis on audio-visual speech processing, automatic speech recognition, multimedia signal processing and fusion, as well as computer vision for human detection and tracking. He has published over 95 articles in these areas that have received over 700 citations and has seven patents granted. Dr. Potamianos is a member of the Technical Chamber of Greece. Several highlights on his work include his participation at the Johns Hopkins CLSP Summer Workshop (WS’00) on audio-visual speech recognition, teaching at the 2001 ELSNET summer school, a tutorial at ICIP 2003, plenary talks at AVSP 2003 and VisHCI 2006, panel participation at MMSP 2006, and guest editor of special issues of the EURASIP JASP 2002 and IEEE TASLP 2009 journals. He also received the best paper award at ICME 2005 and was co-author of the best student paper at Interspeech 2007. He is currently a member of the IEEE Speech and Language Technical Committee.